Archive for the ‘python’ Category

SQL & integer division (why 5/2 usually equals 2!)

Monday, March 15th, 2010

I came across integer division in Python 2.x. If you divide one integer by another you get an integer result. So 5/2 = 2 instead of 2.5. You get floor division, not true division (Python – Changing the Division Operator). In Python 3, true division is the default (thank goodness) but in Python 2.x you need to make one of the numbers a float to get a float returned. So 5.0/2 = 2.5. I was bitten by this early on and know the standard way of handling it.

What I didn’t know was that integer division was the norm in SQL database SELECT statements. I had mainly been using MySQL and MySQL was pretty unique as it turned out:

MySQL by default does floating point division, even if both operators are of type INTEGER, so the above [1/2] would return 0.5 in MySQL. All of the other database engines tested do integer division, and return an integer result. (SQLite – Differences Between Engines).

Anyway, in SOFA Statistics, row and column percentages were affected by this behaviour and always returned x.0 %. There was never anything other than zero after the decimal point. The fix was very simple. Instead of SELECT … 100*(num/denom) the relevant code is SELECT … 100.0*(num/denom). The 100 is now a float for those who missed that small but significant difference.

0.8.11 provides internationalisation support and a major fix for Vista/Windows 7

Monday, November 9th, 2009

The latest version of SOFA Statistics has some important improvements.

  • Fixed major bug preventing interaction with data on Vista/Windows 7. It was caused by the “\U” combination inside project configuration files (e.g. C:\Users\…). The backslash U combination was treated as the start of a unicode string (international text etc) but as an invalid one. Windows testing using XP didn’t pick this up because the venerable “Documents and Settings” folder in XP has been replaced with the “Users” folder in Vista and Windows 7.
  • Better support for international text and unicode e.g. René, Identität, François etc.
  • Better responses to errors saving data to database tables. For example, if a user tries to save to database a word with characters in it not supported by the underlying database table (such as a unicode
    letter not found in the Latin character set).
  • For Galician speakers, a version of SOFA Statistics in their own language (currently only working in Ubuntu).

There is also a new version of wxWebKit etc available for Karmic (9.10) users thanks to Christoph Willing. NB this will also help some users of Jaunty (9.04) who have updated packages which conflict with those in SOFA Statistics. More details can be found at http://www.sofastatistics.com/predeb.php.

Multi-language SOFA Statistics Begins

Saturday, October 24th, 2009

Launchpad offers great support for translating applications into different languages (https://help.launchpad.net/Translations).  And Python http://docs.python.org/library/i18n.html (and wxPython http://wiki.wxpython.org/Internationalization) have standard ways of supporting multiple languages.  So it was always going to be achievable to make SOFA Statistics multilingual as long as people were willing to help with translation.  First to raise their hand has been Indalecio Freiría Santos (see SOFA Statistics discussion thread) and the Galician version should be available first.  If you are interested in adding translations please feel free to raise your hand in the discussion group http://groups.google.com/group/sofastatistics at any time.

wxPython hourglass cursor not working in Ubuntu

Monday, August 17th, 2009

The following code worked in Windows but not in Ubuntu:

# hourglass cursor

curs = wx.StockCursor(wx.CURSOR_WAIT)
self.SetCursor(curs)
Something happens that takes a while … … … …
# Return to normal cursor
curs = wx.StockCursor(wx.CURSOR_ARROW)
self.SetCursor(curs)

Use instead:

wx.BeginBusyCursor()
wx.EndBusyCursor()

NB good to use wx.IsBusy() with EndBusyCursor().  On Windows, ending a cursor if one is not running causes an error.

if wx.IsBusy():
    wx.EndBusyCursor()

Misc library issues

Monday, August 17th, 2009

Re: pysqlite-2.5.5-win32-py2.6.exe – it wouldn’t install on my clean virtual XP environment.  It was unable to locate the component msvcr71.dll. So I was forced to include that in the Windows package.

The mysqldb module doesn’t currently have an official 2.6 version of the Windows installer.  Which was the main reason I had kept the Windows version to Python 2.5 for which there was one  (SciPy was no longer relevant so shifting to 2.6 for all installers was definitely in contention).  And there had been mixed experience of mysqldb packages put together by third parties (https://sourceforge.net/forum/forum.php?thread_id=2316047&forum_id=70460).  But I really needed a feature which was introduced in Python 2.6 – namely the float method as_integer_ratio.  This was needed to enable my float to decimal function to work (http://docs.python.org/library/decimal.html) which I needed to get the level of precision required to pass the hardest NIST ANOVA test (http://www.itl.nist.gov/div898/strd/anova/SmLs09.html). In the end I went with http://www.thescotties.com/mysql-python/test/MySQL-python-1.2.3c1.win32-py2.6.exe.  Another option was http://www.codegood.com/archives/4.

BTW there is a lot to like about Python 2.6 – it is the gateway to the 3 series and will make that eventual transition a lot easier.

The decimal module in Python

Wednesday, August 12th, 2009

Python has a brilliant decimal module (http://docs.python.org/library/decimal.html) you may need if you want to avoid floating point errors.  This may be necessary if you are faced with compounding errors under special circumstances e.g. if testing a statistical routine against a purpose-built test dataset (e.g. http://www.itl.nist.gov/div898/strd/anova/SmLs09_cv.html).  The performance hit is substantial, however, so it has to be used judiciously.  Anyway, here is an example:

import decimal
D = decimal.Decimal
decimal.getcontext().prec = 120
d1 = D("1.1")
f1 = 1.1
print "Decimal result is: %s" % round((d1**1000 - D("2.46993291801e+41")),3)
print "Floating point result is: %s" % round((f1**1000 - 2.46993291801e+41),3)
>>>

Decimal result is: -4.17366587591e+29
Floating point result is: -3.97456123863e+29

Usually, floating point is good enough – but not under all circumstances.  In which case, it pays to be familiar with the decimal module.

Adding ability to import from csv and spreadsheets etc

Tuesday, June 9th, 2009

SOFA Statistics is having new import functionality added.  The first target is csv format files (using the standard Python csv module underneath) followed by Excel spreadsheets.  The solution I have for Excel works even when MS Office has not been installed on a machine but will only work in Windows.   Later on I will target SPSS data files and Open Office Calc spreadsheet files.

Resolving Windows installation glitches

Tuesday, May 26th, 2009

If you had problems installing the Windows version of SOFA 0.6.8, try 0.7.0 (http://www.sofastatistics.com/misc/sofa-0.7.0_python-2.5.zip).  It resolves the main issues with that installation package.  Version 0.7.0 also resolves some other issues within SOFA and represents the first of the 0.7 series – the goal of which is to enable importing of spreadsheets and other, non-SQL database type data.

The Windows comtypes package relied upon by SOFA 0.6.8 proved to be faulty.  The 0.7.0 version of SOFA Statistics, which has just been released, uses an older version of comtypes (0.5.2) which is known to work.  A version of comtypes 0.6.0 for python 2.5 is apparently forthcoming.

There is still a delay when using SOFA’s table making functionality for the first time while comtypes generates some data it needs.  NB this is a one-off delay which doesn’t affect anything else.  Ideally, SOFA will handle this process better in a forthcoming release.

Windows package will be Python 2.5 only for now

Saturday, May 16th, 2009

Unfortunately SciPy does not have a Python 2.6 installer yet.  The MySQLdb package does but it is not from the central sourceforge location (http://www.technicalbard.com/files/MySQL-python-1.2.2.win32-py2.6.exe). Also see http://bytes.com/groups/python/854793-will-mysqldb-python-shim-supported-python-2-6-3-x.  For the time being, therefore, the Windows package for SOFA will be a Python 2.5 version only.

BTW nearly ready to release the packages once project hosting is finalised.

The licence is AGPL 3.

SOFA for Windows Package under testing

Friday, May 15th, 2009

A windows installer for sofa statistics has been created and is currently undergoing testing on clean machines.  NSIS was used to create the installer, which necessitated learning a new script language.  The language used by NSIS is a cross between PHP and assembly and was quite a shock after the elegance of python.