Archive for the ‘developers’ Category

Installing missing dlls in Windows for SOFA Statistics

Tuesday, October 6th, 2009

Creating a Windows installation package that works on everything from XP Home Edition to Vista 64-bit Business Edition is manageable but not exactly trivial.  Sometimes a single file can create a lot of issues e.g. msvcr71.dll (See http://www.sofastatistics.com/blog/?p=113). To ensure this file is available on the target computer it is not simply a matter of transferring the file in the same way that other files are transferred.  The correct approach using NSIS is to run InstallLib.

The following item was helpful – http://blacksheepsoftware.com.au/bradley/wordpress/?p=17. The NSIS documentation of relevance is here – http://nsis.sourceforge.net/Docs/AppendixB.html.

The snippet of code used in the latest SOFA Statistics package for Windows is:

; http://nsis.sourceforge.net/Docs/AppendixB.html
IfFileExists "$PROGRAMFILES\sofa\start.pyw" 0 new_installation

StrCpy $ALREADY_INSTALLED 1

new_installation:

!insertmacro InstallLib REGDLL $ALREADY_INSTALLED REBOOT_NOTPROTECTED “G:\3 SOFA dev\sofalibs\msvcr71.dll” $SYSDIR\msvcr71.dll $SYSDIR

0.8.6 supports PostgreSQL and has better output formatting

Monday, August 24th, 2009

New features:

  • Added support for PostgreSQL databases.
  • Each item of output now has a preceding display line and a description of its data source (database and table) and when it was created.
  • Improved layout of exported scripts.
  • Added unit tests for main statistical algorithms used.
  • Better handling of timestamp and autonumber fields in data entry/editing.

Bug fixes:

  • Fixed script export bug.

Additionally, the Windows package now installs a menu shortcut for uninstallation. It always should have, of course, but the latter is still an example of a little thing which makes newer versions of SOFA Statistics nicer to use. The idea is that, collectively, thousands of details like that will create a sense of polish. The Ubuntu 100 papercuts project is one inspiration.

wxWebKit will enable graphing when it is packaged

Friday, August 21st, 2009

wxWebKit (http://wxwebkit.wxcommunity.com/index.php?n=Main.HomePage) is a very important widget for the SOFA Statistics project as it will be used to display all output. At present, the only debian package for wxWebKit (kindly supplied by Christoph Willing) does not support the display of local images. Fortunately this is being rectified through the hard work of Kevin Ollivier, and a new package should be out sometime soon. This is expected to be a standard package which should simplify the installation instructions for Ubuntu users.

Once the wxWebKit package is available, a lot of development work will take place in SOFA Statistics to provide auxiliary graphs which support analysis e.g. by displaying the data distributions in the samples used for an ANOVA. It will finally be possible to really start delivering on the “learn-as-you-go” promise of SOFA Statistics.

Testing the statistical algorithms

Friday, August 21st, 2009

A statistical program has to produce accurate results reliably. And it has to keep doing so even when some aspects of the program change between versions. Seemingly trivial or non-consequential programming changes can have an enormous impact on the final result produced. So the only way to have confidence in a program is through automated testing. In many cases, it is also possible to test against a standard dataset with a guaranteed, known result (e.g. http://www.itl.nist.gov/div898/strd/general/dataarchive.html.

The one-way ANOVA has passed the most difficult NIST test when using the default “precision” setting (as opposed to speed, which relies on floating point maths).

Additionally, the ANOVA, and all the other tests, are now tested using a number of carefully crafted Python functions and a simple program called NOSE (http://somethingaboutorange.com/mrl/projects/nose/0.11.1/testing.html). The tests can feed hundreds of random samples of data into each SOFA Statistics algorithm and check the output against a trusted algorithm e.g. stats.py from SciPy.

Of course, randomness is not enough to test an algorithm. It is necessary to also feed in cases where some values are very high, very close to zero, or very similar to other values. The specific approach necessary to separate out the weak algorithms depends on the particular test. The NIST ANOVA datasets, for example, include lots of values with the same leading digits and the only difference occurring after the decimal point. A deliberate approach to testing increases the odds of exposing errors.

In the open source world there is no need to take anyone’s word for it. The test script, and all the algorithms for SOFA Statistics, are open source (https://code.launchpad.net/sofastatistics), and any developers or statisticians who can extend or otherwise improve the tests are welcome to do so. That’s the open source way. So if you think of something that could help strengthen SOFA Statistics or its testing, please feel free to contact me.

As part of the testing just completed, a couple of small bugs were detected and these will be corrected in the next release coming soon.

wxPython hourglass cursor not working in Ubuntu

Monday, August 17th, 2009

The following code worked in Windows but not in Ubuntu:

# hourglass cursor

curs = wx.StockCursor(wx.CURSOR_WAIT)
self.SetCursor(curs)
Something happens that takes a while … … … …
# Return to normal cursor
curs = wx.StockCursor(wx.CURSOR_ARROW)
self.SetCursor(curs)

Use instead:

wx.BeginBusyCursor()
wx.EndBusyCursor()

NB good to use wx.IsBusy() with EndBusyCursor().  On Windows, ending a cursor if one is not running causes an error.

if wx.IsBusy():
    wx.EndBusyCursor()

Misc library issues

Monday, August 17th, 2009

Re: pysqlite-2.5.5-win32-py2.6.exe – it wouldn’t install on my clean virtual XP environment.  It was unable to locate the component msvcr71.dll. So I was forced to include that in the Windows package.

The mysqldb module doesn’t currently have an official 2.6 version of the Windows installer.  Which was the main reason I had kept the Windows version to Python 2.5 for which there was one  (SciPy was no longer relevant so shifting to 2.6 for all installers was definitely in contention).  And there had been mixed experience of mysqldb packages put together by third parties (https://sourceforge.net/forum/forum.php?thread_id=2316047&forum_id=70460).  But I really needed a feature which was introduced in Python 2.6 – namely the float method as_integer_ratio.  This was needed to enable my float to decimal function to work (http://docs.python.org/library/decimal.html) which I needed to get the level of precision required to pass the hardest NIST ANOVA test (http://www.itl.nist.gov/div898/strd/anova/SmLs09.html). In the end I went with http://www.thescotties.com/mysql-python/test/MySQL-python-1.2.3c1.win32-py2.6.exe.  Another option was http://www.codegood.com/archives/4.

BTW there is a lot to like about Python 2.6 – it is the gateway to the 3 series and will make that eventual transition a lot easier.

The decimal module in Python

Wednesday, August 12th, 2009

Python has a brilliant decimal module (http://docs.python.org/library/decimal.html) you may need if you want to avoid floating point errors.  This may be necessary if you are faced with compounding errors under special circumstances e.g. if testing a statistical routine against a purpose-built test dataset (e.g. http://www.itl.nist.gov/div898/strd/anova/SmLs09_cv.html).  The performance hit is substantial, however, so it has to be used judiciously.  Anyway, here is an example:

import decimal
D = decimal.Decimal
decimal.getcontext().prec = 120
d1 = D("1.1")
f1 = 1.1
print "Decimal result is: %s" % round((d1**1000 - D("2.46993291801e+41")),3)
print "Floating point result is: %s" % round((f1**1000 - 2.46993291801e+41),3)
>>>

Decimal result is: -4.17366587591e+29
Floating point result is: -3.97456123863e+29

Usually, floating point is good enough – but not under all circumstances.  In which case, it pays to be familiar with the decimal module.

Please report bugs – it’s good for the project

Saturday, July 18th, 2009

Bugs are never welcome, but the only thing worse than a bug is a bug you don’t know about and could easily fix.  Even worse, an unknown bug could put some people off using your software, which is not a good outcome for anyone.  So how do you report a bug in SOFA Statistics?  Fortunately, Launchpad (which is where the SOFA Statistics source code lives) makes bug reporting easy.  Just go to: https://launchpad.net/sofastatistics/+filebug/+login and register the bug.  I’ll do my best to fix it and keep everyone informed along the way.

Remember – reporting a bug is an act of kindness so please don’t hold back.  Your report could help many other users.

Bazaar – Simple Yet Powerful

Sunday, June 28th, 2009

This project uses Bazaar to provide versioning control.  AlthoughBazaar is very powerful, it is also very easy to start using.  Here are some of my most commonly used commands:

bzr add – adds file to versioning control

bzr commit -m “Message in here about changes” – takes the copies and creates a new version

bzr push – pushes the revision out to Launchpad

bzr ls -V – lists all versioned files (if any are missing just use add)

Installation testing using VirtualBox snapshots

Thursday, May 28th, 2009

VirtualBox is brilliant.  You can set up Windows XP, Ubuntu Jaunty etc and test installations into them.  Then reset to snapshot and rinse and repeat.  Installing onto systems that are not “clean” is never as certain – perhaps you have already installed comtypes or whatever.

One tip for sharing files between a host OS and a Linux (Ubuntu) guest OS (see http://www.virtuatopia.com/index.php/VirtualBox_Shared_Folders).

Within the guest OS, make a directory e.g.

sudo mkdir /transfer

Then mount the shared folder you set up externally in VirtualBox using:
sudo mount -t vboxsf sharename mountpoint
in my case:
sudo mount -t vboxsf transfer /transfer
It is then easy to grab files from the host OS e.g. a deb package that needs to be installed.