Testing the statistical algorithms

August 21st, 2009

A statistical program has to produce accurate results reliably. And it has to keep doing so even when some aspects of the program change between versions. Seemingly trivial or non-consequential programming changes can have an enormous impact on the final result produced. So the only way to have confidence in a program is through automated testing. In many cases, it is also possible to test against a standard dataset with a guaranteed, known result (e.g. http://www.itl.nist.gov/div898/strd/general/dataarchive.html.

The one-way ANOVA has passed the most difficult NIST test when using the default “precision” setting (as opposed to speed, which relies on floating point maths).

Additionally, the ANOVA, and all the other tests, are now tested using a number of carefully crafted Python functions and a simple program called NOSE (http://somethingaboutorange.com/mrl/projects/nose/0.11.1/testing.html). The tests can feed hundreds of random samples of data into each SOFA Statistics algorithm and check the output against a trusted algorithm e.g. stats.py from SciPy.

Of course, randomness is not enough to test an algorithm. It is necessary to also feed in cases where some values are very high, very close to zero, or very similar to other values. The specific approach necessary to separate out the weak algorithms depends on the particular test. The NIST ANOVA datasets, for example, include lots of values with the same leading digits and the only difference occurring after the decimal point. A deliberate approach to testing increases the odds of exposing errors.

In the open source world there is no need to take anyone’s word for it. The test script, and all the algorithms for SOFA Statistics, are open source (https://code.launchpad.net/sofastatistics), and any developers or statisticians who can extend or otherwise improve the tests are welcome to do so. That’s the open source way. So if you think of something that could help strengthen SOFA Statistics or its testing, please feel free to contact me.

As part of the testing just completed, a couple of small bugs were detected and these will be corrected in the next release coming soon.

wxPython hourglass cursor not working in Ubuntu

August 17th, 2009

The following code worked in Windows but not in Ubuntu:

# hourglass cursor

curs = wx.StockCursor(wx.CURSOR_WAIT)
self.SetCursor(curs)
Something happens that takes a while … … … …
# Return to normal cursor
curs = wx.StockCursor(wx.CURSOR_ARROW)
self.SetCursor(curs)

Use instead:

wx.BeginBusyCursor()
wx.EndBusyCursor()

NB good to use wx.IsBusy() with EndBusyCursor().  On Windows, ending a cursor if one is not running causes an error.

if wx.IsBusy():
    wx.EndBusyCursor()

Misc library issues

August 17th, 2009

Re: pysqlite-2.5.5-win32-py2.6.exe – it wouldn’t install on my clean virtual XP environment.  It was unable to locate the component msvcr71.dll. So I was forced to include that in the Windows package.

The mysqldb module doesn’t currently have an official 2.6 version of the Windows installer.  Which was the main reason I had kept the Windows version to Python 2.5 for which there was one  (SciPy was no longer relevant so shifting to 2.6 for all installers was definitely in contention).  And there had been mixed experience of mysqldb packages put together by third parties (https://sourceforge.net/forum/forum.php?thread_id=2316047&forum_id=70460).  But I really needed a feature which was introduced in Python 2.6 – namely the float method as_integer_ratio.  This was needed to enable my float to decimal function to work (http://docs.python.org/library/decimal.html) which I needed to get the level of precision required to pass the hardest NIST ANOVA test (http://www.itl.nist.gov/div898/strd/anova/SmLs09.html). In the end I went with http://www.thescotties.com/mysql-python/test/MySQL-python-1.2.3c1.win32-py2.6.exe.  Another option was http://www.codegood.com/archives/4.

BTW there is a lot to like about Python 2.6 – it is the gateway to the 3 series and will make that eventual transition a lot easier.

0.8.5 has stronger ANOVA support and can output in multiple styles

August 17th, 2009

Version 0.8.5 has the following new features:

  • The one-way ANOVA now presents the user with a choice of either precision or speed. Precision passes the hardest NIST test (http://www.itl.nist.gov/div898/strd/anova/SmLs09.html) and is the default.  The speed option uses standard floating point arithmetic with all the pros and cons that entails.
  • ANOVA displays more information in output to enable comparison with known results.
  • HTML output can now display multiple styles for different tables.
  • Importing now requires alphanumeric names for tables.

and the following main bug fixes:

  • Importing CSV files is now working (regression added in 0.8.4).
  • CSV files with multiple data types in columns are handled correctly when user opts to let SOFA Statistics fix a column type.
  • Kruskal Wallis H test now copes with string variables.

The decimal module in Python

August 12th, 2009

Python has a brilliant decimal module (http://docs.python.org/library/decimal.html) you may need if you want to avoid floating point errors.  This may be necessary if you are faced with compounding errors under special circumstances e.g. if testing a statistical routine against a purpose-built test dataset (e.g. http://www.itl.nist.gov/div898/strd/anova/SmLs09_cv.html).  The performance hit is substantial, however, so it has to be used judiciously.  Anyway, here is an example:

import decimal
D = decimal.Decimal
decimal.getcontext().prec = 120
d1 = D("1.1")
f1 = 1.1
print "Decimal result is: %s" % round((d1**1000 - D("2.46993291801e+41")),3)
print "Floating point result is: %s" % round((f1**1000 - 2.46993291801e+41),3)
>>>

Decimal result is: -4.17366587591e+29
Floating point result is: -3.97456123863e+29

Usually, floating point is good enough – but not under all circumstances.  In which case, it pays to be familiar with the decimal module.

0.8.4 adds new features, and lots more polish

July 31st, 2009

The latest version includes a lot more polish and has many rough edges removed.  There are also several important new features:

  • Users can explicitly set variables to Nominal, Ordinal, or Quantity.  These settings are used to limit the variables displayed in various tests to those which are of the appropriate type.
  • Pearson’s Chi Square now has a contingency table with both observed and expected values.
  • The one-way ANOVA and Kruskal Wallis H now provide more information in the output e.g. average rank per group.
  • Variables can be configured from a right click when running statistical tests (as is already possible when making report tables).
  • Can select statistical tests by double clicking.  These are also sorted alphabetically to make direct selection faster.
  • New way of indicating a set of values has been limited to the first 20 unique values no longer disrupts the user from making selections.
  • Test data now includes a string variable (browser).

Bug fixes:

  • Numerical values appear in numerical rather than string order when configuring variable details.
  • No longer necessary to complete MS SQL Server details merely for having plugin installed.
  • Statistical output works even if variable is a string variable.
  • Minor problem with start screen positioning on dual monitors resolved (adequately)

Please let me know what you think.  Is the project heading in the right direction from your point of view?

The next lot of development will focus on subsidiary charting (as opposed to charting for main output).  E.g. assessing normality of data before choosing the appropriate test.

0.8.3 supports MS SQL Server – also misc improvements and some important bug fixes

July 18th, 2009

The latest release not only adds some important new functionality, it cleans up a lot of existing code and fixes a lot of bugs.

  • Supports direct connection to MS SQL Server databases.
  • Previous selections become the defaults when configuring statistical tests.
  • Labels are consistently updated in the statistical test dialogs if a new label file is selected.
  • Plus lots of important bug fixes and usability improvements.
  • Fixed installation bug affecting multi-user Windows installations.
  • Fixed bugs when connecting to MS Access depending on data type.
  • Fixed bug preventing tab traversal of Projects form.
  • Fixed bug when entering data in Windows.
  • Fixed bug when variable in independent tests had large number of unique values.
  • Fixed various bugs occurring when changing database or table selection.

Please report bugs – it’s good for the project

July 18th, 2009

Bugs are never welcome, but the only thing worse than a bug is a bug you don’t know about and could easily fix.  Even worse, an unknown bug could put some people off using your software, which is not a good outcome for anyone.  So how do you report a bug in SOFA Statistics?  Fortunately, Launchpad (which is where the SOFA Statistics source code lives) makes bug reporting easy.  Just go to: https://launchpad.net/sofastatistics/+filebug/+login and register the bug.  I’ll do my best to fix it and keep everyone informed along the way.

Remember – reporting a bug is an act of kindness so please don’t hold back.  Your report could help many other users.

0.8.2 adds final tests

July 10th, 2009

Finally!  Version 0.8.2 is the first with all the core statistical tests functional.  This version:

  • added one-way ANOVA.
  • added Kruskal-Wallis H.
  • added Pearson’s Chi Square.
  • and fixed startup bug affecting Windows users on networked drives.

Of course, the best is yet to come.  You may have noticed that huge empty space in the dialog for configuring a test.  Plus those disabled buttons about reporting level (“results only” through to “full explanation”).  I will be coming back to flesh out those areas.  The intention is that the user will be supplied with little visualisations of the actual data so they can see whether the test is appropriate or not (all explained in words as well with Help on hand).  E.g. a histogram of each sample so the shape of each distribution is visible at a glance.  Plus a small test and its interpretation which lets you know whether the test is usable or not.  The user shouldn’t have to know about, or remember to use, tests of kurtosis or skew or equality of variance.  They should simply choose the most likely test to be appropriate and have SOFA Statistics explain to them whether it will work or not based on the actual data being analysed (along the way some of these ideas are bound to rub off, of course). And when you click on the Help button next to the buttons on Normal/Not Normal, SOFA Statistics should not only explain the concept (with a couple of simple images), but also enable you to visualise the data and run the appropriate tests to decide if it is Normal or not.

0.8.1 adds 4 new tests inc Mann Whitney U and Pearson’s R

July 4th, 2009

Release 0.8.1 of SOFA Statistics added 4 new tests:

  • the Mann Whitney U
  • the Wilcoxon Signed Ranks
  • Pearson’s Correlation
  • Spearman’s Correlation

At this stage only the raw results are presented but the intention is to let users choose the level of explanation they want in their output.  Downloads are available from:

http://www.sofastatistics.com/downloads.php