Archive for the ‘developers’ Category

0.9.3 adds clustered bar charts to Chi Square test

Monday, February 1st, 2010

0.9.3 has nice new graphical output for the Chi Square Test and a few other enhancements. At least as important, however, are all the bug fixes. These are the result of a new pre-release testing process.

Underlying the clustered bar charts is the boomslang library, which provides a simplified interface to common matplotlib charts. What a great idea, and what a great name for a Python library.

Summary of new features in version 0.9.3:

  • Chi Square output includes clustered bar charts to display proportions and frequencies for the two variables selected.
    Chi Square output clustered bar charts

    Chi Square output clustered bar charts

  • Drop-downs default to the most recently used database and table. This recognises that most of the time you are using the same table as you used in the last analysis.
  • More helpful messages if trying to use variables with too many values for Chi Square.

Bug fixes:

  • Fix for Linux users with a 4-digit year date format.
  • Fixed encoding display issue for Windows users.
  • Miscellaneous fixes to the behaviour of the table design dialog. Numerous bugs were flushed out by more extensive user testing before release.
  • The Expand button is disabled if a report runs but not successfully (e.g. returns a warning).
  • The default database and table are saved correctly according to database engine (e.g. MySQL, MS Access etc). This ensures valid projects can always open.

Creating GUI tests

Monday, November 23rd, 2009

Over time, the goal is to extend the test coverage of SOFA Statistics. The GUI side of things needs to be included in this. Here is a link to some good resources:

Misc translation issues

Thursday, October 29th, 2009

The Galician translation is now complete.

e.g. #: dbe_plugins/
msgid “The SQLite details are incomplete”
msgstr “Os datos de SQLite están incompletos”

It is now time to enable SOFA Statistics to use multiple translations successfully. Here are some possible issues:

  • Overly-long strings: these can affect layout e.g. the buttons on the main form. There may be ways of abbreviating strings.
  • A locale not being installed on a computer. Lots more to learn about this but here is a linux command for identifying what is on your system
    locale -a
  • How to allow a user to select a locale (or to automatically use the locale of their computer).
  • Making sure everything works on Windows as well.
  • Getting the Galician po file approved by Launchpad (currently stuck in “Needs Review”)

Testing will begin soon. Most importantly, the internationalisation of SOFA Statistics has begun in earnest :-).

Making beautiful output using SVG and JavaScript

Saturday, October 24th, 2009

The charting functionality of SOFA Statistics is not available yet but the technology required is coming together. At the current time the intention is to use the gRaphaelJS library ( to create the charts and wxWebKit (wxWebKit progress) to display it. The goal is to have beautiful output without using a proprietary technology such as Flash (which also has printing problems). The gRaphaelJS library is still only version 0.2 but progress has been rapid. DmitryBaranovskiy is doing a great job (

Multi-language SOFA Statistics Begins

Saturday, October 24th, 2009

Launchpad offers great support for translating applications into different languages (  And Python (and wxPython have standard ways of supporting multiple languages.  So it was always going to be achievable to make SOFA Statistics multilingual as long as people were willing to help with translation.  First to raise their hand has been Indalecio Freiría Santos (see SOFA Statistics discussion thread) and the Galician version should be available first.  If you are interested in adding translations please feel free to raise your hand in the discussion group at any time.

Vista and Win 7 Permissions in SOFA Statistics Installer

Tuesday, October 6th, 2009

Successfully installing an NSIS-created package requires some attention to the permissions of the person doing the installation onto their Windows machine.

A useful discussion of permission levels is here –

If a user does not install SOFA Statistics with the appropriate permissions they might receive an error message like:

Error opening file for writing:
C:\Program Files\….

This may occur even if there is no folder called Program Files e.g. if they are installing onto a Swedish version of Windows.  See

If it is necessary to check if a user is installing with administrator permissions, the following may be useful –

Installing missing dlls in Windows for SOFA Statistics

Tuesday, October 6th, 2009

Creating a Windows installation package that works on everything from XP Home Edition to Vista 64-bit Business Edition is manageable but not exactly trivial.  Sometimes a single file can create a lot of issues e.g. msvcr71.dll (See To ensure this file is available on the target computer it is not simply a matter of transferring the file in the same way that other files are transferred.  The correct approach using NSIS is to run InstallLib.

The following item was helpful – The NSIS documentation of relevance is here –

The snippet of code used in the latest SOFA Statistics package for Windows is:

IfFileExists "$PROGRAMFILES\sofa\start.pyw" 0 new_installation



!insertmacro InstallLib REGDLL $ALREADY_INSTALLED REBOOT_NOTPROTECTED “G:\3 SOFA dev\sofalibs\msvcr71.dll” $SYSDIR\msvcr71.dll $SYSDIR

0.8.6 supports PostgreSQL and has better output formatting

Monday, August 24th, 2009

New features:

  • Added support for PostgreSQL databases.
  • Each item of output now has a preceding display line and a description of its data source (database and table) and when it was created.
  • Improved layout of exported scripts.
  • Added unit tests for main statistical algorithms used.
  • Better handling of timestamp and autonumber fields in data entry/editing.

Bug fixes:

  • Fixed script export bug.

Additionally, the Windows package now installs a menu shortcut for uninstallation. It always should have, of course, but the latter is still an example of a little thing which makes newer versions of SOFA Statistics nicer to use. The idea is that, collectively, thousands of details like that will create a sense of polish. The Ubuntu 100 papercuts project is one inspiration.

wxWebKit will enable graphing when it is packaged

Friday, August 21st, 2009

wxWebKit ( is a very important widget for the SOFA Statistics project as it will be used to display all output. At present, the only debian package for wxWebKit (kindly supplied by Christoph Willing) does not support the display of local images. Fortunately this is being rectified through the hard work of Kevin Ollivier, and a new package should be out sometime soon. This is expected to be a standard package which should simplify the installation instructions for Ubuntu users.

Once the wxWebKit package is available, a lot of development work will take place in SOFA Statistics to provide auxiliary graphs which support analysis e.g. by displaying the data distributions in the samples used for an ANOVA. It will finally be possible to really start delivering on the “learn-as-you-go” promise of SOFA Statistics.

Testing the statistical algorithms

Friday, August 21st, 2009

A statistical program has to produce accurate results reliably. And it has to keep doing so even when some aspects of the program change between versions. Seemingly trivial or non-consequential programming changes can have an enormous impact on the final result produced. So the only way to have confidence in a program is through automated testing. In many cases, it is also possible to test against a standard dataset with a guaranteed, known result (e.g.

The one-way ANOVA has passed the most difficult NIST test when using the default “precision” setting (as opposed to speed, which relies on floating point maths).

Additionally, the ANOVA, and all the other tests, are now tested using a number of carefully crafted Python functions and a simple program called NOSE ( The tests can feed hundreds of random samples of data into each SOFA Statistics algorithm and check the output against a trusted algorithm e.g. from SciPy.

Of course, randomness is not enough to test an algorithm. It is necessary to also feed in cases where some values are very high, very close to zero, or very similar to other values. The specific approach necessary to separate out the weak algorithms depends on the particular test. The NIST ANOVA datasets, for example, include lots of values with the same leading digits and the only difference occurring after the decimal point. A deliberate approach to testing increases the odds of exposing errors.

In the open source world there is no need to take anyone’s word for it. The test script, and all the algorithms for SOFA Statistics, are open source (, and any developers or statisticians who can extend or otherwise improve the tests are welcome to do so. That’s the open source way. So if you think of something that could help strengthen SOFA Statistics or its testing, please feel free to contact me.

As part of the testing just completed, a couple of small bugs were detected and these will be corrected in the next release coming soon.