Archive for February, 2010

Why should statistics be Open For All?

Sunday, February 21st, 2010

SOFA stands for Statistics Open For All. What does “Open For All” mean and why is it important?

Open For All means:

  • The statistical algorithms used are visible and can be examined by interested users with minimal difficulty (and no Non-Disclosure Agreements required etc).
  • The software is available in the languages users need. At this stage, SOFA Statistics is available in English and has been translated into Galician (largely) and Russian (partially) but the goal is to include as many languages as possible.
  • The program is available without payment (especially important for students and people in developing nations).
  • SOFA Statistics will run on as many computer environments as practical. Currently, only Windows and Ubuntu Linux are supported, but the goal is to add a Mac package ASAP, and possibly some other Linux distributions.
  • The program tries to reduce the amount of prior learning a user has to have to use the package successfully and appropriately. It is not assumed that statistics can be used without thought or any statistical insight, but the goal is to help the user make the right decisions at the right points.

So why does this matter?

  • So students can easily access useful and educational statistical software (no, a spreadsheet doesn’t count 😉 )
  • To allow smaller, or poorly-resourced organisations e.g. non-governmental social service organisations/charities/groups in developing nations etc to conduct basic quantitative research and to generate useful ad hoc and routine reports
  • Because statistical thinking is a fantastic intellectual resource that deserves greater appreciation. It is a shame that the main idea most people have about statistics is “Lies, damned lies, and statistics”.

0.9.4 Additional output for 3 tests and numerous important bug fixes

Sunday, February 7th, 2010

0.9.4 is another important release. The new testing regime is identifying and fixing all sorts of quirky bugs, as well as some more significant ones. Please join the discussion group if there are any surviving bugs which are an issue for you (

Here are the new features of this release:

  • Paired t-test output includes a histogram of differences. This makes it easy to assess the normality of the distribution of differences.
    Paired t-test output

    Paired t-test output

  • Kruskal Wallis output now includes a table for each group containing its median, n, min etc.
  • Mann Whitney output now includes medians.
  • If using assistance to select statistical test, the normality help dialog varies according to whether or not paired data is selected. If paired, then two variables must be chosen and the normality of the differences is analysed and displayed.
    Normality of Differences

    Normality of Differences

  • When a cell edit fails validation, the cursor returns to the end of the text if possible, ready to edit immediately. This ends one major interface annoyance.
  • Users receive a useful message if there are no values to report in an analysis e.g. the data is over-filtered.
  • The Chi Square test provides useful messages if too few values in either row or column variables.

The list of bug fixes is quite substantial this time:

  • Independent t-test now works even if using a string variable for grouping.
  • Fixed bug preventing scripts from being run independently of GUI.
  • Fixed bug exporting scripts to the saved scripts file.
  • Fixed minor UI bug which meant the paired option remained visible after the stats selection was back to unguided.
  • Fixed bug that meant if the user moved the mouse away from data being entered the cell editor closed.
  • Fixed bug caused when shifting from one project with a default database engine e.g. MySQL, to another project in which that database is not available. Changing project wipes the stored default database engine.
  • Fixed bug with writing scripts with unicode characters.
  • If unable to calculate kurtosis etc still potentially able to produce rest of results.
  • Chi Square now honours filter values in script version.
  • Projects no longer have problems with new lines in their notes.
  • System copes with faulty project files better.

Development attention will start turning to the following in due course:

  • Mac packaging
  • Making the “results only” and “brief” explanation level settings operational
  • Making more “Help” buttons functional
  • Enabling user-defined missing values
  • Adding an Oracle plug-in
  • Output charting
  • Connecting to on-line educational resources on statistics

How do you feel about the direction being taken? Good? Bad? Any feedback? Please feel free to discuss any aspect of this project at

0.9.3 adds clustered bar charts to Chi Square test

Monday, February 1st, 2010

0.9.3 has nice new graphical output for the Chi Square Test and a few other enhancements. At least as important, however, are all the bug fixes. These are the result of a new pre-release testing process.

Underlying the clustered bar charts is the boomslang library, which provides a simplified interface to common matplotlib charts. What a great idea, and what a great name for a Python library.

Summary of new features in version 0.9.3:

  • Chi Square output includes clustered bar charts to display proportions and frequencies for the two variables selected.
    Chi Square output clustered bar charts

    Chi Square output clustered bar charts

  • Drop-downs default to the most recently used database and table. This recognises that most of the time you are using the same table as you used in the last analysis.
  • More helpful messages if trying to use variables with too many values for Chi Square.

Bug fixes:

  • Fix for Linux users with a 4-digit year date format.
  • Fixed encoding display issue for Windows users.
  • Miscellaneous fixes to the behaviour of the table design dialog. Numerous bugs were flushed out by more extensive user testing before release.
  • The Expand button is disabled if a report runs but not successfully (e.g. returns a warning).
  • The default database and table are saved correctly according to database engine (e.g. MySQL, MS Access etc). This ensures valid projects can always open.