0.9.0 can examine normality of variables when selecting main test to use

January 4th, 2010

The 0.9 series has finally begun! The main emphasis of this series is to enhance the learn-as-you-go goal of SOFA Statistics. For example, when choosing a statistical test it is important to know if the data is normally distributed or not. You can do this at the point where you’re choosing your test:

Graph to help determine normality

Graph to help determine normality

Here is the list of new features added in 0.9.0:

  • When selecting a statistical test to use, users can examine the normality of a variable visually and with a test of kurtosis and skew.  The thumbnail can be expanded into a more detailed histogram.
  • When opening the project selection dialog it defaults to the version last selected during session.
  • Opening the data selection dialog uses the last database and table selected during session.
  • Information on filter used also displayed in immediate output.
  • The values displayed in tests like the ANOVA take any filtering into account.
  • When exporting script, user reminded where it is being appended.
  • When editing variable details, user is reminded where the changes are being saved.
  • Useful message to user if using deprecated project file format.

Bug fix:

  • Fixed simple naming bug which prevented independent t-test and mann-whitney from running.

This is the first release with Matplotlib (scientific graphing) included and the 0.9 series of releases should include lots more graphing and internal documentation to a) help users choose the correct test and b) confirm that they made the right choice.

NB The current plan is to use RaphaelJS for the output charting, not Matplotlib.  Matplotlib graphics will be restricted to a support role, albeit a very important one.  This decision is consistent with the “beautiful output” goal of SOFA Statistics.  RaphaelJS is still maturing as a library, so it may be a while till conventional output graphing is part of SOFA Statistics.  But most of the foundations are in place.

0.8.15 flexible data filtering options and support for US-style dates

December 23rd, 2009

Version 0.8.15 has the following new features:

* Data filtering: If you’re only interested in data for one nation, for example, you can add a filter so reports are derived from that nation’s subset of the data. It is now easy to add, remove, and alter filters to individual tables. This feature makes analysis much more fluid. It is arguably one of the most important changes in long time. Check it out. Almost anywhere you can see tables listed, you can right click on the table name and set, alter, rename, or remove a filter.
* Support for US-style dates: SOFA Statistics now allows the setting of date entry format preferences, e.g. if you are in the US, Canada, the Philippines, Belize, Palau, or the Federated Republic of Micronesia, you may wish to allow mm/dd/yy as a date format for data entry instead of dd/mm/yy. Over time, there will no doubt be other preference options available.
* Default data table now has a datetime field to experiment with.
* More tooltips and misc usability improvements.

There have also been miscellaneous bug fixes including:

* Removed bug which meant you couldn’t immediately open a freshly-created table.

In addition there has been further expansion of test coverage and a fix of sorts for the Karmic launcher. There is a problem (hopefully temporary) where users of GTK applications can click a button, have it depress, and have nothing actually happen (unless they then click on the Enter key). What this means for SOFA Statistics is that when a Karmic Ubuntu user clicked on the new Preferences button, a preferences dialog wouldn’t appear. A partial solution has been to change the launcher slightly so it sets the GDK_NATIVE_WINDOWS environment variable. So instead of:

python /usr/share/pyshared/sofa/start.py

the launcher now invokes:
bash -c “GDK_NATIVE_WINDOWS=1 python /usr/share/pyshared/sofa/start.py”

… which solves the problem in many cases. If it doesn’t, the work-around is to mouse click on the Preferences button then press the Enter key. Fortunately, the Preferences button will hardly ever be used. Ideally, the problem will be fixed properly in newer versions of SOFA Statistics as the underlying code libraries are fixed.

Adding features, fighting fires

December 20th, 2009

The next version (0.8.15) will have two big new features:

  • The ability to apply filtering to tables. For example, if you are only interested in the data for one gender you could add a filter so that all the reports you make based on a given table are derived from the selected gender’s subset of the data. It is easy to add, remove, and alter filters to tables.
  • The ability to set preferences, e.g. if you are in the US, Canada, the Philippines, Belize, Palau, or the Federated Republic of Micronesia, you may wish to allow mm/dd/yy as a date format for data entry instead of dd/mm/yy. Over time, there will no doubt be other preference options available.

There have also been miscellaneous bug fixes, smaller features e.g. tooltips, and further expansion of test coverage. Now to the most recent fire I had to put out. There is a problem (hopefully temporary) where users of GTK applications can click a button, have it depress, and have nothing actually happen (unless they then click on the Enter key). What this means for SOFA Statistics is that when a Ubuntu user clicked on the new Preferences button, a preferences dialog wouldn’t appear. The solution was to change the launcher slightly so it set the GDK_NATIVE_WINDOWS environment variable. So instead of:

python /usr/share/pyshared/sofa/start.py

the launcher now invokes:
bash -c "GDK_NATIVE_WINDOWS=1 python /usr/share/pyshared/sofa/start.py"

… which solves the problem completely.

I made a similar change to the launcher for eclipse. See http://www.eclipse.org/forums/index.php?&t=msg&th=153842 and http://blog.export.be/2009/10/fixing-eclipse-for-ubuntu-karmic-koala-9-10/.

0.8.14 much more flexible management of data table design inside built-in database

December 9th, 2009

The latest release not only has the usual bug fixes and usability enhancements, it has a few new pieces of functionality.

  • Can modify the design of tables in the built-in SOFA database (variable names, types, order, table names).
  • Able to insert and delete rows when editing value labels.
  • Can delete tables in the built-in database.
  • The sofa id field is now read-only when designing tables and the user gets useful messages if they try to delete it or insert a new field before it.

The bug fixes were:

  • CSV importer copes with inconsistent new line characters anywhere they are encountered.
  • Fixed bug when a key is deleted in a value list and then OK is hit.
  • Removed bug where focus would move erratically when entering long lists of value labels.
  • Changing database or table when making table reports resets the column buttons as it should.
  • Numbers like 1000000000000.2 are now displayed as simple numbers instead of as 1e+12 etc.
  • Newly-created tables appear in the table dropdown list immediately.

IMPORTANT FOR ALL UPGRADING USERS – you must delete the /username/sofa folder first (after storing anything you wish to save). And inside project files, change conn_dets to con_dets.

Funding open source to benefit education, health etc

December 1st, 2009

Government investment in open source makes sense as a way of reducing the cost of government service delivery.

The model is this – a government agency invests a sum of money to benefit a government sector e.g. education, health, social service delivery etc. As long as the software is open source, the government sector gets a multiplier effect – any further development of the software in other countries will further benefit education, health, social service delivery etc in the original country.

“This has certainly been New Zealand’s experience with Moodle, a GPL licenced online learning management system. A few years ago Moodle was a nice platform, had a good development community and a few hundred sites. A New Zealand TEC fund was used to add a number of features, not least amongst these were some very solid enterprise and performance capabilities.

Moodle is now used in over 13,000 sites and must be the world’s most popular LMS. Investment from institutions such as the UK’s Open University has now well exceeded NZ’s initial investment. Yet because of the GPL New Zealand still gets to benefit from other peoples’ significant work.” (New Zealand Open Source Society 2007)

Now imagine if there was a hypothetical open source program 😉 that could be used throughout the education sector (senior secondary, polytechs, universities), across a myriad of smaller organisations that supply social services on behalf of government, and across the private sector as well. It enables everything from the teaching of statistics through to routine management reporting. A bit of assistance in its further development could have a large spin-off. Makes you think.

0.8.13 better support for inconsistent data and missing values

December 1st, 2009

Removing rough edges and handling less perfect incoming data have been the recent focus of attention. The main changes in 0.8.13 are:

  • Better support for inconsistent data and missing values.
  • The user is informed what happens to files being imported.
  • The user has the option of halting a large raw data table report.
  • CSV importing now asks user if file has header and offers to clean up files with mixed line separators.
  • New projects have the default SOFA database preconfigured ready to save tables to.
  • Program now returns useful message if SQLite table with median or std dev fails because of non-numeric values (in purportedly numeric field).
  • Attempts to make new tables without a connection to the default database now receive a useful message.
  • Users are given the opportunity to pull out of opening a large data table.

Also importantly, there have been a myriad of bug fixes:

  • Fixed bug when using SQLite database other than the SOFA default one.
  • Report tables now handle non-numeric values in numeric fields (SQLite).
  • Fixed simple but fatal bugs affecting raw and summary tables.
  • Fixed bug when user path includes international characters.
  • Fixed bug using MS SQL for making report tables.

Creating GUI tests

November 23rd, 2009

Over time, the goal is to extend the test coverage of SOFA Statistics. The GUI side of things needs to be included in this. Here is a link to some good resources:

http://groups.google.com/group/wxpython-users/browse_thread/thread/f68f415e8ef26b36?hl=en

0.8.12 – more flexible installation on Windows and better international support

November 21st, 2009

Another release with an emphasis on quality rather than new features. More new features will be coming soon once the wxWebKit and RaphaelJS technologies are mature enough. In the meantime:

  • More flexible installation options on Windows e.g. installing to D:\Program Files rather than C:\Program Files.
  • Better support for international text and unicode – e.g. René, Identität, François etc – in project, variable details, html report files, and css files.
  • Better interface behaviour when configuring a project.
  • Better feedback when errors with missing or malformed files.
  • For Galician speakers, a version of SOFA Statistics in their own language now works on Windows.
  • Fixed bug allowing new rows to be added to read-only tables if using Tab/Return key on last cell and then repeatedly hitting Tab keys.

0.8.11 provides internationalisation support and a major fix for Vista/Windows 7

November 9th, 2009

The latest version of SOFA Statistics has some important improvements.

  • Fixed major bug preventing interaction with data on Vista/Windows 7. It was caused by the “\U” combination inside project configuration files (e.g. C:\Users\…). The backslash U combination was treated as the start of a unicode string (international text etc) but as an invalid one. Windows testing using XP didn’t pick this up because the venerable “Documents and Settings” folder in XP has been replaced with the “Users” folder in Vista and Windows 7.
  • Better support for international text and unicode e.g. René, Identität, François etc.
  • Better responses to errors saving data to database tables. For example, if a user tries to save to database a word with characters in it not supported by the underlying database table (such as a unicode
    letter not found in the Latin character set).
  • For Galician speakers, a version of SOFA Statistics in their own language (currently only working in Ubuntu).

There is also a new version of wxWebKit etc available for Karmic (9.10) users thanks to Christoph Willing. NB this will also help some users of Jaunty (9.04) who have updated packages which conflict with those in SOFA Statistics. More details can be found at http://www.sofastatistics.com/predeb.php.

Misc translation issues

October 29th, 2009

The Galician translation is now complete.

e.g. #: dbe_plugins/dbe_sqlite.py:308
msgid “The SQLite details are incomplete”
msgstr “Os datos de SQLite están incompletos”

It is now time to enable SOFA Statistics to use multiple translations successfully. Here are some possible issues:

  • Overly-long strings: these can affect layout e.g. the buttons on the main form. There may be ways of abbreviating strings.
  • A locale not being installed on a computer. Lots more to learn about this but here is a linux command for identifying what is on your system
    locale -a
  • How to allow a user to select a locale (or to automatically use the locale of their computer).
  • Making sure everything works on Windows as well.
  • Getting the Galician po file approved by Launchpad (currently stuck in “Needs Review”)

Testing will begin soon. Most importantly, the internationalisation of SOFA Statistics has begun in earnest :-).