Archive for March, 2010

Good future for SVG/Javascript graphing

Wednesday, March 31st, 2010

Whether SOFA Statistics finally settles on RaphaelJS

RaphaelJS Pie Chart

RaphaelJS Pie Chart (NB dynamic in live version)

… or Dojo for output charting I am glad that it will be using SVG (and Javascript). These technologies, and support from them by mainstream web browsers, is only going to get faster and better.

Although the example below is not a graph it gives a taste of what is possible using these technologies: http://svg-wow.org/audio/animated-lyrics.html. Imagine being able to add comments and highlights directly to a chart which you can share with anyone. No proprietary viewer necessary :-).

BTW the reason output charting is not yet available is because I’m waiting till the wxWebKit widget supports it. wxWebKit is the technology SOFA Statistics uses to display HTML (web content) internally. The good news is that the improvements to wxWebKit should be ready by June or July.

0.9.7 Improved data design, report tables, and ODS importing

Tuesday, March 30th, 2010

There are four big changes in the latest version. They’re not really new features as such – the changes are mainly focused on making SOFA Statistics more pleasant to use. We hope you like it :-).

1) When designing or redesigning data tables, users now have visual feedback on the changes made in the form of a demonstration table:

Data design form

Data design form

2) The Projects form has been made a bit more attractive and coherent. Hopefully, it will be easier to use:

Project form refreshed

Project form refreshed

3) Demonstration report tables are more visually distinct from actual report tables (i.e. those run off the data):

Demo tables more faint

Demo tables more faint

4) Better guidance when asking if a csv file or spreadsheet has a header or not:

Import has header

Import has header

Here is the full list of improvements:

  • When designing or redesigning a data table, users can see a demonstration table showing their changes as they go. It uses a few rows of real data where possible to be more realistic.
  • When prompted to choose whether import source file has a header or not, example images appropriate to the type are shown and the options are ‘Has header row’, ‘No header’, and ‘Cancel’.
  • Demonstration text data is now more varied in width and more sensible.
  • The Projects form has an improved appearance and is more friendly and easy to understand.
  • Styling of report tables improved, especially for the raw data list table.
  • Demonstration tables are more visually distinct from report tables based on actual data (more faint).
  • Better details left for user if unable to connect to a type of database.
  • Can now use delete key when naming fields in data table design view.

And there are bug fixes:

  • Fixed ods import bug where repeating columns with values were only treated as one data cell. Major cleanup of ods importing code.
  • Copes with user having multiple versions of wxpython installed – explicitly uses correct version.
  • Fixed bug when inserting new rows into a new data table design.
  • Fixed small bug where renaming an odd table name in design view wouldn’t result in the original named table being deleted.
  • Empty strings not accepted as valid SQLite tables or fields.
  • Fixed bug where enabling of Import button lagged changes to text for source name and SOFA Table Name.

0.9.6 Easier frequency tables; faster large data tables; can import Calc/Gnumeric spreadsheets

Monday, March 22nd, 2010

The latest version of SOFA Statistics (0.9.6) is well worth the upgrade.  Deep down, I was never happy with the approach to creating frequency tables.  It was elegant in some ways to think of frequency tables as a special case of a rows x columns report tables but it was still a mistake.  Now there are four types of report table – Frequencies, Crosstabs, Row summaries (means etc), and simple Data Lists.  Here is how to make a simple Frequency Table for Age Groups with column percentages:

New Frequency Tables Interface

New Frequency Tables Interface

Also, large data tables load almost instantly now.  The table I just opened was over 200,000 rows and it simply appeared as soon as I clicked on the Open button.

The third major advance is for people who want to work with spreadsheets.  Until now, Excel has been the only option, which is only available on Windows systems.  Now users can enter data into an OpenOffice Calc or Gnumeric spreadsheet and it should be possible to import it successfully.  Please let me know how that goes.

ODS spreadsheets

ODS spreadsheets

Here is the full list of new features:

  • Added Frequencies as new report table type and substantially improved ease-of-use.
  • Major speed-up when opening data files as larger files no longer have their columns autosized.  There is a button to allow that to be triggered manually.
  • Now able to import from ods files including OpenOffice Calc and Gnumeric-derived spreadsheets.
  • Action buttons on report tables form enabled/disabled according to completeness of configuration data.
  • The import button is disabled until suitable file and table names have been entered.
  • Shifted more close buttons to bottom right location.
  • Minor improvements to wording of importer dialog to reduce possible confusion.
  • The actual results continue to show for a report table if the user cancels changing the row or column configuration.  Doesn’t revert to demo data.
  • Importing now turns single dots ‘.’ into nulls (missing data) and informs the user.
  • Better error messages if import file not found.  Sets focus on SOFA Table Name if not provided.  Better handling of missing/misnamed css files.
  • Variable setting dialog now appears in more sensible position – esp on a notebook or netbook.

The bug fixes will probably be as important, especially if you have experienced any of them:

  • Fixed integer division issue which meant all row and column percentages were rounded down. Now 100.0* … rather than 100* …
  • Now copes with odd field names like ‘weight(kg)’ and ‘strength/100’ that would have broken SQL.
  • Opening the project select dialog now displays the notes for the selected project, which is not necessarily the first one.
  • Fixed bug which made csv importing unable to recover from data type mismatch. Also fixed bug in csv importing when importing missing cells. Now actual extracts nulls rather than the text ‘None’.
  • Minor fixes to row button enabling/disabling on report table dialog.
  • Fixed misc bugs that became apparent in Windows: right clicking opened dialogs twice; faulty script generation after changing table type; problems with ending busy cursor; and not giving proper message when no data in table.
  • Fixed raw table display problem – now shows raw value if no label available for particular item.
  • Project notes can cope with backslash U etc. Now escaped when written to project file.
  • Can view internal tables with dots in the name.

SQL & integer division (why 5/2 usually equals 2!)

Monday, March 15th, 2010

I came across integer division in Python 2.x. If you divide one integer by another you get an integer result. So 5/2 = 2 instead of 2.5. You get floor division, not true division (Python – Changing the Division Operator). In Python 3, true division is the default (thank goodness) but in Python 2.x you need to make one of the numbers a float to get a float returned. So 5.0/2 = 2.5. I was bitten by this early on and know the standard way of handling it.

What I didn’t know was that integer division was the norm in SQL database SELECT statements. I had mainly been using MySQL and MySQL was pretty unique as it turned out:

MySQL by default does floating point division, even if both operators are of type INTEGER, so the above [1/2] would return 0.5 in MySQL. All of the other database engines tested do integer division, and return an integer result. (SQLite – Differences Between Engines).

Anyway, in SOFA Statistics, row and column percentages were affected by this behaviour and always returned x.0 %. There was never anything other than zero after the decimal point. The fix was very simple. Instead of SELECT … 100*(num/denom) the relevant code is SELECT … 100.0*(num/denom). The 100 is now a float for those who missed that small but significant difference.

0.9.5 major bug fixes for data importing & better default form height by screen size

Wednesday, March 10th, 2010

SOFA Statistics continues to push on towards an eventual 1.0 release. Even though SOFA is still tagged as an early release on the main project website, and as a beta program on the SourceForge site, there have been over 7,000 downloads already. And the current release feels more solid to me than some of the expensive proprietary products I have used in my career as a researcher and analyst. Of course, please feel free to report any bugs that you find (http://groups.google.com/group/sofastatistics) so they can be fixed before the next release.

The basic list of new features is:

  • SOFA now detects screen resolution and sizes height of dialogs/controls accordingly to better handle both netbooks and larger screens.
  • Now able to delete rows in data entry/editing and configuration tables using the delete key.
  • Added hyperlinks to main form for main project website and community website.
  • The project configuration window can be completely resized making it easier to see all settings.
  • The read-only default project form now has an OK button at the bottom right so it can be closed the same way as every other form.
  • Added preliminary chart configuration form. Output charting still under development. In some ways this is the most important development, even though it doesn’t offer anything to the user in the current version apart from a taste of the likely interface.
  • Hovering over image buttons now shows a different cursor in Ubuntu so users know that they are buttons they can click.

And the bug fixes of course:

  • Spreadsheets lacking a header are now correctly imported including the first line.
  • All forms of data import now cope with data which has altered field names in preparation for import into SQLite.
  • Clustered bar charts in the Chi Square output work even if there are lots of bars missing from some clusters.
  • Removed import statement that caused problems on some systems.
  • Button clicks occurring while inside a configuration text box now work as expected on first attempt.
  • If the primary key for an MS Access database lacks autonumbering, it will only be eligible for saving if it has a non-missing value.
  • Hourglass doesn’t remain open too long when clicking Expand button in Ubuntu.

I’m currently running a small poll on the main project page about the most important things to work on next (http://www.sofastatistics.com). Please vote if you haven’t already.

Follow sofastatistics on twitter

Wednesday, March 3rd, 2010

I finally succumbed and created a twitter account – you can follow the project on http://www.twitter.com/sofastatistics.