SOFA exports high-resolution results and more

February 9th, 2014

SOFA users can now export their output results as high-resolution images, PDFs, and spreadsheets ** without requiring additional plug-ins.

Easy high-quality exporting

It has never been easier to produce high-quality output ready to include in presentations, papers for publication etc.

High resolution output

It will also be possible to export table data. If there is a specialised analysis you can’t do in SOFA it will be much easier to export the data and import it into another stats package for that part of the process.

Export to spreadsheet from data

And the ability to backup your SOFA data and settings is built in.

Backup data and settings

So 1.4.2 is quite a major step forwards for the majority of users. I really hope you like it and spread the word.

** Doesn’t work for Macs currently – very sorry :-(. Any Mac users with Python experience are encouraged to contact the project – there are several ways you might be able to help SOFA resolve this problem. And get a little famous ;-).

1.4.1 Adds regression line for scatterplots

January 31st, 2014

Scatterplots can now be produced with a regression line and slope and intercept details:

Scatterplot regression lines

Additionally, the Export output plug-in (proprietary add-on) now gives option of exporting tabular data to spreadsheet. And there are some other minor improvements:

  • Better positioning of legend in scatterplots made by matplotlib.
  • Tweaked algorithm for getting optimal min and max axis values so more sensible when no variation.

The latest release also fixes a number of edge-case bugs:

  • Fixed bug in charting when users use variable names SOFA used in underlying SQL queries.
  • Fixed bug when refreshing database table dropdowns when no databases visible – was assuming the databases were included in the number of items in the sizer.
  • Fixed bug when print content redirected to output file in Windows and Mac – now coverted to utf-8 byte strings directly by overriding sys.stdout and sys.stderr with codecs.getwriter etc. Immediate impact is fix for bug when recoding a table and field names include non-ascii characters.
  • Fixed up fonts used so always look good on all systems.
  • Fixed bug which can occur when designing a new table. If we recode it before clicking the Update button, SOFA thinks we are trying to override another table of the same name. This is because SOFA started out thinking our table had no name and was never updated to tell it otherwise.
  • Minor changes to enable translation.
  • Fixed bug when importing empty pairs of double or single quotes. These were already being de-escaped (as a side-effect of the approach necessary to handle internal quote escaping in the csv module) and turned to solo quotes – and thus evading the check for blank raw vals which would have been turned to NULL.
  • Fixed bug giving error message for too many rows instead of too many columns when too many columns e.g. in Chi Square test.
  • SOFA now checks very early to see if you’ve installed SOFA under a local user folder instead of a program folder.
  • Fixed bug in PostgreSQL plug-in when working with a numeric field lacking a defined decimal points or numeric precision setting.
  • Show scatterplot minor axis ticks more readily so better when fewer distinct x values.
  • Scatterplots cope with absence of variability in an axis by forcing a different min and max for that axis.

SOFA releases 1.4.0 for Christmas :-)

December 15th, 2013

Ease of use is one of SOFA’s main goals (along with “learn as you go”, and “beautiful output”). Unfortunately, as new options were added to SOFA for exporting data, the simplicity of the output section of the user interface suffered slightly. New buttons were squeezed in one by one and the interface was getting more and more crowded. Something was going to have to change. And in version 1.4.0 it finally has! – I hope you like the change. It not only removes two interface items but it also adds room for more export options in the future. And there is more horizontal space in the drop-down control to describe each option more clearly and distinctly.

Here is the old design:

Old output layout

Lots of buttons

Not too bad, but lots of buttons, and more needed in the future. Here is the new design:

New output layout

It drops two items. There is also scope for adding more export options. Here are the current options available as displayed in the drop-down control:

Room for more

Obviously, it wouldn’t be hard to add a few more given it is a list. The Show Results button is larger, and its relationship with the report is more obvious by adding “Also” to “Also add to report”.

Large Show Results button

Only modest changes in many ways but hopefully another step in improving the user experience. Merry Christmas 🙂

Never expected to see 150,000 downloads

November 19th, 2013

Creating and sharing a software project is a leap into the unknown. Will anyone use it? Will anyone like it? And although download numbers are a very imperfect measure, they can provide encouragement when engaged in the numerous tasks associated with a project. So it is with great pleasure that I can announce SOFA Statistics has passed the 150,000 milestone on Sourceforge.

150,000 dowloads milestone

Thanks to all the people who have helped make this possible. I’m still thinking about what to do with SOFA next but it seems to have found a niche as a general purpose, open source statistics application. And I’m still trying to stay true to the mantra “ease of use, learn as you go, and beautiful output”.

If you’ve liked using SOFA, please consider sending me a brief message at I’m keen to hear where in the world people are from, what sorts of things SOFA has been used for, and anything else interesting. Please include at least your first name – I’d like to display some of these messages to promote SOFA. Thanks in advance. [Note – my purpose is to collect some feedback to share, not to gather email addresses. But I expect I will personally reply to a few emails if I have time.]

Version 1.3.5 adds some simplicity

October 14th, 2013

Colin Chapman, founder of Lotus Cars, had the following philosophy for automotive design: “Simplify, then add lightness”. Architect Ludwig Mies van der Rohe used to say “Less is More”. In a similar vein, the latest version of SOFA adds some simplicity for all those users who only use the default database (see Database?! But I just want to analyse my data!). Here is the list of feature changes introduced with version 1.3.5 of SOFA:

  • Simplified access to data for most users. SOFA now only displays a list of data tables (instead of showing both a list of databases and tables) unless there are multiple databases to choose from. For most users, the only database will be the default database SOFA uses to store imported or hand-entered data.
  • Slovenian support added (thanks to Nino Rode :-)).

And the following bugs have been fixed:

  • Fixed bug in histogram output when limited data spread. Error caused by miscalculation of significant decimal points required for display.
  • Fixed bug stopping late-added title details appearing when exporting output. The demo output is refreshed first so the source file is forced to be up-to-date. Probably needs a proper tidy-up some day but this works well in the mean time.
  • Fixed bug in exporting to desktop folder in fix_pdf – need to strip end off folder name when no AM/PM under localisation used.
  • Fixed various bugs associated with exporting output. When copying output, message about keeping form open now names form it means (to reduce confusion).
  • Fixed bug when using a project after it has just been deleted (by pressing cancel in select projects dialog after having deleted the currently active project).
  • Fixed bug which meant “Show Results” and “Add to report” options were displayed when setting up a project.
  • Fixed bug when cancelling a variable details selection in a project.

I hope you like the latest version.

Database?! But I just want to analyse my data!

August 22nd, 2013

SOFA aims for ease of use as part of its “ease of use, learn as your go, beautiful output” mantra. But it confronts users with having to think about databases, even if just working with simple spreadsheets of data or some data entered by hand.

This was the usability problem brought to my attention by a member of the community, Jan Dittrich. Jan (, is completing a Masters in Media Arts and Design at the Bauhaus University in Weimar/Germany. He mainly does user research and usability, but has an interest in statistics as well. When using SOFA he noticed that a “database” needs to be selected for most of the activities but that it might be a rather technical concept for some of those who use SOFA. He wrote me an email addressing the problem and we subsequently exchanged ideas.

So how to address this without removing one important ability of SOFA – namely the ability to connect directly to people’s data when it is in a database (e.g. MySQL).

We explored a few options …

Initial GUI ideas from Jan

… but ended up following the principle of “the least we could do” as recommended in the fantastic usability book “Rocket Surgery Made Easy” by Steve Krug.

Rocket Surgery Made Easy

As Krug notes, tweaking is usually better than redesigning because 1) it actually gets done; 2) larger changes are inevitably going to break some things (think months of squashing all the bugs out again); and 3) redesigns annoy a lot of existing users who have gotten used to the status quo (actually Krug has 9 reasons but these are my favourites).

Anyway, I had no enthusiasm for a major GUI overhaul but it did not make sense to leave a known usability problem in place. What Jan and I came up with was rather simple and elegant. SOFA only shows the Database label and drop-down if the user has configured SOFA to connect to any databases. Expect to see this change in the next version (1.3.5).

Database details only displayed when needed

Users who have database connections will notice no difference. But for everyone else the interface will be simpler and easier to use. Sometimes, less is more.

Where to for SOFA?

August 3rd, 2013

The SOFA Statistics project could go in a number of different directions. Ideally, it would:

  • Add more chart types and more flexibility for graphical customisation
    (without compromising the SOFA goals of beautiful output and ease-of-use)
  • Add a comprehensive array of the most important statistical tests
    (without compromising the ease-of-use and learn-as-you-go goals)
  • Make it much easier to automate reporting
  • Make publishing reports to the web, and office formats, seamless and simple.

I only have limited time to develop SOFA at the moment, so I have to choose the top priorities. Here is what I think I should do:

  • Charting
    • Make it easy to export data so it’s ready for charting using spreadsheet charting tools
    • Provide brief documentation on how to use advanced tools like Matplotlib
  • Statistical Tests
    • Make it really easy to export data from SOFA ready for analysis in R
  • Report automation
    • Provide documentation so people can automate SOFA themselves using Python
  • Publishing
    • Add a plug-in for exporting to a document format

I should also solve the remaining bugs preventing Mac users from being able to export output as images. What do people think about this direction? Drop me a line at

Lots of bug fixes (esp MS Access & SQL Server)

July 23rd, 2013

SOFA Statistics 1.3.4 is not an adventurous release but it squashes plenty of bugs – especially for MS Access and MS SQL Server users. Here are the features:

  • Can make more complex charts and larger series of charts. It is now possible to override the conservative limits on charts e.g. the maximum number of series or charts or clusters. A warning is shown that you may not necessarily produce a viable chart or set of charts. But often it will work so now you get to try and see.

High number of charts

Lots of images

  • Importing now copes with excessively long field names by shortening them automatically (without risking duplicates).
  • MS SQL Server views can now be analysed, not just tables.

And here are the bug fixes:

  • Fixed bug with calculation of mean with MS SQL Server data (now explicitly cast as float to avoid integer result).
  • Fixed bug in ANOVA output for precise (as opposed to speed) – it used to try mixing Decimals and other numeric types unsuccessfully.
  • Fixed bugs with chart data gathering queries so works in MS SQL Server properly. Also cleaner for the others in any case.
  • Fixed bug in underlying code if unique=False ever applied with scatterplots (currently never but it was still, technically, a bug).
  • Fixed bug in scatterplot SQL which affected MS Access and MS SQL Server (can’t use aliases in group by etc).
  • Adjusted y-title position in dojo scatterplots to avoid it being cropped.
  • Fixed bug when Postgresql date date being displayed as a category (couldn’t calculate a length of a datetime.datetime object).
  • Fixed layout bug when report table resized after Add to Report checkbox hidden. Now freshens layout when checkbox reappears.
  • Fixed bug preventing charts from being produced when linked to MS Access.
  • Fixed bug adding large delay to display of output when linked to MS Access.

There are some important changes coming for SOFA but it was important to tidy up a bit first. Watch this space!

Better Mac Testing of SOFA

May 5th, 2013

SOFA Statistics has a Mac version so I need to be able to test and package SOFA on a Mac. I do this on my Ubuntu Linux host machine using VirtualBox which works pretty well. But until a few minutes ago, the VirtualBox instance of Mac I had running was squeezed into a somewhat restrictive screen resolution. No longer! Here are the two basic steps I followed to resolve this problem:

1) Add a new screen resolution to as per How to Increase Mac OS X Snow Leopard Virtual Machine Screen Resolution on VirtualBox and VMware using Method #1 (but not from the /Extra folder – from the next bit):

Mac screen resolutions

2) Make screen resolution available from VirtualBox end as per Notes on setting 1680×1050 resolution on a Snow Leopard inside a VirtualBox

Resolutions from VirtualBox

And here is the result – a much more pleasant experience of testing SOFA on the Mac platform.

Better mac screen resolution for sofa
Next goal is to get some tricky graphics libraries I need working on the Mac.

Confidence Intervals for ANOVA & t-tests in 1.3.3

April 5th, 2013

95% confidence intervals have now been added to ANOVA and t-tests. And associated output has right justified numbers to make it easier to read.

Confidence Intervals

Version 1.3.3 also lets you sort by category labels in clustered bar charts, line charts, area charts and box plots. Area charts can also be sorted ascending or descending by count/mean/sum.

The series and category are now displayed in tooltips e.g. Italy, 20-29 for clustered bar charts, multi-series line charts, and box plots. This is especially helpful when there are lots of categories and/or series.

Boxplot Improvements

  • Improved statistics output footnotes.
  • Borders on bar-type charts are now optional. This can be useful when bars are very short.
  • Chi square clustered bar charts can cope with higher default limits for number of values.
  • Importing field names with more than 90 characters prohibited at the point of import rather than causing problems later.
  • The group by max number of values is now controlled by a single my_globals setting (making it easier to override).
  • The default settings for some remaining max values have been increased.

There was one minor bug-fix this version – line charts now cope better with lots of categories (increased padding around max label width in overall width calculations).

And a problem with the deb installer was also fixed.