Great new SOFA teaching resource

Saturday, January 28th, 2017

Thanks to George Self there is a great new teaching resource available for SOFA users. See Here is George’s announcement repeated from the discussion group:

I teach an undergrad research methodology class and wrote a SOFA-based lab manual for that class that some of you may be interested in. You can find the manual and the data sets at

The manual has ten chapters:

  1. Introduction (data types, normal distribution, kurtosis, skew, null hypothesis, downloading/installing SOFA, recoding data)
  2. Central Measures (mean, median, mode)
  3. Data Dispersion (range, quartiles, standard deviation)
  4. Visualizing Dispersion (box charts)
  5. Frequency Tables (frequency tables, crosstabs, complex crosstabs)
  6. Visualizing Frequency (histogram, bar chart, clustered bar chart, pie chart, line graph)
  7. Correlation (pearson’s r, spearman’s rho, significance, scatter plots)
  8. Regression
  9. Hypothesis Testing: Nonparametric Statistics (SOFA Statistics Wizard, Kruskal-Wallis H, Wilcoxon Signed Ranks, Mann-Whitney U)
  10. Hypothesis Testing: Parametric Statistics (ANOVA, t-test-Independent, t-test-Paired)

There are also two appendices, the first is a data dictionary for each of the data sets used and the second covers the various report generating features of SOFA.

The lab manual covers all of the functions and features of SOFA, but in the context of a lab where those functions are practiced rather than just described. The manual also includes a lot of information about how the various statistical measures are used (for example, the difference between correlation and causation). No math knowledge beyond simple high school algebra is assumed on the part of the student and each of the labs includes a “deliverable” activity so instructors can use this as part of a class.

I’ve printed this manual under Creative Commons-BY-ShareAlike so please feel free to use this in any way you want. Of course, I’m also happy to receive comments that could help me improve this manual in the future.


Please give the resource a spin and provide George with any feedback that can improve/refine it. Once again, thanks George for making this available to the community 🙂

Calling Mac Users – Contact Me to Test Latest Greatest Version

Saturday, March 14th, 2015

Over the last year, work on SOFA has been focused on a difficult packaging issue – enabling a Mac version to be built which allows Mac users to export their charts and reports as PNGs and PDFs. That functionality is now working on Snow Leopard and hopefully newer versions as well. But it would be nice to check with some people running Mac. If you’d like to try out the latest version of SOFA Statistics, please drop me a line via

SOFA Shows Support For Linux

Thursday, April 10th, 2014

SOFA works on Windows, Mac, and Linux. But Linux is especially important for the project because SOFA is developed on Ubuntu. So it made sense to support the Linux ecosystem by signing up with the Open Invention Network. In an ideal world, it wouldn’t be necessary to have anything to do with software patents. For various reasons, they’re a bad idea and function more to inhibit innovation than encourage investment in software research and development. But the Open Invention Network plays a protective function in a world where people who create and actually make things can be preyed upon by parasites who have been granted monopolies on ideas – the so-called patent trolls.

The group was created to defend Linux from patent trolls and other attacks from patent holders. It tries to do this with its own patents which are then available royalty-free to any company, institution or individual that agrees not to assert its patents against Linux. While it hasn’t been done, these patents could also, in theory, be used by the OIN, or an OIN member, against a hostile company in a patent war.

Google joins Open Invention Network patent commons as a full member

Anyway, a range of companies and projects large and small (over 800 at present and growing) have signed up for the initiative including Google, Dropbox, IBM, Canonical, Mozilla, Twitter, Puppet Labs, Valve Software, Alfresco, NEC, Blender, OpenShot, Novell, Inkscape, Philips, Red Hat, CentOS, GNOME, Wikimedia, MariaDB Foundation, Rackspace, Moodle, Openstack, Slackware, Tor, and Sony. You get the idea.

PortableApps version of SOFA (alpha only)

Wednesday, April 2nd, 2014

J. David Eisenberg has kindly made a PortableApps version of SOFA for Windows. It is an alpha release only but it works and feedback/assistance is welcome. Here is his announcement as posted on the Google Group:

I have used the PortableApps guidelines (correctly, I hope!) to create a version of SOFA Statistics that can be installed on a USB drive and will retain your data and settings.

You can download the installer at ; this is Windows only.

Known problem: If you add the results of a statistical test to a report, any graphs for that statistical test will show up as a “missing image” icon. The image will be in the report; it just won’t show up on screen.

I have not tried scripting to see if that works properly.

Any comments are welcome at the SOFA statistics Google Group

Using SOFA alongside other statistics packages

Saturday, February 15th, 2014

All statistics packages have their strengths and weaknesses so it is not uncommon for people to want to use more than one – even on the same project. SOFA is focused on making it easy to use some core statistical tests and producing attractive, high definition tabular and charting output. SOFA also makes it easy to link to, or import from, a wide range of formats: xls, xlsx, csv, google docs spreadsheets, MS Access, NySQL, MS SQL Server, PostgreSQL, SQLite, and, more recently, CUBRID.

But there is no point overcomplicating SOFA so it can do every statistical test that might be needed for a particular project. SOFA users have been routinely surveyed on what features they would like added and it has not consolidated into a clear list of priorities. People need lots of different things depending on their specific projects.

So a sensible goal for SOFA is to make it easy to import and export data, including metadata such as variable and value labels. This strategy has already resulted in the addition to SOFA (version 1.4.2) of built-in export-to-spreadsheet functionality. And it has already been improved for version 1.4.3 (not released yet at the time of this writing).

The question is, what packages should SOFA target as priorities for interoperability? Feel free to fire me off an email at

Never expected to see 150,000 downloads

Tuesday, November 19th, 2013

Creating and sharing a software project is a leap into the unknown. Will anyone use it? Will anyone like it? And although download numbers are a very imperfect measure, they can provide encouragement when engaged in the numerous tasks associated with a project. So it is with great pleasure that I can announce SOFA Statistics has passed the 150,000 milestone on Sourceforge.

150,000 dowloads milestone

Thanks to all the people who have helped make this possible. I’m still thinking about what to do with SOFA next but it seems to have found a niche as a general purpose, open source statistics application. And I’m still trying to stay true to the mantra “ease of use, learn as you go, and beautiful output”.

If you’ve liked using SOFA, please consider sending me a brief message at I’m keen to hear where in the world people are from, what sorts of things SOFA has been used for, and anything else interesting. Please include at least your first name – I’d like to display some of these messages to promote SOFA. Thanks in advance. [Note – my purpose is to collect some feedback to share, not to gather email addresses. But I expect I will personally reply to a few emails if I have time.]

Database?! But I just want to analyse my data!

Thursday, August 22nd, 2013

SOFA aims for ease of use as part of its “ease of use, learn as your go, beautiful output” mantra. But it confronts users with having to think about databases, even if just working with simple spreadsheets of data or some data entered by hand.

This was the usability problem brought to my attention by a member of the community, Jan Dittrich. Jan (, is completing a Masters in Media Arts and Design at the Bauhaus University in Weimar/Germany. He mainly does user research and usability, but has an interest in statistics as well. When using SOFA he noticed that a “database” needs to be selected for most of the activities but that it might be a rather technical concept for some of those who use SOFA. He wrote me an email addressing the problem and we subsequently exchanged ideas.

So how to address this without removing one important ability of SOFA – namely the ability to connect directly to people’s data when it is in a database (e.g. MySQL).

We explored a few options …

Initial GUI ideas from Jan

… but ended up following the principle of “the least we could do” as recommended in the fantastic usability book “Rocket Surgery Made Easy” by Steve Krug.

Rocket Surgery Made Easy

As Krug notes, tweaking is usually better than redesigning because 1) it actually gets done; 2) larger changes are inevitably going to break some things (think months of squashing all the bugs out again); and 3) redesigns annoy a lot of existing users who have gotten used to the status quo (actually Krug has 9 reasons but these are my favourites).

Anyway, I had no enthusiasm for a major GUI overhaul but it did not make sense to leave a known usability problem in place. What Jan and I came up with was rather simple and elegant. SOFA only shows the Database label and drop-down if the user has configured SOFA to connect to any databases. Expect to see this change in the next version (1.3.5).

Database details only displayed when needed

Users who have database connections will notice no difference. But for everyone else the interface will be simpler and easier to use. Sometimes, less is more.

Nice feature for dissertation students

Friday, January 13th, 2012

A helpful user drew my attention to the desirability of adding a small but important feature for dissertation students – namely, the ability to leave the percentage symbol off the numbers in the percentage columns of frequency and cross tabulation report tables. This new feature will be in the forthcoming version of SOFA (1.1.4):

Show (or hide) percentage symbols
Here is the feedback from Doug:

For dissertation writing in the States, Turabian 7th edition and the Chicago Manual of Style 6th edition are standard for many graduate schools on both the masters and the doctoral level. In both cases, tables with percentage figures in them do NOT have percent signs in front of the numbers themselves, because a typical title like “Table 3. % of babies born to men over 40” already tells you what’s inside the table.

Sofa Stats, however, so far as I can see, requires that percentages have the percent sign, which then gets dragged-and-dropped into Word (or Excel, for tidying up first). If the table is a small one, and if there are only one or two, no problem. But many dissertations have tons of them.

There is a way to rid a table of the % signs by using Excel, but it’s awkward and not a part of the regular menu system. I just worked it out myself a few hours ago, after spending half a day on the problem.

What would be *extremely* helpful to graduate students, whom I assume you would like to have as one of your key user groups, would be for you to program in a “switch” that would allow the user to specify percentages with or without percent signs. It’s a small detail, but one that would be much appreciated.

I generally try to avoid adding more features to SOFA in favour of keeping it simple but this seemed a good idea. Thanks again for the feedback Doug.

Easier to “Get Started” and to give feedback

Wednesday, April 13th, 2011

It is easier than ever to get started with SOFA Statistics. A new “Get Started” button has been added to the main form and other buttons have been shifted to better emphasise the most important:

New "Get Started" button

Clicking on the “Get Started” button open a web page with screen shots and step-by-step examples.

It is now easy for first-time users to give their feedback. Was it as useful as they hoped? Is there anything which can be done to improve SOFA? There is a link on the main start form, plus a pop-up option on first exit from SOFA.

Feedback via simple Google Docs survey

The goal is to make SOFA more useful by finding out what worked, and didn’t for users – especially first-time users.

There were two other changes:

  • When importing csv files, SOFA now sets the default for ‘Has header row?’ according to a review of the sample contents.
  • And an Exit control has been added to all forms where appropriate.

Once again, there have been a number of bug fixes:

  • Fixed postgresql quoting error by using single quote values.
  • SOFA now uses the default database when making an initial connection. If connecting to a project fails, SOFA reverts to previous project. Postgresql projects insist on default database if user is not ‘postgres’. Gives useful error if an old project has this problem.
  • Fixed minor bug in Mann-Whitney output exposed whenever labels with %s in them were used.
  • Improved error trapping if error importing wx.lib.iewin.
  • Better font for help text on main form according to platform.
  • CSV importing copes with single-row data.
  • Better font settings for help text on Macs.
  • Fixed display bug in Mac when more than 20 values warning shown (smaller font now).
  • Chart by now filters out data lacking values in chart by variable.

The most important thing for the project is the feedback we will hopefully receive.

Follow sofastatistics on twitter

Wednesday, March 3rd, 2010

I finally succumbed and created a twitter account – you can follow the project on