Archive for the ‘education’ Category

Great new SOFA teaching resource

Saturday, January 28th, 2017

Thanks to George Self there is a great new teaching resource available for SOFA users. See Here is George’s announcement repeated from the discussion group:

I teach an undergrad research methodology class and wrote a SOFA-based lab manual for that class that some of you may be interested in. You can find the manual and the data sets at

The manual has ten chapters:

  1. Introduction (data types, normal distribution, kurtosis, skew, null hypothesis, downloading/installing SOFA, recoding data)
  2. Central Measures (mean, median, mode)
  3. Data Dispersion (range, quartiles, standard deviation)
  4. Visualizing Dispersion (box charts)
  5. Frequency Tables (frequency tables, crosstabs, complex crosstabs)
  6. Visualizing Frequency (histogram, bar chart, clustered bar chart, pie chart, line graph)
  7. Correlation (pearson’s r, spearman’s rho, significance, scatter plots)
  8. Regression
  9. Hypothesis Testing: Nonparametric Statistics (SOFA Statistics Wizard, Kruskal-Wallis H, Wilcoxon Signed Ranks, Mann-Whitney U)
  10. Hypothesis Testing: Parametric Statistics (ANOVA, t-test-Independent, t-test-Paired)

There are also two appendices, the first is a data dictionary for each of the data sets used and the second covers the various report generating features of SOFA.

The lab manual covers all of the functions and features of SOFA, but in the context of a lab where those functions are practiced rather than just described. The manual also includes a lot of information about how the various statistical measures are used (for example, the difference between correlation and causation). No math knowledge beyond simple high school algebra is assumed on the part of the student and each of the labs includes a “deliverable” activity so instructors can use this as part of a class.

I’ve printed this manual under Creative Commons-BY-ShareAlike so please feel free to use this in any way you want. Of course, I’m also happy to receive comments that could help me improve this manual in the future.


Please give the resource a spin and provide George with any feedback that can improve/refine it. Once again, thanks George for making this available to the community 🙂

New tutorial videos on SOFA Statistics

Sunday, January 15th, 2012

Check out these two new tutorial videos for SOFA Statistics:

Great new tutorial on Hypothesis Testing

Saturday, June 11th, 2011

J David Eisenberg has written a great new tutorial on hypothesis testing and here is a guest post from him for the SOFA blog. Enjoy:

I teach a psychology research methods course at a local community college. Every semester, I see the students’ confusion about hypothesis testing and significance levels. Even at the end of the semester, there are always a few students who think that a statistical result with a probability of .001 must *not* be significant, because the number is so small.

I do explain the concept during one lecture, but that just doesn’t do the trick. I could write a web page with the explanation, but I’m sure I’d get a TL; DR [1] from the students. So, I decided to make the explanation in the form of a visual novel [2] (VN). I used the Ren’py [3] visual novel engine to create the script, and it worked fine. The problem is, you need to download a fairly large file in order to display the VN; again, something that students would probably not be eager to do. The solution, which Grant [Developer of SOFA Statistics] suggested, was to make it all web-based. After some failed experiments with canvas and SVG, I was able to achieve the effects I wanted with HTML, CSS,and JavaScript. The result is at

Tutorial on hypothesis testing


Why should statistics be Open For All?

Sunday, February 21st, 2010

SOFA stands for Statistics Open For All. What does “Open For All” mean and why is it important?

Open For All means:

  • The statistical algorithms used are visible and can be examined by interested users with minimal difficulty (and no Non-Disclosure Agreements required etc).
  • The software is available in the languages users need. At this stage, SOFA Statistics is available in English and has been translated into Galician (largely) and Russian (partially) but the goal is to include as many languages as possible.
  • The program is available without payment (especially important for students and people in developing nations).
  • SOFA Statistics will run on as many computer environments as practical. Currently, only Windows and Ubuntu Linux are supported, but the goal is to add a Mac package ASAP, and possibly some other Linux distributions.
  • The program tries to reduce the amount of prior learning a user has to have to use the package successfully and appropriately. It is not assumed that statistics can be used without thought or any statistical insight, but the goal is to help the user make the right decisions at the right points.

So why does this matter?

  • So students can easily access useful and educational statistical software (no, a spreadsheet doesn’t count 😉 )
  • To allow smaller, or poorly-resourced organisations e.g. non-governmental social service organisations/charities/groups in developing nations etc to conduct basic quantitative research and to generate useful ad hoc and routine reports
  • Because statistical thinking is a fantastic intellectual resource that deserves greater appreciation. It is a shame that the main idea most people have about statistics is “Lies, damned lies, and statistics”.

0.9.4 Additional output for 3 tests and numerous important bug fixes

Sunday, February 7th, 2010

0.9.4 is another important release. The new testing regime is identifying and fixing all sorts of quirky bugs, as well as some more significant ones. Please join the discussion group if there are any surviving bugs which are an issue for you (

Here are the new features of this release:

  • Paired t-test output includes a histogram of differences. This makes it easy to assess the normality of the distribution of differences.
    Paired t-test output

    Paired t-test output

  • Kruskal Wallis output now includes a table for each group containing its median, n, min etc.
  • Mann Whitney output now includes medians.
  • If using assistance to select statistical test, the normality help dialog varies according to whether or not paired data is selected. If paired, then two variables must be chosen and the normality of the differences is analysed and displayed.
    Normality of Differences

    Normality of Differences

  • When a cell edit fails validation, the cursor returns to the end of the text if possible, ready to edit immediately. This ends one major interface annoyance.
  • Users receive a useful message if there are no values to report in an analysis e.g. the data is over-filtered.
  • The Chi Square test provides useful messages if too few values in either row or column variables.

The list of bug fixes is quite substantial this time:

  • Independent t-test now works even if using a string variable for grouping.
  • Fixed bug preventing scripts from being run independently of GUI.
  • Fixed bug exporting scripts to the saved scripts file.
  • Fixed minor UI bug which meant the paired option remained visible after the stats selection was back to unguided.
  • Fixed bug that meant if the user moved the mouse away from data being entered the cell editor closed.
  • Fixed bug caused when shifting from one project with a default database engine e.g. MySQL, to another project in which that database is not available. Changing project wipes the stored default database engine.
  • Fixed bug with writing scripts with unicode characters.
  • If unable to calculate kurtosis etc still potentially able to produce rest of results.
  • Chi Square now honours filter values in script version.
  • Projects no longer have problems with new lines in their notes.
  • System copes with faulty project files better.

Development attention will start turning to the following in due course:

  • Mac packaging
  • Making the “results only” and “brief” explanation level settings operational
  • Making more “Help” buttons functional
  • Enabling user-defined missing values
  • Adding an Oracle plug-in
  • Output charting
  • Connecting to on-line educational resources on statistics

How do you feel about the direction being taken? Good? Bad? Any feedback? Please feel free to discuss any aspect of this project at

0.9.0 can examine normality of variables when selecting main test to use

Monday, January 4th, 2010

The 0.9 series has finally begun! The main emphasis of this series is to enhance the learn-as-you-go goal of SOFA Statistics. For example, when choosing a statistical test it is important to know if the data is normally distributed or not. You can do this at the point where you’re choosing your test:

Graph to help determine normality

Graph to help determine normality

Here is the list of new features added in 0.9.0:

  • When selecting a statistical test to use, users can examine the normality of a variable visually and with a test of kurtosis and skew.  The thumbnail can be expanded into a more detailed histogram.
  • When opening the project selection dialog it defaults to the version last selected during session.
  • Opening the data selection dialog uses the last database and table selected during session.
  • Information on filter used also displayed in immediate output.
  • The values displayed in tests like the ANOVA take any filtering into account.
  • When exporting script, user reminded where it is being appended.
  • When editing variable details, user is reminded where the changes are being saved.
  • Useful message to user if using deprecated project file format.

Bug fix:

  • Fixed simple naming bug which prevented independent t-test and mann-whitney from running.

This is the first release with Matplotlib (scientific graphing) included and the 0.9 series of releases should include lots more graphing and internal documentation to a) help users choose the correct test and b) confirm that they made the right choice.

NB The current plan is to use RaphaelJS for the output charting, not Matplotlib.  Matplotlib graphics will be restricted to a support role, albeit a very important one.  This decision is consistent with the “beautiful output” goal of SOFA Statistics.  RaphaelJS is still maturing as a library, so it may be a while till conventional output graphing is part of SOFA Statistics.  But most of the foundations are in place.

Funding open source to benefit education, health etc

Tuesday, December 1st, 2009

Government investment in open source makes sense as a way of reducing the cost of government service delivery.

The model is this – a government agency invests a sum of money to benefit a government sector e.g. education, health, social service delivery etc. As long as the software is open source, the government sector gets a multiplier effect – any further development of the software in other countries will further benefit education, health, social service delivery etc in the original country.

“This has certainly been New Zealand’s experience with Moodle, a GPL licenced online learning management system. A few years ago Moodle was a nice platform, had a good development community and a few hundred sites. A New Zealand TEC fund was used to add a number of features, not least amongst these were some very solid enterprise and performance capabilities.

Moodle is now used in over 13,000 sites and must be the world’s most popular LMS. Investment from institutions such as the UK’s Open University has now well exceeded NZ’s initial investment. Yet because of the GPL New Zealand still gets to benefit from other peoples’ significant work.” (New Zealand Open Source Society 2007)

Now imagine if there was a hypothetical open source program 😉 that could be used throughout the education sector (senior secondary, polytechs, universities), across a myriad of smaller organisations that supply social services on behalf of government, and across the private sector as well. It enables everything from the teaching of statistics through to routine management reporting. A bit of assistance in its further development could have a large spin-off. Makes you think.

wxWebKit will enable graphing when it is packaged

Friday, August 21st, 2009

wxWebKit ( is a very important widget for the SOFA Statistics project as it will be used to display all output. At present, the only debian package for wxWebKit (kindly supplied by Christoph Willing) does not support the display of local images. Fortunately this is being rectified through the hard work of Kevin Ollivier, and a new package should be out sometime soon. This is expected to be a standard package which should simplify the installation instructions for Ubuntu users.

Once the wxWebKit package is available, a lot of development work will take place in SOFA Statistics to provide auxiliary graphs which support analysis e.g. by displaying the data distributions in the samples used for an ANOVA. It will finally be possible to really start delivering on the “learn-as-you-go” promise of SOFA Statistics.