Great new SOFA teaching resource

January 28th, 2017

Thanks to George Self there is a great new teaching resource available for SOFA users. See https://goo.gl/4lpIaO Here is George’s announcement repeated from the discussion group:

I teach an undergrad research methodology class and wrote a SOFA-based lab manual for that class that some of you may be interested in. You can find the manual and the data sets at https://goo.gl/4lpIaO.

The manual has ten chapters:

  1. Introduction (data types, normal distribution, kurtosis, skew, null hypothesis, downloading/installing SOFA, recoding data)
  2. Central Measures (mean, median, mode)
  3. Data Dispersion (range, quartiles, standard deviation)
  4. Visualizing Dispersion (box charts)
  5. Frequency Tables (frequency tables, crosstabs, complex crosstabs)
  6. Visualizing Frequency (histogram, bar chart, clustered bar chart, pie chart, line graph)
  7. Correlation (pearson’s r, spearman’s rho, significance, scatter plots)
  8. Regression
  9. Hypothesis Testing: Nonparametric Statistics (SOFA Statistics Wizard, Kruskal-Wallis H, Wilcoxon Signed Ranks, Mann-Whitney U)
  10. Hypothesis Testing: Parametric Statistics (ANOVA, t-test-Independent, t-test-Paired)

There are also two appendices, the first is a data dictionary for each of the data sets used and the second covers the various report generating features of SOFA.

The lab manual covers all of the functions and features of SOFA, but in the context of a lab where those functions are practiced rather than just described. The manual also includes a lot of information about how the various statistical measures are used (for example, the difference between correlation and causation). No math knowledge beyond simple high school algebra is assumed on the part of the student and each of the labs includes a “deliverable” activity so instructors can use this as part of a class.

I’ve printed this manual under Creative Commons-BY-ShareAlike so please feel free to use this in any way you want. Of course, I’m also happy to receive comments that could help me improve this manual in the future.

–George

Please give the resource a spin and provide George with any feedback that can improve/refine it. Once again, thanks George for making this available to the community 🙂

SOFA passes quarter-million downloads

January 27th, 2017

I was delighted when SOFA passed 30 downloads in 2009 and here we are in 2017 more than 250,000 downloads later – still can’t believe it :-). Even though Sourceforge seems to have become confused about how many downloads there are having “lost” a whole lot in the last couple of months I am pretty confident we really have crossed the quarter-million mark. BTW a major new version of SOFA is in the pipeline and will be released when I fix some Mac installer problems.

Installing SOFA on Ubuntu 16.04 and 16.10

January 27th, 2017

tl;dr

echo "deb http://archive.ubuntu.com/ubuntu wily main universe" | sudo tee /etc/apt/sources.list.d/wily-copies.list

sudo apt update
sudo apt install python-wxgtk2.8
sudo rm /etc/apt/sources.list.d/wily-copies.list
sudo apt update
Download latest deb from http://www.sofastatistics.com/downloads.php
cd ~/Downloads
sudo dpkg -i sofastats-1.4.6-1_all.deb

Details

Even though SOFA is developed on Ubuntu (16.10 at present) there was a problem installing SOFA onto 16.04 or 16.10. The root cause related to Ubuntu support for different versions of wxPython and I spent a lot of time trying different solutions. Fortunately there is a simple workaround that only requires about six terminal commands (see below). Obviously, having to run commands is not as good as a standard installation but it will have to do for now because the main alternatives aren’t currently viable. E.g. some parts of SOFA don’t seem to play nicely with the packaged versions of wxPython 3.0. Snap packaging holds some promise but that will have to wait for later depending on the next releases of Ubuntu.

Thanks to bbobbo for finding a general solution to wxPython 2.8 installation problems on Ubuntu 16 and relating them to the specific SOFA problem and Domenico Somma for bringing it to my attention via the SOFA forum. Here are the steps (Solution from SOFA (statistics) – python 2.8 request – unable installation):

1. Add needed repository and update package list

echo "deb http://archive.ubuntu.com/ubuntu wily main universe" | sudo tee /etc/apt/sources.list.d/wily-copies.list

sudo apt update

2. Install it
sudo apt install python-wxgtk2.8

3. Remove repository entry and update package list again
sudo rm /etc/apt/sources.list.d/wily-copies.list

sudo apt update

4. Install SOFA Statistics
Download latest deb from http://www.sofastatistics.com/downloads.php

cd ~/Downloads

sudo dpkg -i sofastats-1.4.6-1_all.deb

5. Success?
sofastats

Extra info – Warning from http://askubuntu.com/questions/789302/install-python-wxgtk2-8-on-ubuntu-16-04 – “Following this method on large scale can lead to unmet dependency hell. So keep in mind this is similar to PPA’s.” This comment also has more details too about solving issues with broken packages.

Using SOFA on multiple machines with synced config and data

April 26th, 2016

One of the nice things about open source software like SOFA Statistics is that you can freely install it on as many machines as you like without licensing issues, complex validation etc. But how do you keep your content in sync across all the different devices? One approach is to keep the user sofastats folder in a synced drive.

What might be coming next

April 20th, 2016

Python 2 is reaching its End Of Life (EOL) in 2020 so sometime before then I will want to shift SOFA to Python 3. I much prefer Python 3 but the main thing will be the libraries SOFA relies on to operate – especially on Windows and Mac.

Speaking of Mac, I am finding it very time-consuming supporting the platform. Not to enable SOFA’s core functionality to work but for the image processing libraries (esp convert and gs). Along the way I have spent countless weekends compiling using homebrew etc. Slow, tricky, and often fruitless. And it is difficult to test. I only have access to a Snow Leopard machine (virtualised to allow revert to snapshot) and that is no longer very relevant to what people need for newer versions of OS X. Some very kind people have offered to help with testing (thanks!) but the problem seems to be the packaging steps. Maybe what I need to do is ask the people who volunteered if any of them are able to compile convert and gs for me on their machines. I can then just include those versions in my packages and hopefully everything works.

A final work item is to add a Fisher’s LSD test. A friend is helping with this.

1.4.6 Adds basic time series

January 1st, 2016

SOFA line charts and area charts now treat dates as dates in the x-axis which makes it easier to look at time series data.

New option added to interface

New option added to interface

X-axis date aware

Time series selected – X-axis date aware

X-axis not date-aware

Time series not selected – X-axis not date-aware

Example time series chart

Example time series chart

Additional improvements include:

  • Better error message when not enough values in group to run analysis e.g. ANOVA.
  • Better handling of precision in p-value results displayed.
  • Better handling of dates pre-1900.
  • Better messages to user about potentially excessive categories in charts.
  • Add support for float years as date values for time series.
  • Add support for specifying port connecting to postgresql.
  • Allows boxplots when fewer values to display.

And there were two other changes:

  • Removed broken google docs integration – just as easy to manually download and import normally.
  • Removed two pop-ups – no longer needed.

There are also a number of bug fixes:

  • No longer a missing legend in multiseries scatterplots just because the first scatterplot only had one series of data.
  • Fixed bug with saving database connection details when a number involved (port).
  • Fixed PostgreSQL bug when saving connection without password – now succeeds rather than failing silently.
  • Fixed MySQL bug with adding rows.
  • Fixed bug in Windows with checkboxes not enabling/disabling properly unless panels refreshed.

1.4.5 More box plot options

August 15th, 2015

There are several ways of doing box plots. Some show outliers, some don’t. Some set the whiskers at the min and max values, some don’t. Until now, SOFA kept it simple by only allowing one approach. But sometimes a little more flexibility is needed. So now users can choose between three options:

Option 1) This is the default. Outliers are displayed. Lower whiskers are 1.5 times the Inter-Quartile Range below the lower quartile, or the minimum value, whichever is closest to the middle. Upper whiskers are calculated using the same approach.

boxplot_whiskers_1_5

Option 2) Outliers hidden. Lower whiskers are 1.5 times the Inter-Quartile Range below the lower quartile, or the minimum value, whichever is closest to the middle. Upper whiskers are calculated using the same approach.

boxplot_hide_outliers

Option 3) Whiskers are at the minimum and maximum values.

boxplot_whiskers_min_max

And SOFA displays a small note at the bottom of each box plot so it is clear what approach has been used.

Additionally version 1.4.5 adds:

  • ODS importing can now cope with repeated column names.
  • Better error message when unable to get regression line details because of limited variability.

And there are numerous bug fixes as well:

  • Fix bug when problem with imported data.
  • Fix to title and date concatenation code so doesn’t break when title has non-ascii characters e.g. in Spanish (affecting Windows and Mac)
  • Reordered regression line plotting in js so appears on top of dots.
  • Added zero division error trap to spearman’s test error output.
  • Fixed bug which prevented ods_reader from importing repeating rows at the end of a spreadsheet. Only repeated empty rows at the end are considered the end of the data.
  • Properly handle all read operations on internal-use text files e.g. proj, css etc. Can cope with a utf-8 BOM (only) to cope with Windows Notepad editing. Breaks if other encodings used which is fair enough.

We hope you like the latest version.

SOFA website OK now on Mobile

June 24th, 2015

The SOFA Statistics website is now “responsive” – that is, it finally works well on mobile devices. I used Bootstrap to achieve this and had a major clean up of cruft. If there are any problems, let me know.

Responsive  design for SOFA website

Version 1.4.4 good for Mac Users

May 11th, 2015

Mac users can finally export output in PDF format including individual charts and report tables. And depending on the version of OS X Mac users will also be able to export as PNG images. Ensuring image exporting works across multiple versions of OS X is an ongoing project of work which users can help with if interested.

mac_export_output

mac_output

Users needing to produce monochrome output for publication will like the addition of a new monochrome theme.

monochrome

All New Features in 1.4.4:

  • Mac users can export output in PDF format (and PNG depending on version of OS X).
  • Added new monochrome theme.
  • Chi Square proportions output much easier to interpret successfully.
  • The name of the grouping variable is now displayed when running comparisons of groups e.g. Country if comparing Italy and Germany.
  • Exporting to spreadsheet detects if too many fields for xls output and informs user that only csv will be generated. Also truncates table name so worksheet name not too long.
  • Import dialog only displays file types suitable for importing.
  • Added message to let user know spreadsheet creation being skipped if no report tables to export.
  • More user help on need for raw data (not pre-summarised) and long-format vs wide-format data as appropriate.
  • Code reorganisation to make it possible for SOFA to be called in GUI form by external code GUI code.
  • Scripts are now easier to use for standalone purposes.
  • Added note about treatment of datetime data as categorical by SOFA for purposes of statistical tests.
  • When exporting to spreadsheet and csv changes reserved sofa_id field name to was_sofa_id so it is OK to reimport after changes.
  • More informative for larger range of potential problem e.g. database engine not functioning.

Bug Fixes:

  • Fix bug resulting in Pearson’s r being displayed instead of Spearman’s rho.
  • Fixed bug on some systems when saving a worksheet with spaces in name.
  • Prevented numerous bugs related to quoting table names, fully qualified file names etc.
  • Fixed bug with misuse of escape_pre_write on python code rather than normal content.
  • Skew, and normality test now cope with the nan issue better e.g. sqrt of a negative number. Just says unable to calculate instead of displaying nan (not a number). skewtest function now copes with negative number as input to square root.
  • Fixed bug when importing NaN text – now treated as a missing value in a numeric field.
  • Removed bug which sometimes prevented Mac users from being able to successfully change the report name.
  • Stopped making export folder if no output to export into it.

Calling Mac Users – Contact Me to Test Latest Greatest Version

March 14th, 2015

Over the last year, work on SOFA has been focused on a difficult packaging issue – enabling a Mac version to be built which allows Mac users to export their charts and reports as PNGs and PDFs. That functionality is now working on Snow Leopard and hopefully newer versions as well. But it would be nice to check with some people running Mac. If you’d like to try out the latest version of SOFA Statistics, please drop me a line via http://www.sofastatistics.com/contact.php.