One of the nice things about open source software like SOFA Statistics is that you can freely install it on as many machines as you like without licensing issues, complex validation etc. But how do you keep your content in sync across all the different devices? One approach is to keep the user sofastats folder in a synced drive.
Python 2 is reaching its End Of Life (EOL) in 2020 so sometime before then I will want to shift SOFA to Python 3. I much prefer Python 3 but the main thing will be the libraries SOFA relies on to operate – especially on Windows and Mac.
Speaking of Mac, I am finding it very time-consuming supporting the platform. Not to enable SOFA’s core functionality to work but for the image processing libraries (esp convert and gs). Along the way I have spent countless weekends compiling using homebrew etc. Slow, tricky, and often fruitless. And it is difficult to test. I only have access to a Snow Leopard machine (virtualised to allow revert to snapshot) and that is no longer very relevant to what people need for newer versions of OS X. Some very kind people have offered to help with testing (thanks!) but the problem seems to be the packaging steps. Maybe what I need to do is ask the people who volunteered if any of them are able to compile convert and gs for me on their machines. I can then just include those versions in my packages and hopefully everything works.
A final work item is to add a Fisher’s LSD test. A friend is helping with this.
SOFA line charts and area charts now treat dates as dates in the x-axis which makes it easier to look at time series data.
Additional improvements include:
- Better error message when not enough values in group to run analysis e.g. ANOVA.
- Better handling of precision in p-value results displayed.
- Better handling of dates pre-1900.
- Better messages to user about potentially excessive categories in charts.
- Add support for float years as date values for time series.
- Add support for specifying port connecting to postgresql.
- Allows boxplots when fewer values to display.
And there were two other changes:
- Removed broken google docs integration – just as easy to manually download and import normally.
- Removed two pop-ups – no longer needed.
There are also a number of bug fixes:
- No longer a missing legend in multiseries scatterplots just because the first scatterplot only had one series of data.
- Fixed bug with saving database connection details when a number involved (port).
- Fixed PostgreSQL bug when saving connection without password – now succeeds rather than failing silently.
- Fixed MySQL bug with adding rows.
- Fixed bug in Windows with checkboxes not enabling/disabling properly unless panels refreshed.
There are several ways of doing box plots. Some show outliers, some don’t. Some set the whiskers at the min and max values, some don’t. Until now, SOFA kept it simple by only allowing one approach. But sometimes a little more flexibility is needed. So now users can choose between three options:
Option 1) This is the default. Outliers are displayed. Lower whiskers are 1.5 times the Inter-Quartile Range below the lower quartile, or the minimum value, whichever is closest to the middle. Upper whiskers are calculated using the same approach.
Option 2) Outliers hidden. Lower whiskers are 1.5 times the Inter-Quartile Range below the lower quartile, or the minimum value, whichever is closest to the middle. Upper whiskers are calculated using the same approach.
Option 3) Whiskers are at the minimum and maximum values.
And SOFA displays a small note at the bottom of each box plot so it is clear what approach has been used.
Additionally version 1.4.5 adds:
- ODS importing can now cope with repeated column names.
- Better error message when unable to get regression line details because of limited variability.
And there are numerous bug fixes as well:
- Fix bug when problem with imported data.
- Fix to title and date concatenation code so doesn’t break when title has non-ascii characters e.g. in Spanish (affecting Windows and Mac)
- Reordered regression line plotting in js so appears on top of dots.
- Added zero division error trap to spearman’s test error output.
- Fixed bug which prevented ods_reader from importing repeating rows at the end of a spreadsheet. Only repeated empty rows at the end are considered the end of the data.
- Properly handle all read operations on internal-use text files e.g. proj, css etc. Can cope with a utf-8 BOM (only) to cope with Windows Notepad editing. Breaks if other encodings used which is fair enough.
We hope you like the latest version.
The SOFA Statistics website is now “responsive” – that is, it finally works well on mobile devices. I used Bootstrap to achieve this and had a major clean up of cruft. If there are any problems, let me know.
Mac users can finally export output in PDF format including individual charts and report tables. And depending on the version of OS X Mac users will also be able to export as PNG images. Ensuring image exporting works across multiple versions of OS X is an ongoing project of work which users can help with if interested.
Users needing to produce monochrome output for publication will like the addition of a new monochrome theme.
All New Features in 1.4.4:
- Mac users can export output in PDF format (and PNG depending on version of OS X).
- Added new monochrome theme.
- Chi Square proportions output much easier to interpret successfully.
- The name of the grouping variable is now displayed when running comparisons of groups e.g. Country if comparing Italy and Germany.
- Exporting to spreadsheet detects if too many fields for xls output and informs user that only csv will be generated. Also truncates table name so worksheet name not too long.
- Import dialog only displays file types suitable for importing.
- Added message to let user know spreadsheet creation being skipped if no report tables to export.
- More user help on need for raw data (not pre-summarised) and long-format vs wide-format data as appropriate.
- Code reorganisation to make it possible for SOFA to be called in GUI form by external code GUI code.
- Scripts are now easier to use for standalone purposes.
- Added note about treatment of datetime data as categorical by SOFA for purposes of statistical tests.
- When exporting to spreadsheet and csv changes reserved sofa_id field name to was_sofa_id so it is OK to reimport after changes.
- More informative for larger range of potential problem e.g. database engine not functioning.
- Fix bug resulting in Pearson’s r being displayed instead of Spearman’s rho.
- Fixed bug on some systems when saving a worksheet with spaces in name.
- Prevented numerous bugs related to quoting table names, fully qualified file names etc.
- Fixed bug with misuse of escape_pre_write on python code rather than normal content.
- Skew, and normality test now cope with the nan issue better e.g. sqrt of a negative number. Just says unable to calculate instead of displaying nan (not a number). skewtest function now copes with negative number as input to square root.
- Fixed bug when importing NaN text – now treated as a missing value in a numeric field.
- Removed bug which sometimes prevented Mac users from being able to successfully change the report name.
- Stopped making export folder if no output to export into it.
Over the last year, work on SOFA has been focused on a difficult packaging issue – enabling a Mac version to be built which allows Mac users to export their charts and reports as PNGs and PDFs. That functionality is now working on Snow Leopard and hopefully newer versions as well. But it would be nice to check with some people running Mac. If you’d like to try out the latest version of SOFA Statistics, please drop me a line via http://www.sofastatistics.com/contact.php.
SOFA works on Windows, Mac, and Linux. But Linux is especially important for the project because SOFA is developed on Ubuntu. So it made sense to support the Linux ecosystem by signing up with the Open Invention Network. In an ideal world, it wouldn’t be necessary to have anything to do with software patents. For various reasons, they’re a bad idea and function more to inhibit innovation than encourage investment in software research and development. But the Open Invention Network plays a protective function in a world where people who create and actually make things can be preyed upon by parasites who have been granted monopolies on ideas – the so-called patent trolls.
The group was created to defend Linux from patent trolls and other attacks from patent holders. It tries to do this with its own patents which are then available royalty-free to any company, institution or individual that agrees not to assert its patents against Linux. While it hasn’t been done, these patents could also, in theory, be used by the OIN, or an OIN member, against a hostile company in a patent war.
Anyway, a range of companies and projects large and small (over 800 at present and growing) have signed up for the initiative including Google, Dropbox, IBM, Canonical, Mozilla, Twitter, Puppet Labs, Valve Software, Alfresco, NEC, Blender, OpenShot, Novell, Inkscape, Philips, Red Hat, CentOS, GNOME, Wikimedia, MariaDB Foundation, Rackspace, Moodle, Openstack, Slackware, Tor, and Sony. You get the idea.
J. David Eisenberg has kindly made a PortableApps version of SOFA for Windows. It is an alpha release only but it works and feedback/assistance is welcome. Here is his announcement as posted on the Google Group:
I have used the PortableApps guidelines (correctly, I hope!) to create a version of SOFA Statistics that can be installed on a USB drive and will retain your data and settings.
You can download the installer at http://evc-cit.info/SOFAStatisticsPortable_1.4.3_English.paf.exe ; this is Windows only.
Known problem: If you add the results of a statistical test to a report, any graphs for that statistical test will show up as a “missing image” icon. The image will be in the report; it just won’t show up on screen.
I have not tried scripting to see if that works properly.
Any comments are welcome at the SOFA statistics Google Group
SOFA 1.4.3 now lets you import directly from tab-separated/tab-delimited files.
Another change is less momentous but should really please people doing lots of row stats reports. As SOFA gained more measures it became increasingly more effort to select individual measures one-by-one, checkbox by checkbox. Now there is a toggle button for Select All/Deselect All. Much better :-).
And the bonus themes are now part of the standard release making it easier than ever to make your charts and tables look good – I hope you like them.
One change that most users will never notice is better support for running SOFA via scripts. An exciting automation project is currently under development using this functionality and I hope to have some news to share soon.
Here’s the full list of changes:
- Can import tab-delimited data.
- More options for attractive charts and reports. Three new themes available – sky, prestige (screen), and prestige (print).
- Better support for automation (i.e. headless, running without GUI) esp in international context.
- Exporting to spreadsheet now relies on more robust code library (xlwt)
- Easy to select or deselect lots of row stats measures at once.
- Faster opening in many cases.
And the bug fixes:
- Minor tweak to PostgreSQL plug-in to handle timestamps without timezone.
- Resolved bug when SQLite numbers are stored in a non-numeric field and processed for Chi Square test.
- Importing csvs now copes better when only missing vals in sample of a field. Gives user the choice.
- Fixed bug when doing a Row Stats table with a rows variable e.g. by Gender and some of the fields can’t be calculated for some of the row categories.
- Headless importing now works in the event of inconsistent data types in fields.
- Headless importing now reads entire dataset rather than a sample to avoid need for (human) decisions.
- Scripts no longer rely on translated arguments. Much safer to use on other machines with different locales.
- Fixed circular import bugs which only became visible when other bugs occurred.