Over the last year, work on SOFA has been focused on a difficult packaging issue – enabling a Mac version to be built which allows Mac users to export their charts and reports as PNGs and PDFs. That functionality is now working on Snow Leopard and hopefully newer versions as well. But it would be nice to check with some people running Mac. If you’d like to try out the latest version of SOFA Statistics, please drop me a line via http://www.sofastatistics.com/contact.php.
Archive for the ‘community’ Category
SOFA works on Windows, Mac, and Linux. But Linux is especially important for the project because SOFA is developed on Ubuntu. So it made sense to support the Linux ecosystem by signing up with the Open Invention Network. In an ideal world, it wouldn’t be necessary to have anything to do with software patents. For various reasons, they’re a bad idea and function more to inhibit innovation than encourage investment in software research and development. But the Open Invention Network plays a protective function in a world where people who create and actually make things can be preyed upon by parasites who have been granted monopolies on ideas – the so-called patent trolls.
The group was created to defend Linux from patent trolls and other attacks from patent holders. It tries to do this with its own patents which are then available royalty-free to any company, institution or individual that agrees not to assert its patents against Linux. While it hasn’t been done, these patents could also, in theory, be used by the OIN, or an OIN member, against a hostile company in a patent war.
Anyway, a range of companies and projects large and small (over 800 at present and growing) have signed up for the initiative including Google, Dropbox, IBM, Canonical, Mozilla, Twitter, Puppet Labs, Valve Software, Alfresco, NEC, Blender, OpenShot, Novell, Inkscape, Philips, Red Hat, CentOS, GNOME, Wikimedia, MariaDB Foundation, Rackspace, Moodle, Openstack, Slackware, Tor, and Sony. You get the idea.
J. David Eisenberg has kindly made a PortableApps version of SOFA for Windows. It is an alpha release only but it works and feedback/assistance is welcome. Here is his announcement as posted on the Google Group:
I have used the PortableApps guidelines (correctly, I hope!) to create a version of SOFA Statistics that can be installed on a USB drive and will retain your data and settings.
You can download the installer at http://evc-cit.info/SOFAStatisticsPortable_1.4.3_English.paf.exe ; this is Windows only.
Known problem: If you add the results of a statistical test to a report, any graphs for that statistical test will show up as a “missing image” icon. The image will be in the report; it just won’t show up on screen.
I have not tried scripting to see if that works properly.
Any comments are welcome at the SOFA statistics Google Group
All statistics packages have their strengths and weaknesses so it is not uncommon for people to want to use more than one – even on the same project. SOFA is focused on making it easy to use some core statistical tests and producing attractive, high definition tabular and charting output. SOFA also makes it easy to link to, or import from, a wide range of formats: xls, xlsx, csv, google docs spreadsheets, MS Access, NySQL, MS SQL Server, PostgreSQL, SQLite, and, more recently, CUBRID.
But there is no point overcomplicating SOFA so it can do every statistical test that might be needed for a particular project. SOFA users have been routinely surveyed on what features they would like added and it has not consolidated into a clear list of priorities. People need lots of different things depending on their specific projects.
So a sensible goal for SOFA is to make it easy to import and export data, including metadata such as variable and value labels. This strategy has already resulted in the addition to SOFA (version 1.4.2) of built-in export-to-spreadsheet functionality. And it has already been improved for version 1.4.3 (not released yet at the time of this writing).
The question is, what packages should SOFA target as priorities for interoperability? Feel free to fire me off an email at firstname.lastname@example.org.
Creating and sharing a software project is a leap into the unknown. Will anyone use it? Will anyone like it? And although download numbers are a very imperfect measure, they can provide encouragement when engaged in the numerous tasks associated with a project. So it is with great pleasure that I can announce SOFA Statistics has passed the 150,000 milestone on Sourceforge.
Thanks to all the people who have helped make this possible. I’m still thinking about what to do with SOFA next but it seems to have found a niche as a general purpose, open source statistics application. And I’m still trying to stay true to the mantra “ease of use, learn as you go, and beautiful output”.
If you’ve liked using SOFA, please consider sending me a brief message at email@example.com. I’m keen to hear where in the world people are from, what sorts of things SOFA has been used for, and anything else interesting. Please include at least your first name – I’d like to display some of these messages to promote SOFA. Thanks in advance. [Note – my purpose is to collect some feedback to share, not to gather email addresses. But I expect I will personally reply to a few emails if I have time.]
SOFA aims for ease of use as part of its “ease of use, learn as your go, beautiful output” mantra. But it confronts users with having to think about databases, even if just working with simple spreadsheets of data or some data entered by hand.
This was the usability problem brought to my attention by a member of the community, Jan Dittrich. Jan (http://mindthegap.blog.bau-ha.us/), is completing a Masters in Media Arts and Design at the Bauhaus University in Weimar/Germany. He mainly does user research and usability, but has an interest in statistics as well. When using SOFA he noticed that a “database” needs to be selected for most of the activities but that it might be a rather technical concept for some of those who use SOFA. He wrote me an email addressing the problem and we subsequently exchanged ideas.
So how to address this without removing one important ability of SOFA – namely the ability to connect directly to people’s data when it is in a database (e.g. MySQL).
We explored a few options …
… but ended up following the principle of “the least we could do” as recommended in the fantastic usability book “Rocket Surgery Made Easy” by Steve Krug.
As Krug notes, tweaking is usually better than redesigning because 1) it actually gets done; 2) larger changes are inevitably going to break some things (think months of squashing all the bugs out again); and 3) redesigns annoy a lot of existing users who have gotten used to the status quo (actually Krug has 9 reasons but these are my favourites).
Anyway, I had no enthusiasm for a major GUI overhaul but it did not make sense to leave a known usability problem in place. What Jan and I came up with was rather simple and elegant. SOFA only shows the Database label and drop-down if the user has configured SOFA to connect to any databases. Expect to see this change in the next version (1.3.5).
Users who have database connections will notice no difference. But for everyone else the interface will be simpler and easier to use. Sometimes, less is more.
A helpful user drew my attention to the desirability of adding a small but important feature for dissertation students – namely, the ability to leave the percentage symbol off the numbers in the percentage columns of frequency and cross tabulation report tables. This new feature will be in the forthcoming version of SOFA (1.1.4):
For dissertation writing in the States, Turabian 7th edition and the Chicago Manual of Style 6th edition are standard for many graduate schools on both the masters and the doctoral level. In both cases, tables with percentage figures in them do NOT have percent signs in front of the numbers themselves, because a typical title like “Table 3. % of babies born to men over 40” already tells you what’s inside the table.
Sofa Stats, however, so far as I can see, requires that percentages have the percent sign, which then gets dragged-and-dropped into Word (or Excel, for tidying up first). If the table is a small one, and if there are only one or two, no problem. But many dissertations have tons of them.
There is a way to rid a table of the % signs by using Excel, but it’s awkward and not a part of the regular menu system. I just worked it out myself a few hours ago, after spending half a day on the problem.
What would be *extremely* helpful to graduate students, whom I assume you would like to have as one of your key user groups, would be for you to program in a “switch” that would allow the user to specify percentages with or without percent signs. It’s a small detail, but one that would be much appreciated.
I generally try to avoid adding more features to SOFA in favour of keeping it simple but this seemed a good idea. Thanks again for the feedback Doug.
It is easier than ever to get started with SOFA Statistics. A new “Get Started” button has been added to the main form and other buttons have been shifted to better emphasise the most important:
Clicking on the “Get Started” button open a web page with screen shots and step-by-step examples.
It is now easy for first-time users to give their feedback. Was it as useful as they hoped? Is there anything which can be done to improve SOFA? There is a link on the main start form, plus a pop-up option on first exit from SOFA.
The goal is to make SOFA more useful by finding out what worked, and didn’t for users – especially first-time users.
There were two other changes:
- When importing csv files, SOFA now sets the default for ‘Has header row?’ according to a review of the sample contents.
- And an Exit control has been added to all forms where appropriate.
Once again, there have been a number of bug fixes:
- Fixed postgresql quoting error by using single quote values.
- SOFA now uses the default database when making an initial connection. If connecting to a project fails, SOFA reverts to previous project. Postgresql projects insist on default database if user is not ‘postgres’. Gives useful error if an old project has this problem.
- Fixed minor bug in Mann-Whitney output exposed whenever labels with %s in them were used.
- Improved error trapping if error importing wx.lib.iewin.
- Better font for help text on main form according to platform.
- CSV importing copes with single-row data.
- Better font settings for help text on Macs.
- Fixed display bug in Mac when more than 20 values warning shown (smaller font now).
- Chart by now filters out data lacking values in chart by variable.
The most important thing for the project is the feedback we will hopefully receive.
I finally succumbed and created a twitter account – you can follow the project on http://www.twitter.com/sofastatistics.
0.9.4 is another important release. The new testing regime is identifying and fixing all sorts of quirky bugs, as well as some more significant ones. Please join the discussion group if there are any surviving bugs which are an issue for you (http://groups.google.com/group/sofastatistics).
Here are the new features of this release:
- Paired t-test output includes a histogram of differences. This makes it easy to assess the normality of the distribution of differences.
- Kruskal Wallis output now includes a table for each group containing its median, n, min etc.
- Mann Whitney output now includes medians.
- If using assistance to select statistical test, the normality help dialog varies according to whether or not paired data is selected. If paired, then two variables must be chosen and the normality of the differences is analysed and displayed.
- When a cell edit fails validation, the cursor returns to the end of the text if possible, ready to edit immediately. This ends one major interface annoyance.
- Users receive a useful message if there are no values to report in an analysis e.g. the data is over-filtered.
- The Chi Square test provides useful messages if too few values in either row or column variables.
The list of bug fixes is quite substantial this time:
- Independent t-test now works even if using a string variable for grouping.
- Fixed bug preventing scripts from being run independently of GUI.
- Fixed bug exporting scripts to the saved scripts file.
- Fixed minor UI bug which meant the paired option remained visible after the stats selection was back to unguided.
- Fixed bug that meant if the user moved the mouse away from data being entered the cell editor closed.
- Fixed bug caused when shifting from one project with a default database engine e.g. MySQL, to another project in which that database is not available. Changing project wipes the stored default database engine.
- Fixed bug with writing scripts with unicode characters.
- If unable to calculate kurtosis etc still potentially able to produce rest of results.
- Chi Square now honours filter values in script version.
- Projects no longer have problems with new lines in their notes.
- System copes with faulty project files better.
Development attention will start turning to the following in due course:
- Mac packaging
- Making the “results only” and “brief” explanation level settings operational
- Making more “Help” buttons functional
- Enabling user-defined missing values
- Adding an Oracle plug-in
- Output charting
- Connecting to on-line educational resources on statistics
How do you feel about the direction being taken? Good? Bad? Any feedback? Please feel free to discuss any aspect of this project at http://groups.google.com/group/sofastatistics.