0.9.2 more support charts; portable output

The latest release has many of the features I have been wanting to add for a long time. In particular, there are more support charts to help users decide if a particular analysis is appropriate. If you run an ANOVA, for example, how normal are all the distributions for the subgroups? And what about variability? SOFA Statistics provides visual assistance for deciding (e.g. histograms with superimposed normal distribution curves) alongside the numerical results of appropriate tests (e.g. the O’Brien homogeneity of variance test). Instead of expecting the user to know the separate steps they ought to take, SOFA Statistics bundles them up together and tries to provide guidance and interpretation. Which is arguably how it should be. For example, how can a user properly interpret an R result from a correlation without a scatterplot? Many users won’t have studied statistics formally for a long time (if at all) and it is easy to be uncertain about exactly what all the rules are.

Re: portable output, SOFA Statistics reports are designed for viewing in web browsers (e.g. Firefox). Now that these reports include images it has become important to make sure they are easily portable. To that end, all internal links to images are relative. This means you can copy a report and the subfolder of its images (sharing the name of the report) anywhere and have the report work properly. It has never been easier to share the results of your analyses.

Here is a full list of the changes:

ANOVA output now includes histograms for each sample with superimposed normal distribution curves. It also shows kurtosis, skew, and an omnibus measure of normality for each sample as well as the O’Brien homogeneity of variance test. Explanatory footnotes have been added to the output.

Histograms for subgroups of ANOVA
Spearman’s and Pearson’s correlation output now includes scatterplots and lines of best fit.

Scatterplot for assessing linear correlation
All html reports are portable along with their images (stored in a subfolder of the same name).
When titles/subtitles are being changed, the rest of the example report table stays the same. This removes an annoying “flicker” effect when typing in titles/subtitles.
The redundant Clear button has been removed from Statistical Test dialogs.
An hourglass displays when opening statistics tests and report tables for the first time in case of a brief delay on first use.

There have also been some important and edge-case bug fixes:

All images are now uniquely named and stored in report-name-based subfolders if “added to report” has been selected, or in the internal folder otherwise. This guarantees the correct images will always be displayed and that saved HTML reports will work.
The page break in independent t-test output has been repositioned to below the histograms.
Changing to raw data display, and then changing table source, no longer prevents the example table from displaying.
Internal footnotes in expanded output now work for Windows users.

This entry was posted on Friday, January 22nd, 2010 at 10:28 pm and is filed under general. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

4 Responses to “0.9.2 more support charts; portable output”

Wouter says:

January 23, 2010 at 7:35 am

Wow, sofa is really getting good. I am waiting for the day it can totally replace spss for my (limited) statistics needs.
admin says:

January 24, 2010 at 11:30 pm

Thanks for the encouragement.

Re: meeting user needs, SOFA Statistics will have two parts. 1) a core that (hopefully) covers the needs of 80% of likely users; and 2) a plug-in system that makes everything else accessible (assuming, of course, that modules have been built, tested and approved).

For the former, the emphasis will be on ease of use, documentation, and providing the user with appropriate levels of guidance and information. For the latter, the emphasis will be on filling specialist gaps.

But is my understanding of what should be in the core correct? I don’t know. What do you think? What features are you waiting for before you can rely entirely on SOFA Statistics?

A good place to start such a conversation would be the discussion group (http://groups.google.com/group/sofastatistics). I am sure there will be others with opinions on what should be in the core.
Andy says:

January 25, 2010 at 11:48 am

I have to agree with the first poster. I think SOFA is shaping up to be a fantastic program. I love the idea of thinking through what a user needs when they perform various types of analysis. The example you give here is perfect. Of course there is a relationship between age and weight, but the relationship is clearly not strictly linear. Unfortunately, looking at Pearson’s R would not let you in on that little secret, thus you could come to an incorrect, or as shown in this example, incomplete conclusion.

IIRC, SOFA uses Python and I therefore assume that a Python interface will be available for anyone wanting to write extensions, macros, etc.

One strength of SPSS is scripting. Other tools such as R are even better at this. As nice as SOFA is, I need to be able to easily build a simple macro that I can re-use over and over and over again. Obviously, many of us get similar data-sets and run essentially the same basic analysis every time. It’s not exciting – but it is reality. For these situations, I have a set of R scripts that I use to generate a report that I can submit.

It looks like this is something SOFA will be able to handle, but if not I do think it is an important addition. Further down the road something similar to SWEAVE could also be interesting/useful but that level of functionality is clearly beyond the 80% rule you mentioned earlier but I do think the scripting of routine tasks is important to a large enough group of people to make sure this is possible.
admin says:

January 26, 2010 at 4:57 pm

Hi Andy,

When I was using SPSS I would always bundle things like correlations and scatterplots together in my syntax so I would automatically do them together. It seems an obvious thing to do and the only reason I can think for a program not doing that is historical. I can think of two valid reasons for leaving everything fragmented: 1) computation was very expensive on the CPU etc, so it was important to avoid any more analyses than were strictly necessary in each specific case, and 2) any user of a statistical program would be a skilled specialist. Perhaps it was safe to leave it to them to put together all the pieces as and when they needed to.

As for scripting and automation I have some good news for you. SOFA Statistics is built on reusable scripting. Inside the sofa/_internal folder you will find a file called scipt.py. That is the script which produced the last output you ran. There is also an Export button on every output dialog which lets you export the script for your most recent output onto the end of your designated script output file. I am already using this automation to run a series of analyses for a client. Python is perfect for this and it is Python all the way down. If you wish to talk more about reusing scripts etc a good place might be the discussion forum for SOFA Statistics (http://groups.google.com/group/sofastatistics).