Archive for January, 2010

SOFA Statistics passes 5000 downloads on SourceForge

Tuesday, January 26th, 2010

5000 downloads is a milestone worth celebrating on an open source project. And SOFA Statistics has finally passed 5000 downloads on SourceForge.

5000 downloads on SourceForge

5000 downloads on SourceForge

Including Softpedia, and the original downloads from the main project website (http://www.sofastatistics.com) and Launchpad, there have been 6,000 downloads in total. The next big milestone for the project will either be 10,000 downloads or version 1.0. Either way, SOFA Statistics will have a lot of additions and improvements before then.

Bullet-point engineering

Sunday, January 24th, 2010

The features page of SOFA Statistics (http://www.sofastatistics.com/features.php) has a number of bullet points listing features. There is nothing wrong with that. But bullet points don’t tell you how well a feature is implemented or how well it integrates with the rest of the program. Here is an interesting wikipedia entry of relevance: http://en.wikipedia.org/wiki/Bullet-point_engineering . The goal for SOFA Statistics will be to limit the range of features (in the core) but to increase the depth and quality of them. E.g. greater support for users, help with decision-making on appropriate tests, greater ease sharing output or automating report making, more languages supported etc. Beyond the core, it is hoped there will be numerous modules available. But the emphasis for now is on the value of the features added rather than the sheer number of them.

0.9.2 more support charts; portable output

Friday, January 22nd, 2010

The latest release has many of the features I have been wanting to add for a long time. In particular, there are more support charts to help users decide if a particular analysis is appropriate. If you run an ANOVA, for example, how normal are all the distributions for the subgroups? And what about variability? SOFA Statistics provides visual assistance for deciding (e.g. histograms with superimposed normal distribution curves) alongside the numerical results of appropriate tests (e.g. the O’Brien homogeneity of variance test). Instead of expecting the user to know the separate steps they ought to take, SOFA Statistics bundles them up together and tries to provide guidance and interpretation. Which is arguably how it should be. For example, how can a user properly interpret an R result from a correlation without a scatterplot? Many users won’t have studied statistics formally for a long time (if at all) and it is easy to be uncertain about exactly what all the rules are.

Re: portable output, SOFA Statistics reports are designed for viewing in web browsers (e.g. Firefox). Now that these reports include images it has become important to make sure they are easily portable. To that end, all internal links to images are relative. This means you can copy a report and the subfolder of its images (sharing the name of the report) anywhere and have the report work properly. It has never been easier to share the results of your analyses.

Here is a full list of the changes:

  • ANOVA output now includes histograms for each sample with superimposed normal distribution curves. It also shows kurtosis, skew, and an omnibus measure of normality for each sample as well as the O’Brien homogeneity of variance test. Explanatory footnotes have been added to the output.
    Histograms for subgroups of ANOVA

    Histograms for subgroups of ANOVA

  • Spearman’s and Pearson’s correlation output now includes scatterplots and lines of best fit.
    Scatterplot for assessing linear correlation

    Scatterplot for assessing linear correlation

  • All html reports are portable along with their images (stored in a subfolder of the same name).
  • When titles/subtitles are being changed, the rest of the example report table stays the same. This removes an annoying “flicker” effect when typing in titles/subtitles.
  • The redundant Clear button has been removed from Statistical Test dialogs.
  • An hourglass displays when opening statistics tests and report tables for the first time in case of a brief delay on first use.

There have also been some important and edge-case bug fixes:

  • All images are now uniquely named and stored in report-name-based subfolders if “added to report” has been selected, or in the internal folder otherwise. This guarantees the correct images will always be displayed and that saved HTML reports will work.
  • The page break in independent t-test output has been repositioned to below the histograms.
  • Changing to raw data display, and then changing table source, no longer prevents the example table from displaying.
  • Internal footnotes in expanded output now work for Windows users.

0.9.1 has first of new wave of support charts

Sunday, January 17th, 2010

0.9.1 is out and there have been a lot of improvements this time:

  • All output now displays inside design dialogs. In the case of report tables, there is an option to expand output. This is especially important for displaying larger report tables on netbooks.
  • Independent t-test output now includes two histograms with superimposed normal distribution curves.  It also shows kurtosis, skew, and an omnibus measure of normality for each sample as well as the O’Brien homogeneity of variance test. Explanatory footnotes have been added to the output e.g. explaining the p value or what kurtosis means.
    Independent t-test support graphics

    Independent t-test support graphics

  • Guidance given on need to assess normality of each sample when more than one (part of test selection process).
  • Hovering over cells in the data entry/editing grid displays appropriate value labels. E.g. hovering over 1 in a gender field may show the tooltip “Male” (if that label has been set up by the user)
  • Can update variable details from within data editing/entry grid by right clicking on column labels. This ability is signalled by tooltips.
  • Date format (e.g. US) is now automatically extracted from the operating system rather than requiring user preferences. Preferences now sets reporting explanation level (still not operational).

There have also been a few bug fixes:

  • Fixed bug in independent t-test where the std dev displayed for sample b was actually that of sample a.
  • Now copes with filter on a string variable when creating divider.
  • Fixed loss-of-focus bug in Windows when typing titles and subtitles after having clicked html widget.

0.9.0 can examine normality of variables when selecting main test to use

Monday, January 4th, 2010

The 0.9 series has finally begun! The main emphasis of this series is to enhance the learn-as-you-go goal of SOFA Statistics. For example, when choosing a statistical test it is important to know if the data is normally distributed or not. You can do this at the point where you’re choosing your test:

Graph to help determine normality

Graph to help determine normality

Here is the list of new features added in 0.9.0:

  • When selecting a statistical test to use, users can examine the normality of a variable visually and with a test of kurtosis and skew.  The thumbnail can be expanded into a more detailed histogram.
  • When opening the project selection dialog it defaults to the version last selected during session.
  • Opening the data selection dialog uses the last database and table selected during session.
  • Information on filter used also displayed in immediate output.
  • The values displayed in tests like the ANOVA take any filtering into account.
  • When exporting script, user reminded where it is being appended.
  • When editing variable details, user is reminded where the changes are being saved.
  • Useful message to user if using deprecated project file format.

Bug fix:

  • Fixed simple naming bug which prevented independent t-test and mann-whitney from running.

This is the first release with Matplotlib (scientific graphing) included and the 0.9 series of releases should include lots more graphing and internal documentation to a) help users choose the correct test and b) confirm that they made the right choice.

NB The current plan is to use RaphaelJS for the output charting, not Matplotlib.  Matplotlib graphics will be restricted to a support role, albeit a very important one.  This decision is consistent with the “beautiful output” goal of SOFA Statistics.  RaphaelJS is still maturing as a library, so it may be a while till conventional output graphing is part of SOFA Statistics.  But most of the foundations are in place.