Archive for the ‘general’ Category

0.8.5 has stronger ANOVA support and can output in multiple styles

Monday, August 17th, 2009

Version 0.8.5 has the following new features:

  • The one-way ANOVA now presents the user with a choice of either precision or speed. Precision passes the hardest NIST test (http://www.itl.nist.gov/div898/strd/anova/SmLs09.html) and is the default.  The speed option uses standard floating point arithmetic with all the pros and cons that entails.
  • ANOVA displays more information in output to enable comparison with known results.
  • HTML output can now display multiple styles for different tables.
  • Importing now requires alphanumeric names for tables.

and the following main bug fixes:

  • Importing CSV files is now working (regression added in 0.8.4).
  • CSV files with multiple data types in columns are handled correctly when user opts to let SOFA Statistics fix a column type.
  • Kruskal Wallis H test now copes with string variables.

0.8.4 adds new features, and lots more polish

Friday, July 31st, 2009

The latest version includes a lot more polish and has many rough edges removed.  There are also several important new features:

  • Users can explicitly set variables to Nominal, Ordinal, or Quantity.  These settings are used to limit the variables displayed in various tests to those which are of the appropriate type.
  • Pearson’s Chi Square now has a contingency table with both observed and expected values.
  • The one-way ANOVA and Kruskal Wallis H now provide more information in the output e.g. average rank per group.
  • Variables can be configured from a right click when running statistical tests (as is already possible when making report tables).
  • Can select statistical tests by double clicking.  These are also sorted alphabetically to make direct selection faster.
  • New way of indicating a set of values has been limited to the first 20 unique values no longer disrupts the user from making selections.
  • Test data now includes a string variable (browser).

Bug fixes:

  • Numerical values appear in numerical rather than string order when configuring variable details.
  • No longer necessary to complete MS SQL Server details merely for having plugin installed.
  • Statistical output works even if variable is a string variable.
  • Minor problem with start screen positioning on dual monitors resolved (adequately)

Please let me know what you think.  Is the project heading in the right direction from your point of view?

The next lot of development will focus on subsidiary charting (as opposed to charting for main output).  E.g. assessing normality of data before choosing the appropriate test.

0.8.3 supports MS SQL Server – also misc improvements and some important bug fixes

Saturday, July 18th, 2009

The latest release not only adds some important new functionality, it cleans up a lot of existing code and fixes a lot of bugs.

  • Supports direct connection to MS SQL Server databases.
  • Previous selections become the defaults when configuring statistical tests.
  • Labels are consistently updated in the statistical test dialogs if a new label file is selected.
  • Plus lots of important bug fixes and usability improvements.
  • Fixed installation bug affecting multi-user Windows installations.
  • Fixed bugs when connecting to MS Access depending on data type.
  • Fixed bug preventing tab traversal of Projects form.
  • Fixed bug when entering data in Windows.
  • Fixed bug when variable in independent tests had large number of unique values.
  • Fixed various bugs occurring when changing database or table selection.

Please report bugs – it’s good for the project

Saturday, July 18th, 2009

Bugs are never welcome, but the only thing worse than a bug is a bug you don’t know about and could easily fix.  Even worse, an unknown bug could put some people off using your software, which is not a good outcome for anyone.  So how do you report a bug in SOFA Statistics?  Fortunately, Launchpad (which is where the SOFA Statistics source code lives) makes bug reporting easy.  Just go to: https://launchpad.net/sofastatistics/+filebug/+login and register the bug.  I’ll do my best to fix it and keep everyone informed along the way.

Remember – reporting a bug is an act of kindness so please don’t hold back.  Your report could help many other users.

0.8.2 adds final tests

Friday, July 10th, 2009

Finally!  Version 0.8.2 is the first with all the core statistical tests functional.  This version:

  • added one-way ANOVA.
  • added Kruskal-Wallis H.
  • added Pearson’s Chi Square.
  • and fixed startup bug affecting Windows users on networked drives.

Of course, the best is yet to come.  You may have noticed that huge empty space in the dialog for configuring a test.  Plus those disabled buttons about reporting level (“results only” through to “full explanation”).  I will be coming back to flesh out those areas.  The intention is that the user will be supplied with little visualisations of the actual data so they can see whether the test is appropriate or not (all explained in words as well with Help on hand).  E.g. a histogram of each sample so the shape of each distribution is visible at a glance.  Plus a small test and its interpretation which lets you know whether the test is usable or not.  The user shouldn’t have to know about, or remember to use, tests of kurtosis or skew or equality of variance.  They should simply choose the most likely test to be appropriate and have SOFA Statistics explain to them whether it will work or not based on the actual data being analysed (along the way some of these ideas are bound to rub off, of course). And when you click on the Help button next to the buttons on Normal/Not Normal, SOFA Statistics should not only explain the concept (with a couple of simple images), but also enable you to visualise the data and run the appropriate tests to decide if it is Normal or not.

0.8.1 adds 4 new tests inc Mann Whitney U and Pearson’s R

Saturday, July 4th, 2009

Release 0.8.1 of SOFA Statistics added 4 new tests:

  • the Mann Whitney U
  • the Wilcoxon Signed Ranks
  • Pearson’s Correlation
  • Spearman’s Correlation

At this stage only the raw results are presented but the intention is to let users choose the level of explanation they want in their output.  Downloads are available from:

http://www.sofastatistics.com/downloads.php

SOFA Statistics and R

Friday, July 3rd, 2009

Someone asked me recently about the difference between R and SOFA Statistics.  In short, SOFA is aiming for a very different niche.  One of the initial project slogans/messages is:  “SOFA won’t replace sophisticated statistics systems like R, but there is a good chance it will do what you need and do it well.”

Major points of difference as I see it (open for discussion):

Main users:

  • R: statisticians and experienced quantitative researchers.
  • SOFA: business analysts, secondary school statistics students and their teachers, social science students in the tertiary sector, experienced statisticians doing some quick exploration of data or wanting to create attractive output for a report or presentation, citizen activists wanting to use publicly available data to support their cause.

Main concerns:

  • R: statistical analysis – what are the very best tools available for understanding the data.
  • SOFA: ease of use, simplicity, beautiful output (aesthetics as a value in its own right, not just a means for the communication of information)

Scope of statistical tests:

  • R: everything and anything you might need
  • SOFA: the main tests that most potential users of statistical analysis need.  Favouring thoroughness of support for user over breadth of tests available.  See the second screenshot here – http://www.sofastatistics.com/screenshots.php – for an idea of the philosophy being followed.

Of course, these are generalisations.  R is not uninterested in ease-of-use or aesthetics and SOFA Statistics is intended to be extensible with plugins to allow more sophisticated analysis.  But there is a difference in emphasis and there is room for both approaches as open source software increases its presence in the statistical analysis area.

0.8.0 includes t-tests and help choosing the appropriate statistical test

Wednesday, July 1st, 2009

Version 0.8.0 of SOFA Statistics has now been released.

  • SOFA Statistics now includes both the independent samples t-test and the paired samples t-test.
  • There is the option of assistance when selecting a statistical test.
  • Random quotations on statistics are shown when hovering over the Statistics button.
  • Plus there are minor layout and label changes to increase usability.

The statistics selection form is the centrepiece of the new 0.8 series, the goal of which is to implement all the required statistical tests.

Form for selecting appropriate statistical test

Form for selecting appropriate statistical test

Download it on the downloads page – http://www.sofastatistics.com/downloads.php

Candy, Community, Comfort, Credibility etc

Monday, June 22nd, 2009

I have just looked at a range of general/basic open source statistics programs.  Some had extensive lists of tests available.  Some had attractive output.  And some made it easy to edit or import data.  But I couldn’t help feeling your typical business analyst, school student, or medical/social sciences researcher with rusty statistics skills would feel quite daunted by the offerings I experimented with.  Which got me thinking about the different use cases for general purpose statistics/analysis/reporting applications.

So what should the focus be when designing SOFA Statistics and what messages should be communicated and to whom?

Here are some messages that could be made by a statistics/analysis/reporting application:

  • candy – beautiful output, attractive website, splashscreen, dialogs etc
  • comfort – easy interface, lots of help at appropriate level
  • communication – stats are well explained e.g. difference between mean and median
  • correctness – stats you can trust, transparent, verifiable, certified by experts
  • community – help is available for whatever level you are at (school homework, business results, advanced stats, integration with Office suites etc)
  • credibility – backed by a real company, going to be here for the long run, reference group has an impressive membership with good  coverage, people have appropriate qualifications etc
  • continuity/compatibility – no need to abandon existing data to start getting benefits of new system.  Has special “Help for users of [popular stats program name here]” etc.
  • code – using the right software, the coolest programming tricks etc
  • cheap – no money and little time required to use
  • customisability – can make work with other systems, can integrate with other systems, can automate processes

And lining these messages up with potential groups they might appeal to:

  • schoolkids – cheap, comfort, communication, community, and candy
  • teachers – cheap (students can use it), comfort, communication (educational), community (start sharing stats teaching resources while they are at it), correctness
  • university students (social sciences etc) – cheap, communication, community, correctness (so they’ll be allowed to use it)
  • university students (statistics – starting off) – same as social science students
  • statisticians (academic, professional) – correctness (paramount), credibility, continuity, customisability (can extend for special needs), cheap (they already have licenses for other products, plus they may want clients to do preliminary analyses using a free product they know themselves)
  • business analysts – continuity/compatibility (must work with Excel, Word, mainstream web browsers etc), candy (produce lots of reports that managers like to look at and show others), comfort, communication (may be very rusty on stats skills), credibility (a must-have for this group), customisability (want to be able to automate processes e.g. reports), community (where people show them how to automate things, tricks to get problems solved etc)
  • social science researchers – credible (so they can publish based on the data), candy, continuity/compatibility (so they can fall back on an established stats program if there is a problem or if SOFA can’t do something they need), comfort (may be good at social sciences but not computers/programming etc)
  • school administrators – cheap, customisable (for their curriculum)
  • business integrators – customisable, code (developers become more important and they care about code), cheap (so they can make their profit too), compatibility (with all the systems, input and output, they want to integrate with), credibility (can they make deals with you, will you be around in 5 years time?)
  • geeks/developers/coders – code

Of course, having a message is one thing – delivering on it is another.  But it is important to have a clear sense of a project’s priorities and a clear message to take to different groups.

0.7.4 can import from Excel

Saturday, June 20th, 2009

SOFA Statistics has now reached the point where you can probably always get data into it.  The lowest common denominator is the CSV (comma separated values) file, or an Excel spreadsheet (which you could always make in Open Office if you don’t have MS Office), or a MySQL, MS Access, or SQLite database.

Here is the list of main changes:

  • Now able to import from Excel spreadsheets.
  • Importing can now be cancelled.
  • There is a progress bar while importing.
  • CSV importing gives the option of fix and continue if import has problems.
  • Bug fix – can now cope with CSV files with more columns.
  • Bug fix – now able to create projects even if default project selected.
  • Bug fix – can now select database files without file extensions e.g. SQLite databases.

The 0.8 series should be starting soon, with an emphasis on statistical tests like the t-test, Chi Square etc.  Once those are in place, I will start to more heavily promote SOFA Statistics.