Archive for the ‘general’ Category

Where to for SOFA?

Saturday, August 3rd, 2013

The SOFA Statistics project could go in a number of different directions. Ideally, it would:

  • Add more chart types and more flexibility for graphical customisation
    (without compromising the SOFA goals of beautiful output and ease-of-use)
  • Add a comprehensive array of the most important statistical tests
    (without compromising the ease-of-use and learn-as-you-go goals)
  • Make it much easier to automate reporting
  • Make publishing reports to the web, and office formats, seamless and simple.

I only have limited time to develop SOFA at the moment, so I have to choose the top priorities. Here is what I think I should do:

  • Charting
    • Make it easy to export data so it’s ready for charting using spreadsheet charting tools
    • Provide brief documentation on how to use advanced tools like Matplotlib
  • Statistical Tests
    • Make it really easy to export data from SOFA ready for analysis in R
  • Report automation
    • Provide documentation so people can automate SOFA themselves using Python
  • Publishing
    • Add a plug-in for exporting to a document format

I should also solve the remaining bugs preventing Mac users from being able to export output as images. What do people think about this direction? Drop me a line at grant@sofastatistics.com.

Lots of bug fixes (esp MS Access & SQL Server)

Tuesday, July 23rd, 2013

SOFA Statistics 1.3.4 is not an adventurous release but it squashes plenty of bugs – especially for MS Access and MS SQL Server users. Here are the features:

  • Can make more complex charts and larger series of charts. It is now possible to override the conservative limits on charts e.g. the maximum number of series or charts or clusters. A warning is shown that you may not necessarily produce a viable chart or set of charts. But often it will work so now you get to try and see.

High number of charts

Lots of images

  • Importing now copes with excessively long field names by shortening them automatically (without risking duplicates).
  • MS SQL Server views can now be analysed, not just tables.

And here are the bug fixes:

  • Fixed bug with calculation of mean with MS SQL Server data (now explicitly cast as float to avoid integer result).
  • Fixed bug in ANOVA output for precise (as opposed to speed) – it used to try mixing Decimals and other numeric types unsuccessfully.
  • Fixed bugs with chart data gathering queries so works in MS SQL Server properly. Also cleaner for the others in any case.
  • Fixed bug in underlying code if unique=False ever applied with scatterplots (currently never but it was still, technically, a bug).
  • Fixed bug in scatterplot SQL which affected MS Access and MS SQL Server (can’t use aliases in group by etc).
  • Adjusted y-title position in dojo scatterplots to avoid it being cropped.
  • Fixed bug when Postgresql date date being displayed as a category (couldn’t calculate a length of a datetime.datetime object).
  • Fixed layout bug when report table resized after Add to Report checkbox hidden. Now freshens layout when checkbox reappears.
  • Fixed bug preventing charts from being produced when linked to MS Access.
  • Fixed bug adding large delay to display of output when linked to MS Access.

There are some important changes coming for SOFA but it was important to tidy up a bit first. Watch this space!

Confidence Intervals for ANOVA & t-tests in 1.3.3

Friday, April 5th, 2013

95% confidence intervals have now been added to ANOVA and t-tests. And associated output has right justified numbers to make it easier to read.

Confidence Intervals

Version 1.3.3 also lets you sort by category labels in clustered bar charts, line charts, area charts and box plots. Area charts can also be sorted ascending or descending by count/mean/sum.

The series and category are now displayed in tooltips e.g. Italy, 20-29 for clustered bar charts, multi-series line charts, and box plots. This is especially helpful when there are lots of categories and/or series.

Boxplot Improvements

  • Improved statistics output footnotes.
  • Borders on bar-type charts are now optional. This can be useful when bars are very short.
  • Chi square clustered bar charts can cope with higher default limits for number of values.
  • Importing field names with more than 90 characters prohibited at the point of import rather than causing problems later.
  • The group by max number of values is now controlled by a single my_globals setting (making it easier to override).
  • The default settings for some remaining max values have been increased.

There was one minor bug-fix this version – line charts now cope better with lots of categories (increased padding around max label width in overall width calculations).

And a problem with the deb installer was also fixed.

1.3.2 Makes Backing Up SOFA Easy

Sunday, January 27th, 2013

It is now easy to back up SOFA including data, reports, and any variable and project details. The backup button is on the main screen and can be made operational by installing the backup SOFA plug-in (available from www.sofastatistics.com/get_extensions.php).

backup_button_added

1.3.1 brings more improvements

Wednesday, January 9th, 2013

The latest version brings lots of small improvements and one useful new feature – the ability to use sum as an option for charting e.g. a line chart showing total sales by country:

New sum option

Here is the full list of improvements:

  • Adding sum as an option for charting e.g. a line chart showing total income per month by product. And the interface has been simplified at the same time.
  • Matplotlib scatterplots now have optimal min and max settings calculated for their x-axis.
  • Added footnote to Wilcoxon output explaining that different statistics packages may report the test statistic differently.
  • Misc fixes to chart layout including left margin offset.
  • Easier to add new variable definition files from within dialog for choosing them.
  • Modified recode column labels and help content to reduce confusion about which columns to enter range information into.

And bug fixes:

  • Fixed code picking optimal min and max axis values for scatterplots and box plots to cope when value range is much smaller than gap to 0.
  • CSV import now copes with new lines inside fields when gathering data for sample display.
  • Extra settings for Line Charts now display when they should even if only changing data type.
  • Fixed bug which allowed line breaks in field names.

Getting export images/pdfs working for Mac users

Saturday, December 1st, 2012

SOFA has a plug-in for exporting reports and individual output as images (PNG) and/or PDFs. Unfortunately, I haven’t been able to make a version which works for OS X. The plug-in works on Windows and Linux but there are crucial libraries I haven’t yet been able to get working on Mac. Fortunately there are some signs of progress. Sid Stewart of PDF Labs is working on a new version of pdftk (one of the libraries I need working) and will be building a new installer for Mac. And wkhtmltopdf and pyPDF are already working. So getting the export output plug-in working for Mac might be possible after all.

You might be able to help. If you are a Mac user, and you are able to get either of the following libraries working on your machine, please drop me a line (grant@sofastatistics.com) letting me know how you did it.

  • Ghostscript (used to convert PDF → PNG)
  • ImageMagick (used to trim PNG to correct dpi) or, even better still, PythonMagick

1.3.0 brings numerous improvements

Sunday, November 4th, 2012

SOFA 1.3.0 has plenty of small but important additions:

  • Added Mode as an option to Row Stats report tables. Reports mode(s) with N of the mode value(s) e.g. mode weight 72.0, 76.0 (N=23)

    Modes available

  • Line and area charts can show major labels only as an option.
    All labels (the default)

    Major labels only

  • Pie charts now have option of displaying count and percentage (not just in the tooltips as at present).
    Pie Chart Details option available

  • Histograms use consistent bins when charted by a second variable.
  • Better placement of y-axis title when wide labels.
  • Pie charts keep consistent colours even if sorted by count rather than value or label.

    Consistent category colouring in Pie Charts

  • Better message when adding new reports if required subfolder with javascript and background images is missing. Only show message now if a problem.
  • When trying to export report, SOFA checks for expected subfolder as well (otherwise dojo fails for any charts and export fails).
  • SOFA prevents attempt to export report if no report file (yet).
  • No longer displays View or Export Report buttons on Projects dialog.

And there have also been some important bug fixes making it worth upgrading:

  • Fixed bug in row stats where data should have explicitly filtered out None values.
  • Fixed bug in setting of min and max values for y-axis for boxplots when min is below 0.
  • Refactored code for running report in output module. Easier to understand and also made it easy to save copy of internal html output with absolute paths to images – very helpful when exporting images.
  • Built more robust value quoting e.g. for sql statements.

I hope you like it.

FLOSS for Science Interview

Friday, October 12th, 2012

I was lucky enough to get interviewed by FLOSS for Science. Check it out 🙂

FLOSS for Science Interview

Version 1.2.2 has XLSX importing and reportable normality analyses

Thursday, September 20th, 2012
  • The latest SOFA, version 1.2.2, lets you import from Excel XLSX files (previously, Excel files had to be in the XLS format).

    XLSX Format Supported

  • Normality analyses can be included in reports, saved as output etc.

    Normal Curves

  • And there is support for CUBRID databases. CUBRID is an open source relational database highly optimized for Web Applications. That brings to 6 the total number of SQL-type databases that SOFA can directly link to.
    CUBRID Logo

A few bugs were also fixed:

  • Restored standard deviation option to row stats report tables.
  • Fixed bug which meant row % was appended multiple times to config col dialog (until session closed).
  • Restored PostgreSQL functionality by fixing faulty psycopg import statement.

1.2.1 enables export to spreadsheet and more

Tuesday, August 28th, 2012

The latest version of SOFA Statistics makes it easy (via a plug-in) to export to spreadsheet.

Export Data

SOFA also makes it easier to select multiple variables when making report tables.

Select multiple variables

Additionally, there have been some important bug fixes – mainly for bugs which snuck in during the major change to 1.2.0.

  • Fixed nasty bug breaking demo report tables. A casualty of the changes to the independent display of titles from report tables in 1.2.0 so wide titles didn’t mean wide table cells.
  • Fixed bug with display of percent symbols in report tables. A missing not was the culprit – another casualty of the big refactoring for 1.2.0.
  • Fixed bug with Data List reports – wouldn’t update display after changing sort order of a variable.
  • Sort by value now works properly in Data List reports.
  • Can now handle excessively long values being used as categories in report tables or charts etc. Checks are also now made for excessively long category variable values.
  • If encoding problems, SOFA now tries to use the field encoding e.g. iso-8859-1.