August 3rd, 2013
The SOFA Statistics project could go in a number of different directions. Ideally, it would:
- Add more chart types and more flexibility for graphical customisation
(without compromising the SOFA goals of beautiful output and ease-of-use)
- Add a comprehensive array of the most important statistical tests
(without compromising the ease-of-use and learn-as-you-go goals)
- Make it much easier to automate reporting
- Make publishing reports to the web, and office formats, seamless and simple.
I only have limited time to develop SOFA at the moment, so I have to choose the top priorities. Here is what I think I should do:
- Make it easy to export data so it’s ready for charting using spreadsheet charting tools
- Provide brief documentation on how to use advanced tools like Matplotlib
- Statistical Tests
- Make it really easy to export data from SOFA ready for analysis in R
- Report automation
- Provide documentation so people can automate SOFA themselves using Python
- Add a plug-in for exporting to a document format
I should also solve the remaining bugs preventing Mac users from being able to export output as images. What do people think about this direction? Drop me a line at firstname.lastname@example.org.
July 23rd, 2013
SOFA Statistics 1.3.4 is not an adventurous release but it squashes plenty of bugs – especially for MS Access and MS SQL Server users. Here are the features:
- Can make more complex charts and larger series of charts. It is now possible to override the conservative limits on charts e.g. the maximum number of series or charts or clusters. A warning is shown that you may not necessarily produce a viable chart or set of charts. But often it will work so now you get to try and see.
- Importing now copes with excessively long field names by shortening them automatically (without risking duplicates).
- MS SQL Server views can now be analysed, not just tables.
And here are the bug fixes:
- Fixed bug with calculation of mean with MS SQL Server data (now explicitly cast as float to avoid integer result).
- Fixed bug in ANOVA output for precise (as opposed to speed) – it used to try mixing Decimals and other numeric types unsuccessfully.
- Fixed bugs with chart data gathering queries so works in MS SQL Server properly. Also cleaner for the others in any case.
- Fixed bug in underlying code if unique=False ever applied with scatterplots (currently never but it was still, technically, a bug).
- Fixed bug in scatterplot SQL which affected MS Access and MS SQL Server (can’t use aliases in group by etc).
- Adjusted y-title position in dojo scatterplots to avoid it being cropped.
- Fixed bug when Postgresql date date being displayed as a category (couldn’t calculate a length of a datetime.datetime object).
- Fixed layout bug when report table resized after Add to Report checkbox hidden. Now freshens layout when checkbox reappears.
- Fixed bug preventing charts from being produced when linked to MS Access.
- Fixed bug adding large delay to display of output when linked to MS Access.
There are some important changes coming for SOFA but it was important to tidy up a bit first. Watch this space!
May 5th, 2013
SOFA Statistics has a Mac version so I need to be able to test and package SOFA on a Mac. I do this on my Ubuntu Linux host machine using VirtualBox which works pretty well. But until a few minutes ago, the VirtualBox instance of Mac I had running was squeezed into a somewhat restrictive screen resolution. No longer! Here are the two basic steps I followed to resolve this problem:
1) Add a new screen resolution to com.apple.Boot.plist as per How to Increase Mac OS X Snow Leopard Virtual Machine Screen Resolution on VirtualBox and VMware using Method #1 (but not from the /Extra folder – from the next bit):
2) Make screen resolution available from VirtualBox end as per Notes on setting 1680×1050 resolution on a Snow Leopard inside a VirtualBox
And here is the result – a much more pleasant experience of testing SOFA on the Mac platform.
Next goal is to get some tricky graphics libraries I need working on the Mac.
April 5th, 2013
95% confidence intervals have now been added to ANOVA and t-tests. And associated output has right justified numbers to make it easier to read.
Version 1.3.3 also lets you sort by category labels in clustered bar charts, line charts, area charts and box plots. Area charts can also be sorted ascending or descending by count/mean/sum.
The series and category are now displayed in tooltips e.g. Italy, 20-29 for clustered bar charts, multi-series line charts, and box plots. This is especially helpful when there are lots of categories and/or series.
- Improved statistics output footnotes.
- Borders on bar-type charts are now optional. This can be useful when bars are very short.
- Chi square clustered bar charts can cope with higher default limits for number of values.
- Importing field names with more than 90 characters prohibited at the point of import rather than causing problems later.
- The group by max number of values is now controlled by a single my_globals setting (making it easier to override).
- The default settings for some remaining max values have been increased.
There was one minor bug-fix this version – line charts now cope better with lots of categories (increased padding around max label width in overall width calculations).
And a problem with the deb installer was also fixed.
January 27th, 2013
It is now easy to back up SOFA including data, reports, and any variable and project details. The backup button is on the main screen and can be made operational by installing the backup SOFA plug-in (available from www.sofastatistics.com/get_extensions.php).
January 9th, 2013
The latest version brings lots of small improvements and one useful new feature – the ability to use sum as an option for charting e.g. a line chart showing total sales by country:
Here is the full list of improvements:
- Adding sum as an option for charting e.g. a line chart showing total income per month by product. And the interface has been simplified at the same time.
- Matplotlib scatterplots now have optimal min and max settings calculated for their x-axis.
- Added footnote to Wilcoxon output explaining that different statistics packages may report the test statistic differently.
- Misc fixes to chart layout including left margin offset.
- Easier to add new variable definition files from within dialog for choosing them.
- Modified recode column labels and help content to reduce confusion about which columns to enter range information into.
And bug fixes:
- Fixed code picking optimal min and max axis values for scatterplots and box plots to cope when value range is much smaller than gap to 0.
- CSV import now copes with new lines inside fields when gathering data for sample display.
- Extra settings for Line Charts now display when they should even if only changing data type.
- Fixed bug which allowed line breaks in field names.
December 1st, 2012
SOFA has a plug-in for exporting reports and individual output as images (PNG) and/or PDFs. Unfortunately, I haven’t been able to make a version which works for OS X. The plug-in works on Windows and Linux but there are crucial libraries I haven’t yet been able to get working on Mac. Fortunately there are some signs of progress. Sid Stewart of PDF Labs is working on a new version of pdftk (one of the libraries I need working) and will be building a new installer for Mac. And wkhtmltopdf and pyPDF are already working. So getting the export output plug-in working for Mac might be possible after all.
You might be able to help. If you are a Mac user, and you are able to get either of the following libraries working on your machine, please drop me a line (email@example.com) letting me know how you did it.
- Ghostscript (used to convert PDF → PNG)
- ImageMagick (used to trim PNG to correct dpi) or, even better still, PythonMagick
November 20th, 2012
More good news for the SOFA Statistics project – CUBRID recently donated $300 to SOFA Statistics. CUBRID is “a comprehensive open source relational database management system highly optimized for Web Applications” and SOFA recently added a plug-in to connect to CUBRID databases.
It was also great to see the support that CUBRID gave to a range of other projects. One of the interesting things was the range of countries represented: New Zealand (SOFA Statistics), Switzerland, South Korea, Kenya, Russia, Australia, Denmark, Romania, China, Spain, Germany, Indonesia, the Netherlands, and the USA. Truly a global effort.
So on behalf of the SOFA project, thanks 🙂
November 9th, 2012
Great news! SOFA Statistics won the 2012 People’s Choice Award in the NZ Open Source Awards. Thanks to everyone who voted in support.
In addition to the trophy and framed certificate I was lucky to get a nice new Android tablet from Zareason (http://zareason.com/shop/zatab.html). Busy playing with that at the moment.
And SOFA was also a finalist for the Best Open Source Project award.
So it was a great awards ceremony for the project.
Video of Open Source People’s Choice Award (Presented by ZaReason)
Video of finalists for Open Source Software Project (SOFA one of 3 finalists)
November 4th, 2012
SOFA 1.3.0 has plenty of small but important additions:
And there have also been some important bug fixes making it worth upgrading:
- Fixed bug in row stats where data should have explicitly filtered out None values.
- Fixed bug in setting of min and max values for y-axis for boxplots when min is below 0.
- Refactored code for running report in output module. Easier to understand and also made it easy to save copy of internal html output with absolute paths to images – very helpful when exporting images.
- Built more robust value quoting e.g. for sql statements.
I hope you like it.