SOFA Statistics had its 100,000th download today, which is a doubling in just over a year. And more features and user experience refinements are in the pipeline. So please spread the word. There is no advertising budget so we need you to blog, tweet, like, +1 etc. Thanks!
More steady improvements to SOFA:
- Added ability to chart percentages as well as frequencies.
- Chart series now all have consistent y-axes (trellis style).
And there have been some bug fixes:
- Fixed ODS import bugs when encountering fields formatted as fractions, boolean, percentage etc.
- Fixed bug stopping linked images external to the html e.g. generated by matplotlib, from displaying in the internal GUI if the report path was different from the default report path.
- Fixed bug when running scatterplots and histograms with chart by (because of use of dd object which cannot be used headless).
- Fixed bug in charts dialog where second variable drop down would sometimes be overly restrictive when doing a chart of average values.
- Fixed bug where old pycs can interfere with Windows upgrades.
Charts now have the option of rotated (vertical) x-axis labels. This can be useful for longer labels.
Note – if you have upgraded SOFA rotated labels may not work unless you update the sofastats_charts.js file in your local sofastats folder e.g. C:\Users\username\sofastats\reports\sofastats_report_extras with the sofastats_charts file for sofastats_report_extras
Scatterplots now focus on the data better by starting axes just below the minimum x and y values of the data unless the value is close enough to 0 to make it worth using 0 anyway.
And for Ubuntu users, a much nicer launcher icon . Actually, it’s a set of icons at different resolutions so that SOFA always looks good on the launcher.
Other changes include:
- Numeric values are right justified in data tables.
- Kurtosis values in the normality test include the Fischer adjustment (subtracting 3).
- Duplicated field names in imports are given unique suffixes and allowed (now that they are unique).
- Excel importing now handles times without dates.
- More date formats are accepted when importing data.
- Better guidance on data preparation before importing data.
- More robust handling of variable definition files if corrupted.
Note: if upgrading on Linux, the two user folders (sofastats and sofastats_recovery) may be shifted from inside your home folder to a better location e.g. “/home/username/Documents” if free desktop standards are supported. After upgrading you may wish to manually replace the contents of the new folders with the contents of the old ones.
- Fixed small bug stopping column labels displaying in data table view.
- Fixed bug in recode operation which would wipe the table if any errors at all where encountered trying to turn the user recode config into SQLite update clauses.
- Fixed bug in getting structured data e.g. for line charts, where a user names a field freq and thus has a conflict with my own freq field. Renamed the internal use field _sofa_freq to prevent collisions.
- Creating user’s default proj file now copes with apostrophes etc in user path e.g. /Users/Jim’s/etc.
- The project dialog now displays the default report and css details saved with it from previous occasions.
- Project settings are only applied if the project is selected – they are not automatically triggered by changes when configuring a project.
- Multi-line values entered into data cells e.g. variable label settings, automatically have the line breaks converted into spaces. Prevents errors in display of data e.g. in single line text boxes, and problems storing in python scripts (EOL error) etc.
- Fixed bug where the first SQLite database in a project was assumed to be the default sofa database even though it might not be. Now possible to link to multiple default databases e.g. testing copies etc as long as simple naming convention followed.
- SOFA now rolls back to last good database connection if a failure.
- Fixed strange bug where default database would lock if made a new table, then looked as design, then tried to write to the database e.g. importing, editing data. Just refreshed cursor after updating demo table design and problem gone.
The latest version adds a range of improvements:
- Added lower and upper quartiles to Row Stats report tables.
- Box plots now start y-axis from just below the minimum y value of the data unless the content is close enough to the bottom of the graph to make it worth using 0 anyway.
- Showing the percent sign in percent columns for report tables is now optional – which is good news for many dissertation students.
- SOFA now displays value labels sorted by the numerical version of numbers even if stored as text. So no more 1, 11, 2,3 etc in cases where people have stored the number as a Text data type.
- Added some more valid US date formats using dot dividers.
- New help button for importing data.
- New help button to advise on how to make of flexible data filters.
- English translations are handled better (no more messages about not having US English and using UK English instead etc).
Plus there are some useful bug fixes:
- Fixed bug where getting observed values e.g. for chi square test, fell over when one field in pair had missing values while the other didn’t.
- Fixed bug in calculation of upper and lower whiskers in box plots.
- Single bar charts don’t show a bar title anymore – only needed if multichart.
- Fixed bug which only changed variable definitions when the extra settings dialog was closed with OK and didn’t ever set it otherwise e.g. when changing the selected project.
- Now copes with newer versions of matplotlib on Linux.
- No longer stores empty strings as variable labels if user doesn’t enter a label.
A helpful user drew my attention to the desirability of adding a small but important feature for dissertation students – namely, the ability to leave the percentage symbol off the numbers in the percentage columns of frequency and cross tabulation report tables. This new feature will be in the forthcoming version of SOFA (1.1.4):
For dissertation writing in the States, Turabian 7th edition and the Chicago Manual of Style 6th edition are standard for many graduate schools on both the masters and the doctoral level. In both cases, tables with percentage figures in them do NOT have percent signs in front of the numbers themselves, because a typical title like “Table 3. % of babies born to men over 40″ already tells you what’s inside the table.
Sofa Stats, however, so far as I can see, requires that percentages have the percent sign, which then gets dragged-and-dropped into Word (or Excel, for tidying up first). If the table is a small one, and if there are only one or two, no problem. But many dissertations have tons of them.
There is a way to rid a table of the % signs by using Excel, but it’s awkward and not a part of the regular menu system. I just worked it out myself a few hours ago, after spending half a day on the problem.
What would be *extremely* helpful to graduate students, whom I assume you would like to have as one of your key user groups, would be for you to program in a “switch” that would allow the user to specify percentages with or without percent signs. It’s a small detail, but one that would be much appreciated.
I generally try to avoid adding more features to SOFA in favour of keeping it simple but this seemed a good idea. Thanks again for the feedback Doug.
The SOFA installers for Windows and Mac have shrunk substantially – from 43MB to 25MB for Windows and from a rather hefty 85MB to 36MB for Mac. They’ll be quicker to download, and the new installers also avoid possible conflicts with other Python packages on a system. It’s all self-contained. A final benefit is that the installation process itself has become much simpler, with much fewer steps. For those who are technically minded, it is thanks to pyinstaller and py2app (with some initial help from Gui2exe).
SOFA has been reviewed and included in the software CD for a recent edition of Germany’s c’t magazine (c’t 2011 Issue 26 p.118). C’t (Magazin für Computertechnik) has a sold circulation of about 367,000 so it was wonderful to show up on their radar.
As SOFA Statistics has gained more functionality it has grown in complexity – there are modules for reading Excel spreadsheets, connecting to Google Docs spreadsheets, displaying charts, displaying GUI widgets etc. Trying to make a single executable for Windows users was always going to be a challenge and would probably involve a lot of trial and error. So it proved.
But there was one technique I used to make the seemingly impossible task manageable. I made a single python script I called launch.py which was responsible for importing all the main modules the executable would need to handle (e.g. matplotlib, MySQLdb etc). I identified the imports I would need by looking at each and every main module in SOFA and adding any external library module imports not already included.
The process of making an executable failed initially, so by variously commenting and uncommenting parts of the launch script I was able to isolate problem modules and fix them. To get PostgreSQL working, for example, I needed to add the following fix:
try: # I needed to add the Postgres library directory to the PATH # variable in Windows. Apparently when Postgres is installed under Windows as a # service, this isn't done automatically (no need to) so that library isn't # available. [http://osdir.com/ml/python.db.pygresql/2008-03/msg00021.html] # OK to hardwire to version available to my installer dev environment. The user experience # will depend on whether they have set the PATH properly. os.environ['PATH'] += ";C:\\Program Files\\PostgreSQL\\9.1\\bin" import pgdb except ImportError, e: pass
Here is the full text of launch.py:
#! /usr/bin/env python # -*- coding: utf-8 -*- from __future__ import absolute_import from __future__ import division # so 5/2 = 2.5 not 2 ! from __future__ import print_function # remove import __future__ from dbe_sqlite import cgi import codecs from collections import defaultdict from collections import namedtuple import copy import csv import datetime import decimal import gettext import glob import locale import math from operator import itemgetter import os import platform import pprint import random import re import shutil import socket import subprocess import sys import time import traceback from types import IntType, FloatType, ListType, TupleType, StringType import warnings import weakref import webbrowser import xml.etree.ElementTree as etree import zipfile # Even though not used here pyinstaller won't know about it otherwise # and will not have it when encountered in import2run.py/start.py etc import MySQLdb as mysql try: # I needed to add the Postgres library directory to the PATH # variable in Windows. Apparently when Postgres is installed under Windows as a # service, this isn't done automatically (no need to) so that library isn't # available. [http://osdir.com/ml/python.db.pygresql/2008-03/msg00021.html] # OK to hardwire to version available to my installer dev environment. The user experience # will depend on whether they have set the PATH properly. os.environ['PATH'] += ";C:\\Program Files\\PostgreSQL\\9.1\\bin" import pgdb except ImportError, e: pass import sqlite3 as sqlite # using sqlite3.dll from Python 2.7 so includes foreign key support #import wxversion #wxversion.select("2.8") # Not needed when using executable. # http://groups.google.com/group/pyinstaller/browse_thread/thread/1b57e64ddc35e772 if not hasattr(sys, 'frozen'): import wxversion wxversion.select('2.8') import wx import wx.lib.iewin as ie import wx.gizmos import wx.grid import wx.html try: from agw import hyperlink as hl except ImportError: # if it's not there locally, try the wxPython lib. import wx.lib.agw.hyperlink as hl # problem locating eggs folder - solution in http://www.pyinstaller.org/ticket/185 # change pyinstaller-1.5\support\_pyi_egg_install.py #if os.path.isdir(d): # for fn in os.listdir(d): # sys.path.append(os.path.join(d, fn)) import numpy as np #if hasattr(sys, 'frozen') and sys.frozen: # import numpy.core.ma # sys.modules['numpy.ma'] = sys.modules['numpy.core.ma'] # if include matplotlib before sys.path, matplotlib.collections shadows collections and won't find namedtuple # Currently problem with Path in environment MATPLOTLIBDATA not a directory # Must put mpl-data folder in same folder as the executable is finally run from import matplotlib #import matplotlib.numerix as Numerix #from matplotlib.axes import _process_plot_var_args #from matplotlib.backend_bases import FigureCanvasBase #from matplotlib.backends.backend_agg import FigureCanvasAgg, RendererAgg #from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg #from matplotlib.figure import Figure #from matplotlib.font_manager import FontProperties #from matplotlib.projections.polar import PolarAxes #from matplotlib.transforms import Bbox # connected to matplotlib # don't exclude Tkinter, Tkconstants import wxmpl import pylab # must import after wxmpl so matplotlib.use() is always first # don't import boomslang - trouble with import pylab in many cases, even import math. # works fine if matplotlib baked into exe #import boomslang # no need to bake googleapi in as nothing installed as such. Just ensure not using stale pycs from Ubuntu system. #import googleapi # problem with import os etc if using below #import googleapi.gdata.spreadsheet.service as gdata_spreadsheet_service #import googleapi.gdata.spreadsheet as gdata_spreadsheet #import googleapi.gdata.docs.service as gdata_docs_service #import googleapi.gdata.service as gdata_service # no need to bake xlrd in as nothing installed as such. Just ensure not using stale pycs from Ubuntu system. #import xlrd import adodbapi import pywintypes import win32api import win32con import win32com import win32com.client import dao36_from_genpy # go to makepy/genpy and look in py files till found - taken and rename and relocate so can directly call import import2run
The code for SOFA is cross-platform and I start the Windows packaging process by copying everything across from Ubuntu. It is important in such a case to wipe all pyc files so that platform-specific ones are created for Windows and included in the executable creation process.
The final import statement is for import2run.py. This means that the executable doesn’t hardwire anything beyond the imports. As it happens I started by having import2run contain just the following line:
Later, once all the basic imports were working, I changed it to:
to actually load SOFA. NB the executable created using the technique described here doesn’t replace all the SOFA modules with a single executable – its purpose is to replace Python and all the extra libraries such as matplotlib. So the exe is expected to live in the main SOFA program folder (usually in C:\Program Files\sofastats) alongside the usual modules such as core_stats.py. If a user actually had Python 2.6 and all the libraries installed they could either use the exe or run start.py directly themselves. It would have the same effect.
Getting matplotlib to work took a while and involved many false leads. In the end the solution was to copy the entire mpl-data folder (from somewhere like C:\Python26\Lib\site-packages\matplotlib) into the same folder as the sofastats.exe was going to end up.
Some final things I learned about Pyinstaller. –onedir is the default and adds the coll = COLLECT(…) part of the spec file. If making manual changes remember that if you want the onedir approach, don’t include a.binaries in the EXE(…) part and exclude_binaries should be True. If, like myself you want a single executable file, don’t bother with coll = COLLECT(…), include a.binaries, and set exclude_binaries to False. And while testing set debug=True and Console=True so you can see what is going wrong as you refine your spec file, launch.py script etc.
Although GUI2EXE is a wonderful program some aspects may not be compatible with Pyinstaller 1.5.1 so I now build my spec file using makespec.py with the –onefile argument. It works in its basic vanilla form for SOFA using launch.py. You can export the spec file GUI2EXE makes and see the differences.
Here is the final spec file I used:
# -*- mode: python -*- # used MAKESPEC 1.5.1 with --onefile option # NB must include mpl-data folder under main sofastats level (i.e. sibling of dbe_plugins etc) for matplotlib to work # manually set level=9 in PYZ params (inspired by how GUI2EXE did it) # manually replaced name=os.path.join('dist', 'launch.exe'), with name='C:\\sofastats_build_exe\\sofa.main\\sofastats.exe', # manually set debug=True, upx=False in EXE params # manually set exclude_binaries=False in EXE params a = Analysis([os.path.join(HOMEPATH,'support\\_mountzlib.py'), os.path.join(HOMEPATH,'support\\useUnicode.py'), 'C:\\sofastats_build_exe\\sofa.main\\launch.py'], pathex=['C:\\Python26\\pyinstaller-1.5.1']) pyz = PYZ(a.pure, level=9) exe = EXE( pyz, a.scripts, a.binaries, a.zipfiles, a.datas, exclude_binaries=False, name='C:\\sofastats_build_exe\\sofa.main\\sofastats.exe', debug=True, strip=False, upx=False, console=True )
Before going live switch debug and console to False.
This post is largely specific to SOFA Statistics but hopefully it includes some tips which might save others a lot of fruitless struggle. If you have trouble, I found the pyinstaller mailing list people helpful.
Version 1.1.2 fixes a bug which affected people trying to install SOFA into many non-English environments. SOFA also has some changes which make it safe for SOFA to communicate progress in more detail while being run in Windows using the non-console version of Python. Overall, SOFA has become much more robust in recent versions.