Over the last year, work on SOFA has been focused on a difficult packaging issue – enabling a Mac version to be built which allows Mac users to export their charts and reports as PNGs and PDFs. That functionality is now working on Snow Leopard and hopefully newer versions as well. But it would be nice to check with some people running Mac. If you’d like to try out the latest version of SOFA Statistics, please drop me a line via http://www.sofastatistics.com/contact.php.
Archive for the ‘developers’ Category
SOFA works on Windows, Mac, and Linux. But Linux is especially important for the project because SOFA is developed on Ubuntu. So it made sense to support the Linux ecosystem by signing up with the Open Invention Network. In an ideal world, it wouldn’t be necessary to have anything to do with software patents. For various reasons, they’re a bad idea and function more to inhibit innovation than encourage investment in software research and development. But the Open Invention Network plays a protective function in a world where people who create and actually make things can be preyed upon by parasites who have been granted monopolies on ideas – the so-called patent trolls.
The group was created to defend Linux from patent trolls and other attacks from patent holders. It tries to do this with its own patents which are then available royalty-free to any company, institution or individual that agrees not to assert its patents against Linux. While it hasn’t been done, these patents could also, in theory, be used by the OIN, or an OIN member, against a hostile company in a patent war.
Anyway, a range of companies and projects large and small (over 800 at present and growing) have signed up for the initiative including Google, Dropbox, IBM, Canonical, Mozilla, Twitter, Puppet Labs, Valve Software, Alfresco, NEC, Blender, OpenShot, Novell, Inkscape, Philips, Red Hat, CentOS, GNOME, Wikimedia, MariaDB Foundation, Rackspace, Moodle, Openstack, Slackware, Tor, and Sony. You get the idea.
J. David Eisenberg has kindly made a PortableApps version of SOFA for Windows. It is an alpha release only but it works and feedback/assistance is welcome. Here is his announcement as posted on the Google Group:
I have used the PortableApps guidelines (correctly, I hope!) to create a version of SOFA Statistics that can be installed on a USB drive and will retain your data and settings.
You can download the installer at http://evc-cit.info/SOFAStatisticsPortable_1.4.3_English.paf.exe ; this is Windows only.
Known problem: If you add the results of a statistical test to a report, any graphs for that statistical test will show up as a “missing image” icon. The image will be in the report; it just won’t show up on screen.
I have not tried scripting to see if that works properly.
Any comments are welcome at the SOFA statistics Google Group
SOFA aims for ease of use as part of its “ease of use, learn as your go, beautiful output” mantra. But it confronts users with having to think about databases, even if just working with simple spreadsheets of data or some data entered by hand.
This was the usability problem brought to my attention by a member of the community, Jan Dittrich. Jan (http://mindthegap.blog.bau-ha.us/), is completing a Masters in Media Arts and Design at the Bauhaus University in Weimar/Germany. He mainly does user research and usability, but has an interest in statistics as well. When using SOFA he noticed that a “database” needs to be selected for most of the activities but that it might be a rather technical concept for some of those who use SOFA. He wrote me an email addressing the problem and we subsequently exchanged ideas.
So how to address this without removing one important ability of SOFA – namely the ability to connect directly to people’s data when it is in a database (e.g. MySQL).
We explored a few options …
… but ended up following the principle of “the least we could do” as recommended in the fantastic usability book “Rocket Surgery Made Easy” by Steve Krug.
As Krug notes, tweaking is usually better than redesigning because 1) it actually gets done; 2) larger changes are inevitably going to break some things (think months of squashing all the bugs out again); and 3) redesigns annoy a lot of existing users who have gotten used to the status quo (actually Krug has 9 reasons but these are my favourites).
Anyway, I had no enthusiasm for a major GUI overhaul but it did not make sense to leave a known usability problem in place. What Jan and I came up with was rather simple and elegant. SOFA only shows the Database label and drop-down if the user has configured SOFA to connect to any databases. Expect to see this change in the next version (1.3.5).
Users who have database connections will notice no difference. But for everyone else the interface will be simpler and easier to use. Sometimes, less is more.
SOFA Statistics has a Mac version so I need to be able to test and package SOFA on a Mac. I do this on my Ubuntu Linux host machine using VirtualBox which works pretty well. But until a few minutes ago, the VirtualBox instance of Mac I had running was squeezed into a somewhat restrictive screen resolution. No longer! Here are the two basic steps I followed to resolve this problem:
1) Add a new screen resolution to com.apple.Boot.plist as per How to Increase Mac OS X Snow Leopard Virtual Machine Screen Resolution on VirtualBox and VMware using Method #1 (but not from the /Extra folder – from the next bit):
2) Make screen resolution available from VirtualBox end as per Notes on setting 1680×1050 resolution on a Snow Leopard inside a VirtualBox
And here is the result – a much more pleasant experience of testing SOFA on the Mac platform.
Next goal is to get some tricky graphics libraries I need working on the Mac.
SOFA has a plug-in for exporting reports and individual output as images (PNG) and/or PDFs. Unfortunately, I haven’t been able to make a version which works for OS X. The plug-in works on Windows and Linux but there are crucial libraries I haven’t yet been able to get working on Mac. Fortunately there are some signs of progress. Sid Stewart of PDF Labs is working on a new version of pdftk (one of the libraries I need working) and will be building a new installer for Mac. And wkhtmltopdf and pyPDF are already working. So getting the export output plug-in working for Mac might be possible after all.
You might be able to help. If you are a Mac user, and you are able to get either of the following libraries working on your machine, please drop me a line (firstname.lastname@example.org) letting me know how you did it.
- Ghostscript (used to convert PDF → PNG)
- ImageMagick (used to trim PNG to correct dpi) or, even better still, PythonMagick
I was wanting to shrink the font of elements of the SOFA GUI dialogs so I could squeeze more in or relocate items to more logical positions. Can’t be that hard, surely? I have since discovered that if a drop-down list (wxPython wx.Choice widget) has lots of items e.g. 30+ it takes seconds for fresh items to be added the the widget if you are trying to use your own font selection (using setFont()) on Linux. SetItems() takes a long time as, presumably, it sets the font for each individual item. And given I can’t control how many items will appear in drop-down lists or avoid having to repopulate lists (e.g. new data table selected so variable lists have to be updated) the option of shrinking fonts is not viable. Back to the drawing board.
[UPDATE] I came up with a workaround. Because there is no performance problem when items are included with the initial instantiation of dropdown widgets, all dropdowns are rebuilt each time they are changed. This means they have to be destroyed before being replaced, and the panel they are on must be hidden temporarily to avoid flicker on Windows, but it works. The fact that I was able to clean up some code in the process almost compensates for the considerable extra work
The SOFA installers for Windows and Mac have shrunk substantially – from 43MB to 25MB for Windows and from a rather hefty 85MB to 36MB for Mac. They’ll be quicker to download, and the new installers also avoid possible conflicts with other Python packages on a system. It’s all self-contained. A final benefit is that the installation process itself has become much simpler, with much fewer steps. For those who are technically minded, it is thanks to pyinstaller and py2app (with some initial help from Gui2exe).
As SOFA Statistics has gained more functionality it has grown in complexity – there are modules for reading Excel spreadsheets, connecting to Google Docs spreadsheets, displaying charts, displaying GUI widgets etc. Trying to make a single executable for Windows users was always going to be a challenge and would probably involve a lot of trial and error. So it proved.
But there was one technique I used to make the seemingly impossible task manageable. I made a single python script I called launch.py which was responsible for importing all the main modules the executable would need to handle (e.g. matplotlib, MySQLdb etc). I identified the imports I would need by looking at each and every main module in SOFA and adding any external library module imports not already included.
The process of making an executable failed initially, so by variously commenting and uncommenting parts of the launch script I was able to isolate problem modules and fix them. To get PostgreSQL working, for example, I needed to add the following fix:
try: # I needed to add the Postgres library directory to the PATH # variable in Windows. Apparently when Postgres is installed under Windows as a # service, this isn't done automatically (no need to) so that library isn't # available. [http://osdir.com/ml/python.db.pygresql/2008-03/msg00021.html] # OK to hardwire to version available to my installer dev environment. The user experience # will depend on whether they have set the PATH properly. os.environ['PATH'] += ";C:\\Program Files\\PostgreSQL\\9.1\\bin" import pgdb except ImportError, e: pass
Here is the full text of launch.py:
#! /usr/bin/env python # -*- coding: utf-8 -*- from __future__ import absolute_import from __future__ import division # so 5/2 = 2.5 not 2 ! from __future__ import print_function # remove import __future__ from dbe_sqlite import cgi import codecs from collections import defaultdict from collections import namedtuple import copy import csv import datetime import decimal import gettext import glob import locale import math from operator import itemgetter import os import platform import pprint import random import re import shutil import socket import subprocess import sys import time import traceback from types import IntType, FloatType, ListType, TupleType, StringType import warnings import weakref import webbrowser import xml.etree.ElementTree as etree import zipfile # Even though not used here pyinstaller won't know about it otherwise # and will not have it when encountered in import2run.py/start.py etc import MySQLdb as mysql try: # I needed to add the Postgres library directory to the PATH # variable in Windows. Apparently when Postgres is installed under Windows as a # service, this isn't done automatically (no need to) so that library isn't # available. [http://osdir.com/ml/python.db.pygresql/2008-03/msg00021.html] # OK to hardwire to version available to my installer dev environment. The user experience # will depend on whether they have set the PATH properly. os.environ['PATH'] += ";C:\\Program Files\\PostgreSQL\\9.1\\bin" import pgdb except ImportError, e: pass import sqlite3 as sqlite # using sqlite3.dll from Python 2.7 so includes foreign key support #import wxversion #wxversion.select("2.8") # Not needed when using executable. # http://groups.google.com/group/pyinstaller/browse_thread/thread/1b57e64ddc35e772 if not hasattr(sys, 'frozen'): import wxversion wxversion.select('2.8') import wx import wx.lib.iewin as ie import wx.gizmos import wx.grid import wx.html try: from agw import hyperlink as hl except ImportError: # if it's not there locally, try the wxPython lib. import wx.lib.agw.hyperlink as hl # problem locating eggs folder - solution in http://www.pyinstaller.org/ticket/185 # change pyinstaller-1.5\support\_pyi_egg_install.py #if os.path.isdir(d): # for fn in os.listdir(d): # sys.path.append(os.path.join(d, fn)) import numpy as np #if hasattr(sys, 'frozen') and sys.frozen: # import numpy.core.ma # sys.modules['numpy.ma'] = sys.modules['numpy.core.ma'] # if include matplotlib before sys.path, matplotlib.collections shadows collections and won't find namedtuple # Currently problem with Path in environment MATPLOTLIBDATA not a directory # Must put mpl-data folder in same folder as the executable is finally run from import matplotlib #import matplotlib.numerix as Numerix #from matplotlib.axes import _process_plot_var_args #from matplotlib.backend_bases import FigureCanvasBase #from matplotlib.backends.backend_agg import FigureCanvasAgg, RendererAgg #from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg #from matplotlib.figure import Figure #from matplotlib.font_manager import FontProperties #from matplotlib.projections.polar import PolarAxes #from matplotlib.transforms import Bbox # connected to matplotlib # don't exclude Tkinter, Tkconstants import wxmpl import pylab # must import after wxmpl so matplotlib.use() is always first # don't import boomslang - trouble with import pylab in many cases, even import math. # works fine if matplotlib baked into exe #import boomslang # no need to bake googleapi in as nothing installed as such. Just ensure not using stale pycs from Ubuntu system. #import googleapi # problem with import os etc if using below #import googleapi.gdata.spreadsheet.service as gdata_spreadsheet_service #import googleapi.gdata.spreadsheet as gdata_spreadsheet #import googleapi.gdata.docs.service as gdata_docs_service #import googleapi.gdata.service as gdata_service # no need to bake xlrd in as nothing installed as such. Just ensure not using stale pycs from Ubuntu system. #import xlrd import adodbapi import pywintypes import win32api import win32con import win32com import win32com.client import dao36_from_genpy # go to makepy/genpy and look in py files till found - taken and rename and relocate so can directly call import import2run
The code for SOFA is cross-platform and I start the Windows packaging process by copying everything across from Ubuntu. It is important in such a case to wipe all pyc files so that platform-specific ones are created for Windows and included in the executable creation process.
The final import statement is for import2run.py. This means that the executable doesn’t hardwire anything beyond the imports. As it happens I started by having import2run contain just the following line:
Later, once all the basic imports were working, I changed it to:
to actually load SOFA. NB the executable created using the technique described here doesn’t replace all the SOFA modules with a single executable – its purpose is to replace Python and all the extra libraries such as matplotlib. So the exe is expected to live in the main SOFA program folder (usually in C:\Program Files\sofastats) alongside the usual modules such as core_stats.py. If a user actually had Python 2.6 and all the libraries installed they could either use the exe or run start.py directly themselves. It would have the same effect.
Getting matplotlib to work took a while and involved many false leads. In the end the solution was to copy the entire mpl-data folder (from somewhere like C:\Python26\Lib\site-packages\matplotlib) into the same folder as the sofastats.exe was going to end up.
Some final things I learned about Pyinstaller. –onedir is the default and adds the coll = COLLECT(…) part of the spec file. If making manual changes remember that if you want the onedir approach, don’t include a.binaries in the EXE(…) part and exclude_binaries should be True. If, like myself you want a single executable file, don’t bother with coll = COLLECT(…), include a.binaries, and set exclude_binaries to False. And while testing set debug=True and Console=True so you can see what is going wrong as you refine your spec file, launch.py script etc.
Although GUI2EXE is a wonderful program some aspects may not be compatible with Pyinstaller 1.5.1 so I now build my spec file using makespec.py with the –onefile argument. It works in its basic vanilla form for SOFA using launch.py. You can export the spec file GUI2EXE makes and see the differences.
Here is the final spec file I used:
# -*- mode: python -*- # used MAKESPEC 1.5.1 with --onefile option # NB must include mpl-data folder under main sofastats level (i.e. sibling of dbe_plugins etc) for matplotlib to work # manually set level=9 in PYZ params (inspired by how GUI2EXE did it) # manually replaced name=os.path.join('dist', 'launch.exe'), with name='C:\\sofastats_build_exe\\sofa.main\\sofastats.exe', # manually set debug=True, upx=False in EXE params # manually set exclude_binaries=False in EXE params a = Analysis([os.path.join(HOMEPATH,'support\\_mountzlib.py'), os.path.join(HOMEPATH,'support\\useUnicode.py'), 'C:\\sofastats_build_exe\\sofa.main\\launch.py'], pathex=['C:\\Python26\\pyinstaller-1.5.1']) pyz = PYZ(a.pure, level=9) exe = EXE( pyz, a.scripts, a.binaries, a.zipfiles, a.datas, exclude_binaries=False, name='C:\\sofastats_build_exe\\sofa.main\\sofastats.exe', debug=True, strip=False, upx=False, console=True )
Before going live switch debug and console to False.
This post is largely specific to SOFA Statistics but hopefully it includes some tips which might save others a lot of fruitless struggle. If you have trouble, I found the pyinstaller mailing list people helpful.
0.9.3 has nice new graphical output for the Chi Square Test and a few other enhancements. At least as important, however, are all the bug fixes. These are the result of a new pre-release testing process.
Underlying the clustered bar charts is the boomslang library, which provides a simplified interface to common matplotlib charts. What a great idea, and what a great name for a Python library.
Summary of new features in version 0.9.3:
- Chi Square output includes clustered bar charts to display proportions and frequencies for the two variables selected.
- Drop-downs default to the most recently used database and table. This recognises that most of the time you are using the same table as you used in the last analysis.
- More helpful messages if trying to use variables with too many values for Chi Square.
- Fix for Linux users with a 4-digit year date format.
- Fixed encoding display issue for Windows users.
- Miscellaneous fixes to the behaviour of the table design dialog. Numerous bugs were flushed out by more extensive user testing before release.
- The Expand button is disabled if a report runs but not successfully (e.g. returns a warning).
- The default database and table are saved correctly according to database engine (e.g. MySQL, MS Access etc). This ensures valid projects can always open.