Better installation in non-English environments

November 23rd, 2011

Version 1.1.2 fixes a bug which affected people trying to install SOFA into many non-English environments. SOFA also has some changes which make it safe for SOFA to communicate progress in more detail while being run in Windows using the non-console version of Python. Overall, SOFA has become much more robust in recent versions.

SOFA Statistics and Open Source Business – Misc

November 12th, 2011

The way ahead for SOFA Statistics from a business point of view is not clear at the moment and I recently wrote about some of the issues and options here: Finding a Viable Open Source Business Model – The SOFA Statistics Experience (so far). This post is a complement to that article and the purpose is to let me store miscellaneous ideas and links of relevance without having to integrate them into a coherent narrative.

Impact of cost of downtime on value of support

In my case we have three steel mills worth $10k+ per hour of downtime… Even more if downtime causes rework. If we have more than an hour down I have vice presidents in my bosses office!

Scale of Customers Matters

The company received a significant boost in late 2011 when it was contacted by Mexico’s largest telecommunications provider, Telmex. That company had been struggling to implement a network management system from another commercial supplier.

“The Telmex guys saw that MNIS was a commercially-supported product, then downloaded the free version and put it in,” Maher said. “Within a day they had it doing all the stuff that they hadn’t been able to achieve with the other product for the previous nine months. They then reached out to us and pre-purchased six commercial modules which we were yet to release, and support and our assistance for implementation.”

The initial contract is estimated as nearly half a million dollars in value, but could expand into a multimillion dollar deal for Opmantek. Maher said the deal had essentially underwritten Opmantek for the next year. Open source opens doors for Aussie start-up

App Stores not a silver bullet

See Striking It Rich In The App Store: For Developers, It’s More Casino Than Gold Mine

And mobile apps aren’t the answer, even though the sector is growing like crazy. See section entitled “ABOUT THOSE APPS” in

Hard-nosed Realities

So the HP guy comes up to me (at the Melbourne conference) and he says, ‘If you say nasty things like that to vendors you’re not going to get anything’. I said ‘no, in eight years of saying nothing, we’ve got nothing, and I’m going to start saying nasty things, in the hope that some of these vendors will start giving me money so I’ll shut up’. [Quote supposedly from Theo de Raadt – verify]

Impact of different usage patterns

The conventional wisdom on how a business model works is sometimes completely wrong in a particular case. The freemium model, for example, which I am hoping to use with SOFA, was apparently not going to work for Evernote – except it did (Evernote: Company of the Year):

Evernote was being pitched as a so-called freemium service. In other words, people could either use it for free or upgrade to a paid premium version, which is how the company would make money. So far, so good; the freemium model was seen as a smart one. The problem was that, unlike virtually all other entrepreneurs relying on that model, Libin refused to cripple the free version, removing the incentive to upgrade to the paid version. You could pay $5 a month and get additional file storage, but why would anyone do that? asked the VCs. The free version was full featured and offered generous storage.

Libin explained his theory: The more stuff you put in Evernote, the more important the service would be to you. Who would begrudge $5 a month to a company that was storing your memories and helping you retrieve them? “Your notes, your restaurants, your friends, a year of your life, then years of your life,” says Libin. “That’s worth thousands.” The danger wasn’t that people wouldn’t upgrade, he argued; it was that they wouldn’t try the service in the first place or wouldn’t stick with it because the free version was skimpy and failed to impress. Get them to fall in love with the service, and they would eventually pay, because they would be invested in its success. “I want to build a 100-year company, and I’m serious about that,” says Libin. “I don’t need to squeeze money out of you. I’ll have the rest of your life to take your money. It’s my long-term greedy strategy. Our slogan is, ‘We’d rather you stay than pay.’ Basically, I wanted a business model that rhymed.”

And …

Libin showed the group that the rate at which Evernote users were upgrading to the paid version within a month of signing up was half a percent. This was not good—and not surprising, given that the free version worked fine. But then Libin showed the upgrade rates over longer periods of time. Normally, this would be an even grimmer picture, because at almost all companies with freemium models, users who upgrade tend to do so pretty quickly. They sample the hobbled free version, and if they like it, they upgrade right away to get all the features; if they don’t like it enough to upgrade, they tend to abandon the service altogether or use it lightly. But Libin showed that Evernote users became more likely to upgrade over time. For those users who had been using Evernote for a year, the upgrade rate was an impressive 8 percent. If Evernote could get to a million users, explained Libin, sales would be close to $4 million a year. And, at the current growth rate, Evernote would reach 10 million users within two years.

Then Libin showed activity rates, or, roughly, how often an average user was actually using Evernote over time. For many software companies, that curve runs relentlessly downward. Most people who try an app abandon it pretty quickly or use it less frequently as time goes on. But for Evernote, the curve was a smile. There was a slight drop-off in usage after the first few months, but then it went up again—not only because active users were finding the service more and more useful, but also because customers who had stopped using the service were returning to it. People who left Evernote missed it.

Importance of finding your own business model which works for you

NB the value you add may be around the software, rather than the software itself. Notes that open core is very similar to proprietary models.

FOSS4G 2011 Keynote

Visits-Downloads-Sales Process

One has to be careful about drawing conclusions from a relatively small and unverifiable data set. However the results certainly seem to support the much-quoted “industry standard” sales:visits conversion ratio of 1%. But there are huge variations between products.

The fact that the sales:downloads ratio is both lower on average and more variable than the downloads:visitors ratio implies that getting people to download is the easy bit and converting the download to a sale is a tougher challenge.

The average sales:visits conversion ratio is noticeably higher for Mac OS X products than Windows products. This is supported by anecdotal evidence and the author’s own experience with a cross-platform product. However the number of Mac respondents to the survey is too small for the result to be stated with any great confidence. Also remember that the Mac market is still a lot smaller than the Windows market before you rush off to start learning Cocoa and Objective-C.
The truth about conversion ratios for downloadable software

Desktop Apps – Harder Sales Funnel?

Someone visits your website, downloads your trial, and hopefully purchases your program. That process is called a funnel, and if you break it down into concrete steps, the shareware funnel is long and arduous for the consumer:

  1. Start your web session on Google, like everyone does these days.
  2. Google your pain point.
  3. Click on the search result to the shareware site.
  4. Read a little, realize they have software that solves your problem.
  5. Mentally evaluate whether the software works on your system.
  6. Click on the download button.
  7. Wait while it downloads.
  8. Close your browser.
  9. Try to find the file on your hard disk.
  10. Execute the installer.
  11. Click through six screens that no one in the history of man has ever read.
  12. Execute the program.
  13. Get dumped at the main screen.
  14. Play around, fall in love.
  15. Potentially weeks pass.
  16. Find your way back to the shareware site. Check out price.
  17. Type in your credit card details. Hit Checkout.

I could go into more detail if I wanted, but that is seventeen different opportunities for the shareware developer to fail.

Why I’m Done Making Desktop Applications

Tracking Usage

On the Internet, privacy expectations have evolved a bit in the last few years. The overwhelming majority of the public has been told that they’re being tracked via cookies and could not care less. If you write a privacy policy, they won’t even bother reading it. Which means that you can disclose in your privacy policy that you track non-personally identifying information, which is very valuable as a software developer.

  • What features of your software are being used?
  • What features of your software are being ignored?
  • What features are used by people who go on to pay?
  • What combination of settings is most common?
  • What separates the power users from the one-try-and-quit users?

Tracking all of these is very possible with modern analytics software like, e.g., Mixpanel. You can even wrestle the information out of Google Analytics if you’re prepared to do some extra work. You can do it in a way which respects your users’ privacy while still maximizing your ability to give them what they want.

Why I’m Done Making Desktop Applications

The Risk of a Starting a Business/Chasing a Dream

Death of a Startup

Selling Software is H.A.R.D

I can’t think of a way of guaranteeing that you can feed yourself whether or not you open-source your code! Making it as an independent software vendor is hard. Above you, you have big companies who like money and won’t hesitate to offer similar software, independently developed, if it looks like you’ve found a good market. Below you, you have FLOSS developers who won’t hestitate to offer similar software for free if it looks like your software offers useful features for users. (In some cases, these groups may overlap.)

That said, you haven’t given us anywhere near enough information to answer your question. Are you talking about highly specialized software for a niche market, or general purpose software with a potentially huge market? The edge-effects of open-source development are much more likely to be useful and beneficial to you in the latter case.

What do you get out of open-sourcing your software? Free publicity is almost certainly the biggest factor. How big is your advertising budget? Also, what about distribution channels? Remember, you’re competing with big companies and (if you go the non-free route) open-source developers/companies. How are people going to hear about your software, and find it if they do hear about it, and decide if they like it better than other similar software?

Making your code proprietary greatly increases your per-user income, but makes it much more difficult (and expensive) to get new users. Open-sourcing your code makes it much easier to get new users, but greatly reduces your per-user income. Independent comic artist Phil Foglio started putting his Girl Genius comic up as a free webcomic, and said that his readership grew tenfold and his sales quadrupled. But that may or may not be typical.

There’s also the possibility of hybrid models, like releasing the core as open source, but charging for add-ons, or, if you think other companies may want to adapt and sell your code, offering a choice between a restrictive free license (e.g. GPL) or a commercial for-pay license. Depending on what your program is and how it works, those may or may not be viable options–you haven’t given us enough information to tell.

Bottom line, though: all the cards are stacked against you no matter which way you go. And, while you’ve given us very little to go on, it’s quite likely that even if you gave us ten times the details you have so far, it still wouldn’t be enough information to make more than a wild guess. Going it independent is hard and extremely risky. There’s a reason that something like 90% of all programmers are employed developing internal software that never gets licensed or distributed outside of a single company–it’s one of the few ways to be sure you eat.

Staggered release open sourcing

I believe parent has nailed it.

Ethically you want to do what is closest to your heart if you will, but unfortunately you need to eat, and usually this involves doing the opposite of ethical (or at least far from what the ideal-ethics tell you)

So I propose this. How about you release version 1.0 and 1.5 for example (or 1.0 and 2.0 or something) as regular closed-source software, and then when the next version comes out, you release the previous one as open source (e.g. release 1.0 and 2.0 for pay, when you release 3.0 for licensing you release at the same time v1.0 as open source)

this is what trolltech, mysql and other companies did. it never goes down well. it’s _extremely_ unpopular, and absolutely guarantees that there will be no community *other* that paid-up staff members involved in the actual development of the software.

the reason is very simple: any person wishing to help make improvements to the software knows full well that they might as well not bother, because the free software version that they’re using is hopelessly out-of-date.

in the case of QT, what actually happened was that the version 3 of QT (QT3) actually developed into an independent fork. the trinity desktop team now have taken full responsibility for its maintenance. bit of a digression here, but that version is years old, _but_ it has the advantage that it’s much much smaller (faster, less code) than QT4 or QT5. QT4 is severe bloat-ware that performs extremely badly on ARM9 and ARM11 platforms.

anyway the point is: the “model” you propose only really works if you’re a large corporation with lots of resources and lots of money and are willing to piss people off and make even the free software community absolutely desperate and beholden to you. that works for things like mysql and qt but dude, your software had better be _really_ shit hot to make these non-community-inclusive options work.

More business models

Monthly subscriptions meet target or stop developing (Will You Help Change The Way Open-Source Apps are Funded?)

Good news for Mac & Linux users – Excel importing added

October 9th, 2011

SOFA Statistics 1.1.1 brings good news for Mac and Linux users. You can now import Excel xls files directly. This is no longer a Windows-only feature.

Here is the full list of changes:

  • Excel can be imported from Mac and Linux as well as Windows.
  • ODS importing now copes with single ‘divider’ columns – i.e. columns with no field name in the header.
  • CSV importing now autofills blank columns with field numbers such as Var018.
  • More informative if locale issues.
  • More informative if unable to connect to MySQL on Mac.
  • Changed standard deviation in report tables from population sd to sample sd.

There is one important set of bug fixes which allows more sophisticated extraction of cell values from ODS spreadsheets. SOFA now copes with formatted content of cells and other complex cases by handling subelements in the XML.

Version 1.1.0 brings it together

August 20th, 2011

Version 1.1.0 finally brings it together adding some of the last features to round out the original vision for the application. The main change is much easier access to data – users can now open data tables from anywhere the table can be selected e.g. charts, report tables, statistical analyses.

Open your data from anywhere

Another change makes it easier to import from spreadsheets – SOFA now gives a preview of the first few rows of data to make it easier to determine whether there is a header row or not:

Spreadsheet sample

The two extra changes are: Importing from Google Doc spreadsheets now automatically starts import process if downloading was successful; Windows users can install into any folder now, not just one with sofastats in the name.

There are also a couple of bug fixes: Fixed bug when trying to display feedback on resizing operation on data table from dialogs other than data select; and fixed regression when running data list report tables.

Here are all the major feature changes since version 1.0 was released:

  • Single line charts now have option of a trend line and data smoothing (weighted rolling average).

    New options for line charts

  • Averages can be displayed for most chart types e.g. a line chart of average income by month.

    Chart Averages

  • Attractive and dynamic Box and Whisker plots added.

    Box and Whisker Plot button

    Box and Whisker Plot

  • Much easier access to data – can now open data table from anywhere the table can be selected e.g. charts, report tables, stats analyses.
  • Numerous usability improvements and bug fixes.

I hope you really like it.

1.0.7 Much easier data entry; better support for non-English text

July 28th, 2011

It is now a lot easier and more pleasant to enter data directly into SOFA. Check it out and see if you agree. It is also easier to get CSV data in if there are lots of fields. Overall this is an incremental step forwards rather than the introduction of lots of new features. Here is the full list of improvements:

  • Much easier and quicker data entry. Return key now functions like Tab in data entry tables. Deleting a cell automatically inserts the appropriate value.
  • Much faster importing of csv files with lots of fields. Now has option of quickly checking field names collectively (in batches under the surface) rather than individually.
  • Improved feedback to user if problem in early stages starting SOFA. Program now makes an error text file on the user desktop as well.
  • All field or table name checks in SQLite now return the SQLite error text as well.
  • Better message to user if installation of wx backend for matplotlib missing.
  • If cancel process of changing file used to define variable config, report table display no longer reverts to random demo.

and bug fixes:

  • Fixed bug in chi square when no labels set for numerical variables. Needed to convert value to unicode before using as label.
  • Fixed bug when importing datetimes with ‘T’ as the separator between date and time.
  • Fixed bug caused by SQLite queries sometimes returning strings instead of floats when extracting REAL (numeric) data. Fixed it where it affected Row Stats medians and std devs; and statistical tests.
  • Fixed bug when uwhisker and lwhisker not set. Also copes better when no boxes are displayed in boxplot.
  • Handling Python 2.6 unicode keyword bug.
  • Replaced pprint.pformat where it messes up unicode e.g. user paths with non-ascii characters. Misc other changes to fix internal issues.
  • Fixed bug allowing None to be displayed in Val A and Val B drop-downs under Group by e.g. ANOVA.
  • Config dialog in Report Tables widened slightly when needed to display title.
  • Fixed bug when decimal entered into value label list for an integer field.
  • Fixed CSV import bug when trying to guess whether a header or not.

Thanks to all the users who helped identify and resolve problems.

Great new tutorial on Hypothesis Testing

June 11th, 2011

J David Eisenberg has written a great new tutorial on hypothesis testing and here is a guest post from him for the SOFA blog. Enjoy:

I teach a psychology research methods course at a local community college. Every semester, I see the students’ confusion about hypothesis testing and significance levels. Even at the end of the semester, there are always a few students who think that a statistical result with a probability of .001 must *not* be significant, because the number is so small.

I do explain the concept during one lecture, but that just doesn’t do the trick. I could write a web page with the explanation, but I’m sure I’d get a TL; DR [1] from the students. So, I decided to make the explanation in the form of a visual novel [2] (VN). I used the Ren’py [3] visual novel engine to create the script, and it worked fine. The problem is, you need to download a fairly large file in order to display the VN; again, something that students would probably not be eager to do. The solution, which Grant [Developer of SOFA Statistics] suggested, was to make it all web-based. After some failed experiments with canvas and SVG, I was able to achieve the effects I wanted with HTML, CSS,and JavaScript. The result is at

Tutorial on hypothesis testing


50,000 downloads – another milestone

June 6th, 2011

SOFA Statistics had its 50,000th download today, and last month had over 4,600 downloads – a new record.

Download milestone

Box and Whisker Plots in version 1.0.6

June 4th, 2011

Box and Whisker plots have been added.

Box and Whisker Plot button

In keeping with the SOFA ethos these have been made as attractive as possible:

Box and Whisker Plot

As with all SOFA output, the Box and Whisker Plots are themed and, like other charts, the content is dynamic and interactive. A minor feature added to this version is a warning about the need to include the “sofa_report_extras” subfolder to see charts.

The following bug fixes have been made:

  • Fixed bug preventing comma being used as decimal separator when importing data.
  • Fixed bug when making line charts with averages – shows smoothed data line and trend line appropriately. Also only enables or disables checkboxes for smoothed data line and trend line where appropriate.
  • Better handling of comtype errors.
  • Fixed bug in histogram code when no default limits supplied.

New feature – Chart averages in version 1.0.5

May 16th, 2011

Do you want to make a line chart of average income by month? Or a bar chart of average height by country? The latest release of SOFA lets you do that very easily:

Chart Averages

Other improvements include:

  • Tab and Enter now work in a more intuitive way when in a settings grid. New rows which are incomplete produce a useful warning to the user.
  • Better support for Right-to-Left language locales.
  • Regression line intercept and slope included in Pearson’s Correlation output as well as a better regression line on the scatterplot.

A number of bugs have been fixed as well:

  • Fixed bug occurring when user selects a google docs spreadsheet and clicks on download after manually selecting a spreadsheet but without manually selecting a worksheet.
  • Fixed bug with report tables which occurred sometimes when a field value was the same as a field name.
  • Fixed bug when pie chart label has new line characters in it.
  • Adjusted y-axis placement on bar charts so not truncated in some cases.
  • When importing ODS data without a header, and responding to a data type inconsistency with Cancel, the progress bar is now reset to the start.
  • Fixed bug when saving projects in Windows with different name (albeit the same name if case insensitive). Would not appear in the Project Selection list and would be unable to reopen the Project Selection dialog.

Easier to “Get Started” and to give feedback

April 13th, 2011

It is easier than ever to get started with SOFA Statistics. A new “Get Started” button has been added to the main form and other buttons have been shifted to better emphasise the most important:

New "Get Started" button

Clicking on the “Get Started” button open a web page with screen shots and step-by-step examples.

It is now easy for first-time users to give their feedback. Was it as useful as they hoped? Is there anything which can be done to improve SOFA? There is a link on the main start form, plus a pop-up option on first exit from SOFA.

Feedback via simple Google Docs survey

The goal is to make SOFA more useful by finding out what worked, and didn’t for users – especially first-time users.

There were two other changes:

  • When importing csv files, SOFA now sets the default for ‘Has header row?’ according to a review of the sample contents.
  • And an Exit control has been added to all forms where appropriate.

Once again, there have been a number of bug fixes:

  • Fixed postgresql quoting error by using single quote values.
  • SOFA now uses the default database when making an initial connection. If connecting to a project fails, SOFA reverts to previous project. Postgresql projects insist on default database if user is not ‘postgres’. Gives useful error if an old project has this problem.
  • Fixed minor bug in Mann-Whitney output exposed whenever labels with %s in them were used.
  • Improved error trapping if error importing wx.lib.iewin.
  • Better font for help text on main form according to platform.
  • CSV importing copes with single-row data.
  • Better font settings for help text on Macs.
  • Fixed display bug in Mac when more than 20 values warning shown (smaller font now).
  • Chart by now filters out data lacking values in chart by variable.

The most important thing for the project is the feedback we will hopefully receive.