Archive for the ‘open source’ Category

Great new SOFA teaching resource

Saturday, January 28th, 2017

Thanks to George Self there is a great new teaching resource available for SOFA users. See Here is George’s announcement repeated from the discussion group:

I teach an undergrad research methodology class and wrote a SOFA-based lab manual for that class that some of you may be interested in. You can find the manual and the data sets at

The manual has ten chapters:

  1. Introduction (data types, normal distribution, kurtosis, skew, null hypothesis, downloading/installing SOFA, recoding data)
  2. Central Measures (mean, median, mode)
  3. Data Dispersion (range, quartiles, standard deviation)
  4. Visualizing Dispersion (box charts)
  5. Frequency Tables (frequency tables, crosstabs, complex crosstabs)
  6. Visualizing Frequency (histogram, bar chart, clustered bar chart, pie chart, line graph)
  7. Correlation (pearson’s r, spearman’s rho, significance, scatter plots)
  8. Regression
  9. Hypothesis Testing: Nonparametric Statistics (SOFA Statistics Wizard, Kruskal-Wallis H, Wilcoxon Signed Ranks, Mann-Whitney U)
  10. Hypothesis Testing: Parametric Statistics (ANOVA, t-test-Independent, t-test-Paired)

There are also two appendices, the first is a data dictionary for each of the data sets used and the second covers the various report generating features of SOFA.

The lab manual covers all of the functions and features of SOFA, but in the context of a lab where those functions are practiced rather than just described. The manual also includes a lot of information about how the various statistical measures are used (for example, the difference between correlation and causation). No math knowledge beyond simple high school algebra is assumed on the part of the student and each of the labs includes a “deliverable” activity so instructors can use this as part of a class.

I’ve printed this manual under Creative Commons-BY-ShareAlike so please feel free to use this in any way you want. Of course, I’m also happy to receive comments that could help me improve this manual in the future.


Please give the resource a spin and provide George with any feedback that can improve/refine it. Once again, thanks George for making this available to the community 🙂

Using SOFA on multiple machines with synced config and data

Tuesday, April 26th, 2016

One of the nice things about open source software like SOFA Statistics is that you can freely install it on as many machines as you like without licensing issues, complex validation etc. But how do you keep your content in sync across all the different devices? One approach is to keep the user sofastats folder in a synced drive.

Version 1.4.4 good for Mac Users

Monday, May 11th, 2015

Mac users can finally export output in PDF format including individual charts and report tables. And depending on the version of OS X Mac users will also be able to export as PNG images. Ensuring image exporting works across multiple versions of OS X is an ongoing project of work which users can help with if interested.



Users needing to produce monochrome output for publication will like the addition of a new monochrome theme.


All New Features in 1.4.4:

  • Mac users can export output in PDF format (and PNG depending on version of OS X).
  • Added new monochrome theme.
  • Chi Square proportions output much easier to interpret successfully.
  • The name of the grouping variable is now displayed when running comparisons of groups e.g. Country if comparing Italy and Germany.
  • Exporting to spreadsheet detects if too many fields for xls output and informs user that only csv will be generated. Also truncates table name so worksheet name not too long.
  • Import dialog only displays file types suitable for importing.
  • Added message to let user know spreadsheet creation being skipped if no report tables to export.
  • More user help on need for raw data (not pre-summarised) and long-format vs wide-format data as appropriate.
  • Code reorganisation to make it possible for SOFA to be called in GUI form by external code GUI code.
  • Scripts are now easier to use for standalone purposes.
  • Added note about treatment of datetime data as categorical by SOFA for purposes of statistical tests.
  • When exporting to spreadsheet and csv changes reserved sofa_id field name to was_sofa_id so it is OK to reimport after changes.
  • More informative for larger range of potential problem e.g. database engine not functioning.

Bug Fixes:

  • Fix bug resulting in Pearson’s r being displayed instead of Spearman’s rho.
  • Fixed bug on some systems when saving a worksheet with spaces in name.
  • Prevented numerous bugs related to quoting table names, fully qualified file names etc.
  • Fixed bug with misuse of escape_pre_write on python code rather than normal content.
  • Skew, and normality test now cope with the nan issue better e.g. sqrt of a negative number. Just says unable to calculate instead of displaying nan (not a number). skewtest function now copes with negative number as input to square root.
  • Fixed bug when importing NaN text – now treated as a missing value in a numeric field.
  • Removed bug which sometimes prevented Mac users from being able to successfully change the report name.
  • Stopped making export folder if no output to export into it.

SOFA Shows Support For Linux

Thursday, April 10th, 2014

SOFA works on Windows, Mac, and Linux. But Linux is especially important for the project because SOFA is developed on Ubuntu. So it made sense to support the Linux ecosystem by signing up with the Open Invention Network. In an ideal world, it wouldn’t be necessary to have anything to do with software patents. For various reasons, they’re a bad idea and function more to inhibit innovation than encourage investment in software research and development. But the Open Invention Network plays a protective function in a world where people who create and actually make things can be preyed upon by parasites who have been granted monopolies on ideas – the so-called patent trolls.

The group was created to defend Linux from patent trolls and other attacks from patent holders. It tries to do this with its own patents which are then available royalty-free to any company, institution or individual that agrees not to assert its patents against Linux. While it hasn’t been done, these patents could also, in theory, be used by the OIN, or an OIN member, against a hostile company in a patent war.

Google joins Open Invention Network patent commons as a full member

Anyway, a range of companies and projects large and small (over 800 at present and growing) have signed up for the initiative including Google, Dropbox, IBM, Canonical, Mozilla, Twitter, Puppet Labs, Valve Software, Alfresco, NEC, Blender, OpenShot, Novell, Inkscape, Philips, Red Hat, CentOS, GNOME, Wikimedia, MariaDB Foundation, Rackspace, Moodle, Openstack, Slackware, Tor, and Sony. You get the idea.

SOFA exports high-resolution results and more

Sunday, February 9th, 2014

SOFA users can now export their output results as high-resolution images, PDFs, and spreadsheets ** without requiring additional plug-ins.

Easy high-quality exporting

It has never been easier to produce high-quality output ready to include in presentations, papers for publication etc.

High resolution output

It will also be possible to export table data. If there is a specialised analysis you can’t do in SOFA it will be much easier to export the data and import it into another stats package for that part of the process.

Export to spreadsheet from data

And the ability to backup your SOFA data and settings is built in.

Backup data and settings

So 1.4.2 is quite a major step forwards for the majority of users. I really hope you like it and spread the word.

** Doesn’t work for Macs currently – very sorry :-(. Any Mac users with Python experience are encouraged to contact the project – there are several ways you might be able to help SOFA resolve this problem. And get a little famous ;-).

CUBRID donation helps SOFA project

Tuesday, November 20th, 2012


More good news for the SOFA Statistics project – CUBRID recently donated $300 to SOFA Statistics. CUBRID is “a comprehensive open source relational database management system highly optimized for Web Applications” and SOFA recently added a plug-in to connect to CUBRID databases.

It was also great to see the support that CUBRID gave to a range of other projects. One of the interesting things was the range of countries represented: New Zealand (SOFA Statistics), Switzerland, South Korea, Kenya, Russia, Australia, Denmark, Romania, China, Spain, Germany, Indonesia, the Netherlands, and the USA. Truly a global effort.

So on behalf of the SOFA project, thanks 🙂

SOFA Wins People’s Choice Award

Friday, November 9th, 2012

Great news! SOFA Statistics won the 2012 People’s Choice Award in the NZ Open Source Awards. Thanks to everyone who voted in support.

NZOSA Trophy

In addition to the trophy and framed certificate I was lucky to get a nice new Android tablet from Zareason ( Busy playing with that at the moment.

And SOFA was also a finalist for the Best Open Source Project award.

NZOSA Awards

So it was a great awards ceremony for the project.

Awards Ceremony Speech

Video of Open Source People’s Choice Award (Presented by ZaReason)

Video of finalists for Open Source Software Project (SOFA one of 3 finalists)

FLOSS for Science Interview

Friday, October 12th, 2012

I was lucky enough to get interviewed by FLOSS for Science. Check it out 🙂

FLOSS for Science Interview

Vote for SOFA please

Friday, October 12th, 2012

SOFA has been nominated for The People’s Choice Award as part of the New Zealand Open Source Awards. I would really love it if as many people as possible voted for SOFA at The People’s Choice Award. Tell your friends; spread the word.

New Zealand Open Source Awards

SOFA Statistics and Open Source Business – Misc

Saturday, November 12th, 2011

The way ahead for SOFA Statistics from a business point of view is not clear at the moment and I recently wrote about some of the issues and options here: Finding a Viable Open Source Business Model – The SOFA Statistics Experience (so far). This post is a complement to that article and the purpose is to let me store miscellaneous ideas and links of relevance without having to integrate them into a coherent narrative.

Impact of cost of downtime on value of support

In my case we have three steel mills worth $10k+ per hour of downtime… Even more if downtime causes rework. If we have more than an hour down I have vice presidents in my bosses office!

Scale of Customers Matters

The company received a significant boost in late 2011 when it was contacted by Mexico’s largest telecommunications provider, Telmex. That company had been struggling to implement a network management system from another commercial supplier.

“The Telmex guys saw that MNIS was a commercially-supported product, then downloaded the free version and put it in,” Maher said. “Within a day they had it doing all the stuff that they hadn’t been able to achieve with the other product for the previous nine months. They then reached out to us and pre-purchased six commercial modules which we were yet to release, and support and our assistance for implementation.”

The initial contract is estimated as nearly half a million dollars in value, but could expand into a multimillion dollar deal for Opmantek. Maher said the deal had essentially underwritten Opmantek for the next year. Open source opens doors for Aussie start-up

App Stores not a silver bullet

See Striking It Rich In The App Store: For Developers, It’s More Casino Than Gold Mine

And mobile apps aren’t the answer, even though the sector is growing like crazy. See section entitled “ABOUT THOSE APPS” in

Hard-nosed Realities

So the HP guy comes up to me (at the Melbourne conference) and he says, ‘If you say nasty things like that to vendors you’re not going to get anything’. I said ‘no, in eight years of saying nothing, we’ve got nothing, and I’m going to start saying nasty things, in the hope that some of these vendors will start giving me money so I’ll shut up’. [Quote supposedly from Theo de Raadt – verify]

Impact of different usage patterns

The conventional wisdom on how a business model works is sometimes completely wrong in a particular case. The freemium model, for example, which I am hoping to use with SOFA, was apparently not going to work for Evernote – except it did (Evernote: Company of the Year):

Evernote was being pitched as a so-called freemium service. In other words, people could either use it for free or upgrade to a paid premium version, which is how the company would make money. So far, so good; the freemium model was seen as a smart one. The problem was that, unlike virtually all other entrepreneurs relying on that model, Libin refused to cripple the free version, removing the incentive to upgrade to the paid version. You could pay $5 a month and get additional file storage, but why would anyone do that? asked the VCs. The free version was full featured and offered generous storage.

Libin explained his theory: The more stuff you put in Evernote, the more important the service would be to you. Who would begrudge $5 a month to a company that was storing your memories and helping you retrieve them? “Your notes, your restaurants, your friends, a year of your life, then years of your life,” says Libin. “That’s worth thousands.” The danger wasn’t that people wouldn’t upgrade, he argued; it was that they wouldn’t try the service in the first place or wouldn’t stick with it because the free version was skimpy and failed to impress. Get them to fall in love with the service, and they would eventually pay, because they would be invested in its success. “I want to build a 100-year company, and I’m serious about that,” says Libin. “I don’t need to squeeze money out of you. I’ll have the rest of your life to take your money. It’s my long-term greedy strategy. Our slogan is, ‘We’d rather you stay than pay.’ Basically, I wanted a business model that rhymed.”

And …

Libin showed the group that the rate at which Evernote users were upgrading to the paid version within a month of signing up was half a percent. This was not good—and not surprising, given that the free version worked fine. But then Libin showed the upgrade rates over longer periods of time. Normally, this would be an even grimmer picture, because at almost all companies with freemium models, users who upgrade tend to do so pretty quickly. They sample the hobbled free version, and if they like it, they upgrade right away to get all the features; if they don’t like it enough to upgrade, they tend to abandon the service altogether or use it lightly. But Libin showed that Evernote users became more likely to upgrade over time. For those users who had been using Evernote for a year, the upgrade rate was an impressive 8 percent. If Evernote could get to a million users, explained Libin, sales would be close to $4 million a year. And, at the current growth rate, Evernote would reach 10 million users within two years.

Then Libin showed activity rates, or, roughly, how often an average user was actually using Evernote over time. For many software companies, that curve runs relentlessly downward. Most people who try an app abandon it pretty quickly or use it less frequently as time goes on. But for Evernote, the curve was a smile. There was a slight drop-off in usage after the first few months, but then it went up again—not only because active users were finding the service more and more useful, but also because customers who had stopped using the service were returning to it. People who left Evernote missed it.

Importance of finding your own business model which works for you

NB the value you add may be around the software, rather than the software itself. Notes that open core is very similar to proprietary models.

FOSS4G 2011 Keynote

Visits-Downloads-Sales Process

One has to be careful about drawing conclusions from a relatively small and unverifiable data set. However the results certainly seem to support the much-quoted “industry standard” sales:visits conversion ratio of 1%. But there are huge variations between products.

The fact that the sales:downloads ratio is both lower on average and more variable than the downloads:visitors ratio implies that getting people to download is the easy bit and converting the download to a sale is a tougher challenge.

The average sales:visits conversion ratio is noticeably higher for Mac OS X products than Windows products. This is supported by anecdotal evidence and the author’s own experience with a cross-platform product. However the number of Mac respondents to the survey is too small for the result to be stated with any great confidence. Also remember that the Mac market is still a lot smaller than the Windows market before you rush off to start learning Cocoa and Objective-C.
The truth about conversion ratios for downloadable software

Desktop Apps – Harder Sales Funnel?

Someone visits your website, downloads your trial, and hopefully purchases your program. That process is called a funnel, and if you break it down into concrete steps, the shareware funnel is long and arduous for the consumer:

  1. Start your web session on Google, like everyone does these days.
  2. Google your pain point.
  3. Click on the search result to the shareware site.
  4. Read a little, realize they have software that solves your problem.
  5. Mentally evaluate whether the software works on your system.
  6. Click on the download button.
  7. Wait while it downloads.
  8. Close your browser.
  9. Try to find the file on your hard disk.
  10. Execute the installer.
  11. Click through six screens that no one in the history of man has ever read.
  12. Execute the program.
  13. Get dumped at the main screen.
  14. Play around, fall in love.
  15. Potentially weeks pass.
  16. Find your way back to the shareware site. Check out price.
  17. Type in your credit card details. Hit Checkout.

I could go into more detail if I wanted, but that is seventeen different opportunities for the shareware developer to fail.

Why I’m Done Making Desktop Applications

Tracking Usage

On the Internet, privacy expectations have evolved a bit in the last few years. The overwhelming majority of the public has been told that they’re being tracked via cookies and could not care less. If you write a privacy policy, they won’t even bother reading it. Which means that you can disclose in your privacy policy that you track non-personally identifying information, which is very valuable as a software developer.

  • What features of your software are being used?
  • What features of your software are being ignored?
  • What features are used by people who go on to pay?
  • What combination of settings is most common?
  • What separates the power users from the one-try-and-quit users?

Tracking all of these is very possible with modern analytics software like, e.g., Mixpanel. You can even wrestle the information out of Google Analytics if you’re prepared to do some extra work. You can do it in a way which respects your users’ privacy while still maximizing your ability to give them what they want.

Why I’m Done Making Desktop Applications

The Risk of a Starting a Business/Chasing a Dream

Death of a Startup

Selling Software is H.A.R.D

I can’t think of a way of guaranteeing that you can feed yourself whether or not you open-source your code! Making it as an independent software vendor is hard. Above you, you have big companies who like money and won’t hesitate to offer similar software, independently developed, if it looks like you’ve found a good market. Below you, you have FLOSS developers who won’t hestitate to offer similar software for free if it looks like your software offers useful features for users. (In some cases, these groups may overlap.)

That said, you haven’t given us anywhere near enough information to answer your question. Are you talking about highly specialized software for a niche market, or general purpose software with a potentially huge market? The edge-effects of open-source development are much more likely to be useful and beneficial to you in the latter case.

What do you get out of open-sourcing your software? Free publicity is almost certainly the biggest factor. How big is your advertising budget? Also, what about distribution channels? Remember, you’re competing with big companies and (if you go the non-free route) open-source developers/companies. How are people going to hear about your software, and find it if they do hear about it, and decide if they like it better than other similar software?

Making your code proprietary greatly increases your per-user income, but makes it much more difficult (and expensive) to get new users. Open-sourcing your code makes it much easier to get new users, but greatly reduces your per-user income. Independent comic artist Phil Foglio started putting his Girl Genius comic up as a free webcomic, and said that his readership grew tenfold and his sales quadrupled. But that may or may not be typical.

There’s also the possibility of hybrid models, like releasing the core as open source, but charging for add-ons, or, if you think other companies may want to adapt and sell your code, offering a choice between a restrictive free license (e.g. GPL) or a commercial for-pay license. Depending on what your program is and how it works, those may or may not be viable options–you haven’t given us enough information to tell.

Bottom line, though: all the cards are stacked against you no matter which way you go. And, while you’ve given us very little to go on, it’s quite likely that even if you gave us ten times the details you have so far, it still wouldn’t be enough information to make more than a wild guess. Going it independent is hard and extremely risky. There’s a reason that something like 90% of all programmers are employed developing internal software that never gets licensed or distributed outside of a single company–it’s one of the few ways to be sure you eat.

Staggered release open sourcing

I believe parent has nailed it.

Ethically you want to do what is closest to your heart if you will, but unfortunately you need to eat, and usually this involves doing the opposite of ethical (or at least far from what the ideal-ethics tell you)

So I propose this. How about you release version 1.0 and 1.5 for example (or 1.0 and 2.0 or something) as regular closed-source software, and then when the next version comes out, you release the previous one as open source (e.g. release 1.0 and 2.0 for pay, when you release 3.0 for licensing you release at the same time v1.0 as open source)

this is what trolltech, mysql and other companies did. it never goes down well. it’s _extremely_ unpopular, and absolutely guarantees that there will be no community *other* that paid-up staff members involved in the actual development of the software.

the reason is very simple: any person wishing to help make improvements to the software knows full well that they might as well not bother, because the free software version that they’re using is hopelessly out-of-date.

in the case of QT, what actually happened was that the version 3 of QT (QT3) actually developed into an independent fork. the trinity desktop team now have taken full responsibility for its maintenance. bit of a digression here, but that version is years old, _but_ it has the advantage that it’s much much smaller (faster, less code) than QT4 or QT5. QT4 is severe bloat-ware that performs extremely badly on ARM9 and ARM11 platforms.

anyway the point is: the “model” you propose only really works if you’re a large corporation with lots of resources and lots of money and are willing to piss people off and make even the free software community absolutely desperate and beholden to you. that works for things like mysql and qt but dude, your software had better be _really_ shit hot to make these non-community-inclusive options work.

More business models

Monthly subscriptions meet target or stop developing (Will You Help Change The Way Open-Source Apps are Funded?)