Statistics Open For All

SOFA - Statistics Open For All

The user-friendly, open-source statistics,

analysis & reporting package.

The user-friendly, open-source statistics,

analysis & reporting package.

help:data_structure

If you have trouble analysing your variables in SOFA Statistics, check that:

- Your data is structured the right way for the analysis you want. For example, if SOFA needs a column for gender and a column for height, there will be a problem if your data has a column for male height and a column for female height.
- Any variables you need to analyse as numbers e.g. for correlation analyses or histograms, have actually been entered/imported as numeric data not as text.

The first step is to think about what you want to find out about the data. Here are some examples.

Instead of one column per condition or group there needs to be a group column and a measures column.

Example of a bad format (for SOFA):

Male Female 186 167 179 170 ...

Example of a good format (for SOFA):

Gender Height Male 186 Female 167 Male 179 Female 170 ...

In this case, the ranked or averaged variable would be Height, the Group By variable would be Gender, and groups a and b would be Male and Female respectively.

Or if we were looking at the fictitious weight data in the demonstration data and we wanted to know if it differed between two countries:

E.g. looking at linear correlation:

Age Weight 56 86 22 55 ...

In the appropriate SOFA dialog you would select one variable as A and the other as B.

E.g. looking to see if there is a difference between fuel consumption before a fuel gadget was added and afterwards:

NB each row would be the data for one vehicle (or one type of vehicle etc depending on what was being studied).

Consumption (before) Consumption (after) 12.5 11.7 16.1 16.0 ...

Or a difference in weight before and after a diet:

NB each row would be the data for one person.

Weight Post-diet Weight 87 90 59 59 ...

In the appropriate SOFA dialog you would select one variable as A and the other as B.

The most common problem is when your data has the data for different groups in different variables.

E.g. height data for two genders:

Male Female 186 167 179 170 ...

The easiest way to handle this might be to change the data in a spreadsheet and import it in the restructured form.

- Delete the variable not needed (Female in this case)

NB You could have used 1 for Male and 2 for Female if you preferred and added value labels to Gender once the data was imported into SOFA Statistics. See Setting variable details e.g. labels

The same process can be used if there are multiple groups e.g. countries instead of genders.

If you imported your data into SOFA from a spreadsheet, the solution is probably to change the appropriate column data types to numeric and reimport the data. SOFA tries to warn you if it doesn't detect enough numeric variables for the analysis you are conducting e.g. you need at least two numeric variables to conduct a Pearson's R linear correlation analysis.

help/data_structure.txt · Last modified: 2010/10/06 05:22 (external edit)