Statistics Open For All
SOFA - Statistics Open For All
The user-friendly, open-source statistics,
analysis & reporting package.

# SOFA Statistics

### Site Tools

help:data_structure

# My Variables Won't Go Into SOFA

If you have trouble analysing your variables in SOFA Statistics, check that:

1. Your data is structured the right way for the analysis you want. For example, if SOFA needs a column for gender and a column for height, there will be a problem if your data has a column for male height and a column for female height.
2. Any variables you need to analyse as numbers e.g. for correlation analyses or histograms, have actually been entered/imported as numeric data not as text.

## Structuring data for analysis

The first step is to think about what you want to find out about the data. Here are some examples.

### Types of SOFA Statistics analysis

#### Differences between groups

Instead of one column per condition or group there needs to be a group column and a measures column.

Example of a bad format (for SOFA):

```Male Female
186  167
179  170
...```

Example of a good format (for SOFA):

```Gender  Height
Male    186
Female  167
Male    179
Female  170
...```

In this case, the ranked or averaged variable would be Height, the Group By variable would be Gender, and groups a and b would be Male and Female respectively.

Or if we were looking at the fictitious weight data in the demonstration data and we wanted to know if it differed between two countries:

#### Relationships between two different variables

E.g. looking at linear correlation:

```Age  Weight
56   86
22   55
...```

In the appropriate SOFA dialog you would select one variable as A and the other as B.

#### Difference between two "paired" variables

E.g. looking to see if there is a difference between fuel consumption before a fuel gadget was added and afterwards:

NB each row would be the data for one vehicle (or one type of vehicle etc depending on what was being studied).

```Consumption (before)    Consumption (after)
12.5                    11.7
16.1                    16.0
...```

Or a difference in weight before and after a diet:

NB each row would be the data for one person.

```Weight  Post-diet Weight
87      90
59      59
...```

In the appropriate SOFA dialog you would select one variable as A and the other as B.

The most common problem is when your data has the data for different groups in different variables.

E.g. height data for two genders:

```Male Female
186  167
179  170
...```

The easiest way to handle this might be to change the data in a spreadsheet and import it in the restructured form.

1. Insert group by column
2. Transfer first variable (Male) by renaming it to the measure (Height) and populating the group by column (Gender) for that variable
3. Transfer second variable by pasting height values below and completing the Gender column with the variable (Female)
4. Delete the variable not needed (Female in this case)

NB You could have used 1 for Male and 2 for Female if you preferred and added value labels to Gender once the data was imported into SOFA Statistics. See Setting variable details e.g. labels

The same process can be used if there are multiple groups e.g. countries instead of genders.

## Numbers stored in a text variable

If you imported your data into SOFA from a spreadsheet, the solution is probably to change the appropriate column data types to numeric and reimport the data. SOFA tries to warn you if it doesn't detect enough numeric variables for the analysis you are conducting e.g. you need at least two numeric variables to conduct a Pearson's R linear correlation analysis.