SOFA
Statistics Open For All
SOFA - Statistics Open For All
The user-friendly, open-source statistics,
analysis & reporting package.

User Tools

Site Tools


help:recoding_data

Contents

Recoding Data

Introduction

Sometimes you need to change your data before you can analyse it. For example, you might have a field called age but you want to look at the percentages in different age groups. You might want 0-19 in one group, 20-29 in another, 30-39 in another, 40-64 in another, and finally 65+ in another.

How do you get from data like this:

To a report table like this:

The easiest way is to use the built-in recoding functionality of SOFA Statistics (see below). This makes it easy, for example, to map ranges of values to single values. If you are wanting to do something more complex, e.g. averaging the values from multiple fields, it is possible to do so using a spreadsheet before importing/reimporting, or SQLite Database Browser. Finally, if the dataset is small, there is the option of manual data entry.

Recoding in SOFA

  1. Click on the “Enter/Edit Data” button on the main SOFA form.
  2. Select a table in the default SOFA database “sofa_db” other than the read-only “demo_tbl”
  3. Click on the “Design” button because we are going to alter the design of the table by adding an agegroup field based on the “age” field
  4. Click on the “Recode” button
  5. Select the variable to recode (in this case, “age”) and enter a new variable name you wish to recode into (in this case, “agegroup”)
  6. Fill in the details
    1. Ranges use the keyword TO e.g. “150 TO 250”. All keywords must be upper case, so “TO” will work but “to” will not.
    2. “MIN” and “MAX” can be used in ranges e.g. “MIN TO 100”, or “100 TO MAX”. You can even use “MIN TO MAX” if you want to leave out missing values.
    3. “REMAINING” and “MISSING” are the two remaining keywords you can use e.g. if you want all missing values to become 99 you would have a line with From as “MISSING”, and To as 99
    4. Only one condition is allowed per line. So if you want to recode < =5 and 10+ to 99 you would have one line with
      “MIN TO 5” as From and 99 as To
      and another line with
      “10 TO MAX” as From and 99 as To.
      Clicking on the “Help” button gives access to built-in and online help
  7. Click on the “Recode” button to modify the table
  8. Please Note - this was a once-off recode - it won't be applied automatically when new rows are added or cells are edited.
  9. The table design has been altered
  10. If you open the table, you will see that the data has been altered as well. The labels you added are now part of your project and are automatically applied to fields of that name.
  11. Now your data is ready to analyse by age group

A video is available showing how to recode data: Recoding data video

More Sophisticated Recoding

Sometimes you need to do something involving multiple variables e.g. making a new variable from the average of three other variables. Or you may have some other, more sophisticated data manipulation requirements. The easiest way to do this is in a spreadsheet before importing (or reimporting) the data.

Using Spreadsheet Functions

Creating a standard function makes this very easy.

Using SQLite Database Browser

Another option is to manipulate data already inside SOFA. SOFA stores its data in an SQLite database called sofa_db. It will be stored in a folder like “C:\Documents and Settings\username\sofastats\_internal” or “/home/username/sofastats/_internal”. You can alter the data directly using the free and open source program SQLite Database Browser

Adding a New Variable

The following syntax works in SQLite (common field types are INTEGER, TEXT, and NUMERIC):

ALTER TABLE mytable ADD newvar INTEGER

Populating a New Variable with Data

The following syntax shows how flexible this approach is:

UPDATE mytable SET newvar = Total/2

or

UPDATE mytable SET newvar = (var1 + var2 + var3)/3

You can also use this approach to alter values in an existing variable. You can also restrict the changes using a WHERE clause e.g.

UPDATE mytable SET existingvar = “Invalid data” WHERE var1 > 100 OR var2 > 100

Anything Else You Can Imagine

Once you have started using SQL there is very little you cannot do in data manipulation. The SQLite SQL syntax documentation is here: SQL As Understood By SQLite

Contents

Wiki

help/recoding_data.txt · Last modified: 2011/04/07 17:04 (external edit)