help:pearsonsr

The Pearson's R Correlation Test (also called the Pearson product-moment correlation coefficient) tells you how strong the linear correlation is for paired numeric data e.g. height and weight. You should never run this test without viewing a scatterplot and visually examining the basic shape of the relationship. The test could indicate a low linear correlation and yet the data could have a very strong and clear non-linear pattern e.g. a U shape. The other thing to look for is outliers. It might be that the relationship is diffuse except for a value or two that sit on the top right-hand corner of the grid e.g. if you were looking at gun ownership and national income. This test is not very resistant to outliers.

Looking at the patterns below, we have a very strong linear relationship (A), a less strong linear relationship (B), and a weaker linear relationship (C). (D) is barely there, (E) is a very strong non-linear relationship, and (F) is an otherwise weak relationship with an important outlier. It should be noted that even though (C) might have a reasonably high R, that for any given x-axis value there is a considerable spread of y values. You couldn't really consider x as a proxy for y e.g. if you were comparing the results of a cheap and quick measurement tool against the results of an expensive and time-consuming measurement tool. Yes, they produce results with a reasonably high level of correlation, but it is still quite a loose relationship. Many more examples of patterns can be found here: Correlation examples.png.

You should always look at the scatter plot before interpreting the results. Sometimes completely different datasets can produce identical summary statistics (see Anscombe's quartet).

Two key things to note about the test are the p value and R. The p value tells you if you can reject the null hypothesis or not - namely, the hypothesis that there is no linear relationship. The R value gives an indication of the strength of the relationship. If the value is 0.7, we look at R squared to see how much the change in one variable is explained by change in the other - in this case 0.49 or less than half. Once again, it is always important to look at the scatterplot when interpreting the findings.

The Pearson's R Correlation Test is only for data that is numerical and that is distributed adequately normally. If your data is ordinal or not adequately normal the appropriate alternative is the Spearman's R Correlation Test.

help/pearsonsr.txt · Last modified: 2011/02/17 16:06 (external edit)