help:pearsonsr

# Differences

This shows you the differences between two versions of the page.

 — help:pearsonsr [2011/02/17 16:06] (current) Line 1: Line 1: + [[http://​www.sofastatistics.com/​userguide.php | Contents]] + [[:​help:​stats_tests | Statistical Tests Available]] + + ====== Correlation - Pearson'​s R Test ====== + + The Pearson'​s R Correlation Test (also called the [[http://​en.wikipedia.org/​wiki/​Pearson_product-moment_correlation_coefficient | Pearson product-moment correlation coefficient]]) tells you how strong the linear correlation is for paired numeric data e.g. height and weight. You should never run this test without viewing a scatterplot and visually examining the basic shape of the relationship. The test could indicate a low linear correlation and yet the data could have a very strong and clear non-linear pattern e.g. a U shape. ​ The other thing to look for is outliers. ​ It might be that the relationship is diffuse except for a value or two that sit on the top right-hand corner of the grid e.g. if you were looking at gun ownership and national income. This test is not very resistant to outliers. + + Looking at the patterns below, we have a very strong linear relationship (A), a less strong linear relationship (B), and a weaker linear relationship (C). (D) is barely there, (E) is a very strong non-linear relationship,​ and (F) is an otherwise weak relationship with an important outlier. It should be noted that even though (C) might have a reasonably high R, that for any given x-axis value there is a considerable spread of y values. ​ You couldn'​t really consider x as a proxy for y e.g. if you were comparing the results of a cheap and quick measurement tool against the results of an expensive and time-consuming measurement tool.  Yes, they produce results with a reasonably high level of correlation,​ but it is still quite a loose relationship. Many more examples of patterns can be found here: [[http://​en.wikipedia.org/​wiki/​File:​Correlation_examples.png | Correlation examples.png]]. + + {{:​help:​scatterplot_patterns.gif|}} + + You should always look at the scatter plot before interpreting the results. Sometimes completely different datasets can produce identical summary statistics (see [[http://​en.wikipedia.org/​wiki/​Anscombe%27s_quartet | Anscombe'​s quartet]]). + + Two key things to note about the test are the p value and R.  The p value tells you if you can reject the null hypothesis or not - namely, the hypothesis that there is no linear relationship. ​ The R value gives an indication of the strength of the relationship. ​ If the value is 0.7, we look at R squared to see how much the change in one variable is explained by change in the other - in this case 0.49 or less than half.  Once again, it is always important to look at the scatterplot when interpreting the findings. + + The Pearson'​s R Correlation Test is only for data that is numerical and that is distributed adequately normally. ​ If your data is ordinal or not adequately normal the appropriate alternative is the [[:​help:​spearmansr | Spearman'​s R Correlation Test]]. + + [[http://​www.sofastatistics.com/​userguide.php | Contents]] + + [[:​help:​stats_tests | Statistical Tests Available]] + + [[:home | Wiki]] 