# Pearson Correlation Coefficient (r)

The Pearson Correlation Coefficient is a measurement of correlation between two quantitative variables, giving a value between -1 and 1 inclusive. This correlation demonstrates the degree to which the variables are dependent on one another. In other words, if the value of r is 1 or -1, then that means that all of the data points lie perfectly on a line. The correlation also demonstrates the direction of the relationship between two quantitative variables. As demonstrated in the picture, a positive r value indicates a positive relationship (both variables are increasing or both are decreasing), and a negative r value demonstrates a negative relationship (one variable is decreasing while the other is increasing).

This is a summary of some important facts to remember about correlation:

1. Correlation does not distinguish between explanatory and response variables. Therefore, it doesn’t matter which variable is called x and which is called y.

2. Correlation requires both variables to be quantitative. We cannot calculate correlation for categorical variables.

3. r does not change when the units of measurements change. This is because r uses the standardized values of the variables. Correlation itself has no measurement, it is simply a number.

4. Positive r indicates positive association and negative r represents a negative association. This is displayed by the picture.

5. The correlation is always between -1 and 1. Values of r near 0 indicate a very weak relationship. The strength of the linear relationship increases as r moves away from 0 to either 1 or -1. The closer r is to 1 or -1, the more linear the data. Values of 1 or -1 indicate a perfectly linear relationship. This idea is also demonstrated by the picture.

6. Correlation does not measure the relationship of curves, only linear data.

7. The correlation is not resistant to outliers and is strongly affected by outlying observations.

CAUTION!!! Remember that correlation is not a complete description of variable data, and you need to provide means and standard deviations for both x and y along with the correlation to have a complete description of your scatterplot. This fact is exemplified by the final picture included. The bottom row demonstrates that even if you get the same correlation, that doesn’t provide all of the information necessary for understanding what the graph actually looks like. As you can see from that bottom row, all of those scatterplots have an r value of 0, but they are all quite different.

Wikipedia provides a good introduction to the Pearson Correlation Coefficient. It also includes the different formulas for determining the correlation coefficient in both a sample and a population, so the formulas on Wikipedia are quite useful. Furthermore, if you are interested in learning more about Karl Pearson, you can follow this Wikipedia link to his page as well. Pearson was influential in many areas of statistics and worked closely with Galton.

This site provides an easy to follow example of how to find the correlation coefficient, although there are multiple ways to do so. If you are curious to see someone complete each step of finding the correlation coefficient, this site is for you!. This site also includes good general information pertaining to the correlation coefficient, and within that site I found this Scatterplot Demonstration which helps reinforce the idea of a strong  or weak correlation. Simply click on the various correlation coefficients on the side of the diagram and you can see what different correlations look like. Finally, this site covers a lot of the possible problems that could occur when using the correlation coefficient and is a good resource to know how to react when such problems occur.