Category Data Analysis

Anscombe’s Quartet

Here’s an interesting example of why you should never just look at an r value and a regression model without looking at the scatterplot first. Anscombe’s Quartet consists of four data sets. Each has the same sample size and nearly the same mean and standard deviation for the x and y variables, r values, and regression coefficients. But […]

Interquartile Range (IQR)

The interquartile range (IQR) is a measure of the spread of a distribution of a single quantitative variable. The IQR is a rather simple calculation and is merely the difference between (hence “range”) the upper quartile (Q3) and the lower quartile (Q1) (hence “inter” and “quartile”). For a better understanding of quartiles, here is a site […]

Pearson Correlation Coefficient (r)

The Pearson Correlation Coefficient (or the Pearson Product Moment Correlation) is the measure of the strength of the linear association between two quantitative variables. The formula is: , where and are the standard scores for x and y that show how many standard deviations x and y are from the mean and n is the […]

Coefficient of Determination

The coefficient of determination, denoted as R2, is a measure of strength of a given correlation. The value will fall between 0 and 1, with a larger number representing a stronger correlation. There are three ways to calculate the coefficient of determination, though each is not guaranteed to produce the same value. In the scope of […]


As one may know from either middle school or high school, the median is the value in the middle of a sample. That is, there is an equal number of values less than and more than the median. However, if the amount in the sample is an even number, the median is the average of the two […]

Residual Plots

Residual plots graph the distance of each data point from the curve in a chosen model, and can be used to tell if a given data set fits a selected model. A residual is the distance of a point from the curve. When making a residual plot, the x-axis is the same as in the […]

Coefficient of Determination

The coefficient of determination is often referred to as . The coefficient of determination, simply put, is the measure of how well a regression models a data set. If you have a data set that has an value of 0.95, then that means the regression explains 95% of the variation of the data, which is excellent. […]