The least squares regression line is a “line of best fit” used in scatter plots to predict where data is likely to fall in the range of response data. The formula for this regression line is ŷ=b0 +b1x (Where ŷ is the predicted value, b0 is the intercept, and b1 is the slope of the regression line). One can find b0 and b1 as discussed in this website in the third paragraph. Intuitively, if one were to put a straight line through the x-y points on a scatter plot from a sample pool’s given data, he or she would need to square the distances to this line and compare the sum of those squares to sums for all other positions of the line so that the sum of the squares is the “least.” A video explanation of this process can be found on the web here.
In order to be useful for making predictions the least squares regression line method must have the following conditions satisfied:
1. There must be two quantitative variables.
2. The data must form a linear direction.
3. There must be no outliers as their presence skews the r (correlation) value.
4. There must be an equal residual. That is, after the “line of best fit” is positioned all the actual data must fall within a uniform standard deviation.
ModelAssist discusses these points in its articles 4 assumptions. This site also talks about the danger of using the regression line to find ŷ values outside of the range of data from the sample. Personally, I think reading this whole article is useful if keeping in mind what was discussed in class 5/10/2013.