Variation R squared and Line of Best fit

What is the Best Fit Line?

 Once we have plotted the data we collected on a scatter graph and have determined we have a linear correlation, that is the graph shows the points have a pattern that resembles a line we will want to see if we can find a line that best represents this data. We call this the best fit line.

A best-fit line does not necessarily go through every point on the graph. This line is the one that minimizes the distance each point falls from it. Once we determine the equation of the best fit line we can them make predictions. This is important since we are unable to measure everything.

Interpreting the Best Fit Line

Suppose I found based on my height and weight study that the best fit line was y=.42x+4.

.42 is positive therefore I have an upward sloping line which means that as height increases so does a person's weight. This is a positive correlation

Suppose I want to estimate or predict the weight of a person who is 170 cm tall.  If I plug 170 in for x in the equation above I would get .42*170+4=75. Therefore I would estimate that a person 170 cm tall would weigh 75kg.

When you use best-fit lines to predict you must be very careful that you understand the context of your best-fit equation. In other words, understand its limitations. You cannot make predictions outside of the bounds of your data. Also, the best fit line based on past data may not be valid for today or the future.


r and r-squared

Please make sure the diagnostics are turned on in your calculator so that when you do the linear regression analysis both r and r-squared come up.

r is the correlation coefficient and as we learned last week it is a measure of how well the data fits the line.  It will be negative if you have a negative correlation and positive if you have a positive correlation.  The closer it is to 1 or -1 the better fit the line is to the data. Remember, the line DOES NOT need to go through any points to be a good fit. 

r-squared is the coefficient of determination. It is a measure of the percent variation your independent variable causes your dependent variable. For example, if I were to be measuring hours studied per week and GPA and found them to be positively correlated with an r of .9, r-squared would be .81.  This would mean that 81 % of the variation found in GPA can be attributed to the number of hours studied per week.

In your HW and on your final you will be asked to interpret the meaning of r-squared.