# What is the line of best fit?

1733 Views

Explanation of line of best fit, interpreting the best fit line and r, and r- squared

Once we have plotted the data we collected on a scatter graph and have determined we have a linear correlation, that is the graph shows the points have a pattern that resembles a line we will want to see if we can find a line that best represents this data. We call this the best fit line.

A best fit line does not necessarily go through every point on the graph. This line is the one that minimizes the distance each point falls from it. Once we determine the equation of the best fit line we can them make predictions. This is important since we are unable to measure everything. The Calculator finds the best fit line for you. It can be done by hand. We will not go into the mathematics required to do this. For now we will use the calculator to find the best fit line.

Interpretation of the line of best fit

Suppose I found based on my height and weight study that the best fit line was y=.42x+4.

.42 is positive therefore I have an upward sloping line which means that as height increases so does a person's weight. This is a positive correlation

Suppose I want to estimate or predict the weight of a person who is 170 cm tall.  If I plug 170 in for x in the equation above I would get .42*170+4=75. Therefore I would estimate that a person 170 cm tall would weigh 75kg.

When you use best fit lines to predict you must be very careful that you understand the context of your best fit equation. In other words understand its limitations. You cannot make predictions outside of the bounds of your data. In addition a best fit line based on past data may not be valid for today or the future.

R and R squared

Please make sure the diagnostics are turned on in your calculator so that when you do the linear regression analysis both r and r-squared come up.

r is the correlation coefficient and as we learned last week it is a measure of how well the data fits the line.  It will be negative if you have a negative correlation and positive if you have a positive correlation.  The closer it is to 1 or -1 the better fit the line is to the data. Remember, the line DOES NOT need to go through any points to be a good fit.

r-squared is the coefficient of determination. It is a measure of the percent variation your independent variable causes your dependent variable. For example if I were to be measuring hours studied per week and GPA and found them to be positively correlated with an r of .9 , r-squared would be .81.  This would mean that 81 % of the variation found in GPA can be attributed to the number of hours studied per week.

In your HW and on your final you will be asked to interpret the meaning of r-squared.