If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Regression line example

Regression Line Example. Created by Sal Khan.

## Want to join the conversation?

• I don't really understand the meaning of the word "regression" being used as a noun in this context. I thought it made sense in the phrase "regression to the mean", as in "returning to the mean". And I can find clear definitions of "regression line" or "regression analysis" but none of the word "regression" on its own.
• It's a quirk of history. The "regression" part of the name came from its early application by Sir Francis Galton who used the technique doing work in genetics during the 19th century. He was looking at how an offspring's characteristics tended to be between those of the parents (i.e. they regressed to the mean of the parents). The "regression" part just ended up stuck as part of the name from then on. Other than that, linear regression has nothing to do with regression to the mean.
• What is the difference between this method of figuring out the formula for the regression line and the one we had learned previously? that is: slope = r*(Sy/Sx) and since we know the line goes through the mean of the Xs and the mean of the Y's we can figure out the y-intercept by substituting on the formula y= mx +b.
• When would you learn this? Algebra 1 or 2? Or something else?
• You would probably learn this in HS statistics. If not, then definitely in college stats, especially since the proof in the previous videos uses some calculus. Algebra I/II wouldn't go into this area.
• How can we derive m = r*(std y / stdx) from the formula m in this lesson?
• The formula presented in this lesson for m is derived from the method of least squares, which minimizes the sum of the squared errors. However, the correlation coefficient r quantifies the strength and direction of the linear relationship between x and y. To derive m = r × (std y / std x) from the formula in this lesson, you would need to show that the slope m is proportional to r when standardized by the standard deviations of x and y. This involves mathematical manipulation and demonstrating the relationship between the coefficients in both formulas.
(1 vote)
• My textbook uses 'least squares regression line', and has this really complicated equation (or at least it looks like it's complicated) that involves summations and such. Is this the same thing? It has the x's and y's in nearly the same spots, but it's got those different symbols.
(1 vote)
• Yes.

`` N ∑ : x(1) + x(2) + x(3) + ... + x(N)n=1``
• Is the formula used to find m in this video the same as correlation coefficient * (standard deviation of y/standard deviation of x) which he talked about in a previous video?
• The formula used to find m in this video is derived from the method of least squares, while the formula m = r × (std y / std x) involves the correlation coefficient r and the standard deviations of x and y. While both formulas may yield similar results in certain cases, they represent different approaches to estimating the slope of the regression line. The formula derived in this video directly minimizes the sum of squared errors, while the formula involving r relates the slope to the strength and direction of the linear relationship between x and y.
(1 vote)
• At and , why do you divide the "Y's" and the "X's" by 3? Where did the "three" come from?
• he was just calculating mean by adding up all the numbers then DIVIDING BY THE TOTAL NUMBER OF DATA POINTS, which is 3
(1 vote)
• Where can I find the video for the derivation of the slope and y-intercept of the best fitting regression line using least squares? It doesn't seem to follow the introduction video or precede this example.
• Which videos does it refer to at the beginning, i.e. to derive these formulas? Could you please share the link?
• How can we calculate "m" and "b" when we have multiple independent variables? Can I use the above formula for multiple independent variable also?
(1 vote)
• You might be able to use the same process that Sal used in the sequence of videos titled "Proof (part x) Minimizing squared error to regression line" to get slope estimates when there are multiple independent variables. More frequently, matrix algebra is used to get the slopes. If you are familiar with linear algebra, the idea it so say that:

`Y = Xβ + e`

Where:
Y is a vector containing all the values from the dependent variables
X is a matrix where each column is all of the values for a given independent variable.
e is a vector of residuals.

Then we say that a predicted point is `Yhat = Xβ`, and using matrix algebra we get to `β = (X'X)^(-1) (X'Y)`