I don't really understand the meaning of the word "regression" being used as a noun in this context. I thought it made sense in the phrase "regression to the mean", as in "returning to the mean". And I can find clear definitions of "regression line" or "regression analysis" but none of the word "regression" on its own.

It's a quirk of history. The "regression" part of the name came from its early application by Sir Francis Galton who used the technique doing work in genetics during the 19th century. He was looking at how an offspring's characteristics tended to be between those of the parents (i.e. they regressed to the mean of the parents). The "regression" part just ended up stuck as part of the name from then on. Other than that, linear regression has nothing to do with regression to the mean.

When would you learn this? Algebra 1 or 2? Or something else?

You would probably learn this in HS statistics. If not, then definitely in college stats, especially since the proof in the previous videos uses some calculus. Algebra I/II wouldn't go into this area.

My textbook uses 'least squares regression line', and has this really complicated equation (or at least it looks like it's complicated) that involves summations and such. Is this the same thing? It has the x's and y's in nearly the same spots, but it's got those different symbols.

Yes. ``` N ∑ : x(1) + x(2) + x(3) + ... + x(N) n=1 ```

How can we calculate "m" and "b" when we have multiple independent variables? Can I use the above formula for multiple independent variable also?

You might be able to use the same process that Sal used in the sequence of videos titled "Proof (part x) Minimizing squared error to regression line" to get slope estimates when there are multiple independent variables. More frequently, matrix algebra is used to get the slopes. If you are familiar with linear algebra, the idea it so say that: `Y = Xβ + e` Where: Y is a _vector_ containing all the values from the dependent variables X is a _matrix_ where each column is all of the values for a given independent variable. e is a _vector_ of residuals. Then we say that a predicted point is `Yhat = Xβ`, and using matrix algebra we get to `β = (X'X)^(-1) (X'Y)`

Could you please explain orthogonal regression with an example?

An engineer at a medical device company wants to determine whether the company's new blood pressure monitor is equivalent to a similar monitor that is made by a different company. The engineer measures the systolic blood pressure of a random sample of 60 people using both monitors. To determine whether the two monitors are equivalent, the engineer uses orthogonal regression. Previous to the data collection for the orthogonal regression, the engineer did separate studies on each monitor to estimate the variances. The variance for the new monitor was 1.08. The variance for the other company's monitor was 1.2. The engineer decides to assign the new monitor to be the response variable and the other company's monitor to be the predictor variable. With these assignments, the error variance ratio is 1.08 / 1.2 = 0.9.

Main content

Course: Statistics and probability > Unit 5

Lesson 6: More on regression

Regression line example

Name: Regression line example
Uploaded: 2011-02-20T16:54:54Z
Description: Regression Line Example

Google Classroom

Regression Line Example. Created by Sal Khan.

Want to join the conversation?

Sort by:

Andy Brice
Posted 9 years ago. Direct link to Andy Brice's post “I don't really understand...”
I don't really understand the meaning of the word "regression" being used as a noun in this context. I thought it made sense in the phrase "regression to the mean", as in "returning to the mean". And I can find clear definitions of "regression line" or "regression analysis" but none of the word "regression" on its own.
Button navigates to signup pageButton navigates to signup page
(15 votes)
Answer
- Alex Hanning
  Posted 9 years ago. Direct link to Alex Hanning's post “It's a quirk of history. ...”
  It's a quirk of history. The "regression" part of the name came from its early application by Sir Francis Galton who used the technique doing work in genetics during the 19th century. He was looking at how an offspring's characteristics tended to be between those of the parents (i.e. they regressed to the mean of the parents). The "regression" part just ended up stuck as part of the name from then on. Other than that, linear regression has nothing to do with regression to the mean.
  Comment on Alex Hanning's post “It's a quirk of history. ...”
  (29 votes)
Samuel Rodriguez
Posted 5 years ago. Direct link to Samuel Rodriguez's post “What is the difference be...”
What is the difference between this method of figuring out the formula for the regression line and the one we had learned previously? that is: slope = r*(Sy/Sx) and since we know the line goes through the mean of the Xs and the mean of the Y's we can figure out the y-intercept by substituting on the formula y= mx +b.
Button navigates to signup pageComment on Samuel Rodriguez's post “What is the difference be...”
(17 votes)
Answer
Esther
Posted 7 years ago. Direct link to Esther's post “When would you learn this...”
When would you learn this? Algebra 1 or 2? Or something else?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- Lance MacBlane
  Posted 7 years ago. Direct link to Lance MacBlane's post “You would probably learn ...”
  You would probably learn this in HS statistics. If not, then definitely in college stats, especially since the proof in the previous videos uses some calculus. Algebra I/II wouldn't go into this area.
  Comment on Lance MacBlane's post “You would probably learn ...”
  (3 votes)
N N
Posted 2 years ago. Direct link to N N's post “How can we derive m = r*(...”
How can we derive m = r*(std y / stdx) from the formula m in this lesson?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- daniella
  Posted 3 months ago. Direct link to daniella's post “The formula presented in ...”
  The formula presented in this lesson for m is derived from the method of least squares, which minimizes the sum of the squared errors. However, the correlation coefficient r quantifies the strength and direction of the linear relationship between x and y. To derive m = r × (std y / std x) from the formula in this lesson, you would need to show that the slope m is proportional to r when standardized by the standard deviations of x and y. This involves mathematical manipulation and demonstrating the relationship between the coefficients in both formulas.
  Button navigates to signup page
  (1 vote)
Sriha S.
Posted 8 years ago. Direct link to Sriha S.'s post “My textbook uses 'least s...”
My textbook uses 'least squares regression line', and has this really complicated equation (or at least it looks like it's complicated) that involves summations and such. Is this the same thing? It has the x's and y's in nearly the same spots, but it's got those different symbols.
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- redthumb.liberty
  Posted 8 years ago. Direct link to redthumb.liberty's post “Yes. ``` N ∑ : x(1) + ...”
  Yes.
  
  N ∑ : x(1) + x(2) + x(3) + ... + x(N) n=1
  Button navigates to signup page
  (3 votes)
Natalya Hoffman
Posted 8 years ago. Direct link to Natalya Hoffman's post “At 3:30 and 4:04, why do ...”
At
3:30
and
4:04
, why do you divide the "Y's" and the "X's" by 3? Where did the "three" come from?
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
- BikerMiker
  Posted 8 years ago. Direct link to BikerMiker's post “he was just calculating m...”
  he was just calculating mean by adding up all the numbers then DIVIDING BY THE TOTAL NUMBER OF DATA POINTS, which is 3
  Comment on BikerMiker's post “he was just calculating m...”
  (1 vote)
Michael Towery
Posted 7 years ago. Direct link to Michael Towery's post “Where can I find the vide...”
Where can I find the video for the derivation of the slope and y-intercept of the best fitting regression line using least squares? It doesn't seem to follow the introduction video or precede this example.
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
Lois Duhourcau
Posted 7 years ago. Direct link to Lois Duhourcau's post “Which videos does it refe...”
Which videos does it refer to at the beginning, i.e. to derive these formulas? Could you please share the link?
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
surya ambati
Posted 8 years ago. Direct link to surya ambati's post “How can we calculate "m" ...”
How can we calculate "m" and "b" when we have multiple independent variables? Can I use the above formula for multiple independent variable also?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- Dr C
  Posted 8 years ago. Direct link to Dr C's post “You might be able to use ...”
  You might be able to use the same process that Sal used in the sequence of videos titled "Proof (part x) Minimizing squared error to regression line" to get slope estimates when there are multiple independent variables. More frequently, matrix algebra is used to get the slopes. If you are familiar with linear algebra, the idea it so say that:
  
  Y = Xβ + e
  
  Where:
  Y is a vector containing all the values from the dependent variables
  X is a matrix where each column is all of the values for a given independent variable.
  e is a vector of residuals.
  
  Then we say that a predicted point is Yhat = Xβ, and using matrix algebra we get to β = (X'X)^(-1) (X'Y)
  Button navigates to signup page
  (3 votes)
ap3921a
Posted 4 years ago. Direct link to ap3921a's post “Could you please explain ...”
Could you please explain orthogonal regression with an example?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- Andy Tang
  Posted 4 years ago. Direct link to Andy Tang's post “An engineer at a medical ...”
  An engineer at a medical device company wants to determine whether the company's new blood pressure monitor is equivalent to a similar monitor that is made by a different company. The engineer measures the systolic blood pressure of a random sample of 60 people using both monitors.
  
  To determine whether the two monitors are equivalent, the engineer uses orthogonal regression. Previous to the data collection for the orthogonal regression, the engineer did separate studies on each monitor to estimate the variances. The variance for the new monitor was 1.08. The variance for the other company's monitor was 1.2. The engineer decides to assign the new monitor to be the response variable and the other company's monitor to be the predictor variable. With these assignments, the error variance ratio is 1.08 / 1.2 = 0.9.
  Button navigates to signup page
  (3 votes)

Video transcript

In the last several videos, we did some fairly hairy mathematics. And you might have even skipped them. But we got to a pretty neat result. We got to a formula for the slope and y-intercept of the best fitting regression line when you measure the error by the squared distance to that line. And our formula is, and I'll just rewrite it here just so we have something neat to look at. So the slope of that line is going to be the mean of x's times the mean of the y's minus the mean of the xy's. And don't worry, this seems really confusing, we're going to do an example of this actually in a few seconds. Divided by the mean of x squared minus the mean of the x squareds. And if this looks a little different than what you see in your statistics class or your textbook, you might see this swapped around. If you multiply both the numerator and denominator by negative 1, you could see this written as the mean of the xy's minus the mean of x times the mean of the y's. All of that over the mean of the x squareds minus the mean of the x's squared. These are obviously the same thing. You're just multiplying the numerator and denominator by negative 1, which is same thing as multiplying the whole thing by 1. And of course, whatever you get for m, you can then just substitute back in this to get your b. Your b is going to be equal to the mean of the y's minus your m. Let me write that in yellow so it's very clear. You solved for the m value. Minus m times the mean of the x's. And this is all you need. So let's actually put that into practice. So let's say I have three points, and I'm going to make sure that these points aren't colinear. Because, otherwise, it wouldn't be interesting. So let me draw three points over here. Let's say that to one point is the point 1 comma 2. So this 1, 2. And then we also have the point 2 comma 1. And then, let's say we also have the point, let's do something a little bit crazy, 4 comma 3. So this is 4, 3. So those are our three points. And what we want to do is find it the best fitting regression line, which we suspect is going to look something like that. We'll see what it actually looks like using our formulas, which we have proven. So a good place to start is just to calculate these things ahead of time, and then to substitute them back in the equation. So what's the mean of our x's? The mean of our x's is going to be 1 plus 2 plus 4 divided by 3. And what's this going to be? 1 plus 2 is 3, plus 4 is 7 divided by 3. It is equal to 7/3. Now, what is the mean of our y's? The mean of our y's is equal to 2 plus 1 plus 3. All of that over 3. So this is 2 plus 1 is 3. Plus 3 is 6. Divided by 3 is equal to 2. This is 6 divided by 3 is equal to 2. Now, what is the mean of our xy's? So our first xy over here is 1 times 2. Plus 2 times 1 plus 4 times 3. And we have three of these xy's. So divided by 3. So what's this going to be equal to? We have 2 plus 2, which is 4. 4 plus 12, which is 16. So it's going to be 16/3. And then the last one we have to calculate is the mean of the x squareds. So what's the mean of the x squareds? The first x squared is just going to be 1 squared. Plus this 2 squared, plus this 4 squared. And we have three data points again. So this is 1 plus 4, which is 5. Plus 16. Is equal to 21/3, which is equal to 7. So that worked out to a pretty neat number. So let's actually find our m's and our b's. So our slope, our optimal slope for our regression line, the mean of the x's is going to be 7/3. Times the mean of the y's. The mean of the y's is 2. Minus the mean of the xy's. Well, that's 16/3. And then, all of that over the mean of the x's. The mean of the x's is 7/3 squared. Minus the mean of the x squareds. So it's going to be minus this 7 right over here. And we just have to do a little bit of mathematics. I'm tempted to get out my calculator, but i'll resist the temptation. It's nice to keep things as fractions. Let's see if I can calculate this. This is 14/3 minus 16/3. All of that over, this is 49/9. And then minus 7. If I wanted to express that as something over 9, that's the same thing as 63/9. So in our numerator, we get negative 2/3. And then in our denominator, what's 49 minus 63? That's negative 14/9. And this is the same thing as negative 2/3 times negative 9/ 14. Divide numerator and denominator by 3. Well, the negatives are going to cancel out first of all. You divide by 3. That becomes a 1. That becomes a 3. Divide by 2. Becomes a 1. That becomes a 7. So our slope is 3/7. Not too bad. Now, we can go back and figure out our y-intercept. So let's figure out our y-intercept using this right over here. So our y-intercept, b, is going to be equal to the mean of the y's, the mean of the y's is 2, minus our slope. We just figured out our slope to be 3/7. Times the mean of the x's, which is 7/3. These just are the reciprocal of each other, so they cancel out. That just becomes 1. So our y-intercept is literally just 2 minus 1. So it equals 1. So we have the equation for our line. Our regression line is going to be y is equal to-- We figured out m. m is 3/7. y is equal to 3/7 x plus, our y-intercept is 1. And we are done. So let's actually try to graph this. So our y-intercept is going to be 1. It's going to be right over there. And the slope of our line is 3/7. So for every 7 we run, we rise 3. Or another way to think of it, for every 3.5 we run, we rise 1.5. So we're going to go 1.5 right over here. So this line, if you were to graph it, and obviously I'm hand drawing it, so it's not going to be that exact, is going to look like that right over there. And it actually won't go directly through that line. So I don't want to give you that impression. So it might look something like this. And this line, we have shown, that this formula minimizes the squared distances from each of these points to that line. Anyway, that was, at least in my mind, pretty neat.