Main content

## Statistics and probability

### Course: Statistics and probability > Unit 5

Lesson 3: Introduction to trend lines- Fitting a line to data
- Estimating the line of best fit exercise
- Eyeballing the line of best fit
- Estimating with linear regression (linear models)
- Estimating equations of lines of best fit, and using them to make predictions
- Line of best fit: smoking in 1945
- Estimating slope of line of best fit
- Equations of trend lines: Phone data
- Linear regression review

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Line of best fit: smoking in 1945

The scatter plot shows how many adults in America smoked from year to year. We can guess how many smoked in 1945 by drawing a line that slopes down through the points. Then we see how high the line would be 20 years earlier. Created by Sal Khan.

## Want to join the conversation?

- I don't understand this at all... can someone please explain this to me?(20 votes)
- We have a graph with various data points, and it looks like there is a linear relationship between the data points (because if you squint you can kinda see where a line could go, right in the middle of all the points).

Once you sketch this line, you know (even though you can't see it) that the line goes on forever in both directions. We know that 1965 on the graph is where x=0, and**about**41 or 42% of Americans smoked... but we want to know how many Americans smoked in 1945.

Even though the graph doesn't show 1945, we can draw the line backwards (to the left of the y-axis) and estimate the y-value from the graph. In the video (at4:32) it looks like the y-value is about 51 or 52%.

Hope this helps a little!(31 votes)

- Is it possible to calculate a perfect line through the points?(22 votes)
- Only through some points. You can have a perfectly straight line when given only two points, but if there are more than two, most often a perfect line doesn't exist.(18 votes)

- Is there a way to make the equations easier to understand and do? I am good at drawing the line of best fit, but not the rate of change...(10 votes)
- Well, the rate of change is a slope which you need when drawing a line of best fit. You're just drawing a line that best fits the data.(7 votes)

- Is this a factual chart?(8 votes)
- I was wondering this too, so I looked it up and it's true that 45% of Americans smoked in 1965. What's interesting is that by 2015, the percentage had dropped and only 15% of Americans smoked.(6 votes)

- we continue the trend like that backwards, then is it possible to show that at some year ~100% population smokes?(7 votes)
- Assuming the trend stays exactly the same, then yes. You can continue the line for as far back and forward as it can go (from 0% to 100%).(7 votes)

- what if that estimate were to be a faction? And what would that fraction be?(8 votes)
- are there any standard to how to get the "best" line ? how do you know that is the right line and this is not ?(6 votes)
- The best line has the most dots going in the same direction, if the line is wrong there would be outliers and you might not be able to use a line at all. I hope this helps!(3 votes)

- Confusing because it started in 1945(7 votes)
- HOw do you approximately caculate the points in the first place.(6 votes)
- How are you supposed to determine where the line goes exactly? I've been doing some of the practice problems and have gotten every single one wrong because my line wasn't placed exactly where it showed in the hint section, and in result came up with a different answer.(4 votes)
- To figure out EXACTLY where the line goes, you'd have to check out some of Khan Academy's least square regression line (aka linear regression or LSRL) videos! The least square regression line is much more precise than the line of best fit, but the least square regression line is also MUCH MORE ADVANCED! It's in the AP Statistics curriculum!(3 votes)

## Video transcript

The graph below shows the
percentage of American adults who smoke over time. Assuming the trend
shown in the data has been consistent
since 1945, use the graph to estimate the percentage
of American adults who smoked in 1945. So let's see what's
going on here. The horizontal axis here,
they say years since 1965. So at this point
right over here, this is 0 years since 1965. So this really represents 1965. And we see it
looks like around-- let's see, if I
were to eyeball it, it looks like it's around 42%
of Americans, just looking at this graph. I know that's not
an exact number. Roughly 41% or 42% of
Americans smoked in 1965 based on this graph. And then five years
later, this would be 1970. 10 years later,
that would be 1975. And they don't sample
the data, or we don't have data from
every given year. This is just from some of the
years that we happen to have. But what is clear,
it looks like we have a negative linear
relationship right over here, that it would not be
difficult to fit a line. So let me try to do that. So I'm just going to
eyeball it and try to fit a line to this data. So our line might look
something like that. So it looks like a pretty strong
negative linear relationship. When I say it's a negative
linear relationship, we see that as time
increases, the percentage of smokers in the
US is decreasing. So that's what makes it
a negative relationship. Now, what are they asking? They want to estimate the
percentage of American adults who smoked in 1945. Well, 1945 would be
to the left of 0. So we could even
think of it as if 1945 is 20 years before 1965. So let me see if
I can draw that. So 20 years before 1965. Let's see, this would be 5
years before 1965, 10 years, 15 years, 20 years before 1965. So I could even put that as
negative 20 right over here. Negative 20 years since 1965
you could view as 20 years before 1965. So that would represent
1945 right over there. And one thing that we
could do is very roughly just try to extend this negative
linear relationship backwards. And they allow us
to do that by saying assuming the trend shown on
the data has been consistent. So the trend has
been consistent. This line represents the trend. So let's just keep
going backwards, keep going backwards at the same
rate, so something like that. I want to make
sure that it looks like it's the same
rate right over here. And you could just
try to eyeball it. You could say, well, let's
see, 20 years ago, 1945. If I were to extend
that line backwards, it looks like there
were about 52% of the population was smoking. It seems like we're about
52% right over here. Another way to
think about it would be to actually try to
calculate the rate of decline. And let's say we do it
over every 20 years, because that will be
useful because we're going 20 years back. So if we go 20 years from
this point, so this is 1965, you go 20 years in the future. So that is 10 years, and
then that is 20 years. So my change in the
horizontal is 20 years. What's the change
in the vertical? Well, it looks like we have a
decrease of a little bit more than 10%. It looks like it's
11% or 12% decrease. So I'll just say
minus 11% right there. And let's see if
that's consistent. If we were to go
another 20 years. So if we go another 20 years,
it looks like once again we've gone down by about 10%. So that looks like roughly 10%. If we're following the
line, it should actually be the same number. So let me write it this way. It's approximately down 10%. So that little
squiggly line, I'm just saying approximately
negative 10% every 20 years. So if you go back 20 years, you
should increase your percentage by 20%. So this should go up by-- or
you should increase your percent by 10%, I should say. So if we started at
41% or 42%, once again, this was what we saw when
we just eyeballed it, you should get to 51% or 52%. So my estimate of the
percentage of American adults who smoked in 1945
would be 51% or 52%.