Main content

### Course: Statistics and probability > Unit 5

Lesson 3: Introduction to trend lines- Fitting a line to data
- Estimating the line of best fit exercise
- Eyeballing the line of best fit
- Estimating with linear regression (linear models)
- Estimating equations of lines of best fit, and using them to make predictions
- Line of best fit: smoking in 1945
- Estimating slope of line of best fit
- Equations of trend lines: Phone data
- Linear regression review

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Linear regression review

Linear regression is a process of drawing a line through data in a scatter plot. The line summarizes the data, which is useful when making predictions.

### What is linear regression?

When we see a relationship in a scatterplot, we can use a line to summarize the relationship in the data. We can also use that line to make predictions in the data. This process is called

**linear regression**.*Want to see an example of linear regression? Check out this video.*

### Fitting a line to data

There are more advanced ways to fit a line to data, but in general, we want the line to go through the "middle" of the points.

*Want to learn more about fitting a line to data? Check out this video.*

*Want to practice more problems like this? Check out this exercise.*

### Using equations for lines of fit

Once we fit a line to data, we find its equation and use that equation to make predictions.

#### Example: Finding the equation

The percent of adults who smoke, recorded every few years since $1967$ , suggests a negative linear association with no outliers. A line was fit to the data to model the relationship.

**Write a linear equation to describe the given model.**

**Step 1:**Find the slope.

This line goes through $(0,40)$ and $(10,35)$ , so the slope is $\frac{35-40}{10-0}}=-{\displaystyle \frac{1}{2}$ .

**Step 2:**Find the

We can see that the line passes through $(0,40)$ , so the $y$ -intercept is $40$ .

**Step 3:**Write the equation in

The equation is $y=-0.5x+40$

**Based on this equation, estimate what percent of adults smoked in**$1997$ .

To estimate what percent of adults smoked in $1997$ , we can plug in $30$ for $x$ (since $x$ represents years since $1967$ ):

Based on the equation, about $25\mathrm{\%}$ of adults smoked in $1997$ .

*Want to practice more problems like these? Check out this exercise.*

## Want to join the conversation?

- How will I know for sure if my rounding to the nearest hundred correct?(3 votes)
- Then you check your answer again and see if you got it right or wrong.(2 votes)

- Does the line have to have a positive slope for there to be a linear relationship?(1 vote)
- Absolutely not! Slopes can be negative too, that just means the slope-intercept formula will look like y=-mx+b instead of y=mx+b(4 votes)

- In the practice it asks for the exact number like if i got a 97 as an average for an answer it says my answer is wrong and the answer is like 95 or 96.(2 votes)
- In practice problems, especially in scenarios involving real-world data, it's common to round the answer to a reasonable precision based on the context of the problem. If the problem specifies rounding to the nearest hundredth, then providing an answer of 97 would indeed be incorrect if the expected precision is hundredths.(1 vote)

- how do you calculate linear regression by hand if you don't have a graphing calculator?(2 votes)
- In a later course Sal describes the least squares regression(1 vote)

- what if the y intercept is not given how do you find it then(2 votes)
- You can also look at the formula of the equation.(2 votes)

- An example above says, this line goes through (0,40) and (10,35). How did we calculate (10,35). What is the logic?(1 vote)
- how to draw the line correctly such that we can get the slope correctly(1 vote)
- What do you do if your question doesn't include a y-intercept?(1 vote)
- does the line have a slope(0 votes)
- Yes, the line typically has a slope. The slope represents the rate of change of the dependent variable (y-axis) with respect to the independent variable (x-axis). In linear regression, the slope of the line describes the relationship between the variables being analyzed.(1 vote)

- Can you explain how to find the formula? I am still not understanding.(0 votes)
- To find the formula for the linear equation representing the trend line, you first need to determine the slope and y-intercept. The slope is calculated using the formula:

Slope = change in y / change in x = y2 - y1 / x2 - x1

Once you have the slope (m) and a point (x1, y1), you can use the point-slope form of a linear equation to find the equation of the line:

y - y1 = m(x − x1)

Finally, you can rewrite the equation in slope-intercept form (y = mx + b) by solving for y. The y-intercept (b) can be directly read from the graph or calculated from the equation.(1 vote)