If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Linear regression review

Linear regression is a process of drawing a line through data in a scatter plot. The line summarizes the data, which is useful when making predictions.

What is linear regression?

When we see a relationship in a scatterplot, we can use a line to summarize the relationship in the data. We can also use that line to make predictions in the data. This process is called linear regression.
Want to see an example of linear regression? Check out this video.

Fitting a line to data

There are more advanced ways to fit a line to data, but in general, we want the line to go through the "middle" of the points.
practice problem
Which line fits the data graphed below?
Choose 1 answer:

Want to learn more about fitting a line to data? Check out this video.
Want to practice more problems like this? Check out this exercise.

Using equations for lines of fit

Once we fit a line to data, we find its equation and use that equation to make predictions.

Example: Finding the equation

The percent of adults who smoke, recorded every few years since 1967, suggests a negative linear association with no outliers. A line was fit to the data to model the relationship.
Write a linear equation to describe the given model.
Step 1: Find the slope.
This line goes through (0,40) and (10,35), so the slope is 3540100=12.
Step 2: Find the y-intercept.
We can see that the line passes through (0,40), so the y-intercept is 40.
Step 3: Write the equation in y=mx+b form.
The equation is y=0.5x+40
Based on this equation, estimate what percent of adults smoked in 1997.
To estimate what percent of adults smoked in 1997, we can plug in 30 for x (since x represents years since 1967):
y=0.5x+40y=(0.5)(30)+40y=15+40y=25
Based on the equation, about 25% of adults smoked in 1997.
practice problem
Jacob distributed a survey to his fellow students asking them how many hours they'd spent playing sports in the past day. He also asked them to rate their mood on a scale from 0 to 10, with 10 being the happiest. A line was fit to the data to model the relationship.
Which of these linear equations best describes the given model?
Choose 1 answer:
Based on this equation, estimate the mood rating for a student that spent 2.5 hours playing sports.
Round your answer to the nearest hundredth.
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3/5
  • a simplified improper fraction, like 7/4
  • a mixed number, like 1 3/4
  • an exact decimal, like 0.75
  • a multiple of pi, like 12 pi or 2/3 pi

Want to practice more problems like these? Check out this exercise.

Want to join the conversation?