If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Scatterplots | Lesson

What are scatterplots, and how frequently do they appear on the test?

A scatterplot displays data about two variables as a set of points in the x, y-plane. Each axis of the plane usually represents a variable in a real-world scenario.
In this lesson, we'll learn to:
  1. Use the line of best fit to describe scatterplots
  2. Make predictions using the line of best fit
  3. Fit functions to scatterplots
On your official SAT, you'll likely see 2 to 3 questions that test your ability to analyze scatterplots.
This lesson builds upon the following skills:
You can learn anything. Let's do this!

How do we talk about scatterplots?

Note: More recent SAT practice tests do not explicitly test us on the concepts of linearity, strength, and direction, but it's still useful to know the language for describing scatterplots.

Bivariate relationship linearity, strength and direction

Khan Academy video wrapper
Bivariate relationship linearity, strength and directionSee video transcript

What is the line of best fit?

Interpreting a trend line

Khan Academy video wrapper
Interpreting a trend lineSee video transcript

The line of best fit

While each point in a scatterplot represents a specific observation, the line of best fit describes the general trend based on all of the points.
For a given data point, we expect to see a difference between its y-value and the y-value predicted by the line of best fit. These differences are used for more advanced statistical analysis; for the SAT, we only need to calculate the difference.
We can also interpret the slope and y-intercept of the line of best fit the same way we interpret line graphs:
  • The slope represents a constant rate of change.
  • The y-intercept represents an initial value.

Try it!

Try: find the difference between predicted and actual values
A scatterplot and its line of best fit are shown in the x, y-plane above.
The line of best fit passes through the point left parenthesis, 15, comma
  • Your answer should be
  • an integer, like 6
right parenthesis.
Point m has the coordinates left parenthesis, 15, comma
  • Your answer should be
  • an integer, like 6
right parenthesis.
The positive difference in y-value between the data point and the line of best fit is
  • Your answer should be
  • an integer, like 6
.


TRY: Interpret the meaning of the line of best fit
The scatterplot above shows the relative housing cost and the population density for several large US cities in the year 2005. The equation of the line of best fit is y, equals, 0, point, 0125, x, plus, 61.
The constant 61 means that when the population density is 0 people per square mile of land area, the relative housing cost is
.
The coefficient 0, point, 0125 means that as the population density increases by 1, comma, 000 people per square mile land area, the relative housing cost increases by
of the national average cost.


How do I use the line of best fit to make predictions?

Line of best fit: smoking in 1945

Khan Academy video wrapper
Line of best fit: smoking in 1945See video transcript

Predicting what we can and cannot see

When making predictions based on scatterplots, always use the line of best fit instead of individual data points.
If the prediction lies within the part of the x, y-plane shown, it must lie on the line of best fit.
If the prediction lies beyond the part of the x, y-plane shown, we can either extend the line of best fit or use its equation to find the prediction.

Try it!

Try: predict using the line of best fit
The scatterplot above shows the relative housing cost and the population density for several large US cities in the year 2005. The equation of the line of best fit is y, equals, 0, point, 0125, x, plus, 61.
According to the graph, the predicted relative housing cost for a population density of 15, comma, 000 people per square mile land area is approximately
of the national average cost.
According to the equation of the line of best fit, the predicted relative housing cost for a population density of 5, comma, 000 people per square mile land area is
of the national average cost.


How do I fit functions to scatterplots?

Use direction and intercepts to determine the best fit

On the SAT, questions that ask you to fit a function to a scatterplot are always multiple choice, and all four choices are usually functions of the same type, e.g., four linear functions or four quadratic functions.
For linear functions in the form f, left parenthesis, x, right parenthesis, equals, m, x, plus, b:
  • Sketch a line that fits the data and approximate its slope.
  • The value of m should match the slope. Make sure to pay attention to the signs!
  • Approximate the y-intercept of the function that best fits the data. Make sure the constant term b matches the y-intercept.
For quadratic functions in the form f, left parenthesis, x, right parenthesis, equals, a, x, squared, plus, b, x, plus, c:
  • Sketch a parabola and approximately fits the data.
  • If the parabola opens upward, a should be positive. If the parabola opens downward, a should be negative.
  • Approximate the y-intercept of the function that best fits the data. Make sure the constant term c matches the y-intercept.

Try it!

Try: describe a modeling function for a scatterplot
The scatterplot above shows the foot lengths and shoulder heights of the elephants in Kruger National Park in South Africa.
According to the scatterplot, as foot length increases, shoulder height generally
. Therefore, the slope of the line of best fit for this scatterplot is
.
If we sketch a line of best fit for the scatterplots, the y-intercept of the line would be close to 0 and slightly


Your turn!

Practice: find the difference between data and prediction
The scatterplot above shows the dimensions of 12 picture frames on Lee's wall along with the line of best fit. Which of the following statements about the widest picture frame is true?
Choose 1 answer:


Practice: interpret the line of best fit
A panel is rating different kinds of potato chips. The scatterplot above shows the relationship between their average ratings and the price of the chips. The line of best fit for the data is also shown. According to the line of best fit, which of the following is closest to the predicted increase in average rating for every dollar sign, 0, point, 10 increase in price?
Choose 1 answer:


Practice: predict using the line of best fit
The scatterplot above shows data from a random sample of people who reported the age and mileage of their cars. A line of best fit for the data is also shown. Based on the line of best fit, which of the following is closest to the predicted mileage, in thousands of miles, of a car that is 13 years old?
Choose 1 answer:


Practice: fit a quadratic function to a scatterplot
The scatterplot above shows y, the number of employees remaining in an office building, x hours after the building's air conditioning stopped working. Of the following equations, which best models the data in the scatterplot?
Choose 1 answer:


Want to join the conversation?

  • spunky sam green style avatar for user kemolelochannel
    these questions are way harder than the real tests
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user joshuacharlesmachemedze
    How to l draw a line of best fit on a scatterplot
    (2 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Hecretary Bird
      You'd want to make sure that there's an about equal amount of data points both above and below the line of best fit. Use something straight to make sure that you make a straight line. When trying to come up with an equation, try picking intersection points on the coordinate grid that are far away from each other to minimize error.
      (9 votes)
  • blobby green style avatar for user great student
    for the example under "How do I use the line of best fit to make predictions?" about the national average cost percent of house and people per square mile, why is the predicted relative housing cost for a population density of 15,000 people per square mile land area 250 % not 300% ? doesn't the question ask about the value from the scatterplot graph? i hope anyone helps my sat is tomorrow :]
    (3 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Hecretary Bird
      This is a weird question, and I think it might be a mistake on Khan Academy's part to say "according to the graph" instead of something like "according to the line of best fit". 300% is the point on the scatterplot, while you get 250% if you plug 15,000 people into the line of best fit. Rest assured that the SAT will be crystal-clear on where it wants you to get information for. Good luck for tomorrow!
      (5 votes)
  • duskpin sapling style avatar for user MJ
    I thought SAT always draws line of best fit for us? If I'm wrong someone please correct me
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leafers sapling style avatar for user Anuoluwapo Abosede
    How do I draw a line of best fit?
    (0 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Hecretary Bird
      On the SAT, it's probably easiest and fastest to just eyeball the line of best fit and draw it on the graph if you're given a scatterplot and need the line to do something like predict a new point or find the y-intercept or something like that. To do this, you could either just trust that you can figure out where the line of best fit ought to be and draw it there, or you can pick the two furthest away non-outlier points and connect those, if you'd like. The SAT won't have anything like two answer choices so close together that you could get the wrong one by drawing the line of best fit off slightly or anything like that.
      (4 votes)
  • blobby green style avatar for user dhvani
    can someone please explain how to form the equation in the last question? thank you!
    (0 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Hecretary Bird
      The easiest way to do this question is not to come up with the full equation, but instead to look at what the graph gives you and examine the answer choices for shared characteristics.
      If you look at the answer choices, really there's only 2 choices you have to make for the equation. Choosing between a negative and positive x^2 term, and a negative or positive constant (300). The graph curves downwards and intersects the y-axis at a positive value, so we know that our equation would be B).
      If you wanted to find the full equation the hard way, you could plug a bunch of points into your graphing calculator and use regression, or I guess take 3 points on the graph and draw up a system of equations with them.
      (3 votes)
  • blobby green style avatar for user sumedhkulkarni740
    In this particular question solved above I don't get why they take 5 points increase for every increase in $1. I think it should be 5 points increase for every $0.5(that's how slope works, right?).

    "A panel is rating different kinds of potato chips. The scatterplot above shows the relationship between their average ratings and the price of the chips. The line of best fit for the data is also shown. According to the line of best fit, which of the following is closest to the predicted increase in average rating for every \$0.10$0.10dollar sign, 0, point, 10 increase in price?"
    (0 votes)
    Default Khan Academy avatar avatar for user
    • cacteye blue style avatar for user Marian Oxford
      A 5 points increase for every increase in $1 is correct. If you look at the graph above the $1 mark on the line of best fit, it's at 0 points. Then, when you move up the line of best fit until it is directly above the $2 mark (an increase of one dollar), it's at 5 points (an increase of five points).

      It might be confusing because the answer to the question is 0.5. The question, however, asked what the point increase would be for an increase of $0.10, not $1. You know that it's the same thing though, because if you multiply $0.10 and 0.5 by 5, you get $1 and 5.
      (1 vote)
  • blobby green style avatar for user mtalhaasif8
    I am taking SAT for the first time on MARCH 12 I am new at this test, as I have less time I could not decide how to manage my time and how I can do more work in less time, Secondly If I watch a video on a particular which I feel relatively weak in e.g. GRAPHS and SCATTERPLOTS in maths it takes me further to other relating videos and topic articles most of time spend on watching videos and do less practice what do you recommend should I practice more and watch fewer videos or remain to the point don't go into the deep concepts.
    (0 votes)
    Default Khan Academy avatar avatar for user
    • female robot amelia style avatar for user Johanna
      I’d recommend that you focus more on practice. Make sure to review your practices to get an idea of how/why you’re getting things right or wrong. Then if you find that there’s a topic you really don’t understand, check out a video or article on it for a bit.
      (1 vote)
  • blobby green style avatar for user akshararoysharma
    Are the questions asked in the actual test also this hard?
    (0 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user sissy koch
    Referring to the practice scatterplot for an elephants foot length vs. shoulder height, how is the y-intercept "Slightly negative" ? As one variable increased, the other did too?
    (0 votes)
    Default Khan Academy avatar avatar for user
    • female robot amelia style avatar for user Johanna
      What happens to one variable as the other changes relates to slope, not y-intercept. The y-intercept is the point where the line would intercept the y-axis, or the value of y if you substitute 0 for x. We see on the graph that the line of best fit intersects the y-axis slightly below zero, so the y-intercept is slightly negative.

      Does that help?
      (0 votes)