If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Outliers in scatter plots

Learn what an outlier is and how to find one!

What are outliers in scatter plots?

Scatter plots often have a pattern. We call a data point an outlier if it doesn't fit the pattern.
A scatterplot plots Backpack weight in kilograms on the y-axis, versus Student weight in kilograms on the x-axis. 5 points rise diagonally in a narrow pattern of points between (40, 4) and (76, 12 and 1 half). A point labeled Sharon is above the pattern. A point labeled Brad is below the pattern. All points are estimated.
Consider the scatter plot above, which shows data for students on a backpacking trip. (Each point represents a student.)
Notice how two of the points don't fit the pattern very well. These points have been labeled Brad and Sharon, which are the names of the students they represent.
Sharon could be considered an outlier because she is carrying a much heavier backpack than the pattern predicts.
Brad could be considered an outlier because he is carrying a much lighter backpack than the pattern predicts.
Key idea: There is no special rule that tells us whether or not a point is an outlier in a scatter plot. When doing more advanced statistics, it may become helpful to invent a precise definition of "outlier", but we don't need that yet.

Practice problems

To fully wrap our minds around why certain data points might be considered outliers, let's try a couple of practice problems.

Problem 1: Computer shopping

A scatterplot plots Quality rating on the y-axis, versus Price in dollars on the x-axis. 12 points rise diagonally in a relatively narrow pattern of points between (125, 60) and (175, 80). A point labeled A is at the beginning of the pattern. A point labeled B is above the pattern. A point labeled C is at the end of the pattern. A point labeled D is below the pattern to the right. All points are estimated.
Michelle was researching different computers to buy for college. She looked up the prices and quality ratings for a sample of computers. Her data is shown in the scatter plot to the right, where each point is a computer.
Michele wants to buy a computer whose quality rating is far higher than the pattern would predict based on its price.
Which of the labeled points represents a computer that Michele wants to buy?
Choose 1 answer:

Problem 2: Test scores

A scatterplot. Participation (percentage taking SAT) on the x axis, versus Average math score on the y axis. 44 points fall diagonally with a cluster of points between (3, 615) and (25, 525) and another cluster of points between (43, 500) and (85, 500). 3 solid points are a different color and labeled point A, point B, and point C. Point A is plotted at (16, 505). Point B is plotted at (76, 465). Point C is plotted at (93, 468). All points are estimated.
Some high school students in the U.S. take a test called the SAT before applying to colleges. The scatter plot to the right shows what percent of each state's college-bound graduates took the SAT in 2009-2010, along with that state's average score on the math section.
The three labeled points could be considered outliers.
Why might these points be considered outliers?
Choose 1 answer:

Want to join the conversation?