If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Clusters in scatter plots

AP.STATS:
DAT‑1 (EU)
,
DAT‑1.A (LO)
,
DAT‑1.A.6 (EK)
CCSS.Math:
Learn what a cluster in a scatter plot is! 

What are clusters in scatter plots?

Sometimes the data points in a scatter plot form distinct groups. These groups are called clusters.
A scatterplot plots Sodium per serving in milligrams on the y-axis, versus Calories per serving on the x-axis. 16 points rise diagonally in a relatively narrow pattern with a cluster of 8 points between (135, 350) and (155, 360) and another cluster of 8 points between (170, 450) and (195, 500). Both clusters are labeled a different color.
Data source: Consumer Reports, June 1986, pp. 366-367
Consider the scatter plot above, which shows nutritional information for 16 brands of hot dogs in 1986. (Each point represents a brand.) The points form two clusters, one on the left and another on the right.
The left cluster is of brands that tend to be start color #1fab54, start text, l, o, w, space, i, n, space, c, a, l, o, r, i, e, s, space, a, n, d, space, l, o, w, space, i, n, space, s, o, d, i, u, m, end text, end color #1fab54.
The right cluster is of brands that tend to be start color #11accd, start text, h, i, g, h, space, i, n, space, c, a, l, o, r, i, e, s, space, a, n, d, space, h, i, g, h, space, i, n, space, s, o, d, i, u, m, end text, end color #11accd.

Practice problems

To better wrap our minds around the idea of clusters, let's try a couple of practice problems.

Problem 1: Male and female fish

Adult male Lamprologus callipterus (a type of fish) are much bigger than their female counterparts. They weigh about 13 times as much. Also, while females reach a length of 6 centimeters, males reach a length of 15 centimeters.
Which of the plots shown below might describe measurements of a group of adult Lamprologus callipterus?
Choose 1 answer:

Problem 2: SAT test scores

Some high school students in the U.S. take a test called the SAT before applying to colleges. The scatter plot below shows what percent of each state's college-bound graduates participated in the SAT in 2009, start text, negative, end text, 2010, along with that state's average score on the math section.
Data from National Center for Education Statistics
There is a cluster of states with start color #1fab54, start text, l, o, w, e, r, space, p, a, r, t, i, c, i, p, a, t, i, o, n, end text, end color #1fab54, and a cluster of states with start color #11accd, start text, h, i, g, h, e, r, space, p, a, r, t, i, c, i, p, a, t, i, o, n, end text, end color #11accd.
What is the best interpretation of these clusters?
Choose 1 answer:

Why do clusters exist in data?

Explaining why clusters exist in a particular data set can be difficult. This article presented three data sets, each using data from the real world. Only in the fish data set was there a clear explanation behind the clusters.
If you have a theory that explains the clusters in either of the other data sets, please share your thoughts in the comments below.

Want to join the conversation?

  • hopper cool style avatar for user Alex
    up vote for a cookie
    (22 votes)
    Default Khan Academy avatar avatar for user
  • purple pi teal style avatar for user Bella St.
    The ingredients in the hot dogs can effect their ratings
    (9 votes)
    Default Khan Academy avatar avatar for user
  • duskpin ultimate style avatar for user Peaches
    Perhaps states with lower participation but higher math shoes on SAT is correlated to ambition. For example, in states with lower educational progress, the majority of those students might not have the ambition of higher education as a priority. For that reason, the best performing of the state's students (the minority) would be the more likely to want to go to college, therefore participate in SAT testing. Or perhaps, certain states place more value on teaching every student SAT skills, whereas other states focus on individual students who already perform well and encourage those students to sit for the test. Or yet another scenario, perhaps the state's with the best education and smartest students have less participation due to some other factor (i.e. derision for the educational system as a whole, lack of ambition due to laziness from being accustomed to learning being so easy, boredom, etc).
    (6 votes)
    Default Khan Academy avatar avatar for user
  • aqualine ultimate style avatar for user 23kghamman
    This was very helpful for me, but I am a little confused on #2, could someone please explain to me?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user Hamilton, Michael - P3
      The problem is categorizing the dots into either lower or higher participation. It is asking you to find the best fitting answer, but in this case, it is basically just true or false.

      The first answer choice says, "The states with lower participation typically had lower math scores." If you look at the graph, all the green dots, which represent lower participation, are higher on the y-axis. The y-axis' variable is test scores. This means this is untrue.

      The second choice says, "The states with lower participation typically had higher math scores." Check the graph again. The green dots are high on the y-axis, representing higher test scores. This statement is true.
      (9 votes)
  • boggle purple style avatar for user Maddy!
    Im not ready for college LET ALONE THIS
    (6 votes)
    Default Khan Academy avatar avatar for user
  • duskpin ultimate style avatar for user Pranu
    I think that for the SAT problem, clusters might be present because in the states with lower participation because only the students that feel like taking the SAT is "worth it" or have confidence in their abilities take the test. This theory makes sense because of the math scores being higher for the students in states with lower participation. I'm not sure if this is really the reason, but I gave this a shot!
    (5 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Gage Davis
    also some dont make a lot sense like the college SAT's
    (5 votes)
    Default Khan Academy avatar avatar for user
  • starky ultimate style avatar for user weirderquark
    The hotdog brand clusters seem to be an example of competitive positioning in marketing. Hotdog brands need to be able to compete with other brands either by being healthier or by being tastier. Any brands with mid-range levels of sodium and calories will be cornered out of the market on all sides.
    (4 votes)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user ɘɒbiᴎoƚʏT ᴎᴎʏlꟻ
      Yes, that's a good point. Competitive positioning is an important aspect of marketing, and hot dog brands are no exception. Brands need to find a way to differentiate themselves from competitors and appeal to their target audience. In the case of hot dogs, some brands may choose to focus on health aspects, while others may focus on flavor and indulgence. This creates distinct clusters of brands with different value propositions, and as you mentioned, brands with mid-range levels of sodium and calories may struggle to find a place in the market.
      (2 votes)
  • duskpin seedling style avatar for user Petra
    wow this kind of helped me. yay
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user nicole.cook
    on #2 i don't understand why the answer is The states with lower participation typically had higher math scores.
    (2 votes)
    Default Khan Academy avatar avatar for user
    • primosaur seed style avatar for user Ian Pulizzotto
      The cluster of green points is to the left of the cluster of blue points, so the green cluster represents states with lower participation (since the horizontal axis represents participation). Most of the green points are above most of the blue points, so states with lower participation usually had higher math scores (since the vertical axis represents average math score).

      Have a blessed, wonderful day!
      (4 votes)