If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: 8th grade (Illustrative Mathematics)>Unit 6

Lesson 7: Lesson 7: Observing more patterns in scatter plots

# Clusters in scatter plots

AP.STATS:
DAT‑1 (EU)
,
DAT‑1.A (LO)
,
DAT‑1.A.6 (EK)
CCSS.Math:
Learn what a cluster in a scatter plot is!

## What are clusters in scatter plots?

Sometimes the data points in a scatter plot form distinct groups. These groups are called clusters.
A scatterplot plots Sodium per serving in milligrams on the y-axis, versus Calories per serving on the x-axis. 16 points rise diagonally in a relatively narrow pattern with a cluster of 8 points between (135, 350) and (155, 360) and another cluster of 8 points between (170, 450) and (195, 500). Both clusters are labeled a different color.
Data source: Consumer Reports, June 1986, pp. 366-367
Consider the scatter plot above, which shows nutritional information for 16 brands of hot dogs in 1986. (Each point represents a brand.) The points form two clusters, one on the left and another on the right.
The left cluster is of brands that tend to be start color #1fab54, start text, l, o, w, space, i, n, space, c, a, l, o, r, i, e, s, space, a, n, d, space, l, o, w, space, i, n, space, s, o, d, i, u, m, end text, end color #1fab54.
The right cluster is of brands that tend to be start color #11accd, start text, h, i, g, h, space, i, n, space, c, a, l, o, r, i, e, s, space, a, n, d, space, h, i, g, h, space, i, n, space, s, o, d, i, u, m, end text, end color #11accd.

## Practice problems

To better wrap our minds around the idea of clusters, let's try a couple of practice problems.

### Problem 1: Male and female fish

Adult male Lamprologus callipterus (a type of fish) are much bigger than their female counterparts. They weigh about 13 times as much. Also, while females reach a length of 6 centimeters, males reach a length of 15 centimeters.
Which of the plots shown below might describe measurements of a group of adult Lamprologus callipterus?

### Problem 2: SAT test scores

Some high school students in the U.S. take a test called the SAT before applying to colleges. The scatter plot below shows what percent of each state's college-bound graduates participated in the SAT in 2009, start text, negative, end text, 2010, along with that state's average score on the math section.
Data from National Center for Education Statistics
There is a cluster of states with start color #1fab54, start text, l, o, w, e, r, space, p, a, r, t, i, c, i, p, a, t, i, o, n, end text, end color #1fab54, and a cluster of states with start color #11accd, start text, h, i, g, h, e, r, space, p, a, r, t, i, c, i, p, a, t, i, o, n, end text, end color #11accd.
What is the best interpretation of these clusters?

## Why do clusters exist in data?

Explaining why clusters exist in a particular data set can be difficult. This article presented three data sets, each using data from the real world. Only in the fish data set was there a clear explanation behind the clusters.
If you have a theory that explains the clusters in either of the other data sets, please share your thoughts in the comments below.

## Want to join the conversation?

• In lower participation, the best students participated and the average was therefore high.
• That's a good explanation for why there's a negative relationship between participation and average score, but it doesn't explain why the data is clustered. Why are there no schools with middling participation levels and middling average scores?

As a non-American I don't know enough about the US school system and college entrance to make more than a guess, but it looks like schools are incentivised to have higher average SAT scores. That would make sense if schools are rewarded with, for example, higher budgets if they have higher average SAT scores. So for schools with poorly performing students, there's no real reason to put any effort into trying to exclude some students from participating, but for schools where there are at least some decent proportion of high-performing students, schools have a chance of hitting those incentives if they try to convince the poorer performers not to take the test.

In that way, you tend to get only schools where almost everyone participates or schools where only the best participate, not anything in between. It also provides causality for the negative relationship that goes in the other direction from the one you stated. Not only does less participation mean you probably only have your best performers taking the test, but also the schools with the best best performers are the ones most likely to try to convince poor performers not to take the test.
• What is the point of this ?
• no one knows
(1 vote)
• Hot Dogs: Perhaps distinct clusters form due to certain hot dogs marketed as being healthier for you, and certain hot dog brands that are not attempting to be healthy for you. The healthy brands (low sodium & calories) may be clustered together because they are all competing with each other to advertise the lowest calories/lowest sodium. And maybe there is no "middle" brands that are trying to balance health with flavor. Either a brand is super focused on making a healthy hot dog or they don't worry about calorie and sodium levels at all in an effort to make the most delicious hot dog.
• That's an interesting hypothesis! It's possible that hot dog brands are indeed clustered based on their marketing strategies and target audience. Some brands may focus on catering to health-conscious consumers, while others may prioritize flavor and indulgence. It's also possible that there are some brands that try to strike a balance between health and taste, but they may not be as prominent in the market or may not have a distinctive enough profile to form a distinct cluster. Ultimately, more research would be needed to confirm or refute this hypothesis.
(1 vote)
• The ingredients in the hot dogs can effect their ratings
• Hard topic but good information.
• Keep at it Jaidan. After a lot of practice you will find that these hard topics don't seem so hard after all.
• This was very helpful for me, but I am a little confused on #2, could someone please explain to me?
• The problem is categorizing the dots into either lower or higher participation. It is asking you to find the best fitting answer, but in this case, it is basically just true or false.

The first answer choice says, "The states with lower participation typically had lower math scores." If you look at the graph, all the green dots, which represent lower participation, are higher on the y-axis. The y-axis' variable is test scores. This means this is untrue.

The second choice says, "The states with lower participation typically had higher math scores." Check the graph again. The green dots are high on the y-axis, representing higher test scores. This statement is true.
• also some dont make a lot sense like the college SAT's