Clusters in scatter plots

Learn what a cluster in a scatter plot is! 

What are clusters in scatter plots?

Sometimes the data points in a scatter plot form distinct groups. These groups are called clusters.
A scatterplot plots Sodium per serving in milligrams on the y-axis, versus Calories per serving on the x-axis. 16 points rise diagonally in a relatively narrow pattern with a cluster of 8 points between (135, 350) and (155, 360) and another cluster of 8 points between (170, 450) and (195, 500). Both clusters are labeled a different color.
Data source: Consumer Reports, June 1986, pp. 366-367
Consider the scatter plot above, which shows nutritional information for 16 brands of hot dogs in 1986. (Each point represents a brand.) The points form two clusters, one on the left and another on the right.
The left cluster is of brands that tend to be low in calories and low in sodium.
The right cluster is of brands that tend to be high in calories and high in sodium.

Practice problems

To better wrap our minds around the idea of clusters, let's try a couple of practice problems.

Problem 1: Male and female fish

Adult male Lamprologus callipterus (a type of fish) are much bigger than their female counterparts. They weigh about 13 times as much. Also, while females reach a length of 6 centimeters, males reach a length of 15 centimeters.
Which of the plots shown below might describe measurements of a group of adult Lamprologus callipterus?
Choose 1 answer:

Problem 2: SAT test scores

Some high school students in the U.S. take a test called the SAT before applying to colleges. The scatter plot below shows what percent of each state's college-bound graduates participated in the SAT in 2009-2010, along with that state's average score on the math section.
A scatterplot. Participation (percentage taking SAT) on the x axis, versus Average math score on the y axis. 47 points fall diagonally with a cluster of points between (3, 615) and (25, 525) and another cluster of points between (43, 500) and (93, 500). The clusters are labeled a different color. All points are estimated.
Data from National Center for Education Statistics
There is a cluster of states with lower participation, and a cluster of states with higher participation.
What is the best interpretation of these clusters?
Choose 1 answer:

Why do clusters exist in data?

Explaining why clusters exist in a particular data set can be difficult. This article presented three data sets, each using data from the real world. Only in the fish data set was there a clear explanation behind the clusters.
If you have a theory that explains the clusters in either of the other data sets, please share your thoughts in the comments below.

