If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## SAT

### Course: SAT>Unit 6

Lesson 5: Problem Solving and Data Analysis: lessons by skill

# Center, spread, and shape of distributions | Lesson

## What are "center, spread, and shape of distributions" problems, and how frequently do they appear on the test?

Center, spread, and shape of distributions are also known as summary statistics (or statistics for short); they concisely describe data sets.
• Center describes a typical value of in a data set. The SAT covers three measures of center: mean, median, and occasionally mode.
• Spread describes the variation of the data. Two measures of spread are range and standard deviation.
On your official SAT, you'll likely see 2 to 3 questions that test your ability to calculate, compare, and use the center, spread, and shape of distributions.
You can learn anything. Let's do this!

## What do the measures of center represent?

### Statistics intro: mean, median, & mode

Statistics intro: Mean, median, & modeSee video transcript

### How do I find the mean, median, and mode?

On the SAT, we need to know how to find the mean, median, and mode of a data set.

#### Mean

The mean is the average value of a data set.
start text, m, e, a, n, end text, equals, start fraction, start text, s, u, m, space, o, f, space, v, a, l, u, e, s, end text, divided by, start text, n, u, m, b, e, r, space, o, f, space, v, a, l, u, e, s, end text, end fraction

Example:
2, 5, 6, 7, 10
What is the mean of the data set above?

Example:
Pets ownedNumber of students
04
13
23
32
A teacher asked 12 students how many pets they owned. The results are shown in the table above. What is the average number of pets owned by the students?

#### Median

The median is the middle value when the data are ordered from least to greatest.
• If the number of values is odd, the median is the middle value.
• If the number of values is even, the median is the average of the two middle values.

Example:
9, 7, 12, 5, 9
What is the median of the data set above?

Example:
2, 5, 6, 7, 7, 10
What is the median of the data set above?

#### Mode

The mode is the value that appears most frequently in a data set. A data set can have no mode if no value appears more than any other; a data set can also have more than one mode.

Example:
1, 1, 2, 3, 3, 3, 3, 3, 8
What is the mode of the data set above?

### Try it!

Try: find the centers of a distribution
ItemPrice (dollars)
VHS tape3
Salt box2
Hammock15
Concert poster5
Hoodie5
Raccoon statue7
The table above shows the items Stevie bought from a garage sale and their prices.
What is the mean price of the items Stevie bought?
dollars
What is the median price of the items Stevie bought?
dollars
What is the mode of the prices?
dollars

## What do the measures of spread represent?

### Measures of spread: range, variance & standard deviation

Measures of spread: range, variance & standard deviationSee video transcript
Note: variance is not covered on the SAT, and you will not need to calculate standard deviation.

### How do I find the range and standard deviation?

On the SAT, we need to know how to find the range of a data set. While we won't be asked to calculate the standard deviation, we do need to have a sense of the relative standard deviations of two data sets.

#### Range

The range measures the total spread of the data; it is the difference between the maximum and minimum values.
start text, r, a, n, g, e, end text, equals, start text, m, a, x, i, m, u, m, space, v, a, l, u, e, end text, minus, start text, m, i, n, i, m, u, m, space, v, a, l, u, e, end text
A larger range indicates a greater spread in the data.

Example:
1, 9, 4, 3, 8
What is the range of the data set above?

#### Standard deviation

Standard deviation measures the typical spread from the mean; it is the average distance between the mean and a value in the data set.
Larger standard deviations indicate greater spread in the data.

Example:
Of the two dot plots shown above, which one has a greater standard deviation?

### Try it!

Try: compare two distributions
Guitar practice time in minutes
DayJazminPablo
Monday3030
Tuesday450
Wednesday3045
Thursday4530
Friday450
Saturday60120
Sunday6090
The table above shows the amount of time Jazmin and Pablo spent practicing guitar last week.
The range of Jazmin's practice times is
minutes.
The range of Pablo's practice times is
minutes.
Both Jazmin and Pablo practiced an average of 45 minutes a day. However, because Jazmin's practice times are
the 45-minute mean than Pablo's, the standard deviation of Jazmin's practice times is
that of Pablo's practice times.

## How do outliers affect summary statistics?

### Impact on median & mean: removing an outlier

Impact on median & mean: removing an outlierSee video transcript

### The effect of outliers

An outlier is a value in a data set that significantly differs from other values. The inclusion of outliers in data sets can greatly skew the summary statistics, which is why outliers are often removed from data sets.

#### Effect on the range and standard deviation

The inclusion of outliers increases the spread of data, leading to larger range and standard deviation. Conversely, removing outliers decreases the spread of data, leading to smaller range and standard deviation.

#### Effect on the mean

An outlier can significantly skew the mean of a data set. For example, consider the data set left brace, 3, comma, 5, comma, 7, comma, 7, comma, 10, comma, 100, right brace.
100 is an outlier; it is significantly larger than the other values in the data set. If we include the 100, the mean of the data set is:
start fraction, 3, plus, 5, plus, 7, plus, 7, plus, 10, plus, 100, divided by, 6, end fraction, equals, 22
Notice that the mean, 22, is greater than 5 of the 6 values in the data set! If we remove the 100, however, the mean of the remaining values is:
start fraction, 3, plus, 5, plus, 7, plus, 7, plus, 10, divided by, 5, end fraction, equals, 6, point, 4
The removal of an outlier is guaranteed to change the mean.
• If a very large outlier is removed, the mean of the remaining values will decrease.
• If a very small outlier is removed, the mean of the remaining values will increase.

#### Effect on the median

The median of the data set left brace, 3, comma, 5, comma, 7, comma, 7, comma, 10, comma, 100, right brace is 7.
If we remove the outlier 100, the median of the remaining values, left brace, 3, comma, 5, comma, 7, comma, 7, comma, 10, right brace, is still 7 !
Because the median is based on the middle values of a data set, an outlier does not affect the median of a data set as strongly as it affects the mean. As such, the removal of an outlier can still change the median, but that change is not guaranteed.
• If a very large outlier is removed, the median of the remaining value will either decrease or remain the same.
• If a very small outlier is removed, the median of the remaining value will either increase or remain the same.

### Try it!

Try: determine the effect of removing an outlier
The dot plot above shows the height in inches of 20 elementary school students.
If the shortest student is removed from the data set and the summary statistics are re-calculated, how would they compare to the summary statistics for all 20 students?
The mean height of the 19 remaining students would be
that of all 20 students.
The median height of the 19 remaining students would be
that of all 20 students.
The range of the heights of the 19 remaining students would be
that of all 20 students.

## How do I use the mean to calculate a missing value?

### Missing value given the mean

Missing value given the meanSee video transcript

### How do I solve for a missing value?

If we know the mean of a data set and the number of values, we can calculate a missing value in the data set by:
1. Calculating the sum of values by multiplying the mean by the number of values.
2. Subtract all known values from the sum of values.

Example:
20, 20, 40, 60, x
If the mean of the five numbers above is 30, what is the value of x ?

### Try it!

Try: find a missing value using the mean
GamePoints scored
111
2x
313
47
59
612
The table above shows the number of points Marco scored in the last six basketball games he played. Marco doesn't remember how many points he scored in game 2, but his coach tells him he averaged 10 points per game.
What is the total number of points Marco scored in the six games?
points
How many points did Marco score in games 1, 3, 4, 5, and 6 ?
points
How many points did Marco score in game 2 ?
points

Practice: compare two distributions
NameTest 1Test 2Test 3Test 4Test 5
Amara9895949395
Lance96951008896
Amara and Lance are taking the same class. The table above shows their test scores for the class. Which of the following statements about their test scores is true?

Practice: find the median given frequency data
Ned runs a soybean farm and recorded the yields for 175 different one-acre sections. The results are shown in the graph above. Which of the following could be the median yield of Ned's soybean acres?

Practice: determine the effects of changing a data set
The minimum value of a data set consisting of 15 positive integers is 29. A new data set consisting of 16 positive integers is created by including 22 in the original data set. Which of the following measures must be 7 greater for the new data set than for the original data set?

Practice: find a missing value using the mean
Last week, George drove an average of 52 miles per day. If the day he drove the longest distance is removed, the average distance he drove in the remaining 6 days becomes 40 miles per day. What was the longest distance, in miles, George drove in a single day last week?

## Things to remember

start text, m, e, a, n, end text, equals, start fraction, start text, s, u, m, space, o, f, space, v, a, l, u, e, s, end text, divided by, start text, n, u, m, b, e, r, space, o, f, space, v, a, l, u, e, s, end text, end fraction
The median is the middle value when the data are ordered from least to greatest.
• If the number of values is odd, the median is the middle value.
• If the number of values is even, the median is the average of the two middle values.
The mode is the most common value in a data set.
start text, r, a, n, g, e, end text, equals, start text, m, a, x, i, m, u, m, space, v, a, l, u, e, end text, minus, start text, m, i, n, i, m, u, m, space, v, a, l, u, e, end text
Standard deviation measures the typical spread from the mean.

## Want to join the conversation?

• How can the average number of pets a person owns is 1.25 pets? Seen in the question about average number of pets owned by the students. Wouldn't it make sense of round in this situation?
(1 vote)
• If the question asked something like "How many pets does the average person own", then you would definitely round to 1 pet, but I think that because the question specifically mentions the number of pets and not the pets themselves, it's fine to keep it as a decimal. On the actual SAT, it'll definitely be more explicit if you have to round, like in my first example.
• I don't understand the practice: on find the Median for a given frequency data
• Remember that the median is the middle number, so 50% of the numbers will be less than it and 50% greater than it. If you're given a graph charting the frequency of a data set, the median will be the point where the area of the part left of the median and the area of the part right of the median will be the same.
If you're given a data table, simply keep eliminating the both highest and lowest values until you get to either 2 or 1 numbers in the center. If you end up with 1, that's your median. If you end up with 2, the average of those numbers is your median.
• I'm a little confused by the second last question's explanation, why is it range again?
(1 vote)
• Whenever you see the question talk about some maximum or minimum value and then ask you for something like the center, spread, or shape, then start thinking range. In this question, if the minimum is 29, and then 22 gets added to it, the new minimum is 7 lower. Since range is the distance between the maximum and minimum values, this would have to increase by 7 if the minimum gets farther away from the maximum by 7.
• For the last qns (Last week, George...) why doesn't 40+x/7=52 work?
(1 vote)
• Assuming you meant (40+x)/7=52, the problem is that you’re only weighting the 40 once, when there were 6 days he averaged that speed. You would have to do (40*6+x)/7.
• how do i know 88th number is between 48 bushels?
• All you have to do is count! The leftmost bar represents the number of acres that were between 40 and 45 bushels. Together, these are the lowest 25 numbers. If you go over to the next column, it tells you that 70 acres had between 45 and 50 bushels. This means that we now have 95 of the lowest numbers. 95 is greater than 88, so our 88th number must be somewhere between 45 and 50. The only answer choice that matches this is 48 bushels.
(1 vote)
• hey umm, i thought that we're supposed to consider any value just once - even if it repeats - when we're determining the median of a set.
(1 vote)
• Well, unfortunately not. The median is the middle values of all the values in the data, even if a lot of the values are repeated. We still consider them.
• i didnt understand how effect of outliers work for the value of median
(1 vote)
• Though outliers will influence the mean a lot, the median is much more stable, so it will stay about the same.