Main content
SAT
Course: SAT > Unit 6
Lesson 5: Problem Solving and Data Analysis: lessons by skill- Ratios, rates, and proportions | Lesson
- Percents | Lesson
- Units | Lesson
- Table data | Lesson
- Scatterplots | Lesson
- Key features of graphs | Lesson
- Linear and exponential growth | Lesson
- Data inferences | Lesson
- Center, spread, and shape of distributions | Lesson
- Data collection and conclusions | Lesson
© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Center, spread, and shape of distributions | Lesson
What are "center, spread, and shape of distributions" problems, and how frequently do they appear on the test?
Center, spread, and shape of distributions are also known as summary statistics (or statistics for short); they concisely describe data sets.
- Center describes a typical value of in a data set. The SAT covers three measures of center: mean, median, and occasionally mode.
- Spread describes the variation of the data. Two measures of spread are range and standard deviation.
On your official SAT, you'll likely see 2 to 3 questions that test your ability to calculate, compare, and use the center, spread, and shape of distributions.
You can learn anything. Let's do this!
What do the measures of center represent?
Statistics intro: mean, median, & mode
How do I find the mean, median, and mode?
On the SAT, we need to know how to find the mean, median, and mode of a data set.
Mean
The mean is the average value of a data set.
Example:
2, 5, 6, 7, 10
What is the mean of the data set above?
Example:
Pets owned | Number of students |
---|---|
0 | 4 |
1 | 3 |
2 | 3 |
3 | 2 |
A teacher asked 12 students how many pets they owned. The results are shown in the table above. What is the average number of pets owned by the students?
Median
The median is the middle value when the data are ordered from least to greatest.
- If the number of values is odd, the median is the middle value.
- If the number of values is even, the median is the average of the two middle values.
Example:
9, 7, 12, 5, 9
What is the median of the data set above?
Example:
2, 5, 6, 7, 7, 10
What is the median of the data set above?
Mode
The mode is the value that appears most frequently in a data set. A data set can have no mode if no value appears more than any other; a data set can also have more than one mode.
Example:
1, 1, 2, 3, 3, 3, 3, 3, 8
What is the mode of the data set above?
Try it!
What do the measures of spread represent?
Measures of spread: range, variance & standard deviation
Note: variance is not covered on the SAT, and you will not need to calculate standard deviation.
How do I find the range and standard deviation?
On the SAT, we need to know how to find the range of a data set. While we won't be asked to calculate the standard deviation, we do need to have a sense of the relative standard deviations of two data sets.
Range
The range measures the total spread of the data; it is the difference between the maximum and minimum values.
A larger range indicates a greater spread in the data.
Example:
1, 9, 4, 3, 8
What is the range of the data set above?
Standard deviation
Standard deviation measures the typical spread from the mean; it is the average distance between the mean and a value in the data set.
Larger standard deviations indicate greater spread in the data.
Example:
Of the two dot plots shown above, which one has a greater standard deviation?
Try it!
How do outliers affect summary statistics?
Impact on median & mean: removing an outlier
The effect of outliers
An outlier is a value in a data set that significantly differs from other values. The inclusion of outliers in data sets can greatly skew the summary statistics, which is why outliers are often removed from data sets.
Effect on the range and standard deviation
The inclusion of outliers increases the spread of data, leading to larger range and standard deviation. Conversely, removing outliers decreases the spread of data, leading to smaller range and standard deviation.
Effect on the mean
An outlier can significantly skew the mean of a data set. For example, consider the data set left brace, 3, comma, 5, comma, 7, comma, 7, comma, 10, comma, 100, right brace.
100 is an outlier; it is significantly larger than the other values in the data set. If we include the 100, the mean of the data set is:
Notice that the mean, 22, is greater than 5 of the 6 values in the data set! If we remove the 100, however, the mean of the remaining values is:
The removal of an outlier is guaranteed to change the mean.
- If a very large outlier is removed, the mean of the remaining values will decrease.
- If a very small outlier is removed, the mean of the remaining values will increase.
Effect on the median
The median of the data set left brace, 3, comma, 5, comma, 7, comma, 7, comma, 10, comma, 100, right brace is 7.
If we remove the outlier 100, the median of the remaining values, left brace, 3, comma, 5, comma, 7, comma, 7, comma, 10, right brace, is still 7 !
Because the median is based on the middle values of a data set, an outlier does not affect the median of a data set as strongly as it affects the mean. As such, the removal of an outlier can still change the median, but that change is not guaranteed.
- If a very large outlier is removed, the median of the remaining value will either decrease or remain the same.
- If a very small outlier is removed, the median of the remaining value will either increase or remain the same.
Try it!
How do I use the mean to calculate a missing value?
Missing value given the mean
How do I solve for a missing value?
If we know the mean of a data set and the number of values, we can calculate a missing value in the data set by:
- Calculating the sum of values by multiplying the mean by the number of values.
- Subtract all known values from the sum of values.
Example:
20, 20, 40, 60, x
If the mean of the five numbers above is 30, what is the value of x ?
Try it!
Your turn!
Things to remember
The median is the middle value when the data are ordered from least to greatest.
- If the number of values is odd, the median is the middle value.
- If the number of values is even, the median is the average of the two middle values.
The mode is the most common value in a data set.
Standard deviation measures the typical spread from the mean.
Want to join the conversation?
- How can the average number of pets a person owns is 1.25 pets? Seen in the question about average number of pets owned by the students. Wouldn't it make sense of round in this situation?(1 vote)
- If the question asked something like "How many pets does the average person own", then you would definitely round to 1 pet, but I think that because the question specifically mentions the number of pets and not the pets themselves, it's fine to keep it as a decimal. On the actual SAT, it'll definitely be more explicit if you have to round, like in my first example.(12 votes)
- I don't understand the practice: on find the Median for a given frequency data(2 votes)
- Remember that the median is the middle number, so 50% of the numbers will be less than it and 50% greater than it. If you're given a graph charting the frequency of a data set, the median will be the point where the area of the part left of the median and the area of the part right of the median will be the same.
If you're given a data table, simply keep eliminating the both highest and lowest values until you get to either 2 or 1 numbers in the center. If you end up with 1, that's your median. If you end up with 2, the average of those numbers is your median.(3 votes)
- I'm a little confused by the second last question's explanation, why is it range again?(1 vote)
- Whenever you see the question talk about some maximum or minimum value and then ask you for something like the center, spread, or shape, then start thinking range. In this question, if the minimum is 29, and then 22 gets added to it, the new minimum is 7 lower. Since range is the distance between the maximum and minimum values, this would have to increase by 7 if the minimum gets farther away from the maximum by 7.(3 votes)
- For the last qns (Last week, George...) why doesn't 40+x/7=52 work?(1 vote)
- Assuming you meant (40+x)/7=52, the problem is that you’re only weighting the 40 once, when there were 6 days he averaged that speed. You would have to do (40*6+x)/7.(3 votes)
- how do i know 88th number is between 48 bushels?(2 votes)
- All you have to do is count! The leftmost bar represents the number of acres that were between 40 and 45 bushels. Together, these are the lowest 25 numbers. If you go over to the next column, it tells you that 70 acres had between 45 and 50 bushels. This means that we now have 95 of the lowest numbers. 95 is greater than 88, so our 88th number must be somewhere between 45 and 50. The only answer choice that matches this is 48 bushels.(1 vote)
- hey umm, i thought that we're supposed to consider any value just once - even if it repeats - when we're determining the median of a set.(1 vote)
- Well, unfortunately not. The median is the middle values of all the values in the data, even if a lot of the values are repeated. We still consider them.(3 votes)
- i didnt understand how effect of outliers work for the value of median(1 vote)
- Though outliers will influence the mean a lot, the median is much more stable, so it will stay about the same.(2 votes)
- what about shape? it only went over center and spread.(0 votes)
- That is true, and you might want to make a suggestion that Khan Academy adds a bullet point at the beginning just explaining what they mean by the shape. The SAT definitely won't test actual distributions that you may learn in a statistics class or anything as high-level as that, so shape could just be another way to say the distribution's spread and how concentrated the values are, which standard deviation can tell you. It might also have been intended to mean something related to strong and weak correlations in a scatterplot. (I'm not super sure if the SAT tests that, either)(4 votes)
- I am confused about outliner affecting mean(1 vote)
- is there a way to calculate the median for very large values, because as in one of the practice questions, it takes too much time to lay down every number and cancel from both sides to get the median(1 vote)
- There is really no shortcut formula to calculating the median for data sets with really high values on the SAT. If you're finding that this takes a long though, one thing you can do is cancel groups of numbers from both sides instead of every number individually. Let's say you had the following data set:
12, 12, 12, 13, 16, 23, 27, 28, 31, 31, 31, 31
You can see that there are 5 numbers in the teens and 4 numbers in the thirties. If you cancel the 4 numbers in the thirties, plus the next largest number (28), then that will balance out cancelling the 5 teen numbers. After doing the cancellation of those numbers, we see that the median has to be (23+27)/2 = 25
Hope this little shortcut can help you!(1 vote)