If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: AP®︎/College Statistics>Unit 2

Lesson 3: Describing the distribution of a quantitative variable

# Example: Describing a distribution

Learn how to describe a distribution of quantitative data by discussing its shape, center, spread, and potential outliers.

## Want to join the conversation?

• Why does Sal say the same thing twice at ?
• He probably did a second take and just forgot to edit out the original take.
(1 vote)
• How does Sal know whether to calculate the mean or the median of the distribution when trying to describe the center? Also how does Sal know which one to calculate (range, IQR, MAD, standard deviation) when trying to describe the spread?
• Sal seems to be choosing the options that don't require him to pull out his calculator and do a lot of work. ^_^ There's nothing wrong with that; the idea is just to be able to briefly describe the key points of the distribution in a few words.
• we've never talked about IQR, MAD and how to find the mean and medium on this channel...
this lesson is very confusing for me...
• Khan only uses IQR and MAD as an example. You can understand this video without having to study about IQR and MAD first.
• i dont know what MAD means
• MAD stands for "Mean Absolute Deviation", and it is an average of how much each data point varies from the mean. The way you calculate this is: 1. find the mean of the data set 2. calculate the distance of each data point from the mean, you can do this by subtracting each point from the mean and taking the absolute value of the result (distance is always positive) 3. find the mean of the values you just got from taking the distance of each data point from the mean, in other words, add all the distances of the data points from the mean and divide by the total number of data points. Hope that helps, and good luck on your stats journey!
• Could an outlier be in the middle of a dot plot say the dot at 10 was moved closer to 5 and the dot was put at like 13 could that be classified as a outlier
• It's a bit late, but an outlier could technically be classified as such if it's far from the central cluster. So if the cluster is centered around values 5-10, then yes, 13 would be an outlier.
Again, outliers depend on the central cluster, so find that and you can identify the cluster.
Hope this helps! :)
• How do you determine which center and spread method to use?
(1 vote)
• For a measure of center, use mean or median. Because the mean is a non-resistant measure of center (it is easily influenced by outliers), it is typically used for data sets that are roughly symmetrical (they do not have any outliers or skewness). The median is used for data sets that are skewed or have outliers because it is a resistant measure of center (not influence) and gives an accurate portrayal of the data. For spread, range is non-resistant (since it only takes 2 values to calculate range, they might be outliers) and SD is also non-resistant (outliers and skewness may affect the SD). Range and SD should only be used in data sets that are roughly symmetrical. On the other hand, IQR is resistant to outliers, which is why it should be used for data sets that are skewed.

In other words, you should decide which measure to use based on the distribution of your data.
Hope this helps!
• For a measure of center, use means or median. Because the mean is a non-resistant measure of center (it is easily influenced by outliers), it is typically used for data sets that are roughly symmetrical (they do not have any outliers or skewness). The median is used for data sets that are skewed or have outliers because it is a resistant measure of center (not influence) and gives an accurate portrayal of the data. For spread, the range is non-resistant (since it only takes 2 values to calculate range, they might be outliers) and SD is also non-resistant (outliers and skewness may affect the SD). Range and SD should only be used in data sets that are roughly symmetrical. On the other hand, IQR is resistant to outliers, which is why it should be used for data sets that are skewed.

In other words, you should decide which measure to use based on the distribution of your data.
Hope this helps!