If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: AP®︎/College Statistics>Unit 3

Lesson 3: Measuring variability in quantitative data

# Mean and standard deviation versus median and IQR

Learn to choose the "preferred" measures of center and spread when outliers are present in a set of data.

## Want to join the conversation?

• 1,2,3 ,1000,2000,10000,20000
median is 1000.
It just tries to stay in between.
Mean is like finding a point that is closest to all. But it gets skewed.
If for a distribution,if mean is bad then so is SD, obvio.
Standard deviation is how many points deviate from the mean.
For two datasets, the one with a bigger range is more likely to be the more dispersed one.
IQR is like focusing on the middle portion of sorted data. So it doesn’t get skewed.

Why not use IQR Range only.
Use standard deviation using the median instead of mean.
Create levels expanding from the IQR range, level 1, level 2.
Is it a good idea?
• When you perform an exploratory data analysis you may be interested the range.

There is no such thing as IQR range. IQR is a form of range (interquartile range).

There is no such thing as levels in IQR. But perhaps you can create a new feature if you feel it is necessary.
• How about mode? Wouldn't that often be more reliable? Like when calculating the average salary in a large population - would the amount most people make not seem the most representative?
• Not necessary Powel . The example Carlos explained above is accurate
• If median and IQR are preferred when there are outliers, doesn't that imply that they are more accurate when there is any variance at all?

The only case where mean and standard deviation are going to be as accurate as median and IQR is if there is no variance at all in the data.

With that being said, is there any situation where mean and standard deviation would be preferable?
• While median and IQR are more robust in the presence of outliers, mean and standard deviation are still useful in certain situations:

- If the data is symmetrically distributed around the mean without significant outliers, mean and standard deviation can provide a good representation of the data's central tendency and spread.
- In datasets that follow a normal distribution, mean and standard deviation are commonly used because they accurately summarize the distribution's properties.
- Mean and standard deviation are often preferred for mathematical calculations and comparisons between different datasets due to their mathematical properties and ease of interpretation.

Ultimately, the choice between mean/standard deviation and median/IQR depends on the nature of the data and the specific objectives of the analysis. If the data is heavily skewed or contains outliers, using median and IQR can provide a more accurate representation of the central tendency and spread.
• what does the Standard deviation have to do with the IQR
(1 vote)
• They are both measures of how far the typical data point is from the center--either the mean or the median, depending on which you use.
• why cant we mix and match
? as we figure out that median captures central tendency better. why cant we still use median in standard deviation formula?. That would be better capturing total variance/spread in the data set
• interesting idea

and it would remedy the misleading by biased mean a bit

but the skew and thus bias by an outlier remain even with median for calculating standard deviation.

i think that's why we better rely on IQR in that type of situations as it can simply ignore too extreme cases.
(1 vote)
• is there any simpler way to explain this?
• Sure! In simpler terms, when we talk about "sample biased variance," we're talking about a correction needed when calculating the variance from a sample to make it more accurate. And when we say the data is "skewed," we mean it's not evenly spread out around the average. It's like if most people earn around \$50,000, but one person earns \$1 million, that would skew the average salary higher.
(1 vote)
• i have 2 questions.. the first one is on variance... why was the previous video refer to it as sample biased variance.. what does it mean? the second question is the term skew.. what does it mean here? thank you
• The term "sample biased variance" likely refers to the fact that when calculating the variance from a sample (as opposed to the entire population), dividing by n−1 instead of n (Bessel's correction) is necessary to correct for bias in the estimation of the population variance. This correction accounts for the fact that using n instead of n−1 tends to underestimate the true population variance, making the sample variance biased.

In statistics, "skew" refers to the asymmetry or lack of symmetry in a distribution of data. If a distribution is skewed, it means that the data points are not symmetrically distributed around the mean. In the context of the video, the term "skewed data set" implies that the distribution of salaries is not symmetric, likely due to the presence of extreme values (outliers) like the \$250,000 salary mentioned. These extreme values can disproportionately influence measures of central tendency like the mean, causing it to be skewed or distorted.
(1 vote)
• Would the mean be robust if there are outliers on both sides of the main group of data points?
(1 vote)
• Still no because it is unknown how drastically the outliers differ from each other. For example, if most of the data were from 50-60 one of the outliers could be 30 while another outlier is 200. Thus if any outliers as a general reasons use the median.