If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## AP®︎/College Statistics

### Course: AP®︎/College Statistics>Unit 3

Lesson 6: Graphical representations of summary statistics

# Judging outliers in a dataset

AP.STATS:
UNC‑1 (EU)
,
UNC‑1.K (LO)
,
UNC‑1.K.1 (EK)
CCSS.Math:
Using the inter-quartile range (IQR) to judge outliers in a dataset.

## Want to join the conversation?

• at how did you get 1.5?
• Good question!
1.5 is simply a given number that statisticians have decided on using when finding outliers, so it is just a part of the equation.
Hope this helps!
• In the bottom wiskers box plot I noticed the instructor only moved the lower wisker to 6. So he only changed the lower value to 6 since the outliers were removed. If the outliers are removed, wouldn't that also change the median and other quartiles?
• When Sal says he's going to "not include the outliers" he is only talking about not considering them for the min or max. You are correct that if you took them out of the data set completely it could affect all the quartiles.
• In this example, there were two 1s in the data. Sal puts a single dot to represent the two of them in the box plot as outliers. Can I put two dots instead? Or is it just a matter of convention to represent any number of outliers having the same numerical value with a single dot?
• You can just put 1 to indicate that there is data at that point.
• When you are creating box-and-whisker plots, how do you know when you should include outlier's or not?
• Outliers are by definition elements that exist outside of a pattern (i.e. it’s an extreme case or exception). While they might be due to anomalies (e.g. defects in measuring machines), they can also show uncertainty in our capability to measure. Just as there is no perfect mathematical model to characterize the universe, there isn’t a perfect machine to measure it. Hence, when plotting data sets one should never exclude outliers from plots. Someone might come up one day with a better model to characterize your data, and show those outliers are part of something magnificent.
• so q1 is median of first half and q2 is the median of the set and q3 is the median of second half so what is q4 ?!
• There is no Q4. To quarter a data set only needs three points.
• I'm working with count data for my thesis and apparently 94% of my data are outliers (a lot of zeros as well as extreme values when we find large clusters of the animals). Would I exclude 94% of the data in this instance just because they are outliers?
• What is your data set? Logically at least 50% of the data can't be considered as outliers because they would fall between Q1 and Q3. To calculate the outliers you see if they are < Q1 - 1.5 * IRQ or > Q3 + 1.5 * IRQ. So it is not possible to have 94% of your data as outliers.