Main content

## High school statistics

### Course: High school statistics > Unit 1

Lesson 5: Box and whisker plots- Worked example: Creating a box plot (odd number of data points)
- Worked example: Creating a box plot (even number of data points)
- Creating box plots
- Reading box plots
- Reading box plots
- Interpreting box plots
- Interpreting quartiles
- Judging outliers in a dataset
- Identifying outliers

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Reading box plots

Is this some kind of cute cat video? No! Box and whisker plots seek to explain data by showing a spread of all the data points in a sample. The "whiskers" are the two opposite ends of the data. This video is more fun than a handful of catnip. Created by Sal Khan and Monterey Institute for Technology and Education.

## Want to join the conversation?

- What is a quartile?(84 votes)
- a quartile is a quarter of a box plot i hope this helps(30 votes)

- What is the purpose of Box and whisker plots?(44 votes)
- box plots are used to better organize data for easier veiw(24 votes)

- How do you organize quartiles if there are an odd number of data points? Say you have the set: 1, 2, 2, 4, 5, 6, 8, 9, 9. How would you distribute the quartiles? Thanks in advance.(17 votes)
- To divide data into quartiles when there is an odd number of values in your set, take the median, which in your example would be 5. Then take the data below the median and find the median of that set, which divides the set into the 1st and 2nd quartiles. Then take the data greater than the median and find the median of that set for the 3rd and 4th quartiles. In your example, the lower end of the interquartile range would be 2 and the upper end would be 8.5 (when there is even number of values in your set, take the mean and use it instead of the median). So the set would look something like this: 1
**2****2**4**5**6**8****9**9. Hope this helps.(29 votes)

- If it is half and half then why is the line not in the middle of the box?(11 votes)
- Because it is half of the
**Data***NOT*the number line itself.(8 votes)

- What is the interquartile range(10 votes)
- The interquartile range (IQR) is the difference between the first and third quartiles. Here is a link to the video: https://www.khanacademy.org/math/cc-sixth-grade-math/cc-6th-data-statistics/cc-6th/v/calculating-interquartile-range-iqr

Hope this helps!(10 votes)

- Just wondering, how come they call it a "quartile" instead of a "quarter of"? As far as I know, they mean the same thing. Can someone please explain this?(7 votes)
- The first and third quartiles are descriptive statistics that are measurements of position in a data set. Similar to how the median denotes the midway point of a data set, the first quartile marks the quarter or 25% point. Approximately 25% of the data values are less than or equal to the first quartile. The third quartile is similar, but for the upper 25% of data values. We will look into these idea in more detail in what follows.(9 votes)

- What are the 5 values we need to be able to draw a box and whisker plot and how do we find them?(7 votes)
- It has been a while since I've done a box and whisker plot, but I think I can remember them well enough.

You will first need to find the median of your dataset. This can be found by sorting all of your data least-greatest and then finding the middle.

Then, you will find your lower and upper quartiles. The lower quartile can be found by finding the median of the first half of your data - stop at the median you found previously. The upper quartile can be found by finding the median of the second half of your data - start at the median you found previously.

Then you have your minimum and maximum - these are just the smallest and largest values you have in your dataset.(7 votes)

- What is a interquartile?(4 votes)
- The interquartile range is the range of numbers between the first and third (or lower and upper) quartiles. It shows the spread of the middle 50% of a set of data(13 votes)

- What about if I have data points outside the upper and lower quartiles? What does this mean?(7 votes)
- You will almost always have data outside the quirtles. A quartile is a number that, along with the median, splits the data into quarters, hence the term quartile. One quarter of the data is the 1st quartile or below. One quarter of the data is at the 3rd quartile or above.

How far away the data is from the quartiles is necessary to finding the spread. Outliers, points 1.5 times the interquartile range above or below the 3rd and 1st quartiles, respectively, are sometimes shown as dots at the end of the whisker, depending on the tool used. Outliers can mean something intresting is happening in your data.(6 votes)

- im com fuesed(8 votes)

## Video transcript

An ecologist surveys the
age of about 100 trees in a local forest. He uses a box-and-whisker plot
to map his data shown below. What is the range of tree
ages that he surveyed? What is the median age
of a tree in the forest? So first of all, let's
make sure we understand what this box-and-whisker
plot is even about. This is really a way of
seeing the spread of all of the different data points,
which are the age of the trees, and to also give
other information like, what is the median? And where do most of the
ages of the trees sit? So this whisker part, so you
could see this black part is a whisker, this
is the box, and then this is another whisker
right over here. The whiskers tell us essentially
the spread of all of the data. So it says the lowest to
data point in this sample is an eight-year-old tree. I'm assuming that this axis
down here is in the years. And it says at the highest--
the oldest tree right over here is 50 years. So if we want the
range-- and when we think of range in a
statistics point of view we're thinking of
the highest data point minus the
lowest data point. So it's going to be 50 minus 8. So we have a range of 42. So that's what the
whiskers tell us. It tells us that everything
falls between 8 and 50 years, including 8 years and 50 years. Now what the box does,
the box starts at-- well, let me explain it
to you this way. This line right over
here, this is the median. And so half of
the ages are going to be less than this median. We see right over
here the median is 21. So this box-and-whiskers
plot tells us that half of the ages of
the trees are less than 21 and half are older than 21. And then these endpoints
right over here, these are the medians for
each of those sections. So this is the median
for all the trees that are less than
the real median or less than the main median. So this is in the middle
of all of the ages of trees that are less than 21. This is the middle
age for all the trees that are greater than
21 or older than 21. And so we're actually
splitting all of the data into four groups. This we would call
the first quartile. So I'll call it Q1 for
our first quartile. Maybe I'll do 1Q. This is the first quartile. Roughly a fourth of the
tree, because the way you calculate it,
sometimes a tree ends up in one point or another,
about a fourth of the trees end up here. A fourth of the trees
are between 14 and 21. A fourth are between 21
and it looks like 33. And then a fourth
are in this quartile. So we call this the first
quartile, the second quartile, the third quartile, and
the fourth quartile. So to answer the question,
we already did the range. There's a 42-year spread between
the oldest and the youngest tree. And then the median age of a
tree in the forest is at 21. So even though you might have
trees that are as old as 50, the median of the
forest is actually closer to the lower end of
our entire spectrum of all of the ages. So if you view median as your
central tendency measurement, it's only at 21 years. And you can even see it. It's closer to the
left of the box and closer to the end
of the left whisker than the end of
the right whisker.