Statistics and probability
- Worked example: Creating a box plot (odd number of data points)
- Worked example: Creating a box plot (even number of data points)
- Constructing a box plot
- Creating box plots
- Reading box plots
- Reading box plots
- Interpreting box plots
- Interpreting quartiles
- Box plot review
- Judging outliers in a dataset
- Identifying outliers
- Identifying outliers with the 1.5xIQR rule
Reading box plots
Is this some kind of cute cat video? No! Box and whisker plots seek to explain data by showing a spread of all the data points in a sample. The "whiskers" are the two opposite ends of the data. This video is more fun than a handful of catnip. Created by Sal Khan and Monterey Institute for Technology and Education.
Want to join the conversation?
- What is a quartile?(79 votes)
- a quartile is a quarter of a box plot i hope this helps(21 votes)
- What is the purpose of Box and whisker plots?(42 votes)
- box plots are used to better organize data for easier veiw(21 votes)
- How do you organize quartiles if there are an odd number of data points? Say you have the set: 1, 2, 2, 4, 5, 6, 8, 9, 9. How would you distribute the quartiles? Thanks in advance.(16 votes)
- To divide data into quartiles when there is an odd number of values in your set, take the median, which in your example would be 5. Then take the data below the median and find the median of that set, which divides the set into the 1st and 2nd quartiles. Then take the data greater than the median and find the median of that set for the 3rd and 4th quartiles. In your example, the lower end of the interquartile range would be 2 and the upper end would be 8.5 (when there is even number of values in your set, take the mean and use it instead of the median). So the set would look something like this: 1 2 2 4 5 6 8 9 9. Hope this helps.(25 votes)
- What is the interquartile range(9 votes)
- The interquartile range (IQR) is the difference between the first and third quartiles. Here is a link to the video: https://www.khanacademy.org/math/cc-sixth-grade-math/cc-6th-data-statistics/cc-6th/v/calculating-interquartile-range-iqr
Hope this helps!(8 votes)
- If it is half and half then why is the line not in the middle of the box?(9 votes)
- Because it is half of the Data NOT the number line itself.(5 votes)
- Just wondering, how come they call it a "quartile" instead of a "quarter of"? As far as I know, they mean the same thing. Can someone please explain this?(7 votes)
- The first and third quartiles are descriptive statistics that are measurements of position in a data set. Similar to how the median denotes the midway point of a data set, the first quartile marks the quarter or 25% point. Approximately 25% of the data values are less than or equal to the first quartile. The third quartile is similar, but for the upper 25% of data values. We will look into these idea in more detail in what follows.(8 votes)
- What about if I have data points outside the upper and lower quartiles? What does this mean?(7 votes)
- You will almost always have data outside the quirtles. A quartile is a number that, along with the median, splits the data into quarters, hence the term quartile. One quarter of the data is the 1st quartile or below. One quarter of the data is at the 3rd quartile or above.
How far away the data is from the quartiles is necessary to finding the spread. Outliers, points 1.5 times the interquartile range above or below the 3rd and 1st quartiles, respectively, are sometimes shown as dots at the end of the whisker, depending on the tool used. Outliers can mean something intresting is happening in your data.(6 votes)
- What is a interquartile?(4 votes)
- The interquartile range is the range of numbers between the first and third (or lower and upper) quartiles. It shows the spread of the middle 50% of a set of data(12 votes)
- What are the 5 values we need to be able to draw a box and whisker plot and how do we find them?(6 votes)
- It has been a while since I've done a box and whisker plot, but I think I can remember them well enough.
You will first need to find the median of your dataset. This can be found by sorting all of your data least-greatest and then finding the middle.
Then, you will find your lower and upper quartiles. The lower quartile can be found by finding the median of the first half of your data - stop at the median you found previously. The upper quartile can be found by finding the median of the second half of your data - start at the median you found previously.
Then you have your minimum and maximum - these are just the smallest and largest values you have in your dataset.(6 votes)
- How can I find the mean with a box plot?(6 votes)
An ecologist surveys the age of about 100 trees in a local forest. He uses a box-and-whisker plot to map his data shown below. What is the range of tree ages that he surveyed? What is the median age of a tree in the forest? So first of all, let's make sure we understand what this box-and-whisker plot is even about. This is really a way of seeing the spread of all of the different data points, which are the age of the trees, and to also give other information like, what is the median? And where do most of the ages of the trees sit? So this whisker part, so you could see this black part is a whisker, this is the box, and then this is another whisker right over here. The whiskers tell us essentially the spread of all of the data. So it says the lowest to data point in this sample is an eight-year-old tree. I'm assuming that this axis down here is in the years. And it says at the highest-- the oldest tree right over here is 50 years. So if we want the range-- and when we think of range in a statistics point of view we're thinking of the highest data point minus the lowest data point. So it's going to be 50 minus 8. So we have a range of 42. So that's what the whiskers tell us. It tells us that everything falls between 8 and 50 years, including 8 years and 50 years. Now what the box does, the box starts at-- well, let me explain it to you this way. This line right over here, this is the median. And so half of the ages are going to be less than this median. We see right over here the median is 21. So this box-and-whiskers plot tells us that half of the ages of the trees are less than 21 and half are older than 21. And then these endpoints right over here, these are the medians for each of those sections. So this is the median for all the trees that are less than the real median or less than the main median. So this is in the middle of all of the ages of trees that are less than 21. This is the middle age for all the trees that are greater than 21 or older than 21. And so we're actually splitting all of the data into four groups. This we would call the first quartile. So I'll call it Q1 for our first quartile. Maybe I'll do 1Q. This is the first quartile. Roughly a fourth of the tree, because the way you calculate it, sometimes a tree ends up in one point or another, about a fourth of the trees end up here. A fourth of the trees are between 14 and 21. A fourth are between 21 and it looks like 33. And then a fourth are in this quartile. So we call this the first quartile, the second quartile, the third quartile, and the fourth quartile. So to answer the question, we already did the range. There's a 42-year spread between the oldest and the youngest tree. And then the median age of a tree in the forest is at 21. So even though you might have trees that are as old as 50, the median of the forest is actually closer to the lower end of our entire spectrum of all of the ages. So if you view median as your central tendency measurement, it's only at 21 years. And you can even see it. It's closer to the left of the box and closer to the end of the left whisker than the end of the right whisker.