Main content

## Statistics and probability

### Course: Statistics and probability > Unit 2

Lesson 2: Describing and comparing distributions- Shapes of distributions
- Shape of distributions
- Clusters, gaps, peaks & outliers
- Clusters, gaps, & peaks in data distributions
- Comparing distributions with dot plots (example problem)
- Comparing distributions
- Comparing dot plots, histograms, and box plots
- Comparing data displays
- Example: Comparing distributions
- Comparing data distributions
- Comparing center and spread

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Comparing dot plots, histograms, and box plots

Data can be represented in various ways such as dot plots, histograms, and box plots. Dot plots and box plots are useful for finding the median, while histograms are great for showing the number of values within a specific range.

## Want to join the conversation?

- Is there anywhere that I can find what a box plot is and how they work? I have never heard of it before, and I do not understand how Sal is reading it throughout the video, or even what it is. Is it simply showing the range of the data points? Or is there more to it? And what does the box in the middle represent? Is there anywhere on Khan Academy that I can find this?(24 votes)
- A box plot (aka box and whisker plot) uses boxes and lines to depict the distributions of one or more groups of numeric data. Box limits indicate the range of the central 50% of the data, with a central line marking the median value. Lines extend from each box to capture the range of the remaining data, with dots placed past the line edges to indicate outliers.(1 vote)

- What the heck is a box plot? I never saw Sal talk about it in the previous videos.(14 votes)
- is there a short cut to finding the median and mode of a group of numbers?(16 votes)
- the mode is the most common number like 1,1,2,2,3,3,3,4,5,5,5,5,5,6,7,8,9,10,11, the mode is 5 I do not know any shortcut to the median(8 votes)

- How are dot plots and bar graphs similar?(12 votes)
- A dot plot is when you have dots represent as certain number of something (usually 1), where you can tell what it is representing by looking at the x-axis and you can tell how much there is by counting the dots. A bar graph is when you have one bar representing a number of something (and usually more than one bar), and you can tell what is representing by looking at the x-axis and you can tell how much there is by looking at y-axis and what number on it corresponds to the top of the graph. So basically, a dot graph is a bar graph with dots instead of bars and no y-axis.(12 votes)

- What in the world is a box plot? This was not covered anywhere in this section? Nor was finding a median! Help!(10 votes)
- The median is really easy to find. It's just the middle number in a set of data. If there are a even number of data, the median is the average of the 2 middle numbers. For example, the median in this set of data:

1 3 5 5 7

Is 5, because 5 is the middle number. In this set:

2 4 6 8

There are two middle numbers, 4 and 6. The median would be the average of the two, which means it's 5. Please note that before you find the middle number, the data must of course be organized from smallest to largest value. So for example,

1 3 7 8 9 3 2

Must be reordered as:

1 2 3 3 7 8 9

To find the median 3.

Now, a box plot is difficult to explain without a visual, so try going here: https://www.khanacademy.org/math/probability/data-distributions-a1/box--whisker-plots-a1/v/reading-box-and-whisker-plots(10 votes)

- What is quartile and interquartile?(8 votes)
- Well, there is an interquartile range. You can summarize the majority of data by using the interquartile range. The interquartile range is a value that is the difference between the upper quartile value and the lower quartile value. In descriptive statistics, the quartiles of a ranked set of data values are the three points that divide the data set into four equal groups, each group comprising a quarter of the data.(5 votes)

- What is the difference between a histogram and a bar graph? They look the same to me!(6 votes)
- A histogram takes the number of how many something are and add them up. Like if you were selling flip flops than on a histogram it would just be the number of flip flops. But on the bar graph you would have how many you sold with the price.(7 votes)

- In the practice questions preceding this video (labelled "Practice: Comparing distributions), you use box and whisker plots and ask questions about the average values of two data sets. I believe that your earlier discussion of box and whisker plots stated that the middle line showed the median, not the average. If this is the case, then it is not possible to use a box and whisker plot to answer questions regarding the average (arithmatic mean) of a data set. In the alternate, if the middle line in a box and whisker plot represents the arithmatic mean of a data set, then you should take care to refer to it as such in earlier videos. At2:16of this video, you state that the middle line of the box plot "explicitly tells us what the median is."(8 votes)
- i am lost after2:34how can you tell which one to use?(5 votes)
- So the first question, "Which display can be used to find how many vehicles had driven more than 200,000 km?" is asking: FROM WHICH GRAPH can you clearly see the EXACT number of vehicles have driven more than 200,000 km? The answer would be the histogram, because you can tally up the number of cars by counting- aka you get an exact number.

The second question, "Which display can be used to find that the median distance was approximately 140,000 km?" is asking: FROM WHICH GRAPH can you see that the median distance (or the MIDDLE VALUE) is approximately 140,000 km? So the answer to that would be the box and whisker plot, because if you know how a box and whisker plot works, you'll know that the line in the middle of the box is the median. In this case, you'll find that the median is approximately 140,000 km.

Hope this helps!(4 votes)

- Is there any chance you could add on box plots as a separate explanatory to this module?(6 votes)
- As I answered in another question, I found videos under Subject>Statistics and Probability>Displaying and Describing Data>Worked Example: Creating a Box Plot (Odd) {next is Even} Number of Display-Points and the video "constructing a Box Plot". I didn't find any earlier videos for Box Plots.(2 votes)

## Video transcript

- [Voiceover] What I wanna do with this video is look at some examples of data represented in different ways, and think about which representation is the best, or can help us answer different questions? So we see this first example. A statistician recorded the length of each of Pixar's first 14 films. The statistician made a dot plot, each dot is a film, a histogram, and a box plot to display the running time data. Which display could be used to find the median? To find the median. All right, so let's look at these displays. So over here we see, this is the dot plot. We have a dot for each of the 14 films. So one film had a running time of 81 minutes. We see that there. One film had a running time of 92. One had a running time of 93. We see one had a running time of 95. We see two had running times of 96 minutes, and so on and so forth. So I claim that I could use this to figure out the median, because I could make a list of all of the running times of the films, I could order them, and then I could find the middle value. I could literally make a list. I could write down 81, and then write down 92, then write down 93, then write down 95, then I could write down 96 twice, and then I could write down 98, then I could write down 100. I think you see where this is going. I could write out the entire list, and then I could find the middle values. So the dot plot, I could definitely use to find the median. Now, what about the histogram? This is the histogram right over here. And the key here is, for a median, to figure out a median, I just need to figure out a list of numbers. I need to figure out a list of numbers. So here, I don't know, they say I have one film that's between 80 and 85, but I don't know its exact running time. Its running time might have been 81 minutes, its running time might have been 84 minutes. So I don't know here, and so I can't really make a list of the running times of the films and find the middle values, so I don't think I'm gonna be able to do it using the histogram. Now, with the box plot right over here, so I'm not gonna click histogram. With the box plot over here, I might not be able to make a list of all the values, but the box plot explicitly tells us what the median is. This middle line in the middle of the box, that tells us the median is, what is this, this median is, if this is 100, this is 99. So this is 95, 96, 97, 98, 99. It explicitly tells us the median is 99. This is actually the easiest for calculating the median. So I'll go with the box plot. So the histogram is of no use to me if I wanna calculate the median. Let's do a couple more of these. Nam owns a used car lot. He checked the odometers of the cars and recorded how far they had driven. He then created both a histogram and a box plot to display the same data, both diagrams are shown below. Which display can be used to find how many vehicles had driven more than 200,000 kilometers? So how many vehicles had driven more than 200,000 kilometers? So it looks like here in this histogram, I have three vehicles that were between 200 and 250, and then I have two vehicles that are between 250 and 300. So it looks pretty clear that I have five vehicles, three that had a mileage between 200,000 and 250,000, and then I had two that had mileage between 250,000 and 300,000. So I may be able to answer the question. Five vehicles had a mileage more than 200,000, and so I would say that the histogram is pretty useful. But let's verify that the box plot isn't so useful. So I wanna know how many vehicles had a mileage more than 200,000. Well, I know that if I have a mileage more than 200,000, I'm going to be in the fourth quartile, but I don't know how many values I have sitting there in the fourth quartile just looking at this data over here, so that's not gonna be useful for answering that question. Let's look at the second question. Which display can be used to find that the median distance, which display can be used to find that the median distance was approximately 140,000 kilometers? Well, to calculate the median, you essentially wanna be able to list all of the numbers and then find the middle number. And over here, I can't list all of the numbers. I know that there's three values that are between zero and 50,000 kilometers, but I don't know what they are. Could be 10,000, 10,000, 10,000. It could be 10,000, 15,000, and 40,000. I don't know what they are, and so if I can't list all of these things and put them in order, I really am going to have trouble finding the middle value. The middle value, it's going to be in this range right around here, but I don't know exactly what it's going to be. The histogram is not useful, because throwing all the values into these buckets. While on the box plot, it explicitly, it directly tells me the median value. This line right over here, the middle of the box, this tells us the median value, and we see that the median value here, this is 140,000 kilometers. Right, this is 100, 110, 120, 130, 140,000 kilometers is the median mileage for the cars. And so the box plot clearly... clearly gives us that data.