If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Comparing dot plots, histograms, and box plots

Data can be represented in various ways such as dot plots, histograms, and box plots. Dot plots and box plots are useful for finding the median, while histograms are great for showing the number of values within a specific range.

Want to join the conversation?

  • leaf green style avatar for user kelsey.driediger
    Is there anywhere that I can find what a box plot is and how they work? I have never heard of it before, and I do not understand how Sal is reading it throughout the video, or even what it is. Is it simply showing the range of the data points? Or is there more to it? And what does the box in the middle represent? Is there anywhere on Khan Academy that I can find this?
    (26 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user ㅤ
      A box plot (aka box and whisker plot) uses boxes and lines to depict the distributions of one or more groups of numeric data. Box limits indicate the range of the central 50% of the data, with a central line marking the median value. Lines extend from each box to capture the range of the remaining data, with dots placed past the line edges to indicate outliers.
      (1 vote)
  • leafers seedling style avatar for user Jonathan Haroun
    What the heck is a box plot? I never saw Sal talk about it in the previous videos.
    (12 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user ✌PIXholic Studios
    is there a short cut to finding the median and mode of a group of numbers?
    (17 votes)
    Default Khan Academy avatar avatar for user
  • starky seedling style avatar for user Emon Chandra
    How are dot plots and bar graphs similar?
    (12 votes)
    Default Khan Academy avatar avatar for user
    • primosaur tree style avatar for user Bradley Reynolds
      A dot plot is when you have dots represent as certain number of something (usually 1), where you can tell what it is representing by looking at the x-axis and you can tell how much there is by counting the dots. A bar graph is when you have one bar representing a number of something (and usually more than one bar), and you can tell what is representing by looking at the x-axis and you can tell how much there is by looking at y-axis and what number on it corresponds to the top of the graph. So basically, a dot graph is a bar graph with dots instead of bars and no y-axis.
      (12 votes)
  • leaf green style avatar for user Iris M Gross
    What in the world is a box plot? This was not covered anywhere in this section? Nor was finding a median! Help!
    (10 votes)
    Default Khan Academy avatar avatar for user
  • duskpin ultimate style avatar for user WeideVR
    What is quartile and interquartile?
    (8 votes)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user Wendy Cao
      Well, there is an interquartile range. You can summarize the majority of data by using the interquartile range. The interquartile range is a value that is the difference between the upper quartile value and the lower quartile value. In descriptive statistics, the quartiles of a ranked set of data values are the three points that divide the data set into four equal groups, each group comprising a quarter of the data.
      (5 votes)
  • duskpin ultimate style avatar for user craycray_unicorn
    What is the difference between a histogram and a bar graph? They look the same to me!
    (6 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Tim Cary
    In the practice questions preceding this video (labelled "Practice: Comparing distributions), you use box and whisker plots and ask questions about the average values of two data sets. I believe that your earlier discussion of box and whisker plots stated that the middle line showed the median, not the average. If this is the case, then it is not possible to use a box and whisker plot to answer questions regarding the average (arithmatic mean) of a data set. In the alternate, if the middle line in a box and whisker plot represents the arithmatic mean of a data set, then you should take care to refer to it as such in earlier videos. At of this video, you state that the middle line of the box plot "explicitly tells us what the median is."
    (8 votes)
    Default Khan Academy avatar avatar for user
  • starky sapling style avatar for user candacemckinley
    i am lost after how can you tell which one to use?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • starky ultimate style avatar for user sf_3
      So the first question, "Which display can be used to find how many vehicles had driven more than 200,000 km?" is asking: FROM WHICH GRAPH can you clearly see the EXACT number of vehicles have driven more than 200,000 km? The answer would be the histogram, because you can tally up the number of cars by counting- aka you get an exact number.

      The second question, "Which display can be used to find that the median distance was approximately 140,000 km?" is asking: FROM WHICH GRAPH can you see that the median distance (or the MIDDLE VALUE) is approximately 140,000 km? So the answer to that would be the box and whisker plot, because if you know how a box and whisker plot works, you'll know that the line in the middle of the box is the median. In this case, you'll find that the median is approximately 140,000 km.


      Hope this helps!
      (4 votes)
  • purple pi pink style avatar for user Holly E
    Is there any chance you could add on box plots as a separate explanatory to this module?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • primosaur ultimate style avatar for user HT
      As I answered in another question, I found videos under Subject>Statistics and Probability>Displaying and Describing Data>Worked Example: Creating a Box Plot (Odd) {next is Even} Number of Display-Points and the video "constructing a Box Plot". I didn't find any earlier videos for Box Plots.
      (2 votes)

Video transcript

- [Voiceover] What I wanna do with this video is look at some examples of data represented in different ways, and think about which representation is the best, or can help us answer different questions? So we see this first example. A statistician recorded the length of each of Pixar's first 14 films. The statistician made a dot plot, each dot is a film, a histogram, and a box plot to display the running time data. Which display could be used to find the median? To find the median. All right, so let's look at these displays. So over here we see, this is the dot plot. We have a dot for each of the 14 films. So one film had a running time of 81 minutes. We see that there. One film had a running time of 92. One had a running time of 93. We see one had a running time of 95. We see two had running times of 96 minutes, and so on and so forth. So I claim that I could use this to figure out the median, because I could make a list of all of the running times of the films, I could order them, and then I could find the middle value. I could literally make a list. I could write down 81, and then write down 92, then write down 93, then write down 95, then I could write down 96 twice, and then I could write down 98, then I could write down 100. I think you see where this is going. I could write out the entire list, and then I could find the middle values. So the dot plot, I could definitely use to find the median. Now, what about the histogram? This is the histogram right over here. And the key here is, for a median, to figure out a median, I just need to figure out a list of numbers. I need to figure out a list of numbers. So here, I don't know, they say I have one film that's between 80 and 85, but I don't know its exact running time. Its running time might have been 81 minutes, its running time might have been 84 minutes. So I don't know here, and so I can't really make a list of the running times of the films and find the middle values, so I don't think I'm gonna be able to do it using the histogram. Now, with the box plot right over here, so I'm not gonna click histogram. With the box plot over here, I might not be able to make a list of all the values, but the box plot explicitly tells us what the median is. This middle line in the middle of the box, that tells us the median is, what is this, this median is, if this is 100, this is 99. So this is 95, 96, 97, 98, 99. It explicitly tells us the median is 99. This is actually the easiest for calculating the median. So I'll go with the box plot. So the histogram is of no use to me if I wanna calculate the median. Let's do a couple more of these. Nam owns a used car lot. He checked the odometers of the cars and recorded how far they had driven. He then created both a histogram and a box plot to display the same data, both diagrams are shown below. Which display can be used to find how many vehicles had driven more than 200,000 kilometers? So how many vehicles had driven more than 200,000 kilometers? So it looks like here in this histogram, I have three vehicles that were between 200 and 250, and then I have two vehicles that are between 250 and 300. So it looks pretty clear that I have five vehicles, three that had a mileage between 200,000 and 250,000, and then I had two that had mileage between 250,000 and 300,000. So I may be able to answer the question. Five vehicles had a mileage more than 200,000, and so I would say that the histogram is pretty useful. But let's verify that the box plot isn't so useful. So I wanna know how many vehicles had a mileage more than 200,000. Well, I know that if I have a mileage more than 200,000, I'm going to be in the fourth quartile, but I don't know how many values I have sitting there in the fourth quartile just looking at this data over here, so that's not gonna be useful for answering that question. Let's look at the second question. Which display can be used to find that the median distance, which display can be used to find that the median distance was approximately 140,000 kilometers? Well, to calculate the median, you essentially wanna be able to list all of the numbers and then find the middle number. And over here, I can't list all of the numbers. I know that there's three values that are between zero and 50,000 kilometers, but I don't know what they are. Could be 10,000, 10,000, 10,000. It could be 10,000, 15,000, and 40,000. I don't know what they are, and so if I can't list all of these things and put them in order, I really am going to have trouble finding the middle value. The middle value, it's going to be in this range right around here, but I don't know exactly what it's going to be. The histogram is not useful, because throwing all the values into these buckets. While on the box plot, it explicitly, it directly tells me the median value. This line right over here, the middle of the box, this tells us the median value, and we see that the median value here, this is 140,000 kilometers. Right, this is 100, 110, 120, 130, 140,000 kilometers is the median mileage for the cars. And so the box plot clearly... clearly gives us that data.