If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Example: Describing a distribution

Learn how to describe a distribution of quantitative data by discussing its shape, center, spread, and potential outliers.

Want to join the conversation?

  • aqualine ultimate style avatar for user nabhoneelsilupadhyay
    Why does Sal say the same thing twice at ?
    (20 votes)
    Default Khan Academy avatar avatar for user
  • aqualine seed style avatar for user Hollyyy
    we've never talked about IQR, MAD and how to find the mean and medium on this channel...
    this lesson is very confusing for me...
    (7 votes)
    Default Khan Academy avatar avatar for user
  • aqualine sapling style avatar for user lambsandponies
    How does Sal know whether to calculate the mean or the median of the distribution when trying to describe the center? Also how does Sal know which one to calculate (range, IQR, MAD, standard deviation) when trying to describe the spread?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Matthew Daly
      Sal seems to be choosing the options that don't require him to pull out his calculator and do a lot of work. ^_^ There's nothing wrong with that; the idea is just to be able to briefly describe the key points of the distribution in a few words.
      (6 votes)
  • marcimus pink style avatar for user Ryan🥰
    this is so confusing please help me.
    (5 votes)
    Default Khan Academy avatar avatar for user
  • aqualine ultimate style avatar for user LeoDoodlezz
    i dont know what MAD means
    (3 votes)
    Default Khan Academy avatar avatar for user
    • primosaur sapling style avatar for user Liv
      MAD stands for "Mean Absolute Deviation", and it is an average of how much each data point varies from the mean. The way you calculate this is: 1. find the mean of the data set 2. calculate the distance of each data point from the mean, you can do this by subtracting each point from the mean and taking the absolute value of the result (distance is always positive) 3. find the mean of the values you just got from taking the distance of each data point from the mean, in other words, add all the distances of the data points from the mean and divide by the total number of data points. Hope that helps, and good luck on your stats journey!
      (2 votes)
  • starky tree style avatar for user Kaden Peterson
    Could an outlier be in the middle of a dot plot say the dot at 10 was moved closer to 5 and the dot was put at like 13 could that be classified as a outlier
    (2 votes)
    Default Khan Academy avatar avatar for user
  • mr pink red style avatar for user Najeera Williams
    How do you determine which center and spread method to use?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Anagha Tiwari
      For a measure of center, use mean or median. Because the mean is a non-resistant measure of center (it is easily influenced by outliers), it is typically used for data sets that are roughly symmetrical (they do not have any outliers or skewness). The median is used for data sets that are skewed or have outliers because it is a resistant measure of center (not influence) and gives an accurate portrayal of the data. For spread, range is non-resistant (since it only takes 2 values to calculate range, they might be outliers) and SD is also non-resistant (outliers and skewness may affect the SD). Range and SD should only be used in data sets that are roughly symmetrical. On the other hand, IQR is resistant to outliers, which is why it should be used for data sets that are skewed.

      In other words, you should decide which measure to use based on the distribution of your data.
      Hope this helps!
      (5 votes)
  • blobby green style avatar for user walllea
    For a measure of center, use means or median. Because the mean is a non-resistant measure of center (it is easily influenced by outliers), it is typically used for data sets that are roughly symmetrical (they do not have any outliers or skewness). The median is used for data sets that are skewed or have outliers because it is a resistant measure of center (not influence) and gives an accurate portrayal of the data. For spread, the range is non-resistant (since it only takes 2 values to calculate range, they might be outliers) and SD is also non-resistant (outliers and skewness may affect the SD). Range and SD should only be used in data sets that are roughly symmetrical. On the other hand, IQR is resistant to outliers, which is why it should be used for data sets that are skewed.

    In other words, you should decide which measure to use based on the distribution of your data.
    Hope this helps!
    (2 votes)
    Default Khan Academy avatar avatar for user
  • aqualine sapling style avatar for user TiannaBruce16
    Was there anything supposed to be under MAD in the spread group?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • male robot johnny style avatar for user Noah Kohail
    How do I find a median on a histogram?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • primosaur sapling style avatar for user Liv
      Hello Noah,
      That's a great question. You can't find the exact median on a histogram, however you can find the range, or "bucket" as it's called in this course, that the center of the data is likely to be in. Hope that helps!
      (1 vote)

Video transcript

- [Instructor] Sometimes in life, like say, on an exam, in particular, like an AP exam, you might be asked to describe or compare a distribution. And so we're gonna get an example of doing that right over here. Sometimes in life, say on an exam, especially on something like an AP exam, you're asked to describe or compare a distribution. And what we're gonna do in this video is do exactly that, in fact, this one we're gonna describe and in a future video we're going to compare distributions. Now before we even read about this distribution or look at this distribution, if you're asked to describe a distribution, there's four things that you should be thinking about. You should be thinking about the shape of the distribution. And when we're talking about shape, there could be left-skew, there could be right-skew, and we'll see examples of these. And we've talked about them in detail in other videos. It could be symmetric, these are the ones that we typically see, although there might be other types of shapes, you'll have your center of distribution. And there's multiple ways to be thinking about the center of distribution, we've talked about this before, you have your mean, you have your median, these are the two most typical ones. You have a notion of spread. And for spread you could use range, you could use interquartile range, you could use something like a mean absolute deviation, you could use the standard deviation, these are all measures of spread. And then, you probably should at least comment about outliers, even if you don't see them, it's a good idea to comment just to make sure that you are being relatively comprehensive. So now, given that, let's describe the distribution right over here. It says in the state of Connecticut, the Department of Motor Vehicles, the DMV, requires 16 and 17 year olds to take a 25 question knowledge test in order to obtain a Learner's Permit. To pass, prospective drivers must correctly answer at least 20 questions. On one Monday, 22 teenagers took the test. The dot plot below shows their scores. So why don't you pause this video and see if you can take a shot at describing the shape, the center, the spread and the outliers. Some of these you might be able to come with the actual numbers, you might be able to calculate some of these, but really just to get a sense of it, why don't you take a shot at it. Alright, now let's do this together. So first, on the shape. So what we see is, we have, in most of the distribution is in this part between 20 and 25, but then we have this fairly long tail to the left and so this tells us that we have a left-skew or it is a left-skewed distribution right over here. So we have done the shape, it's a left-skewed distribution because the tail goes to the left. Now what about the center of this distribution? So there's a few ways to measure center, mean or median, just for the sake of simplicity, I'll think about the median here, I can eyeball that to some degree. You could also calculate the mean, it would take a little bit more time. I would guess that it's someplace, not even calculating it, I would guess that it's someplace in this range right over there but let me actually calculate it. So the median, there's 22 data points, so the median is whatever number has 11 onto the right of it and 11 to the left, half of 22. So let's see, we have one, two, three, four, five, six, seven, eight, nine, 10, 11. So the median here, is going to be, let's see, this is 23. Because we have a bunch of 23s, one, two, three, four, five, six 23s and if we were to just order all of the data points 11 of the data points would be 23 or less and 11 would be 23 or more. So our median here, so I could say our center, is 23, if we use the median. And actually, let me write that down. So our median is 23, that's the measure of center that I decided to use. Now, what about spread? Well the simplest measure of spread is just the range which is just the highest value minus the lowest value. And so our range here would be 25 minus four. 25 minus four is equal to 21, so that is a measure of range. You could have others but this one is very easy to calculate and then if we think about outliers, well there are a few outliers I would consider and it's very subjective. People can debate, you know if there's a dot right over there, is that an outlier or not. But I would say that these four right over here, I would consider outliers. So I would say approximately four outliers, but once again this is subjective. The main point of this exercise is to just get in the habit of thinking about these things. And statistics is all about creating engineering, one could say, different measurements for center, for spread. Different ways to describe the shape, but the point is is to just think about these various dimensions.