Main content

## Statistics and probability

### Course: Statistics and probability > Unit 3

Lesson 2: More on mean and median- Calculating the mean: data displays
- Calculating the median: data displays
- Comparing means of distributions
- Means and medians of different distributions
- Impact on median & mean: removing an outlier
- Impact on median & mean: increasing an outlier
- Effects of shifting, adding, & removing a data point
- Mean as the balancing point
- Missing value given the mean
- Missing value given the mean
- Median & range puzzlers
- Median & range puzzlers

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Comparing means of distributions

Sal compares the means of two different distributions given as dot plots. Created by Sal Khan.

## Want to join the conversation?

- Why do you need to multiply to get the answer? Is that a choice?(15 votes)
- Jordan is right, it does take up less space to use multiplication and more space with addition(4 votes)

- It would help if you start the video by labeling the X-axis as "Number of fruit eaten per day" and the Y-axis as "# of people (freshman's or seniors)". That would've made ti easier to follow for me.(15 votes)
- Did you (Khan) mean to say the median is the middle number at6:24? It's hard to tell what you meant, because the mode is the same as the median: 3. But it seems odd to call the mode the middle number.(5 votes)
- Median is the middle number, and the mode is the most commonly occurring number. (Occurs the most in a data set) The mode can be the same as the median if the middle number is also the most commonly occurring number. Does this clear things up?(6 votes)

- 0 + 2.1 + 2.2 + 4.3 + 3.4 + 5 + 6 + 19 / 15

why over 15 not over 8? And also why do you need to put the 0 there if it is zero and will not really effect the amount for it will not increase or decrease the sum of all those numbers. So why does it have to be included?(3 votes)- Listen to the video more closely, paying particular attention to what Sal is saying and the math that he is writing out. He's not writing a "2.1", he's writing "2x1", because he's saying "We have two data points at 1". Basically, he's using a short-hand notation.

And you don't have to add the zero, but you do have to count it, so that the denominator is correct.(4 votes)

- And at4:30, it is also confusing!(4 votes)
- Sal is multiplying the value of the dots by how many dots there are of that value. For example, there are 5 dots with the value 3 (look at the bottom dot plot). The reason why he is dividing the equation by 16 is because there is a total of 16 dots on the dot plot. Hope this helps!(2 votes)

- What does ''center of distribution'' exactly mean?(4 votes)
- The centre of distribution is just the middle of the distribution or in this case the middle of the plot.(1 vote)

- At2:01, it's confusing.(3 votes)
- At2:01, Sal just realized he had left out the outlier at 19, (an outlier is a value way outside of the data cluster), so he had to add it to the total number of values to get an accurate mean (average).(3 votes)

- The median is the middle number and the mode is the number that occurs the most. You mixed them up.(3 votes)
- A lot of time wasted on calculating the mean. The video could have been a lot shorter and clearer.(3 votes)
- What site is he using I want to use that help me ?(3 votes)

## Video transcript

Voiceover:Kenny interviewed
freshmen and seniors at his high school, asking them how many pieces
of fruit they eat each day. The results are shown
in the 2 plots below. The first statement
that we have to complete is the mean number of
fruits is greater for, and actually, let me go
down the actual screen, is greater for, we have to pick between
freshmen and seniors. Then they said the mean is a good measure for the center of distribution of, and we pick either freshmen or seniors. Let me go back to my scratch pad here, and let's think about this. Let's first think about the first part. Let's just calculate the mean for each of these distributions. I encourage you to pause the video and try to calculate it out on your own. Let's first think about
the mean number of fruit for freshmen. Essentially, we're just going to take each of these data points, add them all together, and then divide by the
number of data points that we have. We have one data point at 0. We have one data point at 0, so I'll write 0. And then we have two data points at 1, so we could say plus 2 times 1. And then we have two data points at 2, so you write plus 2 times 2. And then, let's see, we
have a bunch of data. We have four data points at 3, so we could say we have four 3s. Let me circle that. So we have four 3s, plus 4 times 3. And then we have three 4s, so plus 3 times 4. And then we have a 5, so plus 5, and then we have a 6. Let me do this in a
color that you can see. And then we have a 6 right over here, plus 6. How many total points did we have? We had 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, oh, actually, be careful. We had 15 points and I
didn't put that one in there. Actually, let me just ... So we have 15 points, and I can't forget this one over here, so plus ... my pen is acting a little funny right now, but we'll power through that, plus 19. So what is this going to be? This is just going to be 0. This is going to be 2. This is going to be 4. This is going to be 12. My pen is really acting up. It's almost like it's
running out of digital ink or something. This is going to be another 12, and then we have 5, 6, and 19. So what is this going to be? 2 plus 4 is 6, plus 24 is 30, plus 11 is 41, plus 19 gets us to 60. 60 divided by 15 is 4, so the mean number of fruit per day for the freshmen is 4
pieces of fruit per day. This right over here, that right over there
is our mean for the ... Let me put that in a color
that you can actually see. Now let's do the same
calculation for the seniors. We have one data point where they didn't eat any fruit at all each
day, not too healthy. Then you have one 1, so I'll just write that as, we could actually write that as 1 times 1, but I'll just write that as 1. Then we have two 2s, so plus 2 times 2. Then we have one, two
, three, four, five 3s, five 3s, so plus 5 times 3. And then we have three 4s, so plus 3 times 4. And then we have two 5s, plus 2 times 5, and then we have a 6. We have a 6, plus 6, and we have a 7, someone eats 7 pieces of fruit each day, a lot of fiber, plus 7. And now, how many data points did we have? We have 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16 data points. So we're going to divide this by 16. So what is this going to be? This is just 0. Let's see. This is, just right over, that's 0. This is 4. This is 15. This is 12. This is 10. So we have 1 plus 4 is 5 plus 15 is 20 plus 12 is 32 plus 10 is 42. 42 plus 6 is 48, 48. Am I doing ... 42 plus 6 is 48 plus 7, 48 plus 7 is 55. Did I do that right? Let me do that one more time. 1 plus 4 is 5 plus 15 is 20, 32, 42. 42 plus 13 is 55. So this is equal to 55 over 16, which is the same thing as, let's see, that's the same thing as 3 and 3 that ... 3 times 16 is 48, so 3 and 7/16. So the mean for the seniors, 3 and 7/16, that's right around ... let's see. This is 3, that's 4, so 7/16, it's a little less than a half. It's right around there. So the mean number of
fruits is defnitely greater for the freshmen. They have 4 ... Their mean number of
fruit eaten per day is 4 versus 3 and 7/16. The mean is a good measure for the center of the distribution of. So when we think about whether
it's freshmen or seniors, the mean is fairly sensitive
to when you have outliers here. For example, someone here was eating 19 pieces of fruit per day. That's an enormous amount of fruit. They must be only eating fruit. You can imagine if it
was even a bigger number, if someone was eating 20
or 30 pieces of fruit, just that one data point will
skew the entire mean upwards. That wouldn't be the effect on the mode because the mode is a middle number. Even if you change this one point all the way out here, it's not going to change
what the middle number is. So the mean is more
sensitive to these outliers, to these really, these points that are really, really high, really, really low. And because the seniors don't seem to have any outliers like that, I would say that the
mean is a good measure for the center of
distribution for the seniors, or a better measure for
the center of distribution for the seniors. Let's fill both of those out. The mean number of fruit is
greater for the freshmen, and the mean is a good measure for the center of
distribution for the seniors. You actually even see it here. We saw that the mean number
for freshmen was at 4, but if you just ignored
this person right over here and just you thought about the bulk of this distribution right over here, 4 really doesn't look
like the center of it. The center of it looks closer to 3 here. What happened is this one person eating 19 pieces of fruit per day skewed the mean upwards. While here, that 3 and
7/16 really did look closer to the actual distribution, closer to the ... actually, I shouldn't say ... I mean in both times, we actually did calculate the mean of the actual distribution. But here, since there's no outliers, it does seem the mean
seemed much closer to, I guess you could say
the middle of this pile right over here. Let's check our answer, and we got it right.