If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## AP®︎/College Statistics

### Course: AP®︎/College Statistics>Unit 9

Lesson 6: Sampling distributions for sample means

# Example: Probability of sample mean exceeding a value

Estimating the probability that the sample mean exceeds a given value in the sampling distribution of the sample mean. Created by Sal Khan.

## Want to join the conversation?

• It seems to me that there is some kind of an underlying assumption here that makes the result suspect (and I'm talking mathematics, not about sweating and all those pesky real-life problems :-) ).
What I'm wondering about is that it seems that by calculating the result while increasing the sample size and the amount of trials maybe as well, it's possible to get arbitrarily close to the situation where the standard deviation is as small as desired. This way we could convince ourselves that the 2.2 L we have reserved for each men is enough because we can get the z-score of 0.2 L arbitrarily low. I'm sort of wondering what would be the correct sample size that would give us the best approximation in the real world.
I realize that the example is simplified for mathematical convenience (which is quite understanable) but it bothers me that just increasing the sample size makes us more certain. I think there is an assumption here that doesn't quite work in real life, but I can't see what it is for now. If anyone can clarify this point, I'd be grateful!
Of course, it's possible that my doubts will be reconciled in a video I haven't yet seen, so maybe this question will become moot. Meanwhile, I'd appreciate it if someone would tell me if I'm mistaken in assuming that increasing the sample size will reduce the SD and this in turn will increase our probability as much as we desire. Also, it seems somehow inappropriate to have a sample size that is larger to or equal to the amount of men (50 men, sample size 50) because the whole idea of a sample will be that it is smaller than the whole population and tries to represent it this way.
Anyway, thanks in general to Sal for showing the intuition behind many mathematical concepts that are often just stated in mathematics books.
• You're right to think about the things you're assuming, when approaching a statistical problem. For this example to work, you have to assume that:

- each camper's fluid intake is independent of all the others
- the given population mean and standard deviation are accurate, and
- your campers are all drawn from that population.

If all of that is true, then we can estimate how likely the water is to run out (or, rather, how likely it is to find 50 campers whose average consumption is higher than 2.2L). The thing we assume here is that as the sample size increases, the probability that the sample mean will differ greatly from the population mean is lower -- and the reason we can assume this is the central limit theorem. We know that regardless of the population distribution, as the size of the random samples increases, the distribution of sample means approaches a normal distribution.

If the population standard deviation is right, then the SD for samples of 1 camper each is 0.7L. If you're randomly picking two campers, most of the time their consumption will balance out a bit, so σ for samples of 2 campers will be around 0.5L. σ for samples of 4 campers should be around 0.35L. For 50 random campers, Sal's probability estimate is right, if our initial assumptions are true.

You're perfectly right in thinking that you can choose sample sizes to make your sample standard deviation arbitrarily low. This is useful if you want to know how many campers to monitor to make sure your estimates are right, for example - as you increase your sample size, you decrease the likelihood that your sample will be different to the population you're drawing it from.
• I don't understand why we use the sampling distribution of the sample mean to calculate this probability. Why wouldn't we get a more accurate result by just taking the area under the population distribution between 2.2 and infinity?
• The area under the population distribution between 2.2 and infinity will give you the probability of one active individual drinking more than 2.2 Liters of water.
The question is asking what is the probability that 50 active guys drink more than 2.2 liters per person , which is equivalent to the probability 50 guys drink a sum of 110 liters.

Suppose you took a sample 50 guys from this population, some of them drink more than 2 liters of water some less than 2 liters, and take the mean of the 50 amounts dranken, place a dot on the x axis that corresponds to this sample mean, and then repeat this experiment thousands of times. You will get the sample distribution of sample means of size n=50. Why do we want to know this?
Well it turns out
(x1 + x2 + x3 + ... + x50 ) /50 >= 2.2
is equivalent to
(x1 + x2 + x3 + ... + x50 ) >= 50*2.2
or
(x1 + x2 + x3 + ... + x50 ) >= 110

so if we scale the horizontal axis of the sample distribution of sample means by multiplying 50, we get a sample distribution of 50 guys total amounts of water
• Hi everyone,
As far as I understand, in this exercise you can make your calculations over the sampling distribution of the sample mean because you assume normality on it. That is because of the central limit theorem. My question is why can you be sure that for n=50 (as in the example) you can assume normality in your sampling distribution? Why not n=10, or maybe n=20, or even n=30 (as pointed out as reasonable sample sizes in previous videos)?
cesc
• As I understand the sampling distribution, you will (in most cases) never reach a perfect normal distribution, but you will be getting really near to it. The higher the sample size, the better the proximity to the normal distribution. As it is just a sample you will have some diffference to the reality, but in a lot of cases it is too complex or / and expensive to use all possible data (like asking every person in the world if male or female), so the possibility to use a sample (e.g. asking 1000 people) is a really good way to solve this problem.

btw as I know there is the possibility to use confidence intervals to get a even better approximation to the reality.
• If we knew for certain that the population distribution was normal, could we not just take the std error as 0.7 and then the z score as 0.2/0.7?
• It is all about means.(Sorry for my bad English, I'm not a native speaker.)No, we can't. Mean 2L and std.dev 0.7 shows how many liters of water drink one man in average. We whant to know how many liters in average (arithmetically) drinks GROUP collected from 50 this "average" people (they are our samples).
• The first examples (videos on sampling distribution of the sample mean 1 and 2) show large SDs bc it was just a "sample". Then, with repeated sampling the SD decreases (SD^2/n).

But in this problem we are told that mean is 2 and SD is 0.7, and that is, supposedly a true representation of the population (i.e. not a small random sample, but a huge sample). Why are we to treat the SD as the distribution of a sample instead?

In other words, I had a hard time wrapping my head around the fact that a sample size of 50 has a smaller SD than a "population SD".

Maybe "the average male" means a "distribution" made on a single person?
• It's not a "sample of size 50" that has a smaller SD.

We have taken a sample of size 50, but that value σ/√n is not the standard deviation of the sample of 50. Rather, it is the SD of the sampling distribution of the sample mean.

Imagine taking a sample of size 50, calculate the sample mean, call it xbar1.
Then take another sample of size 50, calculate the sample mean, call it xbar2.
Then take another sample of size 50, calculate the sample mean, call it xbar3.
And so on.

If we do this repeatedly, we would start to see a distribution of sample means, all calculated from a different sample of size 50. This distribution of sample means has a smaller SD than the population from which the raw data was derived.

Think of an easier example: height. People have heights in some common range, say 4.5 feet tall to 6.5 feet tall. It's possible to be really tall, right? There are people who are 7 feet tall - or even more - but they're kind of few and far between. It's "rare" to see someone that tall, but possible. Now, imagine a collection of 50 people. What would we need in order to see that the average height of these 50 people is 7 feet? Well ... we would need a LOT of REALLY TALL people. Since getting an individual person who is 7 feet tall is pretty rare, getting a lot of people 7 feet tall (or more) is even more rare. Because of this, it's even more unlikely that the sample mean height of 50 people will be 7 feet or more.

This phenomenon manifests itself in Statistics with the SD of the sampling distribution of the sample mean being smaller than the SD of the population.
• HI Sal,
I did this question a different way and was wondering whether you could tell me why it works this way: basically, I defined a statistic, T = X1, X2, ...., X50. And so the statistic's mean is 50mu and variance 50(sigma^2). I then just did P(T > 110), and then converted this and used the Z table to get the exact same answer as you...
• Sal mentions the Z-Score table at . How were the values on this table calculated - where did they come from?
• What could have been made more clear is that Sal is looking for the probability for X to be between 0 and up to 2+(2.02*0.099)=2.19998 liters of water. Since we know that it is a normal distribution we can write:

X is N(2 , 0.099) and we look for P(0<X<2.19998)

Then, if we don't have a z-table, only a texas instrument we can write:

normalcdf(0 , 2.19998 , 2 , 0.099) = 0.9783..... Which is the number that Sal finds in the z- table

Hope it helps :)
• Hi Sal,
is the sampling distribution "less tightly packed" or "more tightly packed"?