If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Course: Statistics and probability>Unit 13

Lesson 2: Comparing two means

# Hypothesis test for difference of means

Hypothesis Test for Difference of Means. Created by Sal Khan.

## Want to join the conversation?

• Sal, Can we solve the same problem like this:
The mean difference = 1.91, the null hypothesis mean difference is 0. Standard deviation is 0.617. Z = (0-1.91)/0.617 = -3.09. It takes -3.09 standard deviations to get a value 0 in this distribution. which when converted to the probability = normsdist(-3.09) = 0.001 which indicates 0.1% probability which is within our significance level :5%. So we can safely reject the null hypothesis.I probably had built wrong concept in my head.correct me plz.
• Yes, you are mostly right, there are two ways to think this problem as you ca try to find out either the:

- "probability of the difference of the samples means being 1.91 or more assuming the difference of the populations means is 0" (this is what Sal does here, except that he only tries to prove that this probability is less than 5%) or the

- "probability of the difference of the samples means being 0 or less (it makes sense because you can theoretically have the people in the first group gaining weight and because here we have a distribution centered around 1.91) assuming the difference of the populations means is 1.91"

...and you can see that the probabilities are the same because you get the same Z score (3.09, the sign is meaningless) and corresponding probabilities equal (0.1%, if you looked up the values right, I didn't check your calculations). The math is the same!

So, essentially, you can think of the problem both ways, but Sal's way is standard way used by scientists, so when explaining something to someone else is better to use this form to make sure everybody understands you (also, in other more complicated setups, the variances might not be the same in the context of the null and alternate hypotheses, so going the "standard" way increases the chances of getting things right).

Hope I understood you question good enough to answer it and this helps.

offtopic: Sal almost never answers questions, guess he's too busy making new content for other courses...
• Even with an n>30, I still don't agree with using the sample's standard deviation as a valid approximation of the population's standard deviation. Say you test your sample the way Sal does it, and realize that the probability of you getting that sample was 1%. Normally, you would reject the null hypothesis. But say the null hypothesis was indeed correct. This means you just happened to choose a lot samples from the far left or far right of the population mean. Since your sample is representative of such a small extreme section, the standard deviation of your sample would have been a lot smaller than the true standard deviation of the total population. Therefore, since Sal used the sample standard deviation as his population standard deviation, he would have underestimated the population standard deviation, and consequently overestimated the z score of that sample. The degree at which he overestimated? I'm not sure, I think I'd have to solve some recursive function that is really hard to think about right now. Anyways, I'm sure there's something I'm missing, because I have full faith in Sal Khan. Please let me know if I have overlooked something.
• > "The degree at which he overestimated? I'm not sure, I think I'd have to solve some recursive function that is really hard to think about right now."

I'm not sure about that. There might be some way to formally express the degree of error, but I like jumping to simulations. The Type I error -- rejecting Ho when Ho is actually true -- is slightly inflated. By the time we get to n>30 it's not really by much, but it's a bit above what it should be. This means that using the Z-test when we should use the T-test will conclude a significant result a bit too often.
• If we are assuming that the null hypothesis is true (there is no difference between the two diets), why are we using each of the sample standard deviations separately, as if they are separate populations? If the null hypothesis is true, both samples are taken from the same population. Shouldn't we then take an average of the two standard deviations, or recalculate the standard deviation of the whole n=200 sample?
• I was thinking in the same way so I calculated the the null hypothesis distribution STD from the average of variances and the result was 0.6165 which is so close to the 0.617 that Sal used.
• I thought the hypotheses have to complement each other...? So if one is "equals zero" shouldn't the other be "does not equal zero"?
• I guess the "correct" H(0) should be "u1-u2 <= 0" (I'll call this H(0)_1) instead of H(0): u1-u2 = 0 as Sal writes (I'll call this H(0)_2), bu you can easily see that:

- if we assume H(0)_1 instead of H(0)_2, we would get a Z_1 >= Z_2 (think visually: we "slide" the bell curve to the left ...or if you're math inclined write the inequalities about Z_1, even if you can't really calculate this Z_1)

- next, if Z_1 >= Z_2, then P_1 <= P_2 (where P is the probability of obtaining the given difference of the means)

- next, if P_2 < 0.5% (our significance threshold) and P_1 <= P_2, then P_1 < 0.5%

- so this "correct" H(0)_1 is actually implied by Sal's H(0)_2, so it's practically the same thing.

...so short answer: yes, you're right, but "is greater than 0" (H(0)_2) actually implies "does not equal zero", so proving the former is implied proof for the latter.
• His critical value Z score is 1.65 because of the 95%. I'm a little confused, because my chart says 95% should be 1.96. Am i using a different chart for something unrelated to z scores?
• One sided vs two sided is important. If we're testing for a "difference" then we need to split the type 1 error probability (alpha) into the two tails, so 0.05 translates to a z value of 1.96. If we're only looking for an increase or decrease, then we put all of the alpha probability into one tail, which leads to a z value of 1.65.
• At , I don't understand why the mean of the sampling dist. is the same as the mean of the population dist.

Couldn't your sample mean be quite different? I.e. couldn't you draw a sample that had a higher mean than the overall population ?

Or is he saying that the mean of the distribution of all the samples you draw is going to be the same as the mean of the overall population?
• Great question. And yes, your last sentence is the correct explanation! This is why inferential stats can be confusing, and why every single word is critical when we talk about means and samples and distributions. As you say, the mean of ONE SAMPLE could be (in fact, almost always IS) different from the population mean. But the mean value of the distribution of all the sample means (phew!) will be the same as the population mean. Of course, you DON'T HAVE "all the sample means", you only have ONE of them (usually)! So the "distribution of all sample means" is usually something we just imagine as a theoretical abstraction. But it's critical to understand what that distribution represents (we have to imagine doing the same experiment many, many times) in order for hypothesis testing to make sense.
• I'm baffled as to why we keep using our original sample standard deviations as estimates for the population SDs (c. ) once we're assuming the null hypothesis. If (and I might be barking up the wrong tree here) the hypothesis is that there's no meaningful difference whatsoever in weight loss effect between the two diets, why should their SDs remain distinct when imagined across the whole population? If the two groups' data are basically identical when viewed globally, shouldn't their SDs be identical too?
• because it would lead to same answer. if you sample twice from the same population then the best variance estimator is ((n1-1)var(x1) + (n2-1)var(x2))/(n1+n2-2) ... i know you understand which symbol means what here .. now calculate for variance of difference of means of two iid samples from this population using the just calculated estimate of variance. It is the same thing as what sal does
• Man, am I confused. I can't understand why we reject the null if there is a low probablility of getting a favorable result. wouldn't that prove the null is correct?
(1 vote)
• All of the above is correct but I think Sal said it pretty well himself in an earlier video: We presume the null hypothesis is true -- so if that's the case what are the chances of getting the sample results we got? He just proved that the chances of getting the results we actually got are < 5%, if the null hypothesis is true. The result in this case fell into the rejection region. So the lower the percentage, the less likely it is that we would've gotten that result.