If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Course: Statistics and probability>Unit 13

Lesson 2: Comparing two means

# Confidence interval of difference of means

Sal uses a confidence interval to help figure out if a low-fat diet helps obese people lose weight. Created by Sal Khan.

## Want to join the conversation?

• Why does Sal use a Z-table instead of a T-table? •  Oliverfaria is correct. Brother Sal in the video "Z-statistics vs. T-statistics" explained that if sample size is bigger the 30, then use Z table. If its smaller then 30 use T table.
• Well I understand Z Scores and Normal Distribution but i'm having a hard time understanding Confidence Intervals. • Let's say that I run a sandwich shop. From experience, I know that the number of sandwiches I sell is normally distributed with a mean of 100 and a standard deviation of 10. There are two things that can go wrong with my store -- either I don't have enough customers to meet my labor costs or I run out of the bread and have to turn away all my late customers. So when I use my z-score stuff to realize that there is a 99.75% probability that I will sell between 70 and 130 sandwiches, that's important enough that we want to call [70,130] a 99.75% confidence interval. Then I conclude that if I order 130 rolls and schedule enough workers for a 70 worker day that I should expect only about one "bad day" per year.
• Okay, so we can make a sample distribution for the people who took the low fat diet and one for those who didn't. But why do we find their difference? My guess is that by finding the difference we get how much more effective is the former than the latter. But then how does the confidence interval part fit into this picture.

I'm looking for an answer that gives me an intuition more than a technical answer, if possible... • The confidence interval is there to account for the randomness associated with any sort of trial. People lose weight at different rates, some likely stick to the diet better than others, and people just experience general fluctuations in weight that may have effected their starting or ending weights. All these things add "noise" to the measurement and the confidence interval is a way to show how much noise is associated with a given result.

If the results of a study are well within that noise, it makes the difference much less credible than one where the noise is small relative to the difference in results.
• @ Andrew M:
Yes, I have encountered the CLT. Why exactly does it apply here? I thought the CLT said that given a sufficiently large sample size n, the distribution of the means of the samples would be approximately normally distributed. Why should the differences between two distributions of sample means be normally distributed? Those aren't means.

The only thing I can think of intuitively is that maybe if you add two normal r.v.s together, the resulting distribution is normal? Is this true? Otherwise, what parts of the CLT correspond to the difference of sample means?

Thanks!
Beth

Thanks!  • Can we use the same calculation if the data is not normally distributed? What about if the sample sizes are not equal? • Why are you dividing by sample size for variance here? My understanding was that this is only for the standard deviation of a sample distribution of sample means and that does not seem to be the case here. You just have 2 samples of size n/m. Can you please clarify? • When to use a Z Table?
I understand the rule for 1 data set; If the sample size is less than 30, use a T-table.
But for 2 data sets, each data set sample size could be different and you could have a scenario one data set sample size is above 30 and the other is below 30. My guess is that a conservative answer would be to use a T-table if either data set is below 30. Is that a good rule of thumb or are there other factors that come into play? • At about , why doesn't 4.67/10 + 4.04/10 work?
(1 vote) • The variance of the sum (or difference) of two random variables is additive. So for random variables X and Y:

`V( X + Y ) = V(X) + V(Y)`
`V( X - Y ) = V(X) + V(Y)`

This comes from the mathematical theory. I'm not sure where (or if) this is covered on Khan Academy.

Then, the 4.67/10 and 4.04/10 are the standard errors (standard deviation) of the two sample means. In order to be able to add them, we first need to convert them into variances by squaring them. 