If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: Statistics and probability>Unit 12

Lesson 5: More significance testing videos

# Small sample hypothesis test

Sal walks through an example of a hypothesis test where he determines if there is sufficient evidence to conclude that a new type of engine meets emission requirements. Created by Sal Khan.

## Want to join the conversation?

• I've always been confused about what a degree of freedom is. My textbook is very unclear, and wikipedia isn't much help either. Wikipedia describes degrees of freedom as "the number of values in the final calculation of a statistic that are free to vary", which is very vague. Is anyone really clear on what this is? I've seen it used all the time in hypothesis tests, but it's always baffled me
• Correct me if I'm wrong, but the way I see it can be illustrated by the following example. Lets say you have de letters A, B, C and D and you have four boxes under which those letters are hidden, called 1, 2, 3 and 4. The letters are randomly hidden under the boxes so you have to guess them. You open box 1 and see the letter C, so that one's out. Box 2 reveiles A, so that one's out aswell and the letters B and D are left. However, if you open the third box and you see the letter, you automatically know what's below box number 4 aswell. So if box 3 reveiles the letter D, you automatically know that B is below box 4. Hence, you have one degree of freedom less, since the last letter is known when the previous three boxes are lifted and there's no need to lift up the last box.
• Correction @; it should use "<=" instead of "=", that is: P( xbar <= 17.7 | H0 )<0.01. p-values are the probability of the statistic coming out "more extreme" than what was observed. This makes sense since we are working with a one-sided test that rejects only if the mean is low. (For the two-sided test, you double the probability to represent both tails.)
• I agree that the probability phrasing in the video is incorrect. It should be <=. Since this is a continuous distribution, the probability of getting any single value is actually zero. So, P(xbar = 17.7|H0 is true) = 0). We are truly looking for the probability of getting a value of xbar more extreme than the observed value of 17.7. Later in the video, Sal shifts gears to examining for a value that is more extreme (than the t-statistic), but that "more extreme than" bit should have been present from the beginning of the analysis.
• Why do we actually use s / sqrt(N) and not s / sqrt(N-1) ? I thought that we used the latter if the sample size is small, or am I wrong? When do you use the one or the other?
• Dividing by n-1 is used when we calculate the standard deviation, s. Once we've done that, we've already adjusted for the bias. The calculation of s / sqrt(n) is calculating the standard error of the sample mean (well, an estimate of it). This calculation uses just sqrt(n) in the denominator.
• I have a basic question on the null hypothesis (H0). Why wasn't the null hypothesis stated as x<20? Is it because the question mentioned Type 1 error or is there some other reasoning for assessing the problems in general?
• Hypothesis tests are designed to prove the alternative hypothesis, so we try to put what we want to show into H1, and use the opposite of it as the null, Ho.

And yes, this is related to Type I Error - which is the probability of incorrectly deciding that H1 is true. So in this case, if we rejected Ho (that is, conclude the new engine design meets the emission requirements), then there is only a 1% chance that we made a mistake.

• Very basic question, but been a long time since i've done any statistics. So may I please ask you how you found the standarddiviation? Left my calculator at school, so cant try, but is it. )(15,6 - 17,17)^2)* 1/10 + .......((13,9- 17,17)^2)*1/10 ?
• When referencing the t-table, why did Sal decide to use the one tailed test rather than the two-tailed test?
• Why is the null hypothesis u=20 ppm and not u is greater than or equal to 20 ppm?
• The null hypothesis is a value believed to be true. (=)
The alternative hypothesis is the same value as the null hypothesis, but it involves a comparative. (<, >, etc.)
(1 vote)
• After watching the previous videos I still do not understand the intuition behind the conditions for H0...
Why do we think that having a low probability (<1%) for 17.7 ppm in the problem leads us to rejecting the H0? If 17.7 has high probability that means that we more than meet the required <20 ppm standards. What am I missing here?
• H0 is that we don't meet the standards. By rejecting H0, we are saying we are confident that we do meet the standards.
In general, rejecting H0 means that we got a statistically significant result. It can seem counter-intuitive. I think about it like wanting a negative result on many medical tests - because negative means I don't have whatever disease or condition they were testing for.
• Why not just take the absolute value of t? In the end, it's the magnitude of t that matters and may be less confusing for one to simply deal with positive values, especially from the t-table.
(1 vote)
• That is generally how it's used. Though you have to be careful sometimes. If you're performing, say, an upper-tail test, and the t-stat is negative, then taking the absolute value, and comparing to the positive critical values could lead to the wrong decision.
• where does the standard deviation come from?

## Video transcript

The mean emission of all engines of a new design needs to be below 20 parts per million if the design is to meet new emission requirements. 10 engines are manufactured for testing purposes, and the emission level of each is determined. The emission data is, and they give us 10 data points for the 10 test engines, and I went ahead and calculated the mean of these data points. The sample mean of 17.17. And the standard deviation of these 10 data points right here is 2.98, the sample standard deviation. Does the data supply sufficient evidence to conclude that this type of engine meets the new standard? Assume we are willing to risk a type-1 error with a probability of 0.01. And we'll touch on this in a second. Before we do that, let's just define what our null hypothesis and our alternative hypothesis are going to be. Our null hypothesis can be that we don't meet the standards. That we just barely don't meet the standards. That the mean of our new engines is exactly 20 parts per million. And you essentially want the best possible value where we still don't meet, or the lowest possible value, where we still don't meet the standard. And then our alternative hypothesis says no, we do meet the standard. That the true mean for our new engines is below 20 parts per million. And to see if the data that we have is sufficient, what we're going to do is assume, we're going to assume that this is true. And given that this is true, if we assume this is true, and the probability of this occurring, and the probability of getting a sample mean of that is less than 1%, then we will reject the null hypothesis. So we are going to reject our null hypothesis if the probability of getting a sample mean of 17.17 given the null hypothesis is true, is less than 1%. And notice, if we do it this way there will be less than a 1% chance that we are making a type-1 error. A type-1 error is that we're rejecting it even though it's true. Here there's only a 1% chance, or less than a 1% chance that we will reject it if it is true. Now the next thing we have to think about is what type of distribution we should think about. And I guess the first thing that rings in my brain is we only have 10 samples here. We only have 10 samples. We have a small sample size right over here. So we're going to be dealing with a T-distribution and T-statistic. So with that said, so let's think of it this way. We can come up with a T-statistic that is based on these statistics right over here. So the T-statistic is going to be 17.17, our sample mean, minus the assumed population mean-- minus 20 parts per million over our sample standard deviation, 2.98-- this is really the definition of the T-statistic. And hopefully we see now that this really comes from a Z-score and the T-distribution is kind of an engineered version of the normal distribution using T-statistics. 2.98 divided by the square root of our sample size. We have 10 samples, so it's divided by the square root of 10. So this value right here-- let me get the calculator out just to get a value in place there. So this is going to be 17.17 minus 20, close parentheses, divided by 2.98 divided by the square root-- that's not what I wanted. Let me delete that. Let me go back. Divided by the square root of 10, and then close parentheses. It is almost exactly negative 3. Our T-statistic is almost exactly negative 3, negative 3.00. And what we need to figure out, because T-statistics have a T-distribution, so what we need to figure out is the probability of getting this T-statistic or a value of T equal to this or less than this, is that less than 1%? So the way we can think about it is we have a T-distribution. And let's say we have a normalized T-distribution. The distribution of all the T-statistics would be a normalized T-distribution. This is the mean of the T-distribution. There's going to be some threshold T-value right here. So this is our threshold T-value. My writing isn't that easy to view. This is some threshold T-value right over here. And we want a threshold T-value such that any T-value less than that, or the probability of getting a T-value less than that is 1%. So that entire area in yellow is 1%. And we need to figure out a threshold T-value there. And this is for a T-distribution that has n equal to 10 or 10 minus 1 equals 9 degrees of freedom. So what is that threshold value over there? And notice that this is a one-sided distribution. We care about this is 1% and then all of this stuff over here is going to be 99%. And just the way most T-tables are set up, they don't set up a negative T-value that is oriented like this, they'll just give you a positive T-value that's oriented the other way. So the way T-tables-- and I have one that we're going to use in a second right over here-- the way T-tables are set up is you have your distribution like this, and they will just give a positive T-value over here, some threshold value. Where the probability of getting a T-value above that is going to be 1%, and the probability of getting a t-value below that is going to be 99%. And you can see that-- well, we know T-distributions are symmetric around their mean, so whatever value this is, if this number is 2 then this value's just going to be negative 2. So we just have to keep that in mind. But the T-tables actually help us figure out this value. So let's figure out a T-value where the probability of getting a T-value below that is 99%. And once again, this is going to be a one-sided situation. So let's look at that over here. So one-sided-- this is just straight from Wikipedia-- one-sided, we want the cumulative distribution below that T-value to be 99%. We have it right over here, 99%. We have 9 degrees of freedom. We have 10 data points, 10 minus 1 is 9. 9 degrees of freedom. So our threshold T-value here is 2.821, so our threshold T-value in the case that we care about is just flip this over, it's completely symmetric is negative 2.821. So what this tells us is the probability of getting a T-value less than the negative 2.821 is going to be 1%. Now we got a value that's a good bit less that we. Got a T-value of negative 3. We got a T-value right here, our T-statistic of negative 3 right over here. So that definitely goes into our-- I guess you could call it our area of rejection. This is even less probable than the 1%. We could even figure it out that the area over here, the probability of getting a T-statistic less than negative 3 is even less than, it's a subset of this yellow area right over here. So because the probability of getting the T-statistic that we actually got is less than 1%, we can safely reject the null hypothesis and feel pretty good about our alternate hypothesis right over here, that we do meet the emission standards. And we know that we have a lower than 1% chance of actually making a type-1 error in this circumstance.