If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Hypothesis test for difference in proportions example

Hypothesis test for difference in proportions example.

Want to join the conversation?

  • blobby green style avatar for user skonukoglu23
    why do we use combined value for estimating standard deviation (p 2015-p 2000)? Why don't we calculate it with their real proportions?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • winston baby style avatar for user Louis Brockett
    how would we find the mean of
    this?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • orange juice squid orange style avatar for user Evan
      Since we're subtracting the two samples, the mean would be the 1st sample mean minus the 2nd sample mean (µ1 - µ2). Sal finds that to be 0.38 - 0.33 = 0.05 at . In this video, Sal is figuring out if there is convincing evidence that the difference in population means is actually 0.
      (2 votes)
  • leaf green style avatar for user George D.
    On an exam, would it be safe to write that I'm assuming the we're sampling less than 10% of the population to meet the independence condition? Like what Sal does around ?
    P.S- Happy New Year
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user josephw
    wouldn't the null be less than or equal to because rejecting the null if its equal doesn't suggest an increase (the alternate), only a change?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Chuck B
    In the computation of σ, Sal observes that because the premise of the hypothesis test is that the null hypothesis is true, we assume that p^_2015 = p^_2000 and thus use the combined p^_c as the basis for a "best estimate" of σ. This I understand and concur with.

    However, it's not clear to me why the numerator for computing z is p^_2015 - p^_2000 rather than p^_2015 - p^_c. In other videos, Sal describes the numerator as p^ - p_0, where p_0 is the presumed proportion of the population. In this case, wouldn't p^_c be a better estimate of the population proportion than p^_2000?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Abhi T.
    what happens if the test case for normalcy is not met because of one or two expected counts?
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

- [Instructor] We are told that researchers suspect that myopia, or nearsightedness, is becoming more common over time. A study from the year 2000 showed 132 cases of myopia in 400 randomly selected people. A separate study from 2015 showed 228 cases in 600 randomly selected people. So what we're going to do in this video is do a hypothesis test to see if we have evidence to suggest the researcher suspicion that myopia is becoming more common over time. If at any point you are inspired, I encourage you to pause the video and try to work through things on your own, but here I go. I'm going to do it with you. So let's just start off by setting our null and alternative hypothesis. So remember, our null hypothesis, this would be that the known news here. So that would be that contrary to their suspicions, that myopia is not becoming more common. And so the way that we're measuring more common over time is we could look at the proportion of folks who have myopia in 2015 and compare that to the proportion in 2000. So our null hypothesis is that there's no difference. Is that the true proportion of folks who have myopia in 2015 is equal to the proportion of folks who have myopia in 2000. And then our alternative hypothesis, remember, they are, they suspect it's becoming more common over time. So that would be a situation where our true proportion in 2015 is greater than the true proportion in 2000. In this scenario, myopia would be becoming more common over time because 2015 happens after 2000. So before we even go about testing our null hypothesis, seeing if we can reject or not, which would suggest our alternative, you have to look at your conditions for inference. And we've done this many times before. You have your random condition, and it looks like we meet that because in both of the samples we have 400 randomly selected people, randomly selected people. So that looks good. Then you have your normal condition. And to meet your normal condition, your number of successes and failures in each of the samples have to be at least 10. And we see that that is the case. We have 132 successes so to speak, not that it's a success for someone to have myopia, but the way this is being constructed that would be a success. And then 400 minus 132 failures. In each case, either of those numbers would be greater than 10. And same thing for the sample from 2015, so we're meeting both of those. And then the last condition that we always talk about, is the independence condition. And two ways to get there, either you are sampling with replacement or you feel good that your sample size is no more than 10 percent of the population. And I think it is safe to say that even this larger sample of 600, that there is more than 6,000 people out there. And so I think it's reasonable to say that we're meeting that independence condition. Even though they're not making it explicit here. But it's good to always think about this. Now the next thing you wanna do in a hypothesis test is set your significance level, your alpha. And I'll set my significance level to 0.05. So we're not going to assume the null hypothesis and say, well what is the probability of getting a difference between 2015 and 2000 that is at least as large as the one that we got. And in that probability is less than our significance level then we would reject our null hypothesis and that would suggest the alternative. If that probability is greater than our significance level, then we fail to reject the null hypothesis and we fail to have evidence for the researchers suspicion. So let's move ahead with that. So what we wanna do, let's come up with a Z value, or a Z score. So our Z is going to be equal to a sample proportion in 2015 minus our sample proportion in 2000. All of that over a standard deviation of the sampling distribution of the difference between the sample proportions in 2015 and 2000. Now, this is going to be, and I will say approximately equal to, we can calculate this numerator exactly, but this denominator we are going to estimate. So this numerator is going to be, let's see, in 2015, I'll use some different colors, 2015 we have 228 cases out of 600. So it's 228 out of 600. And then in 2000, we have 132 cases out of 400. So minus 132 over 400. And then all of that over the square root. And what we use in the denominator here, under the radical sign, is we use the combined proportion. Could write that as P hat sub C. And the reason why we use the combined proportion, we talked about this in previous videos, is remember, when we do a hypothesis test, we assume that our null hypothesis is true. And if our null hypothesis is true there's no difference between our proportions in 2015 and 2000. And so to get a better estimate of the true proportion, well we should just add up our samples. So our sample size would be 600 plus 400 and the number of cases of myopia would be 228 plus 132. Plus 132. Which would get us to, what is this, 360 over 1000 which is equal to 0.36. And there, and we can use that inside the expression when we're trying to estimate our standard deviation of this sampling distribution. So this is going to be 0.36 times one minus 0.36, which would be 0.64 over the sample size in 2015, which is 600, plus 0.36 times 0.64 over the sample size in 2000, which is equal to 400. And let's see, before I even get my calculator out, I think I can simplify this a little bit. 228 over 600, 228 divided by 6 is going to be equal to 38, so this would be 0.38. Let's see, 132 divided by four would be 33, so this would be 0.33. And so our entire numerator is going to be 0.05, 0.05. And so noW I can put this into my calculator and I will get 0.05 divided by the square root of, let's see, I'm gonna have 0.36 times 0.64 divided by 600 plus 0.36 times 0.64 divided by 400 is going to get me approximately 1.61. So this is going to be approximately 1.61. And so one way to think about it is, the difference that we got between our sample proportions, between 2015 and 2000 of 0.05, but that is 1.61 standard deviations above our mean of our sampling distribution, if we assume that the null hypothesis is true. And so from this, we can calculate our P value. Remember, our P value, our P value is equal to the probability that our Z score is at least that big, is greater than or equal to 1.61. And one way to think about it, if you look at the sample distribution, I really could just look at any normal distribution now since we normalized for a Z, so we're looking at 1.61 standard deviations above the mean. So Z is equal to 1.61. So we're thinking about this area right over here. That would be our P value. And to help us with that, we can get out a Z table. And we see this Z table gives us the cumulative area up to some Z score, and so we would just have to whatever this gives us, we would just have to do one minus that. So if we go to 1.61, we get 0.9463. So it would be one minus 0.9463. Is equal to one minus 0.9463, which is equal to, let's see, it's 0.0537. And notice, this P value is ever so slightly higher than our significance level. But this is why we wanna set our significance level ahead of time. We don't wanna get tempted to say oh, I'm so close, let me just raise my significance level a little bit more so that I can reject my null hypothesis and then I can have something that I can tell my friends about. No, that would not be good science. That would not be good statistics. We have to be disciplined. So here, because our P value, our P value is greater than our significance level, even though it's very, it's by a very small amount, we fail to reject our null hypothesis. And another way to think about it, in terms of the context of the question, we can say that there is not enough evidence to suggest that myopia becoming more common over time. Myopia becoming more common over time. And we're done.