If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Conditions for confidence interval for a proportion worked examples

Conditions for confidence intervals for a population proportion worked examples--random condition, independence condition and normal condition.

Want to join the conversation?

  • blobby green style avatar for user Daniel He
    Why is the Condition 3 not met? Here can we interpret that he randomly selects 30 people first and went to them one by one while sampling? This is a bit ambiguous in the context.
    (2 votes)
    Default Khan Academy avatar avatar for user
    • cacteye blue style avatar for user Jerry Nilsson
      Since we're sampling without replacement the first person has a
      1∕150 ≈ 0.67% chance of being picked while the 30th person has a 1/(150 − 29) ≈ 0.83% chance.

      This is too much of a difference to be considered independent, and will cause the expected sample proportion to be different from the population proportion.

      To avoid bias the sample size shouldn't exceed 10% of the population.
      (11 votes)
  • male robot hal style avatar for user tranhungkcn
    So in this example, how is a good model of analysis? we can't increase sample size to meet the 2nd requirement, because will violate the 3rd condition, and also can't decrease sample size to meet 3rd condition because it will violate the 2nd condition?
    (5 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user Salih Tuzen
    I sometimes see the rule np ≥10 and sometimes it says n should be sufficiently large. Are these conditions of different things ? How is central limit theorem related to confidence levels?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • primosaur ultimate style avatar for user Yellow Shiƒt»
      The rule np ≥ 10 is used for binomial distributions. "n should be sufficiently large" refers to the Central Limit Theorem, which states that if n > 30, then we can say that the distribution is approximately normal. If n < 30, then we can not definitively assert that the distribution is approximately normal. (Note that if n is close to 30, say 25 or more, we can sometimes still assume a normal distribution.)

      I am not sure whether you mean confidence level or confidence interval in your second question since confidence level is usually something you choose according to the situation. I will assume you meant confidence intervals in the following explanation:

      To relate the Central Limit Theorem to confidence intervals, we need to look at the formula for a confidence interval. For a normal distribution with a population mean μ and sample mean x̄, the confidence interval would be x̄ ± z*(σ/√n). So if n is small, ie less than 30, the confidence interval would be larger (less confidence in our results). If n is large, the confidence interval would be smaller (more confidence in our results). This makes sense since the more data we have, the more representative the sample is of the population.
      (5 votes)
  • blobby green style avatar for user Parthiban Rajendran
    But taking more samples would give better estimates?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Nick.landry52
    what is required to calculate a confidence interval
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Guilherme Gurgel Prado
    On the matter of independence, what if he samples all 30 individuals at once? He selects 30 people and then asks, instead of drawing a person and then asking.

    Does that change anything?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • aqualine ultimate style avatar for user Liang
    does it apply to np-hat as well? originally it is np (population proportion).
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user earl kraft
    I understand that the sample size relative to population does not meet the description of independence, but two aspects of this exclusion seem problematic in this case. 1) if you were to replace, it seems that you would be more likely to cause deviation between the sample and pop. mean because the replaced item could be counted twice and over-represent that characteristic. 2)if n=p the mean is the same, and if n approximates p the sample mean is likely to be close to or equal the pop. mean. Are these concerns unreasonable?
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

- [Instructor] Ali is in charge of the dinner menu for his senior prom, and he wants to use a one-sample Z interval to estimate what proportion of seniors would order a vegetarian option. He randomly selects 30 of the 150 total seniors and finds that seven of those sampled would order the vegetarian option. Which conditions for constructing this confidence interval did Ali's sample meet? So, pause this video, and you can select more than one of these. Alright now, let's work through this together. So one thing that you might be wondering is, well, what is a one-sample Z interval? Well, you could really interpret that as he's gonna take one sample and then construct a confidence interval based on that. The reason why it might be called a Z interval is the whole idea behind a confidence interval is you're going to pick a number of standard deviations above and below the true parameter that you are actually trying to estimate, and then use that to make your inferences. And one way of thinking about the number of standard deviations, people will often call that a Z score, or Z is often used as a variable for the number of standard deviations above or below something. So really, he's just trying to construct a confidence interval, but remember, in order to construct a confidence interval, we have to make some assumptions. He's taking, there's 150 students, right over here. He's finding it impractical to survey all 150 to figure out the true population proportion. So instead, he samples 30 of the seniors. So, N is equal to 30. And from that, he calculates a sample proportion. Looks like seven out of the 30 are, they want the vegetarian option. And he's going to determine some confidence level and then construct a confidence interval. But remember the conditions that we've talked about in the previous videos. The first thing is, we have to be confident that, is this a random sample? So that would be the random condition, and that's what choice A is telling us. The data is a random sample from the population of interest. Do we know that? Well, it tells us in the passage here, he randomly selects 30 of the total seniors. So I guess we'll take their word for it. We don't know his methodology of what he considers random, but we'll take their word for it, that yes, this has been met. The data is a random sample. If it said he sampled the football team, well, that would not have been a random sample. The next condition here looks all mathematical, but this is really the normal condition. And the idea behind the normal condition is that, in order to construct these confidence intervals, we're assuming that the sampling distribution of the sample proportions is roughly normal, and it is not skewed to the right or skewed to the left like this. And so, right here it says, look, the sample size times our sample proportion has to be greater than/equal to 10. Or our sample size times one minus our sample proportion has to be greater than/equal to 10. Well, another way to think about this is, our successes in our sample need to be greater than/equal to 10, and our failures need to be greater than/equal to 10. Well, how many successes were there? There were seven. And you could even say, look, our N is 30 times our sample proportion is seven over 30, which is going to be seven. So our successes is less than 10, so actually, we violate the normal condition. Once again, this is a rule of thumb, but this is telling us that our actual sampling distribution might be skewed. Remember, this is just based on one sample, what we're able to figure out. This is one sample Z interval. We might be wrong, but we wouldn't feel good that we're meeting the normal condition here, so I would rule this one out. Individual observations can be considered independent. Well, if he randomly selected people with replacement, then they could be independent. Or, if the people he is selecting, if his sample size is less than 10% of the total population, then it could be considered independent, even though it wouldn't be perfectly independent. But we see here that he sampled 30 people out of 150. So his sample size was 30 out of 150, which is the same thing as one fifth of the population, which is the same thing as 20%. And since this is greater than 10%, we are violating the independence condition. We could have met the independence condition if he was sampling with replacement, which it doesn't seem like he is, or if this thing right over here was less than 10%. But we're not meeting that, so we cannot feel good about that constraint. And so, since we're not meeting two of the three constraints for, I would say, valid confidence intervals, or confidence intervals we would feel confident in, this is not so good of an analysis on Ali's part.