If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: AP®︎/College Statistics>Unit 9

Lesson 4: Sampling distributions for sample proportions

# Normal conditions for sampling distributions of sample proportions

Conditions for roughly normal sampling distribution of sample proportions.

## Want to join the conversation?

• What is the intuition for the rule np>=10 , n(1-p) >=10 ?
• Think about the edge case of p = 0.1, then n * 0.1 >= 10 i.e. n >= 100 - so, if your probability parameter is around 10% then you would need at least 100 samples before the SD is so tight that it (i.e. the left side) gets mostly captured in the [0-0.1] interval
• So as n increases, even very low or very high values of p start to produce normal sampling distributions?
Are normal distributions impossible if n < 10?
• As for your first question, you might be interested in this video by jbstatistics on YT: https://www.youtube.com/watch?v=fuGwbG9_W1c.
The part ~ graphs the sample distribution of sample proportion with p=0.04 and increasing n, and the part ~ does the same thing except with p=0.96. That might answer your question.

As for the second question:
First, I think the 10 is more of an arbitrary value than an actual rule. That's because it's quite subjective whether a graph with lower (np) and/or n(1-p) is normal or not (i.e., the video I linked for your first question gives 15 instead of 10 as in this video.)
Secondly though, as you can see in the video on youtube, if your np or n(1-p) takes on small values, a part of your otherwise quite normal histogram gets cut off at zero. Higher np or n(1-p) values kind of makes the distribution skinnier and therefore prevents the cutoff.

I know your question was posted six months earlier, but hopefully this answers your question if you are still confused with it.
• In the first example, how could we tell which way it was going to be skewed?
• Someone correct me if I'm wrong.

The mean of the sampling dist is p (population proportion). If your sampling dist is indeed skewed, then when p is closer to 0 than 1, the top of the distribution "hump" will be closer to 0 than to 1, so it will be skewed to the left, and vice versa.
• what does np represent?
• That's a good question! np is the population mean of a binomial distribution. Where n is the sample size, and p is the probability of success. Since it's a binomial variable, the probability of success is constant.
• Why do the conditions need to be n*p > 10 and n * q > 10?

I thought that n > 30 is what it needs to be in order for any sample distribution of sample statistic to be normally distributed according the central limit theorem.

So,

The sample distribution of the sample means will be normally distributed if n > 30.

The sample distribution of the sample proportions will be normally distributed if n > 30?

What am I missing?
(1 vote)
• The conditions n*p > 10 and n*q > 10 ensure that p is not too close to 0 or 1.

For any given value of n, if p is too close to 0 or 1, then the distribution of the number of successes in a binomial distribution with n trials and success probability p would be significantly asymmetric about its mean (and so significantly non-normal).
• for the first problem, 'a shipment of 50 tangerines everyday' is it means the 'population'? if yes , then how can she sampled 50 tangerines out of 50 population?
• In the example, the shipment of 50 tangerines every day represents the sample size (n), not the population. The population would be the larger pool from which these tangerines are drawn, potentially encompassing all tangerines supplied by the distributor. Emiliana's daily sample of 50 tangerines is considered a random sample from this larger population. The confusion might arise from the wording, but in the context of statistical sampling, the population refers to the total set of observations that could be made, not just the number in a specific shipment.
(1 vote)
• If we already know the true population proportion, why are we interested in calculating a sample proportion? We are using the "true" population proportion to validate the normal distribution of the sample, but then why not just work with the (known to be accurate) population data? Why bother with sampling at all in this case?
(1 vote)
• In short, if the sampling distribution is approximately normal, then we can calculate how likely it is for a sample proportion to deviate from the population proportion by a certain number of standard deviations.

In later lessons we will use this to figure out how likely it is that the population proportion is what it is said to be.
• For these examples, we know that the sample size is 50 and 125 respectively, but what about the number of samples (each of which consists of 50 or 125 in this case) taken? Obviously, the more the number of samples, the smoother the curve, so it's implicitly assumed that they take many samples (of 50 or 125)?
(1 vote)
• It doesn't really matter how many samples we take, the proportion of each sample is still deviating from the population proportion with a probability that resembles that of a normal distribution.

It does however take quite a few samples before we can actually see this in a graph.