If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

### Course: Statistics and probability>Unit 10

Lesson 2: Sampling distribution of a sample proportion

# Sampling distribution of a sample proportion example

Here's the type of problem you might see on the AP Statistics exam where you have to use the sampling distribution of a sample proportion.

## Example: Proportions in polling results

According to the US Census Bureau's American Community Survey, $87\mathrm{%}$ of Americans over the age of 25 have earned a high school diploma. Suppose we are going to take a random sample of $200$ Americans in this age group and calculate what proportion of the sample has a high school diploma.
What is the probability that the proportion of people in the sample with a high school diploma is less than $85\mathrm{%}$?
Let's solve this problem by breaking it down into smaller parts.

### Part 1: Establish normality

Note: The sampling distribution of a sample proportion $\stackrel{^}{p}$ is approximately normal as long as the expected number of successes and failures are both at least $10$.
Question A (Part 1)
What is the expected number of people in the sample with a high school diploma?
people

Question B (Part 1)
What is the expected number of people in the sample without a high school diploma?
people

Question C (Part 1)
Is the sampling distribution of $\stackrel{^}{p}$ approximately normal?
Choose 1 answer:

### Part 2: Find the mean and standard deviation of the sampling distribution

The sampling distribution of a sample proportion $\stackrel{^}{p}$ has:
$\begin{array}{rl}{\mu }_{\stackrel{^}{p}}& =p\\ \\ {\sigma }_{\stackrel{^}{p}}& =\sqrt{\frac{p\left(1-p\right)}{n}}\end{array}$
Note: For this standard deviation formula to be accurate, our sample size needs to be $10\mathrm{%}$ or less of the population so we can assume independence.
Question A (Part 2)
What is the mean of the sampling distribution of $\stackrel{^}{p}$?
${\mu }_{\stackrel{^}{p}}=$

Question B (Part 2)
What is the standard deviation of the sampling distribution of $\stackrel{^}{p}$?
You may round your answer to three decimal places.
${\sigma }_{\stackrel{^}{p}}=$

### Part 3: Use normal calculations to find the probability in question

What is the probability that the proportion of people in the sample with a high school diploma is less than $85\mathrm{%}$?
Choose 1 answer:

## Want to join the conversation?

• why is the formula for Standard deviation either sqrt(np(1-p)) OR its sqrt(p(1-p)/n)?
(3 votes)
• It depends on what quantity you’re taking the standard deviation of. In a binomial distribution, the first formula you wrote is the standard deviation of the number of successes, while the second formula you wrote is the standard deviation of the sample proportion of successes.

Have a blessed, wonderful day!
(24 votes)
• how do you tell if the sampling distribution is describing proportions or means
(2 votes)
• Proportions would sound like "40% of the population knows morse code," where it's a "yes or no" situation. They either do have a certain trait/item or don't. In the case of proportions, the p given is the mean, as seen in the problem on this page.
For means, it would be more like "The average age of people that know morse code is 50 years old," where there's a range of possible values.
I hope this helps!
(The situations I made up don't contain real data lol)
(3 votes)
• Sorry, but using a normal distribution to solve this problem gives incorrect results.

The sampling distribution is a binomial distribution. Using the formula for binomial distributions, one can determine that exactly 85% of the sample has a high school diploma is a whopping 0.0561. It therefore makes a huge difference if we are looking at the probability that the 85% or less of the sample have a high school diploma, or if we are looking at the probability that strictly less than 85% have a diploma. Using the binomial distribution formula again, the former gives a value of 0.2273 and the latter 0.1711. Since the question is asking about P(p^​<0.85), 0.17 is in fact the correct answer.
(3 votes)
• The sampling distribution of a sample proportion is based on the binomial distribution. The binomial distribution provides the exact probabilities for the number of successes in a fixed number of independent Bernoulli trials (like success/failure or yes/no).

When the sample size is large, the sampling distribution of the sample proportion can be approximated by a normal distribution due to the Central Limit Theorem. However, when dealing with exact probabilities, especially at specific proportions like "exactly 85%", it's crucial to use the binomial distribution rather than the normal approximation.
(1 vote)
• What is the best way to find standard deviarion.
(0 votes)
• For the sampling distribution of a sample proportion, the standard deviation (SD) can be calculated using the formula:

SD = sqrt(p(1 - p) / n)

where p is the population proportion and n is the sample size.
(1 vote)
• In a set of 10,000 invoices,it is known that 500 contain errors.If 100 of the 10,000 invoices are randomly selected,what is the probability that the sample proportion of invoices with errors will exceed 0.08?
(0 votes)
• First, calculate your population proportion.
p = 500/10,000 = 0.05

Your sample size is 100.

Next, check for normality.
np >= 10 AND n(1-p) >= 10
100*0.05 = 5 which is NOT >= 10.
100*0.95 = 95 which IS >= 10.

The sample distribution of sample proportions violates normality.
(4 votes)
• Why do we need to prove independence to get the sample proportion standard deviation and not to get the mean ?
(1 vote)
• Independence is a crucial assumption for using the standard deviation formula of the sample proportion. This assumption ensures that the sampling distribution behaves similarly to the binomial distribution.

The mean does not require the same independence assumption because the expected value of the sample proportion is directly related to the population proportion and the sample size, and it doesn't rely on individual outcomes being independent.
(1 vote)
• I don't really know what I'm doing... How do I find the answer to something using only the mean, the standard deviation, and the total population while knowing there are only two possible outcomes in the total population.
(1 vote)
• You'd use the properties of the normal distribution. The mean (μ) represents the center of the distribution, and the standard deviation (σ) represents the spread.

If there are only two possible outcomes and you want to find the probability of one of them, you'd use the Normal CDF function in Excel or similar tools.
(1 vote)
• I'm still confused as to how we can use normal calculations, like a z-table.

The sampling distribution (of sample proportions) is a discrete distribution, and on a graph, the tops of the rectangles represent the probability.
The z-table/normal calculations gives us information on the area underneath the normal curve, since normal dists are continuous.

So we have a sampling dist, and we want to find the probability that we get a sample proportion that is less than 0.7. We know that the dist is approximately normal, and we have it's mean, and SD.
The probability that sample proportion < 0.7 is the tops of all the rectangles below 0.7 summed up for the sampling distribution.
But, for the normal dist (density curve) that approximates our sampling dist, using normalcdf on a calculator or a z-table gives us the proportion of the area under the curve that is < 0.7.

So how is the tops of all the rects below 0.7 summed up equal to the area of the rectangles (area under the normal curve) that is below 0.7?
(1 vote)
• "The sampling distribution (of sample proportions) is a discrete distribution, and on a graph, the tops of the rectangles represent the probability.
The z-table/normal calculations gives us information on the area underneath the normal curve, since normal dists are continuous."

But we have not here barrs whre the top of each bar represent a probability. What we have here is a dot plot, and the height of each "bar" in a dot plot doesn´t represent a probability.
(0 votes)
• Hi, I do not have a calcultor as used in this exercise. Can someone explain, how can I use Excel to get the Normal CDF. I have tried and my answer is not the same. Many Thanks
(0 votes)
• To calculate the Normal CDF in Excel, you can use the NORM.DIST function.

For the cumulative probability to the left of a specific value (e.g., less than 0.85), use:
=NORM.DIST(0.85, mean, standard deviation, TRUE)

For the cumulative probability to the right of a specific value, use: =1 - NORM.DIST(0.85, mean, standard deviation, TRUE)
(1 vote)
• It's not clear to me why in this case we didn't use the Z formula for Central Limit Theorem and instead the standard formula for Z was used?
(0 votes)
• In this case, we are taking one random p-hat out of the whole sample distribution of p-hats, making our sample size _n_=1. Therefore, it doesn't really matter if we apply the CLT z-formula or the standard z-formula, as both give the same result when we plug in n=1.

IDK which playlist you are on, or if you just found the practice by searching, but I guess that since the AP Statistics playlist hasn't covered the CLT yet, the editor simply used the z-score formula for a single x.

I know that you posted the question six months ago, but hopefully my answer helps if you are still confused.
(3 votes)