Main content

### Course: Statistics and probability > Unit 10

Lesson 2: Sampling distribution of a sample proportion- Sampling distribution of sample proportion part 1
- Sampling distribution of sample proportion part 2
- Normal conditions for sampling distributions of sample proportions
- The normal condition for sample proportions
- Mean and standard deviation of sample proportions
- Probability of sample proportions example
- Finding probabilities with sample proportions
- Sampling distribution of a sample proportion example

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Sampling distribution of sample proportion part 2

Building intuition for the sampling distribution of sample proportions using a simulation.

## Want to join the conversation?

- for anyone want to play with the experiment, here is the URL https://www.khanacademy.org/computer-programming/candy-sampling-distribution/5180356611473408(18 votes)
- How do I get to that program?(9 votes)
- "So, we're gonna do 50 samples of ten at a time." v/s "And so, here, we can quickly get to a fairly large number of samples. So here, we're over a thousand samples." These two sentences from the transcript, how do they relate to the previous (part 1) video? What is the value of 'n' here, is it 50 or 1050?(1 vote)
- n is 10 here, we're just taking 50 samples of 10 at once, instead of clicking the button 50 times.(5 votes)

- When Sal says at about2:18that “we saw the relation between the sampling distribution of the sample proportion and a binomial random variable,” is he talking about the ideas in this video? https://www.khanacademy.org/math/statistics-probability/random-variables-stats-library/binomial-random-variables/v/visualizing-a-binomial-distribution(3 votes)
- Why isn't the program workiung?(2 votes)
- Ensure you have the necessary libraries installed (
`numpy`

and`matplotlib`

), and check for any syntax errors or typos in the code. Also, confirm that your Python environment is correctly set up to run the script.(1 vote)

- Sal mentions about standard deviation in the video. I am confused why it's standard deviation and not standard error, since we are dealing with a sampling distribution here?(2 votes)
- Sal's mention of standard deviation in the context of a sampling distribution can indeed be confusing because in statistics, when we refer to the variability of a sampling distribution, we often use the term "standard error." The standard error (SE) of the sample proportion is calculated as:

SE = sqrt(p(1 - p) / n)

where p is the population proportion, and n is the sample size. The term "standard deviation" Sal used could be referring to the standard error of the sample proportion, which describes how spread out the sample proportions are around the mean sample proportion.(1 vote)

- Hey,thanks for this super video.I am referring to our 10% rule.Based on this can we have a rule of thumb that a reasonable sample needs to have a size of at least 10 % of population to be studied?Regards(1 vote)
- For a proportion, the normal approximation is generally good if np and n(1-p) are each at least 10. We also want the sample size to be 10% or less of the population size, so that the effects of selection without replacement (instead of with replacement) are small, meaning that the independence assumption gives a good approximation.(2 votes)

- is their a video of this for standard deviation(1 vote)
- python code for this anyone?(1 vote)
- Below is an example Python code that simulates drawing samples from a population of gumballs where a certain percentage is green, and then calculates the sampling distribution of the sample proportion.
`import numpy as np`

import matplotlib.pyplot as plt

def simulate_gumballs(population_proportion, sample_size, num_samples):

# Simulate drawing samples

sample_proportions = np.random.binomial(sample_size, population_proportion, num_samples) / sample_size

# Calculate mean and standard deviation of sample proportions

mean_sample_proportion = np.mean(sample_proportions)

std_dev_sample_proportion = np.std(sample_proportions, ddof=1)

# Plot the distribution of sample proportions

plt.hist(sample_proportions, bins=30, edgecolor='k', alpha=0.7)

plt.axvline(mean_sample_proportion, color='red', linestyle='dashed', linewidth=1)

plt.title('Sampling Distribution of the Sample Proportion')

plt.xlabel('Sample Proportion')

plt.ylabel('Frequency')

plt.show()

return mean_sample_proportion, std_dev_sample_proportion

# Parameters

population_proportion = 0.6 # 60% are green

sample_size = 10

num_samples = 1000

# Run simulation

mean, std_dev = simulate_gumballs(population_proportion, sample_size, num_samples)

print(f"Mean of the sampling distribution: {mean}")

print(f"Standard Deviation of the sampling distribution: {std_dev}")(1 vote)

- I am slightly confused about the relationship about the distribution.(1 vote)
- The relationship between the population proportion, sample size, and the shape of the sampling distribution of the sample proportion is foundational in statistics. When the sample size is large enough (commonly using the rule of thumb n ⋅ p ≥ 10 and n ⋅ (1 − p) ≥ 10), the sampling distribution of the sample proportion will be approximately normal due to the Central Limit Theorem, regardless of the shape of the population distribution. The mean of this sampling distribution equals the population proportion (p), and its standard error (a measure of spread) decreases as the sample size (n) increases, highlighting the inverse relationship between sample size and the standard error of the sample proportion.(1 vote)

## Video transcript

- [Instructor] This, right over here, is a scratch pad on Khan Academy, created by Khan Academy
user Charlotte Auen. And, what you see here, is
a simulation that allows us to keep sampling from
our gumball machine, and start approximating
the sampling distribution of the sample proportion. So, her simulation focuses on
green gumballs, but we talked about yellow before, and
the yellow gumballs, we said 60% were yellow, so let's
make 60% here green. And then, let's take samples of ten, just like we did before. And then, let's just
start with one sample. So, we're gonna draw one
sample, and what we wanna show, is we wanna show the percentages. Which if the proportion of
each sample, that are green. So, if we draw that first
sample, notice out of the ten, five ended up being green,
and then it plotted that right over here, under 50%. We have one situation where
50% were green, now let's do another sample, so this
sample 60% are green. And so, let's keep going. Let's draw another sample. And now that one, we have,
we have 50% are green, and so notice now we see
here on this distribution; two of them had 50% green. Now, we could keep drawing samples, and let's just really increase. So, we're gonna do 50
samples of ten at a time. And so, here, we can
quickly get to a fairly large number of samples. So here, we're over a thousand samples. And, what's interesting here, is we're seeing experimentally,
that our sample; the mean of our sample proportion here, is zero point six two. What we calculated, a few
minutes ago, was that it should be zero point six. We also see that the standard
deviation of our sample proportion, is zero point one six. And what we calculated was approximately zero point one five. And as we draw more and more
samples, we should get even closer, and closer to those values. And, we see that, for the most
part, we are getting closer, and closer, in fact,
now that it's rounded, we are at exactly those values, that we had calculated before. Now, one interesting thing to observe is, when your population proportion
is not too close to zero, and not too close to one, this looks pretty close
to a normal distribution. And that makes sense. Because, we saw the relation
between the sampling distribution of the sample proportion, and a binomial random variable. But, what if our population
proportion is closer to zero? So, let's say our population
proportion is ten percent. Zero point one. What do you think the
distribution is going to look like then? Well, we know that the mean
of our sampling distribution is going to be ten percent,
and so you could imagine that the distribution is
going to be right skewed. But, let's actually see that. So, here we see that our distribution is indeed, right skewed. And that makes sense. Because, you can only get
values from zero to one, and if your mean is closer
to zero, then you're gonna see the meat of your distribution
here, and then you're gonna see a long tail to the right. Which creates that right skew. And, if your population
proportion was close to one, well, you can imagine the
opposite is going to happen. You're going to end up with a left skew. And, we indeed, see right
over here, a left skew. Now, the other interesting
thing to appreciate is, the larger your samples, the
smaller the standard deviation. And so, let's do a population proportion that is right in-between. And so, here, this is similar
to what we saw before, this is looking roughly normal. But now, and that's when
we had sample size of ten, but, what if we have a
sample size of 50 every time? Well, notice, now it looks like
a much tighter distribution. This isn't even going
all the way to one yet, but it is a much tighter distribution. And, the reason why that made
sense, the standard deviation of your sample proportion,
it is inversely proportional to the square root of "n". And, so, that makes sense. So, hopefully you have a good
intuition now, for the sample proportion, it's distribution,
the sampling distribution of the sample proportion
that you can calculate it's mean, and its' standard deviation. And you feel good about it, because we saw it in a simulation.