If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: Statistics and probability>Unit 10

Lesson 2: Sampling distribution of a sample proportion

# Sampling distribution of sample proportion part 2

Building intuition for the sampling distribution of sample proportions using a simulation.

## Want to join the conversation?

• How do I get to that program?
• "So, we're gonna do 50 samples of ten at a time." v/s "And so, here, we can quickly get to a fairly large number of samples. So here, we're over a thousand samples." These two sentences from the transcript, how do they relate to the previous (part 1) video? What is the value of 'n' here, is it 50 or 1050?
(1 vote)
• n is 10 here, we're just taking 50 samples of 10 at once, instead of clicking the button 50 times.
• When Sal says at about that “we saw the relation between the sampling distribution of the sample proportion and a binomial random variable,” is he talking about the ideas in this video? https://www.khanacademy.org/math/statistics-probability/random-variables-stats-library/binomial-random-variables/v/visualizing-a-binomial-distribution
• Why isn't the program workiung?
• Ensure you have the necessary libraries installed (numpy and matplotlib), and check for any syntax errors or typos in the code. Also, confirm that your Python environment is correctly set up to run the script.
(1 vote)
• Sal mentions about standard deviation in the video. I am confused why it's standard deviation and not standard error, since we are dealing with a sampling distribution here?
• Sal's mention of standard deviation in the context of a sampling distribution can indeed be confusing because in statistics, when we refer to the variability of a sampling distribution, we often use the term "standard error." The standard error (SE) of the sample proportion is calculated as:

SE = sqrt(p(1 - p) / n)

where p is the population proportion, and n is the sample size. The term "standard deviation" Sal used could be referring to the standard error of the sample proportion, which describes how spread out the sample proportions are around the mean sample proportion.
(1 vote)
• Hey,thanks for this super video.I am referring to our 10% rule.Based on this can we have a rule of thumb that a reasonable sample needs to have a size of at least 10 % of population to be studied?Regards
(1 vote)
• For a proportion, the normal approximation is generally good if np and n(1-p) are each at least 10. We also want the sample size to be 10% or less of the population size, so that the effects of selection without replacement (instead of with replacement) are small, meaning that the independence assumption gives a good approximation.
• is their a video of this for standard deviation
(1 vote)
• python code for this anyone?
(1 vote)
• Below is an example Python code that simulates drawing samples from a population of gumballs where a certain percentage is green, and then calculates the sampling distribution of the sample proportion.

import numpy as np
import matplotlib.pyplot as plt

def simulate_gumballs(population_proportion, sample_size, num_samples):
# Simulate drawing samples
sample_proportions = np.random.binomial(sample_size, population_proportion, num_samples) / sample_size

# Calculate mean and standard deviation of sample proportions
mean_sample_proportion = np.mean(sample_proportions)
std_dev_sample_proportion = np.std(sample_proportions, ddof=1)

# Plot the distribution of sample proportions
plt.hist(sample_proportions, bins=30, edgecolor='k', alpha=0.7)
plt.axvline(mean_sample_proportion, color='red', linestyle='dashed', linewidth=1)
plt.title('Sampling Distribution of the Sample Proportion')
plt.xlabel('Sample Proportion')
plt.ylabel('Frequency')
plt.show()

return mean_sample_proportion, std_dev_sample_proportion

# Parameters
population_proportion = 0.6 # 60% are green
sample_size = 10
num_samples = 1000

# Run simulation
mean, std_dev = simulate_gumballs(population_proportion, sample_size, num_samples)
print(f"Mean of the sampling distribution: {mean}")
print(f"Standard Deviation of the sampling distribution: {std_dev}")
(1 vote)