If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: AP®︎/College Statistics>Unit 9

Lesson 5: Sampling distributions for differences in sample proportions

# Differences of sample proportions — Probability examples

Practice using shape, center (mean), and variability (standard deviation) to calculate probabilities of various results when we're dealing with sampling distributions for the differences of sample proportions.

## Intro and review

In this article, we'll practice applying what we've learned about sampling distributions for the differences in sample proportions to calculate probabilities of various sample results.
Skip ahead if you want to go straight to some examples.
Here's a review of how we can think about the shape, center, and variability in the sampling distribution of the difference between two proportions ${\stackrel{^}{p}}_{1}-{\stackrel{^}{p}}_{2}$:

### Shape

The shape of a sampling distribution of ${\stackrel{^}{p}}_{1}-{\stackrel{^}{p}}_{2}$ depends on whether both samples pass the large counts condition.
• If we expect at least $10$ successes and at least $10$ failures in both samples, then the sampling distribution of ${\stackrel{^}{p}}_{1}-{\stackrel{^}{p}}_{2}$ will be approximately normal.
• If one or more of these counts is less than $10$, then the sampling distribution won't be approximately normal.

### Center

The mean difference is the difference between the population proportions:
${\mu }_{{\stackrel{^}{p}}_{1}-{\stackrel{^}{p}}_{2}}={p}_{1}-{p}_{2}$

### Variability

The standard deviation of the difference is:
${\sigma }_{{\stackrel{^}{p}}_{1}-{\stackrel{^}{p}}_{2}}=\sqrt{\frac{{p}_{1}\left(1-{p}_{1}\right)}{{n}_{1}}+\frac{{p}_{2}\left(1-{p}_{2}\right)}{{n}_{2}}}$
(where ${n}_{1}$ and ${n}_{2}$ are the sizes of each sample).
This standard deviation formula is exactly correct as long as we have:
• Independent observations between the two samples.
• Independent observations within each sample*.
*If we're sampling without replacement, this formula will actually overestimate the standard deviation, but it's extremely close to correct as long as each sample is less than $10\mathrm{%}$ of its population.
Let's try applying these ideas to a few examples and see if we can use them to calculate some probabilities.

## Example 1

Yuki is a candidate is running for office, and she wants to know how much support she has in two different districts. Yuki doesn't know it, but $45\mathrm{%}$ of the $8,000$ voters in District A support her, while $40\mathrm{%}$ of the $6,500$ voters in District B support her.
Yuki hires a polling firm to take separate random samples of $100$ voters from each district. The firm will then look at the difference between the proportions of voters who support her in each sample $\left({\stackrel{^}{p}}_{\text{A}}-{\stackrel{^}{p}}_{\text{B}}\right)$.
Question 1.1
What are the mean and standard deviation of the sampling distribution of ${\stackrel{^}{p}}_{\text{A}}-{\stackrel{^}{p}}_{\text{B}}$?
Round to three decimal places.

## Example 2

A company has two offices, one in Mumbai, and the other in Delhi.
• Each office has about $600$ total employees.
• $85\mathrm{%}$ of the employees at the Mumbai office are younger than $40$ years old.
• $81\mathrm{%}$ of the employees at the Delhi office are younger than $40$ years old.
The company plans on taking separate random samples of $50$ employees from each office. They'll look at the difference between the proportions of employees in each sample that are younger than $40$ years old $\left({\stackrel{^}{p}}_{\text{M}}-{\stackrel{^}{p}}_{\text{D}}\right)$.
The company wonders how likely it is that the difference between the two samples is greater than $10$ percentage points.
Question 2.1
Why is it inappropriate to use a normal distribution to calculate this probability?
• n is the sample size, not the population size. Since the question states, "Yuki hires a polling firm to take separate random samples of 100 voters from each district," the sample size is 100 for each district.