If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## AP®︎/College Statistics

### Course: AP®︎/College Statistics>Unit 9

Lesson 5: Sampling distributions for differences in sample proportions

# Differences of sample proportions — Probability examples

Practice using shape, center (mean), and variability (standard deviation) to calculate probabilities of various results when we're dealing with sampling distributions for the differences of sample proportions.

## Intro and review

In this article, we'll practice applying what we've learned about sampling distributions for the differences in sample proportions to calculate probabilities of various sample results.
Skip ahead if you want to go straight to some examples.
Here's a review of how we can think about the shape, center, and variability in the sampling distribution of the difference between two proportions p, with, hat, on top, start subscript, 1, end subscript, minus, p, with, hat, on top, start subscript, 2, end subscript:

### Shape

The shape of a sampling distribution of p, with, hat, on top, start subscript, 1, end subscript, minus, p, with, hat, on top, start subscript, 2, end subscript depends on whether both samples pass the large counts condition.
• If we expect at least 10 successes and at least 10 failures in both samples, then the sampling distribution of p, with, hat, on top, start subscript, 1, end subscript, minus, p, with, hat, on top, start subscript, 2, end subscript will be approximately normal.
• If one or more of these counts is less than 10, then the sampling distribution won't be approximately normal.

### Center

The mean difference is the difference between the population proportions:
mu, start subscript, p, with, hat, on top, start subscript, 1, end subscript, minus, p, with, hat, on top, start subscript, 2, end subscript, end subscript, equals, p, start subscript, 1, end subscript, minus, p, start subscript, 2, end subscript

### Variability

The standard deviation of the difference is:
sigma, start subscript, p, with, hat, on top, start subscript, 1, end subscript, minus, p, with, hat, on top, start subscript, 2, end subscript, end subscript, equals, square root of, start fraction, p, start subscript, 1, end subscript, left parenthesis, 1, minus, p, start subscript, 1, end subscript, right parenthesis, divided by, n, start subscript, 1, end subscript, end fraction, plus, start fraction, p, start subscript, 2, end subscript, left parenthesis, 1, minus, p, start subscript, 2, end subscript, right parenthesis, divided by, n, start subscript, 2, end subscript, end fraction, end square root
(where n, start subscript, 1, end subscript and n, start subscript, 2, end subscript are the sizes of each sample).
This standard deviation formula is exactly correct as long as we have:
• Independent observations between the two samples.
• Independent observations within each sample*.
*If we're sampling without replacement, this formula will actually overestimate the standard deviation, but it's extremely close to correct as long as each sample is less than 10, percent of its population.
Let's try applying these ideas to a few examples and see if we can use them to calculate some probabilities.

## Example 1

Yuki is a candidate is running for office, and she wants to know how much support she has in two different districts. Yuki doesn't know it, but 45, percent of the 8, comma, 000 voters in District A support her, while 40, percent of the 6, comma, 500 voters in District B support her.
Yuki hires a polling firm to take separate random samples of 100 voters from each district. The firm will then look at the difference between the proportions of voters who support her in each sample left parenthesis, p, with, hat, on top, start subscript, start text, A, end text, end subscript, minus, p, with, hat, on top, start subscript, start text, B, end text, end subscript, right parenthesis.
Question 1.1
• Current
What are the mean and standard deviation of the sampling distribution of p, with, hat, on top, start subscript, start text, A, end text, end subscript, minus, p, with, hat, on top, start subscript, start text, B, end text, end subscript?
Round to three decimal places.

## Example 2

A company has two offices, one in Mumbai, and the other in Delhi.
• Each office has about 600 total employees.
• 85, percent of the employees at the Mumbai office are younger than 40 years old.
• 81, percent of the employees at the Delhi office are younger than 40 years old.
The company plans on taking separate random samples of 50 employees from each office. They'll look at the difference between the proportions of employees in each sample that are younger than 40 years old left parenthesis, p, with, hat, on top, start subscript, start text, M, end text, end subscript, minus, p, with, hat, on top, start subscript, start text, D, end text, end subscript, right parenthesis.
The company wonders how likely it is that the difference between the two samples is greater than 10 percentage points.
Question 2.1
Why is it inappropriate to use a normal distribution to calculate this probability?