Main content

## Statistics and probability

### Course: Statistics and probability > Unit 6

Lesson 2: Sampling and observational studies- Reasonable samples
- Valid claims
- Making inferences from random samples
- Identifying a sample and population
- Identify the population and sample
- Examples of bias in surveys
- Example of undercoverage introducing bias
- Correlation and causality
- Identifying bias in samples and surveys
- Simulation and randomness: Random digit tables

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Identifying bias in samples and surveys

AP.STATS:

DAT‑2 (EU)

, DAT‑2.E (LO)

, DAT‑2.E.1 (EK)

, DAT‑2.E.2 (EK)

, DAT‑2.E.3 (EK)

, DAT‑2.E.4 (EK)

, DAT‑2.E.5 (EK)

, DAT‑2.E.6 (EK)

, VAR‑1 (EU)

, VAR‑1.E (LO)

, VAR‑1.E.1 (EK)

CCSS.Math: It's important to identify potential sources of bias when planning a sample survey.

When we say there's potential bias, we should also be able to argue if the results will probably be an overestimate or an underestimate.

Try to identify the source of bias in each scenario, and speculate on the direction of the bias (overestimate or underestimate).

## Scenario 1

David hosts a podcast and he is curious how much his listeners like his show. He decides to start with an online poll. He asks his listeners to visit his website and participate in the poll.

The poll shows that 89, percent of the 200 respondents "love" his show.

## Scenario 2

David hosts a podcast and he is curious how much his listeners like his show. He decides to poll the next 100 listeners who send him fan emails.

They don't all respond, but 94 of the 97 listeners who responded said they "loved" his show.

## Scenario 3

A senator wanted to know about how people in her state felt about internet privacy issues. She conducted a poll by calling 100 people whose names were randomly sampled from the phone book (note that mobile phones and unlisted numbers aren't in phone books). The senator's office called those numbers until they got a response from all 100 people chosen.

The poll showed that 42, percent of respondents were "very concerned" about internet privacy.

## Scenario 4

A senator wanted to know about how people in her state felt about internet privacy issues. She conducted a poll by calling people using random digit dialing, where computers randomly generate phone numbers so unlisted and mobile numbers can still be reached. They called over 1, comma, 000 random phone numbers—most people didn't answer—until they had reached 100 respondents.

The poll showed that 46, percent of respondents were "very concerned" about internet privacy.

## Scenario 5

A high school wanted to know what percent of its students smoke cigarettes. During the week when students visited the counselors to schedule classes, they asked every student in person if they smoked cigarettes or not.

The data showed that 5, percent of students smoked cigarettes.

## Scenario 6

A high school wanted to know what percent of its students smoke cigarettes. Counselors selected a random sample of students to take a survey on drug use. One of the questions reads, "If you are under the age of 18 years, do you illegally smoke cigarettes?"

The data showed that 5, percent of students smoked cigarettes.

## Want to join the conversation?

- How is voluntary bias different from non responsive bias?(22 votes)
- Voluntary response bias occurs when the sampling population has the ability to not respond. Referencing the podcast show example, the negative effect of allowing listeners to respond voluntary is that a majority of those that enjoyed the show would have more desired and spend time to answer a question, rather than those who didn't find enjoyment from the show. When a large proportion of the population in question doesn't respond, the random sample size is reduced and non responsive bias becomes an issue. If 1,000 people are sampled, and only 100 people respond, a 90% non responsive rate would result in a non responsive bias.(55 votes)

- I have a question.

A reporter from the newspaper wanted to know how much time do students spent on homework in a typical week, so he passes out questionnaires to students in a grade 9 English class, an art class, and a grade 12 math class. After some time, he then collects them. So is this biased or not?(10 votes)- This would be biased as it is not a random sample of all students.If he went to all kinds of schools and handed out surveys at random it would be an unbiased survey.(5 votes)

- Perhaps scenarios 1-3 could be mixed up a little? They are identical to the worked example from a previous video on sample bias so I find I'm just parroting back the answers rather than actually having my knowledge tested.(6 votes)
- what is convenience sampling?(3 votes)
- Convenience sampling is given away by its name. It is when the sample you chose is the most convenient for you to sample. For example I could conduct a study about
**overall**satisfaction of**any**online learning program. I could use a sample of only people on Khan Academy learning statistics. That sample is convenient to me because I am on Khan Academy learning statistics. However that does not reflect overall satisfaction of any online learning program, it only shows us the satisfaction for people learning stats on Khan.(5 votes)

- Is there a book that I can read for deep study of statistics?(0 votes)
- Sal made use of the CK-12 book for some sample questions. https://www.ck12.org/book/CK-12-Probability-and-Statistics-Concepts/(2 votes)

- I disagree with Scenario 4's direction of bias. When the senator is polling people who are still using listed landlines, they are likely avoiding using mobile devices intentionally. Those of us who use mobile devices are generally less concerned with internet safety than those who avoid devices with internet access like mobile phones. Can someone explain what I'm missing here? I just think the bias is actually showing an overestimation of the population's view of internet safety.(2 votes)
- From the author:Hi! Scenario 4 says, "She conducted a poll by calling people using random digit dialing, where computers randomly generate phone numbers so unlisted and mobile numbers can still be reached."

That means the sample included folks who use landlines and mobile devices. The big issue here is that they tried calling over 1000 phone numbers and most people didn't answer. It's likely that the folks who didn't answer a call from a strange phone number are more concerned about privacy than the 100 people who did answer the call.

So if this survey finds that 46% of the 100 respondents are concerned about internet privacy, I'd bet that's an underestimate since the group who didn't answer might care about privacy more in general.(4 votes)

- Aren't these questions supposed to test our knowlege of the subject? these are all either common sense or in the video.(3 votes)
- what is the purpose of all this as far weather the questions are bias or not.(3 votes)
- Pedtrol was Here(2 votes)
- What is the difference between Biased wording and response bias?(2 votes)
- Response bias is when people feel like one of the responses is "bad" so they don't choose it even if it is the truthful answer. Wording Bias is when the words make them feel like that instead of them feeling like that already(1 vote)