Main content
High school statistics
Course: High school statistics > Unit 5
Lesson 2: Potential problems with samplingExamples of bias in surveys
AP.STATS:
DAT‑2 (EU)
, DAT‑2.E (LO)
, DAT‑2.E.1 (EK)
, DAT‑2.E.2 (EK)
, DAT‑2.E.6 (EK)
, VAR‑1 (EU)
, VAR‑1.E (LO)
, VAR‑1.E.1 (EK)
CCSS.Math: Examples of bias in surveys.
Want to join the conversation?
- What are the differences between voluntary response sampling, Response bias, and under coverage?(17 votes)
- Voluntary response bias occurs when there sample is responding to the question without being randomly selected. The sample chooses themselves to partake in the survey. This creates bias because people with strong opinions (often in the same direction) are most likely to respond.
Response bias is a systematic pattern of incorrect responses in a sample survey. These people can be: untruthful-- for several reasons: sensitive question, socially acceptable answer, or telling the interviewer what he or she wants to hear; Ignorant-- People give silly answers just so they won't appear like they know nothing about the subject; lack of memory-- give a wrong answer simply because a subject cannot remember; or timing-- When a survey is taken can have an impact on the answers.
Under coverage occurs when the design of the study does not cover everyone in the population (because they cannot be reached or they are left out); For instance, using a random phone number generator for landlines to get a sample from the population, but not everyone in the population owns landlines anymore, so they get excluded and are left out, or doing a survey when people cannot be reached.
I hope this helps you!(47 votes)
- What are the differences between undercoverage and convenience sampling ?(7 votes)
- Undercoverage is where we haven't sampled enough of the population to make any valid conclusions.
Convenience sampling is where we chose a sample from the population based on how convenient it is to sample them. This introduces a bias and is not representative.(8 votes)
- Is voluntary response when those being asked have the option not to respond, or just when the question itself does not have an assigned sample it is asking?
A question on the Practice: Bias in Samples and Surveys exercises read like this, "A mobile phone service provider wants to survey its customers to study privacy concerns and the sharing of their personal information. They call 5,000 randomly selected phone numbers from a database containing the phone number of every customer. If someone selected doesn't answer, they'll attempt calling back up to 2 more times before giving up on reaching that person.
They reach 350 customers with this strategy, and 60% of those reached say they are at least "somewhat concerned" about their personal information being shared without their knowledge or consent.
Which of these is the most concerning potential source of bias in the provider's survey?"
The answer is Nonresponse bias because of how many did not respond, but one of the options was bias from voluntary response. The reason it gives for this not being correct is, "Voluntary response is when a researcher gives an open invitation and people decide to be in the sampler not. the service provider selected a random sample of 5000 customers so they didn't use a voluntary response strategy"
Again, I know it isn't the correct answer, but I thought voluntary response was a correct way of describing the situation. If not then voluntary response seems like a not so accurate label(2 votes)- this is the best response ever , thanks(1 vote)
- Can you still solve with negative numbers?(2 votes)
- That would be a hypothetical situation, you cant have -200 podcast listeners. That would be called an imaginary number. of course, I'm not an expert.(1 vote)
- and and and and a a a a a uhmm uhmmm and.......
lol(1 vote) - a survey of high school students to measure teenage use of illegal drugs will be a biased sample because it does not include home-schooled students or dropouts. A sample is also biased if certain members are underrepresented or overrepresented relative to others in the population.(1 vote)
- What is the right way to observe something other than using the voluntary response sampling? I know it's intuitive that voluntary response sampling may skew the result, but are we supposed to force people involuntarily to be our subject of observation? Or maybe we don't force them necessarily, it's just that they don't have a choice to refuse? Well, it's still coercion, isn't it? I still don't get it.(1 vote)
- Can you still solve with negative numbers?(1 vote)
- A satisfaction survey could ask participants if they were extremely satisfied, satisfied, or dissatisfied.(1 vote)
Video transcript
- [Instructor] We're told
that David hosts a podcast and he is curious how much
his listeners like his show. He decides to start with an online poll. He asks his listeners to visit his website and participate in the poll. The poll shows that 89% of
about 200 respondents "love" his show. What is the most concerning
source of bias in this scenario? And well, like always,
pause this video and see if you can figure it out on your own and then we'll work through it together. Let's think about what's going on. He has this population
of listeners, right? I'll assume that the number
of listeners is more than 200. And he says, "Hey I want to find a sample, "and I can't ask all of my listeners." Who knows, maybe he has 10,000 listeners, they don't tell us that, but let's say there's 10,000 listeners here. And he says, "Well I
want to get an indication "of what percentage like my show. "So I need a sample." But instead of taking
a truly random sample, he asks them to volunteer. He asks his listeners
to visit his website. So that's classic volunteer
response sampling. This is not random because who decides to go to his website and
listen to what he just said, and maybe even has access to a computer. That's not random. In fact, the people
more likely to do that, so these are the people out of the 10,000, these are the 200 responders
here who decided to do it. These are more likely to be
the people who already like David or like to listen to
what he tells them to do. The people, the listeners
who are not into David or don't want to do what
he tells them to do, they're unlikely to say, "Oh,
I'm not really into David "and I don't like him
telling me what to do, "but hey, I'm gonna go
to his website anyway, "I'm gonna fill out that poll." That's less likely. Or you might get extremes,
people who really don't like him, might say, "I'm gonna
definitely go there." But in this case, I would
say that it's more likely your fans are gonna do
what you ask them to do and go to your website and
spend time on your website. And because of that, that 89%
is probably an overestimate. 89% is probably an overestimate
of the number of listeners who really love his show. Cause you're more likely to
get the ones who love him to show up and fill
out that actual survey. Now these other forms of bias. Response bias, this is when
you're asking something that people don't necessarily
want to answer truthfully, or the way that it's phrased,
it might make someone respond, you see, in a biased way. Classic examples of this are like, "Have you lied to your
parents in the past week?" Or "have you ever cheated on your spouse." Something, "do you smoke?" Any of these things that
people might not want to answer completely truthfully
or they might be hiding from the world, they might
not just want to answer that truthfully on a survey. And so you're going to have response bias. But that's not the case right over here. And undercoverage is when
the way that you're sampling, you're definitely missing out
on an important constituency. Voluntary response
we're likely missing out on some important
constituencies, on some people who might not be into
going to your website, but undercoverage is where
it's a little bit more clear that that is happening. Let's do another case,
let's do another case, maybe an alternate reality
where David's trying to figure this out again,
he's still hosting a podcast, he's still curious how much
his listeners like his show, but he tries to take a different sample. He decides in this case, to
poll the next 100 listeners who send him fan emails. They don't all respond, but 94 out of the 97 listeners polled
said they "loved his show." What is the most concerning
source of bias in this scenario? Well this is a classic,
"Hey I have a group, "I have a sample sitting in front of me, "it's in my inbox in my email,
let me just go to them." Isn't that convenient? So this is classic convenience sample. And this isn't just like, hey, these are the first 100
people to walk through the door and there's, a lot of times you can argue why that might be not so random, but these're the next 100
listeners who sent him fan emails. (laughing) So this is convenience
sampling and the sample that you happen to use out of convenience is one that's going to be
very skewed to liking you. So once again, this is overestimating, overestimating the percent, the percent that love his show. Now nonresponse is when you
ask a certain number of people to fill out a survey or
to answer a questionnaire, and for some reason, some
percent do not fill it out. And you're like, "Wow,
who are those people? "Maybe they would have
said something important "and maybe their viewpoint "is not properly represented
in the overall number "that actually did fill it out." And there is some
nonresponse going on here. He asks 100 people who
sent fan emails to fill out the survey to say whether
they love it or not, 97 fill it out. So there are three people who
did not fill out the survey. So there is some nonresponse going on that would be a source of bias, but it's not the most concerning. Right over here they're
asking us, fill out the most concerning source of bias, and the convenience sampling is definitely the biggest deal here. There were three people
who didn't respond, but that's not as big of a deal. Voluntary response sampling. Well, he didn't ask people,
like in the last example, "Hey, if you can go here and fill it out?" I guess, I take that back,
there is a little bit of voluntary response here, where he goes to these 100 people and he asks them to respond. And so you have the 97
people who chose to respond. But once again, that
could be a source of bias, but most of the 97 of
the 100 are responding, and once again, the most concerning thing is the convenience sampling,
which will once again, based on this sample that
he's happening to use out of convenience, is going to be a very, a significant overestimate in terms of representing the entire
population of his listeners.