Main content

## Statistics and probability

### Unit 6: Lesson 2

Sampling and observational studies- Reasonable samples
- Valid claims
- Making inferences from random samples
- Identifying a sample and population
- Identify the population and sample
- Examples of bias in surveys
- Example of undercoverage introducing bias
- Correlation and causality
- Identifying bias in samples and surveys
- Simulation and randomness: Random digit tables

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Reasonable samples

To make a valid conclusion, you'll need a representaive, not skewed, sample. Created by Sal Khan.

## Want to join the conversation?

- At3:34, why would asking the whole class be less efficient? Wouldn't the class be the only relevant demographic in this case?(5 votes)
- u still want the answer?... its been 6 years(5 votes)

- I thought when using a computer to generate something random, that it's not truly random? Rather pseudo-random?(4 votes)
- You are right, but for the purpose of the exercise, it is random.(7 votes)

- So what percent of a number is reasonable. So like if there were 60,000 students in a school, is 50 students a reasonable amount of students to ask their opinions'?(4 votes)
- Rather than a exact percentage, the more important requirement is a random sampling, so 50 may be a reasonable amount if it is truly a random sampling. Think about a presidential race, there is no easy way to get even close to 1% of possible voters (138+ million people voted in 2016, so 1 percent is 1.3 million people, .1% is 138,000, and .01 is still 13,800 voters which is still a lot), so they have to define the parameters of who they chose as opinions.

Statistics can be misleading if it is not random such as the number of toothpaste brands that 4 out of 5 dentists recommend.(1 vote)

- What is the main difference between random and systematic?(4 votes)
- in a statistical context

systematic means there is a time or spatial interval of sampling datapoints (say every o'clock to check the humidity in a room) which is predictable (1 hour interval)

random means we have no predictable or biased way to draw a sample (say pick out 6 balls from 45 possible balls in a veiled box)

but some sampling can be complicated and created by combining both (say pick every odd place in the decimals of pi; 3.'1' 4 '1' 5 '9' 2 and so forth)

above case seems systematic since it has a predictable interval, but random too cause after some (quite long) sequences of picking you have no way to predict the next digit of pi thus it is random in some sense

in a word, predictibility is the key difference between random sampling and systematic one(0 votes)

- I had exam questions and it's hard to solve, is there I can seek help. I could have attached those 6-10 questions need to answer them. Do you think I get help for this(1 vote)
- what website do you get these questions on?(1 vote)
- A magazine asks people to visit its website to vote for Australia's most popular TV start

Why would this survey's sample be biased?(1 vote)- Because it is limited to

(1) People who read that magazine

(2) People who have access to a computer.(0 votes)

- with the first problem wouldn't it be better to ask the parents because they would be most affected by the plan.(0 votes)
- The parents at the local playground are not representative of the entire district. In general, a random sample for a survey is almost always the best way to ensure against bias.(1 vote)

- what is the difference(0 votes)

## Video transcript

City Councilwomen
Kelly wants to know how the residents
of her district feel about a proposed
school redistricting plan. Which of the following
survey methods will allow councilwomen Kelly
to make a valid conclusion about how residence
of her district feel about the proposed plan? So before we even
look at these, we just have to realize that
if you're trying to make a valid
conclusion about how the residents of
her entire district feel about the
proposed plan, she has to find a
representative sample, or not kind of a skewed
sample that would just sample parts of her district. So let's look at her choices. Should she just
ask her neighbors? Well, she might live in a
part of the neighborhood that might unusually benefit
from the redistricting plan or might get hurt from
the redistricting plan. And so just her
neighbors wouldn't be representative of
the district as a whole. So just asking her neighbors
probably does not make sense. Ask the residents of Whispering
Pines Retirement Community. So once again, the first
one skews by geography. She's oversampling her neighbors
and not the entire district. Here, she's oversampling a
specific age demographic. So here she is oversampling
older residents who might have very
different opinions then middle aged or
younger residents. So that doesn't
make sense either. Ask 200 residents
of her district whose names are
chosen at random. Well, that seems reasonable. It doesn't seem like there's
some chance that you somehow over sampled one
direction or another. But it's most likely to give
a reasonably representative sample. And this is a pretty
large sample size. So it's important to say, what
is the random process, where she getting these names from? But this actually
does seem reasonable. Ask a group of parents
at the local playground. Well once again, this is just
like asking your neighbors. And it's also sampling
a specific demographic. Now, this might
be the demographic that cares most
about the schools. But she wants to know how
the whole district feels about the redistricting plan. And once ago, this is
at a local playground. This isn't at all the
playgrounds in the district somehow. So I wouldn't do
this one either. Let's do one more of these. Mimi wants to conduct a
survey of her 300 classmates to determine which candidate
for class president-- Napoleon Dynamite or Blair Waldorf--
is in the lead in the upcoming election. Mimi will ask the question,
if the election were today which candidate
would get your vote? Which of the following methods
of surveying her classmates will allow Mimi to make
valid conclusions about which candidate is in the lead? So let's see, ask
all of the students at Blair's lunch table? No. That would skew it in
Blair's favor, probably. That's not a
representative sample. Ask all the members of
Napoleon's soccer team? No, same thing. They're likely to
go Napoleon's way or maybe they don't
like Napoleon, maybe they'll go
against Napoleon. But either way this seems
like a skewed sample. Put the names of all
the students in a hat and draw 50 names. Ask those students
whose names are drawn. Well this seems like
a nice random sample that could be nicely
representative of the entire population. Ask all students whose
names begin with N or B? Well, this could be
perceived as kind of random. But notice, N is the same
starting letter as Napoleon, B is the same starter
letter as Blair. You might say,
well, that's fair. You're doing it for
each of their letters. But maybe there's like 10 people
whose names start with an N and only two people whose names
start with a B. Once again, you're not even
getting a large sample. And then on top of
that, maybe there's some type of people with
the same starting letter somehow like each other more. So I would steer
clear of this one. Ask every student in the class? Well, that would work. There's 300 classmates,
that might not be that time consuming. You can't get a better
sample than asking everyone in the population. Which of the following methods
of surveying our class which will allow Mimi to make a
valid conclusion about which candidate in the lead? Well, that's a pretty
good conclusion. People might change their mind. So it's not a done deal. You can't get a
better sample size then the entire population. Assign numbers to each
student in the class and use a computer
program to generate 50 random numbers
between 1 and 300. Ask those students whose
numbers are selected. Well this is pretty close to
put the names of all the student in a hat and draw 50 names. So I would give this one. That seems reasonable as well.