Main content

## High school statistics

### Course: High school statistics > Unit 6

Lesson 4: Probability from simulations- Experimental versus theoretical probability simulation
- Random number list to run experiment
- Random numbers for experimental probability
- Interpret results of simulations
- Simulation and randomness: Random digit tables
- Statistical significance of experiment

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Random number list to run experiment

Using a list of random numbers to simulate multiple trials of an experiment.

## Want to join the conversation?

- I understand that the more experiments we conduct, we get closer to the theoretical average, but how do you find the theoretical average for the question that Sal solved?(13 votes)
- If we already have 𝑛 different prizes, then the probability of getting a prize we don't have from the next box of cereals is

𝑝(𝑛) = 1 − 𝑛∕6 = (6 − 𝑛)∕6

The average number of trials until we get a successful trial is

𝑡(𝑛) = 1∕(𝑝(𝑛)) = 6∕(6 − 𝑛)

So, starting with zero prizes, the average number of trials until we have all six prizes is

𝑡(0) + 𝑡(1) + 𝑡(2) + ... + 𝑡(5) =

= 6∕(6 − 0) + 6∕(6 − 1) + 6∕(6 − 2) + ... + 6∕(6 − 5) =

= 14.7(34 votes)

- What is an intuition behind choosing numbers between 0-9 to run simulation experiment?(11 votes)
- The random numbers generated are going to be 0-9. By using all digits, randomness is ensured. Since all ten digits will be used in the generation, and only those digits, you use 0-9 to run the simulation. I hope that helps!(9 votes)

- So is there a rule of how many times we need to run the experiment to get the average number?(6 votes)
- I am guessing that the following law applies and as Sal has shared in previous videos. https://en.wikipedia.org/wiki/Law_of_large_numbers(6 votes)

- Isn't there any other way to calculate this probability? Do we need to run a random digit table and count 1 to 6 for every experiment and then find the average ? This will be tiresome. Mathematics is to make things easy so how is it helpful?(6 votes)
- For this one, no, because we know a fair 6-sided die can only get us numbers 1-6, and three rolls of the die will get us combinations of numbers from 1-6.

The solution: get a table, and write down every single possible outcome, from 111, 112...to 666. Then, circle the outcomes where you win, divide by the number of total possible outcomes. It turns out to be 50%.

And further in the course you will learn combinations and permutations, which deals with exactly this kind of stuff.(5 votes)

- Hi Sal, Why don't you look at the invalid numbers? Cause in real life she does not know if the box has a prize or not and she choses a box unknowingly and thus i would say in the first experiment is 12 boxes of of the total number of boxes? Am i right or what point am i missing :-) I mean i can't go to wall mart and return a box because it has no prize...

Dine(5 votes)- There are only 6 prizes and every box has a prize. The invalid numbers do not represent a box without a prize, they are simply invalid because they are not needed for how the problem is constructed. If the problem had stated there were only 3 prizes, then we would ignore any number greater than 3 in the random digit table, because it would have no meaning in the context of the problem.(4 votes)

- Is there any way to find the amount of boxes scientifically?

It seems like using this method is really slow. Plus, what if no prizes were invalid?(4 votes) - So using a random sequence simulation is only valid where the samples have an equal likelihood?(3 votes)
- How do you know how many experiments to do or what to stop at?(2 votes)
- The more experiments you do, the more accurate your probability will be. So, it really depends on you, so you can balance accuracy and your convenience.(2 votes)

- I made this in python if you guys want to try it out, in this case, I wanted to find out how many times I roll a die to get all 6 numbers. It's experimental and serves the same purpose.

#The setup:

import random

all_rolls = []

max_items = 9

mi = 1

max_items_list = []

while mi <= max_items:

max_items_list.append(mi)

mi+=1

def dice_function():

results =[]

throws = 20000

dice_sides = max_items_list

i=0

while i <throws:

dice_throw = round(random.randint(1,max_items))

results.append(dice_throw)

if all(elem in results for elem in dice_sides):

break

i+=1

all_rolls.append(len(results))

#The LAY-OUT

print ("IT TOOK " + str(len(results)) + " ITERATIONS TO COMPLETE THE SET:")

print (results)

print (" ")

print ("ALL ROLLS: " + str(all_rolls))

print ("MEAN: " + str((sum(all_rolls))/len(all_rolls)))

print (" ")

print (" ")

print (" ----------------------------------------------------------- ")

print (" ")

print (" ")

x = 0

while x < 1000:

dice_function()

x+=1(2 votes) - Would it be accurate if only 3 experiments?(2 votes)

## Video transcript

- [Instructor] So we're told that Amanda Young wants
to win some prizes. A cereal company is giving away a prize in each box of cereal, and they advertise, "Collect all six prizes." Each box of cereal has one prize, and each prize is a
equally likely to appear in any given box. Amanda wonders how many boxes it takes on average to get all six prizes. So there's several ways to
approach this for Amanda. She could try to figure out a
mathematical way to determine what is the expected number of boxes she would need to collect on average to get all six prizes, or she could run some random numbers to simulate collecting box
after box after box and figure out multiple trials on
how many boxes does it take to win all six prizes. So for example, she could say, "All right, each box is gonna have one of six prizes," so there could be... She could assign a number
for each of the prizes, one, two, three, four, five, six, and then she could have a
computer generate a random string of numbers, maybe something
that looks like this, and the general method, she
could start at the left here, and each new number she gets she can say, "Hey, this is like getting a cereal box, and then it's going to
tell me which prize I got." So she starts her first experiment. She'll start here at the
left, and she'll say, "Okay, the first cereal
box of this experiment of this simulation I
got prize number one," and she'll keep going. The next one, she gets prize number five. Then the third one, she
gets prize number six. Then the fourth one, she
gets prize number six again, and she will keep going until
she gets all six prizes. You might say, "Well, look. There are numbers here that
aren't one through six." There's zero. There's seven. There's eight or nine. Well, for those numbers,
she could just ignore them. She could just pretend
like they aren't there and just keep going past them. So why don't you pause
this video and do it for the first experiment. On this first experiment,
using these numbers, assuming that this is the
first box that you are getting in your simulation, how
many boxes would you need in order to get all six prizes? So let me make a table here. So this is the experiment, and then in the second column, I'm gonna say number of boxes, number of boxes you would have
to get in that simulation. So maybe I'll do the first
one in this blue color. So we're in the first
simulation, so one box, we got the one. Actually, maybe I'll check things off, so we have to get a one,
a two, a three, a four, a five, and a six. So let's see. We have a one. I'll check that off. We have a five. I'll check that off. We get a six. I'll check that off. Well, the next box we got another six. We've already had that prize, but we're gonna keep getting boxes. Then the next box, we get a two. Then the next box, we get a four. Then the next box, the number is a seven, so we will just ignore
this right over here. The box after that, we get a six, but we already have that prize. Then we ignore the next box, zero. That doesn't give us a prize. We assume that that didn't even happen, and then we would go to the number three, which is the last prize that we need. So how many boxes did
we have to go through? Well, we would only count the valid ones, the ones that gave a valid prize between the numbers one through
six, including one and six. So let's see. We went through one, two,
three, four, five, six, seven, eight boxes in the first experiment, so experiment number one, it took us eight boxes
to get all six prizes. Let's do another experiment
'cause this doesn't tell us that on average she
would expect eight boxes. This just meant that on this first experiment
it took eight boxes. If you wanted to figure out on average, you want to do many experiments, and the more experiments you do, the better that that
average is going to... The more likely that your
average is going to predict what it actually takes on
average to get all six prizes. So now let's do our second
experiment, and remember, it's important that these
are truly random numbers, and so we will now start
at the first valid number. So we have a two, so this
is our second experiment, and we got a two. We got a one. We can ignore this eight. Then we get a two again. We've already had that prize. Ignore the nine. Five, that's a prize we
need in this experiment. Nine, we can ignore, and then four, haven't
gotten that prize yet in this experiment. Three, haven't gotten that
prize yet in this experiment. One, we already got that prize. Three, we already got that prize. Three, already got that prize. Two, two, already got those prizes. Zero. We already got all of
these prizes over here. We can ignore the zero. Already got that prize, and finally, we get prize number six. So how many boxes did we need
in that second experiment? Well, let's see. One, two, three, four, five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16, 17 boxes, so in experiment 2, I needed
17, or Amanda, needed 17 boxes, and she can keep going. Let's do this one more time. This is strangely fun. So experiment three, and remember, we only want to look at the valid numbers. We'll ignore the invalid numbers, the ones that don't give
us a valid prize number. So four, we get that prize. These are all invalid, in fact, and then we go to five. We get that prize. Five, we already have it. We get the two prize. Seven and eight are invalid. Seven's invalid. Six, we get that prize. Seven's invalid. One, we got that prize. One, we already got it. Nine's invalid. Two, we already got it. Nine is invalid. One, we already got the one prize, and then finally, we
get prize number three, which was the missing prize. So how many boxes, valid boxes,
did we have to go through? Let's see. One, two, three, four, five, six, seven, eight, nine, 10. 10, so with only three
experiments, what was our average? Well, with these three experiments, our average is going to be eight
plus 17 plus 10 over three, so let's see. This is 25, 35 over three, which is equal to 11 2/3. Now, do we know that this is the true theoretical
expected number of boxes that you would need to get? No, we don't know that, but
the more experiments we run, the closer our averages likely get to the true theoretical average.