Main content

## Statistics and probability

### Course: Statistics and probability > Unit 9

Lesson 8: More on expected value- Term life insurance and death probability
- Getting data from expected value
- Expected profit from lottery ticket
- Expected value while fishing
- Comparing insurance with expected value
- Expected value with empirical probabilities
- Expected value with calculated probabilities
- Making decisions with expected values
- Law of large numbers

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Law of large numbers

Sal introduces the magic behind the law of large numbers. Created by Sal Khan.

## Want to join the conversation?

- Just to clarify it in my own mind, as the scale of n grows large, the width of any spike grows small, so its impact on the average approaches zero - while in general, the 'height' will always hover around the expected value - so the sample mean will approach E(x) as n-> infinity. Is that a good way to visualize it?(48 votes)
- Yes, exactly. Early spikes in values that move the average away from the true mean will be drowned out over time by many, many more samples, and the average will converge to the true mean. The best example of this is the Galton Board - it's a physical example of a normal distribution appearing over time. You can see that any early spikes in the distribution dissappear as more sand is poured through. http://www.youtube.com/watch?v=5_HVBhwhwV8(35 votes)

- How can you prove the law of large numbers?(18 votes)
- I am pretty certain that this law was empirically derived. A lot of the early mathematicians who worked on probability would roll die or flip coins for hours.... and then painstakingly work through the data. I actually think that the Law of Large Numbers grew out of Measurement Theory, where scientists were struggling trying to find accurate numbers for physical constants.

Like the calculus -- which was also historically an intuitive subject rooted in observation -- Probability and Statistics were "formalized" in the late 19th through the early 20th Century.(8 votes)

- Please tell me if my understanding is correct or incorrect. It's really hard to explain, but I'll do my best. I apologize in advance if I'll confuse you.

Reading from wikipedia, I understood, that gambler's fallacy states that:

Let's say after 99 flips of fair coins, that turned out to be heads (very small probability, but still, let's assume that's the case), the probability of 100th coin to be tails is still 50%. But, what I am trying to wrap my mind around is this:

Let's say you are going to flip a coin INDEFINITELY. So, E (X) would be 50. After FIRST 50 SEQUENTIAL flips you have, let's say, 45 heads and 5 tails. According to this Law of Large Numbers, you have infinity. That means, that at some region on that infinite graph, you'll get to the point where you'll be having 45 tails and 5 heads (not necessarily sequential draws) - to even out the average value, is that correct? Please remember, that I am not talking about finite number of draws. I'm talking about infinity. As I understand, that even if you don't know what the outcome would be at particular draw, there always be opposite outcome to even that one out. Am I right?(4 votes)- It is a strange idea, but
**luminoustedium**is right, there does not need to be a particular 'balancing' effect. Think about it like this:

You first 50 flips, you get 45 heads and 5 tails. Then for the rest of the flips (up ti infinity), you get*exactly*half heads and the other half tails. So the number of heads is:

(n-45)/2 + 45 = n/2 + 22.5

That is, for n flips total, we have the 45 heads, and for the rest of them (n-45), we have exactly half of them as heads. We can see that this is an*increasing function of n*(as n increases, so does the number of heads). Then for the proportion of heads, we divide by n:

(n/2 + 22.5)/n = 1/2 + 22.5/n

Now if we take n out to infinity (or take the "limit" over n, if you are familiar with calculus), the 1/2 stays as it is, but the 22.5/n slowly gets smaller and smaller as n goes to infinity. The limit of this term (when n is "equal" to infinity_ is zero, meaning that the proportion of heads becomes just 1/2. So even though we had 45 more heads than tails, the expected value would still be 50%, because the slight bias gets washed out by the sample size.

Now, in general it won't happen that the first 50 will be 45 heads and 5 tails, and the rest will be evenly split, but the general idea remains the same: if we are generating from a known distribution (i.e. flipping a fair coin), then while there may be a bias in small samples, that bias will get washed out by the sample size when we take it up to infinity.(14 votes)

- Why is the n-th sample n at3:29? Is this a mistake?(6 votes)
- That is defined like ( X1 + X2 + X3 + ... + Xn ) / n

So yes that was an mistake i thik ;) but dont want to be a smarty panst :D(20 votes)

- Sal: I am confused between "the number of draws" which is not the same as the "sample size of a draw".

If there are x_1, x_2, ... x_n and you draw x_1,x_2,x_3 and create a mean (divide by 3). Then you say draw x_1 ... x_4 sum them and divide by 4 etc.. Then you say as n --> infinity. How can n go to infinity? Are you saying n is the entire population? Let's take a different case here: Now, say you draw 3 X's from a pool and create a mean; then put that back in, and draw another set of 3 X(2 votes)- I think your problem is that a single x_k is an entire draw. So the x's don't represent things that you are drawing; each single x is a result of a full-size draw. In Sal's example, there are 100 coins (x's are NOT coins, so n is NOT 100). He 'draws' them all and gets 55 heads, so x_1=55. Then he has to start over and 'draw' all 100 coins AGAIN just to get x_2. So n is the number of times he 'draws' all 100 coins, and this is also the "Sample Size", but not the "sample size of a draw".(8 votes)

- At3:35when Sal says that the sample mean converges on the population mean when the sample size (n) approaches infinity, would it not also be true for when the sample size (n) approaches the population size (N) for a finite population?(5 votes)
- That's a great question! It seems to me that our statistical practices make the hidden assumption that we do our sampling "with replacement" as we said back in the probability days. If you look up the difference between the binomial distribution and the hypergeometric distribution, you might guess why mathematicians wanted to avoid that headache. :)

That assumption an insignificant source of error as long as n is small relative to N, but you're right that you'd notice the difference as the size of the sample grows. That difference becomes very stark when the sample size is equal to the population size -- without replacement you know you've tested the entire population but with replacement you are all but certain that you've double-checked some people and ignored others, so it doesn't precisely reflect the population for any finite sample size.(2 votes)

- At2:41it said that expected value is number of trial multiply by probability of success. So by this we arrived at E(X) = 100 * (0.5) = 50.

But in Expected value lesson it was never discussed that expected value can be estimated in such a way. The only thing mentioned around calculating expected value was that it is sum of the product of probability of an outcome and actual outcome. The value of outcomes and their probabilities is something which is random and can be derived either by experiment or doing a sampling.

So how can we say that expected value will be 50.(4 votes)- I have a feeling that the videos used to be in a different order. He covers it in this video:

https://www.khanacademy.org/math/probability/random-variables-topic/binomial_distribution/v/expected-value-of-binomial-distribution

The basic idea is this:

For just 1 coin flip, it's easy to get E[X], right? Then, when we're adding up a lot of things (tossing a coin 100 times and counting the number of heads is the same as adding up the number of heads from 100 coin tosses).

Secondly, the expected value can be distributed over addition/subtraction. So if we have two random variables X and Y, E{ X+Y ] = E[X] + E[Y]. So, if we have 100 tosses we're adding up, we can just multiply 100 by the expected value.(5 votes)

- is this the same as weak law of large numbers?(4 votes)
- Weak and strong law of large numbers are similar, but not the same. You must know about diferent modes of convergence (from measure theory/some higher analysis course). Basicaly, the "formula" is the same, but in the weak law, you get convergence in probability, whereas in the strong law you get almost sure convergence. Almost sure convergence implies convergence in probability, so it is "stronger" than convergence in probability. Its hard to explain without using mesure theory....(1 vote)

- At numerous points in this video (i.e.,3:47), Sal explains that knowing the past results of a series of experiments will not help with predicting future results. Does this also hold true for a situation where the results have already been performed but there are so many of them that the count is effectively infinite? For example, a random number generator (like a coin-flipping machine or a thermal sensor that outputs the least significant digit of its measurement) fills a terabyte hard drive with 0s and 1s, which is about 8 million million bits. We then choose an address each minute (a different address each time) that points us to one of these bits. With this setup, it would take more than 15 years to reveal even a millionth of the total bits stored. Is there a way then to make a guess if the bit is 0 or 1 noticably better than 50/50 before the end of the world?(3 votes)
- Nope. Not if the number generator really was random.(2 votes)

- Why is this called a "Law"?(4 votes)
- Actually, there's no special reason.

In mathematics, you can use the terms "theorem, proposition, law, rule, identity, principle, algorithm" almost interchangeably. So, you could call it the "Theorem of Large Numbers", too. Mathematicians just decided to call it a law because it has a nice ring to it, really.

There's nothing to do with the fact that "probabilities obey it". Any statement that is mathematically proven to be true, such as the LLN, can be called a law, or a theorem, or whatever.(1 vote)

## Video transcript

Let's learn a little bit about
the law of large numbers, which is on many levels, one of the
most intuitive laws in mathematics and in
probability theory. But because it's so applicable
to so many things, it's often a misused law or sometimes,
slightly misunderstood. So just to be a little bit
formal in our mathematics, let me just define it for you first
and then we'll talk a little bit about the intuition. So let's say I have a
random variable, X. And we know its expected value
or its population mean. The law of large numbers just
says that if we take a sample of n observations of our random
variable, and if we were to average all of those
observations-- and let me define another variable. Let's call that x sub n
with a line on top of it. This is the mean of n
observations of our random variable. So it's literally this is
my first observation. So you can kind of say I run
the experiment once and I get this observation and I run it
again, I get that observation. And I keep running it n times
and then I divide by my number of observations. So this is my sample mean. This is the mean of all the
observations I've made. The law of large numbers just
tells us that my sample mean will approach my expected
value of the random variable. Or I could also write it as my
sample mean will approach my population mean for n
approaching infinity. And I'll be a little informal
with what does approach or what does convergence mean? But I think you have the
general intuitive sense that if I take a large enough sample
here that I'm going to end up getting the expected value of
the population as a whole. And I think to a lot of us
that's kind of intuitive. That if I do enough trials that
over large samples, the trials would kind of give me the
numbers that I would expect given the expected value and
the probability and all that. But I think it's often a little
bit misunderstood in terms of why that happens. And before I go into
that let me give you a particular example. The law of large numbers will
just tell us that-- let's say I have a random variable-- X is
equal to the number of heads after 100 tosses of a fair
coin-- tosses or flips of a fair coin. First of all, we know what
the expected value of this random variable is. It's the number of tosses,
the number of trials times the probabilities of
success of any trial. So that's equal to 50. So the law of large numbers
just says if I were to take a sample or if I were to average
the sample of a bunch of these trials, so you know, I get-- my
first time I run this trial I flip 100 coins or have 100
coins in a shoe box and I shake the shoe box and I count the
number of heads, and I get 55. So that Would be X1. Then I shake the box
again and I get 65. Then I shake the box
again and I get 45. And I do this n times and then
I divide it by the number of times I did it. The law of large numbers just
tells us that this the average-- the average of all
of my observations, is going to converge to 50 as n
approaches infinity. Or for n approaching 50. I'm sorry, n
approaching infinity. And I want to talk a little
bit about why this happens or intuitively why this is. A lot of people kind of feel
that oh, this means that if after 100 trials that if I'm
above the average that somehow the laws of probability are
going to give me more heads or fewer heads to kind of
make up the difference. That's not quite what's
going to happen. That's often called the
gambler's fallacy. Let me differentiate. And I'll use this example. So let's say-- let
me make a graph. And I'll switch colors. This is n, my x-axis is n. This is the number
of trials I take. And my y-axis, let me make
that the sample mean. And we know what the expected
value is, we know the expected value of this random
variable is 50. Let me draw that here. This is 50. So just going to
the example I did. So when n is equal to--
let me just [INAUDIBLE] here. So my first trial I got 55
and so that was my average. I only had one data point. Then after two trials,
let's see, then I have 65. And so my average is going to
be 65 plus 55 divided by 2. which is 60. So then my average
went up a little bit. Then I had a 45, which
will bring my average down a little bit. I won't plot a 45 here. Now I have to average
all of these out. What's 45 plus 65? Let me actually just
get the number just so you get the point. So it's 55 plus 65. It's 120 plus 45 is 165. Divided by 3. 3 goes into 165 5--
5 times 3 is 15. It's 53. No, no, no. 55. So the average goes
down back down to 55. And we could keep
doing these trials. So you might say that the law
of large numbers tell this, OK, after we've done 3 trials
and our average is there. So a lot of people think that
somehow the gods of probability are going to make it more
likely that we get fewer heads in the future. That somehow the next couple of
trials are going to have to be down here in order to
bring our average down. And that's not
necessarily the case. Going forward the probabilities
are always the same. The probabilities are
always 50% that I'm going to get heads. It's not like if I had a bunch
of heads to start off with or more than I would have expected
to start off with, that all of a sudden things would be made
up and I would get more tails. That would the
gambler's fallacy. That if you have a long streak
of heads or you have a disproportionate number of
heads, that at some point you're going to have-- you have
a higher likelihood of having a disproportionate
number of tails. And that's not quite true. What the law of large numbers
tells us is that it doesn't care-- let's say after some
finite number of trials your average actually-- it's a low
probability of this happening, but let's say your average
is actually up here. Is actually at 70. You're like, wow, we really
diverged a good bit from the expected value. But what the law of large
numbers says, well, I don't care how many trials this is. We have an infinite
number of trials left. And the expected value for that
infinite number of trials, especially in this type of
situation is going to be this. So when you average a finite
number that averages out to some high number, and then an
infinite number that's going to converge to this, you're going
to over time, converge back to the expected value. And that was a very informal
way of describing it, but that's what the law or
large numbers tells you. And it's an important thing. It's not telling you that if
you get a bunch of heads that somehow the probability of
getting tails is going to increase to kind of
make up for the heads. What it's telling you is, is
that no matter what happened over a finite number of trials,
no matter what the average is over a finite number of
trials, you have an infinite number of trials left. And if you do enough of them
it's going to converge back to your expected value. And this is an important
thing to think about. But this isn't used in practice
every day with the lottery and with casinos because they know
that if you do large enough samples-- and we could even
calculate-- if you do large enough samples, what's the
probability that things deviate significantly? But casinos and the lottery
every day operate on this principle that if you take
enough people-- sure, in the short-term or with a few
samples, a couple people might beat the house. But over the long-term the
house is always going to win because of the parameters of
the games that they're making you play. Anyway, this is an important
thing in probability and I think it's fairly intuitive. Although, sometimes when you
see it formally explained like this with the random variables
and that it's a little bit confusing. All it's saying is that as you
take more and more samples, the average of that sample is going
to approximate the true average. Or I should be a little
bit more particular. The mean of your sample is
going to converge to the true mean of the population or to
the expected value of the random variable. Anyway, see you in
the next video.