If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Course: Statistics and probability>Unit 9

Lesson 8: More on expected value

# Law of large numbers

Sal introduces the magic behind the law of large numbers. Created by Sal Khan.

## Want to join the conversation?

• How can you prove the law of large numbers? • I am pretty certain that this law was empirically derived. A lot of the early mathematicians who worked on probability would roll die or flip coins for hours.... and then painstakingly work through the data. I actually think that the Law of Large Numbers grew out of Measurement Theory, where scientists were struggling trying to find accurate numbers for physical constants.

Like the calculus -- which was also historically an intuitive subject rooted in observation -- Probability and Statistics were "formalized" in the late 19th through the early 20th Century.
• Please tell me if my understanding is correct or incorrect. It's really hard to explain, but I'll do my best. I apologize in advance if I'll confuse you.
Reading from wikipedia, I understood, that gambler's fallacy states that:
Let's say after 99 flips of fair coins, that turned out to be heads (very small probability, but still, let's assume that's the case), the probability of 100th coin to be tails is still 50%. But, what I am trying to wrap my mind around is this:

Let's say you are going to flip a coin INDEFINITELY. So, E (X) would be 50. After FIRST 50 SEQUENTIAL flips you have, let's say, 45 heads and 5 tails. According to this Law of Large Numbers, you have infinity. That means, that at some region on that infinite graph, you'll get to the point where you'll be having 45 tails and 5 heads (not necessarily sequential draws) - to even out the average value, is that correct? Please remember, that I am not talking about finite number of draws. I'm talking about infinity. As I understand, that even if you don't know what the outcome would be at particular draw, there always be opposite outcome to even that one out. Am I right? • It is a strange idea, but luminoustedium is right, there does not need to be a particular 'balancing' effect. Think about it like this:

You first 50 flips, you get 45 heads and 5 tails. Then for the rest of the flips (up ti infinity), you get exactly half heads and the other half tails. So the number of heads is:

(n-45)/2 + 45 = n/2 + 22.5

That is, for n flips total, we have the 45 heads, and for the rest of them (n-45), we have exactly half of them as heads. We can see that this is an increasing function of n (as n increases, so does the number of heads). Then for the proportion of heads, we divide by n:

(n/2 + 22.5)/n = 1/2 + 22.5/n

Now if we take n out to infinity (or take the "limit" over n, if you are familiar with calculus), the 1/2 stays as it is, but the 22.5/n slowly gets smaller and smaller as n goes to infinity. The limit of this term (when n is "equal" to infinity_ is zero, meaning that the proportion of heads becomes just 1/2. So even though we had 45 more heads than tails, the expected value would still be 50%, because the slight bias gets washed out by the sample size.

Now, in general it won't happen that the first 50 will be 45 heads and 5 tails, and the rest will be evenly split, but the general idea remains the same: if we are generating from a known distribution (i.e. flipping a fair coin), then while there may be a bias in small samples, that bias will get washed out by the sample size when we take it up to infinity.
• At when Sal says that the sample mean converges on the population mean when the sample size (n) approaches infinity, would it not also be true for when the sample size (n) approaches the population size (N) for a finite population? • That's a great question! It seems to me that our statistical practices make the hidden assumption that we do our sampling "with replacement" as we said back in the probability days. If you look up the difference between the binomial distribution and the hypergeometric distribution, you might guess why mathematicians wanted to avoid that headache. :)

That assumption an insignificant source of error as long as n is small relative to N, but you're right that you'd notice the difference as the size of the sample grows. That difference becomes very stark when the sample size is equal to the population size -- without replacement you know you've tested the entire population but with replacement you are all but certain that you've double-checked some people and ignored others, so it doesn't precisely reflect the population for any finite sample size.
• At it said that expected value is number of trial multiply by probability of success. So by this we arrived at E(X) = 100 * (0.5) = 50.

But in Expected value lesson it was never discussed that expected value can be estimated in such a way. The only thing mentioned around calculating expected value was that it is sum of the product of probability of an outcome and actual outcome. The value of outcomes and their probabilities is something which is random and can be derived either by experiment or doing a sampling.

So how can we say that expected value will be 50. • is this the same as weak law of large numbers? • Weak and strong law of large numbers are similar, but not the same. You must know about diferent modes of convergence (from measure theory/some higher analysis course). Basicaly, the "formula" is the same, but in the weak law, you get convergence in probability, whereas in the strong law you get almost sure convergence. Almost sure convergence implies convergence in probability, so it is "stronger" than convergence in probability. Its hard to explain without using mesure theory....
(1 vote)
• At numerous points in this video (i.e., ), Sal explains that knowing the past results of a series of experiments will not help with predicting future results. Does this also hold true for a situation where the results have already been performed but there are so many of them that the count is effectively infinite? For example, a random number generator (like a coin-flipping machine or a thermal sensor that outputs the least significant digit of its measurement) fills a terabyte hard drive with 0s and 1s, which is about 8 million million bits. We then choose an address each minute (a different address each time) that points us to one of these bits. With this setup, it would take more than 15 years to reveal even a millionth of the total bits stored. Is there a way then to make a guess if the bit is 0 or 1 noticably better than 50/50 before the end of the world? • Why is this called a "Law"? • Actually, there's no special reason.

In mathematics, you can use the terms "theorem, proposition, law, rule, identity, principle, algorithm" almost interchangeably. So, you could call it the "Theorem of Large Numbers", too. Mathematicians just decided to call it a law because it has a nice ring to it, really.

There's nothing to do with the fact that "probabilities obey it". Any statement that is mathematically proven to be true, such as the LLN, can be called a law, or a theorem, or whatever.
(1 vote)
• Ok, so I get that the law of large numbers refers to the number of trials, as it gets higher and higher it approaches E(x). But this refers to the number of trials; not to the sample size? So as n-> infinity, the sample mean approaches E(x) regardless of if the sample size is 2 or 100 (as in this coin example)?   