If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Course: Statistics and probability>Unit 9

Lesson 6: Binomial mean and standard deviation formulas

# Bernoulli distribution mean and variance formulas

Sal continues on from the previous video to derive the mean and variance formulas for the Bernoulli distribution. Created by Sal Khan.

## Want to join the conversation?

• If 0 & 1 are taken as arbitrary, why can't we take -1 & 1 instead. That will result into a completely different set of formula ? •  Let's say we were using 0 & 1 with p = 0.6, then 1-p = 0.4.
In this case u = p = 0.6.
The distance from 0.6 to 0 is 0.6. u is at 0.6 which is 60% away from 0.

Now let's say we were using -1 & 1 with p = 0.6 and 1-p = 0.4.
In this case, u = (1-p)*-1 + p*1 = 2p - 1 = 2*0.6 - 1 = 0.2.
The distance from -1 to 0.2 is |-1 - 0.2| = 1.2. The total distance from -1 to 1 is 2.
Note that 1.2/2 = 0.6, meaning that your u still lies 60% away from -1.
• There's a problem with using Sal's simplified form for Variance: sigma^2 = p(1 - p). It doesn't take into account loss of degrees of freedom when calculating sample standard deviation s^2.

In the next video for example, if you used the p(1 - p) formula to calculate s^2 you would get 24.51/100 = 0.2451 rather than the correct answer of 24.51/99 = 0.2476 as is shown in the video. Does this mean that the simplified formula should only be used when calculating POPULATION mean and not SAMPLE mean? • No, the formula µ=p and σ² = p(1 - p) are exact derivations for the Bernoulli distribution. And similarly when we get to the Binomial distribution and see µ=np and σ² = np(1 - p), these are exact for the Binomial distribution.

In practice, if we're going to make much use of these values, we will be doing an approximation of some sort anyway (e.g., assuming something follows a Normal distribution), so whether or not we're dividing by n or n-1, and what might be proper, isn't really a concern here.
• How do we know which variable should be 0 and which should be 1. In the .4 and .6 example, if we set .6 as 0 and .4 as 1, the mean would be .4 rather than .6. How do we select which becomes 0 and which becomes 1? • We are calculating mu = (1-p)*0 + p*1. Thats simplifies to p.

Why didn't we calculate mu = p*0 + (1-p)*1, which equates to 1-p?

I am assuming that those 0 and 1 which we are multiplying with are purely arbitrary. • You could calculate mu with either equation. It depends what the probability (p) is standing for. When Sal calculated mu, p was the probability of a 1. In you 2nd equation you are using p as the probability of a 0.
So if we use the values that Sal used in the previous video.
(Probability of a 1 = ps = 0.6 and probability of a 0 = pf = 0.4) then...
mu = (1-ps)*0 + ps*1 = (1-.6)*0 + .6*1 = 0.6
mu = pf*0 + (1-pf)*1 = .4*0 + (1-0.4)*1 = 0.6
• in tossing a biased coin once where the head is twice as likely to occur as the tail, let x be the number of heads. find the moment generating function of x hence the mean and variance of x • how come when finding the mean you do not have to have the whole equation over 2. Don't you have to divide by the number of terms? • From what I could get, I think it is because the outcomes are not actual numbers, they're not strictly numerical, so we can't add them and then divide by the number of observations. For example, when fliping a coin 5 times, the outcome could be "HHTTT", so these aren't numbers we can add and then divide by 5, but we can explain it using percentage, for example, if we consider tails (T) a successful outcome, then we could say that we had 60% of successes (3/5=0.6). Actually, if you analyse what a percentage is (number of something divided by the total), we can realize that dividing by the total is the same as dividing by the number of terms. :)

If it was random numbers, for example, "10, 3, 7, 2, 4", then it would be okay to find the mean ( (10+3+7+2+4)/5 ). In the case of "HHTTT", it seems logic to explain it using percentage
(1 vote)
• at I don't understand, why does he write expected value as m rather than E(X). Is it the same thing or not? • Given a normal distribution with upside down h=100 and o'=10, if you select a sample of n=25, what is the probability thatX' IS LESS THAN 95
(1 vote) • If 20 independent Bernoulli trials are carried out each with a different probability of success and therefore failure, what is the standard deviation of this equation? How is this calculated?
(1 vote) • The formula he is using for calculating the variance is not the standard formula and it does NOT calculate the sum of squares of differences from the mean.

Sum of squares of distances from the mean would be this:

(0 - P)^2 + (1 - P)^2

And you divide that by N to get the variance. Instead what I'm seeing is this:

(1 - P)(0 - P)^2 + P(1 - P)^2

Someone please explain why are we multiplying each square with (what seems to me to be a totally arbitrary) factor?
(1 vote) ## Video transcript

In the last video we figured out the mean, variance and standard deviation for our Bernoulli Distribution with specific numbers. What I want to do in this video is to generalize it. To figure out really the formulas for the mean and the variance of a Bernoulli Distribution if we don't have the actual numbers. If we just know that the probability of success is p and the probability a failure is 1 minus p. So let's look at this, let's look at a population where the probability of success-- we'll define success as 1-- as having a probability of p, and the probability of failure, the probability of failure is 1 minus p. Whatever this might be. And obviously, if you add these two up, if you view them as percentages, these are going to add up to 100%. Or if you add up these two values, they are going to add to 1. And that needs to be the case because these are the only two possibilities that can occur. If this is 60% chance of success there has to be a 40% chance of failure. 70% chance of success, 30% chance of failure. Now with this definition of this-- and this is the most general definition of a Bernoulli Distribution. It's really exactly what we did in the last video, I now want to calculate the expected value, which is the same thing as the mean of this distribution, and I also want to calculate the variance, which is the same thing as the expected squared distance of a value from the mean. So let's do that. So what is the mean over here? What is going to be the mean? Well that's just the probability weighted sum of the values that this could take on. So there is a 1 minus p probability that we get failure, that we get 0. So there's 1 minus p probability of getting 0, so times 0. And then there is a p probability of getting 1, plus p times 1. Well this is pretty easy to calculate. 0 times anything is 0. So that cancels out. And then p times 1 is just going to be p. So pretty straightforward. The mean, the expected value of this distribution, is p. And p might be here or something. So once again it's a value that you cannot actually take on in this distribution, which is interesting. But it is the expected value. Now what is going to be the variance? What is the variance of this distribution? Remember, that is the weighted sum of the squared distances from the mean. Now what's the probability that we get a 0? We already figured that out. There's a 1 minus p probability that we get a 0. So that is the probability part. And what is the squared distance from 0 to our mean? Well the squared distance from 0 to our mean-- let me write it over here-- it's going to be 0, that's the value we're taking on-- let me do that in blue since I already wrote the 0-- 0 minus our mean-- let me do this in a new color-- minus our mean. That's too similar to that orange. Let me do the mean in white. 0 minus our mean, which is p plus the probability that we get a 1, which is just p-- this is the squared distance, let me be very careful. It's the probability weighted sum of the squared distances from the mean. Now what's the distance-- now we've got a 1-- and what's the difference between 1 and the mean? It's 1 minus our mean, which is going to be p over here. And we're going to want to square this as well. This right here is going to be the variance. Now let's actually work this out. So this is going to be equal to 1 minus p. Now 0 minus p is going to be negative p. If you square it you're just going to get p squared. So it's going to be p squared. Then plus p times-- what's 1 minus p squared? 1 minus p squared is going to be 1 squared, which is just 1, minus 2 times the product of this. So this is going to be minus 2p right over here. And then plus negative p squared. So plus p squared just like that. And now let's multiply everything out. This is going to be, this term right over here is going to be p squared minus p to the third. And then this term over here, this whole thing over here, is going to be plus p times 1 is p. p times negative 2p is negative 2p squared. And then p times p squared is p to the third. Now we can simplify these. p to the third cancels out with p to the third. And then we have p squared minus 2p squared. So this right here becomes, you have this p right over here, so this is equal to p. And then when you add p squared to negative 2p squared you're left with negative p squared minus p squared. And if you want to factor a p out of this, this is going to be equal to p times, if you take p divided p you get a 1, p square divided by p is p. So p times 1 minus p, which is a pretty neat, clean formula. So our variance is p times 1 minus p. And if we want to take it to the next level and figure out the standard deviation, the standard deviation is just the square root of the variance, which is equal to the square root of p times 1 minus p. And we could even verify that this actually works for the example that we did up here. Our mean is p, the probability of success. We see that indeed it was, it was 0.6. And we know that our variance is essentially the probability of success times the probability of failure. That's our variance right over there. The probability of success in this example was 0.6, probability of failure was 0.4. You multiply the two, you get 0.24, which is exactly what we got in the last example. And if you take its square root for the standard deviation, which is what we do right here, it's 0.49. So hopefully you found that helpful, and we're going to build on this later on in some of our inferential statistics.