Main content

### Course: Statistics and probability > Unit 9

Lesson 6: Binomial mean and standard deviation formulas- Mean and variance of Bernoulli distribution example
- Bernoulli distribution mean and variance formulas
- Expected value of a binomial variable
- Variance of a binomial variable
- Finding the mean and standard deviation of a binomial random variable
- Mean and standard deviation of a binomial random variable

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Variance of a binomial variable

We can derive the variance of a binomial variable to be p(1-p), and the standard deviation is the square root of the variance.

## Want to join the conversation?

- I think 'X = sum Y = nY' is mislead people

since Y is a random variable

sum of Y can't be equal to n*Y(n is a constant)

to me its feel like Var(X) = Var(sum Y) = Var(nY) which is incorrect idea

'Var(sum Y) = sum Var(Y)' and 'Var(nY) = n^2*Var(Y)' so they can't be equal

it would be better to make clear 'X = sum Y' and 'Var(X) = Var(sum Y)'

thanks and sorry for my bad English(13 votes)- Yes, Sal's terminology is a bit sloppy...

It would be clearer that 𝑋 is the sum of independent instances of 𝑌 if he had said

𝑋 = 𝑌(1) + 𝑌(2) + 𝑌(3) + ... + 𝑌(𝑛) = ∑𝑌(𝑖)

Then,

𝐸(𝑋) = 𝐸(∑(𝑌(𝑖))) = ∑𝐸(𝑌(𝑖)) = 𝑛 ∙ 𝐸(𝑌)

Similarly,

Var(𝑋) = 𝑛 ∙ Var(𝑌)(19 votes)

- What does the variance of a binomial variable describe? Why is the variance of a binomial variable important?(4 votes)
- The variance of a binomial variable describes the spread or variability of the distribution around the mean (expected value). It gives us an idea of how dispersed the outcomes are from the expected number of successes. In practical terms, it helps in understanding the reliability or predictability of the outcomes. For example, a higher variance indicates more unpredictability in the number of successes across trials.(2 votes)

- Variance of aY is a^2 variance(Y). How would that justify with the derivation(1 vote)
- This was exactly my question!... if X = nY, then Var(X) = Var(nY) = n^2Var(Y).... so Sal needs to explain his steps...(5 votes)

- Couldn't this be simplified by saying the Variance of a binomial variable is the variance of a Bernoulli distribution multiplied by n trials? The reason being the variance addition property.

@ time5:10Sal says "indeed the variance for a binomial variable" I think he means the variance of a Bernoulli distribution?(3 votes)- Yes, that simplification captures the essence of the derivation. Since a binomial variable X can be thought of as the sum of n independent Bernoulli trials (each with variance p(1 − p)), the variance of X is n times the variance of a single Bernoulli trial, thanks to the property that the variance of the sum of independent variables is the sum of their variances. And, yes, at5:10, it appears there was a confusion in terminology; what's described is indeed the variance of a single Bernoulli trial.(1 vote)

- From the last video in this video I am so confused. I don't even know where to start with my questions.(2 votes)
- Shouldn't var(nx) be n^2*var(x)?(1 vote)
- For the variance of nX, where X is a random variable and n is a constant, it's important to distinguish between multiplying the variable itself by n (affecting the variance by n^2) and summing n independent instances of a variable. For nX, Var(nX) = n^2Var(X) because scaling a variable scales its variance by the square of that factor. However, when you sum n independent copies of X (as in a binomial distribution being the sum of n Bernoulli trials), the variance is n × Var(X), not n^2 × Var(X), because the operation is additive for independent variables, not multiplicative.(2 votes)

- Would this derivation of the variance = p(1-p) work if Sal started by using p(0-p)^2 + (1-p)(1-p)^2? I can't seem to derive the same result if I try calculate it this way. I guess my question is why did Sal use p(1-p)^2 as the first term and not p(0-p)^2? Shouldn't we arrive at the same result?(1 vote)
- When Sal uses p(1−p)^2 as the term, it seems there's a misunderstanding in your question. The correct formulation for the variance of a Bernoulli distribution should involve p(1 − p), where p is the probability of success. For a Bernoulli variable Y where Y = 1 with probability p and Y = 0 with probability 1 − p, the variance calculation uses E[Y ^2] − (E[Y])^2, leading directly to p(1 − p). The terms p(0 − p)^2 and (1 − p)(1 − p)^2 are not used because they're not correctly formulating the variance for a Bernoulli trial.(1 vote)

- X and Y actually are two sets of data

Therefore Var(X)=Var(Y+Y+Y...) = nVar(Y)

Proof:

Var(X+Y) = Var(X)+Var(Y)+2Cov(X,Y)

If X and Y are independent of each other, then Cov(X,Y) = 0(0 votes)- Your statement is correct regarding the variance of sums of independent variables. When X and Y are independent, Cov(X,Y) = 0, and the variance of a sum X + Y equals Var(X) + Var(Y). Extending this to n identical, independent distributions Y, the variance of their sum (nY) indeed equals n × Var(Y).(1 vote)

- Isn't it when X=nY, μx=nμy, and σx=nσy, according to the impact of scaling a random variable. Then why Sal inferred from E(x)=nE(y) the conclusion Var(x)=nVar(Y)?(1 vote)
- No, σx=nσy is incorrect. In previous video Sal also said that if a random variable is scaled by n then Var(x)= n * Var(y), which is equivalent to σx^2 = nσy^2(0 votes)

- I learned that to find the variance of a function or a random variable I can use characteristic function of it but I couldn't find the characteristic function of X^2 to solve the problem below.

If X ∼ bin(n, p) and Y = X^2

Find the variance of Y by using characteristic function of X.(0 votes)- To find the variance of Y = X^2 for a binomially distributed X using the characteristic function, you first need to understand that the characteristic function for a binomial distribution X ∼ Bin(n, p) is given by ϕX(t) = (pe^it + (1 − p))^n, where i is the imaginary unit. However, finding the variance of Y = X^2 directly from the characteristic function of X is not straightforward because the characteristic function of Y isn't directly obtained from that of X. You would typically calculate the variance of Y by finding E[Y] and E[Y^2] from the probability distribution of Y, which, in this case, is not trivially related to the characteristic function of X. This approach might not be the most efficient for this particular problem.(1 vote)

## Video transcript

- [Instructor] What we're
going to do in this video is continue our journey
trying to understand what the expected value
and what the variance of a binomial variable is going to be or what the expected value or the variance of a binominal distribution is going to be which is just the distribution
of a binomial variable. And so, like in the last video I have this binomial variable X that's defined in a very general sense. It's the number of
successes from N trials, so it's a finite number of trials where the probability of
success is equal to P, so the probability is
constant across the trials for each of these independent trials, so the probability of success in one trial is not dependent on what
happened in the other rials. And we also talked in that previous video where we talked about the expected value of this binomial variable we said hey, it could be viewed that this binomial variable can be viewed as the sum of N of what you could really consider to be a Bernoulli variable here. So, this variable, this random variable Y, the probability that's equal to one, you could do that as a
success is equal to P. The probability that it's a failure that Y is equal to zero is one minus P, so you could view Y, the outcome of Y or whether Y is one or zero is really whether we had a success or not in each of these trials, so if you add up N Ys, then you are going to get X and we use that information to figure out what the expected value
of X is going to be because the expected value of Y is pretty straightforward
to directly compute. Expected value of Y is just the probability weighted outcomes. So, it's P times one plus one minus P, one minus P, times zero, times zero. This whole term's gonna be zero and so, the expected value
of Y is really just P and so, if you said the
expected value of X, well, that's just going to be, let me just write it over here, this is all review, we could say that the expected value of X is just going to be equal to, we know from our expected value properties that it's going to be equal to the sum of the expected values of these N Ys, or you could say it is N times the expected value, times the expected value of Y, the expected value of Y is P, so this is going to be equal to N times P. Now, we're gonna do the same idea to figure out what the variance of X is going to be equal to because we could see, we know
from our variance properties, you can't do this with standard deviation but you could do it with variance and then once you figure out the variance, you just take the square root for the standard deviation, the variance of X is similarly going to be the sum of the
variances of these N Ys. So, it's gonna be similarly N times the variance, N
times the variance of Y. So, this all boils down to
what is the variance of Y going to be equal be? So, let me scroll over a little bit, get a little bit of more real estate and I will figure that
out right over here. Alright, so we wanna figure
out the variance of Y, so variance of Y is going to be equal to what? Well, here it's going
to be the probability squared distances from the expected value. So, we have a probability of P where what is going to
be our squared distance from the expected value? Well, we're going to get a
one with a probability of P, so in that case our distance from the mean or from the expected value, we're at one, the expected value we already know is equal to P, so that's that for that possible outcome, the squared distance times
its probability weight and then we have, actually let me scroll over, well, I'll just do it right over here, plus we have a probability of one minus P, one minus P for the
other possible outcome, so in that outcome we are at zero and the difference between
zero and our expected value? Well, that's just going to be zero minus P and once again we are going
to square that quantity and so, this is the expression
for the variance of Y and we can simplify it a little bit. So, this is all going to be equal to, so, P times one minus P squared and then is just going to be P squared times one minus P plus P squared times one minus P and let's see, we can factor
out a P times one minus P, so what is that going to be left with? So, if we factor out a P
times one minus P here, we're just going to be
left with a one minus P and if we factor out a P
times one minus P here, we're just going to have a plus P. These two cancel out. This is just this whole
thing is just a one. So, you're left with P times one minus P which is indeed the variance
for a binomial variable. We actually proved that in other videos. I guess it doesn't hurt to see it again but there you have. We know what the variance of Y is. It is P times one minus P and the variance of X is just
N times the variance of Y, so there we go, we deserve
a little bit of a drum roll, the variance of X is equal to N times P times one minus P. So, if we were to take
the concrete example of the last video where if I were to take 10 free throws, so each trial is a shot, is a free throw, so if I were to take 10 free throws and my probability of success is 0.3, I have a 30% free throw percentage, the variance that I would expect to see, so in that case the variance if X is the number of free throws I make after these 10 shots, my variance will be 10 times 0.3, 0.3 times one minus 0.3, so 0.7 and so, that would be what? This right over, so this
would be equal to 10 times .3 times .7 times 0.21, so my variance in this situation is going to be equal to 2.1. Is equal to 2.1 and if I wanted to figure
out the standard deviation of this right over here, I would just take the square root of this, so if we want the standard deviation, just take the square root of this expression right over here.