Main content

## Statistics and probability

### Course: Statistics and probability > Unit 9

Lesson 4: Combining random variables- Mean of sum and difference of random variables
- Variance of sum and difference of random variables
- Intuition for why independence matters for variance of sum
- Deriving the variance of the difference of random variables
- Combining random variables
- Combining random variables
- Example: Analyzing distribution of sum of two normally distributed random variables
- Example: Analyzing the difference in distributions
- Combining normal random variables
- Combining normal random variables

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Intuition for why independence matters for variance of sum

AP.STATS:

VAR‑5 (EU)

, VAR‑5.E (LO)

, VAR‑5.E.2 (EK)

, VAR‑5.E.3 (EK)

Intuition for why independence matters for variance of sum.

## Want to join the conversation?

- Would someone put into words what is being measured by Var(X + Y)? I understand that of all the people in our sample, for both random variables, there was an average spread from the mean of 2 hours. What does it mean when we add these?(5 votes)
- idk may be because that is how it was supposed to be.(1 vote)

- I want to ask 2 things here.

If the variances of dependent variables is 0 because they dependent on each other, like if this one changes the other one will be changed too thus there will no change in variances, Am I understand it correct?

and What about the mean of dependent variables?(1 vote)- In this particular case 𝑋 + 𝑌 is a constant, which is why Var(𝑋 + 𝑌) = 0.

This isn't always the case, though, and besides it's not very relevant.

What Sal wanted to show is that the equation

Var(𝑋 ± 𝑌) = Var(𝑋) + Var(𝑌) doesn't necessarily hold up if 𝑋 and 𝑌 are dependent.

– – –

For your second question, since the outcome of 𝑌 depends on the outcome of 𝑋, then the mean of 𝑌 depends on the mean of 𝑋.

In this case 𝜇(𝑋) is the number of hours that the average person slept yesterday, while 𝜇(𝑌) is the number of hours the average person was awake yesterday.

That gives us 𝜇(𝑌) = 24 − 𝜇(𝑋)(4 votes)

- why doesn't Var(X+Y)=8 (hrs.)2 make sense?(1 vote)
- So the only way to calculate the variance of sum of two dependent variables is sum the individual data points to form a new variable then apply the variance formula?

what if the two variables have different size?(0 votes)

## Video transcript

- [Narrator] So in previous videos we talked about the claim that if I have two random variables, x and y, that are independent, then the variance of the sum of those two random variables or the difference of
those two random variables is going to be equal to
the sum of the variances. So that if you have independent random variables, your variation is going to increase when you take a sum or a difference. And we've built a little
bit of intuition there. What I wanna talk about in this video, it's really about building
even more intuition, is get a gut feeling for why this independence is important
for making this claim. And to get that intuition, let's look at two random variables that are definitely random variables but that are definitely not independent. So let's let x is equal to the number of hours that the next person you meet, so I'll say random person, random person slept yesterday. And let's say that y is equal to the number of hours that same person was awake yesterday. And appreciate why these are not independent random variables. One of them is gonna
completely determine the other. If I slept eight hours yesterday then I would have been awake for 16 hours. Or if I slept for 16 hours then I would have been awake for eight hours. We know that x plus y, even though they're random variables, and there could be variation in x and there could be variation in y. But for any given person, remember, these are still based on that same person. X plus y is always going
to be equal to 24 hours. So these are not
independent, not independent. If you're given one of the variables it would completely determine
what the other variable is. The probability of getting a certain value for one variable is going to be very different, given what value you got for the other variable. So they're not independent at all. So in this situation, if someone said, let's just say for the sake of argument, that the variance of x, the variance of x is
equal to, I don't know, let's say it's equal to four, the unit's four variance so
it would be squared hours. So four hours squared. We could say that the standard deviation for x in this case would be two hours. And let's say that the variance, let's say the standard deviation of y is also equal to two hours. And let's say that the variance of y, variance of y, well it would be the square of the standard deviation. And so it would be four hours, four hours squared would be our units. So if we just tried to blindly say, "Oh, I'm just gonna apply this little "expression, this claim we have," without thinking about the independents we would try to say,
"Well then, the variance "of x plus y, the variance of x plus y "must be equal to the
sum of their variances." So it would be four plus four. So is it equal to eight hours squared? Well that doesn't make any sense. Because we know that a random variable that is equal to x plus y, this is always going to be 24 hours. In fact, it's not going
to have any variation. X plus y is always gonna be 24 hours. So for these two random variables, because they are so connected. They are not independent at all, this is actually going to be zero. There is zero variance here. X plus y is always going to be 24. At least on earth where
we have a 24 hour day. I guess if someone lived on another planet or something it
could be slightly different. And we're assuming that we have an exactly 24 hour day on earth. So this is to give you a gut sense of why independence matters
for making this claim. And if you have things
that are not independent it gives you a good sense for why this claim doesn't hold up as much.