If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Intuition for why independence matters for variance of sum

Intuition for why independence matters for variance of sum.

Want to join the conversation?

  • aqualine seed style avatar for user dmariesaunders
    Would someone put into words what is being measured by Var(X + Y)? I understand that of all the people in our sample, for both random variables, there was an average spread from the mean of 2 hours. What does it mean when we add these?
    (7 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user akshithio
      Here's how I understood it:

      Variance in this context should really just be intuitively thought about how "much" the values can really differ. Almost as if we're in some sense measuring the "range" of all possible values.

      In the video prior to this as well when Sal introduces the concept of adding or subtracting the Variance of two variables, he adds the minima and the maxima together for the Var(X + Y).

      Now in this case, what are the possible values for X + Y?

      Well there's only one possible answer: 24.

      Because X and Y always need to add up to 24 (Only 24hrs in a day and you're either awake or asleep during all of those hours).

      What is the variance in these values? 0. Since there's not really more than one possible answer that you can have, X + Y can't really "vary".

      In the previous video, you can see how there's a range such as 18 <=X <= 22, meaning the data can "vary" by 4. In our case 24 = X + Y. It can neither be less nor more, therefore 0 variance.


      This definitely is a bit of detour from the typical definition of variance in probability distributions (summation of the squared differences between expected value and one possibility) and isn't particularly a very rigorous explanation, but this is how I understood it intuitively.

      P.S: Also 5 years too late but might still be helpful. Also the previous video I've been referring to just in case: https://www.khanacademy.org/math/ap-statistics/random-variables-ap/combining-random-variables/v/variance-of-sum-and-difference-of-random-variables
      (3 votes)
  • starky ultimate style avatar for user nguyenhaison
    I want to ask 2 things here.
    If the variances of dependent variables is 0 because they dependent on each other, like if this one changes the other one will be changed too thus there will no change in variances, Am I understand it correct?
    and What about the mean of dependent variables?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • cacteye blue style avatar for user Jerry Nilsson
      In this particular case 𝑋 + 𝑌 is a constant, which is why Var(𝑋 + 𝑌) = 0.

      This isn't always the case, though, and besides it's not very relevant.
      What Sal wanted to show is that the equation
      Var(𝑋 ± 𝑌) = Var(𝑋) + Var(𝑌) doesn't necessarily hold up if 𝑋 and 𝑌 are dependent.

      – – –

      For your second question, since the outcome of 𝑌 depends on the outcome of 𝑋, then the mean of 𝑌 depends on the mean of 𝑋.

      In this case 𝜇(𝑋) is the number of hours that the average person slept yesterday, while 𝜇(𝑌) is the number of hours the average person was awake yesterday.
      That gives us 𝜇(𝑌) = 24 − 𝜇(𝑋)
      (8 votes)
  • blobby green style avatar for user Ziad Orabi
    why doesn't Var(X+Y)=8 (hrs.)2 make sense?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user mahmoudsamawal
    The key concept I do not understand here is how to combine two random variables? In the last video we summed or subtracted X and Y as extreme values of both, why we do not do that here and if so we would got variability? What is the rule of the game of combining two r. v.?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      The rule for combining two random variables depends on their relationship and the context of the problem. In the example given, where x and y represent hours slept and hours awake, respectively, their sum x + y always equals 24 hours due to the nature of a 24-hour day. This means there is no variability in the sum, as it's a constant value. When combining random variables, you need to consider how they interact and whether they exhibit independence or dependence. If they are independent, you can apply formulas for the sum or difference of their variances. If they are dependent, you need to analyze their relationship and how it affects the variability of the combined variable
      (1 vote)
  • blobby green style avatar for user zjleon2010
    So the only way to calculate the variance of sum of two dependent variables is sum the individual data points to form a new variable then apply the variance formula?
    what if the two variables have different size?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      When dealing with dependent variables, such as in the example provided where x represents the number of hours slept and y represents the number of hours awake, you can't directly apply the formula for the variance of the sum of independent variables. Instead, you need to consider the relationship between the variables and how they combine to form a new variable. If the variables are dependent, their combination may not exhibit variability in the same way as independent variables. In cases where the variables have different magnitudes or sizes, you still need to consider their relationship and how they contribute to the overall variability of the combined variable.
      (1 vote)

Video transcript

- [Narrator] So in previous videos we talked about the claim that if I have two random variables, x and y, that are independent, then the variance of the sum of those two random variables or the difference of those two random variables is going to be equal to the sum of the variances. So that if you have independent random variables, your variation is going to increase when you take a sum or a difference. And we've built a little bit of intuition there. What I wanna talk about in this video, it's really about building even more intuition, is get a gut feeling for why this independence is important for making this claim. And to get that intuition, let's look at two random variables that are definitely random variables but that are definitely not independent. So let's let x is equal to the number of hours that the next person you meet, so I'll say random person, random person slept yesterday. And let's say that y is equal to the number of hours that same person was awake yesterday. And appreciate why these are not independent random variables. One of them is gonna completely determine the other. If I slept eight hours yesterday then I would have been awake for 16 hours. Or if I slept for 16 hours then I would have been awake for eight hours. We know that x plus y, even though they're random variables, and there could be variation in x and there could be variation in y. But for any given person, remember, these are still based on that same person. X plus y is always going to be equal to 24 hours. So these are not independent, not independent. If you're given one of the variables it would completely determine what the other variable is. The probability of getting a certain value for one variable is going to be very different, given what value you got for the other variable. So they're not independent at all. So in this situation, if someone said, let's just say for the sake of argument, that the variance of x, the variance of x is equal to, I don't know, let's say it's equal to four, the unit's four variance so it would be squared hours. So four hours squared. We could say that the standard deviation for x in this case would be two hours. And let's say that the variance, let's say the standard deviation of y is also equal to two hours. And let's say that the variance of y, variance of y, well it would be the square of the standard deviation. And so it would be four hours, four hours squared would be our units. So if we just tried to blindly say, "Oh, I'm just gonna apply this little "expression, this claim we have," without thinking about the independents we would try to say, "Well then, the variance "of x plus y, the variance of x plus y "must be equal to the sum of their variances." So it would be four plus four. So is it equal to eight hours squared? Well that doesn't make any sense. Because we know that a random variable that is equal to x plus y, this is always going to be 24 hours. In fact, it's not going to have any variation. X plus y is always gonna be 24 hours. So for these two random variables, because they are so connected. They are not independent at all, this is actually going to be zero. There is zero variance here. X plus y is always going to be 24. At least on earth where we have a 24 hour day. I guess if someone lived on another planet or something it could be slightly different. And we're assuming that we have an exactly 24 hour day on earth. So this is to give you a gut sense of why independence matters for making this claim. And if you have things that are not independent it gives you a good sense for why this claim doesn't hold up as much.