If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

ANOVA 1: Calculating SST (total sum of squares)

Analysis of Variance 1 - Calculating SST (Total Sum of Squares). Created by Sal Khan.

Want to join the conversation?

  • leaf red style avatar for user robyn.gibbard
    Sal says that the grand mean is equal to the "mean of the means" of the three groups. Is this always true, or does it depend on the groups having the same number of elements? For example, I can imagine three groups: A={1,2,3}, B={4,3,2}, C={8,9,10,11,12,13,14}. In this case the grand mean is 7.08 but the mean of means is (2+3+11)/3=5.3.
    (8 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      For the grand mean to be the "mean of means," the groups need to have the same number of observations (i.e., be "balanced").

      ANOVA will still work, but depending on how Sal expressed the formulas, the precise formulas he showed may not apply (I forget what formulas he used, and I don't feel like rewatching to find out).
      (12 votes)
  • blobby green style avatar for user Jes Victor Daniel
    Hi sal I was going thru your tutorials anova 1, 2 and 3, I was not able to find if anova can be measured as one tail or two tailed test...
    your help is highly appriciated.
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user jjclaudes
    what is homogeneity of variance
    (4 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Peter
      Homogeneity = "same". Variance is a measure of the spread of data. So homogeneity of variance refers to two groups of data that have approximately the same spread. I don't know if that answers your specific question (and I know this was posted quite some time ago) but if you have further questions I'd be happy to try to help.
      (6 votes)
  • blobby green style avatar for user Ron Ghosh
    Is there a specific video where I can learn more about degrees of freedom?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • mr pants teal style avatar for user Jose
      What exactly is your concern with degrees of freedom maybe the community can help if you are just looking for more general theory on it check out this site its short and sweet with a cool explanation tucked into the end of it.
      www.statsdirect.com/help/basics/degrees_of_freedom.htm
      (4 votes)
  • male robot johnny style avatar for user Mohammad Safaie
    I though that Degrees of Freedom should be the total number of participants minus total number of groups. But here it is N-1. Would you please make the concept of Degrees of Freedom clear for me?
    Thanks
    (2 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      There are several degrees of freedom in an ANOVA setting.

      The total degrees of freedom is N - 1.
      The degrees of freedom for the model is M - 1, where M is the number of groups.
      The degrees of freedom for the residual / error is N - M.
      (6 votes)
  • leaf green style avatar for user Carson
    If the mean of mean is needed to calculate the 9th value, doesn't that count as being not determined by just the 8 values, and thus there are 9 independent values in total, and thus 9 degrees of freedom?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • mr pants teal style avatar for user Christopher Scott
      Degrees of freedom is alwaysthe number of values that you have -1, in other words, n-1. Plus if you watch the previous video Sal explains how we take the Rows x the columns and that gives you (N). So in this example, if you multiply Rows (3) x the Columns (3)----3*3=9. 9 is your N. Now take 9-1=8. For this sample set you have 8 degrees of freedom. I hope this helps.
      (3 votes)
  • leaf green style avatar for user Monder Aboukhaled
    I noticed in the other video that SSW and SSB were used can any one tell me which one of them is SSR and which SSE. Thank you
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user robshowsides
      Good question. You have to be VERY CAREFUL with these, because depending on the source, you could get confused, especially between Regression and ANOVA.
      So, in ANOVA, there are THREE DIFFERENT TRADITIONS:
      1) SSW (Within) + SSB (Between) = SST (Total!!)
      This is what Sal uses. But if you search the web or textbooks, you ALSO FIND:
      2) SSE (Error) + SST (Treatment!!) = SS(Total) THIS IS THE WORST.
      3) SSE (Error) + SSM (Model) = SST (Total)
      Wait, WHAT?! There are two different SST's? I know, it's horrible. Anyway, that's the way it is. If people use SST to mean "treatment", then they have to write SS(Total) for the total sum of squares, or they might even write TSS for "Total Sum of Squares".
      "Error" means the same as "Within groups" This is the variation which is NOT explained by the fact that we can put the data into different groups.
      "Treatment" or "Model" (or sometimes "Factor") means the same as "Between groups" This is the variation that IS explained by the fact that there are different groups of data (often because they come from patients who get different treatments).
      Now, in Regression, we have:
      SSR (Residuals) + SSE (Explained) = SST (Total)
      SSR is the sum of (y_i - yhat_i)^2, so it is the variation of the data away from the regression line. So it is similar to SSW, it is the residual variation of y-values not explained by the changing x-value.
      SSE is the sum of (yhat_i - ybar)^2, so it is the variation of the regression line itself away from the overall mean of the y-values. Thus it tells us how much of the variation in the data is explained by the changing x-values.
      (3 votes)
  • blobby green style avatar for user Aimee Wong
    Does anybody know how to calculate the dof for the denominator and the numerator? I have a sample of n=4 and only that one group. The numerator df was 3 but I can't figure out the denominator df.
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user ankieadinda
    Is there any video describing about OLS regression?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • spunky sam blue style avatar for user sarahwinston2
    At around , I don't understand why he's referring to the degrees of freedom as m*n. I'm only familiar with df as "the number of values in the final calculation of a statistic that are free to vary" (wikipedia) and that it is used in virtually every hypothesis test
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

In this video and the next few videos, we're just really going to be doing a bunch of calculations about this data set right over here. And hopefully, just going through those calculations will give you an intuitive sense of what the analysis of variance is all about. Now, the first thing I want to do in this video is calculate the total sum of squares. So I'll call that SST. SS-- sum of squares total. And you could view it as really the numerator when you calculate variance. So you're just going to take the distance between each of these data points and the mean of all of these data points, square them, and just take that sum. We're not going to divide by the degree of freedom, which you would normally do if you were calculating sample variance. Now, what is this going to be? Well, the first thing we need to do, we have to figure out the mean of all of this stuff over here. And I'm actually going to call that the grand mean. And I'm going to show you in a second that it's the same thing as the mean of the means of each of these data sets. So let's calculate the grand mean. So it's going to be 3 plus 2 plus 1 plus 5 plus 3 plus 4 plus 5 plus 6 plus 7. And then we have nine data points here so we'll divide by 9. And what is this going to be equal to? 3 plus 2 plus 1 is 6. 6 plus-- let me just add. So these are 6. 5 plus 3 plus 4 is 12. And then 5 plus 6 plus 7 is 18. And then 6 plus 12 is 18 plus another 18 is 36, divided by 9 is equal to 4. And let me show you that that's the exact same thing as the mean of the means. So the mean of this group 1 over here-- let me do it in that same green-- the mean of group 1 over here is 3 plus 2 plus 1. That's that 6 right over here, divided by 3 data points so that will be equal to 2. The mean of group 2, the sum here is 12. We saw that right over here. 5 plus 3 plus 4 is 12, divided by 3 is 4 because we have three data points. And then the mean of group 3, 5 plus 6 plus 7 is 18 divided by 3 is 6. So if you were to take the mean of the means, which is another way of viewing this grand mean, you have 2 plus 4 plus 6, which is 12, divided by 3 means here. And once again, you would get 4. So you could view this as the mean of all of the data in all of the groups or the mean of the means of each of these groups. But either way, now that we've calculated it, we can actually figure out the total sum of squares. So let's do that. So it's going to be equal to 3 minus 4-- the 4 is this 4 right over here-- squared plus 2 minus 4 squared plus 1 minus 4 squared. Now, I'll do these guys over here in purple. Plus 5 minus 4 squared plus 3 minus 4 squared plus 4 minus 4 squared. Let me scroll over a little bit. Now, we only have three left, plus 5 minus 4 squared plus 6 minus 4 squared plus 7 minus 4 squared. And what does this give us? So up here, this is going to be equal to 3 minus 4. Difference is 1. You square it. It's actually negative 1, but you square it, you get 1, plus you get negative 2 squared is 4, plus negative 3 squared. Negative 3 squared is 9. And then we have here in the magenta 5 minus 4 is 1 squared is still 1. 3 minus 4 squared is 1. You square it again, you still get 1. And then 4 minus 4 is just 0. So we could-- well, I'll just write the 0 there just to show you that we actually calculated that. And then we have these last three data points. 5 minus 4 squared. That's 1. 6 minus 4 squared. That is 4, right? That's 2 squared. And then plus 7 minus 4 is 3 squared is 9. So what's this going to be equal to? So I have 1 plus 4 plus 9 right over here. That's 5 plus 9. This right over here is 14, right? 5 plus-- yup, 14. And then we also have another 14 right over here because we have a 1 plus 4 plus 9. So that right over there is also 14. And then we have 2 over here. So it's going to be 28-- 14 times 2, 14 plus 14 is 28-- plus 2 is 30. Is equal to 30. So our total sum of squares-- and actually, if we wanted the variance here, we would divide this by the degrees of freedom. And we've learned multiple times the degrees of freedom here so let's say that we have-- so we know that we have m groups over here. So let me just write it as m and I'm not going to prove things rigorously here, but I want to show you where some of these strange formulas that show up in statistics books actually come from without proving it rigorously. More to give you the intuition. So we have m groups here. And each group here has n members. So how many total members do we have here? Well, we had m times n or 9, right? 3 times 3 total members. So our degrees of freedom-- and remember, you have however many data points you had minus 1 degrees of freedom because if you know the mean of means, if you assume you knew that, then only 9 minus 1, only eight of these are going to give you new information because if you know that, you could calculate the last one. Or it really doesn't have to be the last one. If you have the other eight, you could calculate this one. If you have eight of them, you could always calculate the ninth one using the mean of means. So one way to think about it is that there's only eight independent measurements here. Or if we want to talk generally, there are m times n-- so that tells us the total number of samples-- minus 1 degrees of freedom. And if we were actually calculating the variance here, we would just divide 30 by m times n minus 1 or this is another way of saying eight degrees of freedom for this exact example. We would take 30 divided by 8 and we would actually have the variance for this entire group, for the group of nine when you combine them. I'll leave you here in this video. In the next video, we're going to try to figure out how much of this total variance, how much of this total squared sum, total variation comes from the variation within each of these groups versus the variation between the groups. And I think you get a sense of where this whole analysis of variance is coming from. It's the sense that, look, there's a variance of this entire sample of nine, but some of that variance-- if these groups are different in some way-- might come from the variation from being in different groups versus the variation from being within a group. And we're going to calculate those two things and we're going to see that they're going to add up to the total squared sum variation.