If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Course: Statistics and probability>Unit 16

Lesson 1: Analysis of variance (ANOVA)

ANOVA 1: Calculating SST (total sum of squares)

Analysis of Variance 1 - Calculating SST (Total Sum of Squares). Created by Sal Khan.

Want to join the conversation?

• Sal says that the grand mean is equal to the "mean of the means" of the three groups. Is this always true, or does it depend on the groups having the same number of elements? For example, I can imagine three groups: A={1,2,3}, B={4,3,2}, C={8,9,10,11,12,13,14}. In this case the grand mean is 7.08 but the mean of means is (2+3+11)/3=5.3.
• For the grand mean to be the "mean of means," the groups need to have the same number of observations (i.e., be "balanced").

ANOVA will still work, but depending on how Sal expressed the formulas, the precise formulas he showed may not apply (I forget what formulas he used, and I don't feel like rewatching to find out).
• Hi sal I was going thru your tutorials anova 1, 2 and 3, I was not able to find if anova can be measured as one tail or two tailed test...
• This may be a bit late to help you, but ANOVA is always one-tailed.
• what is homogeneity of variance
• Homogeneity = "same". Variance is a measure of the spread of data. So homogeneity of variance refers to two groups of data that have approximately the same spread. I don't know if that answers your specific question (and I know this was posted quite some time ago) but if you have further questions I'd be happy to try to help.
• What exactly is your concern with degrees of freedom maybe the community can help if you are just looking for more general theory on it check out this site its short and sweet with a cool explanation tucked into the end of it.
www.statsdirect.com/help/basics/degrees_of_freedom.htm
• I though that Degrees of Freedom should be the total number of participants minus total number of groups. But here it is N-1. Would you please make the concept of Degrees of Freedom clear for me?
Thanks
• There are several degrees of freedom in an ANOVA setting.

The total degrees of freedom is N - 1.
The degrees of freedom for the model is M - 1, where M is the number of groups.
The degrees of freedom for the residual / error is N - M.
• If the mean of mean is needed to calculate the 9th value, doesn't that count as being not determined by just the 8 values, and thus there are 9 independent values in total, and thus 9 degrees of freedom?
• Degrees of freedom is alwaysthe number of values that you have -1, in other words, n-1. Plus if you watch the previous video Sal explains how we take the Rows x the columns and that gives you (N). So in this example, if you multiply Rows (3) x the Columns (3)----3*3=9. 9 is your N. Now take 9-1=8. For this sample set you have 8 degrees of freedom. I hope this helps.
• I noticed in the other video that SSW and SSB were used can any one tell me which one of them is SSR and which SSE. Thank you
(1 vote)
• Good question. You have to be VERY CAREFUL with these, because depending on the source, you could get confused, especially between Regression and ANOVA.
So, in ANOVA, there are THREE DIFFERENT TRADITIONS:
1) SSW (Within) + SSB (Between) = SST (Total!!)
This is what Sal uses. But if you search the web or textbooks, you ALSO FIND:
2) SSE (Error) + SST (Treatment!!) = SS(Total) THIS IS THE WORST.
3) SSE (Error) + SSM (Model) = SST (Total)
Wait, WHAT?! There are two different SST's? I know, it's horrible. Anyway, that's the way it is. If people use SST to mean "treatment", then they have to write SS(Total) for the total sum of squares, or they might even write TSS for "Total Sum of Squares".
"Error" means the same as "Within groups" This is the variation which is NOT explained by the fact that we can put the data into different groups.
"Treatment" or "Model" (or sometimes "Factor") means the same as "Between groups" This is the variation that IS explained by the fact that there are different groups of data (often because they come from patients who get different treatments).
Now, in Regression, we have:
SSR (Residuals) + SSE (Explained) = SST (Total)
SSR is the sum of (y_i - yhat_i)^2, so it is the variation of the data away from the regression line. So it is similar to SSW, it is the residual variation of y-values not explained by the changing x-value.
SSE is the sum of (yhat_i - ybar)^2, so it is the variation of the regression line itself away from the overall mean of the y-values. Thus it tells us how much of the variation in the data is explained by the changing x-values.
• Does anybody know how to calculate the dof for the denominator and the numerator? I have a sample of n=4 and only that one group. The numerator df was 3 but I can't figure out the denominator df.
(1 vote)
• If you don't have multiple groups, then ANOVA probably isn't the test you want to be using.