If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Course: Statistics and probability>Unit 3

Lesson 5: Variance and standard deviation of a sample

# Sample variance

Thinking about how we can estimate the variance of a population by looking at the data in a sample. Created by Sal Khan.

## Want to join the conversation?

• What if n=1? Then wouldn't the sample variance be infinity? •   It would be undefined, yes. But would that be a problem? If you only looked at one data point from a population you really wouldn't have any idea of how dispersed the data is, so an undefined estimate of the population variance is appropriate.
• Where did the "-x{bar}" at come from? I've totally missed that. • It really bothers me that these terms are introduced here without a definition. Where would I even go to get some context? It seems like Variance doesn't actually get defined until the next course, which is absurd.  • How do we know when it's ok to use a caculator when we're doing the math excersises?

I want to be able to do these problems as well as if I was in a 'real' classroom, so I don't want to cheat and use one when I shouldn't, but I don't know where we're meant to use one and where we should be doing the math completely on our own. Some of these topics doing it without a caculater takes quite a while and I've wondered if it would be ok, and now Sal is using one in this video. Would we be using one with this math topic in a classroom setting? • It's really up to you. You could even go through the exercise three different times, once doing the figures by hand (or at least until you got incredibly bored!), once with a statistical calculator, and maybe even once with a spreadsheet or a statistical software package. Think about drills like this not as an obligation to a teacher but as an opportunity to develop critical skills to the degree that you would like.
• How do we know when to divide by n and when to divide by n-1? Or is it better to always divide by n-1? • Hi RJ,
We divide by n when we know a large majority of the data points. For example, if there are 7 tigers and we know 6 of their ages, then we would divide by n. We divide by n-1 when our sample is relatively small. For example, we know the ages of 5 hippos but there are 42 of them. In this case, divide by n-1 because, due to the small sample, we are probably underestimating the average age.
Hope that helps.
• Why not when finding the variance, find the absolute value of each variables distance from the mean? Why square it? Would the above procedure just give you standard deviation? • First, we could take their absolute values but that would give us a totally different statistic, called the Mean Absolute Deviation (or MAD for short). There are various reasons why the standard deviation is preferred over the MAD (but that gets pretty technical). The point is that you can take the absolute value but it will, in general, give you a totally different number not equal to the standard deviation.

Lastly, when we square the distance from the mean, we also are squaring the units associated with them. So, if you are gathering data on children's heights and you want to calculate the variance, the result will be, for instance, 16 inches squared. Then, we take the square root of the variance (because it makes more sense to talk about height in terms of "inches" rather then "inches squared"), giving us a standard deviation of 4 inches. Does this make sense?
• Is there a resource for understanding what a sample variance is? He just starts talking about it like I am already familiar with the concept and definition, but this is the first time it's been mentioned as I've been working through this class. • It seems to me that, throughout watching all videos previous this, statistics is not based on any reasonable methodology, is that truth? •  