If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## AP®︎/College Statistics

### Course: AP®︎/College Statistics>Unit 3

Lesson 5: More on standard deviation (optional)

# Simulation providing evidence that (n-1) gives us unbiased estimate

Simulation by KA user tetef showing that dividing by (n-1) gives us an unbiased estimate of population variance. Simulation at: http://www.khanacademy.org/cs/will-it-converge-towards-1/1167579097. Created by Sal Khan.

## Want to join the conversation?

• Just curious: Was it by simulations like this that statisticians originally figured out the n-1 thing? Or is that conclusion actually really obvious if you just understand the "pure math" underlying it? •   No, they did it analytically. They probably came up with some intuition of the need to adjust the variance, but intuition cannot tell you why you have to divide exactly by n-1.

There is a geometrical reason to dividing by n-1, it's the number of degrees of freedom. You can see this for the sample variance by considering the number of independent data points. To compute the sample variance, you compute first the sample mean. This means that given this sample mean, if someone gives you all the data points except one, you can figure out by yourself what the last data point is. So, you actually don't have a sample size of n data points to compute the sample variance, but a sample size of n-1.
• I'm sorry, but what does biased and unbiased mean? • A biased estimate is an one that consistently underestimates or overestimates.

For example, sample estimates using (n) tend to consistently underestimate the population variance. So we say it has a BIAS for underestimation.

Sample estimates using (n-1) however do not tend to underestimate or overestimate, so we consider it UNBIASED.

Note that unbiased is not the same thing as accurate. Suppose I use another method that sometimes way underestimates, but at other times way overestimates. This method is not very accurate, but it is also unbiased -- the mean of its errors would be close to zero since the overestimates would "cancel out" the underestimates.
(1 vote)
• When do you make a question to do with variance (n-1)? When is it just n? Thank you. would really appreciate a clear answer... • These explanations are based on empirical evidence, Is there a theoretical explanation for dividing by n-1? • I understand that n-1 provides a more accurate estimation. However, if we know our population N value, couldn't we just subtract the n/N ratio from n instead? For example, if N=20 and n=10, we would know the ratio is 0.5. Therefore, we could find an even better estimate from n-0.5.
(1 vote) • The number that we subtract has nothing to do with the size of the population. It's not just that it makes the estimate "more accurate," it's that it makes it what Statisticians call "unbiased."

Think back to the sampling distribution of the sample mean. So, if we repeated an experiment over and over again, and recorded the sample mean from each of the repeated experiments. The mean of the sampling distribution of the sample mean -- what Sal talks sometimes refers to as the "mean of means" -- happens to be equal to the mean of the original distribution. Because of this, we say that the sample mean is "unbiased" - it doesn't systematically overestimate or underestimate the population mean.

This is not the case with the variance. If we calculate the variance over and over again, using n in the denominator, the "mean of variances" (a strange concept, but it's the proper one to think about) will not be equal to σ^2, it will be σ^2 * (n-1)/n. By dividing by n-1 instead of n, we fix this problem. Using n, the sample variance is biased, because it tends to underestimate the population variance. Using n-1, the sample variance is unbiased.

So in this sense, it's not possible to get a better estimate for the variance. Subtracting 1, and specifically 1, is the best we can do. Changing what we divide by can only make it worse. Now, there are other criteria we might look at which may make a different estimate of the sample variance seem "better," but if we're just talking about the denominator we're using, n-1 can't be beat.
• Hi all,

I have also heard people saying that we divide by the degrees of freedom, which, as I understand, would be the numbers of values I need to fix to get the information on all values. In this case, this would mean that, if I am provided with the sample mean, I only have n-1 degrees of freedom as I can calculated the last value in my sample by the information I got.
Question 1: Did I understand this correctly so far?
Question 2: Where is the logical link between 'I can estimate the last value based on the information I am given' and 'I better divide by n-1 to estimate the variance'?
Question 3: The same idea would be true for the population variance. Here, too, I can calculate the missing value given n-1 values and the mean? So why, under the aspect of degrees of freedom, would I still divide by n here?
Question 4: How is the concept of degrees of freedom related to the explanation for using n-1 provided in the video?

Thank you very much for your help! • Isn't the relative size of the sample compared to the population relevant when calculating the sample variance? I mean, if we calculate the variance of 99 elements out of a population of 100 elements, won't the variance of this sample be more accurately described by N, and not (N-1)? Is there a threshold for a sample to be described by (N-1)? • That’s an excellent question, and I’m not sure about the answer.

But if our sample size is only one or two less than our population size, we might as well look at every element in the population instead. Sampling is used when it is not practical to take information from the whole population, so there is usually a good portion of the population left over. So, this situation isn’t practical, but it is interesting to think about theoretically.   