If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Course: Statistics and probability>Unit 3

Lesson 4: Variance and standard deviation of a population

# Measures of spread: range, variance & standard deviation

Range, variance, and standard deviation all measure the spread or variability of a data set in different ways. The range is easy to calculate—it's the difference between the largest and smallest data points in a set. Standard deviation is the square root of the variance. Standard deviation is a measure of how spread out the data is from its mean. Created by Sal Khan.

## Want to join the conversation?

• What's the point of squaring the difference at just to make it positive when we could have taken the absolute value? •  The main reason to square the values is so they are all positive. You could take the absolute value instead, but squaring means that more variable points have a higher weighting. Squaring rather than taking the absolute value also means that taking the derivative of the function is easier.
Finally you can also view the equal as being the Euclidean distance between all the points and the mean of the points (in the same way that the distance between two points (x1, y1) and (x2, y2) is the square root of (x1-x2)^2 + (y1-y2)^2).

Credits to Peter Collingridge
• Why is it for the variance we square the deviations for data sets to make them positive? Doesn't it make more sense to simplely take the sum of their absolute values, then divide that by the number of data points? • In what case will either Variance or Standard Deviation be preferred over each other ?
At around Sal says that Variance has an odd set of units so is standard deviation better as it has the same units as the data itself ? • What is the difference between the standard deviation and the variance? Why is the variance in units squared and not represented by the units in the measurement? If squaring the numbers is just to make it positive (@ ), why not use the average absolute deviation? • Q1) The Standard Deviation is the "mean of mean". Basically, it is the square-root of the Variance (the mean of the differences between the data points and the average). Standard Deviation is the measure of how far a typical value in the set is from the average. The smaller the Standard Deviation, the closely grouped the data point are. The standard deviation of {1,2,3} would be less than the Standard Deviation of {0,4,7,10}. You can see clearly that the data-points are grouped closely together more in the first set than the second set of data-points. And of course, you will see the same when you have endured the boring process of calculating the Variance and then the Standard Deviation.

Therefore, the difference between Variance and the Standard Difference is that the Variance is "The average of the squared differences from the Mean" and the Standard Deviation is it's square-root.

Q2) I think we could use the absolute value,but for the official definition, you have to square the differences. Thanks to Lura Ercolano for clearing my misconception about using absolute value to get the variance. But remember, by squaring the differences, we get a wider spread between the differences, which is called higher weighting.
• There are many questioners here (including myself) wondering why squaring is used in the definition of variance instead of the more sensible absolute value. I've done a quick Web search on this question, and I believe I understand this better.

First, almost all the the reasons given have to do with ease of computation. This is largely irrelevant, since we have computers to aid us. For example, the first derivative of the abs val func has a discontinuity at zero. But computer numerical analysis can handle discontinuities, so calculations using the abs val definition should be easy using an advanced computer calculator program.

Next, some people like Euclidean distance over Manhattan (rectangular) distance. There is not much justification for this, as variance is not obviously an unconstrained 2-D distance problem. In fact, it is an n-dimensional problem, where n is the number of data measurements.

No, the real reason is historical: Gauss used the square variance definition to introduce his concept of normal distributions, where it is a perfect and natural fit. We might say, a least-squares fit, since one of the motivations is fitting 2nd-order polynomials to the error data.

The point is that the use of the mean and squaring in the definition of variance works great only for normal (Gaussian) distributions. As soon as you have data derived from two or more normal distributions, or a gamma distribution, or a Poisson distribution, or anything else, using abs val works better. In fact, the mean itself only works when there IS a mean,which is when the data is normally distributed. For general data, the mean is not defined, and other, more robust statistical measures must be found. (See https://en.wikipedia.org/wiki/Robust_statistics)

In summary, the definition of variance given here by Khan only applies to Gaussian distributions, which frequently arise in nature and in human behavior. But non-Gaussian distributions also frequently arise (such as when making many measurements with a ruler). For non-Gaussian data, this def of variance is erroneous. Instead, a measure of variance that shows the spreading of the data from each other should be used rather than the standard one, which shows the order-2 spreading from the mean.

Note: for more detailed information on the advantages of the related Absolute Mean Deviation, see http://www.leeds.ac.uk/educol/documents/00003759.htm . • I'm still kind of confused as to what exactly variance measures. • what is range? • So, by reading some of the questions and answers for this video, I have concluded the following: variance and standard deviation are artificial measures of dispersion, designed to be most useful in statistical calculations. The average of the absolute value of the difference of each data point from the mean COULD be used but the square method (variance) is generally adopted by statisticians and mathematicians for various reasons (eg derivatives are easier). Is this conclusion correct? Or is/are there other reasons that more variable points are given more weight (by use of squares not absolute values)?
Thanks. • To some extent, I would say yes. Using squares (or the method of "least squares") certainly does often make derivations easier. Though it's not entirely the only reason. The Normal distribution goes hand-in-hand with the notion of squaring deviations, and scientists centuries ago noticed that the Normal distribution worked quite well to model their astronomical data.

The method of least squares also results in the sample mean - a very intuitive and common measure of central tendency - being the "best" measure of center. And even better, the sampling distribution of the sample mean converges to the Normal distribution, so a lot of methods can be built on top of the Normal distribution, and as long as the sample mean is the starting point, everything should work out.

I have a longer, more detailed answer here (I really wish the KA links were shorter):  