Main content

## Statistics and probability

### Course: Statistics and probability > Unit 3

Lesson 4: Variance and standard deviation of a population- Measures of spread: range, variance & standard deviation
- Variance of a population
- Population standard deviation
- The idea of spread and standard deviation
- Calculating standard deviation step by step
- Standard deviation of a population
- Mean and standard deviation versus median and IQR
- Concept check: Standard deviation
- Statistics: Alternate variance formulas

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Variance of a population

Population variance is a measure of how spread out a group of data points is. Specifically, it quantifies the average squared deviation from the mean. So, if all data points are very close to the mean, the variance will be small; if data points are spread out over a wide range, the variance will be larger. Created by Sal Khan.

## Want to join the conversation?

- At3:30, why square the distance to the mean to get a positive number? Why not just take the absolute value?(137 votes)
- You're right--we could instead take the absolute value. In fact, what you're describing sounds like what statisticians usually call the "Mean Absolute Deviation", or "MAD", for short (other names sometimes used are the "average deviation" or "mean deviation"). Just as there are different "measures of central tendency" of a set of observations (such as mean, median, mode, etc.) there are also different "measures of dispersion" (besides the MAD and standard deviation, another common "measure of dispersion" is the "Interquartile Range").

However, statisticians usually prefer the variance/standard deviation versus the MAD because the MAD is not as "mathematically tractable" as the variance (i.e. the variance is easier to work with than the MAD--though the exact reasons why this is true is beyond the scope of the video).

The important thing to remember, however, is that, in general, the MAD and the standard deviation will NOT be equal. Thus, if you're teacher asks for the standard deviation and you calculate the MAD, you will probably get the wrong answer.

For more information, check out:

http://en.wikipedia.org/wiki/Mean_absolute_deviation(43 votes)

- What is the purpose of finding population variance? What does this value represent in simple terms?(38 votes)
- Earlier in the playlist, Khan described different "measures of central tendency", specifically, the mean, median, & mode. The next step, however, is to learn about different "measures of dispersion"--i.e. how dispersed the data is. Of these different "measures of dispersion", the variance (and, hence, the standard deviation) is the most frequently used and, thus, the most important.

An example might help:

If you have a city where the average height is 5'6 it could be the case that every adult in the city is exactly 5'6 or it could be the case that half the adults were exactly 4 feet tall and the other half were exactly 7 feet tall--the average height in both cases is 5'6. Thus, we use the variance to measure how spread out a set of data is.

Or, as another example, in Finance, the standard deviation of returns is often used to represent the "riskiness" of a company's stock (where a high standard deviation would suggest a risky stock).

Does this help??(104 votes)

- Ok, so the variance of this population is 20. But what does that tell me, really, about this population? What does the number 20 tell me about the experience levels at the Kahn Academy? I understand that variance is a measure of spread in the data, but is 20 a large spread? Would we say that the population is very various? Or are these questions not meaningful given the small size of the sample?(39 votes)
- It helps you figure out how good an indication the mean is of a typical employee. If there's a large variance, you know that there's a large experience level gap between different employees, if the variance is small you know that all the employees have more or less the the mean experience.(28 votes)

- In real life, when would you need to know how to solve this problem? Can someone give me an example on how you would use variance in real life?(3 votes)
- There was an episode of the US television show "Mythbusters" last year where they tested the idea that you can "hold" urination by dancing. The problem is, in their experiment, they did one sample of how long they could "hold it" without dancing, and one sample with dancing. With only one sample, they could not estimate the variance in their data (and thus, they made senseless conclusions).

I assume you're not a television scientist, but the same idea applies any time you collect information about the world. If you're planning a dinner for 50 people, you need to consider the variance in that number: if the variance is large, you'd better have an extra table ready. When I receive post from abroad, it usually comes after one week, but since there's a large variance, I'm not worried if something hasn't yet arrived after 10 days.

Like many things in math, you won't do these explicit calculations every day. Instead, you'll internalize them. You'll understand the main idea, and use mental estimates instead of calculations. But those are estimations you wouldn't have made before you studied these things, and that's why variance is valuable. Hope that helps! :)(25 votes)

- Around6:10he's talking about the 20 being the squared distance away from the population mean - is there a time when you would take the square root?(6 votes)
- Yep. The square root of the variance is called the standard deviation, which will be another crucial concept that you'll get to pretty soon.(19 votes)

- Where in real life would we use variance? I mean, I understand the equation, but not the concept behind it. Like why do we square the numbers, why is the answer larger than the population mean, and what's the difference between the upper-case and lower-case sigma?(3 votes)
- Variance is a measure of how much a data set differs from its mean.

Old math joke: Two mathematicians go duck hunting. One shoots 1 foot in front of the duck, the other shoots 1 foot behind the duck. The first cries out "on average, we got it"

The mean of their shots was on the duck, but the variance was too large.

If two data sets have the same mean, are they really the same data set (from the same population)? Variance gives you more information about the distribution of the data.

We square the values to make them all positive... in the duck joke, if you only added the distance from each data point to the mean you would get a variance of zero (-1 + 1 = 0). So you find the difference between a data point and the mean, then square that difference (to make it positive), then find the mean of all of those squared differences.

If the data is widely distributed the variance can get very large... the reals world is annoying like that.

Upper case sigma (big E) usually means 'sum up a bunch of stuff' while lower case sigma (small o with a tail at the top) means 'standard deviation' which is the square root of the variance.(17 votes)

- I understand that we square in order to get a positive value, that makes sense. But why not just take the absolute value of each element? For example, have |1 - 6| + |3 - 6| + |5 - 6|. And if you wanted to find the average of how much each deviated by, you could just divide this all by 3. Here you'd get an answer of 3, meaning that on average, each point differs from the mean by 3.

Why isn't this method used?(4 votes)- As it was explained to me when I asked the exact same question...

just because

An answer sadly lacking in rationale or mathematical rigor.

My understanding is that it could have been, that there would have been advantages and disadvantages in choosing that strategy and that the people involved determined the squares method was the most useful at the time... and once the tradition gets set and people start to produce results, it is devilishly hard to change.

I do know that in a normal distribution the square root of the variance is the standard deviation, which is another useful statistic in application. I don' t know if there would be a comparable statistic derived from the absolute value method.(0 votes)

- Hello, everyone. I can understand why we compute mean - it represents all numbers in data. But variance is more complicated. I do not understand where and how I might use that number. And I don't know any of life areas where variance is used. So, can you please give me some examples of practical use of variance? Thank you.(3 votes)
- Variance and standard deviation allow you to quickly understand how close most of a population is to the mean.

For instance, average adult male height in the USA is 70in, the standard deviation being 2in.

Now, about 2/3 of adult males in the US are between 68in and 72in, so the standard deviation tells you that it's normal to be within about 2 inches of the mean. It lets you know that someone 6'4'' is rather tall, although he wouldn't be tall if the deviation was 6 inches.(4 votes)

- How does the Std. Dev compare to the average of the absolute value of the differences from the mean? In other words, is taking the average of the absolute values of the differences between each data point and the mean a useful number?(2 votes)
- It used to be that Mean Absolute Deviation (MAD) was the standard way of communicating dispersion in scores. However, when MAD is calculated for a sample, it tends to be (some argue) a negatively biased estimate of the population MAD. In other words, if you knew what the population MAD was, you'd find that a sample MAD would more often be lower than the population MAD. This is not a good characteristic for a sample statistic. We want a sample statistic to be an unbiased estimator of the population parameter. The sample statistic can be higher or lower than the actual population parameter (there is always sampling error), but we'd like the sampling error to be random. It could be too high or too low, but we don't want it to be consistently too low (like sample MAD is). Standard Deviation is an unbiased estimator in part because the differences are squared. This means that an occasional outlier counts for more, because that difference is squared and the impact of the outlier on the overall sample statistic is greater. This helps to make the sample estimate of SD a little bigger, and a better estimate of population SD.(4 votes)

- at1:27whats that big 'e' ??(2 votes)
- It's a capital sigma - that's the Greek equivalent of the letter S. The capital sigma is used to write a sum when all the terms are very similar. In the video, all the terms are x with some subscript.

This video explains it better than I can in plain text: https://www.khanacademy.org/math/algebra2/sequences-and-series/copy-of-sigma-notation/v/sigma-notation-sum(4 votes)

## Video transcript

Let's say I'm trying to judge
how many years of experience we have at the Khan Academy. Or on average, how many
years of experience we have. And in particular, the
particular type of average we'll focus on, is
the arithmetic mean. So I go and I survey
the folks there. And let's say this was when
Khan Academy was a smaller organization, when
there were only five people in the organization. And I find-- and I'm surveying
the entire population-- so years of experience, the
entire population of Khan Academy, because that's
what I care about, years of experience at our
organization, at Khan Academy. And this was when
we had five people. And I were to go--
we're now 36 people, I don't want to date this video
too much-- but let's say I go, and I say, OK, there's one
person straight out of college, they have one year
of experience, or recently out of
college, somebody with three years of
experience, someone with five years of
experience, someone with seven years of experience,
and someone very experienced, or reasonably experienced,
with 14 years of experience. So based on this data point,
and this is our population, for years of experience. I'm assuming that we
only have five people in the organization,
at this point. What would be the
population mean for the years of experience? What is the mean years of
experience for my population? Well, we can just
calculate that. Our mean experience,
and I'm going to denote it with
mu, because we're talking about the
population now. This is a parameter
for the population. It's going to be equal to the
sum, from our first data point, so data point one all the way
to data point, in this case, data point five-- we have
five data points-- of each of-- so we're going to take
all, from the first data point, the second data
point, the third data point, all the way to the fifth. So this is going to be
equal to x1, plus x-- and I'm going to divide it all
by the number of data points I have-- plus x2, plus x3, plus
x4, plus x sub 5, subscript 5. All of that over 5. And as we said, this is a
very fancy way of saying, I'm going to sum up
all of these things and then divide by the
number of things we have. So let's do that. Get the calculator out. So I'm going to add them
all up, 1 plus 3 plus 5-- I really don't need a calculator
for this-- plus 7 plus 14. So that's five data points. And I'm going to divide by 5. And I get 6. So the population
mean, for years of experience at my
organization, is 6. 6 years of experience. Well, that's, I
guess, interesting. But now I want to
ask another question. I want to get some
measure of how much spread there is around that mean. Or how much do the data
points vary around that mean. And obviously, I can give
someone all the data points. But instead, I actually
want to come up with a parameter that
somehow represents how much all of these things,
on average, are varying from this number right here. Or maybe I will call
that thing the variance. And so, what I do-- so the
variance-- and I will do-- and this is a
population variance that I'm talking about, just
to be clear, it's a parameter. The population
variance I'm going to denote with the Greek letter
sigma, lowercase sigma-- this is capital sigma--
lowercase sigma squared. And I'm going to
say, well, I'm going to take the distance from each
of these points to the mean. And just so I get a positive
value, I'm going to square it. And then, I'm going to divide
by the number of data points that I have. So essentially,
I'm going to find the average squared distance. Now that might sound
very complicated, but let's actually work it out. So I'll take my first
data point and I will subtract our mean from it. So this is going to give
me a negative number. But if I square it, it's
going to be positive. So it's, essentially,
going to be the squared distance
between 1 and my mean. And then, to that,
I'm going to add the squared distance
between 3 and my mean. And to that, I'm going to add
the squared distance between 5 and my mean. And since I'm
squaring, it doesn't matter if I do 5
minus 6, or 6 minus 5. When I square it, I'm going
to get a positive result regardless. And then, to that
I'm going to add the squared distance
between 7 and my mean. So 7 minus 6 squared. All of this, this
is my population mean that I'm finding
the difference between. And then, finally, the squared
difference between 14 and my mean. And then, I'm going
to find, essentially, the mean of these
squared distances. So I have five squared
distances right over here. So let me divide by 5. So what will I get when
I make this calculation, right over here? Well, let's figure this out. This is going to be equal
to 1 minus 6 is negative 5, negative 5 squared is 25. 3 minus 6 is negative 3, now
if I square that, I get 9. 5 minus 6 is negative 1, if I
square it, I get positive 1. 7 minus 6 is 1, if I square
it, I get positive 1. And 14 minus 6 is 8, if
I square it, I get 64. And then, I'm going to
divide all of that by 5. And I don't need to
use a calculator, but I tend to make a
lot of careless mistakes when I do things
while making a video. So I get 25 plus 9 plus 1
plus 1 plus 64 divided by 5. So I get 20. So the average squared distance,
or the mean squared distance, from our population
mean is equal to 20. You may say, wait, these
things aren't 20 away. Remember, it's the
squared distance away from my population mean. So I squared each
of these things. I liked it, because
it made it positive. And we'll see later it has
other nice properties about it. Now the last thing
is, how can we represent this mathematically? We already saw that we know how
to represent a population mean, and a sample mean,
mathematically like this, and hopefully, we don't find
it that daunting anymore. But how would we do
the exact same thing? How would we denote what
we did, right over here? Well, let's just
think it through. We're just saying that
the population variance, we're taking the sum
of each-- so we're going to take each item, we'll
start with the first item. And we're going to go to the
n-th item in our population. We're talking about
a population here. And we're going to
take-- we're not going to just take the item,
this would just be the item-- but we're going take the item. And from that, we're going to
subtract the population mean. We're going to
subtract this thing. We're going to
subtract this thing. We're going to square it. We're going to square it. So the way I've
written it right now, this would just
be the numerator. I've just taken the sum
of each of these things, the sum of the difference
between each data point and the population
mean and squared it. If I really want to get
the way I figure out this variance right
over here, I have to divide the whole thing by the
number of data points we have. So this might seem
very daunting, and very intimidating. But all it says is, take each
of your data points-- well, one, it says, figure out
your population mean. Figure that out first. And then, from each data
point, in your population, subtract out that
population mean, square it, take the sum of all
of those things, and then just divide by the
number of data points you have. And you will get your
population variance.