Statistics and probability
Course: Statistics and probability > Unit 3Lesson 4: Variance and standard deviation of a population
- Measures of spread: range, variance & standard deviation
- Variance of a population
- Population standard deviation
- The idea of spread and standard deviation
- Calculating standard deviation step by step
- Standard deviation of a population
- Mean and standard deviation versus median and IQR
- Concept check: Standard deviation
- Statistics: Alternate variance formulas
Variance of a population
Population variance is a measure of how spread out a group of data points is. Specifically, it quantifies the average squared deviation from the mean. So, if all data points are very close to the mean, the variance will be small; if data points are spread out over a wide range, the variance will be larger. Created by Sal Khan.
Want to join the conversation?
- At3:30, why square the distance to the mean to get a positive number? Why not just take the absolute value?(137 votes)
- You're right--we could instead take the absolute value. In fact, what you're describing sounds like what statisticians usually call the "Mean Absolute Deviation", or "MAD", for short (other names sometimes used are the "average deviation" or "mean deviation"). Just as there are different "measures of central tendency" of a set of observations (such as mean, median, mode, etc.) there are also different "measures of dispersion" (besides the MAD and standard deviation, another common "measure of dispersion" is the "Interquartile Range").
However, statisticians usually prefer the variance/standard deviation versus the MAD because the MAD is not as "mathematically tractable" as the variance (i.e. the variance is easier to work with than the MAD--though the exact reasons why this is true is beyond the scope of the video).
The important thing to remember, however, is that, in general, the MAD and the standard deviation will NOT be equal. Thus, if you're teacher asks for the standard deviation and you calculate the MAD, you will probably get the wrong answer.
For more information, check out:
- What is the purpose of finding population variance? What does this value represent in simple terms?(38 votes)
- Earlier in the playlist, Khan described different "measures of central tendency", specifically, the mean, median, & mode. The next step, however, is to learn about different "measures of dispersion"--i.e. how dispersed the data is. Of these different "measures of dispersion", the variance (and, hence, the standard deviation) is the most frequently used and, thus, the most important.
An example might help:
If you have a city where the average height is 5'6 it could be the case that every adult in the city is exactly 5'6 or it could be the case that half the adults were exactly 4 feet tall and the other half were exactly 7 feet tall--the average height in both cases is 5'6. Thus, we use the variance to measure how spread out a set of data is.
Or, as another example, in Finance, the standard deviation of returns is often used to represent the "riskiness" of a company's stock (where a high standard deviation would suggest a risky stock).
Does this help??(104 votes)
- Ok, so the variance of this population is 20. But what does that tell me, really, about this population? What does the number 20 tell me about the experience levels at the Kahn Academy? I understand that variance is a measure of spread in the data, but is 20 a large spread? Would we say that the population is very various? Or are these questions not meaningful given the small size of the sample?(39 votes)
- It helps you figure out how good an indication the mean is of a typical employee. If there's a large variance, you know that there's a large experience level gap between different employees, if the variance is small you know that all the employees have more or less the the mean experience.(28 votes)
- In real life, when would you need to know how to solve this problem? Can someone give me an example on how you would use variance in real life?(3 votes)
- There was an episode of the US television show "Mythbusters" last year where they tested the idea that you can "hold" urination by dancing. The problem is, in their experiment, they did one sample of how long they could "hold it" without dancing, and one sample with dancing. With only one sample, they could not estimate the variance in their data (and thus, they made senseless conclusions).
I assume you're not a television scientist, but the same idea applies any time you collect information about the world. If you're planning a dinner for 50 people, you need to consider the variance in that number: if the variance is large, you'd better have an extra table ready. When I receive post from abroad, it usually comes after one week, but since there's a large variance, I'm not worried if something hasn't yet arrived after 10 days.
Like many things in math, you won't do these explicit calculations every day. Instead, you'll internalize them. You'll understand the main idea, and use mental estimates instead of calculations. But those are estimations you wouldn't have made before you studied these things, and that's why variance is valuable. Hope that helps! :)(25 votes)
- Around6:10he's talking about the 20 being the squared distance away from the population mean - is there a time when you would take the square root?(6 votes)
- Yep. The square root of the variance is called the standard deviation, which will be another crucial concept that you'll get to pretty soon.(19 votes)
- Where in real life would we use variance? I mean, I understand the equation, but not the concept behind it. Like why do we square the numbers, why is the answer larger than the population mean, and what's the difference between the upper-case and lower-case sigma?(3 votes)
- Variance is a measure of how much a data set differs from its mean.
Old math joke: Two mathematicians go duck hunting. One shoots 1 foot in front of the duck, the other shoots 1 foot behind the duck. The first cries out "on average, we got it"
The mean of their shots was on the duck, but the variance was too large.
If two data sets have the same mean, are they really the same data set (from the same population)? Variance gives you more information about the distribution of the data.
We square the values to make them all positive... in the duck joke, if you only added the distance from each data point to the mean you would get a variance of zero (-1 + 1 = 0). So you find the difference between a data point and the mean, then square that difference (to make it positive), then find the mean of all of those squared differences.
If the data is widely distributed the variance can get very large... the reals world is annoying like that.
Upper case sigma (big E) usually means 'sum up a bunch of stuff' while lower case sigma (small o with a tail at the top) means 'standard deviation' which is the square root of the variance.(17 votes)
- I understand that we square in order to get a positive value, that makes sense. But why not just take the absolute value of each element? For example, have |1 - 6| + |3 - 6| + |5 - 6|. And if you wanted to find the average of how much each deviated by, you could just divide this all by 3. Here you'd get an answer of 3, meaning that on average, each point differs from the mean by 3.
Why isn't this method used?(4 votes)
- As it was explained to me when I asked the exact same question...
An answer sadly lacking in rationale or mathematical rigor.
My understanding is that it could have been, that there would have been advantages and disadvantages in choosing that strategy and that the people involved determined the squares method was the most useful at the time... and once the tradition gets set and people start to produce results, it is devilishly hard to change.
I do know that in a normal distribution the square root of the variance is the standard deviation, which is another useful statistic in application. I don' t know if there would be a comparable statistic derived from the absolute value method.(0 votes)
- Hello, everyone. I can understand why we compute mean - it represents all numbers in data. But variance is more complicated. I do not understand where and how I might use that number. And I don't know any of life areas where variance is used. So, can you please give me some examples of practical use of variance? Thank you.(3 votes)
- Variance and standard deviation allow you to quickly understand how close most of a population is to the mean.
For instance, average adult male height in the USA is 70in, the standard deviation being 2in.
Now, about 2/3 of adult males in the US are between 68in and 72in, so the standard deviation tells you that it's normal to be within about 2 inches of the mean. It lets you know that someone 6'4'' is rather tall, although he wouldn't be tall if the deviation was 6 inches.(4 votes)
- at1:27whats that big 'e' ??(2 votes)
- It's a capital sigma - that's the Greek equivalent of the letter S. The capital sigma is used to write a sum when all the terms are very similar. In the video, all the terms are x with some subscript.
This video explains it better than I can in plain text: https://www.khanacademy.org/math/algebra2/sequences-and-series/copy-of-sigma-notation/v/sigma-notation-sum(5 votes)
- How does the Std. Dev compare to the average of the absolute value of the differences from the mean? In other words, is taking the average of the absolute values of the differences between each data point and the mean a useful number?(2 votes)
- It used to be that Mean Absolute Deviation (MAD) was the standard way of communicating dispersion in scores. However, when MAD is calculated for a sample, it tends to be (some argue) a negatively biased estimate of the population MAD. In other words, if you knew what the population MAD was, you'd find that a sample MAD would more often be lower than the population MAD. This is not a good characteristic for a sample statistic. We want a sample statistic to be an unbiased estimator of the population parameter. The sample statistic can be higher or lower than the actual population parameter (there is always sampling error), but we'd like the sampling error to be random. It could be too high or too low, but we don't want it to be consistently too low (like sample MAD is). Standard Deviation is an unbiased estimator in part because the differences are squared. This means that an occasional outlier counts for more, because that difference is squared and the impact of the outlier on the overall sample statistic is greater. This helps to make the sample estimate of SD a little bigger, and a better estimate of population SD.(4 votes)
Let's say I'm trying to judge how many years of experience we have at the Khan Academy. Or on average, how many years of experience we have. And in particular, the particular type of average we'll focus on, is the arithmetic mean. So I go and I survey the folks there. And let's say this was when Khan Academy was a smaller organization, when there were only five people in the organization. And I find-- and I'm surveying the entire population-- so years of experience, the entire population of Khan Academy, because that's what I care about, years of experience at our organization, at Khan Academy. And this was when we had five people. And I were to go-- we're now 36 people, I don't want to date this video too much-- but let's say I go, and I say, OK, there's one person straight out of college, they have one year of experience, or recently out of college, somebody with three years of experience, someone with five years of experience, someone with seven years of experience, and someone very experienced, or reasonably experienced, with 14 years of experience. So based on this data point, and this is our population, for years of experience. I'm assuming that we only have five people in the organization, at this point. What would be the population mean for the years of experience? What is the mean years of experience for my population? Well, we can just calculate that. Our mean experience, and I'm going to denote it with mu, because we're talking about the population now. This is a parameter for the population. It's going to be equal to the sum, from our first data point, so data point one all the way to data point, in this case, data point five-- we have five data points-- of each of-- so we're going to take all, from the first data point, the second data point, the third data point, all the way to the fifth. So this is going to be equal to x1, plus x-- and I'm going to divide it all by the number of data points I have-- plus x2, plus x3, plus x4, plus x sub 5, subscript 5. All of that over 5. And as we said, this is a very fancy way of saying, I'm going to sum up all of these things and then divide by the number of things we have. So let's do that. Get the calculator out. So I'm going to add them all up, 1 plus 3 plus 5-- I really don't need a calculator for this-- plus 7 plus 14. So that's five data points. And I'm going to divide by 5. And I get 6. So the population mean, for years of experience at my organization, is 6. 6 years of experience. Well, that's, I guess, interesting. But now I want to ask another question. I want to get some measure of how much spread there is around that mean. Or how much do the data points vary around that mean. And obviously, I can give someone all the data points. But instead, I actually want to come up with a parameter that somehow represents how much all of these things, on average, are varying from this number right here. Or maybe I will call that thing the variance. And so, what I do-- so the variance-- and I will do-- and this is a population variance that I'm talking about, just to be clear, it's a parameter. The population variance I'm going to denote with the Greek letter sigma, lowercase sigma-- this is capital sigma-- lowercase sigma squared. And I'm going to say, well, I'm going to take the distance from each of these points to the mean. And just so I get a positive value, I'm going to square it. And then, I'm going to divide by the number of data points that I have. So essentially, I'm going to find the average squared distance. Now that might sound very complicated, but let's actually work it out. So I'll take my first data point and I will subtract our mean from it. So this is going to give me a negative number. But if I square it, it's going to be positive. So it's, essentially, going to be the squared distance between 1 and my mean. And then, to that, I'm going to add the squared distance between 3 and my mean. And to that, I'm going to add the squared distance between 5 and my mean. And since I'm squaring, it doesn't matter if I do 5 minus 6, or 6 minus 5. When I square it, I'm going to get a positive result regardless. And then, to that I'm going to add the squared distance between 7 and my mean. So 7 minus 6 squared. All of this, this is my population mean that I'm finding the difference between. And then, finally, the squared difference between 14 and my mean. And then, I'm going to find, essentially, the mean of these squared distances. So I have five squared distances right over here. So let me divide by 5. So what will I get when I make this calculation, right over here? Well, let's figure this out. This is going to be equal to 1 minus 6 is negative 5, negative 5 squared is 25. 3 minus 6 is negative 3, now if I square that, I get 9. 5 minus 6 is negative 1, if I square it, I get positive 1. 7 minus 6 is 1, if I square it, I get positive 1. And 14 minus 6 is 8, if I square it, I get 64. And then, I'm going to divide all of that by 5. And I don't need to use a calculator, but I tend to make a lot of careless mistakes when I do things while making a video. So I get 25 plus 9 plus 1 plus 1 plus 64 divided by 5. So I get 20. So the average squared distance, or the mean squared distance, from our population mean is equal to 20. You may say, wait, these things aren't 20 away. Remember, it's the squared distance away from my population mean. So I squared each of these things. I liked it, because it made it positive. And we'll see later it has other nice properties about it. Now the last thing is, how can we represent this mathematically? We already saw that we know how to represent a population mean, and a sample mean, mathematically like this, and hopefully, we don't find it that daunting anymore. But how would we do the exact same thing? How would we denote what we did, right over here? Well, let's just think it through. We're just saying that the population variance, we're taking the sum of each-- so we're going to take each item, we'll start with the first item. And we're going to go to the n-th item in our population. We're talking about a population here. And we're going to take-- we're not going to just take the item, this would just be the item-- but we're going take the item. And from that, we're going to subtract the population mean. We're going to subtract this thing. We're going to subtract this thing. We're going to square it. We're going to square it. So the way I've written it right now, this would just be the numerator. I've just taken the sum of each of these things, the sum of the difference between each data point and the population mean and squared it. If I really want to get the way I figure out this variance right over here, I have to divide the whole thing by the number of data points we have. So this might seem very daunting, and very intimidating. But all it says is, take each of your data points-- well, one, it says, figure out your population mean. Figure that out first. And then, from each data point, in your population, subtract out that population mean, square it, take the sum of all of those things, and then just divide by the number of data points you have. And you will get your population variance.