Statistics and probability
Course: Statistics and probability > Unit 3Lesson 4: Variance and standard deviation of a population
- Measures of spread: range, variance & standard deviation
- Variance of a population
- Population standard deviation
- The idea of spread and standard deviation
- Calculating standard deviation step by step
- Standard deviation of a population
- Mean and standard deviation versus median and IQR
- Concept check: Standard deviation
- Statistics: Alternate variance formulas
Population standard deviation
The population standard deviation is a measure of how much variation there is among individual data points in a population. It's a way of quantifying how spread out the data is from its mean. A small standard deviation means that the data points are generally close to the mean, while a large standard deviation means that the data is more dispersed. Created by Sal Khan.
Want to join the conversation?
- Isn't the dividing part wrong?
I learned it should not be 5 in this case, but it should be 4 which is n-1.(4 votes)
- You divide by "n-1" when dealing with the Sample Standard deviation. In this video Sal is calculating the Standard deviation of the population, which is why he is dividing by "N".(8 votes)
- How could the concept of variance be usefull in real life ?(7 votes)
- So in this example the standard deviation is 0.562 meters, does that mean that the 5.5 meters of the original data set is a bit of an outlier since it's not within the standard deviation of the mean?(6 votes)
- What does Population standard deviation mean??(2 votes)
- The standard deviation of the population. Most if not all the values that we quantify in the field of Statistics - things like the mean (average), or median, or standard deviation, etc - can be thought of in two ways:
1. What is the value of the quantity considering only our sample data? This is what we call a "statistic".
2. What would be the value of the quantity if we were able to get data on the wheel population, meaning every possible data point. This is what we call a "parameter".
So there are statistics and parameters. We use a statistic to estimate (make an educated guess at the value) of a parameter. The population standard deviation is simply referencing the population parameter, rather than the sample statistic. Sometimes (often) the value of the parameter is unknown or even unknowable, but we can still think of it in theory.(6 votes)
- Sal's question makes sense, why don't take the absolute value of it instead of take it to second power?(3 votes)
- Is 'var' the short form of variance?(2 votes)
- It depends on the context, I've seen it used for both. If it seems to be representing a single number or a function, then it's probably variance. If it seems to reference several characteristics (e.g. height, weight, eye color, etc), then it probably means variables.(4 votes)
- so both Variance and standard deviation are used to measure level of dispersion. what's the difference when you need to pick one to solve real-world problem?(3 votes)
- It's true that they both are used to measure the level of dispersion but the difference is that the SD is a "true" average distance from the mean. Therefore, SD is more "useful." Variance is just a step before you get SD.(3 votes)
- with n=13 and p=0.5 find p( at least 10)(3 votes)
- you said that p= 0.5 so the anser is 0.5(1 vote)
- At1:35when I used my calculator I got a different answer it said 18.6 instead of 4.6 I am quite puzzled because I have repeated the calculation correctly and still I have the same wrong answer 18.6.(1 vote)
- Are you sure you've input the decimals properly? The answer is 4.6. The numbers are 4.0, 4.2, 5.0, 4.3 and 5.5.(4 votes)
- Hello im wondering how can u do this when the middle number has a repating number for mine is 233/ 6 whitch = 38.8333333333333333333333333333333 and keeps going how do i solve this?(1 vote)
- you can convert it into a fraction which is 38 5/6(4 votes)
Let's say that you're curious about studying the dimensions of the cars that happen to sit in the parking lot. And so you measure their lengths. Let's just make the computation simple. Let's say that there are five cars in the parking lot. The entire size of the population that we care about is 5. And you go and measure their lengths-- one car is 4 meters long, another car is 4.2 meters long, another car is 5 meters long, the fourth car is 4.3 meters long, and then, let's say the fifth car is 5.5 meters long. So let's come up with some parameters for this population. So the first one that you might want to figure out is a measure of central tendency. And probably the most popular one is the arithmetic mean. So let's calculate that first. So we're going to do that for the population. So we're going to use mu. So what is the arithmetic mean here? Well, we just have to add all of these data points up and divide by 5. And I'll just get the calculator out just so it's a little bit quicker. This is going to be for 4 plus 4.2 plus 5 plus 4.3 plus 5.5. And then, I'm going to take that sum and then divide by 5. And I get an arithmetic mean for my population of 4.6. So that's fine. And if we want to put some units there, it's 4.6 meters. Now, that's the central tendency or measure of central tendency. We also might be curious about how dispersed is the data, especially from that central tendency. So what would we use? Well, we already have a tool at our disposal-- the population variance. And the population variance is one of many ways of measuring dispersion. It has some very neat properties the way we've defined it as the mean of the squared distances from the mean. It tends to be a useful way of doing it. So let's just a bit. Let's actually calculate the population variance for this population right over here. Well, all we need to do is find the distance from each of these points to our mean right over here. And then, square them. And then, take the mean of those two squared distances. So let's do that. So it's going to be 4 minus 4.6 squared plus 4.2 minus 4.6 squared plus 5 minus 4.6 squared plus 4.3 minus 4.6 squared. And then, finally-- I'm running out of space-- plus 5.5 minus 4.6 squared. And then, we're going to divide all of that by 5 to get our population variance. And so what's that going to give us? Let's get our calculator out. 4 minus 4.6 squared. That's negative 0.6 squared. Negative 0.6 squared is going to be the exact same thing as 0.6 squared. So let me write that as 0.6 squared plus 4.2 minus 4.6 is negative 0.4. But when we square it, the negative's going to disappear. So it's going to be plus 0.4. I'll just write 0.4 squared. And then, we have 5 minus 4.6. That's 0.4 so plus 0.4 squared. 4.3 minus 4.6. It's negative 0.3. The negative goes away when you square it. It's going to be plus 0.3 squared. And then, finally, 5.5 minus 4.6 is going to be 0.9. So plus 0.9 squared. Then, we will divide by the number of data points we have. And we get 0.316. Or if we want to write it, this is going to be 0.316. Now, let me ask you what is a mildly interesting question-- what would be the units for this population variance? Since we happen to care about units in this video. Well, up here, this is 4 meters minus 4.6 meters. 4.2 meters minus 4.6 meters. So these are all meters. These are measurements in meters. We saw it up here. So these are all measurements in meters. When you subtract them, you'll get meters. But then when you square them, you get meters squared plus meters squared plus meters squared plus meters squared plus meters squared. And then, you're just dividing that by a unitless count of the number of data points you have. So the units here are going to be square meters. And so you might say, hey. That's kind of a weird unit if we're trying to visualize or think about how dispersed we are from the mean. When I visualize it, I visualize dispersion or how varied they are in terms of meters, not meters squared. So what could we do? And a big hint-- this comes out of just even the notation for variance. And it's this sigma symbol squared. So why don't we just take the square root of our variance? Which we will denote with just a sigma. It makes a lot of sense. And in this case, what's it going to be? It's going to be the square root of 0.316. And then, what are the units going to be? It's going to be just meters. And we end up with-- so let me take the square root of 0.316. And I get 0.56-- I'll just round to the nearest thousandth-- 0.562. So this is approximately 0.562 meters. So you might be saying, Sal, what do we call this thing that we just did? The square root of the variance. And here we're dealing with the population. We haven't thought about sampling yet. The square root of the population variance, what do we call this thing right over here? And this is a very familiar term. Oftentimes, when you take an exam, this is calculated for the scores on the exam. This is our population-- let me do this in a new color. I'm using that yellow a little bit too much. This is the population standard deviation. It is a measure of how much the data is varying from the mean. In general, the larger this value, that means that the data is more varied from the population mean. The smaller, it's less varied. And these are all somewhat arbitrary definitions of how we've defined variance. We could have taken things to the fourth power. We could have done other things. We could have not taken them to a power but taking the absolute value here. The reason why we do it this way is it has neat statistical properties as we try to build on it. But that's the population standard deviation, which gives us nice units-- meters. In the next video, we'll think about the sample standard deviation.