Main content
Statistics and probability
Course: Statistics and probability > Unit 3
Lesson 4: Variance and standard deviation of a population- Measures of spread: range, variance & standard deviation
- Variance of a population
- Population standard deviation
- The idea of spread and standard deviation
- Calculating standard deviation step by step
- Standard deviation of a population
- Mean and standard deviation versus median and IQR
- Concept check: Standard deviation
- Statistics: Alternate variance formulas
© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Calculating standard deviation step by step
Introduction
In this article, we'll learn how to calculate standard deviation "by hand".
Interestingly, in the real world no statistician would ever calculate standard deviation by hand. The calculations involved are somewhat complex, and the risk of making a mistake is high. Also, calculating by hand is slow. Very slow. This is why statisticians rely on spreadsheets and computer programs to crunch their numbers.
So what's the point of this article? Why are we taking time to learn a process statisticians don't actually use? The answer is that learning to do the calculations by hand will give us insight into how standard deviation really works. This insight is valuable. Instead of viewing standard deviation as some magical number our spreadsheet or computer program gives us, we'll be able to explain where that number comes from.
Overview of how to calculate standard deviation
The formula for standard deviation (SD) is
where sum means "sum of", x is a value in the data set, mu is the mean of the data set, and N is the number of data points in the population.
The standard deviation formula may look confusing, but it will make sense after we break it down. In the coming sections, we'll walk through a step-by-step interactive example. Here's a quick preview of the steps we're about to follow:
Step 1: Find the mean.
Step 2: For each data point, find the square of its distance to the mean.
Step 3: Sum the values from Step 2.
Step 4: Divide by the number of data points.
Step 5: Take the square root.
An important note
The formula above is for finding the standard deviation of a population. If you're dealing with a sample, you'll want to use a slightly different formula (below), which uses n, minus, 1 instead of N. The point of this article, however, is to familiarize you with the process of computing standard deviation, which is basically the same no matter which formula you use.
Step-by-step interactive example for calculating standard deviation
First, we need a data set to work with. Let's pick something small so we don't get overwhelmed by the number of data points. Here's a good one:
Step 1: Finding start color #e07d10, mu, end color #e07d10 in square root of, start fraction, sum, start subscript, end subscript, start superscript, end superscript, open vertical bar, x, minus, start color #e07d10, mu, end color #e07d10, close vertical bar, squared, divided by, N, end fraction, end square root
In this step, we find the mean of the data set, which is represented by the variable mu.
Step 2: Finding start color #e07d10, open vertical bar, x, minus, mu, close vertical bar, squared, end color #e07d10 in square root of, start fraction, sum, start subscript, end subscript, start superscript, end superscript, start color #e07d10, open vertical bar, x, minus, mu, close vertical bar, squared, end color #e07d10, divided by, N, end fraction, end square root
In this step, we find the distance from each data point to the mean (i.e., the deviations) and square each of those distances.
For example, the first data point is 6 and the mean is 3, so the distance between them is 3. Squaring this distance gives us 9.
Step 3: Finding start color #e07d10, sum, open vertical bar, x, minus, mu, close vertical bar, squared, end color #e07d10 in square root of, start fraction, start color #e07d10, sum, start subscript, end subscript, start superscript, end superscript, open vertical bar, x, minus, mu, close vertical bar, squared, end color #e07d10, divided by, N, end fraction, end square root
The symbol sum means "sum", so in this step we add up the four values we found in Step 2.
Step 4: Finding start color #e07d10, start fraction, sum, open vertical bar, x, minus, mu, close vertical bar, squared, divided by, N, end fraction, end color #e07d10 in square root of, start color #e07d10, start fraction, sum, start subscript, end subscript, start superscript, end superscript, open vertical bar, x, minus, mu, close vertical bar, squared, divided by, N, end fraction, end color #e07d10, end square root
In this step, we divide our result from Step 3 by the variable N, which is the number of data points.
Step 5: Finding the standard deviation square root of, start fraction, sum, start subscript, end subscript, start superscript, end superscript, open vertical bar, x, minus, mu, close vertical bar, squared, divided by, N, end fraction, end square root
We're almost finished! Just take the square root of the answer from Step 4 and we're done.
Yes! We did it! We successfully calculated the standard deviation of a small data set.
Summary of what we did
We broke down the formula into five steps:
Step 1: Find the mean mu.
Step 2: Find the square of the distance from each data point to the mean open vertical bar, x, minus, mu, close vertical bar, squared.
x | open vertical bar, x, minus, mu, close vertical bar, squared | |
---|---|---|
6 | open vertical bar, 6, minus, start color #11accd, 3, end color #11accd, close vertical bar, squared, equals, 3, squared, equals, 9 | |
2 | open vertical bar, 2, minus, start color #11accd, 3, end color #11accd, close vertical bar, squared, equals, 1, squared, equals, 1 | |
3 | open vertical bar, 3, minus, start color #11accd, 3, end color #11accd, close vertical bar, squared, equals, 0, squared, equals, 0 | |
1 | open vertical bar, 1, minus, start color #11accd, 3, end color #11accd, close vertical bar, squared, equals, 2, squared, equals, 4 |
Steps 3, 4, and 5:
Try it yourself
Here's a reminder of the formula:
Want to join the conversation?
- What are the steps to finding the square root of 3.5? I can't figure out how to get to 1.87 with out knowing the answer before hand.(24 votes)
- without knowing the square root before hand, i'd say just use a graphing calculator(23 votes)
- But what actually is standard deviation? I understand how to get it and all but what does it actually tell us about the data?(16 votes)
- The standard deviation is a measure of how close the numbers are to the mean. If the standard deviation is big, then the data is more "dispersed" or "diverse".
As an example let's take two small sets of numbers:
4.9, 5.1, 6.2, 7.8
and
1.6, 3.9, 7.7, 10.8
The average (mean) of both these sets is 6. But the second set is more dispersed: the numbers are further away from the mean.
This is reflected in the standard deviation: if I calculated correctly (please check!) the first set has a standard deviation of 2.3, the second has 7.05.(37 votes)
- I want to understand the significance of squaring the values, like it is done at step 2. Why actually we square the number values?(12 votes)
- The important thing is that we want to be sure that the deviations from the mean are always given as positive, so that a sample value one greater than the mean doesn't cancel out a sample value one less than the mean. There are two strategies for doing that, squaring the values (which gives you the variance) and taking the absolute value (which gives you a thing called the Mean Absolute Deviation). Even though taking the absolute value is being done by hand, it's easier to prove that the variance has a lot of pleasant properties that make a difference by the time you get to the end of the statistics playlist.(20 votes)
- From the class that I am in, my Professor has labeled this equation of finding standard deviation as the population standard deviation, which uses a different formula from the sample standard deviation. Is there a way to differentiate when to use the population and when to use the sample? Or would such a thing be more based on context or directly asking for a giving one? Why do we use two different types of standard deviation in the first place when the goal of both is the same?(11 votes)
- The population standard deviation is used when you have the data set for an entire population, like every box of popcorn from a specific brand. Having this data is unreasonable and likely impossible to obtain. That's why the sample standard deviation is used. Sample standard deviation is used when you have part of a population for a data set, like 20 bags of popcorn. This is much more reasonable and easier to calculate.(2 votes)
- What is the formula for calculating the variance of a data set? Is it the same as the formula for standard deviation given in this article but without the square root?
In other words, is standard deviation the square root of the variance?
I remember vaguely that one of the two — SD and variance — is the square (or square root) of the other.(6 votes)- Yes, the standard deviation is the square root of the variance.(8 votes)
- If I have a set of data with repeating values, say 2,3,4,6,6,6,9, would you take the sum of the squared distance for all 7 points or would you only add the 5 different values?(7 votes)
- In the formula for the SD of a population, they use mu for the mean. Is there a difference from the x with a line over it in the SD for a sample?(5 votes)
- No, μ and x̄ mean the same thing (no pun intended). At least when it comes to standard deviation.(8 votes)
- I didn't get any of it. I need help really badly. What does this stuff mean?(6 votes)
- It may look more difficult than it actually is, because
all the different variables that are used are just there to represent the numbers in your equation. Therefore, those variables are just examples of how to solve for Standard Deviation, and are not actually in the equation.(5 votes)
- Hi,
How do I calculate the standard deviation of bivariate data by hand?
Thanks
Sean(7 votes)- You would have a covariance matrix. You could find the Cov that is covariance.
E.g. Cov(X, X) = Var(X) = standard_deviation_x^2
Similarly we could do the same thing for Y.
We can also find Cov(X, Y). Just use definition. If you are not able find it on khan academy just go to Wikipedia.(1 vote)
- Why does the formula show n and not n-1?(7 votes)
- n is the denominator for population variance. In contrast n-1 is the denominator for sample variance.
Depending on the context we use n or n-1. Using n can result in underestimation.
Because we don't exact the mean and mean is used in the formula for variance this mean when we don't have population data we will most likely underestimate the variance if we use n.
In contrast using n-1 adjust for this.
I highly recommend you read Probability and Statistic for Engineering Science by Jay L. Devore. They go through a proof showing n-1 is an unbiased estimate for sample data in contrast to n.(1 vote)