If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

How parameters change as data is shifted and scaled

See how transforming a data set by adding, subtracting, multiplying, or dividing a constant affects measures of center and spread.

Want to join the conversation?

  • duskpin ultimate style avatar for user learn
    I don't understand why the 1st and 3rd quartiles are different on the spreadsheet than when I do it by hand.
    First I ordered the numbers
    2,3,3,5,5,5,6,7,7,8,10,13
    I got the same median = 5.5
    but for the IQR I got = 3.5
    1st quartile = 4
    3rd quartile = 7.5
    IQR= 3.5

    At you can see the results of the spreadsheet

    1st quartile = 4.5
    3rd quartile = 7.25
    IQR = 2.75

    Is there a difference in how the spreadsheet computes the 1st and 3rd quartiles?
    (17 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Trần Hữu Minh Hoàng
      The Excel function QUARTILE is considered inaccurate. It treats the quartile like a percentile and then use linear interpolation to get the output. It's newer version QUARTILE.EXC is more preferable.

      https://superuser.com/questions/343339/excel-quartile-function-doesnt-work

      The algorithm of QUARTILE.EXC can be described somewhat like below:
      - Calculate i = (n-1)/4
      - 1st quartile = [i]th number + {i} * ([i+1]th number - [i]th number).
      Where [i] = the integral part of i, {i} = the decimal part of i.
      - 2nd (or 3rd) quartile: Multiply i by 2 (or 3), then do the same process.

      For example, in the lesson we have a set of data: 2,3,3,5,5,5,6,7,7,8,10,13

      i = (12 + 1)/4 = 3.25
      3 * i = 9.75

      1st quartile = 3 + 0.25 * (5 - 3) = 3.5
      3rd quartile = 7 + 0.75 * (8 - 7) = 7.75
      IQR = 7.75 - 3.5 = 4.25
      * Tested using QUARTILE.EXC in Excel.

      An alternative recursive algorithm can be used where the data set is splitted into halves to find the median of each half, which is similar to what Sal taught.

      Both algorithms produce values that separate the data set into groups of 25%. The algorithm implemented in QUARTILE doesn't.
      (23 votes)
  • blobby green style avatar for user vigneshboserajan
    Here how standard deviation is scaling if we scale data? If i add 5 to all data, SD is not increasing but if we multiply its increasing. But Multiplication is repeated addition, its same thing like adding 5 five times, then why it is scaling?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Dan Oschrin
      Because...

      Say for example I have 4 and 8. The difference is 4, or in other words they are 4 apart. The ratio between them is 1 to 2 (8 = 2x4).

      Now let's say I multiply both of those numbers by 5. They become 20 and 40. The ratio is exactly the same: 1 to 2. However, the difference between them is much bigger (20 now), because multiplying by the same number doesn't mean that you are adding the same thing. 5 x 4 = 4+4+4+4+4, and 5 x 8 = 8+8+8+8+8.

      So, if we imagine that 4 is the mean in the original set, and 8 is another data point called X, X is now a lot farther away after scaling it (remember that standard deviation is just the average distance from the mean).

      What DOESN'T change (I think) is the number of standard deviations away from the mean that data point x is. Say the standard deviation of the dataset is 4. 8 is 1 SD from 4. If we scale the data by 5, then the SD becomes 20, the mean is now 20, and data point x is now 40. Data point x is still 1 SD away.
      (18 votes)
  • cacteye green style avatar for user Eric Allen Conner
    Is there a Khan Academy-like course for learning excel?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user zjleon2010
    why sal evaluate mean and the standard deviation on population rather than on samples?
    I think it could be more practical to evaluate them on sample
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      Evaluating mean and standard deviation on the population rather than on samples provides a complete understanding of the entire dataset. While sample statistics can be useful for making inferences about a larger population, calculating parameters on the entire population allows for a more accurate representation of the data without potential sampling biases.
      (1 vote)
  • blobby green style avatar for user steve_lee
    can you tell us how to convert units of measurement?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user zjleon2010
      I think the magnitude of the unit of measurement remains the same, because the domain of distribution remains the same after scaling or shifting, it still define on all real numbers
      while the unit of the unit of measurement may change depends on the context, say the transformation transfer the sample from Fahrenheit to degree.
      (1 vote)
  • piceratops seed style avatar for user Abdelrhman Adel
    why when we scale the sd it changes even if the multiplication is repeated addition.
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user ali.elham1252
    I was never taught that adding/subtracting is shifting and multipyling/dividing is scaling. what does it mean here? To me both of them are just the number increase and that is it!
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      In statistics, "shifting" refers to adding or subtracting a constant value from each data point, which moves the entire dataset up or down along the number line without changing its relative distribution. "Scaling," on the other hand, involves multiplying or dividing each data point by a constant, which changes the spread or dispersion of the data. The distinction between the two operations is important because they have different effects on the statistical properties of the data.
      (1 vote)
  • winston baby style avatar for user peterbpesch
    Why does Sal suddenly skip to a completely different definition of the IQR?

    For the standard deviation, he explained which function from the spreadsheetprogram he selected.
    But during the calculation of the IQR he just used some function, without explaining why he chose that one, and without explaining why it leads to a different answer?

    (Within excel, I can choose between 2 different quartile functions, but each one will lead to an IQR which is far away from the IQR we learned to compute in the previous unit ...)
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      The choice of function for computing quartiles may indeed lead to different results. Different functions may use distinct algorithms or assumptions about the distribution of the data, resulting in variations in the calculated quartiles. It's essential to understand the specific function being used and ensure it aligns with the method taught in previous units.
      (1 vote)
  • duskpin sapling style avatar for user JJ
    So bacically:

    For Standard deviation and IQR, they do not change if you shift(+or-) the data. But scaling(×) it would change.

    For mean and median, they both change if you shift or scale the data.




    In the case when they do change, they are only changed by the value it shifts or the scale factor . For instance, if X_i +5 , its mean will also be increased by 5. If its scaled up by 5, the mean will be 5 times of the original mean, and the Standard deviation or IQR would also be 5 times of it
    .



    data=X_i


    tl;dr
    +- changes mean and median only
    × changes mean, median SD, and IQR.
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user HI
    what happens to the variation?
    (0 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      When the data is shifted (added or subtracted by a constant), the variation, represented by measures like standard deviation and interquartile range, remains the same. However, when the data is scaled (multiplied or divided by a constant), the variation also scales accordingly. For example, if all data points are multiplied by 2, the spread of the data, as measured by the standard deviation or interquartile range, will also be multiplied by 2.
      (1 vote)

Video transcript

- [Instructor] So I have some data here in the spreadsheet you could use Microsoft Excel or you could use Google Spreadsheet and we're gonna use the spreadsheet to quickly calculate some parameters. Let's say this is a population. Let's say this is, we're looking at a population of students and we wanna calculate some parameters and this is their ages, and we wanna calculate some parameters on that. And so first I'm gonna calculate it using the spreadsheet, and then we're gonna think about how those parameters change as we do things to the data. If we were to shift the data up or down or if we were to multiply all the points by some value, what does that do to the actual parameters? So the first parameter I'm gonna calculate as the mean. Then I'm gonna calculate the standard deviation. Then I'm gonna calculate the median, and then I wanna calculate, let's say, the inter quartile range. Inter, I'll call it IQR. So let's do this. Let's first look at the measures of central tendencies. So the mean, the function on most spreadsheets is the average function, and then I cold use my mouse and select all of these, or I could press Shift with my arrow button and select all those. Okay, that's the mean of that data. Now let's think about what happens if I take all of that data and if I were to add a fixed amount to it. So if I took all the data and if I were to add five to it. So an easy way to do that in a spreadsheet is you select that, you add five, and then I can scroll down. And notice for every data point I have before, I now have five more than that. So this is my new dataset, or as I'm calling Data+5. Let's see what the mean of that is. So the mean of that, notice, is exactly five more, and the same would have been true if I added or subtracted any number. The mean would change by the amount that I add or subtract. That shouldn't surprise you, because when you're calculating the mean, you're adding all the numbers up and you're dividing by the numbers you have. If all the numbers are five more, you're gonna add five. In this case, how many numbers are there? One, two, three, four, five, fix, seven, eight, nine, 10, 11, 12. You're gonna add 12 more fives and then you're gonna divide by 12, and so it makes sense that your mean goes up by five let's think about how the mean changes if you multiply. So if you take your data and if I were to multiply it times five, what happens? So this equals this times five. So now all the data points are five times more. Now what happens to my mean? Notice my mean is now five times as much. So the measures of central tendency, if I add or subtract, well I'm gonna add or subtract the mean by that amount, and if I scale it up by five or if I scaled it down by five, well my mean would scale up or down by that same amount, and if you numerically looked at how you calculate a mean, it would make sense that this is happening mathematically. Let's look at the other typical measure of central tendency, and that is the median. To see if that has the same properties. So let's calculate the median here. So once again you order these numbers and just find the middle number. Which isn't too hard, but a computer can do it awfully fast. So that's the median for that dataset. What do you think the medians gonna be if you take all of the data plus five? Well the middle number, if you ordered all of these numbers and made them all five more, the orders, you could think of it as being the same order, but now the one in the middle is gonna be five more. So this should be 10.5, and yes, it is indeed 10.5, and what would happen if you multiply everything by five? Well once again, you still have the same ordering. It should just multiply that by five. Yup, the middle number's now gonna be five time larger. So both of these measures of central tendency, if you shift all the data points, or if you scale them up, you're going to similarly shift or scale up these measures of central tendency. Now let's think about these measures of spread. See if that's the same with these measures of spread. So standard deviation. So STDEV. I'm gonna take the population standard deviation. I'm assuming that this is my entire population. So let me analyze it. So let me make sure I'm doing, so standard deviation of all of this is going to be 2.99. Let's see what happens when I shift everything by five. Actually, pause the video. What do you think is going to happen? This is a measure of spread. So if you shift, I'll tell you what I think. If I shift everything by the same amount, the mean shifts but the distance of everything from the mean should not change. So the standard deviation should not change, I don't think, in this example, and indeed, it does not change. So if we shift the datasets. In this case we shifted it up by five, or if we shifted it down by one. Your measure of spread, in this case standard deviation should not change, or at least the standard deviation measure of spread does not change, but if we scale it, well I think it should change, because you could imagine a very simple dataset that things that were a certain amount of distance from the mean are now going to be five times further from the mean. So I think this actually should, we should multiply by five here, and it does look like that is the case. If I multiplied this by five. So scaling the dataset will scale the standard deviation is a similar way. What about inter quartile range? Where essentially we're taking the third quartile and subtracting from that the first quartile to figure out kind of the range of the middle 50%. Let's do that. We can have the quartile function equals quartile and then we want to look at our data, and we want the third quartile. So that's gonna calculate the third quartile. Minus quartile, same data set. So now we wanna select it again. So same dataset, but this is now going to be the first quartile. So this is gonna give us our inter quartile range. This calculates the third quartile in that dataset and this calculates the first quartile in that dataset. And we get 2.75. Now let's think about whether the inter quartile range should change. And I don't think it will. Because remember, everything shifts, and even though the first quartile is gonna be five more, but the third quartile is gonna be five more as well. So the difference shouldn't change. And indeed look, the distance does not change, or the difference does not change. But similarly, if we scale everything up, if we were to scale up the first quartile and the third quartile by five, well then their difference should scale up by five, and we see that right over there. So the big takeaway here. I just use the example of shifting up by five and scaling up by five, but you could subtract by any number, and you could divide by a number as well. The typical measures of central tendency mean and median, they both shift and scale as you shift and scale the data, but your typical measures of spread, standard deviation and inter quartile range, they don't change if you shift the data, but they do change and they scale as you scale the data.