If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Statistical significance on bus speeds

Sal determines if the results of an experiment about bus speeds are statistically significant.

Want to join the conversation?

  • piceratops seed style avatar for user T R Lake
    In all of these videos about hypothesis testing I'm left wondering how the "re-randomisation" is done. It would be helpful to have this explained in more detail.
    (31 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      I'm assuming that the "re-randomisation" means when we take the N people and redistribute them between the two groups (let me know if that's mistaken, I'm not able to watch the video right at this moment).

      If this is the case, then it's a relatively simple concept. Imagine we have a list of names and the associated group A or B. Just keeping the list of names as-is, take all the group A and B's, and throw them into a hat. Draw one out - the first person is in that group. Draw a second - the second person is in that group, and so on. Draw out all the groups, and just put the next person into that group. Presto, we have a re-randomization of the groups. Rinse and repeat to get a second re-randomization, and so on. Various computer algorithms will do this for us very quickly, but that's the basic idea.
      (25 votes)
  • blobby green style avatar for user Harikesh
    How and Why Re-randomization works ?
    (12 votes)
    Default Khan Academy avatar avatar for user
    • purple pi purple style avatar for user doctorfoxphd
      The purpose of the re-randomization is to take all the original data, regardless of whether it is treatment group or control, and see whether the resulting difference in trip times are likely as random chance. Here is a very simplified version: actually you would want more measurements. In the case of bus trip lengths, we might have the following:
      1: 53 min (A)
      2: 42 min (A)
      3: 40 min (B)
      4: 53 min (A)
      5: 38 min (A)
      6: 28 min (B)
      7: 52 min (A)
      8: 32 min (B)
      9: 55 min (B)
      10: 33 min (B
      For calculating the original results, we would find the median of the A bus trips and the median of the B bus trips. Then we would compare them, in this case by finding the difference of the medians.

      For the simulations, we would dump all the data together and group them randomly into two groups over and over. This is re-randomization. We are trying to find out if the results were important and likely to occur as a regular event, or if the results were just a quirk, and not likely to be a regular result.
      How do we do this:? For each simulation trial, we would find the medians of each random group and the difference between the medians. To have a good simulation, we might do this 150 times or 1000 times, as in this case. Then we would see how often we would find the original results among the RANDOM group results. If we get the same result in the random groups very rarely, then we can say that our experimental result was a significant result. If that is true, we can switch to Bus B and save time most of the time on our bus ride.
      (33 votes)
  • orange juice squid orange style avatar for user bryan
    In calculating statistical significance should he be counting both the frequencies that are greater than +8 as well as those that are less than -8 (and not just the +8 ones)? I thought that statistical significance was measuring the likelihood of getting a value as extreme as the one he got (regardless of direction)?
    (11 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user hannah

      Good question.
      You would be right that we would have to add the frequencies greater than +8 and smaller than -8...
      IF our question was: do A and B differ from each other by more than 8 minutes ?
      In this question, we don't care if A is faster than B or B is faster than A.

      However, here our question is: is it true (can we reasonably assume) that A is faster than B by 8 minutes ?

      For this reason, we are only interested in those outcomes where the difference [A-B] >= 8. Those outcomes where A is greater than B by 8 minutes or more.

      I hope this answers your question!
      (13 votes)
  • duskpin ultimate style avatar for user Clare
    Why is it that if the probability you get is lower than 5%, then the result is significant? How come when you solve the probability you are actually solving the probability that the results are random ?

    Thx, Clarissa
    (6 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      The tests that we do make some sort of assumption. We might assume that the population mean is some value, or that the probability of getting heads on a coin flip is 0.5, etc. That assumption is crucial. Once we make that assumption, we can start calculating probabilities. In particular, we want to calculate the probability of the observed result happening by chance. The reason for this is that the outcome - such as the length of time the bus trip lasts - is a "random variable." It can't be predicted exactly. So in this case we make the assumption that the two bus routes have the same mean travel time.

      With that, we have a definite scenario to play with. The times are still random events, so there's an element of chance as to whether one route will be a minute or two longer than the other. We want to know the probability of this happening by chance, because if it's a really small probability, then it's very unlikely to occur by chance, right?

      Now, we try to trust the data - because they're real, they're what actually happened. So if, assuming the two routes have equal travel time, our observed data are very unlikely, that makes our assumption a very poor one, and it's probably wrong. In Statistical jargon, we say this is a "significant" result.
      (9 votes)
  • orange juice squid orange style avatar for user williampgrady
    In this example the median travel time was used. Is there any reason for using the median instead of the mean?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • mr pants orange style avatar for user Sven Jankowski
      Sometimes the median can give you a far more practical approach towards a situation.
      For example: You want to know how rich the average person living in a city, let's call it Basin City is. While the median person earns 100 dollars per year, and the standard deviation is very low, meaning that most people are very close towards 100 dollars (e.g. 80% of population is between 80 and 120 dollars), the range could be ridiciously high. Imagine a rich CEO living in Basin City earning dollars a year. This insane range may strongly influence the mean, while the median is less affected by those extremes. Now if somebody would use the mean to answer the question "how rich is the average person living in Basin City?" he would get a very distorded answer depicting the average person in Basin City as way more wealthier than actually is the case.
      (6 votes)
  • leaf green style avatar for user akurnya
    I dont understand. In Statistical significance on bus speeds, if the chances of Bus A being faster than Bus B in terms of time from source to destination, is roughly ~ 10% out of 1000 simulations by re randomization of sample data medians, to me that means The Claim Bus A is faster than Bus B is True TEN/10 % out of 1000 times OR that almost 90% of the times this Proposed Claim doesn't hold true meaning Bus A DOESN'T reach faster than Bus B.

    What am I missing?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user André Nunes
      You're missing the point that after randomization, the values of Bus A are not really from Bus A anymore, it's values were randomly assigned from Bus A and Bus B of the initial experiment. You've switched the times around randomly so the origin is lost.

      Those 10% mean that the hypothesis from your first experience is not valid because in 10% of other 1000 random experiments we got the same result. This tells us that your first experiment might be caused by chance therefore it is not significant.

      I hope that clears things out for you.
      (4 votes)
  • blobby green style avatar for user Marcello Cruz
    Maybe because I am not an English speaker, things are not that clear to me.
    According to this video, what I understood was:
    1) Hypothesis: bus A is faster than bus B
    2) Experiment: bus A "median" travel duration is 8 minutes less than bus B
    3) Simulation: the probability of bus A being faster than bus B by 8 minutes or more is 9.3%
    4) Significance: ?
    I don't understand the significance (meaning and importance) of the simulation result. What means its relationship with the threshold? Less than the threshold means the hypothesis is valid or opposite? Is it good to be over or under the threshold? Is there a logical way to define the threshold or it was chosen by chance?

    Sometimes the explanations are very confusing, especially to non-English audience. Sometimes, the choosing of words make the entire explanation confuse, like in "test of pregnancy" and "test of probability" later are referred to as simply "test". Which test?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user hannah
      Hi Marcello!

      Good question. The threshold is chosen by the statistician. That's also why we always have to mention it. When we say "this experiment was significant at the 5% level", the audience knows that we chose a threshold of 5%.
      (2 votes)
  • blobby green style avatar for user Abiraj
    I didn't get the threshold thing here. In the video, Sal said if the threshold is 50%, it is very likely to happen and if its 25% then its less likely to happen. I thought if the probability we get after re-randomizing the previous experiment data is greater than the threshold then we assume our null hypothesis (Bus A is faster than Bus B) to be true.
    So, if threshold is 50% or 25% and the probability we got is 9.3%, the chance of Bus A being faster than Bus B is very unlikely and we reject our hypothesis. Do correct me here, I'm probably wrong.
    (2 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      > "the chance of Bus A being faster than Bus B is very unlikely and we reject our hypothesis"

      There's a key element you're missing. Our hypothesis is that the two bus routes do not have different population median travel times. If our hypothesis is wrong, then Bus A is generally faster than Bus B, and so that fact explains the faster time for Bus A. But under our hypothesis, there is no reason that Bus A should tend to be quicker, so the fact that Bus A had a sample median that was 8 minutes faster than Bus B is purely a result of chance - random variation.

      In the re-randomization, we simulate a distribution of the difference in medians - this is a set of possible values that we could have observed if the two bus routes had equal medians, with more likely values showing up more often. If you've seen some of the other Statistics videos, it's comparable to the Sampling Distribution of the Sample Mean. We use this distribution to find the probability of Bus A being at least 8 minutes faster than Bus B under the assumption that the two routes have no difference.

      The observed value, the 8 minutes difference, is derived from reality. It's what really happened. If Bus A is faster, this will be a larger number. The simulated distribution is forced to obey our hypothesis, that neither route is quicker. If there is only a small probability of the observed result when comparing against the simulated distribution, then we know that our hypothesis doesn't really reflect reality (or put another way: reality conflicts with our hypothesis), and we would claim that the hypothesis is wrong, and that one of the routes is indeed quicker than the other.
      (3 votes)
  • orange juice squid orange style avatar for user Tom Gade
    In statistical studies like this, how would one know when to use the median vs the mean? Conceptually, what would analyzing the mean have given different from analyzing the median?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      When the data are skewed or contain outliers, the mean tends to be a poorer measure of center than the median. It is still sometimes preferable to use the mean instead of the median (due to some other properties, such as the sampling distribution of the sample mean being asymptotically normal).
      (2 votes)
  • blobby green style avatar for user abdelkarim
    why do we keep the names of the groups(treatment group, control group) if we are doing a random shuffle? I think that we could have taken the positive values just like we took the negative ones. and in the case of what I said was true which one do we pick.
    (2 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      What do you mean by "we could have taken the positive values just like we took the negative ones"?

      To answer your question: we keep the group labels because we have two different groups (bus routes). We hypothesize / assume that they have the same travel time. Under this assumption, it doesn't matter which route we take, they'll have basically the same travel time, just with random variation (getting stopped at a traffic light, running into heavy traffic, etc). If this assumption if wrong, then one route will have a tendency to produce lower numbers (faster times).

      By shuffling the data between the two groups, we make sure that neither route has a tendency to produce lower or higher numbers. So, we're intrinsically concerned about the two groups, so we need to keep them around to classify the randomization of the data points.
      (1 vote)

Video transcript

- [Voiceover] "Giovanna usually takes bus B to work, "but now she thinks that bus A gets her to work faster. "She randomized 50 workdays between a treatment group "and a control group. "For each day from the treatment group, she took bus A; "and for each day from the control group, she took bus B. "Each day she timed the length of her drive." This is really interesting what she did, it's very important, she randomized the 50 work days. Before she did this, instead of just kinda waking up in the morning and just deciding on her own which bus to take. Because humans are infamously bad at being random. Even when we think we're being random, we're actually not that random. She might inadvertently be taking bus A earlier in the week. Or maybe the commute times are shorter. Or maybe she inadvertently takes bus A when the weather is better, when there's less traffic. Remember, there's a natural tendency for human beings to want to confirm their hypothesis. So, if she thinks that bus A is faster maybe she'll want to pick the days where she'll get data to confirm her hypothesis. It's really important that she randomize the 50 workdays. What I could imagine she did is maybe she wrote each of the work days, the dates, on a piece of paper. She would have 50 pieces of paper and then she turned them all upside down or maybe she closed her eyes and then she moved them all over her table. Then with her eyes closed she randomly moved them to either the left or the right of the table. If they moved to the left of the table then those are the days she'll take bus A, if she moves them to the right of the table those are the days she takes bus B. That's how she can make sure that this is truly random. So then they tell us, this is important, "The results of the experiment showed that the median "travel duration for bus A is eight minutes less than "the median travel duration for bus B." Or one way to think about, if we said, "The treatment group "median minus the control group median. "What would we get?" Well, the treatment group is eight minutes less than the control group? Right? This is A, this is B, so if this is eight less than this, then this is going to be equal to negative eight. This is just another way of restating what I have underlined right over here. Someone's car alarm went off, hope you're not hearing that. Anyway, I'll try to pay attention while it's going off (chuckles). "To test whether the results could be explained "by random chance, she created the table below, "Which summarizes the results of 1000 re-randomizations "of the data, with differences between medians "rounded to the nearest five minutes." What is going on over here? You might say well look, "She got her result that she "wanted to get, this data seems to confirm that "bus A gets her to work faster. "What's all this other business with re-randomization "she's doing?" The important thing to realize is, and she realizes this, is that she might have just gotten this data that I underlined, by random chance. There's some chance maybe A and B are completely similar, in terms of how long they take in reality. She just happened to pick bus A on days where bus A got to work faster. Maybe bus B is faster but she just happened to take bus A on the days that it was faster. The days it just happened to have less traffic. What she's doing here is she re-randomized the data and she wants to see that with all this re-randomized data, out of these 1000 re-randomizations, what fraction of them do I get a result like this? Do I get a result where A is eight minutes or more faster? Or you could say that the median travel duration for bus A is eight minutes less, or even less than that, than the median travel for bus B. So if it was nine minutes less, or 10 minutes less, or 15 minutes less, those are all the interesting ones. Those are the ones that confirm our hypothesis, that bus A gets to work faster. Let's look at this table, it's not below, it's actually to the right. Let's just remind ourselves what she did here, cause the first time you try to process this it can seem a little bit daunting. So, in her experiment, let me write this down, experiment... The car alarm outside which you probably, hopefully are not hearing, it's actually a surprisingly pleasant sounding car alarm, sounds like a slightly obnoxious bird, but anyway (laughs). Her experiment is, the way I described it, 25 days she would take bus A, 25 days she would take bus B. She would record all the travel times and let's say that I have 25 data points in each column. Let's say they get 12 minutes, 20 minutes, 25 minutes, and you just keep going, there's 25 data points. Let's just say that there are 12 data points less than 20 minutes and 12 data points more than 20 minutes. In this circumstance, her median time for bus A would be 20 minutes and I just made this number up. So in order for this to be eight minutes less than the median time for bus B, the median for bus B would have to be 28 and maybe you have data points here. Maybe this is 18 and you have 12 more that are less than 28. Then you have 12 more that are greater than 28. So the median time for bus B would be 28, once again I just made this data up. If you took treatment group median. I 'll just write TGM for short. TGM minus control group median. What do you get? 20 minus 28 is negative eight. This is the actual results of.... These are theoretical, potential results, hypothetical results for her actual experiment. Now what's all of this business over here? What she did is she took these times and she said, "You know what, let's just imagine a world where I could "have gotten any of these times randomly on either bus." So she just randomly re-sorted them between A and B, she did that a thousand times. The first time, the second time, the third time. She does this 1000 times. I'm assuming she used some type of computer program to do it and each time, once again, she just took the data that she had and she just rearranged it, she just reshuffled it. Maybe A on one day. Maybe it got this 18. Maybe it gets the 25. Maybe it gets a 30. Once again, I got the 18, the 25, the 30 and maybe B gets the... You know she's reshuffling all these other data points that I just have with dots and maybe B... Let's see she had the 18, 25, 30, maybe 12, 20, and 28. So in this circumstance, this random reshuffling and she keeps doing it over and over again. In this random reshuffling, the treatment group median minus the control group median is going to be what? It's going to be equal to positive five. In this random shuffling, this hypothetical scenario, Bus A's median would have been five minutes longer than bus B's. If she gets this result with this random re-sorting, this would have been... She would have had a column here for five. Then she would have put one notch right over here. It looks like she classified things or maybe she didn't even get the data but she classified them by multiples of two. If she got this again then she would have put a two here. Then she would have said, "Okay, in how many of these "random reshufflings am I getting a scenario where "there's a five minute difference? "Or where the treatment group was five minutes longer?" What is this saying? For example, this is saying that 18 out of the 1000 reshufflings, which she just randomly re-shuffled the data, 18 out of those 1000 times, she found a scenario where her treatment group median was 10 minutes longer than her control group. Where bus A's median was in this hypothetical re-randomization where the treatment group is 10 minutes slower than the control group. There were 159 times where the treatment group... Once again, in her random reshuffling, these aren't based on observations, these are random reshufflings. There's 159 times where her treatment group is four minutes slower than her control group. The whole reason for doing this is she says, "Okay, what's the probability of getting a result "like this or better?" I say, "better", as one that even more confirms her hypothesis, that the treatment group is faster than the control group. Well, the scenario, this scenario is this one right over here and then another one that the treatment group is even faster, is this right over here. Here, the treatment group median is 10 less than the control group median. In how many of these scenarios, out of the thousands, is this occurring? Well, this one occurs 85 times, this one occurs eight. If you add these two together, 93 out of the thousand times, out of her re-randomization or I guess you could say 9.3 percent of the time, the data... 9.3 percent of the randomized, the 1000 re-randomizations, 9.3 percent of the time she got data that was as validating of a hypothesis or more than the actual experiment. One way to think about this is, the probability of randomly getting the results from her experiment or better results from her experiment are 9.3 percent. They're low, it's a reasonably low probability that this happened purely by chance. Now, a question is, "What's the threshold?" If it was a 50 percent you say, "Okay, this was very "likely to happen by chance." If this was a 25 percent you're like, "Okay, it's less "likely to happen by chance but it could happen." 9.3 percent, it's roughly 10 percent. For every 10 people who do an experiment like she did, even if it was random, one person would get data like this? What typically happens amongst statisticians is they draw a threshold and the threshold for statistical significance is usually five percent. One way to think about it, the probability of her getting this result by chance, this result or a more extreme result? One that more confirms her hypothesis by chance is 9.3 percent. If you're cut-off for significance is five percent. If you said, "Okay, this has to be five percent or less." Then you say, "Okay, this is not statistically significant." There's more than a five percent chance that I could have gotten this result purely through random chance. Once again, that just depends on where you have that threshold. When we go back, I think we've already answered the final question, "According to the simulations, "what is the probability of the treatment group's median "being lower than the control group's median "by eight minutes or more?" Which once again, eight minutes or more, that would be negative eight and negative 10. We just figured that out, that was 93 out of the 1000 re-randomizations, so it's a 9.3 percent chance. If you set five percent as your cut-off for statistical significance, you say, "Okay, this doesn't quite meet my "cut-off so maybe this is not a statistically "significant result."