If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: Statistics and probabilityĀ >Ā Unit 13

Lesson 2: Comparing two means

# Statistical significance on bus speeds

Sal determines if the results of an experiment about bus speeds are statistically significant.

## Want to join the conversation?

• In all of these videos about hypothesis testing I'm left wondering how the "re-randomisation" is done. It would be helpful to have this explained in more detail.
• I'm assuming that the "re-randomisation" means when we take the N people and redistribute them between the two groups (let me know if that's mistaken, I'm not able to watch the video right at this moment).

If this is the case, then it's a relatively simple concept. Imagine we have a list of names and the associated group A or B. Just keeping the list of names as-is, take all the group A and B's, and throw them into a hat. Draw one out - the first person is in that group. Draw a second - the second person is in that group, and so on. Draw out all the groups, and just put the next person into that group. Presto, we have a re-randomization of the groups. Rinse and repeat to get a second re-randomization, and so on. Various computer algorithms will do this for us very quickly, but that's the basic idea.
• How and Why Re-randomization works ?
• The purpose of the re-randomization is to take all the original data, regardless of whether it is treatment group or control, and see whether the resulting difference in trip times are likely as random chance. Here is a very simplified version: actually you would want more measurements. In the case of bus trip lengths, we might have the following:
1: 53 min (A)
2: 42 min (A)
3: 40 min (B)
4: 53 min (A)
5: 38 min (A)
6: 28 min (B)
7: 52 min (A)
8: 32 min (B)
9: 55 min (B)
10: 33 min (B
For calculating the original results, we would find the median of the A bus trips and the median of the B bus trips. Then we would compare them, in this case by finding the difference of the medians.

For the simulations, we would dump all the data together and group them randomly into two groups over and over. This is re-randomization. We are trying to find out if the results were important and likely to occur as a regular event, or if the results were just a quirk, and not likely to be a regular result.
How do we do this:? For each simulation trial, we would find the medians of each random group and the difference between the medians. To have a good simulation, we might do this 150 times or 1000 times, as in this case. Then we would see how often we would find the original results among the RANDOM group results. If we get the same result in the random groups very rarely, then we can say that our experimental result was a significant result. If that is true, we can switch to Bus B and save time most of the time on our bus ride.
• In calculating statistical significance should he be counting both the frequencies that are greater than +8 as well as those that are less than -8 (and not just the +8 ones)? I thought that statistical significance was measuring the likelihood of getting a value as extreme as the one he got (regardless of direction)?
• Hi!

Good question.
You would be right that we would have to add the frequencies greater than +8 and smaller than -8...
IF our question was: do A and B differ from each other by more than 8 minutes ?
In this question, we don't care if A is faster than B or B is faster than A.

However, here our question is: is it true (can we reasonably assume) that A is faster than B by 8 minutes ?

For this reason, we are only interested in those outcomes where the difference [A-B] >= 8. Those outcomes where A is greater than B by 8 minutes or more.

• Why is it that if the probability you get is lower than 5%, then the result is significant? How come when you solve the probability you are actually solving the probability that the results are random ?

Thx, Clarissa
• The tests that we do make some sort of assumption. We might assume that the population mean is some value, or that the probability of getting heads on a coin flip is 0.5, etc. That assumption is crucial. Once we make that assumption, we can start calculating probabilities. In particular, we want to calculate the probability of the observed result happening by chance. The reason for this is that the outcome - such as the length of time the bus trip lasts - is a "random variable." It can't be predicted exactly. So in this case we make the assumption that the two bus routes have the same mean travel time.

With that, we have a definite scenario to play with. The times are still random events, so there's an element of chance as to whether one route will be a minute or two longer than the other. We want to know the probability of this happening by chance, because if it's a really small probability, then it's very unlikely to occur by chance, right?

Now, we try to trust the data - because they're real, they're what actually happened. So if, assuming the two routes have equal travel time, our observed data are very unlikely, that makes our assumption a very poor one, and it's probably wrong. In Statistical jargon, we say this is a "significant" result.
• In this example the median travel time was used. Is there any reason for using the median instead of the mean?
• Sometimes the median can give you a far more practical approach towards a situation.
For example: You want to know how rich the average person living in a city, let's call it Basin City is. While the median person earns 100 dollars per year, and the standard deviation is very low, meaning that most people are very close towards 100 dollars (e.g. 80% of population is between 80 and 120 dollars), the range could be ridiciously high. Imagine a rich CEO living in Basin City earning 5.000.000.000 dollars a year. This insane range may strongly influence the mean, while the median is less affected by those extremes. Now if somebody would use the mean to answer the question "how rich is the average person living in Basin City?" he would get a very distorded answer depicting the average person in Basin City as way more wealthier than actually is the case.
• I dont understand. In Statistical significance on bus speeds, if the chances of Bus A being faster than Bus B in terms of time from source to destination, is roughly ~ 10% out of 1000 simulations by re randomization of sample data medians, to me that means The Claim Bus A is faster than Bus B is True TEN/10 % out of 1000 times OR that almost 90% of the times this Proposed Claim doesn't hold true meaning Bus A DOESN'T reach faster than Bus B.

What am I missing?
• You're missing the point that after randomization, the values of Bus A are not really from Bus A anymore, it's values were randomly assigned from Bus A and Bus B of the initial experiment. You've switched the times around randomly so the origin is lost.

Those 10% mean that the hypothesis from your first experience is not valid because in 10% of other 1000 random experiments we got the same result. This tells us that your first experiment might be caused by chance therefore it is not significant.

I hope that clears things out for you.
• Maybe because I am not an English speaker, things are not that clear to me.
According to this video, what I understood was:
1) Hypothesis: bus A is faster than bus B
2) Experiment: bus A "median" travel duration is 8 minutes less than bus B
3) Simulation: the probability of bus A being faster than bus B by 8 minutes or more is 9.3%
4) Significance: ?
I don't understand the significance (meaning and importance) of the simulation result. What means its relationship with the threshold? Less than the threshold means the hypothesis is valid or opposite? Is it good to be over or under the threshold? Is there a logical way to define the threshold or it was chosen by chance?

Sometimes the explanations are very confusing, especially to non-English audience. Sometimes, the choosing of words make the entire explanation confuse, like in "test of pregnancy" and "test of probability" later are referred to as simply "test". Which test?
• Hi Marcello!

Good question. The threshold is chosen by the statistician. That's also why we always have to mention it. When we say "this experiment was significant at the 5% level", the audience knows that we chose a threshold of 5%.
• I didn't get the threshold thing here. In the video, Sal said if the threshold is 50%, it is very likely to happen and if its 25% then its less likely to happen. I thought if the probability we get after re-randomizing the previous experiment data is greater than the threshold then we assume our null hypothesis (Bus A is faster than Bus B) to be true.
So, if threshold is 50% or 25% and the probability we got is 9.3%, the chance of Bus A being faster than Bus B is very unlikely and we reject our hypothesis. Do correct me here, I'm probably wrong.
• > "the chance of Bus A being faster than Bus B is very unlikely and we reject our hypothesis"

There's a key element you're missing. Our hypothesis is that the two bus routes do not have different population median travel times. If our hypothesis is wrong, then Bus A is generally faster than Bus B, and so that fact explains the faster time for Bus A. But under our hypothesis, there is no reason that Bus A should tend to be quicker, so the fact that Bus A had a sample median that was 8 minutes faster than Bus B is purely a result of chance - random variation.

In the re-randomization, we simulate a distribution of the difference in medians - this is a set of possible values that we could have observed if the two bus routes had equal medians, with more likely values showing up more often. If you've seen some of the other Statistics videos, it's comparable to the Sampling Distribution of the Sample Mean. We use this distribution to find the probability of Bus A being at least 8 minutes faster than Bus B under the assumption that the two routes have no difference.

The observed value, the 8 minutes difference, is derived from reality. It's what really happened. If Bus A is faster, this will be a larger number. The simulated distribution is forced to obey our hypothesis, that neither route is quicker. If there is only a small probability of the observed result when comparing against the simulated distribution, then we know that our hypothesis doesn't really reflect reality (or put another way: reality conflicts with our hypothesis), and we would claim that the hypothesis is wrong, and that one of the routes is indeed quicker than the other.
• In statistical studies like this, how would one know when to use the median vs the mean? Conceptually, what would analyzing the mean have given different from analyzing the median?
• When the data are skewed or contain outliers, the mean tends to be a poorer measure of center than the median. It is still sometimes preferable to use the mean instead of the median (due to some other properties, such as the sampling distribution of the sample mean being asymptotically normal).