If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

P-values and significance tests

We compare a P-value to a significance level to make a conclusion in a significance test. Given the null hypothesis is true, a p-value is the probability of getting a result as or more extreme than the sample result by random chance alone. If a p-value is lower than our significance level, we reject the null hypothesis. If not, we fail to reject the null hypothesis.

Want to join the conversation?

  • purple pi teal style avatar for user Nguyen Nguyen
    why do we reject the H(o) when p value is less than the significance level?
    (36 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user alsoltes
      Because we're looking for the probability that the sample mean (X bar) is greater than or equal to 25 minutes. if we assume the null hypothesis to be true, then the p-value would display the percent chance of getting the result if the null hypothesis were true. If the chance is lower than our significance level (1 in 20 or .05 in this case), then that's evidence that such an outcome would be rather unlikely to occur if the null hypothesis were true.
      (25 votes)
  • blobby green style avatar for user Sergey Li
    In the case of sampling from the yellow version, I don't understand why we can assume the null hypothesis to be equal to 20 minutes. The null hypothesis being equal to 20 minutes doesn't seem to have any relationship to the p-value to me. Can someone help explain?
    (8 votes)
    Default Khan Academy avatar avatar for user
    • leafers ultimate style avatar for user Cory Cascalheira
      A mean of 20 conveys that there is no difference. We expect that any new sample of users who use the yellow-background website will spend, on average, the same amount of time as they would have on the off-white-background website.

      That is, if the null hypothesis were true, then every sample of 100 that we take from the population of users who visit the yellow-background website should have a mean close to 20.

      The p-value tells us how likely it is to get a sample mean of 25 when the sample mean should be close to 20.

      If, as at , the p-value = 0.03, then the probability of randomly selecting a sample of 100 users who spend, on average, 25 minutes on the yellow-background website is only 3%!

      In other words, the likelihood of getting a sample mean of 25 given that all sample means should be near 20 is only 3%.

      So, in a world where the null hypothesis is true--a hypothetical world where the yellow background has no effect on the amount of time that people spend on the website, and thus, the mean amount of time spent on either website is 20--it would be very unlikely to get a sample of users from that hypothetical world who spend an average of 25 minutes on the yellow-background website. If that hypothetical world existed, then we would get a sample of users who spend 25 minutes on the yellow-background website only 3% of the time. If we continued to take samples of 100 users until we died, then we would of get a sample of users who spent 25 minutes on the new website 3% of the time.

      So, the fact that this first sample returned a mean of 25 given that the null hypothesis is true is very unlikely--below the threshold that one assumes to be due to random chance alone--and therefore, the sample is inconsistent with the null hypothesis, and we reject it.
      (38 votes)
  • piceratops seed style avatar for user victoriacarrera2000
    How would I calculate the p-value if the problem doesn't give me a mean or standard deviation?
    (17 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Ken Badal
    What a confusing video
    (11 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Ebrahim
    Why do we calculate p(x>=observed value) and not just p(x=observed value)
    (9 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user 曾棋能
    At , what do you mean by at least this far away from the mean, and why we wanna the P-value is the probability of X bar greater or equal to 25 mins?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user Anwar
      Because in step 3 at we're taking a sample and supposing that the sample mean is 25. So, we're asking a question: if we take a sample and the sample mean is 25 what's the likelihood of it happening given our population is 20? If the probability of getting a sample mean of at least 25 is very low (less than 0.05) then maybe the population mean is not 20 and we have reasons to reject the null hypotesis.
      (8 votes)
  • leaf blue style avatar for user satyanarayan.ts
    I have a question on the procedure here. When the probability of ( X bar) >= 25 is below the significance level, why dont we question the sample the (X bar) came from instead of reject the null hypothesis. It may be that the sample we used might resulted in this unreasonable probability.
    (6 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Raj D
    Can someone please point me to the video where Sal explained how to calculate
    P(x bar >= 25 minutes | Ha is true)?
    Thank you
    (7 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user priyansh.7299
    I am having a hard time wrapping my head around it, this is my understanding, please confirm if I am correct or correct me if I am wrong.

    - So here we have 2 distributions, one is the distribution of the people who visit the website with white background and the other distribution is of the people who visit the website with yellow background.
    We take sample data from the second distribution with a sample size of 100.

    - p (x_bar >= 25 | H0 is true): is the probability of that the sample belongs to the first distribution or, in other words, probability that the sample was chosen completely by chance and not due to the fact that we changed the background.

    - Alpha is the tolerance that is decided depending on the experiment.

    - If the p-value is less than the tolerance, then we have reason to believe that the sample is different than the first distribution or it is very unlikely to happen by chance and more likely to happen because we changed the background color. And hence we have enough evidence to reject the null hypothesis.

    - If the p-value is higher than alpha, then there is higher probability that the sample belongs to the first distribution or it is more likely to happen by chance and less likely to happen because we changed the background color.
    (5 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Parthiban Rajendran
    If probability of mean to be 20 minutes is so low, how could that automatically mean alternate hypothesis is true, that mean could be more than 20 minutes? In reality, it could be less than 20 minutes also? (and proving alternate hypothesis that mean less than 20min for any other person for whatever strange reason he assumes that?)
    (5 votes)
    Default Khan Academy avatar avatar for user

Video transcript

- Let's say that I run a website that currently has this off white color for it's background and I know the mean amount of time that people spend on my website, let's say it is 20 minutes and I'm interested in making a change that will make people spend more time on my website. My idea is to make the background color of my website yellow. But after making that change, how do I feel good about this actually having the intended consequence? Well that's where significance tests come into play. What I would do is first set up some hypotheses, a null hypothesis and an alternative hypothesis. The null hypothesis tends to be a statement that, "Hey, your change actually had no effect, "there's no news here," and so this would be that your mean is still equal to 20 minutes after the change to yellow, in this case, for our background. And we would also have an alternative hypothesis. Our alternative hypothesis is actually that our mean is now greater because of the change, that people are spending more time on my site. So our mean is greater than 20 minutes after the change. Now the next thing we do is we set up a threshold known as the significance level and you will see how this comes into play in a second. So, your significance level is usually denoted by the Greek letter Alpha and you tend to see significant levels like 1/100 or 5/100 or 1/10 or 1%, 5%, or 10%. You might see other ones, but we're gonna set a significance level for this particular case. Let's just say it's going to be 0.05. And what we're going to now do is we're going to take a sample of people visiting this new yellow background website and we're gonna calculate statistics. The sample mean, the sample standard deviation, and we're gonna say, "Hey, if we assume that "the null hypothesis is true, "what is the probability of getting a sample "with the statistics that we get?" And if that probability is lower than our significance level, if that probability is less than 5/100, if it's less than 5%, then we reject the null hypothesis and say that we have evidence for the alternative. However, if the probability of getting the statistics for that sample are at the significance level or higher, then we say, "Hey, we can't reject the null hypothesis, "and we aren't able to have evidence for the alternative." So what we would then do, I will call this step three. In step three, we would take a sample. So let's say we take a sample size, let's say we take 100 folks who visit the new website, the yellow background website, and we measure sample statistics. We measure the sample mean here, let's say that for that sample, the mean is 25 minutes. We are also likely to, if we don't know what the actual population standard deviation is, which we typically don't know, we would also calculate the sample standard deviation. Then the next step is we calculate a p-value. And the p-value, which stands for probability value, is the probability of getting a statistic at least this far away from the mean if we were to assume that the null hypothesis is true. So one way to think about it it is a conditional probability. It is the probability that our sample mean when we take a sample of size n=100 is greater than or equal to 25 minutes, given our null hypothesis is true. And in other videos, we have talked about how to do this. If we assume that the sampling distribution of the sample means is roughly normal, we can use the sample mean, we can use our sample size, we can use our sample standard deviation, perhaps we use a t-statistic, to figure out what this probability is going to be. And then we decide whether we can reject the null hypothesis. So let me call that step five. So step five, there are two situations. If my p-value, if it is less than Alpha, then I reject my null hypothesis and say that I have evidence for my alternative hypothesis. Now, if we have the other situation, if my p-value is greater than or equal to, in this case 0.05, so if it's greater than or equal to my significance level, then I cannot reject the null hypothesis. I wouldn't say that I accept the null hypothesis, I would just say that we do not reject the null hypothesis. And so, let's say, when I do all of these calculations, I get a p-value which would put me in this scenario right over here. Let's say that I get a p-value of 0.03. 0.03 is indeed less than 0.05 so I would reject the null hypothesis and say that I have evidence for the alternative. And this should hopefully make logical sense because what we're saying is, hey, look, we took a sample and if we assume the null hypothesis, the probability of getting that sample is 3%, it's 3/100, and so since that probability is less than our probability threshold here, we'll reject it and say we have evidence for the alternative. On the other hand, there might have been a scenario where we do all of the calculations here and we figure out a p-value that we get is equal to 0.5, which you can interpret as saying that hey, if we assume the null hypothesis is true, that there's no change due to making the background yellow, I would have a 50% chance of getting this result. And in that situation, since it's higher than my significance level, I wouldn't reject the null hypothesis. A world where the null hypothesis is true and I get this result, well, you know, it seems reasonably likely. And so, this is the basis for significant tests generally and as you'll see, is applicable in almost every field you'll find yourself in. Now there's one last point of clarification that I wanna make very, very, very clear. Our p-value, the thing that we're using to decide whether or not we reject the null hypothesis, this is the probability of getting your sample statistics given that the null hypothesis is true. Sometimes people confuse this and they say, "Hey, is this the probability that the null hypothesis "is true given the sample statistics that we got?" And I would say, "Clearly, no, that is not the case." We are not trying to gauge the probability that the null hypothesis is true or not. What we are trying to do is say, "Hey, if we assume the null hypothesis were true, "what is the probability that we got the result "that we did for our sample?" And if that probability is low, if it's below some threshold that we set ahead of time, then we decide to reject the null hypothesis and say that we have evidence for the alternative.