If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: AP®︎/College Statistics>Unit 10

Lesson 3: The idea of significance tests

# Using P-values to make conclusions

Learn how to use a P-value and the significance level to make a conclusion in a significance test.
This article was designed to provide a bit of teaching and a whole lot of practice. The questions are ordered to build your understanding as you go, so it's probably best to do them in order. Onward!
We use $p$-values to make conclusions in significance testing. More specifically, we compare the $p$-value to a significance level $\alpha$ to make conclusions about our hypotheses.
If the $p$-value is lower than the significance level we chose, then we reject the null hypothesis ${H}_{0}$ in favor of the alternative hypothesis ${H}_{\text{a}}$. If the $p$-value is greater than or equal to the significance level, then we fail to reject the null hypothesis ${H}_{0}$, but this doesn't mean we accept ${H}_{0}$. To summarize:
Let's try a few examples where we use $p$-values to make conclusions.

## Example 1

Alessandra designed an experiment where subjects tasted water from four different cups and attempted to identify which cup contained bottled water. Each subject was given three cups that contained regular tap water and one cup that contained bottled water (the order was randomized). She wanted to test if the subjects could do better than simply guessing when identifying the bottled water.
Her hypotheses were ${H}_{0}:p=0.25$ vs. ${H}_{\text{a}}:p>0.25$ (where $p$ is the true likelihood of these subjects identifying the bottled water).
The experiment showed that $20$ of the $60$ subjects correctly identified the bottle water. Alessandra calculated that the statistic $\stackrel{^}{p}=\frac{20}{60}=0.\overline{3}$ had an associated P-value of approximately $0.068$.
Question A (Example 1)
What conclusion should be made using a significance level of $\alpha =0.05$?

Question B (Example 1)
In context, what does this conclusion say?

Question C (Example 1)
How would the conclusion have changed if Alessandra had instead used a significance level of $\alpha =0.10$?

## Example 2

A certain bag of fertilizer advertises that it contains , but the amounts these bags actually contain is normally distributed with a mean of and a standard deviation of .
The company installed new filling machines, and they wanted to perform a test to see if the mean amount in these bags had changed. Their hypotheses were vs. (where $\mu$ is the true mean weight of these bags filled by the new machines).
They took a random sample of $50$ bags and observed a sample mean and standard deviation of and . They calculated that these results had a P-value of approximately $0.02$.
Question A (Example 2)
What conclusion should be made using a significance level of $\alpha =0.05$?

Question B (Example 2)
In context, what does this conclusion say?

Question C (Example 2)
How would the conclusion have changed if they had instead used a significance level of $\alpha =0.01$?

## Ethics and the significance level $\alpha$‍

These examples demonstrate how we may arrive at different conclusions from the same data depending on what we choose as our significance level $\alpha$. In practice, we should make our hypotheses and set our significance level before we collect or see any data. Which specific significance level we choose depends on the consequences of various errors, and we'll cover that in videos and exercises that follow.

## Want to join the conversation?

• Could any one explain how to get the p-value in the second example?
• Sure!

The p-value is the probability of a statistic at least as deviant as ours occurring under the assumption that the null hypothesis is true.

Under that assumption, and noting also that we are given that the population is normally distributed (or that we took a sample size of at least 30 [by the Central Limit Theorem]), we can treat the sampling distribution of the sample mean as a normal distribution.

So now, we can use the normal cumulative density function or a z-table to find this probability. (We could also use a t-table, but it is allowable to just use a z table since our sample size is larger than 30)

To use a z-table, we'll need to find the appropriate z-score first.

Since the answer to what we are asking comes from the sampling distribution of the sample mean, we would find the appropriate standard deviation to use by dividing the population standard deviation by the square root of the sample size (since the variance of the sampling distribution is the population variance divided by the sample size, and the standard deviation is the square root of the variance).

That would give us a standard deviation for the sampling distribution of the sample mean.

I say would, because unfortunately, we don’t always know the population standard deviation, and so (as it seems they did here, despite knowing the population standard deviation), we are using the sample standard deviation in its place to find an estimate of the standard deviation for the sampling distribution of the sample mean, which is also known as the standard error of the mean.

In our example, the standard error of the mean therefore has a value of 0.12 / 50^0.5, or approximately 0.01697.

Taking the difference between our sample mean and the population mean and dividing it by the standard error gives us our z-score (number of standard errors our sample mean is away from the population mean), which is approximately (7.36 - 7.4) / 0.01697 or -2.36.

Since the alternative hypothesis is not specific about the population mean being either greater than or less than the value in the null hypothesis, we have to consider both tails of the distribution, but by symmetry of the standard normal distribution, we can accomplish this by simply doubling the value we get from using our obtained z-score with a z-table.

The value given by a z-table using a z-score of -2.36 is 0.0091, which, when doubled, is 0.0182 or approximately 0.02.

This (or other videos before it in that section) might also help (it comes later in this unit): https://www.khanacademy.org/math/statistics-probability/significance-tests-one-sample/tests-about-population-mean/v/calculating-p-value-from-t-statistic

:)
• I don't understand the p-value in example 1

Isn't the calculation: binomial(60,20) * 0.75^40 * 0.25^20 = 0.0383?
The problem states it is "0.068". Is this p-value wrong or did I make a mistake in my calculation?
• p-value = P(p(x) >= 20/60 given that the actual proportion is 0.25)
So, You need to calculate:
binomial(60,20) * 0.75^40 * 0.25^20 (probability of 20 subject that identified the bottled water )
+
binomial(60,21) * 0.75^39 * 0.25^21 (probability of 21 subject that identified the bottled water )
+
binomial(60,22) * 0.75^38 * 0.25^22
+
binomial(60,23) * 0.75^37 * 0.25^23
:
:
binomial(60,60) * 0.75^0 * 0.25^60 (probability of all the 60 subjects identified the bottled water )

I guess !
• How do you decide what Significance level you should set??
(1 vote)
• A significance level of 0.05 (i.e. 5%) is commonly used, but sometimes other significance levels are used.

Note that the significance level is the probability of a Type 1 error (rejecting a true null hypothesis). Everything else being equal, decreasing the significance level (probability of a Type 1 error) increases the probability of a Type 2 error (failing to reject a false null hypothesis), and vice versa.

So the statistician has to weigh the cost of a Type 1 error (rejecting a true null hypothesis) versus the cost of a Type 2 error (failing to reject a false null hypothesis) in the real-world situation. If the statistician is especially concerned about the cost of a Type 1 error, then he/she will use a significance level that is less than 0.05. However, if instead the statistician is especially concerned about the cost of a Type 2 error, then he/she will use a significance level that is greater than 0.05.
• As far as I understand, rejecting H0 doesn't mean accepting Ha in all cases. Rejecting H0 only implies accepting Ha iff both are complements to each other, i.e. exactly one of them must be true. E.g. if H0 says x = 5, and Ha says x > 5, then maybe both are wrong and the truth is x < 5. This will be so weird though because the truth is expected to be either H0 or Ha, but I think it's theoretically possible to happen.
• You're confounding the truthfulness of H0 with the acceptability of Ha. In your example, not accepting Ha says we will not accept that x > 5, in other words x = 5 or x < 5. Not accepting Ha does not report on the truth that x < 5, it still allows the possibility that x = 5 - that is H0 is not rejected. It's very tempting to say H0 is "rejected" because x = 5 is a false statement. The key is to clarify what is meant by "reject". The statistics notion of reject is not based on whether the hypothesis is a true or false statement but on if it is rejected by the acceptability criteria of Ha.

From that perspective verify these statements (the logic flows from one to the next): If you do not accept Ha, then you do not reject H0. The only way you can reject H0 is by accepting Ha. It doesn't make sense to both reject H0 and not accept Ha.
(1 vote)
• In the first problem, is 0.068 the correct p-value? Assuming that the null hypothesis is true, and p = 0.25, the sampling distribution of sample proportion with n = 60 should be approximately normal, with a mean = p = 0.25 and standard deviation of √((p·(1-p))/n) ≈ 0.056. So a sample with p-hat = 0.3 should only have a z-score ≈ 0.89, and there should be ≈ 0.187 probability of getting a sample with p-hat ≥ 0.3. Or am i missing something?
• You generally had the right idea for calculating the p-value. Note that the p-hat value is not 0.3, but rather 20/60 = 1/3 = 0.3333... (perhaps you did not consider the bar on top of the decimal digit 3). So the z-score is about 1.49 instead of 0.89. The probability of equaling or exceeding a z-score of 1.49 is about 0.068.

Have a blessed, wonderful day!
• the p values are generally given whenever such problems are asked, i think calculating p values is a completely unrelated concept here so it is not taught
• what is the equation to calculate the p value
• The equation to calculate the p-value depends on the specific hypothesis test being performed. For example:
In a z-test for a population mean, the p-value can be calculated using the standard normal distribution tables or software functions.
In a t-test for a population mean, the p-value is typically calculated using the t-distribution tables or software functions.
In a chi-square test for independence, the p-value is calculated based on the chi-square distribution.
Each test has its own formula for calculating the p-value based on the observed sample data and the assumptions of the test.
(1 vote)
• ur... are we going to be told how to calculate this P-value?

I'm confused on what it actually is...
• Yes, there are lessons on how to calculate the p-value, which is the probability that the assumed population parameter is true, based on a sample statistic of that parameter.
• First problem, question B, remark for answer C
"There wasn't enough evidence to reject H0 at this significance level, but that doesn't mean we should accept H0. This experiment didn't attempt to collect evidence in support of H0."

What would be like an experiment that would collect evidence in support of H0?
• This experiment just assumed Ho was true; if p-value was below our sig level, then our assumption of Ho could be rejected, since it's unlikely we'd get such a deviant (or more deviant) sample proportion if Ho was true.
If p-value was above our sig level, it tells us that Ha can be rejected, since it's likely enough to get a sample proportion of 0.333333333etc or more assuming Ho; there is no need for Ha to be true (no need for pop proportion to be higher).

But Ha being rejected doesn't prove Ho (pop proportion = 0.25).

For example, our hypothesis could be
Ho: p = 0.245
Ha: p > 0.245

And then with a p^ of 0.33333etc, we would have a p-value of around 0.056, which still above our sig level, meaning that we reject our Ha, p > 0.245.
This would be a contradiction if our first Ho was proven, but it wasn't, so it's not a contradiction.
Notice how rejecting p > 0.245 doesn't conflict with rejecting p > 0.25, since we never said p had to be in between 0.245 and 0.25.

You could maybe use the law of large numbers and coerce millions of people into guessing the water in your cups, and see if that proportion is really close to 0.25 to possibly prove that p = 0.25. There's probably a better experiment, but I'm not too experienced in thinking of them :P
(1 vote)
• Thank you for the great questions. They helped me so much to prepare for my test.