If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: Statistics and probability>Unit 12

Lesson 1: The idea of significance tests

# Estimating a P-value from a simulation

Example of estimating a P-value based on a simulation to approximate a sampling distribution assuming the null hypothesis is true.

## Want to join the conversation?

• But why is the p-value based on 20%? The alternative hypothesis asks just >6%. In that case it's 15 students out of 40. So the p-hat is suppose to be 15/40, right?
• We are trying to reject the null hypothesis. We got 20% proportion from the sample and we want to see how probable to get a value at least this high if null hypothesis (about 6%) were true. This probability is called p-value.

There are 25 students in a sample. 40 is a number of samples (of size 25) she simulates to estimate the p-value.

Also, p-value is NOT the probability that null hypothesis is correct. We start the whole experiment assuming that it is correct and if we fail to reject it we simply return to where we started from.
• So here the p-value is 7.5%. This means the null hypothesis is not rejected. Correct?
• I think that it would depend on the significance level that is set. Sometimes that could be 10%, other times less than 1%. As the significance level doesn't seem to be mentioned in this question we can't conclude if it is rejected. (Instead we're simply estimating the value that would be used to evaluate the rejection/acceptance decision.)
• Why take >= 20% for the p value and not just 20%?
• I think that's because of the definition of p-value itself. p- value is the probability of getting test results "at least as extreme as" the observed result (here 20%). That means that we have to take more extreme values than 20% into account.
It might be quite confusing, but what we are trying to do here is to see whether we can reject the nullhypothesis (because that means that our suspect that there are more vegetarian in our school is likely to be true).

If the sample proportion was 20%, then we can also include sample proportions that are greater than 20% in order to test the nullhypothesis.
• Why is the professor trying to run a simulation when you can calculate the binomial distribution with p = 0.06? For n = 25, the probability of getting P(p>=20%) = 1 - 0.98495 = 0.01505. Very, very different from the biased 7,5% found in the exercise.
• I think this simulation helps us estimate the level of confidence (alpha) rather than the p-value.
(1 vote)
• Please show us how to obtain the P Value without simulation.
• I don't think that's possible unless you physically keep repeating the experiment (which is basically what the simulations do for you, just in theory).
(1 vote)
• Why do we always take the value of significance as 0.05? Is it a universal value or what?
• It's simply a rule of thumb. In medicine, for instance, you would definitely NOT want to have a significance level as high as 0,05. Instead, you might want a significance level of, for instance, 0,001.
The lower the significance level, the harder it is to reject the H0. The reason you'd want the H0 to be hard to reject in the medical field is simple. Imagine if you were to give a medicine to a patient, and there is, for instance, a 5% chance (significance level of 0,05) that the medicine doesn't work. That would be catastrophic.
• I tried working this problem by first calculating the standard deviation for the sample given the null hypothesis was true.

sqrt((0.06*0.94)/25) = 0.0475

I then tried plugging this into the normalcdf function on my calculator with the following inputs.

minimum: 0.2
maximum: 1
mean: 0.06
standard deviation: 0.0475

I got an answer of about 0.16%. This is completely off from the 7.5% that Sal got in the video (). Why does the way I tried to solve it not work? Thanks for your help!
• the formula of Z = (m_sample-m_population)/std_sample might give 0.16% as the p-value

and this equation relies on Z-table, which assumes the sample distribution should be normal

but as we see above in the simulation, it's not normally distributed. and the expected # of success cases (1.5) are also less than 10 (while that of failure cases, 23.5 is greater than 10). thus it is failed to meet the normal condition

in short, if the normal condition wasn't met for z_table and then p_value, we better use simulation. and that might be the (implicit) point of this video, i believe