If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

When to use z or t statistics in significance tests

When to use z or t statistics in significance tests.

Want to join the conversation?

  • blobby green style avatar for user ju lee
    when n (sample size) is greater or equal to 30, can we use use z statistics because the sampling distribution of the sample mean is approximately normal, right? if this is the case, then why does t table contain rows where the degree of freedom is 100, 1000 etc (i.e. degree of freedom = n - 1)? if n is greater or equal to 30, we would be using a z table anyway, so is the rows in t table that have degree of freedom greater than 30 redundant?
    (15 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Geoffrey  Pinkner
    This video would have been so helpful waaay back when we were first introduced to sampling distributions.
    (13 votes)
    Default Khan Academy avatar avatar for user
  • duskpin ultimate style avatar for user Kartikey
    But when Sal used simulation technique for calculating p value, the answer is very different.

    In this previous video- *"Estimating a P-value from a simulation"* https://www.khanacademy.org/math/statistics-probability/significance-tests-one-sample/idea-of-significance-tests/v/estimating-p-value-from-simulation
    I calculated the p value by the formula which Sal just described. However, the answer is very very different.

    Simulation p value- 7.5%
    formula p value - 0.16%

    I have calculated it three times. Can anyone explain this difference in p values ?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • winston default style avatar for user Victor Gutierrez
      Yes, I also get a p-value of 0.16% with the formula.
      I think the problem here is that we do not meet all the conditions for inference. We do not meet the normal condition. n < 30. and n * p is way less than 10. This problem with normality we can also see in the simulation that is performed in the problem, where the distribution of dots is very skewed to the right, being very far from being normal.
      (2 votes)
  • blobby green style avatar for user Ranjit
    Sal seems to suggest that Z method should be used in the case of sample proportions, and T method for the sample mean. Can one confirm? I thought I saw an earlier video that says if the number of samples is greater than 30 then Z score gives more accurate results, while T score is advised if the number of samples is less than 30.
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user martynas.venckus97
    Why does Sal introduce another variable of p_0, when he could just use p_1 in calculation Z statistic?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • cacteye blue style avatar for user Jerry Nilsson
      Let's say 𝐻₀: 𝑝 < 𝑝₁

      Then there is no assumed population proportion, we just assume that the true population proportion is less than whatever value 𝑝₁ is, and 𝑝₀ is the true population proportion given that 𝐻₀ is true.

      By convention we always treat 𝑝₀ and 𝑝₁ as separate quantities regardless of what 𝐻₀ says.
      (1 vote)
  • blobby green style avatar for user Will Leach
    Does anyone know at what a standard deviation of the sampling distribution of the sample mean is? If so, any chance can you explain it?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • eggleston orange style avatar for user el_dAna
      Say we have a population P.
      We are interested in knowing the population parameter mean, but we cant access all the population.

      We take a sample called p1 and find its sample mean.
      We take a sample again say p2 and find its sample mean.
      We do this severally for n samples and so we have n means.
      We then plot the means we have a sampling distribution of means.

      By the Central Limit Theorem the means would have a normal distribution...just an aside.

      Now, finding the standard deviation of that sampling distribution we just plotted is what Sal referred to as the standard deviation of the sampling distribution of the sample mean. It's quite a mouthful. Hopefully you can wrap your head around it now.

      H_L.
      (1 vote)
  • purple pi purple style avatar for user EngrMama
    This whole video is great for the difference between when we use z-statistics and when we use t-statistics. How do we know when to use Chi-square-statistics versus f-statistics versus ANOVA versus Linear Regression versus Multiple Regression?
    I guess I need a chart that compares all the requirements for each type of evaluation similar to the two columns we end up with at !
    (2 votes)
    Default Khan Academy avatar avatar for user
  • mr pink orange style avatar for user Carly Smallwood
    I still don't understand why standard error for mean is considered an estimate (since it relies on sample stdev s_x), but sigma_p^ is not considered an estimate (even though it relies on the presumed p_0) in the case of proportions.
    (2 votes)
    Default Khan Academy avatar avatar for user
    • purple pi purple style avatar for user unalivepool
      Finally, someone had the same question!!

      Here's my understanding

      The standard error of the mean (SEₓ̄) is calculated using the sample standard deviation (sₓ) and the square root of the sample size (n): SEₓ̄ = sₓ / √n. It represents the variability or standard deviation of the sample means distribution, not the original data's distribution.

      The standard error of the proportion (SEₚ̂) is calculated using the presumed population proportion (p₀) under the null hypothesis and the sample size (n): SEₚ̂ = sqrt(p₀(1-p₀) / n). It represents the expected variability or standard deviation of the sampling distribution of the proportion.

      Now,
      It's(SEₓ̄) considered an estimate because sₓ, the sample standard deviation, is an estimate of the unknown population standard deviation σ. Since σ is typically unknown, sₓ is used as the best available estimate, making SEₓ̄ an estimate as well. This estimation introduces uncertainty because sₓ can vary from sample to sample.

      While, SEₚ̂ is based on p₀, the hypothesized or claimed population proportion, not directly an estimate from the sample

      This is why SEₚ̂ isn't labeled as an "estimate" in the same manner as SEₓ̄; it’s built on a specific hypothesis or claim (p₀) rather than derived from the sample.

      In summary, while both standard errors are used in inferential statistics, the reason we typically consider SEₓ̄ an estimate while not always labeling SEₚ̂ as such lies in their foundational assumptions and sources: one stems directly from sample data and estimation, while the other begins with a presumed parameter, influencing their interpretation
      (1 vote)
  • blobby green style avatar for user Yadi Lang
    When using sample standard deviations, Why is the biased n taken instead of the unbiased estimator (n-1)? Thanks!
    (1 vote)
    Default Khan Academy avatar avatar for user
  • leaf orange style avatar for user PTNLemay

    I've seen this description before and it confuses me greatly. That dividing by the square-root of the population is going to change the standard deviation tremendously. Shouldn't the standard deviation of a sample group be more similar to the standard deviation of the overall population?
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

- [Tutor] What I wanna do in this video is give a primer, I'm thinking about when to use a z statistic versus a t statistic, when we are doing significance tests. So there's two major scenarios that we will see in an introductory statistics class, one is when we are dealing with proportions, so I'll write that on the left side right over here and the other is when we are dealing with means. In the proportion case, when we are doing our significance test, we will set up some null hypothesis, that usually deals with the population proportion, we might say it is equal to some value, let's just call that P sub one and then maybe you have an alternative hypothesis, that, well, no, the population proportion is greater than that or less than that or it's just not equal to that, so let me just go with that one, it's not equal to P sub one and then what we do to actually test, to actually do the significance test is we take a sample from the population, it's going to have a sample size of n, we need to make sure that we feel good about making the inference, we've talked about the conditions for inference in previous videos multiple times, but from this we calculate the sample proportion and then from this, we calculate the P value and the way that we do the P value, remember the P value is the probability of getting a sample proportion at least this extreme and if it's below some threshold, we reject the null hypothesis and suggest the alternative and over here the way we do that is well, we find an associated z value for that P for that sample proportion and the way that we calculate it, we say, okay look, our z is going to be, how many of the sampling distributions standard deviations are we away from the mean and remember the mean of the sampling distribution is going to be the population proportion, so here we've got this sample statistic, this sample proportion, the difference between that and the assumed proportion, remember when we do these significance tests, we try to figure out the probability assuming the null hypothesis is true and so when we see this P sub zero, this is the assumed proportion from the null hypothesis, so that's the difference between these two, the sample proportion and the assumed proportion and then you'd wanna divide it by what's often known as the standard error of the statistic, which is just the standard deviation of the sampling distribution of the sample proportion and this works out well for our proportions, because in proportions, I can figure out what this is, this is going to be equal to the square root of the assumed population proportion times one minus the assumed population proportion, all of that over n and then I would use this z statistic to figure out the P value and in this case, I would look at both tails of the distribution, because I care about how far I am either above or below the assumed population proportion. Now with means, there's definitely some similarities here, you will make a null hypothesis, maybe you assume the population mean is equal to mu one and then there's going to be an alternative hypothesis, that maybe your population mean is not equal to mu one and you're gonna do something very simple, you take your population, you take a sample of size n and instead of calculating a sample proportion, you calculate a sample mean and actually you can calculate other things, like a sample standard deviation, but now you have an issue, you say, well ideally I would use a z statistic and you could, if you were able to say, well I could take the difference between my sample mean and the assumed mean in the null hypothesis, so that would be this right over here, that's what that zero means, the assumed mean from the null hypothesis and I would then divide by the standard error of the mean, which is another way of saying the standard deviation of the sampling distribution of the sample mean, but this is not so easy to figure out, in order to figure out this, this is going to be the standard deviation of the underlying population divided by the square root of n. We know what n is going to be, if we conducted a sample, but we don't know what the standard deviation is, so instead what we do is we estimate this and so we'll take the sample mean, we subtract from that the assumed population mean from the null hypothesis and we divide by an estimate of this, which is going to be our sample standard deviation divided by the square root of n, but because this is an estimate, we actually get a better result, instead of saying, hey, this is an estimate of our z statistic, we will call this our t statistic and as we will see, we will then look this up in a t table and this will give us a better sense of the probability.