If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Contingency table chi-square test

Sal uses the contingency table chi-square test to see if a couple of different herbs prevent people from getting sick. Created by Sal Khan.

Want to join the conversation?

  • aqualine tree style avatar for user Heather
    Why do we use the data for both 'sick' and 'not sick' in computing our chi squared statistic? It seems like it will make our result seem more deviant than it really is since in each group the number of 'not sick' people is directly related to the number of 'sick' people.
    (31 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user Jonathan.Kayes
    Isn't there the potential for one herb to be really effective and the other to be ineffective? I feel like those scores could cancel each other out, leading to a Type II error (failing to reject the null when it is false). I don't understand why, if you're interested in testing several conditions, it makes sense to mix all the data up.
    (8 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user BHY1976
    At he says that 21% did not get sick and then writes 21% in the row labeled "sick". Did he make a mistake, or am I missing something here?
    (12 votes)
    Default Khan Academy avatar avatar for user
  • orange juice squid orange style avatar for user Itzik katz
    Why use the Chi square statistics to address this problem instead of the Bernoulli one and continue inferring the data as before? Basically, how does one decide which approach to apply?
    (8 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Andrei-Lucian Șerb
      Well, a Bernoulli hypothesis test with two samples would work... if we had two samples :) But in this case we have 3 samples( herb 1, herb 2, and placebo). You just can't compare 3 things to see if they are the same. If you say x1-x2 = 0 it means that x1 & x2 are the same. But if you say x1-x2-x3 = 0 you can't really say anything about them, they could be any numbers that add up to zero. So the best way to do it is to use a contingency table with a chi-square test.
      (6 votes)
  • old spice man green style avatar for user Daniel
    Throughout this hypothesis test the actual values were used, resulting in a chi-square value of 2.53, but I decided to try the whole calculation from scratch using the percentages of each subgroup instead. The result was a chi-square value of 2.08 . So this makes me wonder: is it possible to manipulate the parameters of the study (i.e. obtain a larger sample of the population) to where it would result in a chi-square value greater than our critical chi-square value?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Anna Mueller
      Actually, Chi-square test statistics are extremely sensitive to the sample size - and it is not because larger samples are inherently better. The chi-square always gets larger with larger sample size, and it always gets small with smaller sample sizes (Daniel's math is correct). Thus, you can have a strong statistical association but fail to find it significant with a chi-square test if you have a small sample size (and the reverse). The chi-square test has many limitations - still it is one of the most useful tests in social statistics. The key is to learn to use it appropriately and to learn to interpret your findings in light of the limitations of a chi-square. For more information, see your friendly stats textbook. I suggest (because I use this book in my own stats classes and have it handy) The Essentials of Social Statistics for a Diverse Society, page 210 for a discussion of this specific issue (sample size and Chi-square test statistics).
      (9 votes)
  • aqualine seedling style avatar for user Katharine Young
    Why would you include the number of people who got sick and also took an herb in the expected percentage of who would get sick with no interference? I would think the whole point of having a control group would be to get the actual percentage of people who would get sick with no interference and then test the observed for the two herb groups. Based on this test you could see if there was a difference between observed for the herb and expected for no interference. Then you could answer if the herb made a difference or not. What Sal did seems to be was say that if we are assuming that there is no difference we can just include the sick people from the herb categories in the percentage that get sick with no interference.
    (6 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user vixignus
    Shouldn't we be performing a two tailed test here? the null hypothesis says that the effect of the herbs is nothing and the alternate hypothesis says that the effect is not nothing.
    (4 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user SwizzleStick
      By its nature, this chi-square test is one-tailed. This is because you square the observed frequency minus the expected frequency, so you will never get a negative number, so you can never have a negative chi-squared as the result. Consequently, the test only needs to have one tail.
      (2 votes)
  • purple pi teal style avatar for user Varvara Stesina
    But don't we do here an overcalculation, counting one the same error two times (because second row = 100% - first row)?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user jpostley73
    if Anna Mueller is correct and this is not the proper statistical method for assessing this question. what is the correct method
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Anna Mueller
      there are many ways you could correctly answer the question of herbs doing something. One simple way would be to run two separate chi-squares, one that tests herb1 versus placebo and then one that tests herb2 versus placebo. The test you would use in part depends on exactly what question you want answered.
      (5 votes)
  • blobby green style avatar for user Alek
    In this video you said that Ho: Herbs are useless. I would have said that Ho:Herbs had effect. How would you know which one to use for Ho in this one? I thought that Ho was something you're trying to prove. Aren't we trying to prove that the herbs have effect?
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

Let's say there are a couple of herbs that people believe help prevent the flu. So, to test this, what we do is we wait for flu season and we randomly assign people to three different groups. And over the course of flu season, we have them either in one group taking herb one, in the second group taking herb two, and in the third group they take a placebo. And if you don't know what a placebo is it's something that, to the patient or to the person participating, it feels like they're taking something that you've told them might help them, but it does nothing. It could be just a sugar pill, just so it feels like medicine. The reason why you would even go through the effort of giving them something is because oftentimes there's something called a placebo effect, where people get better just because they're being told that they're being given something that will make them better. So this could, right here, just be a sugar pill, and a very small amount of sugar so it really can't affect the actual likelihood of getting the flu. So over here we have a table, and this is actually called a contingency table. And it has on it in each group the number that got sick, the number that didn't get sick. And so we also can from this calculate the total number. So in group one, we had a total of 120 people. In group two, we had a total of 30 plus 110 is 140 people. And in the placebo group, the group that just got the sugar pill, we had a total of 120 people. And then we could also tabulate the number of people, the total number of people, that got sick. So that's 20 plus 30 is 50 plus 30 is 80. This is the total column right over here. And then the total people that didn't get sick over here is 100 plus 110 is 210 plus 90 is 300, and then the total people here are 380. Both this column and this row should add up to 380. So with that out of the way, let's think about how we can use this information in the contingency table and our knowledge of the chi-square distribution to come up with some conclusion. So let's just make a null hypothesis. Our null hypothesis is that the herbs do nothing. Let's just assume-- let me get some space here-- so let's assume the null hypothesis that the herbs do nothing. And then we have our alternative hypothesis, or alternate hypothesis, that the herbs do something. Notice I don't even care whether they actually improve. I'm just saying they do something. They might even increase your likelihood of getting the flu. We're not testing whether they're actually good. We're just saying, are they different than just doing nothing. So like we do with all of our hypothesis tests, let's just assume the null. We're going to assume the null and, given that assumption, figure out if the likelihood of getting data like this or more extreme is really low. And if it is really low, then we will reject the null hypothesis. And in this test, like every hypothesis test, we need a significance level. And let's say our significance level we care about for whatever reason is 10% or 0.10. That's the significance level that we care about. Now to do this, we have to calculate a chi-square statistic for this contingency table. And to do that, we do it very similar to what we did with the restaurant situation. We figure out, assuming the null hypothesis, the expected results you would've gotten in each of these cells. You could view each of these entries as a cell. You know that's what we do with it. You call each of those entries in Excel also a cell, each of the entries in a table. What we do is we figure out what the expected value would have been if you do assume the null hypothesis. Then we find the squared distance from that expected value, and we, I guess you could call it, normalize it by the expected value. Take the sum of all of those differences, and if those squares differences are really big, the probability of getting it would be really small, and maybe we'll reject the null hypothesis. So let's just figure out how we can get the expected number. So we're assuming the herbs do nothing. So if the herbs do nothing, then we can just figure out that this whole population just had nothing happen to them. These herbs were useless. And so we can use this population sample-- or I shouldn't call it the population-- we should use this sample right here to figure out the expected number of people who would get sick or not sick. And so over here, we have 80 out of 380 did not get sick. And I want to be careful, I just said the word population, but we haven't sampled the whole universe of all people taking this herb. This is a sample. So I don't want to confuse you. I was using population in more of the conversational sense than the statistical sense. But anyway, of our sample-- and we're using all of the data because we're assuming there's no difference. We might as well just use the total data to figure out the expected frequency of getting sick and not getting sick. So 80 divided by 380 did not get sick. And that's 21%. 21% did not get sick. So let me write that over here. So 21, and that's 21% of the total, and then this would be 79% if we just subtract 1 minus 21. We could divide 300 by 380, and we should get 79% as well. So you would expect-- one would expect-- that 21% of your total, based on the total sample right over here, that our best guess is that 21% percent should be getting sick and 79% should not be getting sick. So let's look at it for each of these groups. If we assume that 21% of these 120 people should have gotten sick, what would have been the expected value right over here? So let's just multiply 21% times 120. So let's just multiply that times 120. That gets us to 25 point-- I'll just round it-- 25.3 people should have gotten sick. So the expected-- so let me write it over here, I'll do expected in yellow-- so the expected right over here. If you assume that 21% of each group should have gotten sick is that you would have expected 25.3 people to get sick in group one, in herb one group. And then the remainder will not get sick. So let's just subtract or I could actually multiply 79% times 120, either one of those would be good. But let me just take 120 minus 25.3, and then I get 94.7. So you would have expected 94.7 to not get sick. So this is expected again. 94.7 to not get sick. And now let's do that for each of these groups. So once again, group two, you would've expected 21% to get sick. 21% of the total people in that group, so that's 140, so that's 29.4. And then the remainder-- let's see, 140 minus 29.4-- should not have gotten sick. So that gets us this right here. We have 29.4 should have gotten sick if the herbs did nothing. And then, over here, we would have 110.6 should not have gotten sick. And these are pretty close. So, just looking at the numbers, it looks this herb doesn't do too much relative to the total, all of the groups combined. And then in the placebo group, let's see what happens. Let's see what happens. We expect 21% to get sick, 21% of our group of 120. So it's 25.2. So this right over here. And actually, this should be 25 point-- since we're rounding, actually, these will be the same number over here-- so I said 21%, but it's 21 point something something something. The group sizes are the same, so we should expect the same proportion to get sick. So I'll say 25.3 just to make it consistent. The reason why I got 25.2 just now is because I lost some of the trailing decimals over here. But since I had them over here, I'm going to use them over here as well. And then over here in this group, you would expect 94.7 to get sick. So if you just actually relied on this data, it looks like herb two is actually, to some degree, even worse than the-- oh. No, no, I take that back. It's not worse because you would have expected a small number, and a lot of people got sick here. So this is the placebo-- Well anyway, we don't want to make judgments just staring at the numbers. Let's figure out our chi-square statistic. And to do that, let's get our statistic, our chi-square statistic. I'll write it like this, maybe, for fun. Or maybe I'll write it as a big X because it's really, this random variables distribution, is approximately a chi-square distribution. So I'll write it like that. And, well, we'll talk about the degrees of freedom in a second. Actually, let me write it with the curly X, just so you see that some people write it with the chi instead of the X. So our chi-square statistic over here. We're literally just going to find the squared distance between the observed and expected. And then divide it by the expected. So it's going to be 20 minus 25.3 squared over 25.3 plus 30 minus 29.4 squared over 29.4-- I'm going to run out of space-- plus 30 minus 25.3 squared over 25.3. And then I'm going to have to do these over here, so let me just continue it. You could ignore this H1 over here. So plus 100 minus 94.7 squared over 94.7 plus-- I think you see where this is going-- 110 minus 110.6 squared over 110.6. And then, finally, plus 90 minus 94.7-- let me scroll to the right a little bit-- squared, all of that over 94.7. So let me just get the calculator out to calculate this. Take a little bit of time. So we have-- I have to type on the calculator for these parentheses-- so we have 20 minus 25.3 squared divided by 25.3 plus, open parentheses, 30 minus 29.4 squared divided by 29.4 plus, open parentheses, 30 minus 25.3 squared divided by 25.3-- halfway there-- plus 100, open parentheses, this is the tedious part, 100 minus 94.7 squared divided by 94.7 plus 110 minus-- I'll let you type it out, we can do a lot of these in our head, but let me just do it-- 110 minus 110.6 squared divided by 110.6-- and then last one, homestretch, assuming we haven't made any mistakes-- we have 90 minus 94.7 squared divided by 94.7. And let's see what we get. We get 2.528, so let's just say it's 2.53. So our chi-square statistic-- always have trouble saying that-- our chi-square statistic, assuming the null hypothesis is correct, is equal to 2.53. Now, the next thing we have to do is figure out the degrees of freedom that we had in calculating the chi-square statistic. And I'll give you the rule of thumb, and I'll give you a little bit of a sense of why this is the rule of thumb for a contingency table like this. And in the future, we'll talk a little bit more deeply about degrees of freedom. So the rule of thumb for a contingency table is you have the number of rows, so you have rows, and then you have your number of columns. So here we have two rows, and we have three columns. You don't count the totals. So you have three columns over here. And the degrees of freedom, and this is the rule of thumb, the degrees of freedom for your contingency table is going to be the number of rows minus 1 times the number of columns minus 1. In our situation, we have 2 rows and 3 columns. So it's going to be 2 minus 1 times 3 minus 1. So it's going to be 2 minus 1 times 3 minus 1, which is just 1 times 2, which is 2. We have 2 degrees of freedom. Now, the reason that that should make a little bit of intuitive sense, we'll talk about this in more depth in the future, is that if you assume that you know the totals. So let's just assume that you know the totals. So if you know all of this information over here, if you know the total information-- or actually, if you knew the parameters of the population as well-- but if you know the total information, and if you know this information, or if you know r minus 1 of the information in the rows, the last one can be figured out just by subtracting from the total. So for example, in this situation, if you know this, you can easily figure out this. This is not new information, it's just the total minus 20. Same thing, if you know this one right over here, this one over here is not new information. And similarly, if you know these two, this guy over here isn't new information. You could always just calculate him based on the total and everything else. So that's the sense of why our degrees of freedom are the columns minus 1 times the rows minus 1. But anyway, so our chi-square statistic has 2 degrees of freedom. So what we have to do is remember our alpha value-- let me get it up here, we had it right over here-- our significance level that we care about, our alpha value is 10%. Let me rewrite it over here. So our alpha is 10%. So what we're going to do is figure out what is our critical chi-square statistic that gives us an alpha of 10%. If this is more extreme than that-- if the probability of getting this is even less than that critical statistic-- it'll be less than 10%, and we'll reject the null hypothesis. If it's not more extreme, then we won't reject the null hypothesis. So what we need to do is to figure out with the chi-square distribution and 2 degrees of freedom, what is our critical chi-square statistic. So let's just go back. So we have 2 degrees of freedom. We care about a significance level of 10%. So our critical chi-square value is 4.60. So another way to visualize this. If we look at the chi-square distribution with 2 degrees of freedom, that's this blue one over here, at a value of-- I'm trying to pick a nice blue to use-- at a critical value of 4.60. So 4.60-- this is 5-- so 4.60 will be right around here. At a critical value of 4.60, so this is 4.60, the probability of getting something at least that extreme, so that extreme or more extreme, is 10%. This is what we care about. Now, if the chi-square statistic that we calculated falls into this rejection region, then we're going to reject the null hypothesis. But our chi-square statistic is only 2.53. It is only 2.53. So it's sitting someplace right over here is actually ours. So it's actually not that crazy to get it if you assume the null hypothesis. So based on our data that we have right now, we cannot reject the null hypothesis. So we don't know for a fact that the herbs do nothing, but we can't say that they do something based on this. So we're not going to reject it. We won't say 100% that it's true, but we can't say that we're rejecting it. So at least from this point of view, it doesn't look like the herbs did anything that would make us believe that they're any different than each other. And one of the herbs is obviously a placebo. So any different than a placebo or each other.