Main content
Statistics and probability
Course: Statistics and probability > Unit 14
Lesson 2: Chi-square tests for relationships- Filling out frequency table for independent events
- Contingency table chi-square test
- Introduction to the chi-square test for homogeneity
- Chi-square test for association (independence)
- Expected counts in chi-squared tests with two-way tables
- Test statistic and P-value in chi-square tests with two-way tables
- Making conclusions in chi-square tests for two-way tables
© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Chi-square test for association (independence)
Chi-square test for association/independence.
Want to join the conversation?
- Just to make sure, calculating Chi-square for association and homogeneity are same but the interpretation is different.
Am I getting this correctly?(17 votes)- Yes, they are calculated the same however the interpretation is different. One is asking if they are independent or have no association with one another and is done on something with one sample and a test for homogeneity is a test for multiple samples and is asking if there is difference between the different samples.(24 votes)
- Maybe I missed it, but when you go back to check the expected values, why did they have to be at least greater than or equal to 5?(7 votes)
- To meet the condition of Large counts for any X^2 Statistic.(4 votes)
- When specifically does one use a T-test and a chi-square test.(5 votes)
- A t-test is used to determine the difference between two sets of data. A chi-square test involves looking for a relationship (homogeneity, independence, or goodness-of-fit.)(8 votes)
- Where did you get the p value from in the last section of the video?(4 votes)
- You can use chi-squared cdf to calculate the probability you would get such a chi squared value. Set the max to 9E99, min to the chi squared statistic you just found, and the degrees of freedom is (row-1)(col-1) as he shows.(3 votes)
- Why isn't it Row total x Column / Table total? also when I tried to use the formula I mentioned it doesn't add up to the total... I don't really understand what is going on(3 votes)
- At, why is the P value 0.018? According to the Chi squared table, the p value, corresponding to the Alpha level of 0.05 and the Degree of Freedom of 4 should be 9.49. 9:21(3 votes)
- That's not how you use the table. The chi-squared value is the input, and then the tail probability is the p-value. Alpha level is just a preset level that needs to be passed in order to reject the H0. Also, how are you going to get a probability higher than 100%?(1 vote)
- I understand all the calculations and hypothesis testing, but I don't understand what "association" means here. What does it mean that there's an association between hand length and foot length or that they're not independent? Does that mean that if you get another sample, you'll get roughly the same distribution/percentages?(2 votes)
- What are the uncertainties during a chi-square association text done using quadrat sampling?(2 votes)
- Say I have data on an entire population that I put into a two-way table based on several categorical variables. If I wanted to measure how far the population data is from what would be expected if there was no association between the categorical variables, would I still use the chi-squared statistic, or would a different number be used because we have population data rather than just sample data?(2 votes)
- Why does df = 4? I had the belief df should equal the number of categories -1 so given its 3x3 possible outcomes should be 9-1. I could be wrong, but I'm just checking.(2 votes)
Video transcript
- [Instructor] We're already familiar with the chi-squared statistic. If you're not, I encourage you to review the videos on that. And we've already done
some hypothesis testing with the chi-squared statistic, and we've even done
some hypothesis testing based on two-way tables. And now we're going to extend that by thinking about a chi-squared test for association between two variables. So let's say that we suspect
that someone's foot length is related to their hand length. That these things are not independent. Well, what we can do is
set up a hypothesis test. And remember, the null hypothesis in a hypothesis test, is
to always assume no news. So what we could say is here is that there is no association. No association between, between foot and hand length. Another way to think about it
is that they are independent. And oftentimes what
we're doing is called a chi-squared test for independence. And then our alternative
hypothesis would be our suspicion there is an association. There is an association. So, foot and hand length
are not independent. So what we can then do
is go to a population, and we can randomly sample it. And so let's say we
randomly sample 100 folks. And for all of those 100 folks, we figure out whether
their right hand is longer, their left hand is longer,
or both hands are the same. And we also do that for the feet, and we tabulate all of the data. And this is the data that we actually get. Now it's worth thinking
about this for a second on how what we just did is different from a chi-squared test for homogeneity. And a chi-squared test for homogeneity, we sample from two different populations where we look at two different groups, and we see whether the distribution of a certain variable amongst those two different groups is the same. Here we are just sampling from one group, but we're thinking about
two different variables for that one group. We're thinking about feet length, and we're thinking about hand length. And so you can see here,
that 11 folks had both their right hand longer and
their right foot longer. Three folks had their right hand longer, but their left foot was longer. And then eight folks had
their right hand longer, but both feet were the same. Likewise, we had nine
people where their left foot and hand was longer,
but you had two people where the left hand was longer, but the right foot was longer. And we can go through all of these. But to do our chi-squared test, we would've said, what
would be the expected value of each of these data points if we assumed that the null hypothesis was true? That there was no association
between foot and hand length. So to help us do that,
I'm going to make a total of our columns here, and also
a total of our rows here. Let me draw a line here,
so we know what's going on. And so, what are the
total number of people who had a longer right hand? Well, it's going to be
11 plus three plus eight, which is 22. The total number of people
who had a longer left hand is two plus nine plus 14, which is 25. And then the total number
of people whose hands had the same length, 12 plus 13 plus 28, 25 plus 28, that is 53. And then if I were to total this column, 22 plus 25 is 47, plus 53,
we get 100, right over here. And then if we total the number of people who had a longer right
foot, 11 plus two plus 12, is 13 plus 12, that is 25. Longer left foot, three plus nine plus 13, that's also 25. And then we can either add these up, and we would get 50, or we could say, hey 25 plus 25 plus what is 100? Well, that is going to be equal to 50. Now to figure out these expected values, remember, we're going to
figure out the expected values assuming that the null hypothesis is true. Assuming that these
distributions are independent. That feet length and hand length
are independent variables. Well, if they are independent,
which we are assuming, then our best estimate is that
22% have a longer right hand, and our best estimate is that
25% have a longer right foot. And so out of 100, you would expect 0.22 times 0.25 times 100 to have a longer right hand and foot. I'm just multiplying the probabilities, which you would do if these
were independent variables. And so 0.22 times 0.25, let's see, one fourth of 22 is 5 1/2, so this is going to be equal to 5.5. Now what number would you expect to have a longer right hand, but a longer left foot? So that would be 0.22
times 0.25 times 100. Well, we already calculated
what that would be. That would be 5.5. And then to figure out the expected number that would have a longer right hand, but both feet would be the same length, we could multiply 22 out
of 100 times 50 out of 100 times 100, which is
going to be half of 22, which is equal to 11. And we can keep going. This value right over here would be 0.25 times 0.25 times
100, 25 times 25 is 625, so that would be 6.25. This value right over here
would be 0.25 times 0.25 times 100, which is again, 6.25. And then this value right over here, a couple of ways we can get it. We can multiply 0.25 times 50 times 100, which would get us to 12.5, or we could have said
this plus this plus this has to equal 25, so this would be 12.5. And on this expected
value, we can figure out because 5.5 plus 6.25 plus
this is going to equal 25. So let's see, 5.5 plus 6.25 is 11.75. 11.75 plus 13.25 is equal to 25. Same thing over here. This would be 13.25, 'cause this is 11.75 plus 13.25 is equal to 25. If we add these two together, we get 26.5. 26.5 plus what is equal to 53? Well, it'd be equal to another 26.5. Now once you figure out all
of your expected values, that's a good time to
test your conditions. The first condition is that
you took a random sample. So let's assume we had done that. The second condition is
that your expected value for any of the data points has
to be at least equal to five. And we can see that all
of our expected values are at least equal to five. The actual data points we got do not have to be equal to five. So it's okay that we got a two here, because the expected value
here is five or larger. And then the last condition
is the independence condition. That either we are
sampling with replacement or that we have to feel comfortable that our sample size is no more
than 10% of the population. So let's assume that
that happened as well. So assuming we met all
of those conditions, we are ready to calculate
our chi-squared statistic. And so what we're going to do, is for every data point,
we're going to find the difference between the data point, 11 minus the expected, minus 5.5, squared over the expected,
so I did that one. Now I'll do this one. So plus three minus 5.5 squared over 5.5 plus, now I'll do this one, eight minus 11 squared over 11, then I'll do this one, two
minus 6.25 squared over 6.25. And I'll keep doing it. I'm going to do it for all
nine of these data points. And I actually calculated
this ahead of time to save some time. And so if you do this for
all nine of the data points, you're going to get a
chi-squared statistic of 11.942. Now before we calculate the P-value, we're going to have to think about what are our degrees of freedom? Now we have a three-by-three table here, so one way to think about it, it's the number of rows minus one, times the number of columns minus one, and this is two times two,
which is equal to four. Another way to think
about it is if you know four of these data points
and you know the totals, then you can figure out
the other five data points. And so now we are ready
to calculate a P-value. And you can do that using a calculator, and you can do that using
a chi-squared table, but let's say we did
it using a calculator, and we get a P-value of 0.018 And just to remind ourselves what this is, this is the probability of getting a chi-squared statistic at
least this large or larger. And so next, we do what we always do with hypothesis testing. We compare this to our significance level. And we actually should have
set our significance level from the beginning. So let's just assume that when we set up our hypotheses here, we also said that we want a significance level of 0.05. You really should do this before
you calculate all of this. But then you compare your P-value to your significance level, and we see that this P-value is a good bit less than our significance level. And so one way to think about it is, we got all these expected values assuming that the null hypothesis was true. But the probability of getting a result this extreme or more
extreme is less than 2%, which is lower than
our significance level. And so this will lead us to
reject our null hypothesis and it suggests to us that
there is an association between hand length and foot length.