Statistics and probability
- Reasonable samples
- Valid claims
- Making inferences from random samples
- Identifying a sample and population
- Identify the population and sample
- Examples of bias in surveys
- Example of undercoverage introducing bias
- Correlation and causality
- Identifying bias in samples and surveys
- Simulation and randomness: Random digit tables
Correlation and causality
Understanding why correlation does not imply causality (even though many in the press and some researchers often imply otherwise). Created by Sal Khan.
Want to join the conversation?
- So how do we know, given some data, that two variables are just correlated or there's some causality between them?(150 votes)
- I'm a statistician and I can categorically state that causality is ideological.
That is, if the data is related (correlated), and if you susplect one causes the other, you are making an ideological statement. It might be true, it might not be – there isn’t enough information to supported or rejected that assertion.
Sometimes the statement is very obvious - the temperature is correlated to the length of the day... well... the length of the day relates to the amount of sun shine, and therefore we can safely say that the length of the day causes changes in temperature. Sometimes the statment isn't so obvious, like above example. What appears to be a perfectly logical assumption has no basis. The same used to happen in history where people though bad smells gave you diseases (rather than both bad smells and diseases being related to poor hygene and microbial action).
So at the very least causation is a hypothesis (hypothetical thesis – unproven theory), and at best an accepted theory (i.e. previous studies have confirmed that one is likely to cause the other).
What does this mean? If you find that data are correlated (related), you should then determine if one causes the other.(293 votes)
- So what is the perfect definition for the causality?(2 votes)
- Causality is relation between something as cause and other thing as effect.
So, it's not "just" about relation (correlation), there must be cause and effect. To make it clear, we have to distinguish causality from correlation.
Let say we have two variables: A and B.
A and B correlates when the value of A and B changes together; for example, when A's values increase, B's values decrease. However, we cannot say yet that A causes the change of B.
Here are great examples that correlation doesn't equal causation:
- What is the difference between causality and causation?(3 votes)
- Hmmm, I think they are pretty close, but used in different contexts. "Causality" is a general, absolute property of the universe, which most scientists believe is an important building block of the real world. They want their theories to respect "causality" meaning that the cause (or causes) of every specific event must happen before the event (say, the decay of a radioactive atom must happen before the click in the geiger counter). "Causation" is usually used to refer to categories, and often only in a probabilistic sense, such as "smoking causes lung cancer", or "global warming causes floods".(12 votes)
- Maybe a combination of eating healthy meals and exercise can result in a decrease in obesity?(7 votes)
- Yes of course. Nutrition are and exercise repeated over long periods of time are the only significant causes for weight loss.(4 votes)
- Are there real world applications of causality?(0 votes)
- Yes..Oil prices are causal to inflation to most if coutries(9 votes)
- I don't see how this relates to math, but okay.(3 votes)
- i need help(2 votes)
- 1st person here(2 votes)
I have this article right here from WebMD. And the point of this isn't to poke holes at WebMD. I think they have some great articles and they have some great information on their site. But what I want to do here is to think about what a lot of articles you might read or a lot of research you might read are implying and to think about whether they really imply what they claim to be implying. So this is an excerpt of an article, and the title of the article says "Eating breakfast may beat teen obesity." So they're already trying to create this cause-and-effect relationship. The title itself says if you eat breakfast then you're less likely-- or you won't be obese. You're not going to be obese. So the title right there already sets up this. That eating breakfast may beat teen obesity. And then they tell us about the study. "In the study, published in Pediatrics, researchers analyzed the dietary and weight patterns of a group of 2,216 adolescents over a five-year period from public schools in Minneapolis-Saint Paul, Minnesota." And I won't talk too much about this. It looks like a good sample size. It was over a large period of time. I'll just give the researchers the benefit of the doubt, assume that it was over broad audience, that they were able to control for a lot of variables. But then they go on to say, "The researchers write that teens who ate breakfast regularly had a lower percentage of total calories from saturated fat and ate more fiber and carbohydrates." And to some degree that first-- "than those who skipped breakfast." And to some degree this first sentence is obvious. Breakfast tends to be things like cereals, grains. You eat syrup, you eat waffles-- that all tends to fall in the category of carbohydrates and sugars. And frankly, that's not even necessarily a good thing. Not obvious to me whether bacon is more or less healthy than downing a bunch of syrup or Fruit Loops or whatever else. But we'll let that be right here. "In addition, regular breakfast eaters seemed more physically active then the breakfast skippers." So over here they're once again trying to create this other cause-and-effect relationship. Regular breakfast eaters seemed more physically active than the breakfast skippers. So the implication here is that breakfast makes you more active. And then this last sentence right over here, they say "Over time, researchers found teens who regularly ate breakfast tended to gain less weight and had a lower body mass index than breakfast skippers." So you could-- they're telling us that breakfast skipping-- this is the implication here-- is more likely, or it can be a cause of making you overweight or maybe even making you obese. So the entire narrative here, from the title all the way through every paragraph, is look, breakfast prevents obesity. Breakfast makes you active. Breakfast skipping will make you obese. So you just say then, boy, I have to eat breakfast. And you should always think about the motivations and the industries around things like breakfast. But the more interesting question is does this research really tell us that eating breakfast can prevent obesity? Does it really tell us that eating breakfast will cause some to become more active? Does it really tell us that breakfast skipping can make you overweight or make it obese? Or, it is more likely, are they showing that these two things tend to go together? And this is a really important difference. And let me kind of state slightly technical words here. And they sound fancy, but they really aren't that fancy. Are they pointing out causality, which is what it seems like they're implying. Eating breakfast causes you to not be obese. Breakfast causes you to be active. Breakfast skipping causes you to be obese. So it looks like they are kind of implying causality. They're implying cause and effect, but really what the study looked at is correlation. The whole point of this is to understand the difference between causality and correlation because they're saying very different things. Causality versus correlation. And, as I said, causality says A causes B. Well, correlation just says A and B tend to be observed at the same time. Whenever I see B happening, it looks like A is happening at the same time. Whenever A is happening, it looks like it also tends to happen with B. And the reason why it's super important to notice the distinction between these is you can come to very, very, very, very, very different conclusions. So the one thing that this research does do, assuming that it was performed well, is it does show a correlation. So the study does show a correlation. It does show, if we believe all of their data, that breakfast skipping correlates with obesity and obesity correlates with breakfast skipping. We're seeing it at the same time. Activity correlates with breakfast and breakfast correlates with activity-- that all of these correlate. What they don't say-- and there's no data here that lets me know one way or the other-- what is causing what or maybe you have some underlying cause that is causing both. So for example, they're saying breakfast causes activity, or they're implying breakfast causes activity. They're not saying it explicitly. But maybe activity causes breakfast. Maybe. They didn't write the study that people who are active, maybe they're more likely to be hungry in the morning. Activity causes breakfast. And then you start having a different takeaway. Then you don't say, wait, maybe if you're active and you skip breakfast-- and I'm not telling you that you should. I have no data one way or the other-- maybe you'll lose even more weight. Maybe it's even a healthier thing to do. We're not sure. So they're trying to say, look, if you have breakfast it's going to make you active, which is a very positive outcome. But maybe you can have the positive outcome without breakfast. Who knows? Likewise they say breakfast skipping, or they're implying breakfast skipping, can cause obesity. But maybe it's the other way around. Maybe people who have high body fat-- maybe, for whatever reason, they're less likely to get hungry in the morning. So maybe it goes this way. Maybe there's a causality there. Or even more likely, maybe there's some underlying cause that causes both of these things to happen. And you could think of a bunch of different examples of that. One could be the physical activity. And these are all just theories. I have no proof for it. But I just want to give you different ways of thinking about the same data and maybe not just coming to the same conclusion that this article seems like it's trying to lead us to conclude. That we should eat breakfast if we don't want to become obese. So maybe if you're physically active, that leads to you being hungry in the morning, so you're more likely to eat breakfast. And obviously being physically active also makes it so that you burn calories. You have more muscle. So that you're not obese. So notice if you view things this way, if you say physical activity is causing both of these, then all of a sudden you lose this connection between breakfast and obesity. Now you can't make the claim that somehow breakfast is the magic formula for someone to not be obese. So let's say that there is an obese person-- let's say this is the reality, that physical activity is causing both of these things. And let's say that there is an obese person. What will you tell them to do? Will you tell them, eat breakfast and you won't become obese anymore? Well, that might not work, especially if they're not physically active. I mean, what's going to happen if you have an obese person who's not physically active? And then you tell them to eat breakfast? Maybe that'll make things worse. And based on that, that the advice or the implication from the article is the wrong thing. Physical activity maybe is the thing that should be focused on. Maybe something other than physical activity. Maybe you have sleep, maybe people who sleep late and they're not getting enough sleep, maybe that leads to obesity. And obviously, because they're not getting enough sleep, they wake up as late as possible and they have to run to the next appointment-- or they have to run to school in the case of students-- and maybe that's why they skip breakfast. So once again, if you find someone that's obese, maybe the rule here isn't to force a breakfast down your throat. Maybe it will become even worse because maybe it is the lack of sleep that's causing your metabolism to slow down or whatever. So it's very, very important when you're looking at any of these studies to try to say, is this a correlation or is this causality? If it's correlation, you cannot make the judgment that, hey, eating breakfast is necessarily going to make someone less obese. All that tells you is that these things move together. A better study would be one that is able to prove causality. And then we could think of other underlying causes that would kind of break down the narrative that this piece is trying to say. I'm not saying it's wrong. Maybe it's absolutely true that eating breakfast will fight obesity. But I think it's equally or more important to think about what the other causes are, not to just make a blanket statement like that. So for example, maybe poverty causes you to skip breakfast for multiple reasons. Maybe both of your parents are working. There's no one there to give you breakfast. Maybe there's more stress in the-- who knows what it might be? And so when you have poverty maybe you're more likely to skip breakfast and maybe when there's poverty, and maybe you have two-- both your parents are working and the kids have to make their own dinner and whatever else-- maybe they also eat less healthy at all times of day and then that leads to obesity. So once again in this situation, if this is the reality of things, just telling someone to also eat breakfast regardless of what that breakfast is, even if it's Fruit Loops or syrup, that's probably not going to help the situation. Maybe it's just eating unhealthy dinners is the underlying cause. And if you eat an unhealthy dinner maybe by breakfast time you're not hungry still because you've binged so much on breakfast. So you skip breakfast. And this also leads to obesity. But once again, if this is the actual reality, doing the advice that that article's saying might actually be a bad thing. If you need an unhealthy dinner and then force yourself to eat a breakfast when you're not hungry, that might make the obesity even worse. So the whole point of this video isn't to say that the implications from that article are necessarily wrong. The important thing is to just realize that it might be wrong. And that just because you saw this correlation with the data, it doesn't mean that eating breakfast is going to somehow magically fight obesity.