Main content
High school statistics
Course: High school statistics > Unit 4
Lesson 2: Analyzing trend lines in scatterplots- Interpreting slope of regression line
- Interpreting y-intercept in regression model
- Interpreting slope and y-intercept for linear models
- Equations of trend lines: Phone data
- Example: Correlation coefficient intuition
- Correlation coefficient intuition
- Correlation and causality
© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Correlation and causality
Understanding why correlation does not imply causality (even though many in the press and some researchers often imply otherwise). Created by Sal Khan.
Want to join the conversation?
- So how do we know, given some data, that two variables are just correlated or there's some causality between them?(150 votes)
- I'm a statistician and I can categorically state that causality is ideological.
That is, if the data is related (correlated), and if you susplect one causes the other, you are making an ideological statement. It might be true, it might not be – there isn’t enough information to supported or rejected that assertion.
Sometimes the statement is very obvious - the temperature is correlated to the length of the day... well... the length of the day relates to the amount of sun shine, and therefore we can safely say that the length of the day causes changes in temperature. Sometimes the statment isn't so obvious, like above example. What appears to be a perfectly logical assumption has no basis. The same used to happen in history where people though bad smells gave you diseases (rather than both bad smells and diseases being related to poor hygene and microbial action).
So at the very least causation is a hypothesis (hypothetical thesis – unproven theory), and at best an accepted theory (i.e. previous studies have confirmed that one is likely to cause the other).
What does this mean? If you find that data are correlated (related), you should then determine if one causes the other.(293 votes)
- So what is the perfect definition for the causality?(2 votes)
- Causality is relation between something as cause and other thing as effect.
So, it's not "just" about relation (correlation), there must be cause and effect. To make it clear, we have to distinguish causality from correlation.
Let say we have two variables: A and B.
A and B correlates when the value of A and B changes together; for example, when A's values increase, B's values decrease. However, we cannot say yet that A causes the change of B.
Here are great examples that correlation doesn't equal causation:
http://www.tylervigen.com/spurious-correlations(26 votes)
- What is the difference between causality and causation?(3 votes)
- Hmmm, I think they are pretty close, but used in different contexts. "Causality" is a general, absolute property of the universe, which most scientists believe is an important building block of the real world. They want their theories to respect "causality" meaning that the cause (or causes) of every specific event must happen before the event (say, the decay of a radioactive atom must happen before the click in the geiger counter). "Causation" is usually used to refer to categories, and often only in a probabilistic sense, such as "smoking causes lung cancer", or "global warming causes floods".(12 votes)
- Maybe a combination of eating healthy meals and exercise can result in a decrease in obesity?(7 votes)
- Yes of course. Nutrition are and exercise repeated over long periods of time are the only significant causes for weight loss.(4 votes)
- idk if this is math(5 votes)
- i need help(4 votes)
- I like how the rest is about scatter plots but then this is about obesity(4 votes)
- Are there real world applications of causality?(0 votes)
- Yes..Oil prices are causal to inflation to most if coutries(9 votes)
- But I will add to my previous comment, just to be fair and balanced, the point of your discussion was spot on… always question the narrative, whether it be in “statistical analysis” or life in general. Regards.(3 votes)
Video transcript
I have this article
right here from WebMD. And the point of this isn't
to poke holes at WebMD. I think they have
some great articles and they have some great
information on their site. But what I want to
do here is to think about what a lot of
articles you might read or a lot of research you
might read are implying and to think about
whether they really imply what they
claim to be implying. So this is an excerpt
of an article, and the title of the article
says "Eating breakfast may beat teen obesity." So they're already
trying to create this cause-and-effect
relationship. The title itself says if
you eat breakfast then you're less likely--
or you won't be obese. You're not going to be obese. So the title right there
already sets up this. That eating breakfast
may beat teen obesity. And then they tell
us about the study. "In the study,
published in Pediatrics, researchers analyzed the
dietary and weight patterns of a group of 2,216 adolescents
over a five-year period from public schools in
Minneapolis-Saint Paul, Minnesota." And I won't talk
too much about this. It looks like a
good sample size. It was over a large
period of time. I'll just give the researchers
the benefit of the doubt, assume that it was
over broad audience, that they were able to control
for a lot of variables. But then they go on to
say, "The researchers write that teens who ate
breakfast regularly had a lower percentage of total
calories from saturated fat and ate more fiber
and carbohydrates." And to some degree that
first-- "than those who skipped breakfast." And to some degree this
first sentence is obvious. Breakfast tends to be
things like cereals, grains. You eat syrup, you
eat waffles-- that all tends to fall in the category
of carbohydrates and sugars. And frankly, that's not even
necessarily a good thing. Not obvious to me
whether bacon is more or less healthy than
downing a bunch of syrup or Fruit Loops or whatever else. But we'll let that
be right here. "In addition, regular
breakfast eaters seemed more
physically active then the breakfast skippers." So over here they're
once again trying to create this other
cause-and-effect relationship. Regular breakfast eaters
seemed more physically active than the breakfast skippers. So the implication
here is that breakfast makes you more active. And then this last
sentence right over here, they say "Over time,
researchers found teens who regularly
ate breakfast tended to gain less weight and
had a lower body mass index than breakfast skippers." So you could--
they're telling us that breakfast skipping--
this is the implication here-- is more likely, or it can be a
cause of making you overweight or maybe even making you obese. So the entire narrative here,
from the title all the way through every
paragraph, is look, breakfast prevents obesity. Breakfast makes you active. Breakfast skipping
will make you obese. So you just say then, boy,
I have to eat breakfast. And you should always
think about the motivations and the industries around
things like breakfast. But the more
interesting question is does this research
really tell us that eating breakfast
can prevent obesity? Does it really tell us
that eating breakfast will cause some to
become more active? Does it really tell us
that breakfast skipping can make you overweight
or make it obese? Or, it is more likely,
are they showing that these two things
tend to go together? And this is a really
important difference. And let me kind of state
slightly technical words here. And they sound fancy, but
they really aren't that fancy. Are they pointing
out causality, which is what it seems like
they're implying. Eating breakfast causes
you to not be obese. Breakfast causes
you to be active. Breakfast skipping
causes you to be obese. So it looks like they are
kind of implying causality. They're implying
cause and effect, but really what the study
looked at is correlation. The whole point of this is
to understand the difference between causality
and correlation because they're saying
very different things. Causality versus correlation. And, as I said, causality
says A causes B. Well, correlation just
says A and B tend to be observed at the same time. Whenever I see B
happening, it looks like A is happening
at the same time. Whenever A is happening,
it looks like it also tends to happen with
B. And the reason why it's super
important to notice the distinction
between these is you can come to very,
very, very, very, very different conclusions. So the one thing that
this research does do, assuming that it
was performed well, is it does show a correlation. So the study does
show a correlation. It does show, if we
believe all of their data, that breakfast
skipping correlates with obesity and
obesity correlates with breakfast skipping. We're seeing it
at the same time. Activity correlates with
breakfast and breakfast correlates with activity--
that all of these correlate. What they don't say--
and there's no data here that lets me know one way or
the other-- what is causing what or maybe you have
some underlying cause that is causing both. So for example, they're saying
breakfast causes activity, or they're implying
breakfast causes activity. They're not saying
it explicitly. But maybe activity
causes breakfast. Maybe. They didn't write the study
that people who are active, maybe they're more likely
to be hungry in the morning. Activity causes breakfast. And then you start having
a different takeaway. Then you don't say, wait,
maybe if you're active and you skip
breakfast-- and I'm not telling you that you should. I have no data one way
or the other-- maybe you'll lose even more weight. Maybe it's even a
healthier thing to do. We're not sure. So they're trying to say,
look, if you have breakfast it's going to make
you active, which is a very positive outcome. But maybe you can have
the positive outcome without breakfast. Who knows? Likewise they say
breakfast skipping, or they're implying breakfast
skipping, can cause obesity. But maybe it's the
other way around. Maybe people who have
high body fat-- maybe, for whatever reason, they're
less likely to get hungry in the morning. So maybe it goes this way. Maybe there's a causality there. Or even more likely,
maybe there's some underlying cause that
causes both of these things to happen. And you could think of a bunch
of different examples of that. One could be the
physical activity. And these are all just theories. I have no proof for it. But I just want to
give you different ways of thinking about the same
data and maybe not just coming to the same conclusion that
this article seems like it's trying to lead us to conclude. That we should eat breakfast if
we don't want to become obese. So maybe if you're
physically active, that leads to you being
hungry in the morning, so you're more likely
to eat breakfast. And obviously being
physically active also makes it so that
you burn calories. You have more muscle. So that you're not obese. So notice if you
view things this way, if you say physical activity
is causing both of these, then all of a sudden
you lose this connection between breakfast and obesity. Now you can't make the
claim that somehow breakfast is the magic formula for
someone to not be obese. So let's say that there
is an obese person-- let's say this is the reality, that
physical activity is causing both of these things. And let's say that there
is an obese person. What will you tell them to do? Will you tell them, eat
breakfast and you won't become obese anymore? Well, that might
not work, especially if they're not
physically active. I mean, what's going
to happen if you have an obese person who's
not physically active? And then you tell
them to eat breakfast? Maybe that'll make things worse. And based on that,
that the advice or the implication from the
article is the wrong thing. Physical activity
maybe is the thing that should be focused on. Maybe something other
than physical activity. Maybe you have sleep,
maybe people who sleep late and they're not
getting enough sleep, maybe that leads to obesity. And obviously, because they're
not getting enough sleep, they wake up as late
as possible and they have to run to the
next appointment-- or they have to run to school
in the case of students-- and maybe that's why
they skip breakfast. So once again, if you
find someone that's obese, maybe the rule here isn't
to force a breakfast down your throat. Maybe it will become even
worse because maybe it is the lack of sleep that's
causing your metabolism to slow down or whatever. So it's very, very
important when you're looking at any of
these studies to try to say, is this a correlation
or is this causality? If it's correlation, you cannot
make the judgment that, hey, eating breakfast is necessarily
going to make someone less obese. All that tells you is that
these things move together. A better study would be one
that is able to prove causality. And then we could think of
other underlying causes that would kind of break down the
narrative that this piece is trying to say. I'm not saying it's wrong. Maybe it's absolutely
true that eating breakfast will fight obesity. But I think it's equally or more
important to think about what the other causes
are, not to just make a blanket statement like that. So for example,
maybe poverty causes you to skip breakfast
for multiple reasons. Maybe both of your
parents are working. There's no one there
to give you breakfast. Maybe there's more
stress in the-- who knows what it might be? And so when you
have poverty maybe you're more likely to
skip breakfast and maybe when there's poverty,
and maybe you have two-- both your
parents are working and the kids have to make their
own dinner and whatever else-- maybe they also eat less
healthy at all times of day and then that leads to obesity. So once again in this
situation, if this is the reality of things,
just telling someone to also eat breakfast regardless
of what that breakfast is, even if it's Fruit Loops
or syrup, that's probably not going to
help the situation. Maybe it's just eating
unhealthy dinners is the underlying cause. And if you eat an unhealthy
dinner maybe by breakfast time you're not hungry
still because you've binged so much on breakfast. So you skip breakfast. And this also leads to obesity. But once again, if this
is the actual reality, doing the advice that
that article's saying might actually be a bad thing. If you need an unhealthy
dinner and then force yourself to eat a
breakfast when you're not hungry, that might make
the obesity even worse. So the whole point
of this video isn't to say that the implications
from that article are necessarily wrong. The important thing is to just
realize that it might be wrong. And that just because you saw
this correlation with the data, it doesn't mean that
eating breakfast is going to somehow magically
fight obesity.