If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Types of statistical studies

Types of statistical studies.

Want to join the conversation?

  • purple pi teal style avatar for user Jay Mitchell
    At Sal puts a restriction on the control group of the experiment. Doesn't that take them out of the control group? It seems like in this example we now have two experiments, but no control group. From my understanding of control groups, they either just keep on doing what they're doing, or the changes they make have a known effect, e.g., current best medical treatment (control) versus experimental treatment.
    (9 votes)
    Default Khan Academy avatar avatar for user
    • boggle yellow style avatar for user Jesse Johnson
      Here's what I got from this video, and I hope this helps: what is done with the control group and the experimental group will vary on what you're testing for.

      With this experiment example, Sal is trying to see if the amount of computer time has an impact on blood pressure. You can think of the reason for the experiment is that, in the observational study, we saw a positive correlation between computer usage and blood pressure. So the question for the experiment is: "does high computer usage time cause high blood pressure (is there causality)?"

      Since we want to see if there is causality between high computer usage and high blood pressure, we need then one group, the control, to have a low computer usage, and the experimental group, what we're testing for, to have higher computer time usage. We do this test this way because we need to control the amount of computer time each group has, or we won't have an experiment.

      If both groups, the control and experimental groups, were not given different restrictions on computer time, the test would only be showing what the blood pressure was for each group afterwards, and we wouldn't be able to tell if computer time made any difference.
      (17 votes)
  • piceratops tree style avatar for user aadeoshu
    So what EXACTLY is a confounding variable?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • leaf grey style avatar for user Abdelrahman
      A variable that's not accounted for that may cause variation in the results. You might a conduct a study and conclude that bank thieves are more likely to eat ice cream after the theft. While your results may be statistically robust, you have overlooked the fact that it's because most thefts happen in the summer when the weather is warmer. In this case, "season" or "temperature"are the confounding variables.
      (10 votes)
  • purple pi teal style avatar for user Jay Mitchell
    I'm confused about the difference between the sample and observational study examples. Suppose in the sample study, they collect computer time and blood pressure. It sounds like if they just present the average of both then it remains a sample study, but as soon as they plot them against each other it becomes an observational study. Similarly, in the observational study, they presumably created some criteria to determine which 1000 people were included in the study. Is the difference really based on what is done with the data rather than how the study is conducted?
    (7 votes)
    Default Khan Academy avatar avatar for user
    • boggle yellow style avatar for user Jesse Johnson
      The type of study you use really depends on the data that you have, and what you're trying to find out.

      So, your first question should be is, what am I trying to find out? Am I trying to find out something about a population? Am I trying to compare two variables (like computer use and blood pressure)? Am I trying to see if something causes something else?

      Then, the second question to ask is: what kind of data do I need to collect so that I can answer what I am trying to find out?

      I hope also that you continue with the next few videos, where Sal works through examples of each different type of study. That will help I believe in telling the difference between them.
      (3 votes)
  • aqualine ultimate style avatar for user mirahbob
    Ethical experiment: Trying to determine if computer time causes high blood pressure
    Unethical experiment: Trying to determine if high blood pressure causes computer time
    (4 votes)
    Default Khan Academy avatar avatar for user
  • piceratops ultimate style avatar for user gingerseal8
    I'm having trouble understanding the difference between parameters and variables. I know there's something about parameters remaining the same. However, a parameter of time people use the computer could be turned into a variable in an experiment, couldn't it? My ecology teacher said they can be almost interchangeable "for our purposes", but that only confuses me more.
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      Parameters are fixed characteristics of a population, while variables can vary within a population. In an experiment, time spent on the computer may be manipulated as an independent variable (e.g., 30 minutes vs. 2 hours), making it a variable in that context. Understanding the distinction helps in experimental design and data analysis.
      (1 vote)
  • blobby green style avatar for user Mez Cooper
    Basically, you can't ever really know you've eliminated the confounding variables for lots of different scenarios.

    How would you know? Looks like it depends on the scenario being studied.
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      Eliminating confounding variables entirely can be challenging, and it depends on the specific scenario being studied. Researchers use various techniques such as random assignment in experiments to minimize the influence of confounding variables. However, in observational studies, confounding variables may still exist, making it difficult to establish causal relationships definitively.
      (1 vote)
  • leafers seedling style avatar for user Tom Keller
    According to Wikipedia (https://en.wikipedia.org/wiki/Observational_study) "an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concerns or logistical constraints."

    This explanation is very different from Sal's.
    Wikipedia's definition seems to me to make more sense judging by the term.

    What am i missing?
    Is Sal's definition incorrect?

    With Sal's definition a Sample Study turns into an Observational Study as soon as we collect more than one variable and look if there is a correlation between them.
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      Sal's explanation of observational studies focuses on analyzing relationships between variables without intervention, whereas Wikipedia's definition emphasizes the lack of control over the independent variable due to ethical or logistical constraints. Both definitions capture different aspects of observational studies, highlighting their complexity and varying applications.
      (1 vote)
  • blobby green style avatar for user RogerMoxswana
    So am I correct in saying that the act of "observing" has nothing to do with an observational study?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      In observational studies, "observing" refers to analyzing data to understand relationships between variables, rather than actively intervening or manipulating variables as in experiments. The focus is on observing existing phenomena rather than controlling variables.
      (1 vote)
  • blobby green style avatar for user Thompson, Jenna
    There are four types of statistical studies: observational studies, surveys, experiments, and meta analytical studies.
    (1 vote)
    Default Khan Academy avatar avatar for user
  • male robot hal style avatar for user Yash
    What would be the term for a statistical study in which only one variable (for instance, it can be heights of male giraffes in a certain forest) is studied to find population parameters, and the entire population is studied, instead of just a sample?
    (0 votes)
    Default Khan Academy avatar avatar for user

Video transcript

- [Instructor] Talk about the main types of statistical studies. So you can have a sample study and we've already talked about this in several videos but we'll go over it again in this one. You can have an observational study, observational study. Or you can have an experiment, experiment. So let's go through each of these and always pause this video and see if you can think about what these words likely mean, or you might already know. Well, sample study, we have looked at. This is really where you're trying to estimate the value of a parameter for a population. So what's an example of that? So let's say we take the population of people in a city, and so that could be hundreds of thousands of people, and the parameter that you care about is how much time on average do they spend on a computer. So the parameter would be for the entire population. If it was possible, you would go talk to every, maybe there's a million people in the city. You would talk to all million of those people and ask them how much time they spend on a computer and you would get the average and then that would be the parameter. So population parameter, population parameter, would be average time on a computer per day, average daily time, time on a computer. Now you'd determine that it's impractical to go talk to everyone, so you're not going to be able to figure out the exact population parameter, average daily time on a computer, so instead, you do a sample study. You randomly sample, and there's a lot of thought in thinking about whether your sample is truly random, so you randomly sample and there's also different techniques of randomly sampling. So you randomly sample people from your population and then you take the average daily time on a computer for your sample, and that is going to be an estimate for the population parameter. So that's your classic sample study. Now in an observational study, you're not trying to estimate a parameter. You're trying to understand how two parameters in a population might move together or not. So let's say that you have a population now, so let's say you have a population of, let's say you have a population of 1,000 people. 1,000 people, and you're curious about whether average daily time on a computer, how it relates to people's blood pressure. So average computer time, oh, I shouldn't be writing this way. Instead of average computer time, it should just be computer time. Computer time versus blood pressure, blood pressure. So what you do is you apply a survey to all 1,000 people and you ask them how much time you spend on a computer and what is your blood pressure? Or maybe you measure it in some way, and then you plot it all, you look at the data and you see if those two variables move together. So what does that mean? Well, let me draw. If this axis is, let's say this is computer time. Computer time, and this axis is blood pressure. Blood pressure. So let's say that there's one person who doesn't spend a lot of time on a computer and they have a relatively low blood pressure. There's another person who spends a lot of time, has high blood pressure. There could be someone who doesn't spend much time on a computer but has a reasonably high blood pressure, but you keep doing this and you get all these data points for those 1,000 people, and I'm not going to sit here and draw 1,000 points, but you see something like this, and so you see, hey, look, it looks like there's definitely some outliers but it looks like these two variables move together. It looks like, in general, the more computer time, the higher the blood pressure, or the higher the blood pressure, the more computer time. And so you can make a conclusion here about these two variables correlating, that they're positively correlated. There is a positive, a reasonable conclusion if you did the study appropriately would be that more computer time correlates with higher blood pressure or that higher blood pressure correlates with more computer time. Now, when you do these observational studies or when you interpret these observational studies, when you read someone else's, it's very important not to say oh, well, this shows me that computer time causes blood pressure, because this is not showing causality, and you also can't say, maybe you might say, somehow blood pressure causes more people to spend time in front of a computer. That seems even a little bit sillier, but they're actually the same 'cause all you're saying is that there's a correlation. These two variables move together. You can't make a conclusion about causality, that computer time causes blood pressure or that high blood pressure causes more computer time. Why can't you make that? Well, there could be what's called a confounding variable, sometimes called a lurking variable, where let's say that, so this is computer time. Computer time, and this is blood pressure. Looks like building, so blood, blood pressure. And it looks like these two things move together. We saw that right over here in our data, but there could be a root variable that drives both of these, a confounding variable, and that could just be the amount of physical activity someone has. So there could just be a lack of physical activity driving both, lack of activity. People who are less active spend more time in front of a computer, and people who are less active have higher blood pressure, and if you were to control for this, if you were to take a bunch of people who had a similar lack of activity or had a similar level of activity, you might see that computer time does not correlate with blood pressure, that these are just both driven by the same thing and what you're really seeing here is like, okay, people who aren't active drives both of these variables. So once again, when you do this observational study and if you do it well, you can draw correlations and that might give you decent hypotheses for causality, but this does not show causality because you could have these confounding variables. Now, experiments, and experiments are the basis of the scientific method. Experiments are all about trying to establish causality, and so what you would do is if you wanted to do an experiment, you would take, and you probably wouldn't be able to do it with 1,000 people. Experiments in some ways are the hardest to do of all of these. Maybe you take 100 people, 100 people, and to avoid having this confounding variable introduce error into your experiment, you randomly assign these hundred people to two groups. So random assign, it's very important that they're randomly assigned. And that's nice, you might not know all of the confounding variables there, but it makes it likely that each group will have a same amount of people with lack of activity or the activity levels on average in each of the groups, when they're randomly assigned, it gives you a better chance that one group doesn't have a significantly different activity level than the other. And then what you do is you have a control group and you have a treatment group. Once again, you've randomly assigned them. So a control and then treatment. And what you might say is, okay, for some amount of time, all of you in the control group can only spend max of 30 minutes in front of a computer, or maybe if you really wanted to do it, you'd say you have to spend exactly 30 minutes on a computer and that's maybe a little unrealistic, and then the treatment group, you have to say, you have to spend exactly two hours in front of a computer, and I'm making up these numbers at random, and it would be nice to see, okay, what as everyone's blood pressure before the experiment? And you'd say, okay, well, the averages are similar going into the experiment, and then you go some amount of time and you measure blood pressure, and if you see that, wow, this group definitely has a higher blood pressure, this group has a higher blood pressure, so the blood pressure is higher here, and once again, some of this might have just happened randomly, it might've been the people you happened to put in there, et cetera, et cetera, but depending if this was a large enough experiment and you conducted it well, this says, hey, look, I'm feeling like there is a causality here, that by making these people spend more time in front of a computer, that that actually raised their blood pressure. So once again, sample study, you're trying to estimate a population parameter. Observation study, you are seeing if there is a correlation between two things and you have to be careful not to say, hey, one is causing the other 'cause you could have confounding variables. Experiment, you're trying to establish or show causality and you do that by taking your group, randomly assigning to a control or treatment. That should evenly or hopefully evenly distribute. Not always, there's some chance it doesn't, but distribute the confounding variables and then on each group, you change how much of one of these variables they get and you see if it drives the other variable. So anyway, in the next two videos, we'll do some examples of identifying these types of sample studies and thinking about what we can conclude from them, or these types of statistical studies and see what we can conclude from them.