If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Identifying individuals, variables and categorical variables in a data set

The concept of variables in data sets comes to life through an exploration of categorical and quantitative variables. Using nutritional data from a coffee shop as an example, the lesson highlights how variables can represent diverse aspects of a data set, such as drink type, calorie count, sugar content, and caffeine amount.

Want to join the conversation?

Video transcript

- [Narrator] We're told that millions of Americans rely on caffeine to get them up in the morning. Which is true, although, if I drank caffeine in the morning, I've very sensitive. I wouldn't be able to sleep at night. Here's nutritional data on some popular drinks at Ben's Beans coffee shop. All right, so here we have the different names of the drinks. And then here we have the type of the drink, and it looks like they're either hot or cold. Here we have the calories for each of those drink. Here we have the sugar content in grams for each of those drinks. And here we have the caffeine in milligrams for each of those drinks. And then we are asked, the individuals in this data set are, and then we have three choices. Ben's Beans customers. Ben's Beans drinks. Or the caffeine contents. Now, we have to be careful. When someone says the individuals in a data set, they don't necessarily mean that they have to be people. They could be things. And the individuals in this data set, each of these rows, they're referring to a certain type of drink at Ben's Beans coffee shops. So the different types of drinks that Ben's Beans offers, those are the individuals in this data set. So they're Ben's Beans drinks. Next, they ask us the data set contains, and they say how many variables and how many of those variable are categorical. So if we look up here, let's look at the variables. So this first column is essentially giving us the type of drink. This wouldn't be a variable, this would be more of an identifier. But all of these other columns are representing variables. So, for example, type is a variable. It can either be hot or cold. And because it can only take on one of kind of a number of bucket, it's either going to be hot or cold. It's going to fit in one category or another. And you don't just have two categories, you could have more than two categories. But it isn't just some type of variable number that could take on a bunch of different values. So this right over here is a categorical variable. Calories is not a categorical variable. You could have something with 4.1 calories. You could have something with 178. Things aren't fitting into nice buckets. Same thing for sugars and for the caffeine. These are quantitative variables that don't just fit into a category. And so here I would say that we have four variables, one, two, three, four. One of which is categorical. So that would be choice A over here.