Main content

### Course: Statistics and probability > Unit 1

Lesson 1: Analyzing one categorical variable- Identifying individuals, variables and categorical variables in a data set
- Individuals, variables, and categorical & quantitative data
- Reading pictographs
- Read picture graphs (multi-step problems)
- Reading bar graphs
- Reading bar graphs: Harry Potter
- Creating a bar graph
- Create bar graphs
- Reading bar charts: comparing two sets of data
- Read bar graphs (2-step problems)
- Reading bar charts: putting it together with central tendency
- Reading pie graphs (circle graphs)
- Picture graphs (pictographs) review
- Bar graphs review

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Identifying individuals, variables and categorical variables in a data set

The concept of variables in data sets comes to life through an exploration of categorical and quantitative variables. Using nutritional data from a coffee shop as an example, the lesson highlights how variables can represent diverse aspects of a data set, such as drink type, calorie count, sugar content, and caffeine amount.

## Want to join the conversation?

- what does categorical mean(57 votes)
- It means the data in the set can be sorted into categories, in this case hot drinks and cold drinks. The sugar content, on the other hand, is not categorical, because a drink could have infinite different amounts of sugar.

Hope this helps!(123 votes)

- Why isn't the type of drink classified as a variable?(20 votes)
- It's not a variable because it's not describing anything or numbering anything. For example, "Type" is a categorial variable because it describes the heat of the drinks. "Sugars" is a quantitative variable because it numbers the amount of sugar in the drinks.(45 votes)

- What are the prerequisites for the Statistics and Probability course?(7 votes)
- Algebra is a must for any Statistics and Probability course.

Whether or not calculus is also required depends on how deeply the course goes into probability theory. If the course covers topics such as probability density functions of continuous random variables, cumulative distribution functions of continuous random variables, moment generating functions, and/or maximum likelihood estimators, then calculus would be required.(27 votes)

- what is standard deviation?(5 votes)
- Standard deviation is a measurement of the spread of the data. If you have a high standard deviation, that means your data are far away from the mean, while if it is low it means they are closer.

Hope this helps!(17 votes)

- I'm interested in learning about stats and probability however I have never learned about it in class. I've covered other math classes like algebra, advanced functions, calculus and a bit of linear algebra. Would this course be good for someone wanting to learn stats for fun? Or should I do other courses prior? Thanks!(8 votes)
- I think this course would be excellent for someone wanting to learn stats for fun. It's modeled off AP Statistics, a class for high schoolers wanting to explore college level statistics. There is a course on Khan Academy called "Get ready for AP Statistics", but if you have experience with linear algebra I think you'll find it very easy. Perhaps you could try Khan Academy's "High school statistics" course, but I don't think you'd need to take other courses prior to this one. Happy learning!(3 votes)

- where can you find the "individuals" of set of data in a given table?(2 votes)
- An individual is what the data is describing. In a table like this, each individual is represented by one row. So in this case, the individuals would be the drinks. An example individual is cappuccino, which is a hot coffee that has 60 calories, 8 grams of sugar, and 75 milligrams of caffeine.(9 votes)

- I'm in seventh grade. Is this suitable for my grade level?(3 votes)
- If you mean this specific lesson, yes. As a course, though, AP Statistics is usually taken in high school. The College Board recommends taking Algebra 2 before this course. So, if you have already done that, yes, it is suitable.

If not, there is a Probability unit in the 7th-grade course, which might be what you are looking for: https://www.khanacademy.org/math/cc-seventh-grade-math/cc-7th-probability-statistics

If you've already done that, you could do the 8th-grade Data and Modeling unit: https://www.khanacademy.org/math/cc-eighth-grade-math/cc-8th-data(5 votes)

- I’m looking at this problem also from a machine learning perspective. In that sense wouldn’t we also include the column ‘drink name’ here also be counted as a categorical variable ? Please help me out here.

Thanks !(3 votes)- Similar to how Sal explained in the video, the drink name
**would not**be a categorical variable in the sense that the pursuing variables are all**describing it**; therefore, it is an**individual**. (Variables describe the individual)

Hope this helps!(4 votes)

- okay, so categorial is more like something that describes it, like in this example, if the coffee is hot or cold. and quantitative is more like the caloric intake, the actual amounts with numbers?(4 votes)
- Typically quantitative (quantity) data are that of which that can be counted(or measured in various numerical values) while qualitative data (quality)is data specifies what type of data is ( i.e hot,cold, yellow, blue, red, low, high etc..) Be mindful on the data in the problem - all of it isn't relevant.

atleast that's what I got out of it......hopes this helps...(2 votes)

- Why are there no missions for statistics and probability? If there is, can you tell where?(3 votes)
- Khan Academy is getting rid of Missions in June 2020, so they have not been adding them to existing courses. If you are wondering about mastery challenges or practice problems, you should be able to access problems and quizzes either by assigning AP Statistics as one of your courses or by looking at the course overview here: https://www.khanacademy.org/math/ap-statistics/analyzing-categorical-ap

Hope this helps!(4 votes)

## Video transcript

- [Narrator] We're told
that millions of Americans rely on caffeine to get
them up in the morning. Which is true, although, if I
drank caffeine in the morning, I've very sensitive. I wouldn't be able to sleep at night. Here's nutritional data
on some popular drinks at Ben's Beans coffee shop. All right, so here we
have the different names of the drinks. And then here we have
the type of the drink, and it looks like they're
either hot or cold. Here we have the calories
for each of those drink. Here we have the sugar content in grams for each of those drinks. And here we have the
caffeine in milligrams for each of those drinks. And then we are asked, the individuals in this data set are, and then we have three choices. Ben's Beans customers. Ben's Beans drinks. Or the caffeine contents. Now, we have to be careful. When someone says the
individuals in a data set, they don't necessarily mean
that they have to be people. They could be things. And the individuals in this data set, each of these rows, they're referring to a certain type of drink
at Ben's Beans coffee shops. So the different types of
drinks that Ben's Beans offers, those are the individuals
in this data set. So they're Ben's Beans drinks. Next, they ask us the data set contains, and they say how many variables and how many of those
variable are categorical. So if we look up here,
let's look at the variables. So this first column is essentially giving
us the type of drink. This wouldn't be a variable, this would be more of an identifier. But all of these other columns
are representing variables. So, for example, type is a variable. It can either be hot or cold. And because it can only take
on one of kind of a number of bucket, it's either
going to be hot or cold. It's going to fit in
one category or another. And you don't just have two categories, you could have more than two categories. But it isn't just some
type of variable number that could take on a
bunch of different values. So this right over here
is a categorical variable. Calories is not a categorical variable. You could have something
with 4.1 calories. You could have something with 178. Things aren't fitting into nice buckets. Same thing for sugars
and for the caffeine. These are quantitative variables that don't just fit into a category. And so here I would say
that we have four variables, one, two, three, four. One of which is categorical. So that would be choice A over here.