Main content

### Course: AP®︎/College Statistics > Unit 9

Lesson 2: The central limit theorem# Sampling distribution of the sample mean

Take a sample from a population, calculate the mean of that sample, put everything back, and do it over and over. No matter what the population looks like, those sample means will be roughly normally distributed given a reasonably large sample size (at least 30). This is the main idea of the Central Limit Theorem — the sampling distribution of the sample mean is approximately normal for "large" samples. Created by Sal Khan.

## Want to join the conversation?

- If we know the mean and the standard deviation of the population, then why are we taking samples, if we already have the data?

Thanks in advance.(27 votes)- Learning statistics can be a little strange. It almost seems like you're trying to lift yourself up by your own bootstraps. Basically, you learn about populations working under the assumption that you know the mean/stdev, which is silly, as you say, but later you begin to drop these assumptions and learn to make inferences about populations based on your samples.

Once you have some version of the Central Limit Theorem, you can start answering some interesting questions, but it takes a lot of study just to get there!(44 votes)

- Is there any difference if I take 1 "sample" with 100 "instances", or I take 100 "samples" with 1 "instance"?

(By sample I mean the S_1 and S_2 and so on. With instances I mean the numbers, [1,1,3,6] and [3,4,3,1] and so on.)(11 votes)- There is a difference. Your "samples" (random selections of values "x") that are made up of "instances" (referred to as the variable "n") provide what will essentially be the building blocks of your Sampling Distribution of the Sample Mean. Because your "instances" determine the value of the mean of "x", your size of "n" determines the value of "x"'s mean, and the Sampling Distribution of the Sample Mean's standard deviation (Defined as The original dataset's standard deviation divided by the square root of "n").

For example: If you were to take 1 "sample" with 100 "instances", you would get only one piece of data regarding the mean of 100 items [1,1,3,6,3,6,3,1,1,1,1,1...] from your original data. Your sampling distribution of the Sample mean's standard deviation would have a value of ((The original sample's S.D.)/(The square root of 100)), but that wouldn't really matter, because your data will likely be very close to your original data's mean, and you'd only have one sample.

Now if you take 100 samples with 1 instance [3], you'll get many pieces of data, but no change in standard deviation from your first sample: ((The original sample's S.D.)/(The square root of 1)). Functionally, with enough samples taken like this, you'll re-create your original dataset! You won't be creating a useful sampling distribution of the sample mean because "x" will equal the mean of "x". With 100 "samples" of 1 "instance", you're randomly picking 100 values of "x" and re-plotting them.

I hope that helps.(9 votes)

- So if every distribution approaches normal when do I employ say a Poisson or uniform or a Bernoulli distribution? I suppose it's a concept I haven't breached yet but how do I know when or which distribution to employ so I appropriately analyze the data? End goal = solve real world problems!(1 vote)
- Not every distribution goes to the Normal. the distribution of the sample mean does, but that's as the sample size increases. If you have smaller sample sizes, assuming normality either on the data or the sample mean may be wholly inappropriate.

In terms of identifying the distribution, sometimes it's a matter of considering the nature of the data (e.g. we might think "Poisson" if the data collected are a rate, number of events per some unit/interval), sometimes it's a matter of doing some exploratory data analysis (histograms, boxplots, some numerical summaries, and the like).

For actually analyzing data: I would suggest hiring someone with more extensive training in Statistics to actually do such. Taking one course in Stats, which is basically what KhanAcademy goes through, isn't really enough to prepare someone to be a data analyst. I see the primary goal of taking one or two stats courses as giving you enough information to allow you to understand the results of statistical analyses. You can better tell the statistician what you want in his/her own terms, and you can better understand what s/he gives back to you.(11 votes)

- Do your sample sizes have to be the same size? E.G, at1:05(ish) there are a bunch of samples with a sample size of four. Would it mess up any calculations if you took a sample of four and then, say, a sample of ten?(2 votes)
- Yes, the sample sizes should be the same. The sample size is not considered to be a variable, it's considered to be a constant. The sampling distribution of the sample mean can be thought of as "For a sample of size n, the sample mean will behave according to this distribution." Any random draw from that sampling distribution would be interpreted as the mean of a sample of n observations from the original population.(7 votes)

- What is the difference between "sample distribution" and "sampling distribution"?(2 votes)
- The sample distribution is what you get directly from taking a sample. You plot the value of each item in the sample to get the distribution of values across the single sample. When Sal took a sample in the previous video at2:04and got S1 = {1, 1, 3, 6}, and graphed the values that were sampled, that was a sample distribution. The 2nd graph in the video above is a sample distribution because it shows the values that were sampled from the population in the top graph.

The sampling distribution is what you get when you compare the results from several samples. You plot the mean of each sample (rather than the value of each thing sampled). In the previous video, Sal did that starting at4:29, when he plotted the mean of each sample. The 3rd and 4th graphs above are sampling distributions because each shows a distribution of means from the many samples of a particular size.

http://www.psychstat.missouristate.edu/introbook/SBK19.htm also has an explanation.(4 votes)

- Is it possible to determine the sample variance without the population variance? I have an assignment that requires me to show the sampling distribution of the mean with only a population proportion and sample size.(3 votes)
- If a question talks about a "population proportion" then you are dealing with a binomial distribution, except that you divide by the sample size to get sample proportion rather than the sample count. If the population proportion is p, then the mean value of sample proportions will be also be p (as usual, the mean of the sampling distribution is just the same as for the whole population), and the variance will be p(1 - p)/n, where n is the size of the sample. You can read about this distribution here (note they use the letter pi for population proportion. It does NOT mean 3.14159...):

http://onlinestatbook.com/2/sampling_distributions/samp_dist_p.html(2 votes)

- why can we say that the sampling distribution of mean follows a normal distribution for a large enough sample size even though the population is may not be normally distributed?(2 votes)
- Properly, the sampling distribution APPROXIMATES a normal distribution for a sufficiently large sample (sometimes cited as n > 30). A coin flip is not normally distributed, it is either heads or tails. But 30 coin flips will give you a binomial distribution that looks reasonably normal (at least in the middle).(2 votes)

- at8:45, it has been said that even for single samples the central limit theorem is true. It is not so, central limit theorem is applicable only for sample MEANS. For example, out of a population of 5000 if I have taken the sample of n=50, central limit theorem does NOT apply to that. It applies only when I have taken (e.g.)40 samples of n=50. However, this is as per my understanding. Please correct me if I am wrong.(2 votes)
- What is the difference between X-bar and mu? Like when do you know which to use what?(1 vote)
- X-bar is the mean of a sample (as Sal says at4:29in https://www.khanacademy.org/math/probability/descriptive-statistics/central_tendency/v/statistics-sample-vs-population-mean). You use X-bar for the mean calculated from data that was only gathered from part of the population (such as a survey of 1000 adults out of the entire US population).

Mu is the mean of the entire actual population. You only use mu to describe the mean if you are talking about data gathered from every element in the population, such as the 2010 census or every porcupine in the zoo.(3 votes)

- How is the sampling distribution of a sample mean related to the sampling distribution of a sample proportion?(2 votes)

## Video transcript

In the last video,
we learned about what is quite possibly the most
profound idea in statistics, and that's the
central limit theorem. And the reason why
it's so neat is, we could start with
any distribution that has a well defined mean
and variance-- actually, I wrote the standard
deviation here in the last video, that
should be the mean, and let's say it
has some variance. I could write it
like that, or I could write the standard
deviation there. But as long as it
has a well defined mean and standard
deviation, I don't care what the
distribution looks like. What I can do is take samples--
in the last video of say, size four-- that means
I take literally four instances of this random
variable, this is one example. I take their mean,
and I consider this the sample mean
from my first trial, or you could almost say
for my first sample. I know it's very confusing,
because you can consider that a sample, the
set to be a sample, or you could consider each
member of the set is a sample. So that can be a little
bit confusing there. But I have this
first sample mean, and then I keep doing
that over and over. In my second sample,
my sample size is four. I got four instances of
this random variable, I average them, I have
another sample mean. And the cool thing about
the central limit theorem, is as I keep plotting
the frequency distribution of my
sample means, it starts to approach
something that approximates the
normal distribution. And it's going to do a
better job of approximating that normal distribution
as n gets larger. And just so we have
a little terminology on our belt, this
frequency distribution right here that
I've plotted out, or here, or up here that
I started plotting out, that is called-- and
it's kind of confusing, because we use the word
sample so much-- that is called the sampling
distribution of the sample mean. And let's dissect
this a little bit, just so that this
long description of this distribution starts
to make a little bit of sense. When we say it's the
sampling distribution, that's telling us that
it's being derived from-- it's a distribution
of some statistic, which in this case happens
to be the sample mean-- and we're deriving
it from samples of an original distribution. So each of these. So this is my first sample,
my sample size is four. I'm using the
statistic, the mean. I actually could have
done it with other things, I could have done the mode or
the range or other statistics. But sampling distribution
of the sample mean is the most common one. It's probably, in
my mind, the best place to start learning about
the central limit theorem, and even frankly,
sampling distribution. So that's what it's called. And just as a little bit
of background-- and I'll prove this to you
experimentally, not mathematically, but
I think the experimental is on some levels more
satisfying with statistics-- that this will
have the same mean as your original distribution. As your original
distribution right here. So it has the same mean, but
we'll see in the next video that this is actually going
to start approximating a normal distribution, even
though my original distribution that this is kind of generated
from, is completely non-normal. So let's do that with
this app right here. And just to give proper
credit where credit is due, this is-- I think was developed
at Rice University-- this is from onlinestatbook.com. This is their app, which I
think is a really neat app, because it really helps you
to visualize what a sampling distribution of
the sample mean is. So I can literally create my
own custom distribution here. So let me make
something kind of crazy. So you could do this, in
theory, with a discrete or a continuous probability
density function. But what they have here, we
could take on one of 32 values, and I'm just going to set
the different probabilities of getting any of
those 32 values. So clearly, this right here
is not a normal distribution. It looks a little bit bimodal,
but it doesn't have long tails. But what I want to do is,
first just use a simulation to understand, or to
better understand, what the sampling
distribution is all about. So what I'm going
to do is, I'm going to take-- we'll start
with-- five at a time. So my sample size
is going to be five. And so when I click animated,
what it's going to do, is it's going to take five
samples from this probability distribution function. It's going to take five
samples, and you're going to see them
when I click animated, it's going to average them and
plot the average down here. And then I'm going
to click it again, and it's going to do it again. So there you go, it got
five samples from there, it averaged them,
and it hit there. So what I just do? I clicked-- oh, I
wanted to clear that. Let me make this
bottom one none. So let me do that over again. So I'm going to
take five at time. So I took five samples from up
here, and then it took its mean and plotted the mean there. Let me do it again. Five samples from this
probability distribution function, plotted
it right there. I could keep doing it. It'll take some time. But you can see I
plotted it right there. Now I could do this 1,000 times,
it's going to take forever. Let's say I just wanted
to do it 1,000 times. So this program,
just to be clear, it's actually generating
the random numbers. This isn't like
a rigged program. It's actually going to generate
the random numbers according to this probability
distribution function. It's going to take five at
a time, find their means, and plot the means. So if I click 10,000, it's
going to do that 10,000 times. So it's going to take five
numbers from here 10,000 times and find their
means 10,000 times and then plot the
10,000 means here. So let's do that. So there you go. And notice it's
already looking a lot like a normal distribution. And like I said, the original
mean of my crazy distribution here was 14.45, and after doing
10,000 samples-- or 10,000 trials-- my mean here is 14.42. So I'm already getting pretty
close to the mean there. My standard deviation, you
might notice, is less than that. We'll talk about that
in a future video. And the skew and
kurtosis, these are things that help us measure
how normal a distribution is. And I've talked a little
bit about it in the past, and let me actually just diverge
a little bit, it's interesting. And they're fairly
straightforward concepts. Skew literally
tells-- so if this is-- let me do it in
a different color-- if this is a perfect
normal distribution-- and clearly my drawing is
very far from perfect-- if that's a perfect
distribution, this would have a skew of zero. If you have a
positive skew, that means you have a larger right
tail than you would otherwise expect. So something with a positive
skew might look like this. It would have a large
tail to the right. So this would be
a positive skew, which makes it a
little less than ideal for normal distribution. And a negative skew
would look like this, it has a long tail to the left. So negative skew
might look like that. So that is a negative skew. If you have trouble
remembering it, just remember which
direction the tail is going. This tail is going towards
a negative direction, this tail is going to
the positive direction. So if something
has no skew, that means that it's nice and
symmetrical around its mean. Now kurtosis, which sounds
like a very fancy word, is similarly not that
fancy of an idea. So once again, if I were to draw
a perfect normal distribution. Remember, there is no
one normal distribution, you could have
different means and different standard deviations. Let's say that's a perfect
normal distribution. If I have positive kurtosis,
what's going to happen is, I'm going to have
fatter tails-- let me draw it a little nicer than
that-- I'm going to have fatter tails, but I'm going to
have a more pointy peak. I didn't have to
draw it that pointy, let me draw it like this. I'm going to have
fatter tails, and I'm going to have a more pointy
peak than a normal distribution. So this right here
is positive kurtosis. So something that has
positive kurtosis-- depending on how positive it
is-- it tells you it's a little bit more pointy than
a real normal distribution. And negative kurtosis
has smaller tails, but it's smoother
near the middle. So it's like this. So something like this would
have negative kurtosis. And maybe in future videos we'll
explore that in more detail, but in the context
of the simulation, it's just telling us how
normal this distribution is. So when our sample
size was n equal 5 and we did 10,000 trials,
we got pretty close to a normal distribution. Let's do another 10,000 trials,
just to see what happens. It looks even more like
a normal distribution. Our mean is now the
exact same number, but we still have a
little bit of skew, and a little bit of kurtosis. Now let's see what happens if we
do the same thing with a larger sample size. And we could actually
do them simultaneously. So here's n equal 5. Let's do here, n equals 25. Just let me clear them. I'm going to do the sampling
distribution of the sample mean. And I'm going to run
10,000 trials-- I'll do one animated trial, just so
you remember what's going on. So I'm literally taking first
five samples from up here, find their mean. Now I'm taking 25 samples
from up here, find its mean, and then plotting it down here. So here the sample size
is 25, here it's five. I'll do it one more time. I take five, get
the mean, plot it. Take 25, get the mean, and
then plot it down there. This is a larger sample size. Now that thing that I just did,
I'm going to do 10,000 times. And remember, our
first distribution was just this really crazy
very non-normal distribution, but once we did it--
whoops, I didn't want to make it that big. Scroll up a little bit. So here, what's interesting? I mean they both
look a little normal, but if you look at the
skew and the kurtosis, when our sample size is
larger, it's more normal. This has a lower skew than when
our sample size was only five. And it has a less
negative kurtosis than when our sample
size was five. So this is a more
normal distribution. And one thing that we're
going to explore further in a future video, is not only
is it more normal in its shape, but it's also tighter
fit around the mean. And you can even think about
why that kind of makes sense. When your sample size
is larger, your odds of getting really far away
from the mean is lower. Because it's very
low likelihood, if you're taking 25 samples,
or 100 samples, that you're just going to get a
bunch of stuff way out here, or a bunch of
stuff way out here. You're very likely to get a
reasonable spread of things. So it makes sense that your
mean-- your sample mean-- is less likely to be far
away from the mean. We're going to talk a little
bit more about in the future. But hopefully this
kind of satisfies you that-- at least
experimentally, I haven't proven it to you with
mathematical rigor, which hopefully we'll
do in the future. But hopefully this
satisfies you, at least experimentally, that the
central limit theorem really does apply to any distribution. I mean, this is a
crazy distribution. And I encourage you to use this
applet at onlinestatbook.com and experiment with
other crazy distributions to believe it for yourself. But the interesting
things are that we're approaching a
normal distribution, but as my sample
size got larger, it's a better fit for
a normal distribution.