If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Margin of error 1

Finding the 95% confidence interval for the proportion of a population voting for a candidate. Created by Sal Khan.

## Want to join the conversation?

• the variance you got doesn't match the variance calculated by sqrt [ p ( 1 - p ) ].
I would like to know why ?
• There are two possible explanations:

The relevant variance is p(1-p), your calculation of √p(1-p) is the standard deviation.

If that's not the reason, then note that Sal is working by treating "successes" as a 1 and "failures" a a 0, and then applying the typical variance formula - including division by n-1. The p(1-p) formula assumes division by n. Using p(1-p) will get you 0.2451, whereas Sal got 0.2476.
• In my stats module the interpretation of the confidence interval is not that we estimate the true population mean to be in a certain interval, as the true population mean is not a variable and is not subject to probability statements.
Rather, the confidence interval should be interpreted as saying, if I took a large amount of samples from the population and I used each sample mean as the center for my confidence interval, then the percentage of those intervals that will contain the true population mean will be the confidence level percentage. Is this correct?
• There are two things to note here:

1. Yes, you are correct in your understanding of a confidence interval and its interpretation.

2. The population mean can be subject to probability statements. For instance, it is perfectly valid to write:

0.95 = P( Xbar - 1.95 σ/ √n < µ < Xbar + 1.95 σ/ √n )

This is how we derive the formula for the confidence interval. However, this is only a valid probability statement when we are thinking of the sample mean, Xbar, as a random variable. The moment we use our sample to calculate the sample mean and plug it in, we have two actual numbers for the bounds of the interval, and hence there is nothing random anymore and we need to switch to the "confidence" interpretation. But before we plug in the observed sample mean to the formula, the "probability" interpretation is still valid.
• I am totally confused. What is s?
s-squared is the variance of the sample. So if you square root it, that's the standard deviation of the sample at . Then why do you have to divide that by the square root of the sample size (n) to get the standard deviation? Why are there two standard deviations? What is the second standard deviation of?
• Can we also use the formula for "sampling distribution of sample proportion" here?

sqrt(p(1-p)/n) = sqrt(0.43*0.57/100) = sqrt(0.002451) = 0.0497 = ~0.50
• Around I don't understand how if the sample mean was 0.43 and the sample standard deviation was 0.50. Would this not possible result in one standard deviation to the left being a negative value?
• I can't watch the video right now, but even if it's impossible for the things being measured to be negative (e.g. rainfall), those sorts of numbers can come out. If there are a lot of data in some cluster around 0.4, but then a few pretty large numbers - outliers - then the standard deviation can get pretty inflated. Some non-normal distributions might exhibit something like this.
• At Sal mentions 100 possible values, what do these values represent? And the distribution is sampling distribution of sample mean, why is that, how is this related to 100 discrete values? Thanks.
• The values represent the people who answered the survey about which candidate they were going to vote for. If someone indicated that they would vote for person A, then their vote would be assigned a value of 1. Otherwise, they indicated that they would vote for B and thus their vote would be assigned a value of 0. Hope that helps.
• None of that made sense to me and I didnt understand at all where you exactly explained margin of error. Is it possible that you could tell me what the margin of error is and how you do it.
• Hi,
The formula is ME(margin of Error)= 2 times the square root of P "hat" times (1 minus P "hat") divide by the amount of people surveyed. The 2 stands for two standard deviation over that stands for 95 % confidence interval. P hat is the result of the survey as a decimal.
So I think margin of error is where you have a survey and there are, lets say 100 people doing it. You put 100 at the bottom. And then the survey says for example, that there was 48% who disliked English. So this is the "p hat" Then you solve the equation. 0.48 times the (1 minus 0.48). And that equals 0.2496. Then you divide that by 100, that makes it 0.002496, then you have to do the square root of that, which is 0.049959984. Then you multiply by 2. And that is 0.099919968. Then you have to multiply by 100 again to make it a percent. Which gives us 9.99 which we round to 10%. That is the Margin of Error. I at least think it is, I am still learning all this stuff.
(1 vote)
• Could there be a possibility that the sample mean would not equal the population mean? For example, if we surveyed only the people who were going to vote for the given candidate, the sample mean would not equal the population mean. Please correct me if I'm wrong.
• The sample mean will often not equal the population mean. That's somewhat the point of Statistics: different samples will give different results, and we want to use just one sample to be able to generalize to the population.
• At why does Sal write µ_¯x instead of ¯x when he refers to the sampling distribution of the sample mean? Isn't that the same as the sample mean ¯x?
• Mu of xbar is the mean of the normal distribution of sample means (and is also the population mean). So of course it's not the same as our (one) sample mean.
(1 vote)
• How does he know which value should equal 0 and which value should equal 1 He said that they could be switched but if they were, the sample mean would be different, so..how does that work?
(1 vote)
• If you switched 0 and 1, the mean would indeed have a different value, because it would represent the proportion of people who vote for candidate A instead of the proportion who vote for candidate B. But it wouldn't affect the variance or standard deviation.

Using the values from Sal's example:
``x-bar = (57 * 0 + 43 * 1) / 100 = 0.43s^2 = (57(0 - 0.43)^2 + 43(1 - 0.43)^2) / 99    = (57 * 0.43^2 + 43 * 0.57^2) / 99    = 0.2475``

If you switch 0 and 1, you get
``x-bar = (57 * 1 + 43 * 0) / 100 = 0.57s^2 = (57(1 - 0.57)^2 + 43(0 - 0.57)^2) / 99    = (57 * 0.43^2 + 43 * 0.57^2) / 99    = 0.2475``

So the sample variance is unchanged.

Since the size of a confidence interval only depends on the sample variance, the confidence interval bounds would be the same distance either side of the new mean as they were from the original mean.

In the second part to this video, Sal concludes that the 95% confidence interval is from 33% to 53% (10% either side of x-bar = 43%). That represents the expected proportion of votes for candidate B, which means the expected proportion of votes for candidate A would be 47% (= 100% - 53%) to 67% (= 100% - 33%).

After switching 0 and 1, you'd instead get a confidence interval of 47% to 67% (10% either side of x-bar = 57%), which represents the expected proportion of votes for candidate A. Thus, the expected proportion of votes for candidate B would be 33% (= 100% - 67%) to 53% (= 100% - 47%).

So, ultimately, you get the same results whichever way you assign 0 and 1.