Techniques of Statistical
Analysis I
Lect_4: Confidence Intervals for the
proportion
Bruno Arpino
Sample proportion
Confidence interval estimation for a population
proportion
Outline
2
We want to estimate the % of people approving the way
Obama is handling the economy
Here you can find results from some polls
http://www.pollingreport.com/obama_ad.htm
Example
http://www.pollingreport.com/obama_ad.htm
According to the ABC News/Washington Post Poll
conducted in the period Sept. 29-Oct. 2, 2011 on a
sample of 1,002 adults nationwide this percentage was
35%
Similarly to what we have seen for the mean, the best
point estimator for the population proportion,
π
, is the
sample proportion, p:
Point estimation
interest of
stic characteri the
having sample
in the units
of number p =
4
This is an ubiased estimator: E(p) =
π
In the previous example: p = 35% out of n = 1002
persons mean than in the sample 0.35*1002
≈
351
intervieved persons sayd “approve” (351/1002
≈
0.35)
What is the variable we are modelling?
Binary variables are usually coded in this way: X = 0 for unit withouth the characteristic of interes (“disapprove” or“unsure”) and X = 1 for units with the characteristic of interest (“approve”).
A proportion can be seen as the mean of the binary variable X:N Xi
∑
= =size population
interest of
stic characteri the
having population
in the units
of number
Assumption: n is “large enough”
6 “large enough”.
In particular, the CI formula we consider is valid if n is such that: n*p*(1 – p) > 9 An alternative rule of thumb: formula is ok if at least 15observations are in the category of interest (“approve”) and at least 15 observations are not in it.
Confidence interval for a proportion
proportion (we skip the details):
where zα/2 is the standard normal value for the level of
confidence desired, p is the sample proportion and n is the sample size
( )
−
+
−
−
=
n
p)
p(1
z
p
,
n
p)
p(1
z
p
A closer look at the CI
Point Estimate ± (Reliability Factor)(Standard Error)
( )
−
+
−
−
=
n
p)
p(1
z
p
,
n
p)
p(1
z
p
CI
1-απ
α/2 α/28
The width is: And the margin of error is:
α/2n
α/2n
-1α
n
p)
p(1
z
*
2
α/2−
=
W
n
p)
p(1
z
2
/
α/2−
=
A random sample of 100 people shows that 25 are
left-handed.
Form a 95% confidence interval for the true
proportion of left-handers
Exercise
n = 100; p = 25/100 = 0.25; 1-
α
;
z
α/2= 1.96.
Exercise (cont’d)
( )
n
p)
p(1
z
p
,
n
p)
p(1
z
p
CI
0.95 α/2 α/2
−
+
−
−
=
π
10(
0
.
1651
,
0
.
3349
)
We are 95% confident that the true percentage of
left-handers in the population is between 16.51%
and 33.49%.
Exercise (cont’d): interpretation
( ) (
0
.
1651
,
0
.
3349
)
CI
0.95π
=
and 33.49%.
(Although the interval from 0.1651 to 0.3349 may or
may not contain the true proportion, 95% of
The ABC News/Washington Post Poll we considered before report a margin of error of ± 4 for all the polls. What is wrong about this
statement? Can you calculate a 95% CI with the available data? Can you safely conclude that a minority of US citizens approve Obama?
ME=W/2 of a CI. It depends on the value of p!!!
Exercise 2
p(1− p) p(1− p)
12
While phrases such as, “The poll has a margin of error of plus or minus 4” (percentage points!!!) are commonly heard, an additional qualification such as "at a 95 percent confidence level" is also
needed in order to precisely indicate what the error refers to.
Moreover the ME is always positive! (± It does not make sense)
( )
− + − − = n p) p(1 z p , n p) p(1 z pExercise 2
( )
1002 0.65 * 0.35 96 . 1 35 . 0 , 1002 0.65 * 0.35 96 . 1 35 . 0CI 0.95
+ − = π
So actually ME = 0.03! (They probably report the maximum ME for all the polls or using a confidence level of 99%) In fact, if we set 1- α = 0.99; zα/2 = 2.58 then ME = 0.039 (CI = 0.31,0.39). So, also in this case all the “plausible” values are below 50%. We are 99% confident that the proportion of(
0.32 ,0.38)
http://www.prenhall.com/agresti/applet_files/propci.html
You can use it to check to what extend a statement like “95% of CI obtained drwaing several random samples from theHomework (w/o grade!)
14
of CI obtained drwaing several random samples from the population contain the true population proportion” is correct
Compare this two cases: n=1000; p = 0.5 and n = 1000; p = 0.1Is a (more) modern method (due to Brad Efron) for
generating CIs without using mathematical methods to
derive a sampling distribution that assumes a particular
population distribution. It is based on repeatedly taking
samples of size n (with replacement) from the sample
The bootstrap method
samples of size n (with replacement) from the sample
data distribution.
If something is not clear
(or you find mistakes in the slides)
16 do not hesitate to come at office hours
or e-mail me