• No results found

Lecture_4 [Modo de compatibilidad]

N/A
N/A
Protected

Academic year: 2020

Share "Lecture_4 [Modo de compatibilidad]"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Techniques of Statistical

Analysis I

Lect_4: Confidence Intervals for the

proportion

Bruno Arpino

(2)

Sample proportion

Confidence interval estimation for a population

proportion

Outline

2

(3)

We want to estimate the % of people approving the way

Obama is handling the economy

Here you can find results from some polls

http://www.pollingreport.com/obama_ad.htm

Example

http://www.pollingreport.com/obama_ad.htm

According to the ABC News/Washington Post Poll

conducted in the period Sept. 29-Oct. 2, 2011 on a

sample of 1,002 adults nationwide this percentage was

35%

(4)

Similarly to what we have seen for the mean, the best

point estimator for the population proportion,

π

, is the

sample proportion, p:

Point estimation

interest of

stic characteri the

having sample

in the units

of number p =

4

This is an ubiased estimator: E(p) =

π

In the previous example: p = 35% out of n = 1002

persons mean than in the sample 0.35*1002

351

intervieved persons sayd “approve” (351/1002

0.35)

(5)

Consider again the previous example.

For each unit the variable of interest is “X=approve or not Obama”. I.e., this is a binary variable (a variable that only takes two values: “yes”, “other”).

Binary variables are usually coded in this way: X = 0 for unit

What is the variable we are modelling?

Binary variables are usually coded in this way: X = 0 for unit withouth the characteristic of interes (“disapprove” or

“unsure”) and X = 1 for units with the characteristic of interest (“approve”).

A proportion can be seen as the mean of the binary variable X:

N Xi

= =

size population

interest of

stic characteri the

having population

in the units

of number

(6)

X only takes values 0 and 1. So, the normal distribution is not appropriate.

The distribution of the sample proportion is binomial but can be approximated by a normal distribution if the sample is “large enough”.

Assumption: n is “large enough”

6 “large enough”.

In particular, the CI formula we consider is valid if n is such that: n*p*(1 – p) > 9

An alternative rule of thumb: formula is ok if at least 15

observations are in the category of interest (“approve”) and at least 15 observations are not in it.

(7)

Again similarly to what we have seen for the mean, we prefer a confidence interval estimation to a point estimation to keep into account sampling variability

Using the CI formula for the mean we can derive a CI for a proportion (we skip the details):

Confidence interval for a proportion

proportion (we skip the details):

where zα/2 is the standard normal value for the level of

confidence desired, p is the sample proportion and n is the sample size

( )





+

=

n

p)

p(1

z

p

,

n

p)

p(1

z

p

(8)

Remember the general structure of a CI:

A closer look at the CI

Point Estimate ± (Reliability Factor)(Standard Error)

( )





+

=

n

p)

p(1

z

p

,

n

p)

p(1

z

p

CI

1-α

π

α/2 α/2

8

The width is:

And the margin of error is:

α/2

n

α/2

n

-1α

n

p)

p(1

z

*

2

α/2

=

W

n

p)

p(1

z

2

/

α/2

=

(9)

A random sample of 100 people shows that 25 are

left-handed.

Form a 95% confidence interval for the true

proportion of left-handers

Exercise

(10)

n = 100; p = 25/100 = 0.25; 1-

α

;

z

α/2

= 1.96.

Exercise (cont’d)

( )

n

p)

p(1

z

p

,

n

p)

p(1

z

p

CI

0.95 α/2 α/2





+

=

π

10

(

0

.

1651

,

0

.

3349

)

(11)

We are 95% confident that the true percentage of

left-handers in the population is between 16.51%

and 33.49%.

Exercise (cont’d): interpretation

( ) (

0

.

1651

,

0

.

3349

)

CI

0.95

π

=

and 33.49%.

(Although the interval from 0.1651 to 0.3349 may or

may not contain the true proportion, 95% of

(12)

The ABC News/Washington Post Poll we considered before report a margin of error of ± 4 for all the polls. What is wrong about this

statement? Can you calculate a 95% CI with the available data? Can you safely conclude that a minority of US citizens approve Obama?

ME=W/2 of a CI. It depends on the value of p!!!

Exercise 2

  p(1p) p(1p)

12

While phrases such as, “The poll has a margin of error of plus or minus 4” (percentage points!!!) are commonly heard, an additional qualification such as "at a 95 percent confidence level" is also

needed in order to precisely indicate what the error refers to.

Moreover the ME is always positive! (± It does not make sense)

( )

      − + − − = n p) p(1 z p , n p) p(1 z p

(13)

Assume 1- α = 0.95; zα/2 = 1.96

We know: n = 1002; p = 0.35

We can calculate a 95% CI:

Exercise 2

( )

1002 0.65 * 0.35 96 . 1 35 . 0 , 1002 0.65 * 0.35 96 . 1 35 . 0

CI 0.95

      + − = π

So actually ME = 0.03! (They probably report the maximum ME for all the polls or using a confidence level of 99%)

In fact, if we set 1- α = 0.99; zα/2 = 2.58 then ME = 0.039 (CI = 0.31,0.39). So, also in this case all the “plausible” values are below 50%. We are 99% confident that the proportion of

(

0.32 ,0.38

)

(14)

Check out this applet on CI estimation for a proportion

http://www.prenhall.com/agresti/applet_files/propci.html

You can use it to check to what extend a statement like “95% of CI obtained drwaing several random samples from the

Homework (w/o grade!)

14

of CI obtained drwaing several random samples from the population contain the true population proportion” is correct

Compare this two cases: n=1000; p = 0.5 and n = 1000; p = 0.1

(15)

Is a (more) modern method (due to Brad Efron) for

generating CIs without using mathematical methods to

derive a sampling distribution that assumes a particular

population distribution. It is based on repeatedly taking

samples of size n (with replacement) from the sample

The bootstrap method

samples of size n (with replacement) from the sample

data distribution.

(16)

If something is not clear

(or you find mistakes in the slides)

16 do not hesitate to come at office hours

or e-mail me

References

Related documents

To the extent that student subject interest is positively related to student academic achievement, cognitive stimulating activities with students could have an effect on

components of the family environment and parental and child behaviour [ 63 , 64 ]. However, the use of the term ‘ family ’ in these contexts does not reflect a family level

The results of the Item Equivalence Study show clear evidence that TIMSS 2019 trend items presented in eTIMSS format were more difficult on average than the paperTIMSS

We hypothesize that a local in- crease in inflammatory cells and cytokines such as neu- trophils and NF- κ B contribute to increased muscle atrophy during the acute phase (2

Methods: Selected items from the PIRLS 2011 student and home questionnaires were analyzed in a regression model fitted using the IEA International Database (IDB) Ana- lyzer

The findings of this study highlight the following: (a) Ma- jority of the participants are physically active; (b) They ob- tain their physical activity primarily from work

There is also some evidence that medium-low tech firms were more likely to hire immigrants, and that immigrant employment is negatively correlated with the average years of schooling