• No results found

Lecture_12 [Modo de compatibilidad]

N/A
N/A
Protected

Academic year: 2020

Share "Lecture_12 [Modo de compatibilidad]"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Techniques of Statistical

Analysis I

Lect_12: ANOVA vs Regression +

intro to Multilevel Models

Bruno Arpino

(2)

An ANOVA model is equivalent to a particular form of

regression where the dependent variable is the same as in

the ANOVA and the independent variables are dummy

variables representing the different groups.

Consider again the data set “nlsw88.dta” used in slide 16 of

Regression vs ANOVA

2

Consider again the data set “nlsw88.dta” used in slide 16 of

Lect_7: Does wage vary significantly by race? (Do different

ethnic groups have significantly different average wage?)

Three race groups: white, black, others

group)

each

for

same

the

is

wage

(average

µ

µ

µ

:

H

0 1

=

2

=

3

group)

some

for

different

is

wage

(average

same

the

are

µ

all

Not

:

(3)

The ANOVA model compares the three means (tests if they at

least two are significantly different)

We can also specify a regression model to this scope:

Regression vs ANOVA (cont’d)

Others

White

Y

=

β

+

β

+

β

+

ε

3

where

Note: we omit the black group in the previous regression.

Why? See next slide.

i i

i

i

White

Others

Y

=

β

0

+

β

1

+

β

2

+

ε

(4)

Remember: one requirement on the independent variables is

None of the independent variables can be written as a

linear combination of other X variables

(perfect linear

relationship, multicollinearity).

What would happen if we also include “Black”? Consider a

sample of 5 units

The “dummies trap”

4

sample of 5 units

We can see that the variable Black can be written as a linear

combination of the others: Black = 1 – White – Others.

In general, a categorical regressor with k categories can be

entered in a regression model as a set of k-1 dummy

variables. What happen to the excluded one? Let’s see…

White Others Black

1 0 0

1 0 0

0 1 0

(5)

β

0

is the expected value of wage when White and Others are

0 (that is, for black people)

β

1

is the expected difference between the wage of White and

Black

F-tests in Regression vs ANOVA

i i

i

i

White

Others

Y

=

β

0

+

β

1

+

β

2

+

ε

5

Black

β

2

is the expected difference between the wage of Others

and Black

The omitted group, Black, becomes the

reference group

The hypotheses of the F-test in this regression are equivalent

to those of the ANOVA:

zero

from

different

is

one

least

at

:

H

0

:

H

1 2 1

0

β

=

β

=

same

the

are

µ

all

Not

:

H

µ

µ

µ

:

H

i 1 3 2 1

(6)

Anova output:

F-tests in Regression vs ANOVA (cont’d)

zero

from

different

is

one

least

at

:

H

0

:

H

1 2 1

0

β

=

β

=

same

the

are

µ

all

Not

:

H

µ

µ

µ

:

H

i 1 3 2 1

0

=

=

6

MLRM output:

(7)

After ANOVA we should implement a multiple comparison procedure:

With the MLRM we test directly specific comparisons:

Which means are different?

7

With the MLRM we test directly specific comparisons:

Differences: with the MLRM we do not test all the

(8)

The regression equation can be written as:

Interpretation of the MLRM output

Others

b

White

b

b

wage

=

+

+

8

b

0

is the expected salary for Black

b

1

is the expected difference between the salary of White and

Black (i.e., the expected salary of White is b0+b1)

b

2

is the expected difference between the salary of Others

and Black (i.e., the expected salary of Others is b0+b2)

(9)

We can include other regressors in the MLRM. For example, we could check if the wage gap between White and Black can be attributed to differences in the average education level (“grade”)

Extending the MLRM

9 Interpretation:

-2.21 is the expected salary for black people with 0 education (Note that there is only one unit in the sample with this characteristic!)

0.61 is the expected difference between the salary of White and Black with the same level of education. Controlling for

(10)

Interpretation:

1.12 is the expected difference between the salary of Others and

Extending the MLRM (cont’d)

10

1.12 is the expected difference between the salary of Others and

Black with the same level of education

0.73 is the effect of one additional completed grade on wage keeping constant the race.

I.e., the effect of grade is the same for the three race groups.

(11)

Imagine we sampled 3000 students from 100 schools and we want to assess if the average Grade Point Average (GPA) of students vary by school (i.e., if there is a “school effect”) after having controlled for students characteristics (such as, gender, race, parents’ income, parents’ education).

What happens if we have many groups?

11

We should use a MLRM with 99 dummy variables (+ the other

regressors).

The output would be difficult to read.

(12)

A Multilevel Linear Regression Model has the form:

This model is called NULL because the are no covariates.

In this model the error term is decomposed in two components:

Multilevel Linear Regression Model

j ij

ij

Y

=

β

0

+

ε

+

η

~

N(0,

2

)

ε

σ

ε

ij

η

j

~

N(0,

σ

η2

)

12

In this model the error term is decomposed in two components:

ε represents an individual error (e.g., student)

η is a group-level error (e.g., school)

The Intra-Class Correlation coefficient (ICC), compares the

group-level to the total variance and is an index of homogeneity of units among groups (or importance of the “group” effect):

2 2

2

CC

η ε

η

σ

σ

σ

+

=

(13)

With covariates the MuLRM has the form:

In this model, the variance of η, the group-level error, can be interpreted as the residual variance across groups, i.e., the

Multilevel Linear Regression Model (cont’d)

j ij

kij k

ij ij

ij

X

X

X

Y

=

β

0

+

β

1 1

+

β

2 2

+

...

+

β

+

ε

+

η

13 interpreted as the residual variance across groups, i.e., the

varaibility that exists across groups after differences among groups in terms of the covariates X have been controlled for.

By comparing the residual group-level variance with the variance estimated by the null model we can aseess how much the X

(14)

Subramanian et al (2001), Does the state you live in make a difference? Multilevel analysis of self-rated health in the US, Social Science & Medicine, 53, 9–19.

How does age, gender, race, income affect self-rated health?

Having taken account of these individual (compositional

Multilevel questions: an example

14

Having taken account of these individual (compositional

characteristics) are there significant variations in self-rated

health between US states?

How do state-level characteristics such as, per-capita income,

income distribution and social capital affect self-rated health?

Are there differential effects of these contextual characteristics

(state-level characteristics) across different income groups? (

(15)

students, classes, schools

individuals, families, regions

immigrants, countries of origin

patients, doctors, hospitals

Multilevel structures: examples

15

patients, doctors, hospitals

workers, firms

soccer players, teams

animals, factories

(16)

In the previous examples the data structures are hierarchical or

nested: each unit at the lowest level is nested in a group at the

second level (and this is possibly nested into another at the third level and so on). Other structures are possible:

Multiple memberships: a unit belongs to more than one group

Multilevel structures: not only hierarchical

16 (e.g., people can change residence place over time).

Cross-classifications: groups are not nested. E.g., Vitali and

Arpino (2010) consider the effect of country of origin and province of residence in Spain on the probability of living with parents for second generations young adults immigrants. Country of origin and province of residence are not nested. E.g., immigrants from

Colombia can reside in Barcelona and Spain. At the same time, in Barcelona reside immigrants from many different countries.

(17)

These papers focus on different contextual effects:

School/Teacher (Aitkin et al, 1981)

Family (Curtis et al, 1993)

Space/time (Arzheimer, 2009)

Multilevel research: examples

17

Space/time (Arzheimer, 2009)

Country/Region (Billari et al, 2008)

Neighbourhood (Cerdà et al, 2009)

Work environment (Jolivet et al, 2010)

Social network (De Miguel Luken and Tranmer, 2010)

(18)

If something is not clear

(or you find mistakes in the slides)

18

do not hesitate to come at office hours

or e-mail me

References

Related documents

The traditional influence, also political, of religion in Russia makes the international debate on secularism (Habermas 2011; Taylor 2011, 2007) relevant for Russia, where

This prospective study evaluated one- piece zirconia implants placed in posterior mandible assessing implant survival rate, implant success and marginal bone remodeling..

Ronald McDonald House Charities (RMHC) supports Ronald McDonald Houses throughout Australia and other programs that directly help seriously ill children live

Further, in its Statement to the Community: Transfer and the Public Interest , the Council for Higher Education Accreditation (CHEA), an association of 3,000

1/4" Soaker hose dripline (aka Drip emitter tubing) is attached directly to ½” solid poly tubing supply line with.. 1/4"

Rain Bird SDI specified under turf grass should be designed in a grid pattern, with supply and flush manifolds so that the individual drip emitters will be set out in a pattern

The Ethiopian flora is estimated to about 6000 species of higher plants of which 10% are considered to be endemic according to Institute of Biodiversity

8 Each application must: (1) identify the qualified borrower expected to own the project, 9 (2) if any of the bonds are expected to be issued as pooled financing bonds, 10