Chapter 1. Linear Panel Models and Heterogeneity

(1)

Chapter 1. Linear Panel Models and Heterogeneity

Master of Science in Economics - University of Geneva

Christophe Hurlin, Université d’Orléans

Université d’Orléans

January 2010

(2)

The outline of the chapter:

1 _{Speci…cation tests and analysis of covariance} 2 _{Linear unobserved e¤ects panel data models} 3 Fixed or random methods?

4 Fixed-E¤ects methods: Least Squares dummy variable

approach

5 Random e¤ects methods

(3)

Section 1. Speci…cation tests and analysis of

covariance

(4)

A linear model commonly used to assess the e¤ects of both quantitative and qualitative factors is postulated as

yi,t =αi,t+β0_i,txi,t+εi,t

where₈i =1, ..,N, ₈t =1, ..,T

αit and β= (β_1it,β_2it, ...,β_Kit)are 1 1 and 1 K vectors of

parameters that vary acrossi andt,

xit = (x1it, ...,xKit)is a 1 K vector of exogenous variables,

(5)

Three aspects of the estimated regression coe¢ cients can be tested:

1 the homogeneity of regression slope coe¢ cients

2 _{the homogeneity of} _{regression intercept coe¢ cients}_.

3 thetime stability of parameters (slopes and constants). We

will not consider this issue (not speci…c to panel data models) here.

(6)

We assume that parameters areconstant over time, but can vary across individuals.

yi,t =αi+β0_ixi,t+εi,t

Three types of restrictions can be imposed on this model. Regression slope coe¢ cients are identical, and intercepts are not (model with individual / unobserved e¤ects).

yi,t =αi +β0xi,t+εi,t

Regression intercepts are the same, and slope coe¢ cients are not

(unusual).

yi,t = α+β0_ixi,t +εi,t

Both slope and intercept coe¢ cients are the same(homogeneous

/ pooled panel).

(7)

De…nition

Anheterogeneous panel data model is a model in which all parameters (constant and slope coe¢ cients) vary accross individuals.

De…nition

Anhomogeneous panel data model (or pooled model) is a model in which all parameters (constant and slope coe¢ cients) are common

(8)

(9)

Figure: Hsiao (2003) [ ]N i H Test :i i 1, 1 0α=αβ=β∀∈ vraie H1 0 rejetée H1 0 t i t i t i x y,=α+β',+ε, [ ]N i H Test:i 1, 2 0β=β∀∈ vraie H2 0 rejetée H2 0 t i t i i i t i x y,=α+β',+ε, TestH: i i[ ]1,N 3 0α=α∀∈ vraie H3 0 Hrejetée 3 0 t i t i i t i x y,=α+β',+ε, t i t i t i x y,=α+β',+ε,

(10)

Lemma

Under the assumption that theεit are independently normally distributed over i and t with mean zero and varianceσ2_ε, F tests

(11)

First step (homogeneous/ pooled assumption)

Let us consider the general model

yi,t =αi +β0xi,t+εi,t

The hypothesis of common intercept and slope can be viewed as a set of(K +1)(N 1)linear restrictions:

H₀1 : β_i = β αi =α 8i 2 [1,N]

H_a1 : ₉(i,j)₂ [1,N] / β_i 6= β_j ou αi 6=αj

(12)

Under the alternative H1,there are NK estimated slope

coe¢ cients for the N vectors β_i (K 1)and estimatedN

constants.

Under H1, the unrestricted residual sum of squaresS1 divided

byσ2_ε has a chi-square distribution with NT N(K +1)

(13)

De…nition

Under the homogeneous assumptionH₀1,

H₀1 : β_i

(K,1)

= β

(K,1)

αi = α 8i 2[1,N]

the F statistic,denotedF1,and de…ned by: F1 =

(RSS1,c RSS1)/[(N 1) (K+1)]

RSS1/[NT N(K+1)]

has a Fischer distribution with(N 1) (K +1)and

NT N(K +1)degrees of freedom. RSS1 denotes the residual

sum of squares of the model andRSS1,c the residual sum of

squares of the constrained model:

yi,t =α+β0xi,t +εi,t

(14)

UnderH1,the residual sum of squares is equal to the sum of the N

residual sum of squares associated to theN individual regressions:

RSS1 = N

∑

i=1 RSS1,i = N

∑

The hypothesis of heterogeneous intercepts but homogeneous

slopes can be reformulated as subject to(N 1)K linear

restrictions (non restrictions on αi).

H₀2 : β_i =β 8i =1, ..N

(18)

De…nition

Under the assumptionH₀2,

H₀2 : β_i =β 8i =1, ..N

the F statistic, denotedF2,and de…ned by:

F2 =

(RSS1,c0 RSS1)/[(N 1)K] RSS1/[NT N(K+1)]

has a Fischer distribution with(N 1)K etNT N(K +1)

degrees of freedom underH₀2. RSS1 denotes the residual sum of

squares of the model andRSS_1,c0 the residual sum of squares of

the constrained model (model with individual e¤ects):

(19)

IfH2

0 is not rejected, one can also apply a conditional test for

homogeneous intercepts (N 1 linear restrictions).

H₀3 : αi =α 8i =1, ..,N given β_i = β

Under the null, the model is homogeneous (pooled) and the restricted residual sum of squares is SCR1,c..

Under the alternative, the model is yi,t =αi+β0xi,t +εi,t,

(21)

De…nition

Under the assumptionH₀3,

H₀3 : αi =α 8i =1, ..,N given β_i = β

the F statistic, denotedF3,and de…ned by: F3 =

(RSS1,c RSS1,c0)/(N 1) RSS1,c0/[N(T 1) K]

(1)

has a Fischer distribution withN 1 and N(T 1) K degrees

of freedom underH₀2. RSS1,c0 denotes the residual sum of squares

of the model with individual e¤ects andSCR1,c the residual sum of

squares of the pooled model previously de…ned.

(22)

(23)

Example

Let us consider a simple panel regression model for the total number of strikes days in OECD countries. We have a balanced

panel data set for 17 countries (N =17) and annual data form

1951 to 1985 (T =35). General idea: evaluate the link between

strikes and some macroeconomic factors (in‡atiion, unemployment etc..)

s,t =αi+β_iu,t+γipit+εit 8i =1, ..,17

si,t the number of strike days for 1000 workers for the country i at time t.

uitt the unemployement rate

pi,t,the in‡ation rate

(24)

(25)

(26)

(27)

Conclusion of section 1.

Fact

It is possible to test the heterogeneity / homogeneity of the parameters under some speci…c assumptions(normality of residuals). More generaly, the assumption of heterogenity / homogeneity of the parameters (slope coe¢ cients and constants) has to be evaluated with an economic interpretation.

Example

Example: It is reasonnable to assume that the slope parameters of the production function are the same accros countries? what does it imply? Should I impose a common mean for the TFP for France and Germany?

(28)

Section 2. Linear unobserved/individual e¤ects

panel data models

(29)

2.1. De…nitions

(30)

We now consider a model with individual e¤ects and common slope parameters.

De…nition

A linear unobserved / individual e¤ects panel data model is de…ned as follows:

yit = αi+β0xit +εit

where₈i =1, ..,N, ₈t =1, ..,T

αi and β= (β₁,β₂, ...,β_K)are 1 1 and 1 K vectors of

parameters,

xit = (x1it, ...,xKit)is a 1 K vector of exogenous variables,

εit is the error term, assumed to be i.i.d.,with 8i =1, ..,N,

8t =1, ..,T

(31)

De…nition

There are many names forαi: (1) unobserved e¤ects, (2)

individual e¤ects, (3) unobserved component, (4) latent variable (for random e¤ects models), (5) individual heterogeneity. De…nition

Theεit are called the idiosyncratic errors or idiosyncratic disturbancesbecause these change accross t as well as accross i.

(32)

Writting in vector form. Let us denote yi (T,1) = 0 B B @ yi,1 yi,2 ... yi,T 1 C C A Xi (T,K) = 0 B B @

x1,i,1 x2,i,1 ... xK,i,1 x1,i,2 x2,i,2 ... xK,i,2

... ... ... ...

x1,i,T x2,i,T ... xK,i,T

1 C C A

Let us denotee a unit vector and εi the vector of errors:

e (T,1)= 0 B B @ 1 1 ... 1 1 C C A εi (T,1) = 0 B B @ εi,1 εi,2 ... εi,T 1 C C A

(33)

De…nition

For a given individuali, the linear unobserved / individual

e¤ects panel data modelis de…ned as follows:

yi =eαi+Xiβ+εi 8i =1, ..,N E(εi) =0

E εiεi0 =σ2εIT

E εiεj0 = 0

(T,T) if i 6=j

(34)

Example

Let us consider the case of a Cobb Douglas production function in

log, as de…ned previously, for the caseT =3.We have:

yi,t =αi+β_kki,t +β_nni,t+εi,t 8i,8t 2[1,3]

or in a vectorial form for a countryi :

0 @ yi,1 yi,2 yi,3 1 A= 0 @ 1 1 1 1 Aαi+ 0 @ ki,1 ni,1 ki,2 ni,2 ki,3 ni,3 1 A β_k β_n + 0 @ εεi,1i,2 εi,3 1 A

(35)

It is also possible to stackle all this vectors/matrix as follows Y =ee_eα+Xβ+ε Y (TN,1)= 0 B B @ y1 y2 ... yN 1 C C A (TNX,K)= 0 B B @ X1 X2 ... XN 1 C C A ε (TN,1)= 0 B B @ ε1 ε2 ... εN 1 C C A

where 0T is the null vector (T,1).

ee (TN,N)= 0 B B @ e 0T ... 0T 0T e ... 0T ... ... ... 0T 0T 0T ... e 1 C C A eα (N,1)= 0 B B @ α1 α2 ... αN 1 C C A

(36)

Example

Consider the case of the production function withT =3 and

N=2 0 B B B B B B B @ y1,1 y1,2 y1,3 y2,1 y2,2 y2,3 1 C C C C C C C A = 0 B B B B B B @ 1 0 1 0 1 0 0 1 0 1 0 1 1 C C C C C C A α1 α2 + 0 B B B B B B B @ k11 n11 k12 n12 k13 n13 k21 n21 k22 n22 k,3 n23 1 C C C C C C C A β_k β_n + 0 B B B B B B B @ ε11 ε12 ε1,3 ε2,1 ε2,2 ε2,3 1 C C C C C C C A ()Y =ee_eα+Xβ+ε

(37)

2.2. Fixed and random e¤ects

(38)

Especially in methodological papers, but also in applications, one often sees a discussion about whetherαi will be treated as a random e¤ect or a…xed e¤ect.

De…nition

In the traditional approach to panel data models,αi is called a

“random e¤ect”when it is treated as a random variable and a

“…xed e¤ect”when it is treated as a parameter to be estimated for each cross section observationi.

(39)

Fact

For Wooldridge (2003), these discussions about whether theαi should be treated as random variables or as parameters to be estimated are wrongheaded for microeconometric panel data applications. With a large number of random draws from the cross section,it almost always makes sense to treat the unobserved e¤ects,αi, as random draws from the population, along with yit and xit. This approach is certainly appropriate from an omitted variables or neglected heterogeneity perspective.

(40)

Fact

As suggested by Mundlak (1978), the key issue involvingαi is whether or not it is uncorrelated with the observed explanatory variables xit, for t =1, ..,T .

Mundlak Y. (1978), ”On the Pooling of Time Series and Cross

(41)

De…nition

In modern econometric parlance, “random e¤ect” is synonymous with zero correlation between the observed explanatory variables and the unobserved e¤ect:

cov(xit,αi) =0, t =1, ..,T

(42)

Actually, a stronger conditional mean independence assumption,

E(αi j xi1, ..,xiT) =0

will be needed to fully justify statistical inference. In applied papers, whenαi is referred to as, say, an “individual random

e¤ect,” thenαi is probably being assumed to be uncorrelated with

(43)

De…nition

In microeconometric applications, the term “…xed e¤ect” does not

usually mean thatαi is being treated as nonrandom; rather, it

means that one is allowing for arbitrary correlation between the unobserved e¤ectαi and the observed explanatory variablesxit.

(44)

Wooldridge (2003) avoids referring toαi as a random e¤ect or a

…xed e¤ect. Instead, we will refer to αi as unobserved e¤ect, unobserved heterogeneity, and so on. Nevertheless, later we will

labeltwo di¤erent estimation methods random e¤ects

estimation and …xed e¤ects estimation. This terminology is so ingrained that it is pointless to try to change it now.

(45)

Example (Production function)

Let us consider the simple example of the Cobb Douglass production function.

yi,t = β_iki,t+γini,t +αi+vi,t

In this case,αi corresponds to the unobserved e¤ect on TFP due to

scountry speci…c omitted factor (climate, institutions, organiszation etc..). In this case, we might expect that the the level of factors are positivily correlated with this component of TFP: the more a country is productive, the more it invests in capital for instance.

cov(αi,ki,t)>0 cov(αi,ni,t)>0

(46)

Example (Patents and R&D)

Hausman, Hall, and Griliches (1984) estimate (nonlinear) distributed lag models to study the relationship between patents awarded to a …rm and current and past levels of R&D spending. A linear version of their model is:

patentsit =θt+zitγ+δ0RDit+δ1RDi,t 1+..+δ5RDi,t 5+αi+vi,t

whereRDit is spending on R&D for …rm i at timet andzit

contains other explanatory variables. αi is a …rm heterogeneity

term that may in‡uencepatentsit and that may be correlated with

current, past, and future R&D expenditures.

(47)

De…nition

The Hausman’s test (1978), is a test of the null hypothesis

cov(xit,αi) =0, t =1, ..,T

and is generally presented as test of speci…cation (…xed or

random) of the unobserved e¤ects (see later, section 5.).

(48)

(49)

Let us consider the following model:

yi =eαi +Xiβ+εi 8i =1, ..,N

whereαi is a constant term,β0 = (β₁β₂....β_K )2RK and:

yi

(T,1)

= εi,1 εi,2 ... εi,T 0

εi

(T,1)

= εi,1 εi,2 ... εi,T 0

e (T,1)= 1 1 ... 1 0 Xi (T,K) = 0 B B @ x1,i,1 x2,i,1 ... xK,i,1 x1,i,2 x2,i,2 ... xK,i,2 ... ... ... ... x1,i,T x2,i,T ... xK,i,T 1 C C A

(50)

Assumtions (H1) Let us assume that errors terms εi,t are i.i.d.₈i ₂[1,N],₈t ₂[1,T]with: E(εi,t) =0 E(εi,tεi,s) = σ 2 ε 0 t =s 8t ₆=s ,or E (εiε0i) =σ2εIT

where It denotes the identity matrix (T,T). E(εi,tεj,s) =0,8j 6=i,8(t,s),orE εiε0_j =0

(51)

Theorem

Under assumption H1,the ordinary-least-squares (OLS) estimator of βis the best linear unbiased estimator (BLUE).

De…nition

In this context, the OLS estimatorbβis called the least-squares

dummy-variable(LSDV) of Fixed E¤ect (FE) estimator, because the observed values of the variable for the coe¢ cient αi

takes the form of dummy variables.

(52)

Given the second FOC (with respect to β)and the previous result,

we have: De…nition

Under assumptionH1,the …xed e¤ect estimator or LSDV estimator

of parameter βis de…ned by:

b β_LSDV = "_N

∑

i=1 T

∑

t=1 (xi,t xi) (xi,t xi)0 # 1 "_N

∑

i=1 T

∑

t=1 (xi,t xi) (yi,t yi) #

(55)

1 _{The computational procedure for estimating the slope}

parameters in this model does not require that the dummy variables for the individual (and/or time) e¤ects actually be included in the matrix of explanatory variables.

2 _{We need only …nd the} _{means of time-series observations}

separately for each cross-sectional unit, transform the observed variables by subtracting out the appropriate

time-series means, and then apply the leastsquares method to the transformed data.

(56)

The foregoing procedure is equivalent to premultiplying theith

equation

yi =eαi +Xiβ+εi

by aT T idempotent (covariance) transformation matrix (within operator)

Q =IT

1

Tee 0

to “sweep out” the individual e¤ectαi so that individual

observations are measured as deviations from individual means (over time).

(57)

Qyi andQXi correspond to the observations are measured as

deviations from individual means :

Qyi = IT 1 Tee 0 _y_i = yi e 1 Te 0_yi = 0 B B @ yi,1 yi,2 ... yi,T 1 C C A 1 T T

∑

t=1 yi,t !0 B B @ 1 1 ... 1 1 C C A

(58)

QXi = Xi 1 Tee 0_Xi = 0 B B @

x1,i,1 x2,i,1 ... xK,i,1 x1,i,2 x2,i,2 ... xK,i,2

... ... ... ...

x1,i,T x2,i,T ... xK,i,T

1 C C A 1 T 0 B B @ 1 1 ... 1 1 C C A ∑Tt=1x1,i,t ∑Tt=1x2,i,t ... ∑Tt=1xK,i,t

(59)

Finally, when the transformationQ is applied to a vector of constant (or a time invariant variable), it lead to a null vector.

Qe = IT 1 Tee 0 _e = e 1 Tee 0_e = e e =0 since e0e = 1 .. 1 0 @ 1 .. 1 1 A=T

(60)

So, we have:

yi =eαi +Xiβ+εi

()Qyi = Qeαi+QXiβ+Qεi

(61)

De…nition

Under assumptionH1,the …xed e¤ect estimator or LSDV estimator

of parameter βis de…ned by:

bβ_LSDV = "_N

∑

i=1 X_i0QXi # 1"_N

∑

i=1 X_i0Qyi # where Q =IT 1 Tee 0

(62)

Example

1951 to 1985 (T =35). General idea: evaluate the link between

strikes and some macroeconomic factors (in‡atiion, unemployment etc..)

(63)

(64)

(65)

Theorem

The LSDV estimatorbβis unbiased and consistent when either N or

T or both tend to in…nity. Its variance–covariance matrix is:

V bβ_LSDV =σ2_ε " N

∑

i=1 X_i0QXi # 1 b β_LSDV p! NT_!∞β

(66)

Theorem

However, the estimator for the intercept,bαi, although unbiased, is consistent only when T _!∞.

b

αi p

!

(67)

2.4. Random e¤ects methods

(68)

De…nition

The random speci…cation of unobserved e¤ects corresponds to a

particular case of variance-component or error-component

model, in which the residual is assumed to consist of three components

yi,t =β0xit+εi,t 8i8t

(69)

Assumptions (H2) Let us assume that error terms εi,t =

αi+λt +vi,t are i.i.d.with

8i =1, ..,N, ₈t =1, .,T E(αi) =E(λt) =E(vi,t) =0 E(αiλt) =E(λtvi,t) =E(αivi,t) =0 E(αiαj) = σ 2 α 0 i =j 8i ₆=j E(λtλs) = σ 2 λ 0 t =s 8t ₆=s E(vi,tvj,s) = σ 2 v 0 t =s,i =j 8t ₆=s,₈i ₆=j E αix_i,t0 =E λtx_i,t0 =E vi,tx_i,t0 =0

(70)

De…nition

As suggested by Wooldridge (2001), the "…xed e¤ect"

speci…cation can be viewed as a case in whichαiis a random

parameter withcov αi,xi0,t 6=0, whereas the "random e¤ect

model" correspond to the situation in whichcov αi,x_i,t0 =0.

De…nition

The variance ofyit conditional onxit is the sum of three

components: σ2y =σ2α+σ 2 λ+σ 2 v

(71)

De…nition

Sometimes, the individual e¤ectsα_i are supposed to have a non

zero mean, withE(αi) =µ,then we can de…ned a individual

e¤ectsαi with zero mean. A linear unobserved e¤ects panel data

models is then de…ned as follows:

yi =eµ+Xiβ+εi εi (T,1) = e (T,1)(Tα,1i) + vi (T,1)

(72)

De…nition

The vectorial expression of the individual e¤ects model is then:

yi (T,1) = Xei (T,K+1) γ (K+1,1) + εi (T,1) εi (T,1) = e (T,1)(Tα,1i) + vi (T,1) with e Xi = (e :Xi) and γ0 = µ: β0

(73)

De…nition

Under assumptions H2,the variance-covariance matrix of εi is

equal to:

V =E εiε0i =E (αie+vi) (αie+vi)0 = σ_α2ee0+σ2vIT

Its inverse is:

V 1= 1 σ2v IT σ 2 α σ2v+Tσ2α ee0

(74)

The presence of αi produces a correlation among residuals of the

same crosssectional unit, though the residuals from di¤erent cross-sectional units are independent

V =E εiε0_i =σ2_αee0+σ2_vIT V = 0 B B @ σ2_α+σ2v σ2α ... σ 2 α σ2_α+σ2_v ... σ2_α ... σ2_α σ2_α+σ2v 1 C C A

(75)

However, regardless of whether theαi are treated as …xed or as

random,the individual-speci…c e¤ects for a given sample can be

swept out by the idempotent (covariance) transformation matrixQ

Qyi =Qeµ+QXiβ+Qeαi+Qvi

SinceQe = IT T 1ee0 e =0,we have

Qyi =Qeµ+QXiβ+Qvi

(76)

Theorem

Under assumptions H2,when αi are treated as random, the LSDV estimator is unbiased and consistent either N or T or both tend to in…nity. However, whereas the LSDV is the BLUE under the assumption that αi are …xed constants, it is not the BLUE in …nite samples whenαi are assumed random. The BLUE in the latter case is the generalized-least-squares (GLS) estimator.

(77)

Fact

Moreover, if the explanatory variables contain some time-invariant variables zi, their coe¢ cients cannot be estimated by LSDV, because the covariance transformation eliminates zi.

(78)

Let us consider the model

yi =Xieγ+εi 8i =1, ..,N

whereεi =αie+vi, Xei = (e,Xi)andγ0 = µ,β0 .The variance

covariance matrix V =E(εiε0_i) is known.

De…nition

If the variance covariance matrix V is known, the GLS estimator of the γ vector, denoted bγ_GLS,is de…ned by:

b γ_GLS = " N

∑

i=1 e X_i0V 1Xei # 1" N

∑

i=1 e X_i0V 1yi #

(79)

De…nition

Following Maddala (1971), we writeV as:

V 1 = 1

σ2v

Q+ψ1

Tee 0

where Q = (IT ee0/T) and where parameter ψis de…ned by:

ψ= σ

2 v

σ2v+Tσ2α

(80)

i=1 e X_i0ee0yi # withXei = (e Xi)and γ0 = µ β0

(81)

b µ_GLS b β_GLS = 2 6 6 4 ψNT ψT N ∑ i=1 x0_i ψT N ∑ i=1 xi N ∑ i=1 X_i0QXi +ψT N ∑ i=1 xix 0 i 3 7 7 5 1 2 4 ψ NT y N ∑ i=1 X_i0Qyi+ψT N ∑ i=1 xiy_i 3 5

Using the formula of the partitioned inverse, we can derivebβ_GLS.

(82)

De…nition

If the variance covariance matrix V is known,the GLS estimator of

vector βis de…ned by:

bβ_GLS = " 1 T N

∑

i=1 X_i0QXi+ψ N

∑

i=1 (xi x) (xi x)0 # 1 " 1 T N

∑

i=1 X_i0Qyi+ψ N

∑

i=1 (xi x) (yi y) #

(83)

This estimator can be expressed as a weigthed average of the LSDV (OLS) estimator and the between estimator.

De…nition

The between-group estimator or between estimator,denotedbβ_BE,

corresponds to theOLS estimator obtained in the model:

y_i =c+β0xi+εi 8i =1, ..,N b β_BE = "_N

∑

i=1 (xi x) (xi x)0 # 1"_N

∑

i=1 (xi x) (yi y) #

The estimatorbβ_BE is called the between-group estimator because

it ignores variation within the group

(84)

De…nition

The pooled estimator,denotedbβ_pooled,corresponds to theOLS

estimator obtained in the model:

yit = α+β0xit +εit 8i =1, ..,N 8t =1, ..,T b β_pooled = " _T

∑

t=1 N

∑

i=1 (xi,t x) (xi x)0 # 1 " _T

∑

t=1 N

∑

i=1 (xit x) (yit y) #

(85)

Theorem

Under assumptions H2, the GLS estimator bβ_GLS is a weighted

average of the between-group bβ_BE and the within-group (LSDV)

estimatorsbβ_LSDV.

b

β_GLS =∆bβ_BE + (IK ∆)bβ_LSDV

where∆ denotes a weigth matrix de…ned by:

∆ = ψT "_N

∑

i=1 X_i0QXi+ψT N

∑

i=1 (xi x) (xi x)0 # 1 "_N

∑

i=1 (xi x) (xi x)0 #

(86)

1 If_ψ!1, then GLS converges to the OLS pooled estimator. b β_GLS p! ψ!1 b β_pooled

2 _If_ψ!_{0, the GLS estimator converges to LSDV estimator.}

bβ_GLS p!

ψ!0

(87)

Fact

The constantψmeasures the weight given to the between-group

variation.

In the LSDV (or …xed-e¤ects model) procedure, this source of variation is completely ignored.

The OLS procedure corresponds toψ=1. The between-group and

within-group variations are just added up.

(88)

Fact

The procedure of treating αi as random provides a solution intermediate between treating them all as di¤erent and treating them all as equal.

(89)

Given the de…nition ofψ,we have: ψ= σ 2 v σ2_v+Tσ2_α Tlim!∞ψ=0 Fact

When T tends to in…nity, the GLS estimator converges to the LSDV estimator:

bβ_GLS !

T_!∞bβLSDV

This is because when T _!∞, we have an in…nite number of observations for each i .Therefore, we can consider eachαi as a random variable which has been drawn once and forever, so that for each i we can pretend that they are just like …xed parameters.

(90)

Fact

Computation of the GLS estimator can be simpli…ed by introducing a transformation matrix P such that

P = hIT 1 ψ1/2 (1/T)ee0 i We have V 1 = 1 σ2v P0P

Premultiplying the model by the transformation matrix P, we obtain the GLS estimator by applying the least-squares method to the transformed model (Theil (1971, Chapter 6)).

(91)

This is equivalent to …rst transforming the data by subtracting a

fraction 1 ψ1/2 of individual meansyi andxi from their

correspondingyit and xit, then regressingyit 1 ψ1/2 yi on

aconstant andxit 1 ψ1/2 xi.

(92)

We can show that: var bβ_GLS =σ2_v " N

∑

i=1 X_i0QXi +ψT N

∑

i=1 (xi x) (xi x)0 # 1

Becauseψ>0, we see immediately that the di¤erence between

the covariance matrices ofbβ_LSDV and bβ_GLS is a positive

semide…nite matrix. ForK =1,we have:

(93)

De…nition

If the variance componentsσ2_ε andσ2_α are unknown, we can use

two-stepGLS estimation.

1 In the …rst step, we estimate the variance components using

some consistent estimators.

2 In the second step, we substitute their estimated values into

b γ_GLS = " N

∑

i=1 e X_i0Vb 1Xei # 1" N

∑

i=1 e X_i0Vb 1yi #

or its equivalent form.

(94)

Lemma

When the sample size is large (in the sense of either N _!∞,or T _!∞), the two-step GLS estimator will have the same asymptotic e¢ ciency as the GLS procedure with known variance components.

Lemma

Even for moderate sample size [for T 3,N (K +1) 9; for T 2,N (K+1) 10], the two-step procedure is still more e¢ cient than the covariance (or within-group) estimator in the sense that the di¤erence between the covariance matrices of the covariance estimator and the two-step estimator is nonnegative de…nite

(95)

Noting thaty_i =αi+β0xi+εi and

(yit yi) = (xit xi) + (vit vi), we can use the within- and

between-group residuals to estimate σ2_ε andσ2_α by

b σ2_v = N ∑ i=1 T ∑ t=1 h (yi,t yi) bβ 0 LSDV (xi,t xi) i2 N(T 1) K b σ2_α = N ∑ i=1 y_i bβ0_LSDVxi 2 N K 1 bσ 2 v

(96)

Then, we have an estimate ofψ andV 1 b ψ= σb 2 v b σ2_v+Tbσ2_α ! b V 1 = 1 b σ2v Q+bψ1 Tee 0

(97)

Example

1951 to 1985 (T =35).

s,t =αi+βu,t+γpit+εit 8i =1, ..,17

(98)

(99)

2.5. Fixed or random methods?

(100)

Fact

Whether to treat the e¤ects as …xed or random makes no

di¤erence when T is large, because both the LSDV estimator and the generalized least-squares estimator become the same estimator:

bβ_GLS !

(101)

WhenT is …nite and N is large, whether to treat the e¤ects as …xed or random is not an easy question to answer. It can make a surprising amount of di¤erence in the estimates of the parameters.

(102)

Example

For example, Hausman (1978) found that using a …xed-e¤ects speci…cation produced signi…cantly di¤erent results from a random-e¤ects speci…cation when estimating a wage equation using a sample of 629 high school graduates followed over six years by the Michigan income dynamics study. The explanatory variables in the Hausman wage equation include a piecewise-linear

representation of age, the presence of unemployment or poor health in the previous year, and dummy variables for

(103)

Figure: Hausman (1978)

(104)

Fact

In the random-e¤ects framework, there are two fundamental assumptions. One is that the unobserved individual e¤ectsαi are random draws from a common population. The other is that the explanatory variables are strictly exogenous. That is, the error terms are uncorrelated with (or orthogonal to) the past, current, and future values of the regressors:

(105)

What happens when this condition is violated?

E(αijxi1, ..,xiK)6=0 or E αixi0,t 6=0

1 The Mundlak’s speci…cation (1978) 2 _{The Hausman test}

(106)

(107)

Mundlak (1978) criticized the random-e¤ects formulation on the grounds that it neglects the correlation that may exist between the e¤ectsαi and the explanatory variables xit. There are reasons to

believe that in many circumstancesαi andxit are indeed correlated.

Mundlak Y. (1978), ”On the Pooling of Time Series and Cross

Section Data”,Econometrica, 46, 69-85.

(108)

The properties of various estimators we have discussed thus far depend on the existence and extent of the relations between the X’s and the e¤ects. Therefore, we have to consider the joint distribution of these variables. However, αi are unobservable. =₎ Mundlak (1978a) suggested that we approximate E(αixi,t)by a linear function.

(109)

De…nition

Let us assume that the individual e¤ects satisfy:

αi =xi0a+αi

with a₂ RK andE(α_ix_it0) =0.Under this assumption, the model

is:

yi,t =µ+β0xi,t+x0_ia+εi,t

εi,t =α_i +vi,t

(110)

Assumptions H4 The error term εi,t = α_i +vi,t satis…es, 8i ₂[1,N],₈t₂ [1,T] E(α_i) =E(vi,t) =0 E(α_ivi,t) =0 E α_iα_j = σ 2 α 0 i =j 8i ₆=j E(vi,tvj,s) = σ 2 v 0 t =s,i =j 8t ₆=s,₈i ₆=j E vi,txi,t0 =E αixi,t0 =0

(111)

The model can be rewritten as follows: yi (T,1) = Xe_i (T,K+1) γ (K+1,1) + εi (T,1) 8 i =1, ..,N

with εi =α_ie+vi, Xei = (ex0i,e,Xi)and γ0 = a,µ,β0 .

E εiε0_j = E h (α_ie+vi) α_je+vj 0 i = σ 2 α ee0+σ 2 vIT =V 0 i =j i ₆=j

(112)

Utilizing the expression for the inverse of a partitioned matrix, we obtain the GLS of (µ, β,a) as:

b

µ_GLS =y x0bβ_BE

ba_GLS =bβ_BE bβ_LSDV

b

(113)

Then, the GLS estimatorbβ_GLS is unbiased and we have: bβ_GLS ! T_!∞bβLSDV b β_GLS ! NT!∞ β

(114)

Now let us assume that the DGP corresponds to the Mundlak’s model

αi =xi0a+αi

and we apply GLS to the initial model:

yi,t =µ+β0xi,t+εi,t

(115)

In general, we have:

b

β_GLS =∆bβ_BE + (IK ∆)bβ_LSDV

In this case, we can show that:

E bβ_BE = β+a bβ_BE p!

N_!∞ β+a

E bβ_LSDV = β

(116)

Theorem

So, ifαi =x0ia+αi,the GLS is biaised when T is …xed. More precisely:

E bβ_GLS =β+∆a

b

β_GLS p!

N_!∞β+∆a As usual, the GLS is asymptotically unbiased:

bβ_GLS p!

(117)

WhenT is …xed and N tends to in…nity, the GLS is biased if there is a correlation between the individual e¤ects and the expanatory variables: plim N!∞ b β_GLS = ∆eplim N!∞ b β_BE + IK ∆e plim N!∞ b β_LSDV = ∆e(β+a) + IK ∆e β = β+∆ea with ∆e = plim N_!∞∆ .

(118)

Summary

E(αijxi1, ..,xiK) =0 E(αijxi1, ..,xiK)6=0

LSDV GLS LSDV GLS

T Fixed, N_!∞ Unbiased — Unbiased Biased

(119)

b. The Hausman’s speci…cation test

(120)

Hausman (1978) proposes a general test of speci…cation, that can be applied in the speci…c context of linear panel models to the issue of speci…cation of individual e¤ects (…xed or random).

Hausman J.A., (1978) ”Speci…cation Tests in Econometrics”, Econometrica, 46, 1251-1271

(121)

The general idea of the an Hausman’s test is the following.Let us consider a particular modely =f (x;β) +ε and particular

hypothesisH0 on this model (parameter, error term etc.).Let us

consider two estimators of theK-vector β,denoted bβ₁ andbβ₂,

both consistent underH0 and asymptotically normally distributed.

1 Under H₀,the estimator b_β

1 attains the asymptotic

Cramer–Rao bound.

2 _Under _H1_,_{the estimator} b_β

2 is biased.

(122)

By examing the "distance" betweenbβ₁ and bβ₂,it is possible to

conclude aboutH0:

1 If the "distance" is small, H0 can not be rejected. 2 _{If the "distance" is large,} _H0 _{can be rejected.}

(123)

This distance is naturally de…ned as follows: H= bβ₂ bβ₁ 0h Var bβ₂ bβ₁ i 1 b β₂ bβ₁

However, the issue is to compute the variance-covariance matrixVar bβ₂ bβ₁ of the di¤erence between both

estimators. Generaly we know V bβ₂ and V bβ₁ , but not

Var bβ₂ bβ₁ .

(124)

Lemma (Hausman, 1978)

Based on a sample of N observations, consider two estimatesbβ₁

andbβ₂ that are both consistent and asymptotically normally

distributed, withbβ₁ attaining the asymptotic Cramer–Rao bound

so thatpN bβ₁ β is asymptotically normally distributed with

variance–covariance matrix V1. Suppose

p

N bβ₂ β is

asymptotically normally distributed, with mean zero and variance–covariance matrix V2. Let bq =bβ₂ bβ₁. Then the

limiting distribution [under the null] ofpN bβ₁ β andpNbq

has zero covariance:

(125)

Theorem

From this lemma, it follows that

Var bβ₂ bβ₁ =Var bβ₂ Var bβ₁

Thus, Hausman suggests using the statistic H= bβ₂ bβ₁ 0h Var bβ₂ Var bβ₁ i 1 b β₂ bβ₁ or equivalently H =bq0[Var(bq)] 1bq

(126)

Under the null hypothesis, this statistic is distributed

asymptotically as central chisquare, withK degrees of freedom.

H HO !

N!∞χ

2_(K₎

Under the alternative, it has a noncentral chi-square distribution with noncentrality parameter_eq0[Var (bq)] 1eq, where_eq is de…ned as follows: e q = plim H1/N_!∞ b β₂ bβ₁

(127)

Problem

Now, apply the Hausman’s test to discriminate between …xed e¤ects methods and random e¤ects methods. We assume thatαi are random variable and the key assumption tested is here de…ned as:

H0 : E(αijXi) =0 H1 : E(αijXi)6=0

(128)

This test can be interpreted as a speci…cation test between "…xed e¤ect methods" and "random e¤ect methods".

1 If the null is rejected, the correlation between individual

e¤ects and the explicative variables induces a bias in the GLS estimates. So, a standard LSDV approach (…xed e¤ect method) has to be privilegiated.

2 _{If the null is not rejected, we can use a GLS estilmator}

(random e¤ect method) and specify the individual e¤ects as random variables (random e¤ects model).

(129)

How to implement this test? Let us consider the standard model with random e¤ects (µ=0):

yi =Xiβ+eαi+vi

1 Under H₀ (and assumptionsH₂)we know that b_β

LSDV and

b

β_GLS are consistent and asymptotically normally distributed.

2 Under H₀, b_β

GLS is BLUE and attains asymptotic Cramer–Rao

bound.

3 Under H₁, b_β

GLS is biased.

(130)

According to the Hausman’s lemma, we have:

cov bβ_GLS, bβ_LSDV bβ_GLS =0()cov bβ_LSDV,bβ_GLS =Var bβ_GLS

Since,

Var bβ_LSDV bβ_GLS =Var bβ_LSDV +Var bβ_GLS 2cov bβ_LSDV,bβ_GLS

We have:

(131)

De…nition

The Hausman’s speci…cation test statistic of individual e¤ect can be de…ned as follows: H= bβ_LSDV bβ_GLS 0h Var bβ_LSDV Var bβ_GLS i 1 b β_LSDV bβ_GLS UnderH0,we have: H H_!O NT_!_∞χ 2_(K₎

(132)

1 WhenN is …xed andT tends to in…nity, b_β

GLS andbβMCG

become identical. However, it was shown by Ahn and Moon

(2001) that the numerator and denominator ofH approach

zero at the same speed. Therefore the ratio remains chi-square distributed. However, in this situation the …xed-e¤ects and random-e¤ects models become indistinguishable for all practical purposes.

2 The more typical case in practice is that N is large relative to

T, so that di¤erences between the two estimators or two

(133)

Example

1951 to 1985 (T =35).

si,t the number of strike days for 1000 workers for the country

i at time t.

(134)

(135)

Section 3. Random coe¢ cients models

(136)

There are cases in which there are changing economic structures or di¤erent socioeconomic and demographic background factors that

imply thatthe response parameters may be varying over time

(137)

3.1. Variable coe¢ cient model

(138)

When data do not support the hypothesis of coe¢ cients being the same, a single-equation model in its most general form can be written as:

yit = K

∑

k=1

β_kitxkit+vit i =1, ..,N andt =1, ..,T

where, in contrast to previous sections, we no longer treat the intercept di¤erently than other explanatory variables and let

(139)

Fact

We assume that parameters do not vary with time (the time stability issue is not speci…c to panel data econometrics)

β_kit =β_ki yit = K

∑

k=1 β_kixkit+vit i =1, ..,N and t =1, ..,T

(140)

In the case in which only cross-sectional di¤erences are present, this approach is equivalent to postulating a separate regression for each cross-sectional unit

yi =Xiβ_i +vi i =1, ..,N

or

yit = β0_ixit +vit i =1, ..,N and t =1, ..,T

whereβ_i = (β_1i,β_2i, ...,β_Ki) is a 1 K vector of parameters,and

(141)

De…nition

Alternatively, each regression coe¢ cient can be viewed asa

random variable with a probability distribution (e.g., Hurwicz (1950); Klein (1953); Theil and Mennes (1959); Zellner (1966)).

(142)

1 The random-coe¢ cient speci…cation reduces the number of

parameters to be estimated substantially, while still allowing the coe¢ cients to di¤er from unit to unit and/or from time to time.

2 _{Depending on the type of assumption about the parameter}

variation, it can be further classi…ed into one of two categories: stationary and nonstationary random-coe¢ cient models.

(143)

De…nition

Stationary random-coe¢ cient models regard the coe¢ cients as having constant means and variance–covariances.Namely, the

K 1 vector of parameters β_i is speci…ed as:

β_i = β+ζ_i

fori =1, ..,N, where βis a K 1 vector of constants, and ζi is a

K 1 vector of stationary random variables with zero means and

constant variance–covariances.

(144)

For this type of model we are interested in

1 _{Estimating the mean coe¢ cient vector} _β_,

2 _{Predicting each individual component} _β

i,

3 Estimating the dispersion of the individual-parameter vector

β_i (variance covariance matrix ∆)

4 Testing the hypothesis that the variances of _ξ

i (and so βi)are

(145)

Fact

Because of the computational complexities, variable-coe¢ cient models have not gained as wide acceptance in empirical work as has the variable-intercept model. However, that does not mean that there is less need for taking account of parameter

heterogeneity in pooling the data.

(146)

De…nition

When regression coe¢ cients are viewed as invariant over time, but varying from one unit to another, we can write the model as

yit = K

∑

k=1

(β_k+ξ_ki)xkit+vit i =1, ..,N and t=1, ..,T

whereβ= (β₁,β₂, ...,β_K)can be viewed as the common mean

coe¢ cient 1 K vector, and ξi = (ξ1i,ξ2i, ...,ξKi)as the

(147)

1 If individual observations are heterogeneous or the

performance of individual units from the data base is of

interest, then ξi are treated as …xed constants.

2 _{If conditional on}_x

kit, individual units can be viewed as

random draws from a common population or the population characteristics are of interest, then ξ_i are generally treated

as random variables having zero means and constant variances and covariances.

(148)

De…nition (Fixed-coe¢ cient model)

Whenβ_i are treated as …xed and di¤erent constants, we can stack

theNT observations in the form of the Zellner (1962) seemingly

unrelated regression model

0 @ y1 . yN 1 A= 0 @ X1 0 0 0 .. .. 0 .. XN 1 A 0 @ β₁ . β_N 1 A+ 0 @ v1 . vN 1 A

whereyi andvi are T 1 vectors (yit, ...,yiT) and(vit, ...,viT),

andXi is the T K matrix of the time-series observations of the

(149)

1 If the covariances between di¤erent cross-sectional units are

not zero,e.g. E(vivj0)6=0, the GLS estimator of β0₁, ...,β0_N

is more e¢ cient than the single-equation estimator of i for each cross-sectional unit.

2 IfX_i are identical for alli orE(v_iv_i₀) =_σ2

iIT and

E(vivj0) =0 for i =6 j, the GLS estimator for β0₁, ...,β0_N is

the same as applying least squares separately to the time-series observations of each cross-sectional unit.

(150)

(151)

De…nition

We now assume that all the coe¢ cients β_i are random, with

common mean β. yit = K

∑

k=1 (β_k+ξ_ki)xkit+vit i =1, ..,N and t=1, ..,T

whereβ= (β₁,β₂, ...,β_K)can be viewed as the common mean

coe¢ cient 1 K vector, and ξ_i = (ξ_1i,ξ_2i, ...,ξ_Ki)as the

individual deviation from the common mean. Let us denote

β_ki = β_k+ξ_ki

(152)

1 The vectorx_i = (x_1i..x_Ki)includes a constant term.The

parameter β_ki associated to this constant term correspond to

an individual e¤ect

2 An alternative notation is:

yit =α+ K

∑

k=2 (β_k+ξ_ki)xkit+αi+vit E(αi) =0

(153)

We will consider a set of assumptions used in the seminal paper of Swamy (1970)

Swamy P.A. (1970), ”E¢ cient Inference in a Random

Coe¢ cient Regression Model”,Econometrica, 38, 311-323

(154)

Assumptions (Swamy, 1970) Let us assume that E(ξi) =0, E(vi) =0 E ξ_iξ0_j = ∆ 0 i =j 8i ₆=j E xitξj0 =0, E ξivj0 =0, 8(i,j) E(vivj0) = σ 2 iIT 0 t =s,i =j 8t ₆=s,₈i ₆=j

(155)

Remark : we assume that the error term vi is heteroskedastic: E vivi0 =σ2iIT

(156)

De…nition

The two …rst moments of the vector of random parameters

β_i = β+ξi are de…ned by,8i =1, ..N : E(β_i) (K,1) = β (K,1) ∆ (K,K)=E βiβ 0 i =E ξiξ0i = 0 B B B @ σ2_ξ 1 σξ1,ξ2 ... σξ1,ξK σξ2,ξ1 σ 2 ξ2 ... σξ2,ξK ... ... ... ... σξK,ξ1 σξK,ξ2 ... σ 2 ξK 1 C C C A

(157)

Fact

The matrix∆ corresponds to variance covariance matrix of the random parametersβ_i = (β_1i,β_2i, ..,β_Ki)0, which is assumed to be

common to all cross section units:

∆ (K,K)= 0 B B B @ σ2_β 1 σβ1,β2 ... σβ1,βK σβ2,β1 σ 2 β2 ... σβ2,βK ... ... ... ... σβK,β1 σβK,β2 ... σ 2 βK 1 C C C A

(158)

De…nition

For each cross section unit, we have:

yi =Xiβ+Xiξ_i+vi

with

β_i = β+ξ_i

where the vectorXi include a constant term (i.e. the average of

(159)

De…nition

This model can be rewriten as follows :

yi =Xiβ+εi

with

εi =Xiξi+vi =Xi(β_i β) +vi

(160)

For a given cross unit, the covariance matrix for the composite disturbance termεi =Xiξi+vi is de…ned by:

Φi = E εiε0_i

= E (Xiξ_i+vi) (Xiξ_i +vi)0 = XiE ξiξ0i Xi0+E vivi0

(161)

De…nition

For a given cross unit, the covariance matrix for the composite disturbance termεi =Xiξi+vi is de…ned by:

Φi =Xi∆Xi0+σ2iIT

Stacking allNT observations, the covariance matrix for the

composite disturbance term is either block-diagonal and heteroskedastic or diagonal and heteroskedastic.

(162)

1 Under Swamy’s assumption, the simple regression ofy onX

will yield an unbiased and consistent estimator of β if

(1/NT)X0_X _{converges to a nonzero constant matrix.}

2 But the estimator is ine¢ cient, and the usual least-squares

formula for computing the variance–covariance matrix of the estimator is incorrect, often leading to misleading statistical inferences.

(163)

De…nition

The best linear unbiased estimator ofβ is the GLS estimator

b β_GLS = N

∑

i=1 X_i0Φ_i 1Xi ! 1 N

∑

i=1 X_i0Φ_i 1yi !

(164)

De…nition

The GLS estimatorbβ_GLS is a matrix-weighted average of the

least-squares estimator bβ_i for each cross-sectional unit, with the

weights inversely proportional to their covariance matrices:

Swamy proposed using the least-squares estimators

b

β_i = (X_i0Xi) 1Xi0yi and their residualsbvi =yi Xibβ_i to obtain

unbiased estimators ofσ2_v and∆

b σ2_i = 1 T Ky 0 i h I Xi Xi0Xi 1 X_i0iyi = 1 T K T

∑

t=1 b vi,t

(167)

For the∆ matrix, we have: b ∆ (K,K) = 1 N 1 N

∑

i=1 b β_i !₀#

This estimator, although not unbiased, is nonnegative de…nite and

(169)

Swamy proved that substituting_bσ2_i and∆b for σ2_i and ∆yields an

asymptotically normal and e¢ cient estimator of β. The speed of

convergence of the GLS estimator isN1/2.

(170)

Summary: how to estimate a random coe¢ cient model?

1 Run theN individual regressionsy_i,t =b_β0

ixi,t+bvi,t.

2 Compute b_σ2

i and the Swamy’s estimator ∆b as follows

i=1 X_i0Φb_i 1yi ! b Φi =Xi∆bXi0+σb2iIT

(172)

Example

Swamy (1970) used this model to reestimate the Grunfeld

investment function with the annual data of 11 U.S. corporations. His GLS estimates of the common-mean coe¢ cients of the …rms’ beginning-of-year value of outstanding shares and capital stock are 0.0843 and 0.1961, with asymptotic standard errors 0.014 and 0.0412, respectively. The estimated dispersion measure of these coe¢ cients is

b

∆= 0.0011 0.0002

(173)

Predicting Individual Coe¢ cients

1 _{Sometimes one may wish to predict the individual component}

β_i, because it provides information on the behavior of each

individual and also because it provides a basis for predicting future values of the dependent variable for a given individual.

2 Swamy (1970, 1971) has shown that the best linear unbiased

predictor, conditional on given β_i, is the least-squares

estimator bβ_i.

(174)

De…nition

Lee and Gri¢ ths (1979) suggest predictingβ_i by

β_i = bβ_GLS +∆Xi0 Xi∆Xi0+σ2ιIT

1

yi Xibβ_GLS

This predictor is the best linear unbiased estimator in the sense thatE(β_i β_i) =0, where the expectation is an unconditional

(175)

3.3. Mixed …xed and random coe¢ cient

model

(176)

Many of the previously discussed models can be treated as special

(177)

De…nition

We assume that each cross section unit is di¤erent

yit = K

∑

k=1 β_kixki + m

∑

l=1 γ_liwli+vit i =1, ..,N andt =1, ..,T

wherexit andwit are each aK 1 and anm 1 vector of

explanatory variables that are independent of the error of the equationvit.

(178)

The parameters β

(NK,1)

= (β0₁,β0₂, ...,β0_K)0 are assumed to be

random and parameters γ

(Nm,1)