Chapter 1. Linear Panel Models and Heterogeneity
Master of Science in Economics - University of GenevaChristophe Hurlin, Université d’Orléans
Université d’Orléans
January 2010
The outline of the chapter:
1 Speci…cation tests and analysis of covariance 2 Linear unobserved e¤ects panel data models 3 Fixed or random methods?
4 Fixed-E¤ects methods: Least Squares dummy variable
approach
5 Random e¤ects methods
Section 1.
Speci…cation tests and analysis of
covariance
A linear model commonly used to assess the e¤ects of both quantitative and qualitative factors is postulated as
yi,t =αi,t+β0i,txi,t+εi,t
where8i =1, ..,N, 8t =1, ..,T
αit and β= (β1it,β2it, ...,βKit)are 1 1 and 1 K vectors of
parameters that vary acrossi andt,
xit = (x1it, ...,xKit)is a 1 K vector of exogenous variables,
Three aspects of the estimated regression coe¢ cients can be tested:
1 the homogeneity of regression slope coe¢ cients
2 the homogeneity of regression intercept coe¢ cients.
3 thetime stability of parameters (slopes and constants). We
will not consider this issue (not speci…c to panel data models) here.
We assume that parameters areconstant over time, but can vary across individuals.
yi,t =αi+β0ixi,t+εi,t
Three types of restrictions can be imposed on this model. Regression slope coe¢ cients are identical, and intercepts are not (model with individual / unobserved e¤ects).
yi,t =αi +β0xi,t+εi,t
Regression intercepts are the same, and slope coe¢ cients are not
(unusual).
yi,t = α+β0ixi,t +εi,t
Both slope and intercept coe¢ cients are the same(homogeneous
/ pooled panel).
De…nition
Anheterogeneous panel data model is a model in which all parameters (constant and slope coe¢ cients) vary accross individuals.
De…nition
Anhomogeneous panel data model (or pooled model) is a model in which all parameters (constant and slope coe¢ cients) are common
Figure: Hsiao (2003) [ ]N i H Test :i i 1, 1 0α=αβ=β∀∈ vraie H1 0 rejetée H1 0 t i t i t i x y,=α+β',+ε, [ ]N i H Test:i 1, 2 0β=β∀∈ vraie H2 0 rejetée H2 0 t i t i i i t i x y,=α+β',+ε, TestH: i i[ ]1,N 3 0α=α∀∈ vraie H3 0 Hrejetée 3 0 t i t i i t i x y,=α+β',+ε, t i t i t i x y,=α+β',+ε,
Lemma
Under the assumption that theεit are independently normally distributed over i and t with mean zero and varianceσ2ε, F tests
First step (homogeneous/ pooled assumption)
Let us consider the general model
yi,t =αi +β0xi,t+εi,t
The hypothesis of common intercept and slope can be viewed as a set of(K +1)(N 1)linear restrictions:
H01 : βi = β αi =α 8i 2 [1,N]
Ha1 : 9(i,j)2 [1,N] / βi 6= βj ou αi 6=αj
Under the alternative H1,there are NK estimated slope
coe¢ cients for the N vectors βi (K 1)and estimatedN
constants.
Under H1, the unrestricted residual sum of squaresS1 divided
byσ2ε has a chi-square distribution with NT N(K +1)
De…nition
Under the homogeneous assumptionH01,
H01 : βi
(K,1)
= β
(K,1)
αi = α 8i 2[1,N]
the F statistic,denotedF1,and de…ned by: F1 =
(RSS1,c RSS1)/[(N 1) (K+1)]
RSS1/[NT N(K+1)]
has a Fischer distribution with(N 1) (K +1)and
NT N(K +1)degrees of freedom. RSS1 denotes the residual
sum of squares of the model andRSS1,c the residual sum of
squares of the constrained model:
yi,t =α+β0xi,t +εi,t
UnderH1,the residual sum of squares is equal to the sum of the N
residual sum of squares associated to theN individual regressions:
RSS1 = N
∑
i=1 RSS1,i = N∑
i=1 Syy,i Sxy0 ,iSxx1,iSxy,i with Syy,i = T∑
t=1 (yi,t yi) 2 with xi = 1 T T∑
t=1 xi,t and yi = 1 T T∑
t=1 yi,t Sxx,i = T∑
t=1 (xi,t xi) (xi,t xi)0 Sxy,i = T∑
t=1 (xi,t xi) (yi,t yi)0UnderH01,the model is :
yi,t =α+β0xi,t +εi,t
the least-squares regression of the pooled model yields parameter estimates b β=Sxx1Sxy Sxx = N
∑
i=1 T∑
t=1(xi,t x) (xi,t x)0 with x =
1 NT N
∑
i=1 T∑
t=1 xi,t Sxy = N∑
i=1 T∑
t=1(xi,t x) (yi,t y)0 with y = 1
NT N
∑
i=1 T∑
t=1 yi,tUnderH01,the overall RSS is de…ned by SCR1,c =Syy Sxy0 Sxx1Sxy with Syy = N
∑
i=1 T∑
t=1 (yi,t yi) 2 Sxx,i = N∑
i=1 T∑
t=1 (xi,t xi) (xi,t xi)0 Sxy,i = N∑
i=1 T∑
t=1 (xi,t xi) (yi,t yi)0Second step (individual/unobserved e¤ects)
Let us consider the general model
yi,t =αi+β0ixi,t+εi,t
The hypothesis of heterogeneous intercepts but homogeneous
slopes can be reformulated as subject to(N 1)K linear
restrictions (non restrictions on αi).
H02 : βi =β 8i =1, ..N
De…nition
Under the assumptionH02,
H02 : βi =β 8i =1, ..N
the F statistic, denotedF2,and de…ned by:
F2 =
(RSS1,c0 RSS1)/[(N 1)K] RSS1/[NT N(K+1)]
has a Fischer distribution with(N 1)K etNT N(K +1)
degrees of freedom underH02. RSS1 denotes the residual sum of
squares of the model andRSS1,c0 the residual sum of squares of
the constrained model (model with individual e¤ects):
UnderH02,the residual sum of squares is: RSS1,c0 = N
∑
i=1 Syy,i N∑
i=1 Sxy,i !0 N∑
i=1 Sxx,i ! 1 N∑
i=1 Sxy,i ! Syy,i = T∑
t=1 (yi,t yi) 2 with xi = 1 T T∑
t=1 xi,t yi = 1 T T∑
t=1 yi,t Sxx,i = T∑
t=1 (xi,t xi) (xi,t xi)0 Sxy,i = T∑
t=1 (xi,t xi) (yi,t yi)0Last step
IfH2
0 is not rejected, one can also apply a conditional test for
homogeneous intercepts (N 1 linear restrictions).
H03 : αi =α 8i =1, ..,N given βi = β
Under the null, the model is homogeneous (pooled) and the restricted residual sum of squares is SCR1,c..
Under the alternative, the model is yi,t =αi+β0xi,t +εi,t,
De…nition
Under the assumptionH03,
H03 : αi =α 8i =1, ..,N given βi = β
the F statistic, denotedF3,and de…ned by: F3 =
(RSS1,c RSS1,c0)/(N 1) RSS1,c0/[N(T 1) K]
(1)
has a Fischer distribution withN 1 and N(T 1) K degrees
of freedom underH02. RSS1,c0 denotes the residual sum of squares
of the model with individual e¤ects andSCR1,c the residual sum of
squares of the pooled model previously de…ned.
Example
Let us consider a simple panel regression model for the total number of strikes days in OECD countries. We have a balanced
panel data set for 17 countries (N =17) and annual data form
1951 to 1985 (T =35). General idea: evaluate the link between
strikes and some macroeconomic factors (in‡atiion, unemployment etc..)
s,t =αi+βiu,t+γipit+εit 8i =1, ..,17
si,t the number of strike days for 1000 workers for the country i at time t.
uitt the unemployement rate
pi,t,the in‡ation rate
Conclusion of section 1.
Fact
It is possible to test the heterogeneity / homogeneity of the parameters under some speci…c assumptions(normality of residuals). More generaly, the assumption of heterogenity / homogeneity of the parameters (slope coe¢ cients and constants) has to be evaluated with an economic interpretation.
Example
Example: It is reasonnable to assume that the slope parameters of the production function are the same accros countries? what does it imply? Should I impose a common mean for the TFP for France and Germany?
Section 2.
Linear unobserved/individual e¤ects
panel data models
2.1. De…nitions
We now consider a model with individual e¤ects and common slope parameters.
De…nition
A linear unobserved / individual e¤ects panel data model is de…ned as follows:
yit = αi+β0xit +εit
where8i =1, ..,N, 8t =1, ..,T
αi and β= (β1,β2, ...,βK)are 1 1 and 1 K vectors of
parameters,
xit = (x1it, ...,xKit)is a 1 K vector of exogenous variables,
εit is the error term, assumed to be i.i.d.,with 8i =1, ..,N,
8t =1, ..,T
De…nition
There are many names forαi: (1) unobserved e¤ects, (2)
individual e¤ects, (3) unobserved component, (4) latent variable (for random e¤ects models), (5) individual heterogeneity. De…nition
Theεit are called the idiosyncratic errors or idiosyncratic disturbancesbecause these change accross t as well as accross i.
Writting in vector form. Let us denote yi (T,1) = 0 B B @ yi,1 yi,2 ... yi,T 1 C C A Xi (T,K) = 0 B B @
x1,i,1 x2,i,1 ... xK,i,1 x1,i,2 x2,i,2 ... xK,i,2
... ... ... ...
x1,i,T x2,i,T ... xK,i,T
1 C C A
Let us denotee a unit vector and εi the vector of errors:
e (T,1)= 0 B B @ 1 1 ... 1 1 C C A εi (T,1) = 0 B B @ εi,1 εi,2 ... εi,T 1 C C A
De…nition
For a given individuali, the linear unobserved / individual
e¤ects panel data modelis de…ned as follows:
yi =eαi+Xiβ+εi 8i =1, ..,N E(εi) =0
E εiεi0 =σ2εIT
E εiεj0 = 0
(T,T) if i 6=j
Example
Let us consider the case of a Cobb Douglas production function in
log, as de…ned previously, for the caseT =3.We have:
yi,t =αi+βkki,t +βnni,t+εi,t 8i,8t 2[1,3]
or in a vectorial form for a countryi :
0 @ yi,1 yi,2 yi,3 1 A= 0 @ 1 1 1 1 Aαi+ 0 @ ki,1 ni,1 ki,2 ni,2 ki,3 ni,3 1 A βk βn + 0 @ εεi,1i,2 εi,3 1 A
It is also possible to stackle all this vectors/matrix as follows Y =eeeα+Xβ+ε Y (TN,1)= 0 B B @ y1 y2 ... yN 1 C C A (TNX,K)= 0 B B @ X1 X2 ... XN 1 C C A ε (TN,1)= 0 B B @ ε1 ε2 ... εN 1 C C A
where 0T is the null vector (T,1).
ee (TN,N)= 0 B B @ e 0T ... 0T 0T e ... 0T ... ... ... 0T 0T 0T ... e 1 C C A eα (N,1)= 0 B B @ α1 α2 ... αN 1 C C A
Example
Consider the case of the production function withT =3 and
N=2 0 B B B B B B B @ y1,1 y1,2 y1,3 y2,1 y2,2 y2,3 1 C C C C C C C A = 0 B B B B B B @ 1 0 1 0 1 0 0 1 0 1 0 1 1 C C C C C C A α1 α2 + 0 B B B B B B B @ k11 n11 k12 n12 k13 n13 k21 n21 k22 n22 k,3 n23 1 C C C C C C C A βk βn + 0 B B B B B B B @ ε11 ε12 ε1,3 ε2,1 ε2,2 ε2,3 1 C C C C C C C A ()Y =eeeα+Xβ+ε
2.2. Fixed and random e¤ects
Especially in methodological papers, but also in applications, one often sees a discussion about whetherαi will be treated as a random e¤ect or a…xed e¤ect.
De…nition
In the traditional approach to panel data models,αi is called a
“random e¤ect”when it is treated as a random variable and a
“…xed e¤ect”when it is treated as a parameter to be estimated for each cross section observationi.
Fact
For Wooldridge (2003), these discussions about whether theαi should be treated as random variables or as parameters to be estimated are wrongheaded for microeconometric panel data applications. With a large number of random draws from the cross section,it almost always makes sense to treat the unobserved e¤ects,αi, as random draws from the population, along with yit and xit. This approach is certainly appropriate from an omitted variables or neglected heterogeneity perspective.
Fact
As suggested by Mundlak (1978), the key issue involvingαi is whether or not it is uncorrelated with the observed explanatory variables xit, for t =1, ..,T .
Mundlak Y. (1978), ”On the Pooling of Time Series and Cross
De…nition
In modern econometric parlance, “random e¤ect” is synonymous with zero correlation between the observed explanatory variables and the unobserved e¤ect:
cov(xit,αi) =0, t =1, ..,T
Actually, a stronger conditional mean independence assumption,
E(αi j xi1, ..,xiT) =0
will be needed to fully justify statistical inference. In applied papers, whenαi is referred to as, say, an “individual random
e¤ect,” thenαi is probably being assumed to be uncorrelated with
De…nition
In microeconometric applications, the term “…xed e¤ect” does not
usually mean thatαi is being treated as nonrandom; rather, it
means that one is allowing for arbitrary correlation between the unobserved e¤ectαi and the observed explanatory variablesxit.
Wooldridge (2003) avoids referring toαi as a random e¤ect or a
…xed e¤ect. Instead, we will refer to αi as unobserved e¤ect, unobserved heterogeneity, and so on. Nevertheless, later we will
labeltwo di¤erent estimation methods random e¤ects
estimation and …xed e¤ects estimation. This terminology is so ingrained that it is pointless to try to change it now.
Example (Production function)
Let us consider the simple example of the Cobb Douglass production function.
yi,t = βiki,t+γini,t +αi+vi,t
In this case,αi corresponds to the unobserved e¤ect on TFP due to
scountry speci…c omitted factor (climate, institutions, organiszation etc..). In this case, we might expect that the the level of factors are positivily correlated with this component of TFP: the more a country is productive, the more it invests in capital for instance.
cov(αi,ki,t)>0 cov(αi,ni,t)>0
Example (Patents and R&D)
Hausman, Hall, and Griliches (1984) estimate (nonlinear) distributed lag models to study the relationship between patents awarded to a …rm and current and past levels of R&D spending. A linear version of their model is:
patentsit =θt+zitγ+δ0RDit+δ1RDi,t 1+..+δ5RDi,t 5+αi+vi,t
whereRDit is spending on R&D for …rm i at timet andzit
contains other explanatory variables. αi is a …rm heterogeneity
term that may in‡uencepatentsit and that may be correlated with
current, past, and future R&D expenditures.
De…nition
The Hausman’s test (1978), is a test of the null hypothesis
cov(xit,αi) =0, t =1, ..,T
and is generally presented as test of speci…cation (…xed or
random) of the unobserved e¤ects (see later, section 5.).
Let us consider the following model:
yi =eαi +Xiβ+εi 8i =1, ..,N
whereαi is a constant term,β0 = (β1β2....βK )2RK and:
yi
(T,1)
= εi,1 εi,2 ... εi,T 0
εi
(T,1)
= εi,1 εi,2 ... εi,T 0
e (T,1)= 1 1 ... 1 0 Xi (T,K) = 0 B B @ x1,i,1 x2,i,1 ... xK,i,1 x1,i,2 x2,i,2 ... xK,i,2 ... ... ... ... x1,i,T x2,i,T ... xK,i,T 1 C C A
Assumtions (H1) Let us assume that errors terms εi,t are i.i.d.8i 2[1,N],8t 2[1,T]with: E(εi,t) =0 E(εi,tεi,s) = σ 2 ε 0 t =s 8t 6=s ,or E (εiε0i) =σ2εIT
where It denotes the identity matrix (T,T). E(εi,tεj,s) =0,8j 6=i,8(t,s),orE εiε0j =0
Theorem
Under assumption H1,the ordinary-least-squares (OLS) estimator of βis the best linear unbiased estimator (BLUE).
De…nition
In this context, the OLS estimatorbβis called the least-squares
dummy-variable(LSDV) of Fixed E¤ect (FE) estimator, because the observed values of the variable for the coe¢ cient αi
takes the form of dummy variables.
OLS estimators ofαi and β and are obtained by minimizing n b αi,bβLSDV o arg min fαi,βgNi=1 S = N
∑
i=1 ε0iεi = N∑
i=1 (yi eαi Xiβ)0(yi eαi Xiβ)FOC1 (with respect toαi): b αi =yi bβ 0 LSDVxi with xi = 1 T T
∑
t=1 xi,t yi = 1 T T∑
t=1 yi,tGiven the second FOC (with respect to β)and the previous result,
we have: De…nition
Under assumptionH1,the …xed e¤ect estimator or LSDV estimator
of parameter βis de…ned by:
b βLSDV = "N
∑
i=1 T∑
t=1 (xi,t xi) (xi,t xi)0 # 1 "N∑
i=1 T∑
t=1 (xi,t xi) (yi,t yi) #1 The computational procedure for estimating the slope
parameters in this model does not require that the dummy variables for the individual (and/or time) e¤ects actually be included in the matrix of explanatory variables.
2 We need only …nd the means of time-series observations
separately for each cross-sectional unit, transform the observed variables by subtracting out the appropriate
time-series means, and then apply the leastsquares method to the transformed data.
The foregoing procedure is equivalent to premultiplying theith
equation
yi =eαi +Xiβ+εi
by aT T idempotent (covariance) transformation matrix (within operator)
Q =IT
1
Tee 0
to “sweep out” the individual e¤ectαi so that individual
observations are measured as deviations from individual means (over time).
Qyi andQXi correspond to the observations are measured as
deviations from individual means :
Qyi = IT 1 Tee 0 yi = yi e 1 Te 0yi = 0 B B @ yi,1 yi,2 ... yi,T 1 C C A 1 T T
∑
t=1 yi,t !0 B B @ 1 1 ... 1 1 C C AQXi = Xi 1 Tee 0Xi = 0 B B @
x1,i,1 x2,i,1 ... xK,i,1 x1,i,2 x2,i,2 ... xK,i,2
... ... ... ...
x1,i,T x2,i,T ... xK,i,T
1 C C A 1 T 0 B B @ 1 1 ... 1 1 C C A ∑Tt=1x1,i,t ∑Tt=1x2,i,t ... ∑Tt=1xK,i,t
Finally, when the transformationQ is applied to a vector of constant (or a time invariant variable), it lead to a null vector.
Qe = IT 1 Tee 0 e = e 1 Tee 0e = e e =0 since e0e = 1 .. 1 0 @ 1 .. 1 1 A=T
So, we have:
yi =eαi +Xiβ+εi
()Qyi = Qeαi+QXiβ+Qεi
De…nition
Under assumptionH1,the …xed e¤ect estimator or LSDV estimator
of parameter βis de…ned by:
bβLSDV = "N
∑
i=1 Xi0QXi # 1"N∑
i=1 Xi0Qyi # where Q =IT 1 Tee 0Example
Let us consider a simple panel regression model for the total number of strikes days in OECD countries. We have a balanced
panel data set for 17 countries (N =17) and annual data form
1951 to 1985 (T =35). General idea: evaluate the link between
strikes and some macroeconomic factors (in‡atiion, unemployment etc..)
s,t =αi+βiu,t+γipit+εit 8i =1, ..,17
si,t the number of strike days for 1000 workers for the country i at time t.
uitt the unemployement rate
Theorem
The LSDV estimatorbβis unbiased and consistent when either N or
T or both tend to in…nity. Its variance–covariance matrix is:
V bβLSDV =σ2ε " N
∑
i=1 Xi0QXi # 1 b βLSDV p! NT!∞βTheorem
However, the estimator for the intercept,bαi, although unbiased, is consistent only when T !∞.
b
αi p
!
2.4. Random e¤ects methods
De…nition
The random speci…cation of unobserved e¤ects corresponds to a
particular case of variance-component or error-component
model, in which the residual is assumed to consist of three components
yi,t =β0xit+εi,t 8i8t
Assumptions (H2) Let us assume that error terms εi,t =
αi+λt +vi,t are i.i.d.with
8i =1, ..,N, 8t =1, .,T E(αi) =E(λt) =E(vi,t) =0 E(αiλt) =E(λtvi,t) =E(αivi,t) =0 E(αiαj) = σ 2 α 0 i =j 8i 6=j E(λtλs) = σ 2 λ 0 t =s 8t 6=s E(vi,tvj,s) = σ 2 v 0 t =s,i =j 8t 6=s,8i 6=j E αixi,t0 =E λtxi,t0 =E vi,txi,t0 =0
De…nition
As suggested by Wooldridge (2001), the "…xed e¤ect"
speci…cation can be viewed as a case in whichαiis a random
parameter withcov αi,xi0,t 6=0, whereas the "random e¤ect
model" correspond to the situation in whichcov αi,xi,t0 =0.
De…nition
The variance ofyit conditional onxit is the sum of three
components: σ2y =σ2α+σ 2 λ+σ 2 v
De…nition
Sometimes, the individual e¤ectsαi are supposed to have a non
zero mean, withE(αi) =µ,then we can de…ned a individual
e¤ectsαi with zero mean. A linear unobserved e¤ects panel data
models is then de…ned as follows:
yi =eµ+Xiβ+εi εi (T,1) = e (T,1)(Tα,1i) + vi (T,1)
De…nition
The vectorial expression of the individual e¤ects model is then:
yi (T,1) = Xei (T,K+1) γ (K+1,1) + εi (T,1) εi (T,1) = e (T,1)(Tα,1i) + vi (T,1) with e Xi = (e :Xi) and γ0 = µ: β0
De…nition
Under assumptions H2,the variance-covariance matrix of εi is
equal to:
V =E εiε0i =E (αie+vi) (αie+vi)0 = σα2ee0+σ2vIT
Its inverse is:
V 1= 1 σ2v IT σ 2 α σ2v+Tσ2α ee0
The presence of αi produces a correlation among residuals of the
same crosssectional unit, though the residuals from di¤erent cross-sectional units are independent
V =E εiε0i =σ2αee0+σ2vIT V = 0 B B @ σ2α+σ2v σ2α ... σ 2 α σ2α+σ2v ... σ2α ... σ2α σ2α+σ2v 1 C C A
However, regardless of whether theαi are treated as …xed or as
random,the individual-speci…c e¤ects for a given sample can be
swept out by the idempotent (covariance) transformation matrixQ
Qyi =Qeµ+QXiβ+Qeαi+Qvi
SinceQe = IT T 1ee0 e =0,we have
Qyi =Qeµ+QXiβ+Qvi
Theorem
Under assumptions H2,when αi are treated as random, the LSDV estimator is unbiased and consistent either N or T or both tend to in…nity. However, whereas the LSDV is the BLUE under the assumption that αi are …xed constants, it is not the BLUE in …nite samples whenαi are assumed random. The BLUE in the latter case is the generalized-least-squares (GLS) estimator.
Fact
Moreover, if the explanatory variables contain some time-invariant variables zi, their coe¢ cients cannot be estimated by LSDV, because the covariance transformation eliminates zi.
Let us consider the model
yi =Xieγ+εi 8i =1, ..,N
whereεi =αie+vi, Xei = (e,Xi)andγ0 = µ,β0 .The variance
covariance matrix V =E(εiε0i) is known.
De…nition
If the variance covariance matrix V is known, the GLS estimator of the γ vector, denoted bγGLS,is de…ned by:
b γGLS = " N
∑
i=1 e Xi0V 1Xei # 1" N∑
i=1 e Xi0V 1yi #De…nition
Following Maddala (1971), we writeV as:
V 1 = 1
σ2v
Q+ψ1
Tee 0
where Q = (IT ee0/T) and where parameter ψis de…ned by:
ψ= σ
2 v
σ2v+Tσ2α
Given this de…nition ofV 1,we have: b γGLS = " N
∑
i=1 e Xi0 Q+ψ1 Tee 0 Xie # 1"N∑
i=1 e Xi0 Q+ψ1 Tee 0 yi # b γGLS = "N∑
i=1 e Xi0QXie +ψ1 T N∑
i=1 e Xi0ee0Xie # 1"N∑
i=1 e Xi0Qyi+ψ1 T N∑
i=1 e Xi0ee0yi # withXei = (e Xi)and γ0 = µ β0b µGLS b βGLS = 2 6 6 4 ψNT ψT N ∑ i=1 x0i ψT N ∑ i=1 xi N ∑ i=1 Xi0QXi +ψT N ∑ i=1 xix 0 i 3 7 7 5 1 2 4 ψ NT y N ∑ i=1 Xi0Qyi+ψT N ∑ i=1 xiyi 3 5
Using the formula of the partitioned inverse, we can derivebβGLS.
De…nition
If the variance covariance matrix V is known,the GLS estimator of
vector βis de…ned by:
bβGLS = " 1 T N
∑
i=1 Xi0QXi+ψ N∑
i=1 (xi x) (xi x)0 # 1 " 1 T N∑
i=1 Xi0Qyi+ψ N∑
i=1 (xi x) (yi y) #This estimator can be expressed as a weigthed average of the LSDV (OLS) estimator and the between estimator.
De…nition
The between-group estimator or between estimator,denotedbβBE,
corresponds to theOLS estimator obtained in the model:
yi =c+β0xi+εi 8i =1, ..,N b βBE = "N
∑
i=1 (xi x) (xi x)0 # 1"N∑
i=1 (xi x) (yi y) #The estimatorbβBE is called the between-group estimator because
it ignores variation within the group
De…nition
The pooled estimator,denotedbβpooled,corresponds to theOLS
estimator obtained in the model:
yit = α+β0xit +εit 8i =1, ..,N 8t =1, ..,T b βpooled = " T
∑
t=1 N∑
i=1 (xi,t x) (xi x)0 # 1 " T∑
t=1 N∑
i=1 (xit x) (yit y) #Theorem
Under assumptions H2, the GLS estimator bβGLS is a weighted
average of the between-group bβBE and the within-group (LSDV)
estimatorsbβLSDV.
b
βGLS =∆bβBE + (IK ∆)bβLSDV
where∆ denotes a weigth matrix de…ned by:
∆ = ψT "N
∑
i=1 Xi0QXi+ψT N∑
i=1 (xi x) (xi x)0 # 1 "N∑
i=1 (xi x) (xi x)0 #1 Ifψ!1, then GLS converges to the OLS pooled estimator. b βGLS p! ψ!1 b βpooled
2 Ifψ!0, the GLS estimator converges to LSDV estimator.
bβGLS p!
ψ!0
Fact
The constantψmeasures the weight given to the between-group
variation.
In the LSDV (or …xed-e¤ects model) procedure, this source of variation is completely ignored.
The OLS procedure corresponds toψ=1. The between-group and
within-group variations are just added up.
Fact
The procedure of treating αi as random provides a solution intermediate between treating them all as di¤erent and treating them all as equal.
Given the de…nition ofψ,we have: ψ= σ 2 v σ2v+Tσ2α Tlim!∞ψ=0 Fact
When T tends to in…nity, the GLS estimator converges to the LSDV estimator:
bβGLS !
T!∞bβLSDV
This is because when T !∞, we have an in…nite number of observations for each i .Therefore, we can consider eachαi as a random variable which has been drawn once and forever, so that for each i we can pretend that they are just like …xed parameters.
Fact
Computation of the GLS estimator can be simpli…ed by introducing a transformation matrix P such that
P = hIT 1 ψ1/2 (1/T)ee0 i We have V 1 = 1 σ2v P0P
Premultiplying the model by the transformation matrix P, we obtain the GLS estimator by applying the least-squares method to the transformed model (Theil (1971, Chapter 6)).
This is equivalent to …rst transforming the data by subtracting a
fraction 1 ψ1/2 of individual meansyi andxi from their
correspondingyit and xit, then regressingyit 1 ψ1/2 yi on
aconstant andxit 1 ψ1/2 xi.
We can show that: var bβGLS =σ2v " N
∑
i=1 Xi0QXi +ψT N∑
i=1 (xi x) (xi x)0 # 1Becauseψ>0, we see immediately that the di¤erence between
the covariance matrices ofbβLSDV and bβGLS is a positive
semide…nite matrix. ForK =1,we have:
De…nition
If the variance componentsσ2ε andσ2α are unknown, we can use
two-stepGLS estimation.
1 In the …rst step, we estimate the variance components using
some consistent estimators.
2 In the second step, we substitute their estimated values into
b γGLS = " N
∑
i=1 e Xi0Vb 1Xei # 1" N∑
i=1 e Xi0Vb 1yi #or its equivalent form.
Lemma
When the sample size is large (in the sense of either N !∞,or T !∞), the two-step GLS estimator will have the same asymptotic e¢ ciency as the GLS procedure with known variance components.
Lemma
Even for moderate sample size [for T 3,N (K +1) 9; for T 2,N (K+1) 10], the two-step procedure is still more e¢ cient than the covariance (or within-group) estimator in the sense that the di¤erence between the covariance matrices of the covariance estimator and the two-step estimator is nonnegative de…nite
Noting thatyi =αi+β0xi+εi and
(yit yi) = (xit xi) + (vit vi), we can use the within- and
between-group residuals to estimate σ2ε andσ2α by
b σ2v = N ∑ i=1 T ∑ t=1 h (yi,t yi) bβ 0 LSDV (xi,t xi) i2 N(T 1) K b σ2α = N ∑ i=1 yi bβ0LSDVxi 2 N K 1 bσ 2 v
Then, we have an estimate ofψ andV 1 b ψ= σb 2 v b σ2v+Tbσ2α ! b V 1 = 1 b σ2v Q+bψ1 Tee 0
Example
Let us consider a simple panel regression model for the total number of strikes days in OECD countries. We have a balanced
panel data set for 17 countries (N =17) and annual data form
1951 to 1985 (T =35).
s,t =αi+βu,t+γpit+εit 8i =1, ..,17
si,t the number of strike days for 1000 workers for the country i at time t.
uitt the unemployement rate
pi,t,the in‡ation rate
2.5. Fixed or random methods?
Fact
Whether to treat the e¤ects as …xed or random makes no
di¤erence when T is large, because both the LSDV estimator and the generalized least-squares estimator become the same estimator:
bβGLS !
WhenT is …nite and N is large, whether to treat the e¤ects as …xed or random is not an easy question to answer. It can make a surprising amount of di¤erence in the estimates of the parameters.
Example
For example, Hausman (1978) found that using a …xed-e¤ects speci…cation produced signi…cantly di¤erent results from a random-e¤ects speci…cation when estimating a wage equation using a sample of 629 high school graduates followed over six years by the Michigan income dynamics study. The explanatory variables in the Hausman wage equation include a piecewise-linear
representation of age, the presence of unemployment or poor health in the previous year, and dummy variables for
Figure: Hausman (1978)
Fact
In the random-e¤ects framework, there are two fundamental assumptions. One is that the unobserved individual e¤ectsαi are random draws from a common population. The other is that the explanatory variables are strictly exogenous. That is, the error terms are uncorrelated with (or orthogonal to) the past, current, and future values of the regressors:
What happens when this condition is violated?
E(αijxi1, ..,xiK)6=0 or E αixi0,t 6=0
1 The Mundlak’s speci…cation (1978) 2 The Hausman test
Mundlak (1978) criticized the random-e¤ects formulation on the grounds that it neglects the correlation that may exist between the e¤ectsαi and the explanatory variables xit. There are reasons to
believe that in many circumstancesαi andxit are indeed correlated.
Mundlak Y. (1978), ”On the Pooling of Time Series and Cross
Section Data”,Econometrica, 46, 69-85.
The properties of various estimators we have discussed thus far depend on the existence and extent of the relations between the X’s and the e¤ects. Therefore, we have to consider the joint distribution of these variables. However, αi are unobservable. =) Mundlak (1978a) suggested that we approximate E(αixi,t)by a linear function.
De…nition
Let us assume that the individual e¤ects satisfy:
αi =xi0a+αi
with a2 RK andE(αixit0) =0.Under this assumption, the model
is:
yi,t =µ+β0xi,t+x0ia+εi,t
εi,t =αi +vi,t
Assumptions H4 The error term εi,t = αi +vi,t satis…es, 8i 2[1,N],8t2 [1,T] E(αi) =E(vi,t) =0 E(αivi,t) =0 E αiαj = σ 2 α 0 i =j 8i 6=j E(vi,tvj,s) = σ 2 v 0 t =s,i =j 8t 6=s,8i 6=j E vi,txi,t0 =E αixi,t0 =0
The model can be rewritten as follows: yi (T,1) = Xei (T,K+1) γ (K+1,1) + εi (T,1) 8 i =1, ..,N
with εi =αie+vi, Xei = (ex0i,e,Xi)and γ0 = a,µ,β0 .
E εiε0j = E h (αie+vi) αje+vj 0 i = σ 2 α ee0+σ 2 vIT =V 0 i =j i 6=j
Utilizing the expression for the inverse of a partitioned matrix, we obtain the GLS of (µ, β,a) as:
b
µGLS =y x0bβBE
baGLS =bβBE bβLSDV
b
Then, the GLS estimatorbβGLS is unbiased and we have: bβGLS ! T!∞bβLSDV b βGLS ! NT!∞ β
Now let us assume that the DGP corresponds to the Mundlak’s model
αi =xi0a+αi
and we apply GLS to the initial model:
yi,t =µ+β0xi,t+εi,t
In general, we have:
b
βGLS =∆bβBE + (IK ∆)bβLSDV
In this case, we can show that:
E bβBE = β+a bβBE p!
N!∞ β+a
E bβLSDV = β
Theorem
So, ifαi =x0ia+αi,the GLS is biaised when T is …xed. More precisely:
E bβGLS =β+∆a
b
βGLS p!
N!∞β+∆a As usual, the GLS is asymptotically unbiased:
bβGLS p!
WhenT is …xed and N tends to in…nity, the GLS is biased if there is a correlation between the individual e¤ects and the expanatory variables: plim N!∞ b βGLS = ∆eplim N!∞ b βBE + IK ∆e plim N!∞ b βLSDV = ∆e(β+a) + IK ∆e β = β+∆ea with ∆e = plim N!∞∆ .
Summary
E(αijxi1, ..,xiK) =0 E(αijxi1, ..,xiK)6=0
LSDV GLS LSDV GLS
T Fixed, N!∞ Unbiased — Unbiased Biased
b. The Hausman’s speci…cation test
Hausman (1978) proposes a general test of speci…cation, that can be applied in the speci…c context of linear panel models to the issue of speci…cation of individual e¤ects (…xed or random).
Hausman J.A., (1978) ”Speci…cation Tests in Econometrics”, Econometrica, 46, 1251-1271
The general idea of the an Hausman’s test is the following.Let us consider a particular modely =f (x;β) +ε and particular
hypothesisH0 on this model (parameter, error term etc.).Let us
consider two estimators of theK-vector β,denoted bβ1 andbβ2,
both consistent underH0 and asymptotically normally distributed.
1 Under H0,the estimator bβ
1 attains the asymptotic
Cramer–Rao bound.
2 Under H1,the estimator bβ
2 is biased.
By examing the "distance" betweenbβ1 and bβ2,it is possible to
conclude aboutH0:
1 If the "distance" is small, H0 can not be rejected. 2 If the "distance" is large, H0 can be rejected.
This distance is naturally de…ned as follows: H= bβ2 bβ1 0h Var bβ2 bβ1 i 1 b β2 bβ1
However, the issue is to compute the variance-covariance matrixVar bβ2 bβ1 of the di¤erence between both
estimators. Generaly we know V bβ2 and V bβ1 , but not
Var bβ2 bβ1 .
Lemma (Hausman, 1978)
Based on a sample of N observations, consider two estimatesbβ1
andbβ2 that are both consistent and asymptotically normally
distributed, withbβ1 attaining the asymptotic Cramer–Rao bound
so thatpN bβ1 β is asymptotically normally distributed with
variance–covariance matrix V1. Suppose
p
N bβ2 β is
asymptotically normally distributed, with mean zero and variance–covariance matrix V2. Let bq =bβ2 bβ1. Then the
limiting distribution [under the null] ofpN bβ1 β andpNbq
has zero covariance:
Theorem
From this lemma, it follows that
Var bβ2 bβ1 =Var bβ2 Var bβ1
Thus, Hausman suggests using the statistic H= bβ2 bβ1 0h Var bβ2 Var bβ1 i 1 b β2 bβ1 or equivalently H =bq0[Var(bq)] 1bq
Under the null hypothesis, this statistic is distributed
asymptotically as central chisquare, withK degrees of freedom.
H HO !
N!∞χ
2(K)
Under the alternative, it has a noncentral chi-square distribution with noncentrality parametereq0[Var (bq)] 1eq, whereeq is de…ned as follows: e q = plim H1/N!∞ b β2 bβ1
Problem
Now, apply the Hausman’s test to discriminate between …xed e¤ects methods and random e¤ects methods. We assume thatαi are random variable and the key assumption tested is here de…ned as:
H0 : E(αijXi) =0 H1 : E(αijXi)6=0
This test can be interpreted as a speci…cation test between "…xed e¤ect methods" and "random e¤ect methods".
1 If the null is rejected, the correlation between individual
e¤ects and the explicative variables induces a bias in the GLS estimates. So, a standard LSDV approach (…xed e¤ect method) has to be privilegiated.
2 If the null is not rejected, we can use a GLS estilmator
(random e¤ect method) and specify the individual e¤ects as random variables (random e¤ects model).
How to implement this test? Let us consider the standard model with random e¤ects (µ=0):
yi =Xiβ+eαi+vi
1 Under H0 (and assumptionsH2)we know that bβ
LSDV and
b
βGLS are consistent and asymptotically normally distributed.
2 Under H0, bβ
GLS is BLUE and attains asymptotic Cramer–Rao
bound.
3 Under H1, bβ
GLS is biased.
According to the Hausman’s lemma, we have:
cov bβGLS, bβLSDV bβGLS =0()cov bβLSDV,bβGLS =Var bβGLS
Since,
Var bβLSDV bβGLS =Var bβLSDV +Var bβGLS 2cov bβLSDV,bβGLS
We have:
De…nition
The Hausman’s speci…cation test statistic of individual e¤ect can be de…ned as follows: H= bβLSDV bβGLS 0h Var bβLSDV Var bβGLS i 1 b βLSDV bβGLS UnderH0,we have: H H!O NT!∞χ 2(K)
1 WhenN is …xed andT tends to in…nity, bβ
GLS andbβMCG
become identical. However, it was shown by Ahn and Moon
(2001) that the numerator and denominator ofH approach
zero at the same speed. Therefore the ratio remains chi-square distributed. However, in this situation the …xed-e¤ects and random-e¤ects models become indistinguishable for all practical purposes.
2 The more typical case in practice is that N is large relative to
T, so that di¤erences between the two estimators or two
Example
Let us consider a simple panel regression model for the total number of strikes days in OECD countries. We have a balanced
panel data set for 17 countries (N =17) and annual data form
1951 to 1985 (T =35).
s,t =αi+βiu,t+γipit+εit 8i =1, ..,17
si,t the number of strike days for 1000 workers for the country
i at time t.
uitt the unemployement rate
pi,t,the in‡ation rate
Section 3.
Random coe¢ cients models
There are cases in which there are changing economic structures or di¤erent socioeconomic and demographic background factors that
imply thatthe response parameters may be varying over time
3.1. Variable coe¢ cient model
When data do not support the hypothesis of coe¢ cients being the same, a single-equation model in its most general form can be written as:
yit = K
∑
k=1
βkitxkit+vit i =1, ..,N andt =1, ..,T
where, in contrast to previous sections, we no longer treat the intercept di¤erently than other explanatory variables and let
Fact
We assume that parameters do not vary with time (the time stability issue is not speci…c to panel data econometrics)
βkit =βki yit = K
∑
k=1 βkixkit+vit i =1, ..,N and t =1, ..,TIn the case in which only cross-sectional di¤erences are present, this approach is equivalent to postulating a separate regression for each cross-sectional unit
yi =Xiβi +vi i =1, ..,N
or
yit = β0ixit +vit i =1, ..,N and t =1, ..,T
whereβi = (β1i,β2i, ...,βKi) is a 1 K vector of parameters,and
De…nition
Alternatively, each regression coe¢ cient can be viewed asa
random variable with a probability distribution (e.g., Hurwicz (1950); Klein (1953); Theil and Mennes (1959); Zellner (1966)).
1 The random-coe¢ cient speci…cation reduces the number of
parameters to be estimated substantially, while still allowing the coe¢ cients to di¤er from unit to unit and/or from time to time.
2 Depending on the type of assumption about the parameter
variation, it can be further classi…ed into one of two categories: stationary and nonstationary random-coe¢ cient models.
De…nition
Stationary random-coe¢ cient models regard the coe¢ cients as having constant means and variance–covariances.Namely, the
K 1 vector of parameters βi is speci…ed as:
βi = β+ζi
fori =1, ..,N, where βis a K 1 vector of constants, and ζi is a
K 1 vector of stationary random variables with zero means and
constant variance–covariances.
For this type of model we are interested in
1 Estimating the mean coe¢ cient vector β,
2 Predicting each individual component β
i,
3 Estimating the dispersion of the individual-parameter vector
βi (variance covariance matrix ∆)
4 Testing the hypothesis that the variances of ξ
i (and so βi)are
Fact
Because of the computational complexities, variable-coe¢ cient models have not gained as wide acceptance in empirical work as has the variable-intercept model. However, that does not mean that there is less need for taking account of parameter
heterogeneity in pooling the data.
De…nition
When regression coe¢ cients are viewed as invariant over time, but varying from one unit to another, we can write the model as
yit = K
∑
k=1
(βk+ξki)xkit+vit i =1, ..,N and t=1, ..,T
whereβ= (β1,β2, ...,βK)can be viewed as the common mean
coe¢ cient 1 K vector, and ξi = (ξ1i,ξ2i, ...,ξKi)as the
1 If individual observations are heterogeneous or the
performance of individual units from the data base is of
interest, then ξi are treated as …xed constants.
2 If conditional onx
kit, individual units can be viewed as
random draws from a common population or the population characteristics are of interest, then ξi are generally treated
as random variables having zero means and constant variances and covariances.
De…nition (Fixed-coe¢ cient model)
Whenβi are treated as …xed and di¤erent constants, we can stack
theNT observations in the form of the Zellner (1962) seemingly
unrelated regression model
0 @ y1 . yN 1 A= 0 @ X1 0 0 0 .. .. 0 .. XN 1 A 0 @ β1 . βN 1 A+ 0 @ v1 . vN 1 A
whereyi andvi are T 1 vectors (yit, ...,yiT) and(vit, ...,viT),
andXi is the T K matrix of the time-series observations of the
1 If the covariances between di¤erent cross-sectional units are
not zero,e.g. E(vivj0)6=0, the GLS estimator of β01, ...,β0N
is more e¢ cient than the single-equation estimator of i for each cross-sectional unit.
2 IfXi are identical for alli orE(vivi0) =σ2
iIT and
E(vivj0) =0 for i =6 j, the GLS estimator for β01, ...,β0N is
the same as applying least squares separately to the time-series observations of each cross-sectional unit.
De…nition
We now assume that all the coe¢ cients βi are random, with
common mean β. yit = K
∑
k=1 (βk+ξki)xkit+vit i =1, ..,N and t=1, ..,Twhereβ= (β1,β2, ...,βK)can be viewed as the common mean
coe¢ cient 1 K vector, and ξi = (ξ1i,ξ2i, ...,ξKi)as the
individual deviation from the common mean. Let us denote
βki = βk+ξki
1 The vectorxi = (x1i..xKi)includes a constant term.The
parameter βki associated to this constant term correspond to
an individual e¤ect
2 An alternative notation is:
yit =α+ K
∑
k=2 (βk+ξki)xkit+αi+vit E(αi) =0We will consider a set of assumptions used in the seminal paper of Swamy (1970)
Swamy P.A. (1970), ”E¢ cient Inference in a Random
Coe¢ cient Regression Model”,Econometrica, 38, 311-323
Assumptions (Swamy, 1970) Let us assume that E(ξi) =0, E(vi) =0 E ξiξ0j = ∆ 0 i =j 8i 6=j E xitξj0 =0, E ξivj0 =0, 8(i,j) E(vivj0) = σ 2 iIT 0 t =s,i =j 8t 6=s,8i 6=j
Remark : we assume that the error term vi is heteroskedastic: E vivi0 =σ2iIT
De…nition
The two …rst moments of the vector of random parameters
βi = β+ξi are de…ned by,8i =1, ..N : E(βi) (K,1) = β (K,1) ∆ (K,K)=E βiβ 0 i =E ξiξ0i = 0 B B B @ σ2ξ 1 σξ1,ξ2 ... σξ1,ξK σξ2,ξ1 σ 2 ξ2 ... σξ2,ξK ... ... ... ... σξK,ξ1 σξK,ξ2 ... σ 2 ξK 1 C C C A
Fact
The matrix∆ corresponds to variance covariance matrix of the random parametersβi = (β1i,β2i, ..,βKi)0, which is assumed to be
common to all cross section units:
∆ (K,K)= 0 B B B @ σ2β 1 σβ1,β2 ... σβ1,βK σβ2,β1 σ 2 β2 ... σβ2,βK ... ... ... ... σβK,β1 σβK,β2 ... σ 2 βK 1 C C C A
De…nition
For each cross section unit, we have:
yi =Xiβ+Xiξi+vi
with
βi = β+ξi
where the vectorXi include a constant term (i.e. the average of
De…nition
This model can be rewriten as follows :
yi =Xiβ+εi
with
εi =Xiξi+vi =Xi(βi β) +vi
For a given cross unit, the covariance matrix for the composite disturbance termεi =Xiξi+vi is de…ned by:
Φi = E εiε0i
= E (Xiξi+vi) (Xiξi +vi)0 = XiE ξiξ0i Xi0+E vivi0
De…nition
For a given cross unit, the covariance matrix for the composite disturbance termεi =Xiξi+vi is de…ned by:
Φi =Xi∆Xi0+σ2iIT
Stacking allNT observations, the covariance matrix for the
composite disturbance term is either block-diagonal and heteroskedastic or diagonal and heteroskedastic.
1 Under Swamy’s assumption, the simple regression ofy onX
will yield an unbiased and consistent estimator of β if
(1/NT)X0X converges to a nonzero constant matrix.
2 But the estimator is ine¢ cient, and the usual least-squares
formula for computing the variance–covariance matrix of the estimator is incorrect, often leading to misleading statistical inferences.
De…nition
The best linear unbiased estimator ofβ is the GLS estimator
b βGLS = N
∑
i=1 Xi0Φi 1Xi ! 1 N∑
i=1 Xi0Φi 1yi !De…nition
The GLS estimatorbβGLS is a matrix-weighted average of the
least-squares estimator bβi for each cross-sectional unit, with the
weights inversely proportional to their covariance matrices:
b βGLS = N
∑
i=1 Wibβi Wi = ( N∑
i=1 h ∆+σ2i Xi0Xi 1 i 1) 1h ∆+σ2i Xi0Xi 1 i 1 b βi = Xi0Xi 1 Xi0yiThe covariance matrix for the GLS estimator is: V bβGLS = N
∑
i=1 Xi0Φi 1Xi ! 1 = ( N∑
i=1 h ∆+σ2i Xi0Xi 1i 1 ) 1Swamy proposed using the least-squares estimators
b
βi = (Xi0Xi) 1Xi0yi and their residualsbvi =yi Xibβi to obtain
unbiased estimators ofσ2v and∆
b σ2i = 1 T Ky 0 i h I Xi Xi0Xi 1 Xi0iyi = 1 T K T
∑
t=1 b vi,tFor the∆ matrix, we have: b ∆ (K,K) = 1 N 1 N
∑
i=1 " b βi 1 N N∑
i=1 b βi ! b βi 1 N N∑
i=1 b βi !0# 1 N N∑
i=1 b σ2i Xi0Xi 1De…nition
However, the previous estimator∆b is not necessarily nonnegative de…nite. In this situation, Swamy (1970) has suggested replacing this estimator by:
b ∆ (K,K)= 1 N 1 N
∑
i=1 " b βi 1 N N∑
i=1 b βi ! b βi 1 N N∑
i=1 b βi !0#This estimator, although not unbiased, is nonnegative de…nite and
Swamy proved that substitutingbσ2i and∆b for σ2i and ∆yields an
asymptotically normal and e¢ cient estimator of β. The speed of
convergence of the GLS estimator isN1/2.
Summary: how to estimate a random coe¢ cient model?
1 Run theN individual regressionsyi,t =bβ0
ixi,t+bvi,t.
2 Compute bσ2
i and the Swamy’s estimator ∆b as follows
b ∆ (K,K)= 1 N 1 N
∑
i=1 " b βi 1 N N∑
i=1 b βi ! b βi 1 N N∑
i=1 bβi !0# b σ2i = 1 T Ky 0 i h I Xi Xi0Xi 1 Xi0iyi3. Compute the GLS estimate of the expectation of individual parameters βi bβGLS = N
∑
i=1 Xi0Φbi 1Xi ! 1 N∑
i=1 Xi0Φbi 1yi ! b Φi =Xi∆bXi0+σb2iITExample
Swamy (1970) used this model to reestimate the Grunfeld
investment function with the annual data of 11 U.S. corporations. His GLS estimates of the common-mean coe¢ cients of the …rms’ beginning-of-year value of outstanding shares and capital stock are 0.0843 and 0.1961, with asymptotic standard errors 0.014 and 0.0412, respectively. The estimated dispersion measure of these coe¢ cients is
b
∆= 0.0011 0.0002
Predicting Individual Coe¢ cients
1 Sometimes one may wish to predict the individual component
βi, because it provides information on the behavior of each
individual and also because it provides a basis for predicting future values of the dependent variable for a given individual.
2 Swamy (1970, 1971) has shown that the best linear unbiased
predictor, conditional on given βi, is the least-squares
estimator bβi.
De…nition
Lee and Gri¢ ths (1979) suggest predictingβi by
βi = bβGLS +∆Xi0 Xi∆Xi0+σ2ιIT
1
yi XibβGLS
This predictor is the best linear unbiased estimator in the sense thatE(βi βi) =0, where the expectation is an unconditional
3.3. Mixed …xed and random coe¢ cient
model
Many of the previously discussed models can be treated as special
De…nition
We assume that each cross section unit is di¤erent
yit = K
∑
k=1 βkixki + m∑
l=1 γliwli+vit i =1, ..,N andt =1, ..,Twherexit andwit are each aK 1 and anm 1 vector of
explanatory variables that are independent of the error of the equationvit.
The parameters β
(NK,1)
= (β01,β02, ...,β0K)0 are assumed to be
random and parameters γ
(Nm,1)