MODELS WITH ARBITRARY ERROR STRUCTURE – CHAMBERLAIN π APPROACH

Simple Regression with Variable Intercepts

Lemma 3.5.1. Based on a sample of N observations, consider two estimates

3.9 MODELS WITH ARBITRARY ERROR STRUCTURE – CHAMBERLAIN π APPROACH

The focus of this chapter is formulation and estimation of linear-regression models when there exist time-invariant and/or individual-invariant omitted (latent) variables. In Sections 3.1–3.7 we have been assuming that the variance–

covariance matrix of the error term possesses a known structure. In fact, when N tends to inﬁnity, the characteristics of short panels allow us to exploit the un-known structure of the error process. Chamberlain (1982, 1984) has proposed to treat each period as an equation in a multivariate setup to transform the problems of estimating a single-equation model involving two dimensions (cross sections and time series) into a one-dimensional problem of estimating a T -variate re-gression model with cross-sectional data. This formulation avoids imposing restrictions a priori on the variance–covariance matrix, so that serial correlation and certain forms of heteroscedasticity in the error process, which covers cer-tain kinds of random-coefﬁcient models (see Chapter 6), can be incorporated.

The multivariate setup also provides a link between the single-equation and simultaneous-equations models (see Chapter 5). Moreover, the extended view of the Chamberlain method can also be reinterpreted in terms of the generalized method of moments (GMM) method to be discussed in Chapter 4 (Cr´epon and Mairesse (1996)).

For simplicity, consider the following model:

yi t = αi^∗+ ␤xi t+ ui t, i = 1, . . . , N,

(3.9.1) t = 1, . . . , T,

and

E(ui t| xi 1, . . . , xi T, α^∗i)= 0. (3.9.2) When T is fixed and N tends to infinity, we can stack the T time-period ob-servations of the i th individual’s characteristics into a vector (y_i, xi), where y_i = (yi 1, . . . , yi T) and x_i = (x_{i 1}, . . . , xi T) are 1× T and 1 × K T vectors, re-spectively. We assume that (y_i, x_i) is an independent draw from a common (unknown) multivariate distribution function with finite fourth-order moments and with Exix_i= x xpositive definite. Then each individual observation vec-tor corresponds to a T -variate regression

T×1 = eα^∗i + (IT ⊗ ␤)xi+ ui, i = 1, . . . , N. (3.9.3) To allow for the possible correlation betweenα^∗i and xi, Chamberlain, fol-lowing the idea of Mundlak (1978a), assumes that

E(α^∗i | xi)= µ +

T t=1

a_txi t = µ + axi, (3.9.4)

where a= (a₁, . . . , aT). While E(yi| xi, αi^∗) is assumed linear, it is possible

3.9 Models with Arbitrary Error Structure 61

to relax the assumption of E(αi^∗| xi) being linear for the linear model. In the case in which E(α^∗i | xi) is not linear, Chamberlain (1984) replaces (3.9.4) by

E^∗(αi^∗| xi)= µ + axi, (3.9.5)

where E^∗(αi^∗| xi) refers to the (minimum-mean-squared-error) linear predictor (or the projection) ofα^∗i onto xi. Then,¹⁸

E^∗(y_i| xi)= E^∗{E^∗(y_i| xi, αi^∗)| xi}

= E^∗{eαi^∗+ (IT⊗ ␤)xi| xi}

= eµ + xi, (3.9.6)

where

T×K T = IT ⊗ ␤+ ea. (3.9.7)

Rewrite equations (3.9.3) and (3.9.6) as

yi = eµ + [IT⊗ xi]␲ + ␯i, i= 1, . . . , N, (3.9.8) where␯i = yi− E^∗(yi| xi) and␲= vec(^∗)= [␲₁, . . . , ␲T] is a 1× K T² vector with␲tdenoting the tth row of. Treating the coefﬁcients of (3.9.8) as if they were unconstrained, we regress (yi− ¯y) on [IT⊗ (xi− ¯x^∗)] and obtain the least-squares estimate of␲ as¹⁹

␲ =ˆ

i=1

[IT⊗ (xi− ¯x^∗)][IT⊗ (xi− ¯x^∗)]

₋₁

i=1

[IT⊗ (xi− ¯x^∗)](yi− ¯y)

= ␲ +

1 N

N i=1

[IT ⊗ (xi− ¯x^∗)][IT ⊗ (xi− ¯x^∗)]

₋₁

1 N

N i=1

[IT⊗ (xi− ¯x^∗)]␯i

, (3.9.9)

where ¯y= (1/N)N

i=1yi and ¯x^∗= (1/N)N i=1xi.

By construction, E(␯i| xi)= 0, and E(␯i⊗ xi)= 0. The law of large num-bers implies that ˆ␲ is a consistent estimator of ␲ when T is ﬁxed and N tends to inﬁnity (Rao (1973, Chapter 2)). Moreover, because

plim

N→∞

1 N

N i=1

(xi− ¯x^∗)(xi− ¯x^∗)= E[xi− Exi][xi− Exi]

= x x − (Ex)(Ex)= x x,

we have√

N ( ˆ␲ − ␲) converging in distribution to (Rao (1973, Chapter 2))

IT⊗ ⁻¹x x

√1 N

N i=1

[IT ⊗ (xi− ¯x^∗)]␯i

IT⊗ ⁻¹x x

√N

N i=1

[␯i⊗ (xi− ¯x^∗)]

. (3.9.10)

So the central limit theorem implies that√

N ( ˆ␲ − ␲) is asymptotically normally distributed, with mean zero and variance–covariance matrix, where²⁰

 = E

(yi− eµ − xi)(yi− eµ − xi)

⊗ ⁻¹x x(xi− Ex)(xi− Ex)⁻¹x x . (3.9.11) A consistent estimator of  is readily available from the corresponding sample moments,

 =ˆ 1 N

N i=1

![(yi− ¯y) − ˆ(xi− ¯x^∗)][(yi− ¯y)

− ˆ(xi− ¯x^∗)]⊗ S⁻¹x x(xi− ¯x^∗)(xi− ¯x^∗)S_{x x}⁻¹"

(3.9.12) where

Sx x = 1 N

N i=1

(xi− ¯x^∗)(xi− ¯x^∗).

Equation (3.9.7) implies that is subject to restrictions. Let ␪ = (␤, a).

We specify the restrictions on [equation (3.9.7)] by the conditions that

␲ = f(␪). (3.9.13)

We can impose these restrictions by using a minimum-distance estimator.

Namely, choose ˆ␪ to minimize

[ ˆ␲ − f(␪)]ˆ⁻¹[ ˆ␲ − f(␪)]. (3.9.14) Under the assumptions that f possesses continuous second partial derivatives, and the matrix of ﬁrst partial derivatives

F= ∂f

∂␪ (3.9.15)

has full column rank in an open neighborhood containing the true parameter␪, the minimum-distance estimator ˆ␪ of (3.9.14) is consistent, and√

N ( ˆ␪ − ␪) is asymptotically normally distributed, with mean zero and variance–covariance matrix

(F⁻¹F )⁻¹. (3.9.16)

3.9 Models with Arbitrary Error Structure 63

The quadratic form

N [ ˆ␲ − f(␪)]ˆ⁻¹[ ˆ␲ − f(␪)] (3.9.17) converges to a chi-square distribution with K T²− K (1 + T ) degrees of freedom.²¹

The advantage of the multivariate setup is that we need only to assume that the T period observations of the characteristics of the i th individual are independently distributed across cross-sectional units with ﬁnite fourth-order moments. We do not need to make speciﬁc assumptions about the error process.

Nor do we need to assume that E(αi^∗| xi) is linear.²²In the more restrictive case that E(α^∗_i | xi) is indeed linear, [then the regression function is linear, that is, E(yi| xi)= eµ + xi] and Var(yi| xi) is uncorrelated with xix_i, (3.9.12) will converge to

E[Var(yi| xi)]⊗ ⁻¹x x. (3.9.18)

If the conditional variance–covariance matrix is homoscedastic, so that Var(yi| xi)= does not depend on xi, then (3.9.12) will converge to

 ⊗ ⁻¹x x. (3.9.19)

The Chamberlain procedure of combining all T equations for a single indi-vidual into one system, obtaining the matrix of unconstrained linear-predictor coefﬁcients, and then imposing restrictions by using a minimum-distance es-timator also has a direct analog in the linear simultaneous-equations model, in which an efﬁcient estimator is provided by applying a minimum-distance procedure to the reduced form (Malinvaud (1970, Chapter 19)). We demon-strate this by considering the standard simultaneous-equations model for the time-series data,²³

yt+ Bxt= ut, t= 1, . . . , T, (3.9.20) and its reduced form

yt = xt+ vt,  = −⁻¹B, vt = ⁻¹ut, (3.9.21) where, B, and are G × G, G × K , and G × K matrices of coefﬁcients, ytand utare G× 1 vectors of observed endogenous variables and unobserved disturbances, respectively, and xtis a K× 1 vector of observed exogenous vari-ables. The ut are assumed to be serially independent, with bounded variances and covariances.

In general, there are restrictions on and B. We assume that the model (3.9.20) is identiﬁed by zero restrictions (e.g., Hsiao (1983)), so that the gth structural equation is of the form

ygt= wgt␪g+ vgt, (3.9.22)

where the components of wgt are the variables in yt and xt that appear in the gth equation with unknown coefﬁcients. Let(␪) and B(␪) be parametric

representations of and B that satisfy the zero restrictions and the normalization rule, where␪= (␪₁, . . . , ␪G). Then␲ = f(␪) = vec{[−⁻¹(␪)B(␪)]}.

Let ˆ be the least-squares estimate of , and

 =˜ 1 minimum-distance estimator is to choose ˆ␪ to

min[ ˆ␲ − f(␪)]˜⁻¹[ ˆ␲ − f(␪)]. (3.9.24) Then we have√

T ( ˆ␪ − ␪) being asymptotically normally distributed, with mean zero and variance–covariance matrix (F˜⁻¹F )⁻¹, where F= ∂f(␪)/∂␪.

The formula for∂␲/∂␪is given in Rothenberg (1973, p. 69):

F= ∂␲

which is the conventional asymptotic covariance matrix for the three-stage least-squares (3SLS) estimator (Zellner and Theil (1962)). If utu_tis correlated with xtx_t, then the minimum-distance estimator of ˆ␪ is asymptotically equivalent to the Chamberlain (1982) generalized 3SLS estimator,

␪ˆG3SLS= (Sw xˆ⁻¹S_{w x} )⁻¹(Sw xˆ⁻¹sx y), (3.9.28)

where ˆ and ˆB are any consistent estimators for and B. When certain equa-tions are exactly identiﬁed, then just as in the conventional 3SLS case, applying the generalized 3SLS estimator to the system of equations, excluding the ex-actly identiﬁed equations, yields the same asymptotic covariance matrix as the estimator obtained by applying the generalized 3SLS estimator to the full set of G equations.²⁴

Appendix 3A 65

However, as with any generalization, there is a cost associated with it. The minimum-distance estimator is efficient only relative to the class of estimators that do not impose a priori restrictions on the variance–covariance matrix of the error process. If the error process is known to have an error-component structure, as assumed in previous sections, then the least-squares estimate of is not efficient (see Section 5.2), and hence the minimum-distance estimator, ignoring the specific structure of the error process, cannot be efficient, although it remains consistent.²⁵The efficient estimator is the GLS estimator. Moreover, computation of the minimum-distance estimator can be quite tedious, whereas the two-step GLS estimation procedure is fairly easy to implement.

APPENDIX 3A: CONSISTENCY AND ASYMPTOTIC

In document [Cheng Hsiao] Analysis of Panel Data(BookFi.org) (Page 78-83)