Identiﬁcation - TRIANGULAR SYSTEM - Simultaneous-Equations Models

Simultaneous-Equations Models

5.4 TRIANGULAR SYSTEM

5.4.1 Identiﬁcation

A convenient way to model correlations across equations, as well as the corre-lation of a given individual at different times (or different members of a group), is to use latent variables to connect the residuals. Let ygi t denote the value of the variable ygfor the i th individual (or group) at time t (or tth member). We can assume that

vgi t = dghi t+ ugi t, (5.4.1)

where the ug are uncorrelated across equations and across i and t. The corre-lations across equations are all generated by the common omitted variable h, which is assumed to have a variance-component structure:

hi t= αi+ ωi t, (5.4.2)

whereαiis invariant over t but is independently identically distributed across i (groups), with mean zero and varianceσ_α², andωi tis independently identically distributed across i and t, with mean zero and varianceσ_ω²and is uncorrelated withαi.

An example of the model withΓ lower-triangular and v of the form (5.4.1) is (Chamberlain (1977a, 1977b); Chamberlain and Griliches (1975); Griliches

(1979))

y₁_{i t} = ␤1xi t+ d1hi t+ u1i t,

y₂_{i t} = −γ21y₁_{i t}+ ␤2xi t+ d2hi t+ u2i t, (5.4.3) y₃_{i t} = −γ31y₁_{i t}− γ32y₂_{i t}+ ␤₃xi t+ d3hi t+ u3i t,

where y1, y2, and y3denote years of schooling, a late (postschool) test score, and earnings, respectively, and xi tare exogenous variables (which may differ from equation to equation via restrictions on␤g). The unobservable h can be inter-preted as early “ability,” and u₂as measurement error in the test. The index i in-dicates groups (or families), and t inin-dicates members in each group (or family).

Without the h variables, or if dg= 0, equation (5.4.3) would be only a simple recursive system that could be estimated by applying least squares separately to each equation. The simultaneity problem arises when we admit the possibility that dg= 0. In general, if there were enough exogenous variables in the ﬁrst (schooling) equation that did not appear again in the other equations, the system could be estimated using 2SLS or EC2SLS procedures. Unfortunately, in the income–schooling–ability model using sibling data [e.g., see the survey by Griliches (1979)] there usually are not enough distinct x’s to identify all the parameters. Thus, restrictions imposed on the variance–covariance matrix of the residuals will have to be used.

Given that h is unobservable, we have an indeterminate scale d_g²

σ_α²+ σ_ω²

= cdg²

1 cσ_α²+1

cσ_ω²

. (5.4.4)

So we normalize h by lettingσ_α²= 1. Then Evi tv_{i t} =

1+ σ_ω²

dd+ diag

σ₁², . . . , σG²

= , (5.4.5)

Evi tv_{i s} = dd= w if t= s, (5.4.6)

Evi tv_{j s} = 0 if i= j, (5.4.7)

where d= (d1, . . . , dG), and diag(σ₁², . . . , σG²) denotes a G× G diagonal ma-trix withσ₁², σ₂², . . . , σ_G² on the diagonal.

Under the assumption thatαi, ωi t, and ugi tare normally distributed, or if we limit our attention to second-order moments, all the information with regard to the distribution of y is contained in

Cytt = Γ⁻¹BCxttBΓ⁻¹+ Γ⁻¹Γ⁻¹, (5.4.8) Cyts = Γ⁻¹BCxtsBΓ⁻¹+ Γ⁻¹wΓ⁻¹, t= s, (5.4.9)

Cyxts = −Γ⁻¹BCxts, (5.4.10)

where Cy_ts= Eyi ty_{i s}, Cyx_ts = Eyi tx_{i s} , and Cx_ts = Exi tx_{i s}.

Stack the coefﬁcient matrices and B into a 1 × G(G + K ) vector ␪= (␥₁, . . . , ␥G, ␤₁, . . . , ␤G). Suppose␪ is subject to M a priori constraints:

(␪) = ␾, (5.4.11)

5.4 Triangular System 129

where ␾ is an M × 1 vector of constants. Then a necessary and sufﬁcient condition for local identiﬁcation ofΓ, B, d, σ_ω², and σ₁², . . . , σG² is that the rank of the Jacobian formed by taking partial derivatives of (5.4.8)–(5.4.11) with respect to the unknowns is equal to G(G+ K ) + 2G + 1 (e.g., Hsiao (1983)).

Suppose there is no restriction on the matrix B. The GK equations (5.4.10) can be used to identify B provided thatΓ is identiﬁable. Hence, we can concentrate on

Cy_tt− Cyx_ttC_x⁻¹_ttC_yx_tt

Γ= , (5.4.12)

Cy_ts− Cyx_tsC⁻¹_x_tsC_yx _ts

Γ= w, t = s, (5.4.13) We note that is symmetric, and we have G(G + 1)/2 independent equa-tions from (5.4.12). Butw is of rank 1; therefore, we can derive only G independent equations from (5.4.13). SupposeΓ is lower-triangular and the diagonal elements ofΓ are normalized to be unity; there are G(G − 1)/2 un-knowns inΓ, and 2G + 1 unknowns of (d1, . . . , dG), (σ₁², . . . , σG²), andσ_ω². We have one less equation than the number of unknowns. In order for the Jacobian matrix formed by (5.4.12), (5.4.13), and a priori restrictions to be nonsingular, we need at least one additional a priori restriction. Thus, for the system

Γyi t+ Bxi t = vi t, (5.4.14)

whereΓ is lower-triangular, B is unrestricted, and vi tsatisﬁes (5.4.1) and (5.4.2), a necessary condition for the identiﬁcation under exclusion restrictions is that at least oneγg= 0 for g > . [For details, see Chamberlain (1976) or Hsiao (1983).]

5.4.2 Estimation

We have discussed how the restrictions in the variance–covariance matrix can help identify the model. We now turn to the issues of estimation. Two methods are discussed: the purged-instrumental-variable method (Chamberlain (1977a)) and the maximum-likelihood method (Chamberlain and Griliches 1975)). The latter method is efficient, but computationally complicated. The former method is inefficient, but it is simple and consistent. It also helps to clarify the previous results on the sources of identification.

For simplicity, we assume that there is no restriction on the coefﬁcients of exogenous variables. Under this assumption we can further ignore the existence of exogenous variables without loss of generality, because there are no excluded exogenous variables that can legitimately be used as instruments for the endoge-nous variables appearing in the equation. The instruments have to come from the group structure of the model. We illustrate this point by considering the

following triangular system:

The reduced form of (5.4.15) is

yg_{i t} = aghi t+ g_{i t}, g= 1, . . . , G, (5.4.16)

The trick of the purged instrumental-variable (IV) method is to leave h in the residual and construct instruments that are uncorrelated with h. Before going to the general formula, we use several simple examples to show where the instruments come from.

5.4 Triangular System 131

Next, suppose that onlyγ32= 0. The reduced form of the model becomes



In this case, the construction of valid instruments is more complicated. It re-quires two stages. The ﬁrst stage is to use y1as a proxy for h in the reduced-form equation for y₂:

The second stage is to use z2as an instrument for y1in the structural equation y3:

y3i t = γ31y1i t + d3hi t+ u3i t. (5.4.23) The variable z₂is an appropriate IV because it is uncorrelated with h and u₃, but it is correlated with y₁, provided d₂σ₁²= 0. (If d2= 0, then z2= y2− γ21y₁= u₂. It is no longer correlated with y₁.) Therefore, we require that h appear directly in the y₂equation and that y₁not be proportional to h – otherwise we could never separate the effects of y1and h.

In order to identify the y2equation

y2i t = γ21y1i t + d2hi t+ u2i t, (5.4.24) we can interchange the reduced-form y2 and y3 equations and repeat the two stages. Withγ21andγ31identiﬁed, in the third stage we form the residuals

v2i t = y2i t− γ21y1i t = d2hi t+ u2i t,

Now d2/d1and d3/d1 can be identiﬁed by a third application of instrumental variables, using y1i s, s = t, as an instrument for y1i t. (Note that only the ratio of the d’s is identiﬁed, because of the indeterminate scale of the latent variable.)

Now come back to the construction of IVs for the general system (5.4.15)–

(5.4.18). We assume that T ≥ 2. The instruments are constructed over several stages. At the ﬁrst stage, let y1be a proxy for h. Then the reduced-form equation for ygbecomes

yg_{i t} =ag

y₁_{i t}+ g_{i t}−ag

a11i t, g= 2, . . . , − 1. (5.4.27) If T ≥ 2, ag/a1can be consistently estimated by using different members in the same group (e.g., y1i sand y1i t, t = s) as instruments for the ygequation (5.4.27) when d1σ_α²= 0. Once ag/a1is consistently estimated, we form the residual

zg_{i t} = yg_{i t}−ag

y₁_{i t} = g_{i t}−ag

a11i t, g= 2, . . . , − 1. (5.4.28) The zg are uncorrelated with h. They are valid instruments for yg provided dgσ₁²= 0. There are − 2 IVs for the − 2 variables that remain on the right-hand side of theth structural equation after ykhas been excluded.

To estimate the equations that follow y, we form the transformed variables y₂^∗_{i t} = y2i t− γ21y₁_{i t},

y₃^∗_{i t} = y3i t− γ31y1i t − γ32y2i t,

(5.4.29) ...

y^∗_{i t} = yi t − γ1y1i t− · · · − γ,−1y₋₁_{i t}, and rewrite the y₊₁equation as

y₊₁_{i t} = γ_+1,1^∗ y1i t+ γ_+1,2^∗ y₂^∗_{i t}+ · · +γ_+1,−1^∗ y₋₁^∗ _{i t}+ γ_+1,y^∗_{i t}

+ d+1hi t+ u+1i t, (5.4.30)

whereγ_{+1, j}^∗ = γ_{+1, j}+

m= j+1γ_+1,mγm j^∗ for j < . Using y1as a proxy for h, we have

y₊₁_{i t} = γ_+1,2^∗ y₂^∗_{i t}+ · · · + γ+1,y^∗_{i t}

(5.4.31) +

γ_+1,1^∗ +d₊₁ d1

y₁_{i t} + u+1i t −d₊₁ d1

u₁_{i t},

Because u1 is uncorrelated with y_g^∗ for 2≤ g ≤ , we can use yg^∗_{i t} together with y1i s, s = t as instruments to identify γ_{+1, j}. Onceγ_{+1, j}are identiﬁed, we can form y₊₁^∗ = y₊₁− γ_+1,1y1− · · · − γ_+1,y and proceed in a similar fashion to identify the y₊₂equation, and so on.

Once all theγ are identiﬁed, we can form the estimated residuals, ˆvi t. From ˆvi t we can estimate dg/d1by the same procedure as (5.4.26). Or we can form the matrix ˆ of variance–covariances of the residuals, and the matrix ˆ¯ of variance–covariances of averaged residuals (1/T )T

t=1ˆvi t, then solve for d,

5.4 Triangular System 133

(σ₁², . . . , σG²), andσ_ω²from the relations

 =ˆ 1+ σ_ω²

dd+ diag

σ₁², . . . , σG²

, (5.4.32)

 =ˆ¯ 1+ σ_ω²

dd+ 1 Tdiag

σ1², . . . , σG²

. (5.4.33)

The purged IV estimator is consistent. It also will often indicate quickly if a new model is identiﬁed. For instance to see the necessity of having at least one moreγg= 0 for g > to identify the foregoing system, we can check if the instruments formed by the foregoing procedure satisfy the required rank condition. Consider the example where G= 3 and all γg= 0 for g > . In order to follow the strategy of allowing h to remain in the residual, in the third equation we need IVs for y₁and y₂that are uncorrelated with h. As indicated earlier, we can purge y₂of its dependence on h by forming z₂= y2− (a2/a1)y₁. A similar procedure can be applied to y₁. We use y₂as a proxy for h, with y₂_{i s} as an IV for y2i t. Then form the residual z1= y1− (a1/a2)y2. Again z1 is uncorrelated with h and u3. But z1= −(a1/a2)z2, and so an attempt to use both z2and z1as IVs fails to meet the rank condition.

5.4.2.b Maximum-Likelihood Method

Although the purged IV method is simple to use, it is likely to be inefficient, because the correlations between the endogenous variables and the purged IVs will probably be small. Also, the restriction that (5.4.6) is of rank 1 is not being utilized. To obtain efficient estimates of the unknown parameters, it is necessary to estimate the covariance matrices simultaneously with the equation coefficients. Under the normality assumptions forαi, ωi tand ui t, we can obtain efficient estimates of (5.4.15) by maximizing the log likelihood function

log L = −N 2 log|V |

−1 2

N i=1

(y_1i, y_2i, . . . , yGi)V⁻¹(y_1i, . . . , yGi), (5.4.34) where

ygi

T×1= (ygi 1, . . . , ygi T), g= 1, . . . , G,

GTV×GT = ⊗ IT + aa⊗ eTe_T, (5.4.35)

G×G= E(⑀i t⑀i t)+ σ_ω²aa. Using the relations¹⁰

V⁻¹ = ⁻¹⊗ IT− cc⊗ eTe_T, (5.4.36)

|V | = ||^T|1 − T cc|⁻¹, (5.4.37)

we can simplify the log likelihood function as¹¹ log L= −N T

2 log|| + N

2 log(1− T cc)

−N T

2 tr(⁻¹R)+ N T²

2 cRc,¯ (5.4.38)

where c is a G× 1 vector proportional to ⁻¹a, R is the matrix of the sums of the squares and cross-products of the residuals divided by N T , and ¯R is the matrix of sums of squares and cross-products of the averaged residuals (over t for i ) divided by N . In other words, we simplify the log likelihood function (5.4.34) by reparameterizing it in terms of c and.

Taking partial derivatives of (5.4.38), we obtain the ﬁrst-order conditions¹²

∂ log L

∂⁻¹ = N T

2  + N T 2

(1− T cc)cc − N T 2 R= 0,

(5.4.39)

∂ log L

∂c = − N T

1− T ccc + N T²¯Rc= 0. (5.4.40) Postmultiplying (5.4.39) by c and regrouping the terms, we have

c = 1− T cc

1− (T − 1)ccRc. (5.4.41)

Combining (5.4.40) and (5.4.41), we obtain

¯R− 1

T [1− (T − 1)cc]R

c= 0. (5.4.42)

Hence, the MLE of c is a characteristic vector corresponding to a root of

| ¯R − λR| = 0. (5.4.43)

The determinate equation (5.4.43) has G roots. To ﬁnd which root to use, substitute (5.4.39) and (5.4.40) into (5.4.38):

log L= −N T

2 log|| + N

2 log(1− T cc)

−N T

2 (G+ T tr c¯Rc)+ N T² 2 tr(c¯Rc)

= −N T

2 log|| + N

2 log(1− T cc) − N T G

2 . (5.4.44) Let the G characteristic vectors corresponding to the G roots of (5.4.43) be denoted as c1(= c), c2, . . . , cG. These characteristic vectors are determined only up to a scalar. Choose the normalization c^∗_gRc^∗_g = 1, g = 1, . . . , G, where c^∗_g= (cgRcg)^−1/2cg. Let C^∗ = [c^∗₁, . . . , c^∗G]; then C^∗RC^∗= IG. From (5.4.39)

5.4 Triangular System 135

and (5.4.41) we have

C^∗C^∗= C^∗RC^∗− 1− T cc

[1− (T − 1)cc]²C^∗RccRC^∗

= IG− 1− T cc [1− (T − 1)cc]²







(cRc)^1/2 0

... 0





[(cRc)^1/2 0 · · · 0]. (5.4.45)

Equation (5.4.41) implies that (cRc)= {[1 − (T − 1)cc]/[1 − T cc]}cc.

Therefore, the determinant of (5.4.45) is{[1 − T cc]/[1 − (T − 1)cc]}.

Using C^∗−1C^∗−1= R, we have || = {[1 − T cc]/[1 − (T − 1)cc]}|R|.

Substituting this into (5.4.44), the log likelihood function becomes log L = −N T

2 {log |R| + log(1 − T cc)

− log[1 − (T − 1)cc]}

2 log[1− T cc] − N T G

2 , (5.4.46)

which is positively related to cc within the admissible range (0, 1/T ).¹³So the MLE of c is the characteristic vector corresponding to the largest root of (5.4.43). Once c is obtained, from Appendix 5A and (5.4.39) and (5.4.40) we can estimate a and by

a= T (1 + T²c¯Rc)^−1/2c¯R, (5.4.47) and

 = R − aa. (5.4.48)

Knowing a and, we can solve for the coefﬁcients of the joint dependent variables.

When exogenous variables also appear in the equation, and with no re-strictions on the coefﬁcients of exogenous variables, we need only replace the exponential term of the likelihood function (5.4.34),

−1 2

N i=1

(y_1i, . . . , yGi)V⁻¹(y_1i, . . . , yGi),

with

−1 2

N i=1

(y_1i− ␲₁X_i, . . . , yGi− ␲GX_i)

× V⁻¹(y_1i − ␲₁X_i, . . . , yGi− ␲GX_i).

The MLEs of c, a, and remain the solutions of (5.4.43), (5.4.47), and (5.4.48).

From knowledge of and a we can solve for and σ_ω². The MLE of condi-tional on V is the GLS of. Knowing and , we can solve for B = −.

Thus, Chamberlain and Griliches (1975) suggested the following iterative algorithm to solve for the MLE. Starting from the least-squares reduced-form estimates, we can form consistent estimates of R and ¯R. Then estimate c by maximizing¹⁴

c¯Rc

cRc. (5.4.49)

Once c is obtained, we solve for a and by (5.4.47) and (5.4.48). After obtaining

 and a, the MLE of the reduced-form parameters is just the generalized least-squares estimate. With these estimated reduced-form coefﬁcients, one can form new estimates of R and ¯R and continue the iteration until the solution converges.

The structural-form parameters are then solved from the convergent reduced-form parameters.

5.4.3 An Example

Chamberlain and Griliches (1975) used the Gorseline (1932) data of the highest grade of schooling attained (y1), the logarithm of the occupational (Duncan’s SES) standing (y2), and the logarithm of 1927 income (y3) for 156 pairs of broth-ers from Indiana (U.S.) to ﬁt a model of the type (5.4.1)–(5.4.3). Speciﬁcally, they let

y1i t = ␤₁xi t+ d1hi t+ u1i t,

y2i t = γ21y1i t + ␤₂xi t+ d2hi t+ u2i t, (5.4.50) y3i t = γ31y1i t + ␤₃xi t+ d3hi t+ u3i t.

The set X contains a constant, age, and age squared, with age squared appearing only in the income equation.

The reduced form of (5.4.50) is

yi t = xi t+ ahi t+ ⑀i t, (5.4.51)

where Π =



␤₁

γ21␤₁+ ␤₂ γ31␤₁+ ␤₃



, a=



d1

d₂+ γ21d₁ d₃+ γ31d₁



, (5.4.52)

⑀i t =



u1i t

u₂_{i t} + γ21u₁_{i t} u₃_{i t} + γ31u₁_{i t}



.

5.4 Triangular System 137

Therefore,

E⑀i t⑀i t=





σ_u1² γ21σ_u1² γ31σ_u1² σ_u2² + γ₂₁²σ_u1² γ21γ31σ_u1²

σ_u3² + γ₃₁²σ_u1²



, (5.4.53)

and

 =



σ11 σ12 σ13

σ22 σ23

σ33



 = E(⑀i t⑀i t)+ σ_ω²aa. (5.4.54) We show that knowing a and identiﬁes the structural coefﬁcients of the joint dependent variables as follows: For a given value ofσ_ω², we can solve for

σ_u1² = σ11− σ_ω²a²₁, (5.4.55)

γ21 = σ12− σ_ω²a₁a₂

σ_u1² , (5.4.56)

γ31 = σ13− σ_ω²a₁a₃

σ_u1² . (5.4.57)

Equating

γ21γ31= σ23− σ_ω²a₂a₃

σ_u1² (5.4.58)

with the product of (5.4.56) and (5.4.57), and making use of (5.4.55), we have σ_ω²= σ12σ13− σ11σ23

σ12a1a3+ σ13a1a2− σ11a2a3− σ23a²₁. (5.4.59) The problem then becomes one of estimating a and. Table 5.1 presents the MLE of Chamberlain and Griliches (1975) for the coefﬁcients of schooling and (unobservable) ability variables withσ_α²normalized to equal 1. Their least-squares estimates ignore the familial information, and the covariance estimates in which each brother’s characteristics (his income, occupation, schooling, and age) are measured around his own family’s mean are also presented in Table 5.1.

The covariance estimate of the coefficient-of-schooling variable in the income equation is smaller than the least-squares estimate. However, the simultaneous-equations model estimate of the coefficient for the ability variable is negative in the schooling equation. As discussed in Section 5.1, if schooling and ability are negatively correlated, the single-equation within-family estimate of the schooling coefficient could be less than the least-squares estimate (here 0.080 versus 0.082). To attribute this decline to “ability” or “family background”

is erroneous. In fact, when schooling and ability were treated symmetrically, the coefﬁcient-of-schooling variable (0.088) became greater than the least-squares estimate 0.082.

Table 5.1. Parameter estimates and their standard errors for the income–

occupation–schooling model

Method Least-squares Covariance

Coefﬁcients of the structural equations estimate estimate MLE Schooling in the:

Income equation 0.082 0.080 0.088

(0.010)^a (0.011) (0.009)

Occupation equation 0.104 0.135 0.107

(0.010) (0.015) (0.010)

“Ability” in the:

Income equation 0.416

(0.038)

Occupation equation 0.214

(0.046)

Schooling equation −0.092

(0.178)

aStandard errors in parentheses.

Source: Chamberlain and Griliches (1975, p. 429).

APPENDIX 5A

In document [Cheng Hsiao] Analysis of Panel Data(BookFi.org) (Page 145-156)