• No results found

Simultaneous-Equations Models

5.4 TRIANGULAR SYSTEM

5.4.1 Identification

A convenient way to model correlations across equations, as well as the corre-lation of a given individual at different times (or different members of a group), is to use latent variables to connect the residuals. Let ygi t denote the value of the variable ygfor the i th individual (or group) at time t (or tth member). We can assume that

vgi t = dghi t+ ugi t, (5.4.1)

where the ug are uncorrelated across equations and across i and t. The corre-lations across equations are all generated by the common omitted variable h, which is assumed to have a variance-component structure:

hi t= αi+ ωi t, (5.4.2)

whereαiis invariant over t but is independently identically distributed across i (groups), with mean zero and varianceσα2, andωi tis independently identically distributed across i and t, with mean zero and varianceσω2and is uncorrelated withαi.

An example of the model withΓ lower-triangular and v of the form (5.4.1) is (Chamberlain (1977a, 1977b); Chamberlain and Griliches (1975); Griliches

(1979))

y1i t = ␤1xi t+ d1hi t+ u1i t,

y2i t = −γ21y1i t+ ␤2xi t+ d2hi t+ u2i t, (5.4.3) y3i t = −γ31y1i t− γ32y2i t+ ␤3xi t+ d3hi t+ u3i t,

where y1, y2, and y3denote years of schooling, a late (postschool) test score, and earnings, respectively, and xi tare exogenous variables (which may differ from equation to equation via restrictions on␤g). The unobservable h can be inter-preted as early “ability,” and u2as measurement error in the test. The index i in-dicates groups (or families), and t inin-dicates members in each group (or family).

Without the h variables, or if dg= 0, equation (5.4.3) would be only a simple recursive system that could be estimated by applying least squares separately to each equation. The simultaneity problem arises when we admit the possibility that dg= 0. In general, if there were enough exogenous variables in the first (schooling) equation that did not appear again in the other equations, the system could be estimated using 2SLS or EC2SLS procedures. Unfortunately, in the income–schooling–ability model using sibling data [e.g., see the survey by Griliches (1979)] there usually are not enough distinct x’s to identify all the parameters. Thus, restrictions imposed on the variance–covariance matrix of the residuals will have to be used.

Given that h is unobservable, we have an indeterminate scale dg2

σα2+ σω2

= cdg2

1 α2+1

ω2



. (5.4.4)

So we normalize h by lettingσα2= 1. Then Evi tvi t =

1+ σω2

dd+ diag

σ12, . . . , σG2

= , (5.4.5)

Evi tvi s = dd= w if t= s, (5.4.6)

Evi tvj s = 0 if i= j, (5.4.7)

where d= (d1, . . . , dG), and diag(σ12, . . . , σG2) denotes a G× G diagonal ma-trix withσ12, σ22, . . . , σG2 on the diagonal.

Under the assumption thatαi, ωi t, and ugi tare normally distributed, or if we limit our attention to second-order moments, all the information with regard to the distribution of y is contained in

Cytt = Γ−1BCxttBΓ−1+ Γ−1−1, (5.4.8) Cyts = Γ−1BCxtsBΓ−1+ Γ−1wΓ−1, t= s, (5.4.9)

Cyxts = −Γ−1BCxts, (5.4.10)

where Cyts= Eyi tyi s, Cyxts = Eyi txi s , and Cxts = Exi txi s.

Stack the coefficient matrices and B into a 1 × G(G + K ) vector ␪= (␥1, . . . , ␥G, ␤1, . . . , ␤G). Suppose␪ is subject to M a priori constraints:

(␪) = ␾, (5.4.11)

5.4 Triangular System 129

where ␾ is an M × 1 vector of constants. Then a necessary and sufficient condition for local identification ofΓ, B, d, σω2, and σ12, . . . , σG2 is that the rank of the Jacobian formed by taking partial derivatives of (5.4.8)–(5.4.11) with respect to the unknowns is equal to G(G+ K ) + 2G + 1 (e.g., Hsiao (1983)).

Suppose there is no restriction on the matrix B. The GK equations (5.4.10) can be used to identify B provided thatΓ is identifiable. Hence, we can concentrate on

Γ

Cytt− CyxttCx−1ttCyxtt

Γ= , (5.4.12)

Γ

Cyts− CyxtsC−1xtsCyx ts

Γ= w, t = s, (5.4.13) We note that is symmetric, and we have G(G + 1)/2 independent equa-tions from (5.4.12). Butw is of rank 1; therefore, we can derive only G independent equations from (5.4.13). SupposeΓ is lower-triangular and the diagonal elements ofΓ are normalized to be unity; there are G(G − 1)/2 un-knowns inΓ, and 2G + 1 unknowns of (d1, . . . , dG), (σ12, . . . , σG2), andσω2. We have one less equation than the number of unknowns. In order for the Jacobian matrix formed by (5.4.12), (5.4.13), and a priori restrictions to be nonsingular, we need at least one additional a priori restriction. Thus, for the system

Γyi t+ Bxi t = vi t, (5.4.14)

whereΓ is lower-triangular, B is unrestricted, and vi tsatisfies (5.4.1) and (5.4.2), a necessary condition for the identification under exclusion restrictions is that at least oneγg= 0 for g > . [For details, see Chamberlain (1976) or Hsiao (1983).]

5.4.2 Estimation

We have discussed how the restrictions in the variance–covariance matrix can help identify the model. We now turn to the issues of estimation. Two methods are discussed: the purged-instrumental-variable method (Chamberlain (1977a)) and the maximum-likelihood method (Chamberlain and Griliches 1975)). The latter method is efficient, but computationally complicated. The former method is inefficient, but it is simple and consistent. It also helps to clarify the previous results on the sources of identification.

For simplicity, we assume that there is no restriction on the coefficients of exogenous variables. Under this assumption we can further ignore the existence of exogenous variables without loss of generality, because there are no excluded exogenous variables that can legitimately be used as instruments for the endoge-nous variables appearing in the equation. The instruments have to come from the group structure of the model. We illustrate this point by considering the

following triangular system:

The reduced form of (5.4.15) is

ygi t = aghi t+ gi t, g= 1, . . . , G, (5.4.16)

The trick of the purged instrumental-variable (IV) method is to leave h in the residual and construct instruments that are uncorrelated with h. Before going to the general formula, we use several simple examples to show where the instruments come from.

5.4 Triangular System 131

Next, suppose that onlyγ32= 0. The reduced form of the model becomes

In this case, the construction of valid instruments is more complicated. It re-quires two stages. The first stage is to use y1as a proxy for h in the reduced-form equation for y2:

The second stage is to use z2as an instrument for y1in the structural equation y3:

y3i t = γ31y1i t + d3hi t+ u3i t. (5.4.23) The variable z2is an appropriate IV because it is uncorrelated with h and u3, but it is correlated with y1, provided d2σ12= 0. (If d2= 0, then z2= y2− γ21y1= u2. It is no longer correlated with y1.) Therefore, we require that h appear directly in the y2equation and that y1not be proportional to h – otherwise we could never separate the effects of y1and h.

In order to identify the y2equation

y2i t = γ21y1i t + d2hi t+ u2i t, (5.4.24) we can interchange the reduced-form y2 and y3 equations and repeat the two stages. Withγ21andγ31identified, in the third stage we form the residuals

v2i t = y2i t− γ21y1i t = d2hi t+ u2i t,

Now d2/d1and d3/d1 can be identified by a third application of instrumental variables, using y1i s, s = t, as an instrument for y1i t. (Note that only the ratio of the d’s is identified, because of the indeterminate scale of the latent variable.)

Now come back to the construction of IVs for the general system (5.4.15)–

(5.4.18). We assume that T ≥ 2. The instruments are constructed over several stages. At the first stage, let y1be a proxy for h. Then the reduced-form equation for ygbecomes

ygi t =ag

a1

y1i t+ gi tag

a11i t, g= 2, . . . ,  − 1. (5.4.27) If T ≥ 2, ag/a1can be consistently estimated by using different members in the same group (e.g., y1i sand y1i t, t = s) as instruments for the ygequation (5.4.27) when d1σα2= 0. Once ag/a1is consistently estimated, we form the residual

zgi t = ygi tag

a1

y1i t = gi tag

a11i t, g= 2, . . . ,  − 1. (5.4.28) The zg are uncorrelated with h. They are valid instruments for yg provided dgσ12= 0. There are  − 2 IVs for the  − 2 variables that remain on the right-hand side of theth structural equation after ykhas been excluded.

To estimate the equations that follow y, we form the transformed variables y2i t = y2i t− γ21y1i t,

y3i t = y3i t− γ31y1i t − γ32y2i t,

(5.4.29) ...

yi t = yi t − γ1y1i t− · · · − γ,−1y−1i t, and rewrite the y+1equation as

y+1i t = γ+1,1 y1i t+ γ+1,2 y2i t+ · · +γ+1,−1 y−1 i t+ γ+1,yi t

+ d+1hi t+ u+1i t, (5.4.30)

whereγ+1, j = γ+1, j+

m= j+1γ+1,mγm j for j < . Using y1as a proxy for h, we have

y+1i t = γ+1,2 y2i t+ · · · + γ+1,yi t

(5.4.31) +



γ+1,1 +d+1 d1



y1i t + u+1i td+1 d1

u1i t,

Because u1 is uncorrelated with yg for 2≤ g ≤ , we can use ygi t together with y1i s, s = t as instruments to identify γ+1, j. Onceγ+1, jare identified, we can form y+1 = y+1− γ+1,1y1− · · · − γ+1,y and proceed in a similar fashion to identify the y+2equation, and so on.

Once all theγ are identified, we can form the estimated residuals, ˆvi t. From ˆvi t we can estimate dg/d1by the same procedure as (5.4.26). Or we can form the matrix ˆ of variance–covariances of the residuals, and the matrix ˆ¯ of variance–covariances of averaged residuals (1/T )T

t=1ˆvi t, then solve for d,

5.4 Triangular System 133

12, . . . , σG2), andσω2from the relations

 =ˆ  1+ σω2

dd+ diag

σ12, . . . , σG2

, (5.4.32)

 =ˆ¯  1+ σω2

dd+ 1 Tdiag

σ12, . . . , σG2

. (5.4.33)

The purged IV estimator is consistent. It also will often indicate quickly if a new model is identified. For instance to see the necessity of having at least one moreγg= 0 for g >  to identify the foregoing system, we can check if the instruments formed by the foregoing procedure satisfy the required rank condition. Consider the example where G= 3 and all γg= 0 for g > . In order to follow the strategy of allowing h to remain in the residual, in the third equation we need IVs for y1and y2that are uncorrelated with h. As indicated earlier, we can purge y2of its dependence on h by forming z2= y2− (a2/a1)y1. A similar procedure can be applied to y1. We use y2as a proxy for h, with y2i s as an IV for y2i t. Then form the residual z1= y1− (a1/a2)y2. Again z1 is uncorrelated with h and u3. But z1= −(a1/a2)z2, and so an attempt to use both z2and z1as IVs fails to meet the rank condition.

5.4.2.b Maximum-Likelihood Method

Although the purged IV method is simple to use, it is likely to be inefficient, because the correlations between the endogenous variables and the purged IVs will probably be small. Also, the restriction that (5.4.6) is of rank 1 is not being utilized. To obtain efficient estimates of the unknown parameters, it is necessary to estimate the covariance matrices simultaneously with the equation coefficients. Under the normality assumptions forαi, ωi tand ui t, we can obtain efficient estimates of (5.4.15) by maximizing the log likelihood function

log L = −N 2 log|V |

−1 2

N i=1

(y1i, y2i, . . . , yGi)V−1(y1i, . . . , yGi), (5.4.34) where

ygi

T×1= (ygi 1, . . . , ygi T), g= 1, . . . , G,

GTV×GT =  ⊗ IT + aa⊗ eTeT, (5.4.35)

G×G= E(⑀i ti t)+ σω2aa. Using the relations10

V−1 = −1⊗ IT− cc⊗ eTeT, (5.4.36)

|V | = ||T|1 − T cc|−1, (5.4.37)

we can simplify the log likelihood function as11 log L= −N T

2 log|| + N

2 log(1− T cc)

N T

2 tr(−1R)+ N T2

2 cRc,¯ (5.4.38)

where c is a G× 1 vector proportional to −1a, R is the matrix of the sums of the squares and cross-products of the residuals divided by N T , and ¯R is the matrix of sums of squares and cross-products of the averaged residuals (over t for i ) divided by N . In other words, we simplify the log likelihood function (5.4.34) by reparameterizing it in terms of c and.

Taking partial derivatives of (5.4.38), we obtain the first-order conditions12

∂ log L

∂−1 = N T

2  + N T 2

1

(1− T cc)cc − N T 2 R= 0,

(5.4.39)

∂ log L

∂c = − N T

1− T ccc + N T2¯Rc= 0. (5.4.40) Postmultiplying (5.4.39) by c and regrouping the terms, we have

c = 1− T cc

1− (T − 1)ccRc. (5.4.41)

Combining (5.4.40) and (5.4.41), we obtain



¯R− 1

T [1− (T − 1)cc]R



c= 0. (5.4.42)

Hence, the MLE of c is a characteristic vector corresponding to a root of

| ¯R − λR| = 0. (5.4.43)

The determinate equation (5.4.43) has G roots. To find which root to use, substitute (5.4.39) and (5.4.40) into (5.4.38):

log L= −N T

2 log|| + N

2 log(1− T cc)

N T

2 (G+ T tr c¯Rc)+ N T2 2 tr(c¯Rc)

= −N T

2 log|| + N

2 log(1− T cc) − N T G

2 . (5.4.44) Let the G characteristic vectors corresponding to the G roots of (5.4.43) be denoted as c1(= c), c2, . . . , cG. These characteristic vectors are determined only up to a scalar. Choose the normalization c∗gRcg = 1, g = 1, . . . , G, where cg= (cgRcg)−1/2cg. Let C = [c1, . . . , cG]; then C∗RC= IG. From (5.4.39)

5.4 Triangular System 135

and (5.4.41) we have

C∗C= C∗RC− 1− T cc

[1− (T − 1)cc]2C∗RccRC

= IG− 1− T cc [1− (T − 1)cc]2

×





(cRc)1/2 0

... 0



[(cRc)1/2 0 · · · 0]. (5.4.45)

Equation (5.4.41) implies that (cRc)= {[1 − (T − 1)cc]/[1 − T cc]}cc.

Therefore, the determinant of (5.4.45) is{[1 − T cc]/[1 − (T − 1)cc]}.

Using C∗−1C∗−1= R, we have || = {[1 − T cc]/[1 − (T − 1)cc]}|R|.

Substituting this into (5.4.44), the log likelihood function becomes log L = −N T

2 {log |R| + log(1 − T cc)

− log[1 − (T − 1)cc]}

+N

2 log[1− T cc] − N T G

2 , (5.4.46)

which is positively related to cc within the admissible range (0, 1/T ).13So the MLE of c is the characteristic vector corresponding to the largest root of (5.4.43). Once c is obtained, from Appendix 5A and (5.4.39) and (5.4.40) we can estimate a and by

a= T (1 + T2c¯Rc)−1/2c¯R, (5.4.47) and

 = R − aa. (5.4.48)

Knowing a and, we can solve for the coefficients of the joint dependent variables.

When exogenous variables also appear in the equation, and with no re-strictions on the coefficients of exogenous variables, we need only replace the exponential term of the likelihood function (5.4.34),

−1 2

N i=1

(y1i, . . . , yGi)V−1(y1i, . . . , yGi),

with

−1 2

N i=1

(y1i− ␲1Xi, . . . , yGi− ␲GXi)

× V−1(y1i − ␲1Xi, . . . , yGi− ␲GXi).

The MLEs of c, a, and remain the solutions of (5.4.43), (5.4.47), and (5.4.48).

From knowledge of and a we can solve for  and σω2. The MLE of condi-tional on V is the GLS of. Knowing  and , we can solve for B = −.

Thus, Chamberlain and Griliches (1975) suggested the following iterative algorithm to solve for the MLE. Starting from the least-squares reduced-form estimates, we can form consistent estimates of R and ¯R. Then estimate c by maximizing14

c¯Rc

cRc. (5.4.49)

Once c is obtained, we solve for a and by (5.4.47) and (5.4.48). After obtaining

 and a, the MLE of the reduced-form parameters is just the generalized least-squares estimate. With these estimated reduced-form coefficients, one can form new estimates of R and ¯R and continue the iteration until the solution converges.

The structural-form parameters are then solved from the convergent reduced-form parameters.

5.4.3 An Example

Chamberlain and Griliches (1975) used the Gorseline (1932) data of the highest grade of schooling attained (y1), the logarithm of the occupational (Duncan’s SES) standing (y2), and the logarithm of 1927 income (y3) for 156 pairs of broth-ers from Indiana (U.S.) to fit a model of the type (5.4.1)–(5.4.3). Specifically, they let

y1i t = ␤1xi t+ d1hi t+ u1i t,

y2i t = γ21y1i t + ␤2xi t+ d2hi t+ u2i t, (5.4.50) y3i t = γ31y1i t + ␤3xi t+ d3hi t+ u3i t.

The set X contains a constant, age, and age squared, with age squared appearing only in the income equation.

The reduced form of (5.4.50) is

yi t = xi t+ ahi t+ ⑀i t, (5.4.51)

where Π =

␤1

γ211+ ␤2 γ311+ ␤3

, a=

d1

d2+ γ21d1 d3+ γ31d1

, (5.4.52)

i t =

u1i t

u2i t + γ21u1i t u3i t + γ31u1i t

.

5.4 Triangular System 137

Therefore,

Ei ti t=



σu12 γ21σu12 γ31σu12 σu22 + γ212σu12 γ21γ31σu12

σu32 + γ312σu12

, (5.4.53)

and

 =

σ11 σ12 σ13

σ22 σ23

σ33

 = E(⑀i ti t)+ σω2aa. (5.4.54) We show that knowing a and identifies the structural coefficients of the joint dependent variables as follows: For a given value ofσω2, we can solve for

σu12 = σ11− σω2a21, (5.4.55)

γ21 = σ12− σω2a1a2

σu12 , (5.4.56)

γ31 = σ13− σω2a1a3

σu12 . (5.4.57)

Equating

γ21γ31= σ23− σω2a2a3

σu12 (5.4.58)

with the product of (5.4.56) and (5.4.57), and making use of (5.4.55), we have σω2= σ12σ13− σ11σ23

σ12a1a3+ σ13a1a2− σ11a2a3− σ23a21. (5.4.59) The problem then becomes one of estimating a and. Table 5.1 presents the MLE of Chamberlain and Griliches (1975) for the coefficients of schooling and (unobservable) ability variables withσα2normalized to equal 1. Their least-squares estimates ignore the familial information, and the covariance estimates in which each brother’s characteristics (his income, occupation, schooling, and age) are measured around his own family’s mean are also presented in Table 5.1.

The covariance estimate of the coefficient-of-schooling variable in the income equation is smaller than the least-squares estimate. However, the simultaneous-equations model estimate of the coefficient for the ability variable is negative in the schooling equation. As discussed in Section 5.1, if schooling and ability are negatively correlated, the single-equation within-family estimate of the schooling coefficient could be less than the least-squares estimate (here 0.080 versus 0.082). To attribute this decline to “ability” or “family background”

is erroneous. In fact, when schooling and ability were treated symmetrically, the coefficient-of-schooling variable (0.088) became greater than the least-squares estimate 0.082.

Table 5.1. Parameter estimates and their standard errors for the income–

occupation–schooling model

Method Least-squares Covariance

Coefficients of the structural equations estimate estimate MLE Schooling in the:

Income equation 0.082 0.080 0.088

(0.010)a (0.011) (0.009)

Occupation equation 0.104 0.135 0.107

(0.010) (0.015) (0.010)

“Ability” in the:

Income equation 0.416

(0.038)

Occupation equation 0.214

(0.046)

Schooling equation −0.092

(0.178)

aStandard errors in parentheses.

Source: Chamberlain and Griliches (1975, p. 429).

APPENDIX 5A