Combining time series and cross-section data in simultaneous linear equations

(1)

• l

1966

Institute of Statistics Mimeograph Series No. 505

Raleigh

1966

COMBINING TIME SERIES AND CROSS-SECTION DATA IN SIMULTANEOUS ~INEAR EQUATIONS

By

ASHIQ HUSSAIN

(2)

HUSSAIN, ASHIQ. Combining Time Series and Cross-Section Data in

parameters.

Simultaneous LinearEquations~·(Under the direction of THOMAS DUDLEY WALLACE) •

This thesis is concerned with the estimation of parameters in a

system of· simultaneous linear equations,by the combined use of time

series and cross-sectional data. An error component disturbance model

is postulated: the disturbance term in each equation of the system is

assumed to have three mutually independent random components - one

associated with time, another associated with cross-sectional units, and

a third one representing.measurement error~ Usual distributional

properties are.ascribed to. these error components, and methods of

estimation appropriate for the model are developed.

A two-stage estimation procedure is given for the reduced-form

*

In the first.stage, covariance matrices L I (~,~'

=

),l~

1, 2, 3, ••• , M) .of the reduced-form disturbances are estimated from

the ordinary least squares -residuals. In the second stage, two sets

of estimators for the reduced-form par~eters are derived: (i) single

equation estimators which result when Attken's two-stage method is

applied to each reduced-form equationpeparately; and (it) generalized

estimators obtained from the entire set of reduced-form equations.

Both sets of estimators compare favorably with the ordinary least squares estimators.

For the estimation of structural ,parameters, two methods designated

as the "Two-Stage.Generalized.Least.Squares Method" and the "Three-Stage.

(3)

simultaneous linear equations. For a system of exactly identified

equations, a third method called the "Indirect Generalized Least 8quares

Method" is also available. All three methods are based upon the

covariance matrices of reduced-form disturbances; the estimation of

covariance matrices of structural disturbances is avoided. The estimators

of structural parameters obtained by these methods are found to have

some optimal large sample properties.

Finally, some special cases, including the dummy variable model, are

considered. In this last case, it is found that, although BLU estimators

can be obtained for reduced-form parameters, the estimation of structural

parameters is made difficult by the fact that the number of

(4)

•

IN SIMULTANEOUS LINEAR EQUATIONS

by

ASHIQ HUSSAIN

A thesis submitted to the Graduate Faculty of North Carolina State University

at Raleigh

in partial fulfillment of the requirements for the Degree of

Doctor of Philosophy

DEPARTMENT OF EXPERIMENTAL STATISTICS

RALEIGH

1966

APPROVED BY:

(5)

Born:

Married:

Previous Work:

Undergraduate:

Graduate:

Employment:

BIOGRAPHY

Sargodha, West Pakistan January 1, 1924

Fahmida, June 27, 1954

B.A. (Mathematics) 1947

University of Punjab, Lahore, Pakistan

M.A. (Mathematics) 1950

University of Punjab, Lahore, Pakistan

M.A. (Economics) 1962

University of Peshawar, Peshawar, Pakistan

M.E.S. 1965

North Carolina State University, Raleigh

Lecturer in Mathematics

Murray College, Sialkot, Paki~tan

and

Pakistan Military Academy, Kakul, Pakistan 1950 - 1954

Senior Lecturer in Mathematics

(6)

-ACKNOWLEDGMENTS

I want to acknowledge my indebtedness to Dr. Thomas Dudley Wallace,

Chairman of my committe~, who inspired my interest in the subject of

simultaneous linear equations and from whom I received unfailing

guidance and encouragement throughout my stay at North Carolina State

University.

I am also thankful to the other members of my committee for their

advice and helpful criticism.

I must thank the Agency for International Development, Department

of State, United States Government for financing my education here and

the faculty and staff of the Department of Experimental Statistics whose

cooperation and courtesy made my stay a pleasant one.

A special word of thanks should go to Mrs. Jo Ann Beauchaine for her

excellent typing of this thesis.

Lastly, lowe a debt of gratitude to my wife, Fahmida, who had to

shoulder the burden of supporting our children for three years that I

was in the United States, earning nothing, but who never failed to s~nd

a word of encouragement from across the seas, and to my father, who

(7)

1.0 INTRODUCTION • • • . •

TABLE OF CONTENTS

Page

1

2.0 ORDINARY SIMULTANEOUS LINEAR EQUATION MODELS ••

2.1 Stochastic Specifications •• 2.2 Identification • • • • • • • 2.3 Reduced-Form Estimation •• 2.4 Structural Estimation ••

5 5 7 10 14 2.4.1 2.4.2 2.4.3 2.4.4 2.4.5 2.4.6

Indirect Least Squares (ILS). Two-Stage Least Squares (2-SLS) • Three-Stage Least Squares (3-SLS) • Limited Information Single Equation

(LISE) or Least Variance Ratio

Method. . . . • • • . . . . . • .

K-Class Estimators • • • • • • • • • • Full Information Least Generalized

Residual Variance Method. •

.

. .

14 15 17 19 21 22

3.0 SOME THEOREMS ON STOCHASTIC CONVERGENCE 23

3.1 3.2 3.3

Useful Inequalities • • • • • • • • • • • Stochastic Convergence, Univariate Case. • Convergence of Sequences of Random Vectors .

23 24 30

4.0 ERROR COMPONENT MODEL: REDUCED-FORM ESTIMATION • • •

4.1 Description of the System • • • • • • • 4.2 Estimation of Covariance Matrices of

Reduced-Form Disturbances • • • • • • 4.3 Some Useful Results • • • • • • • • • 4.4 Estimators of Reduced-Fo~Parameters ••

5.0 ERROR COMPONENT MODEL: STRUCTURAL ESTIMATION OF AN EXACTLY IDENTIFIED SYSTEM • • • • •

5.1 Notation. . • . • . • • • • •

5.2 Indirect Generalized Least Squares 5.3 Two-Stage Generalized Least Squares •• 5.4 Three-Stage Generalized Least Squares ••

(8)

TABLE OF CONTENTS (continu~d)

Page

6.0 ERROR COMPONENT MODEL: STRUCTURAL ESTIMATION OF

OVERIDENTIFIEDSYSTEMS • • • • • • • • • • • •

. .

.

. . .

'. 79

6.1

6.2 6.3 6.4

Inadequacy of Indirect Generalized Least Squares Method. • 0 • • • •

Two Lenunas. • 0 0 0 • 0 • .. • • • • • • •

Two-Stage Generalized Least Squares Estimators •• Three-Stage Generalized Least Squares Estimators.

.

7980

83

88

7.0 ERROR COMPONENT MODEL: SOME SPECIAL CASES.

7.1 Fixed Cross-Sectional Effects • • •

. . .

.

~

93

7.1.1 7.1. 2

Estimators of Reduced-Form Parameters. • • Structural Estimation. • . • • • • •

95 101

7.2

Cross-Sectional and Time Effects Random with Finite Nonzero Expectations .

7.3 Dummy Variable Specifications.

8.0 SUMMARY AND CONCLUSIONS • • •

LIST OF REFERENCES • • • • • .

102 104

106

(9)

This thesis is concerned with the estimation of parameters in a

system of simultaneous linear equations by the combined use of time

series and cross-section data. The problem was suggested by Hildrethl

in 1950; but so far as the present writer knows, it has received little

attention. Most of the work done thus far on simultaneous equations

is based on time series data. The model which is ordinarily used is:

,

.YtA+~B+~=Q, (1.1)

,

where Zt = (Ylt' Y2t' ••• , YMt) is the vector of observed values of M

. th '

endogenous variables for the t time period; ~ = (x

lt' x2t' ••• , xKt) is the vector of observed values of K exogenous variables for the tth

,

time period; ~t = (u_lt' u_2t' ••. , ~t) is the vector of values of M unobservable random variables in the system, specified by the relations

E(uj.lt)

=

0 for all t

=

1, 2,

i

(J

,

(finite) , i f t

,

=

t,

E(u t u , ,)_j.l _j.l _t

=

j.l].l

,

0, i f t

+

t, (j.l,].l

=

1, 2, ••• , M). A is an MxM nonsingular matrix of constants with diagonal elements

equal to - 1; and B is a KxM matrix of constants.

A few cross-sectional studies have been made using the same general

model (model (1.1)) but data consisting of observations for different

cross-sectional units.

But no attempt has been made to pool time series and cross-section

data for the estimation of simultaneous equations.

(10)

In the classical single-equation system problem two different models

have been suggested for combining time series and crosssection data

-(i) the "dummy-variable" model, and (ii) the error-component model.

In the "dummy-variable" model, time and cross-sectional effects are

assumed constant. A "dummy-variable'} version of the simultaneous linear

equation system will be the following:

,

' "

,

~t A

+

~t B

+

4 +

~

+

~t = .Q. ,

where A and B are as defined in (1.1):

(1.2)

~t is an Mxl vector of observations on the endogenous variables for

h ,th . d t h . . d

t e 1 cross-sect1on an t t1me per10 ;

~t is a Kxl vector of observations on the exogenous variables for the

. th . d t h . . d

1 cross-sect1on an t t1me per10 ;

A, is an Mxl vector of constant effects associated with the ith cross--:J..

section;

~

is an Mxl vector of constant effects associated with the tth time

period; and

~t is an Mxl vector of values of unobservable random variables

specified by the relations

E(u_IJ1't) = 0,

)

"

(J '(finite) , i f i = i and t = t,

E(u. u . '

'=

IJIJ

IJ1t IJ1 t ) ,

0, otherwise for IJ,IJ = 1,2, ••.,M.

The model is unsuitable for two reasons. First, it does not take

cognizance of random effects which may cause variations from time to

time and from one cross-sectional unit to another. Second, the number

of cross~sectionalunits as well as the number of time intervals must be

(11)

But this very assumption stands in the way of structural estimation. For

a subset of explanatory variables in each structural equation consists of

endog~nQus variables which are correlated with the error term. Conseqpently

the principle of least squares cannot be applied.

In the error-component model, i~ is assumed that the error term has

three mutually independent random components - a component associated with

time, another associated with cross-sectional units, and a third one

representing measurement error. We adopt this model. We assume that the

di13turbance term of the lJth equation of the system is of the form

(1.3)

where u

lJi' VlJt and WlJit are mutually independent random variables with zero means and

E(u . u ' . ' )

=

In lJ 1.

E(v v " )

=

t t

It is further assumed

normally distributed.

i

aU , (finite) if i i, lJlJ

_,

0, if i

+

i for lJ,lJ

=

l,2,3, ..• ,M;

).o~"' (f~nite)if

t' - t,

10,

if t

+

t for lJ,lJ

=

l,2,3, ••. ,M;

,

) a~lJ' (finite) i f i = i and t = t,

10'

otherwise, for lJ,lJ' = 1,2,3, ... M. that the components of the error terms are

Our model is, therefore,

,

~t A

+

,

~t B

+

-4t

=

°

(1.4)

and disturbances E

it = (£lit' £2it' ... , £Mit) as specified in (1.3).

A good rationalization of this type of error structure is given by

(12)

estimating the parameters of the system of equations - that is, the

elements of matrices A andB - which are appropriate to this error model.

It will be shown in the sequel that our methods yield estimators which

have some optimal properties.

The plan of this thesis is as follows: In Chapter 2.0 we present a

review of the existing material on the subject of simultaneous linear

equations, discussing briefly some of the well-known methods of

structural and reduced-form estimation. All these methods are based on

model (1.1). This is followed by a chapter on stochastic convergence;

a number of simple results on the convergence of sequences of random

variables are given, which are helpful in the derivation of large-sample

properties of estimators. The remaining chapters are devoted to the main

topic - estimation of the reduced-form and structural parameters in the

(13)

2.0 ORDINARY SIMULTANEOUS LINEAR ~EQUATION MODELS

2.1 Stochastic Specifications

By far the most important contributions to the subject of

simulta-neous linear equations are those which use the model (1.1). We have a

rich and growing body of literature in this area, and a variety of methods

for ~tructural and reduced-form estimation are available. Excellent

treatment of these methods is given by Theil [17], Goldberger [8] and

Johnston

[9].

We review some of these methods briefly here merely to

provide a frame of reference for what is to follow in succeeding chapters.

Let T be the number of observations made on the observable variables

of the system (1.1). We can then write the system of equations compactly

as

where

YA + XB + U = 0 (2.1)

Y is a TxM matrix of observations on M endogenous variables;

X is a TxK matrix of obse+vations on K exogenous variables; and

U is a TxM matrix of values of M unobservable random variables

(ul ' u2 ' ••• , uM)·

The equation (2.1) will be referred to as the structure, while the

equations

Y=XIT+V, ...

where

-1 IT

= -

BA ,

V

= -

UA-1,

will be ~alled the reduced-form of the system.

(14)

The exogenous variables ~

=

(x_lt' x_2t' ••• x_Kt) are determined out-side the system. The observation matrix X

=

(Xl' X

2' ••• ~) is assumed to be generated by some mechanism independently of the disturbance so that

E(ulx)

=

E(U)

=

0

E(Ylx)

=

E(Y)

It is further assumed that

EX(XtX~)

= LXX is nonsingular so that Plim

(x~x)

exists and is positive definite.

It has been shown that the specifications given above give the same

asymptotic results as would be given by the specification that (xl"'~)

(2.3)

we

are nonstochastic but subject to the condition that the ordinary limit,

i~ (x~x)

exists and is nonsingular.

In what follows we shall, for sake of convenience, ignore the process

generating observations on the exogenous variables, regarding them as

nonstochastic but subject to the condition (2.3).

It is clear that

(2.4)

0"11 0" 12 O"IM

0"21 0"22 0" 2M

= = L

uu

O"MI O"M2 O"MM

,

(15)

th

and that u , the disturbance vector of the u structural equation has

tJ

covariance matrix

,

E(u u)

=

a I

tJ tJ ].1].1 T

, ' - 1

Continuing, we note that ~ = ~t A so that E(~) = 0 and

Let v be the disturbance vector of ].Ith equation in the reduced-form

tJ

(2.2). Clearly

1,2, •••,M) are elements of a].lr u where a].lr (r =

-r'

v

=

-u

M

L

r=l

th -1

the].l column of A •

(t

=

o

for

(2.6)

a ') rr '

, ,

].Ir ].I r

a a

1,2, •••,M)

M

L

r=l

,

t

+

t M = (,2:

r =1 v].lt

E(v v " ) = ].It ].I t

Or

so that

-e

,

(].I,].I = 1,2, ••• ,M) which shows that the reduced-form disturbances are

temporally uncorre1ated.

2.2 Identification

We assume that the system (1.1) is identified - all underidentified

equations are deleted from the system. This means that given IT there is

at least one solution of

(16)

tions - that is, some of the elements of A and B are zeros.

-e

There must be a sufficient number of ~priori restrictions on the

elements of A and B if (Z.7) is to have at least one solution. We have

already imposed one restriction, namely, that diagonal elements of A are

equal to -1. Let the remaining restrictions take the form of

zero-restric-*

Let Q\ and

--,.I

*

th

8

be the ~ column vectors of Aand B, respectively. Let us assume

--,.I

*

that t elements of Q\ are nonzero.

~ --,.I

th

One of these nonzero elements (the ~ element) is -1. Therefore,

rearranging these elements we can write

*

[

~].

0

Q\ = where Q\ is t xl, having one element equal to -1

--,.I --,.I ~

and all other elements nonzero.

*

Similarly, assuming that K~ elements of ~ are nonzero, while all

the remaining elements are zeros,we can write

where 8 is K xl.

--;.l ~

From (Z.7) it follows that

* *

-8

=

IT Q\

--,.I --,.I (Z.8)

Partitioning IT as

IT

=[rr~l

IT

ZI

IT~ . _{K xt}

ll~s _~ _~

~ _{is (I\-K )xJ/,} IT

ZI ~ ~

~ _{is K x(M-t )} _and IT

IZ ~ ~

,

(17)

n~22 is (K-K )x(M-t )

~ ~

and using (2.8), we have

(2.9)

The

o

=

0

equation in the structure is

*

Ya

+

xa

+

u = O.

11 11 I.l

-+

u = O.

/.l

]J

Here Yis the matrix of (t~-l) endogenous variables other than Y]J in-th

eluded in the]J structural equation~ and X is the matrix of ]J

observations on K exogenous variables included in this equation. ]J

-e

(~)

Or y

=

Y a

+

X

!

I

+

u

I J I.l ~... I.l (2.10)

Suppose that we know

n.

o

If the equations (2.9) are solvable for a I.l th

and

a ,

we can immediately find the parameters of the]J structural I.l

equation (2.10).

Now, by a theorem of matrix algebra (Goldberger [8], p. 23), a

necessary condition for the equation

to have a solution that

K - K_{]J -}> t_]J-1

or

K > K

+

t -1

(18)

The solution of the equation

o

when substituted for a in the first equation of (2.9) will yield a value

-;.t

of~. Thus, given a knowledge of reduced~formparameters IT, we can at th

once determine the ~ structural equation.

Three situations arise:

When K = K~

+

~~-1, the nullity of the (K - K~)x~~ matrix IT~l

is exactly equal to 1 so that there is a unique solution (a ,6 ) of

-;.t -;.t

th

equation (2.9), and the ~ structural equation is exactly identified. When K > K

+

~ -1, the nullity

~ ~

case, the equations (2.9) admit more than

~ of IT

21 exceeds 1. In this

· 1 ' d h th

one so ut~on an t e ~

-e

structural equationisoveridentified.

Finally, when K < ~

+

K -1, equations (2.9) have no solution

~ ~

and the

~th

structural equation is said to be underidentified. The underidentified relations are indeterminate so that only the first two

cases are relevant to the estimation problem.

2.3 Reduced-Form Estimation

The reduced-form equations are

y

=

XIT

+

v

J.l I.l I.l (~

=

1, 2, ... , M)

th

where IT is the ~ column vector of IT. ~

We have seen that

E(V~t v ') = 0 for ~ = 1, 2,

...

,

M ~t

M M

and _{E(v~t V~t)} = w = ,L L a~r (1 ar ~ for all t so that

~~ _{r =1 r=l} rr

(19)

Hence the ordinary least squares method applied to each reduced~form

equation yields unbiased estimates.

We have, therefore

, 1 ' , 1 '

II

=

(X X)- X Y

=

II

+

(X X)- X v

~ ~ ~ ~

which have ,covariance matrix

(2.13)

,

E(II - II ) (II - II )

~ ~ ~ ~

, -1

=

(X X ) w .

j..l~ (~ = 1,2, ... , M). (2.14)

It will be shown that if we take all the reduced-form equations and

simultaneously estimate !1' !2' ••• , ~ we obtain the same estimates as are given byequation-by-equation ordinary least squares method.

Write the reduced~formequations in the form

::L=

_::L1

= _XIII

+

v

1 (2.15)

'e

::L2

X!2 v₂

~ X~ ~

If we write

X

= X 0 0 0

0 X 0

0 0 X

MTxMK,

then the equations (2.15) take the form

(20)

where II

=

v

=

~

MK+l TMxl

,

E(y' v ) = _wll1 w

121 wlM1

w

2l1 w221 w2M1

wMl1 w_M21 wMM1

w

ll w12 wlM

= w_2l w₂₂ w

2M 001 = fl

<ID!.

(2.17)

-e

w MM

By Aitken's generalized least squares method we have

].=

These estimates are B.L.U. Denoting the elements of fl-l by wjk

(21)

e

wll(X 'X) w12(X'X) w1M (X' X)

-1 M

Ij

,

j~1 w _X~

w21(X' X) w22(X'X) w2M(X'X)

M

2'

,

l!. = ,1:1 w J XY..,

J= J

wM1(X'X) wMM(X'X) ,1:M M'

,

1 w J XY..'

J= J

, -1 _' _-1 _' _-1 M 1j

w

ll(X x) w12(X X) ••• w1M(XX) .

j~l

w X~

, -1 _' _-1 _' _-1 M 2 '

= w

21(X X) w22(X X) •.• w2M(X X) _J=,1:1 w J X~ (2.18)

M

1: wM'J X,Y..,

J

"e

_Since _~1:1M _{w ,w}J'R, _=.

~1'

J= ].lJ _0,

we have from (2.18)

if ].l = R,

if].lfR,

(2.19)

= which shows the classical least

squares method applied to each equation

Further, the covariance matrix of IT

~

separately yields B.t.ll. estimators.

,

( , )-1

w

is (X X) -1 w

=

U

..l:!J:!. which

].l].l T T

(22)

2.4 Structural Estimation

2.4.1 Indirect Least Squares (ILS)

Suppose that we have estimated

n

by

n.

n

is clearly consistent. Let us recall the equations (2.9):

If the equations are exactly identified,

Q, -1 II

-e

and K - K

=

Q, -1 II II

Therefore the nullity of n

21 which is(K - Kll)x Q,ll matrix is exactly equal to 1, and there is a unique solution of

A

0 0 0

Denote it by a . Then a is a consistent estimate of a . Therefore

"jJ J.l "jJ

= so that

A 0

1,

= - nll a is a consistent estimate of

S .

,. 11 "jJ J.l

This holds for all II = 1, 2, ••• , M.

Thus in order to find consistent estimates of structural parameters

(23)

-e

Now if K > KlJ

+

!/,lJ-l, the nullity of IIilexceeds one, and there

o

are more than one consistent estimates of a corresponding to a cons

is--j..I

"lJ lJ tent estimate II

21 of II21• Each of these estimates substituted in

will give a consistent estimate of

S •

One has to arbitrarily discard

11

o

all but one of the estimates of a in order to obtain a consistent

11

estimate of the lJth structural equation in the overidentified case.

2.4.2 Two-Stage Least Squares (2-SLS)

The two-stage least squares method was developed by Theil [17] and,

independently, by Basmann [3, 4]. The method avoids the arbitrariness

and loss of efficiency involved in the ILS method when the system is

over-identified.

Suppose we estimate the reduced-form parameters by ordinary least

sq1-1ares method applied to each reduced-form equation separately. These

est;imates have been shown to be B.L.U. and consistent. Therefore

, l '

~

=

X(X X)- X~

m

=

1, 2,

...

,

M, and

is a consistent estimate of XII , for lJ

'" I -1 I I 1 I

Ym

=

~

-

X(X X) X

Ym

=

(IT - X(X X)- X ) ~ (2.20)

is a consistent estimate of v (m

=

1, 2, ••• , M).

-m

In the reduced-form estimation we obtain the estimator ylJ and the

e$timator V of the corresponding matrix of reduced-form errors so that lJ

(24)

and (2.10) can be written in the form

(lJ)

Y. = Y ex

+

X

11,

+

(u

+

Vex)

IJ "j.l l J . . IJ lJ "j.l

(2.22)

The two-stage least squares estimators 0

"j.l parameters[

t]_

..

(lJ) , Y

o

=

"j.l X lJ

..

_-1

(lJ) (lJ) ,

Y Y X

lJ

(lJ)

Y X X

lJ lJ

~ [~]

of structural

..

(lJ) , Y

_;

(2.23)

,

X ~

It has been shown that these estimators are consistent and that

their asymptotic covariance matrix is

(J ~ T

..

1

((~)'

Plim T

T-+oo_. _X'

lJ

..

1

((~)'

P1im

T

T-+oo ' X lJ

..

(].1) , )

Y X lJ

,

X X lJ lJ (2.24)

which is consistently estimated by the matrix

s lJlJ

..

(lJ) , Y X

lJ XlJ

. -1

X")

(2.25) X lJ where (2.26)

An alternative derivation is the (lJ)

Writing H =(Y,X) and premu1tip1ying

lJ

,

lJ

,

X Y. = X H 0

+

X u lJ IJ lJ "j.l IJ

following:

,

(25)

,

The covariance matrix of the transformed error vector is a (X X)

llll

,

which is not diagonal. Ignoring the correlation between X yll and X u

f.l

(Which is weak) and applying Aitken's method, we obtain

, , l ' l ' , 1 '

o =

[H X(X X)- X H]- H X(X X)- X Y

-V II II II II (2.27)

,

If the equation is exactly identified, X H is a square matrix. If II

it is also nonsingular, (2.27) simplifies to

"

' - 1 '

o

=

(X H) X v .

-V II f.l

2.4.3 Three-Stage LeastSguares (3-SLS)

(2.28)

--The three-stage least squares method was developed by Zellner and

Theil [20]. Zellner [19] has shown that if the error terms in a set of

regression relations were contemporaneously correlated and if the

regressors in different relations were different, Aitken's method applied

to the entire set of relations would yield better estimates than the

ordinary least squares method applied,to each equation separately. The

three-stage least squares method utilizes the same principle.

(ll) HI

Writing H

ll

= (

y ,Xll), H

=

0

o

X 0

, X

=

0 X

o

X

y

=

Y2 , u

=

u₂ and 0

=

, and premultiplying each structural

,

equation by X , we obtain

,

(26)

,

8ince E(X ~~ X)

=

,

(X L: X)t Aitken's generalized least squares method uu

is appropriate. ConsequentlYt we have

" l ' 1 " l '

o = [H X(X L: . X) - X H]- [H X(X L: X) - X y]

uu uu

which reduces to

o

=

lL_' ' - 1 '

cr 1H1X(X X) X HI

ZL_' ' - 1 '

cr IHZX(X X) X HI

crlZH~X(X'X)-lX'HZ

crZZH~X(X'X)-lX'HZ

-1

.

MMu'v

'

l ' cr L~(X X)- X~

-e

(Z.Z9)

Replacing cr_jk by their estimates Sjk obtained in 2-8L8 method t we obtain

the three-stage least squares estimators

o

==

lL_' ' - 1 '

s 1H1X(X X) X HI

ZL' ' - 1 '

s IHZX(X X) X HI

SlZH~X(X'X)-lX'HZ

SZZH~X(X'X)-lX'HZ

...

MMu'v

'

l ' s ~(X X)- X ~

x

M Mj' ' - 1 '

(27)

Zellner and Theil have shown that these estimators are consistent and

that their asymptotic covariance matrix is

1

lim E[T (§..-.0 ('§"-.0 ' ]

TT~

=

t

Plim T T~

allH~X(X'X)-lX'Hl

a 2lH;X(X'X)-lX'H l

alMH~X(X'X)-lX'~

a2MH;X(X'X)-lX'~

MMu'

,

l '

a ~~X(X X)- X ~

-1

(2.31)

"-which is consistently estimated by the matrix Qn the right-hand side of

(2.31). It has been, futther, shown that these estimators are

asymptotically more efficient than the two-stage least squares estimators.

It is, of course, easy to see that if all the equations are exactly

identified, the 2-SLS and 3-SLS give identical estimators.

2.4.4 Limited Information Single Equation (LISE) or Least Variance Ratio Method"

The two methods are essentially the same, though these were

developed under different assumptions - the former by Anderson and

Rubin [1, 2] under the assumption of normality of structural disturbances,

and the latter by Koopmans and Hood ~O] without normality assumption. Consider the first structural equation

1

Zl = y (Xl + Xl~l +.!!1 so that

*

.I.I =

Partition X - the matrix of observations on all exogenous variables

in the complete system of equations - as X

=

(Xl' X

(28)

*

is regressed on Xl to the residual variance when Y1 is regressed on X is

as small as possible - that 'is, the addition of excluded exogenous

variables X

2 should make a minimum improvement to the explained sums of squares. Let t denote the ratio of the two unexplained sums of squares.

We have

t

=

[I

-[I - X(X X), - 1 'X] Y1

*

(2.32)

= -

= - (say) •

0'

1'*

a y Therefore, t =

_0'

1'*

~1 y

o ,

* 0

~1 W1 a1

-e

= 0

* 0

a

1 W ~1 .,."~.

o

Minimizing with respect to ~1' we obtain

(2.33)

This equation has a nontrivial solution if and only if the nullity

*

of the matrix W

1 - t W1 is at least equal to 1 - that is, if and only if

(2.34)

This is a polynomial of degree T in t. If t is the smallest of

the roots of (2.34), we can solve

o

for a

1 , obtaining an estimate a1 of ~1'

(2.35)

*

The next problem is to find an estimator ~1 of ~1' Regressing Y₁

(29)

Thus Al is estimated by

A

A I -1 I 1* ~

A

=

_{(XlXl )} XlY~l' (2.36)

where

;

I-~

~l

=

l~J

2.4.5 K-Class Estimators

Theil [24] defines the K-class estimators by the relations

-e

(].i)I (].i)

(")' J

(].i) I

Y Y - K V V Y X a Y - KV

].i ]1 ].l -'\l ₌ ].i

~, (2.37)

I (].i) I

X Y X X

_~

X

]1 ].i ].i ].i

(ll)

(]1

=

1, 2, .•• , M), where Y and V have been defined before, and K is a ].i

scalar which may be a constant. Or it may be a stochastic or

non-stochastic variable.

When K

=

0, we h.ve the ordinary least squares estimators;

whenK

=

~, we have the LVR estimators, and

wnen K

=

1, we obtain the 2-8L8 estimators.

It has been shown by Nagar [14] that if P1im (K~l)

=

0, the K-c1ass T-+<><>

P

estimaters are cons:f,.st'int; further, i f

IT""

(K-1)

--7>

0, have

(2.38) -1

X X

]1 ].i (].i) I

Y X

]1

P1im T-+<><> (1

]1].i

asymptotic covariance matrix

[

(].i) I(]1) AI A

Y Y - V V,

]1 ].i

I (]1)

X Y

(30)

2.4.6 Full Information Least Generalized Residual Variance Method

This method was developed by Koopmans, Rubin and Leipnik [11] under

the specification that the structural disturbances are normally

dis-tributed. While in the limited information least variance ratio method

only the restrictions on a single structural equation are utilized, this

method uses the restrictions on the entire system of structural equations.

It is a method for simultaneous estimation of all structural parameters.

In order to estimate the reduced~formsystem

Y = XII + V, we minimize

-e

I ' I

- XII) (Y - XII).

- (V V) = - (Y

T T

minimize

t

,

We may as well log I(v V)

I.

But

1

,

I

_{I 1-}

2 _I

,

- log I(v V)₂

I

=

2'

log A +

2'

log

lu ul·

(2.39)

From the

or

structural equations,

(Y,X)

[~]

+

U

=

0 Z A+ U = 0 , we have

I '

2'

log Iv vi = -log IAI +

2'

I loglA'

2 2

, AI (2.40)

This quantity is minimized subject to the constraints on A =[AJ

_{B. '}

to obtain estimates of structural parameters. The method has not been

extensively used, because on equating to zero, the partial derivatives of

,

log Iv vi with respect to the elements of A are, confronted with a complex

(31)

3.0 SOME THEOREMS ON STOCHASTIC CONVERGENCE

3.1 Useful Inequalities

This chapter concerns the convergence of sequences of random

variables. Using the standard notation, we shall give some theorems

on stochastic convergence which will be useful in the sequel.

First, we state a few inequalities:

(i) Let xl' x₂' ••. , x_n; Yl' Y₂, •• ", Y_n be nonnegative real

-e

numbers. Then

1 1

n n

-

n

~ xi Y

i ~ (~ x.p)p ( ~ y.q)q (Holder)

i=l i=l 1 i=l 1

where p > 1 and -1

+ -

1 = 1.

p q

(ii) _{For arbitrary real xi' Yi (i =} 1, 2, o • • , n)

(3.1)

(Cauchy-Schwarz) (3.2)

(iii) If X and Yare random variables,

1.

EIXYI ~ [ElxIP]p (Holder) (3.3)

for p > 1 and

1. + 1.

= 1, provided that the expectations exist.

p q

(3.4)

••• = 1.

following:

1

[Elzlr]r ...

,1.+1.+1.+

p q r

(3.3) is the

1

[EIYlq]q

An extension of

1.

E

I

XYZ ••I.~JE (

I

X

I

p) ] p

(iv) Let X be a random variable and c a positive constant. Then

< EIXlr

p[IXI > c] ~ for all r > O. r

c

(Markov) (3.5)

(v) Let X and Y be two random variables and e: > 0 an arbitrary

number. Then

p[IX-YI > e:] < E[X-1]2

e:

(32)

(vi) For any random variable X,

I

E(X)

I' .::.

E

I

X! provided EX exis t s •

(For proof of these inequalities see Rao [15]' and Loeve [12].

3.2 Stochastic Convergence. Univariate Case

Let Xl' X2' "', Xn ' be a sequence of random variables. A

particular member, X , of this sequence has distribution function n

(3.7)

F (x) which depends upon integer n and moments (if existing) which also n

depend upon n. The various modes of convergence of sequences of random

variables seek an answer to the question: What,happens to the random

variable X , its distribution function F (x) and it~ moments when n gets

n n

large?

Definition 1. A sequence {X

n} converges in distribution to the random variable X, if the distribution function F (x) of X converges

. n n

to the distribution function F(x) of X at all points of continuity of

F(x). We write X dist.

'>

X. n

Definition 2. A sequence {X } of~'random variables converges in n

probability to a constant cif

for

every E > 0 lim p[IX n

n -+co

p

We write X --~ c, or P1im X = c.

n _n-+co n

cl

> E] = O.

lim P [

I

X - X

I

> E] = O. n

Definition 3. A sequence of random variab~es {X} is said to n

converge in probability to a random variable X if for every E > 0

P

Wewri te X --;> X. n

Convergence in probability implies convergence in distribution.

(33)

'e

Theorem (3.1). Let {X } be a sequence of random variables and X

I n

2 2

a random variable such that EX < 00, EX < 00 for all. n, and

n

lim E(X - X)2 =

o.

n

n-7<lO

Then (i) X

2-..>

X·

n

,

(ii) lim EX

=

EX; n

n-7<lO

(iii) lim E(X - EX )2

₌

E(X - EX)2.

n n

n-+oo

Proof: (i) See Wilks [18], p. 100.

(ii)

I

[EX

n -

EX]I~

Elx

n -

xl

~

IE(X

n - X)2, by (3.7) and (3.3).

Therefore, 0 ~ _{lim I (EXn - EX)}

I

~ lim IE(X _ X)2 = 0

n-+00 n-7<lO n

so that lim EX = EX, which clearly exists because E(X2) < 00.

n

n-7<lO

(iii)

Now E(X - EX )2

n n

Let EX = a, EX = a so that by (ii) lim a = a.

n n _n-+oo n

2 = E(X_n - X+ X - a + a - a )_{' n}

= E(X - X)2

+

E(X - a)2

+

(a _ a)2

n n

+

2E(X_n - X) (X - a)

+

2(a - a ) E(X_n _n - X)

+

(a- a ) E(X - a) n

= E(X - X)2

+

E(X - a)2

+

(a _ a)2

n n

+

2E[(X_n - X) (X - a)]_'

+

2(a - a ) E(X_n _n - X).

Since lim E(X - X)2 = 0, E(X - X) is bounded for all n.

n n Moreover,

lim

n-+oo IE(Xn - X) (X - a)

I

~ n-+oolim IE(Xn _ X)2 E(X _ a)2 = O.

Therefore, lim E(X - EX )2 = E(X - a)2 = E(X - EX)2.

(34)

Corollary (3.1.1). Let X be a degenerate random variable. X= c

with probability 1. Then lim E(X - c)2

=

°

implies that

N~ n lim EX =

n

n~

c

=

Flim X •

n

n~

Corollary (3.1.2). Let {X } and {y } be two sequences of random

n . n

n~ n~

variables such that lim E y2

n < 00, and lim E Yn exists. I f

lim E(X - Y )2 = 0, then

n~ n n

both exist and are bounded for all n. (i) lim E(X )

=

lim E Y ,

n n

n~ n~

(ii) lim E(X - EX )2

=

lim E(Y _ EY )2

n n n n

n~ n~

--

Proof: Since lim E(y2)

n~ n

< 00, it follows that E y2 and E(Y - E Y )2

n n n

Now IE(X_n - Y_n

)1

_-< Elx_n - Y_n

I

_-<

I (

_EX-Y )2

n n

as n + 00, so that lim E(X - Y ) = 0. Therefore,

n n

which tends to zero

n~

lim E X

=

lim [E(X - Y

+

Y )]

=

n n n n

n~

n~ n~

lim E(X - Y ) + lim E Y =

n n n

n~ n~

lim E Y

n

This proves (i) above. Further,

E(X - E X )2 = E(X - Y + Y - E Y + E Y - E X )2

n n n n n n n n

=

E(X - Y )2 + E(Y _ E Y )2 + (E Y _ E X )2

n n n n n n

+ 2E[(X - Y )(Y - E Y )] + 2(E Y - E X )E(Y - E Y )

n n n n n n n n

- (E X _ E Y )2

n n

Y )

n

_ E Y )2 n

+ 2(E Y - EX )E(X

-n n n

=

E(X - Y )2

+

E(Y

n n n

(35)

On taking limit and using the given condition, we obtain

lim E(X - EX )2

n n

=

_n-+oolim E(Yn - EY )2n

+

2 lim E(X_n-+oo n - Y ) (Yn n - EY ).n

But lim IE[(X

n

n-+oo Y )2 E(Yn n

=

o.

Therefore, lim E(X - EX )2

n n

n-+oo

The corollary is proved.

=

lim E(Y - EY )2.

. , n n

n-+oo

Theorem (3.2). Let S be-a statistic and

B

a finite constant such

n

that for a given integer ~ > 3

lim E

Is -

13

I

~ = 0.

n n-+oo

Let X be normal (0, 0 2 ) where 0 2 < 00 and lim 0 2

=

0 2 •

n n n _n-+oo n

dist.

Then S X ~ normal (d, 132 0 2 ), and the mean and variance of n n

a X converge respectively to the mean and variance of the limiting

n n

distribution.

Proof: Using Holder's 'inequality, we have

P

Hence, by theorem (3.1), an

---:;>

S.

Let X be normal (0, 0 2 ). Then X_n dist.

:>

X, and, by Slutsky's

theorem, (Cramer [7], p. 254) a_{n ,n}X dist.:'> BX, which is normal (0, 132 0 2). Further, E(a X)

=

E(S - S)X

+

SEX

n n n n n

'V

= E(8 - 8)X .

n n

which tends to zero as n goes to infinity.

Therefore, lim E(S - S)X

=

0, which implies that lim E Sn X

=

0.

n n n

(36)

Moreover, 'using Holder's inequality, we have

1 2 1 2

E X2 (S - S)2 < [E X6]3 [EIS - S13]3 = (15 06

)3

[EIS - S13]3

n n n n n n '

which tends to zero as n ~ 00, and

which also tends to zero as n ~ 00.

Therefore,

lim E

X~ S~

= lim E

X~

(Sn - S)2

+

S2 lim E

X~

+

2S lim E

X~

(Sn - S)

n~ n~ n~ n~

The theorem is proved.

Corollary (3.2.1). Let X and S_n _n be specified as in theorem (3.2).

Let P(x) be a polynomial of degree p (3p ~ ~). Then lim Elp(S ) - peS) Ir =

n

n~

o

for 0 < r < ~ •

- p

Further, y = X Pc'S) dist

.~

normal

(0,

02 p2(S») and the first

n n n

two moments of y converge to the corresponding moments of the limiting

n

distribution.

Proof: By Taylor's theorem,

tV pes )

n

p

= pes)

+

L v=l v

= d P(x) I_

dxv

x =

s.

where

Therefore,

[

p

J

(v)

L

- pes)

I

r

~

L P v! (S) IS n v=l

and the given conditions ensure that

lim Elp(S ) - pes)

I

r = 0 for 0 < r ~ ~/p

n

(37)

The rest of the corollary.follows from the theorem (3.2), because 'U

P(S ) plays the same role as S .

n n

Corollary (3.2.2). Let X , 'USand P(x) be specified as in corollary

n n

(3.2.1). Let R(x) be a rational function of the form R(x) =

1~6(~)

Q(x) being any polynomial such that Q(x) > 0 for all real x. Then

lim EIR(Sn) - R(S)

I

r

~

°

for 0 < r

~ ~/p.

n-+oo

Further,

dist.

Z = X

R(~)

--:.>

normal [0, (J2 R2(S)], and the first

n n n

two moments of Z converge to the corresponding moments of the limiting

n

distribution.

_,

Proof:' Evidently Q (x) is bounded for all real x so that [1 +:Q(x)2:

there exists a positive constant M< 00 such that

,

~<M.

[1

+

Q(x)]2 -By Taylor's theorem,

1 1 +Q(S )

n

where

Y

_n

=

S

+

e(Sn - S), (0

~

e

~

1). Therefore, 'U

P(S ) - P(S)

n .

1

+

Q(S ) n

, 'U P(S) Q (Y

n)

[1

+

Q(y

)]2

, n

r

< c - r

pes ) -

P(S) E . n .'

1

+

Q(S ) n

r

+

c_r

, 'U

r Q (Y_{n )} 'U

I

P(8)

I

E - - - : ; ; . - (8 1

+

Q(y)

n

- 8)

r

by c - inequality, c

. r r

[12], p. 155).

r-l

(38)

Using the results of corollary (3.2.1), we have

= 0 for 0 < r < ~ • - p

The remaining part of the corollary now follows immediately from

'"

the fact that R(B ) satisfies the conditions stated in theorem (3.2). n

3.3 Convergence of Sequences of Random Vectors

Y1

Let ~ = Y2 be a K-dimensiona1 ranQom vector defined on the basic

probability space (Q,a,P), the symbols having the usual meanings. We

de:l;ine

Definition 4. E ~ =

J

~ dP =

Q

J

Y1dP

Q

J

Y2 dP

Q

The integral J~ dP is said to be finite (or existing), if each of

Q

the integrals on the right is finite.

In usual notation, we have

<

K

E

ly.l,

i=l ~

K

If

~ dP

I

2.

J

I~

1

dP 2. E

J

I

yi

I

dP •

Q Q i=l Q

.

(3.8)

The following very useful result is probably not new, but a simple

(39)

.Y_l

Theorem (3.3). Let ~= Y2 be a K-dimensional random vector.

Then for every E > 0,

_,

P [Max

I

yi

I

> E ]

~

P [ I

~

I > £] < E

(~'1.~

l<i<K E

Proof: Since I~I < E is a sphere in K-dimensional Euclidean space,

I~I ~ E ==;>Max IYi

l

~ E.

l<i<K

Or

Max IYi

l

> E ==>I~I > E.

l<i<K

[

_J

E(I~12)

Therefore, P Max 1Yi

l

> E ~ p[I~1 > E] < 2 =

l<i<K E

Theorem (3.4).

,

E~~

2 E

(i) Let ~ be a sequence of K-dimensional random vectors and

~ a K-dimensional random vector such that

,

E(~~ <

00,

and lim E(~ - ~ (~ - ~ = 0.

n~

.

l~ E(~) = E~, and

n~

,

lim E(~ - E~) ~ - E~) = E(~ - E ~)(~ - E~)

n~

(ii) Let c be a Kxl vector of constants such that

lim E(~ - s) (~ - s) = 0.

n~

Then lim E ~ = Plim ~

=

.£.

n~ n~

Proof: By Holder's inequality, we have

IE ~I ~ EI~I ~

f

_E~'~ ,which is finite by assumption. Hence

,

E~ exists. Moreover, it is evident that E(Z - E ~ ~ - E~) exists and

(40)

Now let y. and y. denote the corresponding elements of v and y

l.n l. -n

respectively. By the stated conditions, it follows that (y. - Yi) is l.n

always well-defined.

By theorem (3.3), we have

P [Max

I

y . - Yi

I

>

e:]

<

l<i<K l.n

,

E(.Y.n - ~ (.Y.n - ~

2

e:

, for every

-e

,

e:

> 0. But lim E(.Y.n - y) (.Y.n - y) .;, lim tr. E(.Y.n - ~.~ - ~

=

0,

n~ n~

since K is fixed.

Therefore, lim P [Max !Yin - yil >

e:]

=

0, which implies that

n~ l.::.i.::.K

P Yn ---7>" y.

Further, lim E(.Y.n) = lim E(~ - ~

+

E{y).

n~ n~

Since lim !E(.Y.n - y)

I .::.

lim EIXn - yl

~

lim

tl

_E( _ .. \ '(v _ )

=

0,

n~ n~ n~ ~ .:J..I ""'n Y

\ .

it follows that lim E(~ - ~ = Q,. and lim E Xn

=

E y.

n~ n~

=

lim E~ - y) (Xn - ~

n~

+

lim E(Xn - y)(y - E y)

n~

=

lim E{y - E

:U

(Xn - ~

n~

,

+

E(y - E ~ (y - E ~ •

,

+

lim E(y - E y) ~ - ~

n~

+

E{y - E y) (y - E y)

+

lim E(Xn - y)(y - E y)

n~

,

A typical element of E(y - E~(Xn - 1) is E(Y_i - E Y

i) (Yjn - yj),

Using Holder's inequality, we obtain

< lim

tl

_E( )2 ( _ y)2

=

0.

(41)

,

This implies that lim E (y, - E ::t) (~ -::t.) = 0

n~

= lim E~ - ::t.)(::t. - E ::t..>

n~

Hence, lim E(~ - E ~)(~ - E~) = E(y, - E ::t..>(::t. - E ::t..>

n~

Q.E.D.

,

Replacing ::t. by .£, we see that lim E~ - .£)~ -.£)

=

0 implies

n~

Plim ~

=

lim E~

=

c. This,proves (ii).

n~ n~

Yln

Theorem (3.5). Let ~

=

Y2n be two sequences of

K-dimensional random vectors such that

lim E ~ exists,

n~

,

lim E~) <

00,

and

n~

lim E ( ; - ~) ( ; -~) = O.

n~

Then (i) lim E ;

=

lim E ~,

n~ n~

(ii) lim Ee; - E ~) e ; -E ~)

n~

Proof: Since lim E(~ - ~)e; -~)

=

0,

n~

=

0, which implies that

Therefore,

< lim

I

tr.

n~

lim E ( ; ) = lim E( ; - ~)

+

lim E ~

=

lim E ~ ,

n~ n~ n~ n~

which is finite by assumption.

(42)

,

Further» lim E~)

n~

< 00 implies that ~ has existing covariance

matrix for all n and» moreover» lim E(~ - E~)~ - E~) exists. ·n~

Denote this matrix by V ~ [v .• ]. Then in the manner of corollary (3.1.2)>>

~J

it follows» from the given conditions» that

lim E(Zi - E Zi )(Z. - E Z. ) = lim E(Y_i - E Yin) (Y_jn - E Y_jn)

n~ . n n . Jn Jn n~ n

=

vij (i» j

=

1» 2, ••• » K). This implies that

,

lim E(Z - E Z )(Z - E Z ) -xl -xl --n -xl

n~

The theorem is proved.

,

=

lim E(~ - E~)(~ - E::Lu) •

n~

elements also depending on n such that

upon the integer n» and IT

=

(IT ) a KxK matrix with real nonstochastic rs

"-Theorem 3.6. Let IT'V = [IT'V ] be a KxK matrix of statistics depending rs

and:

lim IT = IT (positive definite)>>

n~

lim

E(~-

IT )4

=

0» (r» s

=

1» 2» ... , K).

n~ rs rs

Further» let ~ be K~variatenorma1[Q» V= V(n)]» where

lim V

=

V

(positive definite).

n~

Then (i) ~ ~ converges in distribution to normal (Q»

IT V IT

)>> 'V

(ii) lim E IT ~

=

Q» and

n~

'V 'V - - -(iii) lim E IT ~ IT

=

IT V IT •

n~

Proof: Let IT v and v denote the typical elements of IT» V

rs» rs rs

and

V

respectively.

Using Holder's inequality» we have» for t < 4

t

EI~

- IT

It

<

[E(~

_ IT )4] /4 •

(43)

Therefore,

lim E

I

rt'_rs n-t<>o

'U

A typical element of (IT - IT) Yn is or that

Further, c inequality (see Loeve [12], p. 155) gives r

Elrt' - IT

I~

= EI(rt'. - IT )

+

(IT - IT )

I~

rs rs rs rs rs rs

< 2~-1 [E·lrt' - IT I~ + lIT - IT I~]

- rs rs rs r s . ·

Since lim IT_rs = IT_rs' we have

n-t<>o

lim Elrt' - IT

I~

= 0,

~

= 1,2,3,4.

rs rs

n-t<>o

'U P

This implies that IT ----? IT (elementwise).

Since Yn dist.> normal (Q, V), it follows by Slutsky's theorem

'U _ - '

IT Yn approaches in distribution to normal (Q, IT V IT ).

Now E(rt' Yn) = E(rt' - IT) Yn + IT E Yn = E(Tl - IT) Yn' K

=

I: (II 0 - IT .) y. , and

i=l r1 r1 1n

n

< l:

i=l

E

I

(rt' . - IT .)y.

I

r1 r1 1n

K

=

l:

i=l

'U

E(IT ._r1 IT. )- 2

ri

Hence, lim IE(o )

I

=

r n-t<>o

K

l: ./ .. ' { . 'U _ -IT )2} 1 lim v. ~ E(IT . i

i= 11 r1 r

n-t<>o K

= l:

i=l

./- 'U - 2 = 0, which implies that

vii lim E(IT_ri - IT_ri)

n-t<>o

'U 'U

-lim E(IT Yn) = lim E(IT -IT) Yn = Q.

n-t<>o n-t<>o

'U ''U'

Finally, E(IT Yn Yn IT )

=

E[(IT - IT) Yn Yn (IT - IT) ]'U - ' ' U - ' + ITE[Yn Yn (IT - IT) ]- ' ' U - '

'U _ ' - ' - ' - '

(44)

'V ~ ' ' V - '

Consider the matrix (IT - IT)~ (IT - IT) • A typical element of this matrix is given by

eprs

K 'V K

= {E (IT i -

ff

i)Yi }" {E

(~i

-

ff

_{')Yi }}

i=l r r n i=l s S1 n

Using Holder's inequality, we obtain

K 4 1 1 4

1

lim IE eprs

l

< E lim

[{E(~ri

-

ff

_ri) }4

{E(~Si

-

ff

_si)4}4 _{{E yin}2]}

n~ i=ln~

"-

+

KE KE lim

i=l j=ln~

ifj

K K

+

E E

<1--i=l j=l 3V_iiV_jj ifj

= 0, which implies that

'V - ' ' V - '

lim E[(IT - IT) ~ (IT - IT) ] =

o.

n~

' V ' ' ' V '

Similarly, lim E[(IT -

ff)

~YnJ = 0 = lim E[~ (IT - IT) ].

n~ n~

-

,

-'

-

-'

- -

-,

Moreover, lim (IT E X~ IT ) = lim (IT V IT ) = IT V IT •

n~ n~

'V ' ' V ' I Therefore, lim E(IT

YnYn

IT ) = IT V IT .

n~

(45)

Corollary 0-.6.1)0

'V

Let TI, TI and.In he-specified-as in theorem (3.6).

Let Z be a sequence-of r<-dimensiona'l-random··ve-ctors·such that

"""'I1

4

Um E(Zin -:- Yin) = 0, (i

=

1, 2,

.0.,

,K).

n-+<X>

Then (i)

(if)

(ifi)

~ ~ di~

1::. -::.>nonnal (.Q., TI

V IT'),

tV ' t V - - - '

lim E(TI Z Z TI')

=

TI V TI •

-n-n

n-+oo

Proof: Using Holder's inequality, we have

2

[E(Zin - 4

1

lim E(Zin - Yin) < lim _Yin)

_]2

₌

0,

n-+oo n-+oo

(i = 1, 2, ••• , K)

, K 2

so that lim E(~ -~) (~ - ~)

=

E lim E(Zin - Yin)

=

0.

n-loOO i=l n-+oo

Therefore, by theorem (3.3),

Plim (Z_"""'I1 - v )_"""'I1

=

°.

n-+oo

Let

Z

be normal (Q,

V).

Plim

d

Z -

IT

Z) = PUm ~(Z

"""'I1 -n

n-+oo n-+oo

=

o.

P

Then ~

Z'

and

tV -

--

~)

+

PUm (TI - TI) ~

+

PUm TI

<.I.u -

Z)

n-+oo n-+oo

Thus

~ ~ dist.~

IT

Z, which is normal (0, TI

V IT').

Since Yin (i

=

1, 2, ..• , K) have existi~g moments of all orders which converge to the corresponding moments of y. (i

=

1, 2, .•• , K), the

~

given condition ensures that Zin has existing moments of the 4th order,

and moreover,

lim

E(Z~n)

=

n-+oo

J/,

limE(y. )

.~n

n-+oo

(46)

-e

The remaining part of the corollary, therefore, follows from the

theorem if we rep1ace.y._;:m by Z. and use these results.

~n .

Note: In many practical problems, we have to deal with random

variables whose distribution depends upon two or more integers. The

results established in sections (3.2) and (3.3) can be reworded to take

care of such a situation. Suppose that in theorem (3.2) we have the 'V

following hypothesis:.

S

is a statistic depending upon integers nand T

and 8 a finite constant such that

Ele-

sl~ (~~ 3) goes to zero as n

and Ttend to infinity independently. If X is normal [0,

a~n,T)]'

such

2

that lim lim

a~n

T) =

cr ,

it is straightforward to show that

~X

converges

n-+eo T-+eo '

2

in distribution to normal (0,

cr),

and that the mean and the variance of

(47)

4.0 ERROR COMPONENT MODEL: REDUCED-FORM ESTIMATION

401 Description of the System

This and the following three chapters are intended to provide

methods for the estimation of structural and reduced-form parameters in

a system of simultaneous linear equations with error structure specified

in (1.3). Following the common practice, we first consider reduced-form

estimation.

To begin with, let us restate in greater detail the assumptions

underlying the error component model. The structural equations are

,

~t A

+

~t B

+

~t

=

0 (4.1)

,

where ~t

=

_{(Ylit' Y2it ' "0, YMit) ,} ~t

=

_{(xlit ' x2it ' ••. , xKit),}

A is an MxM nonsingular matrix of constants with diagonal elements equal

I ,

to -1, B is a KxM matrix of constants, and ~t

=

(E_lit, E_2it, ••• , EMit)

is a vector of unobservable errors with E~it

=

U~it

+

V~t

+

W~it'

We make the following assumptions:

u

li

(i) For each i, t, ~

=

u 2i

w lit

and ~t = _w2it are

mutually independent M-dimensional normally distributed random vectors

with zero means and covariance-matrices

u u u

°11 °12 °lM

,

_u _u _u

n

= E(u. u.) =

°21 °22 °2M

u -J. -J.

U U u

(48)

e

v v v

all a₁₂ am

,

_v

v v

Q ₌ _E~~) ₌ _a

2l a22 a2M and

v

v v v

a~ll aM2 aMM

w w w

all a₁₂ aIM

,

_w _w

w

Q = _E~tWit) = a

2l a22 a2M

w

.

_w _w _w

aMI a_M2 a_MM

the elements of these matrices being all finite.

(ii) For each j.l ( = 1, 2,

.0.,

M), uj.ll' uj.l2 ' ••. , represent independent drawings from the marginal distribution of u_j.l which is

--

u

normal (0, a_j.lj.l); v l'_j.l marginal distribution

vj.l2' .•• , represent independent

of v_j.l which is normal (0, aV_j.lj.l);

drawings from the

and wj.lll' wj.l12' .•• , represent independent drawings from the marginal distribution of

w which is normal (0, aW ).

j.l j.lj.l

(iii) The exogenous variables x" (j = 1, 2,

J1.t

...

,

K) are

independent of the error terms E 't (j.l = 1, 2, ••. , M). j.l.1.

Suppose that we have a sample of n T observations for n

cross-sectional units and T time intervals. To avoid complications, we shall

assume that nTxK matrix X is nonstochastic and subject to the condition

that

<a)

~~

l

x:;

I

T~

exists and is positive definite, and further

that

n T 2

(b) ~ ~ (x ..t - x . . -

X.

t

+

x

j ) >

°

i=l t=l J1. J1.0 J • • •

(j = 1, 2, ... , K)

(49)

Nonstochastic.X implies the assumption (iii) above, while (4.2)

eJ.'isures that X does not include a colunmvector of ones. th

Inter1'iJ.s of nl' observations, we can write the]..l structural

equation in the compact form.

Y..

=

y]..l Ct

+

X S

+

e: ,

JJ ""1J ]..l""1J ""1J (4.3)

-e

th

where y is an nTxl vector of observations on the]..l endogenous ]..l

variable, y]..l is an nTx(t -1) matrix of observations on (t -1) endogenous

]..l ]..l

. th·

variables (other than y ) included in the]..l_]..l equation, X_{] . . l ] . . l}is an nTxK

. th

matrix of observations on K exogenous variables included in the ]..l ]..l

equation, e: is an nTxl vector of errors, Ct is an (t -l)xl vector of

un-""1J ""1J ]..l

known parameters, and S is a K xl vector of unknown parameters. '"'"'1J ]..l

If we write,

1 .Q. 0 0

_.

I _u]..ll _v]..ll _w]..lll

A

₌

0 1

_-

0

,

B

₌

I u

₌

_u]..l2 v

₌

_v]..l2 and w

=

_w]..l12 (4.4)

Il Il Il

0 1

0 0 0 1 I u _v]..lT _w]..li:lT

nTxn nTxT ]..In nxl Txl nTxl

where

1

is a Txl vector of ones and I is a TxT identity matrix, we have

e:

=

Au

+

Bv

+

w . ""1J Il Il Il

Using the assumptions listed above, we obtain

, , u

L

=

E(e: e:)

=

A A cr

]..l]..l 1-1 1-1 ]..l]..l InTxnT

J 0 0 0 I I I I 0 0

0 J 0 0 u

₊

I I I v

+

0 I 0 w

= cr cr cr

]..l]..l ]..l]..l ]..l]..l

0 0 J

(50)

e

u v

₊

crw ) I v I v I

cr J

+

(cr

llll cr cr

llll llll llll llll

v

I u J

+

(cr v

+

_{crW ) I} v I

=

cr a cr

llll llll llll llll llll

(4.5)

Here J is T:x:T matri:x: with 1 everywhere. There are n rows and n columns

of T:x:T block matrices.

Evidently, L

,

₌

E(E _{E' ,) where II}

+

II can be obtained from (4.5) llll ""1.1 ""1.1

,

by replacing one of the ll'S by II

.

By trial error and generalization we obtain

-e

a

1 I

+

a2 J a3 I

+

a4 J a3 I

+

a4 J

-1

L

=

a

3 I

+

a4 J a1 I

+

a2 J a3, I

+

a4 J (4.6)

llll

a₃ I

+

a

4 J a3 I

+

a4 J a1 I

+

a2 J

where

v

1 cr

a

1 =

-}.l)J

w w w v

cr cr (cr

+

ncr )

llll' llll llll llll u

[1 -

crV (20

W

+nov

+

To

U

1]

cr

= )J)J )J)J 1J)J lJ)J )Jll

a

2 w w u w v w v u

crllll(cr

llll

+

Tcrllll) cr,llll

+

ncrllll crllll

+

ncrllll

+

Tcrllll

(51)

The reduced form of the system written in (4.1) is

*'

~t =~t II

+

.f..:t.t

*'

,

-1 -1

where

.f..:t.t = -.f..:t.t A and II = -B A •

(4.8)

(lJ = 1, 2 , ..., M) denote the column vectors of

*

_(l

£ =

-

_{(£lit' £2it'}

...

,

_£Mit) lJit

M M M M

=

-

~ alJr £ = - ~ alJr U

-

~ alJr v

-

~ alJr

Writ

r=l rit r=l ri r=l rt r=l

-e

*

where ulJi

* * *

= ulJi

+

V

lJt

+

WlJit

M

= - ~ alJr U ,etc.

r=l ri (4.9)

,

0, i f i

+

i for all lJ, lJ = 1, 2,

...

,

M·

,

M M

v*,

,

~ ~ alJr alJ s _°v = i f

°lJlJ ' t = t

r=ls=l rs

,

0, if t

₊

t for lJ,lJ = 1, 2,

...

,

M;

M M

w*,

,

~ ~ alJr alJ s W_° = i f i i and t t

°lJlJ ' = =

r=ls=l rs

0, i f i

+

i and/or t

+

_t (4.10)

Recalling the assumptions about u., L. and w. , we see that

~ .. ~t

M M ,

~ ~ alJr alJ s nU = ou*' if i' = i, "rs lJlJ'

r=lr=l

and E(W*'t w*'.' ') =

(52)

e

_{Thus we can assert that}

* * *

(i) _~,~ _and~t are mutually independent M-variate normal

vectors with zero means and covariattce matrices

u* u* u*

°11 °12 °lM

rl

₌

u* u* u*

°21 °22 °2M

u*

u* u* u*

°M1 °M2 °MM

v* v* v*

°11 °12 °lM

rl

=

v* v* v*

°21 °22 °2M

v*

v* v* v*

--

°M1 °M2 °MM

w* w* w*

°11 °12

om

rl_w*

=

_°21w* _°22w* _°2Mw* respectively;

w* w* w*

°M1 °M2 °MM

(ii) For each ~,

*

u~l' u~2' ••. , are

*

marginal distribution of u which is

~

independent drawings u*

normal (0, ° );

~~

from the

*

v~l' v~2'

*

marginal distributio~of v

~

.•. ,are independent drawings from the v*

which is normal (0, °_{, .} ); and

. ~~

*

w~ll' w~12' .•. ,are independent drawings from the

* w*

marginal distribution of w which is normal (0, ° ).

(53)

1111 1111

1111 1111 1111

(4.12) and

* * * * * *

a

l I

+

a2 J a3 I

+

a4 J a3 I

+

a4 J

-e

*-1 _* _* _* _* _* _*

2: = a

3 I

+

a4 J al I

+

a2 J a3 I

+

a4 J 1111

(4.13)

* * * *

where aI' a

2, a3, and a4 have the same expressions as those of aI' a2, a3

u* v* w*

replaced by 0 0 0 respectively.

1111 ' 1111 ' 1111 '

*

'

The formula for 2: '(11

1111

+

_,

11) can be derived from (4.12) by changing one of the subscripts 11 into 11 •

The distinctive feature of the error component model is that the

covariance matrices of disturbance vectors of anyone of the structural

or reduced-form equations are not diagonal. Consequently, many of the

techniques which are ordinarily employed for structural and reduced-form