Estimation and inference in simultaneous equation models

(1)

warwick.ac.uk/lib-publications

A Thesis Submitted for the Degree of PhD at the University of Warwick

Permanent WRAP URL:

http://wrap.warwick.ac.uk/129879

Copyright and reuse:

This thesis is made available online and is protected by original copyright.

Please scroll down to view the document itself.

Please refer to the repository record for this item for information to help you to cite it.

Our policy information is available from the repository home page.

(2)

*

ESTIM A T I O N AND INFERENCE IN S I M U L T A N E O U S

EQUATION MODELS

ALASTAIR HALL

Thesis submitted for Ph.D. d e g r e e

University of Warwick

Department of Economics

(3)

Summary

Chapter 1: I N T R O DUCTION

1 . 1

The e c o n o m e t r i c model and the data g en e r a t i ng

process

1.2

The linear model as an a p p r o x i m a t i o n

1.3 Time varying linear models as an a p p r o x i m a t i o n

to n o n li n e a r models 1.4 Summa ry

Chapter 2: S TATISTICAL P R O P E R T I E S OF E S T I M A T O R S AND

LINEAR MODEL R E S U L T S

2.1

Introduction

2.2

Choice of e s t i m a t o r s in classical stat i s t i c s

2.3 I d e n t l f 1cati on

2.4 Information and e s t i m a t i o n

2.5 LS, J V and ML in the normal linear model

Chapter 3: A S Y M P T O T I C THEORY A N D E X I S T I N G L I T E R A T U R E ON

NLSEM' S

3.1 A s y m p t o t i c th e o r y 1n n o n li n e a r models

3.2 No n l i n e a r t hree stage least squares

3.3 Properties of NLFIML: Amemiya (1977) and

Phillips (1982)

3.4 N o n linear Ins t r u m e n t a l variables

Chapter 4: INFERENCE IN M I S S P E C I F I E D M O D E L S

4.1 Theory of the quasi MLE

4.2 Id e n t i f i c a t i o n In n o n l i n e a r models

Chapter 5: C O N S I S T E N C Y OF NLFIML

(4)

5.2 Consistency of N L F I M L 1n the general model

5.3 Consistency of N L F I M L when ut Is a weakly

stationary process

5.4 Examples of mo d e l s in w h i c h NLFIML Is

conslstent

5.4.1 E x p e n diture and cost share models

5.4.2 Logs and levels m o d e l s

5.4.3 Further e x a m p l e s

Chapter

6

: MODEL SPECIFICATION. C O N D I T I O N S FOR THE

CONSISTENCY AND A S Y M P T O T I C NORMALITY OF NLFIML

6.1

Model coherency

6.2

Coherency 1n p i e c e w i s e linear models

6.3 The Implicit f u n c t i o n t h e o r e m

6.4 Gale and Nikaldo (1968) u n i v a l e n c e theorems

6.5 Model s pe c i f i c a t i o n and e s t i m a t i o n

6.6

The Implicit f u n c t i o n t h e o r e m and analytic

functions

6.7 Implicit function t h e o r e m and c onsistency of

NLFIML

6.8

Asymptotic n o r m a l i t y of NLFIML

Chapter 7: CONDI T I O N S FOR G E N E R A L I S A T I O N OF STATIC M ODEL

RESU L T S TO DYNAMIC MODEL

7.1 Introductlon

7.2 Convergence of QML E

7.2.1 Discussion of Problem

7.2.2 Definitions

7.2.3 Proof of c o n v e r g e n c e of QMLE to KLIC

minimising value

(5)

7.4

7.5

7.6

Chapter

8

:

8 . 1

8.2

8.3

8.4

8.5

8.6

numbers for dep e n d e n t processes

7.3.1 Martingales

7.3.2 Mi xi ngales

7.3.3 Mixing Processes

R e l a t i o n s h i p between r o bustness of NLFIML and

the reduced form

A s y m p t o t i c n o r m a l i t y of NLFIML in dynamic

models

V e r i f i c a t i o n and s u i t a b i l i t y of the a ssumption

that series are mixing p r ocesses

THE INFORM A T I O N MATRIX TEST AND THE

E X P O N E N T I A L FAMILY

Pseudo m a x i m u m l ik e l i h o o d e s t i m a t o r s

Linear e xponential family

Poisson models

8.3.1 Poisson d i s t r i b u t i o n

8.3.2 Normal d i s t r i b u t i o n

8 .3.3 Gamma d i s t r i b u t i o n

8.3.4 Negative binomial d i s t r i b u t i o n

S p e c ification tests based on higher order

d e r i v a t i v e s of the like l i h o o d

Qua d r a t i c e xponential family

D1scu s s i o n

Chapter 9: C O N C L U S I O N S

Appendice s

(6)

Acknowledgements

I am greatly I n d ebted to Ken Wallis for his s upervision

of this work. He p r o v i d e d inval u a b l e guidance at all stages

of the research, and w i t h o u t his support the work would not have been completed.

I have benefltted f r o m very useful discussions on

various aspects of this thesis with Peter Burrldge, Peter

Crouch, Jan Magnus, Peter Phillips, George Rowlands and Mark

Sa l mo n .

I would also like to thank Carol Jones for her

skillful, and patient, t y p i n g of the original m a n u s c r i p t .

This research was u n d e r t a k e n du r i n g the tenur e s h l p of a

studentship from the E c o n o m i c and Social Science R e s e a r c h

(7)

We exam in e the a s y m p t o t i c p r o p e r t i e s of the full

in f or m at io n m a xi m u m l ikelihood e s t i m a t o r (M L E ) under the

a ss um p ti o n of n o r m a l i t y

1

n the general n o nlinear

s i mu l ta n eo u s e q u at i o n s model. The Initial analysis is for

the static model, and then the c o n d i t i o n s which allow the

g e n e r a li s at i on of the results to the d y n a m i c model are

e x p l o r e d .

We c o nc e n t r a t e on the quest i o n of the consis t e n c y of

the MLE when the n o r m a l i t y a s s u m p t i o n is erroneous. The

c on di ti on s for a s y m p t o t i c n o r ma l i t y are also considered, but

are given less e m p h a s i s beca u s e any tests based on the MLE

require c on si s te n t e s t im a t e s of its c o v a r i a n c e and so also

of Its mean. It 1s d e m o n s t r a t e d that 1f 1t 1s possible to

w ri te down an e x p l i c i t reduced form, then we can find

families of true nonnormal d i s t r i b u t i o n s for which the

e st i ma t or is c on s i s t e n t . However 1f the reduced form is

Implicit, then, apart from some special cases, the estimator

can only be proved to be c o n s i s t e n t if the model

1

s

c o r r ec t ly s pe ci f i e d . The nature of the reduced form 1n

n o n li n ea r models

1

s rarely c o n s i d e r e d , and we examine

c on di ti on s for Its u n i q u e n e s s . It 1s d e m o n s t r a t e d that this

entails more str i n g e n t c o n d i t i o n s on the Jacobian than are

usua ll y a c kn o wl e d g e d .

Finally we argue that the I n f o r m a t i o n matrix test 1s a

natural choice of s p e c i f i c a t i o n test for the pseudo MLE

s tr at eg y s uggested by Go u r l e r o u x , Monfort and Trognon

(1984a), which e s t im a t e s the p a r a m e t e r s of the nonlinear

regre ss io n model by m a x i m i s i n g the l i k e l i h o o d from a member

(8)

calculated for the Poisson model example d i scussed in

G o u r l e r o u x , Monf o r t and Trognon (1984b), and their

performance c on t r a s t e d with that of goodness of fit tests.

Also tests based on the E d geworth expansion are compared

with tests based on higher d e r i v a t i v e s of the standard

(9)

1 . I NT RODUCTION

1.1 The e c on o m e t r i c model and the data generating p r o c e s s .

The q u e s ti o n of how to e x p l a i n the behaviour of

economic series Is one of f u n d a m e n t a l Importance. The

choice of po l ic y Ins t r u m e n t s , and the appropriate m a g n i t u de

by which to adjust them, to a c h i e v e a particular goal

depends on our u n d e r s t a n d i n g of the economy. The central

problem

1

s that wh i l s t the o u t c o m e s of economic agents

actions are o bs e r v e d ,

1

t

1

s o nly p o s s i b l e to hypothesise the

decision ma k in g proc e s s from w h i c h these outcomes result.

This has n a t u r a l l y led to the use of statistical models to

attempt to e xp l a i n the I n t e r r e l a t i o n s h i p between economic

series. It 1s hoped that by u si n g data to explore the

nature of this I n t e r r e l a t i o n s h i p

1

n the past, sufficient

Information can be acquired to p r o v i d e useful forecasts of

what may happen In the future.

In e c o n o m e t r i c s 1t Is c u s t o m a r y to think of the data as

having been g e n e r a t e d by a p r o c e s s of the form

q (y^., x^. , a) ■ u ^ , t * l , . . . T , (1)

where y t , xt , ut are vectors of e n d o g e n o u s , exogenous and

error v a ri ab le s r e s p e c t i v e l y ,

1

n period t, and a Is a vector

of unknown p a ra m e t e r s . The f un c t i o n a l form q(*) 1s assumed

time I nvariant but 1s of unknown form. Typically Its

structure

1

s d e t e r m i n e d by a m i x t u r e of economic theory and

prior e x p er i en c e of the va r i a b l e s c on c e r n e d . Having

chosen q(*) the next step Is to e s t i m a t e the unknown

parameters. Three main e s t i m a t i o n strategies are

(10)

2

and m a x i m u m l ikelihood (ML). The latter requires an

assu mp t io n about the error distribution, and this

1

s usually

that 1t 1s normal. It 1s argued that the t r a nsformation

q ( *) of the u nd e r l y i n g series represents the mec h a n i s m that

generated the data and so, on average overtime, the observed

values of y t , xt sati s f y q ( y t ,xt ,a) - 0. However 1n any

time period q ( y t ,xt ,a) may be subject to a random deviation

from zero. This de v i a t i o n 1s considered equa l l y likely to

be p o s i ti v e or n e g a t i v e and decr e a s l n g l y likely as Its

absolute value Incre a s e s . This suggests u t should be

modelled as a bell shape dis t r i b u t i o n centered on zero. The

normal

1

s one such d i s t r i b u t i o n and has the added advantage

of ma k in g analysis of the model t ractable. The properties

of LS & IV estimates have been analyzed In the literature,

but little Is known of the properties of ML in n o nlinear

models .

In this d i s s e r t a t i o n we are concerned with the

situation 1n which y takes on values In Rm and q(*) Is an

un s pe c if ie d function but subject to certain regularity

condit io ns . Ne c e s s a r i l y some nonlinear models, for Instance

q u al it at iv e response models, are not en c o m p a s s e d by our

analysis. Within this f r amework we exam i n e the c onditions

under w h ic h the full I nformation ML e s t im a t o r 1s consistent

and a s y m p t o t i c a l l y n o r m a l l y d i s t r ibuted. From standard

likelihood theory 1t Is known that the MLE 1s consistent,

and both a s y m p t o t i c a l l y n o r m a l l y di s t r i b u t e d and the most

efficient when the model 1s c o rrectly specified. In this

thesis we c on c en t r a t e on the degree to w h i c h the MLE retains

these prope rt ie s when the true dlstrl b u t o n

1

s nonnormal, and

(11)

The q ue st i on of the robustness of an est i m a t o r 1s of

c o ns i d e r a b l e I m p o r t a n c e . The eventual power of the model

for either f o re c a s t i n g or policy analysis, as well as Its

a c c ur a cy

1

n e x p l a i n i n g the data, depends on the use made of

our a priori knowl e d g e , which

1

s at best t e ntative, and

s p e c if i c a ti o n s e a r c h e s c o n s i s t i n g of a succession of

d ia gn o st i c tests of model adequacy. There Is no unique

o rd er i ng for a pp l y i n g te s t s , nor any guarantee that

d i ff er en t p e rm u ta t i o n s of the sequence lead to the same

c on c lu s io n . There 1s, c o n s e q u e n t l y , no guarantee that the

original s p e c if i c a t i o n wa s correct nor that the model

s election p ro ce d u r e s are s u f f i c i e n t l y s o p h i s t i c a t e d to

I nd icate d i re c t i o n s 1n w h i c h 1t might be Improved. This 1s

p a r t i c u l a r l y true of the assumed error d i s t r ib u t i o n . The

n or m al i ty s p ec i fi c a t i o n c a p t u r e s a symmetric, or bell shape,

error process 1n an a n a l y t i c a l l y tractable fashion. As 1t

Is not the only choice s a t i s f y i n g this r equirement

1

t

1

s

Imp o rt an t to be aware of any biases

1

n Inference caused by

Its Incorrect I m p o s i t i o n .

These r e se r v a t i o n s about test procedures have

r a m i fi c a t io n s for the i n t e r p r e t a t i o n of an e c o n o m e t r i c

mo d el . It 1s Imp o r t a n t to distin g u i s h between the data

g en er at io n process (dgp) and ap p r o x i m a t i o n s to It. If 1t 1s

p os si bl e to find a f un c t i o n a l transfo r m a t i o n q(*), subject

to the c o n di t io n s

1

n (

1

), that represents the e xa c t

m e c h a n i s m by w hi c h a c h a n g e In the economic enviro n m e n t

effe ct s the b eh a v i o r of y t , then this particular

r e pr e se n ta t i o n 1s the d g p . In the abscence of knowledge

about the a p pr o p r i a t e c h o i c e of q ( y ^ , x t ,a), the model

(12)

4

I n te rr el at i on shi p between series Is a synthesis of a priori

economic theory and d iagnostic tests. It has been noted

above that such a procedure lacks the s o p histication to

infallibly d e te r m i n e the dgp. Therefore the econom e t r i c

model Is best regarded as an appro x i m a t i o n to the dgp, whose

accuracy depends on the estim a t i o n and model selection

procedures employed.

This 1s at the centre of the debate on the Lucas policy

critique. Lucas (1976) argued that e c o n o m e t r i c models could

not be used for policy analysis as they were by their very

nature s el f- fa ls i f y i n g. "Given that the structure of an

econometric model consists of optimal decision rules of

economic agents" (Lucas, 1976, p. 41) any change 1n a policy

variable will alter the economic e n v i r o n m e n t and therefore

agents' reaction f unctions. The s t ructure of the

econometric model Is c o n s equently, he argued, changing with

the policy variable over time. However only the outcomes,

and not the d ec ision making processes themselves, are

observed. Given the reservations cited above about the

genesis of a model sp e c i f i c a t i o n , the equations are,

therefore, better In t e r p r e t e d as a p p r o x i m a t i o n s to the

underlying reaction functions. In this case, as S1ms (1982)

notes, Lucas' conc l u s i o n reduces to the point that

"Statistical models are likely to be come unreliable

when e x tr a po l a t e d to make pr e d i c t i o n s for c o nditions for

outside the range e x p e r i e n c e d 1n the sample" (S1ms,

1982, p. 122)

1.2 The linear model as an a p p r o x i m a t i o n

(13)

original s p ec i f i c a t i o n . A lot of att e n t i o n has focused on

the use of l i ne a r models to explain e c o n o m i c series. These

have the a d v a n t a g e of relative c om p u t a t i o n a l ease compared

to n o nl in ea r m o d e l s , and so It is i m portant to c o n s i d e r

1

n

what s i t ua t io n s the choice of a linear model may be

suitable. Our ar g u m e n t s suggest that in a large number of

cases such m o d e l s are I n a pp r o p r i a t e , and so, there

1

s a need

to deve lo p the t h e o r y of their n o n l i n e a r c o u n t e r p a r t s . For

this section we confine attention to sc a l a r y t and a vector

of ex o ge n ou s v a r i a b l e s , but the a r g u m e n t s can be g eneralised

to vector y t . We consider two j u s t i f i c a t i o n s for the linear

form

y t = xt a + u t* (

2

)

as an a p p r o x i m a t i o n to a non l i n e a r dgp: the n o r mality

of (yt ,x£) and first order Taylor series ex p a n s i o n s .

If (yt .x£) have a joint normal d i s t r i b u t i o n then

x£a » E ( y t |xt ). The a ssumption of n o r m a l i t y can be

j ustified q uite ea s i l y

1

f y t

1

s an a g g r e g a t e , by appeal to

central limit t h e o r e m s . However the sample sizes for w hich

these hold will vary from case to case. If yt 1s not an

aggregate then, from the Edgeworth e x p a n s i o n of Its p.d.f.,

the n o rm al it y of y t results from the a ss u m p t i o n that all Its

cumulants h ig h er than the second are zero.

A l t e r n a t i v e l y 1t may be argued that 1f the dgp 1s y t *

f (x t ) + vt then 1f we take a first order Taylor series

(14)

6

f(x) + E ( x 1 t- x i )Af.

1 = 1 3x

1

t

+ EE (x1 t - X , ) (X.t - X .)

1,j«l 11 1 3

₂

_X

₁

_{t 3Xjt} +

then e qu ating higher order terms to a white noise r a n d o m

v ar ia bl e (r.v.) In d e p e n d e n t of v t , we have a j u s t i f i c a t i o n

for the linear model. There are two main flaws 1n this

a rgument. Firstly, as noted by Bowden (1974), the

d e ri va ti ve s are state d ep e n d e n t , and the r e f o r e not f i x e d as

assumed in the linear model. Secondly, as White (1980) has

argued, the Taylor series 1s onl y valid as a local

a pp ro x im a ti o n whereas we wish to explain b e h a v i o r t h r o u g h o u t

the sample space, and use d i s p e r s e d data to e s t i m a t e t h e

p a r a m e t e r s .

Linear models are also e n c o u n t e r e d 1n the time s e r i e s

litera tu re . The Wold d e c o m p o s i t i o n t h e o r e m e s t a b l i s h e s that

a s t a ti o na r y series can be split Into d e t e r m i n i s t i c and non

d e t e rm i n i st i c co m p o n e n t s , and that this n o n d e t e r m l n

1

s t

1

c

c o m ponent has an Infinite order moving average

r e pr e se nt at i on . The removal of trend and seasonal f a c t o r s

from econo mi c series

1

s u s u a l l y thought to render t hem

s t at io n ar y and n o n d e t e r m l n l s t 1 c . A more p a r s i m o n i o u s

r ep re se n ta t i o n of this c o m po n e n t 1s an ARMA model and, by

using s t a t l o n ar l ty to pool I n f o r m a t i o n , the a p p r o p r i a t e

order of the model can be I d e n t i f i e d by the c o r r e l o g r a m and

partial a u to c o r r e l a t i o n f u n ction of the series. The model

In (

2

) can be derived as a set of p a r ameter r e s t r i c t i o n s on

a m u l t i v a r i a t e ARMA model for (yt ,x£). The Wold t h e o r e m

only states that this moving a v e r a g e r e p r e s e n t a t i o n e x i s t s -

(15)

A n d e r s e n (1978) has d e m o n s t r a t e d that I d e nt i f i c a t i o n via the

c o r r e l o g r a m

1

s only u n a m b i g u o u s within the class of linear

m o d e l s . It can be shown that b i l i n e a r models of the form,

w i t h c ^ =

0

for k > m, have the same a u t o c o v a ri ance

s tr u ct u re as an A R M A ( p , m a x { q , s > ) model. Higher order

c o r r e l a ti o ns will be needed to u n i q u e l y i d e n t i f y a model

w i t h i n this class, but the c o m p l i c a t e d nature of this

a n a ly s is has tended to result

1

n Inform a t i o n criteria being

u sed to d i sc ri m i n a t e b et w e e n b i l i n e a r models. However

G r an g e r and Ander s e n ' s r e s u l t s und e r l i n e that the linear

r e pr e s e n t a t i o n , whilst a n a l y t i c a l l y trac t a b l e . Is not

a c c o r d e d any statistical o p t i m a l i t y by the Wold theorem.

R a t h e r 1t is just one model fo r m u l a t i o n c o n s i s t e n t with the

s a m p l e a u to c or re l a t i o n s t r u c t u r e .

The use of linear m o d e l s may be a p p r o p r i a t e 1n certain

c a s e s either because the dgp Itself Is linear or as an

a p p r o x i m a t i o n to a n o n l i n e a r dgp. Whilst a linear model has

the a dv antage of analytical t r a c t a b i l l t y our review of the

t he or et i ca l j u s t i f i c a t i o n s for Its use, suggest that It

1

s

by no means always a s u i t a b l e model choice. These are also

g r o u n d s for exp e c t i n g t r a d i t i o n a l model d i a g n o s t i c s to be

I n ad e q u a t e I ndicators of s i t u a t i o n s

1

n which est i m a t e d

l i n e a r model can be I m p r o v e d on by a d o p t i n g a nonlinear

f o r m u l a ti o n. The I n t e r p r e t a t i o n of s p e c i f i c a t i o n tests 1s

n o r m a l l y within the context of the linear frame w o r k . Tests

for Incorrect functional f o r m have been d e v e l o p e d

1

n the

l i te r a t u r e but the choice of a l t e r n a t i v e hy p o t h e s i s , and Its

P q r s

+ £ E C

k = 1 m = l

(16)

8

i nt er pr e ta t i o n 1f accepted, may be p r o b l e m a t i c a l . We do not

exam in e these Issues but c oncentrate on the p ro p e r t i e s of

estim at or s once a n o n li n e a r formulation is chos e n .

1.3 Time varying linear models as an a p p r o x i m a t i o n to

no n li n ea r models

Given the data d e p e n d e n c e of the d e r i v a t i v e s In a

Taylor series ap p r o x i m a t i o n , the natural e x t e n s i o n to the

linear appro xi ma t i o n

1

s to adopt a time v a r y i n g linear

model. In this case the c o e f ficients on the x-j- are regarded

as altering overtime with certain p r operties of their

b eh av io r known. An e x a m p l e of this 1s the s t a t e space

system, outlined for Instance by Harvey (1981), In which

par a me te r e stimates are updated after each o b s e r v a t i o n by an

u pd ating p ro cedure such as the Kalman filt e r . This model 1s

s ui table for e v o l u t i o n a r y processes, but we a rg u e below that

Its d ep en de nc e on past o b s e r v a t i o n s may make

1

t Inapplicable

for m od e ll i ng non l i n e a r systems. An a l t e r n a t i v e 1s to

employ s witching regression models, w hich c o n s t i t u t e an

extreme form of varying par a m e t e r model. T hese have been

s ug gested by Tong and L1m (1980) 1n the time series

literature, and are f a m i l i a r

1

n e c o n o m e t r i c s w i t h reference

to mark et s In d i s e q u i l i b r i u m . Tong and Urn's (1980)

thr e sh ol d a ut or eg r e s s i o n model takes the form

y t ■ B( J t )yt + A ( J t )yt _i + et (Jt ) + c ( J t ),

where y t

1

s a vector of endog e n o u s variables In period t,

A (j ), B(j) are matrices of fixed c o e f f i c i e n t s and et (j) 1s

(17)

value of the I n d i c a t o r variable J t w hich dete r m i n e s the

value of B(Jt ), A ( J t ), C ( J t ) and the d i s t r i b u t i o n of et (Jt ).

Whilst this f o r m u l a t i o n is of little practical use 1n

most e c on om et ri c settings It does highlight the potential

weakness of t i m e - d e p e n d e n t p a r a m e t e r mode l s . The problem 1s

that knowledge of an a p p r o p r i a t e I n d ic a t o r

1

s required, but

this Is u nl ik el y to be av a i l a b l e due to the unknown nature

of the dgp. This a p p r o a c h 1s, however, more consi s t e n t with

the Idea of d i f f e r e n t linear a p p r o x i m a t i o n s to an underlying

nonlinear dgp. In any n e i g h b o u r h o o d of a parti c u l a r point,

y t , the b eh a vi o ur of y t can be e x p l a i n e d by a linear Taylor

series a p p r o x i m a t i o n with fixed c o e f f i c i e n t s . However as

y t , and so the c e n t r e of the e x p an s i o n y t , moves through the

sample space the c o e f f i c i e n t s of the linear expansion

change. However t here 1s no reason to suppose they evolve

by a p articular s t o c h a s t i c law. If we regard the

a pp ro pr i at e l in e a r a p p r o x i m a t i o n as being Indexed by some

state dependent v a r ia b l e , then

1

n varying parameter models

in which the c o e f f i c i e n t s are p r e sumed to evolve over time

by some s t o ch a st i c proce s s , past o b s e r v a t i o n s from other

regimes are still a f f e c t i n g the e s t i m a t e s . For Instance If

we pass the hyp o t h e t i c a l switch point, the varying parameter

model still bases Its c o e f f i c i e n t e s t im a t e s on the previous

regime. H ar rison and Stevens (1976) have sought to adapt

the state space r e p r e s e n t a t i o n to a B a y esian framework.

This allows the I n t e r v e n t i o n of s u b j e c t i v e I nformation 1n

the updating to w e i g h t m ore heav i l y the last o b s e r v a t i o n

when there

1

s reason to expect previous e xp e r i e n c e to be

m i sl ea di ng . The e x a m p l e s they give for this model are short

(18)

10

climate 1n the next period may well be available. However

we t yp ically do not know when the neglect of the u n derlying

non l in ea r it ie s of the system will make our model u nreliable.

1.4 Summary

We have argued above that l i n e a r models with or with o ut

time varying param e t e r s are not n e c e s s a r i l y always suitable

app r ox im a ti on s to the dgp. In this thesis we consider

situations in which a more general nonlinear model

1

s deemed

a pp ro priate. The m a j o r i t y of our analysis deals with models

of the gene ra l it y of equation (

1

) and

1

s concerned with the

p roperties of the MLE once a functional form has been

chosen, and not w i t h methods of s e l e c t i n g the functional

form. The c o n s is t e n c y and a s y m p t o t i c normality of the

est i ma to r are, of course, p r e r e q u i s i t e s for specification

searches for a be t t e r a p p r o x i m a t i o n using conventional test

procedures such as the Wald, l i k e l i h o o d ratio or score

t e s t s .

This work 1s based on a s y n t h e s i s of two areas of the

literature, and d e v e l o p s new anal y t i c a l results to answer

q uestions p r ev i ou s l y unexplored 1n those areas. Existing

work on the p ro pe r t i e s of e s t i m a t o r s In linear and non l i n e a r

mo d el s tends to assume the model s p e c i f i c a t i o n Is correct

and explores what parts of the s p e c i f i c a t i o n can be relaxed

without losing the des i r a b l e p r o p e r t i e s of the estimator.

This 1s dif f er en t from the a p p r o a c h taken by White (1982)

who examines the prop e r t i e s of the MLE when It 1s admitted

from the outset that the model

1

s m

1

s s p e c

1

f

1

ed (In this case

the estimator Is called the quasi ML E (QMLE)). White (1982)

(19)

this e s t i m a to r to the value that m i n im i s e s the Kullback

U e b l e r (1951) I n f o r m a t i o n cri t e r i o n (KLIC). Our work

f o ll ow s the p ra c t i c e of the s i m u l a t a n e o u s equations model

(SEM) l i t er a tu r e and con s i d e r s c o n d i t i o n s for the

c o n v e r g e n c e of the QMLE to the true value 1n nonlinear

m o d e l s .

In chap te r 2 we discuss the l i terature on linear SEM's

and the I n t e r r e l a t i o n s h i p between the three stage least

s qu ar e s, full I n f o r m a t i o n MLE and full Information

I n st ru me n ta l v a r ia b l e s e s t i m a t o r . The aim 1s to demonstrate

the line of a rg ument by w hich previous authors have

e s t a b l i s h e d the c o n s i s t e n c y and a sy m p t o t i c n o r mality of the

MLE in this s it u a t i o n . This work would appear a logical

s t a r t i n g point for d e r i v i n g ana l o g o u s results for nonlinear

m od e ls , and so we need to I d e n t i f y at w hich stages of these

a r g u m e n t s l in e ar i t y 1s c r u cial. We also c o n sider the

a d v a n t a g e s of e s t i m a t i n g e q u a t i o n s s i m u l t a n e o u s l y (full

I n f o r m a t i o n (FI) e s t i m a t i o n ) as opposed to Ind i v i d u a l l y

( li mi te d I n fo rm a t i o n (LI) e s t i m a t i o n ) . In this thesis we

focus p u re l y on full I n f o r ma t i o n e s t i m a t o r s .

In c h ap te r 3 we su r v e y p r e vious e x p l o r a t i o n s of the

p r o p e r t i e s of t h e s e t hree e s t i m a t o r s

1

n non l i n e a r models.

A me m ly a (1977) has shown that the Instrumental variable

I n t e r p r e t a t i o n of MLE does not persist to n o nlinear models,

and so H a us ma n' s (1974) proof of the c o n s i s t e n c y of the MLE

does not g e ne r a l i s e from linear to no n l i n e a r models.

P hi ll i ps (1982) has shown that there must exist classes of

the d i s t r i b u t i o n s for w hi c h ML e s t i m a t i o n under n o r mality

p ro vi d es c on si s t e n t e s t i m a t e s . However very little 1s known

(20)

12

argue that the approach taken by Phillips (1982) cannot be

extended to provide I n f o r ma t i o n on this Issue. We also

consider the c onditions u nd e r which an asym p t o t i c theory for

nonlinear models can be devel o p e d .

Chapter 4 contains an outline of the n e c e s s a r y results

from the m 1 ss p ec 1 f 1 e d model literature. We show that the

focus of our work 1s d i f f e r e n t from that of White (1982).

He derives c on di t i o n s for the convergence 1n p r o b a b i l i t y of

the QMLE to the KLIC m i n i m i s i n g value, whereas we examine

the cond it i on s under w hi c h this value Is

1

n fact the true

value. We also explore the diff i c u l t y of v e r if y i n g second

order c on di ti on s for c o n s i s t e n c y , and the use of

dist ri b ut io n free I d e n t i f i c a t i o n criteria to c heck these

conditions. Att e n t i o n 1s focused on the criteria d e veloped

by Brown (1983) for n o n l 1n e a r - 1 n - v a r 1 a b le s models.

In c ha p te r 5 we c o n s i d e r various altern a t i v e analytical

approaches to that of P h i l l i p s (1982) for d e r i v i n g

conditions for the c o n s i s t e n c y of the MLE. We e s t a b l i s h

that there exists a fa m i l y of weakly stationary tru e error

processes w hose conditional distribution varies ov e r t i m e ,

for which the MLE under the assumption of I n d e p e n d e n t l y and

I de nt ically d is t ri b u t e d (

1

.

1

.d.) normal errors p r o vides

consistent e s ti m at o r s . H o w e v e r the analytical d e r i v a t i o n of

nonnormal 1 .1 . d. true e r r o r distributions, for w h i c h ML

e stimation u nder n o r m a l i t y retains these des i r a b l e

properties, depends on the nature of the reduced form. If

1

t can be writ te n down e x p l i c i t l y then we can find true

d i st r ib u ti on s for which N L F I M L Is consistent, a l t h o u g h the

class Is likely to be much narrower than Its linear model

(21)

In c ha p te r

6

we c o n sider the case where the reduced

form Is Imp l ic it . We show that the c o n dition for

c o n s i s t e n c y Involves all the mome n t s of the d i s t ribution.

In this case the analytical results a v a il a b l e are that

NLFIML 1s c o n si s t e n t when the model Is cor r e c t l y specified

or If the e rr or

1

s from the class of d is t r i b u t i o n s

c o ns i d e r e d by Phillips. However Phillips' proof only

e s ta b l i s h e s the e x i s t e n c e of such a class, and as Its exact

nature varies from case to case, our results suggest that

1

f

we require c o ns i s t e n t and a s y m p t o t i c a l l y normal e stimates,

NLFIML should not be used when the reduced form 1s Implicit.

We e x pl o r e the condi t i o n s for a set of structural

e q ua ti o ns , such as (

1

), to Imply an u n i q u e l y defined reduced

form. An e x a m i n a t i o n of the work of Gale and Nlkaldo (1968)

shows that t hese condi t i o n s are more strlgent than Is

u s ua ll y r e c og n is e d 1n the e c o n o m e t r i c s literature. Fina l ly

we c on si d er the c o n d i t i o n s for the a s y m p t o t i c n o r mality of

NLFIML. W hi te (1983) observes the Impor t a n c e of consi s t e nt

e s ti m a t i o n of the first moment for that of the covar i a n c e of

the QMLE. Wh i ls t White's analysis c o n t a i n s an algebraic

slip, the e ss e nc e of h

1

s comments retains Its validity.

With ou t c o n s i s t e n t est i m a t e s of the covari a n c e , traditional

t e st in g p r o ce d ur e s based on the p a r am e t e r estimates break

down. In c o n t ra s t NL 3 S L S 1s c o n s i s t e n t and a s y m p t o t i c a l l y

normal under the same moment c o n d i t i o n s as In the linear

model, and so w ould appear the p r e fered e stimator.

C h ap t er 7 c o n t a i n s a d i s c u s s i o n of the conditions under

(22)

14

e xt ended to d yn am i c models. We e x a m i n e the types of dynamic

processes for w hi c h we can apply a vers i o n of the strong law

of large numbers and so replicate our earlier analysis for

static models. Curr e n t practice 1s to employ either

m ar ti ng al e or m ix i n g process argu m e n t s . McLelsh (1975) has

shown both types of processes to be m l x l n g a l e s for which the

desired law of large numbers can be d e r i v e d . White and

Domowltz (1983) have advocated the use of mixing processes

as they have the adv a n t a g e that f u n c t i o n s of them are

t hemselves mi x in g proce s s e s , and so t h e i r use Involves one

basic assu mp t io n about y ^ . Whereas th e marti n g a l e arguments

entail a series of a s s u m ptions about f u n ctions of y t

Inva ri a bl y with ou t exa m i n i n g their c o n s e q u e n c e for the

under ly in g series. However we argue u si n g some results due

to Jones (1976) that, c o n trary to the vie w a p parently

expressed by White and Domowltz, the theoretical validation

of whet he r a p a r ti c u l a r series g e n e r a t e d by a model

1

s In

fact a mi x in g p ro cess,

1

s likely to p r o v e Impossible.

This chap te r also contains an e x t e n s i o n of a proof by

Heljmans and Magnus (1983a) of c o n s i s t e n c y of the MLE, under

weak c on di ti on s on the u nderlying p r o c e s s ,

1

n cor r e c t l y

specified models to the case of m 1 s s p e c 1 f 1 e d models. We

show that the MLE con v e r g e s to the KLIC d i m i n i s h i n g value 1n

their f r am ew o rk . Finally, we consi d e r the c onditions for

a symptotic n o r m a l i t y of the QMLE 1n d y n a m i c models. In

part ic u la r we focus a t tention on the c h o i c e of scaling

factor. White and Domowltz (1983) p r e s e n t a central limit

theorem that r eq uires a constant s c a l i n g factor m ul t i p l i e d

by the Increase of the square root of the sample size. They

(23)

nonnormal a s ym p t o t i c d i s t r i b u t i o n . We argue, using the work

of Hall and H eyde (1981), that this need not be the case.

In c h ap t er

8

we argue that the information matrix test

s uggested by W hi t e (1982) is a natural test of model

s pe ci f ic a ti o n when e m p l o y i n g the pseudo maxi m u m likelihood

e s ti ma t io n str a te g y , a d vocated by Gourleroux, Monfort and

Trognon (1984a), for the n o nlinear regression model. We

c al c ul a te the a p p r o p r i a t e tests for the Poisson model

exam pl e c o n si d er e d by G ourleroux, Monfort and Trognon

(1984b). The r e s ul t i n g tests of d i s t r i b u t i o n are compared

with goodness of fit tests. We comp a r e the higher order

l ikelihood d e r i v a t i v e tests (suggested by Chesher, 1983)

based on the s ta ndard normal l i kelihood with the tests based

on E d ge wo rt h e x p a n s i o n s (Keifer and Salmon, 1983) and show

that they c o i n ci d e for tests of the third and fourth m o m e n t s

but not for the fifth. Fina l l y 1t is shown that the

d e c o mp o s i ti o n of the i nformation matrix test in the linear

model r eg re s si o n model, d e m o n s t r a t e d by Hall (1982), can be

e xt en de d to Its non l i n e a r c o u n terpart.

C h ap t er 9 c o n tains some con c l u s i o n s , after which some

(24)

16

2. STATISTICAL P R O P E R T I E S OF E S T I M A T O R S AND LINEAR MODEL

R E SU LT S

2.1 INTRODUCTION

The properties of and r e l a t i o n s h i p between maximum

l ikelihood (ML), least s q u a r e s (LS) and Instrumental

v ariables (IV) have been e x p l o r e d at length 1n the

l it er at ur e for the linear m o d e l . It 1s well known that all

three can be considered IV e s t i m at o r s , which provides a

c on ve n ie n t proof of their c o n s i s t e n t l y and asymptotic

n or m al i ty provided the e r r o r process has mean zero. Whereas

ML under n ormality Is the m o s t e f f ic i e n t If the

s p ec i fi c at io n 1s correct, a class of IV e stimators.

Inc l ud in g LS, are a s y m p t o t i c a l l y eq u i v a l e n t . In this

c ha p te r we outline the b as i s of these results to Illustrate

both why linearity deliv e r s such powerful results and why

the type of arguments used cannot n e c e s s a r i l y be generalised

to the n onlinear setting. We also I n t roduce and discuss the

c ri te r ia for choice of e s t i m a t o r s , I d e n t i f i c a t i o n and full

or limited Information e s t i m a t i o n of systems of equations,

the basic theoretical Issues of which are relevant to all

m o d e l s .

2.2 Choice of Estimators 1n Classical Statistics

The m aj or it y of e c o n o m e t r i c the o r y Is based on

classical statistics. P r o b a b i l i t y statements have a

f r eq ue nt ls t I nt erpretation as the sit u a t i o n e n visaged

1

s one

1

n w hich the researcher can g e n erate unlimited data by

r e pe at in g the experiment u n d e r Identical conditions. In

e c o n o m e tr i cs the data are o b s e r v e d p a s si v e l y and so

1

t

1

s

(25)

s ta t lo n ar i t y , before the classical framework can be used.

This done, we h y p o t h e s i s e a probability model of the

form q (y • x , a) =* u , with a ssumptions about u,y,x,q(*)» to

explain the o bs e r v e d relationships between economic

varia bl es . The model 1s Indexed by an unknown parameter

vector a and the aim of classical statistics

1

s to reduce

our u n ce r t a i n t y about a by point and Interval estimation

using I n fo rm at io n 1n the data. The point e s t i m a t e

of a , a , Is a f u n ction of random variables and so Is Itself

st o ch a st ic . The Interval estimate, or hypothesis test,

gives an Idea of the sampling distribution of a and so of

the de g re e to w hich a eva l u a t e d at the realised data values

1

s a "true" r ef l e c t i o n of a .

We can c o ns t r u c t any number of estim a t o r s from the

data, but as our In f e r e n c e depends on a It Is des i r a b l e to

have some method of " screening out" poor e s t i m a t o r s . The

classical c r it er i o n for a c h ieving this Is to require a to be

(1) u nbiased: E ( a ) * a and/ or (1 i ) c onsistent: p U m a - a .

The e s t i m a to r chosen 1s the most efficient (1n the sense of

having m i ni m um vari a n c e ) , of those satisfying (

1

) and (

1 1

).

In e c on o m e t r i c models an estimator 1s u s u a l l y a

c o mp l i c a t e d function of the error random v a riables making

its small sample d i s t r i b u t i o n a n a l ytically I n t r a c t a b l e and

so the d i s cu s si o n Is limited to large sample p roperties,

nam e ly c o ns i s t e n c y and a s ymptotic effici e n c y . The problem

of Interval e s t i m a t i o n reduces to finding the c o nditions for

c o n s i s t e n c y and a s y m p t o t i c normality of a under particular

c i r c u m st a nc e s. The argum e n t Is that whilst we may know

noth in g of Its small sample behavior, an e s t im a t o r

1

s

(26)

18

However any Interval esti m a t i o n using asymptotic results

requires the a s s u m p t i o n that Indeed the sample size Is large

enough, although this Is rarely check e d . Asymp t o t i c theory

can be regarded as an a p p r o x i m a t i o n to the finite sample

result. In any p a r t i c u l a r s i t uation better approxi m a t i o n s

can be developed from the a s y m p t o t i c estimates by using

Edgeworth e x p a n s i o n s to analyze the effects of the largest

as y mp t ot ic a ll y n e g l i g i b l e terms In the dis t r i b u t i o n function

of the e stimator.

2

.3 Identi f

1

cat

1

on

The analysis of the p ro p e r t i e s of estimators

presupposes that t h e para m e t e r s can be uniquely deter m i n e d

from the data or,

1

n statistical p a r lance, that the model

1

s

Identified. E c o n o m i c theory has limi t e d our a t tention to a

particular family of p r o b a bi l i t y d i s t r i b u t i o n s , termed the

model, but what we seek

1

s the struc t u r e , the p a r t i c u l a r

d is tribution, m ost likely to have g e nerated the data. The

problem of lack of I d e n t i f i c a t i o n

1

s es s e n t i a l l y one of

observational e q u i v a l e n c e . This arises when two s tructures

are Identical, and so I n d i s t i n g u i s h a b l e from sample data. A

structure Is I d en t i f i a b l e If, and o n l y

1

f, there are no

o bs er v at l o n al l y e q u i v a l e n t st r u c t u r e s ,

1

n which case the

parameters can be u n i q u e l y d e t e r m i n e d from the data.

A well known e x a m p l e of lack of I d e ntification 1s when

the common factor r e s t r i c t i o n occurs 1n ARMA models.

Consider the s t a t i o n a r y ARMA(1,1) model:

t

- 1

+ et'

(27)

By repeated s u bs t i t u t i o n for lagged values of y, (3) can be

writ te n as

j = 0

et + U + e) i *J et . ..

j » 0 t - J- l

Any structure for w hi c h + - -e is not Identifiable as then

y t 1s white noise. This problem can occur, with the same

c o ns e qu e nc e s for I d e n t i f i c a t i o n , in a more general model

H ( L) y t - ♦ ( L )et ,

1f H(L) « y(L)H*(L) and *(L) = f ( L)**(L). The model cannot

be Identified due to the common roots shared by both

p ol y no m ia l s H(L) and *(L).

The p r ob le m of lack of I d e n t i f i c a t i o n 1s essentially

one of I n su ff i ci en t I n f o r m a t i o n to en a b l e the parameters to

be d et er mi n ed . This can be offset by Introducing additional

In f or m at io n Into the problem, 1n the form of parameter

r e st r ic t io ns . These can take two forms: nonstochastic

r e st ri c ti on s on a an d / o r s tochastic restrictions on the

p.d.f. of u. For a s t ructure to be model admissible, It

m ust satisfy these r e s t r ic t i o n s , and

1

t

1

s hoped that

s uf fi ci en t r es tr i c t i o n s can be Imposed to reduce the number

of model a dm is s ib l e stru c t u r e s to one.

Ide n ti fi c at i o n Is a general statistical problem, but 1n

e c o n o m e tr i cs

1

t Is n o r m a l l y a s s o c i a t e d with simultaneous

e qu at i on mode ls . For I l l u s t r a t i v e purposes we consider the

(28)

20

B 'y t + r'xt * u t , t • î ...t,

where y t Is a N x 1 vector of endog e n o u s variables, xt 1s a

K x 1 v e c t o r of e x o ge n o u s v a riables and ut 1 s a N x 1 vector

of mean z er o d is t u r b a n c e s with c o n t e m p or a n e o u s covariances

matrix E and E(u^u^) = 0. The reduced form for y^ is

y t = ïïxt + vt* t ‘ 1 ... T »

where v^ * B' ^u^. Note that we require B to be nonsingular

for there to be a un i q u e reduced form a ssociated with the

structural e qu at i o n s . We return to the condi t i o n s for such

a mapping betw ee n y and u

1

n a more general setting

1

n

chapter

6

. The reduced form 1s n e c e s sa r i l y Identified and

the I d en t if i c a t i o n of the structural equations depends on

whether g i v e n est i m a t e s of n we can uniquely det e r m i n e

(B,r). The r el at i o n s h i p betw e e n structural and reduced form

parameters

1

s given by

AW = 0 where A » [B':r'], W' =

As the s y s t e m stands there 1s Ins u f f i c i e n t I nformation to

estimate th e param e t e r s of the 1th equation, . They must

satisfy the r e st r i c t i o n s a{W * 0 but as rank(W) » K there

are only k l in ea rl y I n d e p e n d e n t restrictions on the N+K

elements of . However 1f we know that the coe f f i c i e n t s

have linear r e st ri c t i o n s betw e e n them of the form *

0

,

then this I nf o rm a t i o n can be used to achieve I d e n t i f ication.

The vector must then sati s f y affW:*) > 0, and so a

(29)

up to a scalar multiple is that rank(W:+) = N+K-l. The

matrix {A’*w] is a n o n s i n g u l a r matrix of dimension N+K a^id so

its colu mn s form a basis for R N + K . We can t h e r e f o r e w r i t e

♦ = A'e + W n ,

and as A* = AA'c, because AW = 0, rank(A+) = r a n k ( g ) . This

enab le s the c on dition for i d e n t i f i c a t i o n to be r e s t a t e d

1

n a

more c o n v e n i e n t form. For rank(W:$) to be N+K-l, we requ ir e

there to be N-l linear Ind e p e n d e n t , both of t h e m s e l v e s and

W, c ol um ns 1n

4

. We t h e r e f o r e require rank(A'c) = N-l, but

this 1n ter m Implies that rank(c) = rank(A*) must equal N-l.

A n e c e s s a r y and sufficient con d i t i o n for I d e n t i f i c a t i o n 1n

this model Is t h erefore r a n k ( A + ) = N-l.

Note we have sought I d e n t i f i c a t i o n up to a s c a l a r

m ul ti p le b e ca u se this type of operation on the p a r a m e t e r

vector does not alter the c o n t e n t of the e qu a t i o n s . An

a l te r n a t i v e Is to fix one p a r a m e t e r to a set value, for

Instance unity, and require unique I d e n t i f i c a t i o n b e c a u s e

this n o r m a l i s a t i o n of the e q u a t i o n means that the

m u l t i p l i c a t i o n of the r e m ai n i n g c o e f f i c i e n t s

1

n the e q u a t i on

by a sc a la r alters the nature of the structural e q u a t i o n s .

This c on d it i o n relies on the n o n s t o c h a s t i c e q u a t i o n s

A* »

0

and the s tochastic r e s t r i c t i o n that E ( u t ) «

0

. An

a l t e r na t iv e m o t iv a t i o n for the result Is based on t h e Idea

of o bs er v at i on a l e q u i v a l e n c e . If the model 1s I d e n t i f i e d

then the t r an s f o r m e d structural e q u ations

F B ' y t - F r' x t + F u t ,

(30)

22

(F n o ns in g ul ar ) should only be observât 1o n a l 1 y equivalent to

the original str u c t u r e If F - I. This can be checked by

exa m in in g the first and/or second moments of the t ransformed

system. The first moment a p p roach gives the already derived

rank c on di ti on . The second moment a p p roach uses the fact

that If two struc t u r e s are o b s e r v a t l o n a l l y e quivalent ut and

Fut m u s t have the same c ov a r i a n c e matrix. However 1n the

u n l i ke l y event of our p o s s e s s i n g detailed knowledge of the

second moment of ut , this a p p roach yields insufficient

restrictions as E(ut u£) has only N(N-l)/2 distinct off

diagonal e le ments, and so even If we assume I = a

2

I, we only

reduce the class of admis s i b l e F to be orthogonal matrices.

A lt ho ug h id e nt i f i c a t i o n could then be achieved by assuming

the s ys t em to be recursive, and so B would be triangular.

Our original d er i v a t i o n Is specific to linear systems,

makes onl y a f irst moment r estriction on the errors, and

uses no further di s t r i b u t i o n a l a s s um p t i o n s . A l t e r n a t i v e l y

we can con d it io n on the d i s t r i b u t i o n of the errors and

derive c o n di t io n s for local Ide n t i f i c a t i o n of the model.

R ot he n be r g (1971) and Bowden (1973) have d e m o n s t r a t e d that

the p ar a me t er vector, a,

1

s I dentified at a Q If the

I n fo rm at io n m a tr i x , defined as the expected value of the

hessian of the l i k e l i h o o d ,

1

s positive d e f i n i t e at that

point. R o t he n be r g (1971) shows that If ut Is distributed

n or mally then the rank c o n di t i o n again results for the

linear model. We return to those arguments later 1n our

d is cu s si o n of the c on d i t i o n s for c o n s i s t e n c y of an est i m a t or

(31)

2.4 Information & Estimation

Having c o n s i d e r e d the Identification of a s i m u l t a n e o us

equ a ti on s model, we now examine the meth o d s s u g gested for

Its e st i ma t io n . In practice there are t hr e e main

approa ch es : least squares (or minimum d istance).

Instrumental v a r ia b l e s and maximum l ikelihood. W i t h i n the

normal linear model these three are c lo s e l y related and

before e xp l or i ng the extent to which this r e l a t i o n s h i p

persists in the n o n li n e a r setting. In c h a p t e r

3

, we first

outl in e the a r g u m e n t s used to establish the prop e r t i e s of

these e st im a to r s in the linear model.

As 1n the I d e n t i f i c a t i o n stage, the proposed methods

di f fe r In t heir e x p l i c i t distributional a s s u m p t i o n s . Least

squares and Instrumental variables are d i s t r i b u t i o n free,

1

n

the sense that a s s u m p t i o n s are only made about the first two

mome nt s of the e rr o r process. However the e x o g e n e i t y of

certain v a ri ab le s will be crucial to the c o n s t r u c t i o n of

these e s t i m at o rs . It has therefore been Impl i c i t l y assumed

that the f a c t o r i s a t i o n of the joint d i s t r i b u t i o n Into the

conditional and marginal densities has p r o duced a sequential

cut on the p a r a m e t e r s of this model. Nor m a l i t y Is, of

course, s u ff ic i e n t for this, but

1

n some cases e.g., the

m u l t i v a r i a t e t, the cut will not occur (see Engle, He n d r y

and Richard, 1983).

In u ti l is i ng the extra Information about the

d i s t r i b ut i on 1n ML one would Intuitively expect to prod u c e

more e ff i ci e nt e s t i m a t o r s

1

f the asumptlon

1

s correct, but

at the e x pe ns e of bias If 1t Is false. This robustness/

e f fi c i e n c y t ra de o f f Is present In the linear model

1

n small

(32)

24

e xa mi ne the links between I d e n t i f i c a t i o n , information and

e st i ma t io n , the ideas behind which are relevant to all

m o d e l s .

The e f f ic i en c y of an e s t im a t o r clea r l y depends on the

amount of inform a t i o n used. In our d i scussion of

I de n ti f ic a ti o n we were solely c o ncerned with whether we had

suff ic i en t Information to be able to d e t ermine the unknown

p a ra me t er s uniquely from the data. The distin c t i o n then was

between just and under I d e n t i f i c a t i o n . For our d iscussion

of esti ma t io n we need to d i s t i n g u i s h a third situation,

na m el y that of o v e r 1 dent 1flcation . This occurs when there

1

s more than enough i ndependent I n f o r mation to Identify the

p ar a me t er s . For an e st i m a t i o n pro c e d u r e to be efficient It

will have to take account of all these restrictions, as the

use of one set of just Id e n t i f y i n g res t r i c t i o n s does not

gua r an te e the remaining I n d e p endent r e s t r ictions on an

equat io n will be s a tisfied. In the linear model the

p ro pe rt ie s of LS e st i m a t o r s are c l o s e l y related to the

d eg re e of Iden ti f i c a t i o n , as both two and three stage LS

( 2S LS and 3SLS) are undefined when the system Is

u n d e r

1

dent

1

f

1

e d , but equal when the system

1

s just

I d en ti fi ed . The exi s t e n c e of esti m a t o r s 1n all models will

d ep e nd on the number of o b s e r v a t i o n s , or rather amount of

I n f o r m a ti o n, relative to the number of variables. LS and ML

b re ak down In the u nd e r s i z e d sample case, where there are

less o b se rv a ti on s than ex o g e n o u s v a riables, and

1

n the

co u rs e of this c h a p t e r we note the m e t h o d s used to overcome

this problem.

There may s i milarly be an I nformation loss from

(33)

I nf o rm a ti o n (LI) techniques Ignore the Information contained

In the rest of the system about a p a r t i c u l a r equation, and

so will never be more efficient than full Inform a t i o n (FI)

m e t h o d s which I ncorporate all r e s t r ic t i o n s . Against this

has to be set the fact that our s p e c i f i c a t i o n

1

s often

t e n ta t iv e , and so some res t r i c t i o n s may be I ncorrect. The

t r a d e o f f to the effic i e n c y of FI may well be a lack of

r ob us t ne s s as

1

t allows any e r r o n e o u s res t r i c t i o n s on one

e q u a t i o n to p ot e n tially affect the e st i m a t i o n of the whole

syst em . Sims (1980) has argued for the need to match the

e s t i m a t i o n approach to the ma n n e r In w hich r e s t r i c t i o n s are

plac ed . If the system 1s treated e q u ation by equation at

the s pe c if i ca t i o n stage, which defi n e s the restr i c t i o n s ,

1

t

should then be e s t imated by a LI m et h o d . Typically a system

with a LI specification but e s t i m a t e d by FI methods will not

appear the a pp ro priate f o r m u lation when submitted to model

d i a g n o s ti c s. The a priori r e s t r i c t i o n s should the r e f o r e be

placed by c on sideration of the en t i r e system. The

d i f f i c u l t y of making such r e s t rictions, S1ms sees as a

f u rt he r support for h

1

s reduced form e s t i m a t i o n using vector

a u t o r e g r e s s i o n s. In this thesis we are con c e r n e d purely

with th e properties of FI es t i m a t o r s .

2.5 LS. IV and ML 1n the normal linear model

W i t h i n the normal linear SEM there 1s a close

r e l a t i o n s h i p between LS, IV and the ML e s t i m a t o r s . Hausman

(1974) has shown that both 3SLS and FIML can be c onsidered

as IV e st im a to r s and this approach will prove c on v e n i e n t for

e x a m i n i n g c on si stency and nor m a l i t y of the es t i m a t o r s .

(34)

26

as a pp ro xi m at i o n s to FI M L , and his "estimator g e nerating

equation" a pp ro a c h highlights the loss of inf o r m a t i o n , and

so (small sample) i n e f f i c i e n c i e s , of 3SLS and IV. In all of

the subse qu en t a n a l y s i s systems of equations are assumed to be i de nt if i ed .

Consi de r the model

where Y is a T x N ma t r i x of j o i n t l y dependent variables, X

1s a T x K m at r ix of p r e d e t e r m i n e d (weakly e xo g e n o u s )

variables, U is a T x N matrix of structural d i s t u r b a n c e s of

the system, T is the number of o b s e r v a t i o n s , B 1s assumed to

be n o ns i ng u la r , E(X'U) = 0, and E(UU') = t ® I T . Therefore we

are allowing c o n t e m p o r a n e o u s but not intertemporal

c or re lation betw ee n d i s t u r b a n c e s . The e q u a t i o n used 1n our

d is cu ss io n of i de n t i f i c a t i o n in SEMS in 2.3 is the transpose

of the t th row of (4). If we Impose n o r m a l i s a t i o n then the

1

th e qu ation of the system can be written as

YB + Xr = U ₍₄₎

yi = Zi

«1

+ u i , (

1 = 1

9 • • • 9N)

and the whole system as

y = Zi + u (5)

where

2

0

z zi - [Yj x1 ], = [§{•»{]

(35)

yi and are the 1th colu m n s of Y and U respec t i v e l y ,

vecY » y, VecU « u, and

8

^ , are the u n r e s t r i c t e d

coefficients on the e n d o g e n o u s and p r e d e t e r m i n e d variables

1n the 1th equation. Let the reduced form a ss o c i a t e d with this system be

Y = Xn' + V (6)

where V * UB"*, n ' = r B “ *

Brundy and J o r g e n s o n (1971) d e f i n e the Instrumental

variable e st imator of

6

as d, the s o l ution to the e q uations

(W'Z)d - W " y , (7)

where W 1s the matrix of I nstruments s a t i s f y i n g the

following conditions:

(

I

) pllm i l ' u *

0

, T

(

I I

) pllm -i-W'W is finite and n o n s i n g u l a r , T

(

I I I

) pi

1

m Iw'X Is finite. T

T h e r e f o r e ,

d - (M ' Z ) “

1

H ' y ,

d - 6 ♦ ( W' Z ) _

1

W ' u ,

(36)

28

to W'u//T. we have

/T(d-«) ^ N ( 0 ,pl1m (— )_ 1 (— )(— ) _ 1 ).

T T T

The IV e s t i m a to r Is c o n s i s t e n t and a s y m p t o t i c a l l y

di s tr i bu te d as normal, p r o vided the c o n d i t i o n s on W and the

first two moments of u are s a tisfied.

B ru nd y and J o r ge n s o n also prove that for W to yield an

a s ym p to t ic a l l y e f f ic i e n t d,

1

t must be ch o s e n so that the

1

-

j th block of W, l^j, Is equal to ( W i j i . W 1 J 2 ). where

a) p 11 m T "

1

W^J-1 X = a ^ i r j p l l m A X'X,

b) pi 1 m T -

1

W i'j 2 X = a i j pl1m A XjX,

(where the 1 - j th e l e m e n t s of E and E -1 are and o 1*

r es pe ct i ve l y ) . One p o s sible selection 1s to put «

C o ^ X w j , o ^ X j ] , w h e r e »j , a

1

^ are c o n s i s t e n t estimators

°f ifj > . Of course the 3SLS e s t i m a t o r s ,

«3SLS “ ® X ( X - X ) " 1X 1) Z ] “ 1C S “ 1 « X( X'X) ”

1

X

1

] y ,

falls Into this class. At the first stage the reduced form

1s e s ti ma te d by OLS to d e r i v e »j • Each structural equation

1s then e s ti ma te d I n d i v i d u a l l y by the IV e s t im a t o r with

W - [Xw j.Xj]: this gives the 2SLS (limited Information)

estim at or s of

6

^, 1 “ 1 ... N. The c o n s i s t e n t estimator

of E , S, 1s c o ns tr u c t e d by putting Its 1-jth element,

M l

.1

* * «

(37)

structural equation. Provided the structural equations are just

'.However it is shown on page 32 that it is not the most efficient estimator in small samples, although all IV of the form above are equal lyj efficient asymptotically.

To derive the ML estimator for this model we assume that U is

distributed multivariate normal. The log likelihood for the model in (4) is therefore

- I

t r [ I e_ 1 (YB + X r ) ' ( YB + Xr)].

2 T

The first order c o n d i t i o n s for o p t i m i s a t i o n are then

3E

To e s t a b l i s h the IV I n t e r p r e t a t i o n of FIML, Hausman (1974)

c on ce n tr a t e s the first order c o n d i t i o n s w i t h respect to T.

From (10),

T « 1 (YB + X r ) ' ( Y B + Xr),

and s u b s t i t u t i n g this Into (

6

) g ives the e q u at i o n s

* Throughout this thesis we refer to the estimator with tbs minimum (asymptotic) variance as being (asymptotically) most efficient.

identified 3SLS uses the most efficient* estimator of ir^ in the first stage

L(B,r,E) ■ const + — log d e t ( E ) “ * + T log det (|B|)

2

t t = T ( B

' ) " 1

- Y'(YB + X r ) E

- 1

= 0,

SB (8)

I t = -X ' (YB + X r ) E

_1

= 0

ar (9)

= TE - (YB + Xr)'(YB + Xr) = 0 (1 0)

-X' B ' ) _

1

r'X

(38)

1n terms of our notat i o n in model (5) in which the

coefficients are stacked in vector form, (

1 1

) can be rewritten as

30

Z

0

1

(y-Zi )(e

"1

® I) - 0,

which implies the FI ML est i m a t o r of

6

1s

3 = ( W ' Z ) -

1

W'y,

where W' - Z'(S x I j ) - 1 , Z = d 1 a g (Z j ... Z N ) ,

2

1

= C X ( r B _

1

)

1

Xi ,X1 ], and S = T “ 1 (YB + Xr)'(YB + Xr) .

The e quations are non l i n e a r in B and f and so have to

be estimated I te ra tively, giving the estimator after the kth Iteration as

*k+ l = ( M k Z )_ l M k ^

the I ns truments, W k , being revised at each step by

updating Z^ and S using the p a r am e t e r estimates from the

last iteration. We have assumed that the second order

moments are finite and n o n s i n g u l a r , where appropriate, and

so

6

j may be consi d e r e d an IV e stimator, for every k, as

It satisfies all the n e c es s a r y requir e m e n t s . The a s y m p t o t i c

n o rm al it y and c o ns i s t e n c y of « f o l l o w from the a r g uments

above, and so are guara n t e e d for a wide class of nonnormal

error d is tr ib u t i on s .