warwick.ac.uk/lib-publications
A Thesis Submitted for the Degree of PhD at the University of Warwick
Permanent WRAP URL:
http://wrap.warwick.ac.uk/129879
Copyright and reuse:
This thesis is made available online and is protected by original copyright.
Please scroll down to view the document itself.
Please refer to the repository record for this item for information to help you to cite it.
Our policy information is available from the repository home page.
*
ESTIM A T I O N AND INFERENCE IN S I M U L T A N E O U S
EQUATION MODELS
ALASTAIR HALL
Thesis submitted for Ph.D. d e g r e e
University of Warwick
Department of Economics
Summary
Chapter 1: I N T R O DUCTION
1 . 1
The e c o n o m e t r i c model and the data g en e r a t i ngprocess
1.2
The linear model as an a p p r o x i m a t i o n1.3 Time varying linear models as an a p p r o x i m a t i o n
to n o n li n e a r models 1.4 Summa ry
Chapter 2: S TATISTICAL P R O P E R T I E S OF E S T I M A T O R S AND
LINEAR MODEL R E S U L T S
2.1
Introduction2.2
Choice of e s t i m a t o r s in classical stat i s t i c s2.3 I d e n t l f 1cati on
2.4 Information and e s t i m a t i o n
2.5 LS, J V and ML in the normal linear model
Chapter 3: A S Y M P T O T I C THEORY A N D E X I S T I N G L I T E R A T U R E ON
NLSEM' S
3.1 A s y m p t o t i c th e o r y 1n n o n li n e a r models
3.2 No n l i n e a r t hree stage least squares
3.3 Properties of NLFIML: Amemiya (1977) and
Phillips (1982)
3.4 N o n linear Ins t r u m e n t a l variables
Chapter 4: INFERENCE IN M I S S P E C I F I E D M O D E L S
4.1 Theory of the quasi MLE
4.2 Id e n t i f i c a t i o n In n o n l i n e a r models
Chapter 5: C O N S I S T E N C Y OF NLFIML
5.2 Consistency of N L F I M L 1n the general model
5.3 Consistency of N L F I M L when ut Is a weakly
stationary process
5.4 Examples of mo d e l s in w h i c h NLFIML Is
conslstent
5.4.1 E x p e n diture and cost share models
5.4.2 Logs and levels m o d e l s
5.4.3 Further e x a m p l e s
Chapter
6
: MODEL SPECIFICATION. C O N D I T I O N S FOR THECONSISTENCY AND A S Y M P T O T I C NORMALITY OF NLFIML
6.1
Model coherency6.2
Coherency 1n p i e c e w i s e linear models6.3 The Implicit f u n c t i o n t h e o r e m
6.4 Gale and Nikaldo (1968) u n i v a l e n c e theorems
6.5 Model s pe c i f i c a t i o n and e s t i m a t i o n
6.6
The Implicit f u n c t i o n t h e o r e m and analyticfunctions
6.7 Implicit function t h e o r e m and c onsistency of
NLFIML
6.8
Asymptotic n o r m a l i t y of NLFIMLChapter 7: CONDI T I O N S FOR G E N E R A L I S A T I O N OF STATIC M ODEL
RESU L T S TO DYNAMIC MODEL
7.1 Introductlon
7.2 Convergence of QML E
7.2.1 Discussion of Problem
7.2.2 Definitions
7.2.3 Proof of c o n v e r g e n c e of QMLE to KLIC
minimising value
7.4
7.5
7.6
Chapter
8
:8 . 1
8.2
8.3
8.4
8.5
8.6
numbers for dep e n d e n t processes
7.3.1 Martingales
7.3.2 Mi xi ngales
7.3.3 Mixing Processes
R e l a t i o n s h i p between r o bustness of NLFIML and
the reduced form
A s y m p t o t i c n o r m a l i t y of NLFIML in dynamic
models
V e r i f i c a t i o n and s u i t a b i l i t y of the a ssumption
that series are mixing p r ocesses
THE INFORM A T I O N MATRIX TEST AND THE
E X P O N E N T I A L FAMILY
Pseudo m a x i m u m l ik e l i h o o d e s t i m a t o r s
Linear e xponential family
Poisson models
8.3.1 Poisson d i s t r i b u t i o n
8.3.2 Normal d i s t r i b u t i o n
8 .3.3 Gamma d i s t r i b u t i o n
8.3.4 Negative binomial d i s t r i b u t i o n
S p e c ification tests based on higher order
d e r i v a t i v e s of the like l i h o o d
Qua d r a t i c e xponential family
D1scu s s i o n
Chapter 9: C O N C L U S I O N S
Appendice s
Acknowledgements
I am greatly I n d ebted to Ken Wallis for his s upervision
of this work. He p r o v i d e d inval u a b l e guidance at all stages
of the research, and w i t h o u t his support the work would not have been completed.
I have benefltted f r o m very useful discussions on
various aspects of this thesis with Peter Burrldge, Peter
Crouch, Jan Magnus, Peter Phillips, George Rowlands and Mark
Sa l mo n .
I would also like to thank Carol Jones for her
skillful, and patient, t y p i n g of the original m a n u s c r i p t .
This research was u n d e r t a k e n du r i n g the tenur e s h l p of a
studentship from the E c o n o m i c and Social Science R e s e a r c h
We exam in e the a s y m p t o t i c p r o p e r t i e s of the full
in f or m at io n m a xi m u m l ikelihood e s t i m a t o r (M L E ) under the
a ss um p ti o n of n o r m a l i t y
1
n the general n o nlinears i mu l ta n eo u s e q u at i o n s model. The Initial analysis is for
the static model, and then the c o n d i t i o n s which allow the
g e n e r a li s at i on of the results to the d y n a m i c model are
e x p l o r e d .
We c o nc e n t r a t e on the quest i o n of the consis t e n c y of
the MLE when the n o r m a l i t y a s s u m p t i o n is erroneous. The
c on di ti on s for a s y m p t o t i c n o r ma l i t y are also considered, but
are given less e m p h a s i s beca u s e any tests based on the MLE
require c on si s te n t e s t im a t e s of its c o v a r i a n c e and so also
of Its mean. It 1s d e m o n s t r a t e d that 1f 1t 1s possible to
w ri te down an e x p l i c i t reduced form, then we can find
families of true nonnormal d i s t r i b u t i o n s for which the
e st i ma t or is c on s i s t e n t . However 1f the reduced form is
Implicit, then, apart from some special cases, the estimator
can only be proved to be c o n s i s t e n t if the model
1
sc o r r ec t ly s pe ci f i e d . The nature of the reduced form 1n
n o n li n ea r models
1
s rarely c o n s i d e r e d , and we examinec on di ti on s for Its u n i q u e n e s s . It 1s d e m o n s t r a t e d that this
entails more str i n g e n t c o n d i t i o n s on the Jacobian than are
usua ll y a c kn o wl e d g e d .
Finally we argue that the I n f o r m a t i o n matrix test 1s a
natural choice of s p e c i f i c a t i o n test for the pseudo MLE
s tr at eg y s uggested by Go u r l e r o u x , Monfort and Trognon
(1984a), which e s t im a t e s the p a r a m e t e r s of the nonlinear
regre ss io n model by m a x i m i s i n g the l i k e l i h o o d from a member
calculated for the Poisson model example d i scussed in
G o u r l e r o u x , Monf o r t and Trognon (1984b), and their
performance c on t r a s t e d with that of goodness of fit tests.
Also tests based on the E d geworth expansion are compared
with tests based on higher d e r i v a t i v e s of the standard
1 . I NT RODUCTION
1.1 The e c on o m e t r i c model and the data generating p r o c e s s .
The q u e s ti o n of how to e x p l a i n the behaviour of
economic series Is one of f u n d a m e n t a l Importance. The
choice of po l ic y Ins t r u m e n t s , and the appropriate m a g n i t u de
by which to adjust them, to a c h i e v e a particular goal
depends on our u n d e r s t a n d i n g of the economy. The central
problem
1
s that wh i l s t the o u t c o m e s of economic agentsactions are o bs e r v e d ,
1
t1
s o nly p o s s i b l e to hypothesise thedecision ma k in g proc e s s from w h i c h these outcomes result.
This has n a t u r a l l y led to the use of statistical models to
attempt to e xp l a i n the I n t e r r e l a t i o n s h i p between economic
series. It 1s hoped that by u si n g data to explore the
nature of this I n t e r r e l a t i o n s h i p
1
n the past, sufficientInformation can be acquired to p r o v i d e useful forecasts of
what may happen In the future.
In e c o n o m e t r i c s 1t Is c u s t o m a r y to think of the data as
having been g e n e r a t e d by a p r o c e s s of the form
q (y^., x^. , a) ■ u ^ , t * l , . . . T , (1)
where y t , xt , ut are vectors of e n d o g e n o u s , exogenous and
error v a ri ab le s r e s p e c t i v e l y ,
1
n period t, and a Is a vectorof unknown p a ra m e t e r s . The f un c t i o n a l form q(*) 1s assumed
time I nvariant but 1s of unknown form. Typically Its
structure
1
s d e t e r m i n e d by a m i x t u r e of economic theory andprior e x p er i en c e of the va r i a b l e s c on c e r n e d . Having
chosen q(*) the next step Is to e s t i m a t e the unknown
parameters. Three main e s t i m a t i o n strategies are
2
and m a x i m u m l ikelihood (ML). The latter requires an
assu mp t io n about the error distribution, and this
1
s usuallythat 1t 1s normal. It 1s argued that the t r a nsformation
q ( *) of the u nd e r l y i n g series represents the mec h a n i s m that
generated the data and so, on average overtime, the observed
values of y t , xt sati s f y q ( y t ,xt ,a) - 0. However 1n any
time period q ( y t ,xt ,a) may be subject to a random deviation
from zero. This de v i a t i o n 1s considered equa l l y likely to
be p o s i ti v e or n e g a t i v e and decr e a s l n g l y likely as Its
absolute value Incre a s e s . This suggests u t should be
modelled as a bell shape dis t r i b u t i o n centered on zero. The
normal
1
s one such d i s t r i b u t i o n and has the added advantageof ma k in g analysis of the model t ractable. The properties
of LS & IV estimates have been analyzed In the literature,
but little Is known of the properties of ML in n o nlinear
models .
In this d i s s e r t a t i o n we are concerned with the
situation 1n which y takes on values In Rm and q(*) Is an
un s pe c if ie d function but subject to certain regularity
condit io ns . Ne c e s s a r i l y some nonlinear models, for Instance
q u al it at iv e response models, are not en c o m p a s s e d by our
analysis. Within this f r amework we exam i n e the c onditions
under w h ic h the full I nformation ML e s t im a t o r 1s consistent
and a s y m p t o t i c a l l y n o r m a l l y d i s t r ibuted. From standard
likelihood theory 1t Is known that the MLE 1s consistent,
and both a s y m p t o t i c a l l y n o r m a l l y di s t r i b u t e d and the most
efficient when the model 1s c o rrectly specified. In this
thesis we c on c en t r a t e on the degree to w h i c h the MLE retains
these prope rt ie s when the true dlstrl b u t o n
1
s nonnormal, andThe q ue st i on of the robustness of an est i m a t o r 1s of
c o ns i d e r a b l e I m p o r t a n c e . The eventual power of the model
for either f o re c a s t i n g or policy analysis, as well as Its
a c c ur a cy
1
n e x p l a i n i n g the data, depends on the use made ofour a priori knowl e d g e , which
1
s at best t e ntative, ands p e c if i c a ti o n s e a r c h e s c o n s i s t i n g of a succession of
d ia gn o st i c tests of model adequacy. There Is no unique
o rd er i ng for a pp l y i n g te s t s , nor any guarantee that
d i ff er en t p e rm u ta t i o n s of the sequence lead to the same
c on c lu s io n . There 1s, c o n s e q u e n t l y , no guarantee that the
original s p e c if i c a t i o n wa s correct nor that the model
s election p ro ce d u r e s are s u f f i c i e n t l y s o p h i s t i c a t e d to
I nd icate d i re c t i o n s 1n w h i c h 1t might be Improved. This 1s
p a r t i c u l a r l y true of the assumed error d i s t r ib u t i o n . The
n or m al i ty s p ec i fi c a t i o n c a p t u r e s a symmetric, or bell shape,
error process 1n an a n a l y t i c a l l y tractable fashion. As 1t
Is not the only choice s a t i s f y i n g this r equirement
1
t1
sImp o rt an t to be aware of any biases
1
n Inference caused byIts Incorrect I m p o s i t i o n .
These r e se r v a t i o n s about test procedures have
r a m i fi c a t io n s for the i n t e r p r e t a t i o n of an e c o n o m e t r i c
mo d el . It 1s Imp o r t a n t to distin g u i s h between the data
g en er at io n process (dgp) and ap p r o x i m a t i o n s to It. If 1t 1s
p os si bl e to find a f un c t i o n a l transfo r m a t i o n q(*), subject
to the c o n di t io n s
1
n (1
), that represents the e xa c tm e c h a n i s m by w hi c h a c h a n g e In the economic enviro n m e n t
effe ct s the b eh a v i o r of y t , then this particular
r e pr e se n ta t i o n 1s the d g p . In the abscence of knowledge
about the a p pr o p r i a t e c h o i c e of q ( y ^ , x t ,a), the model
4
I n te rr el at i on shi p between series Is a synthesis of a priori
economic theory and d iagnostic tests. It has been noted
above that such a procedure lacks the s o p histication to
infallibly d e te r m i n e the dgp. Therefore the econom e t r i c
model Is best regarded as an appro x i m a t i o n to the dgp, whose
accuracy depends on the estim a t i o n and model selection
procedures employed.
This 1s at the centre of the debate on the Lucas policy
critique. Lucas (1976) argued that e c o n o m e t r i c models could
not be used for policy analysis as they were by their very
nature s el f- fa ls i f y i n g. "Given that the structure of an
econometric model consists of optimal decision rules of
economic agents" (Lucas, 1976, p. 41) any change 1n a policy
variable will alter the economic e n v i r o n m e n t and therefore
agents' reaction f unctions. The s t ructure of the
econometric model Is c o n s equently, he argued, changing with
the policy variable over time. However only the outcomes,
and not the d ec ision making processes themselves, are
observed. Given the reservations cited above about the
genesis of a model sp e c i f i c a t i o n , the equations are,
therefore, better In t e r p r e t e d as a p p r o x i m a t i o n s to the
underlying reaction functions. In this case, as S1ms (1982)
notes, Lucas' conc l u s i o n reduces to the point that
"Statistical models are likely to be come unreliable
when e x tr a po l a t e d to make pr e d i c t i o n s for c o nditions for
outside the range e x p e r i e n c e d 1n the sample" (S1ms,
1982, p. 122)
1.2 The linear model as an a p p r o x i m a t i o n
original s p ec i f i c a t i o n . A lot of att e n t i o n has focused on
the use of l i ne a r models to explain e c o n o m i c series. These
have the a d v a n t a g e of relative c om p u t a t i o n a l ease compared
to n o nl in ea r m o d e l s , and so It is i m portant to c o n s i d e r
1
nwhat s i t ua t io n s the choice of a linear model may be
suitable. Our ar g u m e n t s suggest that in a large number of
cases such m o d e l s are I n a pp r o p r i a t e , and so, there
1
s a needto deve lo p the t h e o r y of their n o n l i n e a r c o u n t e r p a r t s . For
this section we confine attention to sc a l a r y t and a vector
of ex o ge n ou s v a r i a b l e s , but the a r g u m e n t s can be g eneralised
to vector y t . We consider two j u s t i f i c a t i o n s for the linear
form
y t = xt a + u t* (
2
)as an a p p r o x i m a t i o n to a non l i n e a r dgp: the n o r mality
of (yt ,x£) and first order Taylor series ex p a n s i o n s .
If (yt .x£) have a joint normal d i s t r i b u t i o n then
x£a » E ( y t |xt ). The a ssumption of n o r m a l i t y can be
j ustified q uite ea s i l y
1
f y t1
s an a g g r e g a t e , by appeal tocentral limit t h e o r e m s . However the sample sizes for w hich
these hold will vary from case to case. If yt 1s not an
aggregate then, from the Edgeworth e x p a n s i o n of Its p.d.f.,
the n o rm al it y of y t results from the a ss u m p t i o n that all Its
cumulants h ig h er than the second are zero.
A l t e r n a t i v e l y 1t may be argued that 1f the dgp 1s y t *
f (x t ) + vt then 1f we take a first order Taylor series
6
f(x) + E ( x 1 t- x i )Af.
1 = 1 3x
1
t+ EE (x1 t - X , ) (X.t - X .)
1,j«l 11 1 3
2
X1
t 3Xjt +then e qu ating higher order terms to a white noise r a n d o m
v ar ia bl e (r.v.) In d e p e n d e n t of v t , we have a j u s t i f i c a t i o n
for the linear model. There are two main flaws 1n this
a rgument. Firstly, as noted by Bowden (1974), the
d e ri va ti ve s are state d ep e n d e n t , and the r e f o r e not f i x e d as
assumed in the linear model. Secondly, as White (1980) has
argued, the Taylor series 1s onl y valid as a local
a pp ro x im a ti o n whereas we wish to explain b e h a v i o r t h r o u g h o u t
the sample space, and use d i s p e r s e d data to e s t i m a t e t h e
p a r a m e t e r s .
Linear models are also e n c o u n t e r e d 1n the time s e r i e s
litera tu re . The Wold d e c o m p o s i t i o n t h e o r e m e s t a b l i s h e s that
a s t a ti o na r y series can be split Into d e t e r m i n i s t i c and non
d e t e rm i n i st i c co m p o n e n t s , and that this n o n d e t e r m l n
1
s t1
cc o m ponent has an Infinite order moving average
r e pr e se nt at i on . The removal of trend and seasonal f a c t o r s
from econo mi c series
1
s u s u a l l y thought to render t hems t at io n ar y and n o n d e t e r m l n l s t 1 c . A more p a r s i m o n i o u s
r ep re se n ta t i o n of this c o m po n e n t 1s an ARMA model and, by
using s t a t l o n ar l ty to pool I n f o r m a t i o n , the a p p r o p r i a t e
order of the model can be I d e n t i f i e d by the c o r r e l o g r a m and
partial a u to c o r r e l a t i o n f u n ction of the series. The model
In (
2
) can be derived as a set of p a r ameter r e s t r i c t i o n s ona m u l t i v a r i a t e ARMA model for (yt ,x£). The Wold t h e o r e m
only states that this moving a v e r a g e r e p r e s e n t a t i o n e x i s t s -
A n d e r s e n (1978) has d e m o n s t r a t e d that I d e nt i f i c a t i o n via the
c o r r e l o g r a m
1
s only u n a m b i g u o u s within the class of linearm o d e l s . It can be shown that b i l i n e a r models of the form,
w i t h c ^ =
0
for k > m, have the same a u t o c o v a ri ances tr u ct u re as an A R M A ( p , m a x { q , s > ) model. Higher order
c o r r e l a ti o ns will be needed to u n i q u e l y i d e n t i f y a model
w i t h i n this class, but the c o m p l i c a t e d nature of this
a n a ly s is has tended to result
1
n Inform a t i o n criteria beingu sed to d i sc ri m i n a t e b et w e e n b i l i n e a r models. However
G r an g e r and Ander s e n ' s r e s u l t s und e r l i n e that the linear
r e pr e s e n t a t i o n , whilst a n a l y t i c a l l y trac t a b l e . Is not
a c c o r d e d any statistical o p t i m a l i t y by the Wold theorem.
R a t h e r 1t is just one model fo r m u l a t i o n c o n s i s t e n t with the
s a m p l e a u to c or re l a t i o n s t r u c t u r e .
The use of linear m o d e l s may be a p p r o p r i a t e 1n certain
c a s e s either because the dgp Itself Is linear or as an
a p p r o x i m a t i o n to a n o n l i n e a r dgp. Whilst a linear model has
the a dv antage of analytical t r a c t a b i l l t y our review of the
t he or et i ca l j u s t i f i c a t i o n s for Its use, suggest that It
1
sby no means always a s u i t a b l e model choice. These are also
g r o u n d s for exp e c t i n g t r a d i t i o n a l model d i a g n o s t i c s to be
I n ad e q u a t e I ndicators of s i t u a t i o n s
1
n which est i m a t e dl i n e a r model can be I m p r o v e d on by a d o p t i n g a nonlinear
f o r m u l a ti o n. The I n t e r p r e t a t i o n of s p e c i f i c a t i o n tests 1s
n o r m a l l y within the context of the linear frame w o r k . Tests
for Incorrect functional f o r m have been d e v e l o p e d
1
n thel i te r a t u r e but the choice of a l t e r n a t i v e hy p o t h e s i s , and Its
P q r s
+ £ E C
k = 1 m = l
8
i nt er pr e ta t i o n 1f accepted, may be p r o b l e m a t i c a l . We do not
exam in e these Issues but c oncentrate on the p ro p e r t i e s of
estim at or s once a n o n li n e a r formulation is chos e n .
1.3 Time varying linear models as an a p p r o x i m a t i o n to
no n li n ea r models
Given the data d e p e n d e n c e of the d e r i v a t i v e s In a
Taylor series ap p r o x i m a t i o n , the natural e x t e n s i o n to the
linear appro xi ma t i o n
1
s to adopt a time v a r y i n g linearmodel. In this case the c o e f ficients on the x-j- are regarded
as altering overtime with certain p r operties of their
b eh av io r known. An e x a m p l e of this 1s the s t a t e space
system, outlined for Instance by Harvey (1981), In which
par a me te r e stimates are updated after each o b s e r v a t i o n by an
u pd ating p ro cedure such as the Kalman filt e r . This model 1s
s ui table for e v o l u t i o n a r y processes, but we a rg u e below that
Its d ep en de nc e on past o b s e r v a t i o n s may make
1
t Inapplicablefor m od e ll i ng non l i n e a r systems. An a l t e r n a t i v e 1s to
employ s witching regression models, w hich c o n s t i t u t e an
extreme form of varying par a m e t e r model. T hese have been
s ug gested by Tong and L1m (1980) 1n the time series
literature, and are f a m i l i a r
1
n e c o n o m e t r i c s w i t h referenceto mark et s In d i s e q u i l i b r i u m . Tong and Urn's (1980)
thr e sh ol d a ut or eg r e s s i o n model takes the form
y t ■ B( J t )yt + A ( J t )yt _i + et (Jt ) + c ( J t ),
where y t
1
s a vector of endog e n o u s variables In period t,A (j ), B(j) are matrices of fixed c o e f f i c i e n t s and et (j) 1s
value of the I n d i c a t o r variable J t w hich dete r m i n e s the
value of B(Jt ), A ( J t ), C ( J t ) and the d i s t r i b u t i o n of et (Jt ).
Whilst this f o r m u l a t i o n is of little practical use 1n
most e c on om et ri c settings It does highlight the potential
weakness of t i m e - d e p e n d e n t p a r a m e t e r mode l s . The problem 1s
that knowledge of an a p p r o p r i a t e I n d ic a t o r
1
s required, butthis Is u nl ik el y to be av a i l a b l e due to the unknown nature
of the dgp. This a p p r o a c h 1s, however, more consi s t e n t with
the Idea of d i f f e r e n t linear a p p r o x i m a t i o n s to an underlying
nonlinear dgp. In any n e i g h b o u r h o o d of a parti c u l a r point,
y t , the b eh a vi o ur of y t can be e x p l a i n e d by a linear Taylor
series a p p r o x i m a t i o n with fixed c o e f f i c i e n t s . However as
y t , and so the c e n t r e of the e x p an s i o n y t , moves through the
sample space the c o e f f i c i e n t s of the linear expansion
change. However t here 1s no reason to suppose they evolve
by a p articular s t o c h a s t i c law. If we regard the
a pp ro pr i at e l in e a r a p p r o x i m a t i o n as being Indexed by some
state dependent v a r ia b l e , then
1
n varying parameter modelsin which the c o e f f i c i e n t s are p r e sumed to evolve over time
by some s t o ch a st i c proce s s , past o b s e r v a t i o n s from other
regimes are still a f f e c t i n g the e s t i m a t e s . For Instance If
we pass the hyp o t h e t i c a l switch point, the varying parameter
model still bases Its c o e f f i c i e n t e s t im a t e s on the previous
regime. H ar rison and Stevens (1976) have sought to adapt
the state space r e p r e s e n t a t i o n to a B a y esian framework.
This allows the I n t e r v e n t i o n of s u b j e c t i v e I nformation 1n
the updating to w e i g h t m ore heav i l y the last o b s e r v a t i o n
when there
1
s reason to expect previous e xp e r i e n c e to bem i sl ea di ng . The e x a m p l e s they give for this model are short
10
climate 1n the next period may well be available. However
we t yp ically do not know when the neglect of the u n derlying
non l in ea r it ie s of the system will make our model u nreliable.
1.4 Summary
We have argued above that l i n e a r models with or with o ut
time varying param e t e r s are not n e c e s s a r i l y always suitable
app r ox im a ti on s to the dgp. In this thesis we consider
situations in which a more general nonlinear model
1
s deemeda pp ro priate. The m a j o r i t y of our analysis deals with models
of the gene ra l it y of equation (
1
) and1
s concerned with thep roperties of the MLE once a functional form has been
chosen, and not w i t h methods of s e l e c t i n g the functional
form. The c o n s is t e n c y and a s y m p t o t i c normality of the
est i ma to r are, of course, p r e r e q u i s i t e s for specification
searches for a be t t e r a p p r o x i m a t i o n using conventional test
procedures such as the Wald, l i k e l i h o o d ratio or score
t e s t s .
This work 1s based on a s y n t h e s i s of two areas of the
literature, and d e v e l o p s new anal y t i c a l results to answer
q uestions p r ev i ou s l y unexplored 1n those areas. Existing
work on the p ro pe r t i e s of e s t i m a t o r s In linear and non l i n e a r
mo d el s tends to assume the model s p e c i f i c a t i o n Is correct
and explores what parts of the s p e c i f i c a t i o n can be relaxed
without losing the des i r a b l e p r o p e r t i e s of the estimator.
This 1s dif f er en t from the a p p r o a c h taken by White (1982)
who examines the prop e r t i e s of the MLE when It 1s admitted
from the outset that the model
1
s m1
s s p e c1
f1
ed (In this casethe estimator Is called the quasi ML E (QMLE)). White (1982)
this e s t i m a to r to the value that m i n im i s e s the Kullback
U e b l e r (1951) I n f o r m a t i o n cri t e r i o n (KLIC). Our work
f o ll ow s the p ra c t i c e of the s i m u l a t a n e o u s equations model
(SEM) l i t er a tu r e and con s i d e r s c o n d i t i o n s for the
c o n v e r g e n c e of the QMLE to the true value 1n nonlinear
m o d e l s .
In chap te r 2 we discuss the l i terature on linear SEM's
and the I n t e r r e l a t i o n s h i p between the three stage least
s qu ar e s, full I n f o r m a t i o n MLE and full Information
I n st ru me n ta l v a r ia b l e s e s t i m a t o r . The aim 1s to demonstrate
the line of a rg ument by w hich previous authors have
e s t a b l i s h e d the c o n s i s t e n c y and a sy m p t o t i c n o r mality of the
MLE in this s it u a t i o n . This work would appear a logical
s t a r t i n g point for d e r i v i n g ana l o g o u s results for nonlinear
m od e ls , and so we need to I d e n t i f y at w hich stages of these
a r g u m e n t s l in e ar i t y 1s c r u cial. We also c o n sider the
a d v a n t a g e s of e s t i m a t i n g e q u a t i o n s s i m u l t a n e o u s l y (full
I n f o r m a t i o n (FI) e s t i m a t i o n ) as opposed to Ind i v i d u a l l y
( li mi te d I n fo rm a t i o n (LI) e s t i m a t i o n ) . In this thesis we
focus p u re l y on full I n f o r ma t i o n e s t i m a t o r s .
In c h ap te r 3 we su r v e y p r e vious e x p l o r a t i o n s of the
p r o p e r t i e s of t h e s e t hree e s t i m a t o r s
1
n non l i n e a r models.A me m ly a (1977) has shown that the Instrumental variable
I n t e r p r e t a t i o n of MLE does not persist to n o nlinear models,
and so H a us ma n' s (1974) proof of the c o n s i s t e n c y of the MLE
does not g e ne r a l i s e from linear to no n l i n e a r models.
P hi ll i ps (1982) has shown that there must exist classes of
the d i s t r i b u t i o n s for w hi c h ML e s t i m a t i o n under n o r mality
p ro vi d es c on si s t e n t e s t i m a t e s . However very little 1s known
12
argue that the approach taken by Phillips (1982) cannot be
extended to provide I n f o r ma t i o n on this Issue. We also
consider the c onditions u nd e r which an asym p t o t i c theory for
nonlinear models can be devel o p e d .
Chapter 4 contains an outline of the n e c e s s a r y results
from the m 1 ss p ec 1 f 1 e d model literature. We show that the
focus of our work 1s d i f f e r e n t from that of White (1982).
He derives c on di t i o n s for the convergence 1n p r o b a b i l i t y of
the QMLE to the KLIC m i n i m i s i n g value, whereas we examine
the cond it i on s under w hi c h this value Is
1
n fact the truevalue. We also explore the diff i c u l t y of v e r if y i n g second
order c on di ti on s for c o n s i s t e n c y , and the use of
dist ri b ut io n free I d e n t i f i c a t i o n criteria to c heck these
conditions. Att e n t i o n 1s focused on the criteria d e veloped
by Brown (1983) for n o n l 1n e a r - 1 n - v a r 1 a b le s models.
In c ha p te r 5 we c o n s i d e r various altern a t i v e analytical
approaches to that of P h i l l i p s (1982) for d e r i v i n g
conditions for the c o n s i s t e n c y of the MLE. We e s t a b l i s h
that there exists a fa m i l y of weakly stationary tru e error
processes w hose conditional distribution varies ov e r t i m e ,
for which the MLE under the assumption of I n d e p e n d e n t l y and
I de nt ically d is t ri b u t e d (
1
.1
.d.) normal errors p r o videsconsistent e s ti m at o r s . H o w e v e r the analytical d e r i v a t i o n of
nonnormal 1 .1 . d. true e r r o r distributions, for w h i c h ML
e stimation u nder n o r m a l i t y retains these des i r a b l e
properties, depends on the nature of the reduced form. If
1
t can be writ te n down e x p l i c i t l y then we can find trued i st r ib u ti on s for which N L F I M L Is consistent, a l t h o u g h the
class Is likely to be much narrower than Its linear model
In c ha p te r
6
we c o n sider the case where the reducedform Is Imp l ic it . We show that the c o n dition for
c o n s i s t e n c y Involves all the mome n t s of the d i s t ribution.
In this case the analytical results a v a il a b l e are that
NLFIML 1s c o n si s t e n t when the model Is cor r e c t l y specified
or If the e rr or
1
s from the class of d is t r i b u t i o n sc o ns i d e r e d by Phillips. However Phillips' proof only
e s ta b l i s h e s the e x i s t e n c e of such a class, and as Its exact
nature varies from case to case, our results suggest that
1
fwe require c o ns i s t e n t and a s y m p t o t i c a l l y normal e stimates,
NLFIML should not be used when the reduced form 1s Implicit.
We e x pl o r e the condi t i o n s for a set of structural
e q ua ti o ns , such as (
1
), to Imply an u n i q u e l y defined reducedform. An e x a m i n a t i o n of the work of Gale and Nlkaldo (1968)
shows that t hese condi t i o n s are more strlgent than Is
u s ua ll y r e c og n is e d 1n the e c o n o m e t r i c s literature. Fina l ly
we c on si d er the c o n d i t i o n s for the a s y m p t o t i c n o r mality of
NLFIML. W hi te (1983) observes the Impor t a n c e of consi s t e nt
e s ti m a t i o n of the first moment for that of the covar i a n c e of
the QMLE. Wh i ls t White's analysis c o n t a i n s an algebraic
slip, the e ss e nc e of h
1
s comments retains Its validity.With ou t c o n s i s t e n t est i m a t e s of the covari a n c e , traditional
t e st in g p r o ce d ur e s based on the p a r am e t e r estimates break
down. In c o n t ra s t NL 3 S L S 1s c o n s i s t e n t and a s y m p t o t i c a l l y
normal under the same moment c o n d i t i o n s as In the linear
model, and so w ould appear the p r e fered e stimator.
C h ap t er 7 c o n t a i n s a d i s c u s s i o n of the conditions under
14
e xt ended to d yn am i c models. We e x a m i n e the types of dynamic
processes for w hi c h we can apply a vers i o n of the strong law
of large numbers and so replicate our earlier analysis for
static models. Curr e n t practice 1s to employ either
m ar ti ng al e or m ix i n g process argu m e n t s . McLelsh (1975) has
shown both types of processes to be m l x l n g a l e s for which the
desired law of large numbers can be d e r i v e d . White and
Domowltz (1983) have advocated the use of mixing processes
as they have the adv a n t a g e that f u n c t i o n s of them are
t hemselves mi x in g proce s s e s , and so t h e i r use Involves one
basic assu mp t io n about y ^ . Whereas th e marti n g a l e arguments
entail a series of a s s u m ptions about f u n ctions of y t
Inva ri a bl y with ou t exa m i n i n g their c o n s e q u e n c e for the
under ly in g series. However we argue u si n g some results due
to Jones (1976) that, c o n trary to the vie w a p parently
expressed by White and Domowltz, the theoretical validation
of whet he r a p a r ti c u l a r series g e n e r a t e d by a model
1
s Infact a mi x in g p ro cess,
1
s likely to p r o v e Impossible.This chap te r also contains an e x t e n s i o n of a proof by
Heljmans and Magnus (1983a) of c o n s i s t e n c y of the MLE, under
weak c on di ti on s on the u nderlying p r o c e s s ,
1
n cor r e c t l yspecified models to the case of m 1 s s p e c 1 f 1 e d models. We
show that the MLE con v e r g e s to the KLIC d i m i n i s h i n g value 1n
their f r am ew o rk . Finally, we consi d e r the c onditions for
a symptotic n o r m a l i t y of the QMLE 1n d y n a m i c models. In
part ic u la r we focus a t tention on the c h o i c e of scaling
factor. White and Domowltz (1983) p r e s e n t a central limit
theorem that r eq uires a constant s c a l i n g factor m ul t i p l i e d
by the Increase of the square root of the sample size. They
nonnormal a s ym p t o t i c d i s t r i b u t i o n . We argue, using the work
of Hall and H eyde (1981), that this need not be the case.
In c h ap t er
8
we argue that the information matrix tests uggested by W hi t e (1982) is a natural test of model
s pe ci f ic a ti o n when e m p l o y i n g the pseudo maxi m u m likelihood
e s ti ma t io n str a te g y , a d vocated by Gourleroux, Monfort and
Trognon (1984a), for the n o nlinear regression model. We
c al c ul a te the a p p r o p r i a t e tests for the Poisson model
exam pl e c o n si d er e d by G ourleroux, Monfort and Trognon
(1984b). The r e s ul t i n g tests of d i s t r i b u t i o n are compared
with goodness of fit tests. We comp a r e the higher order
l ikelihood d e r i v a t i v e tests (suggested by Chesher, 1983)
based on the s ta ndard normal l i kelihood with the tests based
on E d ge wo rt h e x p a n s i o n s (Keifer and Salmon, 1983) and show
that they c o i n ci d e for tests of the third and fourth m o m e n t s
but not for the fifth. Fina l l y 1t is shown that the
d e c o mp o s i ti o n of the i nformation matrix test in the linear
model r eg re s si o n model, d e m o n s t r a t e d by Hall (1982), can be
e xt en de d to Its non l i n e a r c o u n terpart.
C h ap t er 9 c o n tains some con c l u s i o n s , after which some
16
2. STATISTICAL P R O P E R T I E S OF E S T I M A T O R S AND LINEAR MODEL
R E SU LT S
2.1 INTRODUCTION
The properties of and r e l a t i o n s h i p between maximum
l ikelihood (ML), least s q u a r e s (LS) and Instrumental
v ariables (IV) have been e x p l o r e d at length 1n the
l it er at ur e for the linear m o d e l . It 1s well known that all
three can be considered IV e s t i m at o r s , which provides a
c on ve n ie n t proof of their c o n s i s t e n t l y and asymptotic
n or m al i ty provided the e r r o r process has mean zero. Whereas
ML under n ormality Is the m o s t e f f ic i e n t If the
s p ec i fi c at io n 1s correct, a class of IV e stimators.
Inc l ud in g LS, are a s y m p t o t i c a l l y eq u i v a l e n t . In this
c ha p te r we outline the b as i s of these results to Illustrate
both why linearity deliv e r s such powerful results and why
the type of arguments used cannot n e c e s s a r i l y be generalised
to the n onlinear setting. We also I n t roduce and discuss the
c ri te r ia for choice of e s t i m a t o r s , I d e n t i f i c a t i o n and full
or limited Information e s t i m a t i o n of systems of equations,
the basic theoretical Issues of which are relevant to all
m o d e l s .
2.2 Choice of Estimators 1n Classical Statistics
The m aj or it y of e c o n o m e t r i c the o r y Is based on
classical statistics. P r o b a b i l i t y statements have a
f r eq ue nt ls t I nt erpretation as the sit u a t i o n e n visaged
1
s one1
n w hich the researcher can g e n erate unlimited data byr e pe at in g the experiment u n d e r Identical conditions. In
e c o n o m e tr i cs the data are o b s e r v e d p a s si v e l y and so
1
t1
ss ta t lo n ar i t y , before the classical framework can be used.
This done, we h y p o t h e s i s e a probability model of the
form q (y • x , a) =* u , with a ssumptions about u,y,x,q(*)» to
explain the o bs e r v e d relationships between economic
varia bl es . The model 1s Indexed by an unknown parameter
vector a and the aim of classical statistics
1
s to reduceour u n ce r t a i n t y about a by point and Interval estimation
using I n fo rm at io n 1n the data. The point e s t i m a t e
of a , a , Is a f u n ction of random variables and so Is Itself
st o ch a st ic . The Interval estimate, or hypothesis test,
gives an Idea of the sampling distribution of a and so of
the de g re e to w hich a eva l u a t e d at the realised data values
1
s a "true" r ef l e c t i o n of a .We can c o ns t r u c t any number of estim a t o r s from the
data, but as our In f e r e n c e depends on a It Is des i r a b l e to
have some method of " screening out" poor e s t i m a t o r s . The
classical c r it er i o n for a c h ieving this Is to require a to be
(1) u nbiased: E ( a ) * a and/ or (1 i ) c onsistent: p U m a - a .
The e s t i m a to r chosen 1s the most efficient (1n the sense of
having m i ni m um vari a n c e ) , of those satisfying (
1
) and (1 1
).In e c on o m e t r i c models an estimator 1s u s u a l l y a
c o mp l i c a t e d function of the error random v a riables making
its small sample d i s t r i b u t i o n a n a l ytically I n t r a c t a b l e and
so the d i s cu s si o n Is limited to large sample p roperties,
nam e ly c o ns i s t e n c y and a s ymptotic effici e n c y . The problem
of Interval e s t i m a t i o n reduces to finding the c o nditions for
c o n s i s t e n c y and a s y m p t o t i c normality of a under particular
c i r c u m st a nc e s. The argum e n t Is that whilst we may know
noth in g of Its small sample behavior, an e s t im a t o r
1
s18
However any Interval esti m a t i o n using asymptotic results
requires the a s s u m p t i o n that Indeed the sample size Is large
enough, although this Is rarely check e d . Asymp t o t i c theory
can be regarded as an a p p r o x i m a t i o n to the finite sample
result. In any p a r t i c u l a r s i t uation better approxi m a t i o n s
can be developed from the a s y m p t o t i c estimates by using
Edgeworth e x p a n s i o n s to analyze the effects of the largest
as y mp t ot ic a ll y n e g l i g i b l e terms In the dis t r i b u t i o n function
of the e stimator.
2
.3 Identi f1
cat1
onThe analysis of the p ro p e r t i e s of estimators
presupposes that t h e para m e t e r s can be uniquely deter m i n e d
from the data or,
1
n statistical p a r lance, that the model1
sIdentified. E c o n o m i c theory has limi t e d our a t tention to a
particular family of p r o b a bi l i t y d i s t r i b u t i o n s , termed the
model, but what we seek
1
s the struc t u r e , the p a r t i c u l a rd is tribution, m ost likely to have g e nerated the data. The
problem of lack of I d e n t i f i c a t i o n
1
s es s e n t i a l l y one ofobservational e q u i v a l e n c e . This arises when two s tructures
are Identical, and so I n d i s t i n g u i s h a b l e from sample data. A
structure Is I d en t i f i a b l e If, and o n l y
1
f, there are noo bs er v at l o n al l y e q u i v a l e n t st r u c t u r e s ,
1
n which case theparameters can be u n i q u e l y d e t e r m i n e d from the data.
A well known e x a m p l e of lack of I d e ntification 1s when
the common factor r e s t r i c t i o n occurs 1n ARMA models.
Consider the s t a t i o n a r y ARMA(1,1) model:
t
- 1
+ et'By repeated s u bs t i t u t i o n for lagged values of y, (3) can be
writ te n as
j = 0
et + U + e) i *J et . ..
j » 0 t - J- l
Any structure for w hi c h + - -e is not Identifiable as then
y t 1s white noise. This problem can occur, with the same
c o ns e qu e nc e s for I d e n t i f i c a t i o n , in a more general model
H ( L) y t - ♦ ( L )et ,
1f H(L) « y(L)H*(L) and *(L) = f ( L)**(L). The model cannot
be Identified due to the common roots shared by both
p ol y no m ia l s H(L) and *(L).
The p r ob le m of lack of I d e n t i f i c a t i o n 1s essentially
one of I n su ff i ci en t I n f o r m a t i o n to en a b l e the parameters to
be d et er mi n ed . This can be offset by Introducing additional
In f or m at io n Into the problem, 1n the form of parameter
r e st r ic t io ns . These can take two forms: nonstochastic
r e st ri c ti on s on a an d / o r s tochastic restrictions on the
p.d.f. of u. For a s t ructure to be model admissible, It
m ust satisfy these r e s t r ic t i o n s , and
1
t1
s hoped thats uf fi ci en t r es tr i c t i o n s can be Imposed to reduce the number
of model a dm is s ib l e stru c t u r e s to one.
Ide n ti fi c at i o n Is a general statistical problem, but 1n
e c o n o m e tr i cs
1
t Is n o r m a l l y a s s o c i a t e d with simultaneouse qu at i on mode ls . For I l l u s t r a t i v e purposes we consider the
20
B 'y t + r'xt * u t , t • î ...t,
where y t Is a N x 1 vector of endog e n o u s variables, xt 1s a
K x 1 v e c t o r of e x o ge n o u s v a riables and ut 1 s a N x 1 vector
of mean z er o d is t u r b a n c e s with c o n t e m p or a n e o u s covariances
matrix E and E(u^u^) = 0. The reduced form for y^ is
y t = ïïxt + vt* t ‘ 1 ... T »
where v^ * B' ^u^. Note that we require B to be nonsingular
for there to be a un i q u e reduced form a ssociated with the
structural e qu at i o n s . We return to the condi t i o n s for such
a mapping betw ee n y and u
1
n a more general setting1
nchapter
6
. The reduced form 1s n e c e s sa r i l y Identified andthe I d en t if i c a t i o n of the structural equations depends on
whether g i v e n est i m a t e s of n we can uniquely det e r m i n e
(B,r). The r el at i o n s h i p betw e e n structural and reduced form
parameters
1
s given byAW = 0 where A » [B':r'], W' =
As the s y s t e m stands there 1s Ins u f f i c i e n t I nformation to
estimate th e param e t e r s of the 1th equation, . They must
satisfy the r e st r i c t i o n s a{W * 0 but as rank(W) » K there
are only k l in ea rl y I n d e p e n d e n t restrictions on the N+K
elements of . However 1f we know that the coe f f i c i e n t s
have linear r e st ri c t i o n s betw e e n them of the form *
0
,then this I nf o rm a t i o n can be used to achieve I d e n t i f ication.
The vector must then sati s f y affW:*) > 0, and so a
up to a scalar multiple is that rank(W:+) = N+K-l. The
matrix {A’*w] is a n o n s i n g u l a r matrix of dimension N+K a^id so
its colu mn s form a basis for R N + K . We can t h e r e f o r e w r i t e
♦ = A'e + W n ,
and as A* = AA'c, because AW = 0, rank(A+) = r a n k ( g ) . This
enab le s the c on dition for i d e n t i f i c a t i o n to be r e s t a t e d
1
n amore c o n v e n i e n t form. For rank(W:$) to be N+K-l, we requ ir e
there to be N-l linear Ind e p e n d e n t , both of t h e m s e l v e s and
W, c ol um ns 1n
4
. We t h e r e f o r e require rank(A'c) = N-l, butthis 1n ter m Implies that rank(c) = rank(A*) must equal N-l.
A n e c e s s a r y and sufficient con d i t i o n for I d e n t i f i c a t i o n 1n
this model Is t h erefore r a n k ( A + ) = N-l.
Note we have sought I d e n t i f i c a t i o n up to a s c a l a r
m ul ti p le b e ca u se this type of operation on the p a r a m e t e r
vector does not alter the c o n t e n t of the e qu a t i o n s . An
a l te r n a t i v e Is to fix one p a r a m e t e r to a set value, for
Instance unity, and require unique I d e n t i f i c a t i o n b e c a u s e
this n o r m a l i s a t i o n of the e q u a t i o n means that the
m u l t i p l i c a t i o n of the r e m ai n i n g c o e f f i c i e n t s
1
n the e q u a t i onby a sc a la r alters the nature of the structural e q u a t i o n s .
This c on d it i o n relies on the n o n s t o c h a s t i c e q u a t i o n s
A* »
0
and the s tochastic r e s t r i c t i o n that E ( u t ) «0
. Ana l t e r na t iv e m o t iv a t i o n for the result Is based on t h e Idea
of o bs er v at i on a l e q u i v a l e n c e . If the model 1s I d e n t i f i e d
then the t r an s f o r m e d structural e q u ations
F B ' y t - F r' x t + F u t ,
22
(F n o ns in g ul ar ) should only be observât 1o n a l 1 y equivalent to
the original str u c t u r e If F - I. This can be checked by
exa m in in g the first and/or second moments of the t ransformed
system. The first moment a p p roach gives the already derived
rank c on di ti on . The second moment a p p roach uses the fact
that If two struc t u r e s are o b s e r v a t l o n a l l y e quivalent ut and
Fut m u s t have the same c ov a r i a n c e matrix. However 1n the
u n l i ke l y event of our p o s s e s s i n g detailed knowledge of the
second moment of ut , this a p p roach yields insufficient
restrictions as E(ut u£) has only N(N-l)/2 distinct off
diagonal e le ments, and so even If we assume I = a
2
I, we onlyreduce the class of admis s i b l e F to be orthogonal matrices.
A lt ho ug h id e nt i f i c a t i o n could then be achieved by assuming
the s ys t em to be recursive, and so B would be triangular.
Our original d er i v a t i o n Is specific to linear systems,
makes onl y a f irst moment r estriction on the errors, and
uses no further di s t r i b u t i o n a l a s s um p t i o n s . A l t e r n a t i v e l y
we can con d it io n on the d i s t r i b u t i o n of the errors and
derive c o n di t io n s for local Ide n t i f i c a t i o n of the model.
R ot he n be r g (1971) and Bowden (1973) have d e m o n s t r a t e d that
the p ar a me t er vector, a,
1
s I dentified at a Q If theI n fo rm at io n m a tr i x , defined as the expected value of the
hessian of the l i k e l i h o o d ,
1
s positive d e f i n i t e at thatpoint. R o t he n be r g (1971) shows that If ut Is distributed
n or mally then the rank c o n di t i o n again results for the
linear model. We return to those arguments later 1n our
d is cu s si o n of the c on d i t i o n s for c o n s i s t e n c y of an est i m a t or
2.4 Information & Estimation
Having c o n s i d e r e d the Identification of a s i m u l t a n e o us
equ a ti on s model, we now examine the meth o d s s u g gested for
Its e st i ma t io n . In practice there are t hr e e main
approa ch es : least squares (or minimum d istance).
Instrumental v a r ia b l e s and maximum l ikelihood. W i t h i n the
normal linear model these three are c lo s e l y related and
before e xp l or i ng the extent to which this r e l a t i o n s h i p
persists in the n o n li n e a r setting. In c h a p t e r
3
, we firstoutl in e the a r g u m e n t s used to establish the prop e r t i e s of
these e st im a to r s in the linear model.
As 1n the I d e n t i f i c a t i o n stage, the proposed methods
di f fe r In t heir e x p l i c i t distributional a s s u m p t i o n s . Least
squares and Instrumental variables are d i s t r i b u t i o n free,
1
nthe sense that a s s u m p t i o n s are only made about the first two
mome nt s of the e rr o r process. However the e x o g e n e i t y of
certain v a ri ab le s will be crucial to the c o n s t r u c t i o n of
these e s t i m at o rs . It has therefore been Impl i c i t l y assumed
that the f a c t o r i s a t i o n of the joint d i s t r i b u t i o n Into the
conditional and marginal densities has p r o duced a sequential
cut on the p a r a m e t e r s of this model. Nor m a l i t y Is, of
course, s u ff ic i e n t for this, but
1
n some cases e.g., them u l t i v a r i a t e t, the cut will not occur (see Engle, He n d r y
and Richard, 1983).
In u ti l is i ng the extra Information about the
d i s t r i b ut i on 1n ML one would Intuitively expect to prod u c e
more e ff i ci e nt e s t i m a t o r s
1
f the asumptlon1
s correct, butat the e x pe ns e of bias If 1t Is false. This robustness/
e f fi c i e n c y t ra de o f f Is present In the linear model
1
n small24
e xa mi ne the links between I d e n t i f i c a t i o n , information and
e st i ma t io n , the ideas behind which are relevant to all
m o d e l s .
The e f f ic i en c y of an e s t im a t o r clea r l y depends on the
amount of inform a t i o n used. In our d i scussion of
I de n ti f ic a ti o n we were solely c o ncerned with whether we had
suff ic i en t Information to be able to d e t ermine the unknown
p a ra me t er s uniquely from the data. The distin c t i o n then was
between just and under I d e n t i f i c a t i o n . For our d iscussion
of esti ma t io n we need to d i s t i n g u i s h a third situation,
na m el y that of o v e r 1 dent 1flcation . This occurs when there
1
s more than enough i ndependent I n f o r mation to Identify thep ar a me t er s . For an e st i m a t i o n pro c e d u r e to be efficient It
will have to take account of all these restrictions, as the
use of one set of just Id e n t i f y i n g res t r i c t i o n s does not
gua r an te e the remaining I n d e p endent r e s t r ictions on an
equat io n will be s a tisfied. In the linear model the
p ro pe rt ie s of LS e st i m a t o r s are c l o s e l y related to the
d eg re e of Iden ti f i c a t i o n , as both two and three stage LS
( 2S LS and 3SLS) are undefined when the system Is
u n d e r
1
dent1
f1
e d , but equal when the system1
s justI d en ti fi ed . The exi s t e n c e of esti m a t o r s 1n all models will
d ep e nd on the number of o b s e r v a t i o n s , or rather amount of
I n f o r m a ti o n, relative to the number of variables. LS and ML
b re ak down In the u nd e r s i z e d sample case, where there are
less o b se rv a ti on s than ex o g e n o u s v a riables, and
1
n theco u rs e of this c h a p t e r we note the m e t h o d s used to overcome
this problem.
There may s i milarly be an I nformation loss from
I nf o rm a ti o n (LI) techniques Ignore the Information contained
In the rest of the system about a p a r t i c u l a r equation, and
so will never be more efficient than full Inform a t i o n (FI)
m e t h o d s which I ncorporate all r e s t r ic t i o n s . Against this
has to be set the fact that our s p e c i f i c a t i o n
1
s oftent e n ta t iv e , and so some res t r i c t i o n s may be I ncorrect. The
t r a d e o f f to the effic i e n c y of FI may well be a lack of
r ob us t ne s s as
1
t allows any e r r o n e o u s res t r i c t i o n s on onee q u a t i o n to p ot e n tially affect the e st i m a t i o n of the whole
syst em . Sims (1980) has argued for the need to match the
e s t i m a t i o n approach to the ma n n e r In w hich r e s t r i c t i o n s are
plac ed . If the system 1s treated e q u ation by equation at
the s pe c if i ca t i o n stage, which defi n e s the restr i c t i o n s ,
1
tshould then be e s t imated by a LI m et h o d . Typically a system
with a LI specification but e s t i m a t e d by FI methods will not
appear the a pp ro priate f o r m u lation when submitted to model
d i a g n o s ti c s. The a priori r e s t r i c t i o n s should the r e f o r e be
placed by c on sideration of the en t i r e system. The
d i f f i c u l t y of making such r e s t rictions, S1ms sees as a
f u rt he r support for h
1
s reduced form e s t i m a t i o n using vectora u t o r e g r e s s i o n s. In this thesis we are con c e r n e d purely
with th e properties of FI es t i m a t o r s .
2.5 LS. IV and ML 1n the normal linear model
W i t h i n the normal linear SEM there 1s a close
r e l a t i o n s h i p between LS, IV and the ML e s t i m a t o r s . Hausman
(1974) has shown that both 3SLS and FIML can be c onsidered
as IV e st im a to r s and this approach will prove c on v e n i e n t for
e x a m i n i n g c on si stency and nor m a l i t y of the es t i m a t o r s .
26
as a pp ro xi m at i o n s to FI M L , and his "estimator g e nerating
equation" a pp ro a c h highlights the loss of inf o r m a t i o n , and
so (small sample) i n e f f i c i e n c i e s , of 3SLS and IV. In all of
the subse qu en t a n a l y s i s systems of equations are assumed to be i de nt if i ed .
Consi de r the model
where Y is a T x N ma t r i x of j o i n t l y dependent variables, X
1s a T x K m at r ix of p r e d e t e r m i n e d (weakly e xo g e n o u s )
variables, U is a T x N matrix of structural d i s t u r b a n c e s of
the system, T is the number of o b s e r v a t i o n s , B 1s assumed to
be n o ns i ng u la r , E(X'U) = 0, and E(UU') = t ® I T . Therefore we
are allowing c o n t e m p o r a n e o u s but not intertemporal
c or re lation betw ee n d i s t u r b a n c e s . The e q u a t i o n used 1n our
d is cu ss io n of i de n t i f i c a t i o n in SEMS in 2.3 is the transpose
of the t th row of (4). If we Impose n o r m a l i s a t i o n then the
1
th e qu ation of the system can be written asYB + Xr = U (4)
yi = Zi
«1
+ u i , (1 = 1
9 • • • 9N)and the whole system as
y = Zi + u (5)
where
2
0
z zi - [Yj x1 ], = [§{•»{]
yi and are the 1th colu m n s of Y and U respec t i v e l y ,
vecY » y, VecU « u, and
8
^ , are the u n r e s t r i c t e dcoefficients on the e n d o g e n o u s and p r e d e t e r m i n e d variables
1n the 1th equation. Let the reduced form a ss o c i a t e d with this system be
Y = Xn' + V (6)
where V * UB"*, n ' = r B “ *
Brundy and J o r g e n s o n (1971) d e f i n e the Instrumental
variable e st imator of
6
as d, the s o l ution to the e q uations(W'Z)d - W " y , (7)
where W 1s the matrix of I nstruments s a t i s f y i n g the
following conditions:
(
I
) pllm i l ' u *0
, T(
I I
) pllm -i-W'W is finite and n o n s i n g u l a r , T(
I I I
) pi1
m Iw'X Is finite. TT h e r e f o r e ,
d - (M ' Z ) “
1
H ' y ,d - 6 ♦ ( W' Z ) _
1
W ' u ,28
to W'u//T. we have
/T(d-«) ^ N ( 0 ,pl1m (— )_ 1 (— )(— ) _ 1 ).
T T T
The IV e s t i m a to r Is c o n s i s t e n t and a s y m p t o t i c a l l y
di s tr i bu te d as normal, p r o vided the c o n d i t i o n s on W and the
first two moments of u are s a tisfied.
B ru nd y and J o r ge n s o n also prove that for W to yield an
a s ym p to t ic a l l y e f f ic i e n t d,
1
t must be ch o s e n so that the1
-j th block of W, l^j, Is equal to ( W i j i . W 1 J 2 ). where
a) p 11 m T "
1
W^J-1 X = a ^ i r j p l l m A X'X,b) pi 1 m T -
1
W i'j 2 X = a i j pl1m A XjX,(where the 1 - j th e l e m e n t s of E and E -1 are and o 1*
r es pe ct i ve l y ) . One p o s sible selection 1s to put «
C o ^ X w j , o ^ X j ] , w h e r e »j , a
1
^ are c o n s i s t e n t estimators°f ifj > . Of course the 3SLS e s t i m a t o r s ,
«3SLS “ ® X ( X - X ) " 1X 1) Z ] “ 1C S “ 1 « X( X'X) ”
1
X1
] y ,falls Into this class. At the first stage the reduced form
1s e s ti ma te d by OLS to d e r i v e »j • Each structural equation
1s then e s ti ma te d I n d i v i d u a l l y by the IV e s t im a t o r with
W - [Xw j.Xj]: this gives the 2SLS (limited Information)
estim at or s of
6
^, 1 “ 1 ... N. The c o n s i s t e n t estimatorof E , S, 1s c o ns tr u c t e d by putting Its 1-jth element,
M l
.1
* * «structural equation. Provided the structural equations are just
'.However it is shown on page 32 that it is not the most efficient estimator in small samples, although all IV of the form above are equal lyj efficient asymptotically.
To derive the ML estimator for this model we assume that U is
distributed multivariate normal. The log likelihood for the model in (4) is therefore
- I
t r [ I e_ 1 (YB + X r ) ' ( YB + Xr)].2 T
The first order c o n d i t i o n s for o p t i m i s a t i o n are then
3E
To e s t a b l i s h the IV I n t e r p r e t a t i o n of FIML, Hausman (1974)
c on ce n tr a t e s the first order c o n d i t i o n s w i t h respect to T.
From (10),
T « 1 (YB + X r ) ' ( Y B + Xr),
and s u b s t i t u t i n g this Into (
6
) g ives the e q u at i o n s* Throughout this thesis we refer to the estimator with tbs minimum (asymptotic) variance as being (asymptotically) most efficient.
identified 3SLS uses the most efficient* estimator of ir^ in the first stage
L(B,r,E) ■ const + — log d e t ( E ) “ * + T log det (|B|)
2
t t = T ( B
' ) " 1
- Y'(YB + X r ) E- 1
= 0,SB (8)
I t = -X ' (YB + X r ) E
_1
= 0ar (9)
= TE - (YB + Xr)'(YB + Xr) = 0 (1 0)
-X' B ' ) _
1
r'X1n terms of our notat i o n in model (5) in which the
coefficients are stacked in vector form, (
1 1
) can be rewritten as30
Z
0
1
(y-Zi )(e
"1
® I) - 0,which implies the FI ML est i m a t o r of
6
1s3 = ( W ' Z ) -
1
W'y,where W' - Z'(S x I j ) - 1 , Z = d 1 a g (Z j ... Z N ) ,
2
1
= C X ( r B _1
)1
Xi ,X1 ], and S = T “ 1 (YB + Xr)'(YB + Xr) .The e quations are non l i n e a r in B and f and so have to
be estimated I te ra tively, giving the estimator after the kth Iteration as
*k+ l = ( M k Z )_ l M k ^
the I ns truments, W k , being revised at each step by
updating Z^ and S using the p a r am e t e r estimates from the
last iteration. We have assumed that the second order
moments are finite and n o n s i n g u l a r , where appropriate, and
so
6
j may be consi d e r e d an IV e stimator, for every k, asIt satisfies all the n e c es s a r y requir e m e n t s . The a s y m p t o t i c
n o rm al it y and c o ns i s t e n c y of « f o l l o w from the a r g uments
above, and so are guara n t e e d for a wide class of nonnormal
error d is tr ib u t i on s .