Avoiding misspecifications and improving efficiency in hedonic and consumption models: applications of semiparametric methods

(1)

AVOIDING MISSPECIFICATIONS AND IMPROVING EFFICIENCY IN HEDONIC AND CONSUMPTION MODELS:

APPLICATIONS OF SEMIPARAMETRIC METHODS

KUO CHUEN LEE

A THESIS SU BM ITTED F O R A D E G R E E O F D O C T O R O F PH IL O SO PH Y

(2)

UMI Number: U048666

INFORMATION TO ALL USERS

The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed,

a note will indicate the deletion.

Disscrrlation Publishing

UMI U048666

Published by ProQuest LLC 2014. Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC.

ProQuest LLC

789 East Eisenhower Parkway P.O. Box 1346

(3)

.Ÿ-'-6# P O LIT IC A L '*

AND *4t

r HES-eS

h

6691

(4)

P5-2]

ABSTRACT

T he objective of this thesis is to avoid m isspecifications and to seek efficiency im provem ents in cross sectional and tim e series econom etric applications using sem iparam etric m ethods. W e restrict our a tte n tio n to single equation models and th e use of conditional m om ent restrictions as well as m axim um likelihood m ethods. The first p a rt of th e thesis deals w ith cross sectional studies on th e U nited Kingdom car m arket and th e second p a rt deals w ith tim e series studies of th e U nited States consum ption function. There are five m ain contributions of th e thesis.

(5)

PS-3]

D edicated to m y p aren ts and m y wife,

(6)

î>5-4]

ACKNOWLEDGEMENTS

In preparing this thesis, I am indebted to m any people. Forem ost on th e list is Prof. P eter M. Robinson. The project would not have been undertaken at all had it not been for the encouragem ent and support from him a t th e in itial stages. The intellectual debts to him are th e m ost im p o rtan t and hardest to m easure, and his tactfu l criticisms, helpful advice, thoughtful guidance and constant encouragem ent have had th e greatest effect on m any p arts of this research.

I would also like to thank Prof. Jam es D urbin and Dr. M artin K not for being my supervisors and taking care of my welfare. I would like to th an k my colleagues Miguel Delgado, Javier Hildago and M asako K urosawa for some stim ulating discussions and assistance in com puter program m ing. T he com m ents and ideas of A drian Pagan, Lonnie Magee, John Lane, Steve Pudney, and O razio A ttan asio on some chapters of th e d raft have led to th e im provem ent in the present thesis. I am also indebted to Dr. Charles Bean, R obert Blackm an of Em merson Hill Associates, Rosem ary Roberts of H aym arket M agazine Lim ited and Vivien M aries of th e Association for Consumer Research, for their help in various ways during th e collection of th e data.

(7)

Contents [ pg.5]

TABLE OF CONTENTS

C H A P T E R PA G E

A bstract 2

Dedications 3

Acknowledgements 4

T able of C ontents 5

List of Tables 9

List of Figures 11

Acronyms 14

1. Introduction

1.1 Prelim inaries 17

1.2 Avoiding M isspecifications 19

1.3 Efficiency Im provem ents 22

1.4 The Scope and O utline of th e Thesis 25 2. The M ethod of Kernels and Technical Complements

2.1 Introduction 31

2.2 Some Prelim inaries 33

2.3 E stim ation of U nivariate and

M ultivariate Densities 34

2.4 N adaraya—W atson Kernel E stim ato r For

Conditional M oments 38

2.5 A sym ptotic Properties of Density E stim ato r 40

2.6 C onditional E xpectation 41

2.7 D erivative of Conditional E xpectation 43 2.8 Higher—O rder Kernels and O ptim al B andw idth 44

2.9 D ependent O bservations 47

(8)

Contents [ pgS]

2.11 High D im ensionality 49

3. A Sem iparam etric Hedonic Approach.

3.1 Introduction 53

3.2 A Non—Stochastic Model of U tility M axim ization 55

3.3 T ransform ation Models 59

3.3.1 H eteroscedastic Box—Cox (HBC)

T ransform ation Model 62

3.3.2 H eteroscedastic Inverse Hyperbolic Sine

(HIHS) T ransform ation Model 65

3.3.3 H eteroscedastic Hyperbolic Box—Cox

(HHBC) T ransform ation Model 66

3.4 M ethod of M oments and M axim um Likelihood M ethod 66 3.4.1 C onsistent Instrum ental V ariable

E stim ates 68

3.4.2 Efficient In strum en tal V ariable E stim ates 70 3.4.3 N onparam etric Smoothing and

B andw idth Selection 72

3.4.4 M axim um Likelihood E stim ation 73

3.5 Results 75

3.6 Identification and C onstruction of E lasticities 78

3.7 Conclusion 84

4. F u rth e r Sem iparam etric Policy Analysis.

4.1 Introduction 94

4.2 T he Hedonic Price Function 96

4.3 Problem s w ith th e Two—Stage Procedures 100

(9)

4.5 R esults 110

4.6 Conclusion 114

5. A Survey of Consum ption Models w ith R ational E xpectations

5.1 Introduction 140

5.2 T he Stochastic Im plications of the LC PIH 140

5.3 Asset Pricing and Consum ption 143

5.4 Some General Problem s 146

5.5 T he Econom etric M ethods 147

5.6 G eneral O bservations 162

5.7 Studies Using Household D ata 167

6. Sem iparam etric Analysis of a Consum ption Model. 6.1 Relationship Between C onsum ption and

Real Interest R ates 170

6.2 T he Model 172

6.3 Sem iparam etric ARCH (SARCH) and

B andw idth Selection 173

6.4 Likelihood B andw idth Selection in SARCH Model 177

6.5 Inflation Expectations 178

6.6 Surprise Models and Conditional V ariance 181

6.7 Results 183

6.8 A utom atic B andw idth Selection and

E lasticity E stim ates 184

6.9 Conclusion 190

7. F u rth e r Sem iparam etric Analysis of Surprise C onsum ption Function.

(10)

Contents [ pg.S]

7.3 Results: Using Standard Two Stage M ethods 210 7.4 N onparam etric Regression and R eduction

of Dimensionality 214

7.5 E stim ation of A nticipated Term s 217

7.6 E stim ation of Surprise Term s 220

7.7 H ausm an and Robinson Test Statistics 222 7.8 Choice of B andw idth, Kernels,

Lag W indow and Number. 226

7.9 Results from Sem iparam etric M ethods 228

7.10 Conclusion 230

8. B andw idth Selection : Some M onte Carlo Results.

8.2 The Pseudo Log—Likelihood and

C ross-V alidation C riteria 243

8.3 Mean Square E rror C riterion 248

8.4 M onte Carlo Results 249

9. Some Open Problem s 272

D ata Appendices

A ppendix A: C ross-Sectional D ata 279

Appendix B: Tim e Series D ata 283

(11)

LIST O F TABLES

TA B L E PA G E

3.1 P aram etric E stim ates for Running Cost Function 86 3.2 P aram etric E stim ates for Fuel Efficiency Function. 87 3.3 Box—Cox and Inverse Hyperbolic Sine E stim ates

for th e R unning Cost Function 88

3.4 Box—Cox and Inverse Hyperbolic Sine E stim ates for

th e Fuel Efficiency Function 89

3.5 E stim ates for th e C apital Cost Function 90 3.6 C onstructed Price E lasticity of

D em and For A ttrib u tes 91

3.7 C onstructed Price E lasticity of Demand

For A ttrib u tes 92

4.1 P aram etric E stim ates for the Hedonic Price Function 136 4.2 Sem iparam etric Estim ates for the

Hedonic Price Function 137

4.3 W illingness—T o -P a y E stim ates 137

4.4 Cross—V alidated Hedonic Price and

W illingness—To—Pay E stim ates 144

6.1 O rdinary Least Squares Estim ates:

D ependent V ariable is Infl^ 193

6.2 A utoregressive Condition H eteroscedastic

E stim ates: Dependent V ariable is Infl^ 194 6.3 Sem iparam etric A utoregressive Conditional

H eteroscedastic Estim ates:

D ependent V ariable is Infl^ 194

(12)

Contents [ p^.lO]

D ependent V ariable is Ac^ 195

6.5 O rdinary Least Squares Estim ates:

6.6 Instru m en tal V ariable Estim ates:

6.7 Instru m en tal V ariable Sem iparam etric E stim ates:

7.1 Regression Results Using Standard Two Stage M ethods 232 7.2a Regression Results Using S tandard Two Stage M ethods 233

7.2b Diagnostic Tests Results 234

7.3 Specification T ests for Supplem entary Regression 235

7.4 P rincipal Component Analysis 236

7.5 Sem iparam etric Regression Results for

Model la and 2a 237

7.6a Sem iparam etric Regression Results for Model 3a 238 7.6b Sem iparam etric Regression Results for Model 3a 239 7.6c Sem iparam etric Regression Results for Model 3a 240

8.1a R esults for P a rtly Linear Model 251

8.1b R esults for P a rtly Linear Model 252

8.1c R esults for P a rtly Linear Model 253

8.Id R esults for P a rtly Linear Model 254

8.2a R esults for E rror—In—V ariable Model 256 8.2b R esults for E rror—In—V ariable Model 257

8.3a R esults for GLS Model 259

(13)

Contents [

LIST O F FIG U RES

FIG U R E PA G E

4.1a P lo t of Sem iparam etric E stim ates A gainst B andw idth:

Air C onditioned 115

4.1b P lo t of Sem iparam etric Hg E stim ates A gainst B andw idth:

Sunroof 116

4.1c P lo t of Sem iparam etric Hg E stim ates A gainst B andw idth:

E lectric W indow 117

4 .Id P lot of Sem iparam etric Estim ates A gainst B andw idth:

A utom atic Transm ission 118

4. le P lo t of Sem iparam etric Hg Estim ates A gainst B andw idth:

5 Speed 119

4 .If P lo t of Sem iparam etric E stim ates A gainst B andw idth:

Pow er Steering 120

4.1g P lo t of Sem iparam etric H2 E stim ates A gainst B andw idth:

U nleaded 121

4.1h P lo t of Sem iparam etric H2 E stim ates A gainst B andw idth:

B ritish M ade 122

4.2a P lo t of Sem iparam etric E stim ates A gainst B andw idth:

Air C onditioned 123

4.2b P lo t of Sem iparam etric E stim ates Against B andw idth:

Sunroof 124

4.2c P lo t of Sem iparam etric E stim ates Against B andw idth:

E lectric W indow 125

4.2d P lo t of Sem iparam etric E stim ates A gainst B andw idth:

A utom atic Transm ission 126

(14)

Contents [ p^.l2]

5 Speed 127

4.2f P lot of Sem iparam etric E stim ates A gainst B andw idth:

Power Steering 128

4.2g P lo t of Sem iparam etric E stim ates A gainst B andw idth:

Unleaded 129

4.2h P lo t of Sem iparam etric E stim ates A gainst B andw idth:

B ritish M ade 130

4.3a P lo t of Cost E stim ates for and A gainst B andw idth:

10% Increased in Fuel Efficiency 131

4.3b P lo t of t —ratios for and Against B andw idth:

4.4a P lo t of Cost E stim ates for and A gainst B andw idth:

4.4b P lot of t —ratios for Hg and Against B andw idth:

4.5 P lo t of Cost E stim ates for Against B andw idth:

M inim um S tan d ard of 35 M PH. 135

6.1a P lo t of C riterion Function Values A gainst B andw idth:

for SARCH(1,1) 199

6.1b P lo t of C riterion Function Values A gainst B andw idth:

for SARCH(1,1) 200

6.2a P lo t of Hg C riterion Function Values A gainst B andw idth:

for Model 1 201

6.2b P lo t of C riterion Function Values A gainst B andw idth:

for Model 1 202

6.2c P lo t of Hg C riterion Function Values A gainst B andw idth:

(15)

Contents [

8.1 Plot of R atio of Bias of CV E stim ates to U N IT for

Model 3 262

8.2 Plot of R atio of Bias of CV E stim ates to O P T for

Model 3 263

8.3 P lot of R atio of Bias of CV E stim ates to GLS for

Model 3 264

8.4 P lot of R atio of Standard D eviation of CV E stim ates

to U N IT for Model 3 265

8.5 P lot of R atio of S tandard Deviation of CV E stim ates

to O P T for Model 3 266

8.6 P lot of R atio of Standard D eviation of CV E stim ates

to GLS for M odel 3 267

8.7 P lo t of R atio of MSE of CV E stim ates to U N IT for

Model 3 268

8.8 Plot of R atio of MSE of CV E stim ates to O P T for

Model 3 269

8.9 P lot of R atio of M SE of CV Estim ates to GLS for

(16)

pg.U]

ACRONYM S

ACE A lternating C onditional E xpectations

ARCH Autoregressive C onditional H eteroscedasticity ARE A sym ptotic R elative Efficiency

ARMA Autoregressive M oving Average A SPE Average Squared Prediction E rror

BC Box—Cox

CCAPM C onsum ption C apital Asset Pricing Model CRRA C onstant R elative Risk Aversion

CV C ross-V alidation

EVM Errors—In—Variables M ethod

FIM L Full Inform ation M axim um Likelihood

GARCH Generalised ARCH

GCV Generalised CV

GLS Generalised Least Squares

GMM Generalised MM

GN2SLS Generalised Nonlinear 2SLS

HBC H eteroscedastic Box—Cox

HHBC Heteroscedastic Hyperbolic Box—Cox HIHS Heteroscedastic Inverse Hyperbolic Sine i.i.d independent identically distrib u ted

ISE Integrated Squared E rror

IV Instrum ental V ariable

LCPIH Life Cycle—P erm anent Income Hypothesis

LM Lagrange M ultiplier

LR Likelihood R atio

(17)

M ISE ML MLS MM M PG MSE NL2SLS NLIV NLLS N -W O CE OLS QTM RSS R T SARCH SML TS V AR

Mean Integrated Squared E rror M axim um Likelihood

M ultiple Least Squares M ethod of M oments Miles P er Gallon Mean Square E rror

Nonlinear Two Stage Least Squares Nonlinear In stru m en tal V ariable Nonlinear Least Squares

N adaraya—W atson

O rdinal C ertain ty Equivalence O rdinary Least Squares

Q uadratic Transform ation Model Residual Sum of Squares

Rules—of—the—Thum b Sem iparam etric ARCH

Sem iparam etric M axim um Likelihood Two Stage

(18)

CHAPTER 1

INTRODUCTION

(19)

Introduction [ch 1. 17]

1.1 PRELIMINARIES

Econom etrics are useful for policy analysis and counter factual purposes. For policy studies in general, economists are interested in finding th e effects of th e change of some controlled instrum ents on the dependent variables. N otable examples of policy studies are th e m easurem ent of environm ental benefits of im proving certain neighbourhood qualities and th e im pact upon car a ttrib u te s brought about by th e governm ent restriction on fuel efficiency, in view of the hike in th e price of petrol. As for counter factual purposes, diagnostic or specification tests are usually employed to cross exam ine em pirical facts w ith theory. For exam ple, economists are interested in finding w hether anticipated and u nanticipated inflation variables and stock prices m a tte r in th e consum ption model.

One approach of econometric modeling is to form ulate an economic model and end up w ith th e relationship in a param etric form relating th e variables of interest. In the process, econom etricians alm ost always place some additional restrictions on the economic model under study for em pirical tra c ta b ility w ith little economic m otivations. G enerally, th e restrictions are in th e form of p aram etric assum ptions regarding functional form or distribution of some variables or th e disturbances. The stronger th e assum ptions, the sim pler th e stru c tu re of th e econom etric model. For easy in te rp re ta b ility , th e resulting models are usually of a very simple form , e.g., a linear or log linear model. However, if one proceeds to conduct policy analysis based on th e model, th ere are always some doubts on w hether these additional assum ptions embodied in th e model are valid. These doubts will inevitably affect one's fa ith in the estim ates.

(20)

Sometimes, these additional assum ptions imposed by th e econom etrician can be tested using th e data. Diagnostic and specification tests are usually employed to perform th e task. However, in m any instances, successive rejections of these tests leave th e econom etrician stranded w ith little idea of how to proceed.

A nother approach known as m easurem ent w ithout or w ith little theory is to specify a very general statistical model and seek th e best rep resen tatio n of th e relationship and th en reconcile th e results w ith existing economic theory. The existing theory is not necessarily consistent w ith th e best fitted models. Indeed, it m ay indicate th a t th e best model is impossible w ithin a certain theoretical framework. If th e "best" model is inconsistent w ith th e theory, th en new explanations has to be offered and indeed a new theory. U nfortunately, th e refined theory will often lead to em pirical in tractab ility unless some additional p aram etric assum ptions are added. This leads us to the same problems as in th e first approach.

Therefore, it would be desirable to proceed w ith policy studies or hypothesis testing w ithout further assum ptions besides those im posed or im plied by th e economic model. In this study, we use th e economic fram ework as a bench m ark for our studies; when economic inform ation is lacking in assisting us in functional form selection, we resort to nonparam etric technique which relieves us from m aking any further assum ptions. W hen this is not possible, we will com prom ise by relaxing at least some restrictions so th a t we have a more flexible model. Since we com bine the use of p aram etric and nonparam etric com ponents, th e approach can be term ed sem iparam etric.

(21)

Let Y be a continuous random variable, X a k x l vector of continuous variables and Z a p x l vector of predeterm ined continuous independent variable. The econom etricians observed the values of Y, X and Z, where { y .;i= l,...,N } , v jk } ? { z j j 5 i = l v î P } î X. = {x.ji j= l,...,k } , and z. = {z-; j= l,...,p } . Let us assume for the m om ent th a t th e y^, and = (xj,z.) are related by th e model:

'^y(yi5'^) — ^w ^^i'^^ ^ ^i’ i= l,...,N

where A and 6 are some param eters to be estim ated, E[e. | w^j = 0, E[e.ej| w-j = 0 and

2 2

E[e. I w.] = cr(wj) . and are transform ations or functions to be defined. 1.2 AVOIDING MISSPECIFICATIONS

M isspecification of the model usually leads to loss of consistency or efficiency. In general, consistent estim ates can usually be obtained at th e expense of loss of efficiency. T he existence of a tru e model is im p o rtan t in theoretical work. However, m any believe th a t in empirical work, th e tru e model is generally unknow n and the tra d e off between consistency and efficiency does not really exist. Therefore, the preferred strategy for modelling should be to allow for possible misspecifications w ith m inim al assum ptions. This brings us to focus on some statistic a l concepts which we will now discuss.

These three aspects of statistical modelling are best sum m arized by Stone (1985):

Flexibility is the ability o f the model to provide accurate fits in a wide variety o f

situations, inaccuracy here leading to bias in estimation.

D im ensionality can be thought o f in term s o f variance in estimation, the ’’curse

o f dim ensionality” being that the am ount o f data required to avoid an unacceptably

large variance increases rapidly with increasing dimensionality, or, as usually put,

between bias and variance.

(22)

structure.

W hile flexibility is positively correlated w ith dim ensionality, th ere is always some tra d e off between these two aspects and in te rp re ta b ility in m odel building. Our foremost concern is to have "enough" flexibility in order to give us an accurate fit. U nfortunately, th e more flexible a model is, th e higher th e dim ension of param eter space th a t one has to deal with. Furtherm ore, it would generally be m uch harder for one to visualize or comprehend th e relationship betw een th e dependent and independent variables. Therefore, th e three aspects are not entirely independent of each other and having a more flexible model will in general incur inflated variance and reduced in terp retab ility . The particu lar class of sem iparam etric models th a t we em ploy in general, trad e off inflated variance (efficiency) for flexibility (consistency). W e are p articu larly concerned w ith in th e following models in our applications:

(1) M ethod of M oments

T he following models all have th e common conditional m om ent restrictions of the form E [e|(w ,y;^)| w.] = 0, where 9 is th e vector of param eters of interest. The m ost efficient of th e m ethod of m om ents (MM) estim ato r requires th e estim ation of

Wj] = h(w .;^) which is generally unknown. (a) T ransform ation model

A transform ation model can be expressed as

TyCypAy) = Sj2i^jTj(Wij,Aj) + ej, i= l,...,N (1) 2

where Efc. Iw-j = a, T and T. are some known transform ations, A and A-'s are

L 1 I p ’ y 1 ’ y J

some tran sfo rm atio n param eters to be estim ated, and /?j's are some unknown param eters to be estim ated.

(23)

have the advantage of inducing flexibility but as we have m entioned, this will increase th e dimension of param eters of interest. In th e case of a p aram etric model with a single transform ation param eter for each T , th e increase in th e num ber of param eters to be estim ated is two fold. In terp retab ility is com plicated by th e fact th a t th e relationship is between functions of y and functions of th e w's.

Various transform ations will be introduced in C h ap ter 3. It should be m entioned th a t th e model is sem iparam etric only in th e sense th a t no distributional assum ptions are imposed on The contribution from this relaxation should not be overlooked, th e reasons being th a t m any transform ation models, e.g., logarithm ic transform ation, exclude certain distributional assum ptions and estim ators obtained by imposing distributions are not robust. T he use of MM will a t least guarantee consistent estim ates and correct inference.

Nonlinear two stage least squares, a special case of MM, is an appropriate estim ation procedure for transform ation models. It involves finding "optim al" instrum ents w^] which is usually assumed to be linear in w. However, the expectation is unknown in this case and we can apply th e m ethod of Robinson (1988e) in obtaining th e optim al instrum ents ra th e r th a n estim ating th e conditional expectations nonparam etrically as in Newey (1987).

(2) P a rtial Linear Model

Now, consider the following model which is p artly linear *

î ~ îj ^ ’" ’^ ip )”^ î’ i= l,...,N

(24)

can be recovered. In th e event th a t we have a large num ber of z's in and only a set of m edium sized data, we m ay have to employ a m ore restrictive model of the form

Yj — ^j/^j ^i’ i= l,...,N

where aj is a vector of known coefficiencts. This device is to overcome some of the problems in nonparam etric estim ation. The issues of high dim ensionality and inflated variance encountered in m u ltiv ariate estim ation are discussed in C hapter 2. This partly linear model is used in the tim e series study in C h ap ter 7. z ’s cannot be known w ith certain ty in this case and rules out any dum m y variables or constant. This is in consequence of relaxing th e assum ption th a t T^ is unknown. However, this consequence has a very useful purpose as we shall see in C hapter 4. In many instances, one is only interested in th e policy changes of some or all of th e z's on the y.. In other words, we are concerned w ith distributional changes of y from , e.g., z to

*

z . Let D be a vector of dum m y variables and 6 th e coefficients. W e m ay th en be able * to deal w ith dum m y variables easily since we are only in terested in E[yj—E[y ] = E[D(^+T^(z)] — E[D 6+T^(z )] = E [T^(.)—T ^(.)j. T hus, a n onparam etric policy analysis can be conducted using the suggestions of Stock (1985a). Unlike th e case of MM, where it is known th a t it is m ost efficient w ithin a class, th e p a rtly linear model estim ator is only known to be root—N—consistent and asym ptotically norm ally distributed (Robinson (1988a)).

1.3 EFFICIENCY IMPROVEMENTS

(25)

As m entioned above, th e problem in applied work is th a t th e tru e functional form is seldom known although economic theory m ay provide or im pose some restrictions on th e functional form. In m any other instances, th ere is virtu ally no inform ation to guide one in selection. Given th e level of generality, th e usual approach is to use an ad hoc functional form. In m any instances, especially when the functional form itself is not of intrinsic interest, e.g., when th e nuisance function is involved, linear functional form is employed. A m ore respectable approach is to use diagnostic and specification tests in aiding one to select th e correct functional form. The problem arises when successive tests are rejected, one finds it difficult to suggest a suitable functional form.

Recent work has progressed in th e direction of ad ap tiv e estim ation. In simple term s, th e adaptive sem iparam etric estim ato r has th e sam e efficiency under unknown distribution o r/a n d functional form as a p aram etric estim ator under known distribution o r/a n d functional form. Some of our models are of this class, b u t we lim it ourselves to m axim um likelihood (ML) estim ation when th e error te rm is known to be from a p aram etric family.

O ur m ain concern under th e heading of efficiency im provem ent is efficiency of the estim ators and consistency is taken for granted. In o ther words, if we use the incorrect p aram etric functional form for the functions, we will still have consistent estim ates. The nonparam etric technique is only used to estim ate th e nuisance functions, i.e., th e functions are itself not of intrinsic interest. Of course, in th e case of m isspecification of the nuisance function, th e stan d a rd errors are usually inconsistent and therefore th e sem iparam etric model has advantage.

(26)

variance—covariance m atrix. Therefore, flexibility of th e nuisance functions in ARCH and other conditional heteroscedastic models is im p o rtan t. T he o th er estim ators which belong to this class are th e m ethod of m om ents estim ators which include the linear and nonlinear heteroscedastic, transform ation and errors—in—variables models. (1) M axim um Likelihood M ethod

M axim um likelihood estim ators for th e following models are no t robust to slight misspecifications of th e error d istribution or nuisance functions. However, efficiency im provem ent c a n . be attain ed if additional inform ation, such as th e inform ation on th e error distribution, is used for estim ation.

(a) T ransform ation model

The error term of th e transform ation model can be known to have certain distributions, e.g., norm al. G am m a or t —distribution. T he norm al d istrib u tio n is favoured if it is perm itted by th e transform ation used. In fact, one of th e purposes of transform ation is to reduce skewness. These d istrib u tio n assum ptions can be checked bu t if it is known w ith certainty, then m aking use of th e inform ation will usually lead to efficiency im provem ent.

Although one of the intentions of tran sfo rm atio n is to induce hom oscedasticity, a direct heteroscedasticity correction m ay still be desirable. This can be done by nonparam etrically estim ating th e conditional variance and a model is introduced and applied to real d a ta in C hapter 3.

(b) ARCH

(27)

(2) M ethod of M oments

In th e following models, E[ej|zj] = 0 and possibly E[c.C j|zJ = cr-. Since our MM estim ators have th e in terp retatio n of IV (In stru m en ta l V ariable) estim ators, efficiency im provem ent can be attain ed by using m ore efficient instrum ents.

(a) T ransform ation model

In cross sectional studies, th e efficiency of m ost transform ation models can be improved by taking into account heteroscedasticity. It is easy to see th a t the generalized n o n -lin e a r tw o—stage least squares (GNL2SLS) estim ato r is m ore efficient th an th e n o n -lin ea r tw o—stage least squares (NL2SLS) estim ator. If the heteroscedasticity is of an unknown form b u t ctj = cr(zj), th en it is n a tu ra l to employ nonpar am etric estim ates for th e conditional variance. This class of model is dealt w ith in C h ap ter 3. O f course, failure to take heteroscedasticity into account in this case will also inv alid ate th e statistical inference.

(b) Errors—in—variables

By im posing suitable restrictions on the T 's in (1), we can have th e linear GLS model and this model is used in C hapter 3. However, if th e explanatory variables contain conditional expectation term s, we have an errors—in—variable model. Consider th e linear rational expectations models as in C hapters 6 and 7,

y^ = /?Q + Sj/?|E[xjj I Zj] + (., i= l,...,N

In some models w here th e conditional expectations are unknow n functions, we may use th e n o n p aram etric estim ates as instrum ents. Of course, it is well known th a t in IV estim atio n , consistency is taken for granted while efficiency can be im proved by finding th e in stru m en ts closely related to the explanatory variable.

1.4 T H E S C O P E AND O U TL IN E O F T H E TH ESIS

(28)

essence of th e nonparam etric technique in the applications. Of course, regression analysis in economics has usually been referred in relation to p aram etric linear or nonlinear regression ones. In fact, regression analysis extends beyond th is class of param etric models and m any other topics should be included in this broad heading, e.g., biased estim ations (Stein, Ridge, principal com ponents regressions and indeed the nonparam etric regression th a t we are interested in). It is therefore im p o rtan t to understand the concept of nonparam etric regression and the distinctions from param etric regression. W e have presented and discussed th e form ulae, m otivations and properties of th e nonparam etric estim ators in C h ap ter 2. This will give us some understanding of the working of th e nonparam etric techniques and will serve as a reference in later chapters.

Two areas of economics will particularly benefit from th e use of sem iparam etric models m entioned in Section 2 and 3. T he first is th e hedonic price function and th e second is th e consum ption function.

(29)

literatu re on vehicles, there is however a clear p a tte rn evolving regarding th e choice of variables. B ut this is still somewhat dependent on th e functional form employed.

In C hapter 3, we have introduced a different methodology which m akes use of the two—stage hypothesis of O h ta and Griliches (1976). The procedures overcome the the problem of m ulticollinearity and ex tract m ore inform ation from th e fuel efficiency and rental cost function th en other approaches, such as th e discrete choice model. In this study of petrol price elasticities of fuel efficiency as well as a ttrib u te s, an intertem poral economic model is constructed. T he fuel efficiency function and the rental cost function need to be estim ated before we can construct th e elasticity. However, there is v irtually no inform ation on th e functional forms. T ransform ation models may be a desirable robust approach to overcome th e problem because of the relatively sim plicity of th e model structure. In this case, efficiency im provem ent can be a ttain ed by correcting for heteroscedasticity which is of an unknow n form, but known to be function of th e param eters and th e attrib u te s. F u rth e r efficiency im provem ent can be achieved in some cases using m axim um likelihood m ethod. Of course, this efficiency im provem ent is at th e expense of m aking th e additional d istribution assum ption on th e error term .

(30)

making too m any assum ptions on the hedonic price function.

In order to b etter understand and m o tiv ate th e work in C hapters 6 and 7, C hapter 5 is devoted to th e review of some of th e surprise consum ption literature. T he applications in this second p a rt of th e thesis will use only tim e series d ata.

In C hapter 6, we have tested th e consum ption capital asset pricing model by assuming th a t th e conditional variance of inflation follows an ARCH process. We have suggested and applied a Pseudo—G aussian m axim um likelihood criterion for bandw idth selection. W e found no evidence of expected inflation being an explanatory variable in this consum ption model. W e have also a tte m p te d to establish a relationship between consum ption and expected inflation in th e presence of possibly n o n -lin ea r ratio n al expectations form ation. T he one q u arter expected in terest ra te has no role in explaining consum ption. W e have also discovered some interesting properties of th e cross-validation function.

(31)

other linear ratio n al expectations models. Two sem iparam etric te st statistics, namely R te st (Robinson (1988c)) and Hausman (1978)'s te st, are used for testing zero—type restrictions and diagnostic checking. W e have obtained th e estim ates of interest under less restrictiv e environrrient w ith results opposite to those obtained under the m ore restrictiv e regime. W e have also established a relationship betw een expected real interest ra te and consum ption in this extended model.

In C h ap ter 8, we have evaluated th e perform ance of th e au to m atic bandw idth selection criterion and various subjective rule—of—th e —th u m b m ethods by m eans of M o n te-C arlo sim ulations. The results favour th e use of a u to m atic bandw idth selection m ethod th a t we have suggested, b u t only w ith reasonably large sam ple size based on th e sim ulation results.

(32)

C H A P T E R 2

T H E M ETH O D O F K ER N ELS AND TEC H N ICA L C O M PLEM EN TS

(33)

The Method o f Kernels [ch 2.

2.1 INTRODUCTION

Consider th e regression function or curve

yj = m(Xj) + Cj i= l,...,N (1)

2 2

where is a scalar, x. = (x^^.,...,x^.)', E [ej|x.] = 0, E[c.ej|x-] = 0 for i /j, E[e. ] = (r ■ m (x) is th e regression function and can be estim ated by p aram etric or nonparam etric m ethods. W hile param etric models is fam iliar to m ost econom ists, th e use of nonparam etric models are a relative new idea in th e field of economics.

T he m ost common approach to modelling (1) is to specify a p aram etric form for m thus giving rise to a linear or nonlinear regression model. It is assum ed th a t the form of th e regression function is known and there are a finite num ber of param eters, e.g., slope coefficients, to be estim ated. In this case, th e p ractitio n er selects one p artic u la r curve from a whole family. W hatever results and inference ob tain ed from the d a ta is heavily dependent on the choice of the functional form of m (x).

An altern ativ e approach to estim ating th e regression function is to use a n onparam etric regression model. In this case, we m ake no p aram etric assum ptions on m (x) except th a t it belongs to some infinite dim ensional collection of functions. However, some weak smoothness assum ptions m ay still be im posed on m (x), e.g., m is r th tim es differentiable. B ut th e nonparam etric model is less restrictiv e th a n th e p aram etric models and relies more on th e d a ta for inform ation.

(34)

The Method o f Kernels [ch 2. i?^-32]

is inadequate inform ation on th e functional form of m (x).

F rom th e theoretical point of view, it is believed th a t p aram etric models have desirable properties over the nonparam etric models, e.g., gains in asym ptotic efficiency. However, this belief m ay in fact be u n w arran ted if th e p aram etric model is misspecified. F urtherm ore, any subsequent policy analysis and conclusions from hypothesis testing m ay be erroneous if the model is indeed misspecified. Therefore, it is unwise to adhere to param etric models when in fact th ere are undesirable side effects as opposed to possible efficiency im provem ents and ease of in terp retab ility .

T he methodology which underlies our study is dom inated by th e use of sem iparam etric models. In other words, th e model under stu d y has two components: a p aram etric and a nonparam etric component. T he m ain difference betw een its p aram etric com ponent and nonparam etric component is th a t th e former has finitely m any unknown param eters while the la tte r has infinite num ber of param eters to be estim ated. T he m ost interesting property of the sem iparam etric model is th a t not only does it allow for unknown functions to be estim ated , it som etim es has th e same ra te of convergence, and indeed th e same efficiency, as p aram etric estim ators.

T here are numerous methods of nonparam etric estim ation including Fourier inversion, histogram , nearest neighbour, orthogonal series, penalty functions, splines, d elta sequences, kernels and others. These m ethods have been surveyed in P rak asa Rao (1983), and brief discussions of some m ethods are given in Silverm an (1986). Eubank (1988) has also described various m ethods of estim ating regression functions.

(35)

The Method o f Kernels ■ [ch 2. p^.33]

sem iparam etric models in em pirical work. F urtherm ore, th e asym ptotic properties of sem iparam etric estim ators are m uch easier to u n d erstan d after some discussion of the theorem s on nonparam etric estim ators.

W e begin by discussing th e m ethod of kernels in estim atin g th e density function and regression function of th e identical and independently d istrib u te d (i.i.d.) case. A fter presenting some asym ptotic results and properties, extensions to other cases will also be discussed.

2.2 SOME PRELIMINARIES

Let us assum e th a t (x.,y.) = (x^.,...,x ^.,y .), i= l,...,N are identically and independently distrib u ted as continuous m u ltiv ariate random variable (X ,Y ). W e are interested in the estim atio n of the density, f(x), as well as conditional expectations or m om ents, E [y|x] and E [g (y )|x ]. F irst, consider (1) again:

yj = m (x.) + e. i= l,...,N

where m is th e regression function or conditional expectation. Let us assum e th a t the problem here is to construct consistent estim ator of m (x) = E [y |x ]. E xtension to the conditional expectation of g(y) on x is straigh t forward. Assume th a t we have a joint density f(y,x). T he associated m arginal density for x is given by

% (x ) = 0 ( y . x ) d y (2)

The conditional density of Y given X is

~ (3)

and appealing to th e definition of conditional m om ent, we have

m ( x ) = J _ ^ y f y I x ( y I ^ ) d y - (4)

(36)

The Method o f Kernels [ch 2. p^.34]

density estim ation.

2.3 ESTIMATION OF UNIVARIATE AND MULTIVARIATE DENSITIES

The literatu re on th e estim ation of density functions is v ast. T he bibliography up to th e late 70's is provided by W ertz and Schneider (1979) and up to th e early 80's by Collomb (1981). T he num ber of articles quoted in th e tw o bibliographies will dem onstrate th e enormous interest shown in this field. W e do not intend to survey th e whole literature. W e re—iterate th a t our m ain purpose is to present th e methodology and sufficient asym ptotic results in order to un d erstand th e properties of th e estim ators of in terest to us. It will also serve as a useful reference chapter as th e same techniques are repeatedly used in later chapters.

The work on m u ltiv ariate density has its origin in th e statistic a l lite ra tu re of the 1950's. In p articu lar, the first published paper on u n iv ariate kernel density estim ation was by R osenblatt (1956). U nivariate kernel estim ators are unlikely to be very useful in economic problem s as m u ltivariate applications are usually encountered in econom etric studies. However, to give some initial insights in to th e working and th e properties of th e m ethod of kernels, it is useful to discuss u n iv ariate estim ators. The use of univariate kernels is also useful in simplifying th e n o tations when we are discussing extensions and more com plicated applications of th e m ethod of kernels. Let us give a very brief introduction of th e working of a univariate kernel.

U N IV A R IA TE D ENSITY

For this sub—section, let us consider th e i.i.d. observations on th e scalar random variables x^, 1=1,...,N draw n from the density function f(x). O ur in ten tio n is to estim ate f(x) and by definition

f(x) = d /d x F(x) = lim ^_^Q (a)~ ^[F (x+ a/2)-F (x-a/2)]

(37)

f(x ) = [ F ( x + a /2 ) - F ( x - a /2 ) ] /a

= a ^ [average num ber of Xj, j= l,..,N , in th e interval a centred at x]

= a ^[number of Xj in th e interval (x + a /2 , x—a /2 )]/N = (N a)“ ^ S j^ ^ I ( |X j- x |< a /2 )

= ( N a ) ~ ^ E j^ ^ I ( |x j- x |/a < l/ 2 ) = ( N a ) - ^ E j ^ ^ I ( a - \x j- x ) )

where a = a ^ is a sequence of positive num ber which satisfies th e condition lim ^ _ ^ a ^ = 0, and is sometimes known as window w idth, bandw idth, or sm oothing param eter. As these names suggest, a actually controls th e num ber of Xj to be averaged. I(u) is th e indicator function which takes th e value of 1 if |u | < 1 / 2 and zero otherwise, and has th e properties th a t JI(u )d u = 1 and I(u) > 0.

R osenblatt (1956) has suggested replacing th e indicator function I(u) by any function K (u) which possesses the properties (a) jK (u )d u = 1 and (b) K (u) > 0. In his case

f(x) = (N a)~^EjK (a~^(X j-x))

This kernel estim ator has th e advantage th a t m ore weight can be given to the observations closer to x and less weight to those fu rth er away. For exam ple, the norm al kernel, K(u) = (27r) ^/^exp(—u ^ /2 ), has th e property of giving m ost weight to th e observation x itself and th e weight decays exponentially. R osenblatt (1956) established th e consistency of this univariate kernel estim ator.

However, altern ative kernels m ay be desired including relaxing th e assum ption th a t K (u) > 0 o r/a n d sym m etric K (u) for superior perform ance. In fact, Parzen (1962) was th e first to generalize th e results to non—negative kernels. U nder the following regularity conditions on th e kernel

(38)

The Method o f Kernels [ch 2. p^.36)

(A2) Sup^ |K (u )| < 0 0, (A3) JI K (u) I du < 0 0, and (A4) l im |^ |_ ^ |u K ( u ) | = 0,

together w ith some conditions on th e ra te of convergence of th e bandw idth a,

(B l) ^™N->oo ^ ^ &nd (B2)

th e kernel estim ato r is asym ptotically unbiased an d m ean square consistent a t every continuity point of f. (B2) requires th a t as the num ber of observations increases, the bandw idth has to converge at a slower rate. This is ju st to m ake sure th a t as we have m ore observations, we m ust have a smaller bandw idth in order to delete observations fu rth er from th e point of interest x.

U nder conditions A l—A4 and B l—B2, ( f —E f ) / v a r ( f ) ^ /^ is asym ptotically d istrib u ted as stan d ard normal. Notice th a t we are centering around E f ra th e r th an th e tru e f (see Cacoullos (1964)'s result below which deals w ith th e la tte r case).

In fact, these conditions are fairly stan d ard in th e kernel lite ra tu re though th e following conditions are usually imposed on th e kernels:

(A la ) JK (u)du = 1, (A2a) JuK (u)du = 0, (A3a) Ju^K (u)du = c^^ f 0, (A4a) JK (u)^du < o o ,

(A la ) is equivalent to the weights sum m ing to one, (A2a) is autom atically satisfied if K is sym m etric about zero. (A3a) and (A4a) will be easy to com prehend as we come to th e asym ptotic properties of th e density estim ates. Kernels which satisfy conditions (A la ) to (A4a) are usually called second order kernels or sim ply kernel. T he discussions of kernels w ith higher order th a n th e second will be presented.

(39)

It was Cacoullos (1964) who extended th e u n iv ariate nonparam etric estim ator to. a m u ltiv ariate framework. Consider the case where we have i.i.d. observations on th e d—vector x., i= l,..,N draw n from th e density function f(x p ...,x ^ ). Let us ignore the i subscripts. T he estim ates f(x) of th e density function of f(x) is:

f(x ) = (N a^a2 ...a^)“ ^SjK(a^“ ^ ( x j- X jj) ,...,a j“ \ x j - x ^ j ) )

where K (u) is a m u ltiv ariate kernel satisfying certain conditions. For reasons best suited to our purpose, we restrict our atte n tio n to product kernels and a common bandw idth a^ = ...= a^ = a, where a is a sequence of positive constant satisfying lim^_^^a = 0. A diagonal bandw idth m atrix or a full bandw idth m a trix m ay be desired in some circum stances. Indeed, em pirical w orker usually use a diagonal bandw idth w ith p th diagonal elem ent a^ = s.d. (x^) x co n stan t x N ^, where a is a negative fraction.

T he bandw idth param eter controls th e degree of sm oothness of our estim ates. In this case, th e bandw idth is in fact the size of th e neighbourhood which controls how m any observations of x around x^ should be used for local regressions or averages. If we have a common bandw idth, we end up w ith a sim ple form

f(x ) = (N a‘^)“ ^ E .K ( a " \x - x .) )

= (N a d )-lS j{ n p k p (a -l(X p -X p j))} (5)

where 11^ refers to th e product from 1 to d and k^ is usually bu t not necessary a probability density function. For example, the norm al kernel can be expressed as

k(u) = (27r) ^/^exp(—u ^/2 ) (6)

In some cases, th e k are not confined to be non—negative.

(40)

m easurem ent. In th e case of nonparam etric regression discussed below, the N adaraya—W atson estim ator w ith product kernels and un it covariance m a trix can be in terp reted as an estim ator using diagonal bandw idth m a trix w ith p—p th elem ent as [a X stan d ard error (x^)].

2.4 N A D A R A Y A -W A TSO N K ER N EL E S T IM A T O R F O R C O N D ITIO N A L M O M EN TS

W e shall retu rn to our original problem of estim ating th e conditional mom ents. W hile th ere are a num ber of choices for kernel estim ato r of conditional m om ents, w hat we are about to present is known as th e N adaraya—W atson (N—W) estim ato r which originated independently from N adaraya (1964) and W atson (1964). This estim ato r will be th e m ost commonly used nonparam etric technique in our work.

Let Y be a random variable and X be a d x l vector of random variables. W e consider th e estim ation of th e regression function E [Y |X ]. E xtension to th e second and higher m om ents, E [g(Y )|X ], is straig h t forward. T he joint density can be estim ated by

f(y ,x ) = (N a ^ ‘^^)~^EjK(a'“ ^(x-X j))K (a“ ^ (y -y j)). Using (2), th e estim ato r for the m arginal density is

f(y-x)dy

= (Na*^) ^S.K(a \ x —x.).

Using (3), th e conditional density is estim ated as

Using (4), th e estim ato r for the conditional m om ent is therefore

m(x) =

y fY |x (y |x)dy = f x W ~ h _ ^ y f(y>x)dy

= (aSjK (a“ \ x - X j ) ) “ ^SjK(a“ \ x - X j ) ) y K (a ~ ^ (y -y j))d y

By a change of variable, u = a (y—y ,), i.e., y = (y .+ a u ) and dy = adu, we have m (x) = (SjK (a“ \ x - X j ) ) “ ^£jK (a“ ^ (x -X j)J_ ^ (y j+ a u )K (u )d u

(41)

(see A (la )). If we m ake th e assum ption (A2a) th a t Ju K (u )d u = 0, we shall arrive at th e N—W estim ator

m (x) = (E j^ jK (a “ \ x - X j ) ) “ ^ E j^ jy jK (a “ \ x - X j ) )

- SjKjy. (7)

or in m atrix form, M (x) = Ky.

* 1

where K . = (E .K ,(u )) K .(u), K is known as th e sm oothing m a trix w ith K-.

J ^ ^ J i j

elem ents and y is vector w ith elem ents y^. Since m (x) is a linear com bination of the * yj, the N—W estim ato r belongs to the class of linear estim ato rs given a. If K j is nonnegative and sum to one, it can be taken as a w eighted average of yj. In this case, it is not too difficult to understand how th e N—W estim ato r works. If m (x) is a sm ooth function, th en it is plausible th a t th e observations close to x. contain useful inform ation about m at Xj. One would w ant to place m ore weight on the observations close to X. and less or none of th e weight on the observations fu rth er aw ay from x.. Since th e N—W estim ato r is constructed by taking local average of th e d a ta close to Xj, it can be tak en to be a local average estim ator. T he ban d w idth therefore controls how wide th e in terv al should be and so how m any observations should be used for averaging.

(42)

In some cases, in order to overcome technical difficulties, th e N—W estim ator is augm ented to prevent th e denom inator from going to zero. T h e trim m ed N—W estim ator is defined to be

m (x) = [I(fj>b)(EjK (a“ ^(x-X j))“ ^]EjyjK(a“ \ x - X j ) ) (8)

where I is th e usual indicator function and b is a user chosen trim m ing constant. In other cases, in order to minimize th e influence of outliers or for the construction of cross-validation or other autom atic bandw idth selection criteria, we use th e leave—one—out estim ator

= (Sj^iK(a“ ^(x-Xj))“ ^Sj^jyjK(a~\x-Xj)) (9)

In order to gain more insight into th e estim ators, we have to discuss the asym ptotic properties. In particular, it is interesting to know under w hat regularity conditions are th e estim ators asym ptotically unbiased, consistent and have a lim iting norm al distribution.

2.5 ASYMPTOTIC PROPERTIES OF DENSITY ESTIMATOR

F irst of all, let us discuss some of th e asym ptotic properties of f(x ). Theorem 5.1. (A sym ptotic unbiasedness of f(x )). Suppose

(i) sup^ |K (u )| < CO, (ii) ; . |K (u ) ld u < 00,

R“

(iii) l i n i | ^ | ^ | u | ^ K ( u ) = 0, (iv) J , K (u)du = 1,

R “

(v) 1™N-400^ = 0 .

T hen a t every continuity point of x of f,

= f(x) Proof: Cacoullos (1964)'s Theorem 3.1.

(43)

asym ptotically unbiased if h(x) = E [g(y)|x] is continuous a t x and E h(x) < oo. Theorem 5.2. (M ean square consistency of f(x )). If (i) to (v) hold and

(vi) = 0 .

T hen at every continuity point x of f,

= O' Proof: Cacoullos (1964)'s Theorem 3.2.

The results can again be straightforw ardly extended to th e case of g(y) w ith additional regularity conditions on the m om ents of g(y) and h(x). Besides being asym ptotically unbiased and consistent, th e estim ato r has a lim iting norm al distribution.

Theorem 5.3. (A sym ptotic N orm ality of f(x )). If (i) to (v) are satisfied Then at every continuity point x of f,

(N a "^ )l/^ (f(x )-f(x )) - N (0,^2) where

(P" = f(x)JK ^(u)du.

Proof: Cacoullos (1964) and P rakasa Rao (1983).

W e have now centred around the tru e f as opposed to Parzen (1972)'s E f. Therefore, one can now construct confidence intervals for f(x) a t each point of x w ith

2 th e estim ato r of a as

P = f(x )jK ^ (u )d u . (10)

In fact, th e density estim ates are also known to be strongly consistent and uniform ly strongly consistent. The last property is one of th e m easures of th e global perform ance of the estim ator.

2.6 CONDITIONAL EXPECTATION

(44)

The Method o f Kernels [ch 2.

general estim ators of m (x) = E[g(x) |x], at distinct x ^ i= l,...N .

Theorem 6.1. If th e conditions of Theorem 2 and for function of derivative hold, m (x) is m ean square consistent.

Proof: Since both th e num erator and denom inator can be shown to be m ean square consistent, we apply Slutsky's theorem to obtain th e results.

Consider th e N—W estim ator as a ratio

m (x) = (EjK(a“ \ x - X j ) ) “ ^EjyjK(a“ V - X j ) ) = ^ (x )/f(x )

where f(x) = (E-K(a \ x —x.)) (11)

^(x) = S jy jK (a ~ \x -X j)) (12)

Since th e N—W estim ator is a ratio, some authors have looked a t th e approxim ation of th e ratios and obtained the asym ptotic results. Let w(x) = Jyf(x,y)dy and v(x) = Jy f(x,y)dy. T he following theorem dem onstrates th a t th e N—W and some other nonparam etric regression estim ators are asym ptotically norm ally d istrib u ted .

Theorem 6.2. Let Xp...Xj^ be distinct point. If (i)(a) K (u) and |u K (u )| are bounded,

(b) Ju K (u )d u = 0, and (c) Ju^K (u)du < 0 0,

(d) lim^_^^Na^ = oo and lim ^^^N a^ = 0. (ii) f(x.) > 0, 1 < i < k,

(iii) E | y | ^ < « ,

(iv) th e derivative of v exists and is bounded, w and f are twice differentiable and bounded,then

(N a)^/^[m (x ^)-m (x j),...,m (x j^)-m (x j^)] ' Nj^(0,n), where H is a diagonal m atrix w ith th e ith elem ent w ith

(45)

This suggests th a t interval estim ates can be easily constructed for nonparam etric regression estim ates. W e also know th a t var[y|x] = v (x )/f(x ) —

2 2 2

(w (x )/f(x )) = E[y |x] — {E[y|x]} . It is therefore easy to suggest how to construct the estim ate for th e conditional variance since we can express th e variance as tw o regression functions. Indeed, nonparam etric variance estim ates are useful for heteroscedastic problems.

2.7 DERIVATIVES OF CONDITIONAL EXPECTATION

In some of our problems, especially in those cases where we have to m axim ize or m inim ize a function w ith respect to th e param eters of in terest, we will have to consider th e derivative of th e N—W estim ator. In p articu lar, we have in m ind th e problem of constructing two step or linearized ML estim ators. Suppose th a t m adm its r derivatives and we wish to estim ate th e r^ th (r^ < r) derivative.

Theorem 7.1 If the following conditions are satisfied, (i) E[y2] < « ,,

(ii) f(x) > 0,

(iii) m (x) is r tim es differentiable,

(iv) If th e characteristic function of K is ÿ , th en J |u |^ ÿ ( u )d u < oo, i.e., ^(u) = Jexp(iux)K (u)du, where i = —1.

W ■ l i m | ^ | ^ | u K ( u ) | = 0,

(vii) f is continuous in [a,b],

(viii) I f(u) I d u < 0 0, for ^ < c < a < b < d < 0 0, (ix) J |u K (u ) |d u is finite,

(x) K^'®^(u) be a continuous function of bounded variation for j= 0 ,l,..,r . (xi) f and its first r+ 1 derivatives are bounded,

(xii) lim j^^^a/e = 0,

(46)

then there is a constant C such th a t for any positive

I I > e 1 < C ( Na ^ ’^' ^^£^) “ ^

Proof: Schuster and Yakowitz (1974).

From th e theorem , we can see th a t th e ra te of convergence depends on r. The problems th a t we will be concerned w ith only requires first derivative.

2-8 HIGHER ORDER KERNELS AND OPTIMAL BANDWIDTH

As we have m entioned above, it is som etim es desirable to relax the non—negative kernel and take advantage of th e sm oothness of m. Im proved asym ptotic rates of convergence can be attain ed via th e use of a higher—order kernels. Of course, w hat we are about to describe is not th e only way to reduce bias, the m ethod of Jackknife (see Schucany and Sommers (1977)) has also been used by other authors in sem iparam etric models. In fact, it would be useful to introduce th e higher order kernel of B arlett (1963).

Definition 8.1: Let f be ^ times differentiable in th e neighbourhood of x, kernel of order i is to satisfy

(i) Ju^K(u)du = 6.Q, 0 < i < &-1; (ii) J |u |^ K (u )d u / 0;

(iii) Sup^ ( l + |u |^ '^ ^ ) |K ( u ) | < o o ;

where 6 - is th e K ronecker's delta, (i) ensures th a t we have enough zero m oments. Sometimes, it m ay be desirable to relax (ii) and (iii) as in other higher—order kernel lite ra tu re (see, P rak asa Rao (1983)).

(47)

The Method o f Kernels [ch 2. pgAb]

Theorem 8.1: Let f be ^ tim es differentiable in th e neighbourhood of x and K is a higher—order kernel given in Definition 1. Then

E f(x ) = f(x) + 0 ( / ) Proof: Robinson (1989).

This theorem shows th a t th e bias decreases "sufficiently fast" w ith a. W e have assum ed th a t K (u) integrates to 1 and th a t we have a sufficiently sm ooth function w ith enough zero mom ents. Thus, it is easy to suggest a e K. W ith i even and (/? a even function, we can suggest each kp(u) as a product of ^ (u ) and ^(u ) where ^ (u ) is an even polynomial in u, i.e.,

k(u) = (13)

we can easily find c^'s which satisfy (i), e.g., (a) if ÿ(u) = l / 2 I ( | u | < l ) , then th e m om ents are

m ^=0 if r is odd;

m ^= Ju^^(u)du = [r+1] ^ if r is even; (b) if ^(u) = (27t) ^/^exp(—u^ / 2), then the m om ents are

m ^=0 if r is odd;

m^=j"u^ÿi(u)du = r![(r/2)!(2^/^)]~^ if r is even;

su b stitutin g k(u) into (i), we will have a system of (&-2)/2 equations of th e form M c = d,

where

c =

M =

*^0 ■ 1 ■

^1 d = 0

V - 2 ) / 2 - Ô

■ 1 m.2 m^ . . . _™(&2)

“ 2 m^ . . . .

■ “ (f -2 ) •

™2(f-2)

(48)

a stronger condition K(u) = 0 ((l4 - ^), for some e > 0. W hen r = 2, we have th e simple density estim ates. Although th e theorem holds for a variety of distributions for K (u), it rules out Cauchy distrib u tio n because its m om ents do not exist.

To understand th e bandw idth selection problem in higher—order kernels, let us look a t th e MSE. T he MSE of a higher—order kernel can be expressed as :

MSE (f) = b ia s(f)^ + v a r(f) where

b ias(f) = jK (u ){f(x —an )-f(x )} d u

v a r(f) = N ^[a ^jK (u)% (x—au)du — (jK (u )f(x —au)du)^] W ith th e Taylor series expansion, given th a t au is small, we have

f(x—au) = f(x) — auf'(x) + 1/2 a \ % " ( x ) + ....

B ut from th e definitions of higher—order kernel, the first r—1 m om ents of th e kernel is zero. This implies the mom ents of f are all zeros except th e term associated w ith ar. Therefore, we are left w ith

bias = constant x f^(x) Ju^K (u)du x a^

Sim ilarly,

var (f) = (Na) ^J{f(x)—auf'(x)

+ 1/2 (a u )% "(x )-...} K ^ (u )d u + 0 ( N » (Na) %(x)J’K ^(u)du

= C g(N a)^^

W e can now consider th e "optim al bandw idth" a^^^ = m in^ MSE w ith MSE = {CgCNa)^^ + (C^a*^)^}.

Consider th e use of simple calculus,

(49)

The Method o f Kernels [ch 2. ^^.47]

From this first order condition, we have

V t = C N - “

where C = (C2 / (2rC ^ ))a, a = l / ( 2 r + l ) .

C is a function of f(x) since both and Cg are functions of f(x). As m entioned above, th e problem in obtaining the optim al a in practice is th a t f(x) is unknown! It appears th a t we m ay be able to work ou t th e unknown con stan t and Cg if we take th e average a^^^'s corresponding to tw o different N's: and The results on optim al bandw idth can be extended to d—dimension m u ltiv ariate density estim ates straightforw ardly. In th e case of Cacoulous (1964)'s m u ltiv ariate kernel, we have a^p^ = constant x (N where a = d + 2r.

Since depends on JK (u)du, Epanechnikov (1969) has gone furth er by finding th e optim um kernel by minimizing jK (u)du subject to th e constraints th a t (i) th e kernel integrates to one, (ii) a sym m etric kernel and (iii) Ju K (u )d u = 1. This produces w hat is now known as the Epanechnikov kernel which is non—negative. N otice also th a t replacing (ii) by (ii)' Ju K (u )d u = 0 allows for non—sym m etric kernels. However, it has been known from Epanechnikov (1969) th a t th ere is very little difference in using different kernels and th e choice of kernels should be based on com putational and technical considerations. For these reasons, th e uniform and norm al kernels have been favoured in m ost applied studies and indeed our work in th is thesis.

2.9 DEPENDENT OBSERVATIONS

In economics, we often deal w ith tim e series. It is therefore useful to u n d erstan d some dependence conditions used in th e stud y of th e asym ptotic properties of th e estim ators. W e generally assum e th a t th e observed economic variables {W^, - o o < t < o o } is strictly stationary. Let denotes th e cr—algebra of events generated by for - o o < a < b < oo . W e say W^ is

(50)

ttj = sup |P ( A n B ) - P ( A ) P ( B ) | 0, as j -> 0 0. BeM*^ ,A eM “

-® J

(i) A bsolute Regular (A RE) if

/? .= E { s u p |P ( A |M _ ^ ) - P ( A ) |} - .0 , a s j - œ . A eM ?

(iii) Uniform Mixing (UM) if

ÿj = sup I P(A I B )-P (A ) I -4 0, as j 0 0. BêM ‘',A eM °?

J

It is known th a t (iii) ^ (ii) ^ (i) since . (ÿj) > (/?j) > (nj).W hile th ere are asym ptotic results under various stationary processes for th e kernel estim ators, some results for strong mixing have been provided in Robinson (1983). Under some regularity conditions, additional to th e usual conditions im posed on th e kernels, ra te of convergence of a as well as th e condition th a t W is SM, f(x ) and m (x) have the sam e lim iting d istribution as in th e case where x / s are i.i.d.. However, he w arned against placing too m uch faith in th e estim ates because th e perform ance of th e kernel estim ates clearly depend crucially on th e choice of bandw idth. It is perhaps not surprising th a t stronger conditions have to be imposed for the central lim it theorem , w hat is surprising is th a t the same ra te of convergence and covariance m atrix are identical to th e case of i.i.d.. T he results thus justify the use of kernel estim ation in the tim e series case. Recent work of Robinson (1986) discovered th a t th e bandw idth chosen should be larger th an th a t for i.i.d case.

2.10 O U TL IER S AND TA IL E F F E C T

(51)

The Method o f Kernels [eh 2. p#.49]

rest of th e observations, SjK jyj/EjK j = y^. The sam e problem will occur if one uses too small a common bandw idth which controls th e num ber of observations to be used for averaging. This is exactly th e problem one faces in th e case of density estim ates when estim ating th e tails where there are only a few observations around.

There are solutions to these problems, such as using th e leave—o n e -o u t estim ato r to exclude th e ith observation when sum m ing over th e N observations in th e density or N—W estim ates. A nother solution m ay be to use a variable bandw idth which may generate further problems in practice.

2.11 H IG H D IM EN SIO NA LITY

Applying the kernel estim ato r to m u ltiv ariate economic problem s has its shortcom ings. One serious problem w ith th e kernel estim ato r is th e "curse of dim ensionality". This problem arises when we have a large num ber of explanatory variables as we often encounter in economic appl