J O U R N A L O F
Econometrics
ELSEVIER Journal of Econometrics 81 (1997) 193-221
Efficient estimation in semiparametfic G A R C H models
Fe;k¢ C. Drost ~*, Chris A.J. Klaassenb
a Department o f Econometrics, Tilbury University, P. 0. Box 90153, 5000 L E Tilburg, Netherlands
b Department o f Mathematics, University o f Amsterdam, Pluntage Muidergracht 24, 1018 T V Amsterdam, Netherlands
A b s t r a c t
It is w e l l - k n o w n that financial data sets exhibit c o n d i t i o n a l heteroskedasticity. G A R C H - t y p e m o d e l s are often used to m o d e l this p h e n o m e n o n . Since the distribution o f the rescaled i n n o v a t i o n s is g e n e r a l l y far f r o m a n o r m a l distribution, a sem,.'parametric a p p r o a c h is ad- visable. Several publications o b s e r v e d that adaptive estimation o f the E u c l i d e a n parameters is not possible in the usual parametrization w h e n the distribution o f the rescaled i n n o v a - tions is the u n k n o w ~ n u i s a n c e parameter. However, there exists a reparametrization such that the efficient score functions in the parametric m o d e l o f the autoregres_sion parameters are o r t h o g o n a l to the tangent space generated b y the n u i s a n c e parameter, thus suggesting that adaptive estimation o f the autoregression parameters is possible. Indeed, w e con- struct adaptive and h e n c e efficient estimators in a general G A R C H in m e a n - t y p e context i n c l u d i n g integrated G A R C H models.
O u r analysis is based o n a general L A N t h e o r e m for time-series models, p u b l i s h e d elsewhere. In contrast to recent literature about A R C H m o d e l s w e do not n e e d a n y m o m e n t condition. © 1997 Elsevier Science S.A.
K e y words: LAa'Xl in time-series; Semiparametrics; A d a p t i v i t y ; (integrated) G A R C H (in m e a n )
J E L classification: C22; C14
* Corresponding author.
The first author is r e h a s h fellow o f the Royal Netherlands Academy o f Arts and Sciences (K.N.A.W.). Part of this research was done by the second author at the Euler International Mathematical Institute, St. Petersburg. The authors gratefully acknowledge the helpful comments o f Bas Werker, the editor Helmut Lfitkepohl, and three anonymous referees.
0304-4076/97/$17.00 ~D 1997 Elsevier Science S.A. All rights reserved
194 F. C. Drost. C. A.J. Klaassen ! Journal o f Econometrics 81 ( 1 9 9 7 ) 193-221
I. I n t r o d u c t i o n
It is a well-established empirical fact in financial e c o n o m i c s that time series like e x c h a n g e rates and stock prices exhibit conditional heteroskedasticity. Big s h o c k s are clustered together. T h e original paper o f Engle ( 1 9 8 2 ) proposes the A R C H model to incorporate conditional heteroskedasticity in e c o n o m e t r i c m o d - eling o f financial data sets. Bollerslev ( 1 9 8 6 ) introduces the G A R C H m o d e l as a generalization o f A R C H . This facilitates a p a r s i m o n i o u s parametrization w h i c h is particularly useful w h e n s h o c k s are important for a longer period (the idea c o r r e s p o n d s to the generalization o f A R to A R M A models). Several variations a n d extensions h a v e been p r o p o s e d in the literature. N e l s o n ( 1 9 9 1 ) proposes the exponential G A R C H m o d e l to capture the fact that the stock m a r k e t is s m o o t h e r in u p w a r d directions than in the opposite case ( b e c a u s e o f the leverage effect). G o u r i e r o u x and M o n f o r t ( 1 9 9 2 ) suggest a n o n p a r a m e t r i c approach. T h e y do not restrict attention to conditional variances that d e p e n d only u p o n past squared observations, but they try to estimate the functional f o r m o f the conditional het- eroskedastic variance from the data. A n o t h e r important extension is the G A R C H - M - t y p e m o d e l (cf. Engle et al., 1987). A c c o r d i n g to the Capital Asset Pricing M o d e l one expects h i g h e r returns due to risk p r e m i a if the asset is m o r e risky. T o m o d e l this p h e n o m e n o n the conditional variance is also included in the m e a n equation. Lots o f applications h a v e s h o w n the strength o f the G A R C H type o f modeling. In this p a p e r w e do not refer to original application papers but w e w a n t to d r a w attention to the m o n o g r a p h o f Diebold ( 1 9 8 8 ) and the survey paper o f Bollerslev et al. ( 1 9 9 2 ) .
Despite the success o f the G A R C H history there are several topics that re- quire attention. In this p a p e r w e c o n s i d e r the distributional a s s u m p t i o n s on the rescaled innovations. T h e original formulations o f G A R C H - t y p e m o d e l s a s s u m e that these residuals are standard normal. Diebold ( 1 9 8 8 ) , however, s h o w s that this a s s u m p t i o n is often violated in empirical examples. Typically, the innova- tions h a v e fat-tailed d i s t n t m t i o n s and they are also n o n - s y m m e t r i c in several ap- plications. Drost a n d W e r k e r ( 1 9 9 6 ) provide an explanation for high kurtosis i f the observations arise f r o m a G A R C H data-generating process in c o n t i n u o u s t i m e (see also Drost a n d N i j m a n , 1993; Nelson, 1990a). Diebold ( 1 9 8 8 ) suggests that the errors will be ' m o r e n o r m a l ' i f the process is m o r e and m o r e aggregated. De- spite the o b s e r v e d n o n - n o r m a l i t y o f the error structure, W e i s s ( 1 9 8 6 ) and Lee a n d H a n s e n ( 1 9 9 4 ) h a v e s h o w n that q u a s i - m a x i m u m likelihood estimation ( Q M L E ) , b a s e d u p o n the false a s s u m p t i o n o f normality, yields x/f~-consistent estimators: see also L u m s d a i n e ( 1 9 8 9 ) . H o w e v e r , the efficiency loss m a y be considerable. There- fore, several authors try to a v o i d efficiency loss, allov, i n g the error structure to b e l o n g to s o m e flexible p a r a m e t r i c family o f distributions. Student t-distributions are very popular, cf., e.g., Baillie a n d Bollerslev (1989). A s a d r a w b a c k o f the introduction o f such parametric m o d e l s o f the innovation distribution, we m e n t i o n that the results o f W e i s s ( 1 9 8 6 ) a n d Lee a n d H a n s e n ( 1 9 9 4 ) d o not carry o v e r
F.C. Drost. C.A.J. KlaassenlJournal o f Econometrics ;~i (1997) 193--221 195
to general error distributions. W h i l e Q M L E b a s e d u p o n the n o r m a l distribution yields x/~-consistent estimators, Q M L E b a s e d u p o n other distributions (e.g., the student distributions) generally e v e n fails to be consistent i f the true distribution is different.
In the approaches m e n t i o n e d above, the stochastic error structure is still de- scribed by s o m e finite-dimensional statistical model. To avoid the introduction o f a w r o n g parametric family o f innovation distributions leading to inconsistent estimators and to be m o r e flexible, a s e m i p a r a m e t r i c a p p r o a c h is to be preferred. W e w~.nt to estimate the conditional heteroskedastic character o f the G A R C H process but w e do not w a n t t~ restrict the class o f error distributions too much. Apart from s o m e regularity conditions w e will a s s u m e the distribution o f the innovations to be c o m p l e t e l y u n k n o w n . In passing w e will c o n s i d e r the case o f s y m m e t r i c a l l y distributed innovations. A~ first sight, these types o f estimation p r o b l e m s s e e m to be m u c h h a r d e r than the c o r r e s p o n d i n g parametric ones a n d one w o u l d expect that optimal semiparometric estimators are less p r . ~ i s e a s y m p - totically than optimal parametric estimators. F o r lots o f interesting e c o n o m e t r i c m o d e l s this p r e s u m p t i o n turns out to be too pessimistic. A d a p t i v e estimation is oRen possible. A d a p t i v e estimatio~.l is j u s t a special instance o f s e m i p a r a m e t r i c efficient estimation. Just as in parametric m o d e l s , in s e m i p a r a m e t r i c m o d e l s an efficient estimator is an asymptotically n o r m a l estimator with m i n i m a l variance. I f this m i n i m a l variance is the s a m e as w h e n the error distribution is k n o w n , o n e calls the efficient estimator a d a p t i v e since it adapts, so to say, to the u n d e r l y i n g error distribution. Typically, an e s t i m a t o r b a s e d on a ( w r o n g l y ) specified error distribution is not efficient in the s e m i p a r a m e t r i c sense. F o r i.i.d, o b s e r v a t i o n s a lot o f adaptive and s e m i p a r a m e t r i c results are available [cf., e.g., Bickel et al. ( 1 9 9 3 ) ( B K R W , 1993 from n o w o n ) a n d the s u r v e y p a p e r s o f R o b i n s o n ( 1 9 8 8 ) and N e w e y (1990)]. R i g o r o u s results are sparse in a time-series context. A R M A m o d e l s are considered in detail b y Kreiss ( 1 9 8 7 a , b). S o m e ~ s u l t s for G A R C H are o b t a i n e d in E n g l e and G o n z ~ l e z - R i v e r a ( 1 9 9 1 ) and S t e i g e r w a l d ( 1 9 9 2 ) (se¢ also P6tsA:her, 1995; Steigerwald, 1995). L i n t o n ( 1 9 9 3 ) discusses the s e m i p a r a m e t r i c properties o f A R C H m o d e l s in m o r e detail. H o w e v e r , these p a p e r s i m p o s e rather h i g h m o m e n t conditions. T h e p a r a m e t e r estimates o b t a i n e d in empirical w o r k generally fail these m o m e n t conditions and, therefore, the scope for application s e e m s to be limited.
In Drost et al. ( 1 9 9 7 ) ( h e n c e f o r t h D K W , 1997) a general L A N t h e o r e m for time-series m o d e l s is presented together with conditions g u a r a n t e e i n g the exis- tence o f efficient estimators. W e will apply these results to G A R C H type m o d e l s , including, e.g., I - G A R C H a n d G A R C H - M , thus a v o i d i n g severe m o m e n t condi- tions. W e do not r ~ e d the existence o f m o m e n t s neither o f the rescaled inno- vations in the G A : _ C H m o d e l (admitting, for e x a m p l e , C a u c h y errors) n o r o f the observations (as is clear f r o m the inclusion o f i m e g r a t e d G A R C H m o d e l s ) . Since w e only a s s u m e the existence o f a stationary solution o f the G A R C H equations, o u r a p p r o a c h captures the m o d e l s c o m m o n l y used. A general L A N
196 F.C. Drost, ¢ . A . J . KiaassenlJournal o f Econometrics 81 (1997) 193-221
theorem for time-series models is also contained in Theorem 13, Section 4, o f Jeganathan (1995). Based hereon is his Theorem 17, Section 4, which yields adaptive estimators for ARMA-type location models. However, this result is not directly applicable to G A R C H scale models and, moreover, it heavily leans on symmetry o f the innovations.
To keep notation simple we restrict attention to the popular and most commonly used G A R C H ( I , 1 )-type models. This preserves the essential difficulty o f G A R C H (with respect to A R C H ) since both the A R and the M A part are present in the conditional variance equation. All past observations show up (at an exponentially decaying rate). The statement o f our theorem with respect to G A R C H ( I , 1 ) is easily generalized to the general case o f G A R C H ( p , q ) .
The first semiparametric results in a G A R C H context were only partially suc- cessful. Engle and Gonz_Alez-R/vera (1991) state "Monte Carlo evidence suggest that this semiparametric method
(i.e. the discrete maximum penalized likelihood
estimation technique o f Tapia and Thompson,
1978) can improve the efficiencyo f the parameter estimates up to 50% over QMLE, but it does not seem to cap- ture the total potential gain in efficiency. In this sense we say that the estimator is not adaptive' in the class o f densities with mean 0 and variance 1; that is, the estimator is not" fully efficient, and it does not achieve the Cram6r-Rao lower bound. The infornmtion matrix is not block-diagonal between the parameters o f interest (the ones in the mean and in the variance equation) and the nuisance parameters (the know of the density). I f we choose the parametric form o f the model with a conditional parametric density defined by a shape parameter, this one being part o f the parameters to estimate, we can show easily that the expec- tation o f the cross-partial derivatives o f the log-likelihood function respects the parameter o f interest and the shape parameter is different from 0. In other words, the estimation o f the shape parameter affects the efficiency o f the estimates o f the parameters o f interest" (pp. 3 5 5 - 3 5 6 ) . These statements imply that the finite- dimensional parameter describing the G A R C H model (with the standardized error distribution as nuisance parameter) is not adaptively estimable. This is not sur- prising since the classical G A R C H forml:lation contains a scale parameter, and in most models the variance is not adaptively estimable. Therefore, the scale parameter is often included into the (infinite-dimensional) nuisance parameter. For the G A R C H model this procedure does not work: the scores w.r.t, the re- maining autoregression parameteis are still not orthogonal to the tar:gent space. Hence, complete adaptive estimation o f the conditional heteroskedastic character is not possible in G A R C H models. This explains the efficiency loss observed by Engle and Gonzfilez-Rivera (1991). However, calculation o f the scores w.r.t, the parameters o f the G A R C H model shows that there are several orthogonality re- lations between the score space and the tangent space generated by the unknown shape. Linton (1993) and Drost et al. (1994) (henceforth DKW, 1994) obtain alone different lines a reparmnetrization o f the A R C H and G A R C H model, re- spectively, such that the autoregression parameters are adaptively estimable and
F.C. Drost, C.A.J. Klaassen/Journal o f Econometrics 81 (1997) 193-221 197
the location-scale parameters generate the most difficult one-dimensional sub- problems. So, knowledge o f the shape o f the error distribution does not help to construct better estimators of the conditional heteroskedastic character o f the G A R C H process. This resembles the regression model with unknown location p E R, regression parameter fl E R k and completely unknown error distribution, where the regression parameter/~ is adaptively estimable i f the location parame- ter is included into the nuisance (see Bickel, 1982).
The paper is organized along the followiwg lines. In Section 2 we state the LAN theorem for a large set o f G A R C H ( I , I ) - t y p e models, including all sta- tionary classical GARCI] models such as, e.g., I-GARCH and GARCH-M. This LAN property is derived for the parametric model with the shape o f the inno- vations known, and it implies the Convolution Theorem o f Hajek (1970) which we will state next. This Convolution Theorem yields a bound on the asymptotic performance o f estimators in the parametric model and is valid, a f o r t i o r i , for the semiparametric model as well. Section 3 is devoted to the construction o f an estimator o f the autoregression parameters on the assumption that the shape o f the innovations is unknown, i.e. within the semiparametric model. This es- timator happens to attain the bound from the parametric Convolution Theorem and therefore is asymptotically efficient in the parametric model and hence in the semiparametric model, since it does not use knowledge about the shape o f the innovations. Such an estimator, which attains the parametric bound in a semi- parametric model, is called adaptive. The proofs o f these results are based on DKW(1997) and most o f them are given in the appendix.
A small simulation study is presented in Section 4. It turns out that the sug- gested optimal estimator performs as expected: the estimator performs better than Q M L E and the difference with MLE (if the error distribution is known) becomes negligible when the sample size is growing large. The empirical illustration in this section shows that the efficiency loss by using Q M L E may be considerable. Some conclusions are drawn in Section 5.
2. L A N and Convolution Theorem
We consider a generalization o f the reparametrized G A R C H ( p , q ) model as given in Linton (1993), with p - - 0 , and motivated by adaptation arguments in DKW(1994). For notational simplicity, we take p - - q = I. It~ this manner the essential difficulty o f an infinite number o f lags is retained. To obtain the corre- sponding results for the general case (with p , q E N fixed) a careful replacement o f coefficients by vectors suffices.
Let p E R , a > O , or>O, and f l > O be parameters and let {~t: t ~ Z } be an i.i.d. sequence o f innovation errors with location zero, scale one, and density g. Put ~t = # + o , t and note that ~t is a random variable with location /~, scale a, and density c r - l g ( { - - p } / e ) . We introduce the following convention: random
198 F.C. Drost. C.A.J. KlaasseniJournal o f Econometrics 81 (1997) 193-221
variables, like e and ~, d e n o t e a typical e l e m e n t o f the c o r r e s p o n d i n g sequences {e/: t ~ 7/} and {~t: t ~ 7/}.
C o n s i d e r the m o d e l with observations
y t = h ~ / 2 ~ t :l~nt" Lt/2 + ~rh~/2~t, ( 2 . 1 )
w h e r e the u n o b s e r v a b l e heteroskedasticity factors ht d e p e n d on the past via ht = 1 + / ~ h t - ! + ~ Y L l = 1 + h t - 1 ( ~ + oc~Ll). ( 2 . 2 ) O b s e r v e that the Euclidean p a r a m e t e r 0--(g,~ff,/~, a ) ' is identifiable. T h r o u g h o u t w e a s s u m e that Eq. ( 2 . 2 ) a d m i t s a stationary solution {ht: t E ~'}. A necessary a n d sufficient condition is given by (Nelson, 1990b, T h e o r e m 2):
A s s u m p t i o n A.
E ln{/~ + ~ 2 } < 0 . ( 2 . 3 )
O u r s e m i p a r a m e t r i c analysis treats the density g as an infinite-dimensional nui- sance p a r a m e t e r and includes all strictly stationary G A R C H m o d e l s o f type (2.1), (2.2). T h e s e equations contain, e.g., the classical E n g l e ( 1 9 8 2 ) - B o l l e r s l e v ( 1 9 8 6 ) G A R C H m o d e l with a different parametrization and with finite s e c o n d m o m e n t s (fl -t- 0ccr 2 < 1 and / z : - 0 ) , and the I - G A R C H m o d e l o f E n g l e and Bollerslev ( 1 9 8 6 ) (fl+0~o " 2 = 1 and / ~ - - 0 ) . F u r t h e r m o r e , our m o d e l r e s e m b l e s the G A R C H - M m o d e l o f Engle et al. ( 1 9 8 7 ) . In the m e a n equation ( 2 . 1 ) w e h a v e i n c l u d e d the conditional standard deviation o f Yt while Engle et al. ( 1 9 8 7 ) include a k i n d o f conditional variance. M o r e precisely stated, their m o d e l is g i v e n by zt -~ ¢Sht --k Yt and p -- 0, i.e., zt = ¢Sht + ah~/eet. Inserting /~ = 0 in ( 2 . 1 ) or/~----~ = 0 in the G A R C H - M m o d e l yields the classical G A R C H model. G e n e r - ally, risk a v e r s i o n is stronger p r o n o u n c e d in the original G A R C H - M m o d e l than in o u r formulation.
S u p p o s e that w e observe Y l , . . . , y n , and s o m e starting v a l u e h01 initializing (2.2). It is not n e e d e d that h01 arises f r o m the stationary solution o f (2.2). W e are c o n s i d e r i n g estimation o f 0, b a s e d on h o l , y , . . . Yn, in the p r e s e n c e o f the infinite-dimensional nuisance p a r a m e t e r [/. H o w e v e r , in this section w e will fix the nuisance p a r a m e t e r g and in the resulting parametric m o d e l w e will derive a b o u n d on the asymptotic p e r f o r m a n c e o f regular estimators o f 0. a so- called C o n v o l u t i o n T h e o r e m . T o that end w e c h o o s e local s u b m o d e l s and w e will study estimation o f 0 locally asymptotically. T h e a b o v e - m e n t i o n e d C o n v o l u t i o n T h e o r e m holds o n c e the log-likelihood ratios o f the o b s e r v e d r a n d o m variables are locally asymptotically n o r m a l ( L A N ) .
O b s e r v e that the m o d e l with the autoregression p a r a m e t e r s ~ and jff fixed too, c o r r e s p o n d s to the location-scale m o d e l for i.i.d, r a n d o m variables since the in- f o r m a t i o n p r o v i d e d by the o b s e r v a t i o n s h0~, y~ . . . Yn is equal to the information
F C. Drost. C.A.,I. K l a a s s e n l J o u r n a l o f Econometrics 81 ( 1 9 9 7 ) 193-221 199
contained in the i.i.d, r a n d o m variables £ 1 , . . . , ~n- C o n s e q u e n t l y , the location-scale m o d e l is a parametric s u b m o d e l o f o u r time-series m o d e l and it m a k e s sense to a s s u m e that this s u b m o d e i is regular, i.e. (see Hfijek and S i d , ~ , 1967):
Assumption B. T h e distribution o f e possesses an absolutely c o n t i n u o u s L e b e s g u e density g with derivative g ' and finite Fisher information for location
~(~) =/{o'/g}2o(c)d~
a n d for scale
/ { l 4 - E g t / g ( e ) } 2 g ( e ) d e .
& ( o ) =
M o r e o v e r , the r a n d o m variable ~ has location zero a n d scale one.
( 2 . 4 )
( 2 . 5 )
To b e able to derive an a s y m p t o t i c l o w e r b o u n d w e h a v e to rely o n semi- parametric m e t h o d s as presented in, e.g., B K R W ( 1 9 9 3 ) a n d D K W ( 1 9 9 4 , 1997). So w e fix 0 at 00 =(o~o, flo, l~o, oo)' and c h o o s e local parametrizations On =
(ctn,fln, Pn, On)' a n d On=(~n, fln, fin, On) ' such that ]On--Oo[--O(n-t/2), [On--Ool:
O(n--I/2), a n d e v e n
An -- v/'n(0n -- 0,,) ---* 2, as n --, oo. ( 2 . 6 ) In the r e m a i n d e r expectations, c o n v e r g e n c e s , etc., are implicitly t a k e n u n d e r On
and O (unless o t h e r w i s e indicated).
To obtain a u n i f o r m L A N t h e o r e m w e c o n s i d e r the log-likelihood latio An o f h0x,Yl, . . . . y,, for 0n with respect to On u n d e r On ( a n d O fixed). O b s e r v e that t h e residuals a n d the conditional variances up to time t c a n be recursively calculated f r o m 0 a n d the o b s e r v a t i o n s h 0 1 , Y l , . . . , Y t : with h l ( O ) - - h o l , obtain for t = 1 , 2 , . . .
¢ , ( 0 ) = y , / h y 2 ( o ) , c,(0) = { ~ , ( 0 ) - # } / o , h,+~(0) = l + # h , ( 0 ) + ~y2. ( 2 . 7 ) (2.S) ( 2 . 9 ) Conditionally on hol the density of" Yt . . . Yn u n d e r On is
f l o~lh~t'/2g(tr~ ' {h~l/ayt --/tn})
= fl
o~'h~'/2 g({~nt --
I~n}/On)
t = i t----I
lg
= 1-[ ¢r~'h~'/eg(~,,t), t=l
200
F.C. Drost. C.A.J. KlaassenlJournal of Econometrics 81 (1997) 193-221
To e n h a n c e the interpretation o f this formula and to stress the link b e t w e e n the present time-series model and the i.i.d, location-scale model w e introduce the notation L , t =
hi(On ),
:{z,a}(x)=
loga({x- z}/cr)- log#,
Snt
-__ Ell2
_ Lt/2 'O n f l n l ~ O n r # n l
(2.10)
and ~.t --~t(On)- W i t h A s the log-likelihood ratio for h01, the log-likelihood ratio
An
m a y be written as n : , ~ {¢'{(/gn, trn) "4- txnn-l/2(Mnt, Snt)}(~nt) -- :{lan, tXn}(~nt)} -t- A s l = l n -- ~ { : { ( 0 , I ) --Fn-'/2(Mnt, Snt)}(e.nt)
--
:{0, 1 }(ent)} "-k A s. t = l(2.11)
This expression resembles the log-likelihood ratio statistic tbr the i.i.d, location- scale model but here the deviations M . t and S.t are random. In the i.i.d, case the L A N t h e o r e m is obtained with deterministic sequences. W e will apply the results o f D K W ( 1 9 9 7 ) w h i c h allow for such r a n d o m sequences.
To get rid o f the starting condition in the log-likelihood ratk~ statistic w e will use the following regularity condition [compare aesump,),en ( A . 3 ) o f Kxeiss ( 1 9 8 7 a ) and A s s u m p t i o n A o f D K W ( 1 9 9 7 ) ] .
A s s u m p t i o n C. The
density if0 o f the initia/ value hol satisfies, u n d e rOn,
A s = log{~o,/~o,(hol
)} ~ 0 as n ---+ oo. (2.12)To m a k e an appropriate expansion o f An it will be h a n d y to in::'oduce the notation ~,u for the four-dimensional conditional score at time t. T o be m o r e precise, denote the two-dimensional vector derivative o f the conditional variance
by
0 ( Y~-! ) (2.13)
H i ( O ) =
~(~,fl---~ht(O)=
flH,_t(O) -I- ht_l(O)
w i t h / - / 1 ( 0 ) = 0 2 . Define the (4 x 2)-derivative matrix
Wt(O)
[motivated by differ- entiation o f(Mnt,Snt)
with respect to 0. at 0.] b yW , ( O ) = e r - ~ ( ½ hT~(O)HdO)(#'~r) )
F. C. Drost, C. A.J. Klaassen / Journal o f Econometrics 81 ( 1 9 9 7 ) 193-221 denote the location-scale score b y ( w i t h
1'--g'/g)
:'(et(O)) )
q / , ( O ) = - l + ~,(o):'(e, to)) ' 2 0 1 (2.15) and put:~(0)
= w , ( 0 ) ~ ( 0 ) .Then, the conditional score at time t m a y be denoted b y g'nt = g~(0n). O b s e r v e that / is j u s t the heuristic score. A n e x p a n s i o n o f (2. I 1 ) s h o w s that the log-likelihood ratio An m a y be alternatively written as
A n = A,n_,/2 ~ ~ , -- g n -1 ~ {AtLt} 2 + 1 Rn. ( 2 . 1 6 )
t=l t----I
T h e L A N result for the parametric version o f model (2.1), (2.2) is stated in the following theorem. T h e p r o o f is deferred to A p p e n d i x A.
T h e o r e m 2.1 ( L A N ) . Suppose that Assumptions A - C are satisfied. Then the local log-likelihood ratio statistic An, as defined by (2.11 ) and (2.16), is asymp- totically normal. More precisely, under On,
R n L O, A n D_D N(_½),l(Oo))..A,i(Oo).~ ) as n - - ~ o o , (2.17)
where I(0o) is the probability limit o f the averaged score products gntg~nt.
W e are n o w in a position to apply the Convolution T h e o r e m o f Hfijek (1970); cf. T h e o r e m 2.3.1 o f B K R W (1993, p. 24).
T h e o r e m 2.2 (Convolution T h e o r e m ) . Under the assumptions o f the L A N The- orem 2.1, let { T n : n E N } be a regular sequence o f estimators o f q ( 0 ) , where q" IR 4 --+ I~ k is differentiable with total differential matrL, c q. A s usual, regular- ity at 0 "-Oo means that there exists a random k-vector Z such that f o r all sequences {On : n E N}, with nl/2(On - 0 0 ) = O ( 1 ) ,
ni/2{Tn--q(On)} D Z as n--+cx~, ( 2 . 1 8 )
o I
where the convergence is under On. Let Z = q(Oo)l(Oo)- i(Oo) be the efficient influence function, then, under 0o,
(
n ' / 2 { r . - - q ( O o ) - - n - ~ E , = l t ' t }) ( )
D Ao202 F.C. Drost. C.A.J. KlaassenlJournal o f Econometrics 81 (1997) 193-221
o ! o
w h e r e Ao a n d Zo a r e i n d e p e n d e n t a n d Zo is N ( O , q ( O o ) l ( O o ) - q ( O o y ) . M o r e o v e r , { Tn : n E N } is efficient i f { Tn : n E ~ } is a s y m p t o t i c a l l y linear in the efficient i n f l u e n c e f u a c t i o n , i.e. i f Ao -- 0 (a.s.).
A s a c o n c l u s i o n f r o m the C o n v o l u t i o n T h e o r e m we obtain that a regular esti- m a t o r ~), o f 0 satisfies, u n d e r 00,
- 0 0 ) `40 + z 0 ,
i.e. the limit distribution o f On is the c o n v o l u t i o n o f the r a n d o m vector `4o and a G a u s s i a n r a n d o m vector with m e a n zero and variance the inverse o f the infor- m a t i o n matrix I ( 0 o ) . Since `40 add~ noise to the G a u s s i a n v e c t o r Z0, it is cl¢,~r that d o - 0 w o u l d be preferred. This m o t i v a t e s the usual t e r m i n o l o g y (as lower b o u n d , etc.) b e c a u s e ,40 = 0 is attainable in the majority o f situations.
In the r e m a i n d e r o f this p a r a g r a p h w e simplify exposition b y s u p p o s i n g that the scores given a b o v e are stationary such that w e m a y restrict attention to j u s t one specific element; c o m p a r e D K W ( 1 9 9 4 ) . In this w a y it is easier to c o m p r e - h e n d the specific adaptiveness features in the G A R C H model. T h e s e results are d e r i v e d a l o n g the lines o f Sections 2.4 and 3.4 o f B K R W ( 1 9 9 3 ) . This exposi- tory simplification will be suppressed again in the next section w h e n deriving a ( s e m i p a r a m e t r i c ) efficient estimator. This optimal estimator satisfies the properties o b t a i n e d in ( 1 ) - ( I V ) below.
In a stationary setting the Fisher information matrix defined in the L A N T h e - o r e m 2.1 simplifies to
I ( 0 o ) = E l i ' - - EWd./@'W' - E W l l ~ ( g ) W ' ,
w h e r e lls(g) is the information matrix in the location-scale model, ( E(E')2 E~:(f') 2 )
/is(g) -- E~b~b' -- e~:(~f') 2 E( l d- ~:~f,)2 "
I f the i o c a t i o , p a r a m e t e r p is k n o w n to be zero, as in the classical G A R C H case, this f o r m u l a simplifies even further to
I ( 0 o ) = I~(g)EW~ W~', ( 2 . 2 0 )
w h e r e Ws is the t h r e e - d i m e n s i o n a l s u b v e c t o r o f W c o n c e r n i n g the relevant deriva- tives with respect to the scale p a r a m e t e r cr and w h e r e Is(g)----E(I -t-eel) 2 is the i n f o r m a t i o n for scale in the i.i.d, scale model.
(I) I f g is k n o w n a n d if w e w a n t to estimate the autoregression p a r a m e t e r v----(=,fl)' in the presence o f the nuisance p a r a m e t e r ~ / - (l~, or)' then w e see that the efficient influence function, as defined in the C o n v o l u t i o n T h e o r e m 2.2, equals
F.C. Drost. C.,4.J. Klaassen/Journal o f Econometrics 81 (1997) 193-221 203
A s in Proposition 2 . 4 . I . A and f o r m u l a (2.4.3) o f B K R W (1993, pp. 2 8 , 3 0 ) w e m a y write
: = [ E l f : l * ' ] - l : x * , ( 2 . 2 1 )
w h e r e the so-called efficient score function E~* o f v is obtained by the c o m p o - nentwise projection o f ~ , the first t w o e l e m e n t s o f L, onto the o r t h o c o m p l e m e n t o f [ ~ ] , the linear span o f the last t w o c o m p o n e n t s o f / . Here the inner p r o d u c t is the covariance and the o r t h o c o m p l e m e n t is t a k e n in the linear space s p a n n e d by all c o m p o n e n t s o f ~'. It is easy to verify that
:~ -- ½oF -I {(H/h ) -- E ( H / h )}(Iz, a)~, ( 2 . 2 2 ) and that ~'1" is orthogonal to ~ indeed, since
H/h =Ht/h:
d e p e n d s on the past only a n d is i n d e p e n d e n t o f the present innovation et.( I I ) I f g is u n k n o w n and i f w e w a n t to estimate v i : the presence o f the nui- sance p a r a m e t e r s r/ a n d O then w e obtain the s a m e efficient influence function. T o see this note that the c o m p o n e n t s o f ~l* as given in ( 2 . 2 2 ) are orthogonal to e v e r y e l e m e n t o f L°(v) b y the i n d e p e n d e n c e o f present (~ and e) a n d past (h and H ) . B y (3.4.2) and Corollary 3.4.1.A o f B K R W (1993, pp. 7 0 , 7 2 ) w e obtain
l(Po l
v,.~)>~
E::'::'
( 2 . 2 3 )for all regular parametric s u b m o d e l s .~ o f o u r s e m i p a r a m e t r i c m o d e l ~ , i.e. the information at P0 in estimation o f v within the parametric s u b m o d e l .~ equals at least the information at Po in estimation o f v within the parametric model, studied in (I), with 0 k n o w n . In other words, as far z:s e s t i m a t i o n o f v is concerned, no parametric m o d e l .~ is asymptotically m o r e d:,fficult to first order (contains less i n f o r m a t i o n ) than the m o d e l f r o m (I). C o n s e q u e n t l y , the s e m i p a r a m e t r i c m o d e l itself is asymptotically to first order as difficult as the parametric m o d e l with g k n o w n , i.e. the information matrix with respect to v evaluated at P0 for the s e m i p a r a m e t r i c m o d e l ~ equals the l o w e r b o u n d in the parametric m o d e l with g ki~own (case ( I ) ) ,
1(P01 v,~,) = E t ; e ; ' .
O n c e m o r e , the efficient influence function is g i v e n by (2.21). Apparently, introduction o f the nuisance p a r a m e t e r g in the presence o f the Euclidean nui- sance p a r a m e t e r r/ does not c h a n g e the efficient influence fimction for v. Hence, estimation o f v is asymptotically as hard not k n o w i n g g as k n o w i n g 0- O n e usu- ally calls this
adaptivity.
Observe, however, that the presence o f the n u i s a n c e p a r a m e t e r w/ is important to derive this result. I f r/ is k n o w n adaptive e~tima- tion o f v is not possible! T h e s a m e conclusion applies i f ~/ is included into the ' b i g ' infinite d i m e n s i o n a l nuisance p a r a m e t e r 0. So, the nuisance p a r a m e t e r t/ is204 F.C. Drost. C.A.J. K/aassenlJournal o f Econometrics 81 (1997) 193-221
treated in a n o t h e r w a y than the nuisance p a r a m e t e r O- Since location-scale pa- r a m e t e r s are almost always present in econometric models a different treatment is not unreasonable and the usage o f the protected notion ' a d a p t i v i t y ' is legalized. H o w e v e r , with the c o m m e n t s above in mind, a m o r e appropriate w a y of saying this is to call the p a r a m e t e r v r/-adaptive, explicitly referring to the remaining n u i s a n c e parameters present in the model. [Of course, a similar r e m a r k applies to, e.g., the non-synmletfic regression model as discussed in Bickel (1982), w h e r e the regression p a r a m e t e r fl is not fully adaptively estimable. In fact fl is /~-adaptive.]
( I I I ) Estimation o f the r e m a i n i n g p a r a m e t e r r/ is completely a n a l o g o u s to the location-scale p r o b l e m for i.i.d, variables. Obtain the w e l l - k n o w n lower bound for t/ in the semiparametric location-scale model. It suffices to construct a sequence o f estimators { ~ , , n E r~} for r/ attaining this bound. Let 0, be some initial v / J - consistent estimator o f 0, calculate h,t = ht(0n) by plugging in On into (2.9) and obtain the residuals ~,t = ~ t ( 0 , ) -
yt/h,,t I/2,
similarly. I f one proceeds as if the~,t
are i.i.d, observations from some location-scale model, one obtains a semi- parametric efficient estimator for 17 in our model (as is easily verified from the C o n v o l u t i o n T h e o r e m 2.2 by c h o o s i n g an appropriate function q). To be more explicit, w e a s s u m e that O has finite second m o m e n t and w e define the location and scale parameters by standardizing g via the equations E ~ - 0, Ege 2 -- 1. T h e n the square root o f the sample variance is optimal for tr both in the s y m m e t r i c and n o n - s y m m e t r i c case. T h e sample m e a n is optimal for /~ if no s y m m e t r y is a s s u m e d and u n d e r the assumption o f s y m m e t r y one has to use a n efficient es- timator for the symmetric location-problem (cf. Example 7.8.1 o f B K R W , 1993, p. 400). I f one w a n t s to avoid m o m e n t conditions on e one m a y define the location-scale p a r a m e t e r in another way, see the discussion o f the M - e s t i m a t o r in Section 3.( I V ) Finally, w h e n estimating the whole Euclidean p a r a m e t e r 0, the efficient score is simply obtained from ( I I ) and (IIl). Following the a r g u m e n t s leading to (2.23) in ( I I ) this score function yields a lower b o u n d indeed. Optimality o f this b o u n d follows from ( I I I ) by choosing the m o s t difficult direction from the location-scale problem.
O b v i o u s substitutions in T h e o r e m s 2.1 and 2.2 s h o w that the conclusions above are also valid for the classical G A R C H model with /~ =Eq.~ = 0 . A n optimal estimator o f tr in the n o n - s y m m e t r i c case is given t h e n by the square root o f (cf. E x a m p l e 3.2.3 o f B K R W , 1-.',-93, pp. 5 3 - 5 5 )
En ~.3t , = ,
In the symmetric case the limiting behavior o f this estimator and the square root o f the sample variance are the same.
F.C. Drost, C.A.J. KlaasseniJournal o f Econometrics 81 (1997) 193-221 2 0 5 3. Adaptive Estimators
In classical parametric models the maximum likelihood estimator is asymp- totically efficient, typically. In semiparametric models such an estimation princi- ple yielding efficient estimators does not exist. However, there exist methods to upgrade x/J-consistent estimators to efficient ones by a Newton-Raphson tech- nique, provided it is possible to estimate the relevant score or influence func- tions sufficiently accurately. In Klaassen (1987) such a method based on "sam- ple splitting" is described for i.i.d, models. Schick (1986) uses both "sample splitting" and Le Cain's "discretization', again in i.i.d, models. See, e.g., Sec- tion 7.8 of B K R W (1993) for details. Schick's (1986) method has been adapted to time-series models in Theorem 3.1 o f DKW(1997). We assume the existence o f such a preliminary, x/~-consistent estimator.
Assumption D.
There exists a x/n-consistent estimator ~ o f On (under On and g). For our G A R C H model a natural candidate for such an initial estimator is the MLE based on the assumption o f normality o f the innovations Yr. One often calls this estimator the quasi-MLE. Probably, this QMLE is v/~-consistent tmder every density O with Eg~ 4 < o ¢ , this has been shown by Weiss (1986) for ARCH models and under restrictions by Lee and Hansen (1994) for G A R C H models, which are slightly different from ours; see also Lumsdaine (1989). The additional moment condition on v is needed there since a quadratic term appears in the score function o f the scale parameter. To avoid moment conditions altogether, one could use, e.g., another preliminary M-estimator, instead. Let X : R --, R 2 be a sufficiently smooth bounded function with monotonicity properties. As an example we mention X = (Xm,X2)' with2
X l ( x ) - l + e x p { - - x } - 1 ' x E R ,
the location score function for the logistic distribution and
fo x exp{--y}
Xa(x)=
2Y(l + e x p { _ y } ) z d y - 1, x E R .The M-estimator will solve the equations [cf. ( 2 . 7 ) - ( 2 . 9 ) and (2.13), (2.14)]
Wt(0)X(~t(0)) -- 0. (3.1)
t----I
Use of this M-estimator implies that one standardizes g at location 0 and scale 1 by the equation E g x ( e ) = 0 ; in the normal case with QMLE this yields /t as expectation and ~r as standard deviation.
206 F.C. Drost, C.A.J. KiaassenlJournal o f Econometrics 81 (1997) 193-221
To prove that e ~ i m a t i o n via (3.1) s h o w s validity o f A s s u m p t i o n D w e h a v e to prove existence o f this M-estimator and its x/~-consistency. It should be possible to s h o w the existence along the lines o f Scholz ( 1 9 7 1 ) b y studying the 4 by 4 pseudo-information matrix E W z z ' W ' ; see also H u b e r (1981, pp. 138-139). Here w e will not attempt to do this, since the situation is m u c h more complicated than the location-scale p r o b l e m studied in the literature. At the cost o f some generality w e suppose here that x/~-consistent estimators ~,~ and /~n are given. T h e x/'~-consistency o f ~, a n d / ~ , together with the contiguity obtained from the L A N T h e o r e m 2.1 implies that w e m a y treat the parameters 0t and fl as given. So, w e are in fact in the |.i.d. location-scale model and the M-estimators for /~ and tr solving the latter t w o equations in (3.1) are x/~-consistent; see H u b e r ( 1 9 8 1 ) and Bickel (1982). W e conjecture that the p r o o f o f ;he more general M-estimator solving (3.1) can be given along similar lines.
Here w e will focus on efficient and hence adaptive estiraation o f the autore- gression parameters ot and fl (cf. ( 1 ) - ( I V ) o f Section 2); alternatively, in view o f (2.14), note that the score f',,t satisfies the form discussed in E x a m p l e 3.1 o f D K W ( 1 9 9 7 ) . In the appendix w e verify the. conditions o f T h e o r e m 3.1 in D K W ( 1 9 9 7 ) , this yields the following theorem.
T h e o r e m 3.1. U n d e r A s s u m p t i o n s A - D a d a p t i v e e s t i m a t o r s o f ~ a n d ~ do exist. To describe our adaptive estimator m o r e accurately, let On = ( ~n, ~n, ~ln, ~n ) t be a x/-n-consistent estimator o f 0 and c o m p u t e Wt(0n) via (2.13) and (2.14). Let gn| . . . ~nn be the residuals c o m p u t e d from h l , y l , . . . , y ~ and 0, using (2.8). Via a kernel estimate based on an| . . . ~,, with the logistic kernel, say k(-), and b a n d w i d t h b,~ w e estimate 0 ( ' ) by
0 : ' ( ' ) = n , = | b,, b,,
and subsequently ~ ( - ) by ~,,(-); here bn--* 0 and nb 4 --~ oo. N o w our estimator m a y be written as
- - !
x - ~ Wt(On)- - ~ Ws(O,,) ~n(~,,t). (3.2)
n 1=1 n $ = |
With 0n the Q M L E this is the estimator used in the simulations o f Section 4. T o prove that such estimators are adaptive w e need the following two technical modifications.
D i s c r e t i z a t i o n : On is discretized by c h a n g i n g its value in (0, cx~) x (0, oo) × ~ x (0, oo) into (one of) the nearest point(s) in the grid ( c / v / ' n ) ( ~ x ~ x Z × N ) .
E C. Drost, C.,4.J. Klaassen/.lournal o f Econometrics 81 (1997) 193-221 207 This technical trick enables one t o consider 0n to be non-random, and therefore independent o f ~nt, Yt, and hi.
Sample splitting: The
set o f residuals ~,1, . . . . ~-n is split into two samples, which may be viewed as independent now. For ~n~ in the first sample, the second sample is used to estimate 0 ( - ) by ~n2(') and 0n(~nt) in (3.2) is replaced by ~,2(~,,). Similarly for~nt
in the second sample, the first sample is used to estimate ~(-). In this way, again some independence is introduced artificially to make the proof work.This approach has been adopted in DKW(1997). It should be emphasized that both trick3 are merely introduced as a technical device to make proofs work. Other approaches have also been studied in the literature. Klaassen (1987) has shown that discretization may be avoided at the cost o f an extra sample splitting. Schick (1986) and Koul and Schick (1995) show that sample splitting may be avoided at the cost o f some extra conditions.
4. Simulations and an empirical e x a m p l e
To enhance the interpretation and validity o f the theoretical results o f the previous sections we present a small simulation experiment. Furthermore, a case study concerning some exchange rate series is given.
We simulated several G A R C H ( 1,I ) series o f length n ~- 1000, parameters ( ~ f l , o ) - - ( 0 . 3 , 0 . 6 , 1 ) , (0.1,0.8,1), and (0.05,0.9,1) (the parameter /~ is set to zero and is not estimated to allow for a better comparison with previous simulation studies), and eight different innovation distributions: normal, a balanced mixture o f two standard normals with means 2 and - 2 , respectively, double exponen- tial, Student's distributions with v - 5, 7, and 9 degrees o f freedom, and ( s k e w ) chi-scluared distributions with v - - 6 and 12 degrees o f freedom. These densities are rescaled such that they have the required zero mean and unit variance.
It is the purpose o f the simulations to evaluate the moderate sample properties o f the autoregression parameters ~ and /~ which are adaptively estimable, in principle. For each series we estimated these parameters with MLE, QMLE, and a one-step semiparametric procedure. For the latter estimation method we made two estimates: one under general assumptions on the innovation distribution and one under the extra assumption o f symmetry. The theoretical results imply that there should be no difference between these two semiparametric methods if the true m~derlying density is symmetric indeed but small sample properties may differ. In the semiparametric part we used standardized logistic kernels with a bandwidth o f h = 0.5. Reasonable changes o f the bandwidth, say 0.25 < h 4 0 . 7 5 , or another kernel like the normal one do not alter the conclusions below.
In the first part o f the simulation experiment we compared the ML estimator with the semiparametric ones (with the MLE as initial starting value). Asymptot- ically both semiparametric estimators should behave as well as the MLE but one
208 F.C. Drost, C.A.J. KlaassenlJournal o f Econometrics 81 (1997) 193-221
m a y expect that the small sample properties o f the semiparametric
estimators
are worse due to the inherent problems o f choosing the bandwidth. T h e s e results are not reported here but they are comparable to those given in Table 1, from w h i c h M L E can be c o m p a r e d to the semiparametric procedure with the less efficient Q M L E starting value.O f course, M L estimation is not feasible in practice since the underlying dis- tribution is not k n o w n . Therefore, w e used the Q M L E as starting point. Since /~ vanishes for the situation chosen here and act z + fl < I, T h e o r e m s 2 and 3 o f L e e and H a n s e n ( 1 9 9 4 ) are applicable and the Q M L E is v/-n-consistent. This es- timator has b e e n i m p r o v e d by the one-step N e w t o n method. For c o n v e n i e n c e w e also report the b e h a v i o r o f the unfeasible M L E in Table I. T h e m e a n values o f the estimates in 2500 replications are given together with their sample standard deviations.
To calculate the efficiency o f the Q M L E , observe that the asymptotic variance o f the Q M L E is equal to the w e l l - k n o w n variance f o m m l a A - I B A - n , w h e r e A is the expectation u n d e r (~¢,fl,~r,O) o f the second derivative o f the pseudo log-likelihood (with a w r o n g l y specified normal d e n s i t y ) and B the expectation o f the squared first derivative. With Ws as defined j u s t b e l o w (2.20), straightforward calculations s h o w
A = 2EWs Ws', B = (K -- 1 )EW~Ws',
w h e r e x =
f e40(e)de.
Except for the normal distribution, the matrices A -n and B-~ are generally not equal. Since the asymptotic variance o f the Q M L E is equal to the lower b o u n d up to a constant, the asymptotic efficiency o f each c o m p o n e n t o f the Q M L E is given by(~¢ - - 1 ) I s ( 0 )
4 4
= ~ 1 .
f ( e 2 _ 1)2g(~)d~ f ( l + ~:~t(~))2g(~:) dc
T h e latter inequality follows from C a u c h y - S c h w a r z applied to the following iden- tit),:
- - 2 - - E ( e 2 - 1)(1 + e ~ t ( e , ) ) - -
f ( e 2 -
I X I + t ¢ ' ( c ) ) g ( e ) d e .Since the l o w e r b o u n d for 0t and fl does not change in the semiparametric setting, this expression also entails the loss in the semiparametric model and shows the (potential) gain o f the semiparametric estimator (3.2).
Except for the mixture distribution w e can exactly calculate the effÉciency o f Q M L E with respect to M L E . For the standardized double exponential the relative efficiency is ~, for standardized Stud~.~t distributions with v degrees o f freedom it is 1 - 1 2 / v ( v - 1 ), and for standardized chi-squared distributions with v degrees o f freedom it is ( v - - 4 ) / ( v + 6 ) . For these heavy-tailed distributions the efficiency
F.C. Drost, C.A.J. KI~ .. ~ e n l J o u r n a l o f E c o n o m e t r i c s 81 ( 1 9 9 7 ) 193-221 209
T a b l e !
C o m p a r i s o n o f M L E , Q M L E , a n d t w o s e m i p m m n e t , ~ c o n e - s t e p e s t i m a t o r s in t h e G A R C H ( I , I ) m o d e l
with eight different standardized i n n o v a t i o n distributions. N u m b e r o f ob~,ervations n--- I 0 0 0 , true pa- rameters ( ~ , ~ ) = ( 0 . 3 , 0 . 6 ) , (0.1,0.8), a n d ( 0 . 0 5 , 0 . 9 ) , respectively. T h e s a m p l e m e a n s o f 2 5 0 0 inde- p e n d e n t replications a n d t h e i r s a m p l e standard d e v i a t i o n s are g i v e n
0 . 3 0 0 0 . 6 0 0 0 . 1 0 0 0 . 8 0 0 0 . 0 5 0 0 . 9 0 0 N M L = Q M L 0.298 0.593 0.071 0.056 0.099 0.786 0.035 0.073 0.047 0.891 0.022 0.051 l - s t e p 0.298 0.593 0.072 0 . 0 5 7 0.098 0.786 0 . 0 3 6 0.074 0.047 0.892 0.022 0.050 I - s t e l R s y m ) 0.298 0.593 0.072 0.056 0.099 0.786 0 . 0 3 6 0.073 0.047 0.891 0.022 0.051 D E M L 0.299 0.592 0.080 0.070 0.099 0.782 0.038 0.083 0.048 0.885 0.023 0.061 Q M L 0.303 0.588 0.089 0.079 0.100 0.776 0.043 0.094 0.048 0.880 0 . 0 2 6 0.073 I - s t e p 0.294 0.593 0.085 0.074 0.097 0.784 0.040 0.087 0.046 0.886 0.024 0.067 I - s t e p ( s y m ) 0.295 0.592 0.083 0.073 0.097 0.783 0.039 0.086 0.046 0.885 0 . 0 2 4 0.065 N M M L 0.295 0.595 0.058 0.041 0.098 0.790 0.029 0.054 0.047 0.898 0.018 0.0.;0 Q M L 0.295 0.595 0.059 0.042 0.097 0.790 0.030 0.054 0 . 0 4 6 0.897 0.018 0.032 l - s t e p 0.295 0.595 0.060 0.043 0.098 0.793 0.030 0.056 0.047 0.901 0.018 0.032 I - s t e p ( s y m ) 0.295 0.595 0.059 0.042 0.0q9 0.793 0.030 0.056 0.047 0.901 0.018 0.032 M L 0.295 0.592 0.076 0.067 0.100 0.787 0.036 0.071 0.048 0.888 0.021 0 . 0 5 4 Q M L 0 . 2 9 6 0.586 0.098 0.086 0.101 0.777 0.047 0 . I 0 1 0.048 0.879 0.027 0.083 l - s t e p 0.284 0.594 0.080 0.071 0.094 0.791 0.037 0.081 0.044 0.890 0.022 0.064 I - s t e p ( s y m ) 0.285 0.594 0.079 0.070 0.095 0.791 0.037 0.081 0.045 0.889 0.022 0.063 M L 0 . 2 9 6 0.595 0.075 0.060 0.100 0.782 0.037 0.079 0,047 0.885 0.021 0.063 Q M L 0.298 0.592 0.086 0 . 0 7 0 0.101 0.776 0.042 0.094 0.047 0.882 0 . 0 2 4 0.076 I - s t e p 0.291 0.597 0.078 0.064 0.096 0.784 0.038 0.082 0.045 0.886 0.022 0.068 l - s t e p ( s y m ) 0.292 0.597 0.077 0.063 0.097 0.783 0.038 0.082 0.045 0.886 0.022 0.068 M L 0.298 0.592 0 . 0 7 6 0.060 0.098 0.783 0.037 0.077 0.047 0.887 0.022 0.058 Q M L 0.300 0.591 0.083 0.066 0.099 0.781 0.040 0.085 0.048 0.886 0.024 0.064 l - s t e p 0.295 0.593 0.079 0.062 0.096 0.785 0.038 0.080 0 . 0 4 6 0.889 0.022 0.057 l - s t e p ( s y m ) 0.295 0.593 0.077 0.062 0.096 0.784 0.038 0.079 0 . 0 4 6 0 . 9 ° 9 0.022 0.037 X 2 M L 0.297 0.596 0.042 0.034 0.099 0.796 0.{)20 0.036 0 . 0 5 0 0.899 0.012 0.022 Q M L 0.299 0.589 0.091 0.073 0.101 0.780 0.042 0.096 0.048 0 . 8 8 4 0.024 0.072 l - s t e p 0.283 0.603 0.062 0.051 0.092 0.801 0.030 0.061 0.045 0 . 8 9 8 0.017 0.047 ~22 M L 0.298 0 . 5 9 6 0.057 0.045 0.099 0.794 0.029 0.048 0.048 0.893 0 . 0 1 6 0.036 Q M L 0.299 0.592 0.084 0.064 0.100 0.782 0.041 0.079 0.047 0.881 0.023 0.071 l - s t e p 0.289 0.598 0.063 0.051 0.095 0.796 0.032 0.061 0.045 0.891 0.018 0.049
losses o f Q M L E with respect to M L E s h o w up in Table 1 and w e see that the semiparametric m e t h o d s regain m o s t o f the loss caused by the inefficient Q M L E method. For light-tailed alternatives, as in the mixture case, the situation is less clear cut. There the efficiency is approximately 0 . 9 4 and the performance o f the estimators is not m u c h different. For the normal distribution M L E and
210 F. C. Drost, C./I.J. Klaassenl Journal o f Econometrics 81 (1997) 193-221
Q M L E are o f course equivalent. The use o f the additional s y m m e t r y information hardly improves the estimated standard deviation o f the semiparametric estimator ( m a x i m a l 0.002), just as expected from our general theory. In empirical data sets, one often observes outlier-type innovation distributions with high kurtoses. Therefore, it seems worthwhile to apply the semiparametric estimation programs in these situations.
W e conclude this section with a simple empirical e x a m p l e based on daily data. We applied our estimation methods to 15 logarithmic differenced exchange rate series for the period 1 January, 1980 to 1 April, 1994 (n----3719): Austrian Schilling ( A S ) , Australian Dollar ( A D ) , Belgium Franc ( B F ) , British Pound ( B P ) , C a n a d i a n Dollar ( C D ) , Dutch Guilder ( D G ) , Danish Kroner ( D K ) , French Franc ( F F ) , G e r m a n M a r k ( G M ) , Italian Lire (IL), Japanese Yen ( J Y ) , N o r w e g i a n Kro- ner ( N K ) , Swiss Franc ( S F ) , Swedish Kroner ( S K ) , and Spanish Peseta (SP), all with respect to US Dollar. These data are taken f r o m Datastream. To facilitate the interpretation o f the autoregression parameters w e have standardized the series such that the Q M L E o f ¢r equals 1. In all series both the Q M L E method and the scmiparametric procedure estimate the persistence ~ r 2 + fl less than one (for the semiparametric estimates this cannot be inferred from Table 2 since the semipara- metric estimate o f ,7 is not constrained to equal 1). The estimates based on the original data sets are given in the first four c o l u m n s o f Table 2. O f course, w e used the variance formula A - ~ B A - ~ for the direct estimate o f the standard devia- tion o f the Q M L E . As described above, the parameter estimates produced b y the semiparametric procedure are not very sensitive to the choice o f the bandwidth. H o w e v e r , it turns out that the direct variance estimates change dramatically (even for small changes o f the bandwidth). Therefore, these estimates are not reliable and they have been deleted from the table.
For tile simulation study above the situation was quite different since we es- timated the variance o f the semiparametric one-step estimators from indepen- dent parameter estimates in the i-eplications. I Icre w e have only one data seL Independent replications are not available. This leads to the following para- dox. O n the one hand, one m a y have the imprecise Q M L estimate with quite large estimated standard deviations. So it m a y be possible that the hypotheses o f integrated G A R C H or no conditional heteroskedasticity cannot be rejected. On the other, one has the improved semiparametric estimate which allows for m o r e powerful tests. But since the estimated standard deviations are unreliable one can get any answer one wants b y changing the bandwidth. To avoid this paradox, w e propose to use the bootstrap. I.e. simulate replications o f the orig- inal data set with the estimated parameter and the estimated innovation distri- bution as inputs and proceed as in the case o f simulations described above. Then w e have several parameter estimates available from which we calculate the straightforward sample estimate o f the variance. In this m a n n e r we only rely upon the parameter estimates and not on direct estimates o f the variance. Hence, the variability o f the variance due to different bandwidth choices is greatly
F.C. Drost, C.A.J. KlaassenlJournal o f Econometrics 81 (1997) 193-221 211 Table 2
Comparison o f QMLE and a semiparametric one-step estimator for several logarithmic diffs:~c,~xl daily exchange rate series. Observation period I January 1980 to 1 April 1994 (n ---- 3719). The first part o f the table gives the estimates based on the original data set. Estimated standard deviations are deleted for the semiparametric estimators. The sample means and sample standard deviations o f 500 bootstrap replications are given in the second half o f the table
Estimates based on
Original data Bootstrap samples
¢~., dp AD QMLE 0.129 0.843 0.075 0.034 0.116 0.828 0.044 0.063 l-step 0.253 0.867 O. I 12 0.844 0.027 0.024 AS QMLE 0.075 0.891 0.013 0.016 0.073 0.888 0.018 0.018 l-step 0.113 0.897 0.072 0.890 0.014 0.013 BF QMLE 0.068 0.902 0.01 ! 0.023 0.068 0.899 0.019 0.018 l-step 0.093 0.906 0.065 0.903 0.0 i 3 0.0 i 4 B P QMLE 0.052 0.932 0.008 0.0 ! 3 0.05 ! 0.93 ! 0.0 ! 5 0.012 I-step 0.055 0.932 0.050 0.931 0.012 0.010 CD QM LE 0.138 0.798 0.042 0.062 0.139 0.793 0.032 0.03 ! I -step 0. ! 69 0.809 0. ! 33 0.797 0.021 0.021 DG QMLE 0.078 0.888 0.013 0.016 0.077 0.886 0.018 0.018 I-step 0.107 0.916 0.076 0.887 0.015 0.014 D K QMLE 0.067 0.902 0.011 0.016 0.065 0.898 0.015 0.016 l-step 0.095 0.920 0.064 0.90 ! 0.012 0.0 i 4 FF QMLE 0.088 0.873 0.0 ! 6 0.0 i 7 0.088 0.869 0.023 0.022 i-step 0.119 0.913 0.085 0.872 0.017 0.017 GM QMLE 0.073 0.894 0.012 0.013 0.073 0.891 0.017 0.018 I-step 0.095 0.925 0.072 0.893 0.014 0.015 IL QMLE 0.093 0.869 0.016 0.031 0.092 0.864 0.022 0.020 l-step 0.109 0.896 0.090 0.867 0.019 0.017 JY QMLE 0.059 0.89 ! 0.017 0.025 0.060 0.888 0.0 ! 6 0.026 I -step 0.078 0.9 ! 2 0.057 0.891 0.0 ! 2 0.02 i NK QMLE 0.080 0.907 0.009 0.014 0.078 0.906 0.023 0.015 I-step 0.092 0.9 ! 6 0.075 0.908 0.017 0.0 ! 0 SF QM LE 0.059 0.904 0.012 0.0 ! 3 0.058 0.90 ! 0.0 ! 4 0.019 i-step 0.064 0.922 0.057 0.903 0.012 0.0 ! 5 SK QMLE 0.221 0.754 0.035 O. 119 0.210 0.75 i 0.049 0.038 l-step 0. ! 85 0.839 0.209 0.756 0.032 0.020 SP QMLE O. 106 0.871 0.014 0.032 0.1 04 0.868 0.027 0.020 l-step 0.171 0.912 0.100 0.870 0.021 0.014 r e d u c e d . S o m e s i m u l a t i o n e x p e r i m e n t s s h o w t h a t t h i s p r o c e d u r e w o r k s q u i t e w e l l . W e a p p l y t h e b o o t s t r a p p r o c e d u r e t o o u r d a t a s e t s a n d w e r e p o r t t h e s a m p l e m e a n s a n d s a m p l e s t a n d a r d d e v i a t i o n s i n t h e f i n a l f o u r c o l u m n s o f T a b l e 2 . O b s e r v e t h a t t h e e s t i m a t e d s t a n d a r d d e v i a t i o n s o f t h e s e m i p a r a m e t r i c e s t i m a - t o r s o f t h e h e t e r o s k e d a s t i c p a r a m e t e r s a r e b e t w e e n f o u r t e n t h ( A I D ) a n d n i n e t e n t h ( I L ) o f t h e e s t i m a t e d o n e s f o r t h e Q M L E m e t h o d . T h i s i m p l i e s t h e e f f i - c i e n c y o f t h e Q M L E m e t h o d l i e s a p p r o x i m a t e l y i n t h e i n t e r v a l ( 0 . 1 5 , 0 . 8 0 ) i n
2 1 2 F. C. Drost, C. A.J. Klaassen l Journal o f Econometrics 81 (1997) 193-221
these special examples. The efficiency gain is also supported by the plots in Fig. 1 o f the nonparametric density estimates and the corresponding score esti- mates which are far away from the normal density and score. Although these figures suggest some skewness o f the exchange rate densities, they are close to the densities o f Student's t~-distributions with v between 4.1 and 5.4. If the true underlying density would be symmetric, we expect from the simula- tion study that the symmetric nonparametric procedure performs slightly better in moderate samples. However, in the exchange rate applications the latter pro- cedure yields somewhat larger standard deviations (0.003 for AS and less than
A D t(4.1) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 - 3 - 2 - 1 0 1 2 3 AD t(4.1) - 3 - 3 - 2 - 1 0 1 2 3 A S 0.¢; 0.5 0.4 0.3 0.2 0.1 0.0 - 3 - 2 - 1 0 t(5.3) AS t(5.3) 3 g,/g 4 1 o - 2 - 3 2 3 - 3 - 2 - 1 0 1 2 3 BF t(4.5) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 - 3 - 2 - 1 0 1 2 3 3 2 1 0 - 1 - 2 - 3 - 3 - 2 - 1 BF t(4.5) 6 0 1 2 3 BP t(4.8) BP t(4.8) 0.6 ~ 3 - ~ " ~ 0.5 2 _...~...gVg 8 0.4 1 " 0.3 0 0.2 - 1 0.1 - 2 0.0 - 3 - 3 - 2 - 1 0 1 2 3 - 3 - 2 - 1 0 1 2 3 CD t(5.1) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 - 3 - 2 -1 0 1 2 3 CD t(5.1) DG t(4.9) DG t(4.9) 3 ~ 10 0 . 6 ~ 3 2 0.5 2 /~,.._~,g'/g 12 1 ": 0.4 1 o 0.3 0 - 1 ~ 0.2 -1 ~ _ ~ - 2 0.1 - 2 - 3 0.0 - 3 - 3 - 2 - 1 0 I 2 3 - 3 - 2 -1 0 1 2 3 - 3 - 2 -1 0 1 2 3 DK t(4.9) DK t(4.9) FF t(4.8) FF t(4.8)
o il
0.5 2 . . -. 0.4 1 0.4 0.3 0 0.3 0.2 -1 . v ~ . ,, _ 0.2 0.1 - 2 0.1 0.0 - 3 0.0 - 3 -! ... I ... I - 3 - 2 -1 0 1 2 3 - 3 - 2 -1 0 1 2 3 - 3 - 2 -1 0 1 2 3 - 3 - 2 -1 0 1 2 3 F i g . ! . C o m p a r i s o n o f e s t i m a t e d d e n s i t i e s a n d s c o r e s w i t h tv-clensities a n d s c o r e s f o r s e v e r a l l o g a - r i t h m i c d i f f e r e n c e d d a i l y e x c h a n g e r a t e s e r i e s . O b s e r v a t i o n p e r i o d 8 0 0 1 0 1 - 9 4 0 4 0 1 ( n = 3 7 1 9 ) .F.C. Drost. C.A.J. KlaassenlJournal o f Econometrics 81 (1997) 193-221 213 GM t(5.0) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -3 -2 -1 0 1 2 3 GM t(5.0) 321 ~ 18 0
-,
-2 -3 -3 -2 -1 0 1 2 3 JY t(4.1) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 - 3 - 2 -1 0 1 2 3 JY t(4.1) 3 9'/9 22 2 0 - 2 -3 - 3 - 2 -1 0 1 2 3 S, c t(5.4) 0.6 0-5 0.4 0.3 0.2 0.1 0.0 -3 -2 -1 0 1 2 3 SFt(5.4)
3
9'/9
26 1 0-,
'X 4
- 2 - 3 -3 - 2 -1 0 1 2 3 SP t[4.5)0 . 6 ~
3
0.5 2 0.4 1 0.3 O 0.2 -1 0.1 - 2 0.0 - 3 - 3 - 2 -1 0 1 2 3 SP t(4.5) - 3 - 2 -1 0 1 2 3 IL t(5.1) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 - 3 - 2 -1 0 1 2 3 3 ¸ 2 1 0 -1 - 2 - 3 It. t(5.1) 9°/g 20 - 3 - 2 -1 0 1 2 3 NK t(4.5) NK U4.5) 0.6 ~ 3 0.5 2 . ~ . . ~ .~1 9 24 0.4 1 -- - ' ~ . " O.3 0 0.2 -1 0.1 - 2 0.0 - 3 - 3 - 2 -1 0 1 2 3 - 3 - 2 -1 0 1 2 3 SK U4.2) 0.6 3 0.5 ~ 2 0.4 1 0.3 0 0.2 -1 0.1 - 20.O
- 3 - 3 - 2 -1 0 1 2 3 SK t(4.2) - 3 - 2 -1 0 1 2 3 Fig. !. Continued.0 . 0 0 2 for the others, these values are not reported here). This indicates that the true densities are not fully symmetric and h e n c e the symmetric semipara- metric approach m a y lead to w r o n g conclusions. Since the p o s s i b l e moderate s a m p l e loss is very small it s e e m s to be safer to use the ordinary n o n - s y m m e t r i c improvement.
Finally, w e note that the simulation results o f Table 1 s h o w that all estinuttors, e v e n the unfeasible MLE, tend to underestimate the heteroskedasticity parameters. This negative bias explains w h y in Table 2, on average, the bootstrap estimates are less in value than the original estimates.
2 1 4 F.C. Drost. C.A.J. Klaassen/Jourtzal o f Econotnetrics 81 (1997) 193-221 5. C o n c l u s i o n s
In this p a p e r w e studied the s e m i p a r a m e t r i c properties o f (;.nteg~at:~.d) G A R C H - M - t y p e models. In this m o d e l , adaptive estimation is not possible. This fact is c o m p l e t e l y c a u s e d by a location-scale parameter. A f t e r a suitable reparametriza- tion o f the m o d e l we s i m w e d that the estimation proble~l: o f the p a r a m e t e r s characterizing the conditional heteroskedastic character o f the process is equally difficult in cases w h e r e the innovation distribution is k n o w n or u n k n o w n , respec- tively. In that sense w e m a y call these p a r a m e t e r s still adaptively estimable. This property is d e r i v e d in a general G A R C H context a v o i d i n g m o m e n t conditions a n d i n c l u d i n g integrated G A R C H models. The simulations s h o w e d that this property is not only interesting f r o m a theoretical point o f view. In m o d e r a t e s a m p l e sizes with n = 1000 observations, usually available in fin~,r, cial time-series, the s e m i p a r a m e t r i c p r o c e d u r e s w o r k reasonably well. Me~t c f the loss c a u s e d by the Q M L E m e t h o d (instead o f the infeasible M L E m e t h o d ) is r e g a i n e d by the one-step e s t i m a t o r in case o f the interesting g r o u p o f h e a v y - t a i l e d alternatives. M o r e o v e r , the empirical e x a m p l e s h o w e d that the efficiency loss c a u s e d b y the Q M L E m e t h o d m a y be considerable.
It is clear from the exposition in this paper that the adaptivity results carry o v e r to c o m p l i c a t e d m o d e l s with t i m e d e p e n d e n t m e a n a n d variance structures, e.g., A R M A with G A R C H errors. T h e basic conditions g i v e n in D K W ( 1 9 9 7 ) do not s e e m to put serious restrictions on the models. H o w e v e r , a c o m p l e t e verification o f the technical details m a y be m u c h m o r e d e m a n d i n g .
A p p e n d i x