Learning from Auxiliary Statistics
The method of sieves introduced in Section 1.3 allowed us to deal with infinite dimensional spaces of unbounded complexity. There, we reviewed results that sug- gest the possibility (at least in theory) of conducting statistical inference on such large parameter spaces. However, we have not made any comments on the practical implementation of such procedures. We now turn our attention to this issue.
Recall that the consistency of the sieve extremum estimator discussed in Section 1.3 relied fundamentally on the increasing complexity of the sieves. Implicitly, the assumption was also made that an estimator taking values on such sets is available. Remark 1.5.1. If the method of sieves is to be of any use, an estimator must be
available that is practical to work with. The availability of a ‘practical’ sieve estima- tor might not be a matter of concern in the simple regression and density estimation cases considered until now. Outside these simpler cases however, complications are likely to occur.
Consider the nonlinear cross-sectional regression problem introduced in the be- ginning of this chapter. In principle, it is not difficult to work with sieve estima- tors of the type ˆθT(x) = Kk=0T βkxk where KT → ∞ as T → ∞. Indeed, such
a sieve estimator takes values in sieves ΘT that are spanned by the basis vectors
ΘT ⊆ lin{1, x, x2, ..., xKT} for every T . Furthermore, given the results of Section
1.3, we know that (under appropriate regularity conditions) an estimator designed in this way can be consistent to a parameter θ0 lying on a space Θ of continuous functions in x. Until here everything seems to work well. However, consider now the nonlinear dynamic model introduced in the beginning of this chapter,
xt=θ0(xt−1) +t
where t is a vector of innovations and the vectors xt contain both observed and
latent variables. In such a setting, difficulties can be expected when applying the sieve estimation methodology to estimateθ0.
Remark 1.5.2. Even in relatively simple dynamic models, classical estimators such
as maximum likelihood and method of moment estimators might be hard to derive. If this is true for relatively simple models, not much can be expected from dynamic models whose complexity must increase with T .
1.5 Indirect Inference: Learning from Auxiliary Statistics Below, we review a solution to our problem that goes by the name of indirect
inference (II). This solution is available for finite dimensional parameter spaces only.
Hence, for the time being, we leave the sieves method aside.
With the availability of increased computational power, the 1980’s witnessed a growing interest in simulation-based estimators. On finite dimensional parameter spaces, such estimators offer an alternative to classical estimators and are especially appealing when (due to model complexity or others) the latter fail to be avail- able. This literature includes simulation-based extensions of classical estimators such as simulated maximum likelihood, simulated method of moments and others; see Gourieroux and Monfort (1996), Dave and Dejong (2007) and Ruge-Murcia (2007) for reviews of this literature. The II principle underlying these techniques was introduced in Gourieroux et al. (1993); see also Smith (1993).
As we shall see, the principle of II does more than just describing the fundamental ideas behind simulation-based estimators. It provides a general setting for statistical inference that relies on auxiliary statistics or auxiliary estimators (regardless of a possible need for simulations, or not). In essence, it deals with estimators that are defined as functionals of other estimators.
Following Gourieroux et al. (1993), let xT:= (x1, ..., xT) denote a T -period sam-
ple of observed data. Furthermore, suppose that the distribution of xTis implicitly
defined by the following dynamic model,
xt= h xt−1, zt,θ0 zt= g zt−1,t,θ0 , t∈ Z,
whereθ0∈ Θ ⊆ Rp, ztdenotes a vector of latent variables, andta vector of inno-
vations with known distribution D. Suppose that we are interested in conducting
inference on θ0, but that classical estimators are not available. Then, if all the features of the dynamic model are known (except for θ0) we can still proceed to estimateθ0by appealing to the principle of II. In particular, by ‘drawing’ from D,
we can obtain sequences ˜1, ..., ˜T and use these to simulate sequences of ‘artificial
data’, denoted ˜xT(θ), according to,
˜ xt= h ˜ xt−1, ˜zt,θ ˜ zt= g ˜ zt−1, ˜t,θ , t∈ N,
for any θ ∈ Θ. By repeating this procedure, we can obtain multiple simulated sequences ˜x1T(θ), ..., ˜xS
T(θ). Now, the idea of II is to make use of auxiliary estimators
ˆ
βT and ˜βT,s(θ) to ‘describe’ the properties of both observed data xTand simulated
data ˜xs
T, and then, to ‘search’ for the parameter θ ∈ Θ that makes simulated
data ˜xs
T(θ) as ‘similar’ as possible to observed data xT (as judged by the auxiliary
For concreteness, let ˆβT denote an estimator in Rq that ‘describes’ observed
data xT, and ˜βT,s(θ) denote the corresponding estimator obtained from the sth
sequence of simulated data ˜xs
T(θ). For example, ˆβT and ˜βT,s(θ) might consist
of sample moments, or correspond to estimators of a simpler model describing the dynamic properties of the data. All that matters is that they provide a ‘rich enough’ characterization of the distributions of xTand ˜xsT. In essence this means that ˜βT,s(θ)
should converge in an appropriate fashion to a singleton limit β∗(θ) that satisfies β∗(θ) = β∗(θ) for everyθ = θin Θ.
As a deterministic function ofθ, the limit β∗is called the binding function. The binding function plays an essential role in II estimation as its properties determine the ability to conduct inference on Θ through the use of auxiliary statistics.
Finally, we define the II estimator ˆθT as,
ˆ θT := arg min θ∈Θ μ ˆ βT , 1/S S s=1 ˜ βT,s(θ) ,
where μ is some ‘divergence’ that measures some notion of ‘distance’ between ˆβTand
1/SS
s=1β˜T,s(θ). For the special case of a ‘quadratic weighted divergence’ Gourier-
oux et al. (1993) show that, under appropriate regularity conditions, ˆθT converges
to θ0. The same authors show also that ˆθT is
√
T consistent and asymptotically
normal; see also Smith (1993).
We finish this section with a couple of notes on the generality of the II procedure that are important for the theory that follows. First, note that there is no need for auxiliary estimators to be parametric. In fact, non-parametric auxiliary estimators might be preferable in several occasions; see e.g. Billio and Monfort (2003) and Nickl and Pötscher (2009). Second, depending on the ‘objective’ of the econometric exercise, the requirement of correct specification can be weakened or even eliminated. Indeed, a considerable body of literature has been devoted to the study of (i) the properties of II estimators in misspecified models, including the properties of its
‘indirect pseudo-true limit’ θ∗0and the role of II estimators in testing encompassing hypothesis (Dhaene et al. 1998), (ii) the development of robust II estimators (Genton and Ronchetti 2003) and (iii) the use of II estimators in semi-parametric models (Dridi and Renault 2000).
The theory in this thesis differs from the above mentioned literature in the fol- lowing aspects. First, it allows not only the auxiliary parameter space to be infinite dimensional (as in Billio and Monfort (2003) and Nickl and Pötscher (2009)) but also the parameter space of interest Θ to be infinite dimensional. Second, interest lies not on a parametric subset of θ0 (as in Dridi and Renault 2000) but on the ‘unpartitioned’ parameterθ0. The parameterθ0 of interest defines completely the ‘true’ distribution. Third, interest lies not in potential effects of misspecification
1.6 Semi-Nonparametric Indirect Inference: Econometric