Organization of the Chapter and Notation - Importance Sampling: Computational Complexity and In

Chapter 4 Importance Sampling: Computational Complexity and In-

4.1.2 Organization of the Chapter and Notation

Section 4.2 describes importance sampling in abstract form. In Sections 4.3 and 4.4 the linear Gaussian inverse problem and the linear Gaussian filtering problem are studied. Our aim is to provide a digestible narrative and hence all proofs are left to an appendix in Section 4.6. Furthermore, as we study the inverse and filtering problems in both finite dimensional Euclidean space and infinite dimensional Hilbert space, there are some technical matters related to Gaussian measures in infinite dimensional spaces that we also detail in the appendix, Subsection 4.6.1, in order not to distract from the narrative flow.

Given a probability measureν on a measurable space (X,F) expectations of a measurable function φ : X → _R with respect to ν will be written as both ν(φ) and Eν[φ]. When it is clear which measure is being used we may drop the suffixν

and write simplyE[φ].Similarly, the variance will be written as Varν(φ) and again

we may drop the suffix when no confusion arises from doing so. All test functions φappearing in this chapter are assumed to be measurable.

We will be interested in sequences of measures indexed by time, by the state space dimension or by a tempering scheme. These are denoted with a subscript, e.g. νt, νd or νi. Anything to do with samples from a measure is denoted with a

superscript: N for the number of samples, andnfor the indices of the samples. The i-th coordinate of a vectoruis denoted by u(i).Thus,un_t(i) denotesi-th coordinate of the n-th sample from the measure of interest at time t. Finally, the law of a random variablev will be denoted by Pv.

4.1.3 Literature Review

Some early developments of importance sampling as a method to reduce the variance in Monte Carlo estimation date back to the early 1950’s [Kahn and Marshall, 1953], [Kahn, 1955]. In particular the paper [Kahn and Marshall, 1953] demonstrates how to optimally choose the proposal density for given test functionφand target density. A modern view of importance sampling in the general framework (4.1.1) is given in [Chopin and Papaspiliopoulos, 2016]. A comprehensive description of Bayesian inverse problems in finite state/data space dimensions can be found in [Kaipio and Somersalo, 2005], and its formulation in infinite dimensional spaces in [Dashti and Stuart, 2016; Lasanen, 2007, 2012a,b; Stuart, 2010]. Text books overviewing the subject of filtering and particle filters include [Del Moral, 2004; Bain and Crisan, 2009], and the article [Crisan and Doucet, 2002] provides a readable introduction to the area. For an up-to-date and in-depth survey of nonlinear filtering see [Crisan and Rozovskii, 2011]. The linear Gaussian inverse problem and the linear Gaussian filtering problem have been extensively studied because they arise naturally in many applications, lead to considerable algorithmic tractability, and provide theoretical insight. For references concerning linear Gaussian inverse problems see [Franklin, 1970; Mandelbaum, 1984; Lehtinen et al., 1989; Kekkonen et al., 2015]. The linear Gaussian filter –the Kalman filter– was introduced in [Kalman, 1960]; see [Lancaster and Rodman, 1995] for further analysis. The inverse problem of determining sub- surface properties of the Earth from surface measurements is discussed in [Oliver et al., 2008], while the filtering problem of assimilating atmospheric measurements for numerical weather prediction is discussed in [Kalnay, 2003].

The key role of ρ, the second moment of the Radon-Nikodym derivative between the target and the proposal, has long been acknowledged [Liu, 1996], [Pitt and Shephard, 1999]. The value of ρ is indeed known to be asymptotically linked to the effective sample size [Kong, 1992], [Kong et al., 1994], [Liu, 1996]. Recent justification for the use of the effective sample size within particle methods is given in [Whiteley et al., 2016]. We will provide a further nonasymptotic justification of

the relevance ofρthrough its appearance in error bounds on the error in importance sampling; in this context it is of relevance to highlight the paper [Crisan et al., 1998] which proved non-asymptotic bounds on the error in the importance-sampling based particle filter algorithm. We will also bound the importance sampling error in terms of different notions of distance between the target and the proposal measures, as in [Chen, 2005]; a useful overview of the subject of distances between probability measures is [Gibbs and Su, 2002].

We formulate problems in both finite dimensional and infinite dimensional state spaces. We refer to [Kallenberg, 2002] for a modern presentation of probability appropriate for understanding the material in this chapter. Some of our results are built on the rich area of Gaussian measures in Hilbert space; we include all the required background on this material in the appendix Subsection 4.6.1, and references are included there. However we emphasize that the presentation in the main body of the text is designed to keep technical material to a minimum and to be accessible to readers who are not versed in the theory of probability in infinite dimensional spaces. Absolute continuity of the target with respect to the proposal – or the existence of a density of the target with respect to the proposal – is central to our developments. This concept also plays a pivotal role in the understanding of Markov chain Monte Carlo (MCMC) methods in high and infinite dimensional spaces [Tierney, 1998]. A key idea in MCMC is that breakdown of absolute continuity on sequences of problems of increasing state space dimension is responsible for poor algorithmic performance with respect to increasing dimension; this should be avoided if possible, such as for problems with a well-defined infinite dimensional limit [Cotter et al., 2013]. Similar ideas will come in to play in this chapter.

As well as the breakdown of absolute continuity through increase in dimension, small noise limits can also lead to sequences of proposal/target measures which are increasingly close to mutually singular and for which absolute continuity breaks down. Small noise regimes are of theoretical and computational interest for both inverse problems and filtering. For instance, in inverse problems there is a growing interest in the study of the concentration rate of the posterior in the small observational noise limit, see [Knapik et al., 2011], [Agapiou et al., 2013], [Knapik et al., 2013], [Agapiou et al., 2014], [Ray, 2013], [Vollmer, 2013], [Kekkonen et al., 2015]. In filtering and multiscale diffusions, the analysis and development of improved propos- als to deal with small noise limits or simulation of rare events is an active research area [Vanden-Eijnden and Weare, 2012], [Zhang et al., 2013], [Dupuis et al., 2012], [Spiliopoulos, 2013] [Tu et al., 2013].

rent concept is that of intrinsic dimension. Several notions of intrinsic dimension have been used in different fields, including dimension of learning problems [Bishop, 2006], [Zhang, 2002], [Zhang, 2005], of statistical inverse problems [Lu and Math´e, 2014], of functions in the context of quasi Monte Carlo (QMC) integration in fi- nance applications [Caflisch et al., 1997], [Moskowitz and Caflisch, 1996], [Kuo and Sloan, 2005], and of data assimilation problems [Chorin and Morzfeld, 2013]. The underlying theme is that in many application areas where models are formulated in high dimensional state spaces, there is often a small subspace which captures most of the features of the system. It is the dimension of this subspace that ef- fects the complexity of the problem. In the context of inverse problems the paper [Bengtsson et al., 2008] proposed a notion of intrinsic dimension, which was shown to have a direct connection with the performance of importance sampling. We introduce a further notion of intrinsic dimension for Bayesian inverse problems which agrees with the notion of effective number of parameters used in machine learning and statistics [Bishop, 2006]. We also establish that this notion of dimension and the one in [Bengtsson et al., 2008] are finite, or otherwise, at the same time. Both intrinsic dimensions account for three key features of the complexity of the inverse problem: the nominal dimension (i.e. the minimum of the dimension of the state space and the data), the size of the observational noise and the regularity of the prior relative to the observation noise. Varying the parameters related to these three features may cause a break-down of absolute continuity. The deterioration of importance sampling in large nominal dimensional limits has been widely investigated [Bengtsson et al., 2008], [Bickel et al., 2008], [Snyder et al., 2008], [Snyder et al., 2015], [Snyder, 2011], [Slivinski and Snyder, 2015]. In particular, the key role of the intrinsic dimension, rather than the nominal one, in explaining this deterioration was studied in [Bengtsson et al., 2008]. Here we study the different behaviour of importance sampling as absolute continuity is broken in the three regimes above, and we investigate whether, in all these regimes, the deterioration of importance sampling may be quantified by the various intrinsic dimensions that we introduce.

In document Assimilating data into mathematical models (Page 89-92)