Sparse tensor discretizations of high-dimensional parametric and stochastic PDEs

(1)

doi:10.1017/S0962492911000055 Printed in the United Kingdom

Sparse tensor discretizations of

high-dimensional parametric and

stochastic PDEs

∗

Christoph Schwab and Claude Jeﬀrey Gittelson

Seminar for Applied Mathematics, ETH Zürich, Rämistrasse 101, CH-8092 Zürich, Switzerland

E-mail: [email protected],[email protected]

Partial differential equations (PDEs) with random input data, such as ran-dom loadings and coefficients, are reformulated as parametric, deterministic PDEs on parameter spaces of high, possibly infinite dimension. Tensorized operator equations for spatial and temporal k-point correlation functions of their random solutions are derived. Parametric, deterministic PDEs for the laws of the random solutions are derived. Representations of the random so-lutions’ laws on infinite-dimensional parameter spaces in terms of ‘generalized polynomial chaos’ (GPC) series are established. Recent results on the regu-larity of solutions of these parametric PDEs are presented. Convergence rates of best N-term approximations, for adaptive stochastic Galerkin and collo-cation discretizations of the parametric, deterministic PDEs, are established. Sparse tensor products of hierarchical (multi-level) discretizations in physical space (and time), and GPC expansions in parameter space, are shown to con-verge at rates which are independent of the dimension of the parameter space. A convergence analysis of multi-level Monte Carlo (MLMC) discretizations of PDEs with random coefficients is presented. Sufficient conditions on the random inputs for superiority of sparse tensor discretizations over MLMC dis-cretizations are established for linear elliptic, parabolic and hyperbolic PDEs with random coefficients.

∗_{Work partially supported by the European Research Council under grant number ERC}

AdG 247277-STAHDPDE and by the Swiss National Science Foundation under grant number SNF 200021-120290/1.

(2)

CONTENTS

Introduction 292

1 Operator equations with stochastic data 296 2 Stochastic Galerkin discretization 332 3 Optimal convergence rates 367 4 Sparse tensor discretizations 394

Appendix

A Review of probability 419 B Review of Hilbert spaces 428 C Review of Gaussian measures on Hilbert spaces 439

References 461

Introduction

The numerical solution of partial differential equation models in science and engineering has today reached a certain maturity, after several decades of progress in numerical analysis, mathematical modelling and scientific computing. While there certainly remain numerous mathematical and al-gorithmic challenges, for many ‘routine’ problems of engineering interest, today numerical solution methods exist which are mathematically under-stood and ‘operational’ in the sense that a number of implementations exist, both academic and commercial, which realize, in the best case, algorithms of provably optimal complexity in a wide range of applications. As a rule, the numerical analysis and the numerical solution methods behind such al-gorithms suppose that a model of the system of interest is described by a well-posed (in the sense of Hadamard) partial differential equation (PDE), and that the PDE is to be solved numerically to prescribed accuracy for one given set of input data.

With the availability of highly accurate numerical solution algorithms for a PDE of interest andone prescribed set ofexact input data (such as source terms, constitutive laws and material parameters) there has been increasing awareness of the limited signiﬁcance of such single, highly accurate ‘forward’ solves. Assuming, as we will throughout this article, that the PDE model of the physical system of interest is correct, this trend is due to two rea-sons: randomness and uncertainty of input data and the need for eﬃcient prediction of system responses on high-dimensional parameter spaces.

First, the assumption of availability of exact input data is not realistic: often, the simulation’s input parameters are obtained from measurements or from sampling a large, but ﬁnite number of specimens or system snap-shots which are incomplete or stochastic. This is of increasing importance in classical engineering disciplines, but even more so in emerging models in

(3)

the life sciences and social sciences. Rather than producing efficiently accu-rate answers for single instances of exact input data, increasingly the goal of computation in numerical simulations is to efficiently process statistical information on uncertain input data for the PDE of interest. While math-ematical formulations of PDEs with random inputs have been developed with an eye towards uncorrelated, or white noise inputs (see, e.g., Holden, Oksendal, Uboe and Zhang (1996), Da Prato and Zabczyk (1992), Da Prato (2006), Lototsky and Rozovskii (2006), Prévôt and Röckner (2007), Dalang, Khoshnevisan, Mueller, Nualart and Xiao (2009) and the references therein), PDEs with random inputs in numerical simulation in science and engineer-ing are of interest in particular in the case of so-calledcorrelated inputs (or ‘coloured noise’).

Second, in the context of optimization, or of risk and sensitivity anal-ysis for complex systems with random inputs, the interest is in comput-ing the systems’ responses eﬃciently given dependence on several, possibly countably many parameters, thereby leading to the challenge ofnumerical simulation of deterministic PDEs on high-dimensional parameter spaces.

Often, the only feasible approach in numerical simulation towards these two problems is to solve the forward problem for many instances, or samples, of the PDE’s input parameters; for random inputs, this amounts to Monte Carlo-type sampling of the noisy inputs, and for parametric PDEs, responses of the system are interpolated from forward solves at judiciously chosen combinations of input parameters.

With the cost of one ‘sample’ being the numerical solution of a PDE, it is immediate that, in particular for transient problems in three spatial dimensions with solutions that exhibit multiple spatial and temporal length scales, the computational cost of uniformly sampling the PDE solution on the parameter space (resp. the probability space) is prohibitive. Responding to this by massive parallelism may alleviate this problem, but ultimately, the low convergence rate 1/2 of Monte Carlo (MC) sampling, respectively the so-called ‘curse of dimensionality’ of standard interpolation schemes in high-dimensional parameter spaces, requires advances at the mathemati-cal core of the numerimathemati-cal PDE solution methods: the development of novel mathematical formulations of PDEs with random inputs, the study of the regularity of their solutions is of interest, both with respect to the physi-cal variables and with respect to parameters, and the development of novel discretizations and solution methods of these formulations. Importantly, the parameters may take values in possibly inﬁnite-dimensional parame-ter spaces: for example, in connection with Karhunen–Lo`eve expansions of spatially inhomogeneous and correlated inputs.

The present article surveys recent contributions to the above questions. Our focus is on linear PDEs with random inputs; we present various formu-lations, new results on the regularity of their solutions and, based on these

(4)

regularity results, we design, formulate and analyse discretization schemes which allow one to ‘sweep’ the entire, possibly infinite-dimensional input parameter space approximately in a single computation. We also establish, for the algorithms proposed here, bounds on their efficiency (understood as accuracy versus the number of degrees of freedom) that do not deteriorate with respect to increasing dimension of the computational parameter do-main, i.e., that are free from the curse of dimensionality. The algorithms proposed here are variants and refinements of the recently proposed stochas-tic Galerkin and stochasstochas-tic collocation discretizations (see,e.g., Xiu (2009) and Matthies and Keese (2005) and the references therein for an account of these developments). We exhibit assumptions on the inputs’ correlations which ensure an efficiency of these algorithms which is superior to that of MC sampling. One insight that emerges from the numerical analysis of re-cently proposed methods is thatthe numerical resolution in physical space need not be high uniformly on the entire parameter space. The use of ‘poly-nomial chaos’ type spectral representations (and their generalizations) of the laws of input and output random fields allows a theory of regularity of the random solutions and, based on this, the optimization of numerical methods for their resolution. Here, we have in mind discretizations in phys-ical space and time as well as in stochastic or parameter space, aiming at achieving a prespecified accuracy with minimal computational work. From this broad view, the recently proposedmulti-level Monte Carlo methods can also be interpreted as sparse tensor discretizations. Accordingly, we present in this article an error analysis of single- and multi-level MC methods for elliptic problems with random inputs.

As this article’s title suggests, the notion ofsparse tensor products of op-erators and hierarchical sequences of finite-dimensional subspaces pervades our view of numerical analysis of high-dimensional problems. Sparsity in connection with tensorization has become significant in several areas of sci-entific computing in recent years: in approximation theory as hyperbolic cross approximations (see, e.g., Temlyakov (1993)) and, in finite element and finite difference discretizations, the so-calledsparse grids (see Bungartz and Griebel (2004) and the references therein) are particular instances of this concept. We note in passing that the range of applicability of sparse ten-sor discretizations extends well beyond stochastic and parametric problems (see, e.g., Schwab (2002), Hoang and Schwab (2004/05) and Schwab and Stevenson (2008) for applications to multiscale problems). On the level of numerical linear algebra, the currently emerginghierarchical low-rank ma-trix formats, which were inspired by developments in computational chem-istry, are closely related to some of the techniques developed here.

The present article extends these concepts in several directions. First, on the level of mathematical formulation of PDEs with random inputs: we present deterministic tensorized operator equations for two- and k-point

(5)

correlation functions of the the random system responses. Such equations also arise in the context of moment closures of kinetic models in atomistic-to-continuum transitions. Discretizations for their efficient, deterministic numerical solution may therefore be of interest in their own right. For the spectral discretizations, we review the polynomial chaos representation of random fields and the Wiener–Itô chaos decomposition of probability spaces and of random fields into tensorized Hermite polynomials of a countable number of Gaussians. The spectral representation of random outputs of PDEs allows for a regularity theory of the laws of random fields which goes substantially beyond the mere existence of moments.

According to the particular application, in this article sparsity in tensor discretizations appears in roughly three forms. First, we use sparse tensor products of multi-level ﬁnite element spaces in the physical domainD⊂Rd to build eﬃcient schemes for the Galerkin approximation of tensorized equa-tions for k-point correlation functions. Second, we consider heterogeneous

sparse tensor product discretizations ofmulti-level finite element, finite vol-ume and finite difference discretizations in the physical domain with hierar-chical polynomial chaos bases in the probability space. As we will show, the use of multi-level discretizations in physical space actually leads to substan-tial efficiency gains in MC methods; nevertheless, the resulting multi-level MC methods are of comparable efficiency as sparse tensor discretizationsfor random outputs with finite second moments. However, as soon as the out-puts have additional summability properties (and the examples presented here suggest that this is so in many cases), adaptive sparse tensor discretiza-tions outperform MLMC methods.

The outline of the article is as follows. We first derive tensorized oper-ator equations for deterministic, linear equations with random data. We establish the well-posedness of these tensorized operator equations, and in-troduce sparse tensor Galerkin discretizations based on multi-level, wavelet-type finite element spaces in the physical domain. We prove, in particular, stability of sparse tensor discretizations in the case of indefinite operators such as those arising in acoustic or electromagnetic scattering. We also give an error analysis of MC discretizations which indicates the dependence of its convergence rate on the degree of summability of the random solution.

Section 2 is devoted to stochastic Galerkin formulations of PDEs with random coeﬃcients. Using polynomial chaos representations of the random inputs, for example in a Karhunen–Lo`eve expansion, we give a reformulation of the random PDEs of interest as deterministic PDEs which are posed on

inﬁnite-dimensional parameter spaces. While the numerical solution of these PDEs with standard tools from numerical analysis is foiled by the curse of dimensionality (the raison d’ˆetre for the use of sampling methods on the stochastic formulation), we review recent regularity results for these prob-lems which indicate that sparse, adaptive tensorization of discretizations

(6)

in probability and physical space can indeed produce solutions whose accu-racy, as a function of work, is independent of the dimension of the parameter space. We cover both aﬃne dependence, as is typical in Karhunen–Lo`eve representations of the random inputs, as well as log-normal dependence in inputs. We focus on Gaussian and on uniform measures, where ‘polynomial chaos’ representations use Hermite and Legendre polynomials, respectively (other probability measures give rise to other polynomial systems: see,e.g., Schoutens (2000) and Xiu and Karniadakis (2002b)). Section 3 addresses the regularity of the random solutions in these polynomial chaos represen-tations by an analysis of the associated parametric, deterministic PDE for their laws. The analysis allows us to deduce bestN-term convergence rates ofpolynomial chaos semidiscretizations of the random solutions’ laws.

Section 4 combines the results from the preceding sections with space and time discretizations in the physical domain. The error analysis of fully discrete algorithms reveals that it is crucial for eﬃciency that the level of spatial and temporal resolution be allowed to depend on the stochastic mode being discretized. Our analysis shows that, in fact, a highly non-uniform level of resolution in physical space should be adopted in order to achieve algorithms that scale favourably with respect to the dimension of the space of stochastic parameters.

As this article and the subject matter draw on tools from numerical anal-ysis, from functional analysis and from probability theory, we provide some background reference material on the latter two items in the Appendix. This is done in order to fix the notation used in the main body of the text, and to serve as a reference for readers with a numerical analysis background. Naturally, the selection of the background material is biased towards the subject matter of the main text. It does not claim to be a reference on these subjects. For a more thorough introduction to tools from probabil-ity and stochastic analysis we refer the reader to Bauer (1996), Da Prato (2006), Da Prato and Zabczyk (1992), Prévôt and Röckner (2007) and the references therein.

1. Sparse tensor FEM for operator equations with stochastic data

For the variational setting of linear operator equations with deterministic, boundedly invertible operators, we assume thatX, Y are separable Hilbert spaces over R with duals X and Y, respectively, and A ∈L(X, Y) a lin-ear, boundedly invertible deterministic operator. We denote its associated bilinear form by

(7)

Here, and throughout, forw∈Y andv∈Xthe bilinear form_Yw, vX

de-notes theY×Xduality pairing. As is well known (see,e.g., Theorem C.20) the operatorA fromX onto Y is boundedly invertible if and only if a(·,·) satisﬁes the following conditions.

(i) a(·,·) is continuous: there existsC1 <∞ such that

∀w∈X, v∈Y : |a(w, v)| ≤C1wXvY. (1.2)

(ii) a(·,·) is coercive: there existsC2 >0 such that

inf 0=w∈X₀₌sup_v_∈_Y a(w, v) wXvY ≥ C2>0. (1.3) (iii) a(·,·) is injective: ∀0=v∈Y : sup 0=w∈X a(w, v)>0. (1.4)

If (1.2)–(1.4) hold, then for everyf ∈Y the linear operator equation u∈X : a(u, v) =_Yf, vX ∀v∈Y (1.5)

admits a unique solutionu∈X such that

uX ≤C2−1fY. (1.6)

We consider equation (1.5) with stochastic data: to this end, let (Ω,F,P) be a probability space and letf : Ω→Y be a random field,i.e., a measurable map from (Ω,F,P) into Y which is Gaussian (see Appendix C for the definition of Gaussian random fields). Analogous to the characterization of Gaussian random variables by their mean and their (co)variance, a Gaussian random fieldf ∈ L2(Ω,F,P;Y) is characterized by its mean af ∈ Y and

its covariance operatorQf ∈ L+1(Y).

We use the followinglinear operator equation with Gaussian data: given f ∈L2_(Ω_,_F_,_P_;_Y_{), ﬁnd}_u_∈_L2_(Ω_,_F_,_P_;_X_{) such that}

Au=f in L2(Ω,F,P;Y) (1.7) admits a unique solutionu∈L2(Ω,F,P;X) if and only if Asatisﬁes (1.2)– (1.4).

By Theorem C.31, the unique random solutionu∈L2(Ω,F,P;X) of (1.7) is Gaussian with associated Gaussian measureNau,Qu onX which, in turn,

ischaracterized by the solution’s mean,

au = mean(u) =A−1af, (1.8)

and the solution’s covariance operator Qu ∈ L+1(X), which satisﬁes the

(deterministic) equation

(8)

In the Gaussian case, therefore, solving the stochastic problem (1.7) can be reduced to solving the twodeterministic problems (1.8) and (1.9). Whereas the mean-ﬁeld problem (1.8) is one instance of the operator equation (1.7), the covariance equation (1.8) is an equation for the operator Qu ∈ L+₁(X).

As we show in Theorem C.31, this operator is characterized by the so-called covariance kernel Cu, which satisﬁes, in terms of the corresponding

covariance kernelCf of the data, the covariance equation (see (C.50))

(A⊗A)Cu=Cf, (1.10)

which is understood to hold in the sense of (Y ⊗Y) Y ⊗Y. One approach to the numerical treatment of operator equationsAu=f, where the data f are random ﬁelds, i.e., measurable maps from a probability space (Ω,F,P) into the set Y of admissible data for the operatorA, is via tensorized equations such as (1.10) for their statistical moments.

The simplest approach to the numerical solution of the linear operator equation Au = f with random input f is Monte Carlo (MC) simulation,

i.e., generating a large numberM of i.i.d. data samplesfj and solving,

pos-sibly in parallel, for the corresponding solution ensemble{uj =A−1fj; j =

1, . . . , M}. Statistical moments and probabilities of the random solution u are then estimated from {uj}. As we will prove, convergence of the MC

method as the numberM of samples increases is ensured (for suitable sam-pling) by the central limit theorem. We shall see that the MC method allows in general only the convergence rateO(M−1/2).

If statistical moments, i.e., mean-field and higher-order moments of the random solutionu, are of interest, one can exploit the linearity of the equa-tion Au=f to derive a deterministic equation for the kth moment of the random solution, similar to the second-moment equation (1.10); this deriva-tion is done in Secderiva-tion 1.1. For the Laplace equaderiva-tion with stochastic data, this approach is due to I. Babuˇska (1961). We then address the numerical computation of the moments of the solution by either Monte Carlo or by direct, deterministic finite element computation. If the physical problem is posed in a domain D ⊂ Rd, the kth moment of the random solution is defined in the domainDk _⊂_Rkd_{; standard finite element (FE)}

approxima-tions will therefore be inadequate for the eﬃcient numerical approximation of thekth moments of the random solution.

The efficient deterministic equation and its FE approximation were in-vestigated in Schwab and Todor (2003a, 2003b) in the case where A is an elliptic partial differential operator. It was shown that the kth moment of the solution could be computed in a complexity comparable to that of an FE solution for the mean-field problem by the use of sparse tensor products of standard FE spaces for which a hierarchical basis is available. The use of sparse tensor product approximations is a well-known device in high-dimensional numerical integration going back to Smolyak (1963), in

(9)

multivariate approximation (Temlyakov 1993), and in complexity theory; see Wasilkowski and Wo´zniakowski (1995) and the references therein.

In the present section, we address the case whenAis a non-local operator, such as a strongly elliptic pseudodifferential operator, as arises in the bound-ary reduction of boundbound-ary value problems for strongly elliptic partial differ-ential equations. In this case, efficient numerical solution methods require, in addition to Galerkin discretizations of the operator equation, some form of matrix compression (such as the fast multipole method or wavelet-based matrix compression) which introduces additional errors into the Galerkin solution that will also affect the accuracy of second and higher moments. We briefly present the numerical analysis of the impact of matrix compres-sions on the efficient computation of second and higher moments of the random solution. Therefore, the present section will also apply to strongly elliptic boundary integral equations obtained by reduction to the boundary manifoldD=∂Dof elliptic boundary value problems in a bounded domain

D ⊂Rd+1_{, as is frequently done in acoustic and electromagnetic scattering.}

For such problems with stochastic data, the boundary integral formulation leads to an operator equation Au = f, where A is an integral operator or, more generally, a pseudodifferential operator acting on function spaces on ∂D. The linearity of the operator equation allows, without any closure hypothesis, formulation of a deterministic tensor equation for the k-point correlation function of the random solution u = A−1f. We show that, as in the case of differential operators, sparse tensor products of standard FE spaces allow deterministic approximation of thekth moment of the random solutionuwith relatively few degrees of freedom. To achieve computational complexity which scales log-linearly in the number of degrees of freedom in a Galerkin discretization of the mean-field problem, however, the Galerkin matrix for the operatorA must be compressed.

Accordingly, one purpose of this section is the design and numerical anal-ysis of deterministic and stochastic solution algorithms to obtain the kth moment of the random solution of possibly non-local operator equations with random data in log-linear complexity in the number N of degrees of freedom for the mean-ﬁeld problem.

We illustrate the sparse tensor product Galerkin methods for the nu-merical solution of Dirichlet and Neumann problems for the Laplace or Helmholtz equation with stochastic data. Using a wavelet Galerkin ﬁnite element discretization allows straightforward construction of sparse tensor products of the trial spaces, and yields well-conditioned, sparse representa-tions of stiﬀness matrices for the operatorA as well as for its k-fold tensor product, which is the operator arising in thekth-moment problem.

We analyse the impact of the operator compression on the accuracy of functionals of the Galerkin solution, such as far-ﬁeld evaluations of the ran-dom potential in a point. For example, means and variances of the potential

(10)

in a point can be computed with accuracyO(N−p) for any ﬁxed orderp, for random boundary data with known second moments in O(N) complexity, whereN denotes the number of degrees of freedom on the boundary.

The outline of this section is as follows. In Section 1.1, we describe the operator equations considered here and derive the deterministic problems for the higher moments, generalizing Schwab and Todor (2003b). We estab-lish the Fredholm property for the tensor product operator and regularity estimates for the statistical moments in anisotropic Sobolev spaces with mixed highest derivative. Section 1.2 addresses the numerical solution of the moment equations, in particular the impact of various matrix compres-sions on the accuracy of the approximated moments, the preconditioning of the product operator and the solution algorithm. In Section 1.4, we discuss the implementation of the sparse Galerkin and sparse MC methods and estimate their asymptotic complexity. Section 1.5 contains some examples from ﬁnite and boundary element methods.

1.1. Operator equations with stochastic data Linear operator equations

We specialize the general setting (1.1) to the caseX=Y =V, and consider the operator equation

Au=f, (1.11)

where A is a bounded linear operator from the separable Hilbert space V into its dualV.

The operator A is a differential or pseudodifferential operator of order on a bounded d-dimensional manifold D, which may be closed or have a boundary. Here, for a closed manifold and for s ≥ 0, ˜Hs(D) := Hs(D) denotes the usual Sobolev space. For s < 0, we define the spaces Hs(D) and ˜Hs₍_D_{) by duality. For a manifold} _D _{with boundary we assume that}

this manifold can be extended to a closed manifold ˜D, and deﬁne ˜

Hs(D) :={u|_D ; u∈Hs( ˜D), u|_D˜_\_D = 0}

with the induced norm. If D is a bounded domain inRd we use ˜D:= Rd. We now assume that V =H/2(D). In the case when A is a second-order diﬀerential operator, this means that we have Dirichlet boundary conditions (other boundary conditions can be treated in an analogous way).

The manifold D may be smooth, but we also consider the case whenD is a polyhedron inRd, or the boundary of a polyhedron inRd+1, or part of the boundary of a polyhedron.

For the deterministic operatorAin (1.11), we assume strong ellipticity in the sense that there existsα >0 and a compact operator T :V →V such

(11)

that the G˚arding inequality

∀v∈V : (A+T)v, v≥αv2_V (1.12) holds. For the deterministic algorithm in Section 1.4 we need the slightly stronger assumption thatTis smoothing with respect to a scale of smooth-ness spaces (see (1.63) below). Here and in what follows, ·,· denotes the V×V duality pairing. We assume also thatA is injective,i.e., that

kerA={0}, (1.13) which implies that for everyf ∈V, (1.11) admits a unique solution u∈V and, moreover, thatA−1 :V →V is continuous, i.e., there exists CA >0

such that, for allf ∈V,

uV =A−1fV ≤CAfV. (1.14)

Here CA =C2−1 with the constantC2 as in (1.3). We shall consider (1.11)

in particular for dataf, which are Gaussian random ﬁelds on the data space V. By the linearity of the operator equation (1.11), then the solutionv∈V is a Gaussian random ﬁeld as well. Throughout, we assume thatV and V are separable Hilbert spaces.

Random data

A Gaussian random ﬁeldf with values in a separable Hilbert spaceX is a mappingf: Ω→X which maps events E ∈Σ to Borel sets in X, and such that the image measuref#P on X is Gaussian. In the following, we allow

more general random ﬁelds. Of particular interest will be their summability properties. We say that a random ﬁeldu: Ω→ X is in the Bochner space L1(Ω;X) ifω→ u(ω)X is measurable and integrable so thatuL1(Ω;X):=

Ωu(ω)X P(dω) is ﬁnite. In particular, then the ‘ensemble average’

Eu:=

Ω

u(ω)P(dω)∈X

exists as a Bochner integral ofX-valued functions, and it satisﬁes

EuX ≤ uL1(Ω;X). (1.15)

Let k ≥ 1. We say that a random ﬁeld u: Ω → X is in the Bochner space Lk(Ω;X) if uk_Lk_(Ω;_X₎ =

Ωu(ω)kXP(dω) is ﬁnite. Note that ω → u(ω)k_X is measurable due to the measurability ofu and the continuity of the norm·X on X. Also,Lk(Ω;X)⊃Ll(Ω;X) for k < l.

LetB ∈ L(X, Y) denote a continuous linear mapping from X to another separable Hilbert spaceY. For a random ﬁeld u∈Lk(Ω;X), this mapping deﬁnes a random variable v(ω) = Bu(ω) taking values in Y. Moreover, v∈Lk(Ω;Y) and we have

(12)

where the constantC is given by C =B_L(X,Y). In addition, we have B Ω uP(dω) = Ω BuP(dω). (1.17)

MC estimation of statistical moments

We are interested in statistics of the random solution u of (1.11) and, in particular, in statistical moments. To deﬁne them, for a separable Hilbert spaceX and for anyk∈N we deﬁne thek-fold tensor product space

X(k) =X ⊗ · · · ⊗ X

ktimes

,

and equip it with the natural cross-norm · _X(k). The signiﬁcance of a

cross-norm was emphasized by Schatten. The cross-norm has the property that, for everyu1, . . . , uk∈X,

u1⊗ · · · ⊗ukX(k) =u1X· · · ukX (1.18)

(see Light and Cheney (1985) and the references therein for more on cross-norms on tensor product spaces). The k-fold tensor products of, for ex-ample, X are denoted analogously by (X)(k). For u ∈ Lk(Ω;X) we now consider the random ﬁeldu(k) deﬁned byu(ω)⊗ · · · ⊗u(ω). By Lemma C.9, u(k)=u⊗ · · · ⊗u∈L1(Ω;X(k)), and we have theisometry

u(k)_L1_(Ω;_X(k)₎ = Ω u(ω)⊗ · · · ⊗u(ω)_X(k)P(dω) (1.19) = Ω u(ω)X· · · u(ω)XP(dω) =uk_Lk_(Ω;_X₎.

We deﬁne the momentMku as the expectation ofu⊗ · · · ⊗u.

Deﬁnition 1.1. For u ∈ Lk_(Ω;_X_{), for some integer} _k _≥ _{1, the} _k_th

mo-ment ofu(ω) is deﬁned by Mk_u₌_E_u_{⊗ · · · ⊗}_u ktimes = ω∈Ω u(ω)⊗ · · · ⊗u(ω) ktimes P(dω)∈X(k). (1.20) Note that (1.15) and (1.18) give, with Jensen’s inequality and the con-vexity of the norm · V →R, the bound

Mk_u

X(k) =Eu(k)_X(k) ≤Eu(k)_X(k) =Euk_X =uk_Lk_(Ω;_X₎. (1.21)

Deterministic equation for statistical moments

We now consider the operator equation Au = f, where f ∈ Lk_(Ω;_V_{) is}

(13)

(1.16), (1.14) and (1.21), that u∈Lk(Ω;V), and that we have thea priori

estimate

Mk_u

V(k) ≤ uk_Lk_(Ω;_V₎≤C_Akfk_Lk_(Ω;_V). (1.22)

Remark 1.2. One example of a probability measurePonXis a Gaussian measure; we refer to,e.g., Vakhania, Tarieladze and Chobanyan (1987) and Ledoux and Talagrand (1991) for general probability measures over Banach spaces X and, in particular, to Bogachev (1998) and Janson (1997) for a general exposition of Gaussian measures on function spaces.

SinceA−1:V →V in (1.11) is bijective, by (1.12) and (1.13), it induces a measureP:=A−_#1Pon the spaceV of solutions to (1.11). IfPis Gaussian overVandAin (1.11) is linear, thenPis Gaussian overV by Theorem C.18. We recall that a Gaussian measure is completely determined by its mean and covariance, and hence onlyMkufork= 1,2 are of interest in this case. We now consider the tensor product operatorA(k)=A⊗· · ·⊗A(ktimes). This operator mapsV(k) to (V)(k). Forv∈V andg:=Av, we obtain that A(k)_v_{⊗ · · · ⊗}_v₌_g_{⊗ · · · ⊗}_g_{. Consider a random ﬁeld}_u_∈_Lk_(Ω;_V_{) and let}

f :=Au∈Lk(Ω;V). Then the tensor productu(k)=u⊗ · · · ⊗u (k times) belongs to the spaceL1_(Ω;_V(k)_{), and we obtain from (1.17) with}_B ₌_A(k)

that thek-point correlations u(k) satisfyP-a.s. the tensor equation A(k)u(k)=f(k),

wheref(k)_∈_L1_{(Ω; (}_V₎(k)_{). Now (1.17) implies}_{for linear and deterministic} operators Athat the k-point correlation functions of the random solutions,

i.e., the expectations Mku =E[u(k)], are solutions of the tensorized equa-tions

A(k)Mku=Mkf. (1.23) In the case k = 1 this is just the equation AEu = Ef for the mean field. Note that this equation provides a way to compute the moments Mku of the random solution in a deterministic fashion, for example by Galerkin discretization. As mentioned before, with the operatorAacting on function spacesX,Y in the domainD⊂Rd, the tensor equation (1.23) will require discretization inDk, the k-fold Cartesian product of D with itself. Using tensor products of, for instance, finite element spaces inD, we find fork >1 a reduction of efficiency in terms of accuracy versus number of degrees of freedom due to the ‘curse of dimensionality’. This mandates sparse tensor product constructions.

We will investigate the numerical approximation of the tensor equation (1.23) in Section 1.4. The direct approximation of (1.23) by, for example, Galerkin discretization is an alternative to the Monte Carlo approximation of the moments which will be considered in Section 1.3.

(14)

In the deterministic approach, explicit knowledge of all joint probability densities off (i.e., the law off) with respect to the probability measureP is not required to determine the order-k statistics of the random solutionu from order-k statistics off.

Remark 1.3. For nonlinear operator equations, associated systems of mo-ment equations require a closure hypothesis, which must be additionally imposed and veriﬁed. For the linear operator equation (1.11), however, a closure hypothesis is not necessary, as (1.23) holds.

For solvability of (1.23), we consider the tensor product operator A1 ⊗

A2⊗ · · · ⊗Ak for operators Ai ∈ L(Vi, Vi), i= 1, . . . , k.

Proposition 1.4. For integerk >1, letVi,i= 1, . . . , k be Hilbert spaces

with duals V_i, and let Ai ∈ L(Vi, Vi) be injective and satisfy a G˚arding

inequality,i.e., there are compact Ti ∈ L(Vi, Vi) and αi>0 such that ∀v∈Vi:

(Ai+Ti)v, v

≥αiv2Vi, (1.24)

where·,·denotes the V_i×Vi duality pairing.

Then the product operator A = A1⊗A2⊗ · · · ⊗Ak ∈ L(V,V), where V=V1⊗V2⊗ · · · ⊗Vk andV = (V1⊗V2⊗ · · · ⊗Vk) ∼=V1⊗V2⊗ · · · ⊗Vk,

is injective, and for every f ∈ V, the problem Au = f admits a unique solutionu with

u_V ≤Cf_V.

Proof. The injectivity and the G˚arding inequality (1.24) imply the bounded invertibility ofAi for eachi. This implies the bounded invertibility ofAon V _{→ V} _{since we can write}

A= (A1⊗Ik−1)◦(I⊗A2⊗I(k−2))◦ · · · ◦(I(k−1)⊗Ak),

whereI(j) denotes thej-fold tensor product of the identity operator on the appropriateVi. Note that each factor in the composition is invertible.

To apply this result to (1.23), we require the special case A(k):=A ⊗A⊗ · · · ⊗ A

ktimes

∈ L(V(k),(V)(k)) =L(V(k),(V(k))). (1.25) Theorem 1.5. If A in (1.11) satisﬁes (1.12) and (1.13), then for every k > 1 the operator A(k) ∈ L(V(k), (V)(k)) is injective on V(k), and for everyf ∈Lk(Ω;V), the equation

A(k)Z =Mkf (1.26) has a unique solutionZ ∈V(k).

(15)

This solution coincides with thekth moment Mkuof the random ﬁeld in (1.20):

Z =Mku.

Proof. By (1.21), the assumption f ∈ Lk_(Ω;_V_{) ensures that} _Mk_f _∈

(V)(k). The unique solvability of (1.26) follows immediately from Propo-sition 1.4 and the assumptions (1.12) and (1.13). The identity Z = Mku follows from (1.23) and the uniqueness of the solution of (1.26).

Regularity

The numerical analysis of approximation schemes for (1.26) will require a regularity theory for (1.26). To this end we introduce a smoothness scale (Ys)s≥0 for the data f with Y0 = V and Ys ⊂ Yt for s > t. We assume

that we have a corresponding scale (Xs)s≥0 of ‘smoothness spaces’ for the

solutions with X0 = V and Xs ⊂ Xt for s > t, so that A−1:Ys → Xs is

continuous.

When Dis a smooth closed manifold of dimension dembedded into Eu-clidean space Rd+1_{, we choose} _Y

s = H−/2+s(D) and Xs = H/2+s(D).

The case of diﬀerential operators with smooth coeﬃcients in a manifoldD with smooth boundary is also covered within this framework by the choices Ys=H−/2+s(D) andXs=H/2∩H/2+s(D). Note that in other cases (a

pseudodiﬀerential operator on a manifold with boundary, or a diﬀerential operator on a domain with non-smooth boundary), the spaces Xs can be

chosen as weighted Sobolev spaces which contain functions that are singular at the boundary.

Theorem 1.6. Assume (1.12) and (1.13), and that there is ans∗ >0 such that A−1 : Ys → Xs is continuous for 0 ≤ s ≤ s∗. Then we have for all

k≥1 and for 0≤s≤s∗ some constantC(k, s) such that

Mk_u X(sk) ≤CM k_f Ys(k) =Cf k Lk_(Ω;_Ys₎. (1.27)

Proof. If (1.12) and (1.13) hold, then the operatorA(k) is invertible, and

Mk_u_{= (}_A(k)₎₋1_Mk_f _{= (}_A₋1₎(k)_Mk_f. Since A−1fXs ≤CsfYs, 0≤s≤s∗, it follows that Mk_u Xs(k) =(A −1₎(k)_Mk_f Xs(k) ≤C k sMkf_Y(k) s , 0≤s≤s ∗_.

1.2. Finite element discretization

In order to obtain a ﬁnite-dimensional problem, we need to discretize in both Ω and D. For D we will use a nested family of ﬁnite element spaces V ⊂V,= 0,1, . . . .

(16)

Nested ﬁnite element spaces

The Galerkin approximation of (1.11) is based on a sequence {V}∞₌₀ of

subspaces ofV of dimension N = dimV <∞ which are dense in V,i.e.,

V =_≥₀V, and nested,i.e.,

V0 ⊂V1 ⊂V2 ⊂ · · · ⊂V⊂V+1 ⊂ · · · ⊂V. (1.28)

We assume that for functions u in the smoothness spacesXs with s≥0

we have the asymptoticapproximation rate

inf

v∈Vu−vV ≤CN

−s/d

uXs. (1.29)

Finite elements with uniform mesh reﬁnement

We will now describe examples for the subspaces V which satisfy the

as-sumptions of Section 1.2. We briefly sketch the construction of finite element spaces which are only continuous across element boundaries; see Braess (2007), Brenner and Scott (2002) and Ciarlet (1978) for presentations of the mathematical foundations of finite element methods. These elements are suitable for operators of order <3. Throughout, we denote byPp(K)

the linear space of polynomials of total degree≤pon a set K.

Let us ﬁrst consider the case of a bounded polyhedron D ⊂Rd. Let T0

be a regular partition ofD into simplices K. Let {T}∞=0 be the sequence

of regular partitions of D obtained from T0 by uniform subdivision: for

example, if d= 2, we bisect all edges of the triangulation T and obtain a

new, regular partition of the domainDinto possibly curved triangles which belong to ﬁnitely many congruency classes. We set

V =Sp(D,T) ={u∈C0(D) ; u|K ∈ Pp(K) ∀K∈ T}

and let h = max{diam(K) ; K ∈ T}. Then N = dimV = O(h−d) as

→ ∞. With V = ˜H/2₍_D_{) and} _X

s=H/2+s(D), standard ﬁnite element

approximation results imply that (1.29) holds fors∈[0, p+ 1−/2], i.e., inf

v∈Vu−vV ≤CN

−s/d uXs.

For the case when D is the boundary D =∂D of a polyhedron D ⊂ Rd+1

we deﬁne ﬁnite element spaces on D in the same way as above, but now in local coordinates on D, and obtain the same convergence rates (see,

e.g., Sauter and Schwab (2010)): for ad-dimensional domain D⊂Rdwith a smooth boundary we can ﬁrst divide D into pieces DJ, which can be

mapped to a simplexS by smooth mappings ΦJ: DJ →S (which must be

C0-compatible where two pieces DJ, DJ touch). Then we can deﬁne onD

ﬁnite element functions which on DJ are of the form g◦ΦJ, where g is a

(17)

For a d-dimensional smooth surfaceD⊂Rd+1 we can similarly divideD into pieces which can be mapped to simplices inRd, and again deﬁne ﬁnite elements using these mappings.

Finite element wavelet basis for V

To facilitate the accurate numerical approximation of moments of order k≥2 of the random solution and for the efficient numerical solution of the partial differential equations, we use a hierarchical basis for the nested finite element (FE) spacesV0⊂ · · · ⊂VL.

To this end, we start with a basis {ψ_j0}j=1,...,N0 for the ﬁnite element

space V0 on the coarsest triangulation. We represent on the ﬁner meshes

T the corresponding FE spaces V, with > 0 as a direct sum V =

V−1⊕W. Since the subspaces are nested and ﬁnite-dimensional, this is

pos-sible with a suitable spaceW forany hierarchy of FE spaces. We assume,

in addition, that we are explicitly given basis functions {ψ_j}j=1,...,M of

W. Iterating with respect to , we have that VL = V0⊕W1⊕ · · · ⊕WL,

and {ψ_j; = 0, . . . , L, j= 1, . . . , M} is a hierarchical basis for VL, where

M0 :=N0.

(W1) Hierarchical basis. VL= span{ψj; 1≤j≤ML,0≤≤L}.

Let us deﬁne N := dimV and N−1 := 0; then we have M := N−N−1

for= 0,1,2, . . . , L.

The hierarchical basis property (W1) is in principle suﬃcient for the for-mulation and implementation of the sparse MC–Galerkin method and the deterministic sparse Galerkin method. In order to obtain algorithms of log-linear complexity for integrodiﬀerential equations, impose on the hier-archical basis the additional properties (W2)–(W5) of awavelet basis. This will allow us to perform matrix compression for non-local operators, and to obtain optimal preconditioning for the iterative linear system solver. (W2) Small support. diam supp(ψ_j) =O(2−).

(W3) Energy norm stability. There is a constant CB > 0 independent of

L∈N∪ {∞}, such that, for allL∈N∪ {∞} and all vL= L =0 M j=1 v_jψ_j(x)∈VL, we have C_B−1 L =0 M j=1 |v_j|2 ≤ vL2_V ≤CB L =0 M j=1 |v_j|2. (1.30) Here, in the caseL=∞ it is understood that VL=V.

(18)

(W4) Wavelets ψ_j with ≥0 have vanishing moments up to order p0 ≥

p−

ψ_j(x)xαdx= 0, 0≤ |α| ≤p0, (1.31)

except possibly for wavelets where the closure of the support inter-sects the boundary ∂D or the boundaries of the coarsest mesh. In the case of mapped ﬁnite elements we require the vanishing moments for the polynomial function ψ_j◦Φ−_J1.

(W4) Decay of coeﬃcients for ‘smooth’ functions inXs. There existsC >0

independent ofL such that, for everyu∈Xs and everyL, L =0 M j=1 |u_j|2 22s ≤CLνu2_Xs, ν = 0 for 0≤s < p+ 1−/2, 1 for s=p+ 1−/2. (1.32) By property (W3), wavelets constitute Riesz bases: every function u ∈ V has a unique wavelet expansionu=∞₌₀ M_j₌₁ u

jψj.

We deﬁne the projection PL:V →VL by truncating this wavelet

expan-sion ofu at levelL,i.e.,

PLu:= L =0 M j=1 u_jψ_j. (1.33)

Because of the stability (W3) and the approximation property (1.29), we obtain immediately that the wavelet projection PL is quasi-optimal: with

(1.29), for 0≤s≤s∗ and u∈Xs,

u−PLuV N_L−s/duXs. (1.34)

We remark in passing that the appearance of the factor 1/din the conver-gence rates/d in (1.34), when expressed in terms of NL, the total number

of degrees of freedom, indicates a reduction of the convergence rate as the dimensiond of the computational domain increases. This reduction of the convergence rate with increasing dimension is commonly referred to as the ‘curse of dimensionality’; as long as d = 1,2,3, this is not severe and, in fact, shared by almost all discretizations. If the dimension of the computa-tional domain increases, however, this reduction becomes a severe obstacle to the construction of eﬃcient discretizations. In the context of stochastic and parametric PDEs, the dimension of the computational domain can, in principle, become arbitrarily large, as we shall next explain.

(19)

Full and sparse tensor product spaces

To compute an approximation for

Mk_u_∈_V(k)_:=_V _{⊗ · · · ⊗}_V

ktimes

we need a suitable ﬁnite-dimensional subspace ofV(k). The simplest choice is the tensor product spaceVL⊗ · · · ⊗VL=V_L(k). However, this full tensor

product space has dimension

dim(V_L(k)) =N_Lk = (dim(VL))k, (1.35)

which is not practical for k > 1. A reduction in cost is possible by sparse tensor productsofVL. Thek-foldsparse tensor product spaceV_L(k)is deﬁned

by V_L(k)= ∈Nk 0 ||≤L V1 ⊗ · · · ⊗Vk, (1.36)

where we denote by the vector (1, . . . , k) ∈ Nk0 and its length by || =

1+· · ·+k. The sum in (1.36) is not direct in general. However, since the

V are ﬁnite-dimensional, we can writeV_L(k) as a direct sum in terms of the

complement spacesWl: V_L(k)= ∈Nk 0 ||≤L W1 ⊗ · · · ⊗Wk. (1.37)

If a hierarchical basis of the subspaces V (i.e., satisfying hypothesis

(W1)) is available, we can deﬁne a sparse tensor quasi-interpolation op-erator P_L(k) : V(k) _→ _V(k)

L by a suitable truncation of the tensor product

wavelet expansion: for everyx1, . . . , xk∈D,

(P_L(k)v)(x) := 0≤1+···+k≤L 1≤jν≤M_ν,ν=1,...,k v1···k j1···jk ψ 1 j1(x1)· · ·ψ k jk(xk). (1.38)

If a hierarchical basis is not explicitly available, we can still express P_L(k) in terms of the projections Q := P−P−1 for = 0,1, . . . , and with the

conventionP₋1:= 0 as

P_L(k)=

0≤1+···+k≤L

Q1 ⊗ · · · ⊗Qk. (1.39)

We also note that the dimension ofV_L(k) is

(20)

that is, it is a log-linear function of the numberNL of the degrees of

free-dom used for approximation of the ﬁrst moment. Given that the sparse tensor product spaceV_L(k) is substantially coarser, one wonders whether its approximation properties are substantially worse than that of the full ten-sor product spaceV_L(k). The basis for the use of the sparse tensor product spacesV_L(k)is the next result, which indicates thatV_L(k)achieves, up to loga-rithmic terms, the same asymptotic rate of convergence, in terms of powers of the mesh width, as the full tensor product space. The approximation property of sparse grid spacesV_L(k) was established, for example, in Schwab and Todor (2003b, Proposition 4.2), Griebel, Oswald and Schiekofer (1999), von Petersdorﬀ and Schwab (2004) and Todor (2009).

Proposition 1.7. inf v∈V_L(k) U−v_V(k) ≤C(k) N_L−s/dU_X(k) s if 0≤s < p+ 1−/2, N_L−s/dLν(k)U_X(k) s ifs=p+ 1−/2. (1.41) Here, the exponent ν(k) = (k−1)/2 is best possible on account of the V-orthogonality of theV best approximation.

Remark 1.8. The exponent ν(k) of the logarithmic terms in the sparse tensor approximation rates stated in Proposition 1.7 is best possible for the approximation in the sparse tensor product spacesV(k) given the regularity U ∈Xs(k). In general, these logarithmic terms in the convergence estimate

are unavoidable. Removal of all logarithmic terms in the convergence rate estimate as well as in the dimension estimate ofV_L(k)is possibleonly if either (a) the norm◦_V(k) on the left-hand side of (1.41) is weakened, or if (b) the

normXs(k) on the right-hand side of (1.41) is strengthened. For example, in

the context of sparse tensor FEM for the Laplacian in (0,1)d, it was shown by von Petersdorﬀ and Schwab (2004) and Bungartz and Griebel (2004) that all logarithmic terms can be removed; this is due to the observation that the H1₍₍₀_,₁₎d_{) norm is strictly weaker than the corresponding tensorized norm}

H1(0,1)(d) which appears in the error bound (1.41) in the case of d-point correlations of a random ﬁeld taking values inH₀1(0,1).

The same eﬀect allows us to slightly coarsen the sparse tensor product spaceV_L(k). This was exploited, for example, by Bungartz and Griebel (2004) and Todor (2009).

The error bound (1.41) is for the best approximation of U ∈Xs(k) from

V_L(k). To achieve the exponent ν(k) = (k−1)/2 in (1.41) for a sparse tensor quasi-interpolant such as (1.38), the multi-level basisψ

j of V must

(21)

multi-level basis can be achieved in V ⊂ H1(D), for example, by using so-calledspline prewavelets.

Let us also remark that it is even possible to construct L2₍_D₎ orthonor-mal piecewise polynomial wavelet bases satisfying (W1)–(W5). We refer to Donovan, Geronimo and Hardin (1996) for details.

The stability property (W3) implies the following result (see, e.g., von Petersdorﬀ and Schwab (2004)).

Lemma 1.9. (on the sparse tensor quasi-interpolant P_L(k)) Assume (W1)–(W5) and that the component spacesVofV

(k)

L areV-orthogonal

be-tween scales and have the approximation property (1.29). Then the sparse tensor projection P_L(k) is stable: there exists C > 0 (depending on k but independent ofL) such that, for all for U ∈V(k),

P_L(k)U_V(k) ≤CU_V(k). (1.42)

ForU ∈ Xs(k) and 0≤s≤s∗, if the basis functions ψj satisfy (W1)–(W5)

andareV-orthogonal between diﬀerent levels of mesh reﬁnement, we obtain quasi-optimal convergence of the sparse tensor quasi-interpolant P_L(k)U in (1.38): U −P_L(k)U_V(k) ≤C(k)N− s/d L (logNL) (k−1)/2_U Xs(k). (1.43)

Remark 1.10. The convergence rate (1.43) of the approximation P_L(k)U from the sparse tensor subspace is, up to logarithmic terms, equal to the rate obtained for the best approximation of the mean ﬁeld, i.e., in the case k = 1. We observe, however, that the regularity of U required to achieve this convergence rate is quite high: the function U must belong to an anisotropic smoothness class Xs(k) which, in the context of ordinary

Sobolev spaces, is a space of functions whose (weak) mixed derivatives of ordersbelong toV. Evidently, thismixed smoothness regularity requirement becomes stronger as the number k of moments increases. By Theorem 1.6, the k-point correlations Mku of the random solution u naturally satisfy such regularity.

Galerkin discretization

We ﬁrst consider the discretization of the problemAu(ω) =f(ω) for a single realizationω, bearing in mind that in the Monte Carlo method this problem will have to be approximately solved for many realizations ofω∈Ω.

The Galerkin discretization of (1.11) reads: ﬁnduL(ω)∈VL such that vL, AuL(ω)=vL, f(ω) ∀vL∈VL, P-a.e.ω ∈Ω, (1.44)

(22)

injectivity (1.13) ofA, the G˚arding inequality (1.12) and the density inV of the subspace sequence{V}∞=0 imply that there existsL0 >0 such that, for

L≥ L0, problem (1.44) admits a unique solution uL(ω). Furthermore, we

have the uniform inf-sup condition (see, e.g., Hildebrandt and Wienholtz (1964)): there exists a discretization levelL0 and astability constant γ >0

such that, for allL≥L0,

inf 0=u∈VL ₀₌sup_v_∈_VL Au, v uV vV ≥ 1 γ >0. (1.45) The inf-sup condition (1.45) implies quasi-optimality of the approximations uL(ω) for L ≥ L0 (see, e.g., Babuˇska (1970/71)): there exist C > 0 and

L0>0 such that

∀L≥L0 : u(ω)−uL(ω)V ≤C inf

v∈VLu(ω)−vV P-a.e.ω∈Ω. (1.46)

From (1.46) and (1.29), we obtain the asymptotic error estimate: deﬁne σ:= min{s∗, p+ 1−/2}. Then there existsC >0 such that for 0< s≤σ

∀L≥L0 : u(ω)−uL(ω)V ≤CN_L−s/duXs P-a.e.ω ∈Ω. (1.47)

1.3. Sparse tensor Monte Carlo Galerkin FEM

We next review basic convergence results of the Monte Carlo method for the approximation of expectations of random variables taking values in a separable Hilbert space. As our exposition aims at the solution of opera-tor equations with stochastic data, we shall ﬁrst consider the MC method without discretization of the operator equation, and show convergence esti-mates of the statistical error incurred by the MC sampling. Subsequently, we turn to the Galerkin approximation of the operator equation and, in par-ticular, the sparse tensor approximation of the two- andk-point correlation functions of the random solution.

Monte Carlo error for continuous problems

For a random variable Y, let Y1(ω), . . . , YM(ω) denote M ∈Ncopies of Y,

i.e., the Yi are random variables which are mutually independent and

iden-tically distributed toY(ω) on the same common probability space (Ω,Σ,P). Then the arithmetic averageYM(ω),

YM(ω) := 1 M Y1(ω) +· · ·+YM(ω) ,

is a random variable on (Ω,Σ,P) as well.

The simplest approach to the numerical solution of (1.11) forf∈L1(Ω;V) is MC simulation. Let us ﬁrst consider the situation without discretization of V. We generate M draws f(ωj), j = 1,2, . . . , M, of f(ω) and ﬁnd the

(23)

solutionsu(ωj)∈V of the problems

Au(ωj) =f(ωj), j= 1, . . . , M. (1.48)

We then approximate thekth momentMkuwith the sample mean ¯EM[u(k)] ofu(ωj)⊗ · · · ⊗u(ωj): ¯ EM[u(k)] :=u⊗ · · · ⊗uM = 1 M M j=1 u(ωj)⊗ · · · ⊗u(ωj). (1.49)

It is well known that the Monte Carlo error decreases asM−1/2 in a proba-bilistic senseprovided the variance of u(k) exists. By (1.18), this is the case foru∈L2k(Ω;V). We have the following convergence estimate.

Theorem 1.11. Let k ≥ 1 and assume that in the operator equation (1.11)f ∈L2k(Ω;V). Then, for anyM ∈Nof samples for the MC estimator (1.49), we have the error bound

Mk_u₋_E_¯M_[_u(k)_]

L2(Ω;V(k)₎ ≤M−1/2C_Af_L2k_(Ω;_V)

k

. (1.50)

Proof. We observe that f ∈ L2k(Ω;V) implies with (1.22) that u(k) ∈ L2(Ω;V(k)). Fori= 1, . . . , M we denote byui(ω) theM i.i.d. copies of the

random variableu(ω) =A−1f(ω), which corresponds to the M many MC samplesui =A−1fi.

Using that the ui are independent and identically distributed, we infer

that, for each value ofi,ui(ω)∈L2k(Ω;V). Therefore E[u(k)]−E¯M[u(k)]2_L2_(Ω;_V(k)₎ =E E[u(k)]−E¯M[u(k)]2_V(k) =E E[u(k)]− 1 M M i=1 u(_ik) 2 V(k) =E E[u(k)]− 1 M M i=1 u(_ik),E[u(k)]− 1 M M j=1 u(_jk) = 1 M2 M i,j=1 EE[u(k)]−u(_ik),E[u(k)]−u(_jk) = 1 M2 M i=1 EE[u(k)]−u(_ik)2_V(k) (ui(ω) independent) = 1 ME u(k)−E[u(k)]2_V(k) (ui(ω) identically distributed) = 1 ME u(k)−E[u(k)], u(k)−E[u(k)]

(24)

= 1 M Eu(k)−E[u(k)],E[u(k)]+Eu(k)−E[u(k)], u(k) = 1 ME u(k)2_V(k) − 1 ME[u (k)_]2 V(k) ≤M−1u(k)2_L2_(Ω;_V(k)₎=M−1u_L2k2k_(Ω;_V₎.

Taking square roots on both sides completes the proof.

The previous theorem required that u(k) ∈ L2(Ω;V(k)) or (equivalently by (1.18)) thatu∈L2k(Ω;V) (resp. f ∈L2k(Ω;V)) in order to obtain the convergence rateM−1/2 of the MC estimates (1.49), inL2(Ω;v).

In the case of weaker summability ofu, the next estimate shows that the MC method converges in L1(Ω;V(k)) and at a rate that is possibly lower than 1/2, as determined by the summability ofu. We only state the result here and refer to von Petersdorﬀ and Schwab (2006) for the proof.

Theorem 1.12. Let k ≥ 1. Assume that f ∈ Lαk(Ω;V) for some α ∈ (1,2]. For M ≥1 samples we deﬁne the sample mean ¯EM[u(k)] as in (1.49). Then there existsC such that, for everyM ≥1 and every 0< <1,

P Mk_u₋_E_¯M_[_u(k)_] V(k) ≤C fk_Lαk_(Ω;_V₎ 1/α_M1−1/α ≥1−. (1.51) The previous results show that one can obtain a rate of up toM−1/2 in a probabilistic sense for the Monte Carlo method. Convergence rates beyond 1/2 are not possible, in general, by the MC method, as is shown by the central limit theorem; in this sense, the rate 1/2 is sharp.

So far, we have obtained the convergence rate 1/2 of the MC method essentially inL1(Ω, V(k)) and in L2(Ω, V(k)). A P-a.s convergence estimate of the MC method can be obtained using the separability of the Hilbert space of realizations and the law of the iterated logarithm; see,e.g., Strassen (1964) and Ledoux and Talagrand (1991, Chapter 8) and the references therein for the vector-valued case.

Lemma 1.13. Assume that H is a separable Hilbert space and thatX∈ L2(Ω;H). Then, with probability 1,

lim sup

M→∞

XM −E(X)H

(2M−1_{log log}_M₎1/2 ≤ X−E(X)L2(Ω;H). (1.52)

For the proof, we refer to von Petersdorﬀ and Schwab (2006). Applying Lemma 1.13 toX=u(k)=u⊗· · ·⊗uand withV(k)in place ofHgives (with CA as in (1.14)) u⊗ · · · ⊗uL2(Ω;V(k)₎ = uk_L2k_(Ω;_V₎ ≤ C_A2kfk_L2k_(Ω;_V),