Stochastic Processes Under Linear Differential Constraints : Application to Gaussian Process Regression for the 3 Dimensional Free Space Wave Equation

(1)

arXiv:2111.12035v2 [math.ST] 17 Dec 2021

Stochastic Processes Under Linear Differential Constraints : Application to Gaussian Process

Regression for the 3 Dimensional Free Space Wave Equation

Iain Henderson, Pascal Noble, Olivier Roustant May 2021

Abstract Let P be a linear diﬀerential operator over D Ă R^d and U “ pUxqxPD

a second order stochastic process. In the first part of this article, we prove a new necessary and sufficient condition for all the trajectories of U to verify the partial differential equation (PDE) TpUq “ 0. This condition is formulated in terms of the covariance kernel of U. When compared to previous similar results [1], the novelty of this result is that the equality TpUq “ 0 is understood in the sense of distributions, which is a functional analysis framework particularly adapted to the study of PDEs.

This theorem provides precious insights during the second part of this article, which is dedicated to performing ”physically informed” machine learning on data that is solution to the homogeneous 3 dimensional free space wave equation. We perform Gaussian process regression (GPR) on this data, which is a kernel based Bayesian approach to machine learning. To do so, we put Gaussian process (GP) priors over the wave equation’s initial conditions and propagate them through the wave equation. We obtain explicit formulas for the covariance kernel of the posterior GP; this kernel can then be used for GPR. Our theorem states that this kernel, the trajectories of the corresponding GP and the predictions provided by GPR are all solutions to the wave equation in the sense of distributions. We explore two particular cases : the radial symmetry and the point source. For the former, we derive convolution-free GPR formulas; for the latter, we show a direct link between GPR and the classical triangulation method for point source localization used e.g. in GPS systems. Additionally, this Bayesian framework gives rise to a new answer for the ill-posed inverse problem of reconstructing initial conditions for the wave equation with ﬁnite dimensional data, and simultaneously provides a way of estimating physical parameters from this data as in [2]. We ﬁnish by showcasing this physically informed GPR on a number of practical examples.

Keywords : Gaussian Process Regression, Partial Derivative Equations, Wave equation, Physical Parameter Estimation, Initial Value Inverse Problems

1 Introduction

Machine learning techniques have proved countless times that they were able to provide efficient solutions to difficult problems when field data was available. One key element to a great part of this success is the incorporation of ”expert knowledge” in the corresponding statistical models. In a good deal of practical applications, powerful mathematical models are already available (and somewhat well understood) to describe certain phenomena. This is very common when dealing with problems coming from physics such as thermodynamics, continuum mechanics or fluid mechanics to name a few. In these examples, the mathematical models take the form of Partial Differential Equations (PDEs). The zoology of PDEs is incredibly vast [3] and their applications are ubiquitous. As such, tremendous efforts have been devoted to trying to solve them,

(2)

both theoretically [3] and numerically [4] . These equations impose very specific structures, simple or complex, on the observed data. These may be extremely difficult to capture and mimic with general machine learning models : given the data, they are difficult to understand for the model. However, given the theoretical and practical knowledge of the corresponding PDEs at our disposal, one should try to incorporate these structures in the machine learning models. How may this be done? Gaussian Process Regression (GPR) [5], which is a type of Bayesian framework for machine learning, provides a possible answer when the PDE is linear. Indeed, Gaussian processes (GPs) are the most ”linear” of all random processes : they are stable under (finite) linear combinations of their elements [5]. Although they are very simple mathematical objects when compared to non linear ones, linear PDEs are central in general PDE theory and remain physically pertinent in a number of applications such as acoustics, electromagnetics or quantum mechanics. As a matter of fact, the PDE after which this article is titled plays a fundamental role in all three of the aforementioned domains.

1.1 State of the Art

GPR is a ”kernel based” machine learning method, which means that it is built around a positive deﬁnite function (see equation (2)) called its kernel. Solving or ”learning”

linear ODEs and PDEs thanks to GPR is not a very new idea. The ﬁrst initiatives in that direction probably go back to [6] and have been re-explored ever since in a number of cases. They have been developed in the context of latent forces [7] [8] [9] [10] and were then applied to certain wave equations [11] [12]. Latent forces are interested in linear PDEs of the form

Lu“ f (1)

where both u and f are defined on the same domain D Ă R^d, with L a linear differential operator. Latent forces put a GP prior on the driving source term f . Explicit resolution of (1), thanks to Green’s functions, translates this prior as a GP distribution on the solution u. Conversely, a second approach [2] [13] rather puts a GP prior on the solution u and straightforwardly translates (1) as a GP distribution on the driving term f , avoiding the need for Green’s functions and convolutions. Though both of these approaches are ”physically informed”, they may not account for strict linear equality constraints in the interior domain D, which could be exploited for dimension reduction when they are known. Actually, a number of famous PDEs can be studied with no interior source term, in which case initial conditions or more general boundary source terms are provided. This is frequent in the evolution equation literature, which for instance gives rise to the fruitful semi-group theory for PDEs [14]. Equation (1) then writes Lu “ 0 and while u may be a random object, 0 should be strictly 0 and not just a centered stochastic process as would be the case if using any of the two frameworks described above. A first step towards enforcing strict linear constraints in the approximation space probably dates back to [15] where in a deterministic context, divergence-free interpolation spaces were first built. This idea was pursued in [16] where an interpolation space comprised of solutions to Laplace’s equation was constructed.

In both cases, these spaces are built upon positive deﬁnite kernels (see equation (2)), which makes them easy to transpose in a GPR framework. As such, the kernel from [16] was then used in [17] for performing GPR on Laplace’s equation. Likewise for [18], where GPs are built so that their trajectories are systematically divergence/curl free. This was then taken a step further in [19] [20] [21] [22] in the more general context of Maxwell’s stationary equations. Finally, [23] applies this framework to the 1D heat equation, Laplace’s and Helmholtz’ 2D equations. The matter of enforcing strict homogeneous boundary conditions in the context of GPR has also been addressed in [24] [25]. Enforcing these constraints provides another way of lowering the dimension of the problem. Following [26], [24] builds a PDE-tailored covariance kernel thanks to a Mercer-like expansion in terms of eigenvectors of the diﬀerential operator in question.

In contrast with the rest of the literature, [25] raises the question of rigorous proofs and regularity issues regarding the derivations and applications of GPR for PDEs, and resorts to algebraic techniques to justify the diﬀerent steps of his approach. We

(3)

raise the same questions here, though we rather make use of a functional analysis framework adapted to PDEs. All of the aforementioned approaches as well as the one presented in this article can be formulated using the theory of stochastic partial diﬀerential equations (SPDE), which are PDEs whose source term is a random function.

The general matter of applying physically informed GPR to linear PDEs thanks to an SPDE formulation is tackled in [27], without addressing regularity questions. [28]

presents how spatio-temporal GPR can be reformulated as an SPDE problem, enabling the use of Kalman filter theory for computational efficiency. In [29], the variational formulation (see [3], section 6.1.2 for a definition) of certain linear PDEs has been incorporated into a GPR framework thanks to an SPDE reformulation. This approach requires the use of Gaussian generalized stochastic processes (see [30], section 2.2.1.1), or ”functional Gaussian processes” following [29]. In [31], covariance kernels on graphs are obtained thanks to an adaptation of SPDEs on graphs. Finally, [32] focuses on the study of stationary stochastic processes that are solutions of a wide class of linear SPDEs, outside of the context of GPR. In particular, [32] provides a description of all the second order stationary stochastic processes that are solutions to the 3D wave equation, a central equation in the present article; this description is done in terms of the covariance kernel of the corresponding stochastic process. Note that as in this article, [32] also makes use of the theory of generalized functions.On a side note, a thorough theoretical study of a stochastic 3D wave equation and the regularity of the associated random paths is presented in [33]. In a much wider framework, general linearly constrained stochastic processes and GPs in particular are thoroughly explored in [1]. Though [1] deals with many different types of linear operators, the application of the corresponding results to linear PDEs are not straightforward, see section 3. Indeed, these results are not phrased using the language of functional analysis, which was in great part designed to deal efficiently with differential equations. This is why we prove in Proposition 2 a theorem resembling those of [1] but specifically adapted to linear PDEs. A recent survey on linearly constrained GPs presents most of these different approaches [34].

In this article, we focus on the so called (3 dimensional, free space, time dependent) wave equation. As a time-dependent PDE, we show that performing GPR on wave equation data amounts to reconstructing the corresponding initial conditions of the wave equation from incomplete scattered data. This immediately echoes with questions rising from the inverse problem community [35] [36]; the solution we provide in this article entirely falls in the domain of Photo Acoustic Tomography (PAT), which deals with recovering the initial conditions of wave propagation problems for which the initial value problem (37) is the archetype. We quote [37] and the many references therein on that topic. Also, Bayesian approaches for inverse problem questions involving the wave equation have been set up a number of times, though never really following the GPR methodology presented in this article when it comes down to reconstructing some initial conditions. In these approaches, a probability prior is set on the model’s parameter space, typically the wave propagation speed c; the goal is then to estimate the model’s true parameters. See [35], sections 5.6 to 5.8 for practical examples, or [38] for an inversion of a non uniform propagation speed cpxq.

1.2 Contribution of the paper

We tackle the general problem of applying GPR on scattered observations of solutions of the wave equation using ”physically informed” GPs. We explore both the theoretical and applied aspects of this task.

We begin by proving a general result that provides a simple necessary and suﬃcient condition for the trajectories of any second order stochastic process to be solutions to a given linear PDE in the distributional sense (section 3). This condition is formulated in terms of its covariance function; the hypotheses are minimal and the displayed result is concise. This theorem is phrased in a functional analysis framework, using the permissive language of generalized functions.

We describe a general Gaussian process model for the homogeneous 3D wave equation, with the corresponding proofs (section 4). This model is obtained by putting

(4)

GP priors on the initial conditions of the wave equation, which is a natural thing to do when one views these initial conditions as unknown. In particular we derive the corresponding positive deﬁnite covariance kernel which we will directly use for GPR.

For short, we denote WIGPR the use of this positive kernel to perform GPR, as in

”Wave Informed GPR”. The exposed approach enforces strict linear homogeneous PDE constraints in the interior domain, similarly to what was observed in [19] [20] [22]

among others for different PDEs; see the state of the art section for more details. More precisely, the trajectories of the corresponding GP all verify the wave equation, as well as the predictions provided by WIGPR. Here, these linear constraints are understood in light of our result from section 2, i.e. in the sense of distributions. The key difference with the kernels presented in [32] is that here, no stationarity assumptions are made on the underlying stochastic process. In particular the spectral measure provided by Bochner’s theorem [5], which is the key tool used in [32], is not available anymore. We thus resort to more general integration techniques in the proofs. We then provide an inverse problem interpretation of the use of WIGPR on wave equation data, as the prediction from WIGPR evaluated at t“ 0 provides a finite dimensional reconstruction of the real initial conditions corresponding to the observed data. This is a natural thing to do when adopting a Bayesian point of view, where the prior GP distribution is conditioned on the data. The resultant posterior distribution enables the estimation of the parameters on which the priors were set, i.e. the initial conditions. Among other things, the solution to the corresponding inverse problem provided by this approach is implementable in practice. Two particular cases are investigated : the radial symmetry and the point source. When the initial conditions exhibit radial symmetry, we derive convolution-free covariance formulas and discuss them when the initial conditions are compactly supported. Indeed, these formulas can be directly linked to the finite speed propagation principle for the 3D wave equation, also known as the strong Huygens principle. In the case of the point source, we show numerically (Figure 1) and theoretically that the parameter fitting step from WIGPR naturally reduces to the classic triangulation approach for point source localization, used for instance in GPS systems.

Indeed, as in [2], WIGPR can be used to jointly estimate physical parameters such as wave speed, source localization or source size. Note that the wave equation differs from most of the PDEs mentioned in the introduction, such as the heat equation or Laplace’s equation, because these are either parabolic or elliptic. There are known regularization effects for such types of PDEs [3] which mitigate the need for precise mathematical argumentation w.r.t. the derivations of GPR for PDEs. Such regularization effects completely disappear for hyperbolic PDEs such as the wave equation [3]. In that case, precise mathematical argumentation and rigorous derivations become critical [3]. This is one reason why throughout this section, no proofs are omitted and precise arguments are established to justify all the derivations that lead to the exposed formulas.

We showcase a few numerical experiments of WIGPR (section 5). They are performed on numerically generated wave equation data, with radially symmetric compactly supported initial conditions. This data takes the form of a number of noise polluted time series, each of them corresponding to an ”artiﬁcial” sensor placed in the numerical simulation. We thus use the fast-to-compute covariance expressions derived in the previous section. The tackled questions concern the quality of the estimation of certain physical parameters, the quality of the initial condition reconstruction and the sensibility of the reconstruction step w.r.t. the sensor location. We display initial condition reconstruction images, in light of the inverse problem interpretation described in the previous section. In appendix A are presented more complete numerical results, showing for each example the quality of the physical parameter estimation as well as L², L¹ and L⁸ relative error estimates in terms of the number of sensors used.

Organization of the paper The paper is organized as follow. For self-containment, section 2 is dedicated to reminders on GPs, GPR and generalized functions. This section and all the proofs are detailed enough so that this article is accessible both to the analyst and the statistician. In section 3, we state and prove our new necessary and suﬃcient condition on stochastic processes that are subject to linear diﬀerential

(5)

constraints. Section 4 is dedicated to the study of the wave equation thanks to Gaussian processes and Gaussian process regression. In section 5, we showcase some numerical applications of the previous section on wave equation data. We conclude in section 6.

2 Notations and background

2.1 Notations

Let m be a scalar function defined on an open set D Ă R^d and k a scalar function defined on Dˆ D. Let X “ px1, ..., xnq^T be a column vector in Dⁿ. Let pΩ, F , Pq be a probability space, over which all the random objects of this article will be defined.

N.1 Note mpXq the column vector such that mpXqi “ mpxiq, kpX, Xq the square matrix such that kpX, Xqij “ kpxi, xjq and given x P D, kpX, xq the column vector such that kpX, xqi “ kpxi, xq.

N.2 for any positive deﬁnite kernel k, Hk denotes the associated Reproducing Kernel Hilbert Space (RKHS) as deﬁned in [30].

N.3 Note L²pPq the Hilbert space of real-valued random variables deﬁned on Ω with ﬁnite second order moment endowed with the inner product xX, Y y “ ErXY s.

For a GP pUpxqqxPD, the trajectory of U at point ω P Ω is the deterministic function x ÞÝÑ Upxqpωq and is noted Uω. LpUq :“ SpanpUpxq, x P Dq Ă L²pPq denotes the Hilbert subspace of L²pPq induced by U. Since L²pPq-limits of Gaus- sian random variables drawn from the same GP remain Gaussian [39], LpUq only encompasses Gaussian random variables.

N.4 If X is a random variable, then ”X P A a.s.” means ”X P A almost surely”, or equivalently PpX P Aq “ 1. Likewise, if f is a function deﬁned on D Ă R^d, then

”fpxq P A a.e.” means ”f pxq P A almost everywhere” or equivalently, λdptx P D : fpxq R Auq “ 0 where λd is the Lebesgue measure on R^d.

N.5 L¹_locpDq denotes the space of measurable scalar functions f deﬁned on D that are locally integrable, i.e. such that ş

K|f | ă `8 for all compact sets K Ă D.

DpDq denotes the space of compactly supported inﬁnitely diﬀerentiable functions supported on D.

N.6 for k “ pk1, ..., kdq P N^d, we use the usual notation |k| “ k1 ` ... ` kd and B^k:“ B^k_x¹₁...B_x^k^d_d where B^k_xⁱ_i is the k_i^th derivative w.r.t the i^th coordinate xi.

N.7 The variables pr, θ, φq will always denote spherical coordinates; Sp0, 1q denotes the unit sphere of R³ and we will always write dΩ“ sin θdθdφ its surface diﬀeren- tial element; γ “ psin θ cos φ, sin θ sin φ, cos θq^T P Sp0, 1q denotes the unit length vector parametrized by pθ, φq.

2.2 Gaussian Processes

We refer to [5] for further details on on Gaussian processes.

Definition LetpΩ, F , Pq be a probability space and D Ă R^dan open set. A Gaussian process pUpxqqxPD is a collection of normally distributed random variables deﬁned on Ω and indexed by D such that for any px1, ..., xnq P Dⁿ, the law of pUpx1q, ..., Upxnqq^T is a multivariate normal distribution. Its trajectories are the deterministic functions Uω : x ÞÝÑ Upx, ωq for any ω P Ω. The law of a GP is characterized by its mean and covariance functions, deﬁned by

• mpxq :“ ErU pxqs

• kpx, x¹q “ CovpUpxq, Upx¹qq “ ErpUpxq ´ mpxqqpUpx¹q ´ mpx¹qqs We write pUpxqqxPD „ GP pm, kq.

(6)

Covariance kernels The function m can be any function, and is actually often set to zero. On the other hand, the function k has to be positive deﬁnite (PD) :

ÿn i,j“1

aiajkpxi, xjq ě 0 @ px1, ..., xnq P Dⁿ, @ pa1, ..., anq P Rⁿ (2)

PD functions verify the Cauchy-Schwarz inequality [5] :

@x, x¹ P D, |kpx, x¹q| ď a

kpx, xqa

kpx¹, x¹q (3)

The covariance kernel k is the core element that encodes the mathematical properties of the GP. Furthermore, there is a one-to-one correspondence between positive deﬁnite kernels and (covariance kernels of) centered GPs [40]. Thus we will focus on the design of positive deﬁnite kernels.

Among all covariance kernels, some are said to be stationary, in which case the value atpx, x¹q only depends on the increment x ´ x¹ : kpx, x¹q “ kSpx ´ x¹q. Common examples are the squared exponential and Mat´ern kernels [5]; see equation (117) for an example.

Bayesian inference of functions A Gaussian process U “ pUpxqqxPD can also be seen as a random variable that is valued in a space of functions, i.e. a random function.

Indeed, U can equivalently be viewed as the following random variable :

U :

#pΩ, F , Pq ÝÑ pE, T q

ω ÞÝÑ Uω “ rx ÞÑ Upxqpωqs (4)

wherepE, T q is a measurable space of functions large enough to contain the trajectories of U. If E is a Banach space, T can for example be set to be the Borel σálgebra associated to the normed vector space topology of E. This in turn defines a probability distribution over pE, T q, which is the associated pushforward measure P_U defined by PUpAq “ PpU P Aq @A P T . If a function u P E is unknown, it can be modelled as a random function, for example U. Its a priori probability distribution will be P_U and we will say that we put a Gaussian process prior over u. This is typical of Bayesian inference, where probability distributions are assumed over unknown quantities prior to observing them. Thanks to Bayes’ theorem, the prior probability distribution of u can then be updated through probability conditioning when data on u is available. The conditioned probability distribution over u is called the posterior. Statistical indicators can then be derived from the posterior, such as expectation and standard deviation, to estimate the unknown quantity over which the prior was initially set. Bayesian inference is one way of understanding Gaussian process regression (next subsection).

2.3 Gaussian Process Regression

We refer to [5] for further details on Gaussian process regression.

Kriging equations GPs can be used for function interpolation. Let u be a function deﬁned on D of which we know a small dataset of values B “ tupx1q, ..., upxnqu.

Conditioning the law of a GP pUpxqqxPD „ GP pm, kq on the database B yields a second GP ˜U with ˜Upxq :“ pUpxq|Upxiq “ upxiq, i “ 1, ..., nq. The law of ˜U is known : p ˜UpxqqxPD „ GP p ˜m, ˜kq. ˜m and ˜k are given by the so-called Kriging equations (5) and (6). Note X “ px1, ..., xnq^T and suppose that KpX, Xq is invertible, then [5]

"

˜

mpxq = mpxq ` kpX, xq^TkpX, Xq^´1pupXq ´ mpXqq (5)

˜kpx, x¹q = kpx, x¹q ´ kpX, xq^TkpX, Xq^´1kpX, x¹q (6) In a Bayesian framework, the initial GP pUpxqqxPD is the prior and the conditioned GPp ˜UpxqqxPD is the posterior; the Kriging mean and covariance are simply the mean and covariance of the posterior. At location x, ˜mpxq is the prediction of upxq. By

(7)

construction, for all i P t1, ..., nu, we have that ˜mpxiq “ upxiq and ˜kpxi, xiq “ 0. If observing noisy data Ui “ Upxiq ` εi with pε1, ..., εnq^T „ N p0, σ²Inq independent from U, one replaces KpX, Xq with KpX, Xq ` σ²I above and leaves the terms kpX, xq unchanged. This amounts to applying Tikhonov regularization on kpX, Xq and may also be used to approximate (5) and (6) when kpX, Xq is ill-conditioned.

Tuning covariance kernels For discussions on general kernel construction and se- lection, we refer to [5]. Usually, a family of kernels kθ indexed by θ P Θ Ă R^q is first selected. The elements of θ are the hyperparameters of kθ. One may then try to find the value θ^˚ that fits the best the observations, which corresponds to max- imizing the marginal likelihood. It is the probability density of the Gaussian random vector pUpx1q, ..., Upxnqq^T at point pupx1q, ..., upxnqq^T, see equation (8). Note uobs “ pupx1q, ..., upxnqq^T the vector of observations at locations X “ px1, ..., xnq and ppuobs|θq the associated marginal likelihood at point θ, we search for θ^˚ such that

θ^˚ “ arg max

θPΘ

ppuobs|θq (7)

Explicitly, assuming that m” 0, then pUpx1q, ..., Upxnqq^T „ N p0, kθpX, Xqq and ppuobs|θq “ 1

p2πqⁿ^{2det kθpX, Xq¹^{2e^´¹²û^Tôbs^k^θ^pX,Xq^´1ûôbs (8) Set Lpθq :“ ´2 log ppuobs|θq ´ n log 2π, then (7) is equivalent to

θ^˚ “ arg min

θPΘ

Lpθq (9)

Problem (9) is better behaved numerically. From now on, we call Lpθq the negative log marginal likelihood and we have, for noiseless observations,

Lpθq “ u^T_obskθpX, Xq^´1u_obs` log det kθpX, Xq (10) and for noisy observations with noise standard deviation σ,

Lpθ, σ²q “ u^T_obspkθpX, Xq ` σ²Inq^´1u_obs` log detpkθpX, Xq ` σ²Inq (11) σ can be interpreted as an additional hyperparameter and estimated through (9).

Scattered Data interpolation and the RKHS point of view Kriging equations (5) and (6) can be encountered without resorting to GPs. Given a positive definite kernel k defined on D, one may build a Reproducing Kernel Hilbert Space (RKHS) of functions defined on D which we denote by Hk, see N.2. The inner product of Hk

veriﬁes the so called reproducing property :

@x, x¹ P D, xkpx, ¨q, kpx¹,¨qyH_k “ kpx, x¹q (12) In the meshfree interpolation framework [30] [41], one may formulate the following constrained (interpolation) optimization problem

vminPHk

||v||H_k s.t. vpxiq “ upxiq @i P t1, ..., nu (13) Solving (13) leads to the kriging equation for ˜m in (5); the second equation (6) is what is called the power function in [41]. One may also show [30] that equation (5) can be summerized as

˜

m“ m ` pFpu ´ mq (14)

with F is the ﬁnite dimensional space deﬁned as F “ Spanpkpx1,¨q, ..., kpxn,¨qq Ă Hk

and pF stands for the orthogonal projection operator on F w.r.t. the inner product of Hk. In particular, when m ” 0, equation (14) amounts to ˜m “ pFpuq. Likewise, equation (6) amounts to

˜kpx, ¨q “ P_F^Kpkpx, ¨qq and ˜kpx, xq “ ||P_F^Kkpx, ¨q||²H_k ď ||kpx, ¨q||²H_k “ kpx, xq (15) One perk of this approach is that the Kriging mean is now be understood as an orthogonal projection over a ﬁnite dimensional deterministic space, which is reminiscent of Fourier series or Galerkin reconstruction approaches.

(8)

2.4 Generalized functions

We refer to [42] and [43] for further details on generalized functions. In this whole subsection, D is an open set of R^d.

Definitions and properties Endow DpDq with its usual LF-space topology, deﬁned for example in [42]. We call generalized function any continuous linear form on DpDq, i.e. any element of DpDq¹, the topological dual of DpDq. We will rather denote it by D¹pDq as in [42]. The topology of DpDq is such that T P D¹pDq if and only if for all compact set K Ă D,

DCK ą 0, DnK P N, @ϕ P DpDq s.t Supppϕq Ă K, |T pϕq| ď CK

ÿ

|k|ďnK

||B^kϕ||₈ (16)

Generalized functions are also called ”distributions”, a terminology we will only use when there is no risk of confusion with probability distributions. The duality bracket will be denoted x, y : for ϕ P DpDq and T P D¹pDq, we have xT, ϕy “ T pϕq.

• Any function f P L¹locpDq can be injectively identiﬁed to a generalized function Tf [42] deﬁned as follow

@ϕ P DpDq, xTf, ϕy :“

ż

D

fpxqϕpxqdx (17)

The map L¹_locpDq Q f ÞÝÑ Tf is linear and injective. Throughout this article, we will use the abusive notationxTf, ϕy “ xf, ϕy, as if x, y were the L² inner product.

• Any generalized function T can be indefinitely differentiated [42] with the following definition (see N.6)

B^kT : ϕÞÝÑ xT, p´1q^|k|B^kϕy (18) which coincides with the deﬁnition of weak derivatives when T is a function that admits the according weak derivatives [42].

In particular, (17) and (18) combined provide a ﬂexible deﬁnition for the derivatives of any function f P L¹_locpDq up to any order.

Radon measures In this paper, we call positive Radon measure any positive measure over D that is Borel regular ( [44], Def 1.9) and that has ﬁnite mass over any compact subset of D. Borel regularity is a standard regularity hypothesis from measure theory. We call real-valued Radon measure, or simply Radon measure, any linear combination of positive Radon measures. In [45], Chapter IX, it is proved that the space of Radon measures over D is isomorphic to the space of continuous linear forms over CcpDq, the space of compactly supported continuous functions on D endowed with its usual LF-space topology described e.g. in [43]. The corresponding isomorphism is given by

µÞÝÑ

#C_cpDq ÝÑ R

f ÞÝÑş

Dfpxqµpdxq (19)

We have the following facts :

• any signed measure that admits a density f w.r.t. the Lebesgue measure such that f P L¹_locpDq is a Radon measure ( [43],p.217).

• mimicking (17) and (19), any Radon measure can be injectively identiﬁed to a generalized function with the following identiﬁcation [43]

@ϕ P DpDq, xµ, ϕy :“

ż

D

ϕpxqµpdxq (20)

In particular, Radon measures can be diﬀerentiated up to any order through equation (18).

(9)

• for any Radon measure µ, there is a unique couple pµ^`, µ^´q of positive Radon measures such that µ“ µ^`´ µ^´ ( [45], Chapter IX). We then deﬁne |µ| as

|µ| :“ µ^`` µ^´ (21)

• If µ and ν are two ﬁnite Radon measures over R^d (i.e. ş

Rd|µ|pdxq ă 8 and likewise for ν), their convolution µ˚ ν is deﬁned as follow : let BpR^dq be the Borel σ´algebra of R^d, then

@A P BpR^dq, pµ ˚ νqpAq “ ż

R^d

ż

R^d

1Apx ` yqµpdxqνpdyq (22) and µ˚ ν is also a Radon measure over R^d. When µ and ν have densities fµ and fν, µ˚ ν has the density fµ˚ fν deﬁned by

pfµ˚ fνqpxq “ ż

Rd

fµpyqfνpx ´ yqdy

Remark 1. What is meant behind the terminology of Radon measures varies between authors. [44] calls Radon measure what we call positive Radon measure in this article.

[45] proves that continuous linear forms over CcpDq are diﬀerences of Radon measures in the sense of the Radon measures deﬁned in [44], but [45] never uses the term of Radon measures, positive of not. Likewise, [43] calls positive Radon measure any positive linear form over CcpDq which, thanks to the proof from [45], reduces to Radon measures in the sense of [44].

Finite order generalized functions Let k be a non negative integer, we consider C_c^kpDq the space of compactly supported functions of class C^k endowed with its usual LF-space topology [42]. We denote C_c^kpDq¹its topological dual. The topologies of C_c^kpDq and DpDq are such that the canonical injection DpDq Ñ C^kpDq is continuous [43], which yields that C^kpDq¹ Ă D¹pDq : continuous linear forms over C^kpDq, when restricted to DpDq, become continuous linear forms over DpDq, i.e. generalized functions. We then have the following deﬁnitions and facts.

• Generalized functions T P D¹pDq that are restrictions of continuous linear forms over C_c^kpDq are called generalized functions of order k. If T is of order k for some kP N, T is said to be of ﬁnite order.

• T P D¹pDq is at most of order n if in equation (16), the integer nK can always be taken to be equal to n, whatever the compact set K.

• Let T be a generalized function of order k. Then [43] there exists a family of Radon measures tµpu_|p|ďk over D such that

T “ ÿ

|p|ďk

B^pµp (23)

where the equality in (23) holds in D¹pDq and C_c^kpDq¹. Note that we recover (20) when k“ 0.

• Among the ﬁnite order generalized functions are those that are compactly supported, i.e. those for which the measures µp such that T “ ř

|p|ďkB^pµp all have compact support. One property is that one can deﬁne the Fourier transform of any compactly supported generalized functions [42].

Convolution with generalized functions Let k be a non negative integer. As above, we consider C_c^kpR^dq endowed with its usual topology. Let f P C_c^kpR^dq and T P C_c^kpR^dq¹. Note τxf the function y ÞÝÑ f py ´ xq and ˇf the function y ÞÝÑ f p´yq.

Then [43] one may deﬁne the convolution between T and f by

T ˚ f : x ÞÝÑ xT, τ_´xfˇy (24)

(10)

and T ˚ f is a function in the classical sense, i.e. deﬁned pointwise. When T lies in L¹_locpDq, equation (24) reduces to the usual convolution of functions

pT ˚ f qpxq “ ż

R^d

Tpyqf px ´ yqdy

through the identification defined in equation (17). More general definitions of generalized function convolution are available [43] but this one is sufficient for our use.

Tensor product of generalized functions For two generalized functions T1 P D¹pD1q and T2 P D¹pD2q, T1 b T2 P D¹pD1 ˆ D2q denotes their tensor product [43], which is uniquely determined by the following tensor property :

@ϕ1 P DpD1q, @ϕ2 P DpD2q, xT1b T2, ϕ1b ϕ2y “ xT1, ϕ1y ˆ xT2, ϕ2y (25) T1 b T2 reduces to the tensor product of functions when T1 and T2 are functions through the identiﬁcation of equation (17), and the product measure when T1 and T2

are Radon measures through (20).

3 Stochastic processes under linear differential con- straints

One may wish to force the trajectories of a stochastic process U “ pUpxqqxPD to verify linear constraints, i.e. to lie in the kernel of some linear operator. This is a priori an ambitious task as the trajectories of U form a vast set of functions. However, if U is a second order stochastic process (i.e. @x P D, Var`

Upxq˘

ă `8), then in many cases linear constraints on the trajectories of U can be completely translated as linear constraints on the covariance kernel of U. In particular, these new linear constraints are imposed on a much smaller set of accessible ”explicit” functions. Overall, the resulting constraints on the covariance kernel of U are much easier to handle than the constraints on the trajectories of U. This idea was thoroughly explored in [1], where diﬀerent general frameworks were studied in order to formulate mathematical results on linearly constrained stochastic processes. In proposition 1, we recall a particular result from [1] that was then applied to the stationary heat equation in the same article.

Note FpD, Rq the space of real-valued functions deﬁned on D. Proposition 1 is based on the so called Lo`eve isometry [30] between LpUq and L²pPq (see N.3 for notations), which in turn leads to the following theorem.

Proposition 1 (Trajectories of GPs under linear constraints [1]). Let ` Upxq˘

xPD „ GPp0, kq be a centered GP. Note for all x P D the function kx : y ÞÝÑ kpx, yq. Let E be a real vector space of functions defined on D that contains the trajectories of U almost surely and T : E ÝÑ F pD, Rq be a linear operator. Suppose that for all xP D, T pUqpxq P LpUq. Then there exists a unique linear operator T : Hk ÝÑ F pD, Rq such that for all x, x¹ P D,

ErT pUqpxqUpx¹qs “ T pkx¹qpxq and @x P D, @hn

H_k

ÝÝÑ h, T phnqpxq ÝÑ T phqpxq. Moreover, the following statements are equivalent :

(i) Pptω P Ω : T pUωq “ 0uq “ 1 (ii) @x P D, T pkxq “ 0

(iii) TpHkq “ t0u

This theorem can be applied when T is a diﬀerential operator as discussed in [1].

However, in Proposition 1, the diﬀerential operator T of order n has to be valued in the space of (classical, pointwise deﬁned) functions FpD, Rq; in particular for u P E,

(11)

the function Tpuq has to be defined pointwise in order to use the Loève isometry. To summarize, in all generality the derivatives in T have to be understood in a classical sense and E has to be contained in DⁿpDq, the space of n times differentiable functions on D. Requiring that E Ă DⁿpDq is a very strong assumption w.r.t. the trajectories of U; furthermore, this is not compliant with the usual way of studying PDEs where derivatives are understood in a weaker sense. We present in Proposition 2 an adaptation of Proposition 1 where we make use of the distributional definition of derivatives and relax the assumptions made on U and its trajectories. In this proposition, `

Upxq˘

xPD

is not supposed Gaussian and is only required to be second order. We refer to the notation paragraphs N.5 and N.6.

Proposition 2 (Trajectories of stochastic processes under linear diﬀerential constraints). Let D Ă R^d be an open set and let T “ ř

|k|ďnakpxqB^k be a linear differential operator with coefficients akpxq P C^|k|pDq. Let U “`

Upxq˘

xPD be a second order stochastic process with mean function mpxq and covariance kernel kpx, x¹q. For all xP D, note kx : y ÞÝÑ kpx, yq. Suppose that its mean function m lies in L¹_locpDq as well as its standard deviation function σ : xÞÝÑ a

kpx, xq.

1) Then on a set of probability 1, the trajectories of U lie in L¹_locpDq as well as the functions kx for all xP D.

2) Suppose that Tpmq “ 0 in the sense of distributions. Then the following statements are equivalent :

(i) PpT pUq “ 0 in the sense of distributionsq “ 1 (ii) @x P D, T pkxq “ 0 in the sense of distributions.

Here we write down precisely what we mean by piq and piiq. Note T^˚ the formal adjoint of T deﬁned by T^˚u“ř

|k|ďnp´1q^|k|B^kpakpxquq. By piq, we mean that DA P F , PpAq “ 1, @ω P A, @ϕ P DpDq, xUω, T^˚ϕy “

ż

D

UωpxqT^˚ϕpxqdx “ 0 (26) Similarly, piiq means that

@x P D, @ϕ P DpDq, xkx, T^˚ϕy “ ż

D

kxpyqT^˚ϕpyqdy “ 0 (27) This deﬁnition can be found e.g. in [46]. The fact that the functions xÞÝÑ Uωpxq and y ÞÝÑ kxpyq lie in L¹_locpDq ensure the existence of the integrals in equations (26) (see point 2 of the proof of proposition 2) as well as the continuity of the associated linear forms over DpDq, following the deﬁnition (17). In every case, the term ”in the sense of distributions” can be replaced by ”in D¹pDq” : stating that T pf q “ 0 in the sense of distributions means that Tpf q, seen as an element of D¹pDq, is equal to the null generalized function 0D1pDq : ϕÞÝÑ 0.

Proof. Suppose ﬁrst that U is centered, i.e. m” 0.

1) We begin by showing that the trajectories of U almost surely lie in L¹_locpDq. Note ﬁrst that thanks to the Cauchy-Schwarz inequality, Er|Upxq|s ď σpxq. Now, let pKnqnPN

be an increasing sequence of compact subsets of D such that Ť

nPNKn “ D. Then for any nP N,

E“ ż

Kn

|Upxq|dx‰

“ ż

Kn

Er|Upxq|sdx ď ż

Kn

σpxqdx ă `8 (28)

since σ P L¹_locpDq. Using the property that ”Er|X|s ă `8 ùñ |X| ă `8 almost surely”, this yields a set Bn Ă Ω of probability 1 such that the random variable ω ÞÝÑ ş

Kn|Uω|pxqdx takes ﬁnite values over Bn. Consider now the set B “ Ş

nPNBn

which remains of probability 1. For all compact subset K Ă D, there exists an integer nK such that K Ă KnK and thus for all ωP B,

ż

K

|Uωpxq|dx ď ż

K_nK

|Uωpxq|dx ă `8

(12)

which shows that the trajectories of U lie in L¹_locpDq almost surely.

Now, we check that for all x P D, kx P L¹_locpDq : for any compact set K, since σP L¹_locpDq and because of (3),

ż

K

|kxpyq|dy “ ż

K

|kpx, yq|dy ď σpxq ż

K

σpyqdy ă 8

2) Let us check in advance that whatever f P L¹_locpDq, the map T pf q : ϕ ÞÝÑ xf, T^˚ϕy is a continuous linear form over DpDq. Since ak P C^kpDq, we can apply Leib- niz’ rule on T^˚ϕ “ ř

|k|ďnp´1q^|k|B^kpakϕq. This yields a family tfku_|k|ďn of continuous functions over D such that

@ϕ P DpDq, @x P D, T^˚ϕpxq “ ÿ

|k|ďn

fkpxqB^kϕpxq (29) For all f P L¹_locpDq, for all compact set K Ă D and for all ϕ P DpDq such that Supppϕq Ă K, (29) yields

|xf, T^˚ϕy| ď ż

D

|f pxq||T^˚ϕpxq|dx ď

ˆ ż

K

|f pxq|dx ˆ max

|k|ďnsup

xPK

|fkpxq|˙ ÿ

|k|ďn

||B^kϕ||₈ ă `8 (30) This proves that Tpf q : ϕ ÞÝÑ xf, T^˚ϕy is a continuous linear form over DpDq (eq.(16)).

piq ùñ piiq : Suppose piq. Let ϕ P DpDq. There exists a set A Ă Ω such that PpAq “ 1 and such that for all ω P A,

ż

D

UωpxqT^˚φpxqdx “ 0

Multiplying equation above with Uωpx¹q, taking the expectancy and formally permuting (for now) the integral and the expectancy, we obtain

0“ E

« Upx¹q

ż

D

UpxqT^˚ϕpxqdx ﬀ

“ ż

D

T^˚ϕpxqErUpxqUpx¹qsdx

“ ż

D

T^˚ϕpxqkpx, x¹qdx “ xkx¹, T^˚ϕy

The integral-expectancy permutation is justiﬁed by writing down the expectancy as an integral and using Fubini’s theorem, checking that the below quantity is ﬁnite :

E

« ż

D

|Upx¹qUpxqT^˚ϕpxq|dx ﬀ

“ ż

D

|T^˚ϕpxq|Er|UpxqUpx¹q|sdx (Tonelli)

ď ż

D

|T^˚ϕpxq|ErUpxq²s¹^{2ErUpx¹q²s¹^{2dx ď σpx¹q

ż

D

|T^˚ϕpxq|σpxqdx ă `8

The last integral is ﬁnite because of (30) and σ P L¹_locpDq. Thus, @x P D, @ϕ P DpDq, xkx, T^˚ϕy “ 0 which proves that piq ùñ piiq.

piiq ùñ piq : Suppose piiq. Let ϕ P DpDq, we have xkx¹, T^˚ϕy “ 0. Multiplying this with T^˚ϕpx¹q and integrating w.r.t. x¹ yields

0“ ż

D

T^˚ϕpx¹q ż

D

T^˚ϕpxqkpx, x¹qdxdx¹ “ ż

D

ż

D

T^˚ϕpxqT^˚ϕpx¹qErUpxqUpx¹qsdxdx¹ Permuting formally the expectancy and the integrals (justiﬁed in equation (31)) yields

0“ ż

D

ż

D

T^˚ϕpxqT^˚ϕpx¹qErUpxqUpx¹qsdxdx¹

“ E

«˜ ż

D

T^˚ϕpxqUpxqdx¯2ﬀ

“ ErxU, T^˚ϕy²s

(13)

and thus xU, T^˚ϕy “ 0 a.s. : there exists Aϕ P F with PpAϕq “ 1 such that

@ω P Aϕ,xUω, T^˚ϕy “ 0. We justify the expectancy-integral permutation with the computation below

ż

D

ż

D

|T^˚ϕpxqT^˚ϕpx¹q|Er|UpxqUpx¹q|sdxdx¹ ď ż

D

ż

D

|T^˚ϕpxqT^˚ϕpx¹q|σpxqσpx¹qdxdx¹

ď

˜ ż

D

|T^˚ϕpxq|σpxqdx

¸2

ă `8 (31)

The last integral is ﬁnite because of (30) and σP L¹_locpDq.

This does not ﬁnish the proof as we need to ﬁnd a set A with PpAq “ 1, indepen- dently from ϕ, such that@ω P A, xUω, T^˚ϕy “ 0. For this we use the fact that DpDq is a separable topological space, which we prove at the end of this proof. Let F Ă DpDq be a countable dense subset of DpDq, let A :“ B XŞ

ϕPFAϕ and let ω P A. Since Uω P L¹_locpDq,(30) shows that the map Lω : ϕ ÞÝÑ xUω, T^˚ϕy is a continuous linear form on DpDq. The continuity of Lω implies that LωpF q is a dense subset of LωpDpDqq [47].

But LωpF q “ t0u and therefore LωpDpDqq “ LωpF q “ t0u which shows that

@ω P A, @ϕ P DpDq, xUω, T^˚ϕy “ Lωpϕq “ 0 Since PpAq “ 1, this shows that piiq ùñ piq.

When U is not centered, consider the centered stochastic process V deﬁned by Vpxq “ Upxq ´ mpxq for which the above proof can be applied. Since T is linear and mis supposed to verify Tpmq “ 0 in the sense of distributions, the probabilistic events tT pUq “ 0 in the sense of distributionsu and tT pV q “ 0 in the sense of distributionsu coincide and thus have the same probability measure. Finally, U and V have the same covariance kernel kpx, x¹q. Thus,

PpT pUq “ 0 in the sense of distributionsq “ 1

ðñ PpT pV q “ 0 in the sense of distributionsq “ 1 ðñ @x P D, T pkxq “ 0 which ﬁnishes the proof in the general case.

Proof that DpDq is separable : DpDq is an LF-space as the inductive limit of the Fr´echet spaces DKipDq :“ tϕ P C⁸pDq : Supppϕq Ă Kiu, i P N, where K1 Ă K2 Ă ...

are compact subsets of D such that Ť

iKi “ D ( [43], p.131-133). As such, DpDq is separable iff DKipDq is separable for all i P N [48], which we now show. The Fréchet topology of DKipDq is the one induced by the usual Fréchet topology of C⁸pDq when D_K

ipDq is seen as a subspace of C⁸pDq ( [42], section 1.46). As a Fr´echet space, C⁸pDq is metrizable. But C⁸pDq is also a Montel space ( [43], Prop 34.4) : as a metrizable space, it is automatically separable ( [49], p.195). Thus DKipDq is also separable as a subspace of the separable metric space C⁸pDq.

Remark 2. Distributional solutions are the weakest types of solutions for PDEs. In general, additional regularity conditions have to be imposed to obtain physically re- alistic solutions, such as Sobolev regularity or entropy conditions as for non linear hyperbolic PDEs [50]. However, every step in the above proof remains valid when re- placing ϕP DpDq with ϕ P C_cⁿpDq. Although we have not clariﬁed the usual topology of C_cⁿpDq in this article, we state that this is enough to show that the equalities stated in Proposition 2 also hold in C_cⁿpDq¹, the space of ﬁnite order generalized functions of order n, rather than just in D¹pDq. C_cⁿpDq¹ is a smaller space than D¹pDq, though less used in PDE theory than D¹pDq.

Remark 3. We gave here an elementary proof that

σP L¹_locpDq ùñ the trajectories of U lie in L¹_locpDq almost surely (32) Similar results on Sobolev regularity of the trajectories of second order stochastic processes are scarce in the literature. Some are available in [51], though the result (32) is actually not covered in [51], where additional continuity hypotheses would be required in the left hand side of (32) to apply results from [51].

(14)

We partially recover proposition 1 when the trajectories of U lie in CⁿpDq and k P C^n,npD ˆ Dq. Indeed, in that case one can show that if T “ ř

|k|ďnakpxqB^k, then we simply have T “ T in proposition 1. Additionally, T pUωq and T pkxq both lie in F pD, Rq X L¹_locpDq, and for any function g that lies in L¹_locpDq, we have

g “ 0 in the sense of distributions ðñ g “ 0 a.e. (33) Equation (33) is just another way of saying that the linear map f ÞÝÑ Tf given in (17) is injective. In that framework, proposition 1 states that

@x P D, T pkxq “ 0 ðñ PpT pUq “ 0q “ 1 (34) where the function equalities of the form Tpf q “ 0 in (34) are valid everywhere on D.

Following equation (33), proposition 2 states a slightly weaker result, namely that

@x P D, T pkxq “ 0 a.e. ðñ PpT pUq “ 0 a.e.q “ 1 (35) We can now state the following corollary, which draws the consequences of proposition 2 when applied to GPR.

Proposition 3 (Heredity of Proposition 2 to conditioned GPs). Let D and T be as defined in Proposition 2. LetpUpxqqxPD „ GP pm, kq be a Gaussian process that verifies the hypotheses of Proposition 2. Suppose also that

Tpmq “ 0 and @x P D, T pkxq “ 0 both in the sense of distributions (36) piq Then whatever the integer p, the vector u “ pu1, ..., upq^T P R^p and the vector X“ px1, ..., xpq^T P D^p such that kpX, Xq is invertible, the Kriging mean ˜mpxq and the Kriging standard deviation function ˜σpxq “

b˜kpx, xq both lie in L¹_locpDq, and we have

Tp ˜mq “ 0 and @x P D, T p˜kxq “ 0 both in the sense of distributions where ˜m and ˜k are defined in equations (5) and (6).

piiq As such, the trajectories of the conditioned Gaussian process ` ˜Upxq˘

xPD defined by ˜Upxq “ pUpxq|Upxiq “ ui @i “ 1, ..., pq are almost surely solutions of the equation Tpf q “ 0 in the sense of distributions :

PpT p ˜Uq “ 0 in the sense of distributionsq “ 1

Proof. Note ﬁrst that for all x P D, ˜kpx, xq ď kpx, xq, which is immediate from (15).

Thus the function ˜σ : x ÞÝÑ

b˜kpx, xq also lies in L¹_locpDq. Point piq is then a direct consequence of the deﬁnition of ˜m and ˜k in equations (5) and (6), and the linearity of T. Proposition 2 can then be applied conjointly with piq, which yields point piiq since the mean and covariance functions of the GP ˜U are ˜m and ˜k (see section 2.2).

Proposition 3 shows that when U is a GP, the results of proposition 2 are inherited on the conditioned posterior process ˜U. One weak consequence of proposition 3 is that if GPR is performed with a kernel k that veriﬁes pointpiiq of proposition 2, then the predictions provided by GPR are all solutions of the PDE Tp ˜mq “ 0.

The goal of the next section is to apply this idea to a special case of the (3 dimensional) wave equation deﬁned in eq.(37), by building an ”explicit” positive deﬁnite kernel k such that@x¹ P D, lkx¹ “ 0 in the sense of distributions, where the box symbol l classically denotes the linear wave operator a.k.a. the d’Alembert operator. With this new kernel, we will perform GPR on observations of a function that is solution to the wave equation and draw a number of related consequences.

(15)

4 Gaussian Processes and the 3 Dimensional Wave Equation

4.1 General Solution to the 3 Dimensional Wave Equation

Denote the 3D Laplace operator ∆ “ B²_xx ` B_yy² ` B_zz² and the d’Alembert operator l“ 1{c²B_tt² ´ ∆ with constant wave speed c ą 0. We focus on the general initial value problem in the free space R³

$’

&

’%

lw “ 0 @px, tq P R³ ˆ R^˚_` wpx, 0q “ u0pxq @x P R³

pBtwqpx, 0q “ v0pxq @x P R³

(37)

Throughout this paper, we will refer to u0 as the initial position and v0 as the initial speed. The problem (37) is a Cauchy problem with initial conditions (IC) u0 and v0. It admits a unique solution which can be extended to all times tP R, and is represented as follow [3]

wpx, tq “ pFt˚ v0qpxq ` p 9Ft˚ u0qpxq @px, tq P R³ˆ R (38) where Ftand 9Ftare known generalized functions. Actually, Ftand 9Ftare better known through their Fourier transforms [3], as

F pFtqpξq “ sinpct|ξ|q

c|ξ| and Fp 9Ftqpξq “ cospct|ξ|q (39) where |ξ| is the euclidean norm of ξ P R³. Note that the relation 9Ft “ BtFt can be directly deduced from (39). Additionally, the representation (38) is valid in any dimension as well as the Fourier formulas (39), see [3]. Finally, Ft also corresponds to the Green’s function of the wave equation [52].

In dimension 3, Ft and 9Ft are compactly supported generalized functions of order 0 and 1 respectively. More explicitly, in dimension 3 Ft and 9Ft are given by

Ft“ σc|t|

4πc²t and F9t “ BtFt (40)

where σR is the surface measure of the sphere of center 0 and radius R. F9t “ BtFt

means that

@C₀¹pR³q, x 9Ft, fy “ BtxFt, fy “ Bt

ż

R³

fpxqFtpdxq

Suppose that u0 P C¹pR³q and v0 P C⁰pR³q, then w as deﬁned in (38) is a function in the classical sense [43] and in that case an explicit formula for such convolutions is reminded in equation (24) (yet one may actually make sense out of (38) when u0 and v₀ are only required to be any generalized functions [43]). Combining formulas (38) and (40) leads to the Kirschoﬀ formula [3] (see N.7 for spherical coordinates notations) :

wpx, tq “ ż

Sp0,1q

tv0px ´ c|t|γq ` u0px ´ c|t|γq ´ c|t|γ ¨ ∇u0px ´ c|t|γqdΩ

4π (41)

4.2 Gaussian Process Modelling of the Solution

Suppose now that u0 and v0 are unknown, and only pointwise values of w are observed.

In a Bayesian approach, we model u0 and v0 as random functions and put a Gaussian process prior over u0 and v0 as in equation (4). More precisely, we make the following assumptions.

(A1) Suppose that the initial conditions u0 and v0 of Problem (37) are trajectories drawn from two independent Gaussian processes U⁰ „ GP p0, kuq and V⁰ „ GPp0, kvq : Dω P Ω, @x P R³, u0pxq “ U_ω⁰pxq and v0pxq “ V_ω⁰pxq.

(16)

(A2) Suppose that all trajectories of U⁰ lie in C¹pR³q and that those of V⁰ lie in C⁰pR³q almost surely. A suﬃcient condition for this is given in [39], Thm 1.4.2. This theorem states that under mild technical assumptions, the paths ofpUpxqqxPD „ GPp0, kq lie in C^l a.s. as soon as k P C^2lpD ˆ Dq, which we assume from now on.

We now analyse the consequence of these two assumptions. First, they imply that by solving (37), one obtains a time-space stochastic process Wpx, tq deﬁned by

Wpx, tq : Ω Q ω ÞÝÑ pFt˚ V_ω⁰qpxq ` p 9Ft˚ U_ω⁰qpxq (42) Here, V_ω⁰ denotes the trajectory of V⁰ at ω P Ω and likewise for U_ω⁰. In particular, thanks to assumption pA2q, (42) deﬁnes a random variable for all px, tq. Note the space-time variable z“ px, tq and note the random variables

Vpzq : ω ÞÝÑ pFt˚ V_ω⁰qpxq and Upzq : ω ÞÝÑ p 9Ft˚ U_ω⁰qpxq (43) that is, Wpzq “ Upzq ` V pzq. We show in the next proposition that the stochastic processes U, V and W are GPs as well. In particular we describe their covariance kernels.

Proposition 4. Define the two functions

k_v^wavepz, z¹q “ rpFtb Ft¹q ˚ kvspx, x¹q and k_u^wavepz, z¹q “ rp 9Ftb 9Ft¹q ˚ kuspx, x¹q (44) (i) Then U “ pUpzqqzPR³ˆR and V “ pV pzqqzPR³ˆR as defined in (43) are two independent centered GPs with covariance kernels k_u^wave and k^wave_v respectively. Consequently, pW pzqqzPR³ˆR is a centered GP whose covariance kernel is given by

kWpz, z¹q “ k^wave_v pz, z¹q ` k^wave_u pz, z¹q (45) (ii) Conversely, any centered second order stochastic process with covariance kernel kW

has its sample paths solution of the 3 dimensional wave equation (37) almost surely.

Proof. piq : ﬁrst we prove that U and V are GPs. Since U⁰ and V⁰ are GPs, LpU⁰q and LpV⁰q are only comprised of Gaussian random variables (see N.3).

To prove that U and V are Gaussian processes, we rely on the Kirschoff formula (41), writing the integrals as limits of Riemann sums. We start with V , that is, we focus on the first term in Kirschoff’s formula (41). To show that V is a Gaussian process, we only need to show that for any z, Vpzq P LpV⁰q as this will ensure the Gaussian process property. Since the trajectories of V⁰ are continuous almost surely, there exists a sequence of numbers aⁿ_k and points y_kⁿ such that for almost any ω P Ω,

Vpzqpωq “ pFt˚ V_ω⁰qpxq “ t ż

R³

V⁰px ´ c|t|γqpωqdΩ 4π

“ t 4π

ż2π 0

żπ 0

V⁰px ´ c|t|γpθ, φqqpωq sinpθqdθdφ “ lim

nÑ8

ÿn k“1

aⁿ_kV⁰px ´ yⁿ_kqpωq

Thus Vpzq is an almost sure limit of elements of LpV⁰q which also implies convergence in law. But since V⁰ is a Gaussian process, convergence in law implies the convergence of the moments of all order [39] and in particular the convergence in L²pPq. Thus, Vpzq P LpV⁰q and V is a Gaussian process. The convergence of the ﬁrst moment implies that Vpzq is centered.

We apply the same reasoning to U, by applying the above steps to the second part of Kirschoﬀ’s formula (41). One’s ability to write out the integrals as a limit of Riemann sums is ensured when the trajectories of U⁰ lie in C¹pR³q.

Finally, since U⁰and V⁰are independent, LpU⁰q and LpV⁰q are orthogonal in L²pPq:

LpU⁰q ` LpV⁰q “ LpU⁰q‘ LpV^K ⁰q

Since LpUq Ă LpU⁰q and likewise for V , U and V are independent Gaussian processes (for Gaussian random variables, independence is equivalent to null covariance).