Deep learning based numerical approximation algorithms for stochastic partial differential equations and high-dimensional nonlinear filtering problems

(1)

arXiv:2012.01194v1 [math.NA] 2 Dec 2020

Deep learning based numerical approximation

algorithms for stochastic partial differential equations

and high-dimensional nonlinear filtering problems

Christian Beck

1

, Sebastian Becker

2

, Patrick Cheridito

3

,

Arnulf Jentzen

4,5

_{, and Ariel Neufeld}

6

1 _{Department of Mathematics, ETH Zurich,}

Switzerland, e-mail: [email protected]

2 _{RiskLab, Department of Mathematics, ETH Zurich,}

3 _{RiskLab, Department of Mathematics, ETH Zurich,}

4 _{Faculty of Mathematics and Computer Science, University of M¨}_unster

Germany, e-mail: [email protected]

5 _{Department of Mathematics, ETH Zurich}

6 _{Division of Mathematical Sciences, School of Physical and Mathematical Sciences,}

Nanyang Technological University, Singapore, e-mail: [email protected]

Thursday 3

rd

December, 2020

Abstract

In this article we introduce and study a deep learning based approximation al-gorithm for solutions of stochastic partial differential equations (SPDEs). In the proposed approximation algorithm we employ a deep neural network for every real-ization of the driving noise process of the SPDE to approximate the solution process of the SPDE under consideration. We test the performance of the proposed ap-proximation algorithm in the case of stochastic heat equations with additive noise, stochastic heat equations with multiplicative noise, stochastic Black–Scholes equa-tions with multiplicative noise, and Zakai equaequa-tions from nonlinear filtering. In each of these SPDEs the proposed approximation algorithm produces accurate results with short run times in up to 50 space dimensions.

Keywords: stochastic partial differential equation, numerical method, artificial neural network, nonlinear filtering, Zakai equation

(2)

1 Introduction

Stochastic partial differential equations (SPDEs) are key ingredients in numerous models in engineering, finance, and natural sciences. For example, SPDEs commonly appear in models to price interest-rate based financial derivatives (cf., for example, (1.3) in Filipovi´c et al. [34] and Theorem 3.5 in Harms et al. [49]), to describe random surfaces appear-ing in surface growth models (cf., for example, (3) in Hairer [46] and (1) in Bl¨omker & Romito [15]), to describe the temporal dynamics in Euclidean quantum field theories (cf., for example, (1.1) in Mourrat & Weber [85]), to describe velocity fields in fully developed turbulent flows (cf., for example, (1.5) in Birnir [14] and (7) in Birnir [13]), and to describe the temporal development of the concentration of an unwanted (biological or chemical) contaminant in water (e.g., in the groundwater system, in a river, or in a water basin; cf., for example, (2.2) in Kallianpur & Xiong [63] and (1.1) in Kouritzin & Long [69]). Another prominent situation where SPDEs such as the Kushner equation (cf., for exam-ple, Kushner [75]) and the Zakai equation (cf., for examexam-ple, Zakai [105]) appear is in the case of nonlinear filtering problems where SPDEs describe the density of the state space of the considered system. In particular, we refer, e.g., to [16, 21, 25, 29, 36, 37] for filtering problems in financial engineering, we refer, e.g., to [18, 24, 94, 95, 98, 103] for filtering problems in chemical engineering, and we refer, e.g., to [28, 19, 20, 23, 33, 89] for filtering problems in weather forecasting. SPDEs arising in nonlinear filtering problems are usu-ally high-dimensional as the number of dimensions corresponds to the state space of the considered filtering problem. Most of the SPDEs in the above named applications cannot be solved explicitly and for about 30 years it has been an active field of research to design and study numerical algorithms which can approximatively compute solutions of SPDEs.

In order to be able to implement an approximation scheme for evolutionary type SPDEs on a computer both the time interval [0, T] as well as the infinite dimensional state space have to be discretized. Several types of temporal discretizations and spatial discretizations have been proposed and studied in the scientific literature. In particular, we refer, e.g., to [26, 41, 52, 96, 99, 104] for temporal discretizations based on the linear implicit Euler method, we refer, e.g., to [55, 60, 81, 87, 102] for temporal discretizations based on expo-nential Euler-type methods, we refer, e.g., to [50, 51, 96, 99] for temporal discretizations based on linear implicit Crank–Nicolson-type methods, we refer, e.g., to [1, 9, 12, 35, 44, 76] for temporal discretizations based on splitting up approximation methods, we refer, e.g., to [58, 62, 71, 92, 101] for temporal discretizations based on higher order approximation methods, we refer, e.g., to [17, 38, 64, 70, 80, 99, 104] for spatial discretizations based on finite elements methods, we refer, e.g., to [45, 83, 90, 93, 96, 100] for spatial discretizations based on finite differences methods, and we refer, e.g., to [39, 51, 68, 79, 88, 86] for spatial discretizations based on spectral Galerkin methods. Moreover, the recent article [106] em-ploys deep neural networks to approximately solve some one-dimensional SPDEs. We also refer to the overview articles [42, 59] and to the monographs [61, 72] for further references on numerical approximation methods for SPDEs.

(4)

In this article we introduce and study a deep learning based approximation algorithm for approximating solutions of possibly high-dimensional SPDEs. In the proposed approxi-mation algorithm we employ a deep neural network for every realization of the driving noise process of the SPDE to approximate the solution process of the SPDE under consideration. The derivation of the proposed approximation scheme is inspired by the ideas in the articles [2, 3, 48] in which deep learning based algorithms for high-dimensional PDEs have been proposed and studied. We also refer, e.g., to [4, 11, 22, 30, 31, 32, 53, 54, 57, 91, 97] and the references therein for further articles on deep learning based approximation methods for PDEs. We test the performance of the approximation algorithm proposed in this article in the case of stochastic heat equations with additive noise (see Subsection 3.1 below), stochastic heat equations with multiplicative noise (see Subsection 3.2 below), stochastic Black–Scholes equations with multiplicative noise (see Subsection 3.3 below), and Zakai equations from nonlinear filtering (see Subsection 3.4 below). In each of these SPDEs the proposed approximation algorithm produces accurate results with short run times in up to 50 space dimensions.

The remainder of this paper is organized as follows. In Section 2 we derive (see Subsec-tions 2.1–2.6 below) and formulate (see SubsecSubsec-tions 2.7–2.8 below) the proposed approxi-mation algorithm for SPDEs. In Section 3 we test the performance of the proposed approx-imation algorithm (see Subsection 2.8 below) in the case of stochastic heat equations with additive noise (see Subsection 3.1 below), stochastic heat equations with multiplicative noise (see Subsection 3.2 below), stochastic Black–Scholes equations with multiplicative noise (see Subsection 3.3 below), and Zakai equations (see Subsection 3.4 below). In Sec-tion 4 we present the Python source codes which we have used for the numerical simulaSec-tions in Section 3.

2 Derivation and description of the proposed

approx-imation algorithm

Let T _∈ (0,_∞), d _∈ N_{, let} _ϕ_: Rd _→ R _{be a continuous function, let (Ω}_,_F_,P) be a

probability space with a normal filtration (_Ft)t∈[0,T] (cf., e.g., Liu & R¨ockner [78,

Defi-nition 2.1.11]), let Z: [0, T]_×Rd_×_Ω _→ R _{be a sufficiently regular random field which} satisfies for every x _∈ Rd _{that (}_Z_t₍_x₎₎_t_∈_[0,T]_{: [0}_{, T}_]_×_Ω _→ R _{is an (}_F_t₎_t_∈_[0,T_]_{-Itˆo process,} let µ: Rd _→ Rd_{, f}_: Rd _×R_×Rd _→ R_, _and _b_: Rd_×R_×Rd _→ R _{be sufficiently} regu-lar functions, let σ: Rd _→ Rd×d _{be a sufficiently regular and sufficiently non-degenerate} function, and let X: [0, T]_× Rd_× _Ω _→ R _{be a random field which satisfies for every}

t _∈ [0, T], x _∈ Rd _that _X_t₍_x_{) : Ω} _→ R _is _F_t_/_B₍R_{)-measurable, which satisfies for every}

(5)

partial derivatives, and which satisfies1 _{that for every}_t_∈_[0_{, T}_],_x_∈_Rd _{it holds}

P-a.s. that Xt(x) =ϕ(x) +

Z t

0

f x, Xs(x),(∇Xs)(x)

ds+

Z t

0

b x, Xs(x),(∇Xs)(x)

dZs(x)

+

Z t

0

h

1

2Trace σ(x)[σ(x)]

∗_(Hess_X

s)(x)

+

µ(x),(_∇Xs)(x)

Rd

i

ds.

(1)

Our goal is to approximately calculate under suitable hypothesis the solution process

X: [0, T]_×Rd_×_Ω_→R_{of the SPDE (1).}

2.1 Temporal approximations

In this subsection we discretize the SPDE (1) in time by employing the splitting-up method (cf., for example, [8, 10, 40, 43, 44, 76]) to obtain a semi-discrete approximation problem. To this end let N _∈N_, _t₀_{, t}₁_{, . . . , t}_N _∈_[0_{, T}_{] be real numbers with}

0 =t0 < t1 < . . . < tN =T. (2)

Observe that (1) yields that for every n_{∈ {}0,1, . . . , N ₋1_}, t _∈[tn, tn+1], x ∈Rd it holds

P-a.s. that

Xt(x) =Xtn(x) +

Z t

tn

ds+

Z t

tn

dZs(x)

+

Z t

tn

h

1

2Trace σ(x)[σ(x)]

∗_(Hess_X

s)(x)

+

µ(x),(_∇Xs)(x)

Rd

i

ds.

(3)

This illustrates for every n _{∈ {}0,1, . . . , N ₋1_}, t_∈[tn, tn+1],x∈Rd that

Xt(x)≈Xtn(x)

+

Z tn+1

tn

f x, Xtn(x),(∇Xtn)(x)

ds+

Z tn+1

tn

b x, Xtn(x),(∇Xtn)(x)

dZs(x)

+

Z t

tn

h

1

2 Trace σ(x)[σ(x)]

∗_(Hess_X

s)(x)

+

µ(x),(_∇Xs)(x)

Rd

i

ds.

(4)

This, in turn, suggests for every n _{∈ {}0,1, . . . , N ₋1_}, t_∈[tn, tn+1], x∈Rd that

Xt(x)≈Xtn(x) +f x, Xtn(x),(∇Xtn)(x)

(tn+1−tn)

+b x, Xtn(x),(∇Xtn)(x)

Ztn+1(x)−Ztn(x)

+ Z t tn h 1

2 Trace σ(x)[σ(x)]

∗_(Hess_X

s)(x)

+

µ(x),(_∇Xs)(x)

Rd

i

ds.

(5)

1_{Note that for every} _{d, m}_∈N _{and every (}_d_×_m_)-matrix _A _∈Rd×m

it holds that A∗

∈ Rm×d is the transpose ofA.

(6)

To derive the splitting-up approximation let U: (0, T]_×Rd_×_Ω _→ R _{be a random field} which satisfies for every ω _∈ Ω, n _{∈ {}0,1, . . . , N ₋1_} that (Ut(x, ω))(t,x)∈(tn,tn+1]×Rd ∈

C1,2₍₍_t

n, tn+1]×Rd,R) has at most polynomially growing partial derivatives, which satisfies

for everyω _∈Ω,x_∈Rd_thatRT

0 k(HessUs)(x, ω)kRd×d+k(∇U_s)(x, ω)kRdds <∞, and which

satisfies that for every n_{∈ {}0,1, . . . , N ₋1_},t _∈(tn, tn+1], x∈Rd it holdsP-a.s. that Ut(x) =Xtn(x) +f(x, Xtn(x),(∇Xtn)(x)) (tn+1−tn)

+b(x, Xtn(x),(∇Xtn)(x)) (Ztn+1(x)−Ztn(x))

+

Z t

tn

h

1

2Trace σ(x)[σ(x)]

∗_(Hess_U

s)(x)

+

µ(x),(_∇Us)(x)

Rd

i

ds.

(6)

Note that (5) and (6) suggest for every n _{∈ {}1,2, . . . , N_}, x_∈Rd _that

Utn(x)≈Xtn(x). (7)

Next let V : [0, T] _×Rd _× _Ω _→ R _{be a random field which satisfies for every} _ω _∈ _Ω,

n _{∈ {}0,1, . . . , N ₋1_} that (Vt(x, ω))(t,x)∈(tn,tn+1]×Rd ∈ C

1,2₍₍_t

n, tn+1]×Rd,R) has at most

polynomially growing partial derivatives, which satisfies for every ω _∈ Ω, x _∈ Rd _that

RT

0 k(HessVs)(x, ω) kRd×d +k(∇V_s)(x, ω)kRdds < ∞, and which satisfies for every n ∈

{0,1, . . . , N ₋1_}, t_∈(tn, tn+1],x∈Rd that V0(x) =ϕ(x) and

Vt(x) =Vtn(x) +f(x, Vtn(x),(∇Vtn)(x)) (tn+1−tn)

+b(x, Vtn(x),(∇Vtn)(x)) (Ztn+1(x)−Ztn(x))

+

Z t

tn

h

1

2Trace σ(x)[σ(x)]

∗_(Hess_V

s)(x)

+

µ(x),(_∇Vs)(x)

Rd

i

ds

(8)

(cf., for example, Deck & Kruse [27], Hairer et al. [47, Section 4.4], Krylov [73, Chapter 8], and Krylov [74, Theorem 4.32] for existence, uniqueness, and regularity results for (6) and (8)). Note that (6) and (8) suggest for every n _{∈ {}1,2, . . . , N_}, x_∈Rd _that

Vtn(x)≈Utn(x). (9)

Combining this with (7), in turn, suggests for every n_{∈ {}1,2, . . . , N_}, x_∈Rd _that

Vtn(x)≈Xtn(x). (10)

Observe that the random field V is a specific splitting-up type approximation for the random field X (cf., for example, [8, 10, 40, 43, 44, 76]). In the next subsection we derive a Feynman-Kac representation for V given Z (cf., for example, Milstein & Tretjakov [84, Section 2]).

(7)

2.2 An approximate Feynman-Kac type representation

In this subsection we derive a Feynman-Kac representation forV givenZ. More specifically, for every sufficiently regular function z: [0, T]_×Rd _→R _let _V(z)_{: [0}_{, T}_]_×Rd_→ R _satisfy for every n _{∈ {}0,1, . . . , N ₋ 1_} that (_V_t(z)(x))(t,x)∈(tn,tn+1]×Rd ∈ C

1,2₍₍_t

n, tn+1] × Rd,R)

has at most polynomially growing partial derivatives, which satisfies for every x _∈ Rd that RT

0 k(HessV (z)

s )(x)kRd×d+k(∇V_s(z))(x)kRdds < ∞, and which satisfies for every n ∈

{0,1, . . . , N ₋1_}, t_∈(tn, tn+1], x∈Rd that V0(z)(x) =ϕ(x) and

Vt(z)(x) =V (z)

tn (x) +f(x,V

(z)

tn (x),(∇V

(z)

tn )(x)) (tn+1−tn)

+b(x,_Vt(z)n (x),(∇V

(z)

tn )(x)) (ztn+1(x)−ztn(x))

+

Z t

tn

h

1

2Trace σ(x)[σ(x)]

∗_(Hess_V(z) s )(x)

+

µ(x),(_∇V(z) s )(x)

Rd

i

ds.

(11)

Note that (8) and (11) ensure that for every ω _∈Ω, t_∈[0, T],x_∈Rd _{it holds that}

V(Zs(y,ω))(s,y)∈[0,T]×Rd

t (x) =Vt(x, ω). (12)

Combining this with (10) suggests for every sufficiently regular functionz: [0, T]_×Rd_→_R

and every ω_∈Ω, n _{∈ {}0,1, . . . , N_}, x_∈Rd _that

V(Zs(y,ω))(s,y)∈[0,T]×Rd

tn (x)≈Xtn(x, ω). (13)

Moreover, note that (10)–(12) suggest for every sufficiently regular functionz: [0, T]_×Rd _→ R _{and every} _n _{∈ {}₀_,₁_{, . . . , N}_}_, _x_∈Rd _that

Vt(z)n (x)≈E

Xtn(x)|Z =z

. (14)

In the following we introduce additional artificial stochastic processes in order to incorpo-rate a Feynman-Kac type representation into (11). Let B: [0, T]_×Ω_→Rd _{be a standard} (_Ft)t∈[0,T]-Brownian motion, let ξ: Ω → Rd be an F0/B(Rd)-measurable function which

satisfies for every p _∈ (0,_∞), x _∈ Rd _that P(kξ −xkRd ≤ p) > 0 and E[kξk

p

Rd] < ∞,

assume that Z and (ξ, B) are independent random variables, and let Y : [0, T]_×Ω _→ Rd be an (_Ft)t∈[0,T]-adapted stochastic process with continuous sample paths which satisfies

that for every t _∈[0, T] it holds P-a.s. that Yt=ξ+

Z t

0

µ(Ys)ds+

Z t

0

σ(Ys)dBs. (15)

Note that the assumption that for every p _∈ (0,_∞) it holds that E[kξk

p

Rd] < ∞ and the

assumption that µ: Rd _→ Rd _and _σ_: Rd _→ Rd×d _{are sufficiently regular functions ensure} that for every p_∈(0,_∞) it holds that

sup

t∈[0,T]

E

kYtkpRd

(8)

Moreover, observe that (11) implies that for every sufficiently regular function z: [0, T]_× Rd_→R _{and every} _n _{∈ {}₀_,₁_{, . . . , N} ₋₁_}_, _t_∈₍_t_n_{, t}_n+1_),_x_∈Rd _{it holds that}

∂ ∂t

Vt(z)(x)

=

µ(x),(_∇V_t(z))(x)

Rd+

1

2Trace σ(x)[σ(x)]

∗_(Hess

Vt(z))(x)

. (17) This, in turn, assures that for every sufficiently regular function z: [0, T]_×Rd _→ R _and every n _{∈ {}0,1, . . . , N ₋1_}, t_∈(T ₋tn+1, T −tn),x∈Rd it holds that

∂ ∂t

VT(z)−t(x)

+

µ(x),(_∇V_T(z)₋_t)(x)

Rd +

1

2Trace σ(x)[σ(x)]

∗_(Hess

VT(z)−t)(x)

= 0. (18) Next note that Itˆo’s formula, the hypothesis that for every sufficiently regular function

z: [0, T]_×Rd_→R _{and every}_n _{∈ {}₀_,₁_{, . . . , N} ₋₁_} _{it holds that (}_V(z)

t (x))(t,x)∈(tn,tn+1]×Rd ∈

C1,2₍₍_t

n, tn+1] ×Rd,R) (cf. (11)), and (15) guarantee that for every sufficiently regular

function z: [0, T]_×Rd _→R_{and every} _n_{∈ {}₀_,₁_{, . . . , N} ₋₁_}_, _{r, t}_∈_[_T ₋_t_n+1_{, T} ₋_t_n_{) with}

r < t it holds P-a.s. that

VT(z)−t(Yt) =VT(z)−r(Yr) +

Z t

r

(_∇V_T(z)₋_s)(Ys), σ(Ys)dBs

Rd+

Z t

r ∂ ∂s

VT(z)−s

(Ys)ds

+

Z t

r 1

2Trace σ(Ys)[σ(Ys)]

∗_(Hess

VT(z)−s)(Ys)

ds + Z t r

µ(Ys),(∇VT(z)−s)(Ys)

Rdds.

(19)

Combining this with (18) implies that for every sufficiently regular functionz: [0, T]_×Rd_→ R _{and every} _n _{∈ {}₀_,₁_{, . . . , N} ₋₁_}_, _{r, t}_∈_[_T ₋_t_n+1_{, T} ₋_t_n_{) with} _{r < t} _{it holds} P-a.s. that

VT(z)−t(Yt) = VT(z)−r(Yr) +

Z t

r

Rd. (20)

Hence, we obtain that for every sufficiently regular function z: [0, T]_×Rd _→R _{and every}

n _{∈ {}0,1, . . . , N ₋1_}, t_∈(T ₋tn+1, T −tn) it holds P-a.s. that

VT(z)−t(Yt) =Vt(z)n+1(YT−tn+1) +

Z t

T−tn+1

Rd. (21)

Furthermore, note that (16), the hypothesis that σ: Rd _→ _Rd×d _{is a sufficiently regular}

function, and the hypothesis that for every sufficiently regular function z: [0, T]_×Rd_→R and every n _{∈ {}0,1, . . . , N ₋1_} it holds that (tn, tn+1]×Rd ∋ (t, x) 7→ (∇Vt(z))(x) ∈ Rd

is an at most polynomially growing function assure that for every n _{∈ {}0,1, . . . , N ₋1_},

t_∈(T ₋tn+1, T −tn) it holds that

Z t

T−tn+1

E

h

[σ(Y_s)]∗(∇V

(z) T−s)(Ys)

2

Rd

i

(9)

Therefore, we obtain that for every sufficiently regular function z: [0, T]_×Rd _→ R _and every n _{∈ {}0,1, . . . , N ₋1_}, t_∈(T ₋tn+1, T −tn) it holds P-a.s. that

E

Z t

T−tn+1

Rd

FT−tn+1

= 0. (23)

This and (21) demonstrate that for every sufficiently regular function z: [0, T]_×Rd _→ R and every n_{∈ {}0,1, . . . , N ₋1_}, t_∈[T ₋tn+1, T −tn) it holds P-a.s. that

E

VT(z)−t(Yt)

F_T₋_t_n₊₁

=E

Vt(z)n+1(YT−tn+1)

F_T₋_t_n₊₁

. (24)

The fact that for every sufficiently regular function z: [0, T]_×Rd _→ R _{and every} _n _∈

{0,1, . . . , N₋1_}it holds that the function Ω _∋ω_{7→ V}_t(z)_n₊₁(YT−tn+1(ω))∈RisFT−tn+1/B(R

)-measurable hence implies that for every sufficiently regular function z: [0, T]_×Rd _→ _R

and every n_{∈ {}0,1, . . . , N ₋1_}, t_∈[T ₋tn+1, T −tn) it holds P-a.s. that E

VT(z)−t(Yt)

F_T₋_t_n₊₁

=_V_t(z)_n₊₁(YT−tn+1). (25)

Next observe that the hypothesis that for every sufficiently regular functionz: [0, T]_×Rd _→ R _{and every} _n_{∈ {}₀_,₁_{, . . . , N} ₋₁_} _{it holds that (}_V(z)

t (x))(t,x)∈(tn,tn+1]×Rd ∈C

1,2₍₍_t

n, tn+1]×

Rd_,R_{) has at most polynomially growing partial derivatives and the fact that for every}

ω _∈Ω it holds that [0, T]_∋ t_7→Yt(ω)∈Rd is a continuous function ensure that for every

sufficiently regular function z: [0, T]_×Rd _→ R _{and every} _ω _∈_Ω, _n _{∈ {}₀_,₁_{, . . . , N} ₋₁_} _it holds that

lim sup

tրT−tn

V

(z)

T−t(Yt(ω))− VT(z)−t(YT−tn(ω))

= 0. (26)

Furthermore, observe that (11) and the hypothesis that for every sufficiently regular function z: [0, T]_×Rd _→ R _{and every} _x _∈ Rd _{it holds that} RT

0 k(HessV (z)

s )(x)kRd×d +

k(_∇Vs(z))(x)kRdds <∞show that for every sufficiently regular functionz: [0, T]×Rd→R

and every ω_∈Ω, n _{∈ {}0,1, . . . , N ₋1_} it holds that lim sup

tրT−tn

V

(z)

T−t(YT−tn(ω))−

Vt(z)n (YT−tn(ω))

+f YT−tn(ω),V

(z)

tn (YT−tn(ω)),(∇V

(z)

tn )(YT−tn(ω))

(tn+1−tn) (27)

+b YT−tn(ω),V

(z)

tn )(YT−tn(ω))

ztn+1(YT−tn(ω))−ztn(YT−tn(ω))

= 0.

Combining this with (26) demonstrates that for every sufficiently regular functionz: [0, T]_× Rd_→R _{and every} _ω _∈_Ω, _n _{∈ {}₀_,₁_{, . . . , N} ₋₁_} _{it holds that}

lim sup

tրT−tn

V

(z)

T−t(Yt(ω))−

Vt(z)n (YT−tn(ω))

+f YT−tn(ω),V

(z)

tn )(YT−tn(ω))

(tn+1−tn) (28)

+b YT−tn(ω),V

(z)

tn )(YT−tn(ω))

ztn+1(YT−tn(ω))−ztn(YT−tn(ω))

= 0.

(10)

In addition, note that the hypothesis that for every sufficiently regular functionz: [0, T]_× Rd_→R_{and every}_n _{∈ {}₀_,₁_{, . . . , N}_}_{it holds that (}_V(z)

t (x))(t,x)∈(tn,tn+1]×Rd ∈C

1,2₍₍_t

n, tn+1]×

Rd_,R_{) has at most polynomially growing partial derivatives and (16) ensure that for every} sufficiently regular function z: [0, T]_×Rd_→_R _{and every} _p_∈₍₀_,_∞_{) it holds that}

sup

t∈[0,T]

E

kYtkpRd

+

sup

t∈[0,T)

E

|VT(z)−t(Yt)|p

+

sup

t∈(0,T]

E

k(_∇Vt(z))(YT−t)kpRd

<_∞. (29) Next note that the fact that for every x _∈ R_, _ω _∈ _{Ω it holds that} _X₀₍_{x, ω}_{) =} _ϕ₍_x_), the hypothesis that for every sufficiently regular function z: [0, T]_×Rd _→ R _{and every}

x _∈ Rd _{it holds that} _V(z)

0 (x) = ϕ(x), and the hypothesis that for every ω ∈ Ω it holds

that (Xt(x, ω))(t,x)∈[0,T]×Rd ∈C0,2([0, T]×Rd,R) has at most polynomially growing partial

derivatives demonstrate that for every sufficiently regular function z: [0, T]_×Rd _→ R _it holds that (_V₀(z)(x))x∈Rd ∈ C2(Rd,R) has at most polynomially growing derivatives. This

and (29) imply that for every sufficiently regular function z: [0, T]_×Rd _→ _R _{and every}

p_∈(0,_∞) it holds that

sup

t∈[0,T]

E

kYtkpRd

+

sup

t∈[0,T]

E

|VT(z)−t(Yt)| p

+

sup

t∈[0,T]

E

k(_∇V_t(z))(YT−t)kpRd

<_∞. (30) The hypothesis that f: Rd_×_R_×_Rd_→_R _and _b_: _Rd_×_R_×_Rd_→_R _{are sufficiently regular}

functions hence proves that for every sufficiently regular function z: [0, T]_×Rd _→R _and every p_∈(0,_∞) it holds that

sup

t∈[0,T]

E

h

V_t(z)(Y_T₋_t)

pi

+ max

n∈{0,1,...,N−1} E

h

V_t(z)_n (Y_T₋_t_n) +f Y_T₋_t_n,V_t(z)_n (Y_T₋_t_n),(∇V_t(z)_n )(Y_T₋_t_n)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

ztn+1(YT−tn)−ztn(YT−tn)

pi

<_∞.

(31)

Combining (28) and, e.g., Hutzenthaler et al. [55, Proposition 4.5] therefore demonstrates that for every sufficiently regular functionz: [0, T]_×Rd_→ R_{and every}_n _{∈ {}₀_,₁_{, . . . , N}₋₁_} it holds that

lim sup

tրT−tn

E

h V

(z)

T−t(Yt)−

Vt(z)n (YT−tn) +f YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

i

(11)

This and (25) yield that for every sufficiently regular functionz: [0, T]_×Rd _→R_{and every}

n _{∈ {}0,1, . . . , N ₋1_} it holdsP-a.s. that

Vt(z)n+1(YT−tn+1)

=E

h

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

ztn+1(YT−tn)−ztn(YT−tn) F_T₋_t_n₊₁ i

.

(33)

The tower property for conditional expectations therefore assures that for every sufficiently regular function z: [0, T]_×Rd_→R _{and every} _n_{∈ {}₀_,₁_{, . . . , N} ₋₁_} _{it holds} P-a.s. that

E

h

Vt(z)n+1(YT−tn+1)|S(YT−tn+1)

i

=E

h

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

ztn+1(YT−tn)−ztn(YT−tn) S(YT−tn+1)

i

.

(34)

In addition, observe that the fact that for every n _{∈ {}0,1, . . . , N ₋1_} it holds that the function Ω_∋ω _7→YT−tn+1(ω)∈R

d_is_S₍_Y

T−tn+1)/B(R

d_{)-measurable assures that for every}

sufficiently regular function z: [0, T]_×Rd _→ R _{and every} _n _{∈ {}₀_,₁_{, . . . , N} ₋₁_} _{it holds}

P-a.s. that

Vt(z)n+1(YT−tn+1) =E

h

Vt(z)n+1(YT−tn+1)|S(YT−tn+1)

i

. (35)

This and (34) imply that for every sufficiently regular function z: [0, T]_×Rd _→ R _and every n _{∈ {}0,1, . . . , N ₋1_} it holdsP-a.s. that

Vt(z)n+1(YT−tn+1)

=E

h

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

ztn+1(YT−tn)−ztn(YT−tn) S(YT−tn+1)

i

.

(36)

Equation (36) constitutes the Feynman-Kac type representation we were aiming at. In the following subsection we employ the factorization lemma (cf., for example, Becker et al. [6, Lemma 2.1] or Klenke [66, Corollary 1.97]) and the L2_{-minimality property of conditional}

expectations (cf., for example, Klenke [66, Corollary 8.17]) to reformulate (36) as recursive minimization problems.

2.3 Formulation as iterative recursive minimization problems

In this subsection we reformulate (36) as recursive minimization problems. For this observe that (31) shows that for every sufficiently regular function z: [0, T]_×Rd _→ R _{and every}

(12)

n _{∈ {}0,1, . . . , N ₋1_} it holds that E h V (z)

tn (YT−tn) +f YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

2i

<_∞. (37) The factorization lemma, e.g., in [6, Lemma 2.1] (applied with (S,_S) x ₍Rd_,_B₍Rd_)), Ωx_Ω, _X x_Y_T₋_t

n+1 for n ∈ {0,1, . . . , N −1} in the notation of [6, Lemma 2.1]), theL

2

-minimality property for conditional expectations, e.g., in Klenke [66, Corollary 8.17] (ap-plied withX x_Ω_∋_ω_{7→ V}(z)₍_Y_T₋_t

n(ω)) +f(YT−tn(ω),V

(z)

tn )(YT−tn(ω)))

(tn+1 − tn) +b(YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn))(ztn+1(YT−tn)−ztn(YT−tn)) ∈ R, F x

S₍_Y_T₋_t

n+1), A x F in the notation of [66, Corollary 8.17]), the fact that for every

suffi-ciently regular function z: [0, T]_×Rd _→R _{and every} _n _{∈ {}₀_,₁_{, . . . , N} ₋₁_} _{it holds that} Rd _∋_x_{7→ V}(z)

tn+1(x)∈R is a continuous function, the fact that for every Borel measurable

set A _{∈ B}(Rd_{) with positive Lebesgue measure it holds that min}_n_∈{_0,1,...,N₋₁_}P(YT−t

n+1 ∈

A)>0, and (36) hence imply that for every sufficiently regular functionz: [0, T]_×Rd_→R and every n_{∈ {}0,1, . . . , N ₋1_} it holds that

(_V_t(z)_n₊₁(x))x∈Rd = argmin

u∈C(Rd_,R₎

E

h

u(Y_T₋_t_n₊₁)−

Vt(z)n (YT−tn)

+f YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

2i

.

(38)

Therefore, we obtain that for every sufficiently regular function z: [0, T]_×Rd _→ _R _and

every n _{∈ {}1,2, . . . , N_} it holds that (_V_t(z)_n (x))x∈Rd = argmin

u∈C(Rd_,R₎

E

h

u(Y_T₋_t_n)−

Vt(z)n−1(YT−tn−1)

+f YT−tn−1,V

(z)

tn−1(YT−tn−1),(∇V

(z)

tn−1)(YT−tn−1)

(tn−tn−1)

+b YT−tn−1,V

(z)

tn−1(YT−tn−1),(∇V

(z)

tn−1)(YT−tn−1)

ztn(YT−tn−1)−ztn−1(YT−tn−1)

2i

.

(39)

In the following subsection we approximate for every sufficiently regular functionz: [0, T]_× Rd _→R _{and every} _n _{∈ {}₁_,₂_{, . . . , N}_} _{the function} Rd_∋ _x_{7→ V}(z)

tn (x)∈R by suitable deep

neural networks.

2.4 Deep neural network approximations

In this subsection we employ for every sufficiently regular function z: [0, T]_×Rd_→R _and every n _{∈ {}1,2, . . . , N_} suitable approximations for the function

Rd_∋_x_{7→ V}(z)

(13)

More specifically, let ν _∈ N _{and for every sufficiently regular function} _z_{: [0}_{, T}_]_×Rd _→ R _let V(z)_n _{= (}V(z)_n ₍_{θ, x}₎₎_(θ,x)_∈_Rν_×Rd: Rν × Rd → R, n ∈ {0,1, . . . , N}, be continuously

differentiable functions which satisfy for every sufficiently regular functionz: [0, T]_×Rd_→ R _{and every} _θ_∈Rν_, _x_∈Rd _that V(z)

0 (θ, x) =ϕ(x). For every sufficiently regular function

z: [0, T]_×Rd_→R_{, every} _n_{∈ {}₁_,₂_{, . . . , N}_}_,_x_∈Rd_{, and every} _suitable _θ_∈Rν _{we think of} V(z)_n ₍_{θ, x}₎_∈R _{as an appropriate approximation}

V(z)

n (θ, x)≈ V (z)

tn (x) (41)

of_Vt(z)n (x). Combining this and (13) indicates for every sufficiently regular functionz: [0, T]×

Rd_→_R_{, every} _n _{∈ {}₁_,₂_{, . . . , N}_}_, _x_∈_Rd_{, and every} _suitable _θ _∈_Rν _that

V(z)

n (θ, x)≈E

Xtn(x)|Z =z

. (42)

For every sufficiently regular functionz: [0, T]_×Rd_→R_{we propose to choose the functions} V(z)_n _: Rν _×Rd _→R_, _n _{∈ {}₁_,₂_{, . . . , N}_}_{, as deep neural networks (cf., for instance, [7, 77]).} For example, for every k _∈ N _let _T_k_: Rk _→ Rk _{satisfy for every} _x_{= (}_x₁_{, x}₂_{, . . . , x}_k₎_∈ Rk that

Tk(x) = tanh(x1),tanh(x2), . . . ,tanh(xk)

(43) (multidimensional version of the tangens hyperbolicus), for every θ= (θ1, θ2, . . . , θν)∈Rν,

v _∈ N₀ ₌ _{₀_{} ∪} N_, _{k, l} _∈ N _with _v ₊_lk ₊_l _≤ _ν _let _Aθ,v

k,l: Rk → Rl satisfy for every

x= (x1, x2, . . . , xk)∈Rk that

Aθ,v_k,l(x) =

      

θv+1 θv+2 . . . θv+k

θv+k+1 θv+k+2 . . . θv+2k

θv+2k+1 θv+2k+2 . . . θv+3k

..

. ... ... ...

θv+(l−1)k+1 θv+(l−1)k+2 . . . θv+lk

              x1 x2 x3 .. . xk        +        θv+lk+1 θv+lk+2 θv+lk+3 .. . θv+lk+l        (44)

(affine function), let s _{∈ {}3,4,5,6, . . ._}, assume that s(N + 1)d(d+ 1) _≤ν, and for every sufficiently regular function z: [0, T]_×Rd _→R _let V(z)_n _: Rν_×Rd _→R_, _n _{∈ {}₀_,₁_{, . . . , N}_}_, satisfy for every n _{∈ {}1,2, . . . , N_}, θ_∈Rν_, _x_∈Rd _that V(z)

0 (θ, x) =ϕ(x) and

V(z)

n (θ, x) = (45)

Aθ,(sn+s_d,1 −1)d(d+1)_{◦ T}d◦Aθ,(sn+sd,d −2)d(d+1)◦. . .◦ Td◦Aθ,(sn+1)d(d+1)d,d ◦ Td◦Aθ,snd(d+1)d,d

(x).

For every sufficiently regular function z: [0, T]_×Rd _→R _{and every} _n _{∈ {}₁_,₂_{, . . . , N}_}_the function V(z)_n _: Rν _×Rd _→ R _{in (45) describes a fully-connected feedforward deep neural} network with s+ 1 layers (1 input layer withdneurons, s₋1 hidden layers withdneurons each, and 1 output layer with 1 neuron) and multidimensional versions of the tangens hyperbolicus as activation functions (see (43)).

(14)

2.5 Stochastic gradient descent based minimization

We intend to find for every sufficiently regular function z: [0, T] _× Rd _→ R _suitable

θz,1_{, θ}z,2_{, . . . , θ}z,N _∈ _Rν _{in (41) by recursive minimization. More precisely, for every}

suf-ficiently regular function z: [0, T] _× Rd _→ R _{we intend to find for} _n _{∈ {}₁_,₂_{, . . . , N}_}_,

θz,0_{, θ}z,1_{, . . . , θ}z,n−1 _∈_Rν _{a suitable} _θz,n _∈_Rν _{as an approximate minimizer of the function}

Rν _∋_θ _7→E

h

V(z)_n (θ, Y_T₋_t_n)−

V(z)

n−1(θz,n−1, YT−tn−1)

+f YT−tn−1,V

(z)

n−1(θz,n−1, YT−tn−1),(∇xV

(z)

n−1)(θz,n−1, YT−tn−1)

(tn−tn−1)

+b YT−tn−1,V

(z)

n−1(θz,n−1, YT−tn−1),(∇xV

(z)

n−1)(θz,n−1, YT−tn−1)

· ztn(YT−tn−1)−ztn−1(YT−tn−1)

2i

∈R

(46)

(cf. (39) and (41) above). To this end for every sufficiently regular functionz: [0, T]_×Rd_→ R _let _Bz,m_{: [0}_{, T}_]_×_Ω _→ Rd_, _m _∈ N₀_{, be i.i.d. standard (}_F_t₎_t_∈_[0,T]_{-Brownian motions, let}

ξz,m_{: Ω} _→ _Rd_, _m _∈ _N

0, be i.i.d. F0/B(Rd)-measurable functions, for every sufficiently

regular function z: [0, T]_×Rd _→ R _{and every} _m _∈ N₀ _let _Yz,m_{: [0}_{, T}_]_×_Ω _→ Rd _{be an} (_Ft)t∈[0,T]-adapted stochastic process with continuous sample paths which satisfies that for

every t _∈[0, T] it holds P-a.s. that Ytz,m =ξz,m+

Z t

0

µ(Ysz,m)ds+

Z t

0

σ(Ysz,m)dBsz,m, (47)

let γ _∈ (0,_∞), M _∈ N_{, and for every sufficiently regular function} _z_{: [0}_{, T}_]_×Rd _→ R _let

ϑz,n _{= (}_ϑz,n

m )m∈N₀: N₀×Ω→ Rν, n ∈ {0,1, . . . , N}, be stochastic processes which satisfy

for every n _{∈ {}1,2, . . . , N_}, m_∈N₀ _that

ϑz,n_m+1 =ϑz,n_m ₋2γ_·(_∇θV(z)n )(ϑz,nm , Y z,m T−tn)·

h

V(z)

n (ϑz,nm , Y z,m T−tn)−V

(z) n−1(ϑ

z,n−1 M , Y

z,m T−tn−1)

−f Y_Tz,m₋_t_n−1,V

(z) n−1(ϑ

z,n−1 M , Y

z,m

T−tn−1),(∇xV

(z) n−1)(ϑ

z,n−1 M , Y

z,m T−tn−1)

(tn−tn−1)

−b Y_Tz,m₋_t_n−1,V

(z)

n−1(ϑz,nM −1, Y z,m

T−tn−1),(∇xV

(z)

n−1)(ϑz,nM −1, Y z,m T−tn−1)

· ztn(Y

z,m

T−tn−1)−ztn−1(Y

z,m T−tn−1)

i

(48)

(cf. (66)–(68) below).

2.6 Temporal discretization of the auxiliary stochastic process

Equation (48) provides us an implementable numerical algorithm in the special case where for every sufficiently regular function z: [0, T]_×Rd _→ R _{one can simulate exactly from} the solution processes Yz,m_{: [0}_{, T}_] _× _Ω _→ _Rd_, _m _∈ _N

(15)

(15) above). In the case where it is not possible to simulate for every sufficiently regular function z: [0, T]_×Rd _→ R _{exactly from the solution processes} _Yz,m_{: [0}_{, T}_]_×_Ω _→ Rd_,

m_∈N₀_{, of the SDEs in (47), one can employ a numerical approximation method for SDEs,} say, the Euler-Maruyama scheme, to approximatively simulate for every sufficiently regular function z: [0, T]_×Rd _→R _{from the solution processes} _Yz,m_{: [0}_{, T}_]_×_Ω _→ Rd_, _m _∈ N₀_, of the SDEs in (47). This is subject of this subsection. More formally, note that (47) implies that for every sufficiently regular function z: [0, T]_×Rd _→ R _{and every} _m _∈ N₀_,

r, t_∈[0, T] with r_≤t it holds P-a.s. that Ytz,m =Yrz,m+

Z t

r

µ(Y_sz,m)ds+

Z t

r

σ(Y_sz,m)dB_sz,m. (49) Hence, we obtain that for every sufficiently regular function z: [0, T]_×Rd _→R _{and every}

m_∈N₀_, _n _{∈ {}₀_,₁_{, . . . , N} ₋₁_} _{it holds}P-a.s. that Y_Tz,m₋_t_n =Y_Tz,m₋_t_n₊₁ +

Z T−tn

T−tn+1

µ(Y_sz,m)ds+

Z T−tn

T−tn+1

σ(Y_sz,m)dB_sz,m. (50) This shows that for every sufficiently regular functionz: [0, T]_×Rd_→R_{and every}_m_∈N₀_,

n _{∈ {}0,1, . . . , N ₋1_} it holdsP-a.s. that Y_Tz,m₋_t_N

−(n+1) =Y

z,m T−tN−n+

Z T−tN−(n+1)

T−tN−n

µ(Y_sz,m)ds+

Z T−tN−(n+1)

T−tN−n

σ(Y_sz,m)dB_sz,m. (51) Next we introduce suitable real numbers which allow us to reformulate (51) in a more compact way. More formally, let τn ∈ [0, T], n ∈ {0,1, . . . , N}, satisfy for every n ∈

{0,1, . . . , N_}that

τn =T −tN−n. (52)

Observe that (2) and (52) ensure that

0 =τ0 < τ1 <· · ·< τN =T. (53)

Moreover, note that (51) and (52) demonstrate that for every sufficiently regular function

z: [0, T]_×Rd_→R _{and every} _m _∈N₀_,_n _{∈ {}₀_,₁_{, . . . , N} ₋₁_} _{it holds}P-a.s. that Y_τz,m_n₊₁ =Y_τz,m_n +

Z τn+1

τn

µ(Y_sz,m)ds+

Z τn+1

τn

σ(Y_sz,m)dB_sz,m. (54) This suggests for every sufficiently regular function z: [0, T]_×Rd_→_R_{and every} _m_∈_N

0,

n _{∈ {}0,1, . . . , N ₋1_} that

Y_τz,m_n₊₁ _≈Y_τz,m_n +µ(Y_τz,m_n ) (τn+1−τn) +σ(Yτz,mn ) (B

z,m τn+1−B

z,m

(16)

Based on (55) we now introduce for every sufficiently regular function z: [0, T]_×Rd _→R suitable Euler-Maruyama approximations for the solution processes Yz,m_{: [0}_{, T}_]_×_Ω _→

Rd_, _m _∈ N₀_{, of the SDEs in (47). More formally, for every sufficiently regular function}

z: [0, T]_×Rd_→R_{and every} _m_∈N₀ _let _Yz,m _{= (}_Yz,m

n )n∈{0,1,...,N}: {0,1, . . . , N} ×Ω→Rd

be the stochastic process which satisfies for every n _{∈ {}0,1, . . . , N ₋1_} that _Y₀z,m =ξz,m

and

Yn+1z,m =Ynz,m+µ(Ynz,m) (τn+1−τn) +σ(Ynz,m) (Bτz,mn+1 −B

z,m

τn ). (56)

Observe that (52), (55), and (56) suggest for every sufficiently regular function z: [0, T]_× Rd_→R _{and every} _m _∈N₀_, _n_{∈ {}₀_,₁_{, . . . , N}_}_that

Ynz,m ≈Yτz,mn =Y

z,m

T−tN−n. (57)

This, in turn, suggests for every sufficiently regular function z: [0, T]_×Rd_→R_{and every}

m_∈N₀_, _n _{∈ {}₀_,₁_{, . . . , N}_} _that

Y_Tz,m₋_t_n _{≈ Y}_Nz,m₋_n. (58) In the next step we employ (58) to derive for every sufficiently regular function z: [0, T]_× Rd_→_R _{approximations of the stochastic processes} _ϑz,n_: _N

0×Ω→Rν, n ∈ {0,1, . . . , N},

in (48) which are also implementable in the case where one cannot simulate exactly from the solution processes of the SDEs in (47). More precisely, for every sufficiently regular function z: [0, T]_×Rd _→ _R _{let Θ}z,n _{= (Θ}z,n

m )m∈N₀: N₀ ×Ω → R, n ∈ {0,1, . . . , N}, be

stochastic processes which satisfy for every sufficiently regular function z: [0, T]_×Rd_→R and every n_{∈ {}1,2, . . . , N_},m _∈N₀ _that

Θz,nm+1 = Θz,nm −2γ·(∇θV(z)n )(Θz,nm ,Y z,m N−n)·

h

V(z)

n (Θz,nm ,Y z,m N−n)−V

(z) n−1(Θ

z,n−1 M ,Y

z,m N−n+1)

−f _Y_Nz,m₋_n+1,V(z)

n−1(Θ z,n−1 M ,Y

z,m

N−n+1),(∇xV (z) n−1)(Θ

z,n−1 M ,Y

z,m N−n+1)

(tn−tn−1)

−b _Y_Nz,m₋_n+1,V(z)

n−1(Θ z,n−1 M ,Y

z,m

N−n+1),(∇xV(z)n−1)(Θ z,n−1 M ,Y

z,m N−n+1)

(59)

· ztn(Y

z,m

N−n+1)−ztn−1(Y

z,m N−n+1)

i

(cf. (66)–(68) below). Note that (48) , (58), and (59) indicate for every sufficiently regular function z: [0, T]_×Rd _→R_{, every} _n _{∈ {}₁_,₂_{, . . . , N}_}_{, and every sufficiently large} _m _∈ N₀ that

Θz,nm ≈ϑz,nm . (60)

In the following two subsections (Subsection 2.7 and Subsection 2.8) we merge the above derivations to precisely formulate the proposed approximation algorithm, first, in a special case (Subsection 2.7) and, thereafter, in the general case (Subsection 2.8).

(17)

2.7 Description of the proposed approximation algorithm in a

special case

In this subsection we describe the proposed approximation algorithm in the special case where the standard Euler-Maruyama scheme (cf., e.g., Kloeden & Platen [67] and Maruyama [82]) is the employed approximation scheme for discretizing (47) (cf. (56)) and where the plain vanilla stochastic gradient descent method with constant learning rate γ _∈ (0,_∞) and batch size 1 is the employed minimization algorithm. A more general description of the proposed approximation algorithm, which allows to incorporate more sophisticated machine learning approximation techniques such as batch normalization (cf., for instance, Ioffe & Szegedy [56]) and the Adam optimizer (cf., for example, Kingma & Ba [65]), can be found in Subsection 2.8 below.

Framework 2.1 (Special case). Let T, γ _∈ (0,_∞), d, N, M _∈ N_, _ϕ _∈ _C2₍Rd_,R₎_, _s _∈

{3,4,5, . . ._}, ν =s(N + 1)d(d+ 1), t0, t1, . . . , tN ∈[0, T] satisfy

0 =t0 < t1 < . . . < tN =T, (61)

let τ0, τ1, . . . , τn ∈ [0, T] satisfy for every n ∈ {0,1, . . . , N} that τn = T − tN−n, let

f: Rd_×R_×Rd _→ R_, _b_: Rd _×R_×Rd _→ R_, _µ_: Rd _→ Rd_{, and} _σ_: Rd _→ Rd×d _be

con-tinuous functions, let (Ω,_F,P,(Ft)t∈[0,T]) be a filtered probability space, for every function z: [0, T]_×Rd_→R _let _ξz,m_{: Ω}_→Rd_, _m_∈N₀_{, be i.i.d.} _F₀_/_B₍Rd₎_{-measurable random}

vari-ables, let Bz,m_{: [0}_{, T}_]_×_Ω _→_Rd_, _m _∈ _N

0, be i.i.d. standard (Ft)t∈[0,T]-Brownian motions,

let _Yz,m_: _{₀_,₁_{, . . . , N}_{} ×}_Ω_→_Rd_, _m_∈_N

0, satisfy for everym∈N0, n∈ {0,1, . . . , N−1}

that _Y₀z,m =ξz,m _and

Yn+1z,m =Ynz,m+µ(Ynz,m) (τn+1−τn) +σ(Ynz,m) (Bτz,mn+1 −B

z,m

τn ), (62)

let _Td: Rd→Rd satisfy for every x= (x1, x2, . . . , xd)∈Rd that

Td(x) = tanh(x1),tanh(x2), . . . ,tanh(xd)

, (63)

for every θ = (θ1, θ2, . . . , θν) ∈ Rν, k, l ∈ N, v ∈ N0 = {0} ∪N with v +lk+l ≤ ν let

Aθ,v_k,l: Rk _→Rl _{satisfy for every} _x_{= (}_x₁_{, x}₂_{, . . . , x}_k₎_∈Rk _that

Aθ,v_k,l(x) =

θv+kl+1+

_k P

i=1

xiθv+i

, . . . , θv+kl+l+

_k P

i=1

xiθv+(l−1)k+i , (64)

for every function z: [0, T]_×Rd_→R _let V(z)_n _: Rν_×Rd_→R_, _n_{∈ {}₀_,₁_{, . . . , N}_}_{, satisfy for}

every n_{∈ {}1,2, . . . , N_}, θ _∈Rν_, _x_∈Rd _thatV(z)

0 (θ, x) =ϕ(x) and

V(z)

n (θ, x) = (65)

Aθ,(sn+s_d,1 −1)d(d+1)_{◦ T}d◦Aθ,(sn+sd,d −2)d(d+1)◦. . .◦ Td◦Aθ,(sn+1)d(d+1)d,d ◦ Td◦Aθ,snd(d+1)d,d

(18)

for every function z: [0, T]_× Rd _→ R _let _Θz,n_: N₀ _× _Ω _→ Rν_, _n _{∈ {}₀_,₁_{, . . . , N}_}_{, be}

stochastic processes, for every function z: [0, T]_×Rd _→ R _{and every} _n _{∈ {}₁_,₂_{, . . . , N}_}_,

m_∈N₀ _let _φz,n,m_: Rν_×_Ω_→R _{satisfy for every} _θ _∈Rν_, _ω _∈_Ω _that

φz,n,m(θ, ω) =hV(z)

n θ,Y z,m N−n(ω)

−V(z)

n−1(Θ z,n−1 M (ω),Y

z,m

N−n+1(ω)) −(tn−tn−1)

·f _Y_Nz,m₋_n+1(ω),V(z)

n−1(Θ z,n−1 M (ω),Y

z,m

N−n+1(ω)),(∇xV(z)n−1)(Θ z,n−1 M (ω),Y

z,m

N−n+1(ω))

−b _Y_Nz,m₋_n+1(ω),V(z)

n−1(Θ z,n−1 M (ω),Y

z,m

N−n+1(ω)),(∇xV (z) n−1)(Θ

z,n−1 M (ω),Y

z,m

N−n+1(ω))

· ztn(Y

z,m

N−n+1(ω))−ztn−1(Y

z,m

N−n+1(ω))

i2

,

(66)

for every functionz: [0, T]_×Rd_→R_{and every} _n_{∈ {}₁_,₂_{, . . . , N}_}_, _m_∈N₀ _let _Φz,n,m_: Rν_× Ω_→ Rν _{satisfy for every} _θ _∈Rν_, _ω _∈ _Ω _that _Φz,n,m₍_{θ, ω}_{) = (}_∇_θ_φz,n,m₎₍_{θ, ω}₎_{, and assume}

for every function z: [0, T]_×Rd _→R _{and every} _m _∈N₀_, _n _{∈ {}₁_,₂_{, . . . , N}_} _that

Θz,n_m+1 = Θz,n_m ₋γ_·Φz,n,m(Θz,n_m ). (67) In the setting of Framework 2.1 we note that for every function z: [0, T]_×Rd_→R _and every n _{∈ {}1,2, . . . , N_}, m _∈N₀ _{it holds that}

Θz,nm+1 = Θz,nm −2γ·(∇θV(z)n )(Θz,nm ,YNz,m−n)·

h

V(z)_n _(Θz,n_m _,_Yz,m

N−n)−V (z)

n−1(Θz,nM −1,Y z,m N−n+1)

−f _Y_Nz,m₋_n+1,V(z)

n−1(Θ z,n−1 M ,Y

z,m

N−n+1),(∇xV(z)n−1)(Θ z,n−1 M ,Y

z,m N−n+1)

(tn−tn−1)

−b _Y_Nz,m₋_n+1,V(z)

n−1(Θ z,n−1 M ,Y

z,m

N−n+1),(∇xV(z)n−1)(Θ z,n−1 M ,Y

z,m N−n+1)

(68)

· ztn(Y

z,m

N−n+1)−ztn−1(Y

z,m N−n+1)

i

(cf. (48), (59), (66), and (67) above). Moreover, in the setting of Framework 2.1 we think under suitable hypothesis for sufficiently largeN, M _∈N_{, for sufficiently small}_γ _∈₍₀_,_∞_), for every sufficiently regular function z: [0, T]_×Rd_→_R_{, and for every} _n_{∈ {}₀_,₁_{, . . . , N}_}_,

x_∈Rd _of V(z)_n _(Θz,n

M , x) : Ω→R as a suitable approximation

V(z)

n (Θ z,n

M , x)≈E

Xtn(x)|Z =z

(69) of E[Xt

n(x)|Z = z] where X: [0, T]×R

d _×_Ω _→ _R _{is a random field which satisfies for}

every t _∈ [0, T], x _∈ Rd _that _X_t₍_x_{) : Ω} _→ R _is _F_t_/_B₍R_{)-measurable, which satisfies for} every ω _∈ Ω that (Xt(x, ω))(t,x)∈[0,T]×Rd ∈ C0,2([0, T]×Rd,R) has at most polynomially

growing partial derivatives, and which satisfies that for every t _∈ [0, T], x _∈ Rd _{it holds}

P-a.s. that

Xt(x) =ϕ(x) +

Z t

0

ds+

Z t

0

dZs(x)

+

Z t

0

h

1

2Trace σ(x)[σ(x)]

∗

(HessXs)(x)

+µ(x),(_∇Xs)(x)

Rd

i

ds

(70)

whereZ: [0, T]_×Rd_×_Ω_→R_{is a sufficiently regular random field (cf. (1), (13), (14), (41),} and (42)).

(19)

2.8 Description of the proposed approximation algorithm in the

general case

In this subsection we present in Framework 2.2 a general approximation algorithm which includes the proposed approximation algorithm derived in Subsections 2.1–2.7 above as a special case. Compared to Framework 2.1, Framework 2.2 allows to incorporate other minimization algorithms than just the plain vanilla stochastic gradient descent method (see, e.g., (67) in Framework 2.1 in Subsection 2.7 above) such as, for example, the Adam optimizer (cf. Kingma & Ba [65]). Furthermore, Framework 2.2 also allows to incorporate more advanced machine learning techniques like batch normalization (cf. Ioffe & Szegedy [56] and (74) below).

Framework 2.2 (General case). Let T _∈ (0,_∞), M, N, d, δ, ̺, ν, ς _∈ N_, ₍_J_m₎_m_∈N₀ ⊆ N,

t0, t1, . . . , tN ∈[0, T]satisfy 0 = t0 < t1 < . . . < tN =T, letτ0, τ1, . . . , τn∈[0, T]satisfy for

every n_{∈ {}0,1, . . . , N_}thatτn=T−tN−n, letf: Rd×R×Rd→R, b: Rd×R×Rd→Rδ,

H: [0, T]2_×_Rd_×_Rd _→_Rd_{, and}_H

n: Rd×R×Rd×Rδ →R,n ∈ {1,2, . . . , N}, be functions,

let (Ω,_F,P) be a probability space with a normal filtration (Ft)t∈[0,T], for every function z: [0, T]_×Rd _→ Rδ _{and every} _n _{∈ {}₁_,₂_{, . . . , N}_} _let _Bz,n,m,j_{: [0}_{, T}_]_×_Ω _→ Rd_, _m _∈ N₀_,

j _∈N_{, be i.i.d. standard} ₍_F_t₎_t_∈_[0,T]_{-Brownian motions, for every function} _z_{: [0}_{, T}_]_×Rd_→ Rδ _{and every} _n _{∈ {}₁_,₂_{, . . . , N}_} _let _ξz,n,m,j_{: Ω}_→ Rd_, _m _∈ N₀_, _j _∈ N_{, be i.i.d.} _F₀_/_B₍Rd₎

-measurable random variables, for every functionz: [0, T]_×Rd_→Rδ_letVz,j,s

n : Rν×Rd →R,

j _∈N_, _s_∈Rς_, _n_{∈ {}₀_,₁_{, . . . , N}_}_{, be functions, for every function} _z_{: [0}_{, T}_]_×Rd_→Rδ _and

every n_{∈ {}1,2, . . . , N_}, m_∈N₀_, _j _∈N_let _Yz,n,m,j_: _{₀_,₁_{, . . . , N}_{} ×}_Ω_→Rd_{be a stochastic}

process which satisfies for every k _{∈ {}0,1, . . . , N ₋1_} that _Y₀z,n,m,j =ξz,n,m,j _and

Yk+1z,n,m,j =H(τk+1, τk,Y z,n,m,j k , B

z,n,m,j τk+1 −B

z,n,m,j

τk ), (71)

for every functionz: [0, T]_×Rd_→Rδ_let_Θz,n_: N₀_×_Ω_→Rν_,_n_{∈ {}₀_,₁_{, . . . N}_}_{, be stochastic}

processes, for every function z: [0, T]_×Rd _→ Rδ _{and every} _n _{∈ {}₁_,₂_{, . . . , N}_}_, _m _∈ N₀_,

s_∈Rς _let _φz,n,m,s

: Rν _×_Ω_→R _{satisfy for every} _θ _∈Rν_, _ω _∈_Ω _that

φz,n,m,s

(θ, ω) = 1

Jm Jm

X

j=1

Vz,j,s

n θ,Y z,n,m,j N−n (ω)

− Hn

YNz,n,m,j−n+1(ω),V z,j,s

n−1 Θz,nM −1(ω),Y z,n,m,j N−n+1(ω)

, (72)

(_∇xVz,j,

s

n−1) Θ z,n−1 M (ω),Y

z,n,m,j N−n+1(ω)

, ztn Y

z,n,m,j N−n+1(ω)

−ztn−1 Y

z,n,m,j N−n+1(ω)

2

, for every function z: [0, T]_× Rd _→ _Rδ _{and every} _n _{∈ {}₁_,₂_{, . . . , N}_}_, _m _∈ _N

0, s ∈ Rς

let Φz,n,m,s

: Rν _× _Ω _→ Rν _{satisfy for every} _ω _∈ _Ω_, _θ _{∈ {}_η _∈ Rν_: _φz,n,m,s

(_·, ω) : Rν _→ R _{is differentiable at} _η_} _that

Φz,n,m,s

(θ, ω) = (_∇θφz,n,m,

s

Deep learning based numerical approximation algorithms for stochastic partial differential equations and high-dimensional nonlinear filtering problems

arXiv:2012.01194v1 [math.NA] 2 Dec 2020