• No results found

Deep learning based numerical approximation algorithms for stochastic partial differential equations and high-dimensional nonlinear filtering problems

N/A
N/A
Protected

Academic year: 2021

Share "Deep learning based numerical approximation algorithms for stochastic partial differential equations and high-dimensional nonlinear filtering problems"

Copied!
58
0
0

Loading.... (view fulltext now)

Full text

(1)

arXiv:2012.01194v1 [math.NA] 2 Dec 2020

Deep learning based numerical approximation

algorithms for stochastic partial differential equations

and high-dimensional nonlinear filtering problems

Christian Beck

1

, Sebastian Becker

2

, Patrick Cheridito

3

,

Arnulf Jentzen

4,5

, and Ariel Neufeld

6

1 Department of Mathematics, ETH Zurich,

Switzerland, e-mail: [email protected]

2 RiskLab, Department of Mathematics, ETH Zurich,

Switzerland, e-mail: [email protected]

3 RiskLab, Department of Mathematics, ETH Zurich,

Switzerland, e-mail: [email protected]

4 Faculty of Mathematics and Computer Science, University of M¨unster

Germany, e-mail: [email protected]

5 Department of Mathematics, ETH Zurich

Switzerland, e-mail: [email protected]

6 Division of Mathematical Sciences, School of Physical and Mathematical Sciences,

Nanyang Technological University, Singapore, e-mail: [email protected]

Thursday 3

rd

December, 2020

Abstract

In this article we introduce and study a deep learning based approximation al-gorithm for solutions of stochastic partial differential equations (SPDEs). In the proposed approximation algorithm we employ a deep neural network for every real-ization of the driving noise process of the SPDE to approximate the solution process of the SPDE under consideration. We test the performance of the proposed ap-proximation algorithm in the case of stochastic heat equations with additive noise, stochastic heat equations with multiplicative noise, stochastic Black–Scholes equa-tions with multiplicative noise, and Zakai equaequa-tions from nonlinear filtering. In each of these SPDEs the proposed approximation algorithm produces accurate results with short run times in up to 50 space dimensions.

Keywords: stochastic partial differential equation, numerical method, artificial neural network, nonlinear filtering, Zakai equation

(2)

Contents

1 Introduction 3

2 Derivation and description of the proposed approximation algorithm 4

2.1 Temporal approximations . . . 5

2.2 An approximate Feynman-Kac type representation . . . 7

2.3 Formulation as iterative recursive minimization problems . . . 11

2.4 Deep neural network approximations . . . 12

2.5 Stochastic gradient descent based minimization . . . 14

2.6 Temporal discretization of the auxiliary stochastic process . . . 14

2.7 Description of the proposed approximation algorithm in a special case . . . 17

2.8 Description of the proposed approximation algorithm in the general case . 19 3 Examples 20 3.1 Stochastic heat equations with additive noise . . . 22

3.2 Stochastic heat equations with multiplicative noise . . . 23

3.3 Stochastic Black–Scholes equations with multiplicative noise . . . 26

3.4 Zakai equations . . . 34

4 Python source codes for the proposed approximation algorithm 38 4.1 A Python source code associated to Subsection 3.1 . . . 41

4.2 A Python source code associated to Subsection 3.2 . . . 42

4.3 A Python source code associated to Subsection 3.3 . . . 44

(3)

1

Introduction

Stochastic partial differential equations (SPDEs) are key ingredients in numerous models in engineering, finance, and natural sciences. For example, SPDEs commonly appear in models to price interest-rate based financial derivatives (cf., for example, (1.3) in Filipovi´c et al. [34] and Theorem 3.5 in Harms et al. [49]), to describe random surfaces appear-ing in surface growth models (cf., for example, (3) in Hairer [46] and (1) in Bl¨omker & Romito [15]), to describe the temporal dynamics in Euclidean quantum field theories (cf., for example, (1.1) in Mourrat & Weber [85]), to describe velocity fields in fully developed turbulent flows (cf., for example, (1.5) in Birnir [14] and (7) in Birnir [13]), and to describe the temporal development of the concentration of an unwanted (biological or chemical) contaminant in water (e.g., in the groundwater system, in a river, or in a water basin; cf., for example, (2.2) in Kallianpur & Xiong [63] and (1.1) in Kouritzin & Long [69]). Another prominent situation where SPDEs such as the Kushner equation (cf., for exam-ple, Kushner [75]) and the Zakai equation (cf., for examexam-ple, Zakai [105]) appear is in the case of nonlinear filtering problems where SPDEs describe the density of the state space of the considered system. In particular, we refer, e.g., to [16, 21, 25, 29, 36, 37] for filtering problems in financial engineering, we refer, e.g., to [18, 24, 94, 95, 98, 103] for filtering problems in chemical engineering, and we refer, e.g., to [28, 19, 20, 23, 33, 89] for filtering problems in weather forecasting. SPDEs arising in nonlinear filtering problems are usu-ally high-dimensional as the number of dimensions corresponds to the state space of the considered filtering problem. Most of the SPDEs in the above named applications cannot be solved explicitly and for about 30 years it has been an active field of research to design and study numerical algorithms which can approximatively compute solutions of SPDEs.

In order to be able to implement an approximation scheme for evolutionary type SPDEs on a computer both the time interval [0, T] as well as the infinite dimensional state space have to be discretized. Several types of temporal discretizations and spatial discretizations have been proposed and studied in the scientific literature. In particular, we refer, e.g., to [26, 41, 52, 96, 99, 104] for temporal discretizations based on the linear implicit Euler method, we refer, e.g., to [55, 60, 81, 87, 102] for temporal discretizations based on expo-nential Euler-type methods, we refer, e.g., to [50, 51, 96, 99] for temporal discretizations based on linear implicit Crank–Nicolson-type methods, we refer, e.g., to [1, 9, 12, 35, 44, 76] for temporal discretizations based on splitting up approximation methods, we refer, e.g., to [58, 62, 71, 92, 101] for temporal discretizations based on higher order approximation methods, we refer, e.g., to [17, 38, 64, 70, 80, 99, 104] for spatial discretizations based on finite elements methods, we refer, e.g., to [45, 83, 90, 93, 96, 100] for spatial discretizations based on finite differences methods, and we refer, e.g., to [39, 51, 68, 79, 88, 86] for spatial discretizations based on spectral Galerkin methods. Moreover, the recent article [106] em-ploys deep neural networks to approximately solve some one-dimensional SPDEs. We also refer to the overview articles [42, 59] and to the monographs [61, 72] for further references on numerical approximation methods for SPDEs.

(4)

In this article we introduce and study a deep learning based approximation algorithm for approximating solutions of possibly high-dimensional SPDEs. In the proposed approxi-mation algorithm we employ a deep neural network for every realization of the driving noise process of the SPDE to approximate the solution process of the SPDE under consideration. The derivation of the proposed approximation scheme is inspired by the ideas in the articles [2, 3, 48] in which deep learning based algorithms for high-dimensional PDEs have been proposed and studied. We also refer, e.g., to [4, 11, 22, 30, 31, 32, 53, 54, 57, 91, 97] and the references therein for further articles on deep learning based approximation methods for PDEs. We test the performance of the approximation algorithm proposed in this article in the case of stochastic heat equations with additive noise (see Subsection 3.1 below), stochastic heat equations with multiplicative noise (see Subsection 3.2 below), stochastic Black–Scholes equations with multiplicative noise (see Subsection 3.3 below), and Zakai equations from nonlinear filtering (see Subsection 3.4 below). In each of these SPDEs the proposed approximation algorithm produces accurate results with short run times in up to 50 space dimensions.

The remainder of this paper is organized as follows. In Section 2 we derive (see Subsec-tions 2.1–2.6 below) and formulate (see SubsecSubsec-tions 2.7–2.8 below) the proposed approxi-mation algorithm for SPDEs. In Section 3 we test the performance of the proposed approx-imation algorithm (see Subsection 2.8 below) in the case of stochastic heat equations with additive noise (see Subsection 3.1 below), stochastic heat equations with multiplicative noise (see Subsection 3.2 below), stochastic Black–Scholes equations with multiplicative noise (see Subsection 3.3 below), and Zakai equations (see Subsection 3.4 below). In Sec-tion 4 we present the Python source codes which we have used for the numerical simulaSec-tions in Section 3.

2

Derivation and description of the proposed

approx-imation algorithm

Let T (0,), d N, let ϕ: Rd R be a continuous function, let (Ω,F,P) be a

probability space with a normal filtration (Ft)t∈[0,T] (cf., e.g., Liu & R¨ockner [78,

Defi-nition 2.1.11]), let Z: [0, T]×Rd× R be a sufficiently regular random field which satisfies for every x Rd that (Zt(x))t[0,T]: [0, T]× R is an (Ft)t[0,T]-Itˆo process, let µ: Rd Rd, f: Rd ×R×Rd R, and b: Rd×R×Rd R be sufficiently regu-lar functions, let σ: Rd Rd×d be a sufficiently regular and sufficiently non-degenerate function, and let X: [0, T]× Rd× R be a random field which satisfies for every

t [0, T], x Rd that Xt(x) : Ω R is Ft/B(R)-measurable, which satisfies for every

(5)

partial derivatives, and which satisfies1 that for everyt[0, T],xRd it holds

P-a.s. that Xt(x) =ϕ(x) +

Z t

0

f x, Xs(x),(∇Xs)(x)

ds+

Z t

0

b x, Xs(x),(∇Xs)(x)

dZs(x)

+

Z t

0

h

1

2Trace σ(x)[σ(x)]

(HessX

s)(x)

+

µ(x),(Xs)(x)

Rd

i

ds.

(1)

Our goal is to approximately calculate under suitable hypothesis the solution process

X: [0, T]×Rd×Rof the SPDE (1).

2.1

Temporal approximations

In this subsection we discretize the SPDE (1) in time by employing the splitting-up method (cf., for example, [8, 10, 40, 43, 44, 76]) to obtain a semi-discrete approximation problem. To this end let N N, t0, t1, . . . , tN [0, T] be real numbers with

0 =t0 < t1 < . . . < tN =T. (2)

Observe that (1) yields that for every n∈ {0,1, . . . , N 1}, t [tn, tn+1], x ∈Rd it holds

P-a.s. that

Xt(x) =Xtn(x) +

Z t

tn

f x, Xs(x),(∇Xs)(x)

ds+

Z t

tn

b x, Xs(x),(∇Xs)(x)

dZs(x)

+

Z t

tn

h

1

2Trace σ(x)[σ(x)]

(HessX

s)(x)

+

µ(x),(Xs)(x)

Rd

i

ds.

(3)

This illustrates for every n ∈ {0,1, . . . , N 1}, t[tn, tn+1],x∈Rd that

Xt(x)≈Xtn(x)

+

Z tn+1

tn

f x, Xtn(x),(∇Xtn)(x)

ds+

Z tn+1

tn

b x, Xtn(x),(∇Xtn)(x)

dZs(x)

+

Z t

tn

h

1

2 Trace σ(x)[σ(x)]

(HessX

s)(x)

+

µ(x),(Xs)(x)

Rd

i

ds.

(4)

This, in turn, suggests for every n ∈ {0,1, . . . , N 1}, t[tn, tn+1], x∈Rd that

Xt(x)≈Xtn(x) +f x, Xtn(x),(∇Xtn)(x)

(tn+1−tn)

+b x, Xtn(x),(∇Xtn)(x)

Ztn+1(x)−Ztn(x)

+ Z t tn h 1

2 Trace σ(x)[σ(x)]

(HessX

s)(x)

+

µ(x),(Xs)(x)

Rd

i

ds.

(5)

1Note that for every d, mN and every (d×m)-matrix A Rd×m

it holds that A∗

∈ Rm×d is the transpose ofA.

(6)

To derive the splitting-up approximation let U: (0, T]×Rd× R be a random field which satisfies for every ω Ω, n ∈ {0,1, . . . , N 1} that (Ut(x, ω))(t,x)∈(tn,tn+1]×Rd ∈

C1,2((t

n, tn+1]×Rd,R) has at most polynomially growing partial derivatives, which satisfies

for everyω Ω,xRdthatRT

0 k(HessUs)(x, ω)kRd×d+k(∇Us)(x, ω)kRdds <∞, and which

satisfies that for every n∈ {0,1, . . . , N 1},t (tn, tn+1], x∈Rd it holdsP-a.s. that Ut(x) =Xtn(x) +f(x, Xtn(x),(∇Xtn)(x)) (tn+1−tn)

+b(x, Xtn(x),(∇Xtn)(x)) (Ztn+1(x)−Ztn(x))

+

Z t

tn

h

1

2Trace σ(x)[σ(x)]

(HessU

s)(x)

+

µ(x),(Us)(x)

Rd

i

ds.

(6)

Note that (5) and (6) suggest for every n ∈ {1,2, . . . , N}, xRd that

Utn(x)≈Xtn(x). (7)

Next let V : [0, T] ×Rd × R be a random field which satisfies for every ω Ω,

n ∈ {0,1, . . . , N 1} that (Vt(x, ω))(t,x)∈(tn,tn+1]×Rd ∈ C

1,2((t

n, tn+1]×Rd,R) has at most

polynomially growing partial derivatives, which satisfies for every ω Ω, x Rd that

RT

0 k(HessVs)(x, ω) kRd×d +k(∇Vs)(x, ω)kRdds < ∞, and which satisfies for every n ∈

{0,1, . . . , N 1}, t(tn, tn+1],x∈Rd that V0(x) =ϕ(x) and

Vt(x) =Vtn(x) +f(x, Vtn(x),(∇Vtn)(x)) (tn+1−tn)

+b(x, Vtn(x),(∇Vtn)(x)) (Ztn+1(x)−Ztn(x))

+

Z t

tn

h

1

2Trace σ(x)[σ(x)]

(HessV

s)(x)

+

µ(x),(Vs)(x)

Rd

i

ds

(8)

(cf., for example, Deck & Kruse [27], Hairer et al. [47, Section 4.4], Krylov [73, Chapter 8], and Krylov [74, Theorem 4.32] for existence, uniqueness, and regularity results for (6) and (8)). Note that (6) and (8) suggest for every n ∈ {1,2, . . . , N}, xRd that

Vtn(x)≈Utn(x). (9)

Combining this with (7), in turn, suggests for every n∈ {1,2, . . . , N}, xRd that

Vtn(x)≈Xtn(x). (10)

Observe that the random field V is a specific splitting-up type approximation for the random field X (cf., for example, [8, 10, 40, 43, 44, 76]). In the next subsection we derive a Feynman-Kac representation for V given Z (cf., for example, Milstein & Tretjakov [84, Section 2]).

(7)

2.2

An approximate Feynman-Kac type representation

In this subsection we derive a Feynman-Kac representation forV givenZ. More specifically, for every sufficiently regular function z: [0, T]×Rd R let V(z): [0, T]×Rd R satisfy for every n ∈ {0,1, . . . , N 1} that (Vt(z)(x))(t,x)∈(tn,tn+1]×Rd ∈ C

1,2((t

n, tn+1] × Rd,R)

has at most polynomially growing partial derivatives, which satisfies for every x Rd that RT

0 k(HessV (z)

s )(x)kRd×d+k(∇Vs(z))(x)kRdds < ∞, and which satisfies for every n ∈

{0,1, . . . , N 1}, t(tn, tn+1], x∈Rd that V0(z)(x) =ϕ(x) and

Vt(z)(x) =V (z)

tn (x) +f(x,V

(z)

tn (x),(∇V

(z)

tn )(x)) (tn+1−tn)

+b(x,Vt(z)n (x),(∇V

(z)

tn )(x)) (ztn+1(x)−ztn(x))

+

Z t

tn

h

1

2Trace σ(x)[σ(x)]

(HessV(z) s )(x)

+

µ(x),(∇V(z) s )(x)

Rd

i

ds.

(11)

Note that (8) and (11) ensure that for every ω Ω, t[0, T],xRd it holds that

V(Zs(y,ω))(s,y)∈[0,T]×Rd

t (x) =Vt(x, ω). (12)

Combining this with (10) suggests for every sufficiently regular functionz: [0, T]×RdR

and every ωΩ, n ∈ {0,1, . . . , N}, xRd that

V(Zs(y,ω))(s,y)∈[0,T]×Rd

tn (x)≈Xtn(x, ω). (13)

Moreover, note that (10)–(12) suggest for every sufficiently regular functionz: [0, T]×Rd R and every n ∈ {0,1, . . . , N}, xRd that

Vt(z)n (x)≈E

Xtn(x)|Z =z

. (14)

In the following we introduce additional artificial stochastic processes in order to incorpo-rate a Feynman-Kac type representation into (11). Let B: [0, T]×Rd be a standard (Ft)t∈[0,T]-Brownian motion, let ξ: Ω → Rd be an F0/B(Rd)-measurable function which

satisfies for every p (0,), x Rd that P(kξ −xkRd ≤ p) > 0 and E[kξk

p

Rd] < ∞,

assume that Z and (ξ, B) are independent random variables, and let Y : [0, T]× Rd be an (Ft)t∈[0,T]-adapted stochastic process with continuous sample paths which satisfies

that for every t [0, T] it holds P-a.s. that Yt=ξ+

Z t

0

µ(Ys)ds+

Z t

0

σ(Ys)dBs. (15)

Note that the assumption that for every p (0,) it holds that E[kξk

p

Rd] < ∞ and the

assumption that µ: Rd Rd and σ: Rd Rd×d are sufficiently regular functions ensure that for every p(0,) it holds that

sup

t∈[0,T]

E

kYtkpRd

(8)

Moreover, observe that (11) implies that for every sufficiently regular function z: [0, T]× RdR and every n ∈ {0,1, . . . , N 1}, t(tn, tn+1),xRd it holds that

∂ ∂t

Vt(z)(x)

=

µ(x),(∇Vt(z))(x)

Rd+

1

2Trace σ(x)[σ(x)]

(Hess

Vt(z))(x)

. (17) This, in turn, assures that for every sufficiently regular function z: [0, T]×Rd R and every n ∈ {0,1, . . . , N 1}, t(T tn+1, T −tn),x∈Rd it holds that

∂ ∂t

VT(z)−t(x)

+

µ(x),(∇VT(z)t)(x)

Rd +

1

2Trace σ(x)[σ(x)]

(Hess

VT(z)−t)(x)

= 0. (18) Next note that Itˆo’s formula, the hypothesis that for every sufficiently regular function

z: [0, T]×RdR and everyn ∈ {0,1, . . . , N 1} it holds that (V(z)

t (x))(t,x)∈(tn,tn+1]×Rd ∈

C1,2((t

n, tn+1] ×Rd,R) (cf. (11)), and (15) guarantee that for every sufficiently regular

function z: [0, T]×Rd Rand every n∈ {0,1, . . . , N 1}, r, t[T tn+1, T tn) with

r < t it holds P-a.s. that

VT(z)−t(Yt) =VT(z)−r(Yr) +

Z t

r

(∇VT(z)s)(Ys), σ(Ys)dBs

Rd+

Z t

r ∂ ∂s

VT(z)−s

(Ys)ds

+

Z t

r 1

2Trace σ(Ys)[σ(Ys)]

(Hess

VT(z)−s)(Ys)

ds + Z t r

µ(Ys),(∇VT(z)−s)(Ys)

Rdds.

(19)

Combining this with (18) implies that for every sufficiently regular functionz: [0, T]×Rd R and every n ∈ {0,1, . . . , N 1}, r, t[T tn+1, T tn) with r < t it holds P-a.s. that

VT(z)−t(Yt) = VT(z)−r(Yr) +

Z t

r

(∇VT(z)s)(Ys), σ(Ys)dBs

Rd. (20)

Hence, we obtain that for every sufficiently regular function z: [0, T]×Rd R and every

n ∈ {0,1, . . . , N 1}, t(T tn+1, T −tn) it holds P-a.s. that

VT(z)−t(Yt) =Vt(z)n+1(YT−tn+1) +

Z t

T−tn+1

(∇VT(z)s)(Ys), σ(Ys)dBs

Rd. (21)

Furthermore, note that (16), the hypothesis that σ: Rd Rd×d is a sufficiently regular

function, and the hypothesis that for every sufficiently regular function z: [0, T]×RdR and every n ∈ {0,1, . . . , N 1} it holds that (tn, tn+1]×Rd ∋ (t, x) 7→ (∇Vt(z))(x) ∈ Rd

is an at most polynomially growing function assure that for every n ∈ {0,1, . . . , N 1},

t(T tn+1, T −tn) it holds that

Z t

T−tn+1

E

h

[σ(Ys)]∗(∇V

(z) T−s)(Ys)

2

Rd

i

(9)

Therefore, we obtain that for every sufficiently regular function z: [0, T]×Rd R and every n ∈ {0,1, . . . , N 1}, t(T tn+1, T −tn) it holds P-a.s. that

E

Z t

T−tn+1

(∇VT(z)s)(Ys), σ(Ys)dBs

Rd

FT−tn+1

= 0. (23)

This and (21) demonstrate that for every sufficiently regular function z: [0, T]×Rd R and every n∈ {0,1, . . . , N 1}, t[T tn+1, T −tn) it holds P-a.s. that

E

VT(z)−t(Yt)

FTtn+1

=E

Vt(z)n+1(YT−tn+1)

FTtn+1

. (24)

The fact that for every sufficiently regular function z: [0, T]×Rd R and every n

{0,1, . . . , N1}it holds that the function Ω ω7→ Vt(z)n+1(YT−tn+1(ω))∈RisFT−tn+1/B(R

)-measurable hence implies that for every sufficiently regular function z: [0, T]×Rd R

and every n∈ {0,1, . . . , N 1}, t[T tn+1, T −tn) it holds P-a.s. that E

VT(z)−t(Yt)

FTtn+1

=Vt(z)n+1(YT−tn+1). (25)

Next observe that the hypothesis that for every sufficiently regular functionz: [0, T]×Rd R and every n∈ {0,1, . . . , N 1} it holds that (V(z)

t (x))(t,x)∈(tn,tn+1]×Rd ∈C

1,2((t

n, tn+1]×

Rd,R) has at most polynomially growing partial derivatives and the fact that for every

ω Ω it holds that [0, T] t7→Yt(ω)∈Rd is a continuous function ensure that for every

sufficiently regular function z: [0, T]×Rd R and every ω Ω, n ∈ {0,1, . . . , N 1} it holds that

lim sup

tրT−tn

V

(z)

T−t(Yt(ω))− VT(z)−t(YT−tn(ω))

= 0. (26)

Furthermore, observe that (11) and the hypothesis that for every sufficiently regular function z: [0, T]×Rd R and every x Rd it holds that RT

0 k(HessV (z)

s )(x)kRd×d +

k(∇Vs(z))(x)kRdds <∞show that for every sufficiently regular functionz: [0, T]×Rd→R

and every ωΩ, n ∈ {0,1, . . . , N 1} it holds that lim sup

tրT−tn

V

(z)

T−t(YT−tn(ω))−

Vt(z)n (YT−tn(ω))

+f YT−tn(ω),V

(z)

tn (YT−tn(ω)),(∇V

(z)

tn )(YT−tn(ω))

(tn+1−tn) (27)

+b YT−tn(ω),V

(z)

tn (YT−tn(ω)),(∇V

(z)

tn )(YT−tn(ω))

ztn+1(YT−tn(ω))−ztn(YT−tn(ω))

= 0.

Combining this with (26) demonstrates that for every sufficiently regular functionz: [0, T]× RdR and every ω Ω, n ∈ {0,1, . . . , N 1} it holds that

lim sup

tրT−tn

V

(z)

T−t(Yt(ω))−

Vt(z)n (YT−tn(ω))

+f YT−tn(ω),V

(z)

tn (YT−tn(ω)),(∇V

(z)

tn )(YT−tn(ω))

(tn+1−tn) (28)

+b YT−tn(ω),V

(z)

tn (YT−tn(ω)),(∇V

(z)

tn )(YT−tn(ω))

ztn+1(YT−tn(ω))−ztn(YT−tn(ω))

= 0.

(10)

In addition, note that the hypothesis that for every sufficiently regular functionz: [0, T]× RdRand everyn ∈ {0,1, . . . , N}it holds that (V(z)

t (x))(t,x)∈(tn,tn+1]×Rd ∈C

1,2((t

n, tn+1]×

Rd,R) has at most polynomially growing partial derivatives and (16) ensure that for every sufficiently regular function z: [0, T]×RdR and every p(0,) it holds that

sup

t∈[0,T]

E

kYtkpRd

+

sup

t∈[0,T)

E

|VT(z)−t(Yt)|p

+

sup

t∈(0,T]

E

k(∇Vt(z))(YT−t)kpRd

<. (29) Next note that the fact that for every x R, ω Ω it holds that X0(x, ω) = ϕ(x), the hypothesis that for every sufficiently regular function z: [0, T]×Rd R and every

x Rd it holds that V(z)

0 (x) = ϕ(x), and the hypothesis that for every ω ∈ Ω it holds

that (Xt(x, ω))(t,x)∈[0,T]×Rd ∈C0,2([0, T]×Rd,R) has at most polynomially growing partial

derivatives demonstrate that for every sufficiently regular function z: [0, T]×Rd R it holds that (V0(z)(x))x∈Rd ∈ C2(Rd,R) has at most polynomially growing derivatives. This

and (29) imply that for every sufficiently regular function z: [0, T]×Rd R and every

p(0,) it holds that

sup

t∈[0,T]

E

kYtkpRd

+

sup

t∈[0,T]

E

|VT(z)−t(Yt)| p

+

sup

t∈[0,T]

E

k(∇Vt(z))(YT−t)kpRd

<. (30) The hypothesis that f: Rd×R×RdR and b: Rd×R×RdR are sufficiently regular

functions hence proves that for every sufficiently regular function z: [0, T]×Rd R and every p(0,) it holds that

sup

t∈[0,T]

E

h

Vt(z)(YTt)

pi

+ max

n∈{0,1,...,N−1} E

h

Vt(z)n (YTtn) +f YTtn,Vt(z)n (YTtn),(∇Vt(z)n )(YTtn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

ztn+1(YT−tn)−ztn(YT−tn)

pi

<.

(31)

Combining (28) and, e.g., Hutzenthaler et al. [55, Proposition 4.5] therefore demonstrates that for every sufficiently regular functionz: [0, T]×Rd Rand everyn ∈ {0,1, . . . , N1} it holds that

lim sup

tրT−tn

E

h V

(z)

T−t(Yt)−

Vt(z)n (YT−tn) +f YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

ztn+1(YT−tn)−ztn(YT−tn)

i

(11)

This and (25) yield that for every sufficiently regular functionz: [0, T]×Rd Rand every

n ∈ {0,1, . . . , N 1} it holdsP-a.s. that

Vt(z)n+1(YT−tn+1)

=E

h

Vt(z)n (YT−tn) +f YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

ztn+1(YT−tn)−ztn(YT−tn) FTtn+1 i

.

(33)

The tower property for conditional expectations therefore assures that for every sufficiently regular function z: [0, T]×RdR and every n∈ {0,1, . . . , N 1} it holds P-a.s. that

E

h

Vt(z)n+1(YT−tn+1)|S(YT−tn+1)

i

=E

h

Vt(z)n (YT−tn) +f YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

ztn+1(YT−tn)−ztn(YT−tn) S(YT−tn+1)

i

.

(34)

In addition, observe that the fact that for every n ∈ {0,1, . . . , N 1} it holds that the function Ωω 7→YT−tn+1(ω)∈R

disS(Y

T−tn+1)/B(R

d)-measurable assures that for every

sufficiently regular function z: [0, T]×Rd R and every n ∈ {0,1, . . . , N 1} it holds

P-a.s. that

Vt(z)n+1(YT−tn+1) =E

h

Vt(z)n+1(YT−tn+1)|S(YT−tn+1)

i

. (35)

This and (34) imply that for every sufficiently regular function z: [0, T]×Rd R and every n ∈ {0,1, . . . , N 1} it holdsP-a.s. that

Vt(z)n+1(YT−tn+1)

=E

h

Vt(z)n (YT−tn) +f YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

ztn+1(YT−tn)−ztn(YT−tn) S(YT−tn+1)

i

.

(36)

Equation (36) constitutes the Feynman-Kac type representation we were aiming at. In the following subsection we employ the factorization lemma (cf., for example, Becker et al. [6, Lemma 2.1] or Klenke [66, Corollary 1.97]) and the L2-minimality property of conditional

expectations (cf., for example, Klenke [66, Corollary 8.17]) to reformulate (36) as recursive minimization problems.

2.3

Formulation as iterative recursive minimization problems

In this subsection we reformulate (36) as recursive minimization problems. For this observe that (31) shows that for every sufficiently regular function z: [0, T]×Rd R and every

(12)

n ∈ {0,1, . . . , N 1} it holds that E h V (z)

tn (YT−tn) +f YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

ztn+1(YT−tn)−ztn(YT−tn)

2i

<. (37) The factorization lemma, e.g., in [6, Lemma 2.1] (applied with (S,S) x (Rd,B(Rd)), ΩxΩ, X xYTt

n+1 for n ∈ {0,1, . . . , N −1} in the notation of [6, Lemma 2.1]), theL

2

-minimality property for conditional expectations, e.g., in Klenke [66, Corollary 8.17] (ap-plied withX xω7→ V(z)(YTt

n(ω)) +f(YT−tn(ω),V

(z)

tn (YT−tn(ω)),(∇V

(z)

tn )(YT−tn(ω)))

(tn+1 − tn) +b(YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn))(ztn+1(YT−tn)−ztn(YT−tn)) ∈ R, F x

S(YTt

n+1), A x F in the notation of [66, Corollary 8.17]), the fact that for every

suffi-ciently regular function z: [0, T]×Rd R and every n ∈ {0,1, . . . , N 1} it holds that Rd x7→ V(z)

tn+1(x)∈R is a continuous function, the fact that for every Borel measurable

set A ∈ B(Rd) with positive Lebesgue measure it holds that minn∈{0,1,...,N1}P(YT−t

n+1 ∈

A)>0, and (36) hence imply that for every sufficiently regular functionz: [0, T]×RdR and every n∈ {0,1, . . . , N 1} it holds that

(Vt(z)n+1(x))x∈Rd = argmin

u∈C(Rd,R)

E

h

u(YTtn+1)−

Vt(z)n (YT−tn)

+f YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

(tn+1−tn)

+b YT−tn,V

(z)

tn (YT−tn),(∇V

(z)

tn )(YT−tn)

ztn+1(YT−tn)−ztn(YT−tn)

2i

.

(38)

Therefore, we obtain that for every sufficiently regular function z: [0, T]×Rd R and

every n ∈ {1,2, . . . , N} it holds that (Vt(z)n (x))x∈Rd = argmin

u∈C(Rd,R)

E

h

u(YTtn)−

Vt(z)n−1(YT−tn−1)

+f YT−tn−1,V

(z)

tn−1(YT−tn−1),(∇V

(z)

tn−1)(YT−tn−1)

(tn−tn−1)

+b YT−tn−1,V

(z)

tn−1(YT−tn−1),(∇V

(z)

tn−1)(YT−tn−1)

ztn(YT−tn−1)−ztn−1(YT−tn−1)

2i

.

(39)

In the following subsection we approximate for every sufficiently regular functionz: [0, T]× Rd R and every n ∈ {1,2, . . . , N} the function Rd x7→ V(z)

tn (x)∈R by suitable deep

neural networks.

2.4

Deep neural network approximations

In this subsection we employ for every sufficiently regular function z: [0, T]×RdR and every n ∈ {1,2, . . . , N} suitable approximations for the function

Rdx7→ V(z)

(13)

More specifically, let ν N and for every sufficiently regular function z: [0, T]×Rd R let V(z)n = (V(z)n (θ, x))(θ,x)Rν×Rd: Rν × Rd → R, n ∈ {0,1, . . . , N}, be continuously

differentiable functions which satisfy for every sufficiently regular functionz: [0, T]×Rd R and every θ, xRd that V(z)

0 (θ, x) =ϕ(x). For every sufficiently regular function

z: [0, T]×RdR, every n∈ {1,2, . . . , N},xRd, and every suitable θwe think of V(z)n (θ, x)R as an appropriate approximation

V(z)

n (θ, x)≈ V (z)

tn (x) (41)

ofVt(z)n (x). Combining this and (13) indicates for every sufficiently regular functionz: [0, T]×

RdR, every n ∈ {1,2, . . . , N}, xRd, and every suitable θ Rν that

V(z)

n (θ, x)≈E

Xtn(x)|Z =z

. (42)

For every sufficiently regular functionz: [0, T]×RdRwe propose to choose the functions V(z)n :×Rd R, n ∈ {1,2, . . . , N}, as deep neural networks (cf., for instance, [7, 77]). For example, for every k N let Tk: Rk Rk satisfy for every x= (x1, x2, . . . , xk) Rk that

Tk(x) = tanh(x1),tanh(x2), . . . ,tanh(xk)

(43) (multidimensional version of the tangens hyperbolicus), for every θ= (θ1, θ2, . . . , θν)∈Rν,

v N0 = {0} ∪ N, k, l N with v +lk +l ν let Aθ,v

k,l: Rk → Rl satisfy for every

x= (x1, x2, . . . , xk)∈Rk that

Aθ,vk,l(x) =

      

θv+1 θv+2 . . . θv+k

θv+k+1 θv+k+2 . . . θv+2k

θv+2k+1 θv+2k+2 . . . θv+3k

..

. ... ... ...

θv+(l−1)k+1 θv+(l−1)k+2 . . . θv+lk

              x1 x2 x3 .. . xk        +        θv+lk+1 θv+lk+2 θv+lk+3 .. . θv+lk+l        (44)

(affine function), let s ∈ {3,4,5,6, . . .}, assume that s(N + 1)d(d+ 1) ν, and for every sufficiently regular function z: [0, T]×Rd R let V(z)n :×Rd R, n ∈ {0,1, . . . , N}, satisfy for every n ∈ {1,2, . . . , N}, θ, xRd that V(z)

0 (θ, x) =ϕ(x) and

V(z)

n (θ, x) = (45)

Aθ,(sn+sd,1 −1)d(d+1)◦ Td◦Aθ,(sn+sd,d −2)d(d+1)◦. . .◦ Td◦Aθ,(sn+1)d(d+1)d,d ◦ Td◦Aθ,snd(d+1)d,d

(x).

For every sufficiently regular function z: [0, T]×Rd R and every n ∈ {1,2, . . . , N}the function V(z)n :×Rd R in (45) describes a fully-connected feedforward deep neural network with s+ 1 layers (1 input layer withdneurons, s1 hidden layers withdneurons each, and 1 output layer with 1 neuron) and multidimensional versions of the tangens hyperbolicus as activation functions (see (43)).

(14)

2.5

Stochastic gradient descent based minimization

We intend to find for every sufficiently regular function z: [0, T] × Rd R suitable

θz,1, θz,2, . . . , θz,N Rν in (41) by recursive minimization. More precisely, for every

suf-ficiently regular function z: [0, T] × Rd R we intend to find for n ∈ {1,2, . . . , N},

θz,0, θz,1, . . . , θz,n−1 Rν a suitable θz,n Rν as an approximate minimizer of the function

θ 7→E

h

V(z)n (θ, YTtn)−

V(z)

n−1(θz,n−1, YT−tn−1)

+f YT−tn−1,V

(z)

n−1(θz,n−1, YT−tn−1),(∇xV

(z)

n−1)(θz,n−1, YT−tn−1)

(tn−tn−1)

+b YT−tn−1,V

(z)

n−1(θz,n−1, YT−tn−1),(∇xV

(z)

n−1)(θz,n−1, YT−tn−1)

· ztn(YT−tn−1)−ztn−1(YT−tn−1)

2i

∈R

(46)

(cf. (39) and (41) above). To this end for every sufficiently regular functionz: [0, T]×Rd R let Bz,m: [0, T]× Rd, m N0, be i.i.d. standard (Ft)t[0,T]-Brownian motions, let

ξz,m: Ω Rd, m N

0, be i.i.d. F0/B(Rd)-measurable functions, for every sufficiently

regular function z: [0, T]×Rd R and every m N0 let Yz,m: [0, T]× Rd be an (Ft)t∈[0,T]-adapted stochastic process with continuous sample paths which satisfies that for

every t [0, T] it holds P-a.s. that Ytz,m =ξz,m+

Z t

0

µ(Ysz,m)ds+

Z t

0

σ(Ysz,m)dBsz,m, (47)

let γ (0,), M N, and for every sufficiently regular function z: [0, T]×Rd R let

ϑz,n = (ϑz,n

m )m∈N0: N0×Ω→ Rν, n ∈ {0,1, . . . , N}, be stochastic processes which satisfy

for every n ∈ {1,2, . . . , N}, mN0 that

ϑz,nm+1 =ϑz,nm ·(θV(z)n )(ϑz,nm , Y z,m T−tn)·

h

V(z)

n (ϑz,nm , Y z,m T−tn)−V

(z) n−1(ϑ

z,n−1 M , Y

z,m T−tn−1)

−f YTz,mtn−1,V

(z) n−1(ϑ

z,n−1 M , Y

z,m

T−tn−1),(∇xV

(z) n−1)(ϑ

z,n−1 M , Y

z,m T−tn−1)

(tn−tn−1)

−b YTz,mtn−1,V

(z)

n−1(ϑz,nM −1, Y z,m

T−tn−1),(∇xV

(z)

n−1)(ϑz,nM −1, Y z,m T−tn−1)

· ztn(Y

z,m

T−tn−1)−ztn−1(Y

z,m T−tn−1)

i

(48)

(cf. (66)–(68) below).

2.6

Temporal discretization of the auxiliary stochastic process

Equation (48) provides us an implementable numerical algorithm in the special case where for every sufficiently regular function z: [0, T]×Rd R one can simulate exactly from the solution processes Yz,m: [0, T] × Rd, m N

(15)

(15) above). In the case where it is not possible to simulate for every sufficiently regular function z: [0, T]×Rd R exactly from the solution processes Yz,m: [0, T]× Rd,

mN0, of the SDEs in (47), one can employ a numerical approximation method for SDEs, say, the Euler-Maruyama scheme, to approximatively simulate for every sufficiently regular function z: [0, T]×Rd R from the solution processes Yz,m: [0, T]× Rd, m N0, of the SDEs in (47). This is subject of this subsection. More formally, note that (47) implies that for every sufficiently regular function z: [0, T]×Rd R and every m N0,

r, t[0, T] with rt it holds P-a.s. that Ytz,m =Yrz,m+

Z t

r

µ(Ysz,m)ds+

Z t

r

σ(Ysz,m)dBsz,m. (49) Hence, we obtain that for every sufficiently regular function z: [0, T]×Rd R and every

mN0, n ∈ {0,1, . . . , N 1} it holdsP-a.s. that YTz,mtn =YTz,mtn+1 +

Z T−tn

T−tn+1

µ(Ysz,m)ds+

Z T−tn

T−tn+1

σ(Ysz,m)dBsz,m. (50) This shows that for every sufficiently regular functionz: [0, T]×RdRand everymN0,

n ∈ {0,1, . . . , N 1} it holdsP-a.s. that YTz,mtN

−(n+1) =Y

z,m T−tN−n+

Z T−tN−(n+1)

T−tN−n

µ(Ysz,m)ds+

Z T−tN−(n+1)

T−tN−n

σ(Ysz,m)dBsz,m. (51) Next we introduce suitable real numbers which allow us to reformulate (51) in a more compact way. More formally, let τn ∈ [0, T], n ∈ {0,1, . . . , N}, satisfy for every n ∈

{0,1, . . . , N}that

τn =T −tN−n. (52)

Observe that (2) and (52) ensure that

0 =τ0 < τ1 <· · ·< τN =T. (53)

Moreover, note that (51) and (52) demonstrate that for every sufficiently regular function

z: [0, T]×RdR and every m N0,n ∈ {0,1, . . . , N 1} it holdsP-a.s. that Yτz,mn+1 =Yτz,mn +

Z τn+1

τn

µ(Ysz,m)ds+

Z τn+1

τn

σ(Ysz,m)dBsz,m. (54) This suggests for every sufficiently regular function z: [0, T]×RdRand every mN

0,

n ∈ {0,1, . . . , N 1} that

Yτz,mn+1 Yτz,mn +µ(Yτz,mn ) (τn+1−τn) +σ(Yτz,mn ) (B

z,m τn+1−B

z,m

(16)

Based on (55) we now introduce for every sufficiently regular function z: [0, T]×Rd R suitable Euler-Maruyama approximations for the solution processes Yz,m: [0, T]×

Rd, m N0, of the SDEs in (47). More formally, for every sufficiently regular function

z: [0, T]×RdRand every mN0 let Yz,m = (Yz,m

n )n∈{0,1,...,N}: {0,1, . . . , N} ×Ω→Rd

be the stochastic process which satisfies for every n ∈ {0,1, . . . , N 1} that Y0z,m =ξz,m

and

Yn+1z,m =Ynz,m+µ(Ynz,m) (τn+1−τn) +σ(Ynz,m) (Bτz,mn+1 −B

z,m

τn ). (56)

Observe that (52), (55), and (56) suggest for every sufficiently regular function z: [0, T]× RdR and every m N0, n∈ {0,1, . . . , N}that

Ynz,m ≈Yτz,mn =Y

z,m

T−tN−n. (57)

This, in turn, suggests for every sufficiently regular function z: [0, T]×RdRand every

mN0, n ∈ {0,1, . . . , N} that

YTz,mtn ≈ YNz,mn. (58) In the next step we employ (58) to derive for every sufficiently regular function z: [0, T]× RdR approximations of the stochastic processes ϑz,n: N

0×Ω→Rν, n ∈ {0,1, . . . , N},

in (48) which are also implementable in the case where one cannot simulate exactly from the solution processes of the SDEs in (47). More precisely, for every sufficiently regular function z: [0, T]×Rd R let Θz,n = (Θz,n

m )m∈N0: N0 ×Ω → R, n ∈ {0,1, . . . , N}, be

stochastic processes which satisfy for every sufficiently regular function z: [0, T]×RdR and every n∈ {1,2, . . . , N},m N0 that

Θz,nm+1 = Θz,nm −2γ·(∇θV(z)n )(Θz,nm ,Y z,m N−n)·

h

V(z)

n (Θz,nm ,Y z,m N−n)−V

(z) n−1(Θ

z,n−1 M ,Y

z,m N−n+1)

−f YNz,mn+1,V(z)

n−1(Θ z,n−1 M ,Y

z,m

N−n+1),(∇xV (z) n−1)(Θ

z,n−1 M ,Y

z,m N−n+1)

(tn−tn−1)

−b YNz,mn+1,V(z)

n−1(Θ z,n−1 M ,Y

z,m

N−n+1),(∇xV(z)n−1)(Θ z,n−1 M ,Y

z,m N−n+1)

(59)

· ztn(Y

z,m

N−n+1)−ztn−1(Y

z,m N−n+1)

i

(cf. (66)–(68) below). Note that (48) , (58), and (59) indicate for every sufficiently regular function z: [0, T]×Rd R, every n ∈ {1,2, . . . , N}, and every sufficiently large m N0 that

Θz,nm ≈ϑz,nm . (60)

In the following two subsections (Subsection 2.7 and Subsection 2.8) we merge the above derivations to precisely formulate the proposed approximation algorithm, first, in a special case (Subsection 2.7) and, thereafter, in the general case (Subsection 2.8).

(17)

2.7

Description of the proposed approximation algorithm in a

special case

In this subsection we describe the proposed approximation algorithm in the special case where the standard Euler-Maruyama scheme (cf., e.g., Kloeden & Platen [67] and Maruyama [82]) is the employed approximation scheme for discretizing (47) (cf. (56)) and where the plain vanilla stochastic gradient descent method with constant learning rate γ (0,) and batch size 1 is the employed minimization algorithm. A more general description of the proposed approximation algorithm, which allows to incorporate more sophisticated machine learning approximation techniques such as batch normalization (cf., for instance, Ioffe & Szegedy [56]) and the Adam optimizer (cf., for example, Kingma & Ba [65]), can be found in Subsection 2.8 below.

Framework 2.1 (Special case). Let T, γ (0,), d, N, M N, ϕ C2(Rd,R), s

{3,4,5, . . .}, ν =s(N + 1)d(d+ 1), t0, t1, . . . , tN ∈[0, T] satisfy

0 =t0 < t1 < . . . < tN =T, (61)

let τ0, τ1, . . . , τn ∈ [0, T] satisfy for every n ∈ {0,1, . . . , N} that τn = T − tN−n, let

f: Rd×R×Rd R, b: Rd ×R×Rd R, µ: Rd Rd, and σ: Rd Rd×d be

con-tinuous functions, let (Ω,F,P,(Ft)t∈[0,T]) be a filtered probability space, for every function z: [0, T]×RdR let ξz,m: ΩRd, mN0, be i.i.d. F0/B(Rd)-measurable random

vari-ables, let Bz,m: [0, T]× Rd, m N

0, be i.i.d. standard (Ft)t∈[0,T]-Brownian motions,

let Yz,m: {0,1, . . . , N} ×Rd, mN

0, satisfy for everym∈N0, n∈ {0,1, . . . , N−1}

that Y0z,m =ξz,m and

Yn+1z,m =Ynz,m+µ(Ynz,m) (τn+1−τn) +σ(Ynz,m) (Bτz,mn+1 −B

z,m

τn ), (62)

let Td: Rd→Rd satisfy for every x= (x1, x2, . . . , xd)∈Rd that

Td(x) = tanh(x1),tanh(x2), . . . ,tanh(xd)

, (63)

for every θ = (θ1, θ2, . . . , θν) ∈ Rν, k, l ∈ N, v ∈ N0 = {0} ∪N with v +lk+l ≤ ν let

Aθ,vk,l: Rk Rl satisfy for every x= (x1, x2, . . . , xk)Rk that

Aθ,vk,l(x) =

θv+kl+1+

k P

i=1

xiθv+i

, . . . , θv+kl+l+

k P

i=1

xiθv+(l−1)k+i , (64)

for every function z: [0, T]×RdR let V(z)n :×RdR, n∈ {0,1, . . . , N}, satisfy for

every n∈ {1,2, . . . , N}, θ , xRd thatV(z)

0 (θ, x) =ϕ(x) and

V(z)

n (θ, x) = (65)

Aθ,(sn+sd,1 −1)d(d+1)◦ Td◦Aθ,(sn+sd,d −2)d(d+1)◦. . .◦ Td◦Aθ,(sn+1)d(d+1)d,d ◦ Td◦Aθ,snd(d+1)d,d

(18)

for every function z: [0, T]× Rd R let Θz,n: N0 × , n ∈ {0,1, . . . , N}, be

stochastic processes, for every function z: [0, T]×Rd R and every n ∈ {1,2, . . . , N},

mN0 let φz,n,m:×R satisfy for every θ , ω that

φz,n,m(θ, ω) =hV(z)

n θ,Y z,m N−n(ω)

−V(z)

n−1(Θ z,n−1 M (ω),Y

z,m

N−n+1(ω)) −(tn−tn−1)

·f YNz,mn+1(ω),V(z)

n−1(Θ z,n−1 M (ω),Y

z,m

N−n+1(ω)),(∇xV(z)n−1)(Θ z,n−1 M (ω),Y

z,m

N−n+1(ω))

−b YNz,mn+1(ω),V(z)

n−1(Θ z,n−1 M (ω),Y

z,m

N−n+1(ω)),(∇xV (z) n−1)(Θ

z,n−1 M (ω),Y

z,m

N−n+1(ω))

· ztn(Y

z,m

N−n+1(ω))−ztn−1(Y

z,m

N−n+1(ω))

i2

,

(66)

for every functionz: [0, T]×RdRand every n∈ {1,2, . . . , N}, mN0 let Φz,n,m:×satisfy for every θ , ω that Φz,n,m(θ, ω) = (θφz,n,m)(θ, ω), and assume

for every function z: [0, T]×Rd R and every m N0, n ∈ {1,2, . . . , N} that

Θz,nm+1 = Θz,nm γ·Φz,n,m(Θz,nm ). (67) In the setting of Framework 2.1 we note that for every function z: [0, T]×RdR and every n ∈ {1,2, . . . , N}, m N0 it holds that

Θz,nm+1 = Θz,nm −2γ·(∇θV(z)n )(Θz,nm ,YNz,m−n)·

h

V(z)n z,nm ,Yz,m

N−n)−V (z)

n−1(Θz,nM −1,Y z,m N−n+1)

−f YNz,mn+1,V(z)

n−1(Θ z,n−1 M ,Y

z,m

N−n+1),(∇xV(z)n−1)(Θ z,n−1 M ,Y

z,m N−n+1)

(tn−tn−1)

−b YNz,mn+1,V(z)

n−1(Θ z,n−1 M ,Y

z,m

N−n+1),(∇xV(z)n−1)(Θ z,n−1 M ,Y

z,m N−n+1)

(68)

· ztn(Y

z,m

N−n+1)−ztn−1(Y

z,m N−n+1)

i

(cf. (48), (59), (66), and (67) above). Moreover, in the setting of Framework 2.1 we think under suitable hypothesis for sufficiently largeN, M N, for sufficiently smallγ (0,), for every sufficiently regular function z: [0, T]×RdR, and for every n∈ {0,1, . . . , N},

xRd of V(z)n z,n

M , x) : Ω→R as a suitable approximation

V(z)

n (Θ z,n

M , x)≈E

Xtn(x)|Z =z

(69) of E[Xt

n(x)|Z = z] where X: [0, T]×R

d × R is a random field which satisfies for

every t [0, T], x Rd that Xt(x) : Ω R is Ft/B(R)-measurable, which satisfies for every ω Ω that (Xt(x, ω))(t,x)∈[0,T]×Rd ∈ C0,2([0, T]×Rd,R) has at most polynomially

growing partial derivatives, and which satisfies that for every t [0, T], x Rd it holds

P-a.s. that

Xt(x) =ϕ(x) +

Z t

0

f x, Xs(x),(∇Xs)(x)

ds+

Z t

0

b x, Xs(x),(∇Xs)(x)

dZs(x)

+

Z t

0

h

1

2Trace σ(x)[σ(x)]

(HessXs)(x)

+µ(x),(Xs)(x)

Rd

i

ds

(70)

whereZ: [0, T]×Rd×Ris a sufficiently regular random field (cf. (1), (13), (14), (41), and (42)).

(19)

2.8

Description of the proposed approximation algorithm in the

general case

In this subsection we present in Framework 2.2 a general approximation algorithm which includes the proposed approximation algorithm derived in Subsections 2.1–2.7 above as a special case. Compared to Framework 2.1, Framework 2.2 allows to incorporate other minimization algorithms than just the plain vanilla stochastic gradient descent method (see, e.g., (67) in Framework 2.1 in Subsection 2.7 above) such as, for example, the Adam optimizer (cf. Kingma & Ba [65]). Furthermore, Framework 2.2 also allows to incorporate more advanced machine learning techniques like batch normalization (cf. Ioffe & Szegedy [56] and (74) below).

Framework 2.2 (General case). Let T (0,), M, N, d, δ, ̺, ν, ς N, (Jm)mN0 ⊆ N,

t0, t1, . . . , tN ∈[0, T]satisfy 0 = t0 < t1 < . . . < tN =T, letτ0, τ1, . . . , τn∈[0, T]satisfy for

every n∈ {0,1, . . . , N}thatτn=T−tN−n, letf: Rd×R×Rd→R, b: Rd×R×Rd→Rδ,

H: [0, T]2×Rd×Rd Rd, andH

n: Rd×R×Rd×Rδ →R,n ∈ {1,2, . . . , N}, be functions,

let (Ω,F,P) be a probability space with a normal filtration (Ft)t∈[0,T], for every function z: [0, T]×Rd and every n ∈ {1,2, . . . , N} let Bz,n,m,j: [0, T]× Rd, m N0,

j N, be i.i.d. standard (Ft)t[0,T]-Brownian motions, for every function z: [0, T]×Rdand every n ∈ {1,2, . . . , N} let ξz,n,m,j: Ω Rd, m N0, j N, be i.i.d. F0/B(Rd)

-measurable random variables, for every functionz: [0, T]×RdletVz,j,s

n : Rν×Rd →R,

j N, s, n∈ {0,1, . . . , N}, be functions, for every function z: [0, T]×Rdand

every n∈ {1,2, . . . , N}, mN0, j Nlet Yz,n,m,j: {0,1, . . . , N} ×Rdbe a stochastic

process which satisfies for every k ∈ {0,1, . . . , N 1} that Y0z,n,m,j =ξz,n,m,j and

Yk+1z,n,m,j =H(τk+1, τk,Y z,n,m,j k , B

z,n,m,j τk+1 −B

z,n,m,j

τk ), (71)

for every functionz: [0, T]×RdletΘz,n: N0×,n∈ {0,1, . . . N}, be stochastic

processes, for every function z: [0, T]×Rd and every n ∈ {1,2, . . . , N}, m N0,

slet φz,n,m,s

: Rν ×R satisfy for every θ , ω that

φz,n,m,s

(θ, ω) = 1

Jm Jm

X

j=1

Vz,j,s

n θ,Y z,n,m,j N−n (ω)

− Hn

YNz,n,m,j−n+1(ω),V z,j,s

n−1 Θz,nM −1(ω),Y z,n,m,j N−n+1(ω)

, (72)

(xVz,j,

s

n−1) Θ z,n−1 M (ω),Y

z,n,m,j N−n+1(ω)

, ztn Y

z,n,m,j N−n+1(ω)

−ztn−1 Y

z,n,m,j N−n+1(ω)

2

, for every function z: [0, T]× Rd Rδ and every n ∈ {1,2, . . . , N}, m N

0, s ∈ Rς

let Φz,n,m,s

: Rν × satisfy for every ω , θ ∈ {η : φz,n,m,s

(·, ω) : Rν R is differentiable at η} that

Φz,n,m,s

(θ, ω) = (θφz,n,m,

s

Figure

Table 1: Numerical simulations for the stochastic heat equations with additive noise in (81).
Table 2: Numerical simulations for the stochastic heat equations with multiplicative noise in (88).
Table 3: Numerical simulations for the stochastic Black–Scholes equations with multiplica- multiplica-tive noise in (97)
Table 4: Numerical simulations for the Zakai equations in (133).

References

Related documents

To provide the integrity, authenticity, authority, and privacy on the data sharing in the cloud storage, the notion of an aggregatable certificateless designated verifier

According to Ferreir´ os (2011: § 2), Cantor maintained that the Well-Ordering Theorem is “a fundamental and momentous law of thought.” Zermelo clearly must have disagreed because

This pattern was designed trough the explicit addressing of the particular cooperation partner. Instead of using, for example, the turn indicator outside the vehicles, which

• Development Proposal Search Screen -- enter search criteria • List window -- view awards retrieved by your selection. The advantage of two separate windows is that your list

This study has revealed that morphine is commonly prescribed to control pain in at-home cancer patients by doctors who support home care medicine in Japan.. The restricted

For an input dataset that is stored in MongoDB and needs to be ana- lyzed, best performance can be achieved when data is read from MongoDB and the output is written to HDFS as it

Caller class – Permission testing not performed Class called – Permission testing performed Caller class – Permission testing not performed. Class called – Permission

Bayesian modelling of the time delay between diagnosis and settlement for Critical Illness Insurance using a Burr generalised-linear-type model. Insurance: Mathematics and