arxiv: v1 [math.pr] 5 Dec 2020

(1)

arXiv:2012.03081v1 [math.PR] 5 Dec 2020

NON-MARKOVIAN AND APPLICATIONS

LOURIVAL LIMA, PAULO RUFFINO, AND FRANCYS SOUZA

Abstract. In this survey we present the near-optimal stochastic control problem according to some recent tools in the literature. In particular, we focus on the approach of a discretization of the noise values instead of the canonical time-discretization. This is the so calledskeleton structure. This allows to obtain an ǫ-optimal control in non-Markovian systems (the main Theorem). A simple

example illustrates the technique. The importance of the approach is emphasised in a final section on open problems related to more geometrical framework and discontinuous noise.

1. Introduction

Control theory plays a major role in most applications of differential equation in any physical system. This theory is crucial when one can control – either constant in time or time-dependent, one or more parameters of a system, say, with conditions like: temperature, pressure, concentration of substances, investments, humidity, position and velocity of autonomous vehicles, satellites, elec-tromagnetic parameters, action with vaccination in a population etc, just to mention few of them. Around the last few decades of the 20th century, emboldened by the well development of deterministic control theory and the constant improving of stochastic analysis and stochastic dynamics, the theory of stochastic control started to develop rapidly thanks also to a countless number of relevant appli-cation. As for the deterministic control theory, among hundreds of excellent introductory literature, we mention e.g. Colonius and Kliemann [8], Bullo and Lewis [5], Bacciotti [3] and references therein; yet, for stochastic analysis, dynamics and control, among a list of excellent introductory texts, see e.g. Arnold [1], Oksendal [24], Protter [25] and references therein.

Control theory in stochastic systems is fascinating in the sense that although the outcome is random and unpredictable, nevertheless in many cases, its law as a random variable can be controlled. It means that many useful properties and tools can be applied in order to optimise the chance that the outcomes are favourable. The purpose of this review is to show some of these tools and in particular to show an application which describes an algorithm to obtain anǫ-sub-optimal control. The open problems in the final section show how challenging and significant this topic is.

This review is organised as follows: after presenting the general set-up in the Introduction, in Section 2 we introduce the main framework and discretization structure. Section 3 presents recent tools and results concerning near optimality (Theorem 1). A toy model illustrates the algorithm of the main theorem. Section 4 summarises some open relevant questions in a wider scenario of near optimal stochastic control: we discuss the importance of the multiplicative equation and the case of discontinuous noise. Namely: The first open problem regards stochastic control systems in a Lie group, intended to include matrix product; the second problem aims L´evy noise.

1.1. The set-up and the model. We consider a filtered probability space (Ω,_F,F_,P_{), where the} σ-algebra filtrationF_:=_F_t _{is generated by a stochastic process} _Zt_, _t_∈_[0_{, T}_{] (the noise), which is a}

functional of the Brownian motion in the real line, including non-Markovian and non-semimartingale processes controlled by mutually singular measures. We shall denote byXu₍_t_{) the trajectory (the}

out-come) in the state space corresponding to a certain control functionu. Our model includes functional dependence on the past trajectory (see more related to pseudo-Markovian equation in [7]). That is:

Date: November 2020.

(2)

at any timet∈[0, T], the whole past trajectory of the system in [0, t], denoted by the sub-indexXtu

is considered in the equation. Hence, we haveXtu: Ω×[0, t]→Rn in such a way thatXtu(s) =Xtu′(s)

for alls ≤t ≤t′, in particularXtu(t) =Xu(t). We assume that Xu(t) is modelled by the following

functional stochastic differential equation:

(1.1) dXu(t) =α(t, Xtu, u(t))dt+σ(t, Xtu, u(t))dZ(t),

where α and σ are vector fields sufficiently regular depending on t, on the past trajectories Xu t

and on the control function u. More precisely, in our non-antecipative model, the control function

u: Ω×[0, T]→Rk _{lives in the space}_UT _{of bounded,}_F

t-adapted stochastic processes. At each time t∈[0, T], the ”best future” control segment will be looked for in the spaceU_tT; 0≤t≤T, a suitable family of admissibleF_{-adapted controls defined over (}_{t, T}_{], such that}_UT _:=_UT

0.

The aim of the controller is to optimise the expectation of a given functional ξ, say, the payoff, energy, cost, distance etc. More precisely, denote byC([0, T]) the set of continuous functions from [0, T] to the state space Rn_{, then} _ξ _: _C([0_{, T}_])_→ R_{is a Borel (payoff, say) functional. Our aim is,}

theoretically, to find a control functionu(t) which achieve the optimal successful result: sup

u∈UT

E_[_ξ₍_Xu_)]_.

It is well known in the literature that such an optimal control function may not exist, see e.g. [22] or [33]. In view of this fact, we look for an algorithm which provides a sufficiently close approximation in the sense that given anyǫ >0, it is possible to construct a controlu∗ _{such that}

(1.2) Ehξ(Xu∗)i> sup

u∈UT

E_[ξ(Xu)]−ǫ.

The original main problem of calculating an optimal control is essentially an infinite-dimensional nonlinear optimisation question in which explicit solutions are not available. Our strategy is to reduce, via a convenient type of discretization, into a finite dimensional problem, such that near-optimal (i.e.

ǫ-optimal) adapted controls satisfying equation (1.2) can be calculated. A proof of convergence of the discretized version to the original continuous problem can also be done. Note that in this model the functionalξ depends only on the state space. This is not quite a big restriction since increasing the dimension of the state spaceRn _{one can include dependence of}_ξ _{also of the control}_u₍_t_{), on time} _t_,

or on any other relevant parameter of the model.

2. Classical results and new tools

In this section, we summarise some of the classical results in the literature concerning stochastic optimal control and respective numerical methods. In the Markovian case, the traditional approach is based on Hamilton-Jacobi-Bellman (HJB) PDEs and based in discretizations into Markov chains, see e.g. [15, 17]. We also remember that the use of PDE techniques is only numerically efficient in low dimensions.

In addition to the Markovian context, the value process cannot be characterised by HJB-type PDE’s, and consequently, the solution of the stochastic optimal control problem is much more de-manding. For details, see e.g., among many others [23,26,30,28,32,27]. The PDE approach in the sense of [10,9] for the problem of non-Markovian control in continuous time, in general, is restricted to the theoretical characterisation of the value process, not allowing to derive a numerical algorithm for the concrete resolution of the problem, that is, finding optimal or approximately optimal strategies. Such an approach is not computationally treatable, especially in situations in which the state of the system is high-dimensional and non-elliptical.

Recently, it has been developed a theory of functional stochastic calculus and control applications in a series of studies [18,19,20,21,22]. The main advantage of this approach over other methods lies in the numerically viable description of stochastic systems controlled in continuous time, especially in problems of optimal control, possibly non-Markovian and of high dimension. In particular, in [22],

(3)

the authors develop a discretization method that allows to calculate near-optimal controls for a given optimal control problem in a range of [0, T] with 0< T <∞, via a discrete-time PDE.

From now on, we describe the relevant aspects of this construction. In few words: the main idea in this new approach is to consider a discretization based on a partition in the values of the noise, instead of a partition in the time line. The control function is then updated towards best performance at each stopping time associated to this partition. The updating of the control is performed using dynamics programming (DP). The advantage of this approach is that it naturally give us a numerical algorithm for simulation (see the numerical example in Section3.1below).

For a pair of finiteF_{-stopping times (}_{M, N}_{), we denote the random intervals in the standard way:}

]]M, N]] := {(ω, t);M(ω) < t _≤ N(ω)} and ]]M,+∞[[:= {(ω, t);M(ω) < t < +∞}. The control function is bounded with values in the so calledaction spacewhich is going to be the compact cube

A:={(x1, . . . , xr)∈Rr; max

1≤i≤r|xi| ≤a}

for some 0< a <+∞.

In order to set up the basic structure of our control problem, we first need to define the class of admissible control processes: For each pair (M, N) of a.s finite F_{-stopping times such that} _{M < N}

a.s, we denote

UMN :={F-predictable processes u: ]]M, N]]→A; such that the limit u(M+) exists}.

Note that whenever necessary, we can always extend a given u _∈ UN

M by setting u = 0 in the

complement of the random interval ]]M, N]]. Elements in the family of processes in UN

M satisfy the

following straightforward properties:

1): Restriction: u∈U_MN ⇒u|]]M,P]]∈UMP forM < P ≤N a.s.

2): Concatenation: Ifu∈U_MN andv∈U_NP forM < N < P a.s, then (u⊗N v)(·)∈UMP, where

(u_⊗N v)(r) :=

u(r); ifM < r_≤N v(r); ifN < r_≤P.

3): Finite composition: For everyu, v_∈UN

M andK∈ FM, we have u1K+v1Kc∈U_MN.

ByUM we meanU_M∞. For each 1≤p <∞, letLpa(P×Leb) be the Banach space of allF-adapted Rn_{-valued processes. I.e.} _Y _∈_Lp_a₍P_×_Leb_{) if and only if it is adapted and}

E Z T

0 k

Y(t)kp dt <_∞.

ConsiderBp₍_F_{) the space of}_F_{-adapted processes}_Y _{such that}

kY_kp_Bp:=E

sup

0≤t≤Tk Y(t)k

p <_∞.

Definition 2.1. A continuous controlled functional is a map X : U0 → Lpa(P×Leb) for some p≥1, such that for each t ≥0 and u∈U0, {X(s, u); 0≤s ≤t} depends on the control u only on

(0, t] andX(·, u)has continuous paths for each u∈U0.

In many interesting examples, a continuous functional is in fact a Wiener functional. Our dis-cretization technique will require that we consider (payoff) functionals also in the space of discontinu-ous trajectories. We shall denote byDn

T the space of c`adl`agRn-value functions on the interval [0, T]

(recall that the French acronym c`adl`ag refers to continuity on the right with limit on the left). We endow the linear spaceDn

(4)

We shall assume the following hypotheses which concern a sort of continuity of thepayofffunctional

ξand of the controlled functionalX(·, u) with respect to appropriate topologies.

Assumption 1. (H¨older continuity of ξ) There existsγ∈(0,1]and a constantC >0 such that |ξ(f)−ξ(g)| ≤C

sup

0≤t≤Tk

f(t)−g(t)k

γ

for everyf, g∈Dn T.

Assumption 2. (Continuity of u7→X(·, u)) There exists a constant C such that kX(·, u)−X(·, v)k2B2₍_F₎≤CE

Z T

0 k

u(s)−v(s)k2Rrds

for everyu, v∈U0.

From now on, we are going to fix a controlled functionalX :U0T →B2(F). For a given functional

ξ:Dn

T →R, we denote

ξX(u) :=ξ X(·, u)

;u∈U0

and thevalue process:

(2.1) V(t, u) := ess sup

v∈UT t

Eh_ξX₍_u_⊗_t_v₎_|F_ti_{; 0}_≤_{t < T, u}_∈_U₀_,

where V(T, u) := ξX(u) a.s and the process V(·, u) has to be viewed backwards. Throughout this paper, in order to keep notation simple, we omit the dependence of the value process in (2.1) on the controlled functional X and on the payoffξ, in such a way that V means the mapping V : U0 →

L1₍_P_×_Leb_).

Now, we introduce what we call acontrolled embedded discrete structure, see, e.g. [22]. The heuristic is to view a controlled functionalu_7→Y(·, u) as a family of simplified models which one has to build in order to extract the relevant information in order to get a concrete description of value processes and the construction of their associatedǫ-optimal controls.

The discretization procedure will be based on a class of pure jump processes driven by suitable waiting times which describe the local behaviour of the Brownian motion. We briefly recall the basic properties of this skeleton (see among others, [22] and references therein). We start by constructing a sequence of stopping timesT :={Tnk;n≥0}which is the basis for our discretization scheme. We set T0k:= 0 and

Tnk:= inf{Tn−k 1< t <∞;kB(t)−B(Tn−k 1)k∞=ǫk}, n≥1,

whereP

k≥1ǫ2k <∞andk · k∞is theℓ∞ norm inRn. This implies that

∆T_nk:=T_nk₋T_n−k ₁= min

j∈{1,2,...,d}{∆T k,j n }a.s

where, coordinate-wise, forj= 1, . . . , d

∆T_nk,j := inf{0< t <∞;|Bj(t+T_n−k ₁)−Bj(T_n−k ₁)|=ǫk}, n≥1.

Then, we define the discretization of the Brownian motionAk _{:= (}_Ak,1_,_{· · ·} _{, A}k,d_{) by} Ak,j(t) :=

∞ X n=1

Bj(Tnk)−Bj(Tn−k 1

1{Tk

n≤t}; t≥0, j= 1, . . . , d,

for everyk≥1.

The completion (including subsets of null sets) of the filtration generated by Ak is denoted by

(5)

Ftk∩ {Tnk≤t < Tnk+1}=FTkk n∩ {T

k

n ≤t < Tnk+1};t≥0

Definition 2.2. The structure D ₌ _{T, Ak_;_k _≥ ₁_} _{is called a} _{discrete-type skeleton} _{for the}

Brownian motion.

By strong Markov property, we observe that (see, e.g. [19]):

(1) The jumps ∆Ak,j(T_nk);n= 1,2, . . .are independent and identically distributed (iid). (2) The waiting times ∆T_nk;n= 1,2, . . .are iid random variables inR₊_.

(3) The families (∆Ak,j(Tnk);n= 1,2, . . .) and (∆Tnk;n= 1,2, . . .) are independent.

Moreover, it is immediate thatAk,j _{is a square-integrable}_Fk_{-martingale for each}_j_{= 1}_{, . . . , d}_.

This structure is the basic settings for many approaches of near-optimal problems. In particular, in the next section we are going to describe how we apply this framework in our dynamic programming technique.

3. State of the art and applications

We introduce the following subclasses of locally constant control functions Uk,Tnk Tk

ℓ ⊂

UTnk Tk

ℓ with

0≤ℓ < n <_∞. Namely,Uk,Tnk Tk

ℓ is the set of

Fk_{-predictable processes of the form}

(3.1) vk(t) =

n X j=ℓ+1

v_j−k 11{Tk

j−1<t≤Tjk}; T k

ℓ < t≤Tnk,

where for each j = ℓ+ 1, . . . , n, v_j−k ₁ is an A-valued FTkk

j−1-measurable random variable. To keep

notation simple, we use the shorthand notations

(3.2) U_ℓk,n :=Uk,Tnk

Tk ℓ

; 0≤ℓ < n,

whereU_ℓk,∞ is the set of all controlsvk : ]]T_ℓk,+∞[[→Aof the form

vk(t) = X

j≥ℓ+1

v_j−k 11{Tk

j−1<t≤Tjk}; T k ℓ < t,

wherevk_j−₁is anA-valuedFk Tk

j−1-measurable random variable for everyj≥ℓ+ 1 for an integerℓ≥0.

We also use a shorthand notation foruk_⊗ Tk

ℓ v

k _{as follows: for}_uk _∈_Uk,n

0 andvk ∈U

k,n

ℓ withℓ < n,

we write

(uk⊗ℓvk) := (uk0, . . . , ukℓ−1, vℓk, . . . , vn−k 1).

This notation is consistent becauseuk⊗Tk ℓv

k_{only depends on the list of variables (}_uk

0, . . . , ukℓ−1, vℓk, . . . , vn−k 1)

wheneveruk_{: ]]0}_{, T}k

n]]→Aandvk: ]]Tℓk, Tnk]]→Aare controls of the form (3.1) forℓ < n.

Let us now introduce the analogous concept of controlled functional but based on the filtrationFk_.

For this purpose, we need to introduce some further notations. Let us define

e(k, t) :=lǫ

−2

k t χd

m

; 0≤t_≤T,

where⌈x_⌉is the smallest integer greater or equal tox_≥0 and

χd:=E_min_{_τ1_{, . . . , τ}d_}_,

where (τj) is the stopping time inf{t >0;|Wj(t)|= 1} for eachj = 1, . . . , d. We have the following crucial uniform convergence:

(6)

Lemma 3.1. When k goes to infinity, the random variable T_ek₍_k,s₎ a.s. converges uniformly to s in the interval[0, t], i.e. for any 0≤t≤T

lim

k→∞₀sup_≤s≤t|T k

e(k,s)−s|= 0 a.s.

Moreover, the convergenceTk

e(k,t)→t holds also inLp(P)for any 1≤p≤ ∞.

Proof. : The uniform convergence holds using analogous arguments of [14, Lemma 2.2], just adapting easily the argument there forǫk= 2−n/2to our generalǫk _∈ℓ2. Convergence inLp holds by Lebesgue

convergence theorem.

✷ LetOT(Fk_{) be the set of all step-wise constant}Fk_{-adapted processes of the form}

Zk(t) =

∞ X n=0

Zk(T_nk)1{Tk

n≤t∧Te(k,Tk )<Tn+1k }; 0≤t≤T,

whereZk₍_Tk

n) isFTkk

n-measurable for everyn≥0 andk≥1.

Let us now present two concepts which will play a key role in the main result. Definition 3.1. A controlled embedded discrete structure Y = (Yk₎

k≥1,D consists of the

following objects: a discrete-type skeletonD _{and a map}_uk _7→_Yk₍_·_{, u}k₎_from_Uk,e(k,T)

0 toOT(Fk)such

that

Yk(Tnk+1, uk)depends on the control only at (uk0, . . . , ukn),

for each integern∈ {0, . . . , e(k, T)−1}.

Definition 3.2. Astrongly controlled functional is a pair (X,X), whereX is a controlled func-tional andX = (Xk)k≥1,Dis a controlled embedded discrete structure such that the random variable

kXk₍_φ₎_k

∞;φ∈U0k,e(k,T) is uniformly integrable for each k≥1 and

lim

k→+∞ sup φ∈U0k,e(k,T)

E_k_Xk₍_φ₎₋_X₍_φ₎_kγ_∞_{= 0}_,

for 0< γ_≤1.

The concepts of controlled embedded discrete structures and strongly controlled functionals are nonlinear versions of the structures analysed in [20]. The typical example we have in mind of a strongly controlled (Wiener) functional is a controlled state which drives a stochastic control problem. See example in the Section3.1below.

Throughout this section, we are going to fix a controlled embedded discrete structure

uk7→Xk(·, uk) and we set

ξXk(uk) :=ξ Xk(·, uk),

foruk _∈_Uk,e(k,T)

0 . We assume

kXk₍_φ₎_k

∞;φ∈U0k,e(k,T) is uniformly integrable for eachk≥1. We

then define the discrete value process:

Vk(T_nk, uk) := ess sup

φk_∈Uk,e(k,T) n

Eh_ξ_Xk(uk⊗nφk) F_Tkk

n i

; n= 1, . . . , e(k, T)−1,

with boundary conditions

Vk(0) :=Vk(0, uk) := sup

φk_∈Uk,e(k,T) 0

E

ξXk(φk)

(7)

This naturally defines the mapVk _:_Uk,e(k,T)

0 →OT(Fk) with jumpsVk(Tnk, uk);n= 1, . . . , e(k, T)

foruk ∈U0k,e(k,T). One should notice thatVk(Tnk, uk) only depends onuk,n−1:= (uk0, . . . , ukn−1) so it

is natural to write

Vk(Tnk, uk,n−1) :=Vk(Tnk, uk);uk∈U k,e(k,T)

0 ,0≤n≤e(k, T),

with the convention that uk,−1 _:= _{0. By construction,} _Vk _{satisfies the definition of a controlled}

embedded discrete structure.

We present the main theorem in this abstract setting, which naturally includes the original opti-mization problem of the original model (1.1). The discretization structure constructed so far plays a fundamental role:

Theorem 3.1. Let X andξ:Dn

T →Rbe respectively a strongly controlled functional and a (payoff )

functional which satisfies the continuity Assumptions (1) and (2) of the last Section. Then: 1): The value function(V_jk)_je₌₀(k,T) associated withXk is the unique solution of

(3.3) ess sup

uk_∈Ue(k,T) j

E_Vk₍_Tk

j+1, θ⊗juk) F_Tkk

j

−Vk(T_jk, θ) = 0, a.s. j=e(k, T)−1, . . . ,0,

for allθ∈U₀j, with boundary condition

Vk(Tek(k,T), u) =ξXk (u) for u∈U e(k,T)

0 .

2): We have

lim k→∞ V k₍₀₎ − sup u∈UT 0

E₍_ξX₍_u₎₎ = 0

3): There exist control functionsuk,⋆_j : Ω×[Tk

j, Tjk+1]→Rsuch that

Vk(T_jk, θ) =E_Vk₍_Tk

j+1, θ⊗juk,⋆j ) F k Tk n

wherej=e(k, T)−1, . . . ,1. 4): Given ǫ >0,

E _ξ_Xk(uk,⋆)

≥ sup

u∈UT 0

E₍_ξX₍_u₎₎₋_ǫ

for everyk sufficiently large, withuk,⋆_{= (}_uk,⋆

0 , . . . , u

k,⋆ e(k,T)−1).

Note that in the Theorem above, we have transformed the infinite-dimensional optimization problem in a finite-dimensional algorithm which can be numerically implemented. The near-optimal problem of equation (1.2) is then solved for a reasonable wide range of systems.

Comments on the proof and numerical algorithms: Initially note that formula (3.3) is just the algorithm of the dynamic programming. For its implementation one has to know explicitly the model of the controlled system, i.e. the parameters of equation (1.1). Nevertheless, numerically speaking, mind that if the model is unknown (e.g. typical situation in finance), one still can use machine learning techniques to obtain an optimal control and, consequently, the value function based on data and on the simulation of the environment.

For simplicity, for any control functionuk j define

Qk(T_jk, θ) =EVk(T_jk₊₁, θ_⊗jukj) F k Tk n .

We have several approaches on solving the dynamic programming equation. Say, given the model, one can try to solve it directly and obtain a closed solution for optimal control. For example, if the controlled system is linear, the instantaneous cost function and the terminal cost are quadratic

(8)

functions, it is possible to determine the solution of the dynamic programming equation using the Riccati equation. In general, the numerical algorithms to solve the dynamic programming equation are based on:

• Choose the discretization levelkusing the convergence rate (see details in [22]).

• ApproximateQk_j function into each stepjby calculating the conditional expectation. For ex-ample, use regression methods with Monte Carlo simulation and machine learning techniques. • Once we get an approximation ˆQkj to the function Qkj, the optimal control ˆu(okj) obtained

by exhaustive search ifAis discrete, or by algorithms based on the gradient method via e.g. neural networks, whenA is a compact with open interior.

✷ For a full proof of the theorem, see [22]. Recently, several numerical methods based on deep learning techniques have been proposed to solve the PD equation, see [11,12,13]. In this context, Hur´e et al. [12] propose machine learning based methods to solve the dynamic programming equation. In general, under the assumption that the controlled system is Markovian, these authors propose an approach for optimal control through deep learning techniques and calculate the value function via Monte Carlo simulation. Theorem3.1 above proposes a generalization of these algorithms for the non-Markovian case.

3.1. Numerical Example. Let us now present a simple example to illustrate the theory presented so far. We intent to show the power of the numerical technique in a toy model in such a way that one can compare the results of the simulation with the unique theoretical solution. For this purpose, we choose the example of hedging in a Black-Scholes model: For given real parameterscandK >0, consider the cost functional

̺c(x, y, K) := (c+x₋max(y₋K,0))2; (x, y)∈R2_.

Assume that we have the classical geometrical equation:

dS(t) =S(t) (µdt+σdB(t))

where, for simplicity, we assume driftµ= 0, final time T = 1. Then, the problem is to find, among all possible control functionsφ_∈U₀T

infEh_̺c _X₍_{T, φ}₎_{, S}₍_T₎_{, K}i ,

where the controlled state space is given by:

X(t, φ) =

Z t

0

φ(r)dS(r); φ_∈U₀T,0≤t_≤T.

The control functions represent the absolute percentages of the securitiesS which an investor holds at timet ∈[0, T] when, in our notation a= 1. It is well-known that there exists a unique choice of (c∗, φ∗)∈R_×_U₀T _{such that}

inf

(c,φ)∈R×UT 0

Eh_̺c _X₍_{T, φ}₎_{, S}₍_T₎_{, K}i

=Eh_̺c∗ X(T, φ∗), S(T), K i

= 0,

whose solution is given by

c∗=S0Φ(d1)−

K BTΦ(d2),

where

d1=

logS_K(0)+σ2

2T

σ√T , d2=d1−σ

√

(9)

and Φ is the cumulative distribution function of the standard Gaussian variable. We recallc∗ is the price of the option and φ∗ is the so-called delta hedging which can be computed by means of the classical PDE Black-Scholes as a function of Φ. We setS1(0) = 49, σ = 0.2,K = 55 andǫk = 2−k_.

The discretization is given by

Xk(t, φk) =Xk₍_t_∧_Tk

e(k,T), φk), Xk(t, φk) =

Z tk¯

0

φk(r)dSk(r);φk ∈U₀k,e(k,T),

and Sk(t) = Sk₍_t_∧_Tk

e(k,T)). To shorten notation write S

k

n =Sk(Tnk); 0 ≤n ≤e(k, T) and Ek,n for

a conditional expectation with respect to Fk Tk

n. At first, for a givenc ∈

R_{, we apply the algorithm}

described above to get a Monte Carlo optimal control approximation φ∗,k = (v₀∗,k, . . . , v∗,k_m−₁). In this particular case, we can analytically solve and the optimal control is given by: v⋆,k

m = 0 and for

1≤i_≤m, we have

v_m−i⋆,k =−

E_k,m−ihPi−1 j=0

v⋆,k_m−j∆Sk_m₊₁_−j₋H∆S_m−ik,1 ₊₁i

S_m−ik,1 2σ2 1ǫ2k

withH = max{(S(T)−K),0}. The estimated valueck,∗ _{is computed according to} ck,∗∈arg min

c∈R

Eh_̺c_Xk₍_{T, φ}k,∗₎_{, S}k₍_T_∧_T_ek₍_k,T₎₎_{, K}i_.

(3.4)

In other words,Eh̺ck,∗

Xk₍_{T, φ}k,∗₎_{, S}k,1₍_T_∧_Tk

e(k,T)), K

i

= 0, since the minimum of the function (3.4) is zero. Table1 presents a comparison between the true call option pricec∗ and the associated Monte Carlo price ck,∗. Figure 1 presents the Monte Carlo experiments for ck,∗ with k = 3. The number of Monte Carlo iterations in the experiment is 2×104_.

Table 1. Comparison betweenc∗ _and_ck,∗ _for_ǫk_{= 2}−k

Result Mean Square Error True Value Difference % Error

1.815708 0.008660 1.811209 0.004499 0.002477%

4. Open problems

In this section we describe open problems which are natural extended situations where the near-optimal stochastic control problem described in the previous sections are relevant. One of them is the more geometrical case of Lie groups and their corresponding quotients (homogeneous spaces) which include spheres, torus, hyperboloid models and many others that are the natural framework for many interesting applications. Another extended direction to be explored is the case of noise with jumps, for example semimartingales with jumps in general, like [16] or, in particular, L´evy noise as e.g. [2]. Here we intend to describe these open problems and some of the difficulties on dealing with them. 4.1. Lie groups. Besides the pure geometrical motivation on extending the control settings to Lie groups, we point out some other applied importance: 1) this theory encapsulates the multiplicative approach, in the sense that the products here can be considered as the usual product of square matrices; 2) Lie groups are the appropriate frame to work with linear systems: in many of these casesGis the group (or a closed subgroup ) of positive determinantn×n-matricesGl+(n,R_{); 3) this}

approach also includes the framework for any homogeneous space via quotient by closed subgroups, e.g. n-dimensional spheres, torus, Grasmannian, projective spaces, hyperboloid model and many others. In all these cases, the properties of the Lie group theory are important tools in the analysis and interpretation of the dynamics: decompositions (Iwasawa, polar, eingenvectors etc), invariance by

(10)

Figure 1. Monte Carlo experiments forck,∗ withǫk= 2−k

translations, adjoints, geometrical structures of fibre bundles, connections, local diffeomorphism with to the corresponding Lie algebra, and many more. A vast and wide-ranging literature are available from applied to more theoretical approach. Just to mention few of them, see e.g. from the classical [6], the well known [31], the more introductory [4] or the more recent [29], including all references therein.

LetGbe a connected Lie group with its corresponding Lie algebra g, identified with the tangent

spaceTeG, where eis the identity ofG. The dynamics of the controlled trajectoriesXu₍_t₎_∈_G_{, with}

initial conditionXu_{(0) =}_e_{in this context is described by the right invariant vector fields:}

(4.1) dXu(t) =d(RXu₍_t₎)_e α(t, X_tu, u(t))dt+d(RXu₍_t₎)_eσ(t, X_tu, u(t))dZ(t),

where αandβ are g-valued functions, d(RXu₍_t₎)_e :TeG→TXu₍_t₎Gis the derivative at the identity

of the right translationRXu(t) :g7→gXu(t). As before, the functionsαandβ depend on timet, the

past trajectorys7→Xu(s) withs∈[0, t] and the control function (bounded measurable)u. If the Lie groupGis a subgroup of matrices, equation (4.1) can be written with a much more familiar notation

dXu(t) =α(t, X_tu, u(t))·Xu(t)dt+σ(t, X_tu, u(t))·Xu(t)dZ(t),

where·stands for the usual product of square matrices.

In this new context, the cost (or payoff) functional are defined on CT, the set of continuous trajectories in G. Hence, as before, for a certain final time T > 0 one is looking for the optimal performance given by

sup

u∈UT

(11)

As before, this optimal control function may not exist and the problem turns into a near-optimal control function. Previous Assumptions (1) and (2) can be consider naturally in this context. The discrete type skeleton is again a helpful tool here: note that the Lie group structure allows a special kind of discretization of the dynamics given by a product of random matrices Mk ·. . . M2· M1,

where each Mj, j = 0,1, . . . , k is an exponential in the Lie group G with M0 =e a.s.. Hence the

main point here is that controlling Xu₍_t_{) is somehow the same as controlling the distributions of} M′

js. To understand fully this multiplicative approach, i.e. in each sense its structure can provide

new properties to the solution demands good answers to some questions like: how to translate the cost functional into this product of matrices? What are the distributions of these random matrices? In what sense controlling Xu(t) is the same as controlling the distributions of M_j′s? One needs a comparison of the dynamical programming in the additive case, as stated in Theorem3.1, with a DP algorithm here which will start from left to the right in the product of theM_j′s. For many applications it is also relevant to understand how this approach behaves with respect to quotients, i.e. for control systems in homogeneous space.

4.2. Discontinuous noise and geodesic jumps. Another interesting open problem related to opti-mal stochastic control relates to discontinuous noise. In particular, we shall consider semimartingales with jumps such that the trajectories are càdlàg. This is the classical case, see e.g. Kurtz, Pardoux and Prother [16], which includes, for example the Lévy noise, Applebaum [2], Protter [25], Oksendal and Sulem [24], among others. In this contextZ ={Zt, t_≥0}is a k-dimensional semimartingale, [Z, Z] denotes the matrix of quadratic variation ofZwhich can be decomposed into [Z, Z] = [Z, Z]c_+[_{Z, Z}_]d_,

where [Z, Z]c _{and [}_{Z, Z}_]d _{represent the continuous and purely discontinuous parts respectively. This}

kind of noise represents the appropriate model for many applications, in fact, just note that the dis-cretization in time of any system driven by Brownian noise (with or without Poisson perturbation) turns out to be of this kind. The open questions here relates to the behaviour of the DP algorithm of Theorem3.1with respect to jumps in the noise.

The canonical Marcus equation is an interesting approach for the dynamics of jump diffusions on manifolds. Heuristically, if ∆sZ =Z(s)−Z(s−) is the jump of the noise at times, the corresponding jump in the dynamics is performed by a ficticious time-one jump along the deterministic flow of the corresponding vector field multiplied by ∆sZ. For more details we refer to [16] where many other properties are also presented. The second geometrical aspect that we propose in this open problem is to perform the jump along the ficticious time-one to be performed along the geodesic starting at the pointX(s−_{) in the direction of vector field at this point multiplied by ∆}_sZ_{. We are going to make it}

clear in the next paragraph.

LetM be anm-dimensional complete Riemannian manifold. Letα(t, Xu

t, u(t)) and σ(t, Xtu, u(t))

be smooth vector fields with bounded derivatives in M. We assume that the controlled process

Xu(t)∈M is modelled by the following geodesic Marcus equation:

(4.2) dXu(t) =α(t, X_tu, u(t))dt+σ(t, Xu, u(t)) ⋄dZ(t),

which is defined in the following way: Xu₍_t_{) is a solution of equation (}_4.2_{) if and only if for any test}

functionf _∈C2₍_M_{) we have:}

f(Xu(t))−f(Xu(0)) =

Z t

0

α(s, X_su, u)s))ds+

Z t

0

σf(s, X_su, u(s))dZ(t)

+ 1

2

Z t

0

σ′σ(s, X_su, u(s))f(X_su)d[Z, Z]cs

+ X

0<s≤t

f(γ(X_s−u , σ(s, X_su, u(s))∆sZ)) − f(Xs−u )−σ(s, Xs−u , u(s))f(Xs−u )∆sZ .

(12)

Hereγ(Xs−u , σ(s, Xsu, u(s))∆sZ) is the unique geodesic starting atXs−u in the direction ofσ(s, Xsu, u(s))∆sZ∈ TXu

s−M. We remark some aspects. Firstly, note that one can easily includes a higher dimensional

noise acting in more than one vector field in the sense that

Z t

0

σf(s, X_su, u(s))dZ(t) :=

Z t

0

σif(s, X_su, u(s))dZi(t).

Equally

Z t

0

σ′σ(s, X_su, u(s))f(X_su)d[Z, Z]cs:= Z t

0

Dσ ∂xiσ

i₍_{s, X}u

s, u(s))f(Xsu)d[Zi, Zj]cs,

is a Stieltjes integral with respect to the continuous bounded variation processes [Zi_{, Z}j_]c s.

Secondly, note that if the test functionf is the identity in an Euclidean space and the rule of jumps

γ(Xu

s−, σ(s, Xsu, u(s))∆sZ) is converted to the time-one flow ofσ, then equation (4.3) turns back to

be the classical Marcus equation as in [16]. The proposed geodesic jumps corresponds to the natural generalization of the Itô equation with càdlàg noise in an Euclidian space to a Riemannian manifold. As expected, in this case we have a coalescing stochastic solution flow.

Finally, note that the subtracting terms on the summation in equation (4.3) are already accounted in the Itˆo integral on the right hand side. Applying Taylor’s formula inf(Xu

s−, σ(s, Xs−u , u(s))∆sZ)

and using the finite quadratic variation one sees that the series converges absolutely.

With this geodesic description of the dynamics, many properties has to be proved: say, existence, uniqueness, flows (of coalescing smooth maps), Itˆo formula, Itˆo-Ventsel kind of formulas, probabilis-tic properties on the process, a kind of support theorem (M-invariance when M is embededd in an Euclidean space with tangent vector fields) among many others. In the context of the control prob-lem, the challenge is to answer the following question: how does the DP algorithm behaves in this context? What would be geometrical properties of the near-optimal control function u⋆ which sat-isfiesEh_ξ₍_Xu∗₎i_> _sup

u∈UT

E_[_ξ₍_Xu_)]₋_ǫ_{, for any positive}_ǫ _{? Dependence on geometrical parameters,}

curvature or even on the topology ofM, etc, looks quite promising and interesting issues. References

[1] Arnold, L. –Random Dynamical Systems. Springer-Verlag, 1998.

[2] Applebaum, D. –L´evy Processes and Stochastic Calculus. Cambridge University Press, 2009. [3] Bacciotti, A. –Local Stability of Nonlinear Control Systems. World Scientific, 1992.

[4] Baker, A. –Matrix Groups: An Introduction to Lie Group Theory, Springer, 2002. [5] Bullo, F. and Lewis, A. –Geometric Control of Mechanical Systems. Springer-Verlag, 2004.

[6] Chevalley, C. –Theory of Lie groups. 15th printing, Princeton Landmarks in Mathematics. Princeton University Press, 1999.

[7] Claisse, J., Talay, D. and Tan, X. A Pseudo-Markov Property for Controlled Diffusion Processes.SIAM J. Control Optim, 54, 2, 1017-1029.

[8] Colonius, F. and Kliemann, W. –The Dynamics of Control. Birkh¨auser, 2000.

[9] Cont, R. and Fournie, D. – Functional Itˆo calculus and stochastic integral representation of martingales,Ann. of Probab,41_{, 1, 109-133, 2013.}

[10] Dupire, B.Functional Itˆo calculus. Portfolio Research Paper 2009-04. Bloomberg.

[11] Han, J. and Weinan E. (2016) - Deep learning aproximation for stochastic control problems. arXiv:1611.07422. [12] Hurˆe, C., Pham, H., Bachouch, A. and Langren´e, N. (2018) - Deep Neural networks algorithms for stochastic

controle problems on finite horizon, part I: convergence analysis. arXiv:1812.04300.

[13] Hurˆe, C., Pham, H., Bachouch, A. and Langren´e, N. (2018) - Deep Neural networks algorithms for stochastic controle problems on finite horizon, part II: numerical applications. arXiv. 1812.05916.

[14] Khoshnevisan, D. and Lewis, T.M. (1999). Stochastic calculus for Brownian motion on a Brownian fracture.Ann. Appl. Probab.9_{(3), 629–667.}

[15] Krylov, N. V. (1999). Approximating value functions for controlled degenerate diffusion processes by using piecewise constant policies.Electron. J. Probab.4_{, 1-19.}

[16] Kurtz, T. , Pardoux, E. and Protter, P. - Stratonovich Stochastic Differential Equations Driven by General Semi-martingales.Annales de l’I.H.P, section B, 31 (1995), 351-377.

(13)

[17] Kushner, H.J and Dupuis, P.Numerical Methods for Stochastic Control Problems in Continuous Time, 2nd edn., Applications of Mathematics, Vol. 24, Springer-Verlag, 2001.

[18] Le˜ao, D. and Ohashi, A. – Weak approximations for Wiener functionals. Ann. Appl. Probab, 23, 4_{, 1660-1691,} 2013.

[19] Le˜ao,D. Ohashi, A. and Simas, A. B. – A weak version of path-dependent functional Itˆo calculus.Ann. Probab,46_, 6, 3399-3441 (2018).

[20] Le˜ao,D. Ohashi, A. and Simas, A. B. – Differentiability of Wiener functionals and occupation times.Bulletin des Sciences Math´ematiques, 149, 23-65, 2018.

[21] Le˜ao, D., Ohashi, A. and Russo, F. – Discrete-type approximations for non-Markovian optimal stopping problems: Part I.J. Appl. Probab.56_{, 4, 981-1005, (2019)}

[22] Le˜ao,D. , Ohashi, A. and Souza, F. – Stochastic near-optimal controls for path-dependent systems. Arxiv1707.04976v2, 2017.

[23] Nutz, M. – A Quasi-Sure Approach to the Control of Non-Markovian Stochastic Differential EquationsElectron. J. Probab,17_{, 23, 1-23, 2012.}

[24] Oksendal, B. and Sulem, A. –Applied Stochastic Control of Jump Diffusions. 3rd Ed. Springer-Verlag, 2019. [25] Protter, P. –Stochastic Integration and Differential Equations. Springer, 2005.

[26] Qiu, J. – Viscosity Solutions of Stochastic Hamilton–Jacobi–Bellman Equations.SIAM J. Control Optim,56_{, 5,} 3708-3730, 2018.

[27] Ren, Z and Tan, X. – On the convergence of monotone schemes for path-dependent PDEs. Stochastic Process. Appl,127_{, 6, 1738-1762. 2017.}

[28] Saporito, Y. – Stochastic Control and Differential Games with Path-Dependent Influence of Controls on Dynamics and Running Cost.SIAM J. Control Optim,57_{, 2, 1312-1327 (2019).}

[29] San Martin, L. –Lie Groups. Springer-Nature, 2020.

[30] Tan, X. – Discrete-time probabilistic approximation of path-dependent stochastic control problems.Ann. Appl. Probab,24_{, 5, 1803-1834 (2014).}

[31] Warner, F. –Foundations of Differentiable Manifolds and Lie Groups, Springer-Verlag, 1983.

[32] Zhang, J and Zhuo, J. – Monotone schemes for fully nonlinear parabolic path dependent PDEs,Journal of Financial Engineering,1_(2014).

[33] Zhou, X.Y. – Stochastic near-optimal controls: Necessary and sufficient conditions for near optimality. SIAM. J. Control. Optim, 36, 3, 929-947 (1998).

Mathematics Department, Rua S´ergio Buarque de Holanda, 651, UNICAMP - State University of Camp-inas, 13083-859 CampCamp-inas, Brazil, Research supported by CAPES 88882.329064/2019-01.

Email address:[email protected]

Mathematics Department, Rua S´ergio Buarque de Holanda, 651, UNICAMP - State University of Camp-inas, 13083-859 CampCamp-inas, Brazil, Research partially supported by CNPq 305212/2019-2, FAPESP 2020/04426-6 and 2015/50122-0.

Email address:[email protected]

Mathematics Department, Rua S´ergio Buarque de Holanda, 651, UNICAMP - State University of Camp-inas, 13083-859 CampCamp-inas, Brazil, Research supported by FAPESP 2017/23003-6.