Maximum principle for delayed stochastic mean field control problem with state constraint

(1)

R E S E A R C H

Open Access

Maximum principle for delayed stochastic

mean-ﬁeld control problem with state

constraint

Li Chen

1*

_{and Jiandong Wang}

1

*_{Correspondence:} [email protected]

1_{School of Science, China University}

of Mining & Technology, Beijing, Beijing, China

Abstract

In this paper, we consider the optimal control problem for the mean-field stochastic differential equations with delay and state constraint. By virtue of the classical Ekeland’s variational principle, the duality method and a new type of mean-field anticipated backward stochastic differential equation, we obtain the maximum principle of the optimal control for this problem. Our result can be applied to a harvest model from a mean-field system with delay.

Keywords: Mean-ﬁeld; Delay system; Ekeland’s variation principle; Stochastic maximum principle; State constraint

1 Introduction

Mean-ﬁeld stochastic diﬀerential equations (MFSDEs), also called McKean–Vlasov equa-tions, were discussed by Kac [1]. And this study was initiated by McKean [2]. Here, we write a class of commonly used McKean–Vlasov equations as follows:

⎧ ⎨ ⎩

dX=b(t,·,Xt,PXt)dt+σ(t,·,Xt,PXt)dB(t), t∈[0,T], X0=x0,

(1)

where{B(t)}t≥0 is an 1-dimensional standard Brownian motion, deﬁned on a complete probability space (Ω,F,P), denotingPX=P◦X–1 _{as the law of the random variables X.} And the coeﬃcients (b,σ) : [0,T]×Ω×Rd_×_P₂₍_Rd₎_→_R_{are measurable functions. Here,}

P2(Rd_{) is the space of all probability measures on}_Rd_{, equipped with 2-Wassertein metric.}

For more details refer to [3].

Equation (1) has been adapted by many researchers, Buckdanhn, Li and Ma [4] studied the general stochastic control problem of Eq. (1). For other, related work, we refer to [5,

6]. Buckdahn, Li and Peng [7,8] proposed a new kind of backward stochastic differen-tial equations, called mean-field backward stochastic differendifferen-tial equations (MFBSDEs), coupled with MFSDEs. Carmona [9] studied the forward–backward mean-field stochastic differential equations (MFFBSDEs). The pioneering work of mean-field games was done by Lion and Lasry [3,10]. They apply Eq. (1) to study the mean-field potential games in the large players and symmetric equilibrium. More about the mean-field games is in [3,

11].

(2)

In fact, Eq. (1) is too general to be used in the real world. Therefore, many researchers gave many different mean-field definitions. As pointed out by Buckdahn, Li and Ma [4], the main difference of these definitions is in the position of the expectation taken in the current literature. And they classified this conclusion in the following two types:

ϕt,ω,X(t),PX(t)

=

⎧ ⎨ ⎩

E[ϕ˜(t,ω,x,X(t))], (i)

˜

ϕ(t,ω,x,E[X(t)]), (ii) (2)

whereϕis the coefficients function in Eq. (1), i.e.ϕ=b,σ.ϕ˜is a different function corre-sponding tobandσ. Li [12] studied the type (i) mean-field control problem. For type (ii) MFSDEs, Buckdahn, Djehiche and Li [7] proved the general stochastic maximum princi-ple. Yong [13] solved a linear–quadratic (LQ) optimal problem for type (ii) MFSDEs by using a decoupling technique. Li et al. [14,15] further studied the closed-loop optimal LQ control problem and the LQ problem in infinite horizon. Andersson and Djehiche [16] generalized this kind of mean-field definition and we give details for Eq. (3):

ϕ(t,ω,x,PX(t)) =ϕ˜

t,ω,x,EψX(t), (3)

whereψis some general nonlinear function. If we letψ(x) =x, then (3) degenerates to (ii). They obtained Pontryagin’s maximum principle for the optimal control problem. Hu and Øksandal [17] investigated this type singular optimal control, and they apply these results to prove the Nash equilibrium and zero-sum equilibrium. This kind of models can be used to describe the large population interacting system.

The study of the optimal control problem with delay also captured lots of attention. Many results were obtained, such as by Øksendal, Suem [18], Chen and Wu [19,20]. The main reason is that real world systems not just depend on their current state, but also their previous history. Optimal control problems of the delayed systems are very diﬃcult because of the inﬁnite-dimensional state space structure. Therefore current research is divided into two kinds of methods to develop the stochastic maximum principle for de-layed systems. One involves adjoint equation; it is given by the anticipated BSDE which was introduced by Peng and Yang [21]; see Chen and Wu [19,20], Yu [22] and Zhang [23]. Another method is to derive a system of three-coupled adjoint equations, which consists of two BSDEs and one backwards ordinary stochastic equation; see Øksendal and Sulem [18].

(3)

the non-convex control domain rather than the convex control domain. By Ekeland’s vari-ational principle, we derive a necessary condition of the optimal control. The maximum principle diﬀers from the classical one for the adjoint equation will be a new kind of mean-ﬁeld anticipated BSDEs.

The rest of this paper is structured as follows: In Sect.2,we give the preliminary results as regards mean-ﬁeld SDDEs and mean-ﬁeld anticipated BSDEs. The stochastic optimal control problem is formulated in Sect.3. We derive the stochastic maximum principle for a stochastic control system in Sect.4. We apply our result to real world model in Sect.5

to ﬁnalize this paper.

2 Preliminary results

Let (Ω,F,F,P) be a complete filtered probability space. {B(t)}t≥0 is a standard 1-dimensional Brownian motion.F={Ft}t≥0 is the natural filtration augmented by all P-null elements ofF.T > 0 andδ≥0 are given constants. For simplicity of notation, we only consider the 1-dimensional case in this paper and all results can be extended to mul-tidimensional cases without difficulty. The norm inRis denoted by| · |. We will also use the following notation for some positive integern:

Ln₍_F

T,R) :={ξ:ξ isR-valuedFT-measurable random variable s.t.E|ξ|n< +∞}; Ln_F(s,r;R) :={ψ(t) :{ψ(t),s≤t≤r}isR-valued adapted stochastic process s.t.

Er

s |ψ(t)|ndt< +∞};

Sn

F(s,r;R) :={ψ(t) :{ψ(t),s≤t≤r}isR-valued adapted stochastic process s.t. with right-continuous path and left limit s.t.E[sup_s_≤_t_≤_r|ψ(t)|n_{] < +}_∞}_.

2.1 Mean-ﬁeld stochastic differential equation with delay

In this subsection, we will show some preliminary results on mean-field stochastic dif-ferential equations with delay (MFSDDEs) and mean-field anticipated backward stochas-tic differential equations (MABSDEs). These theorestochas-tical results include the existence and uniqueness of the solutions, some estimations of the solutions and the duality relationship between the MFSDDEs and MFABSDEs.

Firstly, we consider the following MFSDDE:

⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩

dx(t) =b(t,x(t),x(t–δ),Eψ(x(t)),Eψ(x(t–δ)))dt

+σ(t,x(t),x(t–δ),Eϕ(x(t)),Eϕ(x(t–δ)))dB(t), t∈[0,T],

x(t) =x0(t), t∈[–δ, 0],

(4)

where

b: [0,T]×R×R×R×R→R,

σ: [0,T]×R×R×R×R→R,

ψ:R→R, ϕ:R→R.

(4)

(H1.1) x,xδ,μ1,μ1δ,μ2,μ2δ,x¯,x¯δ,μ¯1,μ¯1δ,μ¯2,μ¯2δ∈R, there exists a constantC, s.t.

b(t,x,xδ,μ1,μ1δ) –b(t,x¯,x¯δ,μ¯1,μ¯1δ) + σ(t,x,xδ,μ2,μ2δ) –σ(t,x¯,x¯δ,μ¯2,μ¯2δ)

≤C|x–x¯|+|xδ–x¯δ|+|μ1–μ¯1|+|μ1δ–μ¯1δ|+|μ2–μ¯2|+|μ2δ–μ¯2δ|

;

(H1.2) for anyx,xδ,μ1,μ1δ,μ2,μ2δ∈R, the mappingsb(·,x,xδ,μ1,μ1δ)andσ(·,x,xδ,μ2,

μ2δ)areF-adapted andb(·, 0, 0, 0, 0),σ(·, 0, 0, 0, 0)∈L2_F(0,T;R);

(H1.3) ψandϕare continuously diﬀerentiable and their derivatives are bounded.

Theorem 2.1 Suppose that Assumption(H1)holds,and x0(·)∈L2

F(–δ, 0;R).Then

MFS-DDE(4)has a unique t-continuous solution x(t)and Esup₀_≤_t_≤_T|x(t)|2_{< +}_∞_.

Proof Let us introduce a norm in Banach spaceL2

F(–δ,T;R)

x(·)_β=

E

T

–δ

e–βs x(s) 2ds 1

2

, β> 0.

Clearly it is equivalent to the original norm of L2

F(–δ,T;R). We deﬁne a mapping h:

L2

F(–δ,T;R)→L2F(–δ,T;R) by the following equation s.t.h[X(·)] =x(·):

⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩

x(t) =x0(t) +₀tb(s,X(s),X(s–δ),Eψ(X(s)),Eψ(X(s–δ)))ds

+₀tσ(s,X(s),X(s–δ),Eϕ(X(s)),Eϕ(X(s–δ)))dB(s), t∈[0,T],

x(t) =x0(t), t∈[–δ, 0].

(5)

We desire to prove thathis a contraction mapping under the norm · β. For arbitrary

X(·),X¯(·)∈L2_F(–δ,T;R), seth[X(·)] =x(·),h[X¯(·)] =x¯(·), andXˆ(·) =X(·) –X¯(·),xˆ(·) =x(·) –

¯

x(·). Thenxˆ(·) satisﬁes

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ˆ

x(t) =₀t[b(s,X(s),X(s–δ),Eψ(X(s)),Eψ(X(s–δ))) –b(s,X¯(s),X¯(s–δ),Eψ(X¯(s)),Eψ(X¯(s–δ)))]ds

+₀t[σ(s,X(s),X(s–δ),Eϕ(X(s)),Eϕ(X(s–δ)))

–σ(s,X¯(s),X¯(s–δ),Eϕ(X¯(s)),Eϕ(X¯(s–δ)))]dB(s), t∈[0,T],

ˆ

x(t) = 0, t∈[–δ, 0].

(6)

Denoting

A=bs,X(s),X(s–δ),EψX(s),EψX(s–δ) –bs,X¯(s),X¯(s–δ),EψX¯(s),EψX¯(s–δ),

B=σs,X(s),X(s–δ),EϕX(s),EϕX(s–δ) –σs,X¯(s),X¯(s–δ),EϕX¯(s),EϕX¯(s–δ), applying the Itô formula toe–βt_|ˆ_x₍_t₎_|2_{on [0,T], we have}

βE

T

0

x_ˆ(t) 2e–βtdt≤2E

T

0

e–βt xˆ(t)A dt+E

T

0

(5)

By (H1.1) and the Cauchy–Schwarz inequality we obtain

(β– 1)E

T

0

e–βt xˆ(t) 2dt

≤32C2E T

0

e–βt Xˆ(t) 2dt+ 32C2E T

0

e–βt Xˆ(t–δ) 2dt

+ 16C2E T

0

e–βt EψX(t)–EψX¯(t) 2dt

+ 16C2E T

0

e–βt EψX(t–δ)–EψX¯(t–δ) 2dt

+ 16C2E T

0

e–βt EϕX(t)–EϕX¯(t) 2dt

+ 16C2E T

0

e–βt EϕX(t–δ)–EϕX¯(t–δ) 2dt

=1+2+3.

Here1≤64C2E

T

–δe–βt| ˆXt|2dt, and by (H1.3) and the mean value theorem we also have

2≤16C2E

T

0

e–βtE ψx

Xη(t)Xˆ(t) 2dt

+ 16C2E T

0

e–βtE ψxδ

Xη₍_t_–_δ₎_Xˆ₍_t_–_δ₎ 2_dt

≤16C2E T

0

e–βt Xˆ(t) 2dt+ 16C2E T

0

e–βt Xˆ(t–δ) 2dt

≤32C2E T

–δ

e–βt Xˆ(t) 2dt,

where Xη₍_t_{) is the value among}_X₍_t_{) and}_X¯₍_t_{) and the constant}_C _{changes line by line.} Similarly,3≤32C2E

T

–δe

–βt_{| ˆ}_X₍_t₎_|2_dt_.

We selectβ= 256C2_{+ 1; then}

E T

–δ

e–βt|ˆxt|2dt≤

1 2E

T

–δ

e–βt| ˆXt|2dt,

or

x_ˆ(·)_β≤√1

2Xˆ(·)β,

which shows thathis a strict contraction mapping. Then it follows from theﬁxed point theoremthat the MFSDDE (4) has a unique solution inL2

F(–δ,T;R). Sincebandσsatisfy

(6)

2.2 Mean-ﬁeld anticipated backward stochastic differential equation

In this subsection, we consider the following MABSDE:

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

–dy(t) =f(t,y(t),y(t+δ),Eψ(y(t)),Eψ(y(t+δ)),z(t),z(t+δ),Eϕ(z(t)),

Eϕ(z(t+δ)))dt–z(t)dB(t), t∈[0,T],

y(T) =ξ,

y(t) =ξ0(t), z(t) =η0(t), t∈(T,T+δ],

(7)

where the coeﬃcients (f,ξ,ξ0,η0) satisfy Assumption (H2): (H2.1) ξ0(·),η0(·)∈L2

F(T,T+δ;R)andξ∈L2(FT,R);

(H2.2) for any t∈[0,T],y,z∈R, yδ,zδ ∈L2(Ω,Ft+δ;R), f(t,y,yδ,Eψ(y),Eψ(yδ),z,zδ,

Eϕ(z),Eϕ(zδ))isFt-measurable;

(H2.3) f(·, 0, 0, 0, 0, 0, 0, 0, 0)∈L2

F(0,T;R);

(H2.4) for any t ∈[0,T], y,y¯,μ3,μ3δ,z,¯z,μ4,μ4δ ∈R, μ3δ,μ¯3δ,zδ,z¯δ,μ¯4,μ¯4δ ∈L2(Ω, Ft+δ;R), there exists a constantM, s.t.

f(t,y,yδ,μ3,μ3δ,z,zδ,μ4,μ4δ

–f(t,y¯,y¯δ,μ¯3,μ¯3δ,z¯,z¯δ,μ¯4,μ¯4δ)

≤M|y–y¯|+|z–z¯|+|μ3–μ¯3|+ Eϕ(z) –Eϕ(z¯) +EFt|y

δ–¯yδ|+|zδ–z¯δ|+|μ3δ–μ¯3δ|+|μ4δ–μ¯4δ|

;

(H2.5) ψandϕare continuously diﬀerentiable and their derivatives are bounded.

Theorem 2.2 Suppose that Assumption(H2)holds,then the MABSDE(7)admits a unique

solution(y(·),z(·))∈S2

F(0,T;R)×LF2(0,T;R).Moreover,if there exist another set of

coeﬃ-cients(f¯,ξ¯,ξ¯0,η¯0)satisfying(H2),and we denote by(y¯(·),z¯(·))the unique solution of MAB-SDE with coeﬃcients(f¯,ξ¯,ξ¯0,η¯0),then we have the following estimate:

E

sup

0≤t≤T

y(t) –¯y(t) 2+

T

0

z(t) –z¯(t) 2dt

≤CE

|ξ–ξ¯|2+

T+δ

T

ξ0(t) –ξ¯0(t) 2

+ η0(t) –η¯0(t) 2

dt

+

T

0

ft,¯y(t),y¯(t+δ),Eψ¯y(t),Eψ¯y(t+δ),

¯

z(t),z¯(t+δ),Eϕ¯z(t),Eϕz¯(t+δ) –f¯t,y¯(t),y¯(t+δ),Eψy¯(t),Eψy¯(t+δ),

¯

z(t),z¯(t+δ),Eϕ¯z(t),Eϕz¯(t+δ) 2dt

, (8)

(7)

Proof On the interval [T–δ,T], Eq. (7) becomes

y(t) =ξ+

T

t

fs,y(s),ξ0(s+δ),Eψy(s),Eψξ0(s+δ),z(s),η0(s+δ),

Eϕz(s),Eϕη0(s+δ)ds–

T

t

z(s)dB(s),

whereξ0(·),η0(·) are given, i.e. (7) is a mean-ﬁeld BSDE without anticipation. According to Theorem 3.1 and Lemma 3.1 in [7], (7) admits a unique solution under Assumption (H2) and the estimate (8) also holds for (y(·),z(·)) on [T–δ,T]. We can proceed with this argument on [T– 2δ,T–δ], [T– 3δ,T– 2δ] step by step. Hence the conclusions in

Theorem2.2can be proved.

3 Formulation of the optimal control problem

The purpose of this section is to discuss the optimal control for a mean ﬁeld with delay system. We consider the system involving delay terms both in the state and the control variables.

Consider the following MFSDDE with control:

⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩

dx(t) =b(t,x(t),x(t–δ),Eψ(x(t)),Eψ(x(t–δ)),v(t),v(t–δ))dt

+σ(t,x(t),x(t–δ),Eϕ(x(t)),Eϕ(x(t–δ)))dB(t), t∈[0,T],

x(t) =x0(t), v(t) =v0(t), t∈[–δ, 0],

(9)

with

b: [0,T]×R×R×R×R×U×U→R,

σ: [0,T]×R×R×R×R→R.

The control domain U is a nonempty bounded subset ofR. We denote byU ={v(·)∈

L2_F(–δ,T;R)|v(t)∈U, for anyt∈[–δ,T]}.Uis called a feasible control set. The cost func-tional is

Jv(·)=E

T

0

Lt,x(t),x(t–δ),Eφx(t),Eφx(t–δ),v(t),v(t–δ)dt

+EΦx(T),Eχx(T), (10)

Problem 3.1 The optimal control problem is to minimizeJ(v(·)) overv(·)∈U subject to

the following ﬁnal state constraint:

EGx(T)= 0. (11)

Assumption (H3) will be in force throughout the rest of this paper: (H3.1) x0(·)∈L2

F(–δ, 0;R),v0∈L2_F(–δ, 0;R), and for anyx,xδ,μ1,μ1δ,μ2,μ2δ,μ3,μ3δ,μ4,

v,vδ∈R, the mappingsb(·,x,xδ,μ1,μ1δ,v,vδ),σ(·,x,xδ,μ2,μ2δ)andL(·,x,xδ,μ3,

(8)

(H3.2) b(·, 0, 0, 0, 0, 0, 0),σ(·, 0, 0, 0, 0)∈L2

F(0,T;R), L(·, 0, 0, 0, 0, 0, 0) ∈L1F(0,T;R) and

Φ(0, 0)∈L1(FT;R).

(H3.3) b,σandΦare continuously diﬀerentiable with respect to their own variables and their derivatives are both continuous and uniformly bound. And there exists some constantC, s.t.

b(t,x,xδ,μ1,μ1δ,v,vδ) 2

+ L(t,x,xδ,μ3,μ3δ,v,vδ) 2

≤C1 +|x|2+|xδ|2+|μ1|2+|μ1δ|2+|μ3|2+|μ3δ|2+|v|2+|vδ|2

,

σ(t,x,xδ,μ2,μ2δ) 2

≤C1 +|x|2+|xδ|2+|μ2|2+|μ2δ|2

,

Φ(x,μ4) 2

≤C1 +|x|2+|μ4|2

.

(H3.4) ψ,ϕ,φandχare continuously diﬀerentiable and their derivatives are uniformly bounded. And moreover,

ψ(x) 2+ ϕ(x) 2+ φ(x) 2+ χ(x) 2≤C1 +|x|2,

for some constantC.

(H3.5) G(x)isFT-measurable for allx∈R,E|G(0)| ≤+∞, andGis continuously

diﬀer-entiable with bounded derivatives.

Ifv(·) is a feasible control and Assumption (H2) holds, by Theorem2.1, Eq. (9) admits a unique solution denoted byxv₍_·₎_∈_S2

F(0,T;R) and it is easy to check that the cost func-tional is well defined. Ifv(·)∈U also satisfies final state constraint (11), we will call it the admissible control. We denote the set of the admissible controls byUad.

Remark 3.1 We shall postpone obtaining the maximum principle until the following bounded and continuous dependence for the solution of state function (9). These results will play an important role in exploring the maximum principle of Problem3.1. One should note that our control domainUis bounded, while the caseUis unbounded can be treated via the bounded case with a convergence technique as mentioned in Zhang [23].

Lemma 3.2 There exists a constant C> 0,such that,for any v(·),u(·)∈U,we have

E sup

0≤t≤T xv(t) 2

≤C, (12)

E sup

0≤t≤T

xv(t) –xu(t) 2

≤CE T

0

_b

Γu(t),v(t),v(t–δ)–bΓu(t),u(t),u(t–δ) 2dt, (13)

where we have used the abbreviated notations for w=v,u,

Γw(t)=t,xw(t),xw(t–δ),Eψxw(t),Eψxw(t–δ),

Θw(t)=t,xw(t),xw(t–δ),Eϕxw(t),Eϕxw(t–δ).

(9)

Proof Firstly, applying basic inequality and B-D-G inequality to Eq. (9), we deduce that

E sup

0≤t≤T xv(t) 2

≤C x0(0) 2+CE T

0

_b

Γv(t),v(t),v(t–δ) 2+ σΘv(t) 2dt.

Sinceb,σ satisfy Assumption (H3.3) andψ,ϕsatisfy Assumption (H3.4), we have

E sup

0≤t≤T xv(t) 2

≤C x0(0) 2+CE

T

0

1 + xv(t) 2+ xv(t–δ) 2+ Eψxv(t) 2+ Eψxv(t–δ) 2

+ Eϕxv(t) 2+ Eϕxv(t–δ) 2+ v(t) 2+ v(t–δ) 2dt

≤C x0(0) 2

+CE T

0

sup

0≤s≤t

xv(s) 2dt

+CE 0

–δ

x0(t) 2dt+CE T

0

_v₍_t₎ 2

+ v(t–δ) 2dt.

The second inequality is obtained by a change of variable. Thus, the required result (12) follows by applying Gronwall’s inequality and the boundedness of the control variables.

Next, let us turn to the continuous dependence result inequality (13). It is easy to obtain

E sup

0≤t≤T

xv(t) –xu(t) 2

≤CE

T

0

bΓv(t),v(t),v(t–δ)–bΓu(t),u(t),u(t–δ) 2

+ σΘv(t)–σΘu(t) 2dt

. (15)

For the ﬁrst term of the right side of (15), we can get

E T

0

bΓv(t),v(t),v(t–δ)–bΓu(t),u(t),u(t–δ) 2dt

≤CE T

0

bΓv(t),v(t),v(t–δ)–bΓu(t),v(t),v(t–δ) 2

+ bΓu(t),v(t),v(t–δ)–bΓu(t),u(t),u(t–δ) 2dt ≤CE

T

0

_xv₍_t_{) –}_xu₍_t₎ 2

+ bΓu(t),v(t),v(t–δ)–bΓu(t),u(t),u(t–δ) 2dt. (16) Because the coeﬃcientσ does not depend on the control variable, it follows that

E T

0

σΘv(t)–σΘu(t) 2dt≤CE T

0

xv(t) –xu(t) 2dt.

(10)

4 Maximum principle of the mean-ﬁeld optimal control problem with delay

4.1 Variation of the trajectory

Now we let (u(·),xu₍_·_{)) be an optimal solution of Problem}_3.1_{. For the simplicity of notation,}

we denotexu(·) byx(·) in the rest of this paper. Given anyτ ∈[0,T) andv(·)∈U, let us deﬁne the following spike variational control:

u₍_t_{) =} the short-hand notationΓu(t) andΘu(t). We introduce the following linear variational equation:

It is easy to check that the variational equation is a linear MFSDDE and it admits a unique solutionq(·)∈S2_(0,_T_;_R_{). We have the following convergence result.}

Lemma 4.1 Suppose(H3)holds.Then there exists C> 0,which is independent of such that

E sup

0≤t≤T

q(t) 2≤C2. (18)

Proof By the Cauchy–Schwartz inequality and B-D-G inequality we have

(11)

+

Since all the derivatives are bounded and the following results:

E

Applying Gronwall’s inequality, the desired conclusion is obtained.

Lemma 4.2 Suppose(H3)holds,then we have

(12)

+ We will use the following abbreviated notations:

By a Taylor expansion, we can rewrite Eq. (22) and we should point out that the expansion is more complex than the case without mean-ﬁeld terms in Chen and Wu [20]. We have

(13)

+

2 = 0. Also, since all the derivatives are bounded, we can group all above terms to deduce that

sup

(14)

4.2 Necessary maximum principle

We will try to derive the necessary conditions of our optimal control problem in this sub-section.

Since we have the ﬁnal state constraint, we ﬁrst introduce the following Ekeland varia-tional principle in [26].

Lemma 4.3(Ekeland’s variational principle) Let(S,d(·,·))be a completed metric space,

and let F:S→Rbe lower-semicontinuous and bounded from below.If there exists u∈S such that F(u)≤infv∈SF(v) +ρfor someρ> 0,then there exists uρ∈S,such that F(uρ)≤ F(u),d(uρ_,_u₎_{≤ √}_ρ_,_{and F}₍_v_{) +}√_ρ_d₍_uρ_,_v_{) >}_F₍_uρ₎_{for any v}₌_uρ_.

Also we let (u(·),x(·)) be the optimal control and corresponding trajectory of (9)–(10) under the ﬁnal state constraint (11). We deﬁne a metricdonUby

dv(·),u(·)=E

T

–δ

Iv(t)=u(t)dt, (27)

whereIis indicator function and then (U,d) is a completed metric space.

Proposition 4.4 Let us deﬁne

Jρ

v(·)=Jv(·)–Ju(·)+ρ2+ EGxv₍_T₎ 2_, ₍₂₈₎ with v(·)∈U.xv₍_·₎_{is the corresponding trajectory of v}₍_·_)._{Then J}_ρ(_·₎_{is bounded and} contin-uous onU.

Proof By the deﬁnition ofJ(v(·)) and (H3), the bounded ofJρ(·) is obvious. And moreover, for anyv(·),ω(·)∈U,

Jρ

v(·)–Jρ

ω(·) 2

≤ Jρ2

v(·)–Jρ2

ω(·)

= Jv(·)–Ju(·)+ρ2– EGxv(T) 2–Jω(·) –Ju(·)+ρ2+ EGxω₍_T₎ 2

≤ Jv(·)–Ju(·)+ρ2–Jω(·)–Ju(·)+ρ2

+ EGxv(·)2–EGxω(·)2

:=ˆJ1+ˆJ2. (29)

Here

ˆ J1= J

v(·)+Jω(·)– 2Ju(·)+ 2ρ × Jv(·)–Jω(·) . (30) We see that terms within the ﬁrst part on the right side of (30) are bounded by some constantC. By the continuity ofL,Φand their derivatives are bounded we have

Jv(·)–Jω(·) 2

≤CE T

0

(15)

+CE T

0

Lt,xω(t),xω(t–δ),Eφxω(t),Eφxω(t–δ),v(t),v(t–δ)

–Lt,xω₍_t_),_xω₍_t_–_δ_),_E_φ_xω₍_t₎_,_E_φ_xω₍_t_–_δ₎_,_ω₍_t_),_ω₍_t_–_δ₎ 2_dt_. ₍₃₁₎ Then, by Lemma3.2, we can deduce

lim

v(·)→ω(·)

Jv(·)–Jω(·) = 0. (32)

WhileJˆ2can be expanded as

ˆ

J2= EG

xv(T)2–EGxω(T)2

= EGxv(T)+Gxω(T) × EGxv(T)–Gxω(T)

≤CE xv(T) –xω₍_T₎ _. ₍₃₃₎

Combining (29)–(33), we can derive the continuous property ofJρ(v(·)).

Remark4.5 One can notice thatJρ(·) is deﬁned on the feasible control setU rather than the admissible control setUad. It means that we can get rid of the state constraint by the new cost functional.

Now we consider the following free ﬁnal state optimal control problem:

inf

v(·)∈UJρ

v(·).

It is easy to verify that

Jρ

u(·)=ρ, Jρ

v(·)> 0, (34)

and

Jρ

u(·)≤ inf

v(·)∈UJρ

v(·)+ρ. (35)

According to Ekeland’s variational principle, there existsuρ₍_·₎_∈_U_{such that}

(i) Jρ

uρ(·)≤Jρ

u(·)=ρ, (ii) duρ(·),u(·)≤√ρ, (iii) Jρ

v(·)+√ρduρ(·),v(·)≥Jρ

uρ(·), ∀v(·)∈U.

(36)

We can get the necessary conditions of uρ₍_·_{) and then take}_ρ_↓_{0 to derive the proper} conditions ofu(·).

Applying the “spike variation method”, we can construct a uρ₍_·₎_∈_U _{for any}_{> 0 as} follows:

uρ(t) =

⎧ ⎨ ⎩

v(t), τ≤t≤τ+,

(16)

where 0 <<δandτ+≤T. And then

duρ(·),uρ(·)≤. (37)

Letxρ₍_·_{) and}_xρ₍_·_{) be the solution of state function (}₉_{) under the control}_uρ₍_·_{) and}_uρ₍_·_). Following the variational equation (17), we introduce the following expression forqρ₍_·_):

⎧

According to Lemma4.1and Lemma4.2, we have

E sup

Using the Taylor expansion we can check

Juρ₍_·₎_–_J_uρ₍_·₎

By arguments analogous to the previous one,

(17)

(18)

+Lμδ

(19)

=E

T

0

bΓρ(t),uρ(t),uρ(t–δ)–bΓρ(t),uρ(t),uρ(t–δ)pρ(t)dt

+E

T

0

qρ(t)αρ{Lx

Υρ(t),uρ(t),uρ(t–δ) +EFt_L

xδ

Υρ(t),uρ(t),uρ(t–δ)|t+δ

+ELμ

Υρ(t),uρ(t),uρ(t–δ)φx

xρ(t)

+E

T

0

qρ(t)αρLx

Υρ(t),uρ(t),uρ(t–δ) +EFt_L

xδ

+ELμ

Υρ(t),uρ₍_t_),_uρ₍_t_–_δ₎_φ

x

xρ₍_t₎

+ELμδ

EFt_φ

xδ

xρ(t–δ)|t+δ

dt. (50)

Let us assume without loss of generality thatLxδ(t) =Lμδ(t) = 0 on [T,T+δ]. Then Eq. (47) can be written as

E T

0

αρLΥρ(t),uρ(t),uρ(t–δ)–LΥρ(t),uρ(t),uρ(t–δ) +bΓρ(t),uρ(t),uρ(t–δ)–bΓρ(t),uρ(t),uρ(t–δ)pρ(t)dt

+√ρ+o()

=E

τ+

τ

αρLΥρ(t),v(t),uρ₍_t_–_δ₎_–_L_Υρ₍_t_),_uρ₍_t_),_uρ₍_t_–_δ₎ +bΓρ(t),v(t),uρ(t–δ)–bΓρ(t),uρ(t),uρ(t–δ)pρ(t)dt

+E

τ++δ

τ+δ

αρLΥρ(t),uρ₍_t_),_v₍_t_–_δ₎_–_L_Υρ₍_t_),_uρ₍_t_),_uρ₍_t_–_δ₎ +bΓρ(t),uρ(t),v(t–δ)–bΓρ(t),uρ(t),uρ(t–δ)pρ(t)dt

+√ρ+o()

≥0. (51)

By a change of variables, we can obtain

E τ+

τ

αρLΥρ(t),v(t),uρ(t–δ)–LΥρ(t),uρ(t),uρ(t–δ) +bΓρ(t),v(t),uρ(t–δ)–bΓρ(t),uρ(t),uρ(t–δ)pρ(t)dt

+E

τ+

τ

αρLΥρ(t+δ),uρ(t+δ),v(t)–LΥρ(t+δ),uρ(t+δ),uρ(t) +bΓρ(t+δ),uρ₍_t₊_δ_),_v₍_t₎_–_b_Γρ₍_t₊_δ_),_uρ₍_t₊_δ_),_uρ₍_t₎_pρ₍_t₊_δ₎_dt

(20)

Considering the arbitrariness ofτ∈[0,T), dividing (52) byand taking→0+_{, it follows} ofxon the control variables in Lemma3.2, one can obtain

(21)

+EFt_α_L_Υ₍_t₊_δ_),_u₍_t₊_δ_),_v₍_t₎_–_L_Υ₍_t₊_δ_),_u₍_t₊_δ_),_u₍_t₎

+bΓ(t+δ),u(t+δ),v(t)–bΓ(t+δ),u(t+δ),u(t)p(t+δ)

≥0. (57)

We also need to drop the expectation in (57). In fact, (57) holds for allv(·)∈Usincev(·) is arbitrary. For anyv∈UandA∈Ft, we deﬁnev(t) =vIA+u(t)IA¯. Thenv(·)∈Uand taking

thisv(·) into (57), we have

EαLΥ(t),v,u(t–δ)–LΥ(t),u(t),u(t–δ)IA

+bΓ(t),v,u(t–δ)–bΓ(t),u(t),u(t–δ)IAp(t)

+EFt_α_L_Υ₍_t₊_δ_),_u₍_t₊_δ_),_v_–_L_Υ₍_t₊_δ_),_u₍_t₊_δ_),_u₍_t₎_I A

+bΓ(t+δ),u(t+δ),v–bΓ(t+δ),u(t+δ),u(t)IAp(t+δ)

≥0. (58)

SinceA∈Ftis chosen arbitrarily, it implies that

EFt_α_L_Υ₍_t_),_v_,_u₍_t_–_δ₎_–_L_Υ₍_t_),_u₍_t_),_u₍_t_–_δ₎

+bΓ(t),v,u(t–δ)–bΓ(t),u(t),u(t–δ)p(t)

+EFt_α_L_Υ₍_t₊_δ_),_u₍_t₊_δ_),_v_–_L_Υ₍_t₊_δ_),_u₍_t₊_δ_),_u₍_t₎

+bΓ(t+δ),u(t+δ),v–bΓ(t+δ),u(t+δ),u(t)p(t+δ)

≥0.

This will lead to

αLΥ(t),v,u(t–δ)–LΥ(t),u(t),u(t–δ)

+bΓ(t),v,u(t–δ)–bΓ(t),u(t),u(t–δ)p(t)

+EFt_α_L_Υ₍_t₊_δ_),_u₍_t₊_δ_),_v_–_L_Υ₍_t₊_δ_),_u₍_t₊_δ_),_u₍_t₎

+bΓ(t+δ),u(t+δ),v–bΓ(t+δ),u(t+δ),u(t)p(t+δ)

≥0.

Let us introduce the following Hamiltonian:

H(t,x,xδ,μ1,μ1δ,μ3,μ3δ,v,vδ,p,α)

=b(t,x,xδ,μ1,μ1δ,v,vδ)p+αL(t,x,xδ,μ3,μ3δ,v,vδ). (59) Set

H(t,v)

=Ht,x(t),x(t–δ),Eψx(t),Eψx(t–δ),

(22)

+EFt_[_H_t₊_δ_,E_ψ_x₍_t₊_δ₎_,E_ψ_x₍_t₎_,E_φ_x₍_t₊_δ₎_,

Eφx(t),u(t+δ),v,p(t+δ),α. (60)

We haveH(t,v)≥H(t,u(t)),∀v∈U, a.e.t∈[0,T],P-a.s.

In summary of the above analysis, we can get the following maximum principle.

Theorem 4.7 Let(u(·),x(·))be an optimal solution of the Problem3.1and Assumption

(H2)hold.Then there existα,γ∈Rwith|α|2+|γ|2= 1such that

(i) p(·),k(·)is a unique solution of(55);

(ii) H(t,v)≥Ht,u(t),

∀v∈U, a.e. t∈[0,T],P-a.s., whereHis deﬁned in(60).

(61)

Remark4.8 WhenG(x) = 0, the state constraint will disappear and the results in our paper will degenerate to the case without state constraint.

5 Application

To conclude this paper, we apply our maximum principle to study a kind of optimal har-vesting problem for a mean-ﬁeld system. The model of our problem comes from Hu, Øk-sendal and Sulem [17]. We modify the model to be a delayed system with continuous harvesting as follows:

⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩

dx(t) ={b1(t)x(t) +b2(t)x(t–δ) +b3(t)E[x(t)] –λ1v(t) –λ2v(t–δ)})dt +{σ1(t)x(t) +σ2(t)E[x(t)]}dB(t), t∈[0,T]

x(0) =x0(t) > 0, t∈[–δ, 0].

(62)

Herex(t) is the density of an unharvested population at timetandv(t) is the harvesting effort andλ1,λ2> 0 are the given harvesting efficiency coefficients.

The performance functional is assumed to be of the form

Jv(·)= –E

T

0

L1(t)x(t) +L2(t)Ex(t)+L3(t)v2(t)dt–EKx(T), (63)

whereK=K(ω) > 0 isFT-measurable, representing the salvage price. Our problem is to

minimize aboveJ(v(·)) under the constraint (11). For simplicity, we assume all the coeﬃ-cientsbi(t) (i= 1, 2, 3),σi(t) (i= 1, 2) andLi(t) > 0 (i= 1, 2, 3) are deterministic functions

on [0,T].

Proposition 5.1 Assumeα> 0,γ > 0and the constraint function G(x)is convex.Then the optimal control of the mean-ﬁeld harvesting problem with delay is

u(t) = –λ2p0(t) +λ2E

Ft_[_p₀₍_t₊_δ_)]

(23)

where p0(t)is solution of the following adjoint equation:

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

–dp0(t) ={b1(t)p0(t) +EFt_[_b₂₍_t₊_δ₎_p₀₍_t₊_δ_{)] +}E_[_b₃₍_t₎_p₀₍_t_)]

+σ1(t)k0(t) +E[σ2(t)k0(t)] –α[L1(t) +L2(t)]}dt –k0(t)dB(t), t∈[0,T],

p0(T) = –αK+γGx(x(T)),

p0(t) = 0, k0(t) = 0, t∈(T,T+δ].

(65)

Proof Applying the necessary condition (61) in Theorem4.7, the control of the form (64) is a candidate of the optimal controls. We also need to prove the optimality ofu(t). Suppose

x(t) is the trajectory ofu(t) andv(t)∈Uad withxv(t) being its corresponding trajectory. Let us denotex¯(t) =xv(t) –x(t). Applying Itô’s formula tox¯(t)p0(t) we have

Ep0(T)x¯(T)

=E

T

0

αx¯(t)L1(t) +L2(t) –λ1p0(t)

v(t) –u(t)–λ2p0(t)

v(t–δ) –u(t–δ)dt. (66) For the mean-ﬁeld optimal harvesting problem, we have

E T

0

Ht,v(t)–Ht,u(t)dt

=E

T

0

–λ1p0(t)

v(t) –u(t)

–λ2EFt

p0(t+δ)v(t) –u(t)–αL3(t)v2(t) –u2(t)dt. (67) Using a change of time variables we conclude that

E T

0

λ2p0(t)

v(t–δ) –u(t–δ)–λ2EFt

p0(t+δ)v(t) –u(t)dt= 0.

Thus, Eqs. (66) and (67) lead to

Eγx¯(T)Gx

x(T)–E

T

0

αx¯(t)L1(t) +L2(t)+L3(t)v2(t) –u2(t)dt

=Eγx¯(T)Gx

x(T)+αJv(·)–Ju(·)

=E

T

0

Ht,v(t)–Ht,u(t)dt

≥0. (68)

Furthermore, we can get

Jv(·)–Ju(·)

≥–γ

αE

Gx

(24)

≥–γ

αE

Gxv(T)–Gx(T)

= 0, (69)

since we have the constraint conditionE[G(x(T))] =E[G(xv₍_T_{))] = 0 and}_G_{is convex. The}

proof of the optimality is completed.

Remark5.2 An example of the constraint function isG(x) = ((N–x)+₎2_{with a ﬁxed} con-stantN. And this state constraint condition implies thatx(T)≥Na.s.

Funding

This work is supported by National Natural Science Foundation of China (11301530).

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors have contributed equally in this paper. All authors read and approved the ﬁnal manuscript.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional aﬃliations.

Received: 16 February 2019 Accepted: 1 August 2019 References

1. Kac, M.: Foundations of kinetic theory, Berkeley symposium on mathematical statistics and probability. In: Third Berkeley Symposium on Mathematical Statistics and Probability, pp. 101–110 (1956)

2. Mckean, H.P.: A class of Markov processes associated with nonlinear parabolic equations. Proc. Natl. Acad. Sci. USA

56(6), 1907–1911 (1966)

3. Cardaliaguet, P.: Notes on mean ﬁeld games. Technical report, (2010)

4. Buckdahn, R., Li, J., Ma, J.: A stochastic maximum principle for general mean-ﬁeld systems. Appl. Math. Optim.74(3), 507–534 (2016)

5. Agram, N., Øksendal, B.: Mean-field stochastic control with elephant memory in finite and infinite time horizon. arXiv preprint (2018).arXiv:1804.09918

6. Agram, N., Øksendal, B.: Stochastic control of memory mean-ﬁeld processes. Appl. Math. Optim.2, 1–24 (2017) 7. Buckdahn, R., Djehiche, B., Li, J.: A general stochastic maximum principle for sdes of mean-ﬁeld type. Appl. Math.

Optim.64(2), 197–216 (2011)

8. Buckdahn, R., Li, J., Peng, S.: Mean-field backward stochastic differential equations and related partial differential equations. Stoch. Process. Appl.119(10), 3133–3154 (2009)

9. Carmona, R., Delarue, F., et al.: Forward–backward stochastic diﬀerential equations and controlled McKean–Vlasov dynamics. Ann. Probab.43(5), 2647–2700 (2015)

10. Lasry, J.M., Lions, P.L.: Mean ﬁeld games. Jpn. J. Math.2(1), 229–260 (2007)

11. Carmona, R., Delarue, F.: Probabilistic analysis of mean-ﬁeld games. SIAM J. Control Optim.51(4), 2705–2734 (2013) 12. Li, J.: Stochastic maximum principle in the mean-ﬁeld controls. Automatica48(2), 366–373 (2012)

13. Yong, J.: Linear–quadratic optimal control problems for mean-ﬁeld stochastic diﬀerential equations. SIAM J. Control Optim.51(4), 2809–2838 (2013)

14. Huang, J., Li, X., Yong, J.: A linear–quadratic optimal control problem for mean-field stochastic differential equations in infinite horizon. Math. Control Relat. Fields5(1), 97–139 (2015)

15. Li, X., Sun, J., Yong, J.: Mean-ﬁeld stochastic linear quadratic optimal control problems: closed-loop solvability. Probab. Uncertain. Quant. Risk1(1), 1–24 (2016)

16. Andersson, D., Djehiche, B.: A maximum principle for sdes of mean-ﬁeld type. Appl. Math. Lett.63(3), 341–356 (2011) 17. Hu, Y., Øksendal, B., Sulem, A.: Singular mean-ﬁeld control games. Stoch. Anal. Appl.35(5), 1–29 (2017)

18. Menaldi, J.M., Rofman, E.: In: Sulem, A., Øksendal, B., Sulem, A. (eds.) Optimal Control and Partial Diﬀerential Equations—Innovations and Applications (2000)

19. Chen, L., Wu, Z.: Maximum principle for stochastic optimal control problem of forward–backward system with delay. In: IEEE Conference on Decision & Control, pp. 16–18 (2009). IEEE

20. Chen, L., Wu, Z.: Maximum principle for the stochastic optimal control problem with delay and application. Automatica46(6), 1074–1080 (2009)

21. Peng, S.: A general stochastic maximum principle for optimal control problems. SIAM J. Control Optim.28(4), 966–979 (1990)

22. Yu, Z.: The stochastic maximum principle for optimal control problems of delay systems involving continuous and impulse controls. Automatica48(10), 2420–2432 (2012)

23. Zhang, F.: Maximum principle for delayed stochastic linear–quadratic control problem with state constraint. J. Appl. Math.2013, Article ID 964765 (2013)

(25)

25. Yong, J., Zhou, X.: Stochastic controls: Hamiltonian systems and hjb equations. In: Karatzas, I., Yor, M. (eds.) Applications of Mathematics, 1st edn., vol. 43, pp. 101–156. Springer, New York (1999)