arxiv: v1 [math.pr] 7 Jan 2021

(1)

arXiv:2101.02517v1 [math.PR] 7 Jan 2021

Approximation of martingale couplings on the line in the weak

adapted topology

M. Beiglböck

∗

_{B. Jourdain}

†

_{W. Margheriti}

†, ‡

_{G. Pammer}

∗, §

Abstract

Our main result is to establish stability of martingale couplings: suppose thatπis a martingale

coupling with marginalsµ, ν. Then, given approximating marginal measures ˜µ≈µ,˜ν≈νin convex

order, we show that there exists an approximating martingale coupling ˜π≈πwith marginals ˜µ,ν˜.

In mathematical finance, prices of European call / put option yield information on the marginal measures of the arbitrage free pricing measures. The above result asserts that small variations of call / put prices lead only to small variations on the level of arbitrage free pricing measures.

While these facts have been anticipated for some time, the actual proof requires somewhat in-tricate stability results for the adapted Wasserstein distance. Notably the result has consequences for a several related problems. Specifically, it is relevant for numerical approximations, it leads to a new proof of the monotonicity principle of martingale optimal transport and it implies stability of weak martingale optimal transport as well as optimal Skorokhod embedding. On the mathematical finance side this yields continuity of the robust pricing problem for exotic options and VIX options with respect to market data. These applications will be detailed in two companion papers.

Keywords: Martingale Optimal Transport, Adapted Wasserstein distance, Robust finance, Convex order, Martingale couplings.

1 Introduction

Subsequently we will carefully explain all required notation and describe relevant literature in some detail. Before going into this, we provide here a first description of our main result and its relevance for martingale transport theory.

In brief, while classical transport theory is concerned with couplings Π(µ, ν) of probability measures µ, ν, the martingale variant restricts the problem to the set of couplings which also satisfy the martingale constraint, in symbols ΠM(µ, ν). Already in the simple (and most relevant) case whereµ, ν are

proba-bilities on the real line, many arguments and result appear significantly more involved in the martingale context. A basic explanation lies the rigidity of the martingale condition that makes classically simple approximation results quite intricate. Specifically, the martingale theory has been missing a counterpart to the following straightforward fact of classical transport theory:

Fact 1.1(Stability of Couplings). Let π∈Π(µ, ν) and assume thatµk_{, ν}k_,_k_∈_N_{, are probabilities that}

converge weakly toµ andν. Then there exist couplingsπk _∈_Π(µk_{, ν}k_{), k}_∈_N_{converging weakly to}_π.

This is so basic and straightforward that its implicit use is easily overlooked. Note however that it plays a crucial role in a number of instances, e.g. for stability of optimal transport, providing numerical approximations, or in the characterisation of optimality through cyclical monotonicity.

The main result of this article is to establish the martingale counterpart of Fact 1.1, see Theorem 2.5 below. This closes a gap in the theory of martingale transport. It yields basic fundamental results in a unified fashion that it is much closer to the classical theory and it allows to address questions in Martingale optimal transport, optimal Skorokhod embedding and robust finance that have previously remained open. We will consider these applications systematically in two accompanying articles.

∗_{University of Vienna, Austria. Emails:}_{mathias.beiglboeck@univie.ac.at,}

gudmund.pammer@univie.ac.at

†_Université _Paris-Est, _Cermics _(ENPC), _INRIA _F-77455 _{Marne-la-Vallée,} _France. _E-mails:

benjamin.jourdain@enpc.fr,william.margheriti@enpc.fr

‡_{acknowledges support from the “Chaire Risques Financiers”, Fondation du Risque.}

(2)

1.1 The Martingale Optimal Transport problem

Let (X, dX), (Y, dY) be Polish spaces and C : X×Y → R+ be a nonnegative measurable function.

Denote by P(X) the set of probability measures on X. For µ ∈ P(X) and ν ∈ P(Y), the classical Optimal Transport problem consists in minimising

inf

π∈Π(µ,ν) Z

X×Y

C(x, y)π(dx, dy), (OT)

where Π(µ, ν) denotes the set of probability measures in P(X×Y) with first marginalµ and second marginalν. WhenX=Y andC=dr

X for somer≥1, (OT) corresponds to the well-known Wasserstein

distance with indexr to the powerr, denotedWr

r(µ, ν), see [4,41,43,44] for a study in depth.

The theory of OT goes back to Gaspard Monge [36] and, resp. Kantorovich [31] in its modern formula-tion. It was rediscovered many times under various forms and has an impressive scope of applications. A variant of OT that is suitable for applications in mathematical finance, in particular model-independent pricing was introduced in [10] in a discrete time setting and in [21] in a continuous time setting. Com-pared to usual OT, the difference is that one requires an additional martingale constraint to (OT) which reflects the condition for a financial market to be free of arbritrage.

In detail, the Martingale Optimal Transport (MOT) problem is formulated as follows: givenπ ∈ P(X ×Y), we denote by (πx)x∈X its disintegration with respect to its first marginalµ. We then write

π(dx, dy) =µ(dx)πx(dy), or with a slight abuse of notation,π=µ×πxif the context is not ambiguous.

LetC:R_×R_→R₊_{be a nonnegative measurable function and}_µ,_ν _{be two probability distributions on}

the real line with finite first moment. Then the MOT problem consists in minimising inf

π∈ΠM(µ,ν)

Z R_×R

C(x, y)π(dx, dy), (MOT)

where ΠM(µ, ν) denotes the set of martingale couplings between µand ν, that is

ΠM(µ, ν) = π=µ×πx∈Π(µ, ν)|µ(dx)-almost everywhere, Z R y πx(dy) =x .

According to Strassen’s theorem [42], the existence of a martingale coupling between two probability measures µ, ν ∈ P(Rd_{) with finite first moment is equivalent to} _µ_≤_c _ν_{, where} _≤_c _{denotes the convex}

order. We recall that two finite positive measuresµ, ν onRd_{with finite first moment and are said to be}

in the convex order iff we have _Z

Rd

f(x)µ(dx)≤

Z Rd

f(y)ν(dy),

for every convex functionf :Rd_→R_{. Note that there holds equality for all affine functions, from which}

we deduce thatµandν have equal mass and satisfyRRdx µ(dx) =

R

Rdy ν(dy).

For adaptations of classical optimal transport theory to the MOT problem, we refer to Henry-Labordère, Tan and Touzi [27] and Henry-Labordère and Touzi [28]. Concerning duality results, we refer to Beiglböck, Nutz and Touzi [13], Beiglböck, Lim and Obłój [12] and De March [17]. We also refer to De March [16] and De March and Touzi [18] for the multi-dimensional case.

Concerning the numerical resolution of the MOT problem, we refer to the articles of Alfonsi, Corbetta and Jourdain [2, 3], De March [15], Guo and Obłój [24] and Henry-Labordère [26]. When µ andν are finitely supported, then the MOT problem amounts to linear programming. In the general case, once the MOT problem is discretised by approximatingµandν by probability measures with finite support and in the convex order, Alfonsi, Corbetta and Jourdain raised the question of the convergence of optimal costs of the discretised problem towards the costs of the original problem. A first partial result was obtained by Juillet [30] who established stability of left-curtain coupling. Guo and Obłój [24] establish the result under moment conditions. More recently, Backhoff-Veraguas and Pammer [9] and Wiesel [45] independently gave a definite positive answer.

1.2 The Adapted Wasserstein distance

The stability result shown by Backhoff-Veraguas and Pammer involves Wasserstein convergence. More precisely, let µk_{, ν}k _{∈ P}₍_R_), _k _∈ _N _{be in the convex order and respectively converge to} _µ _and _ν _in

(3)

Wr. Under mild assumption, for allk ∈Nthere existsπk ∈ΠM(µk, νk), optimal for (MOT), and any

accumulation point of (πk₎

k∈N with respect to the Wr-convergence is a martingale coupling betweenµ

andν optimal for (MOT).

However, it turns out that the usual weak topology / Wasserstein distance is not wellsuited in setups where accumulation of information plays a distinct role, e.g. in mathematical finance. Indeed, the symmetry of this distance does not take into account the temporal structure of stochastic processes. It is easy to convince oneself that two stochastic processes very close in Wasserstein distance can yield radically unalike information, as illustrated in [5, Figure 1]. Therefore, one needs to strengthen, the usual topology of weak convergence accordingly. Over time numerous researchers haven independently introduced refinements of the weak topology, we mention Hellwig’s information topology [25], Aldous’s extended weak topology [1], the nested distance / adapted Wasserstein distance of Plug-Pichler [37] and the optimal stopping topology [6]. Strikingly, all those seemingly different definitions lead to same topology in present discrete time [6, Theorem 1.1] framework. We refer to this topology as the adapted weak topology. A natural compatible metric is given by theadapted Wasserstein distance, see [37,38,39,

40,35,14] among others.

Fixx0∈X andr≥1. We denote the set of all probability measures onX with finite r-th moment

byPr(X), i.e. Pr(X) = p∈ P(X)| Z X drX(x, x0)p(dx)<+∞ .

Similarly, we denote byM(X) (resp.Mr(X)) the set of all finite positive measures (resp. with finite

r-th moment). The sets M(X) andMr(X), resp. are equipped with the weak topology induced by the

setCb(X) of all real-valued absolutely bounded continuous functions onX and, resp., the set Φr(X) of

all real-valued continuous functions onX,C(X), which satisfy the growth constraint Φr(X) ={f ∈C(X)| ∃α >0, ∀x∈X, |f(x)| ≤α(1 +drX(x, x0))}.

A sequence (µk₎

k∈Nconverges in Mr(X) toµiff

∀f ∈Φr(X), µk(f) −→

k→+∞µ(f). (1.1)

If moreover µ and µk_, _k _∈ _N_{, have equal mass, then the convergence (1.1) can be equivalently}

formulated in terms of the Wasserstein distance with indexr:

Wr(µk, µ) := inf π∈Π(µk_,µ₎ Z X×X drX(x, y)π(dx, dy) 1 r −→ k→+∞0.

Givenm0>0, we can then equip the set of finite positive measures inMr(X×Y) with massm0with

the Wasserstein topology. However, we can also equip it with a stronger topology, namely the adapted Wasserstein topology. It is induced by the metric AWr defined for all π, π′ ∈ Mr(X×Y) such that

π(X×Y) =π′(X×Y) =m0 by AWr(π, π′) = inf χ∈Π(µ,µ′) Z X×X (drX(x, x′) +Wrr(πx, π′x′))χ(dx, dx ′₎ 1 r , (1.2) where µ, resp.µ′ _{is the first marginal of}_π_resp._π′_{. It is easy to check that}_W

r≤ AWr, and therefore

AWr indeed induces a stronger topology thanWr. Another useful point of view is the following: let

J :M(X×Y)→ M(X× P(Y)) be the inclusion map defined for allπ=µ×πx∈ M(X×Y) by

J(π)(dx, dp) =µ(dx)δπx(dp).

For allπ, π′_{∈ M}

r(X×Y) with equal mass, their adapted Wasserstein distance coincides with

AWr(π, π′) =Wr(J(π), J(π′)). (1.3)

Therefore, the topology induced by AWrcoincides with the initial topology w.r.t.J.

Finally, let us mention the interpretation of adapted Wasserstein distance in terms of bicausal cou-plings as done in [7]. Let π, π′ _{∈ P}

(4)

distribution of (Z1, Z2, Z1′, Z2′) is a Wr-optimal coupling betweenπ andπ′. In many cases, there exists

a Monge transport map T :X×Y →X×Y such that (Z′

1, Z2′) =T(Z1, Z2). As mentioned in [5], the

temporal structure of stochastic processes is then not taken into account since the present valueZ′

1is

de-termined from the future valueZ2. Therefore, it is more suitable to restrict to couplings (Z1, Z2, Z1′, Z2′)

betweenπandπ′ such that the conditional distribution of (Z1′, Z2′) (resp. (Z1, Z2)) given (Z1, Z2) (resp.

(Z1′, Z2′)) is equal to the conditional distribution of (Z1′, Z2′) (resp. (Z1, Z2)) givenZ1(resp.Z1′).

Let µ andµ′ denote the respective first marginal distributions of πand π′ and letη ∈Π(π, π′) be a coupling betweenπandπ′_{. Denote by (η}

x(dx′, dy′))x∈X and (ηx′(dx, dy))x′∈X the probability kernels

such that Z y∈Y η(dx, dy, dx′, dy′) =µ(dx)ηx(dx′, dy′) and Z y′∈Y η(dx, dy, dx′, dy′) =µ′(dx′)ηx′(dx, dy).

Thenη is called bicausal iff forπ(dx, dy)-almost every (x, y)∈X×Y, resp.π(dx′_{, dy}′_{)-almost every}

(x′_{, y}′₎_∈_X_×_Y_,

η(x,y)(dx′, dy′) =ηx(dx′, dy′), resp. η(x′_,y′₎(dx, dy) =ηx′(dx, dy).

We denote by Πbc(π, π′) the set of bicausal couplings betweenπ andπ′. Another useful

characteri-sation isη is bicausal iff there existχ∈Π(µ, µ′_{) and (γ}

(x,x′)(dy, dy′))(x,x′)∈X×X such that

χ(dx, dx′)-almost everywhere, γ(x,x′)(dy, dy′)∈Π(πx, πx′) and η(dx, dy, dx′, dy′) =χ(dx, dx′)γ(x,x′₎(dy, dy′).

Then the adapted Wasserstein distance coincides with

AWr(π, π′) = inf η∈Πbc(π,π′) Z X×Y (drX(x, x′) +dYr(y, y′))η(dx, dy, dx′, dy′) 1 r .

One of the objectives of the present paper is to prove that some well-known stability results for the

Wr-convergence also hold for the AWr-convergence. More details are given in Section 2.

1.3 Outline

Section 2 presents the main result of this article, namely Theorem 2.5 below. We give an explanation of how the conclusion of this theorem was expectable. Moreover, we give a sketch of its proof in order to help seeing through the technical details.

Section 3 deals with technical lemmas which allow us to deal with difficulties specific to the adapted Wasserstein distance with more ease. They mainly explore properties about approximations and when addition (in a sense explained below) is continuous.

Section 4 focuses on the convex order. It deals with potential functions which are a convenient tool to address the convex order in dimension one. Moreover, it contains the proof of a key result which allows us to see that the main theorem needs only to be proved for irreducible pairs of marginals.

Section 5 is as the name suggests devoted to the proof of the main theorem. Before entering its technical proof, we prove that is is enough to proveAW1-convergence for irreducible pairs of marginals.

2 Main result

Our main result is Theorem 2.5 below. Before, we state a proposition which enlightens us about why the conclusion of the theorem should be expected (or at least hoped for). We also state the generalisation of this proposition to Polish spaces. Then, we state a proposition which is a key result to argue that the theorem needs only to be proved when the limit pair is irreducible. Next, we state the theorem with a sketch of its proof. It is understood that (X, dX) and (Y, dY) denote any Polish spaces, and (x0, y0) is a

fixed element ofX×Y.

As already mentioned above, it is well-known (and easy to show) that when one considers conver-gent sequences of marginals (µk₎

k∈N, (νk)_k_∈N all with equal mass respectively to µ, ν ∈ M_r(X), then,

informally speaking, there holds

Π(µk, νk) −→

(5)

i.e., any sequence with convergent marginals has accumulation points in Π(µ, ν), and for anyπ∈Π(µ, ν) there holds inf πk_∈_Π(_µk_,νk₎W r r(π, πk)≤ Wrr(µ, µk) +Wrr(ν, νk) −→ k→+∞0. (2.2)

Indeed, let ηk_∈_Π(µk_{, µ), resp.}_τk _∈_{Π(ν, ν}k_{) be optimal for}_W

r(µk, µ), resp.Wr(ν, νk) and set

πk(dxk, dyk) =

Z

(x,y)∈X×Y

ηk(dxk, dx)πx(dy)τyk(dyk).

The next two propositions establish (2.1) with respect to AWr for finite positive measures with

common mass. The first one is formulated for X = Y = R _{and provides under mild assumptions}

an estimate of infπk_∈_Π(_µk_,νk₎AWr_r(π, πk) with respect to the marginals as in (2.2). Its proof relies on unidimensional tools, which we recall here. For η a probability distribution on R_{, we denote by}

Fη:x7→η((−∞, x]) its cumulative distribution function, and byFη−1: (0,1)→Rits quantile function

defined for allu∈(0,1) by

Fη−1(u) = inf{x∈R|Fη(x)≥u}.

The following properties are standard results (see for instance [29, Section 6] for proofs): (a) Fη is càdlàg,Fη−1 is càglàd ;

(b) For all (x, u)∈R_×_(0,_1),

Fη−1(u)≤x ⇐⇒ u≤Fη(x), (2.3)

which implies

Fη(x−)< u≤Fη(x) =⇒ x=Fη−1(u), (2.4)

and Fη(Fη−1(u)−)≤u≤Fη(Fη−1(u)); (2.5)

(c) Forµ(dx)-almost everyx∈R_,

0< Fη(x), Fη(x−)<1 and Fη−1(Fη(x)) =x;

(d) The image of the Lebesgue measure on (0,1) byF−1

η isη.

The property (d) is referred to as inverse transform sampling.

Proposition 2.1. Letµ, µk_{, ν, ν}k_{∈ M}

r(R),k∈N, be with equal mass such thatµk (resp.νk)converges

toµ(resp. ν) in Wr. Letπ∈Π(µ, ν). Then:

(a) There exists a sequenceπk_∈_Π(µk_{, ν}k_),_k_∈_N_, _{converging to}_π _in_AW r;

(b) If for all x∈R _and_k_∈N_with _µk₍_{_x_}₎_>_{0, there exists}_x′ _∈R_{such that}

µ((−∞, x′))≤µk((−∞, x))< µk((−∞, x])≤µ((−∞, x′]), which is for instance always satisfied when µk _{is non-atomic, then}

AWrr(π, πk)≤ Wrr(µ, µk) +Wrr(ν, νk). (2.6)

Remark 2.2. Ifπ is a martingale coupling, i.e.RRy

′_π

x′(dy′) =x′, µ(dx′)-almost everywhere, then for χk _∈_Π(µk_{, µ) an optimal coupling for}_AW

r(πk, π), we have Z R x− Z R y πk x(dy) r µk_{(dx) =} Z R×R x− Z R y πk x(dy) r χk_{(dx, dx}′₎ ≤2r−1 Z R_×R |x−x′|r+ x′− Z R y πxk(dy) r χk(dx, dx′)

(6)

= 2r−1 Z R×R |x−x′_|r₊ Z R y′_π x′(dy′)− Z R y πk x(dy) r χk_{(dx, dx}′₎ ≤2r−1 Z R_×R |x−x′|r+W1r(πkx, πx′) χk(dx, dx′) ≤2r−1AWrr(π, πk) −→ k→+∞0.

In that sense,πk_{, k}_∈_N_{is almost a sequence of martingale couplings.}

In the setting of Proposition 2.1, if µk _and _νk _{are also in the convex order and} _π _{is a martingale}

coupling, then in view of Remark 2.2 one would naturally expect thatπk _{can be slightly modified into a}

martingale coupling and still converge toπinAWr. This actually requires a considerable amount of work

and is the main message of Theorem 2.5 below. We mention that the previous proposition generalises to any Polish spacesX and Y, as the next proposition states, but unfortunately without providing an estimate.

Proposition 2.3. Let µ, µk _{∈ M}

r(X), ν, νk ∈ Mr(Y), k ∈ N, all with equal mass and such that µk

(resp.νk₎_{converges to}_µ_(resp._{ν) in}_W

r. Letπ∈Π(µ, ν). Then there exists a sequenceπk ∈Π(µk, νk),

k∈N_, _{converging to}_π _in_AW_r_.

The next proposition is a key ingredient which allows us to reduce the proof of Theorem 2.5 below to the case of irreducible pairs of marginals. Forµ∈ M1(R), we denote by uµ its potential function, that

is the map defined for ally ∈R_by_u_µ_{(y) =}R_R_|_y₋_x_|_{µ(dx) (see Section 4 for more details). We recall}

that a pair (µ, ν) of finite positive measures in convex order is called irreducible ifI ={uµ< uν} is an

interval and,µ(I) =µ(R_{) and}_{ν(I) =}_ν(R_{). If}_a_∈R_{is such that}_ν([a,₊_∞_{)) = 0, then the convex order}

impliesµ([a,+∞)) = 0, hence

uµ(a) =a− Z R x µ(dx) =a− Z R y ν(dy) =uν(a),

so a /∈I. Similarly,ν((−∞, a]) = 0 =⇒ a /∈ I. We deduce that ν must assign positive mass to any neighbourhood of each of the boundaries ofI.

According to [11, Theorem A.4], for any pair (µ, ν) of probability measures in convex order, there existN ⊂N_{and a sequence (µ}_n_{, ν}_n₎_n_∈_N _{of irreducible pairs of sub-probability measures in convex order}

such that µ=η+X n∈N µn, ν=η+ X n∈N νn and {uµ< uν}= [ n∈N {uµn< uνn},

where the union is disjoint andη=µ|{uµ=uν}. The sequence (µn, νn)n∈N is unique up to rearrangement of the pairs and is called the decomposition of (µ, ν) into irreducible components. Moreover, for any mar-tingale couplingπ∈ΠM(µ, ν), there exists a unique sequence of martingale couplingsπn∈ΠM(µn, νn),

n∈N such that

π=χ+X

n∈N

πn,

whereχ= (id,id)∗η and∗denotes the pushforward operation. This sequence satisfies

∀n∈N, πn(dx, dy) =µn(dx)πx(dy). (2.7)

Proposition 2.4. Let(µk_{, ν}k₎

k∈Nbe a sequence of pairs of probability measures on the real line in convex

order which converge to (µ, ν) in W1. Let (µn, νn)n∈N be the decomposition of (µ, ν) into irreducible

components andη =µ|{uµ=uν}. Then there exists for any k∈N a decomposition of(µ

k_{, ν}k₎_{into pairs}

of sub-probability measures(µk

n, νnk)n∈N,(ηk, υk) which are in convex order such that

ηk₊X n∈N µk n=µk, υk+ X n∈N νk n=νk, k∈N, (2.8) lim k→+∞η k₌_η, _lim k→+∞µ k n=µn, lim k→+∞ν k n =νn, lim k→+∞υ k₌_η _in _W 1. (2.9)

(7)

We can now state our main result, namely Theorem 2.5 below. Any martingale coupling whose marginals are approximated by probability measures in convex order can be approximated by martingale couplings with respect to the adapated Wasserstein distance.

Theorem 2.5. Let µk_{, ν}k _{∈ P}

r(R), k∈N, be in convex order and respectively converge to µand ν in

Wr. Let π∈ΠM(µ, ν). Then there exists a sequence of martingale couplings πk ∈ΠM(µk, νk), k∈N

converging toπ inAWr.

Sketch of the proof. We will first argue that it is enough to consider the caser= 1. Thanks to Proposition 2.4, we can also reduce the proof to the case of irreducible pairs of marginals (µ, ν), whose single irreducible component is denoted (l, r) =I.

Step 1. Fix any martingale coupling π∈ΠM(µ, ν). When directly approximatingπ we would face

technical obstacles. First, forKa compact subset ofI,µ|K×πxis not necessarily compactly supported.

Moreover,ν may put mass on the boundary ofI. To overcome this, the kernelπx is first compactified

to a compact set [−R, R], whereR >0, and then pushed forward by the mapy7→α(y−x) +x, where α∈(0,1). This yields a martingale coupling πR,α close toπ and easier to approximate, betweenµand a probability measureνR,α _{dominated by} _ν _{in the convex order. We find compact sets} _{K, L}_⊂_I _such

that the restriction πR,α_|

K×R is compactly supported on K×Land concentrated on K×˚L, where ˚L

denotes the interior ofL. Since, by irreducibility,ν puts mass to any neighbourhood of the boundary of I,νR,α _{assigns positive mass to two open sets} _L

−, L+ on both sides ofK with positive distance toK.

This is summarised in Figure 1, whereJ denotes a compact subset ofIlarge enough.

l

r

L

−

K

L

+

J

Figure 1: Intervals involved in the proof. The boundaries of the closed intervals are vertical bars and those of the open intervals are parenthesis.

Step 2. It is possible to find an approximative sequence (ˆπk _{= ˆ}_µk_×_π_ˆk

x)k∈N of the sub-probability

martingale coupling πR,α_|

K×R from step 1. Unfortunately ˆπk is not necessarily a martingale coupling.

Therefore, we free up some mass, and use the one available on the left and right ofK inL− andL+ to

adjust the barycentres of the kernelsπk

x. Hence we find a sequence (˜πk = ˆµk×˜πxk)k∈Nof sub-probability

martingale couplings approximatingπR,α_| K×R.

Step 3. By construction, up to multiplication by a factor smaller than and close to 1, the first marginal of ˜πk _{satisfies ˆ}_µk _≤_µk_{. Moreover, its second marginal denoted ˜}_νk _{is such that there exists a}

probability measureνR,α,k _{which satisfies ˜}_νk_≤_νR,α,k_≤

cνk. Then by using the uniform convergence of

potential functions, we show that forksufficiently large there exist sub-probability martingale couplings ηk_∈_Π

M(µk−µˆk, νR,α,k−˜νk) so that the sumηk+ ˜πk is a martingale coupling in ΠM(µk, νR,α,k), where

the second marginal is dominated byνk _{in the convex order.}

Step 4. In the last step, we use the inverse-transform martingale coupling betweenνR,α,k _and_νk_{, see}

[29], to changeηk_{+ ˜}_πk _{to a martingale coupling}_πk _∈_Π(µk_{, ν}k_{). Finally, we estimate the}_AW

1-distance

ofπtoπk_.

3 On the weak adapted topology

We begin this section with a lemma on uniform integrability which will prove very handy throughout the paper. We formulate it for finite positive measures onX, but it is understood that (X, x0) is replaced

with (Y, y0) for measures onY.

Lemma 3.1. Let r≥1 andµ∈ Mr(X). For ε >0, let

Iεr(µ) := sup τ∈M(X) τ≤µ, τ(X)≤ε Z X drX(x, x0)τ(dx). (3.1) (a) Ir

(8)

(b) The value ofIr

ε(µ)vanishes as ε→0.

(c) For anyµ′_{∈ M}

r(X)such that µ(X) =µ′(X)we have

Iεr(µ)≤2r−1(Iεr(µ′) +Wrr(µ, µ′)). (3.2)

(d) Letµ, µk_{∈ M}

r(X),k∈Nbe with equal mass such thatµk converges weakly to µ. Then

Wr(µk, µ) −→ k→+∞0 ⇐⇒ sup_k_∈N Iεr(µk)−→_ε_→₀0 and sup k∈N Z X drX(x, x0)µk(dx)<+∞.

(e) Finally, ifX =Rd _and_µ_≤_c_ν _with _ν_{∈ M}₁₍Rd_{), then}_I_ε1_(µ)_≤_I_ε1_(ν). Remark 3.2. Ifµ(X)≤ε, thenIr

ε(µ) is simply ther-th moment ofµ.

Proof. The first point (a) is an easy consequence of the definition ofIr ε.

Next we check (b). Let µ∈ Mr(X) be such thatµ(X)>0. Since

Iεr(µ) =µ(X)Irε µ(X) µ µ(X) , (3.3) to check convergence of Ir

ε(µ) to 0 as ε → 0, we may suppose that µ ∈ Pr(X). Let ε ∈ (0,1). For

η ∈ Mr(X), we denote by η the image of η under the mapx7→drX(x, x0). It is then enough to show

that

Iεr(µ) =

Z 1 1−ε

F_µ−1(u)du. (3.4)

Indeed, since µ ∈ Pr(X) we have

R XdX(x, x0) r_{µ(dx) =} R1 0 F −1 µ (u)du < +∞, so we conclude by

dominated convergence thatIεr(µ) vanishes asε→0. Of course an appropriate upper bound would be

sufficient, but (3.4) will come in handy for the proof of the last part. Letτ∗∈ M(X) be such thatτ∗ _is

the right-most measure dominated byµwith mass equal to ε, namely τ∗(dx) = 1Aε(x) + Fµ(yε)−(1−ε) µ(Bε) 1 Bε(x) µ(dx), where yε=Fµ−1(1−ε), Aε={x∈R|drX(x, x0)> yε} and Bε={x∈R|drX(x, x0) =yε},

and the second summand of the right-hand side is zero if µ(Bε) = 0. By (2.5) and definition ofyε, we

have

Fµ(yε−)≤1−ε, (3.5)

henceµ(Bε) =µ({yε})≥Fµ(yε)−(1−ε) and thereforeτ∗ ≤µ. Moreover,

τ∗(R_{) =}_µ(A_ε_{) +}_F_µ_(y_ε₎₋₍₁₋_{ε) =}_µ((y_ε_,₊_∞₎₎₋₍₁₋_F_µ_(y_ε_{)) +}_ε₌_ε.

Let us show that _Z

X

drX(x, x0)τ∗(dx) = Z 1

1−ε

F_µ−1(u)du. (3.6) According to (3.5), (1−ε, Fµ(yε)] ⊂ (Fµ(yε−), Fµ(yε)], so by (2.4), for all u ∈ (1 −ε, Fµ(yε)],

F_µ−1(u) =yε. Then, using the inverse transform sampling and (2.3) for the second equality, we get

Z

X

drX(x, x0)τ∗(dx) = Z

R

y1(yε,+∞)(y)µ(dy) + (Fµ(yε)−(1−ε))yε

= Z 1 Fµ(yε) F_µ−1(u)du+ Z Fµ(yε) 1−ε F_µ−1(u)du = Z 1 1−ε F_µ−1(u)du,

(9)

hence (3.6) holds.

Letτ∈ M(X) be such thatτ≤µand 0< τ(X)≤ε. In view of (3.6), it is enough in order to prove (3.4) to show that Z X dr X(x, x0)τ(dx)≤ Z 1 1−ε F_µ−1(u)du. (3.7) Sinceτ ≤µ, we haveτ ≤µ. Using (2.5) for the last inequality, we get for allu∈(0,1)

1−Fτ /τ(X) F_µ−1(1−τ(X)u)= τ((F −1 µ (1−τ(X)u),+∞)) τ(X) ≤ µ((F_µ−1(1−τ(X)u),+∞)) τ(X) ≤u,

henceFτ /τ(X)(Fµ−1(1−τ(X)u))≥ 1−uand by (2.3), F

−1

τ /τ(X)(1−u)≤ F

−1

µ (1−τ(X)u). Using the

inverse transform sampling, we deduce

Z X drX(x, x0)τ(dx) =τ(X) Z 1 0 F_{τ /τ}−1₍_X₎(u)du=τ(X) Z 1 0 F_{τ /τ}−1₍_X₎(1−u)du ≤τ(X) Z 1 0 F_µ−1(1−τ(X)u)du= Z 1 1−τ(X) F_µ−1(u)du≤ Z 1 1−ε F_µ−1(u)du, hence (3.7) holds.

To see (c), fixµ′ _{∈ M}_(X_{) with}_µ(X_{) =}_µ′_(X_{). We denote by}_{π(dx, dx}′_{) =}_µ(dx)_π

x(dx′)∈Π(µ, µ′)

aWr-optimal coupling. Letτ∈ M(X) be such thatτ≤µandτ(X)≤ε. Letτ′∈ M(X) be defined by

τ′(dx′) =

Z

x∈X

πx(dx′)τ(dx).

Sinceπis element of Π(µ, µ′_{), we find}_τ′_≤_µ′ _and_{τ(X) =}_τ′_(X_{). Then}

Z X drX(x, x0)τ(dx)≤2r−1 Z X×X (drX(x′, x0) +dXr(x, x′))πx(dx′)τ(dx) ≤2r−1 Iεr(µ′) + Z X×X drX(x, x′)π(dx, dx′) , which shows by optimality ofπthe assertion.

We now show (d). Let µ, µk _{∈ M}

r(X) be with equal mass such that µk converges weakly to µ.

According to (3.3), we may suppose thatµ, µk _{∈ P} r(X).

Suppose thatWr(µk, µ) vanishes as k goes to +∞. Then the sequence of the r-th moments ofµk,

k∈N_{is bounded since it converges to the}_{r-th moment of}_{µ. Let}_{η >}_{0. Let}_k₀_∈N_{be such that for all}

k > k0,Wrr(µk, µ)< η. Then (c) yields forε >0

sup k∈N Iεr(µk)≤ X k≤k0 Iεr(µk) + sup k>k0 Iεr(µk)≤ X k≤k0 Iεr(µk) + 2r−1(Iεr(µ) +η).

According to (b) we then get

lim sup

ε→0 supk∈N

Iεr(µk)≤2r−1η.

Sinceη >0 is arbitrary, we deduce that supk∈NI_εr(µk) vanishes with ε.

Conversely, suppose that supk∈NI_εr(µk) vanishes withεand the sequence of ther-th moments ofµk,

k ∈ N _{is bounded. By Skorokhod’s representation theorem, there exist random variables} _X _and _Xk_,

k∈N_{, defined on a common probability space such that}_{X, resp.}_Xk _{is distributed according to}_{µ, resp.}

µk _and_Xk _{converges almost surely to}_X_{. Then for all}_{M >}_0,

Wrr(µk, µ)≤E[drX(Xk, X)] =E[drX(Xk, X)1{dr

X(Xk,X)<M}] +

E_[dr_X_(Xk_{, X)}₁_{_dr

X(Xk,X)≥M}]. By the dominated convergence theorem, we deduce

lim sup k→+∞ Wrr(µk, µ)≤lim sup k→+∞ E_[dr_X_(Xk_{, X)}₁_{_dr X(Xk,X)≥M}].

(10)

Let us then prove that the right-hand side vanished asM goes to +∞. Letη >0. Letε >0 be such thatIr

ε(µ) + supk∈NIεr(µk)< η. By Markov’s inequality, we have

sup k∈N E_[₁_{_dr X(Xk,X)≥M}]≤sup k∈N E_[dr X(Xk, X)] M ≤ 2r−1 M supk∈N Z X dr X(x, x0) (µk+µ)(dx),

where the right-hand side vanishes as M goes to +∞. Therefore, there existsM0 >0 such that for all

k∈N_and_{M > M}₀_, E_[dr_X_(Xk_{, X}₎₁_{_dr X(Xk,X)≥M}]≤2 r−1_E_[dr X(Xk, x0)1{dr X(Xk,X)≥M}] +E[d r X(x0, X)1{dr X(Xk,X)≥M}] ≤2r−1 Iεr(µk) +Iεr(µ) <2r−1η. Therefore, for allM > M0,

lim sup

k→+∞

E_[dr_X_(Xk_{, X)}₁_{_dr

X(Xk,X)≥M}]≤2

r−1_η.

Sinceη is arbitrary, this proves the assertion.

Finally, we want to show (e). LetX =Rd _and_µ_≤_c _ν_with_ν _{∈ M}₁₍Rd_{). According to (3.3), we may}

suppose thatµ, ν ∈ P1(Rd). Again, we writeµandνfor the pushforward measures ofµandνunder the

map (x7→ |x−x0|r). First, we note thatµis dominated byν in the increasing convex order. Indeed, let

f ∈C(X) be convex and nondecreasing, thenx7→f(|x−x0|r) constitutes a convex, continuous function.

Thus, _Z R f(y)µ(dy) = Z Rd f(|x−x0|r)µ(dx)≤ Z Rd f(|x−x0|r)ν(dx) = Z R f(y)ν(dy).

The convex increasing order is characterised by the following family of inequalities (see for instance [[2], Theorem 2.4]): for all 0≤ε≤1,

Z 1 1−ε F_µ−1(y)dy≤ Z 1 1−ε F_ν−1(y)dy.

The identity (3.4) concludes the proof.

We now prove Proposition 2.1. A handy tool in the construction of the approximative sequence (πk₎

k∈N are copulas. Recall that a two-dimensional copula is an element C of Π(λ, λ) where λ is the

uniform distribution on (0,1). A coupling πis an element of Π(µ, ν) if and only if it can be written as the push-forward of a copulaCunder the quantile map (F−1

µ , Fν−1) : (0,1)×(0,1)→R×R. Clearly, if

C is a copula thenπ= (F−1

µ , Fν−1)∗C is contained in Π(µ, ν). Contrary, whenπ∈Π(µ, ν) is given, we

can construct a copulaC by

C(du, dv) =1(0,1)(u)du Cu(dv),

whereCu is given by

Cu = ((y, w)7→Fν(y−) +wν({y}))∗(π_F−1

µ (u)×λ). (3.8) In particular, we have that u 7→ Cu is constant on the jumps on Fµ. By the inverse transform

sampling, showing that π= (F−1

µ , Fν−1)∗C amounts to showing that the second marginal distribution

of C is indeed uniformly distributed on (0,1), which is a direct consequence of the inverse transform sampling and the well-known result (see for instance [29, Lemma 6.6] for a proof) that for anyη∈ P(R_),

((z, w)7→Fη(z−) +wη({z}))∗(η×λ) =λ. (3.9)

Moreover, we have by (2.4) thatF−1

ν (Fν(y−) +wν({y})) =y for all (y, w)∈R×(0,1], hence for all

u∈(0,1),

π_F−1

µ (u)= (F

−1

ν )∗Cu. (3.10)

Proof of Proposition 2.1. Because of homogeneity of theAWr- andWr-distances, we can suppose w.l.o.g.

thatµ, µk_{, ν, ν}k_and_π_{are probability measures. Let}_C_{be the copula defined by}_{C(du, dv) =}₁

(0,1)(u)du Cu(dv),

(11)

In order to defineπk_{, we construct associated copulas}_Ck _where_u_7→_Ck

u is constant on the jumps of

Fµk. Let θk:R×(0,1)→(0,1), (x, w)7→Fµk(x−) +wµk({x}), Cuk(dv) = Z 1 w=0 C θk F−1 µk(u),w _(dv)_dw, πk = (F_µ−k1, F −1 νk)∗C k_{= (F}−1 µk, F −1 νk )∗(1(0,1)(u)du Cuk(dv)).

The fact that Ck _{is a copula, and therefore} _πk _∈ _Π(µk_{, ν}k_{), is a direct consequence of (3.9) and}

the inverse transform sampling. Sinceu7→Cu and u7→Cuk are constant on the jumps of Fµ and Fµk respectively, reasoning like in the derivation of (3.10), we have fordu-almost everyuin (0,1)

π_F−1 µ (u)= (F −1 ν )∗Cu, π_Fk−1 µk(u) = (F_ν−k1)∗Cuk. Moreover, since (u7→(F−1

µ (u), Fµ−k1(u)))∗λis a coupling betweenµandµk, namely the comonotonous

coupling, we have AWrr(π, πk)≤ Z 1 0 |Fµ−1(u)−Fµ−k1(u)| r₊_Wr r(πF−1 µ (u), π k F−1 µk(u) ) du =Wr r(µ, µk) + Z 1 0 Wr r (Fν−1)∗Cu,(F_ν−k1)∗C k u du. (3.11)

By Minkowski’s inequality we have

Z 1 0 Wrr (Fν−1)∗Cu,(F_ν−k1)∗C k u du 1 r ≤ Z 1 0 Wrr (Fν−1)∗Cu,(Fν−1)∗Cuk du 1 r + Z 1 0 Wrr (Fν−1)∗Cuk,(Fν−k1)∗Cuk du 1 r . (3.12)

Since for anyη ∈ P(R_{) the map}_F_η−1_◦_F−1

Ck

u is non-decreasing, we have (see for instance [3, Lemma A.3]) that fordw-almost everyw∈(0,1),

Fη−1(FC−k1 u(w)) =F −1 (F−1 η )∗Cuk (w). Hence, we deduce Z (0,1) Wrr (Fν−1)∗Cuk,(Fν−k1)∗C k u du= Z (0,1) Z (0,1) |Fν−1(FC−k1 u(w))−F −1 νk (F −1 Ck u(w))| r_{dw du} = Z (0,1) Z (0,1) |F−1 ν (v)−F −1 νk(v)| r_Ck u(dv)du = Z (0,1) |Fν−1(v)−Fν−k1(v)| r_dv₌_Wr r(ν, νk)→0, (3.13)

where we used inverse transform sampling in the second equality. At this stage, we can already show (b). Indeed, the assumption made in (b) ensures that any jump ofFµk is included in a jump ofFµ. We already noted that u7→ Cu is constant on the jumps ofFµ and therefore also constant on the jumps

of Fµk. This yields for all u, w ∈(0,1) that C_θ k(F−1

µk(u),w)

=Cu and particularly Cuk =Cu, which lets

vanish the first term in the right-hand side of (3.12). Then the estimate (2.6) follows immediately from (3.11), (3.12) and (3.13).

To obtain (a) and in view of (3.11), (3.12) and (3.13), it is sufficient to show

Z 1 0

Wrr (Fν−1)∗Cu,(Fν−1)∗Cuk

(12)

This is achieved in two steps: First, we show fordu-almost everyu∈(0,1) that

Wr((Fν−1)∗Cu,(Fν−1)∗Cuk)→0. (3.14)

Second, we prove that

u7→ Wrr((Fν−1)∗Cu,(Fν−1)∗Cuk) k∈N, (3.15)

is uniformly integrable on (0,1) with respect toλ.

To show (3.14), note thatWr-convergence is already determined by a countable family C ⊂Φr(R)

(see [20, Theorem 4.5.(b)]). For this reason, it is sufficient to show that for allf ∈ C, fordu-almost every u∈(0,1), _Z (0,1) f(Fν−1(v))Cuk(dv)→g(u) := Z (0,1) f(Fν−1(v))Cu(dv), k→+∞, (3.16)

where the integrals aredu-almost everywhere well defined because of the inverse transform sampling, the fact thatf ∈Φr(R) andν ∈ Pr(R). Foru∈(0,1), letxu=Fµ−1(u) andxku=Fµ−k1(u). LetU ⊂(0,1) be the set of continuity points ofF−1

µ and define

Uc={u∈ U |Fµ is continuous atxu} and Ud ={u∈ U\Uc |u∈(Fµ(xu−), Fµ(xu))}.

By monotonicity ofFµ−1, the complement ofUin (0,1) is at most countable, and sinceµhas countably

many atoms, the complement ofUdin U\Uc is also at most countable. We deduce that it is sufficient to

show (3.16) fordu-almost allu∈ Uc∪ Ud.

Let thenu∈ U. Ifµk₍_{_xk

u}) = 0, thenCuk =Cu and

Z (0,1)

f(Fν−1(v))Cuk(dv) =g(u).

From now on and until (3.16) is proved, we suppose w.l.o.g. that µk₍_{_xk

u})>0 for allk∈N. Then

Z (0,1) f(Fν−1(v))Cuk(dv) = 1 µk₍_{_xk u}) Z F_µk(xk u) F_µk(xk u−) g(w)dw. (3.17) Define lk = infn≥kxun andrk = supn≥kxun. Sinceu∈ U we find lk րxu andrk ցxu whenk goes

to +∞. Due to right continuity ofFµ and left continuity ofx7→Fµ(x−) we have

Fµ(xu−) = lim

p Fµ(lp−) and limp Fµ(rp) =Fµ(xu).

By Portmanteau’s theorem and monotonicity of cumulative distribution functions we have Fµ(lp−)≤lim inf k Fµk(lp−)≤lim infk Fµk(x k u−)≤lim sup k Fµk(xk_u)≤lim sup k Fµk(rp)≤Fµ(rp).

By taking the limitp→+∞, we find Fµ(xu−)≤lim inf k Fµk(x k u−)≤lim sup k Fµk(xk_u)≤F_µ(x_u). (3.18)

By (2.5), the interval [Fµk(xk_u−), F_µk(xk_u)] contains u, and if u ∈ U_c, then (3.18) implies that its length µk₍_{_xk

u}) vanishes when k goes to +∞. Consequently, (3.17) and the Lebesgue differentiation

theorem yield that fordu-almost everyu∈ Uc,

Z (0,1)

f(Fν−1(v))Cuk(dv)→g(u).

Suppose nowu∈ Ud and define

ak =Fµk(x_uk−)∨F_µ(x_u−), b_k=F_µk(xk_u)∧F_µ(x_u).

Note that on the interval (ak, bk) the functiong is constant equal tog(u), so (3.17) writes

Z (0,1) f(F−1 ν (v))Cuk(dv) = 1 µk₍_{_xk u}) Z F_µk(xk u) bk g(w)dw+ Z bk ak g(u)dw+ Z ak F_µk(xk u−) g(w)dw ! .

(13)

According to (3.18),

ak−Fµk(xk_u−)→0 and F_µk(xk_u)−bk→0, k→+∞. (3.19)

Moreover, it is clear with (2.5) in mind that

Fµk(xk_u−)< a_k =⇒ µk({xk_u})≥u−F_µ(x_u−), and bk < Fµk(xk_u) =⇒ µk({xk_u})≥F_µ(x_u)−u.

(3.20) Using the latter fact and the equality

bk−ak =µk({xku})−(Fµk(x k u)−bk)−(ak−Fµk(x k u−)), we get 1−Fµk(x k u)−bk Fµ(xu)−u −ak−Fµk(x k u−) u−Fµ(xu−) ≤ bk−ak µk({xku}) ≤1. Hence by (3.19) we have bk−ak

µk({xku}) →1 as kgoes to +∞, which implies that

1

µk₍_{_xk u})

Rbk

akg(u)dw → g(u) ask→+∞. Therefore, we just have to show that

1 µk₍_{_xk u}) Z F_µk(xk u) bk g(w)dw+ Z ak F_µk(xk u−) g(w)dw ! →0, k→+∞. (3.21)

Note that we can assume w.l.o.g. that for all k ∈ N _either _F_µk(xk_u−) < a_k or b_k < F_µk(xk_u). Let d= (u−Fµ(xu−))∧(Fµ(xu)−u), which is positive since u∈ Ud. Then we have by (3.20)

1 µk₍_{_xk u}) Z F_µk(xk u) bk g(w)dw+ Z ak F_µk(xk u−) g(w)dw ≤ 1 d Z F_µk(xk u) bk g(w)dw+ Z ak F_µk(xk u−) g(w)dw . (3.22)

By the inverse transform sampling and the facts thatf ∈Φr(R) andν ∈ Pr(R), we haveR₀1|g(w)|dw =

R

R|f(y)|ν(dy)<+∞. Then (3.21) is a direct consequence of (3.22), (3.19) and the dominated

conver-gence theorem. Hence (3.14) is proved fordu-almost everyu∈(0,1). Next, we show uniform integrability of (3.15). We can estimate

Wrr((Fν−1)∗Cu,(Fν−1)∗Cuk)≤2r−1 Z (0,1) |Fν−1(v)|rCu(dv) + Z (0,1) |Fν−1(v)|rCuk(dv) ! . Since by inverse transform sampling we have

Z (0,1) Z (0,1) |Fν−1(v)|rCu(dv)du= Z R |y|rν(dy)<∞,

it is enough to show uniform integrability ofu7→R₍₀_,₁₎|F−1

ν (v)|rCuk(dv),k∈N.

On the one hand, using the inverse transform sampling andν ∈ Pr(R), we have

sup k Z (0,1) Z (0,1) |Fν−1(v)|rCuk(dv)du= Z R |y|rν(dy)<+∞.

On the other hand, letε >0 andAbe a measurable subset of (0,1) such thatλ(A)< ε. We have

Z A Z (0,1) |Fν−1(v)|rCuk(dv)du= Z R |y|r(Fν−1)∗τk(dy),

(14)

whereτk_{(dv) =}R1

u=01A(du)C

k

u(dv)du. Note thatτk≤λ, (Fν−1)∗τk ≤νand (Fν−1)∗τk(R) =τk((0,1)) =

λ(A). Therefore, sup A∈B((0,1)), λ(A)≤ε sup k Z A Z (0,1) |F−1 ν (v)|rCuk(dv)du≤Iεr(ν), where Ir

ε(ν) is defined by (3.1). By Lemma 3.1, the right-hand side converges to 0 with ε→ 0. This

yields uniform integrability of (3.15), which completes the proof.

As mentioned in Section 2, Proposition 2.1 generalises to Polish spaces. Unsurprisingly, the proof of Proposition 2.3 requires radically different tools from its unidimensional equivalent. In particular, we need to recall the so called Weak Optimal Transport (WOT) problem introduced by Gozlan, Roberto, Samson and Tetali [23] and studied in [22], formulated according to our needs. LetC:X× Pr(Y)→R+

be nonnegative, continuous, strictly convex in the second argument and such that there exists a constant K >0 which satisfies ∀(x, p)∈X× Pr(Y), C(x, p)≤K 1 +drX(x, x0) + Z Y drY(y, y0)p(dy) . (3.23) Then the WOT problem consists in minimising

VC(µ, ν) := inf π∈Π(µ,ν)

Z

X

C(x, πx)µ(dx). (WOT)

In view of the definition (1.3) of the adapted Wasserstein distance which involves measures on the extended spaceX× P(Y), it is natural to consider an extension of (WOT) which also involves this space. Hence we also consider the extended problem

VC′(µ, ν) := inf P∈Λ(µ,ν)

Z

X×P(Y)

C(x, p)P(dx, dp), (WOT’) where Λ(µ, ν) is the set of couplings between µand any measure onP(Y) with meanν, that is

Λ(µ, ν) = ( P ∈ P(X× P(Y))| Z (x′,p)∈X×P(Y) δx′(dx)p(dy)P(dx′, dp)∈Π(µ, ν) ) . (3.24)

Remark 3.3. We gather here useful results on weak transport problems which hold under the assumption we made onC:

(a) (WOT) admits a unique minimiserπ∗ _[₈_{, Theorem 1.2];}

(b) As a consequence of the necessary optimality criterion [9, Theorem 2.2] and [9, Remark 2.3.(a)], J(π∗_{) is the only minimiser of (WOT’)}

(c) V(µ, ν) =V′_{(µ, ν) [}₈_{, Lemma 2.1];}

(d) Stability of (WOT) and (WOT’): Let µk _{∈ P}

r(X), νk ∈ Pr(Y), k ∈ N converge respectively to

µ∈ Pr(X) and ν ∈ Pr(Y) inWr. For k∈N, let πk ∈Π(µk, νk) be optimal forV(µk, νk). Then

πk_{, resp.} _J_(πk_{), converges to the unique minimiser} _π∗_{, resp.}_J_(π∗_{), in} _W

r [9, Theorem 1.3 and

Corollary 2.8]. In particular, this shows that πk _{converges to}_π∗ _{even in} _AW

r.

Proof of Proposition 2.3. Letε >0 and y0∈Y. Define forR >0 the Wr-open ballBR of radius R1/r

and centreδy0 and the set

AR={x∈X |πx∈BR}= x∈X Z Y dr Y(y, y0)πx(dy)< R . Sinceν ∈ Pr(R), R X R Y d r Y(y, y0)πx(dy)µ(dx) = R Y d r

Y(y, y0)ν(dy)<+∞, hence µ is concentrated

onS_R>₀AR and we can chooseRlarge enough such that

(15)

By Lusin’s theorem, there exists a closed setF⊂AR such that

µ(X\F)< ε and x7→πx restricted toF is continuous.

LetMfr(Y) be the linear space of all finite signed measures onY, the positive and negative parts

of which are contained in Mr(Y), equipped with the weak topology induced by Φr(Y). Since weak

topologies are locally convex, an extension of Tietze’s theorem [19, Theorem 4.1] yields the existence of a continuous mapx7→πx defined onX with values inMfr(Y) such thatπx=πxfor allx∈F and

{πx|x∈X} ⊂co{πx|x∈F} ⊂BR,

where co denotes the convex hull.

Next, we define a nonnegative, continuous, strictly convex in the second argument function which satisfies a condition of the form (3.23) in order to use the results on weak transport problems detailed in Remark 3.3. Let{gk|k∈N} ⊂Φ1(Y) be a family of 1-Lipschitz continuous functions and absolutely

bounded by 1, which separatesP(Y) (see [20, Theorem 4.5.(a)]). We have for any pairp, p′ _{∈ P}_(Y_),

p6=p′ _{that there is}_l_∈_N_{such that}

Z Y gl(y)p(dy)6= Z Y gl(y)p′(dy). (3.25)

Define C:X× Pr(Y)→R+ for all (x, p)∈X× Pr(Y) by

C(x, p) :=ρ(πx, p) + X k∈N 1 2k Z Y gk(y)πx(dy)− Z Y gk(y)p(dy) 2 , whereρ:P(Y)× P(Y)→[0,1] is defined for allp, p′_{∈ P}_(Y_{) by}

ρ(p, p′) = inf

χ∈Π(p,p′) Z

Y×Y

(dY(y, y′)∧1)χ(dy, dy′).

Sinceρcan be interpreted as a Wasserstein distance with respect to a bounded distance, it is imme-diate that it is a metric on P(Y) which induces the weak convergence topology. On the one hand, the map (x, p)7→ρ(πx, p) is continuous by continuity ofx7→ πx. On the other hand, by Kantorovich and

Rubinstein’s duality theorem and Jensen’s inequality, we have for all (x, p),(x′_{, p}′₎_∈_X_{× P}

r(Y) X k∈N 1 2k Z Y gk(y)πx(dy)− Z Y gk(y)p(dy) 2 − Z Y gk(y)πx′(dy)− Z Y gk(y)p′(dy) 2 =X k∈N 1 2k Z Y gk(y)πx(dy)− Z Y gk(y)p(dy) + Z Y gk(y)πx′(dy)− Z Y gk(y)p′(dy) × Z Y gk(y)πx(dy)− Z Y gk(y)p(dy)− Z Y gk(y)πx′(dy) + Z Y gk(y)p′(dy) ≤X k∈N 4 2k Z Y gk(y)πx(dy)− Z Y gk(y)πx′(dy) + Z Y gk(y)p(dy)− Z Y gk(y)p′(dy) ≤8 (W1(πx, πx′) +W₁(p, p′))≤8 (W_r(π_x, π_x′) +W_r(p, p′)),

where the right-hand side vanishes when (x′_{, p}′_{) converges to (x, p) by continuity of}_x_7→_π

x. We deduce

thatC is continuous.

Note thatρis convex in the second argument. Therefore, to obtain strict convexity ofC(x,·) in the second argument, it is sufficient to verify that

F(p) =X k∈N 1 2k Z Y gk(y)p(dy) 2

(16)

is strictly convex. Letp, p′ _{∈ P}_(Y_),_p₆₌_p′ _and_l_∈_N_{such that (3.25) holds. Hence, strict convexity of}

the square proves

α Z Y gl(y)p(dy) + (1−α) Z Y gl(y)p′(dy) 2 < α Z Y gl(y)p(dy) 2 + (1−α) Z Y gl(y)p′(dy) 2 , which yields strict convexity ofF onP(Y).

Moreover, we have for all (x, p)∈X×Pr(Y),C(x, p)≤1+8 = 9, henceCsatisfies (3.23). Remember

the definitions ofVCandVC′ given in (WOT) and (WOT’). Since for allx∈F,C(x, πx) =C(x, πx) = 0,

we have

VC(µ, ν)≤

Z

X\F

C(x, πx)µ(dx)<9ε.

Letπ∗,ε_∈_{Π(µ, ν) be optimal for}_V

C(µ, ν). ForP, P′∈ P(X× P(Y)), let ˜ ρ(P, P′) = inf χ∈Π(P,P′) Z X×P(Y)×X×P(Y) ((dX(x, x′) +ρ(p, p′))∧1)χ(dx, dp, dx′, dp′). Sinceµ(dx)δπx(dp)δx(dx ′₎_δ π∗,ε x′ (dp

′_{) is a coupling between}_{J(π) and}_J_(π∗,ε_{), we can estimate}

˜ ρ(J(π), J(π∗,ε))≤ Z X ρ(πx, πx∗,ε)µ(dx) ≤ Z F ρ(πx, π∗x,ε)µ(dx) + Z X×F Z Y (dY(y, y0)∧1) (πx+πx∗,ε)(dy)µ(dx) ≤VC(µ, ν) + 2ε <11ε.

Fork∈N_{, let}_πk,ε_∈_Π(µk_{, ν}k_{) be optimal for} _V_C_(µk_{, ν}k_{). Then}_J_(πk,ε_{) is optimal for}_V_C′_(µk_{, ν}k_{) by}

Remark 3.3 (b), and converges toJ(π∗,ε_{) in}_W

r and therefore weakly by Remark 3.3 (d). Then we get

lim sup k→+∞ ˜ ρ(J(πk,ε), J(π))≤lim sup k→+∞ ˜ ρ(J(πk,ε), J(π∗,ε)) + ˜ρ(J(π∗,ε), J(π))≤11ε. (3.26)

So farε >0 was arbitrary. Therefore, there exists a strictly increasing sequence (kN)N∈N∗ of positive integers such that

∀N ∈N∗_, _∀_k_≥_k_N_, _ρ(J_˜ _(πk,2−rN_{), J}_(π))_≤₁₂_·₂−rN_.

For k∈N_{, let}_N_k _{= max}_{_N _∈N_|_k _≥_k_N_}_{, where the maximum of the empty set is defined as 0.}

SincekN is strictly increasing, we find thatNk→+∞ask→+∞. Then the sequence of couplings

πk=πk,2−rNk ∈Π(µk, νk), k∈N

is such that ˜ρ(J(πk_{), J}_{(π)) vanishes as} _k _{goes to +}_∞_{, and therefore} _J_(πk_{) converges weakly to} _J_(π).

Moreover, sinceWr-convergence is equivalent to weak convergence coupled with convergence of the

r-moments, we have that ther-moments ofµk _and_νk _{respectively converge to the}_{r-moments of}_µ_and_ν_,

which implies Z X×P(Y) Wrr(p, δy0)J(π k_{)(dx, dp) =}Z X Wrr(πxk, δy0)µ k_{(dx) =}Z Y drY(y, y0)νk(dy) −→ k→+∞ Z Y drY(y, y0)ν(dy) = Z X×P(Y) Wrr(p, δy0)J(π)(dx, dp).

We deduce that J(πk_{) converges to} _J_{(π) in} _W

r as k→+∞. According to (1.3),πk,ε converges to

π∗,ε _in _AW

r, which concludes the proof.

In the proof of Theorem 2.5 we need to be able to confine approximative sequences of couplings to certain sets. The next result provides all necessary tools for this.

Lemma 3.4. Letµ, µk _{∈ M}

r(X),ν, νk ∈ Mr(Y),k∈Nall with equal mass andπk∈Π(µk, νk),k∈N,