• No results found

GAEP.pdf

N/A
N/A
Protected

Academic year: 2020

Share "GAEP.pdf"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

The GAEP algorithm for the fast computation of

the distribution of a function of dependent

random variables

Philipp Arbenz

Paul Embrechts

Giovanni Puccetti

September 1, 2010

Abstract

We introduce a new algorithm for numerically computing the distribution of an increasing function ofddependent, non-negative random variables with given joint distribution. We prove convergence of the algorithm and give con-vergence rates under regularity conditions.

Key words:Distribution functions; dependent random variables.

AMS Subject Classfication:62E17; 65C20; 65C50.

1 Introduction

In this paper, we introduce the GAEP algorithm in order to computeP[ϕ(X)s], whereXis a random vector in (0,)d andϕis a continuous function, strictly in-creasing in each coordinate. The algorithm is based on the decomposition of the set {x(0,)d:ϕ(x)s} into a countable family of disjoint hypercubes. The cor-responding probabilityP[ϕ(X)≤s] is then approximated by the measure over these hypercubes.

The GAEP (Generalized AEP) algorithm is similar in spirit to the AEP algorithm introduced by the same authors in Arbenz et al. (2010) for the caseϕ(x)=Pd

k=1xk.

As the two algorithms are based on different geometrical decompositions, GAEP is not a proper extension of AEP.

The paper is organized as follows. After some preliminaries in Section 2, we il-lustrate GAEP in dimension two (d =2) in Section 3. Section 4 extends GAEP to arbitrary dimensions, its convergence being discussed in Section 5, 6 and 7. In Sec-tion 8, we test GAEP on some examples, and, in SecSec-tion 9, we compare and contrast it to its main competitors. Section 10 illustrates the differences between GAEP and AEP, while, in Section 11, we provide a method to improve convergence rates in di-mension three. Section 12 concludes the paper.

[email protected], Department of Mathematics, ETH Zurich, 8092 Zurich, Switzerland

[email protected], Department of Mathematics, ETH Zurich, 8092 Zurich, Switzerland[email protected], Department of Mathematics for Decisions, University of Firenze,

(2)

2 Notation and preliminaries

Fixd∈N,d≥2, and defineN=2d. We setR+=[0,+∞) andR=(−∞, 0]. Through-out the paper, (row) vectors are denoted in boldface, for example,ek∈Rdis thekth standard unit vector, forkD={1, . . . ,d}. We writei1, . . . ,iN for all the 2d vectors in {0, 1}d, that is,i1=0=(0, . . . , 0),ik+1=ek,kD, and so on,iN=1=(1, . . . , 1). By #i=Pd

k=1ik, we denote the number of 1’s in the vectori, for example, #i1=0, #iN=

d. We define the componentwise product between two vectorsx=(x1, . . . ,xd),y= (y1, . . . ,yd)∈Rdas

xy=(x1y1, . . . ,xdyd)∈Rd.

For instance,xek=xkek=(0, . . . , 0,xk, 0, . . . , 0). Let≥denote the componentwise order between vectors, that is,xyif and only ifxkykfor allkD. The orders≤,

<and>are defined analogously.

On some probability space (Ω,A,P), assume that the random vectorX=(X1, . . . ,Xd) has joint distributionH. Throughout the paper, we assume the marginal compo-nentsXkto be non-negative, i.e. P[Xk≤0]=0 for allkD. The extension to ran-dom vectors bounded from below is straightforward and will be illustrated in the following. The joint distributionHinduces the probability measureVHonRdvia

VH

h ©

yRd:yxªi

=H(x) , for allxRd.

ForbRdandhRdRd+, we define the hypercubeQ(b,h)Rdas

Q(b,h)= (©

x∈Rd:b<xb+hª

, ifh∈Rd+, ©

xRd:b+h<xbª

, ifhRd. (1)

ForhRd+, theVH-measure ofQ(b,h) can be calculated easily as

VH[Q(b,h)]=P[X∈Q(b,h)]= N

X

j=1

(1)d+#ijH¡ b+hij

¢

. (2)

The caseh∈Rd−is analogous. As a special case of (2) ford=2, the probability

mea-sure of a rectangleQ(b,h)=(b1,b1+h1]×(b2,b2+h2] can be written as

VH[Q(b,h)]=H(b1,b2)−H(b1+h1,b2)−H(b1,b2+h2)+H(b1+h1,b2+h2).

LetN be the set of continuous functionsϕ:RdR, which are strictly increasing in each coordinate, and such that

lim

t→+∞ϕ(b+tek)= +∞, and t→−∞lim ϕ(b+tek)= −∞, for all b∈R

d andk

D.

(3)

Forb∈Rdandp∈Rd−∪Rd+we also define thequasisimplexS(b,p) as

S(b,p)= (©

xRd:b<xb+p,ϕ(x)sª

, ifpRd+,

©

x∈Rd:b+p<xb,ϕ(x)>sª

, ifp∈Rd−.

(3)

Note that, if one or more of the components ofhandpare equal to zero, thenQ(b,h) andS(b,p) are empty.

Sinceϕ∈N, there exists a unique vectorp11∈Rd+such that

ϕ(p11ek)=s, for allkD. (4)

Figure 1 illustrates (3) and (1) as well as (4) ford=2. DefiningS11=S(0,p11) and recalling thatXis non-negative, (4) implies that

ϕ(X)≤s¤

=VH£S11

¤

.

b1

S(b2,p2)

b2

b1+p1

S

1

1

x2

b2+p2

0

{x∈R2+:ϕ(x)=s}

x1

b4+p4

b3

b3+p3

p11

Q(b4,p4)

Q(b3,p3)

S(b1,p1)

b4

Figure 1: Three quasisimplexesS11=S(0,p11),S(b1,p1),S(b2,p2)⊂R2and two

hypercubesQ(b3,p3),Q(b4,p4)⊂R2withp11,p1,p4∈R2+andp2,p3∈R2−. Thick and

dashed lines indicate closed and, respectively, open boundaries of the sets.

3 Description of the GAEP algorithm in dimension

d

=

2

(4)

Leth∈R2+such thathp11. For ease of notation, we writep=p11throughout this

section. As illustrated in Figure 2, the rectangleQ(0,p)⊂R2can be split into four disjoint rectangles along the components ofh, as

Q(0,p)=Q(0,h)∪Q¡

(h1, 0), (p1−h1,h2)

¢ ∪Q¡

(0,h2), (h1,p2−h2)

¢ ∪Q¡

h,ph¢

. (5)

A similar decomposition holds for the quasisimplexS(0,p), for which we have

S(0,p)=S(0,h)S((h1, 0), (p1−h1,h2))∪S((0,h2), (h1,p2−h2))∪S(h,ph).

(6)

Note that S(0,h)=Q(0,h) \S(h,−h). By setting S11=S(0,p), Q11=Q(0,h),

p

x

2

0

x

1

p

x

2

0

x

1

Q

(

0

,

p

)

h

Figure 2: An illustration of (5), where the rectangleQ(0,p)⊂R2is decomposed into four disjoint rectangles.

S1

2 =S(h,−h),S22=S((h1, 0), (p1−h1,h2)),S23=S((0,h2), (h1,p2−h2)) andS24=

S(h,ph), we can reformulate (6) as

S1 1 =

¡

Q1 1\S21

¢

∪S22∪S23∪S24, (7)

as illustrated in Figure 3. Note that eitherS21= ;(ifϕ(h)>s) orS24= ;(ifϕ(h)≤s).

Since theS2t’s are disjoint andS21⊂Q11, (7) translates into the decomposition

VH£S11

¤

=VH£Q11

¤

VH£S21

¤

+VH£S22

¤

+VH£S23

¤

+VH£S24

¤

. (8)

As a first approximation ofVH[S11] we take the valueP1=VH[Q11]. Thus, the differ-ence betweenP1andVH[S11] is given by

VH£S11

¤ −P1=

4

X

t=1

τt

2VH£S2t

¤

,

whereτ2t{1, 1} indicates whether the measureVH[S2t] has to be added (τ22=τ32=

τ4

2=1) or subtracted (τ12= −1).

(5)

Q

1 1

Q

1 1

S

3 2

S

3 2

S

1 2

h

0 0

h

x2 x2

x1

x1

S

2 2

S

4 2

S

2 2

{x∈R2+:ϕ(x)=s}

{x∈R2+:ϕ(x)=s} p p

Figure 3: An illustration of the decomposition (7) for two different choices ofh

Rd

−∪Rd+. In the left picture, we haveϕ(h)>sandS24= ;, while, in the right picture,

we haveϕ(h)<sandS21= ;.

The measure of the four squares so obtained is to be added or subtracted toP1in

order to define a second estimateP2ofVH[S11], so that the difference betweenP2

andVH[S11] will be given by the measure of the remaining sixteen quasisimplexes. The latter are then decomposed again in the following step of the algorithm. By iterating this procedure, we obtain a sequencePn of estimates which we will prove to converge toVH[S11] under the regularity conditions given in Section 5. In the general case, at each iteration of the algorithm, we restrict to quasisimplexes which turn out to be nonempty.

4 Description of the GAEP algorithm in arbitrary

dimen-sions

In this section, we state the two main results of the paper. First, we will extend the measure decomposition (8) to arbitrary dimensions, by showing that the measure of an arbitrary quasisimplex can be decomposed into the measures of a hypercube andN=2d smaller quasisimplexes. Then, we define a sequencePn which will be proved to converge toVH[S11].

Forp∈Rd−∪Rd+, fixh∈H(p), where

H(p)=

x∈Rd:0xpª

, ifp∈Rd+, ©

xRd:px0ª

, ifpRd.

(6)

Lemma 1. For an arbitrary hypercubeQ(b,p)⊂Rd, we have

Q(b,p)= N

[

j=1

b+ijh,ijp+(1−2ij)◦h

¢

, (9)

with the hypercubes on the right-hand side of(9)being disjoint.

Proof. Forj=1, . . . ,N, let the setsCjbe defined as

Cj={x∈Rd:xkbk+hkfor allkwith (ij)k=0,xk>bk+hkfor allkwith (ij)k=1}.

SinceSN

j=1ij ={0, 1}

d, we have that the family {C

j,j=1, . . . ,N} is a partition ofRd, that is,Ci∩Cj= ;andSNj=1Cj=Rd. Hence, we can write

Q(b,p)=

N

[

j=1

¡

Q(b,p)∩Cj¢.

Ifp∈Rd+(the casep∈R−d is analogous), note that for allj =1, . . . ,N andkD, we

have

¡

b+ijh

¢

k=

(

bk, if (ij)k=0, bk+hk, if (ij)k=1,

(10)

¡

ijp+(1−2ij)◦h

¢

k=

(

hk, if (ij)k=0, pkhk, if (ij)k=1.

(11)

Thus, the result follows by observing that

Q(b,p)∩Cj=

n

x∈Rd:bk<xkbk+hkfor allkwith (ij)k=0,

bk+hk<xkpkfor allkwith (ij)k=1

o

=Q¡

b+ijh,ijp+(1−2ij)◦h¢.

The set decomposition (9) translates into the measure decomposition given by the following theorem.

Theorem 2. Letv1, . . . ,vN∈{0, 1}dbe defined asv1=1andvk=ikfor k=2, . . . ,N . Let

b∈Rd,p∈Rd−∪Rd+andh∈H(p). Then

VH

£

S(b,p

=VH[Q(b,h)]+ N

X

j=1

mjVH

£

b+vjh,ijp+(1−2vj)◦h

¢¤

, (12)

(7)

Proof. First assume thatϕ(b)≤sandp∈Rd+. Defining the setΛ={x∈Rd:ϕ(x)≤s}, and using (9), we can write

Λ∩Q(b,p)=S(b,p)=[Nj=1

©

Λ∩Q¡

b+ijh,ijp+(1−2ij)◦h¢ª

=[Nj=1S

¡

b+ijh,ijp+(1−2ij)◦h

¢

. (13)

Since the hypercubes on the right-hand side of (9) are disjoint, the quasisimplexes on the right-hand side of (13) are disjoint, too. Thus, from (13), we get

VH£S(b,p)¤= N

X

j=1

VH£S¡b+ijh,ijp+(1−2ij)◦h¢¤. (14)

Sincei1=0, we have

VH[S(b+i1◦h,i1◦p+(1−2i1)◦h)]=VH[S(b,h)], and we can write (14) as

VH£S(b,p)¤=VH[S(b,h)]+ N

X

j=2

VH£S¡b+ijh,ijp+(1−2ij)◦h¢¤. (15)

Noting thatS(b,h)=Λ∩Q(b,h), and using the fact thatQ(b,h)=Q(b+h,−h),we obtain

VH[S(b,h)]=VH[Λ∩Q(b,h)]=VH[Q(b,h)]−VH[Q(b+h,−h)∩ΛC]

=VH[Q(b,h)]−VH[S(b+1h,0p+(1−2·1)◦h)]. (16)

Substituting (16) in (15), and recalling the definition of thevj’s, we finally get (12). The caseϕ(b)>s,p∈Rd−is analogous. The casesϕ(b)≤s,p∈Rd−, andϕ(b)>s, pRd+, are trivial, since they implyVH[S(b,p)]=0 andVH[Q(b,h)]=VH[S(b+

h,h)].

Note that Theorem 2 holds for all measurable functionsϕ:RdR.

The idea behind the GAEP algorithm is to apply the decomposition (12) recur-sively to the nonempty quasisimplexes on the right side, starting withS11=S(0,p11),

p11being the unique vector satisfying (4). Note that, by changing the conditions (4) and the vectorb11, the algorithm can be applied to the case in which the random vec-torXalso takes negative values but it is still bounded from below, as, for example,

P[Xb11]=1.

At the beginning of thenth iteration (n∈N), the algorithm receives, as input, a family

St

n =S(btn,ptn),tIn⊂{1, . . . ,Nn−1}

of nonempty quasisimplexes. As already remarked, forn=1 we haveI1={1} and

S1

1 =S(0,p11). Given a sequence of splitting pointsh

t

(8)

qua-sisimplexSntis then decomposed into one hypercube andNquasisimplexes via (12):

VH£S(btn,ptn)

¤

=VH£Q(btn,htn)

¤ +

N

X

j=1

mjVH£S¡bnt+vjhtn,ijpnt+(1−2vj)◦htn

¢¤

=VH£Q(btn,htn)

¤ +

N

X

j=1

mjVH

h

bnN t+1N+j,pnN t+1N+j´i,tIn,

(17)

where the sequencesbnt andptnare defined by their initial valuesb11=0andp11, and by

bnN t+1N+j=bnt+vjhnt, (18)

pN tn+1N+j=ijpnt+(1−2vj)◦htn, (19) for allj=1, . . . ,N andtIn. Recall from Theorem 2 thatm1= −1 andmj =1 for j=2, . . . ,N. At this point, the family of nonempty quasisimplexes obtained on the right-hand side of (17),

S(btn+1,ptn+1),tIn+1, with In+1={t∈{1, . . . ,Nn} :S(bnt+1,ptn+1)6= ;},

is passed to the (n+1)th iteration of the algorithm.

As an example, we illustrate the first iteration of the algorithm in the cased=3, where we denotep11=pwith0hp. For the simplexS11=S(0,p), the measure decomposition (12) gives

VH[S(0,p)]=VH[Q(0,h)]−VH[S(h,−h)]+VH[S((h1, 0, 0), (p1−h1,h2,h3))]

+VH[S((0,h2, 0), (h1,p2−h2,h3))]+VH[S((0, 0,h3), (h1,h2,p3−h3))]

+VH[S((h1,h2, 0), (p1−h1,p2−h2,h3))]+VH[S((h1, 0,h3), (p1−h1,h2,p3−h3))]

+VH[S((0,h1,h2), (h1,p2−h2,p3−h3))]+VH[S(h,ph)].

In Figure 4, we illustrate the case in whichϕ(pek)=s,kD(see conditions (4)),

ϕ(h1,h2, 0)≤s,ϕ(h1, 0,h3)≤sandϕ(0,h2,h3)>s, leading toS27=S28= ;.

There-fore, we haveI1={1} andI2={1, 2, 3, 4, 5, 6}. Note also that, in the two-dimensional

cases described in Figure 3, we hadI2={1, 2, 3} (left) andI2={2, 3, 4} (right).

In general, the set of indexesIn+1, which identifies the quasisimplexesSnt+16= ;,

depends on the vectorshtn,tIn and on the functionϕ, at each iteration of the algorithm. Nevertheless, for a fixedn, the quasisimplexesSnt+1,tIn+1, are always

disjoint (this follows from the proof of Theorem 2).

Now, define the familyQnt =Q(bnt,htn),tIn, and the sequencePnas the sum of theVH-measures of all theQnt’s multiplied by the correspondingτtn, as

Pn=Pn−1+

X

tIn τt

nVH

£

Qt n

¤ =

n

X

i=1

X

tIi τt

iVH

£

Qt i

¤

,nN, (20)

whereP0=0 and the sequenceτnt is defined by its initial valueτ11=1 and by

τN tN+j

n+1 =τ

t

(9)

{x∈R3+:ϕ(x)=s}

S3 2

S4 2

S1 2

h2

h3

h1

S6 2

S2 2

p3

p2

p1

0

h

S5 2

Figure 4: An illustration of the decomposition (12) ofS(0,p)R3for some vector

hRd+.

The valueτtn{1, 1} indicates whether the measure of the quasisimplexSnthas to be added (τtn=1) or subtracted (τtn= −1) in order to compute an approximation of VH[S11].

We now show that, at each iteration of the algorithm, the error committed by takingPnas an approximation ofVH

£

S1 1

¤

can be expressed in terms of the measures of the nonempty quasisimplexesSnt+1,tIn+1passed to the (n+1)th iteration of the

algorithm.

Theorem 3. With the notation introduced above, we have that

VH

£

S1 1

¤ −Pn=

X

tIn+1

τt n+1VH

£

St n+1

¤

. (22)

Proof. We proceed by induction onn. Recalling thatτ11=1 andP0=0, (22)

corre-sponds, forn=0, to the trivial equalityVH

£

S1 1

¤ =VH

£

S1 1

¤

. Hence, suppose that (22) holds for somen∈N. Substituting (17) in (22), and using (20), yields

VH

£

S1 1

¤ =Pn+

X

tIn+1

τt n+1VH

£

Qt n+1

¤ + X

tIn+1

τt n+1

ÃN X

j=1

mjVH

h

SN tN+j n+2

i !

=Pn+1+

X

tIn+1

N

X

j=1

τt

n+1mjVH

h

SnN t+2−N+j

i

.

Recalling (21), we get

VH

£

S1 1

¤

=Pn+1+

X

tIn+1

N

X

j=1

τN tN+j n+2 VH

h

SN tN+j n+2

i

=Pn+1+

X

tIn+2

τt n+2VH

£

St n+2

¤

,

where the last equality follows from the fact that the simplexesSnt+2,tIn+2, are

(10)

5 Convergence of the algorithm

In this section, we give sufficient conditions for the convergence of the sequence Pn, defined above, to the valueVH

£

S1 1

¤

. First, in Lemma 4, we give a simple set rep-resentation ofPn. Recall that the setIn+1identifies the non empty quasisimplexes

St

n+1,tIn+1, which are passed to the (n+1)th iteration of the algorithm. We

parti-tion the setIn+1into the families

I+n+1={tIn:τnt+1= +1} andIn+1={tIn:τtn+1= −1}.

Lemma 4. For any nN,we have that Pn=VH[Bn], where

Bn=

Ã

S1 1

[

tIn+1

St n+1

!

\ [ tI+n+1

St

n+1. (23)

Proof. Using induction, we prove that, for a fixednand alltIn, we have

St n

(

S1

1, ifτtn= +1, Q(0,p11) \S11, ifτtn= −1.

Then, the result easily follows from (22) and the fact that, for a fixedn, theSnt+1’s are disjoint.

From its definition (3), a quasisimplexS(b,p) lies inS11if and only ifpR+

d. It lies instead inQ(0,p11) \S11if and only ifp∈R−d. Therefore, we equivalently have to show that, for a fixednand alltIn,

ptn∈ (

R+

d, ifτ t n= +1,

R−

d, ifτ t n= −1.

(24)

To this end, assume that (24) is true for a fixedn(forn=1, it trivially holds) and choose an arbitrarySnt+1,tIn+1such thatτnt+1= +1 (the caseτ

t

n+1= −1 is

anal-ogous). By (21), there existt0I

n andj ∈{1, . . . ,N}, withτtn+1=τ

N t0N+j

n+1 =τ

t0

nmj. Sinceτnt+1= +1, either (case I)τnt0= −1 andmj = −1 (in this case,j=1) or (case II)

τt0

n =1 andmj= +1 (in this case,j6=1).

Case I:If j=1, using (19) we obtainpnN t+01N+j = −hnt0. Since it also holds thatτtn0= −1, using the induction assumption, we obtain thatptn∈R−d and finally (recall that

hnt H(p))pnN t+01N+j= −htnR+

d. Case II:Forj6=1, (11) yields

(pN tn+01N+j)k=

(

(hnt0)k, if (ij)k=0,

(pnt0−hnt0)k, if (ij)k=1.

Since it also holds thatτtn0 = +1, using the induction assumption, we obtain that

(11)

A simple illustration of (23) in dimensiond=2 is given by (7) and by the corre-sponding Figure 3. Now, we use Lemma 4 to obtain bounds onPn.

Theorem 5. Ifϕ∈N then, for all n∈N, we have

ϕ(X)≤ln¤≤Pn≤P£ϕ(X)≤un¤, (25)

where

ln=min

©

s, min tI+

n+1

ϕ(btn+1

and un=max

©

s, max tI

n+1

ϕ(btn+1

.

Proof. Suppose that, for a quasisimplexSnt+1, we haveτtn+1= −1. Then (see (24))

pnt+10. Therefore, over the quasisimplexSnt+1, the functionϕN attains its maximum atbnt+1, i.e.

max x∈St

n+1

ϕ(x)=ϕ(bnt+1).

Using the above result and (23), we can write

Bn⊂S11

[

tIn+1

St n+1⊂

n

x∈Rd+:ϕ(x)≤max ©

s, max tIn+1ϕ(b

t n+1)

ªo

,

which yieldsBn©

x∈Rd+:ϕ(x)≤un

ª

. Recalling from Lemma 4 thatPn=VH[Bn], the right-hand side of (25) follows. Analogously, ifτtn+1= +1, we have thatptn+10

and

inf x∈St n+1

ϕ(x)=ϕ(bnt+1).

Recalling thatbnt+1Snt+1whenptn+10, we obtain thatϕ(btn+1)<ϕ(x) for allx

St

n+1. Using again (23), it follows that

n

x∈Rd+:ϕ(x)≤min ©

s, min tI+

n

ϕ(btn+1)ªo ⊂S11\

[

tI+ n+1

St

n+1⊂Bn,

which yields©

xRd+:ϕ(x)ln

ª

⊂Bnand, consequently, the left-hand side of (25).

Figure 5 illustrates Theorem 5 for the first two iterations of the algorithm in the cased=2. Now, we define the sequenceDn,n∈N, as

Dn=max {sln,uns}=max tIn+1

¯

¯ϕ(btn+1)−s ¯

¯. (26)

The following lemma will turn out to be useful in the remainder of the paper.

Lemma 6. IfϕN, then, for all nN, we have

¯

¯PnVH[S11] ¯

(12)

0

x2

h11

x1 {x∈R2+:ϕ(x)=l1}

{x∈R2+:ϕ(x)=u1}

x2

h11

{x∈R2+:ϕ(x)=l2} {x∈R2+:ϕ(x)=u2}

x1 0

{x∈R2+:ϕ(x)=s} {x∈R2+:ϕ(x)=s}

p11 p11

Figure 5: An illustration of Theorem 5 ford=2,n=1 (left) andn=2 (right). The dark grey area identifies the setsB1=Q11(left) andB2=(Q11∪Q23∪Q23∪Q42) \Q21

(right). We seth42=0, thusQ24= ;. The circles indicate whereϕattains either its maximum or its minimum.

Proof. From (25) and (26), we have that

Pn≤P[ϕ(X)≤un]≤P[ϕ(X)≤s+Dn].

SinceDn≥0, we obtain

PnVH[S11]≤P[ϕ(X)≤s+Dn]−P[ϕ(X)≤s]

=P[s<ϕ(X)≤s+Dn]≤P[sDn<ϕ(X)≤s+Dn].

Analogously, we can write

PnVH[S11]≥P[ϕ(X)≤sDn]−P[ϕ(X)≤s]

= −P[sDn<ϕ(X)≤s]≥ −P[sDn<ϕ(X)≤s+Dn].

Recall the definition ofDnin (26).

Theorem 7. AssumeXis absolutely continuous andlimn→∞Dn=0. Then

lim n→∞Pn=P

£

ϕ(X)s¤

=VH[S11].

Proof. If limn→∞Dn =0, andϕ(X) is continuous, then the theorem follows from Lemma (6). Therefore, it is sufficient to show thatP[ϕ(X)=s]=0, for alls ∈R. Fix somes ∈R, and let Γs ={x∈Rd :ϕ(x)=s}. Sinceϕ∈N, for eachy∈Rd−1 there exists a uniquezy∈Rsuch thatϕ(y1, . . . ,yd−1,zy)=s. It is easy to see that

(13)

Note that, in Theorem 7, the assumption of absolutely continuity ofXcannot be dropped. As a counterexample, taked=2 andX=(U, 1−U), whereUis a random variable uniformly distributed on [0, 1]. For the functionϕ(x)=x1+x2, we have that

Xis continuous, whileP[ϕ(X)=1]=1. In this case, and depending on the sequence

hnt, the sequencePnmay fail to converge.

6 The choice of the h

nt

: the bisection rule

Assuming continuity ofϕ(X), convergence of the GAEP algorithm is guaranteed by Theorem 7 whenever the sequenceDn goes to zero. Of course, the (speed of ) con-vergence of the sequenceDn depends on the choice of the vectors {htn,tIn}, at each iteration. The aim of this and the following section is to find good criteria for the choice of thehtnsuch thatDnconverges (rapidly) to 0.

Note that, wheneverSnt=Q(btn,ptn), in (17) it is convenient to sethnt =ptn, so that no simplexes are passed on to the following iteration of the algorithm. In fact, using (19), havinghtn =pnt implies thatSN tN+1

n+1 =S(b

t

n+pnt,−ptn)=Q(btn,ptn) \ St

n = ;. Moreover, for j =2, . . . ,N, (11) implies that at least one component of

pN tn+1N+jis zero, henceSnN t+1N+j= ;.

As a first choice forhtn, we propose the so-calledbisection rule. Using this choice, convergence of the sequenceDnis guaranteed under some extra assumptions onϕ.

Theorem 8. Assume thatϕN0, the set of all twice differentiable functionsϕN

for which there exists a constant r>0such that(x)>r for all kD andx∈Rd. For all nNand tIn, let the sequencehnt be defined as

htn= (

ptn, ifSnt=Qnt,

1/2ptn, otherwise. (27)

Then, Dn=O(2−n) ,n→ ∞.

Proof. It is immediate thathtn ∈H(pnt), and hencehtn is correctly defined. Since ϕis twice differentiable, it is also Lipschitz continuous onQ(0,p11), the closure of Q(0,p11). Thus, there exists a constantL< ∞such that¯

¯ϕ(x)−ϕ(y) ¯ ¯<L

¯ ¯ ¯ ¯xy

¯ ¯ ¯ ¯for

allx,y∈Q(0,p11).

Now, consider the nonempty quasisimplexSnt+1,tIn+1, for which there exist

t0∈Inandj ∈{1, . . . ,N} such thatSnt+1=SN t

0N+j

n+1 . For the quasisimplexSt

0

n , we have that (from the definition (3))Snt0=S(bnt0,ptn0)Q(btn0,pnt0). AsSnt+16= ;, (27) implies thatQ(btn0,pnt0) \Snt06= ;.

It is then possible to findytn0∈Snt0 andzt

0

n ∈Qt

0

n \St

0

n , for which we have that eitherϕ(ytn0)s,ϕ(znt0)>s (whenpnt0 0) orϕ(ytn0)>s,ϕ(znt0)s (whenptn0 0). In both cases, continuity ofϕguarantees (by the intermediate point theorem) the existence of a vectorxtn0on the curvec: [0, 1]7→(1−c)ytn0+cztn0, withϕ(xtn0)=s. Clearly,

xtn0∈Qt 0

n, and by (18) and (10), alsob N t0N+j

n+1 ∈Q

t0

(14)

¯ ¯ ¯ ¯ ¯ ¯b

N t0N+j

n+1 −xt

0

n

¯ ¯ ¯ ¯ ¯ ¯≤

¯ ¯ ¯ ¯ ¯ ¯p

t0 n

¯ ¯ ¯ ¯ ¯

¯. Therefore, for eachtIn+1, there existst 0I

nsuch that

¯

¯ϕ(btn+1)−s ¯ ¯=

¯ ¯ ¯ϕ(b

N t0N+j

n+1 )−s

¯ ¯ ¯=

¯ ¯ ¯ϕ(b

N t0N+j

n+1 )−ϕ(x

t0

n)

¯ ¯ ¯≤L

¯ ¯ ¯ ¯ ¯ ¯b

N t0N+j

n+1 −x

t0

n

¯ ¯ ¯ ¯ ¯ ¯≤L

¯ ¯ ¯ ¯ ¯ ¯p

t0

n

¯ ¯ ¯ ¯ ¯ ¯.

Combining the above result with the definition ofDngiven in (26), it follows that

Dn=max tIn+1

¯ ¯ϕ(bnt

+1)−s

¯

¯≤Lmax

tIn ¯ ¯ ¯ ¯pnt

¯ ¯ ¯

¯. (28)

Note that (28) holds for a general sequencehnt,tIn. Using the definition of the

pN tn+1N+j(see (19)) and the bisection rule (27), it follows that

pN tn+1N+j=ijptn+(1−2vj)◦htn=ijptn+(1−2vj)◦1/2pnt

=(ij+1/2(1−2vj))◦pnt =(ijvj+1/21)◦ptn=

(

1/2ptnifj6=1,

−1/2ptnifj=1.

Thus, we have

max tIn+1

¯ ¯ ¯ ¯pnt+1¯

¯ ¯

¯=1/2 max

tIn ¯ ¯ ¯ ¯pnt¯

¯ ¯ ¯.

Using (28), we finally obtain that

DnLmax tIn

¯ ¯ ¯ ¯ptn¯

¯ ¯

¯=1/2Lmax

tIn−1

¯ ¯ ¯ ¯ptn1¯

¯ ¯

¯=(1/2)n−1Lmax

tI1

¯ ¯ ¯ ¯pt1¯

¯ ¯

¯=(1/2)n−1L ¯ ¯ ¯ ¯p11¯

¯ ¯ ¯,

which implies the theorem. In the proof above, note that, even iftIn+1, there does

not necessarily exist a vectorxtn+1Qnt+1such thatϕ(xnt+1)=s. This case occurs, for example, whenSnt+1=Qnt+1.

In order to have convergence of the bisection method, we only needϕto be Lipschitz continuous onQ(0,p11). However, to keep notation simple, we defined the smaller set of functionsN0, which we will use in the following section.

7 The choice of the h

nt

: the gradient rule

In this section, we present thegradient method, a different way of choosing the se-quencehtn, which guarantees a better asymptotic convergence rate in the cased=2. Byxy, denote the componentwise minimum ofxandy, and, byxy, the compo-nentwise maximum. We keep the assumption thatϕ∈N0, see Theorem 8.

First, we use Taylor expansion to find a constant 0≤R< ∞such that

¯

¯ϕ(b+δ)− ¡

ϕ(b)+ ∇ϕ(b)¢¯

¯≤R||δ||2,

for allxQ(b,p) andx+δQ(b,p), whereϕdenotes the gradient ofϕ. Sinceϕ is twice differentiable, the constantRcan be chosen to be the same for everyb

(15)

Theorem 9. Assume thatϕ∈N0and fixα∈(1/d, 1). Forsomen∈Nand all tIn, let the sequencehtnbe defined as

htn=   

 

pnt, ifSnt=Q(bnt,ptn), (ht

nptn)∨0, ifSnt6=Q(bnt,ptn),ptn∈Rd+,

(hnt∗∨ptn)∧0, ifSnt6=Q(bnt,ptn),ptn∈Rd−,

(29)

where

¡ hnt∗¢

k=α

sϕ(btn) ∂kϕ(btn)

, for all kD.

Then, we have that

Dn+1≤

µ

max{1α,αd1}+α2R r2Dn

Dn. (30)

Before proving Theorem 9, we need the following result.

Lemma 10. Assume thatϕ∈N0and fix n∈N. Lethtn,tInbe defined by(29). Then, for all tIn, we have that

max j:SN tN+j

n+1 6=;

¯ ¯ ¯ϕ(b

N tN+j n+1 )−s

¯ ¯ ¯≤

µ

max{1α,αd1}+α2R r2

¯

¯ϕ(bnt)−s ¯ ¯ ¶

¯

¯ϕ(btn)−s ¯ ¯.

(31)

Proof. In the caseSnt=Q(btn,pnt), there is nothing to prove. Suppose, instead, that ϕ(btn)<s,Snt6=Q(btn,ptn) andpRd+, the other non-trivial case whereϕ(btn)>sand

pRdbeing analogous. It follows from (29) that, for allj1, . . . ,N,

ϕ(bN tn+1N+j)s=ϕ(btn+vjhtn)−s

ϕ(btn+hnt∗)−sϕ(bnt)+ ∇ϕ(btn)Thtn∗−s+R¯¯ ¯ ¯htn

¯ ¯ ¯ ¯

2

=ϕ(btn)+(sϕ(btn))s+R¯ ¯ ¯ ¯htn∗¯

¯ ¯ ¯

2

∞=(αd−1)(sϕ(b

t n))+R

¯ ¯ ¯ ¯hnt∗¯

¯ ¯ ¯

2

∞.

Recalling that∂kϕ(x)>r,kD, we also have thatkhtn∗k∞≤α/r

¯

¯ϕ(btn)−s ¯ ¯, which

gives

ϕ(bN tn+1N+j)s≤ |ϕ(bnt)s|

µ

(αd1)+α2R r2|ϕ(b

t n)−s|

. (32)

Note that (vjhtn)k=pkimpliesSnN t+1N+j= ;. Hence, for alljwithSnN t+1N+j6=

;, we havevjhtn=vjhtn∗. Thus,

ϕ(bN tn+1N+j)s=ϕ(btn+vjhtn∗)−s

ϕ(btn)+ ∇ϕ(btn)T(vjhtn∗)−sR

¯ ¯ ¯ ¯vjhtn

¯ ¯ ¯ ¯

2

(16)

As #vj≥1, for allj=1, . . . ,N, we finally get

ϕ(bN tn+1N+j)s≥ −|ϕ(btn)s|

µ

1α+α2R r2|ϕ(b

t n)−s|

. (33)

Combining (32) and (33) yields (31).

Proof of Theorem 9. First of all, note thathtnH(ptn), hencehtnis correctly defined. Due to Lemma 10, and recalling thatR>0,we have that

Dn+1=max

tIn+1|ϕ

(btn+1)−s|

=max tIn

max j:SN tN+j

n+1 6=;

|ϕ(bN tn+1N+j)−s|

≤max tIn

µ

max{1−α,αd−1}+α2R

r2

¯

¯ϕ(btn)−s ¯ ¯ ¶

¯

¯ϕ(bnt)−s ¯ ¯

= µ

max{1−α,αd1}+α2R

r2Dn

Dn.

Now, we are ready to prove convergence of the gradient method.

Theorem 11. Assume thatϕ∈N0and fixα∈(1/d, 1). For somenˆ∈Nandξ∈(0, 1), assume that

max{1−α,αd1}+α2R

r2Dnˆ=1−ξ<1. (34)

For all nn, let the sequenceˆ hnt be defined as(29). Then, we have thatlimn→∞Dn=0 and indeed

Dn=O

¡

(max{1α,αd1})n¢

.

Proof. Using (34) and (30) iteratively, we obtain thatDn≤(1−ξ)nnˆDnˆfor allnnˆ.

Hence, limn→∞Dn=0 and

lim n→∞

Dn+1

Dn ≤ lim n→∞

µ

max{1α,αd1}+α2R r2Dn

=max{1α,αd1}.

The condition (34), which guarantees the convergence of the gradient method with a twice differentiable functionϕ, can always be achieved by using the bisection method for the first iterations of the algorithm. In fact, if one defines thehtn us-ing (27), the sequenceDn will go to zero (Theorem 8) and will satisfy (34) for some integer ˆnlarge enough. From that ˆnon, one can then use the gradient method with convergence guaranteed.

Finally, observe thatα∗=d2+1minimizes max{1−α,αd−1} with max{1−α∗,αd

1}=dd−+11. Thus, the rate

Dn=O

µµd −1 d+1

n

, (35)

(17)

8 Applications

In this section, we test the GAEP algorithm on some random vectorsX=(X1, . . . ,Xd) and several functionsϕ. For illustrative reasons, we provide the joint distributionH ofXin terms of the marginal distributionsFXi,i=1, . . . ,d, and a copulaC. For the

theory of copulas and the definition of the Gumbel and Clayton copula families, we refer the reader to Nelsen (2006).

In Table 1, we consider the two-dimensional case (d=2) with Pareto marginals, that is,

FXi(x)=P[Xix]=1−(1+x)

θi,x0,i=1, 2,

with tail parametersθ1=1 andθ2=2. We couple these Pareto marginals via a

Gum-bel copulaCγGuwith a parameterγ=1.5. For this example, we compute the approxi-mationPn(see (20)) using both the bisection and the gradient method, for different values of the thresholdssand different numbers of iterationsn of the algorithm. Here, we setϕ(x1,x2)=(1+x1)2/3(1+x2)1/3−1. For the gradient method, we

pro-vide the differencesPnP16and their average computation times, for all iterations

nand thresholds. This has been done to show the speed of convergence of GAEP. The choice ofn=16 represents the maximum number of GAEP iterations allowed by the memory (4 GB RAM) of our laptop under the gradient method. Within the same table, we give the differencesPnP18, but using the bisection method. Again,

n=18 is the maximum number of GAEP iterations under the bisection method. Note that these numbers are different because the number of quasisimplexes pro-duced at each iteration of GAEP and, consequently, the memory used by the algo-rithm, depend on the method chosen. In Table 1, we also compute the ratio

Rn= Dn

sϕ(0). (36)

Since, from Lemma 6, we have

¯ ¯PnVH

£

S1 1

¤¯ ¯≤P

·

1Rn<ϕ

(X)ϕ(0)

sϕ(0) ≤1+Rn

¸

,

the sequenceRn provides a relative measure of convergence of the algorithm. In-deed, the convergence ofRn to 0 implies that the algorithm converges to a certain value. Note that, since analytical values forVH[S11] are not available for this ex-ample, nothing can be said about the correctness of the limit. However, for a two-dimensional portfolio, we see that the estimateP9(for the gradient method) andP13

(bisection) could be already considered reasonably accurate and are both obtained in less than 0.1 second.

In Tables 2 (d=3) to 4 (d=5), we perform the same analysis for different Gum-bel and Clayton models in which we progressively increase the number of Pareto random variables used, and we also change the functionϕ. In all tables, the ref-erence values used represent the maximum number of iterations admissible under the corresponding method.

(18)

Section 9. At this stage, we only note that ford=2, the gradient method, measured in terms ofRn, is more accurate than the bisection method, whereas ford=4, 5 the opposite is the case. Memory constraints made estimates ford≥6 prohibitive.

9 Convergence rates and comparison with MC and QMC

methods

In this section, we compare the GAEP algorithm to its main competitors for the es-timation ofVH[S11], which are the so-called Monte Carlo and quasi-Monte Carlo

methods.

GivenM pointsx1, . . . ,xM inS11, it is possible to approximateVH[S11] by the

average of the density functionvHofHevaluated at those points, i.e.

VH[S11]=

Z

S1 1

vH(x)dxV(S11)

1 M

M

X

i=1

vH(xi),

whereV(S11) is the Lebesgue measure ofS11. If thexi’s are chosen to be (pseudo)randomly distributed, this is theMonte Carlo(MC) method. If thexi’s are generated from a so-called low discrepancy sequence (see Niederreiter (1992)), this is thequasi-Monte Carlo(QMC) method. The main features of (Q)MC methods (their convergence rates included) do not depend on the functionϕ. We refer to Arbenz et al. (2010) for ref-erences and a more detailed discussion on both methods relevant for the present paper.

Unfortunately, we were not able to find a convergence rate for the sequencePn, which would be necessary to compare GAEP to (Q)MC methods. However, it is pos-sible to calculate bounds on convergence rates forDn, which, assuming that the random variableϕ(X) has a density nears, has the same asymptotic behavior ofPn. Indeed, because of Lemma 6, we have that

|PnVH[S11]| ≤P[sDn<ϕ(X)≤s+Dn]=O(Dn).

The total numberM(n) of evaluations of the joint distributionHperformed by GAEP after thenth iteration (as well as the computation time used) is proportional to the number of simplexes needed to calculatePn. Since the numberIn of qua-sisimplexes passed to thenth iteration is bounded byN=2d, we have that

M(n)≤B Nn,

whereBis a constant depending only on the dimensiond.

From (35), we know thatDn=O

³³

d−1

d+1

´n´

is the best convergence rate attainable with GAEP, when the gradient method is used. Analogously, from Theorem 8, we know thatDn=O

¡

2−n¢

, for the bisection method. By expressing the convergence rates forDnin terms ofM(n), we find that

Dn=O¡M(n)κ¢ with κ=

(

dln(2)1 ln

³

d+1

d−1

´

, for the gradient method,

(19)
(20)

G ra d ie n t me thod s = 0. 01 s = 1 s = 10 0 s = 10 00 0 P8 (r ef . v a lue) 0 .00 07 11 82 230 07 55 11 0. 17 114 54 86 287 12 5 0 .98 95 55 692 75 09 83 0. 999 89 81 648 15 62 PnP8 Rn PnP8 Rn PnP8 Rn PnP8 Rn n = 1 (7. 8e-05 sec .) -1 .2 2e-05 8 .92 e-0 1 -3.5 7e-02 6 .94 e-0 1 -1. 45e -02 6.5 2e-01 -1 .48 e-0 4 6. 43 e-0 1 n = 3 (2. 5e-03 sec .) -8 .4 8e-05 5 .65 e-0 1 -3.3 6e-02 5 .90 e-0 1 -5. 08e -03 5.9 5e-01 -2 .58 e-0 5 5. 95 e-0 1 n = 5 (4. 5e-01 sec .) -1 .7 2e-05 3 .51 e-0 1 -3.8 3e-03 3 .53 e-0 1 -6. 90e -04 3.5 7e-01 -6 .67 e-0 6 3. 57 e-0 1 n = 7 (3.5 e+01 sec .) -4 .1 2e-06 2 .24 e-0 1 -1.1 5e-03 .261 e-0 1 -8. 88e -05 2.3 5e-01 -1 .16 e-0 6 2. 37 e-0 1 B ise ct ion me thod s = 0. 01 s = 1 s = 10 0 s = 10 00 0 P8 (r ef . v a lue) 0 .00 07 293 52 44 876 18 77 0. 17 462 64 28 280 20 7 0 .98 97 34 882 11 65 19 0.9 99 899 60 57 26 889 PnP8 Rn PnP8 Rn PnP8 Rn PnP8 Rn n = 1 (5. 9e-05 sec .) 2 .00 e-0 5 1.3 4e+00 1 .61 e-0 3 1.1 1e+00 -9. 72e -03 1 .06 e+ 0 0 -9 .96 e-0 5 1 .0 5e+00 n = 3 (7. 2e-04 sec .) 1 .77 e-0 4 6 .11 e-0 1 3 .26 e-0 2 4 .63 e-0 1 -1. 19e -03 4.8 2e-01 -1 .39 e-0 5 4. 86 e-0 1 n = 5 (7. 9e-02 sec .) 8 .70 e-0 5 2 .43 e-0 1 1 .80 e-0 2 1 .51 e-0 1 -1. 36e -04 1.3 3e-01 -2 .83 e-0 6 1. 29 e-0 1 n = 7 (5.7 e+00 sec .) 1 .47 e-0 5 1 .19 e-0 1 2 .55 e-0 3 5 .07 e-0 2 1.9 8e-05 3.8 1e-02 -3 .94 e-0 7 3. 50 e-0 2 T a bl e 3 : T his is the same as T able 1, bu t fo r a rand o m v ect or having fo ur ( d = 4) P ar et o mar ginals w ith tail in dexes θi = i , i = 1, .. ., 4, coupled b y a Clayton c o p u la w it h pa ram e ter 0.5 . F o r the func tion ϕ ( x ) = x1 + x2 + x3 + x4 + 0. 1( x

+x1

+)2

1/2

+

0.

1(

x

+x3

+)4

(21)

Since, in general, we do not know the exact number of simplexes passed to the next iteration by GAEP, the rates provided by (37) represent only an upper bound on the real convergence rates of the GAEP algorithm. As a matter of fact, the convergence rates encountered in many numerical examples turned out to be much better than those predicted by (37).

In Figure 6, we plot absolute errors|PnVH[S11]|versus computation time, for random vectors with independent marginals and functionsϕfor whichVH[S11] is available analytically. We use linear least squares fitting on these curves in order to calculate so-calledempiricalconvergence rates for the algorithm. Here, computa-tion time (which is proporcomputa-tional toM(n)) is used as a measure of numerical com-plexity. These results are collected in Table 5, where the empirical convergence rates obtained from Figure 6, as well as the bounds obtained from (37), are compared with convergence rates for the MC and QMC methods.

0.1ms 10ms 1s 1e−11

1e−8 1e−5 1e−2

Computation time

Error

d = 2

Gradient Bisection

0.1ms 10ms 1s 1e−6

1e−4 1e−2

Computation time

Error

d = 3

Gradient Bisection

0.1ms 10ms 1s 1e−3

1e−2 1e−1

Computation time

Error

d = 4

Gradient Bisection

0.1ms 10ms 1s 1e−2

1e−1

Computation time

Error

d = 5

Gradient Bisection

Figure 6: Errors|PnVH[S(0, 3)]|from the GAEP algorithm for random vectors of different dimensions having independent Pareto marginals with tail indexesθi = i,i =2, . . . , 5. GAEP errors are plotted versus computation time, for the function ϕ(x)=Qd

k=1(xk+1).

(22)

com-d=2 d=3 d=4 d=5

GAEP, g. (bound) M−0.79 M−0.33 M−0.18 M−0.12

GAEP, g. (empirical) M−2.18 M−0.74 M−0.40 M−0.26

GAEP, b. (bound) M−0.5 M−0.33 M−0.25 M−0.2

GAEP, b. (empirical) M−1.62 M−0.74 M−0.40 M−0.28

MC M−0.5 M−0.5 M−0.5 M−0.5

QMC (best) M−1 M−1 M−1 M−1

QMC (worst) M−1(logM)2 M−1(logM)3 M−1(logM)4 M−1(logM)5

Table 5: Asymptotic convergence rates of GAEP, g(radient) and b(isection) method, MC and QMC methods. Here, we use the simplified notationM=M(n). For (Q)MC methods,Mis the number of samples used.

petitive than the gradient, as confirmed by the results in Tables 3-4. However, Table 6 (and the corresponding empirical rates) indicate that the gradient rule can be bet-ter, computationally, also in higher dimensions. Here, it is important to remark that exact convergence rates forPnarenotavailable, and (empirical) convergence rates for the two methods depend on the probability model under study.

With respect to (Q)MC methods, the figures indicate that awell-designedQMC algorithm will perform better, asymptotically, than GAEP under a smooth probabil-ity model and for dimensionsd≥3. At this point, it is however important to stress that GAEP and (Q)MC methods are substantially different. First of all, (Q)MC meth-ods provide a final estimate which contains a source ofrandomness, while the GAEP algorithm, being solely based on geometric properties of a certain domain, is purely deterministic. This can be seen in Table 6, where we compare GAEP and MC esti-mates on two examples.

Also recall that (Q)MC methods need either a density (everywhere onS11) or a sampling algorithm for the distribution function ofX. Instead of this, the GAEP algo-rithm does not require the density ofHin analytic form, nor does it have to assume overall smoothness. In order to use GAEP, one only needs thatHcan be evaluated numerically and thatϕN0. Furthermore, (Q)MC methods need to be tailored to

(23)

(a) GAEP estimate MC estimate MC s.e.

s (n=12, 61 sec.) (M=5e07, 64 sec.)

100 0.416123502687784 0.41614132 6.97e-05 102 0.860937621800580 0.86099074 4.89e-05 104 0.975466112029285 0.97545864 2.19e-05 106 0.996034158165874 0.99603776 8.88e-06

(b) GAEP estimate MC estimate MC s.e.

s (n=8, 41 sec.) (M=4e07, 47 sec.)

10−2 0.000729352448762 0.000715050 4.22e-06

100 0.174626428280207 0.172616075 5.98e-05 102 0.989734882116519 0.989638175 1.60e-05 104 0.999899605726889 0.999899550 1.58e-06

Table 6: GAEP (bisection) and MC estimates forVH[S11] for the example described

in: (a) Table 2; (b) Table 3.

10 GAEP versus AEP

The GAEP algorithm is similar to the AEP algorithm introduced by the same authors in Arbenz et al. (2010) for the sum operator, but cannot be seen as an extension of AEP. In the caseϕ(x)=Pd

k=1xk, the geometrical decompositions of the set

S1

1 ={x∈Rd:0<xandx1+ · · · +xks},

used by AEP and GAEP (gradient method), are equivalent ford =2, 3, but differ-ent ford4. AEP decomposesS11in a countable family ofoverlappingsimplexes, whereas the (quasi)simplexes produced by GAEP are alwaysdisjoint. The decompo-sition used by AEP has the advantage that all the simplexes generated by the algo-rithm are scaled copies of the simplexes from which they have been generated. On the other hand, using overlapping simplexes implies a larger numerical complexity, particularly in high dimensions, and a much more cumbersome proof of conver-gence. Indeed, the problem of convergence of AEP in dimensionsd≥9 is still open. This is different with GAEP, which uses disjoint simplexes and is more efficient than AEP forϕ(x)=Pd

k=1xk in dimensionsd ≥4. Moreover, convergence (under

some smoothness ofϕ) is easily stated in arbitrary dimensions. Unfortunately, the disjoint (quasi)simplexes produced by GAEP can be “cut off” at the edges. This does not allow the use of the extrapolation technique described in Arbenz et al. (2010, Section 5). This is the reason why the AEP algorithm, in its extrapolated version (AEP-E), turns out to be better than GAEP for the sum operator, in every dimensions. In Figure 7, we plot the error committed by AEP, AEP-E (see (43) below) and GAEP (gradient method) versus computation time, when the three algorithms are applied to random vectors of different dimensions for the functionϕ(x)=Pd

i=1xi. In these

(24)

are equivalent in dimensionsd=2 andd=3. As already remarked, GAEP is more efficient than standard AEP in dimensionsd≥4 due to the use of a disjoint decom-position. However, AEP-E turns out to be the best algorithm to be used with the sum operator, in all dimensionsd.

0.1ms 10ms 1s 1e−14

1e−11 1e−8 1e−5 1e−2

Computation time

Error

d = 2

AEP = GAEP AEP−E

0.1ms 10ms 1s 100s 1e−10

1e−7 1e−4 1e−1

Computation time

Error

d = 3

AEP = GAEP AEP−E

0.1ms 10ms 1s 100s 1e−5

1e−4 1e−3 1e−2 1e−1

Computation time

Error

d = 4

GAEP AEP AEP−E

0.1ms 10ms 1s 100s 1e−4

1e−3 1e−2 1e−1

Computation time

Error

d = 5

GAEP AEP AEP−E

Figure 7: Errors|PnVH[S(0, 10)]|from the AEP, AEP-E and GAEP (gradient) al-gorithms for random vectors of different dimensions having independent Gamma marginals with scale parameter 1 and shape parameters 0.5+0.5i,i=1, . . . , 5. GAEP errors are plotted versus computation time, for the sum operatorϕ(x)=Pd

i=1xi.

In order to better clarify the difference between AEP and GAEP, suppose we want to decompose the simplex {x∈R2:0xp,ϕ(x)≤s}⊂R2, where0hpand ϕ(h)<s. In the case of a general functionϕ∈N, an overlapping decomposition, analogous to the one used by AEP, can be carried out, leading to

VH[{x:0xp,ϕ(x)≤s}]=VH[Q(0,h)]+VH[{x: (0,h2)≤xp,ϕ(x)≤s}]

Figure

Figure 1 illustrates (3) and (1) as well as (4) for d = 2. Defining S 1 1 = S (0, p 1 1 ) and recalling that X is non-negative, (4) implies that
Figure 2: An illustration of (5), where the rectangle Q(0,p) ⊂ R 2 is decomposed into four disjoint rectangles.
Figure 3: An illustration of the decomposition (7) for two different choices of h ∈ R d − ∪ R d +
Figure 4: An illustration of the decomposition (12) of S (0,p) ⊂ R 3 for some vector h ∈ R d + .
+7

References

Related documents

• Follow up with your employer each reporting period to ensure your hours are reported on a regular basis?. • Discuss your progress with

Such a collegiate cul- ture, like honors cultures everywhere, is best achieved by open and trusting relationships of the students with each other and the instructor, discussions

For BYOD programs, these capabilities are sometimes used with very minimal device policy management, giving users more freedom to use devices as they wish while carving out

19% serve a county. Fourteen per cent of the centers provide service for adjoining states in addition to the states in which they are located; usually these adjoining states have

In this present study, antidepressant activity and antinociceptive effects of escitalopram (ESC, 40 mg/kg) have been studied in forced swim test, tail suspension test, hot plate

/ L, and a different adsorbent mass, is shown in Figure 5. The results obtained show that the removal rate increases quickly at first, until a time of 60 min, and

National Conference on Technical Vocational Education, Training and Skills Development: A Roadmap for Empowerment (Dec. 2008): Ministry of Human Resource Development, Department