A novel expected hypervolume improvement algorithm for Lipschitz multi-objective optimisation: Almost Shubert's Algorithm in a special case

(1)

A novel expected hypervolume improvement

algorithm for Lipschitz multi-objective

optimisation: Almost Shubert’s algorithm in a

special case

Cite as: AIP Conference Proceedings 2070, 020031 (2019); https://doi.org/10.1063/1.5089998

Published Online: 12 February 2019

Heleen J. Otten, and Sander C. Hille

ARTICLES YOU MAY BE INTERESTED IN

The R2 indicator: A study of its expected improvement in case of two objectives AIP Conference Proceedings 2070, 020054 (2019); https://doi.org/10.1063/1.5090021

A two-phase approach in a global optimization algorithm using multiple estimates of hölder constants

AIP Conference Proceedings 2070, 020033 (2019); https://doi.org/10.1063/1.5090000

Lower and upper bounds for the general multiobjective optimization problem

(2)

A Novel Expected Hypervolume Improvement Algorithm

For Lipschitz Multi-Objective Optimisation:

Almost Shubert’s Algorithm In A Special Case

Heleen J. Otten

1,b)

and Sander C. Hille

1,a)

1_{Mathematical Institute, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands} a)_{Corresponding author: [email protected]}

b)_{[email protected]}

Abstract.An algorithm is proposed for multi-objective optimisation of Lipschitz objective functions that each satisfy a Lipschitz condition of which a Lipschitz constant isa priori known. The number of function evaluations is reduced by determining a good next point of evaluation using an Expected Hypervolume Improvement (EHVI) approach. It is closely related to Shubert’s Algorithm for single objective optimisation on one-dimensional decision space, but sampling sequences can be slightly different.

INTRODUCTION

Algorithms for optimising Lipschitz continuous objective functions for which Lipschitz constants are known have attracted some attention over the past decades. Shubert [1] introduced the algorithm (named later after him) for global optimisation of a single Lipschitz continuous objective function on one-dimensional decision space. ˇZilinskas and

ˇ

Zilinskas [2] introduced an approach to computing the Pareto optimal set for a bi-objective optimisation problem with Lipschitz objective functions on ad-dimensional hyper-rectangular decision space. The Pareto optimal set is approximated by that of a natural Lipschitz lower bound that is iteratively improved. See e.g. [2] for further references. Here we propose an approach for optimisation ofnLipschitz continuous functions ond-dimensional decision space, motivated by the Expected Hypervolume Improvement (EHVI) method introduced in Emmerich [3] and elab-orated upon in Emmerichet al.[4]. We show that our EHVI method reduces ‘almost’ to Shubert’s Algorithm in the casen 1, d 1. In multi-objective optimisation of a function f : D Rd Ñ Rn the main objectives are to

determine the Pareto optimal solutions (simply called the ‘Pareto front’) inRnand the corresponding set of decisions

inD(cf. Miettinen [5]). In case of minimising, this amounts to determining the points in fpDqthat are not dominated by any other point in fpDq. We say that an elementy py1_{, . . . ,}_yd_q_{in objective space}

Rnisdominatedbyy1, written

asy1 y, ifpy1qi¤yifor alliP t1, . . . ,nuandpy1qi yifor at least oneiP t1, . . . ,nu. Ifn1 the Pareto front is simply the global minimum.

The objective of the proposed EHVI algorithm is to approximate the Pareto front of a Lipschitz continuous f. Recall that this entails the following:

Definition 1. A function f : D Rd Ñ Rn, with fpxq pf1pxq, . . . ,fnpxqqfor any x P Dis called Lipschitz

continuous on Dor is said to satisfy a Lipschitz condition on D with constantL pL1, . . . ,Ln_{q P}

Rn if for all

x,yPD:

|fkpxq fkpyq|¤Lk}xy}, k1, . . . ,n.

Here we take}xy}:°d_i₁|xiyi|, the so-called Manhattan metric. (Note that fkandLkare not powers of f and L, but indicate the components of the vector f andL).

The objective is to use as few functions evaluations fpxqas possible, because in applications the evaluation fpxq

can be computationally quite expensive. The EHVI algorithm exploits thea prioriknowledge of a Lipschitz constantL

(3)

FIGURE 1.The set of points that are dominated by a set Y4 typ1q, . . . ,yp4qu R2relative to reference pointrPR2(red) and its hypervolume indicator (the area). The area of the blue region is the hypervolume improvement of Z tzp1q_,_zp2q_u_{relative to Y}

4.

values, that maximises the expected improvement – in a suitable sense – of the approximation of the Pareto front. This ‘educated guess’ of the new positionxis based on the hypervolume improvement measure, that we discuss next.

Expected Hypervolume Improvement

Fix a reference pointrPRn. ForY Rn, theset of points dominated by Y(relative tor) is the set

DomrpYq: tuPRn|u rand there existsyPY: y uu. (1)

Definition 2. Thehypervolume improvement of Z over Yis the increase of size of the set of dominated points relative toZcompared to that relative toY, as measured byn-dimensional Lebesgue measureλn:

HVIpZ|Yq:λn DomrpZq zDomrpYq

. (2)

Figure 1 illustrates the concepts discussed so far. IfZ tzu, a single point, we shall write HVIpz|Yq.

Emmerichet al.[4] showed that theexpectedhypervolume improvement is a useful tool for global optimisation. Suppose one has evaluated the Lipschitz objective functionf(with constantL) at the pointsxP Xk: txp1q, . . . ,xpkqu. LetYk: fpXkqand writeypjq: fpxpjqq. Becausef is Lipschitz continuous, we know that if we evaluatefinxPRd,

the corresponding valuey: fpxq P_Rn_{satisfies for all}_i_{P t}₁_{, . . . ,}_n_u_and _j_{P t}₁_{, . . . ,}_k_u_:

fi xpjqLi}xxpjq} ¤yi¤ fi xpjq Li}xxpjq}. (3) That is,yhas to be in the hyper-rectangleExpXkqthat is ann-fold Cartesian product of intervals inR:

ExpXkq: n ¡

i1

max

j !

fi

xpjq Li}xxpjq}

) ,min

j !

fi

xpjq Li}xxpjq}

)

. (4)

Since one has no further information on the location ofy withinExpXkq, we assume that its location is a random

variableY that is homogeneously distributed overExpXkq. WriteExExpXkqand – motivated by [4] – define

Definition 3. Theexpected hypervolume improvement (EHVI)of a pointx P Drelative to the setXk of previously evaluated points and corresponding valuesYk fpXkqis EIpx|Xkq:_ErHVIpY|Ykqs.

Observe that the hypervolume improvement ofYrelative toYkwill be 0 ifY PDomrpYkq XEx. Otherwise it will

be HVIpY|Ykq. Therefore,

EIpx|Xkq 1 VolpExq

»

ExzDomrpYkq

HVIpy|Ykqdy, (5)

(4)

THE EXPECTED HYPERVOLUME IMPROVEMENT ALGORITHM

The proposed EHVI algorithm for approximating the Pareto front consists of the following steps:

1. Selectxp1qPDand putX1: txp1qu.

2. Computeyp1q_: _f_p_xp1q_q_{and put}_Y

1: typ1qu.

3. Selectxpk 1q_P_{arg max}

xPDEIpx|Xkqand putXk 1:XkY txpk 1qu. 4. Computeypk 1q_: _f_p_xpk 1q_q_{and put}_Yk

1:YkY typk 1qu.

5. Stop if EIpxpk 1q_|_Xk_{q ¤}_ε_{, otherwise increase}_k_{and return to Step 3.}

After stopping, the subset ofYk 1consisting of those points that are not dominated by any other point inYk 1provide

an approximation of the part of the Pareto front of f intuPRn|u ru, to an accuracy that is controlled byε¡0.

This algorithm is interesting to consider – roughly speaking – when computing a global maximum of the functions DÑR:xÞÑEIpx|Xkq(k1,2,3, . . .q, required in Step 3, is computationally more efficient than evaluating f.

RELATION TO SHUBERT’S ALGORITHM

Now we will take a closer look at the case forn1 andd 1, i.e. single objective optimisation in one dimensional decision space. We takeD ra,bs Rand the single objective function f : ra,bs Ñ Ris assumed to satisfy a

Lipschitz condition with constantL. Bruno O. Shubert introduced in 1972 an algorithm to approximate the global maximum of f onra,bsin [1]. Our main conclusion concerning the relationship to Shubert’s Algorithm, which will be made precise below, is:

The sampling sequence of the Expected Hypervolume Improvement Algorithm applied to single objective optimisation pn 1qof a Lipschitz continuous objective function onra,bs Rpd 1qwill generally follow that of Shubert’s

Algorithm, but may deviate at steps, occasionally.

Shubert’s Algorithm

We reformulate the algorithm in Shubert [1] for minimisation. Putφ:minxPra,bsfpxqandΦ:arg minxPra,bsfpxq. Shubert’s Algorithm defines a sampling sequencex0,x1,x2, . . . of points fromra,bsrecursively, by selecting

(arbi-trarily)x0P ra,bs. Oncex0, . . . ,xnhave been selected,xn 1is selected according to

Fnpxq: max

k0,...,npfpxkq L

|xxk|q, xn 1Parg min

xPra,bs

Fnpxq. (6)

It is shown in [1] that the sequencepxnqconverges to a point inΦand that the minimal valuesMn:minxPra,bsFnpxq converges toφ. In practice one usually starts with x0 a after which one can take x1 b. This version of the

algorithm one may call theCanonical Shubert Algorithm (CSA). An example is visualised in Figure 2 (left).

Computation of the Expected Hypervolume Improvement

Select a reference pointr P Rsufficiently large, such thatr ¥ maxxPra,bs fpxq. Suppose that evaluations have been made at points x0, . . . ,xk1, with k ¥ 1. Put Xk : tx0, . . . ,xk1uand Yk : fpXkq. Assume for simplicity of

exposition thata,b PXk. FixxP ra,bszXkand define xas the point inXkclosest toxsuch thatx x. Similarly, x is the point closest toxwithx ¡x, see Figure 2 (right). Putymin :minpYkqand define

Mx:min fpxq Lpxxq,fpx q Lpxx q (

, mx:max fpxq Lpxxq,fpx q Lpxx q (

. (7)

The computation of an expression for EIpx|Xkqand its maximisation are established in the following lemmas.

Lemma 4. ExpXkq Ris determined by the evaluations at xand x only: ExpXkq rmx,Mxs.

Lemma 5. HVIpy|Ykq yminy for yPExpXkqzDomrpYkq

minpmx,yminq,ymin

. Lemma 6. EIpx|Xkq pyminmxq2

2pMxmxq if mx ymin, andEIpx|Xkq 0otherwise.

Lemma 7. Define Fx,x pξq:mintfpxqLpξxq,fpx q Lpξx qu. Thenarg maxxPrx,x sEIpx|Xkq txLu, where xLis the location of the unique minimum of Fx,x :

(5)

FIGURE 2.Left: Visualisation of a sampling sequence in the Canonical Shubert’s Algorithm, where x0 a and x1 b. Right: The upper and lower bound for the values of fpxqin between two evaluated points xand x . The set ExpXkqof possible values for fpxqis denoted by the vertical dashed lines. xLis the position of the minimum of the lower bound Fx,x .

Comparison

Letx1₀ x1₁ x1_k₁be the enumeration ofXkin increasing order and puty1_i : fpx1_iq. In Shubert’s Algorithm the next point xk is chosen at a position whereFk1pxqis minimal.Fk1 is the minimum of the functionsFx1_i,x1_i ₁

defined in Lemma 7,iP t0,1, . . . ,k2u. LetxL,ibe thexL-location of the intervalrx1_i,x1_i ₁s. PutyL,i:Fx1_i,x1_i ₁pxL,iq. ThenxkxL,i for indexifor whichyL,i is minimal. Hence,zi :yminyL,i is maximal.

In our EHVI algorithm the next pointxkis chosen where EIpx|Xkqis maximal. According to Lemma 7, xkis one of the pointsxL,i. A computation shows thatMxL,imxL,i 2rminpy1i,y1i 1q yL,is. Thus, Lemma 6 yields

Ei:EIpxL,i|Xkq 1₄ pyminyL,iq

2

minpy1_i,y1_i ₁q yL,i

1 4

z2

i

wi zi, withzi:yminyL,i, wi:minpy 1

i,y1i 1q ymin. (9)

ThenxkequalsxL,iforifor whichEiis maximal. This isnot necessarilyatiwith maximalzi, as in Shubert’s Algorithm. Depending on the valueswi, the EHVI algorithm may select a next point xkdifferent from Shubert’s Algorithm. It remains to be investigated how this phenomenon affects convergence rates to global minimum.

ACKNOWLEDGMENTS

This work was part of H. Otten’s research project for obtaining her Master’s degree in Mathematics at Leiden Univer-sity under supervision of dr. S.C. Hille (Mathematical Institute) and dr. M.T.M. Emmerich (LIACS).

REFERENCES

[1] B. O. Shubert,SIAM Journal on Numerical Analysis9, 379–388 (1972).

[2] A. ˇZilinskas and J. Zilinskas, Communications in Nonlinear Science and Numerical Simulationˇ 21, 89 – 98 (2015), numerical Computations: Theory and Algorithms (NUMTA 2013), International Conference and Summer School.

[3] M. Emmerich, “Single- and multi-objective evolutionary design optimization assisted by gaussian random field metamodels,” Ph.D. thesis, Fachbereich Informatik, Chair of Systems Analysis, University of Dortmund 2005.

[4] M. Emmerich, K. Yang, A. Deutz, H. Wang, and C. M. Fonseca, “A multicriteria generalization of bayesian global optimization,” inAdvances in Stochastic and Deterministic Global Optimization(Springer International Publishing, 2016), pp. 229–242.