Details of the Spectral Bundle Method - The Spectral Bundle Method

3.4 The Spectral Bundle Method

3.4.2 Details of the Spectral Bundle Method

Being a specialisation of the proximal bundle method, described in Section 3.3.4, the spectral bundle method also builds a minorising model, solves the minimisation problem for the augmented model and performs the descent test (3.41) in order to decide between descent step and null step. The differences to the proximal bundle method described in Section 3.3.4 are caused by the specific oracle being used and the choice of the minorising model.

The Oracle and its Computation

The oracle in the spectral bundle method computes for a given trial point yk _{the maximum eigen-}

value of the matrix C_{− A}T_yk _{and a matrix}

S ∈ Argmax

W∈W C − A

T_yk_{, W}

. (3.55)

From Section 3.4.1, we know that b_{− AW}k

S is a subgradient of the function (3.49) that should be

minimised. Furthermore, we established that a specific Wk

S can always be given as WSk= vvT with

Depending on the size and structure of the matrix in question, in our case C − AT_{y, the}

eigenvalue and -vector computations can be done by exact or, more typically, by iterative methods as the power method or the Lanczos method. The power method starts with a randomly chosen normalised vector. It uses consecutive left multiplications of this vector by the matrix to produce a series of vectors. This series has an accumulation point in the eigenspace of the maximum eigenvalue of the matrix if the starting vector has some component in this eigenspace, i.e., if it does not live completely in the eigenspace of other eigenvalues. This can be seen by looking at the eigenvalue decomposition of the matrix and observing that components in the direction of the maximal eigenspace will dominate the product eventually.

Although being related to the power method, an explanation of the Lanczos method would be much more involved, so we just refer to [65] for an review of its use in the spectral bundle method and to Golub and van Loan [58] for details. Here, we only mention practical implications given in [65] regarding its performance in the spectral bundle method.

• The method is particularly efficient if matrix vector multiplications can be performed efficiently, which, for instance, is the case if the matrix is sparse. That is why the spectral bundle method is well suited for our problem if the objective coefficient matrix is sparse, because the underlying graph is sparse, and if we take care to restrict the support of separated inequalities (compare Section 4.3.4). Furthermore, we can use structured constraint matrices like f fT _{of the bisection constraint in (3.19).}

• In theory, the Lanczos method should stop after, at most, n matrix vector multiplications, where n denotes the dimension of the symmetric matrix. Convergence can even be speeded up if a starting vector with a large component within the maximal eigenspace is employed, e.g., the eigenvector computed for the last successful trial point. If, however, the maximal eigenvalue is not well separated from the remaining ones, then numerical difficulties lead to much larger iteration numbers than n, and the Lanczos method becomes the bottleneck of the spectral bundle method. This effect can be seen, when we are near the optimal solution, it is typical for SDPs and caused by the facial structure of their feasible set. A way to partially overcome this problem is to use inexact computations in case a null step can be foreseen. This is the case if the approximate eigenvectors of the iterative procedure are via (3.47) already sufficient to show that the function value of the next trial point will cause the descent test to fail. Then, the Lanczos method can be stopped prematurely, and its current iterate is used to enhance the minorising model of the eigenvalue function.

The Cutting Surface Model

At the end of Section 3.4.1, we noted that a function (3.54) minorising our objective function (3.49) can be achieved by restricting_{W to a smaller set ˆ}_{W ⊆ W. In iteration k, this set is chosen as}

ˆ Wk ₌_P

kV PkT + α ¯Wk : trV + α = 1, V ∈ Sr+k, α≥ 0 , (3.56)

where Pk is an orthonormal n× rk matrix and ¯Wk ∈ W is called the aggregate matrix. The

surface model, reads

f_Wˆk(y) = max

W∈{PkV PkT+α ¯Wk:trV +α=1,V ∈Srk+,α≥0}

C − AT_{y, W}_{+ b}T_{y .} _(3.57)

The model is called cutting surface model, because it is – in contrast to cutting plane models like the one in the proximal bundle method – not polyhedral, since we use the nonpolyhedral positive semidefinite cone for its definition.

In [65], it was shown that the value of this model at y is

f_Wˆk(y) = maxλmax PkT C− ATy Pk , C − ATy, ¯Wk + bTy . (3.58)

What can we learn from this expression? The computationally most expensive term in (3.58) is the maximum eigenvalue. So, we realise that the model value can be efficiently computed if the bundle size rk is kept small, because then, the eigenvalue computation could be performed quickly.

Furthermore, in order to achieve a high model value in the neighbourhood of the current point yk_,

i.e. a good approximation of the objective function by the minorising model, one should choose Pk

so that it spans the eigenspaces of the largest eigenvalues of C−AT_yk_{, because, by the continuity of}

eigenvalues and eigenvectors, these large eigenvalues and corresponding eigenvectors will produce a large vT _C_{− A}T_{y v in the neighbourhood of y}k_.

From Proposition 117, we have learned that PkV PkT corresponds to a small face of the semidef-

inite cone Sn+ with dimension rk +1

2 . Corollary 118 tells us that this face might be too small

to contain a matrix W∗ ₌ Pk

i=1µivi(vi)T, where the vi form a basis of the eigenspace of an

eigenvector to the optimal λmax C− ATy∗. There are two ways out of this trap.

The first way is pursued by the spectral bundle method and uses the term α ¯Wk_{. It enables us to}

reach from the face into the interior of S+

n, while the additional computational costs are negligible.

The second way is to provide enough columns rk to Pk so that it can span eigenvectors to the

largest eigenvalue (compare Corollary 118). A nontrivial lower bound on the number of columns needed is given in Lemma 5.3.9 of [65] as rk > ¯r + 1, where ¯r∈ N is the largest number satisfying

¯ r+1

2 ≤ m + 1 and m denotes the number of primal constraints. Its proof uses Pataki’s bound on

the rank of matrices in faces of the feasible set of semidefinite programs as given in Proposition 124. Clearly, the second way is not feasible for us, since the number of primal constraints m will be rather large, when we use a branch-and-cut approach.

Finding the Next Trial Point

As in the proximal bundle method, we now want to find the next trial point yk+1_{and an associated}

matrix Wk

S given the current point ˆykand the cutting surface model (3.57). Again, we will augment

it by a trust region term u

2ky − ˆykk2. Thus, it reads

fk(y) = max W∈ ˆWkC − A T_{y, W + b}T_{y +}u 2ky − ˆy k k2 _. _(3.59) The function Lk(y, W ) =C − AT_{y, W + b}T_{y +} u 2ky − ˆy k k2 _(3.60)

In order to find the next trial point, we have to solve the following problem. min y∈Rmf k_{(y) = min} y∈Rm max W∈ ˆWkC − A T_{y, W}_{+ b}T_{y +}u 2ky − ˆy k k2. (3.61)

Its dual is found be exchanging min and max. max W∈ ˆWk min y∈RmC − A T_{y, W}_{+ b}T_{y +}u 2ky − ˆy k k2. (3.62)

Observe that the inner minimisation problem of (3.62) with respect to y is unconstrained, so we can explicitly compute its optimal solution as

ykmin(W ) = ˆyk+

u(AW − b) = ˆy

−_u1∇fW (3.63)

for any fixed W _{∈ ˆ}_Wk_{. Using semidefinite duality [65] or general theorems of convex duality, it}

can be shown that strong duality holds for problems (3.61) and (3.62). The semidefinite duality approach is particularly helpful, because it allows to expose a way to determine the optimal W . Proposition 128 ([65]). Given the augmented Lagrangian Lk_{(y, W ) as in (3.60),}

min y∈Rm max W∈ ˆWkL k_{(y, W ) = L}k_(yk+1_{, W}k+1_{) = max} W∈ ˆWkymin∈RmL k_{(y, W ),} _(3.64) where yk+1_{= y}k

min(Wk+1) is unique and Wk+1 is an optimal solution of the quadratic semidefinite

problem min _2u1_{kb − AW k}2 −W, C − AT_y_ˆk_{− b}T_y_ˆk s.t. W = PkV PkT+ α ¯Wk trV + α = 1 V _{0, α ≥ 0 .} (3.65)

Problem (3.65) can be solved efficiently by interior point methods if its dimension controlled by rk is kept small. For a description of the interior point method that is used in the spectral bundle

method, we refer the interested reader to [65].

Remark 129. Using the equivalence of the eigenvalue optimisation problem (3.26) to the dual SDP (3.20), it can be worked out that the optimal solutions Wk+1_{= P}

kV∗PkT + α∗W¯k can be in-

terpreted as infeasible approximate solutions to the primal problem (3.22) (compare Theorem 5.3.8 in [65]). For the case of the spectral bundle method with bounds, this fact was proved by Helmberg in [71]. We will use this property in our branch-and-cut approach.

Updating the Cutting Surface Model

Once we have computed the next trial point yk+1_{, the oracle is called to deliver the corresponding}

W_Sk+1 = vk+1_(vk+1₎T _{with v}k+1 _{a normalised eigenvector to λ}

max C− ATy. It gives rise to

a subgradient and a supporting cutting plane of our objective function (3.45) in yk+1_{, and this}

information should be incorporated into the cutting surface model. If the maximal number of allowed columns of Pk is not yet reached, we simply orthonormalise the new eigenvector vk+1with

to preserve the most important information of ˆWk_{. From relation (3.64) of Proposition 128, we can}

conclude that Wk+1_{and its associated cutting plane f}

Wk+1 are important, because they guarantee

that the value of the augmented model cannot decrease in a null step. To ensure convergence of the model to the true function near the optimal solution, one has to guarantee that

Wk+1, W_Sk+1_{∈ ˆ}_Wk+1 (3.66)

(compare the convergence analysis given in [65]). To this end, it would suffice to set ¯Wk+1 = Wk+1

and Pk+1= vk+1, i.e., to keep rk = 1. Despite this, it is helpful to use a larger bundle size in order

to store a larger subspace spanned by the current columns of Pk, as explained on page 82. However,

this has to be balanced to the costs of computing the quadratic semidefinite subproblem (3.65). Anyway, a promising subspace to be kept can be determined from the optimal solution Wk+1 ₌

PkV∗PkT + α∗W¯k of the quadratic semidefinite subproblem (3.65). If QΛQT = V∗ denotes an

eigenvalue decomposition of V∗ with QTQ = I, Λ = diag(λ1, . . . , λrk) and λ1 ≥ . . . ≥ λrk ≥ 0,

then we can write

Wk+1 _{= (P} kQ)Λ(PkQ)T + α∗W¯k = Pk[Q1Q2] " Λ1 0 0 Λ2 # " QT 1 QT 2 # PT k + α∗W¯k (3.67)

with Λ1 an upper-left principal submatrix of Λ corresponding to large eigenvalues, Λ2 containing

the remaining eigenvalues, and Q1 and Q2 splitting up Q correspondingly. By the ordering of the

eigenvalues, Q1 carries more information about Wk+1 than Q2. This information will be directly

conserved in the new bundle Pk+1, which is computed so that it spans an orthonormal basis of the

space spanned by PkQ1 and vk+1, i.e.,

Pk+1= orth[PkQ1, vk+1] . (3.68)

The maximal possible dimension of Q1 is determined by a parameter nK ∈ N. Furthermore, note

that the Lanczos method delivers several vectors which are candidates to span a promising new subspace. Therefore, the spectral bundle method may be allowed to include more of them into the new bundle. The maximal number of new vectors to add is given by a parameter nA ∈ N. The

sum nK+ nAdetermines the maximal allowed bundle size. The new aggregate matrix

¯ Wk+1=

(PkQ2)Λ2(PkQ2)T + α∗W¯k

trΛ2+ α∗

(3.69) keeps the information given by the columns of Q2. It can be checked that Pk+1and ¯Wk+1conform

to (3.56) and that (3.66) is fulfilled (compare Proposition 5.2.3 of [65]).

Stopping the Computations

For the lack of a lower bound, we cannot judge if our current solution is sufficiently close to the optimum. However, we know that the quality of the cutting surface model in the interesting region is improved in every step. Furthermore, the trust region approach guarantees that we seek for the next trial point within a ball in which the current model is sufficiently accurate. If this ball is large

enough (u small enough), and the model value of the next trial point fWk+1(yk+1) is close to the

function value f (ˆyk_{) of the current centre of stability, chances for a reasonable progress are low}

and we stop. This stopping criterion can be expressed as

f (ˆyk)− fWk+1(yk+1) < ǫ(f (ˆyk)+ 1) . (3.70) Incorporating Variable Bounds

So far, we have explained the spectral bundle method for unbounded variables y_{∈ R}m_correspond-

ing to equality constraints in our primal SDP relaxation. However, we have already one inequality constraint in our primal SDP, namely the bisection constraint, and the branch-and-cut approach will add many more inequality constraints. Therefore, we will need to use the spectral bundle method with bounds as described by Helmberg and Kiwiel in [68]. Following again [65], we will explain the main idea on the problem

min

y∈Rm +

λmax(C− ATy) + bTy . (3.71)

Denote by f (y) the objective function of (3.71). Using an indicator function ιRm

+ of R m + with ιRm +(y) = 0 if y ∈ R m + and ιRm

+(y) = ∞ otherwise, we can rewrite this problem as an unbounded

problem with respect to y as

min y∈RmfR m +(y) (3.72) with fRm +(y) = λmax(C− A T_{y) + b}T_{y + ι} Rm +(y) . (3.73)

Corresponding to (3.51), we can also find for f_Rm

+(y) linear minorants

fW,η(y) =hC, W i + hb − η − AW, yi, (3.74)

where η∈ Rm

+ collects the Lagrange multipliers to the nonnegativity constraints for y (or subgra-

dients of ιRm

+). Each entry of η can also be interpreted as a primal slack variable which transforms

the corresponding inequality constraint of the primal SDP into an equality constraint. Similar to (3.50), the objective function can again be reformulated as an optimisation problem, but this time over_{W and R}m +, fRm +(y) = sup W∈W,η∈Rm + hC, W i + hb − η − AW, yi . (3.75) The subdifferential of f_Rm +(y) at y∈ R m + turns out to be

∂f (y) = _{∇fW,η(y) : W ∈ W, η ∈ Rm+, fW,η(y) = f (y)} (3.76)

with ∇fW,η(y) = b− η − AW . Again, we restrict our choice of W to some subset ˆW ⊆ W to

construct a cutting surface model minorising fRm + as

f_W,ηˆ (y) = max W∈ ˆW

fW,η(y) . (3.77)

It is also possible to define the augmented Lagrangian as Lk(y, W, η) =C − AT_{y, W}_{+ (b − η)}T_{y +}u

2ky − ˆy

k2. (3.78)

The next trial point yk+1 _{is again found as the optimal solution of}

min y∈Rm max W∈ ˆWk_,η_∈Rm Lk_{(y, W, η)} _(3.79) or of its dual max W∈ ˆWk_,η_∈Rmymin∈RmL k_{(y, W, η) .} _(3.80)

In the unbounded case, the dual was efficiently solvable, because the inner minimisation over y was unconstrained and the outer maximisation over W could be reformulated as a small quadratic semidefinite problem (3.65) in Proposition 128. The first property still holds here. Hence, corresponding to (3.63), we get yk min(W, η) = ˆyk+ 1 u(AW − b + η) = ˆy k −_u1∇fW . (3.81)

However, the second property does not hold anymore, because η introduces m more variables. To steer clear of this difficulty, (3.80) is solved approximately by a series of coordinate wise steps: First, an η+ _{is fixed and}

max

W∈ ˆWk

min

y∈RmL

k_{(y, W, η}+₎ _(3.82)

is solved by the same interior point method used to solve the subproblem stated in Proposition 128 in order to find an optimal W+_{. Afterwards, this W}+ _{is kept fixed and}

max η∈Rm + min y∈RmL k_{(y, W}+_{, η)} _(3.83)

is solved exactly to find the optimal η+_{. From W}+ _{and η}+_{, we can compute y}+ _{= y}k

min(W+, η+)

using formula (3.81), and y+ _{is guaranteed to be feasible, because (3.83) is solved exactly. The}

two steps are iterated until the approximate solution (y+_{, W}+_{, η}+_{) is good enough, e.g., fulfils}

f_Wˆk_,η+(y

+₎

− fW+_,η+(y+)≤ κ_Mf (ˆyk)− f_W+_,η+(y+) (3.84)

for some κM ∈ (0, ∞]. Indeed, if the approximate solution is optimal in (3.79) and (3.80), then

the left-hand side of (3.84) is zero (compare Lemma 5.4.3 in [65]). Note that f_Wˆk_,η+(y+) can be

computed easily using (3.58) and that the stopping criterion (3.70) can already be checked for any approximate solution (y+_{, W}+_{, η}+_{). If the test is passed, the oracle and the descent test are called}

with yk+1_{= y}+_{, W}k+1_{= W}+ _{and η}k+1_{= η.}

We are now prepared to give an overview of the whole spectral bundle method with bounds in Algorithm 1.

Algorithm 1 Spectral bundle method with bounds Input: y0

∈ Rm_{, ǫ}

≥ 0, κ ∈ (0, 1), κM ∈ (0, ∞], u > 0.

1: loop

2: _{{Initialisation} Set k = 0, ˆy}0 = y0_{, compute f (y}0_{) and ˆ}

W0_{, find η}0 _{optimal in (3.83) for}

some fixed W ∈ ˆW0_.

3: repeat{Trial point finding} 4: Set η+_{= η}k_.

5: Find W+_{∈ ˆ}_Wk _{optimal in (3.82) for fixed η}+_.

6: Find η+_{∈ R}m

+ optimal in (3.83) for fixed W+.

7: Set y+= yk_min(W+, η+) (v. (3.81)). 8: if f (ˆyk)− fW+_,η+(y+) < ǫ

f (ˆyk₎

+ 1 then 9: return Precision achieved.

10: end if

11: until f_Wˆk_,η+(y+)− fW+_,η+(y+)≤ κ_Mf (ˆyk)− f_W+_,η+(y+)

12: Set yk+1_{= y}+_{, W}k+1_{= W}+ _{and η}k+1_{= η.}

13: _{{Oracle} Find W}_Sk+1_{∈ Argmax}

W∈W C − A

T_yk+1_{, W}_{and f (y}k+1_).

14: if f ˆyk_{− f y}k+1_{≥ κ f ˆy}k_{− f}

Wk+1_,ηk+1 yk+1 then

15: {Descent step} Set ˆyk+1= yk+1 and find ηk+1 optimal in (3.83) for fixed W_Sk+1.

16: else

17: {Null step} Set ˆyk+1 = ˆyk_.

18: end if

19: {Model updating} Choose ˆWk+1⊃Wk+1, W_Sk+1 of the form (3.56). 20: Set k = k + 1.

21: end loop

Using the Method in a Branch-and-Cut Approach

The theoretical convergence of the spectral bundle method with bounds in a setting, where primal inequalities are given by separation oracles was investigated by Helmberg in [71]. There were three important assumptions. First, the intersection of the primal SDP relaxation with the polytope defined by a finite set of cutting planes had to be strictly feasible. Second, the cutting planes had to be given by maximum violation oracles, i.e., by separation procedures that always return a maximally violated inequality if there are violated inequalities at all. Third, the separation oracle had to be called for each W+ _{solving (3.82), i.e., after line 5 of Algorithm 1. We build on a}

practical version of this algorithm pursued by Helmberg in [71]. The main implication of [71] was to be cautious, when inequalities were deleted. Separation was only employed after descent steps. From a practitioners point of view, one hopes that in a branch-and-cut approach branching takes place long before theoretical convergence problems occur.

Chapter 4

Implementation of a

Branch-and-Cut Approach

We attempt to solve the minimum bisection problem by a branch-and-cut approach. We have already introduced the basic principles of branch-and-cut and its utilisation to solve graph parti- tioning problems in Sections 1.4.5 and 2.1. We will use the primal semidefinite relaxation (3.19) and its dual (3.20) introduced in Section 3.2 to derive primal feasible solutions and to compute dual bounds. The dual relaxation is solved in its equivalent form as the maximum eigenvalue minimisation problem (3.26) by the spectral bundle method with bounds explained in Section 3.4. The first section of this chapter will provide an overview of our solution approach. The later sections go into details of the algorithms and the implementation.

4.1 Overview of the Branch-and-Cut Approach

This section gives an overview of our branch-and-cut approach. Section 4.1.1 briefly describes the branch-and-cut framework SCIP, which we use. After some general remarks, we will concentrate on the main solving loop in SCIP which eventually calls the SDP solver to solve SDP relaxations corresponding to nodes of the branch-and-bound tree. The main administrative algorithm of this SDP solver, which is implemented as a plugin to SCIP, is described in Section 4.1.2. For the actual computations, it uses calls to an external solver implementing an interface to the spectral bundle method. This external solver will be the topic of Section 4.1.3.

In document Branch-and-Cut for a Semidefinite Relaxation of Large-scale Minimum Bisection Problems (Page 114-123)