3.4 The Spectral Bundle Method
3.4.2 Details of the Spectral Bundle Method
Being a specialisation of the proximal bundle method, described in Section 3.3.4, the spectral bundle method also builds a minorising model, solves the minimisation problem for the augmented model and performs the descent test (3.41) in order to decide between descent step and null step. The differences to the proximal bundle method described in Section 3.3.4 are caused by the specific oracle being used and the choice of the minorising model.
The Oracle and its Computation
The oracle in the spectral bundle method computes for a given trial point yk the maximum eigen-
value of the matrix C− ATyk and a matrix
Wk
S ∈ Argmax
W∈W C − A
Tyk, W
. (3.55)
From Section 3.4.1, we know that b− AWk
S is a subgradient of the function (3.49) that should be
minimised. Furthermore, we established that a specific Wk
S can always be given as WSk= vvT with
Depending on the size and structure of the matrix in question, in our case C − ATy, the
eigenvalue and -vector computations can be done by exact or, more typically, by iterative methods as the power method or the Lanczos method. The power method starts with a randomly chosen normalised vector. It uses consecutive left multiplications of this vector by the matrix to produce a series of vectors. This series has an accumulation point in the eigenspace of the maximum eigenvalue of the matrix if the starting vector has some component in this eigenspace, i.e., if it does not live completely in the eigenspace of other eigenvalues. This can be seen by looking at the eigenvalue decomposition of the matrix and observing that components in the direction of the maximal eigenspace will dominate the product eventually.
Although being related to the power method, an explanation of the Lanczos method would be much more involved, so we just refer to [65] for an review of its use in the spectral bundle method and to Golub and van Loan [58] for details. Here, we only mention practical implications given in [65] regarding its performance in the spectral bundle method.
• The method is particularly efficient if matrix vector multiplications can be performed effi- ciently, which, for instance, is the case if the matrix is sparse. That is why the spectral bundle method is well suited for our problem if the objective coefficient matrix is sparse, be- cause the underlying graph is sparse, and if we take care to restrict the support of separated inequalities (compare Section 4.3.4). Furthermore, we can use structured constraint matrices like f fT of the bisection constraint in (3.19).
• In theory, the Lanczos method should stop after, at most, n matrix vector multiplications, where n denotes the dimension of the symmetric matrix. Convergence can even be speeded up if a starting vector with a large component within the maximal eigenspace is employed, e.g., the eigenvector computed for the last successful trial point. If, however, the maximal eigenvalue is not well separated from the remaining ones, then numerical difficulties lead to much larger iteration numbers than n, and the Lanczos method becomes the bottleneck of the spectral bundle method. This effect can be seen, when we are near the optimal solution, it is typical for SDPs and caused by the facial structure of their feasible set. A way to partially overcome this problem is to use inexact computations in case a null step can be foreseen. This is the case if the approximate eigenvectors of the iterative procedure are via (3.47) already sufficient to show that the function value of the next trial point will cause the descent test to fail. Then, the Lanczos method can be stopped prematurely, and its current iterate is used to enhance the minorising model of the eigenvalue function.
The Cutting Surface Model
At the end of Section 3.4.1, we noted that a function (3.54) minorising our objective function (3.49) can be achieved by restrictingW to a smaller set ˆW ⊆ W. In iteration k, this set is chosen as
ˆ Wk =P
kV PkT + α ¯Wk : trV + α = 1, V ∈ Sr+k, α≥ 0 , (3.56)
where Pk is an orthonormal n× rk matrix and ¯Wk ∈ W is called the aggregate matrix. The
surface model, reads
fWˆk(y) = max
W∈{PkV PkT+α ¯Wk:trV +α=1,V ∈Srk+,α≥0}
C − ATy, W + bTy . (3.57)
The model is called cutting surface model, because it is – in contrast to cutting plane models like the one in the proximal bundle method – not polyhedral, since we use the nonpolyhedral positive semidefinite cone for its definition.
In [65], it was shown that the value of this model at y is
fWˆk(y) = maxλmax PkT C− ATy Pk , C − ATy, ¯Wk + bTy . (3.58)
What can we learn from this expression? The computationally most expensive term in (3.58) is the maximum eigenvalue. So, we realise that the model value can be efficiently computed if the bundle size rk is kept small, because then, the eigenvalue computation could be performed quickly.
Furthermore, in order to achieve a high model value in the neighbourhood of the current point yk,
i.e. a good approximation of the objective function by the minorising model, one should choose Pk
so that it spans the eigenspaces of the largest eigenvalues of C−ATyk, because, by the continuity of
eigenvalues and eigenvectors, these large eigenvalues and corresponding eigenvectors will produce a large vT C− ATy v in the neighbourhood of yk.
From Proposition 117, we have learned that PkV PkT corresponds to a small face of the semidef-
inite cone Sn+ with dimension rk +1
2 . Corollary 118 tells us that this face might be too small
to contain a matrix W∗ = Pk
i=1µivi(vi)T, where the vi form a basis of the eigenspace of an
eigenvector to the optimal λmax C− ATy∗. There are two ways out of this trap.
The first way is pursued by the spectral bundle method and uses the term α ¯Wk. It enables us to
reach from the face into the interior of S+
n, while the additional computational costs are negligible.
The second way is to provide enough columns rk to Pk so that it can span eigenvectors to the
largest eigenvalue (compare Corollary 118). A nontrivial lower bound on the number of columns needed is given in Lemma 5.3.9 of [65] as rk > ¯r + 1, where ¯r∈ N is the largest number satisfying
¯ r+1
2 ≤ m + 1 and m denotes the number of primal constraints. Its proof uses Pataki’s bound on
the rank of matrices in faces of the feasible set of semidefinite programs as given in Proposition 124. Clearly, the second way is not feasible for us, since the number of primal constraints m will be rather large, when we use a branch-and-cut approach.
Finding the Next Trial Point
As in the proximal bundle method, we now want to find the next trial point yk+1and an associated
matrix Wk
S given the current point ˆykand the cutting surface model (3.57). Again, we will augment
it by a trust region term u
2ky − ˆykk2. Thus, it reads
fk(y) = max W∈ ˆWkC − A Ty, W + bTy +u 2ky − ˆy k k2 . (3.59) The function Lk(y, W ) =C − ATy, W + bTy + u 2ky − ˆy k k2 (3.60)
In order to find the next trial point, we have to solve the following problem. min y∈Rmf k(y) = min y∈Rm max W∈ ˆWkC − A Ty, W + bTy +u 2ky − ˆy k k2. (3.61)
Its dual is found be exchanging min and max. max W∈ ˆWk min y∈RmC − A Ty, W + bTy +u 2ky − ˆy k k2. (3.62)
Observe that the inner minimisation problem of (3.62) with respect to y is unconstrained, so we can explicitly compute its optimal solution as
ykmin(W ) = ˆyk+
1
u(AW − b) = ˆy
k
−u1∇fW (3.63)
for any fixed W ∈ ˆWk. Using semidefinite duality [65] or general theorems of convex duality, it
can be shown that strong duality holds for problems (3.61) and (3.62). The semidefinite duality approach is particularly helpful, because it allows to expose a way to determine the optimal W . Proposition 128 ([65]). Given the augmented Lagrangian Lk(y, W ) as in (3.60),
min y∈Rm max W∈ ˆWkL k(y, W ) = Lk(yk+1, Wk+1) = max W∈ ˆWkymin∈RmL k(y, W ), (3.64) where yk+1= yk
min(Wk+1) is unique and Wk+1 is an optimal solution of the quadratic semidefinite
problem min 2u1kb − AW k2 −W, C − ATyˆk − bTyˆk s.t. W = PkV PkT+ α ¯Wk trV + α = 1 V 0, α ≥ 0 . (3.65)
Problem (3.65) can be solved efficiently by interior point methods if its dimension controlled by rk is kept small. For a description of the interior point method that is used in the spectral bundle
method, we refer the interested reader to [65].
Remark 129. Using the equivalence of the eigenvalue optimisation problem (3.26) to the dual SDP (3.20), it can be worked out that the optimal solutions Wk+1= P
kV∗PkT + α∗W¯k can be in-
terpreted as infeasible approximate solutions to the primal problem (3.22) (compare Theorem 5.3.8 in [65]). For the case of the spectral bundle method with bounds, this fact was proved by Helmberg in [71]. We will use this property in our branch-and-cut approach.
Updating the Cutting Surface Model
Once we have computed the next trial point yk+1, the oracle is called to deliver the corresponding
WSk+1 = vk+1(vk+1)T with vk+1 a normalised eigenvector to λ
max C− ATy. It gives rise to
a subgradient and a supporting cutting plane of our objective function (3.45) in yk+1, and this
information should be incorporated into the cutting surface model. If the maximal number of allowed columns of Pk is not yet reached, we simply orthonormalise the new eigenvector vk+1with
to preserve the most important information of ˆWk. From relation (3.64) of Proposition 128, we can
conclude that Wk+1and its associated cutting plane f
Wk+1 are important, because they guarantee
that the value of the augmented model cannot decrease in a null step. To ensure convergence of the model to the true function near the optimal solution, one has to guarantee that
Wk+1, WSk+1∈ ˆWk+1 (3.66)
(compare the convergence analysis given in [65]). To this end, it would suffice to set ¯Wk+1 = Wk+1
and Pk+1= vk+1, i.e., to keep rk = 1. Despite this, it is helpful to use a larger bundle size in order
to store a larger subspace spanned by the current columns of Pk, as explained on page 82. However,
this has to be balanced to the costs of computing the quadratic semidefinite subproblem (3.65). Anyway, a promising subspace to be kept can be determined from the optimal solution Wk+1 =
PkV∗PkT + α∗W¯k of the quadratic semidefinite subproblem (3.65). If QΛQT = V∗ denotes an
eigenvalue decomposition of V∗ with QTQ = I, Λ = diag(λ1, . . . , λrk) and λ1 ≥ . . . ≥ λrk ≥ 0,
then we can write
Wk+1 = (P kQ)Λ(PkQ)T + α∗W¯k = Pk[Q1Q2] " Λ1 0 0 Λ2 # " QT 1 QT 2 # PT k + α∗W¯k (3.67)
with Λ1 an upper-left principal submatrix of Λ corresponding to large eigenvalues, Λ2 containing
the remaining eigenvalues, and Q1 and Q2 splitting up Q correspondingly. By the ordering of the
eigenvalues, Q1 carries more information about Wk+1 than Q2. This information will be directly
conserved in the new bundle Pk+1, which is computed so that it spans an orthonormal basis of the
space spanned by PkQ1 and vk+1, i.e.,
Pk+1= orth[PkQ1, vk+1] . (3.68)
The maximal possible dimension of Q1 is determined by a parameter nK ∈ N. Furthermore, note
that the Lanczos method delivers several vectors which are candidates to span a promising new subspace. Therefore, the spectral bundle method may be allowed to include more of them into the new bundle. The maximal number of new vectors to add is given by a parameter nA ∈ N. The
sum nK+ nAdetermines the maximal allowed bundle size. The new aggregate matrix
¯ Wk+1=
(PkQ2)Λ2(PkQ2)T + α∗W¯k
trΛ2+ α∗
(3.69) keeps the information given by the columns of Q2. It can be checked that Pk+1and ¯Wk+1conform
to (3.56) and that (3.66) is fulfilled (compare Proposition 5.2.3 of [65]).
Stopping the Computations
For the lack of a lower bound, we cannot judge if our current solution is sufficiently close to the optimum. However, we know that the quality of the cutting surface model in the interesting region is improved in every step. Furthermore, the trust region approach guarantees that we seek for the next trial point within a ball in which the current model is sufficiently accurate. If this ball is large
enough (u small enough), and the model value of the next trial point fWk+1(yk+1) is close to the
function value f (ˆyk) of the current centre of stability, chances for a reasonable progress are low
and we stop. This stopping criterion can be expressed as
f (ˆyk)− fWk+1(yk+1) < ǫ(f (ˆyk)+ 1) . (3.70) Incorporating Variable Bounds
So far, we have explained the spectral bundle method for unbounded variables y∈ Rmcorrespond-
ing to equality constraints in our primal SDP relaxation. However, we have already one inequality constraint in our primal SDP, namely the bisection constraint, and the branch-and-cut approach will add many more inequality constraints. Therefore, we will need to use the spectral bundle method with bounds as described by Helmberg and Kiwiel in [68]. Following again [65], we will explain the main idea on the problem
min
y∈Rm +
λmax(C− ATy) + bTy . (3.71)
Denote by f (y) the objective function of (3.71). Using an indicator function ιRm
+ of R m + with ιRm +(y) = 0 if y ∈ R m + and ιRm
+(y) = ∞ otherwise, we can rewrite this problem as an unbounded
problem with respect to y as
min y∈RmfR m +(y) (3.72) with fRm +(y) = λmax(C− A Ty) + bTy + ι Rm +(y) . (3.73)
Corresponding to (3.51), we can also find for fRm
+(y) linear minorants
fW,η(y) =hC, W i + hb − η − AW, yi, (3.74)
where η∈ Rm
+ collects the Lagrange multipliers to the nonnegativity constraints for y (or subgra-
dients of ιRm
+). Each entry of η can also be interpreted as a primal slack variable which transforms
the corresponding inequality constraint of the primal SDP into an equality constraint. Similar to (3.50), the objective function can again be reformulated as an optimisation problem, but this time overW and Rm +, fRm +(y) = sup W∈W,η∈Rm + hC, W i + hb − η − AW, yi . (3.75) The subdifferential of fRm +(y) at y∈ R m + turns out to be
∂f (y) = {∇fW,η(y) : W ∈ W, η ∈ Rm+, fW,η(y) = f (y)} (3.76)
with ∇fW,η(y) = b− η − AW . Again, we restrict our choice of W to some subset ˆW ⊆ W to
construct a cutting surface model minorising fRm + as
fW,ηˆ (y) = max W∈ ˆW
fW,η(y) . (3.77)
It is also possible to define the augmented Lagrangian as Lk(y, W, η) =C − ATy, W + (b − η)Ty +u
2ky − ˆy
k
k2. (3.78)
The next trial point yk+1 is again found as the optimal solution of
min y∈Rm max W∈ ˆWk,η∈Rm Lk(y, W, η) (3.79) or of its dual max W∈ ˆWk,η∈Rmymin∈RmL k(y, W, η) . (3.80)
In the unbounded case, the dual was efficiently solvable, because the inner minimisation over y was unconstrained and the outer maximisation over W could be reformulated as a small quadratic semidefinite problem (3.65) in Proposition 128. The first property still holds here. Hence, corre- sponding to (3.63), we get yk min(W, η) = ˆyk+ 1 u(AW − b + η) = ˆy k −u1∇fW . (3.81)
However, the second property does not hold anymore, because η introduces m more variables. To steer clear of this difficulty, (3.80) is solved approximately by a series of coordinate wise steps: First, an η+ is fixed and
max
W∈ ˆWk
min
y∈RmL
k(y, W, η+) (3.82)
is solved by the same interior point method used to solve the subproblem stated in Proposition 128 in order to find an optimal W+. Afterwards, this W+ is kept fixed and
max η∈Rm + min y∈RmL k(y, W+, η) (3.83)
is solved exactly to find the optimal η+. From W+ and η+, we can compute y+ = yk
min(W+, η+)
using formula (3.81), and y+ is guaranteed to be feasible, because (3.83) is solved exactly. The
two steps are iterated until the approximate solution (y+, W+, η+) is good enough, e.g., fulfils
fWˆk,η+(y
+)
− fW+,η+(y+)≤ κMf (ˆyk)− fW+,η+(y+) (3.84)
for some κM ∈ (0, ∞]. Indeed, if the approximate solution is optimal in (3.79) and (3.80), then
the left-hand side of (3.84) is zero (compare Lemma 5.4.3 in [65]). Note that fWˆk,η+(y+) can be
computed easily using (3.58) and that the stopping criterion (3.70) can already be checked for any approximate solution (y+, W+, η+). If the test is passed, the oracle and the descent test are called
with yk+1= y+, Wk+1= W+ and ηk+1= η.
We are now prepared to give an overview of the whole spectral bundle method with bounds in Algorithm 1.
Algorithm 1 Spectral bundle method with bounds Input: y0
∈ Rm, ǫ
≥ 0, κ ∈ (0, 1), κM ∈ (0, ∞], u > 0.
1: loop
2: {Initialisation} Set k = 0, ˆy0 = y0, compute f (y0) and ˆ
W0, find η0 optimal in (3.83) for
some fixed W ∈ ˆW0.
3: repeat{Trial point finding} 4: Set η+= ηk.
5: Find W+∈ ˆWk optimal in (3.82) for fixed η+.
6: Find η+∈ Rm
+ optimal in (3.83) for fixed W+.
7: Set y+= ykmin(W+, η+) (v. (3.81)). 8: if f (ˆyk)− fW+,η+(y+) < ǫ
f (ˆyk)
+ 1 then 9: return Precision achieved.
10: end if
11: until fWˆk,η+(y+)− fW+,η+(y+)≤ κMf (ˆyk)− fW+,η+(y+)
12: Set yk+1= y+, Wk+1= W+ and ηk+1= η.
13: {Oracle} Find WSk+1∈ Argmax
W∈W C − A
Tyk+1, W and f (yk+1).
14: if f ˆyk − f yk+1 ≥ κ f ˆyk − f
Wk+1,ηk+1 yk+1 then
15: {Descent step} Set ˆyk+1= yk+1 and find ηk+1 optimal in (3.83) for fixed WSk+1.
16: else
17: {Null step} Set ˆyk+1 = ˆyk.
18: end if
19: {Model updating} Choose ˆWk+1⊃Wk+1, WSk+1 of the form (3.56). 20: Set k = k + 1.
21: end loop
Using the Method in a Branch-and-Cut Approach
The theoretical convergence of the spectral bundle method with bounds in a setting, where primal inequalities are given by separation oracles was investigated by Helmberg in [71]. There were three important assumptions. First, the intersection of the primal SDP relaxation with the polytope defined by a finite set of cutting planes had to be strictly feasible. Second, the cutting planes had to be given by maximum violation oracles, i.e., by separation procedures that always return a maximally violated inequality if there are violated inequalities at all. Third, the separation oracle had to be called for each W+ solving (3.82), i.e., after line 5 of Algorithm 1. We build on a
practical version of this algorithm pursued by Helmberg in [71]. The main implication of [71] was to be cautious, when inequalities were deleted. Separation was only employed after descent steps. From a practitioners point of view, one hopes that in a branch-and-cut approach branching takes place long before theoretical convergence problems occur.
Chapter 4
Implementation of a
Branch-and-Cut Approach
We attempt to solve the minimum bisection problem by a branch-and-cut approach. We have already introduced the basic principles of branch-and-cut and its utilisation to solve graph parti- tioning problems in Sections 1.4.5 and 2.1. We will use the primal semidefinite relaxation (3.19) and its dual (3.20) introduced in Section 3.2 to derive primal feasible solutions and to compute dual bounds. The dual relaxation is solved in its equivalent form as the maximum eigenvalue minimisation problem (3.26) by the spectral bundle method with bounds explained in Section 3.4. The first section of this chapter will provide an overview of our solution approach. The later sections go into details of the algorithms and the implementation.
4.1
Overview of the Branch-and-Cut Approach
This section gives an overview of our branch-and-cut approach. Section 4.1.1 briefly describes the branch-and-cut framework SCIP, which we use. After some general remarks, we will concentrate on the main solving loop in SCIP which eventually calls the SDP solver to solve SDP relaxations corresponding to nodes of the branch-and-bound tree. The main administrative algorithm of this SDP solver, which is implemented as a plugin to SCIP, is described in Section 4.1.2. For the actual computations, it uses calls to an external solver implementing an interface to the spectral bundle method. This external solver will be the topic of Section 4.1.3.