3.3 Solution Methods for SDPs
3.3.4 The Bundle Method
This system has n(n+1)2 equations in m + p + n +n(n+1)2 variables. The variables z and L0can be
defined as functions of the variables y, u and w. This fact can be exploited to rewrite (3.38) as the nonlinear program
min hb, yi + hh, ui + hd, z(y, w, u)i − 2µ
n X i=1 log wi− µ p X j=1 log uj s.t. (y, u, w)∈ Rm× Rp ++× Rn++ . (3.40)
Burer et al. solve this problem with a limited-memory Broyden-Fletcher-Goldfarb-Shanno approach using a Wolfe-Powell line search [16]. They show that they can exploit the sparsity of the primal objective coefficient matrix and the primal constraint matrices to efficiently evaluate the objective function of (3.40) and to determine its gradient. Although the gradient computations use matrix multiplications with an in general dense matrix X(w, y, u), which serves as an infeasible approxi- mation to primal feasible matrices of (3.36), Burer et al. were able to find a sparse analogue ˆX of X that lives only on the nonzero components of the primal objective and coefficient matrices.
In [23, 70, 98], the gradient-based log-barrier approach was compared to the spectral bundle method and several other methods. The result was that large-scale semidefinite programs arising from combinatorial optimisation could only be solved by these two methods. Between them, no clear winner could be deduced, since both methods have advantages and disadvantages. However, the gradient-based log-barrier approach seemed to be ahead in case of a large number of constraints. Note that Burer et al. did not specify whether an efficient restart is possible in case that the primal problem is modified by additional constraints.
3.3.4
The Bundle Method
An outline of the history and development of bundle methods, which reaches back to the seventies of the last century, can be found in the textbook by Hiriart-Urruty and Lemar´echal [75]. We will start by explaining the idea of Kiwiel’s proximal bundle method [87] of which both the bundle method as used in combinatorial optimisation [46, 113, 125] as well as the spectral bundle method are specialisations. We finish this section by sketching the bundle method as used in [46, 113, 125]. The spectral bundle method will be the topic of Section 3.4.
The proximal bundle method is designed to minimise a nonsmooth convex function f : Rm→ R.
We assume that this function is given by a first order oracle that delivers for a given point ¯y the function value f (¯y) and a subgradient ¯g ∈ ∂f(¯y). The subgradients satisfy the subgradient inequality
f (y)≥ f(¯y) + h¯g, y − ¯yi ∀y ∈ Rm.
y1, . . . , yk. Then a cutting plane model ˆf minorising f on Rm can be written as ˆ fk(y) = max i=1,...,k f (y i) +gi, y − yi .
This model will be of reasonable quality only in the neighbourhoods of the yi. That is why we
use the proximal point idea by restricting the search for the next trial point yk+1, called the centre
of stability, onto the neighbourhood of the last successful iterate ˆyk by minimising an augmented
model fk(y) = ˆfk(y) + u 2 y− ˆyk 2 . Figure 3.1 gives an overview of the situation reached so far.
f (y) ˆ yk, f (ˆyk) ˆ f y ˆ f fk f
Figure 3.1: The function f , two supporting hyperplanes making up the model ˆf and the augmented model fk.
We accept the minimiser yk+1 of fk as the new centre of stability ˆyk+1 in a so-called descent
step only if it passes the following descent test for a given parameter κ∈ (0, 1): f (ˆyk)− f(yk+1)
≥ κhf (ˆyk)− ˆfk(yk+1)i . (3.41) The idea is that the cutting plane model ˆfk will overestimate the real decrease f (ˆyk)− f(yk+1)
most of the time. Therefore, we demand that the model can predict the real decrease reasonably well. The test is particularly hard to pass for large values of κ. If the test is not passed, then the model is not trustworthy yet, and we rather use our new candidate yk+1 together with its
subgradient gk+1 to improve the model, i.e., we make a so-called null step by setting ˆyk+1 = ˆyk
and adding the new hyperplane f (yk+1) +gk+1, y− yk+1, supporting f in yk+1, to the model.
The whole process stops, when the model predicts a relative progress below a prespecified ǫ > 0. Some remarks are in order.
• The term bundle originates from the method’s ability to collect information about the func- tion in the bundle of supporting hyperplanes.
• The choice of u is a tricky business. On the one hand, a neighbourhood that is too small (u too large) is obstructive if we are at a point, where good progress is still possible. This would be indicated by a series of descent steps. On the other hand, a neighbourhood that is too large (u too small) does not help at all if it results in a long series of null steps caused by trial points that are too far away from the current centre and whose cutting planes cannot improve the model value in the neighbourhood of the current centre. A good range for the choice of u depends on the type of problem at hand. To overcome these difficulties, u can be updated. For possible strategies see, e.g., [87].
To use the (proximal) bundle method for the computation of dual bounds of primal SDP relaxations, we have to specify the function f and how it can be evaluated, and how a minorising model can be constructed. The explanation is most convenient if we consider a primal maximisation problem with two sets of equality constraints,AX = a and BX = b.
max hC, Xi s.t. AX = a
BX = b X 0
(3.42)
We assume that solving this programme without the constraints AX = a is efficiently possible, e.g., by some interior point code, but addingAX = a makes it hard to solve. Lifting the difficult equality constraints AX = a with a Lagrange multiplier y ∈ Rm into the objective function, we
obtain the Lagrangian
L(X, y) = hC, Xi + yT(a
− AX) . (3.43)
Assuming thatX := {X 0 : BX = b} is nonempty and bounded, we introduce the dual functional f (y) = max
X∈XL(X, y) = a
Ty + max
X∈XC − A
Ty, X , (3.44)
which should be minimised over y∈ Rm to get an upper bound on the optimal objective value of
(3.42). This means, we have to solve min
y∈Rmf (y) = miny∈RmXmax∈XL(X, y) = maxX∈Xymin∈RmL(X, y),
where the last equality follows from general results of convex analysis using the boundedness ofX (Corollary 37.3.2 in [114]). From (3.44), we see that the evaluation of f for some fixed ˆy amounts to solving a semidefinite programme including only the easy constraintsBX = b. The result of this evaluation will be the function value f (ˆy) and the maximiser ˆX. Together, they form a so-called matching pair ˆX, ˆy with f (ˆy) = L ˆX, ˆy. Given such a pair, we deduce from (3.43) that a subgradient g (ˆy) is given by a− A ˆX, i.e.,
f (y)≥ f (ˆy) +Da− A ˆX, y− ˆyE .
so far.
n ˆX1, ˆy1, . . . , ˆXk, ˆyko .
Initially, this set consists of just one matching pair. Since
max i=1,...,kf ˆy i + g ˆyi , y − ˆyi = max λ∈Rk +,λTe k X i=1 λi f ˆyi + g ˆyi , y − ˆyi ,
the model ˆfk(y) minorising f (y) on Rmcan be written as
ˆ fk(y) = max λ∈Rk +,eTλ=1 k X i=1 λif ˆyi + λig ˆyi , y − ˆyi .
Using the identities f ˆyi
= L ˆXi, ˆyi for i = 1, . . . , k and the definition of L(X, y), we can
equivalently write ˆ fk(y) = max λ∈Rk +,eTλ=1 k X i=1 λi D C, ˆXiE+λig ˆyi , y = max λ∈Rk +,eTλ=1 FTλ + yTGλ, where F :=DC, ˆX1E, . . . ,DC, ˆXkET and G := g ˆy1 , . . . , g ˆyk.
Again, we use the proximal point idea and minimise the augmented model, i.e., min y∈Rmλ max ∈Rk +,eTλ=1 FTλ + yTGλ + u 2 y− ˆyk 2 = max λ∈Rk +,eTλ=1 min y∈RmF Tλ + yTGλ + u 2 y− ˆyk 2 ,
where the equality follows once more from general results of convex analysis using the compactness of the set{λ ∈ Rk
+, eTλ = 1} (Corollary 37.3.2 in [114]). The latter problem is solved by observing
that the inner minimisation is an unconstrained quadratic problem with respect to y; so setting the partial derivatives with respect to y equal to zero, the minimiser ˆy can be determined explicitely and can be plugged into the dual in order to yield a convex quadratic problem over λ∈ Rk
+, eTλ = 1.
This problem can be solved efficiently by interior point methods; and it is interesting to observe that it is less expensive than the semidefinite programme that is required for the function evaluation with its determination of ˆX.
Note that the use of primal inequality constraints is possible and realised in the same fashion as in the case of the spectral bundle method. Therefore, we defer this topic to later.
Dynamic bundle methods of the kind explained above, but with possible inequality constraints, were used by Fischer, Gruber, Rendl and Sotirov in [46] and Rendl, Rinaldi and Wiegele [113, 125]. The term ‘dynamic’ refers to the used scheme of dynamic addition and deletion of primal inequal- ities during the course of the optimisation. Such an approach was required by the large number of possible constraints, e.g., 4 n3 triangle inequalities in case of the maximum cut problem. Inequal- ities that should be added to the problem were separated with respect to a convex combination of the primal points ˆXk−1 and ˆXk, while inequalities that should be deleted from the problem were
detected by a small value of the corresponding Lagrange multiplier ˆyi compared to the maximal
Lagrange multiplier in the problem [46]. A theoretical justification, from a convergence point of view, in case of dynamic deletion and addition of constraints was given by Helmberg for the spec- tral bundle method [71], and can be transfered to the dynamic bundle method, because the latter can be viewed as a restricted version of the former.
Fischer et al. computed relaxations of the maximum cut problem and the k-equipartition prob- lem. Dimensions of problem instances solved reached up to n = 2000 and m = 27000 for maximum cut and n = 500 and m = 83000 for k-equipartition. The practical limits of the approach are clearly given by the necessity to solve a semidefinite programme with an n× n matrix variable for the function evaluation. Rendl, Rinaldi and Wiegele [113, 125] applied the dynamic bundle method as the bounding procedure within a branch-and-cut framework to solve maximum cut relaxations of sizes up to n = 400 to (near) optimality. Of particular interest to us are results obtained on the equicut problem for a set of test graphs known as the Johnson graphs [77], where Rendl et al. were able to prove optimality of known primal solutions for the first time. We will come back to these results in our numerical evaluation.
It should be expected that the bundle method cannot solve as large problems as the spectral bundle method, because the former is restricted by the size of the semidefinite programs that have to be solved for the function evaluation. However, the bundle method works on a stronger oracle, because fewer constraints are relaxed, so when both methods are applicable, dual bounds of the bundle method should, in general, be better.