Constructing Partitions - Primal Heuristics

4.2 Primal Heuristics

4.2.1 Constructing Partitions

In Remark 129, we have stated that, in general, infeasible approximations to solutions X of the primal SDP relaxation (3.19) are given to us by the spectral bundle method via

Wk+1= PkV∗PkT+ α∗W¯k . (4.3)

Here, V_∗is a positive semidefinite matrix, so one can use its eigenvalue decomposition to establish the existence of a symmetric matrix V12

∗ , i.e., V∗ = V 1 2 ∗ V 1 2

∗ , which is called the square root of V∗.

With P = PkV

1 2

∗ and ¯W = α∗W¯k, we write our primal approximations as

X_{≈ P P}T_{+ ¯}_{W .} _(4.4)

The diagonal entries of ˜X may not be equal to one, and the off-diagonal entries may not be in [−1, 1]. This, however, would be the case if diag ˜X = e and ˜X 0. In order to use ˜X for purposes of separation or in primal heuristics, we slightly modify it by three different approaches given by Helmberg in [71]. ˜ Xij := ˜ Xij max{ ˜Xvv : v∈ V } (4.5) ˜ Xij := ˜ Xij q ˜ XiiX˜jj (4.6) ˜ Xij := max{−1, min{ ˜Xij, 1}} (4.7)

In this section, we will use the matrices ˜X and P to derive bisections.

Initial Heuristic Rounding

This heuristic is only concerned with the matrix ˜X. Once it is appropriately modified (see (4.5), (4.6) and (4.7)), its off-diagonal entries ˜Xij are contained in [−1, 1]. A value ˜Xij =−1 indicates

that nodes i and j should be placed in different clusters, while ˜Xij = 1 suggests to put nodes i

and j in the same cluster.

The heuristic Rounding, as displayed in Algorithm 2, uses this intuition. Furthermore, we exploit that the support ¯E is connected, because the problem always contains the bisection constraint of the LP relaxation on the spanning star (see Section 4.1.1). ¯E is traversed edge by edge in a breadth first search manner, while edges ij with large| ˜Xij| are preferred. The nodes i and

j of every considered edge are placed into the two clusters, depending on the sign of ˜Xij and the

position of other adjacent nodes that are already placed. This assignment is encoded in the vector z∈ {−1, 0, 1}|V |_{, where z}

v=−1 and zv= 1 correspond to v being in one of the two clusters, and

zv= 0 means that v is not yet assigned to any set.

In line 1, we initialise an empty assignment z. We insert an edge emax with largest absolute

value_{| ˜}Xemax| into an initially empty edge heap ˆH. This heap will be sorted by nonincreasing values

| ˜Xij|, so that the top position is taken by an edge {i, j} with maximum value | ˜Xij|. Furthermore,

we initialise two helping sets ¯H and ¯D as empty.

Algorithm 2 Initial heuristic Rounding

Input: X _{∈ [−1, 1]}|V |×|V |with ˜Xvv= 1 for all v∈ V , weights fv for all v∈ V , capacity F

Output: Bisection vector z_{∈ {−1, 1}}|V | _{or zero vector 0}|V |

1: Initialise z∈ {−1, 0, 1}|V |_{, z := 0}|V |_{, edge heap ˆ}_{H =}_{e_max_{} so that e}_max_{∈ ¯}_{E with} X˜emax ≥ X˜e

for all e ∈E, edge sets ¯¯ H =∅, ¯D =∅ 2: while ˆH 6= ∅ do

3: Choose_{{i, j} ∈ ˆ}H with X˜ij ≥ X˜kl for all {k, l} ∈Hˆ 4: if {i, j} = emax then

5: zi= 1, zj= sgn ˜Xij

6: else

7: if zi6= 0 6= zj then{both i, j already assigned}

8: _{{nothing to be done}}

9: else_{{at least one of i, j not yet assigned}} 10: if zi= 0 ∧ zj6= 0 then {j already assigned}

11: zi:= sgn ˜Xij

12: end if

13: if zj = 0 ∧ zi6= 0 then {i already assigned}

14: zj := sgn ˜Xij zi 15: end if 16: end if 17: end if 18: H :=¯ _{{e ∈ ¯}E : e /_{∈ ˆ}H, e /_{∈ ¯}D, e incident to i or j_} 19: H := ˆˆ H∪ ¯H 20: H := ˆˆ H\ {{i, j}}, ¯D := ¯D∪ {{i, j}} 21: end while

22: return z := Algorithm 5(z, f, F ){try to derive a bisection}

value_{| ˜}Xij|. Since ˆH is a heap, we will find such an edge on its top. If the chosen edge is emax, we

assign i and j to the same or to different clusters depending on the sign of ˜Xij (line 5). Otherwise,

at least one of i and j is already assigned to a cluster. If both i and j are already assigned, nothing has to be done (line 8). Otherwise, we consider again the sign of ˜Xij and place the other node into

the same set or the opposite set (lines 10 to 15).

After the assignment of nodes i and j has been decided, we extend the heap by edges which are incident to i or j (lines 18 and 19). Furthermore, we exclude the edge _{{i, j} from the heap ˆ}H and insert it into the set ¯D of edges that were already considered (line 20).

Once all edges of ¯E have been considered, we end up with a partition of V encoded in z. Note that this partition does not have to be a bisection, i.e., the weight of one cluster may exceed F . Therefore, we call Algorithm 5 (see Section 4.2.2) to try to derive a bisection from the given partition.

Initial Heuristic Goemans-Williamson

If we neglect the influence of the matrix ¯W in (4.4), we can view the approximate primal solution ˜

X as being given by P PT_{. Thus, ˜}_{X can be viewed as the Gram matrix of the columns of P}T_,

Williamson used these vectors to come up with a very useful geometric interpretation of the feasible set of (3.19) with F _{≥ f(V ) and applied this interpretation in their well-known approximation} algorithm for the maximum cut problem [57].

Let us explain this interpretation directly on a primal heuristic for the minimum bisection problem. We associate with every node i the row vector Pi·. Assuming 1 = xii = Pi·(Pi·)T, we

get _||Pi·|| = 1. Now, we can view Pi· as a relaxation of xi ∈ {−1, 1}, and Pi·(Pj·)T ∈ [−1, 1]

as a relaxation of xixj ∈ {−1, 1}. This approach, usually known as vector labelling and already

present in the seminal paper of Lov´asz [93], enables us to reformulate the primal semidefinite relaxation (3.19) of (MB) as min X 1≤i≤j≤n cijPi·(Pj·)T s.t. X 1≤i≤j≤n (f fT₎ ijPi·(Pj·)T ≤ (2F − f(V ))2 Pi·(Pi·)T = 1 ∀i ∈ V Pi·∈ Rn ∀i ∈ V . (4.8)

Since Pi·(Pj·)T =||Pi·|| · ||Pj·|| · cos ∠(Pi·, Pj·) and||Pi·|| = ||Pj·|| = 1, we can relate a solution

X of (3.19) to a solution of (4.8) in the following way: Xij = Pi·(Pj·)T = cos∠(Pi·, Pj·). On the

one hand, this means that an angle 0◦ _{≤ ∠(P}

i·, Pj·) < 90◦ corresponds to a value Xij > 0 and

suggests to place i and j in the same cluster. On the other hand, an angle 90◦_{< ∠(P}

i·, Pj·)≤ 180◦

recommends to separate i and j. Of course, conflicts can occur if we try to follow this approach for every node pair as displayed in Figure 4.4. Here, we would like to separate each pair of nodes.

n

H

P

Figure 4.4: Node separation by a random hyperplane

To dissolve the conflict, we use a random hyperplane H through the origin and place all nodes with corresponding vectors on one side of the hyperplane in one cluster and all other nodes in the other cluster. Thus, we can compute a partition z_{∈ {−1, 1}}|V | _{by z}

v = sgn

1 kPv·kPv·h

T_{for all}

v∈ V , where h denotes the normal vector of the hyperplane H. The resulting heuristic is realised by Algorithm 3 for input parameter m = 1.

The partition derived by the approach described above may not fulfil the bisection constraint. Therefore, we try a second approach that keeps a closer eye on the cluster sizes. The resulting heuristic is given by Algorithm 3 for input parameter m = 2. Initially, we place all nodes in the second cluster (denoted by zv = 1 for all v ∈ V , line 5). We sort the nodes by nondecreasing

values of _kP1_v·_kPv·hT (line 6). Starting with a node with smallest _kP1v·kPv·h

T_{, we move each node}

Algorithm 3 Initial heuristic Goemans-Williamson Input: P _{∈ R}|V |×k_{, weights f}

v for all v∈ V , capacity F , objective coefficient matrix C, approach

m_{∈ {1, 2}}

Output: Bisection vector z_{∈ {−1, 1}}|V | _{or zero vector 0}|V |

1: Generate a random vector h_{∈ R}|V |

2: if m = 1 then_{{first approach}} 3: zv= sgn

1 kPv·kPv·h

T_{for all v}_{∈ V}

4: else{second approach} 5: zv= 1 for all v∈ V 6: Sort V =_{v1, . . . , v|V |} s.t. 1 kPvi·k Pvi·h T ≤ 1 kPvj·k Pvj·h T _{if i < j} 7: for v = v1, . . . , v_{|V |}do 8: if fv+ X i∈V :zi=−1 f (i) < f (V )_{− F then} 9: zv=−1 10: else 11: if fv+ X i∈V :zi=−1 f (i) > F then 12: zv = 1 13: else 14: if hC(z − 2ev), z− 2evi < hCz, zi then 15: zv =−1 16: else 17: zv = 1 18: end if 19: end if 20: end if 21: end for 22: end if

23: return z := Algorithm 5(z, f, F )_{{try to derive a bisection}}

f (V )− F (lines 8 and 9). Otherwise, we check whether we would increase the cluster size above the upper capacity F , in which case we leave the node in the second cluster (lines 11 and 12). In the remaining case, moving node v into the first cluster creates a bisection. We realise this move if it results in a better objective value than that achieved by not moving the node (lines 14 to 18). If the final partition is not a bisection, we call Algorithm 5 (see Section 4.2.2) to try to derive a bisection. Since we use a random hyperplane, we can, of course, run Algorithm 3 for both m = 1 and m = 2 several times and hopefully achieve different bisections.

Initial Heuristic SumPi

This heuristic, as displayed in Algorithm 4, also uses the rows Pi·of the matrix P . For k denoting

the size of the current bundle, the matrix P _{∈ R}n×k_{can be written with the help of the k largest}

eigenvalues λ1, . . . , λkand corresponding normalised eigenvectors v1, . . . , vkof ˜X, i.e., ˜X = P PT =

V_X˜Λ 1 2 ˜ X V_X˜Λ 1 2 ˜ X T = V_X˜Λ_X˜VT_˜

X with ΛX˜ = Diag(λ1, . . . , λk) so that λ1 ≥ . . . ≥ λk, ||v 1

|| = . . . = ||vk_{|| = 1 and V}

X = (v1. . . vk). Thus, we can write ˜Xij =Pkl=1λlvilvlj=

l=1PilPjl. We see that

eigenvectors corresponding to large eigenvalues have a greater influence on ˜X than eigenvectors of small eigenvalues. In particular in later stages of the computations, the first eigenvalues are

much greater than the remaining ones. Therefore, we concentrate on the first ¯k columns of V , respectively the first ¯k entries in the row vectors Pi·. In fact, this is the main difference to the

heuristic Goemans-Williamson, where all columns of P had an equal influence, because we used a random vector h.

Now, suppose, for ease of explanation, that λ16= 0 and λ2= . . . = λk = 0. Then the sign and

absolute value of ˜Xij = Pi·(Pj·)T are completely determined by the signs and absolute values of

Pi1 and Pj1. Thus, the partition encoded in X can be read from column P·1 in the sense that i

and j with large absolute values of Pi1 and Pj1 and equal signs are most likely in the same set,

while opposite signs suggest to separate them.

If more eigenvalues than just the first one are nonzero, then the later ones have also an influence on ˜Xij. Thus, if we want to consider the first ¯k < k eigenvalues, we should sort the nodes by

increasing values ofP¯k

l=1Pil. Since−v is an eigenvector if and only if v is an eigenvector, we can

also consider the negated columns of P by sorting the nodes v by sums pv= Pv1+P ¯ k

l=2al−1Pvl,

where a_{∈ {−1, 1}}¯k−1_{(line 2). Any such a may lead to a different order of the nodes and, therefore,}

to a different, but equally sensible partition. For the sake of simplicity, we define a as an input parameter of Algorithm 4.

As in the heuristic Goemans-Williamson, we follow two approaches. Approach one (m = 1 in Algorithm 4) simply partitions the nodes according to the sign of pv. Approach two (m = 2 in

Algorithm 4) sorts the nodes by nondecreasing pv. It then tries to construct a partition in the

same manner as approach two of the heuristic Goemans-Williamson.

In document Branch-and-Cut for a Semidefinite Relaxation of Large-scale Minimum Bisection Problems (Page 129-133)