CHAPTER 9. Integer Programming

(1)

CHAPTER 9 Integer Programming

An integer linear program (ILP) is, by definition, a linear program with the addi- tional constraint that all variables take integer values:

(9.1) max c^Tx s t Ax ≤ b and x integral

Integrality restrictions occur in many situations. For example, the products in a linear production model (cf. p. 81) might be “indivisible goods” that can only be produced in integer multiples of one unit. Many problems in operations re- search and combinatorial optimization can be formulated as ILPs. As integer programming is NP-hard (see Section 8.3), every NP-problem can in principle be formulated as an ILP. In fact, such problems usually admit many different ILP formulations. Finding a particularly suited one is often a decisive step towards the solution of a problem.

9.1. Formulating an Integer Program

In this section we present a number of (typical) examples of problems with their corresponding ILP formulations.

Graph Coloring. Let us start with the combinatorial problem of coloring the nodes of a graph G = V E so that no two adjacent nodes receive the same color and as few colors as possible are used (cf. Section 8.1). This problem occurs in many applications. For example, the nodes may represent “jobs” that can each be executed in one unit of time. An edge joining two nodes may indicate that the corresponding jobs cannot be executed in parallel (because they use perhaps common resources). In this interpretation, the graph G would be the conflict graph of the given set of jobs. The minimum number of colors needed to color its nodes equals the number of time units necessary to execute all jobs.

Formulating the node coloring problem as an ILP, we assume V = {1 n}

and that we have n colors at our disposal. We introduce binary variables yk, k = 1 n, to indicate whether color k is used yk = 1 or not yk = 0 . Fur- thermore, we introduce variables xik to indicate whether node i receives color k.

181

(2)

The resulting ILP is (9.2) min Pn

k=1yk s t 1 Pn

k=1xik = 1 i = 1 n

2 xik− yk ≤ 0 i k = 1 n

3 xik+ xjk ≤ 1 i j ∈ E k = 1 n

4 0 ≤ xik y_k ≤ 1

5 x_ik y_k ∈ Z

The constraints (4) and (5) ensure that the xik and yk are binary variables. The constraints (1)–(3) guarantee (in this order) that each node is colored, node i re- ceives color k only if color k is used at all, and any two adjacent nodes have different colors.

EX. 9.1. Show: If the integrality constraint (5) is removed, the resulting linear program has optimum value equal to 1.

The Traveling Salesman Problem (TSP). This is one of the best-known com- binatorial optimization problems: There are n towns and a ”salesman”, located in town 1, who is to visit each of the other n − 1 towns exactly once and then return home. The tour (traveling salesman tour) has to be chosen so that the to- tal distance traveled is minimized. To model this problem, consider the so-called complete graph Kn, i.e., the graph Kn = V E with n = |V| pairwise adjacent nodes. With respect to a given cost (distance) function c : E → R we then seek to find a Hamilton circuit C ⊆ E, i.e., a circuit including every node, of minimal cost.

An ILP formulation can be obtained as follows. We introduce binary variables x_ik i k = 1 n to indicate whether node i is the kth node visited. In addition, we introduce variables ye e ∈ E to record whether edge e is traversed:

(9.3)

minP

e∈Ec_ey_e

s t Pn x₁₁ = 1

k=1xik = 1 i = 1 n Pn

i=1xik = 1 k = 1 n P

eye = n

xi⁺k−1+ xjk− ye ≤ 1 e = i j k ≥ 2 xin+ x11− ye ≤ 1 e = i 1

0 ≤ x^ik ye ≤ 1 xik ye ∈ Z

EX. 9.2. Show that each feasible solution of (9.3) corresponds to a Hamilton circuit and conversely.

In computational practice, other TSP formulations have proved more efficient.

To derive an alternative formulation, consider first the following simple program with edge variables ye, e ∈ E:

(3)

9.1. FORMULATING AN INTEGER PROGRAM 183

(9.4) min c^Ty s t y^Jt i = 2 i = 1 n 0 ≤ y ≤ 1

y integral (Recall our shorthand notationy^-7i =P

e∈^" i y_efor the sum of ally-values on

edges incident with node i.)

ILP (9.4) does not describe our problem correctly: We still must rule out solu- tions corresponding to disjoint circuits that cover all nodes. We achieve this by adding more inequalities, so-called subtour elimination constraints. To simplify the notation, we write fory ∈ R^E and two disjoint subsets S T ⊆ V

y S : T = X

e =^ei^¡ j^f i ∈ S^¡ j ∈ T

ye

The subtour elimination constraints

y S : S ≥ 2

make sure that there will be at least two edges in the solution that lead from a proper nonempty subset S ⊂ V to its complement S = V \ S. So the corresponding tour is connected. A correct ILP-formulation is thus given by

(9.5)

min c^Ty s t y^Jt i = 2 i = 1 n y S : S ≥ 2 ∅ ⊂ S ⊂ V

0 ≤ y ≤ 1

y integral

Note the contrast to our first formulation (9.3): ILP (9.5) has exponentially many constraints, one for each proper subset S ⊂ V. If n = 30, there are more than 2³⁰ constraints. Yet, the way to solve (9.5) in practice is to add even more constraints!

This approach of adding so-called cutting planes is presented in Sections 9.2 and 9.3 below.

REMARK. The mere fact that (9.5) has exponentially many constraints does not prevent us from solving it (without the integrality constraints) efficiently (cf. Section 10.6.2).

Maximum Clique. This is another well-studied combinatorial problem, which we will use as a case study for integer programming techniques later. Consider again the complete graph Kn = V E on n nodes. This time, there are weights c ∈ R^V andd ∈ R^E given on both the vertices and the edges. We look for a set C ⊆ V that maximizes the total weight of vertices and induced edges:

(9.6) max

C⊆V c C + d EC

As Kn = V E is the complete graph, each C ⊆ V is a clique (set of pairwise adjacent nodes). Therefore, we call (9.6) the maximum weighted clique problem.

(4)

EX. 9.3. Given a graph G = V E⁰ with E⁰⊆ E, choose c = 1 and de=

0 e ∈ E⁰

−n otherwise

Show: With these parameters (for Kn= V E), (9.6) reduces to the problem of finding a clique C in G of maximum cardinality.

Problem (9.6) admits a rather straightforward ILP-formulation:

(9.7)

max c^Tx + d^Ty

ye− xⁱ ≤ 0 e ∈ E i ∈ e x_i+ xj− ye ≤ 1 e = i j ∈ E

0 ≤ x y ≤ 1 x y integer

A vector x y with all components xi ye ∈ {0 1} that satisfies the constraints of (9.7) is the so-called (vertex-edge) incidence vector of the clique

C = {i ∈ V | xi = 1}

In other words, x ∈ R^V is the incidence vector of C andy ∈ R^E is the incidence vector of EC .

REMARK. The reader may have noticed that all ILPs we have formulated so far are binary programs, i.e., the variables are restricted to take values in {0 1} only. This is not by pure accident. The majority of integer optimization problems can be cast in this setting. But of course, there are also others (e.g., the integer linear production model mentioned in the introduction to this chapter).

9.2. Cutting Planes I

Consider the integer linear program

(9.8) max c^Tx s t Ax ≤ b and x integral

For the following structural analysis it is important (see Ex. 9.4) to assume thatA andb are rational, i.e., A ∈ Q^m×n andb ∈ Q^m. In this case, the polyhedron

(9.9) P = {x ∈ Rⁿ | Ax ≤ b}

is a rational polyhedron (cf. Section 3.6). The set of integer vectors in P is a discrete set, whose convex hull we denote by

(9.10) P_I= conv {x ∈ Zⁿ | Ax ≤ b}

Solving (9.8) is equivalent with maximizingc^Tx over the convex set PI (Why?).

Below, we shall prove that also PIis a polyhedron and derive a system of inequal- ities describing PI. We thus show how (at least in principle) the original problem (9.8) can be reduced to a linear program.

EX. 9.4. Give an example of a (non-rational) polyhedron P ⊆ Rⁿsuch that the set PIis not a polyhedron.

(5)

9.2. CUTTING PLANES I 185

PROPOSITION 9.1. Let P ⊆ Rⁿ be a rational polyhedron. Then also PI is a rational polyhedron. In case P_I 6= ∅, its recession cone equals that of P.

Proof. The claim is trivial if P is bounded (as P then contains only finitely many integer points and the result follows by virtue of the discussion in Section 3.6). By the Weyl-Minkowski Theorem 3.2, a rational polyhedron generally decomposes into

P = conv V + cone W

with finite sets of rational vectors V ⊆ Qⁿand W ⊆ Qⁿ. By scaling, if necessary, we may assume that W ⊆ Zⁿ. Denote byV and W the matrices whose columns are the vectors in V and W respectively. Thus eachx ∈ P can be written as

x = Vλ + Wµ where λ µ≥ 0 and 1^Tλ= 1

Let bµc be the integral part of µ 6= 0 (obtained by rounding down each compo- nent ^O i≥ 0 to the next integer b^O ic). Splitting µ into its integral part bµc and its non-integral part µ = µ − bµc yields

x = Vλ + Wµ + Wbµc = x + Wbµc with bµc ≥ 0 integral and x ∈ P, where

P = {Vλ + Wµ | λ ≥ 0 1^Tλ= 1 0 ≤ µ 1}

Because W ⊆ Zⁿ,x is integral if and only if x is integral. Hence P ∩ Zⁿ = P ∩ Zⁿ+ {Wz | z ≥ 0 integral}

Taking convex hulls on both sides, we find (cf. Ex. 9.5) PI = conv P ∩ Zⁿ + cone W

Since P is bounded, P ∩ Zⁿ is finite. So the claim follows as before.

EX. 9.5. Show: conv V + W = conv V + conv W for all V W ⊆ Rⁿ.

We next want to derive a system of inequalities describing PI. There is no loss of generality when we assume P to be described by a systemAx ≤ b with A and b integral. The idea now is to derive new inequalities that are valid for PI (but not necessarily for P) and to add these to the systemAx ≤ b. Such inequalities are called cutting planes as they “cut off” parts of P that are guaranteed to contain no integral points.

Consider an inequality c^Tx ≤ ^V that is valid for P. If c ∈ Zⁿ but ^V 6∈ Z, then each integral x ∈ P ∩ Zⁿ obviously satisfies the stronger inequality c^Tx ≤ b^V c.

Recall from Corollary 2.6 that valid inequalities for P can be derived from the systemAx ≤ b by taking nonnegative linear combinations. We therefore consider inequalities of the form

(9.11) y^TA x ≤ y^Tb y ≥ 0

(6)

Ify^TA ∈ Zⁿ, then everyx ∈ P ∩ Zⁿ(and hence everyx ∈ PI) satisfies

(9.12) y^TA x ≤ by^Tbc

We say that (9.12) arises from (9.11) by rounding (ify^TA ∈ Zⁿ). In particular, we regain the original inequalitiesAx ≤ b taking as y all unit vectors. We conclude

PI ⊆ P⁰= {x ∈ Rⁿ| y^TA x ≤ by^Tbc y ≥ 0 y^TA ∈ Zⁿ} ⊆ P

Searching for inequalities of type (9.12) withy^TA ∈ Zⁿ, we may restrict ourselves to 0 ≤ y ≤ 1. Indeed, each y ≥ 0 splits into its integral part z = byc ≥ 0 and non-integral part y = y − z. The inequality (9.12) is then implied by the two inequalities

(9.13) z^TA x ≤ z^Tb ∈ Z

y^TA x ≤ by^Tbc

(Recall that we assume A and b to be integral.) The first inequality in (9.13) is implied byAx ≤ b. To describe P⁰, it thus suffices to augment the systemAx ≤ b by all inequalities of the type (9.12) with0 ≤ y 1, which describes

(9.14) P⁰ = {x ∈ Rⁿ| y^TA x ≤ by^Tbc 0 ≤ y ≤ 1 y^TA ∈ Zⁿ}

by a finite number of inequalities (see Ex. 9.6) and thus exhibits P⁰ as a polyhedron.

EX. 9.6. Show: There are only finitely many vectors y^TA ∈ Zⁿwith0 ≤ y ≤ 1.

EX. 9.7. Show: P ⊆ Q implies P⁰⊆ Q⁰. (In particular, P⁰ depends only on P and not on the particular systemAx ≤ b describing P.)

Iterating the above construction, we obtain the so-called Gomory sequence (9.15) P ⊇ P⁰⊇ P⁰⁰⊇ ⊇ P^k ⊇ ⊇ PI

Remarkably (cf. Gomory [34], and also Chvatal [9]), Gomory sequences are al- ways finite:

THEOREM 9.1. The Gomory sequence is finite in the sense that P^t = PI holds for some t ∈ N.

Before giving the proof, let us examine in geometric terms what it means to pass from P to P⁰. Consider an inequality

y^TA x ≤ y^Tb

with y ≥ 0 and y^TA ∈ Zⁿ. Assume that the components of y^TA have greatest common divisor d = 1 (otherwise replace y by d⁻¹y). Then the equation

y^TA x = by^Tbc

(7)

admits an integral solution x ∈ Zⁿ (cf. Ex. 9.8). Hence passing from P to P⁰ amounts to shifting all supporting hyperplanes H of P “towards” PI until they

“touch” Zⁿin some pointx (not necessarily in PI).

F^IGURE 9.1. Moving a cutting plane towards PI

EX. 9.8. Show: An equation c^Tx =^' withc ∈ Zⁿ,^' ∈ Z admits an integer solution if and only if the greatest common divisor of the components ofc divides^' (Hint: Section 2.3).

The crucial step in proving Theorem 9.1 is the observation that the Gomory se- quence (9.15) induces Gomory sequences on all faces of P simultaneously. More precisely, assume F ⊆ P is a proper face. From Section 3.6, we know that

F = P ∩ H holds for some rational hyperplane

H = {x ∈ Rⁿ| y^TA x = y^Tb}

withy ∈ Q^m₊ (and hencey^TA ∈ Qⁿ andy^Tb ∈ Q).

LEMMA 9.1. F = P ∩ H implies F⁰ = P⁰∩ H.

Proof. From Ex. 9.7 we conclude F⁰ ⊆ P⁰. Since, furthermore, F⁰ ⊆ F ⊆ H holds, we conclude F⁰ ⊆ P⁰∩ H. To prove the converse inclusion, note that F is the solution set of

Ax ≤ b y^TAx = y^Tb

Scalingy if necessary, we may assume that y^TA and y^Tb are integral. By defini- tion, F⁰is described by the inequalities

(9.16) w^TA +¹ y^TA x ≤ bw^Tb +¹ y^Tbc

with w ≥ 0, ¹ ∈ R (not sign-restricted) and w^TA +¹ y^TA ∈ Zⁿ. We show that each inequality (9.16) is also valid for P⁰∩ H (and hence P⁰∩ H ⊆ F⁰).

If^1¢ 0, observe that forx ∈ H (and hence for x ∈ P⁰∩ H) the inequality (9.16) remains unchanged if we increase¹ by an integer k ∈ N: Since x satisfies y^TAx =

(8)

y^Tb ∈ Z, both the left and right hand side will increase by ky^Tb if¹ is increased to

1 + k. Hence we can assume¹ ≥ 0 without loss of generality. If¹ ≥ 0, however, (9.16) is easily recognized as an inequality of type (9.12). (Takey = w +¹ y ≥ 0.) So the inequality is valid for P⁰ and hence for P⁰∩ H.

We are now prepared for the

Proof of Theorem 9.1. In case P = {x ∈ Rⁿ| Ax = b} is an affine subspace, the claim follows from Corollary 2.2 (cf. Ex. 9.9). In general, P is presented in the form

(9.17) Ax = b

A⁰x ≤ b⁰

with n − d equalities Ai·x = bi and s ≥ 0 facet inducing (i.e., irredundant) in- equalitiesA⁰_j·x ≤ b⁰j.

C^ASE 1: PI = ∅. Let us argue by induction on s ≥ 0. If s = 0, P is an affine subspace and the claim is true. If s ≥ 1, we remove the last inequality A⁰_s·x ≤ b⁰s

in (9.17) and let Q ⊆ Rⁿbe the corresponding polyhedron. By induction, we then have Q^t = QI for some t ∈ N. Now PI = ∅ implies

QI∩ {x ∈ Rⁿ| A⁰_s·x ≤ b⁰s} = ∅

Since P^t ⊆ Q^t and (trivially) P^t ⊆ {x ∈ Rⁿ | A⁰_s·x ≤ b⁰s}, we conclude that P^t = ∅ holds, too.

C^ASE 2: PI6= ∅. We proceed now by induction on dim P. If dim P = 0, P = {p}

is an affine subspace and the claim is true. In general, since PI is a polyhedron, we can represent it as

Ax = b Cx ≤ d withC and d integral.

We show that each inequalityc^Tx ≤ of the system Cx ≤ d will eventually be- come valid for some P^t, t ∈ N (which establishes the claim immediately). So fix an inequalityc^Tx ≤ . Since P and PI(and hence all P^t) have identical recession cones by Proposition 9.1, the values

t

= max

x∈P^£^t^¤ c^Tx

are finite for each t ∈ N. The sequence ^t is decreasing. Indeed, from the definition of the Gomory sequence we conclude that ^t+1 ≤ b ^tc. Hence the sequence

t reaches its limit

:= lim_t→∞ ^t ≥

in finitely many steps. If = , there is nothing left to prove. Suppose therefore

= ^t and consider the face

F := {x ∈ P^t | c^Tx = }

(9)

Then FI must be empty since everyx ∈ PI ⊇ FI satisfies c^Tx ≤ ^> . If c^T ∈ rowA, then c^Tx is constant on P ⊇ P^t ⊇ P^I, so ^x¥ is impossible. Hence c^T 6∈ row A, i.e., dim F dim P. By induction, we conclude from Lemma 9.1

F^k = P^t+k ∩ {x ∈ Rⁿ | c^Tx = } = ∅ for some finite k. Hence ^t+k , a contradiction.

EX. 9.9. Assume P = {x ∈ Rⁿ| Ax = b}. Show that either P = PI or P⁰ = PI = ∅.

(Hint: Corollary 2.2 and Proposition 9.1)

EX. 9.10 (Matching Polytopes). Let G = V E be a graph with an even number of nodes. A perfect matching in G is a set of pairwise disjoint edges covering all nodes.

Perfect matchings in G are in one-to-one correspondence with integral (and hence binary) vectorsx ∈ R^E satisfying the constraints

(1) x^vgHi^L = 1 i ∈ V

(2) 0 ≤ x ≤ 1.

Let P ⊆ R^E be the polytope described by these constraints. The associated polytope PI

is called the matching polytope of G. Thus PI is the convex hull of (incidence vectors of) perfect matchings in G. (For example, if G consists of two disjoint triangles, we have R^E ' R⁶, P = {¹2· 1} and P^I= ∅).

To construct the Gomory polytope P⁰, consider some S ⊆ V. When we add the constraints (1) for i ∈ S, every edge e = i j with i j ∈ S occurs twice. So the resulting equation is

(1’) x^vgH S + 2x E S = |S|

(Recall that E S ⊆ E is the set of edges induced by S.) On the other hand, (2) implies

(2’) x^vgH S ≥ 0

From (1’) and (2’) we conclude that x E S^L ≤ ¹₂|S| is valid for P. Hence for S ⊆ V (3) x E S ≤ b¹₂|S|c

is valid for P⁰. It can be shown (cf. [12]) that the inequalities (1)-(3) describe PI. So P⁰ = P^Iand the Gomory sequence has length 1.

Gomory’s Cutting Plane Method. Theorem 9.1 tells us that – at least in principle – integer programs can be solved by repeated application of linear programming.

Conceptually, Gomory’s method works as follows. Start with the integer linear program

(9.18) max c^Tx s t Ax ≤ b x integral

and solve its LP-relaxation, which is obtained by dropping the integrality con- straint:

(9.19) max c^Tx s t Ax ≤ b

So c^Tx is maximized over P = {x ∈ Rⁿ | Ax ≤ b}. If the optimal solution is integral, the problem is solved. Otherwise, determine P⁰ and maximizec^Tx over

P⁰etc.

(10)

Unfortunately, this approach is hopeless inefficient. In practice, if the optimumx^∗ of (9.19) is non-integral, one tries to find cutting planes (i.e., valid inequalities for PIthat “cut off” a part of P containingx^∗) right away in order to add these to the systemAx ≤ b and then solves the new system etc.. This procedure is generally known as the cutting plane method for integer linear programs.

Of particular interest in this context are cutting planes that are best possible in the sense that they cut as much as possible off P. Ideally, one would like to add inequalities that define facets of PI. Numerous classes of such facet defining cutting planes for various types of problems have been published in the literature.

In Section 9.3, we discuss some techniques for deriving such cutting planes.

9.3. Cutting Planes II

The cutting plane method has been successfully applied to many types of problems. The most extensively studied problem in this context is the traveling sales- man problem (see, e.g., [12] for a detailed exposition). Here, we will take the max clique problem from Section 9.1 as our guiding example, trying to indicate some general techniques for deriving cutting planes. Moreover, we take the opportunity to explain how even more general (seemingly nonlinear) integer programs can be formulated as ILPs.

The following unconstrained quadratic boolean (i.e., binary) problem was studied in Padberg [64] with respect to a symmetric matrixQ = qij ∈ R^n×n:

(9.20) max

Xn i⁺ j=1

qijxixj xi∈ {0 1}

As xi· xⁱ = xⁱ holds for a binary variable xi, the essential nonlinear terms in the objective function are the terms qijxixj i 6= j . These may be linearized with the help of new variables yij = xixj. Since xixj = xjxi, it suffices to introduce just nn − 1 ⁶ 2 new variables ye, one for each edge e = i j ∈ E in the complete graph Kn= V E with V = {1 n}.

The salient point is the fact that the non-linear equation ye = xix_j is equivalent with the three linear inequalities

ye ≤ xⁱ ye ≤ x^j and xi+ x^j− y^e ≤ 1 if xi xjand ye are binary variables.

(11)

9.3. CUTTING PLANES II 191

With ci = qii and de = qij+ qji for e = i j ∈ E, problem (9.20) can thus be written as an integer linear program:

(9.21)

max Xn

i=1

c_ix_i+X

e∈E

d_ey_e s.t.

y_e− xi ≤ 0 e ∈ E i ∈ e xi+ x^j− y^e ≤ 1 e = i j ∈ E

0 ≤ xi y_e ≤ 1 xi ye integer.

Note that (9.21) is precisely our ILP formulation (9.7) of the weighted max clique problem.

Let P ⊆ R^V∪E be the polytope defined by the inequality constraints of (9.21).

As we have seen in Section 9.1, PI is then the convex hull of the (vertex-edge) incidence vectors x y ∈ R^V∪E of cliques (subsets) C ⊆ V.

The polytope P ⊆ R^V∪E is easily seen to have full dimension n + ⁿ₂

(because, e.g., x = ¹₂ · 1 and y = ¹₃ · 1 yields an interior point x y of P). Even PI is full-dimensional (see Ex. 9.11).

EX. 9.11. Show: R^V∪E is the affine hull of the incidence vectors of the cliques of sizes 0,1 and 2.

What cutting planes can we construct for PI? By “inspection”, we find that for any three vertices i j k ∈ V and corresponding edges e f g ∈ E, the following triangle inequality

(9.22) xi+ xj+ xk− ye− yf − yg ≤ 1 holds for any clique incidence vector x y ∈ R^V∪E.

EX. 9.12. Show: (9.22) can also be derived from the inequalities describing P by round- ing.

This idea can be generalized. To this end, we extend our general shorthand notation and write for x y ∈ R^V∪E and S ⊆ V:

x S =X

i∈S

x_i and y S = X

e∈ES

y_e

For example, (9.22) now simply becomes: x S − y S ≤ 1 for |S| = 3.

For every S ⊆ V and integer¹ ∈ N, consider the following clique inequality (9.23) ¹ x S − y S ≤ ^1?1 + 1 ⁶ 2

PROPOSITION 9.2. Each clique inequality is valid for PI.

(12)

Proof. Let x y ∈ R^V∪E be the incidence vector of some clique C ⊆ V. We must show that x y satisfies (9.23) for each S ⊆ V and¹ ∈ N. Let s = |S ∩ C|. Then x S = s and y S = ss − 1 ⁶ 2. Hence

1?1 + 1 ⁶ 2 −¹ x S + y S = [^1v1 + 1 − 2¹ s + ss − 1 ]⁶ 2

= ^?1 − s ^?1 − s + 1 ⁶ 2 which is nonnegative since both¹ and s are integers.

A further class of inequalities can be derived similarly. For any two disjoint sub- sets S T ⊆ V, the associated cut inequality is

(9.24) x S + y S + y T − y S : T ≥ 0

(Recall from Section 9.1 thaty S : T denotes the sum of all y-values on edges joining S and T).

PROPOSITION 9.3. Each cut inequality is valid for PI.

Proof. Assume that x y ∈ R^V∪E is the clique incidence vector of C ⊆ V. With s = |C ∩ S| and t = |C ∩ T|, we then find

x S + y S + y T − y S : T = s + s s − 1 ⁶ 2 + tt − 1 ⁶ 2 − st

= s − t s − t + 1 ⁶ 2 ≥ 0

Multiplying a valid inequality with a variable xi ≥ 0, we obtain a new (nonlin- ear!) inequality. We can linearize it by introducing new variables as explained at the beginning of this section. Alternatively, we may simply use linear (lower or upper) bounds for the nonlinear terms, thus weakening the resulting inequality.

For example, multiplying a clique inequality (9.23) by 2xi, i ∈ S, yields 2¹ X

j∈S

xixj− 2xⁱy S ≤ ^1?1 + 1 xi

Because of xiy S ≤ y S , x²_i = xi and xix_j = ye, e = i j ∈ E, the following so-called i-clique inequality

(9.25) 2¹ yi : S \ {i} − 2y S −^1?1 − 1 xi ≤ 0 must be valid for PI. (This may also be verified directly.)

REMARK. Most of the above inequalities actually define facets of PI. Consider, e.g., for some ^% , 1 ≤^% ≤ n − 2, the clique inequality

% xS − y S ≤^% ^% + 1 2

which is satisfied with equality by all incidence vectors of cliques C ⊆ V with |C ∩ S| =^% or |C ∩ S| =^% + 1. Let H ⊆ R^V∪E be the affine hull of all these incidence vectors.

To prove that the clique inequality is facet defining, one has to show dim H = dim PI− 1

(13)

9.4. BRANCH AND BOUND 193

i.e., H is a hyperplane in R^V∪E. This is not too hard to do. (In the special case S = V and

% = 1, it follows readily from Ex. 9.11).

The cutting plane method suffers from a difficulty we have not mentioned so far.

Suppose we try to solve an integer linear program, starting with its LP-relaxation and repeatedly adding cutting planes. In each step, we then face the problem of finding a suitable cutting plane that cuts off the current non-integral optimum.

This problem is generally difficult. E.g., for the max clique problem one can show that it is N P-hard to check whether a given x^∗ y^∗ ∈ R^V∪E satisfies all clique inequalities and, if not, find a violated one to cut off x^∗ y^∗ .

Moreover, one usually has only a limited number of different classes (types) of cutting planes to work with. In the max clique problem, for example, we could end up with a solution x^∗ y^∗ that satisfies all clique, i-clique and cut inequalities and yet is non-integral. The original system and these three classes of cutting planes namely describe PIby no means completely.

The situation in practice, however, is often not so bad. Quite efficient heuristics can be designed that frequently succeed to find cutting planes of a special type.

Macambira and de Souza [57], for example, solve max clique instances of up to 50 nodes with the above clique and cut inequalities and some more sophisticated generalizations thereof.

Furthermore, even when a given problem is not solved completely by cutting planes, the computation was not futile: Typically, the (non-integral) optimum obtained after having added hundreds of cutting planes provides a rather tight estimate of the true integer optimum. Such estimates are extremely valuable in a branch and bound method for solving ILPs as discussed in Section 9.4 below. For example, the combination of cutting planes and a branch and bound procedure has solved instances of the TSP with several thousand nodes to optimality (cf. [12]).

9.4. Branch and Bound

Any linear maximization program (ILP) with binary variables x1 xn can in principle be solved by complete enumeration: Check all 2ⁿpossible solutions for feasibility and compare their objective values. To do this in a systematic fashion, one constructs an associated tree of subproblems as follows. Fixing, say the first variable x1, to either x1= 0 or x1= 1, we generate two subproblemsILP | x¹= 0 and ILP | x¹= 1 . These two subproblems are said to be obtained from (ILP) by branching on x1.

Clearly, an optimal solution of (ILP) can be inferred by solving the two subprob- lems. Repeating the above branching step, we can build a binary tree whose nodes correspond to subproblems obtained by fixing some variables to be 0 or 1. (The term binary refers here to the fact that each node in the tree has exactly two lower neighbors.) The resulting tree may look as indicated in Figure 9.2 below.

(14)

I LP | x¹= 0 x3= 1

I LP | x¹= 0 x3= 0

I LP | x1= 0

I LP I LP | x1= 1

FIGURE 9.2.

Having constructed the complete tree, we could solve (ILP) bottom up and inspect the 2ⁿleaves of the tree, which correspond to ”trivial” (all variables fixed) prob- lems. In contrast to this solution by complete enumeration, branch and bound aims at building only a small part of the tree, leaving most of the “lower part”

unexplored. This approach is suggested by the following two obvious facts:

• If we can solve a particular subproblem, say ILP | x1= 0 x₃= 1 , di- rectly (e.g., by cutting planes), there is no need to inspect the subprob- lems in the branch below ILP | x¹= 0 x3= 1 in the tree.

• If we obtain an upper bound U x1 = 0 x3 = 1 for the sub-problem

ILP | x¹= 0 x3= 1 that is less than the objective value of some known feasible solution of the original (ILP), then ILP | x¹= 0 x3= 1 offers no optimal solution.

Only if neither of these circumstances occurs we have to explore the subtree rooted at ILP | x¹ = 0 x3 = 1 for possible optimal solutions. We do this by branching at ILP | x¹ = 0 x3 = 1 and creating two new subproblems in the search tree. An efficient branch and bound procedure tries to avoid such branching steps as much as possible. To this end, one needs efficient algorithms that produce

(1) “good” feasible solutions of the original (ILP).

(2) tight upper bounds for the subproblems.

There is a trade-off between the quality of the feasible solutions and upper bounds on the one hand and the size of the search tree we have to build on the other. As a rule of thumb, “good” solutions should be almost optimal and bounds should differ from the true optimum by less than 10%.

Algorithms for computing good feasible solutions usually depend very much on the particular problem at hand. So there is little to say in general. Quite often, however, simple and fast heuristic procedures for almost optimal solutions can be found. Such algorithms, also called heuristics for short, are known for many problem types. They have no guarantee for success, but work well in practice.

REMARK [LOCALSEARCH]. In the max clique problem the following simple local search often yields surprisingly good solutions: We start with some C ⊆ V and check

(15)

9.5. LAGRANGIAN RELAXATION 195

whether the removal of some node i ∈ C or the addition of some node j ∈ C yields an improvement. If so, we add (delete) the corresponding node and continue this way until no such improvement is possible (in which case we stop with the current local optimum C ⊆ V). This procedure may be repeated with different initial solutions C ⊆ V.

Computing good upper bounds is usually more difficult. Often, one just solves the corresponding LP-relaxations. If these are too weak, one can try to improve them by adding cutting planes as outlined in Section 9.3 . An alternative is to obtain upper bounds from Lagrangian relaxation (see Section 9.5 below).

Search and Branching Strategies. For the practical execution of a branch and bound algorithm, one needs to specify how one should proceed. Suppose, for example, that we are in a situation as indicated in Figure 9.2, i.e., that we have branched from (ILP) on variable x1 and from ILP | x¹= 0 on variable x3. We then face the question which subproblem to consider next, either ILP | x¹ = 1 or one of the subproblems of ILP | x¹= 0 . There are two possible (extremal) strategies: We either always go to one of the “lowest” (most restricted) subproblems or to one of the “highest” (least restricted) subproblems. The latter strategy, choosing ILP | x¹ = 1 in our case, is called breadth first search while the for- mer strategy is referred to as depth first search, as it moves down the search tree as fast as possible.

A second question concerns the way of branching itself. If LP-relaxation or cutting planes are used for computing upper bounds, we obtain a fractional optimum x^∗ each time we try to solve a subproblem. A commonly used branching rule then branches on the most fractional x^∗_i. In the case of (0 1)-variables, this rule branches on the variable xifor which x^∗_i is closest to 1⁶ 2. In concrete applications, we have perhaps an idea about the “relevance” of the variables. We may then al- ternatively decide to branch on the most relevant variable xi. Advanced software packages for integer programming allow the user to specify the branching process and support various upper bounding techniques.

REMARK. The branch and bound approach can easily be extended to general integer problems. Instead of fixing a variable xi to either 0 or 1, we may restrict it to xi≤^% ior xi≥^% i+ 1 for suitable^% i∈ Z. Indeed, the general idea is to partition a given subproblem into a number of (possibly more than just two) subproblems of similar type.

9.5. Lagrangian Relaxation

In Section 5.1, Lagrangian relaxation was introduced as a means for calculating upper bounds for optimization problems. Thereby, one “relaxes” (dualizes) some (in)equality constraints by adding them to the objective function using Lagrangian multipliersy ≥ 0 (in case of inequality constraints) to obtain an upper bound Ly . The crucial question is which constraints to dualize. The more constraints are dualized, the weaker the bound becomes. On the other hand, dualizing more constraints facilitates the computation of Ly . There is a trade-off between the

(16)

quality of the bounds we obtain and the effort necessary for their computation.

Generally, one would dualize only the “difficult” constraints, i.e., those that are difficult to deal with directly (see Section 9.5.2 for an example).

Held and Karp [39] were the first to apply the idea of Lagrangian relaxation to integer linear programs. Assume that we are given an integer program as

(9.26) max {c^Tx | Ax ≤ b Bx ≤ d x ∈ Zⁿ}

for given integral matrices A B and vectors b c d and let z^∗I P be the optimum value of (9.26). Dualizing the constraints Ax − b ≤ 0 with multipliers u ≥ 0 yields the upper bound

Lu = max {c^Tx − u^T Ax − b | Bx ≤ d x ∈ Zⁿ} (9.27)

= u^Tb + max { c^T− u^TA x | Bx ≤ d x ∈ Zⁿ} and thus the Lagrangian dual problem

(9.28) z^∗_D= min

u≥0 Lu

EX. 9.13. Show that Lu is an upper bound on z^∗_{I P}for everyu ≥ 0.

It is instructive to compare (9.28) with the linear programming relaxation (9.29) z^∗_{L P}= max {c^Tx | Ax ≤ b Bx ≤ d}

which we obtain by dropping the integrality constraintx ∈ Zⁿ. We find that La- grangian relaxation approximates the true optimum z^∗_{I P}at least as well:

THEOREM 9.2. z^∗_{I P} ≤ z^∗D ≤ z^∗L P.

Proof. The first inequality is clear (cf. Ex. 9.13). The second one follows from the fact that the Lagrangian dual of a linear program equals the linear programming dual. Formally, we may derive the second inequality by applying linear programming duality twice:

z^∗_D = min

u≥0 Lu

= min

u≥0 [u^Tb + max

x {c^T− u^TA x | Bx ≤ d x ∈ Zⁿ}]

≤ min

u≥0 [u^Tb + max

x {c^T− u^TA x | Bx ≤ d}]

= min

u≥0 [u^Tb + min

v {d^Tv | v^TB = c^T− u^TA v ≥ 0}]

= min_u

+v {u^Tb + v^Td | u^TA + v^TB = c^T u ≥ 0 v ≥ 0}

= max

x {c^Tx | Ax ≤ b Bx ≤ d} = z^∗L P

(17)

9.5. LAGRANGIAN RELAXATION 197

REMARK. As the proof of Theorem 9.2 shows, z^∗_D= z^∗L P holds if and only if the integrality constraintx ∈ Zⁿis redundant in the Lagrangian dual problem defining z^∗_D. In this case, the Lagrangian dual is said to have the integrality property (cf. Geoffrion [29]).

It turns out that solving the Lagrangian dual problem amounts to minimizing a

”piecewise linear” function of a certain type. We say that a function f : Rⁿ → R is piecewise linear convex if f is obtained as the maximum of a finite number of affine functions fi : Rⁿ → R (cf. Figure 9.3 below). (General convex functions are discussed in Chapter 10).

x f x

FIGURE 9.3. fu = max{ fⁱu | 1 ≤ i ≤ k}

PROPOSITION 9.4. Let U be the set of vectors u ≥ 0 such that (9.30) L u = u^Tb + max

x {c^T− u^TA x | Bx ≤ d x ∈ Zⁿ} ∞ Then L is a piecewise linear convex function on U.

Proof. For fixedu ≥ 0, the maximum in (9.30) is obtained by maximizing a linear function f x = c^T− u^TA x over

P_I = conv {x | Bx ≤ d x ∈ Zⁿ} = conv V + cone E

say, with finite sets V ⊆ Zⁿ and E ⊆ Zⁿ (cf. Proposition 9.1). If Lu ∞, the maximum in (9.30) is attained at somev ∈ V (Why?). Hence

L u = u^Tb + max {c^T− u^TA v | v ∈ V}

exhibiting the restriction of L to U as the maximum of the finitely many affine functions

R iu = u^Tb − Avi + c^Tvi vi ∈ V