CSC5160: Combinatorial Optimization and Approximation Algorithms
Topic: Introducton to Linear and Integer Programming Date: 14/02/2008 Lecturer: Lap Chi Lau Scribe: Shu Tong Tse, Tony Wing Hong Wong
In this lecture, we will talk about the technique of using Linear Programming (LP) to solve combinatorial optimization problems. The lecture is divided into two parts. In the first part, we discuss the theoretical aspects of LP and illustrate by exmaples how combinatoric problems can be reforumated as LP problems. In the second part, we introduce two popular algorithms in solving LP problems: the Simplex Method and the Ellipsoid Method.
9.1 The Linear Programming Formulation
In a LP problem, we are given a set of linear constraint functions gi : Rn → R and constraint values bi and our goal is to maximize (or minimize) a linear objective function f : Rn → R. In other words, our task is to find a vector x ∈ Rn which maximizes f (x) = Pn
j=1cjxj and satisfies gi(x) ≤ bi for all i. LP is a subclass of Mathematical Programming problems, in which the objective function and the constraint functions can be non-linear. In a LP problem, if we require the solution x to be in Zn, then it is called an Integer Linear Program problem.
Example: Perfect Matching with maximum weighting
Suppose e = {u, v} ∈ E(G) i.e. vertex u and vertex v are connected. Let we denotes the weighting associated with the edge e and let us use xeto distinguish whether u and v are not-matched (xe= 0) or matched (xe = 1). Then, our optimization problem becomes maximizing P
e∈E(G)wexe. How about the costraint that every vertex must be matched to exactly one other vertex? It can be encoded inP
e∈δ(v)xe = 1. (Remember that δ(v) means the set of edges adjacent to v.) Example: Maximum Satisfiability
Remeber that the Maximum Satisfaiability problem is to find a truth assignment to satisfy a clause like
(x1∨ x2∨ x3) ∧ · · · ∧ (x3∨ x4∨ x1)
Naturally, we can use the boolean value itself to distinguish between assigning truth or false. i.e.
xi = 0 means assigning false and xi = 1 means assigning true. Note that the whole clause is true if and only if each parenthesis is true. Let us focus on (x1∨ x2∨ x3) first. It is true iff at least one of x1, x2 or x3 is true. This is equivalent to x1+ x2+ x3 ≥ 1. For (x3∨ x4∨ x1), note that xi is true iff 1 − xi= 1. Therefore the linear constraint can be writen as x3+ (1 − x4) + (1 − x1) ≥ 1.
The latter example shows that determining if there is a solution in an integer linear program is NP-complete.
9.2 Different Forms of LP
Before proceeding further, let us first discuss how one form of LP can be converted to another from. By applying these conversion techniques, we can choose the most convenient form of LP
formulation to solve our original optimization problem. There are altogether 5 useful conversions:
1. maximiation problem ⇔ minimization problem
max cTx⇔ min − cTx 2. equality ⇔ a pair of inequalities
aTi x= bi ⇔ aTi x≤ bi and aTi x≥ bi
3. With the use of a slack variable, we can represent an inequality as a combination of equality and non-negativity constraints
aTi x≤ bi ⇔ aiTx+ si= bi, si ≥ 0 4. non-positivity ⇔ non-negativity
xj ≤ 0 ⇔ −xj ≥ 0 5. Restrciting x in sign
For real-value x unrestricted in sign, we can break it into positive part x+j and negative part x−j and write xj = x+j − x−j and require x+j , x−j ≥ 0.
Using these rules, the canonical form min{cTx: Ax ≥ b} can be transformed into the standard formmin{cTx+−cTx−: Ax+−Ax−−Is = b, x+, x−, s≥ 0} in which all variables are non-negative and we have a system of linear equality constraints.
9.3 Relaxing the constraint
In the above examples, we have constraints of the form xe= {0, 1}. Because discrete constraints are mathematically inconvenient, we would like to replace it by a constraints of the form 0 ≤ xe≤ 1.
However, by doing so, the solution of the LP problem may change to a fractional one which is nonsense in our original optimization problem. Suprisingly, this is NOT the case for many problems:
the solution of the new LP problem is still integral! Before discussing the reasons behind, let us do illustrate by one counterexample that fractional solution may arise.
Example: Fractional Solution may arise after relaxing the constraints
Consider the minimum vertex cover problem for the complete 5-graph (See Figure 9.3.1). We can reformulate it as a LP problem as follow:
minP
vxv
xu+ xv ≥ 1 ∀e = {u, v}
xv= 0 or 1 ∀v
Figure 9.3.1: a counterexample
Obviously, the optimal solution is covering all but one vertex and thus minP
vxv = 4. However, if we allow 0 ≤ xv ≤ 1, then the optimal solution would become xv ≡ 0.5 ∀v and thus minP
vxv = 2.5 is changed. Besides, this fractional solution makes no sense to the original vertex cover problem.
9.4 The Geometry of LP
Figure 9.4.2: the geometry of LP
Geometrically, unwanted fractional solutions arise because the corners (the red vertex in Figure 9.4.2) of our enlarged solution space are not integer solutions. It is geometrically obvious that the optimal solution must be at one of the corners. Therefore, for a “good relaxation”, we require every corner to be integral!
9.5 Vertex solutions
Let P = {x : Ax = b, x ≥ 0} be the solution space. Below we gives a mathematical definition of the corner concept we discussed above:
Definition 9.5.1 x is a vertex of P if @y 6= 0 such that x + y, x − y ∈ P .
Now we prove the following important theorem, which says the optimal solution must be attainable at one of the vertices.
Theorem 9.5.2 Assume min{cTx: x ∈ P } is finite, then ∀x ∈ P, ∃ a vertex x0 such that cTx0 ≤ cTx
Proof: If x is a vertex, then take x0 = x0 and we are done.
If x is not a vertex, then by definitioin, ∃y 6= 0 such that x + y, x − y ∈ P . From A(x + y) = b and A(x − y) = b we obtain Ay = 0.
WLOG, we assume cTy ≤ 0 (we can take either y or −y). For the case cTy= 0, then since y 6= 0 and cTy = cT(−y) = 0, either y or −y must have a negative j-component, choose it and name it y (abusing notation). We then consider x + λy for λ > 0. Note that in any situation, we have cT(x + λy) ≤ cTx and x + λy ∈ P .
Case 1: ∃j such that yj < 0: As λ increases, the component j decreases and eventually reaches zero. Let λmax be the largest λ such that x + λy ≥ 0. Note that x + λmaxy has one more zero component than x. We replace x by x + λmaxy and repeat from the beginning.
Case 2: yj ≥ 0 ∀j: we have proven above that this implies cTy < 0. In this case, x + λy is al- ways possible and cT(x+λy) → −∞ as λ → ∞, implying LP is unbounded, which is a contradition.
Since x has n components, Case 1 can happen at most n times. By induction on the number of non-zero componenets of x, we eventually reach a vertex x0.
9.6 Basic Solutions
In this section, we will see how to use bases to obtain solutions to the standard LP problem. Before discussing the details, we shall first prove a theorem which helps us check if x is a vertex.
Theorem 9.6.1 Let P = {x : Ax = b, x ≥ 0}. For x ∈ P , let Axbe a submatrix of A corresponding to j s.t. xj >0. then x is a vetex iff Ax has linearly independent columns. (i.e. Ax has full column rank.)
Example: For A =
2 1 3 0 7 3 2 1 0 0 0 5
, x =
2 0 1 0
, Ax =
2 3 7 2 0 0
, therefore x is a vertex.
Proof: We prove by contrapositive that Ax has linearly independent columns implies x is a vertex. Suppose x is not a vertex. Then, by definition, ∃y 6= 0 s.t. x + y, x − y ∈ P . Let Ay be the submatrix corresponding to the non-zero components of y. Since Ay = 0 and y 6= 0, Ay has linearly dependent columns. Moreover, notice that since both x + y, x − y ≥ 0, yj = 0 whenever xj = 0. Therefore, Ay is a submatrix of Ax and hence Ax has linearly dependent columns.
We prove the other direction by contrapositive as well. Suppose Axhas linearly dependent columns.
Then ∃y s.t. Axy= 0, y 6= 0. We extend y to Rnby adding 0 componenets. Then we have y ∈ Rn
s.t. Ay = 0, y 6= 0 and yj = 0 whenver xj = 0. Let y0 = λy. For λ > 0 small enough, as in the proof of Theorem 9.5.2, we can show that x + y0, x− y0 ∈ P , which means x is not a vertex.
Let P = {x : Ax = b, x ≥ 0} be our solution space, where A is a m × n matrix.(m constraints on x ∈ Rn). Without loss of generality, we assume that there is no redundant constraints, or equivalent, the m rows of A are linearly independent. In this case, for a solution to exists, n ≥ m is necessary. Therefore, we assume n ≥ m as well.
Our next step is to extract m columns from A to obtain a square matrix. Let B ⊂ {1, · · · , n} such that |B| = m. We retain the columns corresponding to the elements of B and remove the other columns. Denote the new square matrix by AB. We also remove the corresponding entries in x and name the new vector xB. Since AB is non-singular by assumption, the system ABxB= b has a unique solution xB= A−B1b. If xB ≥ 0, then we can obtain a solution to the original {Ax = b, x ≥ 0}
prblem by setting xN = 0 for N = {1, · · · , n} \ B. By the previous theorem, x is a vertex.
Here are a few teminolgies. The set B above such that |B| = m and AB is non-singular is called a basis. When xB ≥ 0, the corresponding solution x is called a basic feasible solution. Be careful that a basis need not lead to any basic feasible solution if A−1B bis negative. Also, a vertex can have many basic feasible solutions corresponding to it. (by choosing B in different ways)
Example:
x1+ x2+ x3 = 5 2x1− x2+ 2x3= 1 x1, x2, x3 ≥ 0
We select as a basis B = {1, 2}. Thus N = {3} and AB =
1 1 2 −1
and A−B1b =
2 3
and x= (2, 3, 0)T is a basic feasible (vertex) solution.
9.6.1 Basic Solutions in General Form
Remember that in the canonical LP, we have the constraint x : Ax ≤ b whereas in the standard LP, the constraint is x : Ax = b, x ≥ 0. In either case, when Ax = b holds, we say the constraint is tight for x. From our geometric interpretation, intuitively corners are places where inequalities become tight. In a space with n variables, there are at most n linearly independent tight constraints. If a solution x is only the intersection of n0 linear independent tight constraints where n0 < n, then there is n − n0 ≥ 1 degree of freedom and so x can be represented as the average of two solutions and thus is not a vertex solution. So, we have the following characterization of basic solutions, which will be used frequently later in this course.
Lemma 9.6.2 A basic solution x is the unique solution of n linearly independent tight constraints, where n is the number of variables in the linear program.
Note that this characterization holds in general form; it is instructive to check that this generalizes the previous characterization of basic solutions in standard form.
Figure 9.7.3: Visualization of the Simplex Method
9.7 The Simplex Method
The Simplex Method (Dantzig 1951) is a popular algorithm for solving LP problem. Although it has exponential time performance in worse case scenario, it is very efficient in pratice. Moreover, it is also easy to understand and simple to implement.
The idea is to focus on vertex solutions, since there is always a vertex which attains optimality. The algorithm starts from an arbitrary vertex, and then move to one of its neighbours which improves.
Repeat the procedure until no improvement can be made. At this stage, we reach a local extremum.
Since the solution space is a convex set and the objective function is linear, it is a global extremum as well. Therefore, we have now attained global optimality.
Below we sketch a more mathematical proof of why Simplex algorithm works.
We begin by writing our LP in the form:
min cBxB+ cNxN such that ABxB+ ANxN = b and xB, xN ≥ 0
Here B is the basis from which we obtain the basic solution to start from. Notice that for any solution x, xB= A−1B b− AB−1ANxN and that its total cost, cTxcan be written as
cTx= cBxB+ cNxN = cB(A−B1b− A−B1ANxN) + cNxN = cBA−B1b+ (cN − cBA−B1AN)xN We denote the reduced cost of the non-basic variables by dN = cN − cBA−B1AN. If there is a j ∈ N such that dj <0, then by increasing xj up from zero we decrease the cost. We can keep on increasing xj until one of the components of xB becomes zero. When one of the components does reach zero, we remove it from the basis and replace it by the variable xj which is orgiginal non-basic but now positive. Iterate.
When we reach a stage where there is no j such that dj < 0, then we stop because we are at an optimal solution now. This follows from the new expression for cTx since xN is non-negative.
Remarks: There are many different rules to choose a neighbor. However, so far every rule has a
9.8 Ellipsoid Algorithm
We now introduce another algorithm to solve linear programming. Although the simplex algorithm is fast in practice, none of its variations is of polynomial time. The ellipsoid algorithm, proposed by the Russian mathematician Shor in 1977 for general convex optimization problems, applied to linear programming by Khachyan in 1979, is of polynomial time though it is not very fast in prac- tice comparing with the simplex method. However, as it is of polynomial time in theory, it has important consequences for combinatorial optimization problems. The ellipsoid algorithm essen- tially focuses on finding x ∈ P , where P ⊂ Rn is a given bounded convex set. We shall see later that linear programming can be reduced to finding an x in P = {x ∈ Rn: Cx ≤ d}. The ellipsoid algorithm is as follows:
• Let E0 be an ellipsoid containing P
• While center ak of an ellipsoid Ek is not in P , do
− Find a vector c such that cTx≤ cTak for all x ∈ P
− Find an ellipsoid Ek+1 of minimum volume containing Ek∩ {x : cTx≤ cTak}
− k → k + 1
Before we study this algorithm in greater details, we first need to define ellipsoids.
Definition 9.8.1 Given a center a, and a positive definite matrix A, the ellipsoid E(a, A) is defined as {x ∈ Rn: (x − a)TA−1(x − a) ≤ 1}.
Recall that for any positive definite matrix A, there exists a matrix B such that A = BTB, and hence A−1 = B−1 B−1T
. On the other hand, ellipsoids are just affine transformations of unit spheres. To illustrate this, consider the (bijective) affine transformation x 7→ y = B−1T
(x−a). It maps E(a, A) 7→ E(0, I) = {y : yTy≤ 1}. The following lemma ensures that ellipsoids constructed by the ellipsoid algorithm have shrinking volumes. This means that if the set P has positive vol- ume, we will eventually find a point in P . Later, we will need to take care of the case when P has no volume (when P is a single point or an empty set), and discuss when we can stop and be guaranteed that either we have a point in P or we know that P is empty.
Lemma 9.8.2 V ol(EV ol(Ek+1)
k) < e−2(n+1)1 .
We just consider a simpler case, in which the ellipsoid Ek is the unit sphere and the set P ⊂ {x : x1 ≤ 0}; the analysis of the general case is similar but more complicated. If we pick the vector c = (1, 0, 0 . . . 0), then we claim that a possible choice of the ellipsoid Ek+1 containing
Figure 9.8.4: Ellipsoid Method Ek∩ {x : x1 ≤ 0} is
Ek+1 = {x : n + 1 n
2
x1+ 1 n+ 1
2
+n2− 1 n2
n
X
i=2
x2i ≤ 1}
and it satisfies the volume ratio given in lemma 1.
Proof: For any x ∈ Ek∩ {x : x1≤ 0}, we see that
n + 1 n
2
x1+ 1 n+ 1
2
+ n2− 1 n2
n
X
i=2
x2i
= n2+ 2n + 1
n2 x21+2n + 2
n2 x1+ 1
n2 + n2− 1 n2
n
X
i=2
x2i
= 2n + 2
n2 x1(x1+ 1) + 1
n2 +n2− 1 n2
n
X
i=1
x2i
≤ 1
n2 +n2− 1 n2 = 1
(9.8.1)
where the last inequality is due to x1 ≤ 0, x1+ 1 ≥ 0 and x2i ≤ 1. Hence, Ek∩ {x : x1 ≤ 0} ⊂ Ek+1. As for the volume ratio, since the volume of an ellipsoid is proportional to the product of its axis lengths, we have
V ol(Ek+1) V ol(Ek) =
n n+1
n2 n2−1
n−1 2
1
=
1 − 1 n+ 1
1 + 1 n2− 1
n−12
≤ e−n1+1e
n−1
2(n2 −1) = e−n+11 e2(n+1)1 = e−2(n+1)1
(9.8.2)
where the inequality is due to 1 + x ≤ ex for all x.
9.8.1 From Feasibility to Optimization
We need to show how to reduce an optimization problem to the problem of finding a feasible point in a polytope. Let cTxbe our objective function we would like to minimize over P , where without loss of generality, we may assume that c ∈ Zn. Instead of optimizing, we can check the non-emptiness of
P0 = P ∩ {x : cTx≤ d + 1 2}
for d ∈ Z and our optimal value corresponds to the smallest such d. As S ⊆ {0, 1}n, d must be in the range [−ncmax, ncmax] where cmax = maxici. To find d, we can use binary search (and check the non-emptiness of P0 with the ellipsoid algorithm). This will take O(log(ncmax)) = O(log n + log cmax) steps, which is polynomial.
9.8.2 Starting Ellipsoid
We need to consider using the ellipsoid to find a feasible point in P0 or decide that P0 is empty.
As starting ellipsoid, we can use the sphere centered at 12,12. . .12 and of radius 12√n(which goes through all {0, 1}n vectors). This sphere has volume V ol(E0) = 21n(√
n)nV ol(Bn), where Bnis the unit sphere. We have that V ol(Bn) = π
n 2
Γ(n2+1), which for the purpose here we can even use the (very weak) upper bound of πn2. This shows that log(V ol(E0)) = O(n log n).
9.8.3 Termination Criterion
It can be argued through detailed calculations that the ellipsoid algorithm takes a polynomial time to find out a feasible point or discover that P0is empty. This is because we are using binary search, and the set P0 is not too small if it is non-empty. However, the proof is omitted here.
9.8.4 Separation Oracle
One of the crucial step in the ellipsoid algorithm is to decide, given x ∈ Rn, whether x ∈ P0. If not, we also need to find a violated inequality. The beauty here is that we do not necessarily need a complete and explict description of P in terms of linear inequalities. There are examples in which we can even tackle exponentially many descriptions. What we need is a separation oracle for P : given x∗ ∈ Rn, either decide that x∗ ∈ P or find an inequality aTx ≤ b valid for P such that aTx∗ > b. If this separation oracle is of polynomial time, we have succeeded in finding the optimal value d when optimizing cTx over P (or S).
9.8.5 Finding an optimum solution
There is one more issue. This algorithm gives us a point x∗ ∈ P of value at most d + 12 where d is the optimal value. However, we are interested in finding a point x ∈ P ∩ {0, 1}n = S of value exactly d. This can be done by starting from x∗ and finding any extreme point x of P such that cT ≤ cTx∗. Details are again omitted.
In summary, we obtain the following important theorm shown by Gr¨otschel, Lov´asz and Schri- jver, 1979.
Theorem 9.8.3 Let S = {0, 1}n and P = conv(S). Assume that P is full-dimensional and we are given a separation oracle for P . Then, given c ∈ Zn, one can find min{cTx: x ∈ S} by the ellipsoid algorithm by using a polynomial number of operations and calls to the separation oracle.
References
[1] M.X. Goemans, Lecture notes on linear programming, MIT 1994.
http://www-math.mit.edu/∼goemans/notes-lp.ps
[2] M.X. Goemans, Lecture notes on the ellipsoid algorithm, MIT 2005.
http://www-math.mit.edu/∼goemans/18433/ellipsoid.ps