9.1 The Linear Programming Formulation

(1)

CSC5160: Combinatorial Optimization and Approximation Algorithms

Topic: Introducton to Linear and Integer Programming Date: 14/02/2008 Lecturer: Lap Chi Lau Scribe: Shu Tong Tse, Tony Wing Hong Wong

In this lecture, we will talk about the technique of using Linear Programming (LP) to solve combinatorial optimization problems. The lecture is divided into two parts. In the first part, we discuss the theoretical aspects of LP and illustrate by exmaples how combinatoric problems can be reforumated as LP problems. In the second part, we introduce two popular algorithms in solving LP problems: the Simplex Method and the Ellipsoid Method.

9.1 The Linear Programming Formulation

In a LP problem, we are given a set of linear constraint functions gi : Rⁿ → R and constraint values bi and our goal is to maximize (or minimize) a linear objective function f : Rⁿ → R. In other words, our task is to find a vector x ∈ Rⁿ which maximizes f (x) = Pn

j=1c_jx_j and satisfies g_i(x) ≤ bⁱ for all i. LP is a subclass of Mathematical Programming problems, in which the objective function and the constraint functions can be non-linear. In a LP problem, if we require the solution x to be in Zⁿ, then it is called an Integer Linear Program problem.

Example: Perfect Matching with maximum weighting

Suppose e = {u, v} ∈ E(G) i.e. vertex u and vertex v are connected. Let w^e denotes the weighting associated with the edge e and let us use xeto distinguish whether u and v are not-matched (xe= 0) or matched (x_e = 1). Then, our optimization problem becomes maximizing P

e∈E(G)w_ex_e. How about the costraint that every vertex must be matched to exactly one other vertex? It can be encoded inP

e∈δ(v)xe = 1. (Remember that δ(v) means the set of edges adjacent to v.) Example: Maximum Satisfiability

Remeber that the Maximum Satisfaiability problem is to find a truth assignment to satisfy a clause like

(x₁∨ x2∨ x3) ∧ · · · ∧ (x3∨ x4∨ x1)

Naturally, we can use the boolean value itself to distinguish between assigning truth or false. i.e.

xi = 0 means assigning false and xi = 1 means assigning true. Note that the whole clause is true if and only if each parenthesis is true. Let us focus on (x₁∨ x2∨ x3) first. It is true iff at least one of x₁, x₂ or x₃ is true. This is equivalent to x₁+ x₂+ x₃ ≥ 1. For (x3∨ x4∨ x1), note that xi is true iff 1 − xⁱ= 1. Therefore the linear constraint can be writen as x₃+ (1 − x4) + (1 − x1) ≥ 1.

The latter example shows that determining if there is a solution in an integer linear program is NP-complete.

9.2 Different Forms of LP

Before proceeding further, let us first discuss how one form of LP can be converted to another from. By applying these conversion techniques, we can choose the most convenient form of LP

(2)

formulation to solve our original optimization problem. There are altogether 5 useful conversions:

1. maximiation problem ⇔ minimization problem

max c^Tx⇔ min − c^Tx 2. equality ⇔ a pair of inequalities

a^T_i x= bi ⇔ a^Ti x≤ bⁱ and a^T_i x≥ bⁱ

3. With the use of a slack variable, we can represent an inequality as a combination of equality and non-negativity constraints

a^T_i x≤ bⁱ ⇔ ai^Tx+ si= bi, s_i ≥ 0 4. non-positivity ⇔ non-negativity

x_j ≤ 0 ⇔ −x^j ≥ 0 5. Restrciting x in sign

For real-value x unrestricted in sign, we can break it into positive part x⁺_j and negative part x⁻_j and write xj = x⁺_j − x⁻j and require x⁺_j , x⁻_j ≥ 0.

Using these rules, the canonical form min{c^Tx: Ax ≥ b} can be transformed into the standard formmin{c^Tx⁺−c^Tx⁻: Ax⁺−Ax⁻−Is = b, x⁺, x⁻, s≥ 0} in which all variables are non-negative and we have a system of linear equality constraints.

9.3 Relaxing the constraint

In the above examples, we have constraints of the form xe= {0, 1}. Because discrete constraints are mathematically inconvenient, we would like to replace it by a constraints of the form 0 ≤ xe≤ 1.

However, by doing so, the solution of the LP problem may change to a fractional one which is nonsense in our original optimization problem. Suprisingly, this is NOT the case for many problems:

the solution of the new LP problem is still integral! Before discussing the reasons behind, let us do illustrate by one counterexample that fractional solution may arise.

Example: Fractional Solution may arise after relaxing the constraints

Consider the minimum vertex cover problem for the complete 5-graph (See Figure 9.3.1). We can reformulate it as a LP problem as follow:







minP

vx_v

x_u+ xv ≥ 1 ∀e = {u, v}

xv= 0 or 1 ∀v

(3)

Figure 9.3.1: a counterexample

Obviously, the optimal solution is covering all but one vertex and thus minP

vx_v = 4. However, if we allow 0 ≤ x^v ≤ 1, then the optimal solution would become x^v ≡ 0.5 ∀v and thus minP

vxv = 2.5 is changed. Besides, this fractional solution makes no sense to the original vertex cover problem.

9.4 The Geometry of LP

Figure 9.4.2: the geometry of LP

Geometrically, unwanted fractional solutions arise because the corners (the red vertex in Figure 9.4.2) of our enlarged solution space are not integer solutions. It is geometrically obvious that the optimal solution must be at one of the corners. Therefore, for a “good relaxation”, we require every corner to be integral!

9.5 Vertex solutions

Let P = {x : Ax = b, x ≥ 0} be the solution space. Below we gives a mathematical definition of the corner concept we discussed above:

Definition 9.5.1 x is a vertex of P if @y 6= 0 such that x + y, x − y ∈ P .

(4)

Now we prove the following important theorem, which says the optimal solution must be attainable at one of the vertices.

Theorem 9.5.2 Assume min{c^Tx: x ∈ P } is finite, then ∀x ∈ P, ∃ a vertex x⁰ such that c^Tx⁰ ≤ c^Tx

Proof: If x is a vertex, then take x⁰ = x⁰ and we are done.

If x is not a vertex, then by definitioin, ∃y 6= 0 such that x + y, x − y ∈ P . From A(x + y) = b and A(x − y) = b we obtain Ay = 0.

WLOG, we assume c^Ty ≤ 0 (we can take either y or −y). For the case c^Ty= 0, then since y 6= 0 and c^Ty = c^T(−y) = 0, either y or −y must have a negative j-component, choose it and name it y (abusing notation). We then consider x + λy for λ > 0. Note that in any situation, we have c^T(x + λy) ≤ c^Tx and x + λy ∈ P .

Case 1: ∃j such that y^j < 0: As λ increases, the component j decreases and eventually reaches zero. Let λ^max be the largest λ such that x + λy ≥ 0. Note that x + λ^maxy has one more zero component than x. We replace x by x + λ^maxy and repeat from the beginning.

Case 2: y_j ≥ 0 ∀j: we have proven above that this implies c^Ty < 0. In this case, x + λy is always possible and c^T(x+λy) → −∞ as λ → ∞, implying LP is unbounded, which is a contradition.

Since x has n components, Case 1 can happen at most n times. By induction on the number of non-zero componenets of x, we eventually reach a vertex x⁰.

9.6 Basic Solutions

In this section, we will see how to use bases to obtain solutions to the standard LP problem. Before discussing the details, we shall first prove a theorem which helps us check if x is a vertex.

Theorem 9.6.1 Let P = {x : Ax = b, x ≥ 0}. For x ∈ P , let A^xbe a submatrix of A corresponding to j s.t. xj >0. then x is a vetex iff Ax has linearly independent columns. (i.e. Ax has full column rank.)

Example: For A =





2 1 3 0 7 3 2 1 0 0 0 5



, x =





 2 0 1 0





 , Ax =



 2 3 7 2 0 0



, therefore x is a vertex.

Proof: We prove by contrapositive that Ax has linearly independent columns implies x is a vertex. Suppose x is not a vertex. Then, by definition, ∃y 6= 0 s.t. x + y, x − y ∈ P . Let A^y be the submatrix corresponding to the non-zero components of y. Since Ay = 0 and y 6= 0, A^y has linearly dependent columns. Moreover, notice that since both x + y, x − y ≥ 0, y^j = 0 whenever x_j = 0. Therefore, A_y is a submatrix of A_x and hence A_x has linearly dependent columns.

We prove the other direction by contrapositive as well. Suppose Axhas linearly dependent columns.

Then ∃y s.t. Axy= 0, y 6= 0. We extend y to Rⁿby adding 0 componenets. Then we have y ∈ Rⁿ

(5)

s.t. Ay = 0, y 6= 0 and y^j = 0 whenver xj = 0. Let y⁰ = λy. For λ > 0 small enough, as in the proof of Theorem 9.5.2, we can show that x + y⁰, x− y⁰ ∈ P , which means x is not a vertex.

Let P = {x : Ax = b, x ≥ 0} be our solution space, where A is a m × n matrix.(m constraints on x ∈ Rⁿ). Without loss of generality, we assume that there is no redundant constraints, or equivalent, the m rows of A are linearly independent. In this case, for a solution to exists, n ≥ m is necessary. Therefore, we assume n ≥ m as well.

Our next step is to extract m columns from A to obtain a square matrix. Let B ⊂ {1, · · · , n} such that |B| = m. We retain the columns corresponding to the elements of B and remove the other columns. Denote the new square matrix by AB. We also remove the corresponding entries in x and name the new vector xB. Since AB is non-singular by assumption, the system ABxB= b has a unique solution x_B= A⁻_B¹b. If x_B ≥ 0, then we can obtain a solution to the original {Ax = b, x ≥ 0}

prblem by setting xN = 0 for N = {1, · · · , n} \ B. By the previous theorem, x is a vertex.

Here are a few teminolgies. The set B above such that |B| = m and A^B is non-singular is called a basis. When xB ≥ 0, the corresponding solution x is called a basic feasible solution. Be careful that a basis need not lead to any basic feasible solution if A⁻¹_B bis negative. Also, a vertex can have many basic feasible solutions corresponding to it. (by choosing B in different ways)

Example:







x₁+ x2+ x3 = 5 2x₁− x2+ 2x₃= 1 x₁, x₂, x₃ ≥ 0

We select as a basis B = {1, 2}. Thus N = {3} and AB =

1 1 2 −1

and A⁻_B¹b =

2 3

and x= (2, 3, 0)^T is a basic feasible (vertex) solution.

9.6.1 Basic Solutions in General Form

Remember that in the canonical LP, we have the constraint x : Ax ≤ b whereas in the standard LP, the constraint is x : Ax = b, x ≥ 0. In either case, when Ax = b holds, we say the constraint is tight for x. From our geometric interpretation, intuitively corners are places where inequalities become tight. In a space with n variables, there are at most n linearly independent tight constraints. If a solution x is only the intersection of n⁰ linear independent tight constraints where n⁰ < n, then there is n − n⁰ ≥ 1 degree of freedom and so x can be represented as the average of two solutions and thus is not a vertex solution. So, we have the following characterization of basic solutions, which will be used frequently later in this course.

Lemma 9.6.2 A basic solution x is the unique solution of n linearly independent tight constraints, where n is the number of variables in the linear program.

Note that this characterization holds in general form; it is instructive to check that this generalizes the previous characterization of basic solutions in standard form.

(6)

Figure 9.7.3: Visualization of the Simplex Method

9.7 The Simplex Method

The Simplex Method (Dantzig 1951) is a popular algorithm for solving LP problem. Although it has exponential time performance in worse case scenario, it is very efficient in pratice. Moreover, it is also easy to understand and simple to implement.

The idea is to focus on vertex solutions, since there is always a vertex which attains optimality. The algorithm starts from an arbitrary vertex, and then move to one of its neighbours which improves.

Repeat the procedure until no improvement can be made. At this stage, we reach a local extremum.

Since the solution space is a convex set and the objective function is linear, it is a global extremum as well. Therefore, we have now attained global optimality.

Below we sketch a more mathematical proof of why Simplex algorithm works.

We begin by writing our LP in the form:

min cBx_B+ cNx_N such that ABx_B+ ANx_N = b and xB, x_N ≥ 0

Here B is the basis from which we obtain the basic solution to start from. Notice that for any solution x, xB= A⁻¹_B b− AB⁻¹ANxN and that its total cost, c^Txcan be written as

c^Tx= c_Bx_B+ c_Nx_N = c_B(A⁻_B¹b− A⁻B¹A_Nx_N) + c_Nx_N = c_BA⁻_B¹b+ (c_N − cBA⁻_B¹A_N)x_N We denote the reduced cost of the non-basic variables by d_N = c_N − cBA⁻_B¹A_N. If there is a j ∈ N such that d^j <0, then by increasing xj up from zero we decrease the cost. We can keep on increasing xj until one of the components of xB becomes zero. When one of the components does reach zero, we remove it from the basis and replace it by the variable xj which is orgiginal non-basic but now positive. Iterate.

When we reach a stage where there is no j such that d_j < 0, then we stop because we are at an optimal solution now. This follows from the new expression for c^Tx since xN is non-negative.

Remarks: There are many different rules to choose a neighbor. However, so far every rule has a

(7)

9.8 Ellipsoid Algorithm

We now introduce another algorithm to solve linear programming. Although the simplex algorithm is fast in practice, none of its variations is of polynomial time. The ellipsoid algorithm, proposed by the Russian mathematician Shor in 1977 for general convex optimization problems, applied to linear programming by Khachyan in 1979, is of polynomial time though it is not very fast in practice comparing with the simplex method. However, as it is of polynomial time in theory, it has important consequences for combinatorial optimization problems. The ellipsoid algorithm essen- tially focuses on finding x ∈ P , where P ⊂ Rⁿ is a given bounded convex set. We shall see later that linear programming can be reduced to finding an x in P = {x ∈ Rⁿ: Cx ≤ d}. The ellipsoid algorithm is as follows:

• Let E0 be an ellipsoid containing P

• While center ak of an ellipsoid E_k is not in P , do

− Find a vector c such that c^Tx≤ c^Ta_k for all x ∈ P

− Find an ellipsoid Ek+1 of minimum volume containing E_k∩ {x : c^Tx≤ c^Ta_k}

− k → k + 1

Before we study this algorithm in greater details, we first need to define ellipsoids.

Definition 9.8.1 Given a center a, and a positive definite matrix A, the ellipsoid E(a, A) is defined as {x ∈ Rⁿ: (x − a)^TA⁻¹(x − a) ≤ 1}.

Recall that for any positive definite matrix A, there exists a matrix B such that A = B^TB, and hence A⁻¹ = B⁻¹ B⁻¹T

. On the other hand, ellipsoids are just affine transformations of unit spheres. To illustrate this, consider the (bijective) affine transformation x 7→ y = B⁻¹T

(x−a). It maps E(a, A) 7→ E(0, I) = {y : y^Ty≤ 1}. The following lemma ensures that ellipsoids constructed by the ellipsoid algorithm have shrinking volumes. This means that if the set P has positive volume, we will eventually find a point in P . Later, we will need to take care of the case when P has no volume (when P is a single point or an empty set), and discuss when we can stop and be guaranteed that either we have a point in P or we know that P is empty.

Lemma 9.8.2 ^{V ol(E}_{V ol(E}^k⁺¹⁾

k) < e⁻²⁽ⁿ⁺¹⁾¹ .

We just consider a simpler case, in which the ellipsoid Ek is the unit sphere and the set P ⊂ {x : x1 ≤ 0}; the analysis of the general case is similar but more complicated. If we pick the vector c = (1, 0, 0 . . . 0), then we claim that a possible choice of the ellipsoid E_k+1 containing

(8)

Figure 9.8.4: Ellipsoid Method E_k∩ {x : x¹ ≤ 0} is

E_k+1 = {x : n + 1 n

2

x₁+ 1 n+ 1

2

+n²− 1 n²

n

X

i=2

x²_i ≤ 1}

and it satisfies the volume ratio given in lemma 1.

Proof: For any x ∈ E^k∩ {x : x1≤ 0}, we see that

n + 1 n

2

x₁+ 1 n+ 1

2

+ n²− 1 n²

n

X

i=2

x²_i

= n²+ 2n + 1

n² x²₁+2n + 2

n² x₁+ 1

n² + n²− 1 n²

n

X

i=2

x²_i

= 2n + 2

n² x₁(x₁+ 1) + 1

n² +n²− 1 n²

n

X

i=1

x²_i

≤ 1

n² +n²− 1 n² = 1

(9.8.1)

where the last inequality is due to x1 ≤ 0, x¹+ 1 ≥ 0 and x²i ≤ 1. Hence, E^k∩ {x : x¹ ≤ 0} ⊂ Ek+1. As for the volume ratio, since the volume of an ellipsoid is proportional to the product of its axis lengths, we have

V ol(E_k+1) V ol(Ek) =

n n+1

n² n²−1

n−1 2

1

=

1 − 1 n+ 1

1 + 1 n²− 1

ⁿ⁻¹₂

≤ e⁻ⁿ¹⁺¹e

n−1

2(n2 −1) = e⁻ⁿ⁺¹¹ e²⁽ⁿ⁺¹⁾¹ = e⁻²⁽ⁿ⁺¹⁾¹

(9.8.2)

where the inequality is due to 1 + x ≤ e^x for all x.

(9)

9.8.1 From Feasibility to Optimization

We need to show how to reduce an optimization problem to the problem of finding a feasible point in a polytope. Let c^Txbe our objective function we would like to minimize over P , where without loss of generality, we may assume that c ∈ Zⁿ. Instead of optimizing, we can check the non-emptiness of

P⁰ = P ∩ {x : c^Tx≤ d + 1 2}

for d ∈ Z and our optimal value corresponds to the smallest such d. As S ⊆ {0, 1}ⁿ, d must be in the range [−nc^max, ncmax] where cmax = maxici. To find d, we can use binary search (and check the non-emptiness of P⁰ with the ellipsoid algorithm). This will take O(log(ncmax)) = O(log n + log cmax) steps, which is polynomial.

9.8.2 Starting Ellipsoid

We need to consider using the ellipsoid to find a feasible point in P⁰ or decide that P⁰ is empty.

As starting ellipsoid, we can use the sphere centered at ¹₂,¹₂. . .¹₂ and of radius ¹₂√n(which goes through all {0, 1}ⁿ vectors). This sphere has volume V ol(E₀) = ₂¹ⁿ(√

n)ⁿV ol(Bn), where Bnis the unit sphere. We have that V ol(Bn) = ^π

n 2

Γ(ⁿ₂⁺¹), which for the purpose here we can even use the (very weak) upper bound of πⁿ². This shows that log(V ol(E₀)) = O(n log n).

9.8.3 Termination Criterion

It can be argued through detailed calculations that the ellipsoid algorithm takes a polynomial time to find out a feasible point or discover that P⁰is empty. This is because we are using binary search, and the set P⁰ is not too small if it is non-empty. However, the proof is omitted here.

9.8.4 Separation Oracle

One of the crucial step in the ellipsoid algorithm is to decide, given x ∈ Rⁿ, whether x ∈ P⁰. If not, we also need to find a violated inequality. The beauty here is that we do not necessarily need a complete and explict description of P in terms of linear inequalities. There are examples in which we can even tackle exponentially many descriptions. What we need is a separation oracle for P : given x^∗ ∈ Rⁿ, either decide that x^∗ ∈ P or find an inequality a^Tx ≤ b valid for P such that a^Tx^∗ > b. If this separation oracle is of polynomial time, we have succeeded in finding the optimal value d when optimizing c^Tx over P (or S).

9.8.5 Finding an optimum solution

There is one more issue. This algorithm gives us a point x^∗ ∈ P of value at most d + ¹₂ where d is the optimal value. However, we are interested in finding a point x ∈ P ∩ {0, 1}ⁿ = S of value exactly d. This can be done by starting from x^∗ and finding any extreme point x of P such that c^T ≤ c^Tx^∗. Details are again omitted.

(10)

In summary, we obtain the following important theorm shown by Gr¨otschel, Lov´asz and Schri- jver, 1979.

Theorem 9.8.3 Let S = {0, 1}ⁿ and P = conv(S). Assume that P is full-dimensional and we are given a separation oracle for P . Then, given c ∈ Zⁿ, one can find min{c^Tx: x ∈ S} by the ellipsoid algorithm by using a polynomial number of operations and calls to the separation oracle.

References

[1] M.X. Goemans, Lecture notes on linear programming, MIT 1994.

http://www-math.mit.edu/∼goemans/notes-lp.ps

[2] M.X. Goemans, Lecture notes on the ellipsoid algorithm, MIT 2005.

http://www-math.mit.edu/∼goemans/18433/ellipsoid.ps