Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013

(1)

Dynamic programming

Doctoral course “Optimization on graphs” - Lecture 4.1

Giovanni Righini

January 17th , 2013

(2)

Implicit enumeration

Combinatorial optimization problemsare in general NP-hard and we usually resort toimplicit enumerationto solve them tooptimality(this is also useful forapproximationpurposes).

Two main methods for implicit enumeration are:

• branch-and-bound,

• dynamic programming.

Dynamic programming is also an algorithmic framework to solve polynomial timeproblems efficiently.

It also allows to devisepseudo-polynomial timealgorithms as well as approximationalgorithms.

(3)

An introductory example

Consider the problem of finding a shortest path from node 0 to 9 on this graph, which isdirected,acyclicandlayered.

5

8 2

0

7

3

1 4

6

9

6

8

13

7 12 8 10 15 8 9

7 8 20 15

4 3

(4)

An introductory example

5

8 2

0

7

3

1 4

6

9

6

8

13

7 12 8 10 15 8 9

7 8 20 15

4 3

A greedy algorithm from 0 would produce apathof cost 33.

(5)

An introductory example

5

8 2

0

7

3

1 4

6

9

6

8

13

7 12 8 10 15 8 9

7 8 20 15

4 3

A greedy algorithm from 9 would produce apathof cost 34.

(6)

An introductory example

5

8 2

0

7

3

1 4

6

9

6

8

13

7 12 8 10 15 8 9

7 8 20 15

4 3

Theoptimal pathhas cost 30.

(7)

Bellman’s optimality principle (1953)

An optimal policy is made of a set of optimal sub-policies.

Bypolicywe mean a sequence of decisions, i.e. of value assignments to the variables).

Asub-policyis a sub-sequence of decisions, i.e. of value assignments to a subset of the variables.

Richard E. Bellman (New York, 1920 - 1984)

(8)

The example revisited

5

8 2

0

7

3

1 4

6

9

6

8

13

7 12 8 10 15 8 9

7 8 20 15

4 3

A decision must be taken every time the path is extended from one layer to the next.

Apolicyis a path from 0 to 9. Asub-policyis any other path.

(9)

The example revisited

5

8 2

0

7

3

1 4

6

9

6

8

13

7 12 8 10 15 8 9

7 8 20 15

4 3

The optimal policy is made of pairs of optimal sub-policies of the form(0,i)and(i,9).

We only need to store optimal sub-policies. All the other sub-policies can be disregarded.

(10)

Dominance

Given two sub-policies S^′ and S^′′,S^′dominates S^′′only if

• all sub-policies that can be appended to S^′′can also be appended to S^′, with no greater cost;

• the cost of S^′ is less than the cost of S^′′.

When this occurs, S^′′can be neglected from further consideration: it cannot be part of an optimal policy.

(11)

The example revisited

5

8 2

0

7

3

1 4

6

9

6

8

13

7 12 8 10 15 8 9

7 8 20 15

4 3 18

In our example, all sub-policies leading to the same node (paths of the form(0,i)), can be completed by appending to them the same sub-policies (paths of the form(i,9)).

Then we need to store onlyan optimal oneamong them. All the others aredominated.

(12)

The example finally solved

5

8 2

0

7

3

1 4

6

9

6

8

13

7 12 8 10 15 8 9

7 8 20 15

4 3 18

6

8

13

15

20

30

26

30 0

LetL = {0, . . . ,L}be the set of layers.

LetNl the subset of nodes in layer l ∈ L.

Let w be the weight function of the arcs of the graph.

• c(0) =0

• ∀l∈ L,l ≥1 ∀j ∈ Nl c(j) =min_i∈N_l

−1{c(i) +wij}

(13)

Terminology and correspondence

• Nodes in the graph are states.

• Arcs of the graph are state transitions.

• Paths in the graphs are (sub-)policies.

• Values c(i)(costs of shortest paths from 0 to i) are labels.

The solution process applies these two rules:

• Initialization (recursion base): c(0) =0

• Label extension (recursive step):

∀l∈ L,l ≥1 ∀j ∈ Nl c(j) =min_i∈N_l−1{c(i) +wij}.

It resembles recursive programming, but it is bottom-up instead of top-down.

The execution of a dynamic programming algorithm resembles the evolution of adiscrete dynamic system (automaton).

The recursive step requires to solve a very easyoptimization problem.

(14)

Another example

Consider the problem of finding a shortestHamiltonianpath from node 0 to node 5 on this graph, that isdirectedbutcyclic.

0

3 1 2

4

5

A policy (i.e. a path from 0 to 5) is feasible if and only if it visits all nodes exactly once.

(15)

Dominance?

0

3 1 2

4

5

Reaching the same node is no longer sufficient for a path (sub-policy) to dominate another: they do not reach the samestate.

(16)

Dominance

0

3 1 2

4

5

Only if they have also visited the same subset of nodes, then they reach the same state.

• Initialization: c(0, {0}) =0

• Extension: c(j,S) =min_i∈S\{j}{c(i,S\{j}) +w_ij} ∀j6=0.

(17)

Complexity

The two D.P. algorithms have differentcomplexity.

Ex. 1: n. states = n. of distinct values of j = n. of nodes in the graph.

• State:(i)[last reached node];

• Initialization: c(0) =0;

• Extension: c(j) =min_i_∈N_l−1{c(i) +w(i,j)} ∀l ≥1∈ L, ∀j ∈ N_l. Ex. 2: n. states = n. distinct values of(j,S)= exponential number in the n. of nodes in the graph.

• State:(S,i)[visited subset, last reached node];

• Initialization: c(0, {0}) =0;

• Extension: c(j,S) =min_i∈S\{j}{c(i,S\{j}) +wij} ∀j6=0.

This is equivalent to solve the same problem as in Example 1 but on a larger state graph (one node for each distinct value of(j,S)).

The complexity of the D.P. algorithm depends on thenumber of arcs in the state graph.

(18)

Comparison with branch-and-bound

-

100 00

110 010

101 001

011

11 10 01 0

1

B&B: sub-policies onlydiverge.

-

111 00

001 000

011

11 01 0

1

D.P.: sub-policies sometimes converge.

In both cases graphs aredirected,acyclicandlayered.

But one is an arborescence, the other can be not.

(19)

Dynamic Programming in three steps

In order to design a D.P. algorithm we need to:

1. put the decisions (i.e. the variables) in asequence(policy);

2. define thestate, i.e. the amount of information needed to transform a sub-policy into a complete policy;

3. find therecursive extension function (REF)to compute the labels of the states, following the sequence.

The state of a dynamic system at time t summarizes the past history of the system up to time t, such that the evolution of the system after t depends on the state at time t but not on how it has been reached.

Solving a problem with D.P. amounts at defining a suitable state-transition graph, on which we search for an optimal path.

This graph isdirected,acyclicandlayered.

The number of its nodes and arcs determines thecomplexityof the D.P. algorithm.

(20)

Shortest path on a digraph

Consider the problem of finding ashortest pathfrom node 0 to node 5 on thedirectedandcyclicgraph of the previous example, assuming there are no cycles with negative cost.

0

3

1 2

4

5

(21)

Shortest path on a digraph

0

3

1 2

4

5

We cannotsetthe node labels once for ever: we need tocorrectthem every time we discover a better path to the same node.

However no optimal path can take more than N−1 arcs (in this case, 5 arcs).

The labels we compute at each extension areoptimal for each number of arcs of the path.

(22)

Shortest path on a digraph

We reformulate the problem on a directed,acyclicandlayeredgraph, with N layers.

0 0

3 1

2

4

5

0

3 1

2

4

5

0

3 1

2

4

5

….

• State:(l,i)[n. extensions, last reached node];

• Initialization: c(0,0) =0;

• Extension: c(l,j) =

=min_i∈N{c(l−1,i) +w_ij}

∀l≥1∈ L, ∀j ∈ N.

(23)

Bellman-Ford algorithm (1958)

Thelabel correctingalgorithm obtained in this way is known as the Bellman-Ford algorithm.

• Time complexity: O(N³), as the number of arcs in the layered graph.

• Space complexity: O(N²), to represent the graph. For each node inN we need to store acostand apredecessor.

No label can be considered permanent until:

• either all layers have been labeled,

• or no label update has occurred from one layer to the next one.

(24)

Dijkstra algorithm (1959)

Under the hypothesis that arc weights are non-negative, we can permanentlysetthe label of minimum cost in each layer. Thelabel settingalgorithm obtained in this way is known as theDijkstra algorithm.

b

0

3 1

2

4

5

1

2

4

5

1

2

4

….

(25)

Dijkstra algorithm (1959)

• State:(l,i)[layer, node];

• Initialization: c(0,0) =0; last=0; Permanent(0) = {0};

• Extension:

• c(l,j) =min{c(l−1,j),c(l−1,last(l−1)) +w(last(l−1),j)}

∀l≥1∈ L,∀j6∈Permanent(l−1);

• Permanent(l) =Permanent(l−1) ∪ {minj6∈Permanent(l−1){c(l,j)}}.

Every node has only two predecessors: the time complexity is O(N²).

(26)

Shortest trees and arborescences

Bellman-Ford’s and Dijkstra’s algorithms compute theshortest paths arborescence, that contains all shortest paths from the root node to all the others (for the optimality principle).

With a slight modification we can obtain Prim’s algorithm to compute theminimum spanning treeon weighted (unoriented) graphs.

All these are examples ofdynamic programming algorithmsfor polynomial-time combinatorial optimization problems.

(27)

D.P. for NP-hard problems: the knapsack problem

min z=X

j∈N

cjxj

s.t. X

j∈N

a_jx_j ≤b

xj ∈ {0,1} ∀j∈ N

Assume b is a known constant and all data are integer.

The number of possible solutions is 2^Nand, even worse, the problem is NP-hard!

(28)

D.P. for NP-hard problems: the knapsack problem

• Policy: sort the items inN (the variables) from x1to xN.

• State:

• Feasibility depends on theresidual capacity;

• Cost does not depend on previous decisions.

Hence the state is given bythe last item considered(i) andthe capacity used so far(u).

• R.E.F.:

• Initialization: z(0,0) =0;

• Extension: z(i,u) =max{z(i−1,u),z(i−1,u−ai) +ci}.

(29)

D.P. for NP-hard problems: the knapsack problem

i-1, u Xi=0

Xi=1

i, u+ai

i, u

The state graph has a layer for each item (variable) j ∈ N and b+1 nodes per layer.

Complexity: The graph has O(Nb)nodes and each of them has only two predecessors. Then the D.P. algorithm has complexity O(Nb), which ispseudo-polynomial.

(30)

D.P. for strongly NP-hard problems: the ESPP

Consider the problem of finding theelementaryshortest path from s to t in a weighted directed cyclic graphwith general weights.

s

2 1

t

1 -1

The constraint that the path must be elementary is necessary in presence of negative cycles, because otherwise a finite optimal solution may not exist.

(31)

D.P. for strongly NP-hard problems: the RCESPP

Then we need to put into the state the information on which nodes have already been visited.

• State:(S,i)(visited subset, last reached node);

• Initialization: c({s},s) =0;

• Extension: c(S,j) =min_i∈S\{j}{c(S\{j},i) +w_ij} ∀j6=0.

The problem can also have other constraints, represented as consumptions ofresources(capacity, time,...): this problem is called the Resource Constrained Elementary Shortest Path Problem.

• State:(S,q,i)(visited subset, resource consumption, last reached node);

• Initialization: c({s},0,s) =0;

• Extension:

c(S,q,j) =min_i∈S\{j}{c(S\{j},q−dj,i) +wij} ∀j 6=0, where d_j is the consumption of resources when visiting node j.

(32)

Dominance

A label L^′= (S^′,q^′,i^′)dominates a label L^′′= (S^′′,q^′′,i^′′)only if:

• S^′⊆S^′′

• q^′ ≤q^′′

• i^′=i^′′

• c(L^′) ≤c(L^′′).

Complexity: the n. of states growsexponentiallywith the size of the graph.

(33)

Bi-directional extension

Inbi-directional D.P.labels are extended both forward from vertex s to its successors and backward from vertex t to its predecessors.

Backward states, recursive extension functions and dominance rules are symmetrical.

A path from s to t is detected each time a forward state and a backward state can be feasibly joined through arc(i,j).

Let L^fw = (S^fw,q^fw,i)be a forward path and L^bw = (S^bw,q^bw,j)be a backward path. When they are joined, the cost of the resulting s-t path is c(L^fw) +w_ij+c(L^bw).

The two paths can be joined subject to:

• S^fw ∩S^bw = ∅

• q^fw +q^bw ≤Q

(34)

Stopping at a half-way point

We can stop extending a path in one direction when we have the guarantee that the remaining part of the path will be generated in the other direction and therefore no optimal solution will be lost.

Then the extension of forward and backward labels isstoppedwhile preserving the guarantee that the optimal solution will be found.

Without this the bi-directional algorithm would simply produce twice as many labels compared to the mono-directional one.

Possible stop criteria:

• stop when N/2 arcs have been used;

• stop when half the overall available amount of a resource has been consumed.

(35)

Bounding for fathoming labels

Boundingis used as in branch-and-bound algorithms to detect labels that are not worth extending because lead to optimal solutions.

For instance, given a label L= (S,q,i)it is possible to compute a lower bound LB(L)on the costs to be paid after visiting node i with the remaining amount Q−q of available resources.

min LB= X

j∈N \S

b_jy_j

subject to X

j∈N \S

djyj ≤Q−q

yj ∈ {0,1} ∀j ∈ N \S

where bj is a lower bound on the cost of visiting node j6∈S and dj is the resource consumption when visiting node j.

Label L can be fathomed ifc(L) +LB≥UB, where UB is an incumbent upper bound.

(36)

State space relaxation

State space relaxationwas introduced by Christofides, Mingozzi and Toth in 1981.

The state spaceSexplored by the D.P. algorithm is projected onto a lower dimensional spaceT, so that each state inT retains the minimum cost among those of its corresponding states inS (assuming minimization).

SSR: S 7→ T such that c(t) =mins∈S:SSR(s)=t{c(s)}.

In this way:

• the number of states to be explored is drastically reduced;

• some infeasible states s inS can be projected onto a state t corresponding to a feasible solution inT.

The D.P. algorithm exploringT instead ofS is faster and it does not guarantee to find an optimal solution, but rather a dual bound.

(37)

SSR of the set of visited nodes

We map each state(S,q,i)onto a new state(σ,q,i), whereσ = |S|

represents the number of nodes visited (excluding s).

Dominance condition S^′⊆S^′′is replaced byσ^′ ≤ σ^′′.

Sinceσ ≤N the D.P. haspseudo-polynomial time complexity.

Since the state does no longer keep information about the set of already visited vertices, cycles are no longer forbidden; the solution

• is guaranteed to be feasible with respect to the resource constraints;

• is not guaranteed to be elementary.

This technique can also be applied to bi-directional search.

(38)

Decremental state space relaxation

Decremental SSRallows to tune a trade-off between D.P. with SSR (usingσ) and exact D.P. (using S):

• a setV ⊆ N ofcritical nodesis defined;

• S is now a subset ofV;

• σcounts the overall number of visited nodes.

• ForV = N, DSSR is equivalent toexact D.P..

• ForV = ∅, DSSR is equivalent toD.P. with SSR.

The algorithm is executed multiple times and after each iteration the nodes visited more than once are inserted intoV.

The algorithm ends when the optimal solution is also feasible (the path is elementary). An increasingly betterlower boundis produced at each iteration.

Computational experiments show that in many cases a critical set containing about 15%of the nodes is enough.