Implementation Details and Variants - On Flows, Paths, Roots, and Zeros

We describe some implementation choices and details.

Lazy Potential Update. The update of the potentials in DualStep should not be done in a naive way, since this would directly imply quadratic running time in n for a single iteration. Instead, the potentials can be updated lazily, as already described in [BK14], in such a way that the potential yvof a node v is only set once in each call of DualStep, namely at the time

when v enters Skfor some k.

Choosing the Starting Node. We implemented different ways of choosing the starting node s ∈ V from which the search in DualStep takes off. The first variant is to start with the first node and, as soon as this one has deficit 0, move over to the second one, and so on. This is what is proposed in Section 2.2 as well as in the pseudo-code; we refer to it as snbal. The second variant is to always choose the node with maximal absolute deficit; we call it maxdef. A third variant is a combination of the previous two: Always choose the node with maximal absolute deficit and start with this node until it is balanced. Then, go over to the node with maximal absolute deficit, and so on. We call this variant snbaldef.

Different Tree Updates in Primal Step. The primal update described in the pseudo-code in Section 2.2 can be implemented efficiently as follows: Let T0 be the spanning tree in the residual network Gx(rooted at the starting node s) that is returned by DualStep, and let T A ∩ (T0∪ −T0) be the corresponding spanning tree in G that is given to PrimalStep. Compute f as follows: Traverse the tree from bottom to top – at a node v ∈ T, let a ±(v, w) ∈ A be an arc between v and a node w on the level above v: Now, send that amount of flow faalong

the arc a that minimizes |bx+ f(v)|, respecting the capacity and non-negativity constraints corresponding to a. More formally, let

fa ← max{0, min{uxa, |bx+ fv |}}, and set xa ← xa+ fa.

Then, update the deficits at v and w, accordingly. Thereafter, traverse the tree from top to bottom and at each node v, for any arc a ±(v, w) between v and a child w, change f as follows: Reduce fa such that min{0, bx(v)} ≤ bx+ f(v) ≤ max{0, bx(v)} holds. The idea of

this approach is to first greedily send the maximum amount of deficit up the tree, and in the second traversal, correct the flows such that the absolute deficit of the nodes never increases and its sign stays the same.

We implemented another variant of the algorithm that we obtain by using a different primal update. The idea of this alternative approach is not to traverse the tree back down, but to stay with the greedily chosen flows f after the first traversal of the tree from bottom to top. Note that the difference between this and the original approach is that for an arc that goes up in T from v to w, we ignore the change in |bx+ f(w)| while choosing fa. In particular,

2.3. Implementation Details and Variants 19

defuptfor deficit up tree, and the original version we call rbn for restore balanced nodes. The latter name originates from the fact that in this version all nodes with deficit zero will stay at zero deficit after the primal update.

Breaking out of the Dual Step. Another variant of the algorithm is obtained by breaking out of DualStep after fewer than n − 1 steps. We experimented with several different approaches here. The first stops as soon as a node with a deficit having a different sign than the starting node is found. Note that this approach strongly resembles successive shortest path, with the only difference being that the primal update is still done on a tree and not only on the path from s to that node. This version did not show to be very efficient on the instances tested; however, a relaxed version of it brings great advantages. The idea is to only break out of DualStep in the iteration k when the deficit of the set bx(Sk_{) changes its sign, or}

even only when the deficit hits exactly zero, i.e., bx(Sk) 0. We call these two variants sc and def0 for sign changes and deficit zero, respectively. Note that in the lazy potential update version, the potentials of v ∈ V \ St _{have to be shifted by the last ∆ in order to maintain the}

invariants. The advantage of these approaches is that they do not search through the whole graph in each iteration. This, of course, comes at the cost of a poorer dual update, but might be beneficial when large portions of the graph already have optimal node potentials.

Priority Queue. From a theoretical point of view, the in- and outgoing arcs of Skshould be kept in a priority queue such as a Fibonacci heap [FT87]. In this case, the complexity of Dual- Step is O(m + n log n), similar to Dĳkstra’s algorithm [Dĳ59]. We implemented the following two approaches. First, we implemented our algorithm using the STL priority_queue that is built into C++. Second, we implemented what we call a hybrid_queue. It can be seen as a hybrid of a bucket queue and the standard priority queue. The hybrid_queue stores a normal array called bucket, a priority queue Q (we again used the STL implementation), and a variable called lower_bound. It provides two functions called push() and top(). If push()is called with an element v with key k lower_bound, it is stored in bucket. If v, however, has key k > lower_bound, it is stored within Q. If k < lower_bound, then all elements in bucket are moved to Q, v is added to bucket, and lower_bound is set to k. If top()is called, then an element is taken from bucket as long as it is non-empty. If bucket is empty, the minimum element of Q is returned, and lower_bound is set to the key of that element. In order to explain the idea underlying this approach, let us assume that the same starting node s is chosen for several iterations. Then, it is likely that there are a lot of arcs with reduced costs 0 spanning from s into the graph and these arcs can be taken out of the bucket at smaller cost using this approach

Greedy Dual Ascent Update. Clearly, the evaluation of gSat all event-points would yield

quadratic run-time for the dual step. However, the arcs on the cut can be kept in a balanced binary search tree, see for example [AL62] or [GS78], with the keys being the event-points κ(a). In addition, for every a ∈ δout_{(S) ∪}_δin_{(S), we store a prefix-sum and a postfix-sum. The}

prefix-sumcorresponds to the sum of the capacities of all out-going arcs b with κ(b) < κ(a)

and the postfix-sum is equal to the sum of the capacities of all in-going arcs b with κ(b) ≥ κ(a). The update of the balanced binary search tree can be done in logarithmic time in this way, as well as the computation of the maximum of fS. We use AVL-trees [AL62] for the cut

edges and complement our own implementation with the ability of computing the prefix- and postfix-sums as described above. We denote the greedy dual ascent algorithm with min_cost_flow_gda. We remark that the greedy dual ascent algorithm in the the greedy dual step in the form as it is described in the pseudo-code implementation above, does not take any information about the primal iterate into account. An alternative approach, that

takes the primal iterates into account, is to run the greedy dual step on the residual network instead of the original network. We call this variant min_cost_flow_gdar.

Since the ambition of min_cost_flow_gda and min_cost_flow_gdar is to yield a bet- ter progress in terms of dual objective value at the cost of progress in the primal fea- sibility, it might also be interesting to combine iterations of greedy dual ascent with iterations of the standard dual ascent method. We tried out the following two variants min_cost_flow_gda_aqand min_cost_flow_gdar_aq. In these two methods, the aq stands for alternating queue, i.e., we use the gda and gdar, respectively, for every second iteration and the default variant of the dual ascent algorithm for every other iteration.

Experimental Setting and Evaluation Details. We performed experiments on a compute server with 32 Intel (R) Xeon (R) E5-2680 2.70GHz cores and a total of 256 GB RAM running Debian GNU/Linux 7 with kernel 3.10.60. The code was compiled with gcc version 4.7.2 using the -O3 flag. In all plots in this chapter, the results are averages over 25 runs, 5 runs each on 5 graphs of that size. The error bars indicate the 95%-confidence intervals of the estimated means over 25 runs.

2.4 Experimental Evaluation

In document On Flows, Paths, Roots, and Zeros (Page 32-34)