Reducing Instruction Counts - Tuning Algorithms, Tuning Code

Tuning Algorithms, Tuning Code

4.1 Reducing Instruction Counts

Algorithm tuning has received less systematic treatment than code tuning in the experimental literature; therefore, we start with two case studies to illustrate the basic approach. Section 4.1.1 extends these examples and presents some general guidelines for tuning algorithms.

Case Study: Bin Packing by Exhaustive Search

The bin packing problem was introduced in Section 3.2: Given a listL containing n weights from the real interval[0,1), pack them into unit-capacity bins so as to

minimize the number of bins used.

Figure 4.1 shows an example of 10 weights packed two different ways. Packing (a) is the result of using the next fit packing algorithm. Next fit works through the list from left to right, maintaining a single “open” bin as it goes. It packs as many weights as possible into the open bin and starts a new bin when the next weight will not fit. Thus (.4, .1) are packed into the first bin, but .6 will not fit, so it starts a new bin, and so forth. The next fit rule packs this list into seven bins, while the

.4 .1 .6 .9 .2 .2 .7 .5 .1 .8 (a) Next ﬁt .4 .1 .6 .9 .2 .2 .7 .5 .1 .8 (b) Optimal .9 .1 .8 .2 .7 .2 .6 .4 .5 .1

Figure 4.1. Bin packing. Two packings of the same list of 10 weights. The next ﬁt packing uses seven bins, and the optimal packing uses ﬁve bins.

1 global list[0..n-1]; // list to be packed

2 global optcost; // minimum bin count

3 procedure binPack (k) { 4 if (k == n){

5 b = binCount(); // use next fit

6 if (b < optcost) optcost = b; }

7 else

8 for (i = k; i < n ; i++) {

9 swap (list, k, i); // try it

10 binPack (k+1); // recur

11 swap (list, k, i); // restore it

} }

Figure 4.2. binPack. An exhaustive search algorithm for bin packing.

optimal packing (b) uses only ﬁve bins to pack the same list. Next ﬁt runs inO(n)

time and can never use more than twice as many bins as the optimal packing (since every bin but the last one is at least half-full).

The bin packing problem is NP-hard; that means that no polynomial-time algorithm is known that guarantees to ﬁnd an optimal packing of every list. In this section we consider an exponential-time exact algorithm that does guarantee to ﬁnd optimal packings.

The binPack procedure sketched in Figure 4.2 is an example of an algo- rithm in the exhaustive search paradigm, which solves an optimization problem by (exhaustively) checking all possible solutions. The procedure constructs all permutations of the list recursively, usingk as a parameter. At stage k, the elements inlist[0...k-1] have been ﬁxed in the permutation; the stage considers all remaining elements for positionlist[k] and recurses to generate the rest of the permutation. Once a permutation is complete (whenk == n), the algorithm calls binCount() to build a next ﬁt packing and saves the cost of the best packing found asoptcost.

It is not difficult to see that this algorithm must find an optimal packing because that packing can be decomposed into the permutation that would produce it under next fit. The algorithm takesO(n· n!) time to generate all n! permutations and

computebinCount for each. Java implementations of binPack and the several variations discussed here are available for downloading from AlgLab. The run- time experiments mentioned in this section were performed on the HP platform described in the ﬁrst timing experiment of Section 3.1.2.

Like most exhaustive search algorithms,binPack is painfully slow: a Java implementation takes a little more than an hour to run through all permutations whenn= 14. (In contrast, the polynomial-time ﬁrst ﬁt algorithm in Section 3.2

can pack a list of sizen= 100,000 in about 0.01 second.)

Here are some ideas for making the exact algorithm run faster via algorithm tuning.

Branch-and-bound. The branch-and-bound technique is an important tuning strat-

egy for any exhaustive-search algorithm. The idea is to insert a test to compare the minimum cost found so far (optcost) to a lower-bound estimate on the ﬁnal cost of a partially constructed solution. If the lower bound is greater thanoptcost, then this partial solution cannot possibly lead to a new optimum, and further recursion can be abandoned. We say that this “branch” of the recursion tree can be “pruned.” Here are three lower bounds that could be checked againstoptcost at stage k. • The bin count for a partial list list[0..k] is a lower bound on the bin count

for the whole list. Deﬁne functionbinCount(k) to compute the bin count for list[0..k].

• The sum of weights in a list (rounded up) is a lower bound on bin count for the list. For example, if the weights sum to 12.3, at least 13 bins are needed to pack them. Deﬁne functionweightSum(k+1) to sum the weights in list[k+1..n-1]. The quantity

Ceiling (weightSum(k+1) - (1-list[k]))

is a lower bound on bin count for the partial list inlist[0..k]. The Ceiling function performs the rounding-up step. The negated second term reﬂects the possibility that some weights, totaling at most (1 - list[k]), might be packed together with the weight inlist[k] and not included in the sum. • The sum of these two lower bounds is even better. If

binCount(k) +

Ceiling( weightSum(k+1) - (1-list[k]) ) >= optcost then further recursion on the list can be skipped.

Applying this tuneup to the loop in Figure 4.2 we obtain the following code fragment.

8 for (i = k; i<n ; i++) {

9 swap(list, k, i) // try it

9.1 b = binCount(k);

9.2 w = weightSum(k+1);

6 8 10 12 14 101 104 107 1010 n Recursions

Figure 4.3. Branch-and-bound. Thex-axis marks problem sizes n. The y-axis marks total recursive stages executed, on a logarithmic scale. Crosses show the counts for one test ofV 0 at each problem size. Circles show the results of three random trials ofV 1 at each problem size. With branch-and-bound the algorithm executes between 5×101_{and 2.3}_×109_{times fewer recursions.}

10 binPack (k+1); // recur if needed

11 swap (list, k, i) // restore it

Branch-and-bound adds code that increases the cost of each recursive stage, in hopes of reducing the total number of stages executed. There is no guarantee that the reduction in recursion count will be enough to justify the extra cost of the binCount and weightSum procedures. Experiments can be used to evaluate the trade-off.

Figure 4.3 shows results of an experiment to compare total recursive stages invoked in our original version, called V0, to a branch-and-bound version V1. Since V0 always recurs the same number of times, only one test was performed per problem size. V1 was measured in three random trials at each problem size, using list weights drawn uniformly from[0,1).

Branch-and-bound is clearly very effective at reducing the total number of recursive stages, although the amount of reduction can vary signiﬁcantly among trials.At

n= 14, V0 executed 236.9 billion recursions while V1 executed between 730,000

and 1.59 billion recursions, which represents improvements by factors between 150 and 2200. Overall in these tests counts of recursive stages improved by factors as low as 50 and as high as 230 million.

This reduction in recursion counts translates to important runtime improvements, despite increases in the cost per stage. V0 takes 63 minutes of CPU time at

n= 14, while V1 has average runtimes near 20 seconds, which represents about a

3 binPack (k, bcount, capacity, sumwt) { 4 if (k == n) { 5 if (bcount < optcost) 6 optcost=bcount; } 7 else {

8 for (i=k; i<n; i++ ) {

9 swap (list, k, i); // try it

9.1 if (capacity + list[k] > 1) { // does it fit?

9.2 b = bcount + 1; // use new bin

9.3 c = 1 - list[k];

9.4 }

else {

9.5 b = bcount; // use old bin

9.6 c = c - list[k];

}

9.7 w = sumwt - list[k]; // update sumwt

9.8 if (b+Ceiling(w-c) < optcost) // check bound

10 binpack(k+1, b, c, w); // recur if necessary

11 swap (list, k, i); // restore it

} } }

Figure 4.4. binPack V2. This version applies branch-and-bound and propagation. Branch-and-bound is a special case of pruning, which is discussed more fully in Section 4.1.1 under Recursion-Heavy Paradigms.

Propagation. Our next tuning strategy focuses on speeding upbinCount and weightSum, which together contribute O(n) extra work at each recursive stage. Implementation V2 uses propagation to replace these methods with incremental calculations that take only constant time per stage.

The new version is shown in Figure 4.4. To compute the weight sum incrementally we introduce a new parametersumwt, initialized to equal the sum of the entire list. On line 9.7 the weight inlist[k] is subtracted from sumwt and passed to the next recursive stage. Calculation ofbinCount(k) is propagated by introducing two parameters,bcount and capacity, and performing next ﬁt incrementally during the recursion. Lines 9.1 though 9.6 determine whether the current weightlist[k] ﬁts into the current open bin, or whether a new bin is needed. Now that the value ofcapacity is available, it can be used to give a tighter lower bound on the estimated bin count, so1-list[k] is replaced with 1-c in the test on line 9.8.

Tests using the Java -Xprof proﬁler to compare V1 and V2 on identical inputs show that propagation cuts the average cost of each recursive stage in half; that translates to a 50 percent reduction in total running time. The new lower bound test on line 9.8 yields small improvements: about half the time there is no difference in recursion counts, and 90 percent of the time the improvement is less than 20 percent.

In document 9cgmv.A.Guide.to.Experimental.Algorithmics.pdf (Page 111-116)