Systematic Search for Bin Packing - Combinatorial optimisation for sustainable cloud computing

For this reason we use the χ2 _{test to further validate the results.}

The χ2 _Test. _{As a complementary approach, we used the χ}2 _{goodness-of-fit test}

which is less sensitive to outliers in the sample data.4 _{The null hypothesis is that the} observed and expected distributions are not statistically different.

The procedure requires grouping items into γ categories according to their size. Based on these categories, we can compute the expected number of values in each category, assuming that the item sizes are drawn from a Weibull distribution with shape and scale parameters estimated from the data set. The χ2_{statistic is then computed as:}

χ2 =

i=1

(Oi− Ei)2/Ei,

from which we can obtain the corresponding p-value, where Oiand Eiare the observed

and expected frequencies of each category i, respectively. We model the tail of the distribution, in the standard way, by building a wider category that counts all items in the tail of the distribution. The other γ − 1 categories are equally sized.

As shown in Table 7.1, the null hypothesis cannot be rejected for any of the benchmarks that are presented. Therefore, the conclusion is that the Weibull distribution provides a good fit for the item size distributions in the benchmark instances we considered. We conjecture that it will also do so in very many other cases encountered in practice.

7.3 Systematic Search for Bin Packing

We consider the performance of a systematic constraint-based bin packing method on a wide number of classes of Weibull-based bin packing benchmarks. Our experiment involved varying the parameters of the Weibull distribution so that item sets for bin packing instances could be generated. A range of bin capacities were studied. The details of the experimental setup are described in Section 7.3.1.

7.3.1 Bin Packing Instances and Solver

We considered problems instances involving 100 items. We fixed the scale parameter,

λ, of the Weibull to 1000. As experimental parameters we varied both the capacity of

Table 7.1: The parameters of the best-fit Weibull distributions obtained for randomly selected instances of a number of real-world examination timetabling benchmarks.

Weibull Best-fit KS test χ2 _test

Set Instance shape scale p-value #(cat) lbTail p-value

ETT Nott 1.044 43.270 0.7864 7 100 0.059 MelA 0.946 109.214 0.091 10 427 0.073 MelB 0.951 117.158 0.079 5 47 0.051 Cars 1.052 85.438 0.037 18 53 0.109 hec 1.139 138.362 0.436 10 293 0.204 yor 1.421 37.049 0.062 7 117 0.068 RA ODEF a12 3 0.447 104,346.70 0.005 30 163,000 0.105 a13₃ 0.549 88,267.85 0.001 15 54,800 0.068 a25 1 0.562 67,029.83 0.000 30 470,000 0.768 a24 4 0.334 103,228.30 0.001 30 500,000 0.051 b3 6 0.725 40,469.74 0.000 20 185,000 0.060 b5 3 0.454 91,563.28 0.000 30 140,000 0.088

the bins and the shape of the Weibull distribution, generating 100 instances for each combination of parameters. The capacities we considered were c × max(I), where

c ∈ [1.0, 1.1, . . . , 1.9, 2.0] and max(I) is the maximum item size encountered in the

instance. Therefore, the capacity of the bins considered were at least equal to the largest item, or at most twice that size.

For the shape parameter of the Weibull we considered a very large range: [0.1, 0.2, . . . , 19.9], yielding 199 settings of this parameter. By fixing the scale param- eter to 1000 we considered item sizes that could span over three orders-of-magnitude. To build our problem generator we used the Boost library [Boo]. This is a C++ API that includes type definitions for random number generators and a Weibull distribution, which is parameterized by the random number generator, the shape and the scale. Iteration capabilities for traversing the distribution of generated values are also pro- vided. We generated 100 instances for each combination of shape and scale, giving 199 classes of item sets, providing 19,900 item sets. For each of these sets we generated bin packing instances by taking each set and associating it with a bin capacity in the range described above. In this way we could be sure that as we changed bin capacity, the specific sets of items to be considered was controlled.

Constraint-based bin packing Model. For our experiments we have used Gecode 3.7.0 [Gec06]. The bin packing model used is the most efficient one included in the Gecode distribution for finding the minimum number of bins for a given bin packing

instance [STL12]. This model employs the L1 lower bound on the minimum number

7. ONBINPACKINGINSTANCES 7.3 Systematic Search for Bin Packing

It uses an upper bound based on the first-fit bin packing heuristic which packs each item into the first bin with sufficient capacity.

The model uses the following variables: one variable to represent the number of bins used to pack the items; one variable per item representing which bin the item is as- signed to; and a variable per bin representing its load. The main constraint included in the model is the global bin packing constraint proposed by Paul Shaw [Sha04], en- forcing that the packing of items into bins corresponds to the load variables. Those items whose size is greater than half of the bin capacity are directly placed into different bins. If a solution uses a number of bins smaller than the upper bound, then the load associated with unused bins is set to 0, and symmetry breaking constraints ensure that this reasoning applies to the lexicographically last variables first. Additional symmetry breaking constraints ensure that search avoids different solutions involving permutations of items with equal size.

The search strategy used is as follows. The variable representing the number of bins used in the solution is labelled first, and in increasing order, thus ensuring that the first solution found is optimal. The variables representing the item assignments to bins, and the load on each bin, are then labelled using the complete decreasing best fit strategy proposed by Gent and Walsh [GW97], which tries to place the items into those bins with sufficient but least free space. In our experiments a timeout of 10 seconds is used to ensure that our experiments take a reasonable amount of time. We verified that increasing this to five minutes does not significantly increase the proportion of solved instances. However, of course, for some classes a large number of time-outs were observed, so further empirical study is needed in those cases.

7.3.2 Small Weibull Shape Parameter Values

In this section we explore the behaviour of a systematic search on bin packing instances generated using our Weibull-based approach when considering small values of the distribution’s shape parameter, specifically values ranging from 0.5 to 5.0 in steps of 0.1. Figure 7.3 presents the results – Figure 7.3a and Figure 7.3b present the average time required to those instances solved within the timeout, and the proportion of instances involved, respectively. In these plots we only consider capacity factors 1.0, 1.5, and 2.0.

It is clear that the shape factor, which defines the spread of item sizes, has a dramatic impact on the average time taken to find the optimal solution to a bin packing instance. By referring back to Figure 7.1a one can observe how the distribution of item sizes is

0 50 100 150 200 250 300 350 400 0.5 1.0 2.0 3.0 4.0 5.0 milliseconds shape factor c = 1.0 c = 1.5 c = 2.0

(a) Average running time for instances that did not timeout.

0 20 40 60 80 100 0.5 1.0 2.0 3.0 4.0 5.0 percentage shape factor c = 1.0 c = 1.5 c = 2.0

(b) Percentage of instances solved within the timeout.

Figure 7.3: Average runtime and percentage of solved instances for values of the shape parameter in the 0.5,. . . ,5.0 range.

7. ONBINPACKINGINSTANCES 7.3 Systematic Search for Bin Packing

changing. The lower values of the shape parameters correspond to distributions that have greater skew towards smaller items. As the shape parameter increases, consider value 1.5, there is a much greater range of possible item sizes. Once we get to higher shape values, consider value 5.0, the distribution of item sizes becomes more symmet- ric.

This shift in item size distribution impacts the difficulty of bin packing earlier when the capacity of the bin is smaller. Consider the effort required when the bin capacity is equal to the largest item, i.e. capacity factor 1.0, in Figure 7.3a. The range of shapes over which these problems are hard is quite narrow, and we shall see in the next section, that this is influenced by the bin capacity associated with the problem instance. This difficulty arises from the interaction between item size distribution and bin capacity whereby finding the best combinations of items to place in the same bin becomes challenging. As the shape parameter increases, the range of item sizes again decreases which, given the small bin capacity, makes the instance easy once more. For a bin capacity equal to the largest item size the hard region corresponds to values of Weibull shape between 1.5 and 3.0. Increasing the capacity of the bins dramatically increases the computational challenge of the problems, since again, search effort is invested in finding a good combination of items to fit into each bin. Clearly, from Figure 7.3a, we can see that problem difficulty increases as bin capacity increases. Using our proposed Weibull-based model for generating bin packing instances we claim that not only can one model some real-world bin packing settings, as shown earlier in this chapter, but it is possible to carry out very controlled experiments on the behaviour of bin packing methods, studying the effect of the various aspects of the problem, such as bin capacity and item size distribution in isolation, or together.

7.3.3 Full Range of Shape Parameters

We have also performed a more wide-ranging study of the interaction between the shape of the Weibull distribution, bin capacity, and the hardness of bin packing for a systematic method. In this section we will briefly present a set of experiments that exhibit the various behaviours discussed above. We consider all values of the shape parameter in our data set, 0.1 ≤ k ≤ 19.9. Figure 7.1b shows how the distributions with larger shape parameter values differ from those with the smaller values studied above. Essentially, these distributions have lower spread shown by successively taller density functions centering towards the value of the scale parameter.

within the timeout and the percentage of instances that this corresponds to (Fig- ure 7.4b). The average number of bins associated with these instances is presented in Figure 7.5.

Again, in these plots we can see that, as before, problem difficulty peaks at a specific value of Weibull shape for different values of capacity (Figure 7.4). From Figure 7.5, which presents the average number of bins in an optimal solution, we can extract the average number of items per bin for each class, since all of our instances have 100 items. As before, the range of shapes over which search efforts are hard, correspond to specific ranges of numbers of bins (or average number of items per bin). Therefore, there is an obvious interrelationship between bin capacity, item size distribution, and both problem hardness and numbers of items per bin.

In document Combinatorial optimisation for sustainable cloud computing (Page 154-159)