Evaluation: Setup - On-Line Grid Scheduling 6

On-Line Grid Scheduling 6

6.5 Evaluation: Setup

The main experiments performed and described in this chapter pertain to the eval-uation and comparison of the hierarchical two-level on-line scheduling algorithms presented in sections 6.4.2-6.4.4 and their comparison to the fully centralized single-level algorithm which schedules incoming jobs on the resource set yielding minimal completion time in a greedy fashion. Due to the nature of the divisible load based algorithms, the experiments consist of two phases:

• Solving an off-line steady state scheduling problem (in the context of a Grid dimensioning problem)

• Simulating the on-line scheduling of the appropriate workload (approxi-mated in the first phase) on the Grid used in the first phase

The last step - the workload scheduling - consists of the resource selection and allo-cation phases as indicated in section 6.2.3. Our experiments have been performed for a wide range of parameter variations: for our simulations, we have varied the Grid interconnection network connectivity, the stochastic workload used during the dimensioning and scheduling phases and - for the divisible load based algo-rithms - the definition and implementation of the divisible load cost function and deviation metric.

6.5.1 Simulated Topologies

Our simulations have been performed for different Grid interconnection (i.e. Op-tical Transport Network) topologies, inspired by the sample European network shown in figure 6.3 and introduced in section 5.4.2.

Figure 6.3: 13-Node European Network

We used the sets of random networks as described in section 5.4.7. These ran-dom networks (all having13 nodes) were generated by constructing a connected

set of13 nodes through repeated addition of node-link pairs, and then added in extra links following a probability p in[0, 1]. This parameter was varied to gener-ate networks with different average connectivity values. All Grid topologies have been dimensioned so that excess load generated at one site can be handled by the aggregation of all remote sites. We have chosen13 computational resources with geometrically increasing processing capacity, with a reference capacity of3 units of work per time unit and a capacity increase of5% between successive computa-tional resources.

6.5.2 Simulated Scenarios

The scheduling scenarios studied are those found in two-tier Grids. In particular, we study scenarios where excess workload arrives at one site (the top tier) which must then be distributed to the other, remote sites (the second tier). In case of uniform workload distribution, these scenarios reduce to the scenarios detailed in section 5.2.3. The scenarios used in this chapter differ from the scenarios described in section 5.2.3 in that uniform workload distribution is not enforced. Rather, we look for the optimal excess workload distribution over the remote sites.

In the dimensioning phase of our experiments, each Grid topology (as de-scribed in the previous section) has been dimensioned in order to support all possi-ble two-tier excess load scenarios as described above. This means that the off-line combined Grid dimensioning and scheduling problem solved describes - in our case - thirteen workload scenarios. Again, the computational resources have been dimensioned to handle load up to the60% percentile of the distribution describing the arriving workload at that resource’s site. Excess load is then created by having one site generate more workload than can be handled by its own computational resource. The simulation results presented below were obtained by scheduling the same excess workload in these thirteen scenarios on the resulting Grid.

In order to measure the relevant metrics in steady state, transient effects oc-curring in the start and end phases of each simulation (corresponding to workload build up and workload draining mechanics in the Grid) have been eliminated from our calculations.

6.5.3 Simulated Workload

The workload used in the scheduling simulations is chosen in such a way that the excess load arriving per time unit at the single source site in each scenario equals 90% of the Grid’s residual computational capacity. To process this amount of ex-cess load, co-operation of all remote Grid sites is required. An on-off distribution was used to model the intermittent arrival of large and small jobs. The distribu-tions from which excess job parameters were drawn are shown in table 6.3. We have used a standard workload and a workload featuring the same average values,

yet bigger variance for its constituent parameters. This latter workload is mainly used in section 6.6.4. Distributions are taken to be uniform. For use in our DLT based dimensioning and scheduling model, the parameters shown in table 6.3 yield the same average amounts of arriving computational load per time unit as well as the same average amount of data transferred per unit of processing. Our on-line workload consisted of1000 jobs. To eliminate transient scheduling and queue draining effects, performance metrics such as average job response time have been measured without taking into account the first and last100 jobs.

Workload On-period (Jobs) On-Joblength On-Data(MB) Off-Joblength Off-Data(MB)

Standard 3/10 40-50 5000-9000 10-17 5000-9000

Inc.^σ_µ 1/10 100-150 5000-9000 10-13.2 5000-9000

Table 6.3: Excess Workload Characteristics

In our scenarios, the average job interarrival time was taken to be0.5s. Each job reads its input from and returns it output to its submission site. The Grid’s network has been dimensioned (see section 6.6) to support the average expected network load resulting from this.

In document Dimensionerings- en werkverdelingsalgoritmen voor lambda grids (Page 159-162)