The Complete Model - The Comprehensive Model

5.1 The Comprehensive Model

5.1.4 The Complete Model

Given a job arrival process {Arri} obtained with the model in Chapter 4, we sum-

marize our full model as the following stages:

1. Call the runtime classication procedure described in Section 5.1.1 to obtain mixture of Gaussians parameters (µk,

k;pk), k = 1, . . . , G and classication

labelsLi∈ {1, . . . , G}, i= 1, . . . , n. Then calculate the runtime distributions of

single jobs{pjobk} and jobs of BoTs{pbotk}as presented in Section 5.1.2.

2. Call the procedure in Section 5.1.3 to classify the parallelism process and determine classication labelsCi, i= 1, . . . , n.

3. Fit the BoT sizes from the real data to a Zipf distributionZ.

4. Call Algorithm 1 with inputs from the above steps to obtain a synthetic runtime process {Runi} and a synthetic parallelism process {Cpui}. The set of triples

{Arri, Runi, Cpui}constitutes the full synthetic parallel system workload W0.

In Algorithm 1, we rst calculate the transition conditional probability table P r(c, l)mentioned in Section 5.1.3. P r(c, l)of a job is calculated by the ratio between the probability P(c, l)for that job to have parallelism labelc and runtime labell at the same time and the probability P(l) for that job to have runtime label l. Note that, for a real workload, we only need to calculate the tableP r(c, l)once and apply it several times to generate several synthetic workloads. Secondly, we initialize the runtime Run1 and the number of processorsCpu1 for the rst job in the synthetic workload by calling the function RandomlyGenerate. This function will generate randomly a pair of runtime and number of processors in such a way that we can control

60 Chapter 5. A Comprehensive Parallel Workload Model

Algorithm 1Generate synthetic runtimes and parallelisms. nis the number of jobs in the real data. The model enables the generation of as many jobs as desired.

Input: Job arrivals {Arri}, clustering parameters (µk,

k;pjobk, pbotk), k = 1, . . . , G, runtime classification Li, i = 1, . . . , n, parallelism classification Ci, i = 1, . . . , n, the BoT parameter ∆ and the fitted BoT size Zipf distributionZ.

Output: A synthetic runtime process{Runi}and a synthetic parallelism process {Cpui}.

// Calculate the transition conditional probability table

maxL=max({Li}) =G;maxC=max({Ci});

forl= 1 tomaxLdo

P(l) = length({x=l,x∈{Li}})

n ;

forc= 1 to maxC do

P(c, l) = length({j∈[1,n]:Cj=c,Lj=l})

n , where j represents a job;

P r(c, l) = P_P(c,l₍_l₎);

end for end for

// Initialize

AssignBoT s= 1 and sampleBoT Sizefrom the fitted Zipf distributionZ; [Run1, Cpu1] = RandomlyGenerate(BoT Size);

// Main loop to generate {Runi} and{Cpui}

forj= 2 tolength({Arri})do

if (Arrj−Arrj−1≤∆and BoT s < BoT Size)then

Runj=Runj−1;

Cpuj=Cpuj−1;

BoT s=BoT s+ 1;

else

AssignBoT s= 1 and sampleBoT Sizefrom the fitted Zipf distributionZ; [Runj, Cpuj] = RandomlyGenerate(BoT Size);

end if end for

function[rRun, rCpu] = RandomlyGenerate(bs)

1. Randomly select a runtime label sl ∈ [1, G] using probabilities {pbotk} if

bs >1 or{pjobk} ifbs= 1;

2. Randomly select a parallelism label sc using the transition probability table

P r(c, l) withl=sl;

3. AssignrRunby sampling the Gaussian distribution fsl(µsl,

sl);

4. AssignrCpuby calling Algorithm 2 with inputssl andsc;

5.2. Experimental Results 61

Algorithm 2Generate the synthetic parallelism for a job.

Input: Runtime classification Li, i = 1, . . . , n, parallelism classification Ci, i = 1, . . . , n, runtime labelsl and parallelism labelsc.

Output: Number of processorsprocs.

1. Determine all jobs in the real data whose runtime and parallelism labels are

sl andsc, respectively based on{Li} and {Ci}. Let X be the multiset, i.e. including multiple occurrences, of the numbers of processors of these jobs. 2. Select uniformly at random an element of X to obtainprocs.

their cross-correlation. This cross-correlation is indeed controlled by steps 2 and 4 in the function since the parallelism labelsc is selected using the transition conditional probability table P r(c, l) where the runtime label sl is already known in advance. With the selected labelsc, the parallelism value is generated using Algorithm 2. This algorithm determines all jobs in the real data with runtime label sl and parallelism label sc and forms a multiset of the numbers of processors of these jobs. Then, it generates the parallelism value by selecting uniformly at random an element of the multiset. Thirdly, we control BoT behaviour in the main loop. For any two consec- utive jobs j−1 and j that satisfy the condition Arrj−Arrj−1 ≤∆2, we consider them to be similar and thus they have the same runtime and number of processors. In addition, we also control the size of each BoT by sampling a valueBoT Sizefrom the tted Zipf distributionZ. Whenever the size of a BoT reachesBoT Size, we will stop that BoT and form a new BoT by calling the function and sampling a new value

forBoT Size.

As for the complexity of the two algorithms, it is easy to see that the complexity of Algorithm 2 isO(n)since we have to traverseLi, i= 1, . . . , n andCi, i= 1, . . . , n

in the rst step of this algorithm. With respect to Algorithm 1, we do not count the step of calculating the tableP r(c, l)in its complexity because this table is static, i.e. calculated once. Algorithm 1 takesO(m−1)times for the main loop, wheremis the number of synthetic jobs, and the loop calls Algorithm 2 inside. Therefore, the total complexity of Algorithm 1 isO(m.n).

5.2 Experimental Results

We will present in this section our experiments to validate our model. We apply our model to real traces to generate synthetic workloads. The quality of the synthetic workloads is evaluated by comparing them with the real data. Long range dependence and temporal burstiness properties of the synthetic job arrival process are controlled well by our job arrival model in Chapter 4. In this section, we evaluate the Bag- of-Tasks behaviour, spatial burstiness and the cross-correlation between runtime and

62 Chapter 5. A Comprehensive Parallel Workload Model

parallelism. Quantitative metrics for these characteristics are given in Section 2.3. In addition, we also evaluate our model based on the marginal distributions. Finally, we present a simulation experiment with our model and compare its performance with that of real world data.

In document Workload modeling and performance evaluation in parallel systems (Page 70-73)