Scheduling algorithms - Scheduling SDaaS Software Workflows in the Cloud

4.3 Scheduling SDaaS Software Workflows in the Cloud

4.3.7 Scheduling algorithms

The scheduling needed for software workflows is a multi-criteria scheduling which aims to meet the execution requirements of each activity and reduce the overall exe-cution cost (of all workflows) while not significantly increasing the exeexe-cution time (of individual workflows). Here, we define the terms related to the scheduling algorithms:

• Workflow engines pool: is a pool of workflow engines deployed on similar virtual machines (in terms of computational power and deployment model).

• Workflow makespan (execution time): is the difference between the execution start time of the first activity in the workflow and the execution end time of the last activity in the workflow.

• Workflow engine operational hours: are the hourly units of time starting from the time a workflow engine starts.

• Workflow engines pool size (R): is the maximum number of active workflow engines a pool can have at any given operational hour.

• Execution cost: is the cost of executing all the desired workflows in the SDaaS architecture. This can be calculated by aggregating the cost of running each workflow engine instance as follows:

Cost= Xn

i=1

VMn∗tn (4.13)

Where VMnis the price per partial hour for running the virtual machine hosting the workflow engine and tnis the number of partial hours that workflow engine has been running.

Workflow engines are deployed on virtual machines (VMs) in the cloud. Most cloud providers charge per partial hour usage of VMs. This means that the usage time is rounded to the ceiling number of hours. For example, a one hour and ten minutes usage is charged as two hours. Therefore, to achieve cost reduction, the workflow engines pool size R for a given pool should be limited and the workflow engines in the pool should be utilised as best as possible before they are shut down. For example, if a workflow engine becomes idle after executing a 10 minutes activity, it can be kept on standby for the next 50 minutes (to accommodate any upcoming activities) without incurring any extra cost. To illustrate the effect of activities allocation to workflow engines on the execution cost, let us have a look at Figure 4.3. The figure shows three activities [A1, A2, A3] and their execution times [20, 40, 20] minutes respectively. Each activity is allocated to a workflow engine resulting in the cost of three partial hours and underutilised workflow engines (the grey areas representing 100 minutes of idle time).

While in Figure 4.4, the three activities are allocated on the workflow engine resulting in two partial hours cost and better utilisation of the workflow engine (40 minutes of idle time). The latter scenario would be ideal if the three activities were sequential.

However, if they were concurrent, some of the activities will wait for others to finish executing. Therefore, there would be a trade-off between the workflow execution cost and makespan.

Since the pricing for VMs is per partial hour, then starting and shutting down VMs should happen at the beginning of each operational hour. The decision to allocate an activity to a workflow engine should be made only when the activity becomes ready to execute, i.e., it is an event-driven decision. In this chapter, we try four different scheduling algorithms and benchmark their performance from both execution cost and makespan perspectives.

Chapter 4: Cost-efficient Scheduling of Software Processes Execution in the Cloud

A1 (20 mins)

A2 (40 mins) A3

(20 mins)

Figure 4.3: Allocating activities to workflow engines (a)

A1 (20 mins)

A2 (40 mins)

A3 (20 mins)

Figure 4.4: Allocating activities to workflow engines (b)

Below we explain the four algorithms. These algorithms are all dynamic and cen-tralised.

4.3.7.1 Unlimited First Come First Serve (UFCFS)

This is the simplest and most basic scheduling approach where the pool size R is always set to infinity. Once an activity is ready-to-execute, it is allocated to an available workflow engine in the relevant workflow engines pool (if exists), otherwise a new pool and/or workflow engine are created. Figure 4.5 shows the UFCFS algorithm.

4.3.7.2 Limited First Come First Serve (LFCFS)

This is a similar approach to the UFCFS except that there is a universal limit on the num-ber of active workflow engines in any workflow engines pool at any time. Figure 4.6 shows the LFCFS algorithm. The workflow engines pool size limit (R) is an arbitrary value which aims to restrict the execution cost. If all workflow engines in a pool are busy and their number has reached R and a new activity is ready to be executed in this pool, the scheduler will allocate this activity to the workflow engine with the earliest finishing time. This means that the activity will be delayed until a suitable workflow engine becomes available again.

1 A c t i v i t y A;

2 L i s t<WorkflowEnginePool> pools ; 3

4 s t a r t

5 f i n d a pool i n pools which match t h e computational r e s o u r c e s and p r i v a c y r e q u i r e m e n t s o f A.

6 i f ( pool i s found ) 7 {

8 f i n d an a v a i l a b l e workflow engine 9 i f ( workflow engine i s found )

10 add A t o t h e j o b s queue o f t h e engine ; 11 e l s e

12 {

13 c r e a t e and s t a r t a new workflow engine and add A t o i t s j o b s queue ;

14 }

15 } 16 e l s e 17 {

18 c r e a t e a pool ;

19 c r e a t e and s t a r t a new workflow engine i n t h e new pool and add A t o i t s j o b s queue ;

20 } 21 end

Figure 4.5: Unlimited First Come First Serve algorithm

4.3.7.3 Pool-based Adaptive Task Schedule

This algorithm is adapted from the Adaptive Task Schedule algorithm [109] described in Section 4.2. Here, we define a workflow engines pool size limit R dynamically for each pool at the beginning of each operational hour, hence the name Pool-based. the algorithm consists of two main steps:

1. Matching each ready-to-execute activity with a suitable workflow engines pool (a pool which contains workflow engines matching the required resources for the activity.

2. For each workflow engines pool i, the pool size limit Ri is dynamically calculated using the following formula:

Ri = T ∗ Ei (4.14)

Where T is a universal arbitrary real value between 0 to 1 which indicates the proportion between the activities to be executed and the workflow engines. For example, when T

Chapter 4: Cost-efficient Scheduling of Software Processes Execution in the Cloud

1 A c t i v i t y A;

2 L i s t<WorkflowEnginePool> pools ;

3 i n t R ; / / the max number of workflow engines in each pool

Figure 4.6: Limited First Come First Serve algorithm

is 0.5, it means that there should be a workflow engine for each two activities. Eiis the number of activities which match pool i and are expected to start in the next hour.

Unlike the original algorithm which has two versions (one looking forward and one backward), here we only look at the expected activities in the next hour (forward). Since the activities arrive in a non-deterministic way, the history alone does not necessarily give an accurate prediction for the predicted load in the next hour.

Figure 4.7 shows this algorithm. As we can see, the algorithm is very similar to the LFCFS algorithm except that each pool has its own R.

4.3.7.4 Proportional Adaptive Task Schedule

Similar to the previous two algorithms, this algorithm sets a limit for the workflow engines pool size R. The difference is that R is now calculated based on the proportion between the execution time of the activities that are predicted to start in the next hour and those which have started execution in the past hour. The following formula is

1 A c t i v i t y A;

2 L i s t<WorkflowEnginePool> pools ;

3 L i s t<int > R ; / / the max number of workflow engines f o r each pool

Figure 4.7: Pool-based Adaptive task scheduling algorithm adapted from [109]

applied when Ri for a given pool i is calculated for the first time:

Ri =Tnext

(4.15)

Where Tnext is the total execution time of the activities that will start in the next hour (in minutes). Therefore, Ri is the floor of the expected execution hours needed to execute the activities that would start in the next hour. When Ri has been set before, the following formula is applied to calculate Rion every operational hour:

Ri =& Tnext

Where Tpast is the total execution time (in minutes) of the activities that have started in the past hour and Ri

0 is the last value of Ri. The proportional adaptive task schedule

Chapter 4: Cost-efficient Scheduling of Software Processes Execution in the Cloud

algorithm itself is the same as the pool-based adaptive task schedule algorithm in Figure 4.7.

In document Software development in the post-PC era : towards software development as a service (Page 94-100)