A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

(1)

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

in Cloud Computing Environment

1

Congcong Xiong,

2

Long Feng,

3

Lixian Chen

*1

College of Computer Science and Information Engineering, Tianjin University of Science &

Technology, Tianjin 300222, China, E-mail: [email protected]

2

College of Computer Science and Information Engineering, Tianjin University of Science &

Technology, Tianjin 300222, China, E-mail:[email protected]

3

Informatization Construction and Management Office, Tianjin University of Science &

Technology, Tianjin 300222, China, E-mail: [email protected]

Abstract

Task scheduling and resource allocation are two core techniques in cloud computing. In order to use resource efficiently in heterogeneous environment, the paper presents an new task scheduling algorithm based on Genetic Algorithm (GA). The model considers four aspects of the task scheduling: task finished time, task expenses, bandwidth and reliability in cloud computing environment. And the optimal target of the model is to achieve min-time, min-cost, max-bandwidth and max-reliability. Besides, the new algorithm adopts rule-bound crossover and mutation operation to improve individual quality. The results of simulation experiments validate that compared with the existing GA, the new GA introduced the Quality of Service (QoS) can reflect users’ satisfaction of the scheduling results in the round and solve the task scheduling problems in cloud computing environment effectively.

Keywords: Genetic Algorithm (GA), cloud computing, task scheduling

1. Introduction

There are lots of computer resource in the cloud environment, including CPU, memory, bandwidth and so on. And the method to schedule tasks effectively has become one hotspot in the field of computer science. The task scheduler in cloud computing environment is to determine a proper assignment of resources to the tasks of jobs to complete all the jobs received from users. Until now, researchers have proposed some static, dynamic and mixed forms of resource scheduling strategy in cloud computing environment, such as: FIFO (First In and First Out) and its simple extensions, ISH[1], ETF[1] , GA-based task scheduling and so on[2-5]. The first two algorithms are simple and belong to the static strategy, but usually with poor performances. Because the resources pool quotas and job queues are partly depended on artificial settings. However, GA-based task scheduling algorithms belong to heuristic intelligent algorithm, while there are always some problems such as low convergence, one-sided target and so on.

Considering the shortage of the existing task scheduling algorithms in cloud computing, a new task scheduling based on improved genetic algorithm is presented. The well-distributed strategy, which makes individuals distribute uniformly in the solution space by using the chromosome matching rate when the initial population is generated, is proposed to avoid the premature convergence effectively. And the Quality of Service (QoS) is introduced to improve the fitness function, which can not only find the corresponding relation between the task and the virtual machine quickly and effectively, but also can reflect users’ satisfaction on the scheduling results.

2. Task scheduling

There are many similarities as well as differences on resource scheduling between in cloud computing and other environments. The most remarkable difference is the object of scheduling. The objects of traditional resource scheduling are the threads and tasks running on entity resources which belong to the fine grained scheduling. But the objects of scheduling in the cloud environment are virtual machines which belong to the coarse grained scheduling. There are lots of computation resources, store resources and other resources in cloud computing

(2)

environment. The task scheduling algorithm not only needs to be congruent with the deadline of the jobs, but also concern the users’ expectations on the bandwidth and cost of the resources. So, the task scheduling in cloud computing environment should be a multi-objective scheduling.

Mapreduce model divides jobs into several interdependent tasks after users submit jobs into the cloud computing environment. The execute process of tasks can be represented by a Directed Acyclic Graph (DAG) which is shown in Figure 1.The nodes are the tasks to be executed. The directed edges show the dependent relationships of the tasks.

Figure 1. An example of a DAG

The task scheduling in cloud computing environment is to execute N interdependent tasks

}

,

{

T

₁

T

₂

T

_N

T





on M resources (

P



{

P

₁

,

P

₂

,



,

P

_M

}

) efficiently, and the result of it should satisfy users’ expectations.

Expected Time to Compute (ETC) matrix

ETC

[

i

,

j

]

represents the expected computation time of the task

i

on the resource

j

. If the task T_i cannot execute on the resource P_j,





]

,

[

i

j

ETC

. The total finish time of the task T_i could be obtained from Equation 1.

]

,

[

min

1

ETC

i

j

s

ETF

M j j i





_ (1) Then the total execution time of all the tasks could be expressed as follows:

_i N i

ETF

time

Execute

1

max

_



(2) Where,

i





1 ,

2 ,



,

N



j





1 ,

2 ,



,

M



.

Network transmission decided by bandwidth has a significant effect on those applications which communicate with others frequently or contain a large amount of information. Given that

wm

BW

is the bandwidth of the resource, then the total used bandwidth of all the tasks can be defined as follows:



 



( ) ) 1 (

ln

)

(

_

m TaskTatal m TaskTotal i wm m

BW

J

TaskNum

bw

Execute



(3)

Price constraint is one of the most normal QoS constraints at present. Since the charge of resources is measured by unit, the task

T

_i could be defined as:

T2 T3 T4 T5 T6 T7 T8 T1

(3)

Mbps bw MB stor MB mem num cpu

q

C

q

C

q

C

q

j

i

EC

[

,

]



₁ _/



₂ _/



₃ _/



₄ _/ (4) where

C

is the unit price of resources, and

P

_iis the number of resources. Then, the total price of all the tasks could be obtained from Equation 5.

]

,

[

min

cos

_

1 1

j

i

EC

t

Execute

N i M j



 



(5) Assuming that Fail[j]is the breakdown rate of resources obtained by resource monitor system. The user expected function about the completion rate can be obtained by Equation 6.

])

,

[

1 (

_

1

Fail

i

j

succ

Execute

N i







 (6)

DEFINITION 1 The execution of a task is said to achieve user satisfaction when the resource consumption of the task is near to the value users have expected. is the real resource consumption of the task . is the resource consumption of the task which is the user expectation value. Formally, the user satisfaction function could be expressed as:

i i

i

AR

ER

W





ln

/

(7) Where



is a balance constant, and

0 





1

. The value of user satisfaction function is 0, when AR_iis equal to ER_i, which indicates the scheduling result has achieve users’ satisfaction. And ifW_i 0, it means the real resource consumption of the task exceeds users’ expectations. While if W_i 0, the result is totally contrary to the former one.

Given the user expected cost Expect_cost , user expected computation time Expect_time , user expected bandwidth Expect_bw and set user expected completion rate Expect_succ which are set by users. A weighted objective function can be used as the fitness function which is defined as：

succ

Expect

succ

Execute

t

Expect

t

Execute

bw

Expect

bw

Execute

time

Expect

time

Execute

f

_

ln

cos

_

cos

_

ln

_

ln

_

ln

₂ ₃ ₄ 1





















(8)

where



_i indicates the weight of the QoS constraint and

1

4 1





 i i



. As various applications may require different QoS, weight coefficient vectors can be set differently to satisfy their demands.

3. Genetic algorithm

Genetic Algorithm (GA) in particular became popular through the work of John Holland in the early 1970s [6]. GA generates solutions to optimization problems using techniques inspired by natural evolution [7]. And it becomes a widely used global optimization algorithm in many fields with its remarkable characteristics of high-efficiency, stability, suitability for parallel processing [8].

There are always some problems such as premature convergence in the Basic Genetic Algorithm (BGA). Several algorithms and methods have been proposed to solve the task

(4)

scheduling problems. Most of them, however, only unilaterally consider reducing the completion time and the average completion time, such as [2,9-11] or reflecting users’ integrated requirements on the network bandwidth, price and so on. Therefore, researches focusing on all of the aspects are needed. According to the basic algorithm idea of GA and characteristics of task scheduling in cloud computing environment, a new optimization method based on the BGA for solving the above-mentioned problems is presented. The improved algorithm can avoid the premature convergence effectively by using the chromosome matching rate. In addition, considering the definition of QoS, its fitness function can take users’ expectations including the service response time, network bandwidth, task expenses and reliability as the standard to measure the scheduling results.

3.1. Encoding of chromosomes

Binary encoding and float encoding are the most common types of encoding. Considering the division of jobs in cloud computing environment, this paper adopts an indirect encoding type: resource-task encoding. The total number of all the tasks is the length of each chromosome. The value of each gene is the resource number at the same locus. For example ,given the length of each chromosome 10 and the value range of each gene is from 1 to 3.

The chromosome {2,3,1,2,3,2,1,2,2,1} means that the first task is carried out on the second virtual machine (resource), and the second task is carried out on the third virtual machine, and so on. Therefore, three tasks have been sent to the first virtual machine to be executed:

T

₃,

T

₇ and

T

₁₀.Five tasks have been sent to the second virtual machine and two to the third virtual machine.

3.2. Initial population generating

The initial population has great influence on convergence of GA. The population size is usually set to be between 50 and 160.Given the population size

S

, the length of each chromosome

N

, then the initial population is generated randomly.

3.3. Fitness function

During each successive generation, individual solutions are selected through a fitness function. It measures the quality of the represented solution. So, the fitness function is a crucial part of GAs. The fitness function is always designed to be a one-sided target function in the traditional GA, which is not suitable for cloud computing.The satisfaction of the service in cloud computing environment can be measured by Quality of Service (QoS). Considering the commercial objective of cloud computing and QoS model, the fitness function can be set as Equation 8 .

3.4. Operators

3.4.1. Selection Operation

The objective of selection operation is to make the better solutions have a higher probability to be transmitted to the next generation. The value of selection rate can be defined as:





_S k k

f

i

f

i

p

1

)

(

)

(

(9)

(5)

where

f

_i is the fitness value of the individual

i

. The roulette wheel selection schema is adopted to implement the selection step. The cumulative probability of the individual

i

could be obtained from Equation 11.





i k s

i

p

k

P

1

)

(

)

(

(10) 3.4.2. Crossover and Mutation

Crossover is known as a basic genetic operator. It partially exchanges information between the two selected chromosomes [6,12]. Once the string is picked at random to be subjected to crossover from the population, it randomly chooses several crossover points and exchanges the alleles with its mate to form two new strings. For example, two crossing chromosomes- and can exchange one or more alleles. Mutation helps avoid sticking at the local optimum and guarantee the population diversity. Chromosome reversal strategy is used as the mutation methodology in this paper. The chromosome randomly selects its substring and inverts it.

3.5. Proposed Algorithm Procedure

The main procedure of the new algorithm is described as follows. Step1: Generate an initial population P(0) using the matching rate. Step2: Sort the fitness values of the chromosomes in ascending order, k=0.

Step3: Choose two chromosomes using the roulette wheel and prepare them for crossover and mutation.

Step4: Use crossover and mutation to create a new population P(k+1), k=k+1.

Step5: If the maximum number of generations or a convergence is not reached, then return to Step 2.

4. Experimental results and evaluation

A simulation experiment is designed to compare the scheduling performance of the BGA and Improved Genetic Algorithm (IGA). The experiments have been carried out on the simulation platform named CloudSim. The initial parameters of the algorithms are as follows: maximum number of generations 80, resource number 20, crossover probability 0.8, mutation probability 0.2. The value range of task number is from 20 to 100, and the weight coefficient array {



₁

,



₂

,



₃

,



₄} is set to{

0.6,0.1,0.3,0

}(



₄is set to 0 because of the breakdown rate of resources obtained by the platform CloudSim). The finished time of two algorithms is shown in figure 2.And the fitness value of two algorithms is shown in figure 3.

(6)

As it is shown in Figure2, the average finish time of the BGA at the preliminary stage is less than that of IGA. However, as the number of generations increases, the advantages of the IGA become more and more obvious. The reason is that the crossover and mutation of the IGA improve its global search ability.

The fitness value could reflect users’ satisfaction of the

scheduling result. The scheduling result is congruent with users’ satisfaction when the

fitness value is 0. If the fitness value is bigger than 0, it means the scheduling result

exceeds users’ expectations. It can be seen from the experimental results that the IGA

has higher fitness value than standardized Genetic Algorithm, which means the

scheduling result of IGA can satisfy users’ expectations better.

Figure 3. Fitness value of two algorithms

5. Conclusions

Resource scheduling becomes more complex as the introduction of virtualization technology in cloud computing[13]. This paper presented an improved task scheduling algorithm based on the basic genetic algorithm combined QoS for the task scheduling in cloud computing, with the objective to satisfy users’ expectations on service response time, network bandwidth, task expenses and reliability. The results show that the new IGA based task scheduling algorithm not only can be able to get higher resources utilization, but also has the ability to reflect the conformity between the schedule result and users’ expectations. In addition, the next step is to focus on the study of dynamic queue scheduling algorithm to the realization of a universal task scheduling algorithm and combine with other related algorithm to make comprehensive comparisons according to the different performance index.

6. References

[1] Kwok Y K, Ahmad I, “Static scheduling algorithms for allocating directed task graphs to multiprocessors”, ACM Computing Surveys, Vol.31, No.4, pp.406 ~ 471, 1999.

[2] Chenghua Shi, Xiaomin Wang, “Scheduling Model of Dispatching Ready Mixed Concrete Trucks Based on GA”, AISS, Vol. 4, No. 8, pp. 131 ~ 136, 2012.

[3] JianPing Wang, YanLi Zhu, HongYu Feng, “A Multi-Task Scheduling Method Based on Ant Colony Algorithm Combined QoS in Cloud Computing”, AISS, Vol. 4, No. 11, pp. 185 ~ 192, 2012.

[4] Paton N W, de Aragao M A T, Lee K, et al, “Optimizing utility in cloud computing through automatic workload execution”, IEEE Data Eng Bull, Vol.32, No.1, pp.51 ~ 58, 2009.

(7)

[5] Luyun Xu, Yunsheng Zhang, Xia-an Bi, “A New Model and Queue Management Algorithm for Congestion Control in Cloud Service”, AISS, Vol. 4, No. 11, pp. 320 ~ 327, 2012.

[6] RUDOLPH G, “Convergence analysis of canonical genetic algorithms”, IEEE Trans on Neural Networks, Vol.5, No.1, pp.96-101, 1994.

[7] Di Martino V, Mililotti M, “Suboptimal scheduling in a grid using genetic algorithms”, Parallel Computing, Vol.30, pp.553-565, 2004.

[8] Correa R.C., Ferreira A., Rebreyend P., “Scheduling multiprocessor tasks with genetic algorithms”, IEEE Transactions on Parallel and Distributed Systems, Vol.10, No.8, pp.825-837.

[9] JinFeng Wang, KaiYu Chu, “An Application of Genetic Algorithms for the Flexible Job-shop Scheduling Problem”, IJACT, Vol. 4, No. 3, pp. 271 ~ 278, 2012.

[10]Salcedo-Sanz S.,Bousono-Calzon C.,Figueiras-Vidal A.R., “A mixed neural-genetic algorithm for the broadcast scheduling problem”, IEEE Transactions on Wireless Communications, Vol.2, No.2, pp.277-283.

[11]Zomaya A.Y., Ward C., Macey B., “Genetic Scheduling for parallel processor systems: comparative studies and performance issues”, IEEE Transactions on Parallel and Distributed Systems, Vol.10, No.8, pp.795-812, 1999.

[12]Arnold D V, Hans-Georg B, “A General Noise Model and Its Effects on Evolution Strategy Performance”, IEEE Transaction on Evolutionary Computation, Vol.10, No.4, pp.380-391, 2006. [13]Jianfeng Zhao, Wenhua Zeng, Miu Liu, Guangming Li, “A model of Virtual Resource Scheduling