Impact Of Parallelism And Virtualization On Task Scheduling In Cloud Environment

(1)

5346

Impact Of Parallelism And Virtualization On Task

Scheduling In Cloud Environment

Sanjay Kumar Sharma, Nagresh Kumar

Abstract- Today’s, cloud computing has placed itself in every field of the IT industry. It provides infrastructure, platform, and software as an amenity to users which are effortlessly available via the internet. It has a large number of users and has to deal with a large number of task executions which needs a suitable task scheduling algorithm. Virtualization and parallelism are the core components of cloud computing for optimization of resource utilization and to increase system performance. Virtualization is used to provide IT infrastructure on demand. To take more advantages of virtualization, parallel processing plays an important role to improve the system performance. But it is important to understand the mutual effect of parallel processing and virtualization on system performance. This research helps to better understand the effect of virtualization and parallel processing. In this paper, we studied the effect of parallelization and virtualization for the scheduling of tasks to optimize the time and cost. We found that virtualization along with parallelization boosts the system performance. The experimental study has been done using the CloudSim simulator.

Keywords: virtualization; parallel programming; distributed processing

——————————  ——————————

1. INTRODUCTION

In the present world of information technology, cloud computing emerges as a new computing technology due to its economical and operational benefits. Cloud computing is able to perform the processing of an enormous amount of data using high computing capacity and distributed servers. Clients are facilitated to avail of this facility on the basis of pay-per-use policy. When the users need changes, the cloud server’s capacity scales up, and down to meet the user requirements. It is highly flexible, reduces capital expenditure, robust disaster recovery and can operate from anywhere through the internet. Users can avail these services by just submitting the request to the environment provided by the service provider. Parallel processing is a computing technique in which more than one process is executed simultaneously on different processors. Multiple processes solve a given problem efficiently by executing on multiple processors. The divide and conquer technique is used to divide a task into multiple subtasks. A parallel program written on the basis of the divide and conquer technique execute subtask on multiple processors. The requirement of high computation power cannot be full-fill by a single CPU. So parallel processing can improve the system computation power by increasing the number of CPUs. It is the best cost-effective technique to enhance system computation power. This technique can also be used in load balancing in cloud computing [1]. Virtualization is a key component of cloud computing. Virtualization is a mechanism that is used to create interactive environments like server, storage, operating system, desktop, etc, which is expected by the user. Hardware virtualization based on a set of software is also called a hypervisor or virtual machine manager(VMM). This software operates on each VM instance in complete isolation. A high-performance server, based on several machines needs suitable and customized software on demand. This approach helps to deploy tasks parallelly to the available resources on-demand, such as VMware, AmazonEC2, vCloud, RightScale, etc. [1][2][3][4]

Parallelism and virtualization are the key components of cloud computing. Now, the question is how parallelism and virtualization affect each other to improve system performance? Does it prevent or boost system performance? What are the other parameters which also affect the parallelism and virtualization? This research paper is organized as follows: the literature review of previous work is described in section-2. Section-3 contains the details about the proposed methodology for parallelization and virtualization. Section-4 describes the experimental setting and simulation result. Section-5 describes future work and conclusion.

2. RELATED WORK

Parallel architecture is an organization of resources to maximize resource utilization and improve system performance. The organisation of resources is categorised into three parts; shared memory processor (SMP), cluster parallel computers and hybrid. SMP is also called multiple processors in which each processor access a single shared memory. So, a task divided into subtask must be shared the same memory and communication between subtasks take place via shared variables. In a cluster parallel computers, more than one computing node with its own memory and processor are connected to each other. Hence two processor needs to send a message to each other node in order to exchange variable values. A hybrid architecture is just an interconnected SMP. Communication with SMP takes place through a variable value in shared memory but the SMP node communicates via message passing. Cluster architecture is highly scalable [5]. OpenMP is a standard application program interface (API) that consists of a set of directives and library subroutines [6].OpenMPI is a shared memory multiprocessing programming environment that supports C/C++ and Fortran programming language. A program can write and execute a parallel program in the OpenMP environment by dividing the task into an independent task [7].

Virtualization is a technology that is used to provide better resource utilization. A different operating system can be installed on different virtual machines(VMs) to increase system flexibility. A hypervisor runs on the top of hardware. A hypervisor is also called a virtual machine manager (VMM). Hypervisor hides the actual physical hardware and provides virtual resources as per user expectations. The hypervisor is also responsible to protect hardware from

————————————————

 Sanjay Kumar Sharma, Department of Computer Science,

Banasthali Vidyapith, Rajsthan, India. E-mail:

[email protected]

 Nagresh Kumar, Department of Computer Science, Banasthali

(2)

malware unauthorised access. All the external requests are managed by the hypervisor. Virtualization and parallelization are the key components in cloud computing to improve system performance; it also placed some overhead on the complete system. The overhead percentage depends upon the host configuration. Perera et al.[8] compare the hypervisors VMware[9] and Xen[8]. Authors conclude that Xen is much better than VMware concern to paravirtualized and VMware is much better than Xen concern to fully-virtualized [10]. Tafa et al.[11] study the behavior of various parameters like memory utilization, transfer time, and CPU consumption in five different hypervisors. The authors concluded that KVM imposes more overhead than other hypervisors. To study the effect of parallelism and virtualization Xu et al.[12] compares the performance of OpenMP API, MPI and hybrid paradigm in Xen hypervisor. The authors concluded that an idea speedup can be achieved in SMP. If the number of virtual CPU is not more than physical CPU a linear speedup is achieved. Sharma S.K. et al. found that the parallelization of a sequential process improves performance. The result shows that the parallel algorithm is approximately twice faster than the sequential algorithm. This work can also be used for real-time implementation to a large extent and a high-performance system.[13]

3. METHODOLOGY

For the experimental study, we use the CloudSim simulator and configure the hypervisor to analyse the parameters of parallelism and virtualization to improve the system performance.

VM Allocation Model

To exhibits, the simulation result on CloudSim simulator, space-shared and time-shared allocation policy for both cloudlets and VMs are provisioned [14]. The execution policy of cloudlets and VMs with time-shared and space-shared provisioning is clearly shown in fig.-1. In this figure, the execution of 8 tasks (t1, t2,…, t8) is shown with a host with 2 CPU cores and two VMs per core.

In fig-1(a) space-shared provisioning is taken for cloudlets and VMs. In space shared only one VM can run at a time. Since each VM has two cores, so at a time two tasks can be executed simultaneously as shown in fig-1(a). According to space shared policy of VMs and cloudlets estimated finish time(eft) of a task p managed by the virtual machine i is given by

( ) ( )

Where est(p) is task execution start time and rl is a total number of instructions executed on a processor. The result eft(p) depends on the status of a task in the queue because tasks are placed in a ready queue if and only if the required number of free PEs are available in VMs. The total capacity of a host with np processing elements (PEs) is given by:

∑ ( )

( )

Where cap(i) is the processing strength of individual elements.

Fig.1: Provisioning policies for cloudlets and VMs (a) Cloudlets: Space-shared, VMs: Space-shared (b) Cloudlets: Time-shared, VMs: shared (c) Cloudlets: Space-shared, VMs: Time-shared (d) Cloudlets: Time-Space-shared, VMs: Time-shared

Figure 1(b), showing a space-shared policy for VMs and time shared policy for cloudlets. All the tasks are assigned simultaneously during the lifetime of VMs. When one VM completes its lifetime then another VM start executed. Estimated finish time of a Cloudlet managed by a VM is given by

( )

( ) ( )

Where ct is the current simulation time, and cores (p) is the number of PEs required by the cloudlet. The total processing capacity of Cloud host is given by-

∑ ( )

(∑ ( ) )

( )

Where cap(i) is the processing strength of individual elements.

In Figure 1(c), a time-shared allocation policy for VMs and space shared policy for tasks is taken. In this scenario, VMs assign a time slice and with each time slice context switch take place on available VMs. At the given instance of time slice, only one task can execute. In fig-1(d) time shared allocation policy for tasks and VMs are taken. The processing power of VMs is shared between tasks. There is no queuing delay in this scenario.

Workload Variability [15]

When evaluating the mutual effect of virtualization and parallelization, a particular set of applications exhibit different results under different performance parameters. Small changes in host configuration, cloudlets and VM allocation policy, queuing delay, task scheduling algorithms, nature of hypervisors and other hardware platforms produce a completely different result. The dimensions of virtualization and parallelization modeling are shown in the

cores vm1 vm2 cores vm1 vm2

2

t3 t4 t6 t8 2 t4 t8

_t3 _t7

t1 t2 t5 t7

t2 t6

1 1

_t1 _t5

time time

(a) (b)

cores cores

2

t4 t8 vm2

2 t8

vm2 t7

t3 t7 vm1

t6

vm1 t5

1

t2 t6 vm2

1 t4

vm2

t3

t1 t5 vm1

t2

vm1 t1

(3)

5348 Datacentre Parameters

String arch = "x86"; // system architecture String os = "Linux"; // operating system String vmm = "Xen";

Number of host = 2

Host Parameters

int ram = 65535; //host memory (MB) long storage = 1000000; //host storage int bw = 10000;

core = each host is quad core

VM Parameters

long size = 10000; //image size (MB) int ram = 512; //vm memory (MB) int mips = 1000;

long bw = 1000;

int pesNumber = n; //n number of cpus String vmm = "Xen"; //VMM name

Cloudlet parameters

long length = 1000; long fileSize = 300; long outputSize = 300; int pesNumber = 1;

VM Scheduling Policy: Time shared and space shared. Cloudlet Scheduling Policy: Time shared and space shared.

Simulation Results

Simulation results are recorded as average turnaround time (in simulator time unit) in four different tables as given below. Four different abbreviations are used as-

(i) SS-Space share for VMs and Space shared for cloudlets. (ii) ST-Space share for VMs and Time shared for cloudlets. (iii) TS-Time share for VMs and Space shared for cloudlets. (iv) TT-Time share for VMs and Time shared for cloudlets.

TABLE 1 Cloudlets=20,PEs/VM=1 TABLE 2 Cloudlets=20,PEs/VM=2

Prov

VM SS ST TS TT

2 5.6 5.6 10.1 10.1 4 3.1 3.1 5.1 5.1 8 1.9 1.9 2.7 2.7 16 1.9 1.9 2.7 2.7 32 1.9 1.9 2.7 2.7 64 1.9 1.9 2.7 2.7

Prov VM

SS ST TS TT 2 3.1 3.1 5.1 5.1 4 1.9 1.9 2.6 2.6 8 1.9 1.9 2.6 2.6 16 1.9 1.9 2.6 2.6 32 1.9 1.9 2.6 2.6 64 1.9 1.9 2.6 2.6

TABLE 3 Cloudlets=40,PEs/VM=1 TABLE 4 Cloudlets=40,PEs/VM=2

Prov

VM SS ST TS TT 2 10.6 10.6 20.1 20.1 4 5.6 5.6 10.1 10.1 8 3.1 3.1 5.1 5.1 16 3.1 3.1 5.1 5.1 32 3.1 3.1 5.1 5.1

Prov VM

SS ST TS TT 2 5.6 5.6 10.1 10.1 4 3.1 3.1 5.1 5.1 8 3.1 3.1 5.1 5.1 16 3.1 3.1 5.1 5.1 32 3.1 3.1 5.1 5.1

Note: We have used ‘Prov’ as a abbreviation for provision.

Fig-2: Corresponding to Table-1

Fig-4: Corresponding to Table-3 0

5 10 15

0 2 4 6 8 10 12 14 16

Tu

rn

aro

u

n

T

im

e

No. of virtual Machines Cloudlets=20,PE's/VM=1

SS=ST TS=T T

0 2 4 6 8 10 12

0 2 4 6 8 10 12 14 16

Tu

rn

aro

u

n

T

im

e

TS=TT

SS=ST

0 2 4 6 8 10 12 14 16 18 20 22

0 2 4 6 8 10 12 14 16

Tu

rn

aro

u

n

T

im

e

TS=TT

(4)

Observations:

1. Space shared virtual machine policy performs better than time shared virtual machine in terms of Average turnaround time.

2. If PEs per VM increases then turnaround time decreases in all four scenarios.

3. Exponential increment in virtual machine results in an exponential decrement in turnaround time for a short duration and then constant throughout the execution of tasks.

4. More switching occurs when virtual machines and cloudlets are provisioned in time shared policy. 5. The multicore virtual machine performs better than

single-core virtual machines.

4. ANALYSIS

In the above simulation result, we have taken only four variable parameters i.e. number of cloudlets, VMs allocation policy, Cloudlet allocation policy and a number of PEs per VMs. All the other parameters are kept constant. Size of cloudlets is constant in all scenarios.

1. As we increase the number of cloudlets by keeping other parameters constant, Average turnaround time increases. Since the size of all cloudlets is the same so definitely turnaround time will increase in all scenarios.

2. If we increase the number of VM then turn around time decreases to a certain number of VM and then constant because the required number of resources is available and there is no change in the number of switching.

3. As we increase the number of core and PEs per VM then more number of tasks can execute in a particular unit of time in space shared, and time shared provisioning. In space share, one virtual machine can execute at a time. If a virtual machine has n number of PEs then n cloudlets can execute at a time. In time shared more number of cloudlets can be executed with less number of switching.

4. If the number of available VMs is equal or more than the required VM as per the ready task then there is no effect of cloudlet allocation policy and VMs

5. Space shared provision for cloudlets and VMs are more energy-efficient than time shared because there is a less number of switching occurs.

6. The divergence rate of time shared provisioning of VMs is faster than space shared provisioning as shown in all four graphs.

Now, it is clear from the graphs, recorded data in four different tables and result analysis that parallel computing reduces the average turnaround time and increases the overall system performance. The availability of multicore VMs results in independent execution of tasks and hence more speedup can be achieved. More virtualization generates extra overheads that should be minimized A hypervisor is responsible to schedule the available physical resources and allocates them to different installed virtual machines. So a hypervisor must consider the points raised above. Selection of scheduling algorithm plays a key role in the hypervisor to increase system performance and better resource utilization. A suitable scheduling algorithm minimizes the waiting time of VM. This policy makes VM unaware of the existence of any competition from other VMs on the available resources. To schedule the task on the best resources, an optimization algorithm should be implemented in the hypervisor. Due to the heterogeneous nature of a large number of VMs and cloudlets an intelligent program is also needed for parallel execution.

5. CONCLUSION

Cloud computing is leading technology in the field of information technology. Virtualization and parallelization are the key components of cloud computing. Parallelization and virtualization definitely improve system performance. The maximum benefits of virtualization are taken by parallel execution of tasks on virtualized VM and parallelization is possible only through feasible virtualization. Parallelization and virtualization create extra overhead in cloud computing. In this paper, the mutual effect of virtualization and parallelization are properly analysed so that a hypervisor can take maximum benefits concern to average turnaround time. The next step is to overcome the extra overhead of virtualization and parallelization.

REFERENCES

[1] Raj kumar Buyya , Christian Vecchiola , S.Thamarai Selvi “Mastering Cloud Computing: Foundations and Applications Programming” page 31-35, ISBN-13: 978-0124114548,Morgan Kaufmann publisher, 1st Edition,2013.

[2] D. Ruest and N. Ruest, “Virtualization: a beginner’s guide,” McGraw Hill, 2009.

[3] S. Zaghloul, “Virtualization: key to cloud computing,” MASAUM International Conference on Information Technology (MICIT’12), 2012.

[4] X. Chen and X. Huo, “Cloud computing research and development trend,” In the IEEE Second International Conference on Future Networks, pp. 93-97, 2010. [5] A. Kaminsky, “Building parallel programs: SMPs,

clusters and Java”. Course Technology, Cengage Learning, 2010.

0 5 10 15

0 2 4 6 8 10 12 14 16

Tu

rn

ar

o

u

n

Ti

m

e

TS=TT

(5)

5350 [7] P. Pacheco, “Parallel programming with MPI,” Morgan

Kaufmann Publishers, Inc., San Francisco, CA, 1997. [8] P. Perera and C. Keppitiyagama, “A Performance Comparison of Hypervisors,” The International Conference on Advances in ICT for Emerging Regions (ICTer2011), 2011.

[9] Xen® Hypervisor. “www.xen.org”.

[10] VMware® Hypervisor. “www.vmware.com”.

[11] I. Tafa, E. Beqiri, H. Paci, E. Kajo and A. Xhuvani, “The evaluation of transfer time, CPU consumption and memory utilization in Xen-PV, Xen-HVM, OpenVZ, KVM-FV and KVM-PV hypervisors using FTP and HTTP approaches,” The Third International Conference on Intelligent Networking and Collaborative Systems, 2011.

[12] X. Xu, F. Zhou, J. Wan and Y. Jiang, “Quantifying performance properties of virtual machines,” The International Symposium on Information Science and Engineering, 2008.

[13] Sharma S. K, Gupta Kusum, “Parallel Performance of Numerical Algorithms on Multi-core System Using OpenMP”, Proceedings of the Second International Conference on Advances in Computing and Information Technology (ACITY) and Springer book on Advances in Computing and Information Technology, ISBN 978-3-642-31552-7, Volume 2, 2012

[14] Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, Cesar A. F. De Rose Buyya Raj kumar, “CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms” Software—Practice & Experience archive, Volume 41 , Pages 23-50,Issue 1, ISSN:0038-0644January 2011