Analyzing Job Aware Scheduling Algorithm in Hadoopfor Heterogeneous Cluster

(1)

International Journal of Research and Innovations in Science & Technology,

©SAINTGITS College of Engineering, INDIA www.journals.saintgits.org

Research paper

Analyzing Job Aware Scheduling Algorithm in Hadoop for Heterogeneous Cluster

Mayuri A Mehta, Supriya Pati

Computer Engineering Department, Sarvajanik College of Engineering and Technology, Surat, India [email protected]

Copyright © 2015 Authors.This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

A scheduling algorithm is required to efficiently manage cluster resources in a Hadoop cluster, thereby to increase resource utilization and to reduce response time. The job aware scheduling algorithm schedules non-local map tasks of jobs based on job execution time, earliest deadline first or workload of the job. In this paper, we present the performance evaluation of the job aware scheduling algorithm using MapReduce WordCount benchmark. The experimental results are compared with matchmaking scheduling algorithm. The results show that the job aware scheduling algorithm reduces average waiting time and memory wastage considerably as compared to matchmaking algorithm.

Keywords: Scheduling algorithm, heterogeneous cluster, Hadoop, MapReduce

1. Introduction

Hadoop processes a large amount of data in-parallel on large clusters of commodity hardware [1]. A central component of Hadoop is MapReduce. MapReduce provides a parallel programming model to distribute and to execute data intensive jobs [2]. MapReduce programming model divides a job into Map tasks and Reduce tasks. A Map task can further be divided into local and non-local map tasks. As the number of jobs submitted by the user increases, load of the cluster increases. In order to manage load of the cluster, a scheduling mechanism is needed to improve the overall cluster performance.

Job scheduling algorithms have been studied abundantly in the literature [3-13]. However, the applicability of existing algorithms is restricted as they suffer from one or more of the following issues:

• Random scheduling of non-local map tasks

• Limited resource utilization

• Negligence of small jobs in scheduling

• Not applicable towards heterogeneous cluster

To overcome the above limitations, we have presented a job aware scheduling algorithm in Hadoop for heterogeneous cluster. Unlike conventional algorithms, it schedules non-local map tasks based on one of the three criteria: 1) job execution time, 2) earliest deadline first or 3) workload of the job. In this paper, we present the performance analysis of job aware scheduling algorithm based on job execution time and workload of job. The average waiting time of job reduces considerably as non-local map tasks of jobs are scheduled based on smallest execution time first. Moreover, the memory wastage is reduced as scheduling is carried out selecting a non-local map task that best fits the free memory available.

(2)

The rest of the paper is organized as follows: Section 2 presents the system setup considered for evaluation of the job aware scheduling algorithm. In section 3, we briefly discuss job aware scheduling algorithm. In section 4, we present the performance analysis of job aware scheduling. Finally, Section 5 specifies the conclusion and some future work.

2. System Model

Hadoop has a master-slave architecture. The two major components of Hadoop are MapReduce and HDFS. The MapReduce programming model consists of a JobTracker (master node) and several TaskTrackers (slave nodes).

JobTracker (JT) is a process that handles and allocates jobs in the JobQueue. TaskTracker (TT) is a process that sends heartbeat to a JobTracker and in response, it receives a task to be executed on a particular node [14][15]. The file system component of Hadoop is HDFS. HDFS is comprised of two components: the Name Node and the Data Node.

Name Node stores all the metadata and Data Node stores the actual input data for the tasks to be executed.

Before discussing the job aware scheduling algorithm, we first describe the terminologies used in the algorithm. A MapReduce job is divided into several map tasks (m) and several reduce tasks (r). A map task can further be divided into local and non-local map tasks. Local map task tlis a map task that is executed on a slave node containing its input data. Non-Local map task tnlis a map task that is executed on a slave node that does not contain its input data [4][7-8].

Data locality rate lris defined as the ratio of the number of local map tasks tland all map tasks m. MapReduce slot is defined as the maximum number of map and reduce tasks that can run in parallel on a cluster node[1-4]. Locality marker is used to mark the slave nodes in order to ensure that each slave node gets a fair chance to grab its local map tasks [8]. Heartbeat is a signal that slave node sends to the master node. It carries information about total storage capacity, fraction of storage in use, number of free maps, number of reduce slots, and the number of data transfers currently in progress [1]. Jobs with a small number of input files are known as small jobs such as ad-hoc queries, sampling, and periodic reporting jobs[1-4]. Job execution time is the time needed for a job to complete [7]. Average waiting time is the average of waiting time of all the jobs in a job queue [7]. The notations used in the proposed algorithm are described in Table 1.

Table 1: List of Notations.

Notations Description

SNi i^thSlave Node, i = 1 to n-1

MN Master Node

NLi Locality marker of i^thslave node sci Free slot count of node SNi

JQ Job Queue

LJQ Length of job queue

Jk k^thjob of job queue, k = 1 to LJQ

PQ Priority Queue

Cidle CPU idle time of SN

Didle Disk idle space of SN

Ridle RAM idle space of SN

CPUk CPU requirement of Jk

Dk Disk requirement of Jk

Rk RAM requirement of Jk

The algorithm is designed for a Hadoop cluster that consists of n1,n2,...,nnnodes. Cluster nodes are categorized as Slave Nodes SN1, SN2, …. ,SNn-1 and a Master Node MN. Every Slave Node has access to Master Node. Master Node is responsible for monitoring available resources of Slave Nodes in the cluster. Each cluster node is composed of a combination of various resources including processor, memory, disk and network connectivity.

3. The Proposed Job Aware Scheduling Algorithm

In Figure 1, we present the proposed job aware scheduling algorithm. The Master Node MN maintains a job queue JQ.

Each job is divided into a number of map tasks and reduce tasks. Every map tasks can further be classified as local map tasks tland non-local map tasks tnl. Whenever, a new node is added to the cluster its locality marker NLiis set to NULL.

When an i^thslave node SNihas a free slot count scn. It sends a heartbeat to MN. The scheduler in MN is responsible for scheduling jobs to SNi. Scheduler checks for unassigned local map tasks tl of the jobs in JQ. If it finds any tl, it is allocated to SNi for execution and the locality marker for SNi is incremented by one. Moreover, scn of SNi is decremented by one. The process continues as it checks the subsequent jobs in JQ for tl. However, if it does not find any tlfor the current heartbeat, it sets the slave node’s SNilocality marker NLito zero. It waits for the current heartbeat and does not assign any task to that SNi. However, if it does not find any tlfor the next heartbeat, it allocates unassigned

(3)

non-local map task tnlfrom JQ to that SNi. In order to avoid the wastage of cluster resources, it allocates tnlfrom JQ to SNi. The selection of tnlfrom the JQ is based on one of the following three criteria: job execution time, earliest deadline first, and workload of the job.

Input: Job queue, n-1 slave nodes Output: Result of completed jobs

1. FOR i = 1 to n-1

2. initialize NLi= NULL 3. FOR i = 1 to n-1

4. MN receives heartbeat from SNi:

5. WHILE sci> 0

6. Initialize flag = NLi

7. FOR k = 1 to LJQ

8. IF Jkcontains tl then

9. Allocate tlof Jkto SNi

10. sci= sci– 1

11. IF NLiequals to NULL then

12. NLi= 1

13. ELSE

14. NLi= NLi+ 1

15. IF flag = NLithen

16. NLi= 0

17. ELSEIF NLi= 0 then

18. IF (jobs are interactive)

19. min_task (JQ, SNi)

20. ELSEIF (jobs have a strict deadline)

21. deadline_aware (JQ, SNi)

22. ELSEIF (cluster load > threshold)

23. workload_aware (JQ, SNi)

24. sci= sci– 1

Figure 1: Job aware scheduling algorithm.

Criteria 1: Scheduling based on job execution time Input: Job Queue, SNi

Output: Result of task 1. FOR k = 1 to LJQ

2. Insert Jkinto PQ based on execution time 3. Allocate tnlof Jkto SNi

Criteria 2: Scheduling based on earliest deadline first Input: Job Queue, SN_i

2. Insert Jkinto PQ based on deadline of Jk

3. Allocate tnlof Jkto SNi

Criteria 3: Scheduling based on workload of the job Input: Job Queue, SNi

2. IF Cidle> CPUk

4. ELSEIF Didle>Dk

6. ELSEIF Ridle>Rk

Figure 2: Algorithm for scheduling of jobs based on different criteria.

The functioning of the proposed algorithm as per the three criteria is shown in Figure 2. As per the criterion job execution time, a tnl of a job Jk with minimum execution time is selected first. Next, a tnl having next minimum execution time is selected for execution and so on. Thus, eventually the average waiting time of jobs decreases

(4)

considerably. This criterion is more suitable if the majority of the jobs in the job queue are interactive jobs. As per the criterion earliest deadline first, a tnlof Jkhaving earliest deadline is selected first. The earliest deadline first criterion is appropriate when jobs have a strict deadline. The third criterion is workload of the job in which the resource requirements (CPUk, Dk, Rk) of all jobs is calculated. When SNisends heartbeat to MN, a job from JQ is selected whose resource requirement best fits the available resources (Cidle,Didle,Ridle) on SNi. A tnlof that selected job is scheduled on SNi. The best fit approach of resource allocation is more storage efficient and results in less wastage of resources as compared to first fit approach. This criterion is more appropriate for the cluster that is highly loaded.

4. Performance Analysis

We evaluate the job aware scheduling algorithm with respect to two parameters: average waiting time and memory wastage. We carry out experimental analysis using MapReduce WordCount job as benchmark in Hadoop 2.2.0 with default block size of 128MB. We submit 5 Job Queue with same job arrival time to the Hadoop cluster. The number of jobs in each job queue submitted to the Hadoop cluster is shown in Table 2. In our experimental analysis, we have considered small job as a WordCount MapReduce job that has input file size less than the default block size. As per the proposed job scheduling algorithm in Figure 1 and Figure 2, the data locality is maintained by executing the local map tasks of each job first. Once all local map tasks have been executed, the non-local map task of a job is scheduled having shortest execution time.

Table 2: Number of jobs in job queues.

Job Queue Q1 Q2 Q3 Q4 Q5

Number of jobs 2 4 6 8 10

Figure 3: Average waiting time of MapReduce WordCount benchmark in best case.

Figure 4: Average waiting time of MapReduce WordCount benchmark in worst case.

Algorithm Average Waiting

Time (min:sec) Proposed job aware

scheduling algorithm 1.386

Matchmaking algorithm 4.292

scheduling algorithm 4.292

Matchmaking algorithm 4.292

Performance Improvement 0%

(5)

In first experimental scenario, we evaluate the performance of the proposed algorithm considering criterion job execution time. It is evaluated for three cases: best case, average case, and worst case. In best case, the job consists of all non-local map tasks is considered whereas in worst case, the job consists of all local map tasks is considered. In average case, job consists of several non-local map tasks and several local map tasks is considered, that is if a job has tl

local map tasks, the number of non-local map tasks tnl will be m-tlwhere m is the total number of map tasks and 0<=tl<=m. The waiting time of a job is calculated as the difference between the turnaround time and the execution time.

The average waiting time of MapReduce WordCount jobs for the above discussed three cases is shown in Figure 3, Figure 4, and Figure 5.

Figure 5: Average waiting time of MapReduce WordCount benchmark in average case.

The performance of job aware scheduling algorithm is highly dependent on number of non-local map tasks. In best case, the number of non-local map tasks is higher; hence the average waiting time is reduced significantly by 67%. In worst case, since job has all local map tasks, the scheduling order of both proposed job aware scheduling algorithm and matchmaking algorithm is same, thereby the performance is similar. In average case, we have considered equal number of local and non-local map tasks. In this case, the average waiting time is reduced by 19%.

In second experimental scenario, the criterion workload of the job is evaluated using three jobs. A Hadoop cluster can be highly loaded, moderately loaded, or lightly loaded. In case of highly loaded cluster, the total file size for each job is greater than the amount of free memory available on the node. In case of moderately loaded cluster, it is equal to the amount of free memory available. In case of lightly loaded cluster, it is less than the amount of free memory available.

In case of moderately and lightly loaded cluster, the jobs with any ordering will complete their execution. Hence, when the total file size for each job is less than or equal to the amount of free memory available, the proposed and matchmaking algorithm produces same amount of memory wastage. However, in case of highly loaded cluster, it is required that the memory should be utilized efficiently. As per the proposed job scheduling algorithm in Figure 1 and Figure 2, we schedule the non-local map tasks of jobs that best fit the free memory available on the node as compared to first fit approach.

We evaluate the proposed algorithm considering MapReduce jobs with different map range. Job with 0-100 maps are considered as small jobs, jobs with 100-500 maps are medium sized jobs, and jobs with map in the range of 500-1000 maps are large sized jobs [16]. Number of maps multiplied by the default block size gives the job size in megabytes (MB). In other words, the jobs with size range of 0GB-12.8GB, 12.8GB-64GB, and 64GB-128GB are considered as small, medium, and large jobs respectively. In our experimental analysis, we have considered three WordCount MapReduce jobs: small, medium and large. The amount of free memory available for our experimental setup is 80GB.

The size of small, medium, and large jobs submitted to the cluster is shown in Table 3. The average fraction of memory wastage and memory usage in case of highly loaded cluster for the proposed algorithm and Matchmaking algorithm is shown in Figure 6.

scheduling algorithm 4.417 Matchmaking algorithm 5.447

(6)

Table 3: File size of jobs.

Job Type File Size (GB)

Small jobs 5 10

Medium jobs 15 20 25 30 35 40 45 50 55 60

Large jobs 65 70 75 80 85 90 95 100 105 110 115 120 125

0 0.2 0.4 0.6 0.8 1 1.2

Proposed Algorithm Matchmaking Algorithm

Workload of job

Memory wastage Memory usage

Figure 6: The average memory wastage of MapReduce WordCount benchmark for highly loaded cluster.

It is observed that as the non-local tasks of jobs that best fit the amount of free memory available is scheduled first, the average amount of memory wastage reduces considerably. The amount of memory wasted for all the possible 260 combinations of jobs in case of proposed algorithm is 80.4375GB and in case of Matchmaking algorithm it is 113.75GB. The average memory wastage in case of proposed algorithm is 0.309375 and in case of Matchmaking algorithm it is 0.4375. Hence, the performance analysis shows that the proposed algorithm reduces the memory wastage by 12.81% in a highly loaded cluster.

5. Conclusion

In this paper, we have presented a job aware scheduling algorithm in Hadoop that is applicable towards heterogeneous cluster, in order to validate the resourcefulness of the proposed job aware scheduling algorithm; we have performed a set of experiments. Specifically, we have analyzed the performance of the proposed algorithm to schedule non-local map tasks of the job considering following scenarios: 1) Job execution time criterion: performance under multiple job queues and 2) Workload of the job criterion: performance under highly loaded cluster.

The experimental results in first scenario show that the proposed algorithm reduces the average waiting time by 67% in best case and 19% in average case via scheduling the non-local map tasks of the job based on job execution time. The worst case is likely occur seldom. In second scenario, it is observed that the memory wastage is reduced by 12.81% in a highly loaded cluster. Thus, as per our observations, our proposed novel job aware scheduling algorithm is suitable for interactive jobs where waiting time is crucial and is appropriate for jobs with high resource consumption where efficient memory consumption is necessary.

In future, we aim to analyze the performance of the proposed algorithm based on earliest deadline first. In addition, we aim to evaluate its performance with various cluster settings and different benchmarks.

References

[1] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop Distributed File System,” IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1-10, May 2010.

[2] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” in Proceeding of the 6th Symposium on Operating systems Design and Implementation (OSDI), pp. 137-150. USENIX Association, December 2004.

[3] D. Yoo and K. M. Sim, “A Comparative Review of Job Scheduling For Mapreduce,” in Proceedings of IEEE Cloud Computing and Intelligence Systems (CCIS), pp. 353-358, September 2011.

[4] B. Thirumala Rao and Dr. L. S. S. Reddy, “Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments,”

International Journal of Computer Applications (0975 – 8887), November 2011.

(7)

[5] “Hadoop MapReduce Next Generation – Fair Scheduler [Online].” Available: http://hadoop.apache.org/docs/current/Hadoop- yarn/Hadoop-yarn-site/FairScheduler.html [Last accessed: November, 2014].

[6] “Hadoop MapReduce Next Generation – Capacity Scheduler [Online].” Available:

http://hadoop.apache.org/docs/current/Hadoop-yarn/Hadoop-yarn-site/CapacityScheduler.html [Last accessed: November, 2014].

[7] M. Zaharia, D. Borthankur, J. Sarma, K. Elmellegy, S. Shenker, and I. Stoica, “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling,” in Proceedings of the 5th European conference on Computer systems, ACM, pp. 265-278, 2010.

[8] C. He, Y. Lu, and D. Swanson, “Matchmaking: A new mapreduce scheduling technique,” IEEE Third International Conference on Cloud Computing Technology and Science(CloudCom), pp. 40-47, December 2011.

[9] M. Zaharia, A. Kowinski, A. Joseph, R. Katz, and I. Stoica, “Improving MapReduce Performance in Heterogeneous Environments,” USENIX OSDI, 2008.

[10] Q. Chen, D. Zhang, M Guo, Q. Deng , and S. Guo, “SAMR: A Self-Adaptive MapReduce Scheduling Algorithm In Heterogeneous Environment,” IEEE 10th International Conference on Computer and Information Technology(CIT 2010), pp.

2736-2743, July 2010.

[11] X. Sun, C. He and Y. Lu, “ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm,” IEEE 18th International Conference on Parallel and Distributed Systems, pp. 148-155, December 2012.

[12] M. Elteit, H. Lin, and W. Feng, “Enhancing MapReduce via Asynchronous Data Processing,” in Proceedings of IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS), pp. 397-405, December 2010.

[13] K. Kambatla, N. Rapolu, S. Jagannathan, and A. Grama, “Asynchronous Algorithm in MapReduce,” in Proceedings - IEEE International Conference on Cluster Computing (ICCC), pp. 245-254, September 2010.

[14] Jason Venner, “Tuning Your MapReduce Jobs”, in Pro Hadoop, CA: Apress, 2009.

[15] Tom White, “How MapReduce Works”, in Hadoop The Definitive Guide, Third ed. CA: O’REILLY,2012.

[16] X. Dai and B. Bensaou, “A Novel Decentralized Asynchronous Scheduler for Hadoop”, IEEE Global Communications Conference (GLOBECOM), pp. 1470-1475, December 2013.