An Optimal Task Selection Scheme for Hadoop Scheduling

(1)

IERI Procedia 10 ( 2014 ) 70 – 75

Selection and peer review under responsibility of Information Engineering Research Institute doi: 10.1016/j.ieri.2014.09.093

ScienceDirect

Abst MapR comp It pro mann this p avail has l next expe aroun © 20 Sele Keyw 1. In A * E

20

An Opti

tract Reduce is a p puting environm ovides an abstr ner. There are s paper, a new o lable for a node east number of local node is la riments and it nd 20% of impr

014 The Auth ection and pee

words:Hadoop,Ma ntroduction As recent rapid Corresponding a E-mail address: su

014 Internat

imal Tas

Depar popular paralle ment. Hadoop i racted environm several Hadoop optimal task se e. To improve t f replicas of inpu aunched among has been obse rovements achi ors. Published r review unde apReduce,task se d advancemen author. Tel.: +91-9 ureshtvmalai85@

tional Confe

sk Select

S. Su

rtment of compute Ti l programming s an open sourc ment for runnin p scheduling alg election scheme the probability ut, individual lo g the available erved that the m

eved in terms o

d by Elsevier B er responsibilit

lection, job sched

nt in computin 9941506562. @gmail.com

ference on F

tion Sch

uresh

a

*, N.P

er Applications ,N Tiruchirappalli 63 g model used ce implementat ng large scale d gorithms are pro e is introduced of percentage oad of disks att local tasks for method improv of locality and f B.V. ty of Informat duling. ng systems, it

Future Inform

heme for

P. Gopalan

National Institute 32 015,India to solve wide tion MapReduc data intensive a oposed in the li in to assist th of local tasks l tached to the no a node. The pro ves the perform fairness.

tion Engineer

has been appl

mation Eng

Hadoop

n

b e of Technology, e range of Big e and widely u applications in iterature with va he scheduler wh aunched for a j ode and maximu

oposed method mance significan ing Research lied to solve v

gineering

p Schedu

gData applicati sed by vast am a scalable and arious performa hen multiple lo job in future, th um expected tim d was evaluated

ntly. From the

Institute very complex

uling

ons in cloud ount of users. fault tolerant ance goals. In ocal tasks are he task which me to wait for d by extensive experiments,

problems in

(2)

diverse fields. The data generated and processed by such applications are extremely very large in size. Several computing paradigms such as grid computing, cloud computing are emerged to solve large scale data intensive applications efficiently. However, traditional programming models, load balancing, debugging and scheduling solutions are not efficient when dealing BigData applications. Redefining the solutions to the aforementioned problems of large scale data intensive applications attracted much of research audience. Several frameworks are proposed in the literature to efficiently handling BigData applications [1-4].

MapReduce [1] is a popular programming model for data intensive applications in cloud environments. It hides all the parallel programming difficulties from the users and provides abstracted environment. Hadoop [5] is a widely used open source implementation of MapReduce. Being capable of providing fault tolerance, load balancing and scalable service over commodity hardware makes Hadoop as a standard for BigData processing. The performance of Hadoop is mainly based on scheduler. Several scheduling algorithms [6-10] are proposed for different kind of workloads, environments and applications.

In this paper, a new optimal task selection scheme is introduced in to assist the scheduler when multiple local tasks are available for a node. To improve the probability of percentage of local tasks launched for a job in future, the task which has least number of replicas of input, individual load of disks attached to the node and maximum expected time to wait for next local node is launched among the available local tasks for a node. The proposed method was evaluated by extensive experiments and it has been observed that the method improves the performance significantly. From the experiments, around 20% of improvements achieved in terms of locality and fairness. The remainder of the paper is organized as follows. Section 2 explores the related works on Hadoop scheduling. The proposed method is presented in section 3. The performance of the proposed method is evaluated in section 4 and section 5 concludes the paper.

2. Related Works

Hadoop uses master slave architecture for running applications concurrently. The master and slave nodes are communicated by sending heartbeat messages. Wherever a heartbeat received form an empty node, Hadoop scheduler starts launching a task to that node. Hadoop uses default FIFO scheduler for job scheduling. FIFO lacks in performance in terms of response time of small jobs and fair sharing among users. Several scheduling algorithms [6-10] are proposed for different kind of workloads, environments and applications. Yahoo proposed Capacity Scheduler [6] to share the Hadoop cluster among various organizations. A set of queues with minimum guaranteed capacity is created and shared the cluster accordingly.

Facebook proposed Hadoop Fair Scheduler (HFS) [7] to share the cluster fairly among users/applications. Set of pools are created and a minimum share is assigned to each. Zaharia et al. proposed LATE [8] and Delay scheduling [9] for efficiently launching speculative task and achieving improved locality by slightly relaxing fairness respectively. The deadline constraints of job are considered in [10]. The readers interested in further Hadoop scheduling techniques can refer a detailed survey in [11]. Although several algorithms proposed in the literature, none of them considered the problem of selecting optimal task from set available local tasks for a node.

3. Proposed Method

Hadoop scheduler chooses a job to assign a slot according to the scheduling policy whenever a node becomes free. After deciding the job, the scheduler starts searching to find a local task from list of unscheduled tasks of the job to launch on free node. If a local task found, scheduler launching the task immediately. If no local task is available, scheduler starts launching non local task or chance is given to next

(3)

job in the queue based on scheduling policy. By default, Hadoop lunches the first local task found among the available local tasks of the job. Suppose there are many local tasks available for the job, benefit of choosing a task among them compare to another is not considered. Carefully choosing a task from set of available local tasks based on various factors may improve the locality of other tasks in future and may achieves improved load balancing.

By default, Hadoop replicates each input blocks into 3 different nodes. Several replication algorithms are creating and deleting the files/blocks dynamically according to workload characteristics and balancing the load among the nodes. Hadoop uses Hadoop Distributed File System (HDFS) for storing input and output data which divides the files into fixed sized blocks. This will result a situation where the input of an application (which may consists of many files) is stored with different replica count. Usually the number of nodes is less than the number blocks and each blocks are replicated into many places (either static or dynamic replication), many input blocks of the job are stored in the same node. In addition, many files are supplied as an input of an application which further increases the chance of number of local tasks of the job to a node.

To choose an optimal task with the goal of highest possible locality of other tasks in mind, the knowledge of future arrival of preferred nodes free slots is to be known. In addition, launching a task whose input is stored on least loaded disk among the disks attached to the node also achieves high transfer rate and yields better load balancing at disk level. The nodes future arrival times are unknown before their arrival. To predict the future arrival time of nodes for tasks, progress rates of the nodes is used. The following things are considered while selecting a task: the replica count of input file/block, predicted arrival time of task’s next free node and the current load of the disk where the task’s input is stored.

The future arrival time of a node with data locality for a task is estimated and used for selecting task as similar to the optimal page replacement algorithm in operating systems. Let T = {t1, t2, ... ti} be the set of

unscheduled tasks of job Ji have their input on node ni (set of local tasks to node ni). Li is the average tasks

length running on the slots where task ti’s input available. The Rt is the replication factor of ti’s input and Sik is

the number of slot available in the kth_{local node of}_t

i. The task’s expected time for next free slot is calculated

as follows:

ET(ti) =

௅_೔ σೃ೟ೖసభௌ೔ೖ

(1) The load of a disk attached to a node is equal to the sum of I/O request from the tasks running on the node and some of I/O request from non local tasks running on other nodes accessing the disk through network. To distribute the load across the disks attached to a node, choosing task whose input is stored on disk in which least number running tasks’ inputs stored is desirable. For simplicity, the non local tasks are not considered for calculating disk load. The disk load on which task’s input is stored calculated as follows:

DL(ti) = 1 -

ே௜

ே௦ (2)

where Ns is the number of slots in the node and Ni is the number of running tasks reads input from disk

where the input of task tiis stored.

Similarly, the replication weightage based on replication count is calculated as follows:

RW(ti) = 1 -

ோ௧ି୫୧୬ሺோ௧ሻ

(4)

where max(Rt) and min(Rt) are the maximum and minimum number of replicas of tasks input blocks respectively. The more number of replica count gives small value of RW(ti) and least number of replica count

gives higher value of RW(ti).

The values of DL(ti) and RW(ti) are ranging from zero to one. The value of ET(ti) is normalized between

zero and one. Finally, for task selection, a multi objective function as the linear combination of aforementioned three objective functions is formalized as follows:

TS(T) = max {Į × ET(ti) + ȕ × DL(ti) + ȖRW(ti)}, 1 i n (4) TS(T) = max{Į × ( ௅೔ σೃ೟ೖసభௌ೔ೖ ) + ȕ × (1 - ே௜ ே௦) + Ȗ × [1 - ோ௧ି୫୧୬ሺோ௧ሻ ୫ୟ୶ሺோ௧ሻି୫୧୬ሺோ௧ሻ]}, 1 I n (5) The symbols Į, ȕ and Ȗ are proportion variables corresponding objective functions and n is the number of local tasks available for a node from the job which could be scheduled next according to scheduling policy. The performance overhead of proposed method is trivial. Because, the number of tasks of a job which are local to a node is not a larger value even the cluster size is large. Also, calculating the value of TS(T) for small n value takes negligible amount of time. In addition, the proposed method is used with any Hadoop scheduling algorithm and any systems beyond Hadoop which has several scheduling options. The objective function is defined as a linear combination of several objectives which gives a flexibility to add or remove objectives easily based on our requirements. The values of proportion variables Į, ȕ and Ȗ are fixed according to our needs and their sum could be one. The higher value means more weightage for that objective function. 4. Performance Evaluation

The performance of the proposed method is evaluated by extensive experiments. The simulation experiments were run on 20 node cluster. Each node has 2.4GHz quad core processor and 4GB of RAM. All nodes are having three 500GB sata hard disk drives of 7200RPM and nodes are connected with 1 Gbps Ethernet. The WordCount job is used for experiments and input data generated using RandomTextWriter program in Hadoop distribution of 1.0.4 version. There are 100 jobs are generated with random input sizes with the inter-arrival time of 15 seconds. The values of Į, ȕ and Ȗ are fixed 0.4, 0.3 and 0.3 respectively throughout the experiments. All experiments are repeated 5 times and average values of results obtained is presented.

The Hadoop Fair Scheduler (HFS) [7] is used as a scheduler in the experiments. The performance of proposed method is measured by the improvements achieved in data locality. Figure 1 shows the percentage of local tasks launched with HFS and HFS plus proposed task selection method for various job sizes. Small jobs (1 to 2 tasks) are achieved lower percentage of locality in HFS as input is available in few nodes. Also, using the proposed method with HFS does not improve the locality significantly for small jobs. Because, the proposed method is not applied for single task jobs and jobs with 2-4 tasks has very low probability for more than one tasks input blocks are stored on the same node. The performance of proposed method for large jobs (jobs with more than 50 tasks) are also achieves negligible improvements over HFS. Large jobs are usually having the input data in larger number of node, hence leads to highest locality by default. On the other hand, the proposed method improves the locality for medium size jobs (5 to 50 map tasks) significantly. Around 20% of improvement is achieved in launching local nodes by the proposed method over the HFS.

Delay scheduling [9] proposed by Zaharia et al. improves the data locality by relaxing fairness of the resource allocation. Users can achieve desirable locality level by setting the delay threshold accordingly. The performance of the proposed scheme is also tested using HFS with delay scheduling. The results obtained in

(5)

terms of locality are not presented here because of negligible performance improvement. On the other hand, a considerable improvement is achieved in fairness of resource allocation among jobs (Fig. 2). The proposed method increases the chance of task locality which reduces the fairness relaxation of delay scheduling technique to achieve desired locality. So, the proposed method improves the performance in terms of fairness in the case where delay scheduling is used for higher locality. Improvement in locality and fairness also yields better performance in jobs response time and load balancing. In future, analyzing the performance of the proposed algorithm with various scheduling algorithms, cluster sizes and workloads is planned. Also several other factors which are influencing the system performance will be considered for task selection.

Fig. 1. Percentage of local tasks launched for various job sizes

Fig. 2. Fairness improvements of proposed method over delay scheduling 0 20 40 60 80 100 1Ͳ4 5Ͳ20 21Ͳ50 51Ͳ150 151Ͳ 300 >350

Pe

rc

e

n

t

Loc

al

Map

s

JobSize(NumberofMapTasks)

(6)

5. Conclusion

In this paper, an optimal task selection method is proposed in Hadoop environments. A multi objective function based on various objectives, such as the replica count of input file/block, estimated arrival time of tasks next free node and the current load of the disk, is presented to select a task from set of available local tasks of the job. The performance of the proposed method is evaluated by extensive experiments and observed around 20% of improvements in terms of locality over HFS. Further, it also achieves significant improvement in fairness without affecting locality used with delay scheduling. If future, several other factors which are influencing the system performance will be considered for task selection.

References

[1] V

Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.

[2] Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 41(3):59–72, 2007.

[3] Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. Twister: a runtime for iterative MapReduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 810–818, 2010.

[4] Yunhong Gu and Robert L Grossman. Sector and sphere: the design and implementation of a high-performance data cloud. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1897):2429–2445, 2009.

[5] Apache Hadoop. http://hadoop.apache.org/.

[6] Hadoop’s Capacity Scheduler: http://hadoop.apache.org/docs/stable/capacity_scheduler.html [7] Hadoop’s Fair Scheduler: http://hadoop.apache.org/docs/stable/fair_scheduler.html

[8] M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2008.

[9] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European Conference on Computer systems (EuroSys), 2010.

[10] K. Kc and K. Anyanwu, "Scheduling Hadoop Jobs to Meet Deadlines", in Proc. CloudCom,388-392, 2010.

[11] B.Thirmala Rao, LSS.Reddy. Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments. International Journal of Computer Applications,34(9):29-33,2011.