Optimizing Make Span and Total Completion Time Together to Improve the Performance of the Map Reduce Workloads

(1)

Optimizing Make Span and Total Completion Time Together to

Improve the Performance of the Map Reduce Workloads

B. Haritha Gayathri & P. Venu

1_{M. Tech Student, Department of CSE, Malineni Lakshmaiah Women’s Engineering College, Pulladigunta, Guntur,}

AP, India

2_{Associate Professor, Department of CSE, Malineni Lakshmaiah Women’s Engineering College, Pulladigunta,}

Guntur, AP, India

ABSTRACT─ MapReduce and additionally Hadoop

are utilized to manage bunch preparing for slots

submitted from various clients (i.e., MapReduce

workloads). In spite of numerous exploration

endeavors committed to enhance the execution of a

single MapReduce work, there is moderately little

consideration paid to the framework execution of

MapReduce workloads. Therefore, this paper to

improve the performance of MapReduce workloads,

we proposed a dynamic job ordering and map/reduce

slot configuration algorithms and these to algorithms

can mitigate the makespan as well as total

completion time of the scheduling process of jobs.

Makespan and total completion time (TCT) are two

key performance metrics. Therefore, in this paper, we

aim to optimize these two metrics.

Keywords: MapReduce Programming Model,

Hadoop, Job Ordering

1. INTRODUCTION

In cloud systems, a provider offers elastic computing

assets (virtual compute nodes) to some of users. The

info of the underlying infrastructure is transparent to

the users. This computing paradigm is attracting

increasing hobby from both instructional researchers

and industry practitioners as it permits users to scale

their packages up and down seamlessly in a pay-as-

you-cross manner. To disarm the entire energy of

cloud computing, it's far broadly established that a

cloud statistics processing machine must offer a

excessive diploma of elasticity, scalability and fault

tolerance. MapReduce is recognized as a likely

manner to perform elastic information processing in

the cloud.

MapReduce is considerably unique from formerly

analyzed models of parallel computation as it

interleaves parallel and sequential computation. In

latest years several nontrivial MapReduce algorithms

have emerged, from computing the diameter of a

graph to implementing the EM algorithm to cluster

large records sets. Each of these algorithms gives a

few insights into what can be completed in a

MapReduce framework; but, there may be a loss of

rigorous algorithmic analyses of the problems

involved.

MapReduce has been extensively seemed as a

promising opportunity to large-scale data or statistics

evaluation consisting of graph processes, device

studying, and statistics mining. These programs

which might be submitted to MapReduce clusters are

done within the shape of jobs, and every activity

contains a number of obligations. Every venture can

be assigned to a node, which is normally known as

slave node, via task scheduler in clusters. Task

(2)

MapReduce, it specially controls the order of project

executing and resource allocation. In addition, it may

without delay have an impact on the performance of

MapReduce clusters and the execution time of the

exclusive priority responsibilities. Therefore, the

precise venture scheduling is very crucial for

MapReduce clusters. MapReduce itself presents three

essential task scheduling algorithms, which are the

First-In-First-Out (FIFO), the potential scheduling,

and the honest scheduling. FIFO algorithm is the

build-in scheduler in MapReduce clusters. The

benefits of FIFO are easy and easy to implement,

because it deals with the jobs within the way of first

in first out, this is, the older task may be deal first,

and the new process may be treated later. However, it

does now not take completely into account that there

are distinctive sizes of jobs consisting of small and

large jobs in clusters, and does no longer bear in

mind the support of more than one customer. To deal

with the issues of FIFO, the truthful scheduling is

developed to cope with small and big jobs as pretty

as viable in clusters. In order to gain this goal,

process priorities, pool weights, and postpone

scheduling is introduced. The scheduling of jobs is

managed through activity priorities with a suitable

weight, and weight is split into a positive degree. But

the honest scheduling wishes a lot of manually

configuration, which could significantly affect the

overall performance of the roles.

The data analysis applications variety in

functionality, complexity, resource wishes, and facts

transport deadlines. This variety creates competing

requirements for software design, activity scheduling,

and workload management rules in MapReduce

environments. However, in spite of different user

goals one purpose is commonplace: to enhance the

usability and performance of the MapReduce

framework. The process execution efficiency is

particularly essential for processing manufacturing

workloads whilst a given set of MapReduce jobs and

workflows wishes to be finished periodically on new

facts. Typically, the default FIFO scheduler is used

for processing manufacturing jobs for the reason that

primary performance objective is to minimize the

general execution time (makespan) of a given set.

Such manufacturing workloads are analyzed off-line

for optimizing their execution. There are a slew of

optimization strategies introduced for enhancing

statistics read/write performance in a hard and fast of

manufacturing jobs. For exceptional MapReduce jobs

working over the equal dataset, a more efficient task

scheduling proposes merge their executions in order

that the input statistics is best scanned once.

2. RELATED WORK

MapReduce is a famous computing paradigm for

big-scale information processing in cloud computing.

However, the slot-based MapReduce system (e.g.,

Hadoop MRv1) can be afflicted by poor performance

because of its unoptimized resource allocation. To

address it S. Tang, B. S Lee, and B He recognized &

optimizes the useful resource allocation from three

key factors. First, because of the pre-configuration of

awesome map slots and reduce slots which are not

mutually interchangeable; slots may be seriously

beneath-utilized. Because map slots might be

absolutely utilized while reduce slots are empty, and

vice-versa. They proposed an alternative approach

known as Dynamic Hadoop Slot Allocation by

(3)

slot allocation requirement to allow spaces to be

reallocated to either outline diminish assignments

depending on their necessities. Second, the

theoretical execution can handle the straggler bother,

which has demonstrated to improve the execution for

a single procedure however to the detriment of the

bunch execution. In view of this, they proposed

Speculative Execution Performance Balancing to

balance the performance tradeoff among a single task

and a batch of jobs. Third, put off scheduling has

shown to enhance the records locality but on the fee

of fairness. Alternatively, they proposed a method

called Slot PreScheduling that may improve the facts

locality but and not using a impact on fairness.

Finally, through combining those techniques

collectively, they form a step-by using-step slot

allocation machine referred to as DynamicMR that

may improve the overall performance of MapReduce

workloads substantially.

S. Tang, B. S Lee, and B He proposed Dynamic

Hadoop Fair Schedulers (DHFS) to enhance the

utilization and performance of MapReduce clusters at

the same time as making sure the fairness. The center

technique is dynamically allocating map (or reduce)

slots to map and decrease tasks. Two types of DHFS

are presented, namely, PI-DHFS and PD-DHFS,

based totally on equity for cluster and swimming

pools, respectively. The experimental effects display

that their proposed DHFS can enhance the

performance and utilization of the Hadoop cluster

appreciably.

P. Sanders and J. Speck tested the parallel activity

scheduling problem using a version for execution

time which accounts for each computational speedup

and communication slowdown with the purpose to

reduce makespan. The scheduling trouble for a

sequence of submitted jobs studied entails the

determination of what number of processors to assign

to a task and a start time for starting execution. They

investigated a easy but powerful set of rules, Earliest

Completion Time. This on line algorithm minimizes

the of entirety time for a process given the modern

status of the system in spite of everything previous

jobs have been scheduled.

Dynamic map-reduce method presented by means of

J. Polo et al avoidance overload in server at

scheduling approach for multi-job MapReduce

environments, and exhibit its allocation technique.

The approach dynamically adjusts the allocation of to

be had execution slots across jobs so that you can

meet their final touch time dreams, supplied at

submission time. The system constantly video display

units the common venture duration for all jobs in all

nodes, and uses this information to calculate and

modify the anticipated of entirety time for all jobs.

Dynamic resource allocation set of rules used for

important server of un-obtained storage records

forward to physical node of related server. Finally,

the unreachable storage information added from

related server to the specific receiver. So, mission

lowering the total quantity of network site visitors for

a given workload.

3. FRAMEWORK

A. Overview of the Proposed System

In this paper, Makespan and Total Completion Time

(TCT) are two major performance metrics to the

(4)

defined because the term for the reason that begin of

the first task until the entirety of the final task for a

hard and fast of jobs. It considers the computation

time of jobs and is often used to degree the overall

performance and utilization efficiency of a machine.

In assessment, general of completion time is called

the sum of finished time periods for all jobs for the

reason that begin of the primary job. It is a

generalized makespan with queuing time covered.

We can use it to measure the satisfaction to the

device from a single task’s perspective through

dividing the entire of entirety time by using the

quantity of jobs.

In this paper, we describe the MK_JR set of rules that

produces the optimized task order and also prove its

approximation ratio. We also describe the process

order which offers the worst, i.e., longest makespan,

that's used for derivation of the higher sure makespan

of a workload. Next, we describe the MK_TCT_JR

algorithm, which optimizes each makespan and

overall of completion time.

B. Classify the Slots

The overall performance of a MapReduce cluster

through optimizing the slots utilization basically from

perspectives; first, classify the slots into sorts,

specifically, idle slots (i.e., no strolling

responsibilities) and busy slots (i.e., with going for

walks obligations) . Given the overall wide variety of

map and reduce slots configured with the aid of

customers, one optimization technique (i.e.,

macro-level optimization) is to enhance the slot usage by

means of maximizing the wide variety of busy slots

and decreasing the quantity of idle slots. Second, it is

well worth noting that not each busy slot can be

successfully utilized. Thus, our optimization

approach is to enhance the usage efficiency of busy

slots after the macrolevel optimization.

Slot Allocation:

Current layout of MapReduce suffers from a

below-utilization of the respective slots because the range of

map and decrease responsibilities varies over time,

resulting in events where the wide variety of slots

allocated for map/reduce is smaller than the wide

variety of map/reduce obligations. Our dynamic slot

allocation policy is primarily based on the

clarification at one-of-a-kind period of time there

may be idle map (reduce) slots, because the job

proceeds from map phase to reduce segment. We can

utilize the unused guide slots are the ones

over-burden reduces undertakings to upgrade the

execution of the MapReduce workload, and the other

way around. For example, toward the start of

MapReduce workload calculation, there may be no

registering decrease obligations and just figuring map

commitments, i.e., all the calculation the thickness

fundamentally based bunching corresponding to a

vast range of parameter settings.

C. Job Execution

Word Count Example reads text documents and

counts how regularly words arise. The center is

textual content documents and the output is text

documents, each line of which contains a phrase and

rely of ways often it happened, separated via a tab.

Each mapper takes a line as input and breaks it into

words. It then emits a key/value pair of the phrase

and 1. Each reducer sums the counts for each phrase

(5)

As an enhancement, the reducer is additionally

utilized as a combiner on the conduct yields. This

reduces the measure of information dispatched all

through the group by utilizing joining each

expression into a single record.

4. EXPERIMENTAL RESULTS

In this experiment, we take inputs and run as

unoptimized by using normal map reducer concept.

Here we are running 3 types of jobs such as word

count, sorting & creating inverted index. After run

these three jobs, we can run the MK_JR algorithm.

As per this algorithm, we order jobs in J based on the

following principles: Partition jobs set J into two

disjoint sub-sets JobA and JobB:

JobA = when T(m) <= T(r)

JobB = when T(m) > T(r)

MK_TCT_JR algorithm is also similar to MK_JR

algorithm but difference is based on time threshold

value it arrange the jobs. After completion of these

two algorithms, we can run the MK_SF_JR

algorithm. This algorithm shows that how many

processes required for job and MK_TCT_SF_JR

algorithm provide the processes information based on

time threshold.

Finally, we can see the makespan time for all

algorithms.

5. CONCLUSION

In this paper we conclude that, we proposed job

ordering optimization algorithm & map/reduce slot

configuration optimization algorithm. By using these

two algorithms provided that the makespan is optimal

but total completion time is poor. To improve this,

we proposed moreover, a new grasping job ordering

algorithm and a map/reduce slot configuration

algorithm to minimize the makespan and total

crowning glory time together. From the experimental

consequences, we discovered that we completed our

purpose is minimizing makespan and total

completion time.

REFERENCES

[1] Shanjiang Tang, Bu-Sung Lee, and Bingsheng

He, “Dynamic Job Ordering and Slot Configurations for MapReduce Workloads”,

IEEE Transactions On Services Computing,

VOL 9, NO 1, JANUARY/FEBRUARY 2016

[2] M. Zaharia, D. Borthakur, J. Sen Sarma, K.

Elmeleegy, S. Shenker, and I. Stoica, “Job

scheduling for multi-user mapreduce clusters,”

(6)

USA, Tech. Rep. UCB/EECS-2009-55, Apr.

2009

[3] A. Verma, L. Cherkasova, and R. H. Campbell,

“Two sides of a coin: Optimizing the schedule of

mapreduce jobs to minimize their makespan and

improve cluster performance,” in Proc. IEEE

20th Int. Symp. Model., Anal. Simul. Comput.

Telecommun. Syst., 2012, pp. 11–18.

[4] S. Tang, B.-S. Lee, and B. He, “Dynamicmr: A

dynamic slot allocation optimization framework

for mapreduce clusters,” IEEE Trans. Cloud

Comput., vol. 2, no. 3, pp. 333–347, Jul. 2014.

[5] J. Polo, Y. Becerra, D. Carrera, M. Steinder, I.

Whalley, J. Torres, and E. Ayguade, “Deadline-based mapreduce workload management,” IEEE

Trans. Netw. Service Manage., vol. 10, no. 2, pp.

231–244, Jun. 2013.

[6] S. Tang, B.-S. Lee, and B. He, “Dynamic slot

allocation technique for mapreduce clusters,” in

Proc. IEEE Int. Conf. Cluster Comput., Sep.

2013, pp. 1–8

[7] P. Sanders and J. Speck, “Efficient parallel

scheduling of malleable tasks,” in Proc. IEEE

Int. Parallel Distrib. Process. Symp., 2011, pp.

1156–1166.

[8] W. Cirne and F. Berman, “When the herd is

smart: Aggregate behavior in the selection of job

request,” IEEE Trans. Parallel Distrib. Syst., vol.

14, no. 2, pp. 181–192, Feb. 2003.

[9] T. Condie, N. Conway, P. Alvaro, J. M.

Hellerstein, K. Elmeleegy, and R. Sears,

“Mapreduce online,” in Proc. 7th USENIX Conf.

Netw. Syst. Design Implementation, 2010, p. 21.

[10]J. Dean and S. Ghemawat, “Mapreduce:

Simplified data processing on large clusters,” in

Proc. 6th Conf. Symp. Oper. Syst. Design