Optimizing Make Span and Total Completion Time Together to
Improve the Performance of the Map Reduce Workloads
B. Haritha Gayathri & P. Venu
1M. Tech Student, Department of CSE, Malineni Lakshmaiah Women’s Engineering College, Pulladigunta, Guntur,
AP, India
2Associate Professor, Department of CSE, Malineni Lakshmaiah Women’s Engineering College, Pulladigunta,
Guntur, AP, India
ABSTRACT─ MapReduce and additionally Hadoop
are utilized to manage bunch preparing for slots
submitted from various clients (i.e., MapReduce
workloads). In spite of numerous exploration
endeavors committed to enhance the execution of a
single MapReduce work, there is moderately little
consideration paid to the framework execution of
MapReduce workloads. Therefore, this paper to
improve the performance of MapReduce workloads,
we proposed a dynamic job ordering and map/reduce
slot configuration algorithms and these to algorithms
can mitigate the makespan as well as total
completion time of the scheduling process of jobs.
Makespan and total completion time (TCT) are two
key performance metrics. Therefore, in this paper, we
aim to optimize these two metrics.
Keywords: MapReduce Programming Model,
Hadoop, Job Ordering
1. INTRODUCTION
In cloud systems, a provider offers elastic computing
assets (virtual compute nodes) to some of users. The
info of the underlying infrastructure is transparent to
the users. This computing paradigm is attracting
increasing hobby from both instructional researchers
and industry practitioners as it permits users to scale
their packages up and down seamlessly in a pay-as-
you-cross manner. To disarm the entire energy of
cloud computing, it's far broadly established that a
cloud statistics processing machine must offer a
excessive diploma of elasticity, scalability and fault
tolerance. MapReduce is recognized as a likely
manner to perform elastic information processing in
the cloud.
MapReduce is considerably unique from formerly
analyzed models of parallel computation as it
interleaves parallel and sequential computation. In
latest years several nontrivial MapReduce algorithms
have emerged, from computing the diameter of a
graph to implementing the EM algorithm to cluster
large records sets. Each of these algorithms gives a
few insights into what can be completed in a
MapReduce framework; but, there may be a loss of
rigorous algorithmic analyses of the problems
involved.
MapReduce has been extensively seemed as a
promising opportunity to large-scale data or statistics
evaluation consisting of graph processes, device
studying, and statistics mining. These programs
which might be submitted to MapReduce clusters are
done within the shape of jobs, and every activity
contains a number of obligations. Every venture can
be assigned to a node, which is normally known as
slave node, via task scheduler in clusters. Task
MapReduce, it specially controls the order of project
executing and resource allocation. In addition, it may
without delay have an impact on the performance of
MapReduce clusters and the execution time of the
exclusive priority responsibilities. Therefore, the
precise venture scheduling is very crucial for
MapReduce clusters. MapReduce itself presents three
essential task scheduling algorithms, which are the
First-In-First-Out (FIFO), the potential scheduling,
and the honest scheduling. FIFO algorithm is the
build-in scheduler in MapReduce clusters. The
benefits of FIFO are easy and easy to implement,
because it deals with the jobs within the way of first
in first out, this is, the older task may be deal first,
and the new process may be treated later. However, it
does now not take completely into account that there
are distinctive sizes of jobs consisting of small and
large jobs in clusters, and does no longer bear in
mind the support of more than one customer. To deal
with the issues of FIFO, the truthful scheduling is
developed to cope with small and big jobs as pretty
as viable in clusters. In order to gain this goal,
process priorities, pool weights, and postpone
scheduling is introduced. The scheduling of jobs is
managed through activity priorities with a suitable
weight, and weight is split into a positive degree. But
the honest scheduling wishes a lot of manually
configuration, which could significantly affect the
overall performance of the roles.
The data analysis applications variety in
functionality, complexity, resource wishes, and facts
transport deadlines. This variety creates competing
requirements for software design, activity scheduling,
and workload management rules in MapReduce
environments. However, in spite of different user
goals one purpose is commonplace: to enhance the
usability and performance of the MapReduce
framework. The process execution efficiency is
particularly essential for processing manufacturing
workloads whilst a given set of MapReduce jobs and
workflows wishes to be finished periodically on new
facts. Typically, the default FIFO scheduler is used
for processing manufacturing jobs for the reason that
primary performance objective is to minimize the
general execution time (makespan) of a given set.
Such manufacturing workloads are analyzed off-line
for optimizing their execution. There are a slew of
optimization strategies introduced for enhancing
statistics read/write performance in a hard and fast of
manufacturing jobs. For exceptional MapReduce jobs
working over the equal dataset, a more efficient task
scheduling proposes merge their executions in order
that the input statistics is best scanned once.
2. RELATED WORK
MapReduce is a famous computing paradigm for
big-scale information processing in cloud computing.
However, the slot-based MapReduce system (e.g.,
Hadoop MRv1) can be afflicted by poor performance
because of its unoptimized resource allocation. To
address it S. Tang, B. S Lee, and B He recognized &
optimizes the useful resource allocation from three
key factors. First, because of the pre-configuration of
awesome map slots and reduce slots which are not
mutually interchangeable; slots may be seriously
beneath-utilized. Because map slots might be
absolutely utilized while reduce slots are empty, and
vice-versa. They proposed an alternative approach
known as Dynamic Hadoop Slot Allocation by
slot allocation requirement to allow spaces to be
reallocated to either outline diminish assignments
depending on their necessities. Second, the
theoretical execution can handle the straggler bother,
which has demonstrated to improve the execution for
a single procedure however to the detriment of the
bunch execution. In view of this, they proposed
Speculative Execution Performance Balancing to
balance the performance tradeoff among a single task
and a batch of jobs. Third, put off scheduling has
shown to enhance the records locality but on the fee
of fairness. Alternatively, they proposed a method
called Slot PreScheduling that may improve the facts
locality but and not using a impact on fairness.
Finally, through combining those techniques
collectively, they form a step-by using-step slot
allocation machine referred to as DynamicMR that
may improve the overall performance of MapReduce
workloads substantially.
S. Tang, B. S Lee, and B He proposed Dynamic
Hadoop Fair Schedulers (DHFS) to enhance the
utilization and performance of MapReduce clusters at
the same time as making sure the fairness. The center
technique is dynamically allocating map (or reduce)
slots to map and decrease tasks. Two types of DHFS
are presented, namely, PI-DHFS and PD-DHFS,
based totally on equity for cluster and swimming
pools, respectively. The experimental effects display
that their proposed DHFS can enhance the
performance and utilization of the Hadoop cluster
appreciably.
P. Sanders and J. Speck tested the parallel activity
scheduling problem using a version for execution
time which accounts for each computational speedup
and communication slowdown with the purpose to
reduce makespan. The scheduling trouble for a
sequence of submitted jobs studied entails the
determination of what number of processors to assign
to a task and a start time for starting execution. They
investigated a easy but powerful set of rules, Earliest
Completion Time. This on line algorithm minimizes
the of entirety time for a process given the modern
status of the system in spite of everything previous
jobs have been scheduled.
Dynamic map-reduce method presented by means of
J. Polo et al avoidance overload in server at
scheduling approach for multi-job MapReduce
environments, and exhibit its allocation technique.
The approach dynamically adjusts the allocation of to
be had execution slots across jobs so that you can
meet their final touch time dreams, supplied at
submission time. The system constantly video display
units the common venture duration for all jobs in all
nodes, and uses this information to calculate and
modify the anticipated of entirety time for all jobs.
Dynamic resource allocation set of rules used for
important server of un-obtained storage records
forward to physical node of related server. Finally,
the unreachable storage information added from
related server to the specific receiver. So, mission
lowering the total quantity of network site visitors for
a given workload.
3. FRAMEWORK
A. Overview of the Proposed System
In this paper, Makespan and Total Completion Time
(TCT) are two major performance metrics to the
defined because the term for the reason that begin of
the first task until the entirety of the final task for a
hard and fast of jobs. It considers the computation
time of jobs and is often used to degree the overall
performance and utilization efficiency of a machine.
In assessment, general of completion time is called
the sum of finished time periods for all jobs for the
reason that begin of the primary job. It is a
generalized makespan with queuing time covered.
We can use it to measure the satisfaction to the
device from a single task’s perspective through
dividing the entire of entirety time by using the
quantity of jobs.
In this paper, we describe the MK_JR set of rules that
produces the optimized task order and also prove its
approximation ratio. We also describe the process
order which offers the worst, i.e., longest makespan,
that's used for derivation of the higher sure makespan
of a workload. Next, we describe the MK_TCT_JR
algorithm, which optimizes each makespan and
overall of completion time.
B. Classify the Slots
The overall performance of a MapReduce cluster
through optimizing the slots utilization basically from
perspectives; first, classify the slots into sorts,
specifically, idle slots (i.e., no strolling
responsibilities) and busy slots (i.e., with going for
walks obligations) . Given the overall wide variety of
map and reduce slots configured with the aid of
customers, one optimization technique (i.e.,
macro-level optimization) is to enhance the slot usage by
means of maximizing the wide variety of busy slots
and decreasing the quantity of idle slots. Second, it is
well worth noting that not each busy slot can be
successfully utilized. Thus, our optimization
approach is to enhance the usage efficiency of busy
slots after the macrolevel optimization.
Slot Allocation:
Current layout of MapReduce suffers from a
below-utilization of the respective slots because the range of
map and decrease responsibilities varies over time,
resulting in events where the wide variety of slots
allocated for map/reduce is smaller than the wide
variety of map/reduce obligations. Our dynamic slot
allocation policy is primarily based on the
clarification at one-of-a-kind period of time there
may be idle map (reduce) slots, because the job
proceeds from map phase to reduce segment. We can
utilize the unused guide slots are the ones
over-burden reduces undertakings to upgrade the
execution of the MapReduce workload, and the other
way around. For example, toward the start of
MapReduce workload calculation, there may be no
registering decrease obligations and just figuring map
commitments, i.e., all the calculation the thickness
fundamentally based bunching corresponding to a
vast range of parameter settings.
C. Job Execution
Word Count Example reads text documents and
counts how regularly words arise. The center is
textual content documents and the output is text
documents, each line of which contains a phrase and
rely of ways often it happened, separated via a tab.
Each mapper takes a line as input and breaks it into
words. It then emits a key/value pair of the phrase
and 1. Each reducer sums the counts for each phrase
As an enhancement, the reducer is additionally
utilized as a combiner on the conduct yields. This
reduces the measure of information dispatched all
through the group by utilizing joining each
expression into a single record.
4. EXPERIMENTAL RESULTS
In this experiment, we take inputs and run as
unoptimized by using normal map reducer concept.
Here we are running 3 types of jobs such as word
count, sorting & creating inverted index. After run
these three jobs, we can run the MK_JR algorithm.
As per this algorithm, we order jobs in J based on the
following principles: Partition jobs set J into two
disjoint sub-sets JobA and JobB:
JobA = when T(m) <= T(r)
JobB = when T(m) > T(r)
MK_TCT_JR algorithm is also similar to MK_JR
algorithm but difference is based on time threshold
value it arrange the jobs. After completion of these
two algorithms, we can run the MK_SF_JR
algorithm. This algorithm shows that how many
processes required for job and MK_TCT_SF_JR
algorithm provide the processes information based on
time threshold.
Finally, we can see the makespan time for all
algorithms.
5. CONCLUSION
In this paper we conclude that, we proposed job
ordering optimization algorithm & map/reduce slot
configuration optimization algorithm. By using these
two algorithms provided that the makespan is optimal
but total completion time is poor. To improve this,
we proposed moreover, a new grasping job ordering
algorithm and a map/reduce slot configuration
algorithm to minimize the makespan and total
crowning glory time together. From the experimental
consequences, we discovered that we completed our
purpose is minimizing makespan and total
completion time.
REFERENCES
[1] Shanjiang Tang, Bu-Sung Lee, and Bingsheng
He, “Dynamic Job Ordering and Slot Configurations for MapReduce Workloads”,
IEEE Transactions On Services Computing,
VOL 9, NO 1, JANUARY/FEBRUARY 2016
[2] M. Zaharia, D. Borthakur, J. Sen Sarma, K.
Elmeleegy, S. Shenker, and I. Stoica, “Job
scheduling for multi-user mapreduce clusters,”
USA, Tech. Rep. UCB/EECS-2009-55, Apr.
2009
[3] A. Verma, L. Cherkasova, and R. H. Campbell,
“Two sides of a coin: Optimizing the schedule of
mapreduce jobs to minimize their makespan and
improve cluster performance,” in Proc. IEEE
20th Int. Symp. Model., Anal. Simul. Comput.
Telecommun. Syst., 2012, pp. 11–18.
[4] S. Tang, B.-S. Lee, and B. He, “Dynamicmr: A
dynamic slot allocation optimization framework
for mapreduce clusters,” IEEE Trans. Cloud
Comput., vol. 2, no. 3, pp. 333–347, Jul. 2014.
[5] J. Polo, Y. Becerra, D. Carrera, M. Steinder, I.
Whalley, J. Torres, and E. Ayguade, “Deadline-based mapreduce workload management,” IEEE
Trans. Netw. Service Manage., vol. 10, no. 2, pp.
231–244, Jun. 2013.
[6] S. Tang, B.-S. Lee, and B. He, “Dynamic slot
allocation technique for mapreduce clusters,” in
Proc. IEEE Int. Conf. Cluster Comput., Sep.
2013, pp. 1–8
[7] P. Sanders and J. Speck, “Efficient parallel
scheduling of malleable tasks,” in Proc. IEEE
Int. Parallel Distrib. Process. Symp., 2011, pp.
1156–1166.
[8] W. Cirne and F. Berman, “When the herd is
smart: Aggregate behavior in the selection of job
request,” IEEE Trans. Parallel Distrib. Syst., vol.
14, no. 2, pp. 181–192, Feb. 2003.
[9] T. Condie, N. Conway, P. Alvaro, J. M.
Hellerstein, K. Elmeleegy, and R. Sears,
“Mapreduce online,” in Proc. 7th USENIX Conf.
Netw. Syst. Design Implementation, 2010, p. 21.
[10]J. Dean and S. Ghemawat, “Mapreduce:
Simplified data processing on large clusters,” in
Proc. 6th Conf. Symp. Oper. Syst. Design