• No results found

MapReduce. Simplified Data Processing on Large Clusters

N/A
N/A
Protected

Academic year: 2022

Share "MapReduce. Simplified Data Processing on Large Clusters"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

MapReduce

Simplified Data Processing on Large Clusters

(2)

“As a reaction to this

complexity, we designed a new abstraction that allows us to express the simple

computations we were trying to perform but hides the

messy details of parallelization,

fault-tolerance, data distribution and load balancing in a library.”

“The issues of how to parallelize the computation, distribute the data, and handle failures conspire to obscure the original simple computation with large amounts of complex code.”

[2]

(3)

What is MapReduce?

● Jeffrey Dean and Sanjay Ghemawat of Google, 2004

● Born out of Google’s need to perform distributed computations on large datasets (petabytes)

● Programming model for such computations

● Partitioning data, communication, fault-tolerance, load balancing are abstracted in a library

● Fundamentally comprised of Map and Reduce functions

● Runs on clusters of “commodity” machines, not supercomputers

● Reduces cognitive overhead of parallel programming

(4)

MapReduce

Takes set of input key/value pairs, returns a set of output key/value pairs

● The user implements the Map and Reduce functions

● Map

○ Takes input key/value pairs and does some computation (implemented by the user), producing intermediate key/value pairs

○ MapReduce library groups intermediate values associated with the same key, and passes key and values to the Reduce function

● Reduce

○ Takes intermediate key and associated set of values and merges these values (according to the user), typically producing zero or one output value

Additional mapreduce specification object specified by user, containing input

and output files, optional tuning parameters

(5)

Example: Word Count

map(String key, String value):

// key: document name

// value: document contents for each word w in value:

EmitIntermediate(w, "1");

reduce(String key, Iterator values):

// key: a word

// values: a list of counts int result = 0;

for each v in values:

result += ParseInt(v);

Emit(AsString(result));

(6)

[1]

(7)

Implementation

● Google use case:

○ Dual processor x86 Linux machines with 2-4 GB memory

○ 100 Mbps to 1 Gbps switched ethernet

○ Cluster consists of 100s to 1,000s of commodity machines (machine failures common)

○ Storage is inexpensive hard drives

○ In-house distributed file system uses replication to ensure availability and reliability

○ Users submit jobs, consisting of tasks, to scheduler

○ Scheduler maps tasks to available nodes in cluster

(8)

Step 1

MapReduce partitions the input file(s) into M splits, typically 16 to 64 MB each.

MapReduce forks the program and copies it to each node in the cluster.

[2]

(9)

Step 2

One fork is the master node, the rest are workers. The master picks idle workers and assigns each one a map task or a

reduce task.

There are M map tasks and R reduce tasks. M is determined automatically, while R is

specified by the user (number of

output files).

(10)

Step 3

A worker who is assigned a map task reads the contents of its corresponding input split(s) and parses key/value pairs.

Key/value pairs are passed to the user implemented Map

function. Intermediate key/value

pairs are stored in a buffer.

(11)

Step 4

Buffered intermediate key/value pairs are periodically written to local memory and partitioned into R regions.

Locations of written

intermediate key/value pairs are

sent to the master node, who

forwards the locations to reduce

workers.

(12)

Step 5

When a reduce worker is

notified by the master, it reads the intermediate key/value pairs from the local memory of map workers.

When all intermediate data has

been read, the reduce worker

sorts the key/value pairs,

grouping by key.

(13)

Step 6

The reduce worker iterates over the sorted intermediate

key/value pairs and for each unique key, it passes the key and set of values to the user implemented Reduce function.

The output of the Reduce

function is appended to the final

output file for its corresponding

reduce partition.

(14)

Step 7

When all map tasks and reduce tasks have completed, the

master and workers join with the

user program. The output is

available in the R output files.

(15)

Master Data Structures

The master node stores the state (idle, in-progress, completed) of each map task and reduce task, along with the id of the worker node (non-idle)

● The master node is the “conduit” through which the location of intermediate data propagates from map tasks to reduce tasks

For each completed map task, the master stores the location and size of the R intermediate regions produced by the map task

○ Updates to size and location are received when a map task completes

Size and location information is forwarded incrementally to in-progress reduce tasks

(16)

Fault Tolerance

● With 100s to 1,000s of machines involved, machine failures are common--we want to tolerate them gracefully, and abstract this from the programmer

● Worker failure

○ Master pings workers regularly

○ If no response received

Worker is marked failed

In-progress map tasks and reduce tasks on failed workers are reset to idle state and rescheduled

○ Complete map tasks are re-executed on failure because output is stored on failed local disk

○ Complete reduce task are not re-executed on failure because output is stored on global fs

○ Reduce workers are notified of re-execution

(17)

Fault Tolerance

● Master failure

○ Master writes periodic checkpoints of master data structures to global fs

○ If master fails, new master assigned and started from checkpointed state

(18)

Task Granularity

● What values should we choose for M and R?

● M and R >> number of workers

○ Improves load balancing

○ Quicker recovery after failure because completed map tasks can be distributed

● In practice (at Google)

○ Choose M such that map tasks input is <= fs block size

○ Choose R to be a small multiple of number of workers

(19)

Backup Tasks

● “Straggler” machine takes longer

○ Hardware issues e.g. faulty disk

○ Resource contention e.g. other tasks scheduled on machine, sharing CPU, memory, network

● Backup Tasks

○ When a MapReduce program nears completion, the master schedules backup executions of remaining in-progress tasks

○ Task marked complete when either task completes

(20)

[2]

(21)

Apache Hadoop

● Open-source framework for distributed computing using MapReduce

[3]

(22)

WordCount.java

[4]

(23)

Sources

[1] MapReduce Tutorial | Mapreduce Example in Apache Hadoop

[2] MapReduce: Simplified Data Processing on Large Clusters

[3] An Analysis of the Hadoop HDFS Distributed File System Architecture | by Amit Kriplani

[4] MapReduce Tutorial - Apache Hadoop

References

Related documents

Department of Natural Resource M anagement.. Seven participatory tools w ere applied to assess com m unity m em bers’ experience o f current clim ate change conditions,

The voluntary activity of refugees thus helped diffuse conflicts arising from the contradictions between refugees and the state, by using social capital between refugees as

With CDI, the requirements of individual business users cede to the processing requirements of the applications that need access to customer master data.. The

In order to improve the work on the organisation of the conference and the planned publishing of the next volumes of the journal „Family Upbringing” we kindly ask you to fill out

Hitherto, the RPA framework has not been used to assess breast cancer leaflets to determine the presence or absence of the RPA constructs (risk (including

The PROMs questionnaire used in the national programme, contains several elements; the EQ-5D measure, which forms the basis for all individual procedure

Reduce worker is notified by master with pair locations; uses RPC to read intermediate data from local disk of map workers and sorts it by intermediate key to group tuples by

 Dean, Jeff and Ghemawat, Sanjay, MapReduce: Simplified Data Processing on Large