• No results found

Developing a MapReduce Application

N/A
N/A
Protected

Academic year: 2021

Share "Developing a MapReduce Application"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

MapReduce Paradigm Job Tracker Example

Developing a MapReduce Application

Oguzhan Gencoglu

TIE 12206 - Apache Hadoop Tampere University of Technology, Finland

(2)

MapReduce Paradigm Job Tracker Example

Outline

1 MapReduce Paradigm What is MapReduce MapReduce Workflow 2 Job Tracker

Hadoop Default Ports

3 Example

Word Count Job Tracker Key Points

(3)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

Outline

1 MapReduce Paradigm What is MapReduce MapReduce Workflow 2 Job Tracker

Hadoop Default Ports 3 Example

Word Count Job Tracker Key Points

(4)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

What is MapReduce

MapReduce is a software framework for processing (large) data sets in a distributed fashion over several machines.

Core idea

< key, value > pairs

Almost all data can be mapped into key, value pairs. Keys and values may be of any type.

(5)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

What is MapReduce

MapReduce is a software framework for processing (large) data sets in a distributed fashion over several machines.

Core idea

< key, value > pairs

Almost all data can be mapped into key, value pairs.

(6)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

What is MapReduce

MapReduce is a software framework for processing (large) data sets in a distributed fashion over several machines.

Core idea

< key, value > pairs

Almost all data can be mapped into key, value pairs. Keys and values may be of any type.

(7)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

Outline

1 MapReduce Paradigm What is MapReduce MapReduce Workflow 2 Job Tracker

Hadoop Default Ports 3 Example

Word Count Job Tracker Key Points

(8)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

MapReduce Workflow

Write your map and reduce functions

Test with a small subset of data

If it fails use your IDE’s debugger to find the problem Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which it failed.

Do profiling to tune the performance

(9)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

MapReduce Workflow

Write your map and reduce functions Test with a small subset of data

If it fails use your IDE’s debugger to find the problem Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which it failed.

(10)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

MapReduce Workflow

Write your map and reduce functions Test with a small subset of data

If it fails use your IDE’s debugger to find the problem

Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which it failed.

Do profiling to tune the performance

(11)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

MapReduce Workflow

Write your map and reduce functions Test with a small subset of data

If it fails use your IDE’s debugger to find the problem Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which it failed.

(12)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

MapReduce Workflow

Write your map and reduce functions Test with a small subset of data

If it fails use your IDE’s debugger to find the problem Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which it failed.

Do profiling to tune the performance

(13)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

MapReduce Workflow

Write your map and reduce functions Test with a small subset of data

If it fails use your IDE’s debugger to find the problem Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which it failed.

(14)

MapReduce Paradigm Job Tracker Example What is MapReduce MapReduce Workflow

MapReduce Workflow

Write your map and reduce functions Test with a small subset of data

If it fails use your IDE’s debugger to find the problem Run on full dataset

If it fails Hadoop provides some debugging tools

e.g. IsolationRunner : runs a task over the same input which it failed.

Do profiling to tune the performance

(15)

MapReduce Paradigm

Job Tracker

Example

Hadoop Default Ports

Outline

1 MapReduce Paradigm

What is MapReduce MapReduce Workflow 2 Job Tracker

Hadoop Default Ports

3 Example

Word Count Job Tracker Key Points

(16)

MapReduce Paradigm

Job Tracker

Example

Hadoop Default Ports

Hadoop Default Ports

Handful of ports over TCP.

Some used by Hadoop itself (to schedule jobs, replicate blocks, etc.).

Some are directly for users (either via an interposed Java client or via plain old HTTP)

(17)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Outline

1 MapReduce Paradigm What is MapReduce MapReduce Workflow 2 Job Tracker

Hadoop Default Ports 3 Example

Word Count

Job Tracker Key Points

(18)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Word Count

Task: Counting the word occurances (frequencies) in a text file (or set of files).

< word, count > as < key, value > pair

Mapper: Emits < word, 1 > for each word (no counting at this part).

Shuffle in between: pairs with same keys grouped together and passed to a single machine.

Reducer: Sums up the values (1s) with the same key value. Oguzhan Gencoglu Developing a MapReduce Application

(19)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

(20)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Outline

1 MapReduce Paradigm What is MapReduce MapReduce Workflow 2 Job Tracker

Hadoop Default Ports 3 Example

Word Count

Job Tracker

Key Points

(21)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Job Tracker

(22)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Tasks

(23)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Name Node

(24)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Outline

1 MapReduce Paradigm What is MapReduce MapReduce Workflow 2 Job Tracker

Hadoop Default Ports 3 Example

Word Count Job Tracker

Key Points

(25)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Key Points

Test mapper and reducer outside hadoop.

Copy your MapReduce function and files to DFS.

Test mapper and reducer with hadoop using a small portion of the data.

(26)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Key Points

Test mapper and reducer outside hadoop.

Copy your MapReduce function and files to DFS.

Test mapper and reducer with hadoop using a small portion of the data.

Track the jobs, debug, do profiling

(27)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Key Points

Test mapper and reducer outside hadoop.

Copy your MapReduce function and files to DFS.

Test mapper and reducer with hadoop using a small portion of the data.

(28)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Key Points

Test mapper and reducer outside hadoop.

Copy your MapReduce function and files to DFS.

Test mapper and reducer with hadoop using a small portion of the data.

Track the jobs, debug, do profiling

(29)

MapReduce Paradigm Job Tracker Example Word Count Job Tracker Key Points

Questions/Comments

References

Related documents

Amongst many available options, the user can introduce custom instructions and matching functional units; modify existing units; change the pipelining depth

Table 6 shows the activities in counts per minute of photoheterotrophic protein fractions eluted from a calf thymus (CT) DNA- cellulose column when incubated

Business logic Controller (Servlet) Model Actions State objects Client (Browser) Page actions Operation actions Form beans Unit beans State objects Configuration file: - Action

90’ı 5 µm altı olan gezegen bilyalı değirmende 4 saat boyunca öğütülen tozla (C4 tozu) devam edilmesi uygun görülmüş olup bu tozun detaylı tane boyut analizi Şekil

It is also important to note Somali women reported experiences of positive aspects of childbirth, for example women reported an appreciation for care received, support from

The scheduler used by the Xen hypervisor (and with modi- fications by Amazon EC2) is vulnerable to such timing-based manipulation—rather than receiving its fair share of CPU

counseling was feasible to implement in outpatient commu- nity-based substance abuse treatment settings, was effective in producing modest abstinence rates and strong reductions

In the service sector there are still only a few firms which receive public R&amp;D fundings: 67 firms in our sample have future access to R&amp;D subsidies.. All other firms are