• No results found

Performance of MapReduce on Multicore Clusters

N/A
N/A
Protected

Academic year: 2020

Share "Performance of MapReduce on Multicore Clusters"

Copied!
42
0
0

Loading.... (view fulltext now)

Full text

(1)

S

A

L

S

A

S

A

L

S

A

Performance of MapReduce on

Multicore Clusters

UMBC, Maryland

Judy Qiu

http://salsahpc.indiana.edu

School of Informatics and Computing

Pervasive Technology Institute

(2)

S

A

L

S

A

Important Trends

•Implies parallel computing

important again

•Performance from extra

cores – not extra clock

speed

•new commercially

supported data center

model building on

compute grids

•In all fields of science and

throughout life (e.g. web!)

•Impacts preservation,

access/use, programming

model

Data Deluge

Technologies

Cloud

eScience

Multicore/

Parallel

Computing

•A spectrum of eScience or

eResearch applications

(biology, chemistry, physics

social science and

humanities …)

•Data Analysis

•Machine learning

(3)

3

Data

Information

Knowledge

(4)

S

A

L

S

A

DNA Sequencing Pipeline

Illumina/Solexa Roche/454 Life Sciences Applied Biosystems/SOLiD

Modern Commerical Gene Sequences Internet

Read Alignment

Visualization Plotviz

Blocking alignmentSequence

MDS

Dissimilarity Matrix

N(N-1)/2 values FASTA File

N Sequences Pairingsblock

Pairwise clustering

MapReduce

MPI

• This chart illustrate our research of a pipeline mode to provide services on demand (Software as a Service SaaS)

(5)

5

Parallel Thinking

(6)

6

Flynn’s Instruction/Data Taxonomy of Computer Architecture

v

Single Instruction Single Data Stream (SISD)

A sequential computer which exploits no parallelism in either the instruction or

data streams. Examples of SISD architecture are the traditional uniprocessor

machines like a old PC.

v

Single Instruction Multiple Data (SIMD)

A computer which exploits multiple data streams against a single instruction

stream to perform operations which may be naturally parallelized. For example,

GPU.

v

Multiple Instruction Single Data (MISD)

Multiple instructions operate on a single data stream. Uncommon architecture

which is generally used for fault tolerance. Heterogeneous systems operate on the

same data stream and must agree on the result. Examples include the Space

Shuttle flight control computer.

v

Multiple Instruction Multiple Data (MIMD)

Multiple autonomous processors simultaneously executing different instructions

on different data. Distributed systems are generally recognized to be MIMD

(7)

7

Questions

If we extend Flynn’s Taxonomy to software,

v

What classification is MPI?

(8)

8

v

MapReduce is a new programming model

for

processing and generating

large data sets

(9)

S

A

L

S

A

MapReduce “File/Data Repository” Parallelism

Instruments

Disks

Map

1

Map

2

Map

3

Reduce

Communication

Map

= (data parallel) computation reading and writing data

Reduce

= Collective/Consolidation phase e.g. forming multiple

global sums as in histogram

Portals

/Users

MPI and Iterative MapReduce

Map

Map

Map

Map

(10)

S

A

L

S

A

MapReduce

Implementations support:

Splitting of data

Passing the output of map functions to reduce functions

Sorting the inputs to the reduce function based on the

intermediate keys

Quality of services

Map(Key, Value)

Reduce(Key, List<Value>)

Data Partitions

Reduce Outputs

A hash function maps the

results of the map tasks to

r reduce tasks

(11)

S

A

L

S

A

Google MapReduce Apache Hadoop Microsoft Dryad Twister Azure Twister

Programming

Model MapReduce MapReduce DAG execution,Extensible to MapReduce and other patterns

Iterative

MapReduce MapReduce-- willextend to Iterative MapReduce

Data Handling GFS (Google File

System) HDFS (HadoopDistributed File System)

Shared Directories &

local disks Local disksand data management tools

Azure Blob Storage

Scheduling Data Locality Data Locality; Rack aware, Dynamic task scheduling through global queue Data locality; Network topology based run time graph optimizations; Static task partitions Data Locality; Static task partitions Dynamic task scheduling through global queue

Failure Handling Re-execution of failed tasks; Duplicate

execution of slow tasks

Re-execution of failed tasks;

Duplicate execution of slow tasks

Re-execution of failed tasks; Duplicate execution of slow tasks

Re-execution

of Iterations Re-execution offailed tasks; Duplicate execution of slow tasks

High Level Language Support

Sawzall Pig Latin DryadLINQ Pregel has related features

N/A

Environment Linux Cluster. Linux Clusters, Amazon Elastic Map Reduce on EC2

Windows HPCS

cluster Linux ClusterEC2 Window AzureCompute, Windows Azure Local

Development Fabric

Intermediate

data transfer File File, Http File, TCP pipes,shared-memory FIFOs

Publish/Subscr

(12)

S

A

L

S

A

Hadoop & DryadLINQ

• Apache Implementation of Google’s MapReduce

• Hadoop Distributed File System (HDFS) manage data

• Map/Reduce tasks are scheduled based on data locality in HDFS (replicated data blocks)

• Dryad process the DAG executing vertices on compute clusters

• LINQ provides a query interface for structured data

• Provide Hash, Range, and Round-Robin partition patterns

Job

Tracker

Name

Node

1

3

2

2

3

4

M

M

M

M

R

R

R

R

HDFS

Data

blocks

Data/Compute Nodes

Master Node

Apache Hadoop

Microsoft DryadLINQ

Edge :

communication path

Vertex : execution task

Standard LINQ operations DryadLINQ operations

DryadLINQ Compiler

Dryad Execution Engine

Directed

Acyclic Graph

(

DAG

) based

execution flows

(13)

S

A

L

S

A

Applications using Dryad & DryadLINQ

Perform using DryadLINQ and Apache Hadoop implementations

Single “Select” operation in DryadLINQ

“Map only” operation in Hadoop

CAP3

-

Expressed Sequence Tag assembly to

re-construct full-length mRNA

Input files (FASTA)

Output files

CAP3 CAP3 CAP3

Average

Time

(Seconds

)

0 100 200 300 400 500 600

Time to process 1280 files each with ~375 sequences

Hadoop

DryadLINQ

(14)

S

A

L

S

A

Map() Map()

Reduce

Results

Optional

Reduce

Phase

HDFS

HDFS

exe exe

Input Data Set

Data File

Executable

Classic Cloud Architecture

(15)

S

A

L

S

A

Cap3 Efficiency

•Ease of Use – Dryad/Hadoop are easier than EC2/Azure as higher level models

•Lines of code including file copy

Azure : ~300 Hadoop: ~400 Dyrad: ~450 EC2 : ~700

Usability and Performance of Different Cloud Approaches

•Efficiency = absolute sequential run time / (number of cores * parallel run time)

•Hadoop, DryadLINQ - 32 nodes (256 cores IDataPlex)

•EC2 - 16 High CPU extra large instances (128 cores)

•Azure- 128 small instances (128 cores)

(16)

S

A

L

S

A

(17)

S

A

L

S

A

Scaled Timing with

Azure/Amazon MapReduce

Number of Cores * Number of files

64 * 1024 96 * 1536 128 * 2048 160 * 2560 192 * 3072

Time

(s)

1000 1100 1200 1300 1400 1500 1600 1700 1800 1900

Cap3 Sequence Assembly

Azure MapReduce

Amazon EMR

(18)

S

A

L

S

A

Cap3 Cost

Num. Cores * Num. Files

64 * 102496 * 1536 128 *

2048

160 *

2560

192 *

3072

Co

st

($

)

0

2

4

6

8

10

12

14

16

18

Azure MapReduce

Amazon EMR

(19)

S

A

L

S

A

Alu and Metagenomics Workflow

“All pairs” problem

Data is a collection of N sequences. Need to calcuate N

2

dissimilarities

(distances) between sequnces (all pairs).

These cannot be thought of as vectors because there are missing characters

“Multiple Sequence Alignment” (creating vectors of characters) doesn’t seem to

work if N larger than O(100), where 100’s of characters long.

Step 1:

Can calculate N

2

dissimilarities (distances) between sequences

Step 2:

Find families by

clustering

(using much better methods than Kmeans). As no

vectors, use vector free O(N

2

) methods

Step 3:

Map to 3D for visualization using Multidimensional Scaling (

MDS

) – also O(N

2

)

Results:

N = 50,000 runs in

10

hours (the complete pipeline above) on

768

cores

Discussions:

Need to address millions of sequences …..

Currently using a mix of MapReduce and MPI

(20)

S

A

L

S

A

All-Pairs Using DryadLINQ

35339 50000

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

20000 DryadLINQ

MPI

Calculate Pairwise Distances (Smith Waterman Gotoh)

125 million distances

4 hours & 46 minutes

Calculate pairwise distances for a collection of genes (used for clustering, MDS)

Fine grained tasks in MPI

Coarse grained tasks in DryadLINQ

Performed on 768 cores (Tempest Cluster)

(21)

S

A

L

S

A

Biology MDS and Clustering Results

Alu Families

This visualizes results of Alu repeats from Chimpanzee and Human Genomes. Young families (green, yellow) are seen as tight clusters. This is projection of MDS dimension reduction to 3D of 35399 repeats – each with about 400 base pairs

Metagenomics

(22)

S

A

L

S

A

Hadoop/Dryad Comparison

Inhomogeneous Data I

Standard Deviation

0 50 100 150 200 250 300

Ti

me

(s)

1500 1550 1600 1650 1700 1750 1800 1850 1900

Randomly Distributed Inhomogeneous Data Mean: 400, Dataset Size: 10000

DryadLinq SWG

Hadoop SWG

Hadoop SWG on VM

Inhomogeneity of data does not have a significant effect when the sequence

lengths are randomly distributed

(23)

S

A

L

S

A

Hadoop/Dryad Comparison

Inhomogeneous Data II

Standard Deviation

0 50 100 150 200 250 300

To

ta

lTi

me

(s)

0 1,000 2,000 3,000 4,000 5,000 6,000

Skewed Distributed Inhomogeneous data Mean: 400, Dataset Size: 10000

DryadLinq SWG Hadoop SWG Hadoop SWG on VM

This shows the natural load balancing of Hadoop MR dynamic task assignment

using a global pipe line in contrast to the DryadLinq static assignment

(24)

S

A

L

S

A

Hadoop VM Performance Degradation

15.3% Degradation at largest data set size

0% 5% 10% 15% 20% 25% 30% 35%

No. of Sequences

10000 20000 30000 40000 50000

Perf. Degradation On VM (Hadoop)

(25)

25

Publications

Jaliya Ekanayake, Thilina Gunarathne, Xiaohong Qiu,

Cloud

Technologies for Bioinformatics Applications

, invited paper accepted

by the Journal of IEEE Transactions on Parallel and Distributed

Systems. Special Issues on Many-Task Computing.

Software Release

Twister (Iterative MapReduce)

http://www.iterativemapreduce.org/

(26)

S

A

L

S

A

Twister: An iterative MapReduce

Programming Model

configureMaps(..)

Two configuration options :

1. Using local disks (only for maps)

2. Using pub-sub bus

configureReduce(..)

runMapReduce(..)

while(

condition

){

} //end while

updateCondition()

close()

User program’s process space

Combine()

operation

Reduce()

Map()

Worker Nodes

Communications/data transfers

via the pub-sub broker network

Iterations

May send <Key,Value> pairs directly

Local Disk

(27)

S

A

L

S

A

(28)

S

A

L

S

A

Iterative Computations

K-means

Multiplication

Matrix

Performance of K-Means

Parallel Overhead Matrix Multiplication

(29)

S

A

L

S

A

Pagerank – An Iterative MapReduce Algorithm

Well-known pagerank algorithm [1]

Used ClueWeb09 [2] (1TB in size) from CMU

Reuse of map tasks and faster communication pays off

[1] Pagerank Algorithm,

http://en.wikipedia.org/wiki/PageRank

[2] ClueWeb09 Data Set,

http://boston.lti.cs.cmu.edu/Data/clueweb09/

M

R

Current

Page ranks

(Compressed)

Partial

Adjacency

Matrix

Partial

Updates

C

Partially

merged

Updates

Iterations

Performance of Pagerank using ClueWeb Data (Time for 20 iterations)

(30)

S

A

L

S

A

Applications & Different Interconnection Patterns

Map Only

Classic

MapReduce

Iterative Reductions

MapReduce++

Loosely Synchronous

CAP3

Analysis

Document conversion

(PDF -> HTML)

Brute force searches in

cryptography

Parametric sweeps

High Energy Physics

(

HEP

) Histograms

SWG

gene alignment

Distributed search

Distributed sorting

Information retrieval

Expectation

maximization

algorithms

Clustering

Linear Algebra

Many MPI scientific

applications utilizing

wide variety of

communication

constructs including

local interactions

- CAP3 Gene Assembly

- PolarGrid Matlab data

analysis

Information Retrieval

-HEP Data Analysis

- Calculation of Pairwise

Distances for ALU

Sequences

- Kmeans

-

Deterministic

Annealing Clustering

-

Multidimensional

Scaling

MDS

- Solving Differential

Equations and

- particle dynamics

with short range forces

Input

Output

map

Input

map

reduce

Input

map

reduce

iterations

Pij

(31)

Bare-metal Nodes

Linux Virtual

Machines

Microsoft Dryad / Twister

Apache Hadoop / Twister/

Sector/Sphere

Smith Waterman Dissimilarities, PhyloD Using DryadLINQ, Clustering,

Multidimensional Scaling, Generative Topological Mapping

Xen, KVM Virtualization / XCAT Infrastructure

SaaS

Applications

Cloud

Platform

Cloud

Infrastructure

Hardware

Nimbus, Eucalyptus, Virtual appliances, OpenStack, OpenNebula,

Hypervisor/

Virtualization

Windows Virtual

Machines

Linux Virtual

Machines

Windows Virtual

Machines

Apache PigLatin/Microsoft DryadLINQ

Higher Level

Languages

Cloud Technologies and Their Applications

Swift, Taverna, Kepler,Trident

(32)

S

A

L

S

A

• Switchable clusters on the same hardware (~5 minutes between different OS such as Linux+Xen to Windows+HPCS)

• Support for virtual clusters

• SW-G : Smith Waterman Gotoh Dissimilarity Computation as an pleasingly parallel problem suitable for MapReduce style applications

SALSAHPC Dynamic Virtual Cluster on

FutureGrid -- Demo at SC09

Pub/Sub Broker Network Summarizer Switcher Monitoring Interface iDataplex Bare-metal Nodes XCAT Infrastructure Virtual/Physical Clusters

Monitoring & Control Infrastructure

iDataplex Bare-metal Nodes

(32 nodes)

XCAT Infrastructure

Linux Bare-system Linux on Xen Windows Server 2008 Bare-system SW-G Using

Hadoop SW-G UsingHadoop SW-G UsingDryadLINQ

Monitoring Infrastructure

Dynamic Cluster

Architecture

(33)

S

A

L

S

A

SALSAHPC Dynamic Virtual Cluster on

FutureGrid -- Demo at SC09

• Top: 3 clusters are switching applications on fixed environment. Takes approximately 30 seconds.

• Bottom: Cluster is switching between environments: Linux; Linux +Xen; Windows + HPCS. Takes approxomately 7 minutes

• SALSAHPC Demo at SC09. This demonstrates the concept of Science on Clouds using a FutureGrid iDataPlex.

(34)

Summary of Initial Results

Cloud technologies (Dryad/Hadoop/Azure/EC2) promising for

Life Science computations

Dynamic Virtual Clusters allow one to switch between different

modes

Overhead of VM’s on Hadoop (15%) acceptable

Twister allows iterative problems (classic linear

algebra/datamining) to use MapReduce model efficiently

(35)

S

A

L

S

A

FutureGrid: a Grid Testbed

http://www.futuregrid.org/

NID

: Network Impairment Device

Private

(36)

S

A

L

S

A

FutureGrid key Concepts

FutureGrid provides a testbed with a wide variety of

computing services to its users

Supporting users developing new applications and new

middleware using

Cloud, Grid and Parallel computing

(Hypervisors – Xen, KVM, ScaleMP, Linux, Windows, Nimbus,

Eucalyptus, Hadoop, Globus, Unicore, MPI, OpenMP …)

Software supported by FutureGrid or users

~5000 dedicated cores distributed across country

The FutureGrid testbed provides to its users:

A rich development and testing platform for middleware and

application users looking at

interoperability

,

functionality

and

performance

Each use of FutureGrid is an

experiment

that is

reproducible

A rich

education and teaching

platform for advanced

cyberinfrastructure classes

(37)

S

A

L

S

A

FutureGrid key Concepts II

Cloud

infrastructure supports loading of general images on

Hypervisors

like Xen;

FutureGrid dynamically provisions

software as

needed onto “bare-metal” using Moab/xCAT based environment

Key early user oriented milestones:

June 2010

Initial users

November 2010-September 2011

Increasing number of users allocated by

FutureGrid

October 2011

FutureGrid allocatable via TeraGrid process

To apply for FutureGrid access or get help

, go to homepage

www.futuregrid.org

. Alternatively for help send email to

[email protected]

. Please send email to

PI: Geoffrey Fox

(38)

S

A

L

S

A

(39)

S

A

L

S

A

University of Arkansas Indiana University University of California at Los Angeles Penn State Iowa State Univ.Illinois at Chicago University of Minnesota Michigan State Notre Dame University of Texas at El Paso IBM Almaden Research Center Washington University San Diego Supercomputer Center University of Florida Johns Hopkins

July 26-30, 2010 NCSA Summer School Workshop

http://salsahpc.indiana.edu/tutorial

(40)
(41)

41

Summary

A New Science

“A new, fourth paradigm for science is based on data intensive

computing” … understanding of this new paradigm from a

variety of disciplinary perspective

– The Fourth Paradigm: Data-Intensive Scientific Discovery

A New Architecture

“Understanding the design issues and programming challenges

for those potentially ubiquitous next-generation machines”

(42)

42

Acknowledgements

S

A

L

S

A

HPC Group

http://salsahpc.indiana.edu

… and Our Collaborators

References

Related documents

In this paper, we propose a mobility pattern based location tracking scheme based, which efficiently reduces the location updates and searching cost in the

It is commonly used for itsChemotherapeutic effect, Antioxidant effect ,Antitumoral effect ,Biological effect ,Antiviral and Immunoenhancing effect ,Mechanical effect

Number of approaches in rational drug design, the molecular mechanism of drug action, Drug metabolizing enzyme action upon the structure of to drug molecule,

The corrosion inhibition of Armco steel in 0.5 M sulfuric acid in the presence of cationic Copolymers CQGP of Quaternary 4-vinylpyridine (QVPy) Graft

By obtaining the results from performing trial spot welding between 0.1 mm in thickness and 0.3 mm in thickness of Hilumin® tabs, the value range of maximum supply voltage,

The analysis conducted in the present study demonstrated that the municipalities of Minas Gerais are able to receive energy production fields from renewable sources (solar and

A first mode carries out an unsupervised learning, it uses area and color features with a practical thresholding classifier to differentiate between weed and vegetable

Temperature dependence of voltage- gated H + currents in human neutrophils, rat alveolar epithelial cells,.. and