Classical and Iterative MapReduce on Azure

(1)

Classical and Iterative

MapReduce on Azure

Cloud Futures Workshop

Microsoft Conference Center Building 33, Redmond, Washington

June 2 2011

Geoffrey Fox

[email protected]

http://www.infomall.org

http://www.salsahpc.org

Director, Digital Science Center, Pervasive Technology Institute

Associate Dean for Research and Graduate Studies, School of Informatics and Computing

Indiana University Bloomington

Work with Thilina Gunarathne, Judy Qiu

(2)

https://portal.futuregrid.org

Simple Assumptions

• _{Clouds may not be suitable for everything but they are suitable for}

majority of data intensive applications

–

Solving partial differential equations on 100,000 cores probably needs

classic MPI engines

• _{Cost effectiveness, elasticity and quality programming model will}

drive use of clouds in many areas such as genomics

• _{Need to solve issues of}

–

Security-privacy-trust for sensitive data

–

How to store data – “data parallel file systems” (HDFS) or classic HPC

approach with shared file systems with Lustre etc.

• _{Programming model which is likely to be}

_MapReduce

_based

initially

–

Look at high level languages

–

Compare with databases (SciDB?)

–

Must support iteration for many problems

(3)

Why need cost effective

Computing!

(4)

MapReduceRoles4Azure Architecture

(5)

MapReduceRoles4Azure

• _{Use distributed, highly scalable and highly available cloud services}

as the building blocks.

–

_{Azure Queues for task scheduling.}

–

_{Azure Blob storage for input, output and intermediate data storage.}

–

_{Azure Tables for meta-data storage and monitoring}

• _{Utilize eventually-consistent , high-latency cloud services effectively}

to deliver performance comparable to traditional MapReduce

runtimes.

• _{Minimal management and maintenance overhead}

• _{Supports dynamically scaling up and down of the compute}

resources.

• _{MapReduce fault tolerance}

(6)

MapReduceRoles4Azure Performance

• _{Parallel efficiency}

• _{AzureMapReduce}

–

_{Azure small instances – Single Core (1.7 GB memory)}

• _{Hadoop Bare Metal -IBM iDataplex cluster}

–

_{Two quad-core CPUs (Xeon 2.33GHz),16 GB memory, Gigabit Ethernet per}

node

• _{EMR & Hadoop on EC2}

–

_{Cap3 – HighCPU Extra Large instances (8 Cores, 20 CU, 7GB memory per}

instance)

(7)

SWG Sequence Alignment Performance

(8)

(9)

• _{Iteratively refining operation}

• _{Typical MapReduce runtimes incur extremely high overheads}

–

_{New maps/reducers/vertices in every iteration}

–

_{File system based communication}

• _{Long running tasks and faste}

_{r communication in Twister enables it to}

perform close to MPI

Time for 20 iterations

Why Iterative MapReduce? K-Means Clustering

map

reduce

Compute the

distance to each

data point from

each cluster center

and assign points

to cluster centers

Compute new cluster

centers

Compute new cluster

centers

User program

(10)

Performance of Pagerank using ClueWeb Data (Time for 20

iterations)

using 32 nodes (256 CPU cores)

(11)

High Level Flow Twister4Azure



_{Merge Step}



_{In-Memory Caching of static data}



_{Cache aware hybrid scheduling using Queues as well}

as using a bulletin board (special table)

Reduce

Merge

Add Iteration?

No

Map

_Combine

Map

_Combine

Map

_Combine

Data Cache

Yes

Hybrid scheduling of the new iteration

Job Start

(12)

Cache aware scheduling

• _{New Job (1}

st

_iteration)

–

_{Through queues}

• _{New iteration}

–

_{Publish entry to Job Bulletin Board}

–

_{Workers pick tasks based on}

in-memory data cache and execution

history (MapTask Meta-Data

cache)

–

_{Any tasks that do not get}

(13)

Twister4Azure Performance Comparisons

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%

128 228 328 428 528 628 728

P ar a lle l E ffi ci e n cy

Number of Query Files

Twister4Azure Hadoop-Blast DryadLINQ-Blast 0 500 1000 1500 2000 2500 3000 A d ju st e d T im e (s )

Num. of Cores * Num. of Blocks

Twister4Azure Amazon EMR Apache Hadoop 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100% P a ra lle l E ffi ci e n cy

Num. of Cores * Num. of Files

Twister4Azure Amazon EMR Apache Hadoop

BLAST Sequence Search

Cap3 Sequence Assembly

(14)

Twister4Azure Performance –

Kmeans

Clustering

0% 20% 40% 60% 80% 100% 120% 140% 160% 0 200 400 600 800 1000 1200 1400 1600

8 X 16M 16 X 32M 32 X 64M 48 X 96M 64 X 128M

R el ati ve P ar al le l E ffi ci e n cy T im e (s )

Num Instances X Num Data Points

Relative Parallel Efficiency Time(s)

Performance with/without data caching.

Speedup gained using data cache

Scaled speedup

(15)

Visualizing Metagenomics

• _{Multidimensional Scaling MDS natural way to map}

sequences to 3D so you can visualize

• _{Minimize Stress}

• _{Improve with deterministic annealing (gives lower}

stress with less variation between random starts)

• _{Need to}

_iterate

_{Expectation Maximization}

• N

2 _{dissimilarities (Needleman-Wunsch)}



i j

• _{Communicate N positions}

_X

_{between steps}

(16)

(17)

Multi-Dimensional-Scaling

• _{Many iterations}

• _{Memory & Data intensive}

• _{3 Map Reduce jobs per iteration}

• X

_k

= invV * B(X

_(k-1)

) * X

_(k-1)

• _{2 matrix vector multiplications termed BC and X}

BC:

Calculate BX

Map

Map ReduceReduce MergeMerge

X:

Calculate invV

(BX)

Map

Calculate

Stress

Map

(18)

MDS Execution Time Histogram

10 iterations, 30000 * 30000 data points, 15 Azure Instances

1 21 41 61 81 ₁₀1

121 141161 181201 221241 261281 301321 341361 381401 421441 461481501521541561 581 601621641661 681701721741 761781 801821 841861 881

0 5 10 15 20 25 30

BC Calculation MR

X Calculation MR

Stress Calculation MR

(19)

MDS Executing Task Histogram

10 iterations, 30000 * 30000 data points, 15 Azure Instances

0 10 20 30 40 50 60 70 80 90 ₁₀0

110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340

0 2 4 6 8 10 12 14 16

BC Calculation

X Calculation

Stress Calculation

Time Since Job Start (s)

(20)

MDS Performance

5 10 15 20

0 100 200 300 400 500 600 700

Execution Time

Number of Iterations

5 10 15 20

20 25 30 35 40 45

Time Per Iteration

Number of Iterations

30,000*30,000 Data points, 15 instances, 3 MR steps per iteration

30 Map tasks per application

(21)

(22)

Types of Data

• _{Loop invariant data (static data) – traditional MR}

key-value pairs

–

_{Comparatively larger sized data}

–

_{Cached between iterations}

• _{Loop variant data (dynamic data) – broadcast to}

all the map tasks in beginning of the iteration

–

_{Comparatively smaller sized data}

(23)

In-Memory Data Cache

• _{Caches the loop-invariant (static) data across}

iterations

–

_{Data that are reused in subsequent iterations}

• _{Avoids the data download, loading and parsing}

cost between iterations

–

_{Significant speedups for some data-intensive}

iterative MapReduce applications

(24)

Cache Aware Scheduling

• _{Map tasks need to be scheduled with cache awareness}

–

_{Map task which process data ‘X’ needs to be scheduled to}

the worker with ‘X’ in the Cache

• _{Nobody has global view of the data products cached in}

workers

–

_{Decentralized architecture}

–

_{Impossible to do cache aware assigning of tasks to workers}

• _{Solution: workers pick tasks based on the data they}

have in the cache

(25)

Merge Step

• _{Extension to the MapReduce programming model to support iterative}

applications

–

Map -> Combine -> Shuffle -> Sort -> Reduce -> Merge

• _{Receives all the Reduce outputs and the broadcast data for the current}

iteration

• _{User can add a new iteration or schedule a new MR job from the}

Merge task.

–

_{Serve as the “loop-test” in the decentralized architecture}

• Number of iterations

• Comparison of result from previous iteration and current iteration

(26)

Multiple Applications per Deployment

• _{Ability to deploy multiple Map Reduce}

applications in a single deployment

• _{Possible to invoke different MR applications in}

a single job

• _{Support for many application invocations in a}

(27)

Conclusions

• _{Twister4Azure enables users to easily and efficiently}

perform large scale iterative data analysis and

scientific computations on Azure cloud.

–

_{Supports classic and iterative MapReduce}

• _{Utilizes a hybrid scheduling mechanism to provide}

the caching of static data across iterations.

• _{Can integrate with Trident (or other workflow)}

• _{Plenty of testing and improvements to come!}

• _{Open source: Please use}

http://salsahpc.indiana.edu/twister4azure