• No results found

HPC ABDS: The Case for an Integrating Apache Big Data Stack

N/A
N/A
Protected

Academic year: 2021

Share "HPC ABDS: The Case for an Integrating Apache Big Data Stack"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

HPC‐ABDS: The Case for an 

Integrating Apache Big Data Stack 

with HPC 

1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu

Shantenu Jha (Rutgers) Geoffrey Fox  [email protected] http://www.infomall.org School of Informatics and Computing Digital Science Center Indiana University Bloomington

(2)

Enhanced

Apache Big 

Data Stack

ABDS

• ~120 Capabilities • >40 Apache • Green layers have  strong HPC Integration  opportunitiesGoalFunctionality of ABDSPerformance of HPC

(3)

Broad Layers in HPC‐ABDS

• Workflow‐Orchestration • Application and Analytics • High level Programming • Basic Programming model and runtime – SPMD, Streaming, MapReduce, MPI • Inter process communication – Collectives, point to point, publish‐subscribe • In memory databases/caches • Object‐relational mapping • SQL and NoSQL, File management • Data Transport • Cluster Resource Management (Yarn, Slurm, SGE) • File systems(HDFS, Lustre …) • DevOps (Puppet, Chef …) • IaaS Management from HPC to hypervisors (OpenStack) • Cross Cutting – Message Protocols – Distributed Coordination – Security & Privacy – Monitoring

(4)
(5)
(6)

Getting High Performance on Data 

Analytics (e.g. Mahout, R …)

• On the systems side, we have two principles – The Apache Big Data Stack with ~120 projects has important broad  functionality with a vital large support organization – HPC including MPI has striking success in delivering high performance  with however a fragile sustainability model

• There are key systems abstractions which are levels in HPC‐ABDS software  stack where Apache approach needs careful integration with HPC – Resource management – Storage – Programming model ‐‐ horizontal scaling parallelism – Collective and Point to Point communication – Support of iteration – Data interface (not just key‐value)

• In application areas, we define application abstractions to support – Graphs/network

– Geospatial – Images etc.

(7)

4 Forms of MapReduce

7

(a) Map Only (c) Iterative Synchronous(d) Loosely MapReduce (b) Classic MapReduce Input map reduce Input map reduce Iterations Input Output map Pij BLAST Analysis Parametric sweep Pleasingly Parallel

High Energy Physics (HEP) Histograms Distributed search

Classic MPI

PDE Solvers and particle dynamics

Domain of MapReduce and Iterative Extensions Science Clouds

MPI

Giraph

Expectation maximization Clustering e.g. Kmeans Linear Algebra, Page Rank

(a) Map Only (c) Iterative Synchronous(d) Loosely MapReduce (b) Classic MapReduce Input Input map map reduce reduce Input Input map map reduce reduce Iterations Iterations Input Input Output Output map map Pij BLAST Analysis Parametric sweep Pleasingly Parallel

High Energy Physics (HEP) Histograms Distributed search

Classic MPI

PDE Solvers and particle dynamics

Domain of MapReduce and Iterative Extensions Science Clouds

MPI

Giraph

Expectation maximization Clustering e.g. Kmeans Linear Algebra, Page Rank

MPI is Map followed by Point to Point or Collective Communication  – as in style c) plus d)

(8)

HPC‐ABDS

Hourglass

HPC ABDS

System (Middleware)

High performance

Applications

• HPC Yarn for Resource management • Horizontally scalable parallel programming model • Collective and Point to Point communication • Support of iteration System Abstractions/standards • Data format • Storage

120 Software Projects

Application Abstractions/standards Graphs, Networks, Images, Geospatial …. SPIDAL (Scalable Parallel  Interoperable Data Analytics Library)  or High performance Mahout, R,  Matlab …..

(9)
(10)

We are sort of working on Use Cases with HPC‐ABDS

• Use Case 10 Internet of Things: Yarn, Storm, ActiveMQ • Use Case 19, 20 Genomics. Hadoop, Iterative MapReduce, MPI,  Much better analytics than Mahout • Use Case 26 Deep Learning. High performance distributed GPU  (optimized collectives) with Python front end (planned) • Variant of Use Case 26, 27 Image classification using Kmeans:  Iterative MapReduce • Use Case 28 Twitter with optimized index for Hbase, Hadoop and  Iterative MapReduce • Use Case 30 Network Science. MPI and Giraph for network  structure and dynamics (planned) • Use Case 39 Particle Physics. Iterative MapReduce (wrote  proposal) • Use Case 43 Radar Image Analysis. Hadoop for multiple individual  images moving to Iterative MapReduce for global integration over  “all” images • Use Case 44 Radar Images. Running on Amazon

(11)

Features of Harp Hadoop Plug in

Hadoop Plugin (on Hadoop 1.2.1 and Hadoop 

2.2.0)

Hierarchical data abstraction on arrays, key‐values 

and graphs for easy programming expressiveness.

Collective communication model to support 

various communication operations on the data 

abstractions.

Caching with buffer management for memory 

allocation required from computation and 

communication 

BSP style parallelism

Fault tolerance with check‐pointing

(12)

Architecture

YARN MapReduce V2

Harp

MapReduce Applications Map‐Collective Applications

Application

Framework

(13)

Performance on Madrid Cluster (8 

nodes)

0 200 400 600 800 1000 1200 1400 1600 100m 500 10m 5k 1m 50k Execution  Time  (s) Problem Size K‐Means Clustering Harp v.s. Hadoop on Madrid

Hadoop 24 cores Harp 24 cores Hadoop 48 cores Harp 48 cores Hadoop 96 cores Harp 96 cores

Note compute same in each case as product of centers times points identical Increasing

Communication Identical Computation

(14)

Mahout and Hadoop MR – Slow due to MapReduce

Python slow as Scripting

Spark Iterative MapReduce, non optimal communication

Harp Hadoop plug in with ~MPI collectives 

MPI fastest as C not Java

Increasing Communication Identical Computation

(15)

Performance of MPI Kernel Operations

1 100 10000 0B 2B 8B 32B 128 B 512 B 2KB 8KB 32KB 128KB 512KB Av era ge  time  (u s) Message size (bytes) MPI.NET C# in Tempest FastMPJ Java in FG OMPI‐nightly Java FG OMPI‐trunk Java FG OMPI‐trunk C FG Performance of MPI send and receive operations 5 5000 4B 16B 64B 256 B 1KB 4KB 16KB 64KB 256KB 1M B 4M B Avera ge  time  (u s) Message size (bytes) MPI.NET C# in Tempest FastMPJ Java in FG OMPI‐nightly Java FG OMPI‐trunk Java FG OMPI‐trunk C FG Performance of MPI allreduce operation 1 100 10000 1000000 4B 16B 64B 256B 1KB 4KB 16KB 64KB 256KB 1MB 4MB Aver ag e  Time  (u s) Message Size (bytes) OMPI‐trunk C Madrid OMPI‐trunk Java Madrid OMPI‐trunk C FG OMPI‐trunk Java FG 1 10 100 1000 10000 0B 2B 8B 32B 128B 512B 2KB 8KB 32K B 128KB 512KB Av erag e  Time  (u s) Message Size (bytes) OMPI‐trunk C Madrid OMPI‐trunk Java Madrid OMPI‐trunk C FG OMPI‐trunk Java FG Performance of MPI send and receive on  Infiniband and Ethernet Performance of MPI allreduce on Infiniband and Ethernet Pure Java as  in FastMPJ slower than  Java  interfacing  to C version  of MPI

(16)

Use case 28: Truthy: Information diffusion research from Twitter Data • Building blocks: – Yarn – Parallel query evaluation using Hadoop MapReduce – Related hashtag mining algorithm using Hadoop MapReduce:  – Meme daily frequency generation using MapReduce over index tables – Parallel force‐directed graph layout algorithm using Twister (Harp) iterative MapReduce

(17)

Use case 28: Truthy: Information diffusion research from 

Twitter Data

Two months’ data loading 

for varied cluster size Scalability of iterative graph layout algorithm on Twister

Hadoop‐FS  not indexed

(18)

0 200 400 600 800 1000 1200 1400 1600 1800 2000 24 48 96 To ta l executi o n  ti me   (s) number of mappers Different Kmeans Implementation Total execution time vs. mapper number

Hadoop 100m,500 Hadoop 10m,5000 Hadoop 1m,50000 Harp 100m,500 Harp 10m,5000 Harp 1m,50000 Pig HD1 100m,500 Pig HD1 10m,5000 Pig HD1 1m,50000 Pig Yarn 100m,500 Pig Yarn 10m,5000 Pig Yarn 1m,50000

Pig

(19)

Lines of Code

Pig Kmeans Hadoop Kmeans Pig IndexedHBase meme‐cooccur‐ count IndexedHBase meme‐cooccur‐ count Java ~345 780 152 ~434 Pig 10 0 10 0 Python / Bash ~40 0 0 28 Total Lines 395 780 162 462

(20)

DACIDR for Gene Analysis (Use Case 19,20)

Deterministic Annealing Clustering and Interpolative

Dimension Reduction Method (DACIDR)

Use Hadoop for pleasingly parallel applications, and

Twister (replacing by Yarn) for iterative MapReduce

applications

Sequences – Cluster  Centers

Add Existing data and find Phylogenetic Tree

All‐Pair Sequence Alignment Streaming Pairwise Clustering Multidimensional Scaling Visualization Simplified Flow Chart of DACIDR

(21)

Summarize a million Fungi Sequences

Spherical Phylogram Visualization

RAxML result visualized in FigTree.

Spherical Phylogram from new MDS  method visualized in PlotViz

(22)

Lessons / Insights

Integrate (don’t compete) HPC with “Commodity Big 

data” (Google to Amazon to Enterprise data Analytics) 

i.e. improve Mahout; don’t compete with itUse Hadoop plug‐ins rather than replacing HadoopEnhanced Apache Big Data Stack HPC‐ABDS has 120  members – please improve!

HPC‐ABDS+ Integration areas include 

file systems, cluster resource management, file and object data management, inter process and thread communication, analytics libraries, Workflowmonitoring

References

Related documents

the form in clay, taking a mold in plaster from that, then. painting wax into

• Display the source code of the program in a separate window, with an automatically updated indication of the current point of execution.. • Have full Display

Streams has sink adapters that enable the high-speed delivery of streaming data into BigInsights (through the BigInsights Toolkit for Streams) or directly into your data warehouse

If you need to process smaller numbers of rows, consider storing them in a temporary table in SQL Server or a temporary fi le and only writing them to Hadoop when the data size

Make use of open source enumeration and scanning tools: – Nessus: Write NASL plug-ins to scan for vulnerabilities in your application.. – Nmap : Alter Nmap's

Auspex made significant invest- ments in fiscal 1997 to support planned new product introductions in the company’s primar y UNIX-based market, as well as NT and

To address the more general case of the simultaneous unsupervised segmentation and recognition of multiple categories in a collection of images, we further extended our model by