• No results found

Analysis Tools for Data Enabled Science

N/A
N/A
Protected

Academic year: 2020

Share "Analysis Tools for Data Enabled Science"

Copied!
48
0
0

Loading.... (view fulltext now)

Full text

(1)

S

A

L

S

A

HPC Group

http://salsahpc.indiana.edu

(2)

Twister

Bingjing Zhang

Funded by Microsoft Foundation Grant,

Indiana University's Faculty Research Support

Program and NSF OCI-1032677 Grant

Twister4Azure

Thilina Gunarathne

Funded by Microsoft Azure Grant

High-Performance

Visualization Algorithms

For Data-Intensive Analysis

Seung-Hee Bae and Jong Youl Choi

(3)

DryadLINQ CTP Evaluation

Hui Li, Yang Ruan, and Yuduo Zhou

Funded by Microsoft Foundation Grant

Million Sequence Challenge

Saliya Ekanayake, Adam Hughs, Yang Ruan

Funded by NIH Grant 1RC2HG005806-01

Cyberinfrastructure for

Remote Sensing of Ice Sheets

Jerome Mitchell

(4)

Linux HPC

Bare-system

Amazon Cloud Windows Server

HPC

Bare-system

Virtualization

CPU Nodes

Virtualization

Infrastructure

Hardware

Azure Cloud

Grid

Appliance

GPU Nodes

Cross Platform Iterative MapReduce (Collectives, Fault Tolerance, Scheduling)

Kernels, Genomics, Proteomics, Information Retrieval, Polar Science

Scientific Simulation Data Analysis and Management

Dissimilarity Computation, Clustering, Multidimentional Scaling, Generative

Topological Mapping

Applications

Programming

Model

Services and Workflow

High Level Language

Distributed File Systems

Data Parallel File System

Runtime

Storage

Object Store

(5)

(a) Map Only (b) Classic MapReduce (c) Iterative MapReduce (d) Loosely Synchronous Input map reduce Input map reduce Iterations Input Output map

P

ij CAP3 Analysis Smith-Waterman Distances Parametric sweeps

PolarGrid Matlab data analysis

High Energy Physics (HEP) Histograms

Distributed search Distributed sorting Information retrieval

Many MPI scientific applications such as solving differential equations and particle dynamics

Domain of MapReduce and Iterative Extensions MPI

Expectation maximization clustering e.g. Kmeans

(6)

GTM

MDS (SMACOF)

Maximize Log-Likelihood

Minimize STRESS or SSTRESS

Objective

Function

O(KN) (K << N)

O(N

2

)

Complexity

Non-linear dimension reduction

Find an optimal configuration in a lower-dimension

Iterative optimization method

Purpose

EM

Iterative Majorization (EM-like)

Optimization

Method

Vector-based data

Non-vector (Pairwise similarity matrix)

(7)

Parallel Visualization

(8)

Distinction on static and variable

data

Configurable long running

(cacheable) map/reduce tasks

Pub/sub messaging based

communication/data transfers

Broker Network for facilitating

(9)

configureMaps(..)

configureReduce(..)

runMapReduce(..)

while(

condition

){

} //end while

updateCondition()

close()

Combine()

operation

Reduce()

Map()

Worker Nodes

Communications/data transfers via the

pub-sub broker network & direct TCP

Iterations

May send <Key,Value> pairs directly

Local Disk

Cacheable map/reduce tasks

Main program may contain many

MapReduce invocations or iterative

MapReduce invocations

(10)

Worker Node

Local Disk

Worker Pool

Twister Daemon

Master Node

Twister

Driver

Main Program

B

B

B

B

Pub/sub

Broker Network

Worker Node

Local Disk

Worker Pool

Twister Daemon

Scripts perform:

Data distribution, data collection,

and

partition file

creation

map

reduce

Cacheable tasks

(11)

Master Node

Twister

Driver

Twister-MDS

ActiveMQ

Broker

MDS Monitor

PlotViz

I. Send message to

start the job

II. Send intermediate

results

(12)

Method A

Hierarchical Sending

Method B

Improved Hierarchical Sending

Method C

(13)

Twister Driver Node Twister Daemon Node ActiveMQ Broker Node

Broker-Daemon Connection Broker-Broker Connection 8 Brokers and 32 Daemon Nodes in total

(14)

Twister Daemon Node ActiveMQ Broker Node

Broker-Daemon Connection Broker-Broker Connection

8 Brokers and 32 Daemon Nodes in total

Twister Driver Node

(15)

Time used for the first level sending,

(

𝑁

𝑏

+ 𝑏−1)𝛼

Time used for the second level sending

𝑁

𝑏

𝛼

(sending in

parallel)

𝑁

is the number of Twister Daemon Nodes

𝑏

is the number of brokers

𝛼

is the transmission time for each sending

Get the derivation of

𝑏

,

𝑏 = 2𝑁

That is when

𝑏 = 2𝑁

, the total broadcasting time is the

(16)

Twister Driver Node Twister Daemon Node

ActiveMQ Broker Node 7 Brokers and 32 DaemonNodes in total

(17)

Twister Daemon Node

ActiveMQ Broker Node 7 Brokers and 32 ComputingNodes in total

Twister Driver Node

(18)

𝑡 = 𝑏−1 𝛼 +

𝑁

𝑏−1

𝛼

,

𝑡

comes to the minimum when

𝑏 = 𝑁 + 1

,

𝑡 = 2 𝑁 𝛼

𝑁

is the number of Twister Daemon Nodes

𝑏

is the number of brokers

(19)

Number of Brokers

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Execution

Time

(Unit:

Second)

(20)

Twister Daemon Node

ActiveMQ Broker Node 5 Brokers and 4 ComputingNodes in total

Twister Driver Node

(21)

Centroids

Centroid 1

Centroid 2

Centroid 3

(22)

Centroid 1

Twister Daemon Node ActiveMQ Broker Node

Twister Driver Node

Centroid 2 Centroid 3 Centroid 4

Centroid 1

Centroid 2

Centroid 4

Centroid 3

Centroid 1

Centroid 2

Centroid 4

Centroid 3

Centroid 1

Centroid 2

Centroid 4

Centroid 3

Centroid 1

Centroid 2

(23)

Twister Map Task ActiveMQ Broker Node

Centroid 1

Centroid 1

Centroid 1

Centroid 1

Centroid 2

Centroid 2

Centroid 2

Centroid 2

Centroid 3

Centroid 3

Centroid 3

Centroid 3

Centroid 4

Centroid 4

Centroid 4

Centroid 4

Twister Reduce Task

Centroid 1

Centroid 2

Centroid 4

Centroid 3

Centroid 1

Centroid 2

Centroid 4

Centroid 3

Centroid 1

Centroid 2

Centroid 4

Centroid 3

Centroid 1

Centroid 2

(24)

13.07 18.79 24.50 46.19 70.56 93.14

400M 600M 800M

Broadcasting Time (Unit: Second) 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00

(25)

Distributed, highly scalable and highly available cloud

services as the building blocks.

Utilize eventually-consistent , high-latency cloud services

effectively to deliver performance comparable to

traditional MapReduce runtimes.

Decentralized architecture with global queue based

dynamic task scheduling

Minimal management and maintenance overhead

Supports dynamically scaling up and down of the compute

resources.

(26)
(27)
(28)

Performance with/without

data caching

Speedup gained using data cache

Scaling speedup

Increasing number of iterations

Number of Executing Map Task Histogram

Strong Scaling with 128M Data Points

(29)

Performance with/without

data caching

Speedup gained using data cache

Scaling speedup

Increasing number of iterations

Azure Instance Type Study Number of Executing Map Task Histogram

Weak Scaling

Data Size Scaling

(30)
(31)

Chris Hemmerich, Adam Hughes, Yang Ruan, Aaron Buechlein, Judy Qiu, and Geoffrey Fox. Map-Reduce Expansion of the ISGA Genomic Analysis Web Server (2010) The 2nd IEEE International Conference on Cloud Computing Technology and Science

ISGA

Ergatis

TIGR Workflow

SGE

Condor

Other DCEs

Cloud,

<<XML>>

<<XML>>

(32)
(33)

Gene Sequences (N

= 1 Million)

Distance Matrix Interpolative MDS with Pairwise Distance Calculation Multi-Dimensional Scaling (MDS) Visualizatio

n 3D Plot

Reference Sequence Set

(M = 100K)

N - M Sequence Set (900K) Select Referenc e Reference Coordinates

x, y, z

N - M

Coordinates x, y, z

Pairwise Alignment & Distance Calculation

(34)

Input DataSize: 680k

Sample Data Size: 100k

Out-Sample Data Size: 580k

Test Environment: PolarGrid with 100 nodes, 800 workers.

(35)
(36)

MPI / MPI-IO

Finding K clusters for N data points

Relationship is a bipartite graph (bi-graph)

Represented by K-by-N matrix (K << N)

Decomposition for P-by-Q compute grid

Reduce memory requirement by 1/PQ

K latent

points

N data

points

1

2

A

B

C

1

2

A

B

C

Parallel File System

Cray / Linux / Windows Cluster

Parallel HDF5

ScaLAPACK

(37)

Parallel MDS

O(N

2

) memory and computation

required.

100k data

480GB memory

Balanced decomposition of NxN

matrices by P-by-Q grid.

Reduce memory and computing

requirement by 1/PQ

Communicate via MPI primitives

MDS Interpolation

Finding approximate

mapping position w.r.t.

k-NN’s prior mapping.

Per point it requires:

O(M) memory

O(k) computation

Pleasingly parallel

Mapping 2M in 1450 sec.

vs. 100k in 27000 sec.

7500 times faster than

estimation of the full MDS.

37

c1

c2

c3

r1

(38)

Full data processing by GTM or MDS is computing- and

memory-intensive

Two step procedure

Training

: training by M samples out of N data

Interpolation

: remaining (N-M) out-of-samples are

approximated without training

n

In-sample

N-n

Out-of-sample

Total N data

(39)

PubChem data with CTD

visualization by using MDS (left)

and GTM (right)

About 930,000 chemical compounds

are visualized as a point in 3D space,

annotated by the related genes in

Comparative Toxicogenomics

Database (CTD)

Chemical compounds shown in

literatures, visualized by MDS (left)

and GTM (right)

(40)
(41)
(42)
(43)
(44)
(45)

Investigate in applicability and performance of DryadLINQ CTP to

develop scientific applications.

Goals:

Evaluate key features and

interfaces

Probe parallel programming

models

Three applications:

SW-G bioinformatics application

Matrix Multiplication

(46)

Parallel algorithms for

matrix multiplication

Row partition

Row column partition

2 dimensional block

decomposition in Fox

algorithm

Multi core technologies

PLINQ, TPL, and Thread Pool

Hybrid parallel model

Port multi-core to Dryad task

to improve Performance

Timing model for MM

Input data size

2400 4800 7200 9600 12000 14400 16800 19200

Parallel Efficiency 0 0.1 0.2 0.3 0.4

0.5 RowPartition RowColumnPartition Fox-Hey

RowPartition RowColumnPartition Fox-Hey

Speed up 0 20 40 60 80 100 120 140

(47)

Workload of SW-G, a pleasingly parallel application, is heterogeneous

due to the difference in input gene sequences. Hence workload balancing

becomes an issue.

Two approach to alleviate it:

Randomized distributed input data

Partition job into finer granularity tasks

0 50 100 150 200 250 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Skewed Randomized Standard Deviation Exectuion Time (Seconds)

Number of Partitions

31 62 124 186 248

Execution Time (Seconds) 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

(48)

S

A

L

S

A

HPC Group

Indiana University

References

Related documents

Hitherto, the RPA framework has not been used to assess breast cancer leaflets to determine the presence or absence of the RPA constructs (risk (including

3 ' Also, because parents in a joint custody arrangement work together rather than against one another, potential conflicts arise less often.' 32 An added

Francesco Longo (Univ. and INFN Trieste, Italy) Martino Marisaldi (INAF-IASF Bologna, Italy) Sandro Mereghetti (INAF-IASF, Milano, Italy) Kazuhiro Nakazawa

Sec- ondly, their leading export engines are not only traditional sectors, such as E&amp;E in Malay- sia or mining in Chile, but also ′ new ′ resource-based sectors, such as such

She examines how school structures and practices are aligned with middle- class culture and how precisely through serving the middle-class agenda, schools privilege

Berdasarkan studi pendahuluan yang telah dilakukan, didapatkan hasil bahwa dari pemeriksaan kadar hemoglobin dengan metode sahli yaitu, 7 dari 10 mahasiswi

Bantul district government has five strategies to strengthen gender capacity within social dimension[s]: (1) empowering women community groups in disaster response[s];

A RESOLUTION AUTHORIZING THE CITY MANAGER TO APPLY FOR AND ACCEPT FUNDS FOR A CRIME DATA TECHNICAN FOR THE POLICE DEPARTMENT FROM THE OFFICE OF THE GOVERNOR - CRIMINAL