• No results found

Towards High Performance Data Analytics with Java

N/A
N/A
Protected

Academic year: 2020

Share "Towards High Performance Data Analytics with Java"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Towards High Performance Data Analytics

with Java

SALIYA EKANAYAKE

4/1/2013 SALSA PRESENTATION

1

(2)

A Bit of Background

Gene Sequence Clustering and Visualization

Projects

Million sequence project

http://salsahpc.indiana.edu/millionseq/

Work on COG (Protein) sequences

http://salsacog.blogspot.com/

Work on phylogenetic trees

http://salsafungiphy.blogspot.com/

Publications

G. L. H. Yang Ruan, Saliya Ekanayake, Ursel Schütte, James D. Bever, Haixu Tang,

Geoffrey Fox, “Integration of Clustering and Multidimensional Scaling to Determine

Phylogenetic Trees as Spherical Phylograms Visualized in 3 Dimensions,” in C4Bio

2014 of IEEE/ACM CCGrid 2014, Chicago, USA, 2014

L. Stanberry, R. Higdon, W. Haynes, N. Kolker, W. Broomall, S. Ekanayake, A. Hughes,

Y. Ruan, J. Qiu, E. Kolker, and G. Fox, “Visualizing the protein sequence universe,” in

Proceedings of the 3rd international workshop on Emerging computational methods

for the life sciences, Delft, The Netherlands, 2012, pp. 13-22

Y. Ruan, S. Ekanayake, M. Rho, H. Tang, S.-H. Bae, J. Qiu, and G. Fox, “DACIDR:

deterministic annealed clustering with interpolative dimension reduction using a

large collection of 16S rRNA sequences,” in Proceedings of the ACM Conference on

Bioinformatics, Computational Biology and Biomedicine, Orlando, Florida, 2012, pp.

329-336

A. Hughes, Y. Ruan, S. Ekanayake, S. H. Bae, Q. Dong, M. Rho, J. Qiu, and G. Fox,

“Interpolative multidimensional scaling techniques for the identification of clusters in

very large sequence sets,”

BMC Bioinformatics,

vol. 13 Suppl 2, pp. S9, 2012

(3)

Under the Hood

4/1/2013 SALSA PRESENTATION

3

D1

Alignment

and

Distance

Calculation

D2

Dimension

Reduction

D3

Clustering

D4

Visualization

D5 >G0H13NN01D34CL GTCGTTTAAGCCATTACGTC … >G0H13NN01DK2OZ GTCGTTAAGCCATTACGTC …

# X Y Z

0 0.358 0.2620. 295 1 0.252 0.422 0.372

# Cluster

0 1

1 3

Reality Is More Complex

Study of Biological Sequence Structure

http://salsahpc.blogspot.com/2013/05/study-of-biological-sequence-structure.html

Million Sequence Processes

http://salsahpc.indiana.edu/millionseq/fungi2/fungi2_index.html

Runs On

Tempest

Windows HPC Cluster

FutureGrid, BigRed II, Quarry

Traditional Linux Based HPC Clusters

Algorithms

Alignment and Distance Calculation

SALSA-SWG

C# MPI

SALSA-SWG-MBF

C# MPI

SALSA-NW-MBF

C# MPI

SALSA-SWG-MBF2Java

Java MapReduce

SALSA-NW-BioJava

Java MapReduce

Dimension Reduction

MDSasChisq

C# MPI

DA-SMACOF

C# MPI

Twister DA-SMACOF

Java Iterative MapReduce

WDA-SMACOF

Java Iterative MapReduce

Clustering

DAPWC

C# MPI

(4)

Towards Java

Motivation

Immediate

Limited Windows HPC Clusters

Future

Integrate with Apache Big Data Stack (ABDS)

Options

Keep C#

Run on Azure cloud

Not the best for MPI because of high latencies and low bandwidths

Run on Mono

We tried, it worked, but poor in performance

Convert to Java

Time consuming, but gained good results

“Java Ready” Applications

Deterministic Annealing Vector Sponge (DAVS)

(5)

Evaluations

MPI Frameworks

MPI.NET

A high performance message passing interface for .NET environment

FastMPJ

A pure Java implementation of mpiJava 1.2 specification

OpenMPI

Java wrapper for native MPI implementation

Nightly snapshot 1.9a1r28881 (OMPI-nightly) – conforms with mpiJava 1.2 specification

Source tree revision 30301 (OMPI-trunk)

Release candidate version 1.7.5rc5 (OMPI-175rc5) – latest of the three

Kernel Benchmarks

Ohio MicroBenchmark (OMB) Suite

Send and receive

Allreduce

Application Benchmarks

DAVS and DAPWC on Real Data

Parallel Patterns of T x P x N

T - # threads per process

P - # MPI processes per node

N - # nodes

Threads from Habanero Java Library

4/1/2013

Mainly for Parallel Loops

SALSA PRESENTATION

5

Your code was

(6)

Kernel Benchmarks

MPI Send and Receive

Message size (bytes)

0B 1B 2B 4B 8B 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB 32KB 64KB128KB256KB512KB 1MB

Average

time

(us)

1 10 100 1000 10000

MPI.NET C# in Tempest FastMPJ Java in FG OMPI-nightly Java FG OMPI-trunk Java FG OMPI-trunk C FG OMPI-nightly C FG

Message Size (bytes)

0B 1B 2B 4B 8B 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB 32KB 64KB128KB256KB512KB 1MB

Average

Time

(us)

1 10 100 1000 10000

OMPI-trunk C Madrid OMPI-trunk Java Madrid OMPI-trunk C FG OMPI-trunk Java FG

(7)

Kernel Benchmarks

MPI Allreduce

4/1/2013 SALSA PRESENTATION

7

Performance with Different MPI Frameworks

OMPI-trunk Performance with and without Infiniband

Message size (bytes)

4B 8B 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB 32KB 64KB128KB256KB512KB 1MB 2MB 4MB 8MB

Average

time

(us)

10 100 1000 10000 100000

MPI.NET C# in Tempest FastMPJ Java in FG OMPI-nightly Java FG OMPI-trunk Java FG OMPI-trunk C FG OMPI-nightly C FG

Message Size (bytes)

4B 8B 16B 32B 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB 32KB 64KB128KB256KB512KB 1MB 2MB 4MB 8MB

Average

Time

(us)

1 10 100 1000 10000 100000 1000000

(8)

DAVS Performance

Mode – Charge5

TxPxN

1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2 1x8x1

Speedup

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 MPI.NET OMPI-nightly OMPI-trunk

TxPxN

2x1x8 4x1x8 8x1x8 1x2x8 4x2x8 1x4x8 2x4x8

Time

(hours)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 MPI.NET OMPI-nightly OMPI-trunk

TxPxN

1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2 1x8x1

Time

(hours)

0 0.2 0.4 0.6 0.8 1 1.2 MPI.NET OMPI-nightly OMPI-trunk

(9)

DAVS Performance

Mode – Charge2

4/1/2013 SALSA PRESENTATION

9

Pure MPI

MPI with Threads

Pure MPI Speedup

TxPxN

1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2

Time

(hours)

0 5 10 15 20 25 30 MPI.NET OMPI-nightly OMPI-trunk

TxPxN

1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2

Speedup

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 MPI.NET OMPI-nightly OMPI-trunk

TxPxN

2x1x8 4x1x8 8x1x8 1x2x8 4x2x8 1x4x8 2x4x8 1x8x8

(10)

DAVS Performance

Single Node Charge 2, Charge 5 and Charge 6

Points

OMPI-trunk performed the best and OMPI-nightly was near too

MPI.NET may be suffering from bad Infiniband

FastMPJ had issues that prevented it from running the applications

Performance with threading is not up to expected for Java

(11)

DAPWC Performance

OMPI-175 Only (Chosen over OMPI-trunk)

4/1/2013 SALSA PRESENTATION

11

TxPxN

1x1x11x1x21x2x12x1x11x1x41x2x21x4x12x1x22x2x14x1x11x1x81x2x41x4x21x8x12x1x42x2x22x4x14x1x24x2x18x1x11x1x161x2x81x4x41x8x22x1x82x2x42x4x24x1x44x2x28x1x21x1x321x2x161x4x81x8x42x1x162x2x82x4x44x1x84x2x48x1x41x2x321x4x161x8x82x1x322x2x162x4x84x1x164x2x88x1x81x4x321x8x162x2x322x4x164x1x324x2x168x1x161x8x322x4x324x2x328x1x321x8x43

Time

(hours)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

(12)

DAPWC Performance

Parallelism

16

TxPxN

1x1x16 1x2x8 1x4x4 1x8x2 2x1x8 2x2x4 2x4x2 4x1x4 4x2x2 8x1x2 1x1x32 1x2x16 1x4x8 1x8x4 2x1x16 2x2x8 2x4x4 4x1x8 4x2x4 8x1x4 1x2x32 1x4x16 1x8x8 2x1x32 2x2x16 2x4x8 4x1x16 4x2x8 8x1x8 1x4x32 1x8x16 2x2x32 2x4x16 4x1x32 4x2x16 8x1x16 1x8x32 2x4x32 4x2x32 8x1x32

Time

(hours)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

(13)

DAPWC Performance

Speedup

Points

Performance with threads is better than DAVS, but Tx

1

xN is peculiar

FastMPJ failed as before

MPI.NET and OMPI-nightly runs are yet to perform

4/1/2013 SALSA PRESENTATION

13

TxPxN

1x1x11x1x21x2x12x1x11x1x41x2x21x4x12x1x22x2x14x1x11x1x81x2x41x4x21x8x12x1x42x2x22x4x14x1x24x2x18x1x11x1x161x2x81x4x41x8x22x1x82x2x42x4x24x1x44x2x28x1x21x1x321x2x161x4x81x8x42x1x162x2x82x4x44x1x84x2x48x1x41x2x321x4x161x8x82x1x322x2x162x4x84x1x164x2x88x1x81x4x321x8x162x2x322x4x164x1x324x2x168x1x161x8x322x4x324x2x328x1x32

Speedup

1 21 41 61 81 101 121

(14)

Current Tasks and Future

Current

Complete migration of applications to Java

Evaluate performance

Investigate “not so great” thread performance

Future

How to integrate with ABDS?

(15)

Thank you!

References

Related documents

who think immigrants take away jobs are more likely to support harsh punitive

áDëAþ ù ÿDÿï Ù ßhã>ì ÝKàIÚAæ5ßzéäÛêIÝÞ5ÝDå—Úã>ÚAސèŠÝçzâÚ°áëA þ ùBÿDòFãAìÚ°á"!æÚAÞ'è|Úà+#Iâ&% ÝÜ

In 2013, ThinkTank Research Center for Health Development and Chinese Center for Disease Control and Prevention cooperated to research and made the Monitoring

Adsorption capacity of Cordia Macleodii tree bark granular activated charcoal for Mn (II) retrieval was investigated by employing batch equilibration method as

In conclusion, we have demonstrated that CAO actively suppressed the growth of skin pathogens, elastase activity, and melanin production, all of which are implicated in poor

Circulation Research, American Journal of Physiology, Journal of the American College of Cardiology, Journal of Applied Physiology, Annals of Biomedical Engineering,

Most investigators have been unable to demonstrate steady-state inward H + current, but extrapolation of the behavior described in alveolar epithelial cells (11,27), would

This article discusses some of the factors that contributed to the slow pace of efforts to address this problem, including the ubiquity and magnitude of lead exposure during much of