Big Data Challenges in Simulation-based Science

(1)

Big Data Challenges in Simulation-based

Science

Manish Parashar

Rutgers Discovery Informatics Institute (RDI

2

₎

NSF Center for Cloud and Autonomic Computing (CAC)

http://parashar.rutgers.edu

*Hoang Bui, Tong Jin, Qian Sun, Fan Zhang and other ex-students and

collaborators (ORNL, GT, UU, NCSU, SNL, LBNL, LLNL, …

Outline

• Data Grand Challenges in Simulation-base Science

• Rethinking the simulations -> insights pipeline

• The DataSpaces Project

(2)

Modern Science & Society Transformed by

Compute & Data

New paradigms and prac/ces

in science and engineering

• Data-‐driven, data and

compute-‐intensive

• Inherently mul/-‐

disciplinary

• Collabora/ve (university,

na/onal, global)

Many Challenges

• Computing, Data, Software,

People

4

Many Challenges….

• Computing

–  Multicore; large and increasing core counts, deep memory

hierarchies (TH-2: 54.9 PF, 3.12 M Cores, 1.4 PB, 25 MW)

–  New prgm. models, concerns (fault tolerance, energy, etc)

–  New models & technologies: Clouds, grids, hybrid

manycore, accelerators, deep storage hierarchies, …

• Data

–  Generating more data than in all of human history: preserve,

mine, share?

–  How do we create “data scientists/engineers”?

• Software

–  Complex applications on coupled compute-data-networked

environments, tools needed

–  Modern apps: 106+ lines, many groups contribute, take

decades to develop, very long lifetimes

• 

People

–  Multidisciplinary expertise essential!

(3)

The Era of eScience and Big Data

B

yt

es per da

y

2012

2020

Genomics LHC TeraGrid/ XSEDE, Blue Waters Square Kilometer Array Genomics LHC Climate, Environment LSST Exa Bytes Peta Bytes Tera Bytes Giga Bytes Climate, Environment

Many smaller datasets…

Genome sequencing are doubling in output every 9

months ! _{100PB data from LSST}

by the end of this decade!! Climate Data 2004 -> 36TB 2012 -> 2PB Keep increasing! LHC will produce roughly 15PB data/ year!

SKA project will generate 1 EB data/day in 2020!

Credit: R. Pennington/A. Blatecky

Clearly, modern scientific network/

instruments/experiments/… are

producing Big Data!!

But what about HPC?

O

Scientific Discovery through Simulations (I)

•  ACI providing increasing computational scales, capabilities and capacities •  Scientific simulations running on high-end computing systems generate huge

amounts of data!

–  If a single core produces 2MB/minute on average, one of these machines could generate simulation data between ~170TB per hour -> ~700PB per day -> ~1.4EBper year

•  Successful scientific discovery depends on a comprehensive understanding of this enormous simulation data

How we enable the computation scientists to

e

ﬃ

ciently manage and explore extreme scale data:

“

ﬁ

nd the needles in haystack” ??

(4)

Scientific Discovery through Simulations (II)

• Complex, heterogeneous

components

• Large data volumes and data rates

• Complex couplings, data

re-distribution, data transformations

• Dynamic data exchange patterns

• Strict performance, energy

constraints

• Complex work

ﬂ

ows integrating

coupled models, data management/

processing, analytics

• Tight / loose coupling, data

driven, ensembles

• Advanced numerical methods (E.g.,

Adaptive Mesh Re

ﬁ

nement)

• Integrated (online) uncertainty

quanti

ﬁ

cation, analytics

Traditional Simulation -> Insight Pipelines Break Down

• Traditional

simulation -> insight

pipeline:

–  Run large-scale simulation

workflows on large supercomputers –  Dump data on parallel disk systems –  Export data to archives

–  Move data to users’ sites – usually selected subsets

–  Perform data manipulations and analysis on mid-size clusters –  Collect experimental / observational

data

–  Move to analysis sites –  Perform comparison of

experimental/observational to validate simulation data

(5)

Challenges Faced by Traditional HPC Data Pipelines

• Scalable data analytics challenge

•  Can current data mining, manipulation and visualization algorithms still work effectively on extreme scale machine?

The

costs of data movement

are increasing and dominating!

Figure. Traditional data analysis pipeline

• I/O and storage challenge

•  Increasing performance gap: disks are outpaced by computing speed

• Data movement challenge

•  Lots of data movement between simulation and analysis, between coupled mutli-physics simulation components => longer latencies

•  Improving data locality is critical: do work where the data resides!

• Energy challenge

•  Future extreme systems are designed to have low-power chips – however, much greater power consumption will be due to memory and data movement!

• The energy cost of moving data is a

significant concern

From K. Yelick, “Software and Algorithms for Exascale: Ten Ways to Waste an Exascale Computer”"

The Cost of Data Movement

Energy _ move _ data= bitrate * length

2

cross_section_area_of_wire

performance gap

• Moving data between node

memory and persistent

storage is slow!

(6)

Challenges Faced by Traditional HPC Data Pipelines

Traditional data analysis pipeline

We need to Rethink the Data Management Pipeline!

–

 

Reduce data movement

–

 

Move computation/analytics closer to the data

–

 

Add value to simulation data along the IO path

The

costs of data movement

(power and performance) are

increasing and dominating!

Rethinking the Data Management Pipeline – Hybrid Staging

+ In-Situ & In-Transit Execution

Issues/Challenges

• Programming abstractions/systems

• Mapping and scheduling

• Control and data

ﬂ

ow

(7)

Design space of possible workflow architectures

•  Loca%on of the compute resources

– Same cores as the simula/on (in situ) – Some (dedicated) cores on the same nodes

– Some dedicated nodes on the same machine – Dedicated nodes on an external resource •  Data access, placement, and persistence

– Direct access to simula/on data structures

– Shared memory access via hand-‐oﬀ / copy – Shared memory access via non-‐vola/le near

node storage (NVRAM)

– Data transfer to dedicated nodes or external

resources

•  Synchroniza%on and scheduling

–  Execute synchronously with simula/on every nth_{simula/on /me step}

–  Execute asynchronously

Processing data on remote nodes

Using distinct cores on same node Sharing cores with the simulation

DRA M DRA M DR A M Simulation Node DR A M Staging Node Network Communication Analysis Tasks Simulation Visualization DRAM NVRAM SSD Hard Disk CPUs DRAM NVRAM SSD Hard Disk CPUs Network Node 1Node 2 Node N ... Staging option 1 Staging option 2 Staging option 3

DataSpaces

In-situ/In-transit Data Management & Analytics

! 

Virtual shared-space programming abstraction

! 

Simple API for coordination, interaction and messaging

! 

Distributed, associative, in-memory object store

! 

Online data indexing, flexible querying

! 

Adaptive cross-layer runtime management

! 

Hybrid in-situ/in-transit execution

! 

Efficient, high-throughput/low-latency asynchronous data transport

ADIOS/

Da

taSpac

es

da

taspac

es

.or

g

(8)

DataSpaces: A Scalable Shared Space Abstraction for Hybrid

Data Staging [HPDC10, JCC12]

!  Virtual shared-space abstraction !  Simple API for coordination,

interaction and messaging

!  Provides a global-view

programming abstraction consistent with PGAS

!  Distributed, associative,

in-deep-memory object store

!  Online data indexing, flexible

querying

!  Adaptive cross-layer runtime

management

!  Hybrid in-situ/in-transit execution !  Data-centric mappings

!  High-throughput/low-latency

memory-to-memory asynchronous data transport

•  Dynamic coordination and interaction patterns between the coupled applications

–  Transparent data redistribution –  Complex geometry-based queries –  In-space (online) data

transformation and manipulations

DataSpaces Abstraction: Indexing + DHT

• Dynamically constructed

structured overlay using hybrid

staging cores

• Index constructed online using

SFC mappings and applications

attributes

•  E.g., application domain, data, field values of interest, etc.

• DHT used to maintain

meta-data information

•  E.g., geometric descriptors for the shared data, FastBit indices, etc.

• Data objects load-balanced

separately across staging cores

• The SFC maps the global domain to a

set of intervals

•  Intervals are non-contiguous and can lead to load imbalance

•  Second mapping compacts the intervals into a contiguous virtual interval

•  Split the contiguous interval equally to the DataSpaces nodes

(9)

DataSpaces: Scalability on ORNL Titan

0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 512/32 1024/64 2048/1284096/2568192/51216K/1K 32K/2K 64K/4K 2GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB Time(s)

Data redistribution time

0 10 20 30 40 50 60 70 80 90 100 110 120 512/32 1024/64 2048/1284096/2568192/51216K/1K 32K/2K 64K/4K 2GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB Throughput(GB/s)

Aggregate data redistribution throughput

•  Weak scaling with increasing number of processors

•  Applications redistribute data using DataSpaces

•  Application 1 runs on M processors and inserts data into the space

•  Application 2 runs on N processors and retrieves data from the space

•  Result: A 128 fold increase in the application sizes from 512 to 64K writers, total data size exchanged per step is increased from 2GB to 256GB

Multphysics Code Coupling at Extreme Scales [CCGrid10]

PGAS Extensions for Code Coupling [CCPE13]

DataSpaces: Enabling Coupled Scientific

Workflows at Extreme Scales

Data-centric Mappings for In-Situ Workﬂows [IPDPS12] Dynamic Code Deployment In-Staging [IPDPS11]

kernel_min { for i = 1, n for j = 1, m for k = 1, p if (min > A(i, j, k)) min = A(i, j, k) } data_kernels.c 0x69 0x20 0x61 0x6d 0x20 0x63 0x6f 0x6f 0x6c 0x0 data_kernels.o Applications.o Applications executable apr un apr un Compute nodes Link gcc -c Staging nodes rexec() return() for i = 0, ni-1 do for j = 0, nj-1 do for k = 0, nk-1 do val = input:get_val(i,j,k) if min > val then min = val end end end data_kernels.lua

Runtime execution system (Rexec)

(10)

Combustion: S3D uses chemical model to simulate combustion process in order to understand the ignition characteristic of bio-fuels for automotive usage [SC 2012]

Fusion: EPSi simulation workﬂow provides insight into edge plasma physics in

magnetic fusion devices by simulating movement of millions particles [CCGrid 2010]

Subsurface modeling: IPARS uses multi-physics, multi-model formulations and

coupled multi-block grids to support oil reservoir simulation of realistic, high-resolution reservoir studies with a million or more grids [CCGrid 2011]

Computational mechanics: FEM coupled with AMR is often used to solve heat

transfer or ﬂuid dynamic problems. AMR only performs reﬁnements on important region of the mesh [SC 2014 (in progress)]

Material Science: QMCPack utilizes quantum Monte Carlo (QMC) studies in

heterogeneous catalysis of transition metal nanoparticles, phase transitions, properties of materials under pressure, and strongly correlated materials

Material Science: SNS workﬂow enables integration of data, analysis, and

modeling tools for materials science experiments to accelerate scientiﬁc discoveries which requires beam lines data query capability

Workflows beyond simulations

• Content-based image retrieval (CBIR) on digitized

peripheral blood smear specimens using low-level

morphological features (MICAII’2012, CAC’2013,

MICAII’2013)

• Asynchronous Replica Exchange - Monitor the

progress of replicas, using the secondary structure

prediction methods (J. Comput. Chem.’2008)

• Control fluid flow in micro-devices by placing a

sequence of pillar obstacles (MTAGS’2013,

CISE’2013)

• Monte Carlo numerical strategies to unravel how

nucleosomes shape DNA into chromatin (J.

Biophysical’2014)

(11)

In-Situ Feature Extraction and Tracking using

Decentralized Online Clustering (DISC’12, ICAC’10)

DOC workers executed in-‐situ on simula%on machines

Simula/on Compute Nodes

One compute node

Processor core runs simula/on Processor core runs DOC worker

Beneﬁts of runtime feature extraction and tracking (1) Scientists can follow the events of interest (or data of interest)

(2) Scientists can do real-time monitoring of the running simulations DOC Overlay

Pixplot

8 cores

Pixmon

1 core

(login node)

ParaView Server

4 cores

In-situ viz. and

monitoring with staging"

Pixie3D

1024 cores

DataSpaces

record.bp

pixie3d.bp

(12)

DataSpaces

AMR-as-a-Service using DataSpaces

Grid GridField

FEM

(TalyFEM)

AMR

•  Finite Element Model (FEM):

•  Models using uniform 3D meshes for near-realistic engineering problems such as heat-transfer, fluid flow and phase transformation •  AMR Service:

•  Refines localized areas of interest, e.g., with high local error

•  DataSpaces-as-a-Service:

•  Persistent staging servers run as a service on HEC platform

•  Applications can connect, exchange data objects, and disconnect independently

pGrid pGridField Grid GridField pGrid pGridField

Overview of Recent DataSpaces Research

•  Programming abstractions / system

–  DataSpaces: Interaction, coordination and messaging abstractions for coupled scientific workflow [HPDC10, CCGrid10, HiPC12, JCC12]

–  XpressSpaces: PGAS extensions for coupling using DataSpaces [CCGrid11, CCPE13] –  ActiveSpaces: Dynamic code deployment for in-staging data processing [IPDPS11]

•  Runtime mechanisms

–  Data-centric Task Mapping: Reduce data movement and increase intra-node data sharing [IPDPS12, DISC12]

–  In-situ & In-transit Data Analytics: Simulation-time analysis of large volume data by combining in-situ and in-transit execution [SC12, DISC12]

–  Cross-layer Adaptation: Adaptive cross-layer approach for dynamic data management in large scale simulation-analysis workflow [SC13]

–  Value-based Data Indexing and Querying: Use FastBit to build in-situ, in-memory value-based indexing and query support in the staging area [In review]

–  Power/performance Tradeoffs: Characterizing power/performance tradeoffs for data-intensive simulations workflows [SC13]

–  Data Staging over Deep Memory Hierarchy: Build distributed associative object store over hierarchical memory storage, e.g., DRAM/NVRAM/SSD [HiPC13]

•  High-throughput/low-latency asynchronous data transport

–  DART: Network independent transport library for high speed asynchronous data extraction and transfer [HPDC08]

(13)

Integrating In-Situ and In-Transit Analytics (SC’12)

• S3D: First-principles direct

numerical simulation

• Simulation resolves features on

the order of 10 simulation time

steps

• Currently on the order of every

400

th

_{time step can be written to}

disk

• Temporal fidelity is compromised

when analysis is done as a

post-process

Recent data sets generated by S3D, developed at the Combus/on Research Facility, Sandia Na/onal Laboratories

In-situ Topological Analysis as Part of S3D*

33 Topology Sta/s/cs

Identify features of

interest

Volume Rendering

*J. C. Bennett et al., “Combining In-Situ and In-Transit Processing to Enable Extreme-Scale Scientiﬁc

(14)

Integrating In-Situ and In-Transit Analytics

(SC’12)

• Primary resources execute

the main simulation and in

situ computations

• Secondary resources

provide a staging area

whose cores act as

containers for in transit

computations

• 4896 cores total (4480 simula/on/in situ; 256 in

transit; 160 task scheduling/data movement)

• Simula/on size: 1600x1372x430

• All measurements are per simula/on /me step

Simulation case study with S3D: Timing results

for 4896 cores and analysis every 10

th

simulation time step

168.5& 119.81&

0& 20& 40& 60& 80& 100& 120& 140& 160& simula1on& hybrid&topology& hybrid&visualiza1on& in>situ&visualiza1on& hybrid&sta1s1cs& in>situ&sta1s1cs& seconds&

S3D&

in>situ&

data&movement&

in>transit&&

1.0% 1.0% .43% .004%

(15)

Simulation case study with S3D: Timing results

for 4896 cores and analysis every 100

th

simulation time step

1685% 0% 200% 400% 600% 800% 1000% 1200% 1400% 1600% simula/on% hybrid%topology% hybrid%visualiza/on% in<situ%visualiza/on% hybrid%sta/s/cs% in<situ%sta/s/cs% seconds%

S3D%

in<situ%

data%movement%

in<transit%%

.1% .1% .04%

.004% .16%

Challenges Opportunities Ahead: The Vision of

the Exascale Computing Initiative

• Achieve order 10

18

_{operations per second and order}

10

18

_{bytes of storage}

• Address the next generation of scientific, engineering,

and large

-

data problems

• Enable extreme scale computing: 1,000X capabilities

of today's computers with a similar size and power

footprint

–  20 pJ per average operation => ~ x40 improvement over today's efficiency –  Billion-way concurrency

• “Productive” system based on marketable

technologies

–  Ecosystem for application development and execution, system management

(16)

Ensemble of Models with UQ

Visual Analy/cs

First Principles Fusion Code XGC, “core-‐edge” turbulence with UQ Model Model Model Model Analysis Compara/ve Analy/cs Visualiza/on Experiment mock up

Knowledge

Database

KSTAR Tokamak diagnos/c diagnos/c diagnos/c diagnos/c diagnos/c diagnos/c Synthe/c Diagnos/c

Automate the process of these workﬂows and move work/data to sa/sfy metrics speciﬁed by users, and track provenance

Use ﬁrst principle calcula/ons on HPC to generate more accurate models Generate a DB of knowledge by codes, to use for predic/ve capability during/aker each experiment

Challenges Opportunities Ahead: Pervasive

Computational Ecosystems

(17)

Summary & Conclusions

• Complex applications running on high-end systems

generate extreme amounts of data that must be managed

and analyzed to get insights

–

 

Data costs (performance, latency, energy) are quickly dominating

–

 

Traditional data management/analytics pipelines are breaking

down

• Hybrid data staging, in-situ workflow execution, etc.

can

address this challenges

–

 

Users to efficiently intertwine applications, libraries, middleware for

complex analytics

• Many challenges; Programming, mapping and scheduling,

control and data flow, autonomic runtime management

….

–

 

The DataSpaces project explores solutions at various levels

• However, many challenges opportunities ahead !!

Thank You!

Manish Parashar, Ph.D.

Rutgers Discovery Informatics Institute (RDI2₎

Cloud & Autonomic Computing Center (CAC) Rutgers, The State University of New Jersey Email: [email protected]

WWW: rdi2.rutgers.edu WWW: dataspaces.org

!

Big Data Challenges in Simulation-based Science