Big Data Challenges in Simulation-based
Science
Manish Parashar
Rutgers Discovery Informatics Institute (RDI
2)
NSF Center for Cloud and Autonomic Computing (CAC)
http://parashar.rutgers.edu
*Hoang Bui, Tong Jin, Qian Sun, Fan Zhang and other ex-students and
collaborators (ORNL, GT, UU, NCSU, SNL, LBNL, LLNL, …
© Manish Parashar, Rutgers University
Outline
•
Data Grand Challenges in Simulation-base Science
•
Rethinking the simulations -> insights pipeline
•
The DataSpaces Project
Modern Science & Society Transformed by
Compute & Data
New paradigms and prac/ces
in science and engineering
•
Data-‐driven, data and
compute-‐intensive
•
Inherently mul/-‐
disciplinary
•
Collabora/ve (university,
na/onal, global)
Many Challenges
•
Computing, Data, Software,
People
4
Many Challenges….
•
Computing
– Multicore; large and increasing core counts, deep memory
hierarchies (TH-2: 54.9 PF, 3.12 M Cores, 1.4 PB, 25 MW)
– New prgm. models, concerns (fault tolerance, energy, etc)
– New models & technologies: Clouds, grids, hybrid
manycore, accelerators, deep storage hierarchies, …
•
Data
– Generating more data than in all of human history: preserve,
mine, share?
– How do we create “data scientists/engineers”?
•
Software
– Complex applications on coupled compute-data-networked
environments, tools needed
– Modern apps: 106+ lines, many groups contribute, take
decades to develop, very long lifetimes
•
People
– Multidisciplinary expertise essential!
The Era of eScience and Big Data
B
yt
es per da
y
2012
2020
Genomics LHC TeraGrid/ XSEDE, Blue Waters Square Kilometer Array Genomics LHC Climate, Environment LSST Exa Bytes Peta Bytes Tera Bytes Giga Bytes Climate, Environment
Many smaller datasets…
Genome sequencing are doubling in output every 9
months ! 100PB data from LSST
by the end of this decade!! Climate Data 2004 -> 36TB 2012 -> 2PB Keep increasing! LHC will produce roughly 15PB data/ year!
SKA project will generate 1 EB data/day in 2020!
Credit: R. Pennington/A. Blatecky
Clearly, modern scientific network/
instruments/experiments/… are
producing Big Data!!
But what about HPC?
O
Scientific Discovery through Simulations (I)
• ACI providing increasing computational scales, capabilities and capacities • Scientific simulations running on high-end computing systems generate huge
amounts of data!
– If a single core produces 2MB/minute on average, one of these machines could generate simulation data between ~170TB per hour -> ~700PB per day -> ~1.4EBper year
• Successful scientific discovery depends on a comprehensive understanding of this enormous simulation data
How we enable the computation scientists to
e
ffi
ciently manage and explore extreme scale data:
“
fi
nd the needles in haystack” ??
Scientific Discovery through Simulations (II)
•
Complex, heterogeneous
components
•
Large data volumes and data rates
•
Complex couplings, data
re-distribution, data transformations
•
Dynamic data exchange patterns
•
Strict performance, energy
constraints
•
Complex work
fl
ows integrating
coupled models, data management/
processing, analytics
•
Tight / loose coupling, data
driven, ensembles
•
Advanced numerical methods (E.g.,
Adaptive Mesh Re
fi
nement)
•
Integrated (online) uncertainty
quanti
fi
cation, analytics
Traditional Simulation -> Insight Pipelines Break Down
•
Traditional
simulation -> insight
pipeline:
– Run large-scale simulation
workflows on large supercomputers – Dump data on parallel disk systems – Export data to archives
– Move data to users’ sites – usually selected subsets
– Perform data manipulations and analysis on mid-size clusters – Collect experimental / observational
data
– Move to analysis sites – Perform comparison of
experimental/observational to validate simulation data
Challenges Faced by Traditional HPC Data Pipelines
•
Scalable data analytics challenge
• Can current data mining, manipulation and visualization algorithms still work effectively on extreme scale machine?
The
costs of data movement
are increasing and dominating!
Figure. Traditional data analysis pipeline
•
I/O and storage challenge
• Increasing performance gap: disks are outpaced by computing speed
•
Data movement challenge
• Lots of data movement between simulation and analysis, between coupled mutli-physics simulation components => longer latencies
• Improving data locality is critical: do work where the data resides!
•
Energy challenge
• Future extreme systems are designed to have low-power chips – however, much greater power consumption will be due to memory and data movement!
•
The energy cost of moving data is a
significant concern
From K. Yelick, “Software and Algorithms for Exascale: Ten Ways to Waste an Exascale Computer”"
The Cost of Data Movement
Energy _ move _ data= bitrate * length
2
cross_section_area_of_wire
performance gap
•
Moving data between node
memory and persistent
storage is slow!
Challenges Faced by Traditional HPC Data Pipelines
Traditional data analysis pipeline
We need to Rethink the Data Management Pipeline!
–
Reduce data movement
–
Move computation/analytics closer to the data
–
Add value to simulation data along the IO path
The
costs of data movement
(power and performance) are
increasing and dominating!
Rethinking the Data Management Pipeline – Hybrid Staging
+ In-Situ & In-Transit Execution
Issues/Challenges
•
Programming abstractions/systems
•
Mapping and scheduling
•
Control and data
fl
ow
Design space of possible workflow architectures
• Loca%on of the compute resources– Same cores as the simula/on (in situ) – Some (dedicated) cores on the same nodes
– Some dedicated nodes on the same machine – Dedicated nodes on an external resource • Data access, placement, and persistence
– Direct access to simula/on data structures
– Shared memory access via hand-‐off / copy – Shared memory access via non-‐vola/le near
node storage (NVRAM)
– Data transfer to dedicated nodes or external
resources
• Synchroniza%on and scheduling
– Execute synchronously with simula/on every nth simula/on /me step
– Execute asynchronously
Processing data on remote nodes
Using distinct cores on same node Sharing cores with the simulation
DRA M DRA M DR A M Simulation Node DR A M Staging Node Network Communication Analysis Tasks Simulation Visualization DRAM NVRAM SSD Hard Disk CPUs DRAM NVRAM SSD Hard Disk CPUs Network Node 1Node 2 Node N ... Staging option 1 Staging option 2 Staging option 3
DataSpaces
In-situ/In-transit Data Management & Analytics
!
Virtual shared-space programming abstraction
!
Simple API for coordination, interaction and messaging
!Distributed, associative, in-memory object store
!
Online data indexing, flexible querying
!Adaptive cross-layer runtime management
!
Hybrid in-situ/in-transit execution
!
Efficient, high-throughput/low-latency asynchronous data transport
ADIOS/
Da
taSpac
es
da
taspac
es
.or
g
DataSpaces: A Scalable Shared Space Abstraction for Hybrid
Data Staging [HPDC10, JCC12]
! Virtual shared-space abstraction ! Simple API for coordination,
interaction and messaging
! Provides a global-view
programming abstraction consistent with PGAS
! Distributed, associative,
in-deep-memory object store
! Online data indexing, flexible
querying
! Adaptive cross-layer runtime
management
! Hybrid in-situ/in-transit execution ! Data-centric mappings
! High-throughput/low-latency
memory-to-memory asynchronous data transport
• Dynamic coordination and interaction patterns between the coupled applications
– Transparent data redistribution – Complex geometry-based queries – In-space (online) data
transformation and manipulations
DataSpaces Abstraction: Indexing + DHT
•
Dynamically constructed
structured overlay using hybrid
staging cores
•
Index constructed online using
SFC mappings and applications
attributes
• E.g., application domain, data, field values of interest, etc.
•
DHT used to maintain
meta-data information
• E.g., geometric descriptors for the shared data, FastBit indices, etc.
•
Data objects load-balanced
separately across staging cores
•
The SFC maps the global domain to a
set of intervals
• Intervals are non-contiguous and can lead to load imbalance
• Second mapping compacts the intervals into a contiguous virtual interval
• Split the contiguous interval equally to the DataSpaces nodes
DataSpaces: Scalability on ORNL Titan
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 512/32 1024/64 2048/1284096/2568192/51216K/1K 32K/2K 64K/4K 2GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB Time(s)Data redistribution time
0 10 20 30 40 50 60 70 80 90 100 110 120 512/32 1024/64 2048/1284096/2568192/51216K/1K 32K/2K 64K/4K 2GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB Throughput(GB/s)
Aggregate data redistribution throughput
• Weak scaling with increasing number of processors
• Applications redistribute data using DataSpaces
• Application 1 runs on M processors and inserts data into the space
• Application 2 runs on N processors and retrieves data from the space
• Result: A 128 fold increase in the application sizes from 512 to 64K writers, total data size exchanged per step is increased from 2GB to 256GB
Multphysics Code Coupling at Extreme Scales [CCGrid10]
PGAS Extensions for Code Coupling [CCPE13]
DataSpaces: Enabling Coupled Scientific
Workflows at Extreme Scales
Data-centric Mappings for In-Situ Workflows [IPDPS12] Dynamic Code Deployment In-Staging [IPDPS11]
kernel_min { for i = 1, n for j = 1, m for k = 1, p if (min > A(i, j, k)) min = A(i, j, k) } data_kernels.c 0x69 0x20 0x61 0x6d 0x20 0x63 0x6f 0x6f 0x6c 0x0 data_kernels.o Applications.o Applications executable apr un apr un Compute nodes Link gcc -c Staging nodes rexec() return() for i = 0, ni-1 do for j = 0, nj-1 do for k = 0, nk-1 do val = input:get_val(i,j,k) if min > val then min = val end end end data_kernels.lua
Runtime execution system (Rexec)
Combustion: S3D uses chemical model to simulate combustion process in order to understand the ignition characteristic of bio-fuels for automotive usage [SC 2012]
Fusion: EPSi simulation workflow provides insight into edge plasma physics in
magnetic fusion devices by simulating movement of millions particles [CCGrid 2010]
Subsurface modeling: IPARS uses multi-physics, multi-model formulations and
coupled multi-block grids to support oil reservoir simulation of realistic, high-resolution reservoir studies with a million or more grids [CCGrid 2011]
Computational mechanics: FEM coupled with AMR is often used to solve heat
transfer or fluid dynamic problems. AMR only performs refinements on important region of the mesh [SC 2014 (in progress)]
Material Science: QMCPack utilizes quantum Monte Carlo (QMC) studies in
heterogeneous catalysis of transition metal nanoparticles, phase transitions, properties of materials under pressure, and strongly correlated materials
Material Science: SNS workflow enables integration of data, analysis, and
modeling tools for materials science experiments to accelerate scientific discoveries which requires beam lines data query capability
Workflows beyond simulations
•
Content-based image retrieval (CBIR) on digitized
peripheral blood smear specimens using low-level
morphological features (MICAII’2012, CAC’2013,
MICAII’2013)
•
Asynchronous Replica Exchange - Monitor the
progress of replicas, using the secondary structure
prediction methods (J. Comput. Chem.’2008)
•
Control fluid flow in micro-devices by placing a
sequence of pillar obstacles (MTAGS’2013,
CISE’2013)
•
Monte Carlo numerical strategies to unravel how
nucleosomes shape DNA into chromatin (J.
Biophysical’2014)
In-Situ Feature Extraction and Tracking using
Decentralized Online Clustering (DISC’12, ICAC’10)
DOC workers executed in-‐situ on simula%on machines
Simula/on Compute Nodes
One compute node
Processor core runs simula/on Processor core runs DOC worker
Benefits of runtime feature extraction and tracking (1) Scientists can follow the events of interest (or data of interest)
(2) Scientists can do real-time monitoring of the running simulations DOC Overlay
Pixplot
8 cores
Pixmon
1 core
(login node)
ParaView Server
4 cores
In-situ viz. and
monitoring with staging"
Pixie3D
1024 cores
DataSpaces
record.bp
record.bp
record.bp
pixie3d.bp
pixie3d.bp
pixie3d.bp
DataSpaces
AMR-as-a-Service using DataSpaces
Grid GridFieldFEM
(TalyFEM)
AMR
• Finite Element Model (FEM):
• Models using uniform 3D meshes for near-realistic engineering problems such as heat-transfer, fluid flow and phase transformation • AMR Service:
• Refines localized areas of interest, e.g., with high local error
• DataSpaces-as-a-Service:
• Persistent staging servers run as a service on HEC platform
• Applications can connect, exchange data objects, and disconnect independently
pGrid pGridField Grid GridField pGrid pGridField
Overview of Recent DataSpaces Research
• Programming abstractions / system– DataSpaces: Interaction, coordination and messaging abstractions for coupled scientific workflow [HPDC10, CCGrid10, HiPC12, JCC12]
– XpressSpaces: PGAS extensions for coupling using DataSpaces [CCGrid11, CCPE13] – ActiveSpaces: Dynamic code deployment for in-staging data processing [IPDPS11]
• Runtime mechanisms
– Data-centric Task Mapping: Reduce data movement and increase intra-node data sharing [IPDPS12, DISC12]
– In-situ & In-transit Data Analytics: Simulation-time analysis of large volume data by combining in-situ and in-transit execution [SC12, DISC12]
– Cross-layer Adaptation: Adaptive cross-layer approach for dynamic data management in large scale simulation-analysis workflow [SC13]
– Value-based Data Indexing and Querying: Use FastBit to build in-situ, in-memory value-based indexing and query support in the staging area [In review]
– Power/performance Tradeoffs: Characterizing power/performance tradeoffs for data-intensive simulations workflows [SC13]
– Data Staging over Deep Memory Hierarchy: Build distributed associative object store over hierarchical memory storage, e.g., DRAM/NVRAM/SSD [HiPC13]
• High-throughput/low-latency asynchronous data transport
– DART: Network independent transport library for high speed asynchronous data extraction and transfer [HPDC08]
Integrating In-Situ and In-Transit Analytics (SC’12)
•
S3D: First-principles direct
numerical simulation
•
Simulation resolves features on
the order of 10 simulation time
steps
•
Currently on the order of every
400
thtime step can be written to
disk
•
Temporal fidelity is compromised
when analysis is done as a
post-process
Recent data sets generated by S3D, developed at the Combus/on Research Facility, Sandia Na/onal Laboratories
In-situ Topological Analysis as Part of S3D*
33 Topology Sta/s/cs
Identify features of
interest
Volume Rendering*J. C. Bennett et al., “Combining In-Situ and In-Transit Processing to Enable Extreme-Scale Scientific
Integrating In-Situ and In-Transit Analytics
(SC’12)
•
Primary resources execute
the main simulation and in
situ computations
•
Secondary resources
provide a staging area
whose cores act as
containers for in transit
computations
•
4896 cores total (4480 simula/on/in situ; 256 in
transit; 160 task scheduling/data movement)
•
Simula/on size: 1600x1372x430
•
All measurements are per simula/on /me step
Simulation case study with S3D: Timing results
for 4896 cores and analysis every 10
th
simulation time step
168.5& 119.81&
0& 20& 40& 60& 80& 100& 120& 140& 160& simula1on& hybrid&topology& hybrid&visualiza1on& in>situ&visualiza1on& hybrid&sta1s1cs& in>situ&sta1s1cs& seconds&
S3D&
in>situ&
data&movement&
in>transit&&
1.0% 1.0% .43% .004%
Simulation case study with S3D: Timing results
for 4896 cores and analysis every 100
th
simulation time step
1685% 0% 200% 400% 600% 800% 1000% 1200% 1400% 1600% simula/on% hybrid%topology% hybrid%visualiza/on% in<situ%visualiza/on% hybrid%sta/s/cs% in<situ%sta/s/cs% seconds%
S3D%
in<situ%
data%movement%
in<transit%%
.1% .1% .04%
.004% .16%
Challenges Opportunities Ahead: The Vision of
the Exascale Computing Initiative
•
Achieve order 10
18operations per second and order
10
18bytes of storage
•
Address the next generation of scientific, engineering,
and large
-
data problems
•
Enable extreme scale computing: 1,000X capabilities
of today's computers with a similar size and power
footprint
– 20 pJ per average operation => ~ x40 improvement over today's efficiency – Billion-way concurrency
•
“Productive” system based on marketable
technologies
– Ecosystem for application development and execution, system management
Ensemble of Models with UQ
Visual Analy/csFirst Principles Fusion Code XGC, “core-‐edge” turbulence with UQ Model Model Model Model Analysis Compara/ve Analy/cs Visualiza/on Experiment mock up
Knowledge
Database
KSTAR Tokamak diagnos/c diagnos/c diagnos/c diagnos/c diagnos/c diagnos/c Synthe/c Diagnos/cAutomate the process of these workflows and move work/data to sa/sfy metrics specified by users, and track provenance
Use first principle calcula/ons on HPC to generate more accurate models Generate a DB of knowledge by codes, to use for predic/ve capability during/aker each experiment
Challenges Opportunities Ahead: Pervasive
Computational Ecosystems
Summary & Conclusions
•
Complex applications running on high-end systems
generate extreme amounts of data that must be managed
and analyzed to get insights
–
Data costs (performance, latency, energy) are quickly dominating
–
Traditional data management/analytics pipelines are breaking
down
•
Hybrid data staging, in-situ workflow execution, etc.
can
address this challenges
–
Users to efficiently intertwine applications, libraries, middleware for
complex analytics
•
Many challenges; Programming, mapping and scheduling,
control and data flow, autonomic runtime management
….–
The DataSpaces project explores solutions at various levels
•
However, many challenges opportunities ahead !!
Thank You!
Manish Parashar, Ph.D.
Rutgers Discovery Informatics Institute (RDI2)
Cloud & Autonomic Computing Center (CAC) Rutgers, The State University of New Jersey Email: [email protected]
WWW: rdi2.rutgers.edu WWW: dataspaces.org
!