Addressing Crosscutting Compute/Data
Challenges
Manish Parashar
[email protected]
© Manish Parashar, Rutgers University
Rutgers Discovery
Informatics Institute (RDI
2
)
New Jersey
’
s Center for Advanced
Computation
2
CDSE - Many crosscutting challenges
•
Computing
– Multicore; large and increasing core counts, deep memory
hierarchies (TH-2: 54.9 PF, 3.12 M Cores, 1.4 PB, 25 MW)
– New prgm. model, concerns (fault tolerance, energy, etc)
– New models & technologies: Clouds, grids, hybrid
manycore, accelerators, deep storage hierarchies, …
•
Data
– Generating more data than in all of human history: preserve,
mine, share?
•
Software
– Complex applications on coupled compute-data-networked
environments, tools needed
– Modern apps: 106+ lines, many groups contribute, take
decades to develop, very long lifetimes •
People
– Multidisciplinary expertise essential!
Complex Coupled Application Workflows
•
Tight coupling
– Coupled simulations/models run concurrently and exchange data frequently – E.g., Coupled fusion simulations, multi-block subsurface modeling
simulations
•
Loose coupling
– Coupling and data exchange are less frequent, and often asynchronous and possibly opportunistic
– E.g., Combustion simulations – DNS + LES coupling, material science simulations
•
Data-flow (data-driven) coupling
– Workflows expressed as dataflow based DAG where data produced by a simulation flows through one or more data processing pipelines – E.g., End-to-end simulations workflows with data analytics
•
Ensemble workflows
• Multiple fan-out stages composed of large numbers of loosely coupled or independent simulations
• E.g., Uncertainty quantification, history matching, data assimilation
Scientific Simulations: A Big Data Challenge
•
Scientific simulations running on high-end computing systems generate
huge amounts of data!
– If a single core produces 2MB/minute on average, one of these machines could generate simulation data between ~170TB per hour -> ~700PB per day ->
~1.4EBper year
•
Successful scientific discovery depends on a comprehensive understanding
of this enormous simulation data
How we enable the computation scientists to
e
ffi
ciently manage and explore extreme scale data:
“
fi
nd the needles in haystack” ??
Challenges Faced by Traditional HPC Data Pipelines
The
costs of data movement
are increasing and dominating!
Figure. Traditional data analysis pipeline•
Data analysis challenge
• Can current data mining, manipulation and visualization algorithms still work effectively on extreme scale machine?
•
I/O challenge
• Increasing performance gap: disks are outpaced by computing speed
•
Data movement challenge
• Lots of data movement between simulation and analysis machines, between coupled mutli-physics simulation components -> longer latencies
• Improving data locality is critical: do work where the data resides!
•
Energy challenge
• Future extreme systems are designed to have low-power chips – however, much greater power consumption will be due to memory and data movement!
Coupling Cycles of Innovation
Interoperability/Standards?
7
Organization Organization Organization
Grid Middleware Infrastructure
Virtual Organization
Applications
Programming System
(Programming Model + Abstract Machine)
Virtualization
(Create and manage VOs and VMs)
Abstraction
(Support abstract behaviors/assumptions)
Con ce ptual Im pl ementati o n
Virtual Machine Virtual Machine Virtual Machine
Organization Organization Organization
Organization
Organization OrganizationOrganization OrganizationOrganization
Grid Middleware Infrastructure
Virtual Organization
Applications
Programming System
(Programming Model + Abstract Machine)
Virtualization
(Create and manage VOs and VMs)
Abstraction
(Support abstract behaviors/assumptions)
Virtualization
(Create and manage VOs and VMs)
Abstraction
(Support abstract behaviors/assumptions)
Con ce ptual Im pl ementati o n
Virtual Machine Virtual Machine Virtual Machine
Addressing Complexity
– Programming System
• programming model, languages/
abstraction – syntax + semantics • abstract machine – execution
context and assumptions • infrastructure, middleware &
runtime – Hide v/s expose?
• virtual homogeneity – ease of programming
• robustness ?
• cross-layer adaptation & optimization
– the inverted stack …
–
Abstractions?
“Conceptual and Implementation Models for the Grid,” M. Parashar and J.C. Browne, Proceedings of the IEEE, Special Issue on Grid Computing, IEEE Press, Vol. 19, No. 3, March 2005.!
Software Engineering and CDS&E
• Many sources of complexity– complexity from the problem (complex physics) – complexity of the end-to-end application workflow
– complexity from coordination & coupling across codes and research teams – complexity from the codes and how they are developed and implemented – complexity from the underlying disruptive infrastructure
• Challenge: Manage complexity while maintaining performance/scalability
• Coupling cycles of innovation
• Software Engineering in the Large v/s Software Engineering in the Small ?
– Software management of large systems v/s smaller but highly complex systems – Well defined v/s rapidly evolving requirements/execution environments – Primary complexity from size v/s application requirements
– Relative importance of performance – …
• Design Principles
– Separation of Concerns – Abstractions
• Interoperability and standardization
Complexity suggests a SOA approach
•
Service Oriented Architecture (SOA): Software as a composition of
“services”
– Used to deal with system/application complexity, rapidly changing requirements, evolving target platforms, and diverse teams
– Service: “… a well-defined, self-contained, and independently developed software element that does not depend on the context or state of other services.”
– Abstraction & separation
• Computations from compositions and coordination • Interface from implementations
– Existing and proven concept - widely accepted/used by the enterprise computing community
• For example, extreme-scale data analysis services routinely run at Yahoo & Google, for near real-time web data analysis, for rapid deployment of new web services, etc.
Examples of some relevant services
• I/O services: Efficient and adaptable I/O support– I/O services should be componentized so that it allows codes to switch between and tune different methods easily.
– Example: a user may easily switch between file-based and
memory-based I/O or switch formats without changing any code, all the while maintaining the same data model when switching I/O components.
• Code-coupling services: Code-coupling services includes codes that can run on the same or different machines, codes that are tightly coupled (memory-to-memory), as well as codes that are loosely coupled (exchange of data through files).
• Automatic online (in-transit) data processing services: Efficient data movement services for distributing and archiving data
– Example: generating summary statistics, graphs, images, movies, etc.
• Monitoring Services: dynamic real-time monitoring of large-scale simulations (during execution)
• Automated provenance collections services: Data, system workflow, application, etc.
ADIOS: Adaptable I/O System
•
Provides portable, fast, scalable,
easy-‐to-‐use, metadata rich output
•
Simple API
•
Layered so;ware architecture:
– Allows plug-‐ins for different I/O
implementaCons
– Abstracts the API from the method used
for I/O
•
Beyond I/O – coupling,
coordinaCon, in-‐situ/in-‐transit,
etc.
•
Open source:
– hGp://www.nccs.gov/user-‐support/
center-‐projects/adios/
•
Research methods from many
groups:
– Georgia Tech, Rutgers, NCSU, Emory, Auburn, Sandia, LBL, PPPL Interface)to)apps)for)descrip/on)of)data)(ADIOS,)etc.)) Buffering) Feedback) )Schedule) Mul/Bresolu/on)
methods) Data)Compression)methods) (FastBit))methods)Data)Indexing) Data)Management)Services)
Workflow))Engine) Run/me)engine) Data)movement) Provenance) Plugins)to)the)hybrid)staging)area) Visualiza/on)Plugins) Analysis)Plugins) Parallel)and)Distributed)File)System) IDX) HDF5)
AdiosBbp) pnetcdf) “raw”)data) Image)data) Viz.)Client) 0 5 10 15 20 25 30 35 S3D SEC Fine/Turbo G B/ s
Simulations with and without ADIOS I/O performance of the Combustion S3D code (96K
cores), the SCEC PCML3D (30K cores), and the Fine/Turbo (4K cores) codes.
MPI-IO OR PHDF5 ADIOS
Application Code Contact
Astrophysics+ Chimera+ T.+Mezzacappa+
Combus6on+ S3D+ J.+Chen+ AMR+ Chombo+ B.+Van+Straalen+
CFD+ Fine/Turbo+ M.+Gon6er+ FusionCEdge+ XGC1+ C.+S.+Chang+ FusionCEdge+ XGC0+ S.+H.+Ku+ Geoscience+ AWPCODC+ Y.+Cui+ Materials+ LAMMPS+ A.+Frachioni+ Weather+ GRAPES+ W.+Xue+ Nuclear+ HFODD+ H.+Nam+ Rela6vity+ Maya+ P.+Laguna+ Sub+surface+ M.+Wheeler+ Materials+ M.+Parashar+ Fusion+ GTCCP+ S.+Ethier+
Application Code Contact
Fusion+ GTS+ W.+Wang+ Fusion+ M3DCC1+ S.+Jardin+ Fusion+ M3DCK+ G.+Y.+Fu+ Fusion+ GTC+ Z.+Lin+ Fusion+ GEM+ S.+Parker+ Quantum+ QLG2Q+ M.+Soe+
Image+Analysis+ T.+Kurc+
AMR+ Boxlib+ J.+Bell+ Rela6vity+ Cactus+ G.+Allen+ Weather+ NASA+ T.+Clune+ Materials+ QMC+ J.+Kim+
Climate+ J.+Fu+
Climate+ CAM+ K.+Evans+ Application Code Contact
Astrophysics+ Chimera+ T.+Mezzacappa+
Combus6on+ S3D+ J.+Chen+ AMR+ Chombo+ B.+Van+Straalen+
CFD+ Fine/Turbo+ M.+Gon6er+ FusionCEdge+ XGC1+ C.+S.+Chang+ FusionCEdge+ XGC0+ S.+H.+Ku+ Geoscience+ AWPCODC+ Y.+Cui+ Materials+ LAMMPS+ A.+Frachioni+ Weather+ GRAPES+ W.+Xue+ Nuclear+ HFODD+ H.+Nam+ Rela6vity+ Maya+ P.+Laguna+ Sub+surface+ M.+Wheeler+ Materials+ M.+Parashar+ Fusion+ GTCCP+ S.+Ethier+
Application Code Contact
Fusion+ GTS+ W.+Wang+ Fusion+ M3DCC1+ S.+Jardin+ Fusion+ M3DCK+ G.+Y.+Fu+ Fusion+ GTC+ Z.+Lin+ Fusion+ GEM+ S.+Parker+ Quantum+ QLG2Q+ M.+Soe+
Image+Analysis+ T.+Kurc+
AMR+ Boxlib+ J.+Bell+ Rela6vity+ Cactus+ G.+Allen+ Weather+ NASA+ T.+Clune+ Materials+ QMC+ J.+Kim+
Climate+ J.+Fu+
Climate+ CAM+ K.+Evans+
ADIOS Application
Partners
Over 158 Publications from 2006 - 2013
The ADIOS Approach
•
The overarching design philosophy of our framework is
based on the
Service-Oriented Architecture
–
Used to deal with system/application complexity, rapidly
changing requirements, evolving target platforms, and
diverse teams
•
Applications constructed by assembling services based
on a universal view of their functionality using a
well-defined API
•
Service implementations can be changed easily
•
Integrated simulation can be assembled using these
ADIOS SOA: I/O Componentization
• End users should be able to select the most efficient I/O method for their code, with minimal effort in terms of code alternations/updates.
• Performance-driven choices should not prevent data from being stored in the desired file format, since this is crucial for later data analysis.
• Have efficient ways of identifying and selecting certain data for analysis, to help end users cope with the flood of data being produced.
• Make it easy to introduce new research I/O methods, without changing your code. – allow the “best” CDS&E researchers to contribute in application science projects. • Make it easy to allow I/O to do more than just I/O à code coupling, in situ
visualization.
•
Exploit multi-levels in-memory
Hybrid Data Staging
to:
– Decrease the gap between CPU and IO speeds
– Dynamically deploy and execute data analytical or pre-processing operations either in-situ or in-transit
– Improve IO write performance
Rethinking the Data Management Pipeline – Hybrid Staging
+ In-Situ & In-Transit Execution
In-situ/In-transit Data Management & Analytics
l
Virtual shared-space programming abstraction
l
Simple API for coordination, interaction and messaging
l
Distributed, associative, in-memory object store
l
Online data indexing, flexible querying
l
Adaptive cross-layer runtime management
l
Hybrid in-situ/in-transit execution
l
Efficient, high-throughput/low-latency asynchronous data transport
Overview of Research Work @ RU
•
Programming abstractions / system
– DataSpaces/XpressSpaces: Interaction, coordination and messaging abstraction for coupled scientific workflow [HPDC10, CCGrid10,11, HiPC12, CCPE13]
– ActiveSpaces: Dynamic code deployment for in-staging data processing [IPDPS11]
•
Runtime mechanisms
– Data-centric Task Mapping: Reduce data movement and increase intra-node data sharing [IPDPS12, DISC12]
– In-situ & In-transit Data Analytics: Simulation-time analysis of large volume data by combining in-situ and in-transit execution [SC’12]
– Cross-layer Adaptation: adaptive cross-layer approach for dynamic data management in large scale simulation-analysis workflow
•
High-throughput/low-latency asynchronous data transport
– DART: Network independent transport library for high speed asynchronous data extraction and transfer [HPDC08]
DataSpaces: A Shared Space Abstraction for the
Hybrid Staging Area [HPDC’10]
•
Semantically-specialized
virtual
shared space abstraction in the
staging area
– Shared (distributed) data objects – Simple put/get/query API – Supports coupled app. workflows – Provide a global-view programming
abstraction consistent with the PGAS model (UPC, GA)
•
DataSpaces (HPDC10)
– Constructed on-the-fly on hybrid staging nodes
• Indexes data for quick access and retrieval
• Provides asynchronous coordination and interaction support
– Complements existing interaction/ coordination mechanisms
•
Code coupling using DataSpaces
– Transparent data redistribution – Complex geometry-based queries – In-space (online) datatransformation and manipulations
Figure. Graphical view of the shared space model (derived from Tuple space) for coordination/ interaction
The Replica Exchange Algorithm
•
A powerful sampling algorithm that preserves canonical distributions
– Allows for efficient crossing of high energy barriers that separate thermodynamically stable states
– Can significantly reduce sampling times as compared to formulations based on constant temperatures
•
Algorithm overview
– Several copies, or replicas, of the system are simulated in parallel at different temperatures using “walkers”
– Walkers occasionally swap temperatures to allow them to bypass enthalpic barriers by moving to a high temperature
• Exchanges governed by a probability condition to ensure detailed balance
•
Replica exchange can significantly impact the fields of structural
biology and drug design
– structure based drug design associated with protein misfolding, for example, structure based drug design and binding affinity optimization – molecular basis of human diseases associated with protein misfolding
Parallel Replica Exchange
•
General formulation requires dynamic and complex
coordination and communication patterns between
the walkers
–
Pair-wise and asynchronous
–
Dependent on the current state (temperature, energy) of
replicas, which are changing dynamically
•
Implementation based on commonly used parallel
programming frameworks is challenging
–
Message passing frameworks, e.g. MPI, require matching
sends and receives to be explicitly defined for each
interaction
–
Implementations use restrictive formulations using
synchronous exchanges
• Exchanges between neighboring temperatures only
Asynchronous Replica Exchange using
DataSpaces
•
DataSpaces provides a semantically specialized virtual
shared space abstraction to support scalable
asynchronous replica exchange for molecular dynamics
applications
–
Enables general non-nearest neighboring temperature exchanges
–
Exchanges are decoupled and asynchronously and dynamically
determined
–
Communications are decentralized and peer-to-peer, and occur in
parallel
•
Operation
–
Walkers post temperature range of interest to the
shared space using post
–
Walkers discover partners and negotiate exchange
Example: Salsa Operation
Summary
• Unprecedented opportunities for dramatic insights through
computation!
• Challenge: Manage complexity while maintaining performance/scalability.
– complexity from the problem (complex physics)
– complexity from the codes and how they are developed and implemented – complexity of underlying infrastructure (disruptive hardware trends) – complexity from coordination across codes and research teams
• Overarching philosophy
– Abstraction & Separation through Service Orientation • Independence in development, deployment, execution
• Computations from compositions and coordination; Interface from implementations – Existing and proven concepts - widely accepted/used by the enterprise
computing community
• ADIOS
– Reducing barriers from a scientists perspective • Ease-of-use, simple code integration and maintainability – Minimizing performance impact
Thank You!
Manish Parashar, Ph.D.
Prof., Dept. of Electrical & Computer Engr. Rutgers Discovery Informatics Institute (RDI2) Cloud & Autonomic Computing Center (CAC) Rutgers, The State University of New Jersey Email: [email protected]
WWW: rdi2.rutgers.edu
!