Addressing Crosscutting Compute/Data Challenges

(1)

Addressing Crosscutting Compute/Data

Challenges

Manish Parashar

[email protected]

Rutgers Discovery

Informatics Institute (RDI

2 ₎

New Jersey

’

_{s Center for Advanced}

Computation







2

CDSE - Many crosscutting challenges

• Computing

–  Multicore; large and increasing core counts, deep memory

hierarchies (TH-2: 54.9 PF, 3.12 M Cores, 1.4 PB, 25 MW)

–  New prgm. model, concerns (fault tolerance, energy, etc)

–  New models & technologies: Clouds, grids, hybrid

manycore, accelerators, deep storage hierarchies, …

• Data

–  Generating more data than in all of human history: preserve,

mine, share?

• Software

–  Complex applications on coupled compute-data-networked

environments, tools needed

–  Modern apps: 106+ lines, many groups contribute, take

decades to develop, very long lifetimes • 

People

–  Multidisciplinary expertise essential!

(2)

Complex Coupled Application Workflows

• Tight coupling

–  Coupled simulations/models run concurrently and exchange data frequently – E.g., Coupled fusion simulations, multi-block subsurface modeling

simulations

• Loose coupling

–  Coupling and data exchange are less frequent, and often asynchronous and possibly opportunistic

– E.g., Combustion simulations – DNS + LES coupling, material science simulations

• Data-flow (data-driven) coupling

–  Workflows expressed as dataflow based DAG where data produced by a simulation flows through one or more data processing pipelines – E.g., End-to-end simulations workflows with data analytics

• Ensemble workflows

•  Multiple fan-out stages composed of large numbers of loosely coupled or independent simulations

•  E.g., Uncertainty quantification, history matching, data assimilation







(3)







Scientific Simulations: A Big Data Challenge

• Scientific simulations running on high-end computing systems generate

huge amounts of data!

–  If a single core produces 2MB/minute on average, one of these machines could generate simulation data between ~170TB per hour -> ~700PB per day ->

~1.4EBper year

• Successful scientific discovery depends on a comprehensive understanding

of this enormous simulation data

How we enable the computation scientists to

e

ﬃ

ciently manage and explore extreme scale data:

“

ﬁ

nd the needles in haystack” ??







Challenges Faced by Traditional HPC Data Pipelines

The

costs of data movement

are increasing and dominating!

Figure. Traditional data analysis pipeline

• Data analysis challenge

•  Can current data mining, manipulation and visualization algorithms still work effectively on extreme scale machine?

• I/O challenge

•  Increasing performance gap: disks are outpaced by computing speed

• Data movement challenge

•  Lots of data movement between simulation and analysis machines, between coupled mutli-physics simulation components -> longer latencies

•  Improving data locality is critical: do work where the data resides!

• Energy challenge

•  Future extreme systems are designed to have low-power chips – however, much greater power consumption will be due to memory and data movement!

(4)

Coupling Cycles of Innovation

Interoperability/Standards?

7







Organization Organization Organization

Grid Middleware Infrastructure

Virtual Organization

Applications

Programming System

(Programming Model + Abstract Machine)

Virtualization

(Create and manage VOs and VMs)

Abstraction

(Support abstract behaviors/assumptions)

Con ce ptual Im pl ementati o n

Virtual Machine Virtual Machine Virtual Machine

Organization Organization Organization

Organization

Organization OrganizationOrganization OrganizationOrganization

Grid Middleware Infrastructure

Virtual Organization

Applications

Programming System

(Programming Model + Abstract Machine)

Virtualization

Abstraction

Virtualization

Abstraction

Con ce ptual Im pl ementati o n

Virtual Machine Virtual Machine Virtual Machine

Addressing Complexity

–  Programming System

•  programming model, languages/

abstraction – syntax + semantics •  abstract machine – execution

context and assumptions •  infrastructure, middleware &

runtime –  Hide v/s expose?

•  virtual homogeneity – ease of programming

•  robustness ?

•  cross-layer adaptation & optimization

– the inverted stack …

–

 

Abstractions?

“Conceptual and Implementation Models for the Grid,” M. Parashar and J.C. Browne, Proceedings of the IEEE, Special Issue on Grid Computing, IEEE Press, Vol. 19, No. 3, March 2005.!

(5)







Software Engineering and CDS&E

•  Many sources of complexity

–  complexity from the problem (complex physics) –  complexity of the end-to-end application workflow

–  complexity from coordination & coupling across codes and research teams –  complexity from the codes and how they are developed and implemented –  complexity from the underlying disruptive infrastructure

•  Challenge: Manage complexity while maintaining performance/scalability

•  Coupling cycles of innovation

•  Software Engineering in the Large v/s Software Engineering in the Small ?

–  Software management of large systems v/s smaller but highly complex systems –  Well defined v/s rapidly evolving requirements/execution environments –  Primary complexity from size v/s application requirements

–  Relative importance of performance –  …

•  Design Principles

–  Separation of Concerns –  Abstractions

•  Interoperability and standardization







Complexity suggests a SOA approach

• Service Oriented Architecture (SOA): Software as a composition of

“services”

–  Used to deal with system/application complexity, rapidly changing requirements, evolving target platforms, and diverse teams

–  Service: “… a well-defined, self-contained, and independently developed software element that does not depend on the context or state of other services.”

–  Abstraction & separation

•  Computations from compositions and coordination •  Interface from implementations

–  Existing and proven concept - widely accepted/used by the enterprise computing community

•  For example, extreme-scale data analysis services routinely run at Yahoo & Google, for near real-time web data analysis, for rapid deployment of new web services, etc.

(6)

Examples of some relevant services

•  I/O services: Efficient and adaptable I/O support

–  I/O services should be componentized so that it allows codes to switch between and tune different methods easily.

–  Example: a user may easily switch between file-based and

memory-based I/O or switch formats without changing any code, all the while maintaining the same data model when switching I/O components.

•  Code-coupling services: Code-coupling services includes codes that can run on the same or different machines, codes that are tightly coupled (memory-to-memory), as well as codes that are loosely coupled (exchange of data through files).

•  Automatic online (in-transit) data processing services: Efficient data movement services for distributing and archiving data

–  Example: generating summary statistics, graphs, images, movies, etc.

•  Monitoring Services: dynamic real-time monitoring of large-scale simulations (during execution)

•  Automated provenance collections services: Data, system workflow, application, etc.







ADIOS: Adaptable I/O System

• Provides portable, fast, scalable,

easy-‐to-‐use, metadata rich output

• Simple API

• Layered so;ware architecture:

– Allows plug-‐ins for diﬀerent I/O

implementaCons

– Abstracts the API from the method used

for I/O

• Beyond I/O – coupling,

coordinaCon, in-‐situ/in-‐transit,

etc.

• Open source:

– hGp://www.nccs.gov/user-‐support/

center-‐projects/adios/

• Research methods from many

groups:

–  Georgia Tech, Rutgers, NCSU, Emory, Auburn, Sandia, LBL, PPPL Interface)to)apps)for)descrip/on)of)data)(ADIOS,)etc.)) Buﬀering) Feedback) )Schedule) Mul/Bresolu/on)

methods) Data)Compression)methods) (FastBit))methods)Data)Indexing) Data)Management)Services)

Workﬂow))Engine) Run/me)engine) Data)movement) Provenance) Plugins)to)the)hybrid)staging)area) Visualiza/on)Plugins) Analysis)Plugins) Parallel)and)Distributed)File)System) IDX) HDF5)

AdiosBbp) pnetcdf) “raw”)data) Image)data) Viz.)Client) 0 5 10 15 20 25 30 35 S3D SEC Fine/Turbo G B/ s

Simulations with and without ADIOS I/O performance of the Combustion S3D code (96K

cores), the SCEC PCML3D (30K cores), and the Fine/Turbo (4K cores) codes.

MPI-IO OR PHDF5 ADIOS

(7)







Application Code Contact

Astrophysics+ Chimera+ T.+Mezzacappa+

Combus6on+ S3D+ J.+Chen+ AMR+ Chombo+ B.+Van+Straalen+

CFD+ Fine/Turbo+ M.+Gon6er+ FusionCEdge+ XGC1+ C.+S.+Chang+ FusionCEdge+ XGC0+ S.+H.+Ku+ Geoscience+ AWPCODC+ Y.+Cui+ Materials+ LAMMPS+ A.+Frachioni+ Weather+ GRAPES+ W.+Xue+ Nuclear+ HFODD+ H.+Nam+ Rela6vity+ Maya+ P.+Laguna+ Sub+surface+ M.+Wheeler+ Materials+ M.+Parashar+ Fusion+ GTCCP+ S.+Ethier+

Application Code Contact

Fusion+ GTS+ W.+Wang+ Fusion+ M3DCC1+ S.+Jardin+ Fusion+ M3DCK+ G.+Y.+Fu+ Fusion+ GTC+ Z.+Lin+ Fusion+ GEM+ S.+Parker+ Quantum+ QLG2Q+ M.+Soe+

Image+Analysis+ T.+Kurc+

AMR+ Boxlib+ J.+Bell+ Rela6vity+ Cactus+ G.+Allen+ Weather+ NASA+ T.+Clune+ Materials+ QMC+ J.+Kim+

Climate+ J.+Fu+

Climate+ CAM+ K.+Evans+ Application Code Contact

Astrophysics+ Chimera+ T.+Mezzacappa+

Combus6on+ S3D+ J.+Chen+ AMR+ Chombo+ B.+Van+Straalen+

CFD+ Fine/Turbo+ M.+Gon6er+ FusionCEdge+ XGC1+ C.+S.+Chang+ FusionCEdge+ XGC0+ S.+H.+Ku+ Geoscience+ AWPCODC+ Y.+Cui+ Materials+ LAMMPS+ A.+Frachioni+ Weather+ GRAPES+ W.+Xue+ Nuclear+ HFODD+ H.+Nam+ Rela6vity+ Maya+ P.+Laguna+ Sub+surface+ M.+Wheeler+ Materials+ M.+Parashar+ Fusion+ GTCCP+ S.+Ethier+

Application Code Contact

Fusion+ GTS+ W.+Wang+ Fusion+ M3DCC1+ S.+Jardin+ Fusion+ M3DCK+ G.+Y.+Fu+ Fusion+ GTC+ Z.+Lin+ Fusion+ GEM+ S.+Parker+ Quantum+ QLG2Q+ M.+Soe+

Image+Analysis+ T.+Kurc+

AMR+ Boxlib+ J.+Bell+ Rela6vity+ Cactus+ G.+Allen+ Weather+ NASA+ T.+Clune+ Materials+ QMC+ J.+Kim+

Climate+ J.+Fu+

Climate+ CAM+ K.+Evans+

ADIOS Application

Partners

Over 158 Publications from 2006 - 2013







The ADIOS Approach

• The overarching design philosophy of our framework is

based on the

Service-Oriented Architecture

–

 

Used to deal with system/application complexity, rapidly

changing requirements, evolving target platforms, and

diverse teams

• Applications constructed by assembling services based

on a universal view of their functionality using a

well-defined API

• Service implementations can be changed easily

• Integrated simulation can be assembled using these

(8)

ADIOS SOA: I/O Componentization

•  End users should be able to select the most efficient I/O method for their code, with minimal effort in terms of code alternations/updates.

•  Performance-driven choices should not prevent data from being stored in the desired file format, since this is crucial for later data analysis.

•  Have efficient ways of identifying and selecting certain data for analysis, to help end users cope with the flood of data being produced.

•  Make it easy to introduce new research I/O methods, without changing your code. –  allow the “best” CDS&E researchers to contribute in application science projects. •  Make it easy to allow I/O to do more than just I/O à code coupling, in situ

visualization.







• Exploit multi-levels in-memory

Hybrid Data Staging

to:

–  Decrease the gap between CPU and IO speeds

–  Dynamically deploy and execute data analytical or pre-processing operations either in-situ or in-transit

–  Improve IO write performance

Rethinking the Data Management Pipeline – Hybrid Staging

+ In-Situ & In-Transit Execution

(9)







In-situ/In-transit Data Management & Analytics

l 

Virtual shared-space programming abstraction

l 

Simple API for coordination, interaction and messaging

l 

Distributed, associative, in-memory object store

l 

Online data indexing, flexible querying

l 

Adaptive cross-layer runtime management

l 

Hybrid in-situ/in-transit execution

l 

Efficient, high-throughput/low-latency asynchronous data transport







Overview of Research Work @ RU

• Programming abstractions / system

–  DataSpaces/XpressSpaces: Interaction, coordination and messaging abstraction for coupled scientific workflow [HPDC10, CCGrid10,11, HiPC12, CCPE13]

–  ActiveSpaces: Dynamic code deployment for in-staging data processing [IPDPS11]

• Runtime mechanisms

–  Data-centric Task Mapping: Reduce data movement and increase intra-node data sharing [IPDPS12, DISC12]

–  In-situ & In-transit Data Analytics: Simulation-time analysis of large volume data by combining in-situ and in-transit execution [SC’12]

–  Cross-layer Adaptation: adaptive cross-layer approach for dynamic data management in large scale simulation-analysis workflow

• High-throughput/low-latency asynchronous data transport

–  DART: Network independent transport library for high speed asynchronous data extraction and transfer [HPDC08]

(10)

DataSpaces: A Shared Space Abstraction for the

Hybrid Staging Area [HPDC’10]

• Semantically-specialized

virtual

shared space abstraction in the

staging area

–  Shared (distributed) data objects –  Simple put/get/query API –  Supports coupled app. workflows –  Provide a global-view programming

abstraction consistent with the PGAS model (UPC, GA)

• DataSpaces (HPDC10)

–  Constructed on-the-fly on hybrid staging nodes

•  Indexes data for quick access and retrieval

•  Provides asynchronous coordination and interaction support

–  Complements existing interaction/ coordination mechanisms

• Code coupling using DataSpaces

–  Transparent data redistribution –  Complex geometry-based queries –  In-space (online) data

transformation and manipulations

                    

Figure. Graphical view of the shared space model (derived from Tuple space) for coordination/ interaction







The Replica Exchange Algorithm

• A powerful sampling algorithm that preserves canonical distributions

–  Allows for efficient crossing of high energy barriers that separate thermodynamically stable states

–  Can significantly reduce sampling times as compared to formulations based on constant temperatures

• Algorithm overview

–  Several copies, or replicas, of the system are simulated in parallel at different temperatures using “walkers”

–  Walkers occasionally swap temperatures to allow them to bypass enthalpic barriers by moving to a high temperature

•  Exchanges governed by a probability condition to ensure detailed balance

• Replica exchange can significantly impact the fields of structural

biology and drug design

–  structure based drug design associated with protein misfolding, for example, structure based drug design and binding affinity optimization –  molecular basis of human diseases associated with protein misfolding

(11)







Parallel Replica Exchange

• General formulation requires dynamic and complex

coordination and communication patterns between

the walkers

–

 

Pair-wise and asynchronous

–

 

Dependent on the current state (temperature, energy) of

replicas, which are changing dynamically

• Implementation based on commonly used parallel

programming frameworks is challenging

–

 

Message passing frameworks, e.g. MPI, require matching

sends and receives to be explicitly defined for each

interaction

–

 

Implementations use restrictive formulations using

synchronous exchanges

•  Exchanges between neighboring temperatures only







Asynchronous Replica Exchange using

DataSpaces

• DataSpaces provides a semantically specialized virtual

shared space abstraction to support scalable

asynchronous replica exchange for molecular dynamics

applications

–

 

Enables general non-nearest neighboring temperature exchanges

–

 

Exchanges are decoupled and asynchronously and dynamically

determined

–

 

Communications are decentralized and peer-to-peer, and occur in

parallel

• Operation

–

 

Walkers post temperature range of interest to the

shared space using post

–

 

Walkers discover partners and negotiate exchange

(12)

Example: Salsa Operation







Summary

•  Unprecedented opportunities for dramatic insights through

computation!

•  Challenge: Manage complexity while maintaining performance/scalability.

–  complexity from the problem (complex physics)

–  complexity from the codes and how they are developed and implemented –  complexity of underlying infrastructure (disruptive hardware trends) –  complexity from coordination across codes and research teams

•  Overarching philosophy

–  Abstraction & Separation through Service Orientation •  Independence in development, deployment, execution

•  Computations from compositions and coordination; Interface from implementations –  Existing and proven concepts - widely accepted/used by the enterprise

computing community

•  ADIOS

–  Reducing barriers from a scientists perspective •  Ease-of-use, simple code integration and maintainability –  Minimizing performance impact

(13)







Thank You!

Manish Parashar, Ph.D.

Prof., Dept. of Electrical & Computer Engr. Rutgers Discovery Informatics Institute (RDI2₎ Cloud & Autonomic Computing Center (CAC) Rutgers, The State University of New Jersey Email: [email protected]

WWW: rdi2.rutgers.edu

!

EPSi

Edge Physics Simulation