aiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
UQ pipeline implementa,on
and so0ware integra,on
michael aivazis
psaap review
28‐29 october 2009
alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Table of contents
1.
Introduc,on
people, computa,onal resources2.
Overview of the UQ pipeline
problem scope pipeline architecture the ingredients: pyre, mys,c, VTF, eureka capabili,es, verifica,on and valida,on3.
Status and outlook
summary of work in progress planned ac,vi,es for the remainder of the year tasks that specifically address recommenda,ons from the last review assessmentaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
introduc,on
people
computa,onal resources
resource u,liza,on
so0ware engineering
alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
The so0ware integra,on team
From CSE:
Michael Aivazis, Sharon BruneS, Julian Cummings, Jan Lindheim, San,ago Lombeyda, Mike McKerns, Mark Stalzer, Leif Strand
From Solid Dynamics and Materials:
Michael Or,z, Anna Pandolfi, Bo Li
From UQ:
Tim Sullivan
Expect the group to grow with representa,on from Experimental Science
and CFD
introduc,on: peopleaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Computa,onal resources
introduc,on: computa,onal resources hera@llnl: 12,800 opteron cores lobo@lanl: 2144 opteron coresalt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Computa,onal resources ‐ details
introduc,on: computa,onal resourcescores cpu type (GB/core) memory interconnect os compilers
shc
(CACR) 810 AMD Opteron 2.{2,4,5} GHz 2 Infiniband RHEL Pathscale
coyote
(LANL) 2,580 AMD Opteron 2.6 GHz 4 Infiniband RHEL Pathscale
ubgl
(LLNL) 82,000 IBM PowerPC2 700 MHz 1/2 torus+tree BG SuSE IBM
lobo
(LANL) 4,352 AMD Opteron 2.2 GHz 2 Infiniband CHAOS Pathscale
hera
(LLNL) 13,824 AMD Opteron 2.3 GHz 2 Infiniband CHAOS Pathscale
cerrillos
(LANL) 720 720 AMD Opteron Cell 1/4 4
Infiniband between Opteron cores
Modified
aiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Resource u,liza,on
introduc,on: resource u,liza,on 5,564,069 28,210 6,024,505 174,523 0 2500000 5000000 7500000hera coyote lobo shc
hrs
CPU cycles used by Caltech Jan 1 to Oct 20, 2009
alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Complexity management
Sources: Project size: asset complexity: number of lines of code, files, entry points dependencies: number of modules, third‐party libraries run,me complexity: number of objects types and instances Problem size: number of processors needed, amount of memory, cpu ,me Project longevity: life cycle, duty cycle cost/benefit of reuse managing change: people, hardware, so0ware, technologies Locality of needed resources compute/persist: where, how, when, who Usability: access, interfaces, security, etc… Risk mi,ga,on: Promote and ensure key prac,ces weekly mee,ngs coding and documenta,on standards uniform builds, tes,ng: regression, benchmarks, verifica,on, valida,on so0ware process: svn, trac, wiki, and doxygen, epydoc, … introduc,on: so0ware engineeringaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
the UQ pipeline
problem scope
pipeline architecture
the ingredients: pyre, mys,c, VTF, eureka
capabili,es, verifica,on and valida,on
alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
The physical system
High fidelity modeling of ballis,c
impact requires
non‐linear kinema,cs advanced material models a method for handling the extreme deforma,ons caused by the penetra,on fracture and fragmenta,on erosion robust contact detec,on and resolu,on algorithms the UQ pipeline: scopeaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009 Our methodology involves global op,miza,ons that require efficient explora,on of a huge parameter space DF is the largest devia,on in performance when each input parameter is allowed to vary over its en,re range DF-G is very similar Need to stage, monitor and analyze the output of thousands of simula,ons against the backdrop of constantly shi0ing computa,onal resources Managing such a complex computa,onal environment requires a sophis,cated so0ware infrastructure
Pyre, the Caltech ASC component framework, has par,al support for much of the
required infrastructure
Problem scope
alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Workflow overview
Evalua,ng the model diameters involves
analyzing the available datasets from previous runs preparing a collec,on of “interes,ng” input‐decks iden,fying appropriate computa,onal resources for the new set of runs preparing the computa,onal environment for each run shipping the input decks verifying that ancillary requirements such as connec,ons to the monitoring and journaling agents, are sa,sfied scheduling the job with the machine queue manager monitoring the runs wait for a job to start, monitor its progress, collect output (including debugging/performance info) collec,ng partly‐digested results from remote machine and archiving them along with their input deck for later analysis
See sketch of the architecture on next slide…
the UQ pipeline: scopeaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Architectural overview
the UQ pipeline: architecture optimizer convergence? population generator journal archiver job manager job manager queue manager job1 job2 … jobn machine N job manager queue managerjob1 job2 … jobn
monitor
alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Early pipeline implementa,on
DAKOTA + ABAQUS
Limita,ons
file I/O based – interac,ve diagnos,cs for progress/state of UQ or op,miza,on progress not built in the UQ, Dakota, and model codes must be separately compiled and tested on each machine simulated annealing not available (to complement gene,c algorithms) deployment limited to single machines – adding func,onality to intelligently tap resources from remote machines tailoring op,miza,on algorithms for specific needs is difficult some algorithms only find infima – user must negate objec,ve func,on to do supremum op,miza,on not a “standard” package already installed on most lab systems diagnos,cs involving visualiza,on are clumsy for large variable counts the UQ pipeline: ingredientsaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Ingredients of the UQ pipeline
Our implementa,on: mys,c, a distributed op,miza,on framework fully deployed on lab machines managed calcula,ons using the SPHIR surrogate on thousands of cores a large number of op,miza,on algorithms scipy: community supported/maintained – see hSp://www.scipy.org local varia,ons that are well suited to our problem a simula,on archiving subsystem PostgreSQL database back end web based user interface a distributed simula,on monitoring subsystem diagnos,cs, probes custom simula,on viewer for debugging and post‐mortem analysis computa,onal engines for modeling the impact VTF next genera,on codes from our center a surrogate for the SPHIR gun
alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Pyre
Pyre is a soAware architecture: a specifica,on of the organiza,on of the so0ware system a descrip,on of the crucial structural elements and their interfaces a specifica,on for the possible collabora,ons of these elements a strategy for the composi,on of structural and behavioral elements Pyre is mul,‐layered flexibility complexity management robustness under evolu,onary pressures Pyre is a component framework the UQ pipeline: ingredients applicaBon‐general applicaBon‐specific framework computaBonal enginesaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Using components
Component based solu,ons are ideal for complex systems encourage the decomposi,on of the problem into manageable func,onal units expose the interac,on mechanisms between these units enable the nearly independent evolu,on of the parts Component frameworks enable an incremental and evolu,onary approach exis,ng codes can start producing results immediately new services can be incorporated incrementally The goal is to encapsulate and deploy F the UQ pipeline: ingredients Component input ports output ports properBes component core name controlalt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Services for computa,onal engines
Normal engine life cycle: deployment staging, instan,a,on, sta,c ini,aliza,on, dynamic ini,aliza,on, resource alloca,on launching input delivery, execu,on control, hauling of output teardown resource de‐alloca,on, archiving, execu,on sta,s,cs Excep,onal events core dumps, resource alloca,on failures diagnos,cs: errors, warnings, informa,onal messages monitoring: debugging informa,on, self consistency checks Parallel processing Distributed compu,ng the UQ pipeline: ingredientsaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Simula,on services
Problem specifica,on components and their proper,es Solid modeling overall geometry model construc,on topological and geometrical informa,on Boundary and ini,al condi,ons high level specifica,on access to the underlying solver data structures in a uniform way Materials and cons,tu,ve models materials proper,es database strength models and EOS associa,on with a region of space Computa,onal engines selec,on and associa,on with geometry solver specific ini,aliza,ons Simula,on driver ini,aliza,on appropriate ,mestep computa,on orchestra,on of data exchanges checkpoints and field dumps Ac,ve monitoring instrumenta,on: sensors, actuators real‐,me visualiza,on Full simula,on archiving the UQ pipeline: ingredientsalt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Simula,on archiving
Produce a fully repeatable execu,on by recording scripts user choices sources (cvs/svn tags or even the files themselves) build procedure required third party libraries version of as many run,me components as can be determined generated data sets (urls, actual files) Implementa,on meta‐data in PostgreSQL HDF5 embed XML meta‐data parsed for deducing the layout of the file as format evolves can be extracted for easy indexing the UQ pipeline: ingredientsaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
mys,c: key components
The job manager stages and launches new jobs broadcasts execu,on control direc,ves maintains a registry of submiSed jobs The iterator adjusts the cost func,on parameters reacts to control direc,ves The mapping strategy provides an algorithm to distribute the workload among available resources
The launcher
knows how submit jobs on the current execu,on environment the UQ pipeline: ingredientsalt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
User interface for the simula,on archive
the UQ pipeline: ingredientsaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Custom simula,on viewer
the UQ pipeline: ingredientsalt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Simula,on capability: the VTF
Ini,al capability built using
adlib
, the center’s
Lagrangian solver
finite kinema,cs parallel explicit dynamics with excellent scalability flexible, scalable meshing elements: ten‐noded quadra,c, ten‐noded composite material models: power law, and J2+vinet contact: smooth and non‐smooth surface based contact pyre integra,on
PSAAP extensions
contact: volume based: billiard‐ball element erosion: the UQ pipeline: ingredientsaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Status – VTF simula,on capability
Components:
new contact and element erosion algorithms in place paralleliza,on complete: large runs on all plauorms
Verifica,on:
in the process of collec,ng and organizing the historical verifica,on tests into a coherent test suite element types, material models, contact: in place element erosion: in progress
Valida,on:
building valida,on applica,ons for all major components materials: uniaxial tests complete, shear tests need revival contact algorithms are being validated against Molinari[2002] for impact speeds below 500m/s with good ini,al agreement comparison with our experiments is in progress the UQ pipeline: ingredientsalt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Perfora,on using the VTF
the UQ pipeline: ingredientsaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Perfora,on using the VTF ‐ II
the UQ pipeline: ingredientsalt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Valida,on of the contact algorithm
The billiard ball contact algorithm has three free parameters:
p: the frac,onal interpenetra,on volume k: the s,ffness of the contact restoring force b: a dampening factor
Comparison with experiments [Molinari 2002] of spherical steel
projec,les on thick steel plates
the UQ pipeline: ingredients v = 200m/s v = 400m/s v = 600m/saiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Simula,on profiling
Time consuming por,ons of the code revealed by profiling tools Speedshop, gprof, etc. Outliers shown in load balancing reports Erosion computa,on is responsible for ~85% of the ,me in simula,ons’ explicit integra,on checking is currently done every step ‐ costly residual_general log_mulss assemble: loop unrolling possibili,es! Compu,ng and upda,ng correctors and restoring forces, ~15% Need to keep exploring fracture based erosion scheme less costly element erosion computa,on the UQ pipeline: ingredients SpeedShop profile of a run with 465K elements, 64 MPI tasks on 4 hera nodesalt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Scaling of the contact algorithm
147K elements in ini,al mesh, grows with core count Contact occurs in a rela,vely small region of the plate Execu,on ,me increases with core count, as some cores handle larger contact regions For larger core counts, the contact region begins to be be effec,vely distributed the UQ pipeline: ingredients 0.1 1 10 0 1 2 3 4 5 6 7 8 9 10 11 avg. Bme t o perf orm c on tact de tecBon (sec) log(2) cores proxyball scaling hera 1 10full application - vtf with proxyball weak scaling 22K element base mesh
aiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Load balance issues on small core counts
3.2M element run on 32 shc cores
the UQ pipeline: ingredients 0 10 20 30 40 50 60 70 80 1 5 9 13 17 21 25 29 % of MPI Bme MPI task MPI_allreduce balancealt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Applica,on scaling
the UQ pipeline: ingredients 0 5 10 15 20 25 30 35 40 4 5 6 7 8 9 10 11 avg Bme/ st ep (sec) Log(2) CPUs hera (LLNL) lobo (LANL) shc (Caltech)aiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Next genera,on lagrangian code
eureka: a new solid dynamics capability object oriented finite element and meshfree framework highly flexible and extensible finite deforma,ons, visco‐elas,city, visco‐ plas,city, thermal coupling, contact, fracture and fragmenta,on extensive material model library OTM (op,mal transporta,on meshfree) based on op,mal transporta,on theory with material point sampling both solid and fluid flows exact essen,al boundary condi,on enforcement exact linear and angular momentum conserva,on free from tensile instabili,es contact provably convergent energy‐based material the UQ pipeline: ingredientsalt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
SPHIR surrogate
Model of the SPHIR response:
perfora,on diameter as a func,on of projec,le speed and plate thickness the UQ pipeline: ingredients 60 70 80 90 100 60 70 80 90 100 8aiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
status and outlook
summary of work in progress
planned ac,vi,es for the remainder of the year
tasks that specifically address recommenda,ons from the last review
assessment
alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
In progress
Complete valida,on of the new contact algorithm Integrate new erosion criterion into the simula,on drivers Valida,on against our experimental data Paralleliza,on: contact element erosion Conduct preliminary runs using Ta(j2+vinet) for projec,le, target Manage the deluge of informa,on from our runs db schema almost complete data harves,ng techniques job tracking: both programma,c and interac,ve Deploy the prototype distributed op,miza,on framework pyre driven applica,ons status and outlookaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Planned ac,vi,es
So0ware test suites automa,on coherent verifica,on strategy Simula,on capability lagrangian solver: contact valida,on improve simula,on capability and validate against our experiments eureka: framework integra,on paralleliza,on eulerian code: repeat our process with the new code UQ framework simula,on archiving explora,on of op,miza,on algorithms and their effect on the methodology VTF driven by mys,c recas,ng of exis,ng simula,on drivers as pyre applica,ons for beSer integra,on with the UQ framework status and outlookalt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Review recommenda,ons
Verifica,on: in previous years, verifica,on and valida,on was the responsibility of the research groups and the results were published in the literature we have embarked on a systema,c construc,on of thorough regression, benchmark, verifica,on and valida,on test suites tes,ng strategy is documented on our wiki tes,ng will be integrated with the simula,on launching and archiving facili,es so tests can be submiSed anywhere the code runs, at any ,me Computa,onal requirements: we believe we understand the capabili,es necessary to model ballis,c impact performance modeling is underway: we collect data from every run and we are building resource predictors Iden,fica,on, quan,fica,on and reduc,on of the major sources of uncertainty individual variable sub‐diameters are excellent metrics uncertainty reduc,on is now a high priority task status and outlookaiv azis@c alt ech.edu ) ‐‐ PS AAP REVIEW ‐‐ 28‐29 OC TOBER 2009
Assessment
Assessment from last year: good con,nuity from ASC to PSAAP management structure so0ware development process VTF extensions to handle the new applica,on are well underway Since then: completed deployment of UQ pipeline on lab machines ini,ated the construc,on of our test suites instrumented simula,on to collect data for performance modeling new contact algorithm implementa,on, verifica,on; valida,on underway; paralleliza,on new erosion criterion implementa,on, verifica,on and valida,on tests to be constructed; paralleliza,on preliminary design and implementa,on of the pyre‐based distributed op,miza,on framework – see poster for details Deferred: Ta/Ta impacts: un,l experiments are available status and outlookalt
ech.edu
) ‐‐ PS
AAP REVIEW ‐‐ 28‐29 OC