Generating Virtual Worlds with
Generating Virtual Worlds with
Supercomputer Simulations
Supercomputer Simulations
Lehrstuhl für Informatik 10 (Systemsimulation) Universität Erlangen-Nürnberg
www10.informatik.uni-erlangen.de
December 19, 2007
U. Rüde (LSS Erlangen, [email protected])
Overview
Overview
Computers as tools for scientists: What is Computational ScienceComputational Science
Examples of Simulation for Science and Engineering
Simulating Flow Simulations Biomedical Applications
Motivation
How much is a PetaFlops?
How much is a PetaFlops?
106 = 1 MegaFlops: Intel 48633MHz PC (~1989)
109 = 1 GigaFlops: Intel Pentium III
1GHz (~2000)
If every person on earth does one
operation every 6 seconds, all humans together have 1 GigaFlops performance (less than a current laptop from Aldi)
1012= 1 TeraFlops: HLRB-I
1344 Proc., ~ 2000 1015= 1 PetaFlops
>250 000 Proc. Cores?, ~2008?
If every person on earth runs a 486 PC, we all together have an aggregate
Performance of 6 PetaFlops.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (Unkomprimiert)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (Unkomprimiert)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (Unkomprimiert)“
benötigt.
The Two Principles of Science
The Two Principles of Science
Theory
Theory
Mathematical Mathematical Models, Differential Models, Differential Equations, Newton Equations, NewtonExperiments
Experiments
Observation and Observation and prototypes prototypes empirical Sciences empirical SciencesComputational Science
Simulation, Optimization (quantitative) virtual RealityComputational Science
Computational Science
Simulation, Optimization
Simulation, Optimization
(quantitative) virtual Reality
(quantitative) virtual Reality
Three
CSE is a broad
CSE is a broad multidisciplinary multidisciplinary area that encompasses area that encompasses applications
applications in science/engineering, applied mathematics, in science/engineering, applied mathematics, numerical analysis, and computer science.
numerical analysis, and computer science. Computer models Computer models and computer simulations
and computer simulations have become an important part of the have become an important part of the research repertoire, supplementing (and in some cases
research repertoire, supplementing (and in some cases
replacing) experimentation. Going from application area to
replacing) experimentation. Going from application area to
computational results
computational results requires domain expertise, requires domain expertise, mathematical mathematical modeling
modeling, numerical analysis, algorithm development, software , numerical analysis, algorithm development, software implementation, program execution, analysis, validation and
implementation, program execution, analysis, validation and
visualization of results
visualization of results. CSE involves all of this. CSE involves all of this..
SIAM
SIAM
’
’
s
s
Definition
Definition
of
of
CSE
CSE
http://www.
CSE makes use of the techniques of applied mathematics and compu
CSE makes use of the techniques of applied mathematics and computer ter
science for the
science for the development development of of problemproblem--solving methodologies solving methodologies and and robust tools which will be the building blocks for solutions to
robust tools which will be the building blocks for solutions to scientific and scientific and
engineering problems of ever
engineering problems of ever--increasing complexity. It increasing complexity. It differs from differs from mathematics or computer science
mathematics or computer science in that analysis and methodologies are in that analysis and methodologies are directed
directed specifically specifically at the solution of problem classes from at the solution of problem classes from science and science and engineering
engineering, and will generally require a detailed knowledge or , and will generally require a detailed knowledge or substantial
substantial collaboration collaboration from those disciplines. The computing and from those disciplines. The computing and mathematical techniques used may be more domain specific, and th
mathematical techniques used may be more domain specific, and the e
computer science and mathematics skills needed will be broader.
computer science and mathematics skills needed will be broader.
CSE
CSE is is more than more than a scientist or engineer a scientist or engineer using a canned using a canned code
code to generate and visualize results (skipping all of the to generate and visualize results (skipping all of the intermediate steps).
intermediate steps).
SIAM's
SIAM's
Definition of CSE
Definition of CSE
(2)
(2)
Especially:
Fluid Flow Simulation
Fluid Flow Simulation
Metal Foams
Metal Foams
Nano Technology
Nano Technology
Fancy Physics
Fancy Physics
In Collaboration with:Lehrstuhl Werkstoffkunde und Technologie der Metalle, Erlangen (R.F. Singer, C. Körner)
Lehrstuhl für Bauinformatik, TU München (E. Rank) Institut für Computeranwendungen im Bauingenieurwesen,
TU Braunschweig (M. Krafczyk)
Lehrstuhl für Feststoff- und Grenzflächenverfahrenstechnik, Erlangen (W. Peukert, H.-J. Schmid)
First Test
First Test
„
„
Breaking Dam
Breaking Dam
“
“
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
Falling Drop
Falling Drop
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
Falling Meteor
Falling Meteor
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
The interface between
The interface between
Liquid and Gas
Liquid and Gas
Compute only fluid
Why so compute intensive?
Why so compute intensive?
Millions to billions of cells (1000x1000x1000) Thousands to millions of time steps
hundreds of operations in each cell and time step
Visualization
Visualization
Ray-tracing Refraction Reflection CausticsAbout 15 Min per frame
= 1 day for 4 secs
Process Simulation of Foam Production
Process Simulation of Foam Production
poorly understood: coalescence
poorly understood: coalescence
collapse
collapse, , drying, solidification etc.drying, solidification etc.
Simulation as tool to understand
Simulation as tool to understand
and control the process
Rising Bubbles
Rising Bubbles
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
Simultaneously Rising Bubbles
Simultaneously Rising Bubbles
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
Experimental Verification
Experimental Verification
Simulation and Experiment:
Simulation and Experiment: Diplomarbeit Diplomarbeit N. N. ThThüüreyrey
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
Foaming Simulation
Foaming Simulation
Zur Anzeige wird der QuickTime™ Dekompressor „Cinepak“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
benötigt.
Fancy Physics
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
benötigt.
Moving Nano Particles
Moving Nano Particles
in a Liquid
in a Liquid
K. Iglberger - Master Thesis C. Feichtinger - Diplomarbeit
Bio
Bio
-
-
medical and
medical and
Bio
Bio
-
-
chemical Simulation
chemical Simulation
Bood Flow in an
Bood Flow in an
Aneurysma
Aneurysma
HIV
HIV
-
-
Protease
Protease
Bio
Pulsating Blood Flow
Pulsating Blood Flow
in an Aneurysma
in an Aneurysma
Datensatz Master Thesis Master ThesisJan G
Jan G
ö
ö
tz
tz
Collaboration with Collaboration with Neuroradiologie Neuroradiologie ((Prof. DProf. Döörfler, Dr. Richterrfler, Dr. Richter) )
Image Processing Image Processing Simulation Simulation Fluid Mechanics Fluid Mechanics (Prof. Durst) (Prof. Durst)
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
Pulsating Blood Flow in an
Pulsating Blood Flow in an
Aneurysma
Aneurysma
Datensatz Master Thesis Master ThesisJan G
Jan G
ö
ö
tz
tz
Collaboration with Collaboration with NeuroNeuro--Radiology Radiology (
(Prof. DProf. Döörfler, Dr. Richterrfler, Dr. Richter) )
Image Processing Image Processing Simulation Simulation Fluid Mechanics Fluid Mechanics (Prof. Durst) (Prof. Durst)
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
Bio
Bio
-
-
Electromagnetic Fields
Electromagnetic Fields
Source Localisation
Source Localisation
Erlangen Neuro Surgeons Erlangen Neuro Surgeons
at work at work View through View through operation microscope operation microscope
Collaboration with: Chr. Johnson (Univ. of Utah), C. Popa (Ovidius Univ.
Constanta), Bart Vanrumste, (Univ. of Canterbury, New Zealand), G. Greiner, F. Fahlbusch (Erlangen), C. Wolters (Münster)
Simulation or better do experiments?
Simulation or better do experiments?
Source localisation Source localisation
by open brain by open brain measurements
measurements Operation planning with a virtual head Operation planning with a virtual head
model
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
benötigt.
Molekular Dynamics Simulation of
Molekular Dynamics Simulation of
HIV
International Master (and PhD) Programme
International Master (and PhD) Programme
Computational Engineering
Computational Engineering
What is this about?
it is not Computer Science it is not Mathematics
it is not a conventional engineering field
it is an interdisciplinary combination of all three - the foundation of future science
Master Program in Erlangen
Hours Option (Elite Program) jointly with TU Munich
Acknowledgements
Acknowledgements
Collaborators
In Erlangen: WTM, LSE, LSTM, LGDV, RRZE, Neurozentrum, Radiologie, etc.
Especially for foams: C. Körner (WTM)
International: Utah, Technion, Constanta, Ghent, Boulder, München, Zürich, ...
Dissertationen Projects
U. Fabricius (AMG-Methods and SW-Engineering for parallelization) C. Freundl (Parelle Expression Templates for PDE-solver)
K. Iglberger (Rigid Body Dynamics) J. Götz (LBM, blood flow)
T. Gradl (Parallel multigrid) ... and 8 more
25 Diplom- /Master- Thesis Studien- /Bachelor- Thesis
Especially for Performance-Analysis/ Optimization for LBM
• J. Wilke, K. Iglberger, S. Donath, B. Gmeiner
... and 23 more
KONWIHR,
KONWIHR, DFG, DFG, NATO, BMBFNATO, BMBF Elitenetzwerk Bayern
Elitenetzwerk Bayern
Bavarian Graduate School in Computational Engineering
Bavarian Graduate School in Computational Engineering (with TUM, since 2004)(with TUM, since 2004) Special International
Special International PhD programPhD program: : Identifikation, Optimierung und Steuerung fIdentifikation, Optimierung und Steuerung füür technische r technische
Anwendungen
Thank you for your interest!
Thank you for your interest!
Questions?
Questions?
Part II
Part II
-
-
a
a
Towards Scalable FE Software
Towards Scalable FE Software
Scalable Algorithms:
Scalable Algorithms:
Multigrid
What is Multigrid?
What is Multigrid?
Has nothing to do with „grid computing“A general methodology
multi - scale (actually it is the „original“) many different applications
developped in the 1970s - ...
Useful e.g. for solving elliptic PDEs
large sparse systems of equations iterative
convergence rate independent of problem size
asymptotically optimal complexity -> algorithmic scalability!
can solve e.g. 2D Poisson Problem in ~ 30 operations per gridpoint efficient parallelization - if one knows how to do it
Multigrid
Multigrid
:
:
V
V
-
-
Cycle
Cycle
Relax on Residual Restrict Correct Solve Interpolate by recursion … …
Goal: solve
A
hu
h= f
h using a hierarchy of grids Goal:Part II
Part II
-
-
b
b
Towards Scalable FE Software
Towards Scalable FE Software
Scalable Architecture
Scalable Architecture
Hierarchical Hybrid Grids
H
H
ierarchical
ierarchical
Hybrid
Hybrid
Grids (HHG)
Grids (HHG)
Unstructured input gridResolves geometry of problem domain Patch-wise regular refinement
generates nested grid hierarchies naturally suitable for geometric multigrid algorithms
New:
Modify storage formats and operations on the grid to exploit the regular substructures
Does an unstructured grid with 100 000 000 000 elements
HHG refinement example
HHG refinement example
HHG Refinement example
HHG Refinement example
HHG Refinement example
HHG Refinement example
HHG Refinement example
HHG Refinement example
Structured Interior Structured Interior
HHG Refinement example
HHG Refinement example
HHG Refinement example
HHG Refinement example
HHG Refinement example
HHG Refinement example
Common HHG Misconceptions
Common HHG Misconceptions
Hierarchical hybrid grids (HHG)
are not only another block structured grid
HHG are more flexible (unstructured, hybrid input
grids)
are not only another unstructured geometric multigrid package
HHG achieve better performance
unstructured treatment of regular regions does not improve performance
Parallel HHG
Parallel HHG
-
-
Framework Design
Framework Design
Goals
Goals
To realize good parallel scalability:
Minimize latency by reducing the number of
messages that must be sent
Optimize for high bandwidth interconnects
⇒
large messages
HHG for Parallelization
HHG for Parallelization
Use regular HHG patches for partitioning the domainHHG Parallel Update Algorithm
HHG Parallel Update Algorithm
for each vertex do
apply operation to vertex
end for
for each edge do
copy from vertex interior
apply operation to edge
copy to vertex halo
end for
for each element do
copy from edge/vertex interiors
apply operation to element
copy to edge/vertex halos
end for
update vertex primary dependencies
update vertex primary dependencies
update edge primary dependencies
Part II
Part II
-
-
c
c
Towards Scalable FE Software
Towards Scalable FE Software
Performance Results
Single Processor HHG Performance on Itanium for
Single Processor HHG Performance on Itanium for
Relaxation of a Tetrahedral Finite Element Mesh
HHG: Parallel Scalability
HHG: Parallel Scalability
#Procs #DOFS x 106 #Els x 106 #Input Els GFLOP/s Time [s]
64 2,144 12,884 6144 100/75 68
128 4,288 25,769 12288 200/147 69
256 8,577 51,539 24576 409/270 76
512 17,167 103,079 49152 762/545 75 1024 17,167 103,079 49152 1,456/964 43
Parallel scalability of Poisson problem discretized by tetrahedral finite elements - SGI Altix (Itanium-2 1.6 GHz)
B. Bergen, F. Hülsemann, U. Ruede: Is 1.7× 1010 unknowns the largest
finite element system that can be solved today?
Part III
Part III
-
-
a
a
Free Surface Flow Simulation
Free Surface Flow Simulation
The Lattice Boltzmann Method
Free
Free
surface flow
surface flow
:
:
Breaking
Breaking
Dam
Dam
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
The
The
Lattice
Lattice
-
-
Boltzmann
Boltzmann
Method (2)
Method (2)
Weakly compressible approximation of the
Navier-Stokes
equations
Easy implementation
Applicable for small Mach numbers (< 0.1)
Easy to adapt, e.g. for
Complicated or time-varying geometries Free surfaces
The
The
Lattice
Lattice
-
-
Boltzmann
Boltzmann
Method (3)
Method (3)
Real valued representation of particles Discrete velocities and positions
Algorithm proceeds in two steps:
Stream Collide
Fluid Cell
Fluid Cell
Treatment
Treatment
Algorithm proceeds in two steps:
Stream: advect fluid elements (copy DFs to neighbors) Collide: compute collisions of fluid molecules
Fluid Cell
Fluid Cell
Treatment
Treatment
Algorithm proceeds in two steps:
Stream: advect fluid elements (copy DFs to neighbors) Collide: compute collisions of fluid molecules
Fluid Cell
Fluid Cell
Treatment
Treatment
Algorithm proceeds in two steps:
Stream: advect fluid elements (copy DFs to neighbors) Collide: compute collisions of fluid molecules
Fluid Cell
Fluid Cell
Treatment
Treatment
Algorithm proceeds in two steps:
Stream: advect fluid elements (copy DFs to neighbors) Collide: compute collisions of fluid molecules
Fluid Cell
Fluid Cell
Treatment
Treatment
Algorithm proceeds in two steps:
Stream: advect fluid elements (copy DFs to neighbors) Collide: compute collisions of fluid molecules
The Collide Step
The Collide Step
Amounts for collisions of particles during movement Weigh equilibrium velocities and velocities from
LBM in
LBM in
Equations
Equations
Stream/Collide:
Equilibrium DF:
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
Stability
Stability
&
&
Turbulence Modelling
Turbulence Modelling
Smagorinsky Subgrid Model:
Similar to approach for NS-Solvers
Model subgrid-scale vortices by locally changing the viscosity
Implementation for LBM
Reynolds stress tensor computed for each cell Changes only in collision operator
Ca. 20% slowdown, significant gain due to decreased resolution requirements
Falling
Falling
Drop
Drop
with Turbulence
with Turbulence
Model
Model
Zur Anzeige wird der QuickTime™ Dekompressor „“
Falling
Falling
Drop
Drop
with Turbulence
with Turbulence
Model (
Model (
slower
slower
)
)
Zur Anzeige wird der QuickTime™ Dekompressor „“
Part III
Part III
-
-
b
b
Free Surface Flow Simulation
Free Surface Flow Simulation
Volume of Fluids
Free surfaces with LBM
Free surfaces with LBM
Metal Foams – huge gas volumes Only simulate and track fluid motion
Compute boundary conditions at free surface Three cell types: Empty/Gas, Fluid, Interface
Boundary Conditions
Boundary Conditions
Gas
Liquid
Problem:
Missing distribution functions at interface cells after streaming!
Reconstruction such that macroscopic
boundary conditions
are satisfied.
Körner et al. Lattice Boltzmann Model for Free Surface Flow, Journal of Computational Physics
Free surface simulations
Free surface simulations
Algorithmic Overview:
Before stream step, compute mass exchange
across cell boundaries for interface cells
Calculate bubble volumes and pressure
Surface curvature for surface tension
Change topology if interface cells become full
or empty – keep layer of interface cells closed
Free
Free
Surface Cell Conversions
Surface Cell Conversions
Emptied interface cell > gas Filled interface cell > fluid Guarantee closed layer of interface cells
Redistribute mass in the neighborhood
Curvature calculation (version I)
Curvature calculation (version I)
Alternative approaches:
Integrate normals over surface (weighted triangles) Level set methods (track surface as implicit function)
Surface
Surface
Tension (Vers. 2)
Tension (Vers. 2)
V δ A A A = ′ − δ A′ A 1 nr 3 nr 2 nr
Marching-cube surface triangulation
Compute a curvature for each triangle κ =
1 2
δA
δV
Associate with each LBM cell the average curvature of its triangles
Complicated
Part III
Part III
-
-
c
c
Free Surface Flow Simulation
Free Surface Flow Simulation
Application: Metal Foam
Towards Simulating Metal Foams
Towards Simulating Metal Foams
Bubble growth, Bubble growth,
coalescence, collapse, coalescence, collapse, drainage,
drainage, rheologyrheology, etc. are , etc. are still poorly understood
still poorly understood Simulation as a tool to Simulation as a tool to
better understand, control better understand, control and optimize the process and optimize the process
Rising Bubbles
Rising Bubbles
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
More Rising Bubbles
More Rising Bubbles
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
Simulation
Simulation
Verification
Verification
by
by
Experiment
Experiment
Simulation and Experiment:
Simulation and Experiment: Diplomarbeit Diplomarbeit N. N. ThThüüreyrey
Zur Anzeige wird der QuickTime™ Dekompressor „YUV420 codec“
Foaming
Foaming
Simulation 1
Simulation 1
Zur Anzeige wird der QuickTime™ Dekompressor „Cinepak“
Numerical Experiment: Single Rising
Numerical Experiment: Single Rising
Bubble
Part III
Part III
-
-
d
d
Free Surface Flow Simulation
Free Surface Flow Simulation
Parallel Performance
Parallelization
Parallelization
Standard LBM-Code: Scalability on SR 8000-F1
Largest Simulation:
1,08*109 cells
370 GByte memory Communication Cost because of large data volume (64 MByte)
Æ Efficiency ~ 75%
Dissertation T. Pohl
Parallelization
Parallelization
Free surface LBM-Code
Standard LBM Free surface LBM
1 sweep through grid 5 sweeps through grid
Cell type changes, Closed boundary for bubbles, Initialization of modified cells, Mass balance correction
Parallelization
Parallelization
Free surface LBM-Code:
Standard LBM Free surface LBM
1 sweep through grid 5 sweeps through grid
Performance on SR 8000
Performance on SR 8000
Free
Free surface LBMsurface LBM-Code -Code Standard
Standard LBM-LBM-CodeCode
Performance lousy on a single node!
Conditionals: 2,9 SLBM Æ 51 free surface LBM Pentium 4: almost no degradation ~ 10%
SR 8000: enormous degradation (pseudo-vector, predictable jumps)
Parallel Performance
Parallel Performance
LSS
LSS
-
-Cluster
Cluster
Fujitsu Fujitsu- -Siemens SiemensPart III
Part III
-
-
c
c
Free Surface Flow Simulation
Free Surface Flow Simulation
Visualization and Animation
Adaptive
Adaptive
Grids
Grids
Performance
Performance
Speed up: factor 2-4 for larger resolutions
Zur Anzeige wird der QuickTime™ Dekompressor „“
Example
Example
Coupled
Coupled
Simulations
Simulations
Zur Anzeige wird der QuickTime™ Dekompressor „“
Physically
Physically
Based
Based
Animation
Animation
Special Effects e.g. for Computer generated movies Realistic appearance necessary, but only where it‘s absolutely necessary
> Control Fluid or other simulations
Examples of Fluid Simulations in Movies: Harry Potter 4 (ship-scene), Ice Age 2 (throughout), Poseidon
Zur Anzeige wird der QuickTime™ Dekompressor „mpeg4“
benötigt.
Simulations
Part IV
Part IV
Outlook
Acknowledgements
Acknowledgements
Collaborators
In Erlangen: WTM, LSE, LSTM, LGDV, RRZE, Neurozentrum, Radiologie, etc.
Especially for foams: C. Körner (WTM)
International: Utah, Technion, Constanta, Ghent, Boulder, München, Zürich, ...
Dissertationen Projects
U. Fabricius (AMG-Verfahren and SW-Engineering for parallelization) C. Freundl (Parelle Expression Templates for PDE-solver)
J. Härtlein (Expression Templates for FE-Applications) N. Thürey (LBM, free surfaces)
T. Pohl (Parallel LBM) ... and 6 more
19 Diplom- /Master- Thesis Studien- /Bachelor- Thesis
Especially for Performance-Analysis/ Optimization for LBM
• J. Wilke, K. Iglberger, S. Donath
... and 23 more
KONWIHR,
KONWIHR, DFG, DFG, NATO, BMBFNATO, BMBF Elitenetzwerk
Elitenetzwerk BayernBayern
Bavarian Graduate School in Computational Engineering
Bavarian Graduate School in Computational Engineering (with TUM, since 2004)(with TUM, since 2004) Special International
Special International PhD PhDprogramprogram: : Identifikation, Optimierung und Steuerung fIdentifikation, Optimierung und Steuerung füür technische r technische
Anwendungen