Multiphysics Simulations and Petascale Computing
3.4 Multiscale Algorithms
Petascale simulations of multiphysics, multiscale phenomena require scal- able algorithms and application codes that can effectively harness the power of the computers described above. Specifically, each symponent within a fed- erated simulation needs to be scalable within its processor space. Scalabil- ity across symponents may be achieved via cooperative parallelism using an MPMD paradigm. This section discusses the importance of scalable numer- ical algorithms within a single physics regime, and it previews some novel approaches to scaling across multiple physics regimes.
In the scientific applications of interest here, one typically approximates the solution of a system of partial differential equations (PDEs) describing some physical phenomenon on a discretized spatial mesh. The choice of discretiza- tion scheme and numerical PDE solver are closely entwined and together determine the accuracy and scalability of the resulting simulation. The over- all application code — consisting of the discretization and solver — must be scalable in both its underlying numerical algorithms and its parallel im- plementation. This would result in an application whose time to solution is constant as the problem size increases proportionally with the machine size (i.e, weak scaling). In practice, the scalability burden typically falls on the solver, which is the subject of the next section.
3.4.1
Parallel multigrid methods
Multigrid methods were first introduced in the 1970s and are provably op- timal solvers for various classes of PDEs. Here optimality means that the algorithm has complexity O(n), where nis the number of grid points on the discretized spatial mesh. In contrast, other solution techniques areO(n2) or worse. The optimality of the multigrid solver translates into mathematical scalability: The amount of work required for solution is linearly proportional to the problem (mesh) size. If this numerical method can be efficiently imple- mented in parallel — for example, by overlapping communication and com- putation — then the overall algorithm (and that portion of the application
TABLE 3.2:
Execution times (in seconds) for a parallel multigrid method. The table compares two coarsening operators (C-old and C-new) for each of two data querying techniques (global and assumed partition).Data query via Data query via global partition assumed partition
Processors Unknowns C-old C-new C-old C-new
4,096 110.6M 12.42 3.06 12.32 2.86
64,000 1.73B 67.19 10.45 19.85 4.23
code) is scalable.
A detailed discussion of multigrid methods is beyond the scope of this chap- ter, but the key idea is this: Multigrid methods solve successively smaller problems on a hierarchy of grids to accelerate the solution of the original fine-grid problem. Specifically, one must properly define coarsening and pro- longation operators that map the intermediate approximations from one grid to another. (Coarsening restricts the approximate solution to a coarser grid; prolongation interpolates the approximate solution onto a finer grid.) These operators are highly problem-dependent. Most recent mathematical multi- grid research has concentrated on defining operators in real applications so that the resulting algorithm retains the hallmark optimality (and mathemat- ical scalability). Toward this end, considerable work has been done over the past decade in the area of algebraic multigrid methods. AMG methods, as they are known, do not presume that the underlying mesh is structured. In- stead they rely on inferred algebraic properties of the underlying system of discretized equations. This information is used to define the coarsening and prolongation operators. Researchers successfully have applied AMG methods to challenging PDEs discretized on unstructured meshes for a variety of ap- plications, including computational astrophysics, structural mechanics, and fluid dynamics.
In the past decade, considerable effort has been focused on improving the parallel scalability of algebraic multigrid methods. In serial AMG methods, key aspects of the calculation are inherently sequential and do not parallelize well on massively parallel machines. In particular, although computational complexity is optimal, storage and communication costs increase significantly on parallel computers. This problem has not been solved, but recent advances in coarsening strategies have ameliorated the complexity and setup costs by halving the storage requirements and reducing the execution times by an order of magnitude [8].
The parallel scalability issues become even more pronounced on massively parallel computers like Blue Gene/L. Consider the kernel operation of answer- ing a global data distribution query. In a traditional parallel implementation
(such as one that uses MPI’s MPI Allgatherv collective operation), this re- quiresO(p) storage and communication, wherepis the number of processors. On a machine like Blue Gene/L with more than 100,000 processors — and future petascale machines with upwards of one million processors — storing
O(p) data is impractical if not impossible. A novel “assumed partition” algo- rithm [3] employs a rendezvous algorithm to answer queries withO(1) storage andO(logp) computational costs.
To illustrate the power of efficiently implemented parallel multigrid meth- ods, consider Table 3.2. It demonstrates the scalability of the LLNL AMG solver on the Blue Gene/L supercomputer: A problem with nearly two billion grid points (303 unknowns per processor) is solved in just over four seconds. The 16-fold improvement over the previous algorithm is a combination of an improved coarsening algorithm and the faster communication routine men- tioned above. (It should be noted that the underlying algorithm also is math- ematically scalable in terms of number of iterations required for convergence.)
3.4.2
ALE-AMR discretization
The preceding discussion touched on the importance of the underlying spa- tial mesh to the mathematical and parallel performance of the PDE solver. It is easier to define and implement optimally performing multigrid methods on structured Eulerian meshes (the regular communication patterns facili- tate efficient parallel implementation), but such meshes may not adequately represent important problem features, such as complex moving parts in an engineering application.
To overcome this deficiency, computational scientists have typically turned to one of two competing discretization methodologies: adaptive mesh refine- ment (AMR) or arbitrary Lagrangian-Eulerian (ALE) meshing. In AMR, one refines the mesh during runtime based on various estimates of the solution error. This allows one to obtain the accuracy of a much finer mesh at a frac- tion of the storage and computational costs. The underlying grid still has a fixed topology, however. In an ALE approach, the mesh moves in response to evolving problem dynamics. This allows one to track complex physics more accurately, but the mesh often becomes so tangled that it must be remapped periodically. Constructing robust mesh motion algorithms for ALE schemes remains a central challenge.
An interesting recent idea is to combine the best features of ALE and AMR into a new discretization approach that is better suited to petascale simulation of multiphysics applications. The method, called ALE-AMR [2], is illustrated in Figure 3.2. Standard ALE is illustrated in the left graphic: One starts with an Eulerian mesh, which deforms over time in response to problem dynamics, and eventually it must be remapped. Intermediate meshes may possess highly skewed elements which present numerical difficulties. On the other hand, it nicely resolves complex features, such as shock fronts. A typical AMR grid
FIGURE 3.2: ALE-AMR (center) combines the moving mesh feature of ALE (left) with the adaptive refinement of AMR (right) to yield a cost-effective mesh discretization technique that accurately resolves evolving physical phe- nomena such as shock fronts.
hierarchy is shown on the right. The meshing is simpler with uniform ele- ments, but the shock resolution is inferior per grid point compared to the ALE scheme. The ALE-AMR approach is shown in the center of Figure 3.2. The essential idea here is that one refines a portion of the mesh as in AMR through the dynamic insertion and deletion of grid points rather than allow it to deform too much, thereby combining the advantages of both types of adaptivity in one method. This helps to avoid many of the undesirable nu- merical properties associated with the highly skewed elements that arise in ALE schemes. It also allows one to leverage much of the efficient computer science machinery associated with AMR grid hierarchy management.
The combination of ALE and AMR technology — each challenging in it- self — presents many numerical and software challenges that are still being researched. While some fundamentals of ALE-AMR algorithms have been established, the incorporation of more specialized physics capabilities such as sliding surfaces, globally coupled diffusion physics, and multi-material treat- ments continue to pose research challenges.
3.4.3
Hybrid atomistic-continuum algorithms
The discussion so far has focused on continuum methods, that is, numeri- cal methods for approximating a solution on a spatial mesh. In multiphysics applications, however, one needs a range of models to simulate the under- lying physical phenomena accurately. These applications may include some combination of continuum and atomistic (e.g., particle) methods. For in- stance, consider shock-induced turbulent mixing of two fluids, as shown in Figure 3.3. Continuum computational fluid dynamics (CFD) methods (e.g., Euler and Navier-Stokes) adequately describe fluid motion away from the interface, but they are limited by the smallest scales in the computational mesh. On the other hand, atomistic methods (e.g., direct simulation Monte
Continuum representation (Euler, Navier-Stokes) away
from interface
fluid A fluid B
DSMC representation at interface
FIGURE 3.3: An illustrative multiphysics simulation of a shock propagating through two fluids. Continuum methods (e.g., based on Navier-Stokes) ac- curately describe the fluid motion away from the interface, but one needs an atomistic method (e.g., direct simulation Monte Carlo, DSMC) to simulate behavior at the interface. Since the atomistic method is too expensive to use throughout the domain, the use of a hybrid algorithm is attractive.
Carlo) adequately resolve the shock fronts, but they are too expensive to use throughout the problem domain.
Several researchers have recently investigated hybrid continuum-atomistic methods via adaptive mesh and algorithmic refinement (AMAR) [18]. As noted in the preceding section, traditional AMR allows one to refine a contin- uum calculation around dynamically moving and growing interfaces. Specif- ically, AMR refines the mesh around a feature of interest, say a shock front, and applies the same continuum method within the refined mesh. In a hybrid algorithm such as AMAR, one instead switches to a discrete atomistic method at the finest grid scale. This allows one to use an appropriate (but expensive) method only where it is needed. This is illustrated in Figure 3.4, where one can see the particles of an atomistic method embedded within an AMR grid hierarchy.
The implementation of hybrid methods like AMAR could be facilitated by a MPMD programming paradigm like cooperative parallelism. For example, one could easily allocate additional processors dynamically to the finer meshes or to the direct simulation Monte Carlo (DSMC) method. Although this can be done in a data parallel context, a properly implemented MPMD program- ming paradigm should make it easier to implement — but this remains to be demonstrated.
FIGURE 3.4: (See color insert following page 18.) Simulation of a moving interface via a hybrid continuum-atomistic method. The white grid blocks show where a direct simulation Monte Carlo particle method is applied at the finest AMR grid scale to resolve the physics at the interface between two fluids. A continuum-scale method is applied elsewhere in the fluid. (Adapted from Hornunget al.[13])