• No results found

3.2 Adaptive Mesh Refinement

3.2.3 Adaptive Mesh Refinement Libraries

Supporting adaptive mesh refinement in an application requires managing a large amount of dynamic, complex, and distributed data. Abstraction in the software is often used to provide flexible implementations of key components such as a patch. This means that AMR functionality is well suited to being en- capsulated in a purpose-built library. Many existing libraries provide the con- cepts necessary to construct a structured AMR application, including Chombo [35] from Lawrence Berkeley National Laboratory, SAMRAI [164] from Lawrence Livermore National Laboratory, and PARAMESH [103] from NASA Goddard

and Drexel University.

Extending an application to support AMR adds additional housekeep- ing code, associated with managing an adaptive mesh. This metadata and bookkeeping overhead can become a large proportion of simulation runtime. Reducing the cost of storing and managing the metadata required to advance a simulation on an adaptive mesh is an active area of research, and there are a number of possible techniques used to help ensure that the required algo- rithms and data structures are scalable [65, 99, 100]. For example, the metadata about the location of patches in the hierarchy can be distributed, saving space but adding complexity [64]. Using a library means that users and application developers are no longer responsible for writing this complex code, and can instead focus on scientific and domain-specific concerns.

The components found in the AMR hierarchy provide a convenient pro- gram design template, allowing concepts to be encapsulated inside classes, such that most of the program code can act on these objects with some level of abstraction. Development of AMR libraries focuses on both scalable im- plementations of the core AMR algorithms: regridding, synchronisation, and boundary conditions; as well as on the usability of the package from a user perspective. Here we present an overview of current AMR libraries.

The AMR software released by Berger contains all the numerical routines needed to implement AMR for hyperbolic conservation laws in two- or three- dimensions [19]. However, this Fortran package is not designed to provide an object-oriented view of AMR concepts, and as such, can make it more difficult to integrate with existing applications, or to ensure that new applications are written in such a way as to be extensible and easy to maintain.

The AMRCLAW library is a general framework for simulating wave propagation algorithms on an adaptive mesh, and used for modelling tsunamis and other ocean phenomena as part of the GEOCLAW software [20, 22, 32]. The AMR implementation follows Berger’s original formulation but with some spe- cific modifications for the simulation of wave propagation.

The BoxLib library from the Center for Computational Science and En- gineering and Lawrence Berkeley National Laboratory (LBNL) was the first of these more component-based libraries [38]. BoxLib provides abstractions for a programmer to use when creating an application: a global index space, rectan- gular regions of the index space, data defined on the regions of the index space. Each of these abstractions is a class, so also provides a set of operations that pro- vide an intuitive way to work with the data. Based in part on the work of the BoxLib developers, Chombo is another C++ AMR framework from LBNL. De- signed around supporting parallel AMR calculations at a range of scales, the Chombo framework has been used for both domain-specific scientific simula- tions, as well as research into scalable AMR techniques [34, 160]. The Berkeley libraries have provided a well designed set of abstractions for AMR that are used by many other libraries.

The AMROC package is a generic block-structured AMR package writ- ten in C++ [44, 45]. AMROC’s design focuses on the definition of routines that will advance the simulation on a single patch. These routines are primarily the numerical integrator, the physical boundary settings, and the initial condi- tions. The AMROC package adds an additional constraint by not distributing refined patches to a processor other than that which owns the coarse region of the domain. This constraint means that most of the AMR algorithm can be per- formed locally, avoiding the complex communications required when patches are distributed. However, the problem with this approach is that it may lead to load imbalance. Nevertheless, the AMROC package has been used to perform detonation simulations in parallel on up to 48 processors [43].

The PARAMESH package, from Drexel university and the NASA God- dard Space Flight Center, contains a set of Fortran subroutines designed to allow a developer to extend an existing serial application with AMR capabil- ity [103]. Rather than allowing arbitrary patches, PARAMESH uses a fixed sub- grid approach, where grids can be toggled on and off, dependent on whether or not they contain an area of interest. Each sub grid is identical in logical struc-

ture to its parent, so a grid containing64cells would be refined into four64

sub-grids, where the spatial resolution is twice as fine.

The Structured Adaptive Mesh Refinement Application Infrastructure (SAMRAI) package from Lawrence Livermore National Laboratory is a collec- tion of AMR abstractions, with design roots in the work of the Berkeley devel- opers [78]. Like Chombo, SAMRAI is used both as a framework for algorith- mic research in AMR and for large scientific simulations. The US Department of Energy’s Exascale program, and the close working relationship that the Uni- versity of Warwick has with members of Lawrence Livermore National Lab- oratory meant we selected the SAMRAI package when designing CleverLeaf. The SAMRAI library is described fully in Chapter 5.

Adaptive mesh refinement can also be supported using dynamic run- time systems, such as Charm++ and Uintah [100, 125]. Charm++ organises computation around the concept of migratable objects, which are created by dividing the problem space up into chunks of work. The objects are assigned to processors, and relationships created between the objects allow for commu- nication of boundary conditions. In an AMR application, the concept of mi- gratable objects maps neatly to patches. Uintah uses a task-graph approach to describe computational tasks and data communication. A scheduler can then assign tasks to processors based dynamically, allowing for a high-degree of par- allelism. As with Charm++, using tasks to represent patches allows the devel- opment of AMR applications within Uintah. Other dynamic runtime systems include Overture and GrACE [68].

3.3 Summary

In this chapter we have described the partial differential equations used to rep- resent the motions of fluids, and shown how they are solved computationally. Understanding hydrodynamics is essential in a range of industrial and research contexts such as astrophysics, defence, and the oil and gas industries. The dis-

cretisation of the equations influences the accuracy of the solution, but a high- resolution calculation requires more computational resources. AMR is a tech- nique used to only increase the resolution of a computational simulation in ar- eas where it will be most effective. Combined with the observation that most scientific domains exhibit this locality, where important problem features such as shock waves are confined to a small portion of the domain, we are able to reduce the number of resources required while maintaining solution accuracy. We discuss Euler’s equations, one form of equations describing fluid mo- tion, and present an outline of the solution scheme used in the two hydrody- namics mini-applications described in this thesis. Supporting AMR in an appli- cation requires managing a large amount of dynamic, complex, and distributed data and software-level abstraction can provide flexible implementations of key components such as a patch. We consider examples from the literature of avail- able AMR libraries, and discuss the AMR library we chose to use for the work presented in this thesis.

Whilst AMR has the potential to deliver results faster, it requires complex communication and management to ensure that the simulation is advanced cor- rectly. This complexity can harm application performance, and understanding and improving this performance on current and future architectures is an issue at the core of this thesis.

Performance Engineering with Mini-Applications

The United States of America’s Department of Energy has maintained that ex- ascale computing power (1018 Floating Point Operations per Second) will be ready to use in the next decade [5, 89]. Whilst vendor technology roadmaps and research and development strategies are tightly protected by non-disclosure agreements, it is accepted that the first exascale systems will look dramatically different to traditional supercomputer architectures [89]. Some predictions for an exascale architecture feature accelerator-type devices with slower process- ing cores and vastly increased opportunities for executing many instructions in parallel. Multi-level memory hierarchies may be used, offering complex tiers of memory performance on-node. The network infrastructure will be fast and may be connecting millions of cores across the full system. With this huge core count comes a reduction in the mean time between failures: even if the expected failure rate of a single processor core is low, when there are over one million cores in the system, the failure rate of the machine could become a significant issue [146].

One example of a machine similar in style to the predicted exascale ar- chitecture is Sequoia, an IBM Blue Gene/Q at Lawrence Livermore National Laboratory. However, it still lacks some of the more novel features such as low-powered accelerator-type processing cores and a complex memory hierar- chy. Investigating these kinds of systems now is essential in preparing for the complexities future supercomputers will introduce.

In this chapter we discuss the problems faced when porting production codes to future architectures, with a particular focus on the exascale machines predicted to arrive in the next five years. We introduce mini-applications— small, self-contained programs that represent the performance characteristics of a production application—as a possible solution to this problem. The value of mini-applications is fully realised when they are carefully applied to con- sider specific questions; this requires guidance and some high-level organisa- tion. We describe the Mantevo project, a collection of mini-applications from a wide-range of scientific domains, one provider of this required direction. Two of the mini-applications in the Mantevo suite were developed at the Univer- sity of Warwick and the Atomic Weapons Establishment (AWE) as part of this thesis, and we use these as examples of how mini-applications can be used to solve the problems of moving production codes to future architectures.

4.1 Production Applications and Future Archi-