Termination Detection.
2.8 Case Study: Computational Chemistry
Our third case study, like the first, is from computational science. It is an example of an application that accesses a distributed data structure in an asynchronous fashion and that is amenable to a functional decomposition.
2.8.1 Chemistry Background
Computational techniques are being used increasingly as an alternative to experiment in chemistry. In what is called ab initio quantum chemistry, computer programs are used to compute fundamental properties of atoms and molecules, such as bond strengths and reaction energies, from first principles, by solving various approximations to the Schrödinger equation that describes their basic structures. This approach allows the chemist to explore reaction pathways that would be hazardous or expensive to explore experimentally. One application for these techniques is in the investigation of biological processes. For example, Plate 6
shows a molecular model for the active site region in the enzyme malate dehydrogenase, a key enzyme in the conversion of glucose to the high-energy molecule ATP. This image is taken from a simulation of the transfer of a hydride anion from the substrate, malate, to a cofactor, nicotinamide adenine diphosphate. The two isosurfaces colored blue and brown represent lower and higher electron densities, respectively, calculated by using a combined quantum and classical mechanics methodology. The green, red, blue, and white balls are carbon, oxygen, nitrogen, and hydrogen atoms, respectively.
Fundamental to several methods used in quantum chemistry is the need to compute what is called the Fock matrix, a two-dimensional array representing the electronic structure of an atom or molecule. This matrix, which is represented here as F, has size N N and is formed by evaluating
the following summation for each element:
where D is a two-dimensional array of size N N that is only read, not written, by this computation
and the I represent integrals that are computed using elements i, j, k, and l of a read-only, one-
dimensional array A with elements. An integral can be thought of as an approximation to the
Because Equation 2.3 includes a double summation, apparently 2 integrals must be computed for each element of F, for a total of 2 integrals. However, in practice it is possible to exploit
redundancy in the integrals and symmetry in F and reduce this number to a total of . When
this is done, the algorithm can be reduced to the rather strange logic given as Algorithm 2.3. In principle, the calculation of each element of F requires access to all elements of D and A;
furthermore, access patterns appear highly irregular. In this respect, the Fock matrix construction problem is representative of many numeric problems with irregular and nonlocal communication patterns.
For the molecular systems of interest to chemists, the problem size N may be in the range . Because the evaluation of an integral is a fairly expensive operation, involving operations, the construction of the Fock matrix may require operations. In addition, most methods require that a series of Fock matrices be constructed, each representing a more accurate approximation to a molecule's electronic structure. These considerations have motivated a considerable amount of work on both efficient parallel algorithms for Fock matrix construction and improved methods that require the computation of less than integrals.
Partition.
Because the Fock matrix problem is concerned primarily with the symmetric two-dimensional matrices F and D, an obvious partitioning strategy is to apply domain decomposition techniques to
these matrices to create N(N+1)/2 tasks, each containing a single element from each matrix ( , ) and responsible for the operations required to compute its . This yields N(N+1)/2 tasks, each with data and each responsible for computing 2 integrals, as specified in Equation 2.3.
This domain decomposition strategy is simple but suffers from a significant disadvantage: it cannot easily exploit redundancy and symmetry and, hence, performs eight times too many integral computations. Because an alternative algorithm based on functional decomposition techniques is significantly more efficient (it does not perform redundant computation and does not incur high communication costs), the domain decomposition algorithm is not considered further.
Figure 2.31: Functional decomposition of Fock matrix problem. This yields about data tasks, shown in the upper part of the figure, and computation tasks, shown in the lower part of the figure. Computation tasks send read and write requests to data tasks.
Quite a different parallel algorithm can be developed by focusing on the computation to be performed rather than on the data structures manipulated, in other words, by using a functional decomposition. When redundancy is considered, one naturally thinks of a computation as comprising a set of integrals (the integral procedure of Algorithm 2.3), each requiring six D
elements and contributing to six F elements. Focusing on these computations, we define
``computation'' tasks, each responsible for one integral.
Having defined a functional decomposition, we next need to distribute data structures over tasks. However, we see no obvious criteria by which data elements might be associated with one computation task rather than another: each data element is accessed by many tasks. In effect, the F, D, and A arrays constitute large data structures that the computation tasks need to access in a
distributed and asynchronous fashion. This situation suggests that the techniques described in Section 2.3.4 for asynchronous communication may be useful. Hence, for now we simply define two sets of ``data'' tasks that are responsible only for responding to requests to read and write data values. These tasks encapsulate elements of the two-dimensional arrays D and F ( , ) and of
the one-dimensional array A ( ), respectively. In all, our partition yields a total of approximately
computation tasks and data tasks (Figure 2.31).