Conclusions and Future Work - Performance Models for Electronic Structure Methods on Modern Com

In this chapter a study of thread and memory placement effects on the Gaussian code was undertaken. It comprises three parts:

First, a 18-Crown-6 Ether molecule using a 6-31G* basis set and Valinomycin using a 3- 21G basis was used to obtain serial and parallel timings using a modified version of Gaussian, which could perform thread and memory placement. Timing results were obtained for two cache blocking factors and speedup plots were presented. Serial results, using 18-Crown-6 Ether, showed that good cache blocking can potentially mitigate the effect of poor thread and memory placement on a NUMA machine like the SunFire X4600 M2. Parallel results, using Valinomycin, indicate that good speedups can be obtained by the use of cache blocking and co-locating threads and memory. Speedup results for an unmodified instance of the Gaussian code indicated that it scales similarly to an instance which has all memory being allocated on one node.

Second, a straightforward extension of the LPM to NUMA systems was proposed and evaluated. The extension uses a simple additive model to account for the cost penalty associated with fetching data items from non-local NUMA domains. The NUMA and multi-threaded LPM requires cache misses for each NUMA domain. By using targeted thread and memory placements a set of base timings were obtained with combinations of these used to calculate times for new thread and memory placements. The extended model was then evaluated for single and multi-threaded experiments. Results upto 4 threads had a 5% error in prediction, whereas for 8 threads the error increased to 15% for the dual-core case. The extended model does not account for interconnect contention explicitly. It was proposed that two additional scaling factors could be used to reduce the modelling errors on multi-core platforms.

Third, the use of page migration to affect data locality was assessed using the memory placement APIs in the Solaris operating system. These were used to place Fock matrices locally and to node interleave pages for the Density matrix to reduce contention. A series of placement experiments were performed for the HF, BLYP and B3LYP methods. All calculation types benefited from the use of page migration for the Fock matrix and node interleaving of the Density matrix.

For future work, it would be useful to incorporate the extended LPM as a feedback loop into PRISM, PRISMC and CALDFT to adaptively alter thread and memory placement deci- sions based on the topology of the underlying hardware.

A Comparative Study of Charges

Obtained for a Set of Water Cluster

Complexes

6.1 Introduction

The primary focus of this thesis is towards the creation and use of performance models for the Gaussian quantum chemistry code on NUMA platforms. The electronic structure calculations performed in previous chapters generated a wavefunction for a fixed molecular geometry. As the wavefunction is a mathematical construct, computational chemists have devised procedures to extract meaning from it in line with their ‘chemical intuition’. One such model relates to the assignment of charge onto constituent atoms in a molecular system. Charges are experimen- tally observable quantities [213], and relate to chemical processes such as to bond formation, electronegativity, polarization and sites for electrophilic or nucleophilic attack [188, 315].

In this chapter the Gaussian code is used to perform charge analysis of the electronic wavefunction for a set of molecular systems. This analysis is an example of how Gaussian is used by computational chemists. In chapter 4 two test molecular systems (k300a-04 and k300a- 08) were used in assessing the Linear Performance Model (LPM). These two test systems are part of a larger ensemble of molecular systems used in this chapter. The larger ensemble of systems lie at the forefront of system sizes that are amenable to calculation at present.

In 2004, Bliznyuk and Rendell [28] studied the electronic charge on a potassium ion (K+) located at two specific positions in a large molecular structure known as a Potassium ion channel [167]. (The potassium ion channel is a pore-like protein, present in all biological cells which is responsible for regulating the flow of ions into and out of the cell. A schematic is shown in Figure 6.1). Charge results have been reproduced in Table 6.1. The table gives the nett charge on the K+ ion obtained using HF and B3LYP methods with a 6-31G* basis set. Charges were obtained using the Mulliken Population Analysis (MPA) method [189].

Table 6.1: Mulliken Charges (au) obtained using HF and the B3LYP DFT functional on a K+ ion positioned at two locations (Point B, Point C). The 6-31G* basis set was used for both methods. Reproduced from Table 5 in [28].

Shell Structure 2 (Point B) Structure 3 (Point C)

Cutoff HF DFT HF DFT 3 0.745 0.580 0.572 0.277 5 0.617 0.369 0.565 0.261 6 0.490 0.169 0.540 0.221 8 0.433 0.079 0.532 0.195 10 0.426 0.068 0.530 0.193

In Table 6.1, there are references to two structures (‘Structure 2’ and ‘Structure 3’), in which the potassium ion is located at ‘Point B’ or ‘Point C’. These two locations were chosen as they represent different electronic environments for the passage of the K+ ion through the channel. The two structures were taken from a molecular dynamics (MD) study performed in earlier work by Bliznyuk et. al. [29] . The ‘Shell Cutoff’ column, in the table, refers to the distance criteria used to determine if a given molecule was to be included when computing the charge on the K+ ion. A key point is the variation of charge on the K+ ion; specifically the DFT results shows that the K+ ion loses almost all of its charge as the system size is increased, and this is true for both structures. The dramatic reduction in charge on the K+ ion using DFT is clearly an unphysical result.

Bliznyuk and Rendell point out that this result reinforces “the view that electron density results obtained from DFT calculations should be viewed with extreme caution, especially for large molecules”. They note also that the dramatic reduction in charge on K+ is due to DFT methods overemphasising the importance of polarization in large molecules as has been observed elsewhere [103], but they do not prescribe alternative strategies to perform charge analysis using DFT methods1.

The aim of this study is to further investigate the issue of how to obtain charges for K+ ions using DFT wavefunctions, including the use of two different charge analysis methods – the Mulliken Population Analysis and Natural Population Analysis (NPA).

The chapter layout is as follows: Section 6.2 presents background material. Section 6.3 briefly discusses the test molecular systems, software and methodologies used in this chapter. Section 6.4 presents results for charges on a K+ obtained from HF, BLYP and B3LYP methods using both the MPA and NPA methods. The distribution of charge within a water cluster system is analyzed in Section 6.5. Following this Section 6.6 presents an analysis of charge distribution in a water cluster as a function of the radial distribution function. In Section 6.7

1_{DFT methods are widely used due to their}_O

)scaling for large systems and its ability to obtain

Figure 6.1:Schematic illustration of the KcsA potassium ion channel showing the location of Point B and C. Taken from [28].

we compare spherically integrated electron density, obtained from the HF, BLYP and B3LYP methods. Section 6.8 considers the use alternative basis sets. Previous work is discussed in Section in 6.9 and the Chapter concludes in Section 6.10.

In document Performance Models for Electronic Structure Methods on Modern Computer Architectures (Page 179-183)