• No results found

5.4 Self-consistent field algorithms

5.4.7 Combining self-consistent field algorithms

In the previous sections we mentioned quite a few approaches to solve the HF min- imisation problem using a self-consistent field ansatz. Needless to say that different algorithms tend to perform best in different cases. For this reason in practice often a mixture of methods is employed in order to guarantee fast and reliable convergence. This section represents my own judgement of the situation and give some suggestions based on my own experience. Hardly any of this is resulted from any kind of proper

scientific evaluation24and should therefore not be taken as a final answer, much rather

as a guideline.

In the beginning of the procedure ODA or tODA work great, since they essentially direct the coefficients reliably into the right direction, breaking the oscillatory behaviour of the plain Roothaan algorithm. The energy DIIS can be seen as an accelerated improvement of those methods, which is recommendable for cGTO-based discretisations as the initial SCF algorithm in my point of view.

For the intermediate steps a Pulay DIIS shows typically a faster convergence than the energy DIIS [201]. This can be rationalised by considering the conditions on the coefficients for the linear combination of Fock matrices. In the Pulay DIIS these con-

ditions are much laxer compared to the energy DIIS25, making it easier to explore the

SCF manifold and search for directions which lead to nearby stationary points.

24Unfortunately I am not aware of a work, which properly compares the large range of SCF algorithms

with another. Most papers only compare to the Pulay DIIS.

25

140 CHAPTER 5. NUMERICAL APPROACHES FOR SOLVING HF Close to convergence the DIIS becomes numerically more unstable, but conversely second-order SCF schemes like QCSCF now show the fastest and most reliable conver- gence to the SCF minimum. These should be considered in the final SCF steps in order to obtain a highly-accurate SCF minimum after the DIIS.

5.5

Takeaway

The SCF algorithms we discussed in this section all follow the general scheme, where a Fock-update step and a coefficient-update or density-matrix-update step are

repetitively executed. In the former step a new Fock matrix F(n) is constructed from

the present set of SCF coefficients C(n)or the present density matrix D(n). In the latter

step this Fock matrix F(n), perhaps with additional insight gained in previous iterations,

is used in order to generate a new set of coefficients C(n+1) and perhaps from this a

new density D(n+1). For Roothaan’s repeated diagonalisation, the optimal damping

algorithm and the geometric direct minimisation algorithm this sequence of steps is emphasised in figures 5.14 on page 130 and 5.15 on the previous page, where the Fock update step is highlighted in red and the coefficient/density update step in blue in each case. Motivated by the deviating structure of the aforementioned algorithms I consider it reasonable to assume that all SCF algorithms can be thought of in such a two-step process.

Another key result in this chapter is that different basis function types give rise to different numerical structure of the quantities involved in the SCF procedure. We focused most on the Fock matrices of contracted Gaussian, finite-element and Coulomb-Sturmian discretisations, which are shown in figures 5.4 on page 100, 5.9 on page 112 and 5.13 on page 125. These matrices differ both in size as well as in sparsity. Both for FE-based as

well as CS-based discretisations a contraction-based ansatz26, where one avoids building

the Fock matrix at all and instead thinks in terms of matrix-vector applications, showed noteworthy improvements in formal computational scaling.

As we will discuss in depth in the next chapter, a contraction-based ansatz can be thought of as a generalisation of a scheme keeping the matrices in memory. This suggests targeting a contraction-based SCF scheme to achieve maximum generality of the SCF algorithm and potentially independence of the SCF code from the basis function type in a quantum chemistry program.

As mentioned before this implies to formulate the SCF in terms of coefficients to exploit the favourable computational scaling for some basis function types like the FE or the CS functions. We indicated for the ODA algorithm how approximations allow to transform this density-matrix-based SCF into the tODA scheme, which can be formulated as a contraction-based SCF. In section 5.1 on page 86 we furthermore gave more general suggestions, which allow to transform every density-matrix-based SCF into a coefficient- based SCF in theory. We therefore believe it to be possible to construct an efficient contraction-based SCF, which is independent from the type of basis function used and where one is able to switch between multiple algorithms depending on the numerical requirements of the basis functions as well as the chemical system. This in turn opens the door for achieving a single quantum-chemistry program, which is in theory compatible with every type of basis function. We will present such a program in chapter 7.

26

Chapter 6

Contraction-based algorithms

and lazy matrices

There is a race between the increasing complexity of the systems we build and our ability to develop intellectual tools for understanding their complexity. If the race is won by our tools, then systems will eventually become easier to use and more reliable. If not, they will continue to become harder to use and less reliable for all but a re- latively small set of common tasks. Given how hard thinking is, if those intellectual tools are to succeed, they will have to substitute calculation for thought.

— Leslie Lamport (1941–present)

Summarised in one sentence the main idea of contraction-based algorithms is to avoid storing large matrices or tensors in memory and instead employ highly optimised con- traction expressions for the necessary computations. We already saw in the previous chapter that applying such a strategy to the Fock matrix resulting from a FE-based or a CS-based discretisation can lead to an improved formal computational scaling, making these methods a promising approach. Contraction-based algorithms are, however, not at all limited to SCF procedures or quantum-chemical calculations. This chapter will give a general overview of contraction-based methods, giving some examples where these methods are employed as well as discussing the potentials and some drawbacks.

Closely connected to contraction-based methods is the concept of lazy matrices, which is a direct generalisation to the conventional matrices in the form of a domain- specific language for coding contraction-based algorithms. Main goal of the lazy matrix language is to yield code, which can be used both with matrices stored in memory and additionally in a contraction-based fashion without noteworthy changes. A preliminary

C++implementation of lazy matrices with focus on user-friendliness and flexibility is

available in the lazyten library.

142 CHAPTER 6. CONTRACTION-BASED ALGORITHMS & LAZY MATRICES

6.1

Contraction-based algorithms

The underlying idea of contraction-based methods, namely to avoid storing large matrices in favour of using matrix-vector-product expressions, is hardly new. In his paper from 1975 Davidson [66] not only describes his now famous iterative diagonalisation method (see section 3.2.6), but furthermore he suggests to use an algorithmic expression for computing the required matrix-vector products. The use case Davidson had in mind back then was the diagonalisation of the CI or full CI matrix, which is — even today — too large to keep in memory, see remark 4.9 on page 56.

Nowadays contraction-based methods are rather widespread in quantum chemistry. Even though the contraction expressions are sometimes given different names such as

working equations, making the concept less clear. Examples are recent implement-

ations of the algebraic diagrammatic construction (ADC) scheme [209–211], which do not build the complete ADC matrix to be diagonalised, and efficient coupled-cluster schemes [84], which similarly avoid constructing the matrix governing the CC root- finding problem explicitly. Instead both methods use appropriate tensor contractions and compute matrix-vector products on the fly during the respective iterative solves. A somewhat related take on this are the recent matrix-free methods [162] for solv- ing partial differential equations in a finite-element discretisation without building the system matrix in memory at all.

From the algorithmic point of view one should notice, that especially the direct ei- gensolvers and linear solvers algorithms as they are implemented in LAPACK[212] do require random access into the matrix, are thus not available for a contraction-based ansatz. In practice this an acceptable restriction. Firstly because for large matrices

direct methods become unfavourably expensive anyway1. Secondly because many diag-

onalisation methods and methods for solving linear systems do not need the problem matrix in memory. Instead they can be operated just like the Davidson algorithm [66], by coding an expression for delivering the required matrix-vector products. In this cat- egory practically all Krylov-subspace approaches can be found, including widely-adopted algorithms like Arnoldi, Lanczos, conjugate gradient or GMRES [62, 63, 67]. In the context of eigenproblems one should mention that such iterative methods have an addi- tional disadvantage. It is typically very costly to obtain a large number of eigenpairs of the diagonalised matrix. Fortunately this is hardly needed for large matrices and tech- niques like Chebyshev filtering [213–215] or spectral transformations (see section 3.2.4 on page 38) allow to effectively direct the diagonalisation routines towards the part of the eigenspectrum one is truly interested in.

On the one hand, employing a contraction-based method thus does not really restrict the range of numerical problems, which can be tackled. On the other hand avoiding the storage of the problem matrix immediately reduces the scaling in memory from quadratic (in system size) to linear. The rationale for this is that the memory bottleneck in most subspace algorithms is storing the generated subspace, i.e. a fixed number of vectors, which take linear storage. This makes contraction-based methods especially attractive for problems where memory is a bottleneck. Therefore this concept has been introduced in a range of fields of numerics and scientific computing under different names. Terms like apply-based method, matrix-free method or phrases like using matrix-vector

product expressionsor using matrix-vector products overall largely describe the 1

6.1. CONTRACTION-BASED ALGORITHMS 143

Storage layer Latency /ns FLOPs

L1 cache 0.5 13

L2 cache 7 180

Main memory 100 2600

SSD read 1.5 · 104 4 · 105

HDD read 1 · 107 3 · 108

Table 6.1: Typical latency times required for random access into selected layers of

storage. The right-hand side column represents the peak amount of floating point operations a Sandy Bridge CPU with 3.2 GHz clock frequency could perform in the same time assuming perfect pipelining. Data taken from [216] and [217]. Notice, that the seek time on HDDs averages out in sequential HDD reads. For example reading 1 MB from

disk only takes about 2 · 107ns, i.e. only twice as long as the seek by itself. For other

types of storage this effect is less pronounced.

same concept. I personally like the term contraction-based best, because under the hood evaluating such matrix-vector products in many cases, that I came across, involves expressions with contractions over tensors with rank larger than 2. Consider for example the coupled-cluster doubles working equations (4.96) or the contraction expression for the exchange matrix in a CS-based discretisation of Hartree-Fock (5.52). Additionally, calling such algorithms contraction-based indicates that the idea to substitute storage by expressions is more general than the matrix-vector product. In theory one could think of similar approaches for higher-order tensor contractions as well.