Selection of P 2 NFFT parameters - Complexity of Ewald summation and parameter selection

4.8 Complexity of Ewald summation and parameter selection

4.8.6 Selection of P 2 NFFT parameters

A crucial point for the success of a numerical algorithm is the determination of good parameters. This means that we aim to fulfill some prescribed error bounds at the

shortest runtime, which emphasizes the need for accurate error estimates. Error estimates for the P3_{M have been derived in [39] and can also be used to determine}

the parameters of the 3d-periodic P2_{NFFT. However, in this section we present}

an alternative parameter tuning based on the Ewald summation error estimates (4.42), (4.43) presented in Section 4.8. Our approach enables us to reach very high accuracy and can be generalized to non-periodic boundary conditions. For the sake of simplicity, we assume a cubic box of size L×L×L. The list of P2_{NFFT parameters}

is given as follows:

• short-range cutoff rc>0,

• Ewald splitting parameter α > 0, • NFFT grid size ˆm_{∈ 2N}3,

• FFT grid size M = σ ˆm∈ 2N3, • NFFT window cutoff cϕ ∈ N,

• extended box shape H ∈ R3×3_{, and}

• degree of smoothness s ∈ N of the regularization.

Thereby, we introduced the vector valued oversampling factor σ := ˆm_M−1 _∈

[1, ∞)3_{. We will see that tuning σ instead of M is more convenient since it stays}

constant for N → ∞.

In the following, we present our parameter tuning for 3d-periodic boundary conditions. Afterward, we will extend this approach to non-periodic boundary conditions. The starting point of our parameter tuning is some sensible choice of short-range cutoff rc. Thereby, rcshould be chosen such that the computation of the short-range

interaction part does not get overwhelming. Next, we compute α and kcfrom (4.42)

such that ∆φsr ₌ _ε_/_√2 _{and ∆φ}lr ₌ _ε_/_√2 _{is fulfilled with a prescribed rms potential}

error bound ε. The resulting values of α and kc are given by

α= 1 rc v u u tW r Qrc L3 1 ˜ε ! , kc= √3₂ αL π v u u tW " 4 3 2απ Q L3 2/3 ₁ ˜ε4/3 # , (4.53)

where ˜ε :=ε_/√2. Note that (4.44) implies that the same formulas can be used with

˜ε := pN Q−1ε_/√2 in order to ensure a rms energy bound ∆U ≤ ε. Alternatively, if εwas the rms field error bound we would have computed α and kc from (4.43) such

that ∆Esr ₌_ε_/_√2_{and ∆E}lr ₌_ε_/_√2_{, i.e.,}

α = 1 rc v u u tln r Q rcL3 2 ˜ε ! , kc = αL_2π v u u tW"64α π3 Q L3 2 ₁ ˜ε4 # . (4.54)

Again by (4.44) the same formulas can be used with ˜ε := pN Q−1ε_/√2 to ensure

the rms field error bound ∆F ≤ ε. In [93, Figure 7] it was shown numerically that these formulas also hold for the mixed-periodic case. Note that kc represents a

4.8 Complexity of Ewald summation and parameter selection 129

lengths of a box shaped cutoff scheme. Therefore, we choose the components of ˆ

m= ( ˆm0,mˆ1,mˆ2) as

mt= 2dkce, t = 0, 1, 2. (4.55)

Next, we need to tune the NFFT parameters such that the NFFT approximation error is negligible in comparison to the prescribed approximation error ε of the truncated Ewald sum. More precisely, these parameters are the FFT mesh size

M ∈ 2N3 and the window cutoff parameter cϕ. We claim that these two parameters

must be tuned only for a small test system. For larger particle numbers the same accuracy can be achieved by keeping the oversampling factor σ = M ˆm−1 _∈

[1, ∞)3 _{and the window cutoff c}

ϕ constant. In Section 4.11.2 we give some numerical

evidence that this is true. At the moment the parameters of the NFFT are not tuned automatically but the development of an automatic NFFT parameter tuning is subject to recent research [102, 103]. Finally, the extended box shape can be chosen as H = diag(L, L, L) and the degree of smoothness s is not present for the 3d-periodic case.

Once the parameters for the 3d-periodic P2_{NFFT are found, we can use the fol-}

lowing heuristic to find a parameter set for 2d-, 1d-, and 0d-periodic boundary conditions. This heuristic was proposed in [3] and we only give a short summary of the approach. The key idea is to keep the NFFT parameters cϕ, σ from the 3d-

periodic case and define the grid size ˆmin such a way that the number of grid points

per volume in the extended box H−1/2,1/23 remains constant. More precisely, we

set ˆ

mt=

(

2dkce if the t-th dimension is periodic,

4√3 − pdkce + P if the t-th dimension is non-periodic. (4.56)

Hereby, p denotes the number of periodic dimensions and the so-called transition grid size P ∈ 2N denotes a small number of extra grid points that are introduced within the support of the transitions Tk0,k1, Tk0, T in (4.17),(4.20), and (4.23). Note

that the additional factor 2√3 − p in the non-periodic case results from the period of the regularizations (4.17), (4.20), and (4.23). Thereby, √3 − p reflects the scaling of a (3 − p)-dimensional ball into a (3 − p)-dimensional cube. Next, the extended box shape H = diag(H0, H1, H2) is given by

Ht= L_2dkmˆt_c_e. (4.57)

Note that for a given transition grid size P the NFFT grid size ˆmand the extended

box shape H can be easily computed. Our numerical test in Section 4.11.3 confirm the claim that P stays constant for N → ∞ and, therefore, does not effect the complexity of the P2_NFFT.

The parameter selection can be summarized in the following easy steps: 1. Choose the short-range cutoff rc.

2. Calculate the Ewald splitting parameter α and the NFFT grid size ˆm by

(4.53)–(4.55).

3. Determine the NFFT oversampling factor σ, and the window cutoff cϕ numer-

ically for a small test system.

4. Set the degree of smoothness s := 10 and tune the regularization grid size P numerically for a small test system.

5. Adjust the grid size ˆmaccording to (4.56) and set the extended box shape H

according to (4.57).

6. Apply the NFFT oversampling factor σ to the adjusted grid size ˆmto get the

adjusted FFT grid size M = σ ˆm.

Once all parameters have been determined we can execute P2_{NFFT for a small}

test system and record the runtime tsr

N ∼ rc3N of the short-range computations and

the runtime tlr

N ∼ k3clog kc3 of the long-range computations. With these two times

we can easily extrapolate the value for rc that minimizes the total runtime tsrN + tlrN

for any particle number N.

Remark 4.7. The asymptotic runtime tlr

N ∼ kc3log k3c of the long-range part is a

rather crude approximation for small kc. Indeed, the runtime of the long-range part

accumulates from many steps that have different asymptotic runtime. The most time consuming parts are given by the matrix-decomposition (3.20) of the NFFT. These are the deconvolution step, the FFT, and the discrete convolution step. In order to get a good prediction, we must measure and extrapolate the runtime of these steps individually. This is possible in our implementation, since all these times are

available by the PNFFT timer interface.

In document Massively Parallel, Fast Fourier Transforms and Particle-Mesh Methods: Massiv parallele schnelle Fourier-Transformationen und Teilchen-Gitter-Methoden (Page 127-130)