4.8 Complexity of Ewald summation and parameter selection
4.8.6 Selection of P 2 NFFT parameters
A crucial point for the success of a numerical algorithm is the determination of good parameters. This means that we aim to fulfill some prescribed error bounds at the
shortest runtime, which emphasizes the need for accurate error estimates. Error estimates for the P3M have been derived in [39] and can also be used to determine
the parameters of the 3d-periodic P2NFFT. However, in this section we present
an alternative parameter tuning based on the Ewald summation error estimates (4.42), (4.43) presented in Section 4.8. Our approach enables us to reach very high accuracy and can be generalized to non-periodic boundary conditions. For the sake of simplicity, we assume a cubic box of size L×L×L. The list of P2NFFT parameters
is given as follows:
• short-range cutoff rc>0,
• Ewald splitting parameter α > 0, • NFFT grid size ˆm∈ 2N3,
• FFT grid size M = σ ˆm∈ 2N3, • NFFT window cutoff cϕ ∈ N,
• extended box shape H ∈ R3×3, and
• degree of smoothness s ∈ N of the regularization.
Thereby, we introduced the vector valued oversampling factor σ := ˆm M−1 ∈
[1, ∞)3. We will see that tuning σ instead of M is more convenient since it stays
constant for N → ∞.
In the following, we present our parameter tuning for 3d-periodic boundary condi- tions. Afterward, we will extend this approach to non-periodic boundary conditions. The starting point of our parameter tuning is some sensible choice of short-range cutoff rc. Thereby, rcshould be chosen such that the computation of the short-range
interaction part does not get overwhelming. Next, we compute α and kcfrom (4.42)
such that ∆φsr = ε/√2 and ∆φlr = ε/√2 is fulfilled with a prescribed rms potential
error bound ε. The resulting values of α and kc are given by
α= 1 rc v u u tW r Qrc L3 1 ˜ε ! , kc= √32 αL π v u u tW " 4 3 2απ Q L3 2/3 1 ˜ε4/3 # , (4.53)
where ˜ε :=ε/√2. Note that (4.44) implies that the same formulas can be used with
˜ε := pN Q−1ε/√2 in order to ensure a rms energy bound ∆U ≤ ε. Alternatively, if εwas the rms field error bound we would have computed α and kc from (4.43) such
that ∆Esr =ε/√2and ∆Elr =ε/√2, i.e.,
α = 1 rc v u u tln r Q rcL3 2 ˜ε ! , kc = αL2π v u u tW"64α π3 Q L3 2 1 ˜ε4 # . (4.54)
Again by (4.44) the same formulas can be used with ˜ε := pN Q−1ε/√2 to ensure
the rms field error bound ∆F ≤ ε. In [93, Figure 7] it was shown numerically that these formulas also hold for the mixed-periodic case. Note that kc represents a
4.8 Complexity of Ewald summation and parameter selection 129
lengths of a box shaped cutoff scheme. Therefore, we choose the components of ˆ
m= ( ˆm0,mˆ1,mˆ2) as
ˆ
mt= 2dkce, t = 0, 1, 2. (4.55)
Next, we need to tune the NFFT parameters such that the NFFT approximation error is negligible in comparison to the prescribed approximation error ε of the truncated Ewald sum. More precisely, these parameters are the FFT mesh size
M ∈ 2N3 and the window cutoff parameter cϕ. We claim that these two parameters
must be tuned only for a small test system. For larger particle numbers the same accuracy can be achieved by keeping the oversampling factor σ = M ˆm−1 ∈
[1, ∞)3 and the window cutoff c
ϕ constant. In Section 4.11.2 we give some numerical
evidence that this is true. At the moment the parameters of the NFFT are not tuned automatically but the development of an automatic NFFT parameter tuning is subject to recent research [102, 103]. Finally, the extended box shape can be chosen as H = diag(L, L, L) and the degree of smoothness s is not present for the 3d-periodic case.
Once the parameters for the 3d-periodic P2NFFT are found, we can use the fol-
lowing heuristic to find a parameter set for 2d-, 1d-, and 0d-periodic boundary conditions. This heuristic was proposed in [3] and we only give a short summary of the approach. The key idea is to keep the NFFT parameters cϕ, σ from the 3d-
periodic case and define the grid size ˆmin such a way that the number of grid points
per volume in the extended box H−1/2,1/23 remains constant. More precisely, we
set ˆ
mt=
(
2dkce if the t-th dimension is periodic,
4√3 − pdkce + P if the t-th dimension is non-periodic. (4.56)
Hereby, p denotes the number of periodic dimensions and the so-called transition grid size P ∈ 2N denotes a small number of extra grid points that are introduced within the support of the transitions Tk0,k1, Tk0, T in (4.17),(4.20), and (4.23). Note
that the additional factor 2√3 − p in the non-periodic case results from the period of the regularizations (4.17), (4.20), and (4.23). Thereby, √3 − p reflects the scaling of a (3 − p)-dimensional ball into a (3 − p)-dimensional cube. Next, the extended box shape H = diag(H0, H1, H2) is given by
Ht= L2dkmˆtce. (4.57)
Note that for a given transition grid size P the NFFT grid size ˆmand the extended
box shape H can be easily computed. Our numerical test in Section 4.11.3 confirm the claim that P stays constant for N → ∞ and, therefore, does not effect the complexity of the P2NFFT.
The parameter selection can be summarized in the following easy steps: 1. Choose the short-range cutoff rc.
2. Calculate the Ewald splitting parameter α and the NFFT grid size ˆm by
(4.53)–(4.55).
3. Determine the NFFT oversampling factor σ, and the window cutoff cϕ numer-
ically for a small test system.
4. Set the degree of smoothness s := 10 and tune the regularization grid size P numerically for a small test system.
5. Adjust the grid size ˆmaccording to (4.56) and set the extended box shape H
according to (4.57).
6. Apply the NFFT oversampling factor σ to the adjusted grid size ˆmto get the
adjusted FFT grid size M = σ ˆm.
Once all parameters have been determined we can execute P2NFFT for a small
test system and record the runtime tsr
N ∼ rc3N of the short-range computations and
the runtime tlr
N ∼ k3clog kc3 of the long-range computations. With these two times
we can easily extrapolate the value for rc that minimizes the total runtime tsrN + tlrN
for any particle number N.
Remark 4.7. The asymptotic runtime tlr
N ∼ kc3log k3c of the long-range part is a
rather crude approximation for small kc. Indeed, the runtime of the long-range part
accumulates from many steps that have different asymptotic runtime. The most time consuming parts are given by the matrix-decomposition (3.20) of the NFFT. These are the deconvolution step, the FFT, and the discrete convolution step. In order to get a good prediction, we must measure and extrapolate the runtime of these steps individually. This is possible in our implementation, since all these times are
available by the PNFFT timer interface.