CHAPTER 2. LITERATURE REVIEW
2.1 Simulation Methods
2.1.3 Tricks of computer simulation
Some tricks must be played in the computer simulation, at least for two important reasons: one is to mimic the “real” system since the simulation is never a “real” experiment, the other
is to accelerate the computing speed. The former includes the periodic boundary condition, and the Nos´e-Hoover thermostat, the latter includes the concept of the neighborlist, and the parallel computing.
2.1.3.1 periodic boundary condition
Comparing to most of the “real” experiments, the number of particles in a molecular simulation is often much smaller, particles which are close to the walls of the simulation box, or in experimental words, on the surface, would experience very different forces compared to the particles in the box (bulk particles) would do. If we are interested in simulating the properties of the bulk particles, periodic boundary condition must be applied [107]. The periodic boundary condition is illustrated in Figure. 2.2.
For simplicity, only two dimension is shown, particles leave and enter boxes through the four walls. The basic idea of the periodic boundary condition is: as a particle (particles with dark orange background) moves in the simulation box (shaded in dark blue), the periodic image of it (particles with light orange background) in the neighboring image boxes (shaded in light blue) also moves in the same way, such that when the particle leaves the simula- tion box (particle 1 with dark orange background), its image (particle 1 with light orange background) enters the simulation box from the neighboring image box through the opposite wall. Such that the environment (inter-atomic potentials) of the particles close to the wall is as the same as that of bulk particles, and the total number of particles keeps the same.
2.1.3.2 the Nos´e-Hoover thermostat
The canonical ensemble is often used in the molecular simulation, which is also called the N V T ensemble in simulation for it is with constant number of particles, volume, and temperature. Most real process can be modeled in the N V T ensemble instead of the micro- canonical ensemble (N V E), because temperature is a measurable value from the experiment. In order to set and keep the temperature in the simulation, thermostat must be introduces, the Nos´e-Hoover thermostat [108,109] is widely used in molecular simulation. The basic idea is to introduce a fictitious variable ζ to the Newtonian equation. 2.1 which governs the dynamics of the particles:
d2r~ i dt2 = ~ Fi mi − ζd~ri dt (2.18)
effectively, the variable ζ represents a dumping force, which tunes the acceleration of the particles, ζ is related to the temperature by:
dζ(t) dt = 1 Q " N X i=1 mi ~vi2 2 − 3N + 1 2 kBT # (2.19) where Q is the Nos´e-Hoover mass which determines relaxation of the dynamics of the dump- ing, kB is the Boltzmann constant, and T denotes the target temperature. It can be seen
that with this Nos´e-Hoover approach, the system tends to the steady temperature T , in which PN i=1mi ~v2 i 2 = 3N +1
2 kBT (3-dimension, the +1 is because the additional degree of free-
dom ζ). It is necessary to emphasis that Nos´e-Hoover thermostat does not fix the system’s temperature, but allows some fluctuations, which is very similar to the experiment’s scenario.
2.1.3.3 neighborlist
Figure 2.3 The feasibility of the neighborlist method.
No matter which inter-atomic potential is used, the time taken to compute the forces on the N particles in the system is O(N2) if we execute a double loop. To reduce the
computational load, Verlet [76] suggested an approach by recording a list of the neighbor particles of a particular ith particle, which is updated periodically every a few simulation time steps, this method can reduce the time taken to compute the forces to O(N ). The basic idea of the Verlet neighborlist is to surround the potential cut-off radius rc by a “skin”
i, the neighborlist data is stored in an array and kept until the next update is needed. The feasibility of the neighborlist method is shown in Figure. 2.3, the skin width rs is chosen
reasonably large such that between neighborlist updates, a particle (for example, particle 5 in Figure.2.3) which is not in the list of particle i, will not move through the skin area into the rc sphere.
In the implementations, faster neighborlist store and indexing methods are developed, including the cell index [110,111], the stenciled cell list [112], and the LBVH (linear bounding volume hierarchies) tree indexing [113]. Note that although most algorithms can detect the “dangerous” construction of neighborlist (for example, particle 5 moves into the rc sphere)
and update the neighborlist automatically, it is still important to set the update interval δt and skin width rs reasonably to reduce the number of “dangerous” constructions to save
computational time. For example, if the δt is large, the rs should be set relatively larger
accordingly, and vice versa.
2.1.3.4 parallel computing
Most of the simulation algorithms or their variations can be executed in a parallel fashion, and in the implementation realm, parallel computing has became an important area of research in computer architectures and software systems. Simulation algorithms can be greatly accelerated using parallel computing techniques. The basic idea of parallel computing is upon the Amdahl’s law [114]:
S = Ts+ Tp Ts+ Tp/N
(2.20) where S is the “speedup” of the whole computational task, Ts is the portion of the com-
putational task which cannot be parallelized, Tp is the parallelizeable portion, and N is the
number of processors. It is clear to see, in order to increase the speedup S, we may increase N and/or wisely design the algorithm such that Tp Ts.
The hardware architectures to conduct parallel computing include the multi-core proces- sor, in which two (dual), four (quad) or more processing units are assembled together, which is now commonly seen in personal computers; the computer cluster, which combines sets of computers together to a “node”, modern cluster can contain tens to hundreds of nodes; and the general-purpose computing on graphics processing units (GPGPU), compared to the multi-core processor, a GPU contains more GPU “cores”, but they have lower clock rate, simpler instruction and less cache memory, as illustrated in Figure. 2.4, this figure is retrieved from CUDA toolkit documentation of NVIDIA on Feb. 15, 2018 [115], the orange blocks represent the dynamic random-access memory (DRAM) and the Cache memories, the yellow blocks represent the control units, and the green blocks represent the arithmetic logic units. GPGPU is very suitable for the data extensive (millions of particles) but simple arithmetic operation (Newton’s law/MC algorithm) computations such as MD or MC. In this thesis study, all the simulations are performed in GPU clusters.
Figure 2.4 The GPU Devotes More Transistors to Data Processing.