Analysis and sampling - Molecular Dynamics simulations

2.3 Molecular Dynamics simulations

2.3.4 Analysis and sampling

In MD macroscopic magnitudes are obtained by averaging the microscopic variables of different configurations over time. As it has been previously stressed, the equations of motion that define the system trajectory should be ergodic, for those configurations to represent a correct sample of the working ensemble. Two time scales need to be observed in order to obtain correct averages. In MD simulations the system evolves towards states of lower free energy, and therefore, the microscopic properties of the successive configurations change with time before the steady state is reached. It is thus important to identify relaxation times, so averaging is only performed on equilibrium states. On the other hand, correlations need to be accounted for when determining the statistical error of measurements. In order to get ride of these repetitive samples, only configurations with intervals bigger than this time are recorded or used to calculate the macroscopic averages. However, since the system usually has several degrees of freedom, with different relaxation and decorrelation times, the estima- tion of those time scales may not be obvious. In order to reduce the poor statistics from long correlation times in specific degrees of freedom, averages are usually taken by blocks. The timeseries of a magnitude is divided in 10 blocks or so, each one containing ni samples. The

mean values from each block, Ai = ai/n, are averaged together and the standard deviations

calculated in order to get a good estimate of the observable:

< A>= nblocks

∑

i=1 Ai/nblocks , σ(A) = s (Ai− < A>)2 nblocks−1 (2.42) In general, macroscopic quantities depend on particle positions, velocities and forces, and are either calculated directly during the simulation and averaged at the end, or the configurations are recorded and the magnitudes of interest are obtained from post-simulation analysis. Temperature and pressure are typical thermodynamic quantities usually calculated in MD simulations. They do not only give important information about the system in NVE ensembles, but are crucial for thermostat and barostat dynamics in NVT and NPT. The temperature is related to the velocities of the particles via the equipartition theorem, which predicts that the kinetic energy of each degree of freedom equals kBT/2:

< N

∑

i=1 |p_i|2_/m i >=2< T >=NdkBT (2.43)

2 Methods

where N and d are the number of particles and degrees of freedom in the system, and T is the instantaneous kinetic energy. The pressure can also be obtained from the position, velocities and forces using the virial equation (eq. 2.39).

Diffusion describes the mobility of the particles in a solvent. In membranes, phospholipid and protein diffusion determines the contact rates and the membrane deformation, both important in signal transduction and cell communication. The diffusion constant is obtained from the mean squared displacement of the particles:

D= 1

6τ < |ri(t+τ) −ri(t)|

2_> _(2.44)

where the average is taken over the ensamble.

In many cases, structural magnitudes are also important to characterize the system, order parameters, for example, are very useful to determine the different phases of membranes. Those magnitudes are often defined on each specific system, and need to be carefully chosen. Order parameters form a set of independent variables that can completely characterize the system. Sometimes, the whole set does not need to be defined, and only one of them can describe the features or processes of interest.

The free energy completely characterizes the thermodynamics of the system. The equilibrium states, energy barriers, and the probability of finding a specific state of the system are easily derived from it. Analytically it can only be calculated for very few and simple cases, and computationally is not trivial to obtain. When a full sample for all the degrees of freedom of a system can be performed, the free energy is obtained from the probabilities of the different states contained in the definition of the partition function, Z:

Z= 1

N!Λ3

e−U (q0,q1...qN)kBT_dq

0dq1...dqN (2.45)

where qi are the different degrees of freedom and the coefficient before the integral corre-

sponds to the integration of the kinetic part of the hamiltonian. The free energy is then given by:

F= −kBTlnZ (2.46)

However, the full sampling of the system cannot usually be obtained because there are states with low probabilities that are poorly sampled or even inaccessible when energy barriers separate two regions of the configuration space (Fig. 2.4). In some cases, the system can be initialized at specific states, but in others, when those configurations are the result of the collective behavior of many particles, they can be totally inaccessible with the normal dynamics. Luckily, important information can be obtained without the need to determine fully and exactly the free energy. In many cases, the study of the system is focused along a specific reaction coordinate able to characterize the process under study. A partition function can be defined when one of the degrees of freedom is fixed:

Figure 2.4:The energy barrier prevents the system to evolve to states of high energy or to cross the barrier. In this case, all the states cannot be sampled by normal dynamics.

Z(ζ) = 1 N!Λ3

e−U (q0,q1...qN)kBT_δ₍_ζ₋_q

0)dq0dq1...dqN (2.47)

and the equivalent to the free energy for this partition function is called the potential of mean force:

F(ζ) = −kBTlnZ(ζ) (2.48)

When ζ adopts all the values along the reaction coordinate, the potential of mean force can be understood as the projection of the free energy on that coordinate. Sampling can be difficult even for a single coordinate, when some states are unlikely to happen. If this is the case, a smart method to sample the system needs to be applied. Umbrella Sampling [121] is one of those methods, where the reaction coordinate is divided into different windows, and an artificial biasing potential Vb(ζ)is added to the system potential energyU (q)in each ot

them (Fig. 2.5):

V0 = U (q) +Vb(ζ) (2.49)

The biasing potential shifts the original potential so that the probability to sample unacces- sible states is increased. Multiple simulations are performed with different biasing umbrella potentials V_ib(ζ)that center the sampling in different overlapping regions or windows along ζ. A reasonable choice to produce the biased ensembles is to use for every window i an

harmonic function of the form:

V_ib(ζ) = 1

2k(ζ−ζi)

2 _(2.50)

The new system with potential energy V0 can be correctly sampled, but the resulting distributions are biased, ρ_ib(ζ), and do not correspond to that of the original system, ρi(ζ),

although, they are both related:

ρ_i(ζ) =eβVb(ζ)e−β(U (q)+Vib(ζ)) =eβVb(ζ)_ρb

2 Methods

Figure 2.5:System potential along a reaction coordinate, ζ, in red, and biasing potential along the same coordinate, in blue, used together in umbrella sampling.

Since we know the biased potential, the samples can be unbiased, but to obtain a full unbiased distribution, data from all the simulations have to be combined. The Weighted

Histogram Analysis Method (WHAM) [122] combines the samplings from the different windows and their resulting biased distributions to give an estimate of the free energy of the system. For a set of S umbrella sampling simulations where Ni samples are obtained for the

i-th simulation, WHAM proceeds in the following way: these samples are discretized into M bins to determine a histogram with respect to the biasing coordinated. ρij is the estimate

of the biased probability in the j-th bin of the i-th simulation. The total biased probability distribution from the i-th simulation is then:

ρb_i(ζ) =

∑

j=1

ρij (2.52)

and the combined distribution from all the windows gives the total unbiased distribution:

ρ(ζ) = S

∑

i=1 ci(ζ)ρi(ζ) = S

∑

i=1 ci(ζ)eβV b i _ρb i(ζ) (2.53)

where ci(ζ) are the normalized weights, ∑ici(ζ) = 1. Since, in general, the total potential

is not flat, the optimal weights will not be equal, and should be chosen so that the samples with higher uncertainty in their estimates of the unbiased probabilities are weighted less. A simple way to choose the weights followed by WHAM is the minimization of the variance of the estimated unbiased probability:

∂σ2[ρ(ζ)] ∂ci

=0 (2.54)

Eqs. 2.53 and 2.54 form a system of linear equations that are solved iteratively until self- consistence to obtain the weights that define the unbiased probability so the potential of

mean force along the reaction coordinate can be calculated from eq.2.48.

In document Dynamics of virus assembly and budding: a coarse-grained approach (Page 69-73)