CHAPTER 2. METHODS
2.5 Trajectory Analysis
Root Mean Square Deviation (RMSD) is a quantitative measurement of the structural dif- ference between two conformations of a molecular structure. To calculate the RMSD, the two
ij rij
i
jerij/ij
12
conformations of the structure have to be overlaid so as to eliminate the translational and rota- tional deviations. The RMSD can then be calculated by:
RMSD 1 N rn i r n j
2 n1 N
(2.73)where N is the number of atoms included in the measurement, is the index of the atom, and is the coordinates of the n-th atom, i and j refer to the i-th and j-th conformation of mole- cule respectively. The summation in equation 2.73 can include any set of atoms of the structure and is usually in the unit of Å. With biomolecular simulations system, the RMSD analysis often includes only the heavy atoms (non-hydrogen atoms). In the case of proteins and DNA, the anal- ysis usually only includes the backbone or alpha carbons (Cα) of the protein and phosphate
groups of the DNA backbone since they are usually sufficient to describe the RMSD difference.
2.5.2 Principal Components Analysis
Principal components analysis (PCA) is generally a data analysis method that identifies patterns in multi-dimensional data and highlights both similarities and differences from different samples. In PCA, the original data set is transformed to a new coordinate system, in which the greatest variance exists when the original data was projected onto the first principal component (PC), and the second greatest variance does on the second PC, and so on and so forth. PCA has been shown to alleviate the difficulty in grasping the largest-amplitude modes (slowest dynamic motions), which are often of the most interest to us, from biomolecular simulation systems (98).
Consider an -dimensional trajectory data set consisting snapshots from a MD simula- tion. This trajectory data set can be expressed as a matrix of columns and rows. The data set is multidimensional, and the whole point of PCA is to reduce dimensionality in order to ex- amine the relationships of the different conformations from the simulations. To perform PCA,
the mean of each dimension of the data set is firstly subtracted from each dimension following by constructing a covariance matrix as showing equation 2.74. The covariance between any pair of variables (dimensions) indicates the magnitude of one variable of varying from the mean with respect to the other.
cov(x, y) xi x
i1 m
yi y
m1
(2.74)With the multi-dimensional MD trajectory data set, there are multiple covariance can be calculated using different pairs of variables. As a result, would become a by matrix for a dimensional trajectory data, with each element being the covariance between one pair of vari-
ables. Noticing that this covariance matrix is symmetric since , , , and the
elements on its diagonal are the covariance between one variable and itself ( , ), which is the variances of it. The eigenvectors of the covariance matrix are the PCs we are interested in. By diagonalizing the covariance matrix, the eigenvectors and corresponding eigenvalues will be obtained. We could then represent the original trajectory data in terms of these orthogonal eigen- vectors instead of the original Cartesian coordinates. The variance magnitude in the direction of one particular eigenvector is indicated by the corresponding eigenvalue.
After the eigenvectors and eigenvalues for the covariance matrix are found, they are sort- ed from high to low according to eigenvalues, with the first PC being the eigenvector the with highest eigenvalues, indicating it contributes the most to the variance of the trajectory data set. It is usually the first few PCs that interest us since they account for the most part of variance (the percentage of the contribution of one PC can be calculated by dividing the associated eigenvalue by sum of all eigenvalues). Projecting the original data set onto the chosen PCs would allow for
reducing the dimensionality and retaining the large-scale motions of the of the trajectory data set. The transformation can be done by using equation 2.75:
Xr ET XT (2.75)
where Xr is the transformed data matrix with reduced dimensionality given not all the eigenvectors were chosen, E is the transposed matrix of the chosen eigenvectors, and T
X
T is thetranspose of the original data matrix with the means subtracted for each dimension. The original trajectory data set is now represented in terms of the eigenvectors of choice. It’s usually of inter- est to plot the first up to 3 PCs against one another to visualize the projections of the data onto them so that the large-scale motions can be identified by locating the paths between clusters in the plot.
2.5.3 Normal Mode Analysis
Strictly speaking, Normal Mode Analysis (NMA) a simulation technique to probe the large-scale motion of the biomolecules instead of being an analysis tool for MD trajectory. A standard NMA requires a set of coordinates, a force field describing the interactions between the atoms. The process of a standard NMA includes minimization of the potential energy function, constructing the Hessian matrix (the second derivative matrix of the potential energy function with respect to the mass-weighted atomic coordinates), and the diagonalization of the Hessian matrix (the mathematical treatment is the same as it was with the covariance matrix in PCA). It is computationally expensive due the high dimensionality of the biomolecular systems. Therefore, a simplified model called Elastic Network Model (ENM) was introduced into protein study (99) and become popular due to its simplicity. In ENM, the atoms are connected by a network of elas- tic springs. With ENM, the energy minimization is not needed given all the elastic springs are at the equilibrium distances, and the number of atoms included in the model is greatly reduced to
the number of residues (or even less). Even with a great deal of simplification comparing to the standard NMA, a respectable degree of correspondence has been shown between the two meth- ods (99).
CHAPTER 3. PROBING DNA CLAMP MECHANICS WITH SINGLE-MOLECULE