• No results found

In this chapter, we evaluate the Gaussian Network Model (GNM) which was reviewed in the previous chapter, for a set of protein structures. We downloaded a set of protein structures from the Protein Data Bank. And we explain how we implemented an algorithm in MATLAB (Matrix Laboratory), which is a numerical computing program widely used in academic in- stitutions. In this work, we clarify the procedures for one structure first, and then illustrate those for the sequence of protein structures. Finally, based on the implemented algorithm, correlations with experimental data are calculated. And then we demonstrate the distribution of correlations.

3.1 Introduction

Proteins are one of the most important biological compound. Chains of sequence of amino acids forms a three dimensional protein structure. In other words, the order of the amino acids sequences determines the particular shape of three dimensional structure, and the shape defines the functional roles of the protein structure. Therefore, study of the relationship between the formation of protein structure and dynamics has been performed for decades. We can understand the biological functions better when we understand the dynamics of the entire structure better. Therefore, it is not only essential but also important to research the structural fluctuation of proteins to get better ideas of biological functions as we discussed in the previous chapter. Normal Mode analysis requires the potential energy to reach minimum, in which the protein is in a stable state. Then, the potential energy can be approximated by a quadratic

function around this minimum state. The system of equation of motion (2.4) can then be

solution for the analyzing atomic fluctuations as well as overall structural vibrations. However, calculation of the energy minimum is usually very expensive because the Hessian matrix is huge. In addition, the potential energy function is also estimated approximately, hence the function has atomic detailed errors. Therefore, the need for a model which does not require the energy minimization but also reflects the structural vibrations has been proposed. Gaussian Network Model (GNM) was proposed and it turns out that GNM is useful especially when we investigate the structural fluctuations. In this chapter, we plan to examine how well the GNM predicts the residue level fluctuations compared to experimental data.

3.2 Preparation for evaluation

We downloaded a set of protein structures from the Protein Data Bank satisfying certain conditions: We chose protein structures based on X-ray crystallography experimental method

with high resolution (higher than 1.5˚A). Also we filter the search results based on sequence

similarity at 30% identity. In other words, multiple structures whose sequences have at least 30% of sequence identity will be represented by a single structure. Finally, we obtained 2,052 protein structures satisfying the previous conditions.

Since GNM is residue-level model, we consider Cα as the representative atom of the residue

for given structure. Thus, a contact matrix of given structure can be obtained by considering the

distances between Cαatoms. After that, we compute the residue mean square fluctuations using

singular value decomposition of contact matrix for the given set of protein structures. Finally,

because mean square fluctuation has a linear relationship with B-factor by equation (2.30),

we can calculate correlation coefficients between residue mean square fluctuations computed by GNM and X-ray crystallography experimental B-factors given in PDB file.

3.3 Implementation of algorithm

We implemented an algorithm in MATLAB running on a standard desktop workstation.

By definition (2.23), the contact matrix is a so-called Laplacian matrix. The Laplacian matrix

contact matrix of GNM always has a zero eigenvalue, the inverse of the matrix does not exist. Therefore, the pseudo-inverse should be computed by using the singular value decomposition (SVD) of the matrix. In other words, while implementing the algorithm, we only collected the singular values greater than 1e-5 to compute mean square fluctuations by using equation

(2.28). After calculating the mean square fluctuations, we compute correlation coefficients of

mean square fluctuations with experimental B-factors in PDB file. As we discussed in the Chapter 2, since the B-factors are only constant multiplication of the mean square fluctuations

(2.22), correlation coefficients between predicted mean square fluctuations and experimental

B-factors can be directly computed. Once the correlation coefficients for one structure is

obtained, we repeat the same procedures to have correlation coefficients for the given set of protein structures. And those calculated correlation coefficient values are used for showing the distribution for the chosen set of protein structures.

3.4 Evaluation Results

Figure 3.1: Distribution of correlations for protein structures using GNM

Figure 3.1 shows the distributions of correlation coefficients for the given set of protein

structures. Among 2,052 protein structures, only correlations of 1,817 structures were com-

puted; We remove some structures that the number of Cα for predicting mean square fluc-

tuations is not equal to that of Cα for experimental B-factors in PDB file. Sometimes, the

the X-ray crystallography B-factors. Also, we exclude the chosen structure if the structure is

not a protein such as DNA or RNA structure. Figure 3.1 illustrates the smooth distribution

over the correlation coefficients for the 1817 structures. We can see that almost 70% of the structures have the correlations greater than or equal to 0.5, which means that GNM predicts the residue mean square fluctuations pretty well. In Chapter 5 and Chapter 6, we will com- pare our proposed models to this benchmark to understand how well the model predicts the structural fluctuations.

Related documents