The study of protein motion is of fundamental importance in structural biology due to its intimate link with protein function. The most prevalent method for modelling protein motion computationally is through MD simulations [Gohlke and Thorpe, 2006]. An MD force field describes the interactions between the atoms of the protein, and by numerically solving Newton’s equations of motion, structural fluctuations
of widely used programs to implement the simulations [Karplus and McCammon, 2002]. The detailed atomic trajectories derived make MD simulations easier to probe than experimental data [Karplus and Petsko, 1990]. Since proteins are made up of many thousands of atoms, the large number of equations which need to be solved ensure that MD simulations are computationally intensive. Accurate trajectories
require small timesteps on the order of 1 fs [Wells et al., 2005]. Some of the
most biologically relevant motions take place on the ms – s timescale currently inaccessible to MD. In order to simulate larger amplitude protein motion, methods of coarse graining can be applied [Clementi, 2008]. In such simulations, atomistic detail is neglected and instead conglomerates form the unit particles. As a result the force fields are simplified, the number of calculations is reduced and longer
simulations are possible. Geometric simulations, such as those applied withfroda,
involve the exploration of a conformational space defined by the structure of the
protein and the constraints placed on its atoms by the bond network. first can
be used to establish the bond network and the rigidity it confers on the protein. The application of normal mode analysis to a coarse-grained elastic network model (ENM) [Tirion, 1996] can be used to generate vectors along which to direct such an exploration. These concepts are introduced in the following sections, and described further elsewhere [Wells et al., 2005; Jimenez-Roldan et al., 2012].
1.5.1 froda
The constrained geometric simulation softwarefroda(framework rigidity optimized
dynamic algorithm) rapidly simulates protein motion [Wells et al., 2005]. froda
is a module within first. The framework approach searches for conformers which
satisfy the constraints of the network defined by first. Random motions are used
to generate new conformations, and the re-application of constraints within the network ensures that the conformer becomes valid. Trajectories are formed from continuous pathways between sets of acceptable conformers. The progression of motion is not measured using time, as in MD, but in distances calculated using the newly generated conformers and a reference conformer, typically the initial input structure [Wells et al., 2005]. The algorithm scales linearly with system size, and large-amplitude motions in proteins with hundreds or thousands of residues can be rapidly explored using a desktop computer on a timescale of minutes [Jimenez- Roldan et al., 2012]. The speed of the algorithm comes with the price of sacrificed detail. There is no information on the energy of different conformational states, and
the lack of time in afrodasimulation makes direct quantitative comparison with
between froda simulations and NMR experiments have been made [Wells et al.,
2005; Gohlke and Thorpe, 2006]. Details of the algorithm can be found elsewhere
[Wells et al., 2005]. froda has been used to investigate protein-protein docking
problems [Jolley et al., 2006] as well as substrate recognition and conformational changes [Macchiarulo et al., 2007].
1.5.2 Normal mode analysis
In addition to using random motions, it is possible to direct the motion so that particular conformational changes can be modelled. Using normal mode analysis (NMA) [Diamond, 1990], we direct the motion of the protein along low-frequency normal mode vectors, the superposition of which describes large-scale motion [Hin- sen, 1998]. Previous applications of NMA include predictions of conformational changes in proteins, steered MD simulations and high-throughput comparisons of dynamics in protein families [Bahar and Rader, 2005]. In Chapter 6 we will discuss the results of hydrogen-deuterium exchange NMR experiments which, as shown in Figure 1.2, concern long timescale protein motion. We use NMA to model the the slow large-amplitude motions of the protein using normal mode vectors so that we can make predictions concerning such long timescales of protein motion.
Global protein motion can be described as the superposition of a set of in- dependent concerted motions [Suhre and Sanejouand, 2004; Petrone and Pande, 2006]. Each independent motion describes a state of the system where all of the particles oscillate with the same characteristic frequency [Dobbins et al., 2008]. The vectors describing these independent motions are the normal mode vectors; NMA determines the vectors along with their respective frequencies. The calculation of normal mode vectors from an input structure involves approximating the potential energy of the system around a global minimum. In standard NMA the potential energy function is described using bond lengths, bond angles, and dihedral angles between bonded atoms as well as steric repulsions, van der Waals attractions, and electrostatic interactions between non-bonded atoms [Levitt, 1983]. Minimising this potential energy function is computationally intensive and often inaccurate [Tirion, 1996]. The computational demands of standard NMA applied to proteins are com- pounded by the large system sizes — tens of thousands of atoms — typically involved [Hinsen, 1998].
The potential energy function may simplified in order to remove the minimi- sation step [Tirion, 1996]. In this simplification, the pairwise interactions between neighbouring atoms are modelled as springs, and the spring constant is the same for each interaction [Tirion, 1996]. This simple balls and springs representation is
the ENM. The advantage of the ENM is that the input structure is by definition a minimum for the potential and minimisation is no longer required. The lowest frequency normal modes are not significantly affected when calculated in this way [Tirion, 1996]. Large amplitude protein motion can be accurately described by a small number of low-frequency normal modes [Krebs et al., 2002; Alexandrov et al., 2005; Jimenez-Roldan et al., 2012].
Further simplification is achieved through coarse-graining of the protein. In the coarse-grained ENM, each residue of the protein is represented by a point
mass located at the Cα position. This simplification remains sufficient to calcu-
late the backbone motion of the protein and therefore accurately characterise the low-frequency normal modes [Hinsen, 1998]. Further coarse-graining methods have also been successfully implemented [Gohlke and Thorpe, 2006]. We implement the
NMA of the coarse-grained ENM using theelnemosoftware [Suhre and Sanejouand,
2004]. Normal modes are rapidly generated. The first six normal modes,m1, ..., m6
are the three rotations and translations of a rigid body in space, with frequency of zero. These are neglected, and the next normal modes are used. In the work in
chapter 4 we simulate motion alongm7–m16, the first ten non-trivial modes. These
are the normal modes with the lowest non-zero frequencies. In Chapter 4 we use an approach which combines random motion with directed motion along the mode direction [Jimenez-Roldan et al., 2011].