Chapter 2 Methods
2.5 Analysis
In the analysis of all interfacial simulations carried out in this work it was important that adsorbate-surface separation was defined in a consistent manner. In the case of the aqueousα-Quartz interface, the average zco-ordinate of Si atoms in the top layer of silica was chosen as the reference point. The same definition has been used in previous studies [Notman and Walsh [2009]; Oren et al. [2010]]. Similarly, binding distances were measured from the z co-ordinate of gold atoms in the uppermost layer of the three gold surfaces. For the reconstructed Au(100) interface this was an average co-ordinate due to undulations in the surface.
2.5.1 Hydrogen Bonding
The geometric criteria defined by Jedlovszky et al. to determine the existence of a hydrogen bond in a box of TIP3P water were employed in the analysis of most simulations reported in this thesis [Jedlovszky et al. [1998]]. (Hydrogen bond anal- ysis presented in Chapters 5 and 6 was carried out slightly differently for the rea- sons outlined in the text.) Specifically, a hydrogen bond was said to be present on the simultaneous satisfaction of two distance metrics–O/N· · ·O and H· · ·O–and one angle–H· · ·(O/N)· · ·O. In Chapters 3 and 4, cut-off distances were fine-tuned for each specific type of hydrogen bond using the position of the first minimum in the appropriate radial distribution function (RDF), following the precedent set in reference Jedlovszky et al. [1998]. Distances derived for intermolecular hydrogen- bonding in TIP3P water were used in all other cases. An angular cut-off of 30◦ was
Jedlovszky et al. [1998]; Luzar and Chadler [1996, 1993].
The persistence of a hydrogen bond over two different time-scales (ps and ns) can be probed by calculation of the continuous S(t) and instantaneous C(t) hydrogen-bond time autocorrelation functions respectively. These are defined as:
X(t) = < h(0)·h(t)>
< h > (2.33)
where < h > is the number of hydrogen bonds at t=0 averaged over a number of different starting points throughout the simulation. h(t) is 1 if a hydrogen bond present at time 0 is also present at time t and 0 otherwise; for X(t) = S(t) this holds only if the particular hydrogen-bond exists between times 0 andt, whilst for
X(t) = C(t) it is independent of this condition. For short time-scale simulations, such as the CPMD simulations reported in Chapter 4, the continuous hydrogen- bond time-autocorrelation function,S(t), has been reported to be a better measure of hydrogen-bond lifetime [Rosenfeld and Schmuttenmaer [2011]; Paul and Chandra [2004]] because it normally displays exponential decay within the timescales probed and is not dependent on the time at which the hydrogen bond was formed. In ad- dition, for small length-scale systems it is inevitable that hydrogen bonds between specific atoms will break and reform in time due to the limited number of atoms. This would potentially lead to an over-estimation in the characteristic hydrogen- bond lifetime if the instantaneous, rather continuous time autocorrelation function was used. On the other hand, for large systems and larger time-steps, the instanta- neous correlation function is more appropriate since it is independent of sampling frequency apart from for very short life-times. It must be noted that in this study hydrogen bonds between individualatomsrather than betweenmoleculeshave been identified in all analyses. (For example, hydrogen bonds HaOc−Hb· · ·OdH2 and
HbOc−Ha· · ·OdH2 were classed as distinct.)
2.5.2 Peptide Structure Clustering
To identify the most likely structure a peptide or protein sequence is likely to adopt in a particular environment it is often useful to group similar structures, generated during a simulation, together into clusters. Not only this, but cluster analysis can also:
• give an indication of the amount of peptide conformational space explored by a system. (For example a peptide sequence which samples a larger number of clusters than another is likely to be conformationally more lable, whilst when benchmarking two methods using the same system, the method which
enhances conformational sampling the most will give rise to the largest total number of clusters.)
• be used as a metric for equilibration of a simulation by monitoring convergence in the total number of clusters identified.
Throughout this thesis (Chapters 5, 7 and 8), the Daura clustering algorithm was used to classify peptide (QBP-1 [Oren et al. [2007]] and AuBP-1 [Hnilova et al. [2008]]) structures into clusters [Daura et al. [1999]]. Briefly, the protocol for this procedure was as follows:
1. Calculate the root mean squared deviation (RMSD) of a subset of peptide atoms, after least-squares fitting, between all pairs of structures within the pool of structures generated during a trajectory.
2. Determine the number of neighbours each structure has. A ‘neighbour’ is defined as another structure for which the RMSD between the pair is less than a cut-off value.
3. Assign the structure with the largest number of neighbours (labelled the ‘cen- troid’ structure of the cluster), along with all its neighbouring structures to a cluster.
4. Remove all structures assigned to the cluster from the overall pool of struc- tures.
5. Repeat steps 1-4 until all structures have been assigned to a cluster.
Unless stated otherwise, peptide structures were clustered according to their backbone atom positions, with a RMSD cut-off of 2 ˚A. The same procedure and cut-off were used to assign structures of QBP-1 in solution [Notman et al. [2010]]. Here a cluster was defined as being ‘significant’ based on its size. Specifically, the total number of ‘significant’ clusters was that required to account for 95% of the total structure population.
To investigate whether similarities existed between different simulations (Chap- ter 5) or environments (Chapter 7 and 8), in the structures most likely to be adopted by a given peptide, additional RSMD calculations were performed. In this case only the centroid structures of the most populated clusters from each en- vironment/simulation were chosen. Clusters, and the structures within them, were said to be ‘identical’ if the RMSD was less than the cut-off used in the original cluster analysis (2 ˚A) and ‘similar’ if it fell within the range 2-3 ˚A.
Figure 2.2: Boundaries inφ/ψspace demarking the principal regions in a Ramachan- dran plot for analysis of QBP-1 structure.