• No results found

Binding A ffinity Prediction

8. Lennard-Jones terms: The repulsive overlap-Exchange and attactive

1.4.4 Interaction Thermodynamics

1.4.4.8 Binding A ffinity Prediction

In section 1.3.1.3, it was shown how knowing the strength of the interactions between the molecular constituents of biological systems is antecedent to understanding cellular logic. There are a number of methods of calculating affinities from structure, such as thermodynamic integration, free-energy perturbation, MM-PBSA and others (Zacharias, 2010b; Gilson and Zhou,

2007). However, these methods are very expensive, requiring extensive conformational sampling. Even considering the advances in GPU acceller-ated molecular dynamics, scoring whole interactomes is beyond the remit of these techniques. Faster methods can be broadly split into two categories;

knowledge-based potentials, and "master" thermodynamic equations, both of which must be empirically parametrised.

1.Introduction68

inhibitors/other enzyme-inhibitors/antibody-antigen/small peptides/others). The para-meters (Par.), number of variable terms (Var.), reported performance (Per.), the method used (Method) and the reference (Reference) are also given. The method is reported as either a potential of mean force (PMF) or a sum of terms (sum), with the name given where applicable.

Cases Var. Perf. Method Par. Reference

3 (0/0/3/0/0) 0 10.5a sum Electrostatics, hydrophobic burial, side-chain entropy, constant Novotny et al. (1989)

15 (13/0/1/0/1) 3 0.96b sum Hydrophobic burial, polar burial, constant Horton and Lewis (1992)

9 (7/0/0/0/0) 0 2.4ac sum Electrostatics, H-bonding, side-chain entropy, constant Krystek et al. (1993) 9 (7/0/0/0/0) 0 1.3ac sum Electrostatics, hydrophobic burial, side-chain entropy, constant Vajda et al. (1994)

14 (14/0/0/0/0) 2 0.9a sum Electrostatics, desolvation Nauchitel et al. (1995)

9 (9/0/0/0/0) 4 0.74b sum Statistical function Wallqvist et al. (1995)

21 (16/0/2/0/3) 4 0.86b sum Hydrophobic burial, hydrophilic burial, # hydrophilic pairs, constant Xu et al. (1997)

9 (9/0/0/0/0) 0 0.7b PMF, ACE Coefficient, offset Zhang et al. (1997)

20 (16/0/3/0/1) 0 0.94bc sum Electrostatics, hydrophobic burial, side-chain entropy, constant Weng et al. (1997) 2 (1/0/1/0/0) 0 0.54a sum Electrostatics (water and self), VDW (water and self), cavitation,

en-tropy (translational, rotational, vibrational and configurational)

Noskov and Lim (2001)

28 (16/1/7/0/4) 1 0.75b PMF Coefficient, offset Jiang et al. (2002)

19 (16/1/1/0/1) 3 0.95b sum Hydrophobic burial, side-chain entropy, # hydrophilic pairs, constant Ma et al. (2002)

69 (19/2/8/27/13) 2 0.87b PMF, DComplex Coefficient, offset Liu et al. (2004)

52 (14/1/5/27/5)d 0 0.79be0.85b sum, Rosetta Electrostatics, H-bonding, VDW, EEF1 desolvation, pair-potential, wa-ter potential

Jiang et al. (2005)

82 (21/3/14/27/18) 2 0.73b PMF, DComplex Coefficient, offset Zhang et al. (2005b)

24 (18/1/2/0/3) 7 0.98b0.95f 0.62a sum, AffinityScore Interface gap volume, # exposed charges, # salt bridges, # hydrogen bonds, # constricted torsions, # exposed hydrophobic groups, constant

Audie and Scarlata (2007) 20 (7/2/5/0/6) 5 0.83b sum Hydrophobic burial, polar burial, charge-charge interaction, charge

burial, side-chain entropy

Bougouffa and Warwicker (2008)

86 (25/2/13/26/20) 0 0.76b2.24a PMF None Su et al. (2009)

63 (6/15/4/0/37) 8g 0.73bc f sum Trans/rot entropy, # atom pairs, # non-polar residues, # non-polar atom pairs, interface planarity, interface gap volume, gap volume/interface area ratio, constant

Bai et al. (2011)

aRMSE, kcal mol−1.

bCorrelation.

cTwo complexes omitted from evaluation.

dIdentities from personal communication.

eWithout water potential.

fLeave-one-out cross-validation.

gFeature selection.

In the knowledge-based approach, statistical potentials (described in section 1.4.4.4), are used to predict the binding free energy of complexes with known binding affinity. Appropriately formulated, these do not require training on binding free energies, only structures (Zhang et al., 1997; Su et al., 2009). Having no adjustable parameters, these methods carry no risk of over-fitting. Often, however, a correction to this prediction scheme, in the form of a coefficient and an offset, is derived by linear regression of the predicted energies against empirical binding free energies (Liu et al., 2004;

Zhang et al., 2005b), or by just adjusting the gradient (Jiang et al., 2002).

Such pair potentials cannot tell us about how the free energy is factorised into entropic and enthapic components, nor can it be factorised according to physical origin: electrostatics, desolvation, Van der Waals and so on.

In the "master equation" scheme, a number of physically relevant terms are calculated explicitly and are taken in linear combination. In some cases, the inclusion and weighting of each term is based on physical law, such as Coulomb’s law, or on empirically parametrised functions, such as the function relating the change in hydrophobic surface to the free energy of transfer from aqueous to non-aqueous solvent (Chothia, 1974). This information is derived from data other than the affinity test set and thus, again, no fitting is required (Novotny et al., 1989; Krystek et al., 1993; Vajda et al., 1994; Weng et al., 1997). In other approaches, the weights of some or all terms are determined by linear regression against a training set (Horton and Lewis, 1992; Xu et al., 1997; Ma et al., 2002; Bougouffa and Warwicker, 2008; Bai et al., 2011). Of course, the physical origins are known in these methods, as their contributions are calculated explicitly, and their form can also be related to the enthalpy and entropy of the binding for comparison with isothermal titration calorimetry data (Weng et al., 1997). However, often parameters can be colinear (Bougouffa and Warwicker, 2008; Vajda et al., 1994), and thus it is difficult to be certain that the functional form appropriately reflects the underlying physics. For instance, the affinities of almost identical test sets can be equally predicted by equations of differing functional form (Horton and Lewis, 1992; Xu et al., 1997; Weng et al., 1997).

A summary of methods published to date is given in Table 1.3.

In most efforts to predict binding free energy, the rigid-body

approxim-ation was invoked. Either the rigid-body assumption was implicit as the test/training sets were derived from previous publication in which struc-tures which undergo conformational changes were excluded (Jiang et al., 2002, 2005; Su et al., 2009), or flexible cases were explicitly excluded from the set of complexes. Prior to Liu et al. (2004), only small sets of proteins were used, and these mostly constituted protease/inhibitor interactions, and other high affinity complexes composed of rigid binding partners, such as barnase/barstar, the insulin dimer, the α and β chains of deoxyhaemoglobin and lysozyme/antibody complexes. In these studies, excellent agreement could be made between experiment and theory, with correlations up to 0.96 reported (Horton and Lewis, 1992; Nauchitel et al., 1995; Weng et al., 1997).

Between 2004 and 2009, four papers were published in which free energy functions were applied to more diverse sets of complexes (Liu et al., 2004;

Jiang et al., 2005; Zhang et al., 2005b; Su et al., 2009). Most of this added diversity came from the inclusion of small peptides, typically between 2 and 5 residues in length, and mostly involving interactions with oligopeptide-binding protein (Sleigh et al., 1999, 1997; Tame et al., 1995). However, other interactions such as those including G-binding proteins, signal transduc-tion complexes and hormone/receptor pairs started entering the data sets.

Correspondingly, there was a decrease in the correlation between the ex-perimental binding free energies and theoretical results. In the work of Bai et al. (2011), feature selection and multiple regression was used to construct models to predict affinity, dissociation and association rates for a diverse set of complexes. However, it is not clear how they impliment their feature selection algorithm, and there are a large number of adjustable parameters.

Further, the binding energy function seems incongruous, involving terms such as the number of contacted atom pairs per 100Å2 interface area, the volume of space between the interface per 100Å2interface area, the planar-ity of the interface and the number of non-polar residues at the interface.

A vastly more diverse set of interactions for which structural and affinity data was available was collected and tested by Kastritis and Bonvin (2010).

In this set, no class of interaction is over-represented, as the list of complexes from which it was derived contains no pairs with high sequence identity, except for antibodies (Chen et al., 2003). Post corregendum, their test set contained 46 complexes and 12 binding free energy functions were applied.

The greatest correlation with these data is 0.53, highlighting the degree to which previous studies were biased towards rigid proteins and particular classes of interaction.