Normal Mode Analysis and Conformational Transitions
3.2.6 Energy Function
3.3.1.1 PSO Variant Selection
Table 3.1: The performance of different PSO variants described in section 3.2.4.3, for local bound-bound docking; agglomerated results for 7 test complexes. The number of times a correct solution is found, irrespective of its ranking, is shown. The average neighbourhood size, population size and number of steps required before successfully finding the bound complex is also shown, in order to demonstrate the most successful neighbourhood sizes and population sizes for the variants, as well as how many iterations are required, on average, in order to find the correct solution.
Method # Hits av. k/n av. n av. steps
BPSO 100 0.36 339 219
CPSO 68 0.23 350 317
CPSOvc 223 0.35 326 246
CPSO2 56 0.19 339 169
RPSO 728 0.48 323 273
The first test of the algorithm was performed to select the PSO vari-ant which performs most favourably. Initial benchmarking with the five variations of the PSO algorithm described in section 3.2.4.3 was done by rigid body docking of seven diverse structures in their bound conformation.
These complexes are as follows: antibody/antigen complexes 1MLC and 1E6J, enzyme/inhibitor complexes 2MTA and 1F34, and other complexes 1GCQ, 1I4D and 1H1V. 1MLC, a FAB/Lysozyme complex, has a highly hy-drophobic interface whilst 1E6J, FAB/HIV1 capsid protein p24 complex, has significant electrostatic contribution to the binding energy. 2MTA, methyl-amine dehydrogenase/amicyanin complex is hydrophobic, with no inter-molecular hydrogen bonds and one salt bridge and 1F34, pepsin/pepsin inhibitor complex, has a large interface with 13 hydrogen bonds. 1GCQ, GRB2/VAV complex, has a fairly small interface. 1I4D, arfaptin/RAC1-GDP complex is hydrophobic, with only 2 hydrogen bonds. Finally, actin/gelsolin complex 1H1V has a large interface which is very electrostatic in nature.
As this test was performed to establish the behaviour of the swarm, only local docking was undertaken. Ligands were pulled away from the
Figure 3.3: Correctly determined poses were placed into bins of 25 iterations for the 5 different PSO variants. The RPSO method is shown to be considerably more likely to find the correct binding site than the other methods. The downward trend after the first hundred or so iterations is due to the swarm focusing on low energy regions away from the binding site and no longer exploring new regions of search space.
receptor by 20Å, moved randomly by a number generated from a Gaussian distribution (σ = 30Å) and randomly oriented. Different swarm sizes, n, and neighbourhoods, k, were tested. For each complex, 25 parameter sets were tested, being all permutations of n = 100, 200, 300, 400, 500 and k = 2, 0.25n, 0.5n, 0.75n, n − 1. Runs for each parameter set were repeated 10 times. The number of times the binding site was found (RMSD < 1.0Å compared to crystal structure) is shown in Table 3.1, along with the average neighbourhood size, swarm size and number of iterations for successful runs. In Figure 3.3, correct binding site identification events are put into bins of 25 iterations for the various methods.
It is clear from Table 3.1 and Figure 3.3 that the RPSO variant performs significantly better than the others. For this reason, this variant was chosen for further parametrisation and for all subsequent docking runs. Based on the average population size and neighbourhood size of successful runs shown in Table 3.1, the number of particles used per run was set to n=350, with a neighbourhood size of k=114 for all subsequent runs.
It was speculated that the enhanced performance of the RPSO variant compared to the other variants was due to the distance-dependent repul-sion term maintaining the diversity of the swarm when no particular region of search space had yet shown to be consistently lower in energy. Mechanist-ically, contraction of the swarm will reduce the magnitude of the repulsion term and hasten further contraction. The initial collapse will occur when the pi and pg,i for many particles coincide and the particles all head to toward close regions in search space. To test whether this effect prevented prema-ture convergence, the diversity of the swarm was tracked as the algorithm progressed. As a measure of how diffuse the swarm is, the mean Euclidean distance between the particles was calculated every 25 iterations. This is the distance separating centres of mass of the particles averaged over all particle pairs. The results, averaged over all runs, is shown in Figure 3.4A. Upon termination, the centre of mass of particles for successful runs are around 5Å closer to one another than for runs in which the correct binding site was not found, indicating that the repulsion term is behaving as expected.
To look into this effect further, successful runs were separated into 6 groups depending on when the correct binding site was found. The results
Figure 3.4: The convergence behaviour of the RPSO. (A) The mean Euclidean distance between ligand centres of mass, averaged over all runs, successful runs and runs which failed to correctly locate the binding site. (B) The mean Euclidean distance for successful runs.
of this can be seen in Figure 3.4B. It is clear that the swarm collapses earlier if the binding site is found earlier, an effect which was not observed for any of the other PSO methods tested. For runs in which the correct binding site is found between iteration 101 and 200, the swarm had already started to collapse prior to iteration 100. This indicates that the low energy attractors, pi and pg,i, had already flagged the binding funnel as a promising region worthy of focusing the search, in keeping with the above mechanism.