Screening And Docking - Computational Methods Used in This Project

1.4. Computational Methods Used in This Project

1.4.1. Screening And Docking

1.4.1.1. High-Throughput Screening

In virtual screening, a very large number of compounds are investigated with the aim of generating a rough ranking, from which can be drawn a smaller, refined set of compounds for testing by more accurate methods130. There are two options: ligand-based screening and structure-based screening. Shangary and Wang44 discuss the roles that cheaper, computational methods, such as in silico screening and de novo compound design, have played in the development of p53-Mdm2 inhibitors.

Ligand-based screening seeks to predict new inhibitors through comparison with existing ligands or, in the case of enzymes, the substrate and or predicted transition state which must be stabilised. One piece of software facilitating such screening is ROCS (OpenEye)131. Ligand- based design is relevant to the current project in that oligobenzamides have been chosen for use as scaffolds based on their ability to place side chains in a similar relative location to those of residues in the p53 helix. However, ligand-based screening is of little relevance to the testing of potential inhibitors in this project. It is biased towards ligands which are similar to existing molecules, preventing their fair comparison. Furthermore, this bias means that ligand-based screening is not a source of interesting new chemistry.

61 In contrast, structure-based drug design makes direct use of the target protein’s structure rather

than relying on the properties of the target protein being encapsulated in those of the known binding molecules. The success of structure-based methods depends on the availability of accurate structural information, which can be difficult to obtain. Many proteins, particularly membrane proteins, are not amenable to crystallisation or NMR and a suitable structure is not available. Fortunately, there are many X-ray crystal and NMR structures of Mdm2, the focus of this project.

Structure-based screening typically involves docking, the computational fitting of compounds into the surface of a protein to predict where they will bind most strongly. In virtual screening, the docking must be rapid so the structures of the protein and ligands are often static. In such cases, a number of different conformations of the ligand are generated and docked to account for the ligand flexibility132. If the ligand conformation is changed during docking, this movement may be limited to a small number of degrees of freedom. For example, in ReCore (BioSolveIT)133, the problem of finding a side chain that fits is simplified to a search for groups with the correct vectors, the vectors being the direction of the leaving bond (the bond

connecting the fragment to a larger molecule) and the directionality of a pharmacophore feature such as a hydrogen bond donor. Possible side chains are rotated relative to the core structure but there is no rotation about any of the bonds within the core or the side chain. Possible side chains are ranked based on the relative geometry of the query vectors and the corresponding vectors in the side chain133. According to the ReCore manual, the closer the vectors and the more similar their directionality, the greater the score.

In high-throughput screening, compounds pass or fail based on whether they satisfy certain criteria, for example, based on the existence of hydrogen bond acceptors or aromatic rings in specific positions. When searching for drug-like compounds these criteria are pharmacophore constraints. Galatin and Abraham134 used quantitative structure-activity relationships (QSAR) derived using a structure of Mdm2 bound to the p53 transactivation domain peptide to predict successfully the effect on binding affinity of small changes to the peptide sequence.

Lu et al.118 used the program CAVEAT135 to filter compounds to ensure that they all had side chains which could mimic those of Phe19, Trp23 and Leu26 in the human p53 transactivation domain, three residues identified as key "hot spots" by Böttger et al.26 using phage display. Criteria for screening can be generated using the binding positions of small, fragment-like molecules. The docking of small fragments is unreliable136 but fragment binding sites can be determined by X-ray crystallography or NMR137. Fragment positions can be used to generate possible binding molecules as well as screen a database for them; positioned fragments can be grown138 or joined139. The smaller chemical space of fragments as compared to that of larger

62 compounds has made predicting compounds by ligand-based drug design an increasingly

popular alternative to high-throughput synthesis and screening140.

1.4.1.2. Docking

In more rigorous docking, the ligand is considered flexible during the docking process; however, this comes at the expense of using more computational power. The protein is still often kept rigid in the docking process but some software can also move flexible parts of the protein. Autodock, one of the programs used in this project, can do this; although, this feature was not used in this project. If the protein has a flexible binding site but software does not permit movement of protein atoms then it is possible to dock into an ensemble of X-ray structures141 or conformations generated by computational means. Computational methods of generating an ensemble include taking snapshots from a molecular dynamics simulation142, using normal mode analysis143 and docking a few compounds with a method that does allows for protein movement144.

There are disadvantages to using multiple structures. As well as slowing down the docking process overall, using different conformations can introduce error if the conformations used do not accurately represent the possible conformations of the protein145. This can occur even if the ensemble members are different crystal structures141.

When docking a molecule, a scoring method is needed to predict the relative stability of binding poses both during the docking process and to rank the resulting compound poses. Huang et al.146 describe four types of scoring function: force-field based functions, empirical functions,

knowledge-based methods and consensus methods which involve a combination of these. Examples of force-field-based scoring functions are DOCK and AutoDock146. Force fields are discussed in detail in the context of molecular dynamics simulations on p67.

Empirical scoring functions are simpler, typically using an equation like the following.

∆𝐺 = ∑ 𝑊𝑖 𝑖𝑉𝑖 1.14

The score, ∆𝐺, is the weighted sum of a number of chosen variables, V. These may include the numbers of electrostatic interactions, hydrogen bonds and van der Waals interactions, the number of rotatable bonds and the hydrophobic surface area at the ligand-protein interface. The weights, W, are empirically derived using the structures of known complexes146. Examples of empirical scoring functions are ChemScore147, X-Score148 and FlexX149. Knowledge-based scoring functions are methods which use the experimentally determined structures of protein-ligand complexes, specifically, the frequency with which certain pairwise interactions are found in those structures.

63 In general,

∆𝐺 = −𝑘𝑇 ln (∏ 𝜌𝑖(𝑟𝑖)

𝜌_𝑖∗(𝑟𝑖)

𝑖 ) 1.15

Where ρi is the probability density of finding a distance of ri between the interacting pair of atoms, i, at the binding surface. ρ*i is the probability density which would be observed in a theoretical reference state in which there is no interaction146. k is the Boltzmann constant and T is the temperature in kelvins. Potentials of mean force are calculated using a formula of this type150_.

In this project, three docking programs were used: FlexX (BioSolveIT)149, Autodock 4 (version 4.2)151 and Autodock Vina152. In FlexX, docking is a sequential process comprising the

placement of a core fragment and then incremental addition of atoms to this core until the fully docked compound has been produced. Each step is repeated hundreds of times and the highest- scoring poses (evaluated using the FlexX statistical potential scoring function153) are selected for the next stage153. Other popular docking programs utilising this incremental approach are DOCK, Glide and Hammerhead154.

In contrast to the docking methods above, Autodock151 uses a genetic algorithm, meaning that the state of the ligand is treated like genetic material. In the case of Autodock, the information comprises the location (three coordinates), orientation (four rotation quaternions) and

conformation (one variable for each torsion angle) of the molecule155. A set of random poses are generated, the best poses are selected and then a new set of poses are generated based on the selected poses by combining and then mutating their information155. The process of pose selection, genetic recombination and pose production is repeated thousands of times to produce a population of high-scoring poses. In the event of there being many very similar poses, they are clustered by RMSD (p78) to produce distinct poses. The docking program GOLD also uses a genetic algorithm. Autodock uses a Lamarckian genetic algorithm, meaning that some fitting of each pose is performed before its genetic material (updated to take into account the effects of this local fitting) is processed to generate the next pose set156.

In terms of the computational time required for docking, Autodock Vina is much more efficient than Autodock 4. It uses a third search algorithm for global search based on the Iterated Local Search method devised by Baxter157. It is a Markov chain Monte Carlo iterative method, meaning that each iteration consists of the production of a possible pose based on the current one and the acceptance or rejection of this pose based on its probability relative to the current pose. In the case of Autodock Vina, the production of the possible pose involves a mutation followed by application of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method152. This method uses both the value and gradient of the scoring function in the optimisation

64 At the beginning of this project, rapid docking and scoring were used to predict the relative

binding affinity of oligobenzamides for Mdm2; however, the synthetic inaccessibility of the compounds meant that validation was not possible. Schneider159 highlights the relatively few examples of drug discovery projects using virtual screening, citing the poor accuracy of these methods but also the need to identify synthetically accessible compounds, with docking ideally forming part of a cycle of computational and synthetic work.

Docking predicts binding poses successfully160, but docking scores are typically poor estimates of the true binding affinity of compounds161,162. Oligobenzamides are large molecules and therefore have the potential to make numerous interactions when they bind. The more

interactions a set of compounds can make with a binding site, the harder it is to determine their relative affinities correctly163. Docking aims to find the most stable pose of a compound and docking scores are based on this pose. The true affinity of a compound, in contrast, depends on all of the system’s possible states, and the relative frequencies at which those states (each comprising a set of atom positions and momenta) occur as the compound and protein move around. Therefore, for accurate binding energy predictions, this phase space must be fully sampled, for example, by carrying out molecular dynamics simulations161,162.

In document Predicting and Testing Helix-Mimetic Inhibitors of the p53-Mdm2 Interaction (Page 60-64)