Scoring Functions - Molecular Docking - Structure Based Virtual Screening

1 Computational

1.8 Virtual Screening

1.8.2 Structure Based Virtual Screening

1.8.2.1 Molecular Docking

1.8.2.1.1 Scoring Functions

It is important to make the distinctions between a docking study and a SBVS experiment. Docking involves the prediction of the binding mode of individual molecules, to identify the orientation that is closest in geometry to the observed x-ray structure. Studies to evaluate the performance of docking programs using datasets derived from the Protein Data Bank (PDB), showed that when the native ligand was docked backed into the active site, they were able to correctly predict the binding geometries in more than 70% of cases.320-322 It is not however always clear which docking program will give the best result for a particular case,323, 324 it is therefore important to carefully consider the results from individual studies.

For a SBVS experiment, once a pose has been generated in the binding site it is necessary to score or rank that ligand, using some function related to the free energy of association between the protein and ligand in forming that intermolecular complex. There are a wide range of scoring functions available,325 and the ability to accurately predict the potency of ligand binding within a protein is of significant

value, providing useful starting points for drug discovery.326, 327 Once the ligands are docked the resulting interactions can be scored, giving a quantitative measure of fit quality. Scoring functions are approximate mathematical methods used to predict the strength of the non-covalent interactions between two molecules after they have been docked, also referred to as binding affinity. It is common practice to use scoring functions in protein-ligand docking,328 but they can also be used to predict the strength of intermolecular interactions between two proteins,329 or even between a protein and DNA.330 Scoring functions can be grouped into three categories: force field based, empirical, and knowledge based.331

Force field based scoring functions may make a smooth transition to empirical scoring functions, and include methods such as GOLDScore313, 320 (see Chapter V). The scores are estimated by summing the strength of intermolecular van der Waals and electrostatic interactions between all atoms of the two molecules in the complex. Intramolecular energies of the two binding partners are also frequently included, and since binding normally takes place in the presence of water, the desolation energies of the ligand and the protein are sometimes taken into account using implicit solvation, which is a method of representing a solvent as a continuous medium, rather than as explicit solvent molecules. Force field based scoring functions are primarily derived from force fields such as AMBER,332 which are frequently used in molecular dynamics simulations.

Empirical scoring functions include ChemScore,333, 334 (see Chapter V) and are derived to reproduce data of experimentally determined complex structures based on physicochemical properties. They are based on counting the number of various types of interactions between the two binding partners.335 Counting may be based on the number of ligand and receptor atoms in contact with each other, or by calculating

the change in solvent accessible surface area in the complex compared to the uncomplexed ligand and protein. The coefficients of the scoring function are usually fitted using MLR methods, and may include contributions from hydrogen bonding, ionic interactions, lipophilic interactions and the loss of internal conformational freedom of the ligand.

Knowledge based methods rely on the idea that a sufficiently large data sample can serve to derive rules and general principles inherently stored in this knowledge database.331, 336-339 One such scoring function is DrugScore,338 which is used to describe the binding geometry of ligands in proteins. It is based on statistical observations of intermolecular close contacts in large 3D datasets, such that the interaction potential between each ligand-protein atom pair is calculated as a potential of mean force. The method is founded on the assumption that close intermolecular interactions between certain types of atoms or functional groups that occur more frequently than one would expect by a random distribution, are likely to be energetically favourable and therefore contribute favourably to binding affinity.340

With the docking of a large compound library comes the generation of a vast amount of data, comprising the predicted binding pose for each compound, along with the predicted binding affinity of that ligand at the target. It is therefore conceivable that you could choose a list of compounds to be tested based upon the rank ordering of these compounds.341 It is well known, however, that current scoring functions used in virtual screening are often inadequate at predicting the true binding affinity of a ligand for a receptor,342 and there is currently no universally applicable scoring function.343 One popular strategy to attempt to overcome this is the concept of consensus scoring.344-346 In this approach, when a given docking function is used to generate the top ranked poses for the compounds in a target receptor, other scoring

functions are used to rescore the top ranked pose for each ligand. Only those top ranked compounds common to each scoring function (consensus) are then chosen for biological testing, with this approach showing improvements in the true hit outputs from virtual screening.347

As there are a multitude of possible parameters which govern the operation of docking programs, it is well recommended to spend time investigating the various options available for each docking run which is performed.192 Molecular docking and particular scoring functions are discussed further in Chapter V.

In document Antimalarial drug design: targeting the plasmodium falciparum cytochrome bc1 complex through computational modelling, chemical synthesis and biological testing (Page 71-74)