Docking Simulation Methods Flexible Ligand Search Algorithms

2.3 Docking Methods

2.3.2 Docking Simulation Methods Flexible Ligand Search Algorithms

Docking simulation methods allow modelling of the flexibility within the ligand. Such

methods are more physically detailed and typically slower than matching techniques. The

majority of these methods treat the protein as a rigid body only considering the confor-

mational space of the ligand. Docking approaches that consider both ligand and protein

flexibility have been developed in recent years and progress within the field is discussed in

a recent review by B-Rao and co-workers205_.

The methods that model flexibility within the ligand can be broadly categorised into three

types: systematic, random or stochastic, and classical simulation.

Systematic docking algorithms attempt to explore all possible degrees of freedom within

a ligand. Conformational search, fragmentation and database are all methods utilised in

systematic docking algorithms. Conformational search methods systematically rotate all

rotatable bonds in the ligand through360◦ using a fixed increment, generating and subsequently evaluating all possible combinations. Application of this method is very limited

as the number of different generated structures increases with an increase in the number

of rotatable bonds; this is an example of a combinatorial explosion. Dimensionality of the

problem can be reduced by applying constraints and restraints to the ligand206. In fragmen-

tation methods the ligand is incrementally grown into clefts (binding pocket) of a protein

structure either by docking in a rigid fragment of the ligand and then successively adding

flexible regions of the ligand, or by docking several fragments into the binding pocket and

then linking them covalently206_{. Examples of programs that have implemented a fragmen-}

tation search method include LUDI 207, ADAM 208, FlexX 209and DOCK 210.

Database methods address the issue of combinatorial explosion introduced as a result of

considering ligand flexibility by using libraries of ligand conformations. The program

FLOG uses distance geometry constraints to generate a library of 25 database ligand con-

formations that are subsequently docked into the rigid protein 211_.

Random or stochastic algorithms primarily use either a Monte Carlo (MC) algorithm or ge-

netic algorithm (GA). These methods make incremental changes to the ligand or population

of ligands, which are either accepted or rejected at each step dependent on a predetermined

probability function.

In MC methods a Boltzmann probability forms the basis of the criteria upon which each

new ligand is evaluated206. Programs that use MC-based algorithms include DockVision

212,213_{, Prodock} 214_{, and MCDOCK} 215_.

Genetic algorithms 216 pioneered by John Holland are global heuristic search algorithms

inspired by evolutionary biology. GAs randomly generate an initial population of candidate

solutions (individuals), which are represented abstractly (commonly by binary strings) by

genes organised into a chromosome. The individual solutions within a population evolve

over generations towards better solutions. Only a proportion of the existing population is

carried forward to produce the new generation. The selection of individuals to progress

is dependent on fitness. Pairs of selected individuals (parents) combine, a process termed

crossover, to produce offspring which then form the new population. Some of these off-

begin with each new generation and the process only terminates when either a solution with

a satisfactory fitness level has been generated, or the predetermined maximum number of

iterations has been reached. If the latter is true a solution of satisfactory fitness level may

not have been obtained203_{. Examples of programs that include a GA method for molecular}

docking are; GOLD 217,218_{, DIVALI} 219_{, AutoDock version 3} 220_{, and DARWIN} 221_.

Simulation methods applied to molecular docking include simulated annealing and energy

minimization methods. Application of simulated annealing to the docking problem circum-

vents the limitations of MD for crossing high-energy barriers in the energy landscape of a

biological system and allow for a search of greater conformational space. Energy minimiza-

tion methods are rarely used as a docking search technique, they are however, commonly

used to optimize potential ligand conformations 206.

2.3.3 AutoDock

In the AutoDock program of Morris and co-workers the docking simulation can be carried

out with one of the following search methods: MC simulated annealing; GA; local search

(LS); global-local search method, the Lamarckian genetic algorithm (LGA).

Versions 1 and 2 of the Autodock program contained only a MC simulated annealing search

option called the Metropolis method. The simulated annealing technique has both global

and local search attributes; performing a global search at higher temperatures and a more

localised search at lower temperatures. This method had limitations however when attempt-

ing to dock ligands with more than eight rotatable bonds. The GA, LS and hybrid LGA

method, developed to address the limitations of theMetropolis method, were introduced in

version 3 of the program. Version 3 also contains an empirical binding free energy force

field which was also developed to allow the prediction of binding free energies of docked

ligands with greater accuracy 220_.

Atomic affinity potentials pre-calculated for each atom type in the ligand enable rapid en-

ergy evaluation. The AutoGrid routine embeds the protein in a three-dimensional grid plac-

is assigned to the grid point. An affinity grid is calculated for each type of atom in the

ligand - typically carbon, oxygen, nitrogen and hydrogen - in addition to a grid of electro-

static potential, using either a Poisson-Boltzmann finite difference method or a point charge

of+1. Tri-linear interpolation of affinity values of the eight grid points surrounding each

ligand atom is used to determine the energetics of a particular ligand configuration. The

electrostatic interaction is obtained by interpolating the values of the electrostatic potential

and multiplying by the charge on the atom. These grids mean the time to perform the en-

ergy calculation is proportional only to the number of atoms in the ligand and independent

of the number of atoms in the protein 222.

Genetic Algorithms in Molecular Docking

In molecular docking the position of the ligand with respect to the protein is described by the

state variables- which are a set of variables used to describe the translation, orientation, and

conformation of the ligand with respect to a protein - where each state variable corresponds

to a gene. The ligand’s state corresponds to the genotype and the atomic coordinates of the

ligand correspond to the phenotype. The total interaction of the ligand with the protein,

as determined by the energy function, defines the fitness of a solution. Crossover is the

process by which random pairs of individuals (solutions) are mated, inheriting genes from

either parent to produce offspring. Changes to the genes of the offspring can be introduced

by random mutation. The current generation’s offspring undergo selection based on the

individual’s fitness; this ensures that better solutions reproduce and that poorer solutions

are terminated 220.

The Lamarckian Genetic Algorithm of AutoDock

The LGA is a hybrid search technique that combines an adaptive global optimizer, a GA,

with a pseudo-Solis and Wets (pWS) LS method. In this GA the chromosome is comprised

of a string of real-valued genes each of which encode one state variable. The genes are;

quaterion which specifies the ligand orientation, and one real-value for each ligand torsion.

AutoTors, an AutoDock routine, creates a torsion tree which defines the order of genes that

encode the torsion angles. The ligand’s state variables are therefore, one-to-one mapped to

the genes of an individual’s chromosome. Using real encodings to represent the genome

limits the search to reasonable domains. This contrasts with the use of binary operators to

represent the genome which can lead to an inefficient search by producing values outside

the domain of interest 220_.

Initially the LGA creates a random population of individuals, the number of which is user

defined. For each individual random values are assigned to each of the genes in the follow-

ing fashion: a uniformly distributed random value between the minimum and maximumx,

yandzextents of the grid maps is assigned to each of the threex,yandztranslation genes; a

random quaterion, consisting of a random unit vector and random rotational angle between

−180◦ and +180◦, is assigned to the four genes describing the orientation; and random values between −180◦ and +180◦ are assigned to the torsion angle genes. Creation of the initial population is followed by iterations of the algorithm over generations until the

termination criteria have been met. Each generation consists of five processes carried out

in the following order; mapping and evaluation, selection, crossover, mutation, and elitist

selection. Following each generation the LS is performed on a user defined proportion of

the population 220_.

Mapping is performed across the entire population and translates an individual’s genotype

to its phenotype. Following mapping, the sum of intermolecular interaction energy between

the ligand and protein, and the intramolecular interaction energy of the ligand (fitness) is

calculated. The total number of energy evaluations is incremented every time an individ-

ual’s energy is calculated. Proportional selection determines which individuals reproduce

and ensures that individuals with better-than-average fitness receive more offspring. Deter-

mination of the number of offspring attributed to an individual is carried out in accordance

n0 =

fw−fi

fw − hfi

fw 6=hfi (2.2)

where: n0 is the integer number of offspring allocated to an individual;fw is the fitness of

the worst individual; fi is the fitness of the individual; andhfiis the mean fitness of the

population. As the numerator of this equation will always be greater than the denominator

individuals of sufficient fitness will always be assigned at least one offspring. When fw

equals the mean fitness of the population the docking simulated is assumed to have con-

verged and is terminated 220_.

The number of random members of the population selected to undergo crossover and mu-

tation is user defined. Two-point crossover is performed first and breaks are not permitted

within a gene, only between genes. The offspring produced by crossover replace the par-

ents in the population ensuring the population size remains constant. Mutation follows

crossover and is performed by adding a random real number that has a Cauchy distribution

to the variable. The distribution is defined by:

C(α, β, x) = β

πβ2_{+ (}_x₋_α₎2 α≥0, β >0,−∞< x <∞ (2.3)

whereαandβare parameters that affect the mean and spread of the distribution. Optionally

an elitism parameter can be assigned to allow a user to define the number of the top indi-

viduals to be carried over to the next generation. Once one of the termination criteria is met

AutoDock reports the fitness, state variables, and coordinates of the docked conformation,

and carries out a conformational analysis of the docked conformations to determine which

In document Structural study of the adenylation domain by molecular dynamics simulation (Page 84-90)