1.4 Target Identification
1.4.3 Structure-Based Druggability Assessment
With growing numbers of available X-ray crystal structures and enhanced computational power, structure-based druggability assessment is often employed in target identification. Early assessment of druggability at target identification focuses efforts on targets more susceptible to therapeutic intervention by exposing potential target liabilities. This reduces drug discovery attrition, preventing unnecessary investment of time or money.9,162 Target druggability analysis
assesses the likelihood that a target is amenable to functional regulation by interaction with a drug-like molecule.162 There are many computational methods that enable rapid and robust
evaluation of druggability.
All drugs must not only interact with their molecular target but also be able to reach their site of action. Oral drugs, which must be absorbed from the gastrointestinal tract, typically therefore share specific physicochemical properties, as described by Lipinski’s Rule of 5 (Ro5). Molecules should contain fewer than 5 H-bond donors, between 3 and 10 H-bond acceptors, have molecular weight less than 500 and have CLogP less than 5.163 Most existing drugs engage their
target proteins at predefined ligand binding sites, suggesting that compounds similar to the endogenous ligand should possess biological activity.152 Furthermore, for the vast majority of
these targets the substrate, product or allosteric effector is Ro5 compliant. However, this may be due to ease of design and the absence of a Ro5 ligand does not guarantee that a target is not druggable.162 This is fortunate as most endogenous ligands are not drug-like, with less than a
third of the proteins annotated with ligands in the Human Metabolome Database164 associated
with a Ro5 compliant ligand.162
Methods of structure-based assessment of target druggability are typically assembled of three elements; a mechanism to detect potential binding sites, a mechanism to evaluate these sites based on their physicochemical properties, and a set of reference targets used to validate and refine the assessment.162 Pocket-finding algorithms, used in predicting the ligand-binding sites,
can rely purely on geometry or include physicochemical considerations. Fpocket, which is used in this work, utilises a geometry-only method to predict binding sites then goes on to score these based on the physicochemical properties of the surrounding atoms.165
Fpocket is an open source package which employs Voronoi tessellation and alpha spheres in pocket detection.165 Voronoi tessellation is a method of dividing space with regard to a set of
predefined points. In this instance qvoronoi from the qhull package166 is utilised to produce a
set of Voronoi vertices and the radii of alpha spheres centred at these positions is measured. An alpha sphere is a sphere which contacts four atoms at its boundary and contains no internal atoms. The four atoms will be, by definition, equidistant to the centre of the alpha sphere, the sphere radius. The radius of an alpha sphere is therefore dependent on the local curvature of the protein, as described by the four atoms. Very small spheres exist at the interior of proteins and very large spheres at the exterior, while clefts or cavities result in alpha spheres of intermediate radii, allowing them to be detected. By identifying clusters of such alpha spheres of intermediate radii, fpocket is able to discern structural pockets. Following this the spheres are categorised depending on the four surrounding atoms to enable filtering of the clusters with respect to their physicochemical properties, e.g. by hydrophobicity.165
It is good practice to validate structure-based target druggability methods against a set of reference targets of known tractability. Huang and Schroeder compiled a standard test set of 48 protein targets for which bound and apo crystal structures were available and utilised these to validate pocket-finding methods.167 The average success rate of correct prediction of the true
ligand binding site as the top scoring pocket was 60% for the apo structures and 67% for the bound structures. Fpocket in comparison achieved 69% for the apo structures and 83% for the bound structures.165 Fpocket is also one of the few target druggability methods that is validated
against a set of reference targets with known degrees of druggability. It can therefore be used to quantitatively assess druggability. This approach was spearheaded by Hajduk et al. who designed a simple model to assign the pockets a druggability score.168 They included terms for
polar and apolar surface area, surface complexity and pocket dimensions. The scores were tailored to conform with hit rates from nuclear magnetic resonance (NMR)-based fragment
screening, as an index of binding site druggability. The resulting algorithm was validated against a test set of 23 proteins with 57 pockets, which were not used in the training set, and correctly classified 94%. This reliable algorithm enables a quantitative assessment of the capacity of a given pocket to bind small compounds with high affinity and specificity. Cheng et al. developed a different but complementary approach two years later.169 They utilised a biophysical binding-
free-energy model, predominantly dependent on the curvature and hydrophobic surface area of the binding pocket, to devise the maximal affinity predicted for a passively absorbed oral drug (MAPPOD) for the target. A test set of 27 pharmaceutical targets was compiled with 17 classified
as druggable, 6 as difficult and 4 as undruggable. They found that a 100 nM threshold for MAPPOD
score clearly separated those that are druggable from those which are difficult or undruggable, affording a mechanism to quantitatively assess druggability in other targets. Their manually curated dataset is now considered the benchmark for developing and validating new algorithms.162 The validation set utilised in training fpocket is notable as it was compiled via an
open collaborative platform (http://fpocket.sourceforge.net/dcd) and is the largest publicly available.170 It combines targets from the two aforementioned studies with others manually
annotated from the PDB.158 The fpocket logistic model was trained from this dataset employing
local hydrophobic density, hydrophobicity and normalised polarity as pocket descriptors.162,165
In the preceding studies, druggability is assessed solely on the properties of the protein. Alternatively, or additionally, the energetics of protein-ligand binding can be calculated via docking or molecular simulation. Molecular docking facilitates prediction of the binding modes and associated affinities of ligands at putative binding sites on the protein. Huang and Jacobsen virtually screened ~11,000 diverse fragments against 152 binding sites using docking to calculate computational hit rates for each site, indicative of druggability.171 They showed that these hit
rates correlated with those previously published from NMR-based screening.168 However, this
evolving technology still faces significant challenges that affect the accuracy of results, particularly in sampling approaches and scoring functions.172 Due to the flexibility of the proteins
and ligands, there are too many potential binding conformations to sample every one and although flexible docking is prevalent, it is computationally expensive. The scoring function is a prediction of ligand binding affinity. Current scoring algorithms underestimate the contributions of entropy and structural water and surrounding ions.173 Seco et al. apply a decidedly different
technique utilising molecular dynamics (MD) simulations of the interaction of the protein with isopropyl alcohol.174 Analysis of the simulations enables deduction of the interaction free
energies between the protein and probe molecule, which are used to detect binding sites and predict the maximal affinity of drug-like molecules at these positions.
MD simulations are now increasingly often being used in druggability assessment. Proteins are dynamic but a single X-ray crystal structure typically embodies only one of many possible conformations, although the B-factor provides a measure of the displacement of atoms from their mean position. The alternative conformations are not well evaluated using standard structure-based target druggability assessment. Structural variability can be scrutinised either by assessing multiple crystal structures of one target or by evaluating MD trajectories. Somewhat surprisingly, the druggabilities calculated from different crystal structures of the same protein can vary a great deal. Often this might be attributable to structural perturbation, such as mutant or missing residues, but in many cases it is the result of protein flexibility. A computationally expensive but thorough approach involves scoring the druggability of a series of structures along MD trajectories.162 This highlights any plausible conformational changes that
could occur under conditions of temperature, pressure, solvation and pH, simulating those in
vivo, which may affect the druggability of the target.173