LIGAND DESIGN PROGRAM KNOWLEDGE BASE.
SPROUT APPLICATION FOR 17
β
-HYDROXYSTEROID
DEHYDROGENASE TYPE 1 ENZYME
Sari Alho
2005
Laboratory of Organic Chemistry, Department of Chemistry,
University of Helsinki, Finland
LIGAND DESIGN PROGRAM KNOWLEDGE BASE.
SPROUT APPLICATION FOR 17
β
-HYDROXYSTEROID
DEHYDROGENASE TYPE 1 ENZYME
Sari Alho
University of Helsinki
Faculty of Science
Department of Chemistry
Laboratory of Organic Chemistry
P.O. Box 55, FIN-00014 University of Helsinki
ACADEMIC DISSERTATION
To be presented with the permission of the Faculty of Science of the University of Helsinki for public criticism in Auditorium A 110 of the Department of Chemistry,
A. I. Virtasen aukio 1, on April 1st, 2005 at 12 o’clock noon
ISBN 952-91-6304-5 (paperback) ISBN 952-10-1354-0 (PDF)
http://ethesis.helsinki.fi Helsinki 2005 Gummerus Oy
CONTENTS
ABSTRACT 4
ACKNOWLEDGEMENTS 6
ABBREVIATIONS 8
1. INTRODUCTION 11
1.1 Structure-based drug design 11
1.2 SPROUT and SynSPROUT 15
1.3 Biological background 17
2. AIMS OF THE STUDY 19
3. OVERVIEW OF SPROUT COMPONENT PROGRAM 21
3.1 Survey of de novo ligand design programs 21
3.2 SPROUT 24 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 3.2.7 3.3.1 3.3.2
Current developments of SPROUT 24
General features 25
CANGAROO 28 HIPPO 30
3.2.4.1 Boundary surface 31
3.2.4.2 HIPPO target sites 32
3.2.4.3 Pharmacophore module 38
ELEFANT 39
3.2.5.1 SPROUT template library 40
SPIDER 43
3.2.6.1 User defined parameters 44
3.2.6.2 Template joining 45
3.2.6.3 The search process 47
ALLIGATOR 48
3.3 SynSPROUT 50
Knowledge base and PATRAN language 51
3.3.1.1 Chemical patterns 51
3.3.1.2 Joining rules 51
3.3.1.3 Other specifications 52
3.3.3 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 3.4.6 4.4.1 4.4.2 4.4.3 4.4.4
Differences between Classic and SynSPROUT 53
3.4 Further modelling applications 54
Moloc 54 MacroModel 54 AutoDock 55 eHiTS® 55 SPA-Docking 55 CAESA 56 4. REVIEW OF STEROID HORMONES AND HYDROXYSTEROID
DEHYDROGENASES 57
4.1 Structure of the steroid hormones 57
4.2 Physiological effects of estrogens 58
4.3 Estrogen biosynthesis 59
4.4 Hydroxysteroid dehydrogenase family 63
SDR and AKR protein superfamilies 63
Members of the hydroxysteroid dehydrogenase family
important for human physiology 65
4.4.2.1 3β-Hydroxysteroid dehydrogenase/ketosteroid isomerase 66
4.4.2.2 11β-Hydroxysteroid dehydrogenase 66
4.4.2.3 3α-Hydroxysteroid dehydrogenase 67
4.4.2.4 20α-Hydroxysteroid dehydrogenase 68
4.4.2.5 Multiple specificities of hydroxysteroid dehydrogenases 69 17β-Hydroxysteroid dehydrogenase/ketosteroid reductase 69
4.4.3.1 Members of the 17βHSD/KSR family 73
4.4.3.2 Crystal structure information of the 17βHSD/KSR in PDB 78 4.4.3.3 Overall description of the 17βHSD/KSR (type 1) enzyme
structure 79
4.4.3.4 Ligand-binding domain and the interactions 81
4.4.3.5 Cofactor-binding site 87
4.4.3.6 Reduction mechanism 88
4.4.3.7 Inhibition studies of 17βHSD/KSR enzymes 89
Crystallisation studies of estrogen receptor α and β 91
5.1 Development of SynSPROUT knowledge base 94 5.1.1 5.1.2 5.2.1 5.2.2 5.2.3 5.2.4 5.2.5 5.2.6
1,3-Dipolar cycloaddition reactions 94
5.1.1.1 Azomethine ylides 96
5.1.1.2 Stereochemistry of the 1,3-dipolar cycloaddition reactions 99 Azomethine ylide chemical patterns and joining rules 100
5.2 Inhibitor design for 17βHSD/KSR1 104
Crystal structures selection 104
5.2.1.1 Active site study of estradiol complex 105
5.2.1.2 Active site study of equilin complex 108
5.2.1.3 Active site study of dihydrotestosterone complex 112 5.2.1.4 Active site study of dehydroepiandrosterone complex 115
5.2.1.5 Active site studies of the estrogen receptor α and β 118
Structure generation 119
5.2.2.1 Structure generation for estradiol complex 121 5.2.2.2 Structure generation for equilin complex 125 5.2.2.3 Structure generation for dihydrotestosterone complex 129 5.2.2.4 Structure generation for dehydroepiandrosterone complex 132 5.2.2.5 New structure generation for dihydrotestosterone complex with the latest version of SPROUT 135
Examination of potential inhibitor structures 136 5.2.3.1 Modifications and optimisation studies of the selected structures 137 5.2.3.2 Energy optimisation and further analysis of selected molecules 144
Docking studies 152
5.2.4.1 Docking simulations into the 17βHSD/KSR1 active site 152 5.2.4.2 Docking simulations into the estrogen receptor α and β active
sites 155
Retrosynthesis and synthesis plan 156
Retrosynthesis by CAESA 158
6. CONCLUSIONS AND FUTURE PERSPECTIVES 165
7. REFERENCES 167
ABSTRACT
As part of this thesis various de novo ligand design programs are briefly surveyed. The utilization and characteristics of the SPROUT ligand design program are presented in more detail. The thesis also discusses the process which led towards an extension of the knowledge base of the SynSPROUT ligand design program. It was visualized that pyrrolidine moieties might constitute a key structural element of the sought-after 17β-hydroxysteroid dehydrogenase/ketosteroid reductase type 1 enzyme inhibitor candidates. Thus, a literature survey of azomethine ylide reactions, capable of producing pyrrolidine and related ring structures, was carried out in order to be able then to add the information regarding these reactions to the SynSPROUT program’s knowledge base. The text files were written for the knowledge base containing chemical patterns describing functional groups of 1,3-dipoles and dipolarophiles. For eventual addition into the knowledge base, the next step is to develop the ring formation programming language currently lacking in SynSPROUT.
A survey of the 17β-hydroxysteroid dehydrogenase/ketosteroid reductase enzyme family is also presented. These enzymes are responsible for the final step of the biosynthesis of the sex hormones. In many cases they also stimulate the proliferation of breast and prostate cancers. Because 17β-hydroxysteroid dehydrogenase/ketosteroid reductase type 1 (17β-HSD/KSR1) enzyme catalyses estrogen synthesis it is an attractive target for structure-based ligand design for the prevention and control of breast tumour growth.
The experimental work focused upon the inhibitor ligand design for 17β-HSD/KSR1 enzyme using the SPROUT de novo ligand design program. The three-dimensional crystal structure coordinates of the enzyme type 1 complexed with four different substrates (estradiol, equilin, dihydrotestosterone and dehydroepiandrosterone) have been used for the study. Structure generation of the novel ligand molecule libraries are described step-by-step. Thousands of new molecules were created for the enzyme active site. A set of molecules (64) were selected using SPROUT program’s scoring function and the ALLIGATOR module. Modifications of the functional groups and energy optimisations were carried out for the selected molecules. The interaction results of the optimised molecules were compared with SPROUT information. In silico docking simulations were performed for a promising subset of molecules and the best docking result was also compared with the original molecule generated using SPROUT. Both optimisation and docking simulation results supported the SPROUT generation results. The
retrosynthesis and synthetic plan for one new molecule is presented as an example and also the results of retrosynthetic program for the molecule.
ACKNOWLEDGEMENTS
The experimental work for this thesis was carried out in the Laboratory of Organic Chemistry of the University of Helsinki and the Institute for Computer Applications in the Molecular Science (ICAMS) of the University of Leeds, United Kingdom.
I am most grateful to my supervisor, Professor Kristiina Wähälä, for introducing the fascinating world of molecular modelling to me. I particularly appreciate her numerous suggestions and helpful criticism during this work.
I am exceedingly grateful to Professor A. Peter Johnson and Dr. Kimmo Vihko for reviewing the manuscript of the thesis and for their helpful comments, and Dr. Louise Fletcher for revising the language of the present manuscript.
Sincere thanks to all members at Organic Chemistry Laboratory in Helsinki University. I wish to thank Emeritus Professor Tapio Hase for his constructive comments of the organic chemistry problems and Dr. Jorma Koskimies for his help with molecular modelling problems in the beginning of my studies. Many warm thanks to all former and present members of the Phyto-Syn group at Organic Chemistry Laboratory. I am deeply indebted to Barbara for her endless support and help.
Warm thanks are owing to the members of the ICAMS group in Leeds University. Special thanks to Vilmos, Aniko and Krisztina for their help during my work in Leeds University.
I would like to thank my friends and former study mates for their support. Thanks are extended to Katariina, Päivi ja Maarit for providing the berth during my stays in Helsinki. I am deeply grateful to my good friends in Leeds especially Sari and Houry with whom I have had many fruitful conversations.
I would like to express my deepest gratitude to my parents and siblings for their encouragement during my studies and special thanks to my father for financial support during the last year. Finally a multitude thanks to my fiancé Paul for his support, understanding and love.
Financial support from National Technology Agency of Finland (TEKES), The Academy of Finland, Marie Curie Fellowship Association, the Magnus Ehrnrooth Foundation, Etelä-Pohjanmaa Regional Fund of the Finnish Cultural Foundation and the University of Helsinki is gratefully acknowledged.
Leeds, United Kingdom,
February 2005
ABBREVIATIONS
1,3-DC 1,3-Dipolar cycloaddition
3α/3β-Adiol Androstanediol, androst-5α-an-3α/3β,17β-diol
3-D Three-dimensional
17βHSD/KSR 17β-Hydroxysteroid dehydrogenase/ketosteroid reductase 17βHSD/KSR1 17β-Hydroxysteroid dehydrogenase/ketosteroid reductase type 1 20αDHP 20α-Dihydroprogesterone, pregn-4-nen-20α-ol-3-one
∆4
-dione Androstenedione, androst-4-en-3,17-dione
∆5
-diol Androstenediol, androst-5-en-3β,17β-diol ADH Short Chain Alcohol Dehydrogenase
Adione Androstanedione, androst-5α-an-3,17β-dione ADT Androsterone, 3α-hydroxy-5α-androstan-17-one
AKR Aldoketoreductase
ALLIGATOR Algorithms for Ligand Testing and Ordering of Results
AR Androgen Receptor
CAESA Computer Assisted Estimation of Synthetic Accessibility
CANGAROO Cleft ANalysis by Geometry based Algorithm Regardless Of the Orientation
CoMFA Comparative Molecular Field Analysis
CR Carbonyl Reductase
CSD Cambridge Structural Database
DHEA Dehydroepiandrosterone, 3β-hydroxy-5-androsten-17-one DHT Dihydrotestosterone, 17β-hydroxy-5α-androstan-3-one E1 Estrone, 3-hydroxyestra-1,3,5(10)-triene-17-one E2 Estradiol, estra-1,3,5(10)-triene-3,17β-diol
E3 Estriol, estra-1,3,5(10)-triene-3,16α,17β-triol eHiTS Electronic High Throughput Screening
ELEFANT Election of Functional Groups and Anchoring them to Target Sites
ER Estrogen Receptor
ERα Estrogen Receptor α
ERβ Estrogen Receptor β
EQU Equilin, 3-hydroxyestra-1,3,5(10),7-tetraen-17-one FGI Functional Group Interconversion
FMO Frontier Molecular Orbital
GA Genetic Algorithm
GA-LS Genetic Algorithm and Local Search
HDE Hydratase Dehydrogenase
HIPPO Hydrogen Bonding Interaction Site Prediction as Positions with Orientations
ICAMS Institute for Computer Applications in the Molecular Science
KSR Ketosteroid Reductase
LBD Ligand-binding domain
LEA Ligand Energy Alone
LGA Lamarckian Genetic Algorithm
LHASA Logic and Heuristic Applied to Synthetic Analysis NAD Nicotinamide Adenine Dinucleotide
NADP Nicotinamide Adenine Dinucleotide Phosphate
NMR Nuclear Magnetic Resonance
NR Nuclear Receptor
MCSS Multiple Copy Simultaneous Search
MDDR MDL Drug Data Report
MDL Molecular Design Ltd.
MFE Multifunctional Enzyme
MLE Minimised Ligand Energy
MR Mineralocorticoid Receptor
P450arom cytochrome P450 aromatase
ORAC Organic Reactions Accessed by Computer P Progesteron, pregn-4-nen-3,20-dione
PDB Brookhaven Protein Data Bank
PPARα Peroxisome Proliferators Activated Receptor α PTCR Porcine testicular carbonyl reductase
REACCS Reaction ACCess System
RMS Root Mean Square
RoDH1 Retinal Dehydrogenase type 1 SCP2 Sterol Carrier Protein type 2
SD Structural Data file format of MDL
SDR Steroid Dehydrogenase/Reductase
SGI Silicon Graphics
SPA Systematic Population Annealing
SPIDER Structure Production with Interactive Design of Results T Testosterone, 17β-hydroxyandrost-4-en-3-one
TIM Triose phosphate Isomerase
TOPAS TOPology-Assigning System
vdW van der Waals
Essential Amino Acids Alanine, Ala, A Arginine, Arg, R Aspartate, Asp, D Asparagine, Asn, N Cysteine, Cys, C Glutamate, Glu, E Glutamine, Gln, Q Glycine, Gly, G Histidine, His, H Isoleucine, Ile, I Leucine, Leu, L Lysine, Lys, K Methionine, Met, M Phenylalanine, Phe, F Proline, Pro, P Serine, Ser, S Threonine, Thr, T Tryptophan, Trp, W Tyrosine, Tyr, Y Valine, Val, V
1.
INTRODUCTION
1.1
Structure-based drug design
Structure generation is one of the many approaches that are used in the computer-aided lead discovery cycle. It has been demonstrated for a large number of different molecular targets that the three-dimensional (3-D) structure of a protein can be used to design novel ligands.1, 2 After selection of the therapeutically interesting target the crystal structure information of the enzyme is used for ligand designing. A noticeable amount of 3-D protein structures defined by X-ray crystallography, NMR-spectroscopy or theoretical homology modelling are available in the Brookhaven Protein Data Bank (PDB).3,4 Initially, structure-based designing concentrated on different procedures for screening databases of known 3-D chemical structures.5,6 The advantages of this approach are the commercially available chemical compounds or known synthesis. On the other hand, it is not possible to discover novel structures using this application. Nowadays, denovo design programs7 allow the possibility of designing new compounds without relying on existing molecule databases, such as MDL® Screening Compounds Directory.8 A de novo approach produces large sets of potential structures but its limitation is the synthetic accessibility problem.
Applications of 3-D searching can be divided in four groups as shown in Scheme 1 (page 12), depending on whether the structures of receptor and/or ligand are known. When receptor and ligand 3-D structures are unknown it is still possible to apply the Computer-Aided screening methods, which include the chemical similarity search, based on the structure (High Throughput Screening) and/or combinatorial chemistry application. If the ligand is known while the receptor is unknown it is feasible to apply database and similarity search to identify the pharmacophore of the ligand (Analogy-Base Drug Design). When only the receptor is known it is possible to perform de novo ligand design or receptor-based 3-D searching applications. Performing such virtual screening, compounds that match a given pharmacophore hypothesis are identified in silico.
Unknown Known
Unknown Computer-aided
Screening De Novo Design
Known Analogy-Based Drug Design Structure-Based Drug Design Ligand structure Receptor structure
Scheme 1. Applications of 3-D searching are divided in four groups.
De novo ligand design involves the generation of drugs, based purely on the structure of the binding site, so that bound molecule either inhibit or alter protein activity. Constraints of the binding site are defined using crystallographic information of the receptor. By knowing which amino acids are present in the binding site and where they are located, it is possible to identify the binding interactions. Good inhibitors must possess significant structural and chemical properties which are complementary to their target receptor; hereby molecule skeletons are generated to fit a set of steric-, electrostatic- and hydrophobic constraints. In addition to these constraints molecules should exist in low-energy conformation. After binding site characterisation it is possible to design a molecule that will have the correct size, geometry and necessary functional groups to interact with the amino acid residues. Structure-based and de novo ligand design processes have helped design new potential inhibitors that have also been tested in clinical trials.9,10 11, Although no new structures are based purely on any de novo ligand design program, research and development of these programs is significant, for the reason of assistance of the expeditious and efficient drug discovery.
Frequently the determined crystal structure information includes a ligand bound to the receptor and this ligand location is used as the binding site. If the receptor is unknown, constraints can be derived from a pharmacophore hypothesis. This definition enables creation of a picture of the receptor binding requirements by analysing the molecules, which are known to bind into the receptor. Many examples of successful 3-D searching using pharmacophore queries or hypothesis have been published12,13 14 15, , and some expert systems for automatic prediction of pharmacophoric groups are presented.16
De novo ligand design process includes four main steps regardless of the design program:17
1) Definitionof the constraints: Analysis of the X-ray crystallography information of the receptor provides one starting point for design. However in many cases crystal structures are not available. For this reason another starting point for ligand designing is the pharmacophore hypothesis where structure is generated to fit less well defined constraints. From constraints it is possible to identify the interaction sites.Usually it is also possible to define the volume of the binding site.
2) Skeleton generation: Once a number of interaction sites have been defined, then de novo design methods start to generate structures that have atoms or fragments placed at the interaction sites and have complementary shape, volume and appropriate chemical functional groups with the binding site.
3) Organisation of the results: Programs give large sets of answers and users need tools for navigating through these sets. These tools may include sorting, clustering and ranking techniques, which help in the selection of the designed molecule sets by estimating their chemical and biological properties.
4) Structure evaluation: New ligands must satisfy a number of criteria. For example, a potential enzyme inhibitor must be able to bind to the active site, must be synthetically accessible, and among other biological things, have the required transporting properties.
After binding site characterisation it would be tempting to design a ligand that fits perfectly. However, it is probable that the result is a disappointment because there could be an experimental error in the crystallographic structure. Moreover, the flexible binding site could change shape depending on the molecule it binds with. Therefore, it is better to design a loose-fitting molecule structure.
Programs for de novo ligand design7,18 that generates new structures, by joining together atoms or fragments are called atom- or fragment-based ligand design programs. The advantage of the single-atom based programs is that they can produce huge amounts of chemical structures. The major advantage of the fragment-based approach is that it improves the synthetic accessibility of the generated structures.19,20 21, There are two main strategies for this approach. The first strategy connects selected interaction points and the fragments within suitable linking groups. Connections take place so as to produce an optimal result between
generated skeleton and protein binding site. In the second strategy the generation of the skeleton starts from one selected point of the receptors interaction site and grows piece-by-piece, and the partial skeleton reaches all interaction points one by one (see section 3.1, page 21). Molecules generated in these manners should also fulfil required steric-, electrostatic- and hydrophobic properties. The first strategy has a tendency to generate rigid structures unlike the second one, which is inclined to produce flexible ones. There are many different conformations that atoms or fragments can form when added to a growing skeleton and these lead to a large number of possible structures. Because structures are generally built in a stepwise manner from smaller fragments, it is necessary to ensure that the final structures are chemically stable and synthetically feasible.
Most of de novo design applications are based on the docking method. This means that programs, try to find favourable alignments of two molecules, ligand and its receptor, so that they interact favourably. Several de novo programs have been developed that can be used to characterise an enzyme receptor site and to generate novel molecule sets. These programs can be categorized in several different ways. One of these focuses on the way the programs connect fragments and generate molecular skeletons, they are: 1) programs that link predefined fragments placed into the interactions sites: 2) programs which place one fragment as a base and then generate skeletons in a stepwise manner: and 3) programs that are based on random or stochastic methods. The NEWLEAD,22 HOOK,23 LUDI24,25 26 27, , and SPROUT17, ,
, , ,
28 29 30 31 32
are examples of programs belonging to the first category. The second category includes programs such as LEGEND,33,34 GenStar,35 and GROW.36 The PRO_LIGAND37 is an example of a program using stochastic methods (section 3.1, page 21, briefly describes features of these design programs).
In this research38 the SPROUT (versions 3.2, 4.01, 4.11, 5.0, 5.1 and 6.0) was used as a de novo ligand design program for 17β-hydroxysteroid dehydrogenase/ketosteroid reductase type 1 (17β-HSD/KSR1) enzyme (EC 1.1.1.62).39 Crystal structure is available from the Brookhaven Protein Data Bank4 complexed, for example with estradiol 1 (entry code 1A27),40 equilin 2 (entry code 1EQU),41 dihydrotestosterone 3 (entry code 1DHT)42 and dehydroepiandrosterone 4 (entry code 3DHE)42 ligands.
O H OH OH O H O O H O O H 1 2 3 4
The SPROUT program was chosen because after a brief comparison with the other reliable structure generation system LUDI, it was found to be more appropriate for the chosen method of design work. The SPROUT differs in an important way from most of other de novo design programs by both binding site identification and structure generation. For example, the LUDI uses real molecules, as a fragment but in addition to this SPROUT can also use templates. Templates are 3D molecular graphs whose edges are labelled by bond type and whose vertices are labelled only by hybridization state and not by atom type. When the generated skeletons satisfy all required constraints the program converts the skeleton into the molecular structure by atom substitution. The use of generalized templates is a way to reduce data (see section 3.2.5.1, page 40). Recently combinatorial chemistry programs have utilized template-based de novo application for design of bioactive molecules.43,44,45 These approaches produce molecular skeletons that can be made using combinatorial chemistry methods.
The SPROUT program itself contains a module (the ALLIGATOR), which can help user with the major problems of the de novo design: combinatorial explosion and synthetic accessibility. Addition to this the possibility to use the CAESA46,47,48 (Computer Assisted Estimation of Synthetic Accessibility) and SynSPROUT49 (Synthetic SPROUT) programs was beneficial. The SPROUT program was selected also because of the potential of contributing to the enhancement of the program itself. The information of azomethine ylide class of 1,3-dipolar cycloaddition reactions, was added after literature survey to the SynSPROUT program’s knowledge base.
1.2 SPROUT and SynSPROUT
The SPROUT molecular modelling software was developed in the ICAMS50 (Institute for Computer Applications in the Molecular Science), University of Leeds, United Kingdom, for molecule structure generation. It can be used for de novo structure generation in cases where the 3-D structure of the target protein is known. The SPROUT can also be used for “receptor
mapping”, pharmacophore identification, when the structure of the target macromolecule is unknown.7,51 The SPROUT uses information derived from the enzyme receptor site to provide steric and electrostatic constraints for the design of new ligands, for example potential inhibitors. The steric constraints used by this program include a volume and some interaction sites within the volume.17 Skeletons are built in a step-by-step manner joining together small 3-D fragments, so that the constraints are satisfied (see section 3.2.6, page 43).
The SPROUT program is composed of five modules (CANGAROO, HIPPO, ELEFANT, SPIDER, ALLIGATOR) and a template library manager.48,52 Different modules perform different tasks one after another and lead to structure generation and evaluation. The CANGAROO (Cleft ANalysis by Geometry based Algorithm Regardless Of the Orientation) detects potential binding pockets of protein structures. This module requires an input file containing the atomic coordinates of the receptor. Two output files are then generated; a cavity file and a receptor file. These files contain all the information necessary for the next module. The HIPPO (Hydrogen Bonding Interaction Site Prediction as Positions with Orientations) identifies favourable hydrogen bondings and hydrophobic regions within a binding pocket. The hydrogen bonding sites are directional and are used to define target sites for the position of potential ligand atoms. The ELEFANT (Election of Functional Groups and Anchoring them to Target Sites) selects functional groups and positions them at the target sites to form starting fragments for the structure generation. The SPIDER (Structure Production with Interactive Design of Results) generates skeletons that satisfy the steric constraints of a binding pocket by growing spacer fragments onto the starting fragments and then connecting the resulting part skeletons together. The ALLIGATOR (Algorithms for Ligand Testing and Ordering of Results) clusters and scores the solutions to provide the user with an efficient tool for evaluating and navigating through the results. Atoms can also be substituted on the basis of the information contained in the Vertex Score Table of each ligand. Three scoring results are allocated for each structure. The Site Substituted Score is an estimation of the putative ligands binding affinity (pKi) for the active site (see section 3.2.7, page 48).
Since it is likely that only a limited number of answers can be synthesised in the laboratory, it is necessary to be able to sort and rank the answer set in some rational way. The SPROUT can predict binding affinity using an empirically derived function that takes into account hydrogen bondings, hydrophobic interactions, etc. and this provides one mechanism for ranking. In
addition to the ALLIGATOR module, CAESA program is helpful as regards clustering and sorting of the output (see section 3.4.6, page 56).
The SynSPROUT program generates new potential ligands using commercially available molecules as fragments that guarantee synthetic feasibility of the molecules. This program uses Classic SPROUT program modules and the same structure generation principles (see section 3.3, page 50).
1.3 Biological
background
The estrogen receptor (ER) plays a crucial role in a number of processes such as the control of reproduction and the development of secondary sexual characteristics. The ER binds ligand and undergoes a conformational change which allows the receptor-ligand complex to bind with a high affinity to specific estrogen response element (ERE) and to modulate transcription of target genes (see section 4.4.4, page 91).53 The effect of estrogen on the mammary gland is of distinct interest because of its linkage to breast cancer.
Estrogen antagonists are used in breast cancer treatment to prevent estrogen action on cell proliferation. Another way to reduce the estrogen effect on breast cancer is to decrease the amount of endogenous estrogens. This could be accomplished by inhibiting enzyme activities involved in estrogen biosynthesis. The most potential pathways for inhibition are the cytochrome P450 aromatase, estrone sulfatase (EC 3.1.6.1) and 17β-hydroxysteroid dehydrogenase/ketosteroid reductase (17βHSD/KSR) pathways54 (see section 4.3, page 59). This research concentrates on studies of the 17βHSD/KSR enzymes,55 which are responsible for the final step of the biosynthesis of the sex hormones that in many cases stimulate the proliferation of breast and prostate cancers. Both estrogens and androgens are more active in the 17-hydroxy steroid than the corresponding 17-keto forms. Human estrogenic 17βHSD/KSR1 is one of eleven isoenzymes that have been defined in mammals so far (see section 4.4.3). Human type 1 enzyme catalyses the reduction of inactive estrone (E1) 5 to the active 17β-estradiol (E2) 1 in the presence of NADPH (nicotinamide adenine dinucleotide phosphate) 6 as a cofactor (see section 4.4.3.5, page 87).56
N O N N N N NH2 O H OH O P O O OH P O O H OH O OH NH2 O O H O 5 6
Because 17βHSD/KSR1 enzyme catalyses estrogen synthesis it is an attractive target for structure-based ligand design for the prevention and control of breast tumour growth. This inhibitor development is based mainly on the information of the binding site received from the SPROUT’s HIPPO-module. These important structural features and detailed binding site interactions are presented in this thesis. After development of the new molecules the best options have been modified and examined in detail. Structure generations have been presented step-by-step (see section 5.2, page 104).
2.
AIMS OF THE STUDY
Many studies have proved that sex steroids, including 17β-estradiol (E2) 1, testosterone (T) 7 and androstenediol (∆5-diol) 8, are potent stimulators of cancer cell growth. They also have an effect on the formation of steroid-based breast- and prostate tumours and cancers. Because 17βHSD/KSR enzymes catalyse the final step in the biosynthesis of estrogen and androgen, it is an important target for the design inhibitors of steroid production in tumour growth.
The primary target of this research was to examine 17βHSD/KSR type 1 enzyme and its ligand binding site and design a novel potential inhibitor molecule for the enzyme. Although the 17βHSD/KSR family currently acknowledges eleven different subtypes, type 1 is used here because it is the best known and there is plenty of information published regarding the enzyme. The most significant point is the crystal structure knowledge and mutation studies, which are not as significant in the case of other subtypes. From the therapeutic point of view concerning breast cancer, a non-steroidal inhibitor is the most desirable. Both 17βHSD/KSR1 and ER binds estrogen 5 ligand. However, in the case of a new inhibitor, it should show binding affinity for the type 1 enzyme but not for the ER. Therefore, the crystal structure information of estrogen receptors is studied for inhibitor designing.
The plan was to create non-steroidal inhibitor molecules with different skeletons compared to earlier studies, where diverse steroid analogues as inhibitors have mainly been used. The target of this study was to generate a selection of molecule structures with different substituents and functional groups using the SPROUT de novo ligand design program. After structure generation the energies and other properties of the created sets of molecules needed to be investigated. It is important to carry out ligand minimization in the receptors active site and examine ligand-receptor features after that. Finally, promising structures needed to be study using in silico docking simulations. The molecules were docked into the 17βHSD/KSR1 enzyme as well as estrogen receptor active site for interaction studies and to ensure deficiency of binding affinity for estrogen receptors. With the help of these generated structure libraries it was also possible to test the performance of the different modules of the SPROUT program and correct possible malfunctions.
The secondary aim was to assist in the development of the SynSPROUT program. Although this is a powerful system, currently it does not have the capability of constructing rings from acyclic precursors. 1,3-dipolar cycloaddition (1,3-DC) reactions were chosen as a specific area to test this desired enhancement to the system. These reactions are classic reactions in organic chemistry and most of these five-membered ring structures are easy to synthesise. This study concentrated on azomethine ylide reactions because of the great resemblance of the peptide structures. The plan was to carry out a thorough literature survey and then add the information about these reactions to the SynSPROUT program’s knowledge base.
3.
OVERVIEW OF SPROUT COMPONENT PROGRAM
3.1
Survey of de novo ligand design programs
De novo design programs are classified here into three different groups as mentioned previously. The first and second categories are based on so called deterministic methods and the third on stochastic methods. Programs belonging to the first category of deterministic methods, for example NEWLEAD, HOOK, LUDI and SPROUT, generate the molecules by linking predefined fragments placed into the interactions sites. This linking method is also called the “outside-in” method (Figure 1).
O H N O O O H O H N O O O H O H O H N ? OH O H N O H O H N ?
Figure 1. Several fragments are placed simultaneously into the binding site and connected by suitable linking groups. Adapted from Verlinde57
The NEWLEAD22 program generates molecules to fit the constraints of a known pharmacophore. Fragments are docked to binding sites and the program links fragments finding an appropriate spacer from a database of simple molecules to generate whole molecules. It connects two isolated moieties repeating the connection until the parts are satisfactorily connected using single-atoms, library spacers and fuse-ring fragments. Generated structures are ranked on the basis of van der Waals violations.
The HOOK23 program is based on 3-D searching systems. It uses databases of existing molecules, such as CSD58,59 (Cambridge Structural Database), to find linking molecules between docked fragments in interaction sites. The program uses MCSS60,61 (Multiple Copy Simultaneous Search) method for identification of the energetically favourable binding site and the HOOK-part searches systematically a database for skeletons, which logically connect binding sites. The molecules are scored according to their steric interactions with the receptor.
The LUDI24,25,26,27 was the first commercially available de novo design program (1992). It is a fragment-based structure generation program that can be used also for 3-D database searching. At the start the program identifies interaction sites using a purely geometrical approach. The program recognizes four different kinds of interaction sites: hydrogen-donor, hydrogen-acceptor, lipophilic-aliphatic and lipophilic-aromatic. Two hydrogen-bonding interaction sites are strongly directional and are presented by sets of vectors rather than a single position. Fragments are taken from the program’s own fragment library and fitted into interaction sites using the root mean square (RMS) method. Connecting fragments are found from other template libraries. The program docks fragments into the interaction sites and give a scored list of possible starting fragments. Users can choose one fragment at a time from the list, select the connection point and dock the next fragments into the connection point. The molecule is ready when all interaction sites are connected. The result is one new ligand molecule. Nowadays programs also take into account synthetic accessibility. The LUDI is a module of the InsightII program, which makes it a powerful tool for structure generation.62
As for the SPROUT,17,28,29 0 1 2,3 ,3 ,3 it too uses fragment joining methods and it links added templates to form a novel molecule skeleton in a stepwise manner. The program automatically identifies protein target sites such as hydrogen bond donor and acceptor sites, complex hydrogen bonding sites (multicentered and bifurcated situations), covalent bonding sites and lipophilic regions (see section 3.2.4, page 30). The SPROUT’s HIPPO module identifies interaction sites as regions just as LUDI but an important difference is that HIPPO preserves the continuous nature of each of the regions and also stores the directionality of hydrogen bond or covalent bond sites. Users can choose simple fragments (templates) for target site docking from the programs own fragment library. An enormous variety of skeletons can be generated from a small number of fragments. After structure generation is completed the program arranges the candidate’s structures in an ascending order using a scoring function based on predicted ligand-binding affinities (furthermore see section 3.2.7, page 48). This program, unlike the LUDI, is capable of creating a large library of possible ligand skeletons during one simulation run.
Another deterministic method category generates molecule structures stepwise manner using sequential growing method also called “inside-out” method (Figure 2, page 23). This category includes programs such as GROW, LEGEND, and GenStar.
O H O H N ? ? ? O H O H N ? ? ? O H S O H N O H S O H N O H S O N H O H N O H S O N H O H N
Figure 2. New ligands can be design by positioning a seed that is further extended by additional building blocks. Adapted from Verlinde57
The LEGEND33,34 program uses an atom-based structure generation system. Sixteen atom types and five bond types are allowed to be used. The input file of the program includes pre-calculated electrostatic and van der Waals interactions in a PDB format.4 Structure building starts with generation on an anchor atom. With the addition of the next atom, the program selects a random atom of this existing partial structure as the root. The new atom is rejected if it occupies a forbidden position (van der Waals violations) with previously generated atoms or with the receptor. Additions of the atoms are repeated until the skeleton fulfils the volume and electrostatic requirements.
The GenStar35 is another atom-based program. It builds skeleton similarly to the LEGEND and the only difference is that atoms of the partial skeleton are scored and every new atom is added in the best 20% of the tested partial skeleton atoms instead of the random selection. The program is allowed to form rings and branching instead of linear growing. The user specifies the number and size of the structures that are generated.
The GROW36 was the first published method for sequential build-up of molecules, initially developed for the design of peptides. The program applies a stepwise joining method for structure generation using amino acids as building blocks. Skeleton building starts by adding an acyl group as the seed in the active site. After this amino acids are added one by one and partial skeletons are scored after every addition to find the best peptide chain. Structure generation terminates when the peptide length reaches a defined size.
The third category includes programs based on random or stochastic methods such as genetic algorithms (GA).3 ,63 7
9 0
This method mimics the evolutionary process of natural selection. The structures can undergo genetic-type operations such as crossover, whereby fragments are mixed from two different structures, and mutation where a torsional angle in a structure is altered or an element type is changed. The PRO_LIGAND is one such program, which use the genetic algorithm method.
The PRO_LIGAND37 is similar in procedure to the LUDI. Binding site identification is based on analogous rule-based approach. The program uses fragments, which are labelled with atom properties and docks these into the binding site. The first fragment that is found to satisfy the constraints is accepted and the program continues onto the next operation. These newly generated structures are evaluated via fitness function. According to this, only the best “individuals” survive for further reproduction. GA allows the mixing of information from the high and low scoring structures and aims to increase the average score of the whole set. The program also includes the possibility of designing novel molecules using pharmacophore mapping or CoMFA (Comparative Molecular Field Analysis) techniques.64
3.2
SPROUT
3.2.1 Current developments of SPROUT
The SPROUT17,2 ,3 is an automatic, interactive and comprehensive set of tools for the rational design of enzyme inhibitors. The first version of the SPROUT, built molecule skeletons by joining new templates to each other, until all the constraints were satisfied.28,29 In addition to that the updated versions of the program, builds skeletons from many target sites simultaneously.51 Development of the SPROUT program has continued for a decade and
validations65 of the program as well as some successful applications for lead discovery have been reported.66,67
In addition to updated and more sophisticated Classic SPROUT releases, the recent progresses of the program include, in addition to sequential system, a parallel package of structure generation by clustering SGI (Silicon Graphics) and PC/Linux platforms.4 ,529
0
8
Moreover, several other projects aim to extend the Classic SPROUT, for example VLSPROUT, and SynSPROUT.5 The former screens virtual combinatorial libraries and the latter generates synthetically accessible ligands by de novo design. Generating ligands using this approach program needs information regarding readily available starting materials and high yielding chemical reactions. Information of the high yield reactions is programmed into the knowledge base using retro-synthetic rules, which are encoded using the PATRAN language.68 The SynSPROUT use the Classic SPROUT program for de novo designing. The result of the virtual synthesis in receptor cavity is easily and synthetically accessible structure. Such a result has a huge advantage compared to the Classic SPROUT result, which still requires plenty of work after structure generation.
3.2.2 General features
The SPROUT is designed to build molecules for a range of applications based on molecular identification and structure generation. Generally structure generation is divided into two main stages:2 ,29
• Primary structure generation to generate skeletons or molecular graphs that satisfy steric constraints and
• Secondary structure generation to convert the skeletons into molecules by making atom substitutions.
Primary structure generation gives skeletons, which have a required shape to satisfy the primary constraints. A skeleton that does not satisfy all the requirements is called a partial skeleton. Skeleton structures are an approximate solution to the problem. The final and chemically realistic molecule structures are produced after secondary structure generation.
The primary constraints defined by the SPROUT require the X-ray diffraction or NMR information of the enzyme-ligand complex, which defines steric and electrostatic constraints of the binding site. Steric constraints are the most important limitation in determining the shape of possible ligands (Figure 3).
3D coordinates
Steric constraints
determining shape and volume
Boundary and Target sites
Skeleton generation Electrostatic and Hydrophobic constraints Molecules Selected molecules for minimisation Templates
with joining rules and conformational analysis
Atom substitution
Organise the results
Primary structure generation Secondary structure generation 3D coordinates Steric constraints
determining shape and volume
Boundary and Target sites
Skeleton generation Electrostatic and Hydrophobic constraints Molecules Selected molecules for minimisation Templates
with joining rules and conformational analysis
Atom substitution
Organise the results
Primary structure generation
Secondary structure generation
Figure 3. Outline of the components required for structure generation using the SPROUT.
The 3-D shape of the receptor and substrate defines the volume of the binding site. The volume is enclosed by a boundary, which restricts the shapes of the new ligands. The electrostatic constraints are divided into more directional effects such as hydrogen bonds and less localised effects such as charge distribution and hydrophobicity.28 The weaker electrostatic and hydrophobic constraints are used later on when converting primary structure skeletons into real molecules. The way ligand binds within a receptor active site is attributable to the shape, volume and electrostatic properties and thus an inhibitor is able to bind with a binding site on the enzyme because it has complementary shape and electrostatic properties to the receptor site. Within the volume there are interaction sites. These are regions, which if occupied by an atom of the ligand can lead to favourable interactions between the ligand and the receptor.29 If the interaction sites are satisfactorily localised, they are used as primary constraints and consequently promote and direct primary structure generation. These localised interaction sites are called target sites and satisfying these regions forms a requirement for
primary structure generation. Small 3-D fragments, called templates, are docked into the target sites and connected to form molecular skeletons using a linking or sequential growing method. A solution is found when all the steric and geometric requirements are satisfied and no boundary violations have occurred (Figure 4). A huge range of skeletons can be generated from a small number of templates.29
Figure 4. Primary structure generation is initiated by constraints definition (boundary, receptor site and target sites). Templates are added to the partial skeleton to fulfil the target site requirements to give final skeleton.29
Receptor site Boundary
Target sites Templates
In the secondary structure generation stage atom substitutions are made to convert the approximate skeleton structures into molecules that are consistent with the secondary constraints and as a result of which molecules are possible to score and analyse (see Figure 3, page 26). When the results are analysed and clustered by the SPROUT some conformational analysis of the selected molecule structures are required. The minimised ligands and the interactions with receptor are possible to re-examine using the SPROUT (see section 5.2.3, page 137). Retrosynthetic analysis is possible to carry out using the CAESA program (see section 5.2.6, page 159).
The program is interactive in a manner that users can control each step of the structure generation. The program consists of five modules (Figure 5, page 28). Sequential use of all five modules leads to the generation of the skeletons and their subsequent analysis. In opening the window of the SPROUT program users can either create a New Job File or open an old file. At the beginning users can specify the nature of the generated skeleton, such as the general molecule skeleton or peptide skeleton. It is possible to design peptide ligands consisting of natural and/or synthetic amino acids as well.
Using the general de novo design process grouping the five modules of the SPROUT can be divided in three groups:
a) Identification by CANGAROO and HIPPO b) Structure generation by ELEFANT and SPIDER c) Scoring, ranking and clustering by ALLIGATOR
Figure 5. The main window of SPROUT.
3.2.3 CANGAROO
CANGAROO (Cleft ANalysis by Geometry based Algorithm Regardless Of the Orientation) detects automatically potential binding sites also called clefts (Figure 6, page 29), within the protein structure.51 A cleft is defined as a large inward facing area on the surface of the protein. The module requires a PDB4 input file, which contains the crystal structure of the target enzyme. The bonds are determined according to a table that contains the connectivity of naturally occurring amino acids (Appendix I A and B).
Figure 6. The cleft of the binding site of the 17βHSD/KSR type1 enzyme, together with estradiol 1. This is the region of the protein immediately surrounding the ligand.
CANGAROO identifies the binding cleft using any one of four alternative procedures:
1) Automatic identification of a ligand and receptor in a protein–ligand complex (Figure 7, page 30). Ligand is automatically separated from the protein and selected. Two PDB output files are generated, one for the cavity (ligand) file and another for the receptor file. Users can define the diameter of the desired part of the receptor using ligand as a centre.
2) Identification of all clefts in the protein based on the surface curvature of individual regions of the protein. Users select one from the many clefts identified in this way. It is also helpful to use the information of the active site residues based on the literature. Only a receptor file is generated.
3) Selection of receptor residues known to be involved in binding based on the knowledge of probable binding site (a crystal structure without ligand). A large number of publications are available on enzyme crystal structure and inhibitors revealing information about the ligand–enzyme interactions.4 ,5 ,691 6 ,70 Analysis of that data can provide useful information about the residues of the active site of an enzyme, which is important for binding to known inhibitors. Only a receptor file is generated.
4) Pharmacophore mode. If the protein crystal structure information is not available CANGAROO can also identify the structural information of the known active ligand.51 The program produces only the cavity (ligand) file.
Figure 7. 17βHSD/KSR1 enzyme-estradiol-NADPH complex. E2 1 is selected automatically (red) as a ligand and NADPH 6 (green) is recognised as another possible ligand.
These files, generated by CANGAROO, serve as input files for the next module of the SPROUT, called HIPPO.
3.2.4 HIPPO
The HIPPO (Hydrogen Bonding Interaction Site Prediction as Positions with Orientations) module identifies interaction sites within the cavity that can be used as starting points for structure generation.3 ,52,71 1 Graphic presentation of boundary surface (see section 3.2.4.1, page 31) and proposed good interactions between the ligand and the receptor, help users to analyse and select a subset of these interaction sites, called target sites (see section 3.2.4.2, page 32). These sites are small regions in space within the receptor cavity, which supply good starting points for structure generation because of the highly directional nature of hydrogen bonding interactions. This continuous region presentation is unique to SPROUT while other de novo
programs still sample orientations within the binding site.7 The HIPPO program identifies simple hydrogen bond donor and acceptor sites, complex hydrogen bonding sites, covalent bonding sites including bonds to metal ions and hydrophobic regions.
3.2.4.1 Boundary surface
One feature of HIPPO is the calculation and display of the boundary surface of the receptor as a 3-D grid.52 This surface presents a suggestion of minimum distance between the ligand atom and the protein atom.
Distance = x + y
where x is the van der Waals (vdW) radius of the protein atom and y varies according to the nature of the protein atom; hydrophobic y = 1,5 Å, hydrogen bonding y = 1,0 Å and covalent y = 0,0Å.
The SPROUT boundary surface gives a good and clear outline for the active site. It helps users to comprehend the size and shape of the active site as well as select good target sites. For example the LUDI24 program does not give any kind of illustration of the active site, which makes it difficult to select fragments for the connections.
Different regions of the boundary surface are shown in different colours according to the hydrogen bonding and hydrophobic/hydrophilic properties of the regions. Colours that are used for hydrogen bonding sites are related to the areas of the boundary; red corresponds to a hydrogen acceptor area, blue to a hydrogen donor area and purple to a complex hydrogen bonding (acceptor-donor) area. Green correlates to a hydrophobic region (Figure 8, page 32). HIPPO also defines and displays the hydrophobic region of the receptor in which case it is shown in yellow.
Figure 8. The boundary surface for 17β-HSD/KSR1-equilin 2 ligandcomplex has determined by HIPPO module.
3.2.4.2 HIPPO target sites
Using the database of natural amino acid residues (see Appendix I) it is possible to identify the hydrogen bonding regions. To discover the number and nature of the target sites, the program carries out the following operations:
• The hydrogen positions for hydrogen donors and the electron pairs for acceptors are calculated on the basis of hybridisation.
• An optimal intra-molecular hydrogen-bonding network is generated next. Limits for intra-molecular hydrogen bonds within the receptor are calculated as follows: The hydrogen-acceptor distance is calculated (for all hydrogens) and compared to the limit (default 2.75Å). If the distance is less than this limit, the donor-hydrogen-acceptor angle is also checked (Figure 9, page 33). The angle is accepted if it is at least Θ (by default the tolerance is 45°, Θ = 180° - 45° = 135°). If the bond is acceptable then it is scored and stored in a list that is sorted by the scores of the potential hydrogen bonds.52
Figure 9. Definition of the intra-molecular hydrogen bonds is shown with the default values. Graphic presentation by Z. Zsoldos,52 representations adapted from the SPROUT homepage (SimBioSys Inc.).48
Detecting the intra-molecular hydrogen bonds usually has the effect of fixing the orientation of terminal functional groups and solvent molecules that have been allowed to remain in the receptor cavity.7 When residues can exist in different protonation states (e.g. Glu, His, Asn), SPROUT selects the protonation state that allows the highest number of intra-molecular hydrogen bonds. The donor hydrogens and the acceptor lone pairs that are used in intra-molecular hydrogen bonding are eliminated from further investigation. Complementary donor and acceptor regions are generated within the cavity to correspond to the hydrogen donor and acceptor atoms of the receptor. Tolerances are then applied to the most favourable hydrogen bond length and angle to identify the target site regions where suitable heteroatoms could have extremely strong hydrogen bonding.4 ,5 9 2
The hydrogen bonding target sites are represented by specifically shaped volumes around potential hydrogen bonding residues. The target sites generated by HIPPO are:
• Hydrogen acceptor sites are defined using the position of the hydrogen donor atoms of the receptor. Limits are applied to both the distance and to the direction of the ideal bond. To form a hydrogen bond, ligand (L) needs to reside in the red area (Figure 10, page 34).52
L D H δ min dist max dist 1.0A
Hydrogen acceptor site
Default values min dist = 1.6A max dist = 2.2A δ = 45º D - receptor atom L - ligand atom
Figure 10. A ligand atom can be placed anywhere within the red region (graphic). The SPROUT illustration of an acceptor site. After Zsoldos.4 ,528
• Hydrogen donor sites are derived similarly to acceptor sites, according to the position of the hydrogen acceptor groups in the receptor. In graphic illustration (Figure 11), O stands for the receptor hydrogen acceptor atom, C for the receptor atom next to it, L for the ligand hydrogen donor atom and H for the hydrogen atom covalently bonded to it. A hydrogen bond is formed between O and L through H. Ligand atom (L) needs to exist in blue area while hydrogen atom (H) resides in a white area.52 C O C-H VdW cut-off 1.0A 1.0A H L max dist min dist
Hydrogen donor site
δ
Default values: min dist = 1.6A max dist = 2.2A
δ = 45º
Figure 11. A donor atom of the ligand (L) can be placed anywhere within the blue region with the hydrogen located within the white region (graphic). After Zsoldos.4 ,5 8 2
• Complex hydrogen bonding sites are identified by the intersection of simple sites. These sites can be, for example multicentered and bifurcated situations (Figure 12). If a hydroxyl-group exists within this region it can form two particularly strong hydrogen bonds at the same time.52
Figure 12. A complex hydrogen donor-acceptor intersection site is presented as a purple area.
• Covalent sites regionsare also identified from the receptor cavity. Covalent as well as metal sites possess similar geometry as hydrogen acceptor sites. The program recognises structures, which can form a temporary covalent bond with the ligand in the transition state and creates a covalent site around this group. In Figure 13 O represents the hydroxyl-terminal group of the serine residue.
O θmin θmax bmin
bmax Covalent bond site
C
Figure 13. The geometric representation of the covalent site includes tolerances for the bond length and bond angle. A ligand atom can be placed anywhere within the green region to form a covalent bond with the Ser residue of the receptor. After Zsoldos.48,52
• Metal ion sites in metalloproteins are usually part of the active site of the protein and are involved in the catalytic cycle. For this reason a good inhibitor for these proteins should include an atom, which can interact with the metal ion. HIPPO identifies metal ions (Zn, Mg, Cu, Ca, Co, Fe, Ni, Mn) in a receptor PDB file, calculates the most likely direction of the free valence according to the existing connections and generates an appropriate target site. Metal ion target sites are generated by a similar geometric approach than covalent sites (Figure 14).
M δ X
tolerance bond length
Metal ion site
Figure 14. Metal ion target site is shown as a grey area. After Zsoldos.48,52
The hydrogen bonding sites initially generated by HIPPO are quite large (usually 20-100 in number, depending on the size of the cavity). However, removing the portions that violate the steric constraints of the receptor significantly decreases sites. In addition, the user is able to adjust the angle and distance tolerance values, as well as to generate new spheric sites for hydrophobic region.
• In the earlier SPROUT versions (v3.2 to 4.11) the users were able to generate spheric
sites in the cavity and place it anywhere in the binding site, typically in the most strongly hydrophobic regions (Figure 15 a, page 37). Spheric sites were not hydrogen bonding sites, but they provided starting points for the structure generation. The latest published version of the program (v5.0 and later versions) offer different ways to deal with hydrophobic regions. The program analyses the hydrophobicity of the cavity and
provides the most hydrophobic areas as a starting point for the structure generation
(Figure 15 b, page 37). Appropriate fragments (see section 3.2.5, page 39) are possible
(a)
(b)
Figure 15. a) Spheric sites were used as hydrophobic regions for skeleton generation in the binding site (older versions). b) The hydrophobic areas of the new versions are now used similarly for structure generation.
At this point it is also possible to analyse the native ligand and identify the hydrogen bond interactions it forms. This assists the choice of target sites. The information is also important for structure generations for the reason that novel molecules should have better scoring values and interactions than the native ligand.
It would be quite impossible for any single inhibitor to satisfy all the sites the HIPPO module generates. For that reason a subset of the available binding sites of the boundary surface have to be selected by users (Figure 16a and 16b, page 38). Different subsets of target sites will give skeletons with different structural characteristics and binding properties. There could be many reasons for choosing any individual subset but the usual practice is to base this choice on literature data. The selected sites, which are going to be the interaction points between the
receptor and the generated ligand, are saved and serve as input for the next module of the SPROUT.
(a) (b)
(a) (b)
Figure 16. a) The boundary surface for 17βHSD/KSR1-estradiol 1 complex. b) The boundary and selected interaction sites (two acceptor, one donor and three spheric sites) for the enzyme. The graphical display (lower part) shows the target sites within boundary. Red labels indicate selected acceptor sites (in this case histidine His221 and asparagine Asn152) and blue indicate donor site (tyrosine Tyr218).
3.2.4.3 Pharmacophore module
If the protein crystal structure information is not available and the CANGAROO module has stored only the structural information of the known active ligand, SPROUT progress directly to the HIPPO pharmacophore module.51 The information of the active ligand (Figure 17, page 39) is used for pharmacophore identification by creating spheric sites into appropriate regions in space. The easiest way to create spheric sites is to use active ligand atoms for sites generation. This way 3-D sphere appears with atom type colouring to the right location in the space. These sites act as acceptor, donor and hydrophobic sites and are stored for fragment docking. After target site identification and generation, pharmacophore modelling progresses onto the next module in a similar way to the normal de novo designing.
Figure 17. Active ligand (here ligand from 17βHSD/KSR1-estradiol complex) hasused as a starting point for pharmacophore identification by HIPPO.
3.2.5 ELEFANT
The ELEFANT (ELEction of Functional Groups and ANchoring them to Target Sites) module allows users to choose and dock the starting fragments for target sites that were selected by HIPPO. To carry out this task users have first to select a group of target sites, which can consist of an individual target sites or group of sites. Starting fragments or templates are selected from a template library30 (see section 3.2.5.1, page 40). Users select any number of templates from the template library and the ELEFANT docks the templates in such a way that they satisfy the group of target sites. The docking process uses rigid templates and generates all mapping combinations between the template atoms and the selected target sites.73 In addition to templates, it is possible to import known or dock unknown ligands (as a PDB or MDL file) into the target site and use these as a starting point for the automated ligand generation algorithm.52,72 This gives a very wide freedom of choice, since it allows users to choose from a huge number of structures. The template, which does not satisfy the target site or violates the boundary surface of the receptor, is rejected. In the case of a group of target sites that contain more than one site, each template must contain vertices that satisfy all the target sites in the group.
A set of target sites is shown in Figure 18a (page 40). The six target sites have been divided into five sets. The first group consists of two target sites an acceptor site (5) and spheric site (1) The rest of the target sites are single sites; two of them are hydrophobic spheric sites (2
and 3), one is an acceptor site (4) and one a donor site (6). Some of templates docked into target sites are presented in Figure 18b. The graphical display (lower part) shows the number of selected templates for each target sites.
(a) (b) 6 1 2 3 4 5
Figure 18. a) Five groups of target sites and b) some selected templates for 17βHSD/KSR1 enzyme.
3.2.5.1 SPROUT template library
The template library concept is generally subdivided into two different groups: template library and template library manager.
The template library includes 3-D molecular graphs of the simple organic structures. The edges of the templates are labelled by bond type but atom types are unspecified (generic templates). Instead of atoms types the vertices of each template are defined by their hybridisations state, since different hybridisations lead to diverse geometries.28,29,30,52 Because skeletons are treated as rigid structures, templates have to be presented in the library with various low energy conformations (Figure 19a, page 41) to introduce some ligand flexibility into the structure generation model. Generalisation of the templates enables a small library of templates and saves considerable computation time since many fragments with similar geometry do not need to be processed separately (Figure 19b, page 41).52 The template library for peptide generation is simpler than the standard library including just parts of peptides as a
template. Only generic templates that satisfy steric constraints are converted into actual molecules by heteroatom substitution (see section 3.2.7, page 48).
(a) (b) N O N N S N O Substructure Templates Chair Boat Twist boat
Figure 19. a) Cyclohexane structure exists as a three favourable conformations in template library. b) Each templaterepresents several molecular fragments.
The basic template library includes a choice of acyclic templates, which contain sp2 or sp3 atoms, and a range of 3-6 membered rings in various conformations. The updated version (version 4.11 and 5) of the template library includes not only generic templates but also specific templates such as amino pyridine, indole, carboxy-group as well as nitrogen and sulphur atoms (Figure 20 and 21, page 42). There is no limit to the number of templates that can be selected. However, if the choice of the starting templates is restricted, the structure generation is quicker and the generated structures are less diverse.
O N 4 3 2 C sp3 O O O N N F F F O O N N S O O N N N N C N N N S O N O N sp3 S sp3
Figure 20. The standard template library: generic templates (left), specific templates (right). The figure inside the ring indicates the number of conformations.
Figure 21. The standard template selection menu of the SPROUT (v4.11) and two different conformations of cyclohexane ring are shown in windows.
The template library manager enables users to make an additional template library. New templates can be made in another modelling program (e.g. MacroModel, Moloc, etc.) and can be saved as a MDL mol file. These imported fragment files are starting points for the novel template library. Template library manager automatically observes the necessary features of the molecules and calculates essential information for structure generation. Users can check and modify this information using interactive tools provided (Figure 22, page 43). It is possible to add the general template library as a part of the new library.
An ELEFANT output file is generated when the starting templates for all the target sites are selected and saved. This file is the input for the penultimate and probably the most important module of SPROUT.