Weight Epidermal Growth Factor complex
4.1 Synopsis
This chapter discusses the use o f the D A PM atch algorithm to
p red ict a com plex of epiderm al growth factor (EGF) and epiderm al
growth factor binding protein (EGFBP). The full com plex is formed
from two E G F molecules and two E G FBP molecules and is known as
high m olecular weight EGF (HM WEGF). The structure of this complex
has not yet been solved, by either crystallography or N.M .R. This
w o rk was u n dertaken in c o llab o ratio n w ith D r B. Bax (B irkbeck
College, University of London) who modelled the EGF binding protein
using the structures of tonin and kallikrein. The D A PM atch program
was used to suggest possible modes of binding. Careful exam ination
o f the d ata rev ealed a problem with the protein stru ctu res used,
details o f this problem are given and its solution is described. A
variety o f biochem ical data was used to narrow the search and a
single binding orientation was chosen. Since the observed structure
o f the complex is not known, the result presented in this chapter is
the proposed model of the complex.
4 .2 I n t r o d u c t i o n
4.2.1 Epidermal Growth Factor
E pid erm al grow th factor (EG F) was one o f the first growth
facto rs to be d isc o v ere d and has since been w idely stu d ied , it
stim ulates the division o f epiderm al, epithelial and connective tissue
cells. EGF transmits a signal to the interior of a cell by binding to EGF
recep to rs in the cell m em brane and inducing structural changes to
the re c e p to r. E G F is found in larg e q u a n titie s in the m ouse
su b m a n d ib u lar gland, where it exists as a high m o le cu la r w eight
com plex (H M W EG F). This com plex appears to consist of two EG F
m olecules bound to two m olecules of a much larger protein, known
as EGF binding protein (EGFBP). EGFBP has been sequenced (Blaber, et a l ., 1987) and thereby identified as a m em ber of serine proteinase family, with close homology to the protein kallikrein. Three glandular
kallikreins were suspected of binding EGF, and were called EGFBP
types A,B and C. In fact, only EGFBP type C has been shown to bind
E G F (Isack so n , et al., 1987), in this chapter type C will be referred to as simply EGFBP.
An N.M.R. structure of E G F has been determined (Montelione,
et al., 1992). The structure of EGFBP has been modelled by Dr B. Bax and Ms G. Ferguson using the program COM POSER (Sutcliffe, et al.,
1987a; S utcliffe and H ayes, 1987; S utcliffe, et al., 1987b) and the g ra p h ic s p a c k a g e S Y B Y L (T rip o s A s so c ia te s in c.). T h e know n
stru ctu res o f p o rcin e p a n crea tic kallik rein A (PP K ) (B ode, et al.,
1983) and rat subm axillary gland tonin (Fujinaga and Jam es, 1987)
were used as the basis for the m odelling procedure. These structures
are alm ost identical, except fo r seven structurally v ariab le regions
(see Section 1.4). For each of these loops a comparison of the EGFBP
sequence and the tonin and kallikrein sequences was made, and the
m ost homologous loop was used in the EGFBP structure. In each case
a loop o f the correct length, size and properties was found in either
tonin or kallikrien and it was not necessary to m ake a more general
search o f kn o w n pro tein structures. T he m odel was then energy
m inim ised to rem ove any short range clashes between atoms.
4.2.2 Application of the DAPM atch algorithm
The D A PM atch program was used to exam ine the docking of
E G F to EGFBP. The N.M.R. structure of EGF and the model of EGFBP
w ere the essential starting p o in t for this investigation. As well as
these stru ctu res, co n strain ts w ere necessary before the D A PM atch
program could be used with confidence. Several pieces of biochemical
data w ere known,
i. The N-terminal domain of EGFBP interacts with EGF (Blaber e t al, , unpublished results).
a. The C-terminal residues of EGF are retained in the active site of E G FB P T h is is suggested by the fact that the rem oval of the C-
term in al arginine residue o f E G F prev en ts fo rm atio n o f the
H M W EG F complex (Server, et a i , 1976).
Hi, H M W E G F is know n to be form ed from two E G F /E G F B P co m p lex es (T aylor, et al,, 1974). A two fold axis of symmetry is therefore almost certain to be present in HM WEGF.
iv. Neither EGFBP nor EGF normally exists as dimers. This implies that the full com plex can only be form ed after ind ep en d en t
EGF/EGFB P complexes have been formed.
These conditions suggest a com plex of the form illustrated in
F ig u re 4.1. H ere the first step of fo rm in g the co m p lex is the
in te rac tio n of the C -term inus o f E G F ( A l) with E G F B P ( B l) . This
binding is weak, but can be stabilised by the presence of another
a.
EGF
Al
EGFBP
b.EGFBP
B2
EGFBP
F i g u r e 4 . 1A schematic binding model for the HMWEGF complex. The first step (a) is a relatively unstable association between an EGF molecule ( A l) and and EGFBP molecule (H I), involving the binding of the C- terminal arm of E G F (A l) to the specificity pocket of E G F B P (B l). Two such dimers come together (b) to form the final, symmetrical, HWMEGF complex.
E G F/E G FB P com plex (A2/B2). The final, stable H M W E G F com plex
involves the binding of E G F (A l) with EGFBP(B2) and, by symmetry,
EGF(A2) and E G F B P (B l). Hence each EGF molecule has large contact
areas with both E G FB P m olecules. How ever, the am ount of contact
between the two E G F molecules (A1/A2) and between the two EGFBP
m olecules (B1/B2) would be much smaller, explaining the absence of
h o m o d im e rs o f both E G F and E G F B P . T he A l / B l and A 2/B 2
interm ed iate com plexes are identical and this will form a two fold
axis o f symmetry in the final HM W EGF complex.
The DAPM atch program was used to investigate the binding of
the guest m olecule EG F(A 2) with host m olecule E G F B P (B l). 'This
b in d in g was c o m p lica te d by the p resen c e o f the C -term in u s of
E G F (A l). The C-terminal residues of E G F (A l) were ill-defined in the
N.M .R . structure. How ever, since it was known that these residues
bound to the active site of EGFBP, it was possible to model the three
C -term inal residues into the specificity pocket of E G FBP, using the
resid u es o ccu p y in g the P 1 ,P 2 and P3 sites (Section 1.9) in the
structure betw een PPK and BPTI. An additional two residues from
the C-term inal region were also modelled leaving the E G FB P binding
cleft (Figure 4.2). These residues were chosen to be in an extended
conform ation. This augmented EGFB P model could then be used as a
host molecule in the DAPM atch algorithm.
4.2.3 C onstraints
The known biochemical data of the E G F/EG FB P system had to
be tran sla te d in to a set o f c o n strain ts. P re v io u sly two types of
co n strain t had been used (Section 2.12). The first was the binding
site constraint, which could be applied to both the host and guest
m o le cu le s and stipulated that certain resid u es be p re se n t in the
m
F ig u re 4.2
The C-terminal arm of EGF (gneeri trace, ail atoms shown) as modelled into the specificity sites of EGFBP ( f k j trace, C a only)
binding region. The second was the loose distance constraint, which
stipulated that a particu lar residue of the host m olecule be within
interacting distance of a certain residue of the guest molecule. A new
c o n stra in t was now used, w hich was sim ilar to the binding site
c o n s tr a in t but m o re s p e c ific . T h is c o n s tr a in t r e q u ir e d th a t a
particular residue o f one of the m olecule was not only present in the
b in d in g reg io n , but also m ad e an in teractio n with an u n d efin ed
residue from the other molecule. This constraint could be applied to
residues that were know n to form part of the binding region, but
w h o se p r e c is e in te ra c tio n w as no t k n o w n , and so a d ista n c e
constraint could not be used.
This interaction constraint was applied so that,
f. At lea st one o f the three C -term inal resid u es o f E G F ( A l)
interacted with the guest molecule, EGF(A2).
ii. At least one of the EGFBP residues 39,40 or 41 interacted with the guest molecule. These residues form a surface loop which is
strongly im plicated to play a part in the com plex form ation
(Blaber et al., unpublished results).
Binding region and loose distance constraints were also applied.
Hi. The binding region of the EGFB P was known. The area around the catalytic triad (Ser 57, His 107, Asp 195) was involved, as
was the C-terminal section o f E G F (A l) (Trp 49, Trp 50, Glu 51,
Leu 52, Arg 53). Hence surface slices taken from the EGFBP
molecule were only considered if they contained at least 90% of
these resid u es.
T h e s e th r e e c o n s t r a i n ts p ro v e d i n a d e q u a te , and se v era l
h u n d re d b in d in g o rien tatio n s still rem ain ed . One final p iec e of
biochem ical data still to be used was the implied sym m etry of the
H W M E G F com plex (Section 4.2.1). One way for this sym m etry to be
satisfied w ould be fo r the two C -term in al reg io n s of the E G F
m olecules, between the section bound to E G FB P and the bulk of the
E G F m olecule, to form a section o f anti-parallel p - s h e e t. S im ila r
s tr u c tu r e s a re seen in p r e a lb u m in (B la k e , et al., 1978) and c o n c a n a v a lin A (H a rd m a n and A in sw o rth , 1972). T h is ty p e of
sym m etry was present in several of the D A PM atch orientations, and
it was decided to choose only structures of this type. To allow the
form ation o f a p -sh e et at the centre of sym m etry it was necessary
for the C-terminal residues of the guest E G F m olecule (A2) to be in
co n tact with the C -term inal residues o f E G F ( A l) which had been
m odelled into the specificity pocket of E G F B P (B l). T w o constraints
w ere used to allow for two d iffe re n t ways o f a sso c ia tin g the ind iv id u al p - s tr a n d s to form a sheet. These were that either residue
51(A1) was within 10Â of residue 47(A2) or that residue 53(A2) was within 15Â of residue 47(A2). The two p - s h e e t s allow ed by these
c o n stra in ts place a cen tre o f sym m etry at e ith er resid u e 49 or
residue 50 of the EG F C-terminus.
4.3 M e t h o d
4.3.1 The First Application of DAPMatch
T he in itial d ata for the D A PM atch p ro g ram was the E G F
structure, as supplied by Dr G. M ontelione and the m odel of EGF
binding protein, as supplied by Dr Bax. The algorithm presented in
C hapter 2, and applied to antibody/antigen com plexes in C hapter 3,
was follow ed. T he same p aram eters were used throughout. Surface
slices w ere taken from both stru ctu res and the steric m atch in g
p r o c e d u r e f o llo w e d . A f te r c lu s te r in g 1858 o r i e n t a t i o n s w ere
filtering was then used, f h e increased selectivity filter presented in
Section 3.8 could not be used in this case since there was no evidence
fo r strong electro static linkages in the E G F /E G F B P com plex. The
constraints detailed in Section 4.2.3 were then applied.
The rem a in in g 84 co m p lex e s w ere e x am in ed by eye on a
graphics w orkstation. It was clear to both Dr Bax and m y se lf that
none o f the structures proposed by the D A PM atch p rogram were
satisfactory. The com plexes produced fell into three broad categories.
i. Straddling (Figure 4.3a,b). This class of structures brought the EG F in contact with the cleft walls of EGFBP, but not with the
cleft itself. T w o patches of surface area were buried, but the
cleft was left unoccupied and a hole formed.
ii. End on approach (Figure 4.3c). These structures bury an area around one end of the m ajor axis of the roughly ellipsoidal EGF
molecule. This surface of EG F is bound to the cleft of EGFBP, but
the total surface area buried is small and the binding cleft is
not fully occupied.
Hi. O verhanging (Figure 4.3d). These structures brought a variety of areas of the E G F molecule into contact with EGFBP. However
none of the complexes fully occupied the binding cleft, instead
one or other of the binding cleft walls formed the centre of the
interaction region. This m eant that the E G F m olecule wrapped
around EGFBP, contacting with areas that were not implicated
in binding.
N one o f the structures suggested by the D A PM atch program
fully occupied the E G FB P binding cleft. The structures of E G F and
E G FB P were closely examined for features that m ight cause this and
it was noted that some of the surface side-chains on the EGFBP model
(a)
S ' - '
Figure 4.3a,b
Two stereo views of a type i docking, produced by the first application of DAPMatch to the EGF/EGFBP system. View (a) shows the C a trace of EGF (red) straddling the binding cleft of EGFBP (green). View (b) shows a van der Waals sphere representation of the same complex, highlighting the lack of contact between E G F and EGFBP in the central region.
I
m
M
Figure 4.3c,d
Stereo views of a type ii docking (c) and a type iii docking (d), produced by the first application of DAPMatch. The type ii docking results in a small surface area of contact between the molecules and the type iii docking overhangs the binding cleft.
p o in te d d irec tly out in to solvent. T he su rface sid e -ch a in s are
generally mobile and hence the modelling process cannot normally
p re d ic t their o rien tatio n s. Due to this lack o f in fo rm atio n , the
orientation of m any of the solvent exposed side-chains had been
chosen in an arbitrary manner. In particular, a cluster of residues
were noted on one wall of the EGFBP binding cleft whose side-chains
w ere highly solvent exposed (Figure 4.4). Surface slices o f the
binding cleft, as taken by the DA PM atch program , included these
residues and consequently the size o f the binding cleft wall was
artificially enhanced. For this reason the D A PM atch program was
unable to bring the E G F molecule into satisfactory contact with the
EGFBP cleft without causing energetically prohibitive clashes with the
cleft wall. This observation explained why DAPM atch produced the
classes of complex described above: types i and iii avoided the centre of the cleft completely and type ii brought a narrow strip of the EGF molecule into contact with the cleft without making any contact with
the cleft wall.
4.3.2 Pruning
The DAPM atch program was designed to allow for areas of
mismatch during the docking process. The nature of the soft potential
results in a greater tolerance for cavities between surface slices than
for steric clashes (Section 2.9). This feature was used to overcome the