• No results found

A Practical Application of DAPMatch to the modelling of the High Molecular

Weight Epidermal Growth Factor complex

4.1 Synopsis

This chapter discusses the use o f the D A PM atch algorithm to

p red ict a com plex of epiderm al growth factor (EGF) and epiderm al

growth factor binding protein (EGFBP). The full com plex is formed

from two E G F molecules and two E G FBP molecules and is known as

high m olecular weight EGF (HM WEGF). The structure of this complex

has not yet been solved, by either crystallography or N.M .R. This

w o rk was u n dertaken in c o llab o ratio n w ith D r B. Bax (B irkbeck

College, University of London) who modelled the EGF binding protein

using the structures of tonin and kallikrein. The D A PM atch program

was used to suggest possible modes of binding. Careful exam ination

o f the d ata rev ealed a problem with the protein stru ctu res used,

details o f this problem are given and its solution is described. A

variety o f biochem ical data was used to narrow the search and a

single binding orientation was chosen. Since the observed structure

o f the complex is not known, the result presented in this chapter is

the proposed model of the complex.

4 .2 I n t r o d u c t i o n

4.2.1 Epidermal Growth Factor

E pid erm al grow th factor (EG F) was one o f the first growth

facto rs to be d isc o v ere d and has since been w idely stu d ied , it

stim ulates the division o f epiderm al, epithelial and connective tissue

cells. EGF transmits a signal to the interior of a cell by binding to EGF

recep to rs in the cell m em brane and inducing structural changes to

the re c e p to r. E G F is found in larg e q u a n titie s in the m ouse

su b m a n d ib u lar gland, where it exists as a high m o le cu la r w eight

com plex (H M W EG F). This com plex appears to consist of two EG F

m olecules bound to two m olecules of a much larger protein, known

as EGF binding protein (EGFBP). EGFBP has been sequenced (Blaber, et a l ., 1987) and thereby identified as a m em ber of serine proteinase family, with close homology to the protein kallikrein. Three glandular

kallikreins were suspected of binding EGF, and were called EGFBP

types A,B and C. In fact, only EGFBP type C has been shown to bind

E G F (Isack so n , et al., 1987), in this chapter type C will be referred to as simply EGFBP.

An N.M.R. structure of E G F has been determined (Montelione,

et al., 1992). The structure of EGFBP has been modelled by Dr B. Bax and Ms G. Ferguson using the program COM POSER (Sutcliffe, et al.,

1987a; S utcliffe and H ayes, 1987; S utcliffe, et al., 1987b) and the g ra p h ic s p a c k a g e S Y B Y L (T rip o s A s so c ia te s in c.). T h e know n

stru ctu res o f p o rcin e p a n crea tic kallik rein A (PP K ) (B ode, et al.,

1983) and rat subm axillary gland tonin (Fujinaga and Jam es, 1987)

were used as the basis for the m odelling procedure. These structures

are alm ost identical, except fo r seven structurally v ariab le regions

(see Section 1.4). For each of these loops a comparison of the EGFBP

sequence and the tonin and kallikrein sequences was made, and the

m ost homologous loop was used in the EGFBP structure. In each case

a loop o f the correct length, size and properties was found in either

tonin or kallikrien and it was not necessary to m ake a more general

search o f kn o w n pro tein structures. T he m odel was then energy

m inim ised to rem ove any short range clashes between atoms.

4.2.2 Application of the DAPM atch algorithm

The D A PM atch program was used to exam ine the docking of

E G F to EGFBP. The N.M.R. structure of EGF and the model of EGFBP

w ere the essential starting p o in t for this investigation. As well as

these stru ctu res, co n strain ts w ere necessary before the D A PM atch

program could be used with confidence. Several pieces of biochemical

data w ere known,

i. The N-terminal domain of EGFBP interacts with EGF (Blaber e t al, , unpublished results).

a. The C-terminal residues of EGF are retained in the active site of E G FB P T h is is suggested by the fact that the rem oval of the C-

term in al arginine residue o f E G F prev en ts fo rm atio n o f the

H M W EG F complex (Server, et a i , 1976).

Hi, H M W E G F is know n to be form ed from two E G F /E G F B P co m p lex es (T aylor, et al,, 1974). A two fold axis of symmetry is therefore almost certain to be present in HM WEGF.

iv. Neither EGFBP nor EGF normally exists as dimers. This implies that the full com plex can only be form ed after ind ep en d en t

EGF/EGFB P complexes have been formed.

These conditions suggest a com plex of the form illustrated in

F ig u re 4.1. H ere the first step of fo rm in g the co m p lex is the

in te rac tio n of the C -term inus o f E G F ( A l) with E G F B P ( B l) . This

binding is weak, but can be stabilised by the presence of another

a.

EGF

Al

EGFBP

b.

EGFBP

B2

EGFBP

F i g u r e 4 . 1

A schematic binding model for the HMWEGF complex. The first step (a) is a relatively unstable association between an EGF molecule ( A l) and and EGFBP molecule (H I), involving the binding of the C- terminal arm of E G F (A l) to the specificity pocket of E G F B P (B l). Two such dimers come together (b) to form the final, symmetrical, HWMEGF complex.

E G F/E G FB P com plex (A2/B2). The final, stable H M W E G F com plex

involves the binding of E G F (A l) with EGFBP(B2) and, by symmetry,

EGF(A2) and E G F B P (B l). Hence each EGF molecule has large contact

areas with both E G FB P m olecules. How ever, the am ount of contact

between the two E G F molecules (A1/A2) and between the two EGFBP

m olecules (B1/B2) would be much smaller, explaining the absence of

h o m o d im e rs o f both E G F and E G F B P . T he A l / B l and A 2/B 2

interm ed iate com plexes are identical and this will form a two fold

axis o f symmetry in the final HM W EGF complex.

The DAPM atch program was used to investigate the binding of

the guest m olecule EG F(A 2) with host m olecule E G F B P (B l). 'This

b in d in g was c o m p lica te d by the p resen c e o f the C -term in u s of

E G F (A l). The C-terminal residues of E G F (A l) were ill-defined in the

N.M .R . structure. How ever, since it was known that these residues

bound to the active site of EGFBP, it was possible to model the three

C -term inal residues into the specificity pocket of E G FBP, using the

resid u es o ccu p y in g the P 1 ,P 2 and P3 sites (Section 1.9) in the

structure betw een PPK and BPTI. An additional two residues from

the C-term inal region were also modelled leaving the E G FB P binding

cleft (Figure 4.2). These residues were chosen to be in an extended

conform ation. This augmented EGFB P model could then be used as a

host molecule in the DAPM atch algorithm.

4.2.3 C onstraints

The known biochemical data of the E G F/EG FB P system had to

be tran sla te d in to a set o f c o n strain ts. P re v io u sly two types of

co n strain t had been used (Section 2.12). The first was the binding

site constraint, which could be applied to both the host and guest

m o le cu le s and stipulated that certain resid u es be p re se n t in the

m

F ig u re 4.2

The C-terminal arm of EGF (gneeri trace, ail atoms shown) as modelled into the specificity sites of EGFBP ( f k j trace, C a only)

binding region. The second was the loose distance constraint, which

stipulated that a particu lar residue of the host m olecule be within

interacting distance of a certain residue of the guest molecule. A new

c o n stra in t was now used, w hich was sim ilar to the binding site

c o n s tr a in t but m o re s p e c ific . T h is c o n s tr a in t r e q u ir e d th a t a

particular residue o f one of the m olecule was not only present in the

b in d in g reg io n , but also m ad e an in teractio n with an u n d efin ed

residue from the other molecule. This constraint could be applied to

residues that were know n to form part of the binding region, but

w h o se p r e c is e in te ra c tio n w as no t k n o w n , and so a d ista n c e

constraint could not be used.

This interaction constraint was applied so that,

f. At lea st one o f the three C -term inal resid u es o f E G F ( A l)

interacted with the guest molecule, EGF(A2).

ii. At least one of the EGFBP residues 39,40 or 41 interacted with the guest molecule. These residues form a surface loop which is

strongly im plicated to play a part in the com plex form ation

(Blaber et al., unpublished results).

Binding region and loose distance constraints were also applied.

Hi. The binding region of the EGFB P was known. The area around the catalytic triad (Ser 57, His 107, Asp 195) was involved, as

was the C-terminal section o f E G F (A l) (Trp 49, Trp 50, Glu 51,

Leu 52, Arg 53). Hence surface slices taken from the EGFBP

molecule were only considered if they contained at least 90% of

these resid u es.

T h e s e th r e e c o n s t r a i n ts p ro v e d i n a d e q u a te , and se v era l

h u n d re d b in d in g o rien tatio n s still rem ain ed . One final p iec e of

biochem ical data still to be used was the implied sym m etry of the

H W M E G F com plex (Section 4.2.1). One way for this sym m etry to be

satisfied w ould be fo r the two C -term in al reg io n s of the E G F

m olecules, between the section bound to E G FB P and the bulk of the

E G F m olecule, to form a section o f anti-parallel p - s h e e t. S im ila r

s tr u c tu r e s a re seen in p r e a lb u m in (B la k e , et al., 1978) and c o n c a n a v a lin A (H a rd m a n and A in sw o rth , 1972). T h is ty p e of

sym m etry was present in several of the D A PM atch orientations, and

it was decided to choose only structures of this type. To allow the

form ation o f a p -sh e et at the centre of sym m etry it was necessary

for the C-terminal residues of the guest E G F m olecule (A2) to be in

co n tact with the C -term inal residues o f E G F ( A l) which had been

m odelled into the specificity pocket of E G F B P (B l). T w o constraints

w ere used to allow for two d iffe re n t ways o f a sso c ia tin g the ind iv id u al p - s tr a n d s to form a sheet. These were that either residue

51(A1) was within 10Â of residue 47(A2) or that residue 53(A2) was within 15Â of residue 47(A2). The two p - s h e e t s allow ed by these

c o n stra in ts place a cen tre o f sym m etry at e ith er resid u e 49 or

residue 50 of the EG F C-terminus.

4.3 M e t h o d

4.3.1 The First Application of DAPMatch

T he in itial d ata for the D A PM atch p ro g ram was the E G F

structure, as supplied by Dr G. M ontelione and the m odel of EGF

binding protein, as supplied by Dr Bax. The algorithm presented in

C hapter 2, and applied to antibody/antigen com plexes in C hapter 3,

was follow ed. T he same p aram eters were used throughout. Surface

slices w ere taken from both stru ctu res and the steric m atch in g

p r o c e d u r e f o llo w e d . A f te r c lu s te r in g 1858 o r i e n t a t i o n s w ere

filtering was then used, f h e increased selectivity filter presented in

Section 3.8 could not be used in this case since there was no evidence

fo r strong electro static linkages in the E G F /E G F B P com plex. The

constraints detailed in Section 4.2.3 were then applied.

The rem a in in g 84 co m p lex e s w ere e x am in ed by eye on a

graphics w orkstation. It was clear to both Dr Bax and m y se lf that

none o f the structures proposed by the D A PM atch p rogram were

satisfactory. The com plexes produced fell into three broad categories.

i. Straddling (Figure 4.3a,b). This class of structures brought the EG F in contact with the cleft walls of EGFBP, but not with the

cleft itself. T w o patches of surface area were buried, but the

cleft was left unoccupied and a hole formed.

ii. End on approach (Figure 4.3c). These structures bury an area around one end of the m ajor axis of the roughly ellipsoidal EGF

molecule. This surface of EG F is bound to the cleft of EGFBP, but

the total surface area buried is small and the binding cleft is

not fully occupied.

Hi. O verhanging (Figure 4.3d). These structures brought a variety of areas of the E G F molecule into contact with EGFBP. However

none of the complexes fully occupied the binding cleft, instead

one or other of the binding cleft walls formed the centre of the

interaction region. This m eant that the E G F m olecule wrapped

around EGFBP, contacting with areas that were not implicated

in binding.

N one o f the structures suggested by the D A PM atch program

fully occupied the E G FB P binding cleft. The structures of E G F and

E G FB P were closely examined for features that m ight cause this and

it was noted that some of the surface side-chains on the EGFBP model

(a)

S ' - '

Figure 4.3a,b

Two stereo views of a type i docking, produced by the first application of DAPMatch to the EGF/EGFBP system. View (a) shows the C a trace of EGF (red) straddling the binding cleft of EGFBP (green). View (b) shows a van der Waals sphere representation of the same complex, highlighting the lack of contact between E G F and EGFBP in the central region.

I

m

M

Figure 4.3c,d

Stereo views of a type ii docking (c) and a type iii docking (d), produced by the first application of DAPMatch. The type ii docking results in a small surface area of contact between the molecules and the type iii docking overhangs the binding cleft.

p o in te d d irec tly out in to solvent. T he su rface sid e -ch a in s are

generally mobile and hence the modelling process cannot normally

p re d ic t their o rien tatio n s. Due to this lack o f in fo rm atio n , the

orientation of m any of the solvent exposed side-chains had been

chosen in an arbitrary manner. In particular, a cluster of residues

were noted on one wall of the EGFBP binding cleft whose side-chains

w ere highly solvent exposed (Figure 4.4). Surface slices o f the

binding cleft, as taken by the DA PM atch program , included these

residues and consequently the size o f the binding cleft wall was

artificially enhanced. For this reason the D A PM atch program was

unable to bring the E G F molecule into satisfactory contact with the

EGFBP cleft without causing energetically prohibitive clashes with the

cleft wall. This observation explained why DAPM atch produced the

classes of complex described above: types i and iii avoided the centre of the cleft completely and type ii brought a narrow strip of the EGF molecule into contact with the cleft without making any contact with

the cleft wall.

4.3.2 Pruning

The DAPM atch program was designed to allow for areas of

mismatch during the docking process. The nature of the soft potential

results in a greater tolerance for cavities between surface slices than

for steric clashes (Section 2.9). This feature was used to overcome the