Structural A n alysis, C om parison and A lign m en t
2.2.3 Structural Com parison
2.2.3.1
Structure-Structure Comparison
T he C O C O P L O T toolkit generates a m easure of stru c tu ra l sim ilarity based on the num ber of equivalent, i.e. overlapping, contacts between two protein structures. Inter-residue contacts buried in the protein core not only provide a description of te rtia ry protein stru ctu re b u t also contribute to the stab ility of the overall fold. As a result, these contacts are often highly conserved during evolution.
D uring evolution a stru ctu re can be affected not only by point m utations, where one residue changes identity to another, b u t also by indels of sometimes large frag m ents of am ino acid sequence (see section 1.2.4.1). Therefore, any m ethod th a t ex amines these d istan t stru ctu ral relationships m ust take these indels into account by allowing gaps to be introduced into the alignm ent. For stru ctu re-stru ctu re com par isons the SSAP algorithm (Orengo & Taylor, 1996) was used to generate a stru ctu ral alignm ent between th e two proteins. To generate th e pairwise contact overlap score, th e contact m aps of each stru ctu re were first generated then th e SSAP alignm ent used to identify equivalent positions in the pairwise alignm ent where b o th stru c tures have an inter-residue contact. The contact overlap score, Sstructure-structure, is given by th e overlapping contacts as a percentage of the larger num ber of contacts between the two stru ctures (see equation 2.3).
S s t r u c t u r e - s t r u c t u r e = ~
1
- " ^ — * 1 0 0 (2.3)t^max
W here
Coverlap = N um ber of overlapping contacts between stru ctures (I) and (J)
Cmax = M ax ( C o n tacts/, C o n ta ctsj )
Figure 2.6 shows the comparison contact m ap between two actin-binding proteins (PD B codes 2vik and Isvq) based on a SSAP stru c tu ra l alignm ent. These two proteins are stru ctu rally sim ilar (SSAP score of 83), yet evolutionarily d istan t (17% sequence identity). In th e com parison contact m ap, the contacts in the first stru ctu re (2vik) are shown as grey dots, the contacts in th e second stru ctu re (Isvq) are shown
as black dots and th e overlapping contacts are shown as red dots. The m inim um sequence distance of 8 residues can be seen as th e yellow band on the m ain diagonal. T his is imposed to avoid including frequently occurring contact p a tte rn s between residues close in sequence, since these p a tte rn s (typical of secondary structures) are common to bo th related and unrelated structures.
Chapter 2. Inter-Residue Contacts for Structural Comparison 68 -c u - - o
*
71 121V .
5 I I 21 31 41 31 61 71 81 91 101 I I I 121 131 MATRIX SCOPE OVERWP NWXCOMTACTS MINSEODB CUTOFF GAP PEMALTY BACKGPOUMD CONTACT SEC - SEC SEC • COIL COIL • COIL 23 J» 70 297 5 8 0 10000 10 10000 0 0 0 Protein 1 NAME CATH LgiGTH CONTACTS Protein 2 NAME CATH LB-IGTH CONTACTS 2wkOO 340.20.10 126 297 ImtqOO 340.20.10 94 206 Scale I I Noconlaci 1__ I M in seq d s I I C om ae I (Protein I) I B C o n ta c t (Proiein Z) I B C o n ta ct (C W ta p tF ig u r e 2.6: Pairwise structure-structure comparison by overlapping contact maps. The alignment between the structures is shown by a secondary structure schematic (alpha-helices in magenta, beta-strands in yellow) with the contacts of the first struc ture shown as black dots, the contacts of the second structure as grey dots and overlapping contacts as red dots. The values seen in the bottom -left box relate to CONALIGN parameters (see section 2.3)
Chapter 2. Inter-Residue Contacts for Structural Comparison 69
2.2.3.2 Structure-Tem plate Comparison
To com pare the sim ilarity in contact m aps between a query stru ctu re and 3D tem plate, a stru c tu ra l alignm ent m ust again be perform ed to identify th e set of equiva lent positions. This stru ctu ral alignm ent was achieved using th e CORALIGN pro gram from the CO RA suite (Orengo, 1999). The CORALIGN program is used to align a single protein stru ctu re to the consensus stru ctu ral tem plate generated from a CO RA m ultiple stru ctu re alignm ent (Orengo, 1999). The C O C O P L O T program is th en used to generate a COM for the tem p late and a contact m ap for th e single stru ctu re. T he contact overlap score, Sgtructure-tempiate: is then calculated as the num ber of contacts th a t occur between equivalent positions in the alignm ent (see equation 2.4)
S s t r u c t u r e - t e m p l a t e = * 1 0 0 (2.4)
t^max
W here
Coverlap = N um ber of overlapping contacts between contacts in stru ctu re (I) and consensus contacts in stru c tu ra l tem plate (J)
Cmax = M ax ( C o n tacts/, Consensus C o n ta ctsj )