Structure on the masterlist
2.4 Conclusion
In this chapter, we have demonstrated that large amounts of lectin binding information is readily available from databases such as the consortium for functional glycomics. This data can be used to highlight important elements involved in binding of a lectin to a known glycan, which can then be used to inform inhibitor design for anti-adhesion therapy.
Powerful statistics tools such as random forest and linear discriminant analysis coupled with the information within these databases highlights the potential for using large data sets to assist in classification of unknown lectin samples based on their binding profiles. This could then be used to identify toxic environmental lectins, such as cholera toxin and ricin based on binding alone, which is currently challenging, as samples may be contaminated with environmental lectins that show similar binding to either ricin or cholera toxin.
However, the use of these databases, in informing real-world experiments is limited by the poor accessibility of the data, non-uniform formatting of data sets, variations between data sets in experimental protocol and challenges caused by the effect of the spacers on lectin binding meaning. This means the data in the CFG can only be compared to lectin binding to exactly the same microarray platform. This is unfavourable as these microarrays are expensive.
In conclusion, whilst there is a wealth of information available in terms of carbohydrate binding profiles, its use has many limitations not least of which is the expense of using their microarray platforms. It is favourable to combine many of the
69
analysis techniques examined within this chapter (such as linear discriminant analysis) with a much cheaper more customisable surface modification technique. Coupled with standardised techniques for assessing binding (be that fluorescence or other ‘label free’ methodologies) this could be used to generate a database of profiles that can be accessed and reproduced around the world with the aim of using the database for sample classification based on binding profiles.
2.5 Methods
Consortium for functional glycomics data extraction: The following data files were downloaded from the consortium for functional glycomics and all data plotted are the average of the 6 repeats and error bars represent the standard error (Table 2.2). All graphs are plotted in OriginPro and binding isotherms are fitted using the nonlinear curve fitting function.
Abbreviation in text Sample name in CFG database
RCA120 RCA120 (10 µg.mL-1)
CTxB CTxB (0.001 µg.mL-1, 0.01 µg.mL-1, 0.1 µg.mL-1, 1 µg.mL-1, 10 µg.mL-1)
Ag Ag I/II Vhelical-GFP: Ag I/II Vhelical-GFP ArtB ArtB toxin: ArtB bacterial toxin- 200 µg.mL-1
Blon Blon 2468 mut
BOA Burkholdeira oklahomensis agglutinin (BOA): BOA lectin
EalB EalB-h6:EalB-h6 ADP- ribosylating toxin EplB EplB-h6: EplB-h6 ADP ribosylating enzyme EtpA recombinant EtpA glycoptorein: rEtpA
EtxB EtxB:etxAB-100 µg.mL-1
FimH FimH:FimH type 1 fimbriae 10 µg.mL-1
Fm1D FimD:FimD (200 µg.mL-1)
FliD FliD:FliD
Msmeg MSMEG_3662:MSMEG3662-2B
OmpA OmpA+E. coli:OmpA+E. coli
PFL PFL:PFL
Rv1419 Iron regulated heparin binding hemmaglutinin from
Mycobacterium tuberculosis
SAOUHSC SAOUHSC_00176
SSLO SSLO:SSLO-488 (200 µg.mL-1)
StcE StcE:StcE
Typhoid Typhoid toxin: WT_Typhoid toxin- 20 µg.mL-1
Table 2.2 Table linking the lectin abbreviation in the text to the file name within the CFG database. Where more than one concentration is stated then the concentration used for the method is stated.
71
Linear discriminant analysis: Six repeats of the relative fluorescence for each of the bacterial lectins binding to each of the chosen monosaccharide surfaces was extracted from the CFG. This data formed a training matrix which was subjected to classical linear discriminant analysis using the ‘dapc’ function in the ‘adegenet’ package (version 1.4-2)29 in the open source statistical package R (version 3.1.3).30
Cross-validation was performed using the ‘xvalDapc’ function whereby 10 % of the data set is ‘left out’ of the model and then the model produced is used to classify the ‘left out’ data. This is repeated multiple times with the ‘left out’ data being randomly selected.
Random forest analysis: The dataset of six repeats of the binding of 20 bacterial lectins to 13 monosaccharide surfaces as extracted from the CFG database was used to produce a random forest model using the ‘randomForest’ function (version 4.6.10)31 in the open source statistical package R (version 3.1.3).30 The model produced was the average of 500 trees. Data were cross-validated using a ‘leave-one- out’ approach and the percentage of correct reassignment for each bacterial lectin was calculated.
Apparent Kd calculations: Kdapp calculations were performed as described by Orosz
and Ovadi19 but, briefly; relative fluorescence data produced upon binding of different concentrations of cholera toxin to the various glycans assessed was extracted from the consortium for functional glycomics.
Fluorescence levels were then modified to put them on a relative scale with 1 indicating the maximum fluorescence with 0 indicating the lowest fluorescence. Points were then plotted in OriginPro and a binding isotherm fitted to the data using the nonlinear curve fit tool (as shown in Figure 2.13A). Concentration of cholera toxin at a variety of fluorescence levels (i) was extracted from the fitted curve. All values were then modified and plotted as shown in Figure 2.13B and the gradient of the fitted line (m) was extracted.
𝐾𝑑!"" = 1/𝑚 (Eq. 1)
The Kdapp can then be determined by using Eq. 1. For any data set where the binding
isotherm failed to converge, the Kdapp was unable to be calculated and this is reported
as NA.
Figure 2.13 Example data set showing transformation needed to calculate Kdapp as
described by Orosz and Ovadi.19 A) Relative fluorescence (i) is plotted for various concentrations of cholera toxin and a binding isotherm fitted to the data. The modelled isotherm is then used to estimate cholera toxin concentration at various fluorescence (i) values. These extracted values are then further transformed and plotted as indicated in B. The gradient of the fitted line (m) is extracted.
[CTxB] Re la ti ve fl uore sc enc e (i ) [CTxB]/i 1/ (1-i ) m
A
B
73
2.6 References
1. K. F. Medzihradszky, K. Kaasik and R. J. Chalkley, Mol Cell Proteomics, 2015.
2. A. Lopez-Ferrer, C. Barranco and C. de Bolós, American journal of clinical pathology, 2002, 118, 749-755.
3. J. B. Lowe and J. D. Marth, Annual review of biochemistry, 2003, 72, 643- 691.
4. J. J. Lundquist and E. J. Toone, Chemical reviews, 2002, 102, 555-578. 5. T. K. Dam and C. F. Brewer, Glycobiology, 2010, 20, 270-279.
6. CFG, http://www.functionalglycomics.org.
7. R. Ranzinger, S. Herget, C. W. von der Lieth and M. Frank, Nucleic Acids Res, 2011, 39, D373-376.
8. R. Raman, M. Venkataraman, S. Ramakrishnan, W. Lang, S. Raguram and R. Sasisekharan, Glycobiology, 2006, 16, 82R-90R.
9. C. J. O'Neal, M. G. Jobling, R. K. Holmes and W. G. Hol, Science, 2005, 309, 1093-1096.
10. D. Cassel and T. Pfeuffer, Proc Natl Acad Sci U S A, 1978, 75, 2669-2673. 11. R. G. Zhang, D. L. Scott, M. L. Westbrook, S. Nance, B. D. Spangler, G. G.
Shipley and E. M. Westbrook, J Mol Biol, 1995, 251, 563-573.
12. E. Rutenber, B. J. Katzin, S. Ernst, E. J. Collins, D. Mlsna, M. P. Ready and J. D. Robertus, Proteins, 1991, 10, 240-250.
13. S. V. Heyningen, Science, 1974, 183, 656-657.
14. J. M. Lord, L. M. Roberts and J. D. Robertus, FASEB J, 1994, 8, 201-208. 15. M. Mattarella, J. Garcia-Hartjes, T. Wennekes, H. Zuilhof and J. S. Siegel,
16. P. I. Kitov, J. M. Sadowska, G. Mulvey, G. D. Armstrong, H. Ling, N. S. Pannu, R. J. Read and D. R. Bundle, Nature, 2000, 403, 669-672.
17. T. R. Branson, T. E. McAllister, J. Garcia-Hartjes, M. A. Fascione, J. F. Ross, S. L. Warriner, T. Wennekes, H. Zuilhof and W. B. Turnbull, Angew Chem Int Ed Engl, 2014, 53, 8323-8327.
18. S. J. Richards, M. W. Jones, M. Hunaban, D. M. Haddleton and M. I. Gibson,
Angewandte Chemie International Edition, 2012, 51, 7812-7816. 19. F. Orosz and J. Ovadi, J Immunol Methods, 2002, 270, 155-162.
20. I. Moustafa, H. Connaris, M. Taylor, V. Zaitsev, J. C. Wilson, M. J. Kiefel, M. von Itzstein and G. Taylor, J Biol Chem, 2004, 279, 40819-40826.
21. M. Jones, L. Otten, S.-J. Richards, R. Lowery, D. Phillips, D. Haddleton and M. Gibson, Chemical Science, 2014, 5, 1611-1616.
22. R. L. Phillips, O. R. Miranda, C. C. You, V. M. Rotello and U. H. Bunz,
Angew Chem Int Ed Engl, 2008, 47, 2590-2594.
23. C.-C. You, O. R. Miranda, B. Gider, P. S. Ghosh, I.-B. Kim, B. Erdogan, S. A. Krovi, U. H. F. Bunz and V. M. Rotello, Nat Nano, 2007, 2, 318-323.
24. Z. Yan, J. Li, Y. Xiong, W. Xu and G. Zheng, Oncol Rep, 2012, 28, 1036- 1042.
25. M. Khalilia, S. Chakraborty and M. Popescu, BMC Med Inform Decis Mak, 2011, 11, 51.
26. E. Arigi, O. Blixt, K. Buschard, H. Clausen and S. B. Levery, Glycoconj J, 2012, 29, 1-12.
27. T. Kodadek, Chem Biol, 2001, 8, 105-115.
28. Y. Fei, Y. S. Sun, Y. Li, K. Lau, H. Yu, H. A. Chokhawala, S. Huang, J. P. Landry, X. Chen and X. Zhu, Mol Biosyst, 2011, 7, 3343-3352.
75
29. T. Jombart, C. Collins, P. Solymos, I. Ahmed, F. Calboli and A. Cori, adegenet: an R package for the exploratory analysis of genetic and genomic data, http://adegenet.r-forge.r-project.org/).
30. R. D. C. Team, R: A Language and Environment for Statistical Computing. Vienna, Austria : the R Foundation for Statistical Computing, http://www.R- project.org/).