Structure Validation - Materials and Methods

5.5 Materials and Methods

6.5.6 Structure Validation

Structures were collected from the RCSB Protein Data Bank [3]. Structure alignments for Figure 6.3 and Figure 6.4 were made using Cn3D [18]. The Cn3D alignments are coloured by identity such that conserved positions are coloured red and non-conserved positions are coloured blue. RMSD for structure alignments was calculated using PyMOL [5]. The structure alignment for Figure 6.5 was created using PyMOL[5]. The entire structure alignment

was rendered using the ‘cartoon’ renderer. Important residues and the NAD cofactor are em- phasized through stick rendering on top of the original alignment. NAD is coloured red. The region of high local covariation is coloured blue.

[1] W R Atchley, K R Wollenberg, W M Fitch, W Terhalle, and A W Dress. Correlations among amino acid sites in bhlh protein domains: an information theoretic analysis. Mol Biol Evol, 17(1):164–178, 2000.

[2] WR Atchley, KR Wollenberg, WM Fitch, W Terhalle, and AW Dress. Correlations among amino acid sites in bhlh protein domains: an information theoretic analysis. Molecular Biology and Evolution, 17(1):164, 2000.

[3] H.M Berman, J Westbrook, Z Feng, G Gilliland, TN Bhat, H Weissig, I.N Shindyalov, and P.E Bourne. The protein data bank. Nucleic Acids Research, 28(1):235, 2000.

[4] Michele Clamp, James Cuff, Stephen M Searle, and Geoffrey J Barton. The jalview java alignment editor. Bioinformatics, 20(3):426–7, Feb 2004.

[5] WL Delano. The pymol molecular graphics system. Jan 2002.

[6] R.J Dickson, L.M Wahl, A.D Fernandes, and G.B Gloor. Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. PLoS One, 5(6):e11082, 2010.

[7] SD Dunn, LM Wahl, and GB Gloor. Mutual information without the influence of phy- logeny or entropy dramatically improves residue contact prediction. Bioinformatics, 23(3):333–340, 2008.

[8] SD Dunn, LM Wahl, and GB Gloor. Mutual information without the influence of phy- logeny or entropy dramatically improves residue contact prediction. Bioinformatics, 24(3):333, 2008.

[9] R.C Edgar. Quality measures for protein alignment benchmarks.Nucleic Acids Research, 38(7):2145, 2010.

[10] M.A Fares and S.A.A Travers. A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics, 173(1):9, 2006. [11] J Felsenstein. Inferring phylogenies. Sunderland, Jan 2004.

[12] WM Fitch and E Markowitz. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochemical Genetics, 4(5):579–593, 1970.

[13] Don Gilbert. Sequence file format conversion with command-line readseq. Feb 2003. [14] Gregory B Gloor, Gaurav Tyagi, Dana M Abrassart, Andrew J Kingston, Andrew D Fer-

nandes, Stanley D Dunn, and Christopher J Brandl. Functionally compensating coevolving positions are neither homoplasic nor conserved in clades.Mol Biol Evol, 27(5):1181– 91, May 2010.

[15] X Gu. Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol, 16(12):1664–74, Dec 1999.

[16] X Gu. Maximum-likelihood approach for gene family evolution under functional divergence. Mol Biol Evol, 18(4):453–64, Apr 2001.

[17] Xun Gu. A simple statistical method for estimating type-ii (cluster-specific) functional divergence of protein sequences. Mol Biol Evol, 23(10):1937–45, Oct 2006.

[18] C W Hogue. Cn3d: a new generation of three-dimensional molecular structure viewer.

[19] R Ihaka and R Gentleman. R: a language for data analysis and graphics. Journal of computational and graphical statistics, pages 299–314, 1996.

[20] I Kass and A Horovitz. Mapping pathways of allosteric communication in groel by analysis of correlated mutations. Proteins, 48(4):611–617, 2002.

[21] Alexander Kawrykow, Gary Roumanis, Alfred Kam, Daniel Kwak, Clarence Leung, Chu Wu, Eleyine Zarour, Phylo players, Luis Sarmenta, Mathieu Blanchette, and Jérôme Waldispühl. Phylo: a citizen science approach for improving multiple sequence alignment. PLoS One, 7(3):e31362, 2012.

[22] Changhoon Kim and Byungkook Lee. Accuracy of structure-based sequence alignment of automatic methods. BMC bioinformatics, 8:355, 2007.

[23] B. P Kleinstiver, A. D Fernandes, G. B Gloor, and D. R Edgell. A unified genetic, computational and experimental framework identifies functionally relevant residues of the homing endonuclease i-bmoi. Nucleic Acids Research, 38(7):2411–2427, Apr 2010.

[24] Andrew Kuziemko, Barry Honig, and Donald Petrey. Using structure to explore the sequence alignment space of remote homologs. PLoS Computational Biology, 7(10):e1002175, 2011.

[25] J.A Lake. Reconstructing evolutionary trees from dna and protein sequences: paralinear distances. Proceedings of the National Academy of Sciences, 91(4):1455, 1994.

[26] D.Y Little and L Chen. Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution.

PLoS One, 4(3):e4762, 2009.

[27] Z.J Liu, Y.J Sun, J Rose, Y.J Chung, C.D Hsiao, W.R Chang, I Kuo, J Perozich, R Lindahl, and J Hempel. The first structure of an aldehyde dehydrogenase reveals novel interactions

between nad and the rossmann fold. Nature Structural&Molecular Biology, 4(4):317– 326, 1997.

[28] A Marchler-Bauer, AR Panchenko, BA Shoemaker, PA Thiessen, LY Geer, and SH Bryant. Cdd: a database of conserved domain alignments with links to domain three- dimensional structure. Nucleic Acids Research, 30(1):281, 2002.

[29] L Ni, S Sheikh, and H Weiner. Involvement of glutamate 399 and lysine 192 in the mechanism of human liver mitochondrial aldehyde dehydrogenase.Journal of Biological Chemistry, 272(30):18823, 1997.

[30] O Olmea, B Rost, and A Valencia. Effective use of sequence correlation and conservation in fold recognition1. Journal of molecular biology, 293(5):1221–1239, 1999.

[31] S.J Perez-Miller and T.D Hurley. Coenzyme isomerization is integral to catalysis in aldehyde dehydrogenase. Biochemistry, 42(23):7100–7109, 2003.

[32] Art Poon and Lin Chao. The rate of compensatory mutation in the dna bacteriophage phix174. Genetics, 170(3):989–999, 2005.

[33] A Rodionov, A Bezginov, J Rose, and E.R.M Tillier. A new, fast algorithm for detecting protein coevolution using maximum compatible cliques. Algorithms for molecular biology, 6(1):17, 2011.

[34] Ryo Takeuchi, Abigail R Lambert, Amanda Nga-Sze Mak, Kyle Jacoby, Russell J Dick- son, Gregory B Gloor, Andrew M Scharenberg, David R Edgell, and Barry L Stoddard. Tapping natural reservoirs of homing endonucleases for targeted gene modification.Proc Natl Acad Sci U S A, 108(32):13077–82, Aug 2011.

[35] R Thangudu, M Manoharan, N Srinivasan, F Cadet, R Sowdhamini, and B Offmann. Analysis on conservation of disulphide bonds and their structural features in homologous protein domain families. BMC Structural Biology, 8(1):55, 2008.

[36] J.D Thompson, P Koehl, R Ripp, and O Poch. Balibase 3.0: latest developments of the multiple sequence alignment benchmark. Proteins, 61(1):127–136, 2005.

[37] JD Thompson, F Plewniak, and O Poch. Balibase: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics, 15(1):87, 1999.

[38] A.M Waterhouse, J.B Procter, D Martin, M Clamp, and G.J Barton. Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics, 25(9):1189, 2009.

[39] Y.B Xu and E.R.M Tillier. Regional covariation and its application for predicting protein contact patches. Proteins, 78(3):548–558, 2010.

[40] C Yanofsky, V Horn, and D Thorpe. Protein structure relationships revealed by mutational analysis. Science, 146(3651):1593, 1964.

Discussion

This work represents a contribution to our collective understanding of sequence alignment, coevolution, and their intertwined relationship. Prior to this work, the alignment-coevolution relationship was not adequately characterized in the literature, which has led to misuses of coevolutionary methods. This work is an attempt to acknowledge the fundamental implicit assumptions made about evolution when conducting sequence analysis experimentsin silico. It is also an advancement of the fields of multiple sequence alignment, and (both directly and indirectly) coevolutionary inference.

7.1 Improvements to multiple sequence alignment

In document Computational Molecular Coevolution (Page 170-177)