SIGNIFICANCE OF SEQUENCE ALIGNMENT

SEQUENCE ALIGNMENT

6.3 SIGNIFICANCE OF SEQUENCE ALIGNMENT

Sequence alignment is useful for discovering functional, structural, and evolutionary information in biological sequences. It is important to obtain the best possible or so-called “optimal” alignment to discover this information. Sequences that are very much alike, or “similar” in the parlance of sequence analysis, probably have the same function, be it a regulatory role in the case of similar DNA molecules, or a similar biochemical function and three-dimensional structure in the case of proteins. Additionally, if two sequences from different organisms are similar, there may have been a common ancestor sequence, and the sequences are then defined as being homologous. The alignment indicates the changes that could have occurred between the two homologous sequences and a common ancestor sequence during evolution. With the advent of genome analysis and large-scale sequence comparisons, it becomes important to recognize that sequence similarity may be an indicator of several possible types of ancestor relationships, or there may be no ancestor relationship at all. For example, new gene evolution is often thought to occur by gene duplication, creating two tandem copies of the gene, followed by mutations in these copies. In rare cases, new mutations in one of the copies provide an advantageous change in function. The two copies may then evolve along separate pathways. Although the resulting separation of function will generate two related sequence families, sequences among both families will still be similar due to the single gene ancestor. In addition, genetic rearrange ments

can reassort domains in proteins, leading to more complex proteins with an evolutionary history that is difficult to reconstruct (Henikoff et al. 1997). Evolutionary theory provides terms that may be used to describe sequence relationships. Homologous genes that share a common ancestry and function in the absence of any evidence of gene duplication are called orthologs. When there is evidence for gene duplication, the genes in an evolutionary lineage derived from one of the copies and with the same function are also referred to as orthologs. The two copies of the duplicated gene and their progeny in the evolutionary lineage are referred to as paralogs. In other cases, similar regions in sequences may not have a common ancestor but may have arisen independently by two evolutionary pathways converging on the same function, called convergent evolution. There are some remarkable examples in protein structures. For instance, although the enzymes chymotrypsin and subtilisin have totally different three-dimensional structures and folds, the active sites show similar structural features, including histidine (H), serine (S), and aspartic acid (D) in the catalytic sites of the enzymes (for discussion, see Branden and Tooze 1991). Additional examples are given. In such cases, the similarity will be highly localized. Such sequences are referred to as analogous (Fitch 1970). A closer examination of alignments can help to sort out possible evolutionary origins among similar sequences (Tatusov et al. 1997). As pointed out by Fitch and Smith (1983), sequences can be either homologous or nonhomologous, but not in between. The genetic rearrangements referred to above can give rise to chimeric genes, in which some regions are homologous and others are not. Referring to the entire sequences as homologous in such situations leads to an inaccurate and incomplete description of the sequence lineage. Another complication in tracing the origins of similar sequences is that individual genes may not share the same evolutionary origin as the rest of the genome in which they presently reside. Genetic events such as symbioses and viral- induced transduction can cause horizontal transfer of genetic material between unrelated organisms. In such cases, the evolutionary history of the transferred sequences and that of the organisms will be different. Again, with the capability of detecting such events in the genomes of organisms comes the responsibility to describe these changes with the correct evolutionary terminology. In this case, the sequences are xenologous (Gray and Fitch 1983). Recently, Lawrence and Ochman (1997) have shown that horizontal transfer of genes between species is as common in enteric bacteria, if not more common, than mutation. Describing such changes requires a careful description of sequence origins.

6.4 Let us sum up:

Many Bioinformatics tasks depend upon successful alignments. Alignments are conventionally shown as a traces. When two symbolic representations of DNA or protein sequences are arranged next to one another so that their most similar elements are juxtaposed they are said to be aligned.In a symbolic sequence each base or residue monomer in each sequence is represented by a letter.

6.5 Lesson end activities

1. Find out the methodology for (i) Global Alignment (ii) Local Alignment.

6.6 Check your progress: Model answers 1. Your answer must include these points:

Global Alignment – Needlemann-Wunsch algorithm Local Alignment – Smith-Waterman algorithm 6.7 Points for Discussion

1. “Sequence alignment has made the task of biological scientist easy” - Comment. 2. How do you rate the local and global alignments.

6.8 References

1. Altschul,S.F. (1989) Gap costs for multiple sequence alignmen J. Theor. Biol., 138, 297– 309.

2. Altschul,S.F., Madden,T.L., Sch¨affer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search program Nucleic Acids Res., 25, 3389–3402.

3. Barton,G.J. and Sternberg,M.J. (1987) A strategy for the rapi multiple alignment of protein sequences. Confidence levels fro tertiary structure comparisons. J. Mol. Biol., 198, 327–337. Boguski,M.S. and Schuler,G. (1995) ESTablishing a human transcrip map.

LESSON – 7

In document UNIT I LESSON -1 INTRODUCTION TO BIOINFORMATICS (Page 58-61)