Multiple Sequence Alignment

Top PDF Multiple Sequence Alignment:

Learning to Paraphrase: An Unsupervised Approach Using Multiple Sequence Alignment

Learning to Paraphrase: An Unsupervised Approach Using Multiple Sequence Alignment

Our work presents a novel knowledge-lean algorithm that uses multiple-sequence alignment (MSA) to learn to generate sentence-level paraphrases essentially from unannotated corpus data alone. In contrast to previ- ous work using MSA for generation (Barzilay and Lee, several versions of their component sentences. This could, for example, aid machine-translation evaluation, where it has be- come common to evaluate systems by comparing their output against a bank of several reference translations for the same sen- tences (Papineni et al., 2002). See Bangalore et al. (2002) and Barzilay and Lee (2002) for other uses of such data.
Show more

8 Read more

Prenominal Modifier Ordering via Multiple Sequence Alignment

Prenominal Modifier Ordering via Multiple Sequence Alignment

We believe that multiple sequence alignment is well-suited for aligning linguistic sequences, and that these alignments can be used to predict prenominal modifier ordering for any given set of modifiers. Our technique utilizes simple fea- tures within the raw text, and does not require any semantic information. We achieve good per- formance using this approach, with results com- petitive with earlier work (Shaw and Hatzivas- siloglou, 1999; Malouf, 2000; Mitchell, 2009) and higher recall and F-measure than that reported in Mitchell (2009) when tested on the same cor- pus.

9 Read more

Assessing the efficiency of multiple sequence alignment programs

Assessing the efficiency of multiple sequence alignment programs

Background: Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale data analysis, it is imperative to optimize the application of software. Therefore, a balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. We compared both accuracy and cost of nine popular MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX, MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the benchmark alignment dataset BAliBASE and discuss the relevance of some implementations embedded in each program ’ s algorithm. Accuracy of alignment was calculated with the two standard scoring functions provided by BAliBASE, the sum-of-pairs and total-column scores, and computational costs were determined by collecting peak memory usage and time of execution.
Show more

8 Read more

Progressive Multiple Sequence Alignment with Indel Evolution

Progressive Multiple Sequence Alignment with Indel Evolution

Nowadays, the supply of multiple sequence alignment tools is impressively wide and the decision of which tool to use depends mostly on individual pref- erences, the type of data to be processed and the size of the dataset. One can identify four main strands of multiple sequence alignment purposes, that are structure prediction; database searching ; sequence comparison ; and phyloge- netic analysis requiring, underneath, different mathematical models [97, 98]. The multiple sequence alignment tools can be further classified by the im- plied inference methods as local/global alignment, using iterative refinement feature, based on sum-of-pairs (SP) optimality criterion or probabilistic (poste- rior probability or maximum likelihood). These references provide a quick link to further information: [4, 25, 32, 66, 86, 98, 107, 111, 116, 120, 144, 155, 161]. In this thesis we will focus only on evolutionary alignment that repre- sent character homology between related sequences. They, for instance, are fundamental prerequisite for phylogenetic inference [98]. Multiple sequence alignment are conventionally represented in a table where taxa are placed in succession in rows and the sequence characters are the columns of the table. In this representation a character state is obtained by intersecting row and columns [98]. Gaps are added for the correct positioning of the homologous characters – sharing a common ancestor –together in the same column. It is clear at this point that the fundamental units for an homology hypothesis are the characters which translates in the a proper gaps disposal inside the table [38]. It is worth mentioning the distinction between the information represented in the phylogeny and the one in a multiple sequence alignment. A phylogenentic tree reproduce the relationship among taxa defined by their sequences, whereas an alignment depicts in columns the relationship among characters [98]. When studying homologies we have to focus therefore on the evolutionary events on characters [98], in this sense, homology exists only if referred to phylogeny [6, 16].
Show more

267 Read more

A Survey of the State of the Art Parallel Multiple Sequence Alignment Algorithms on Multicore Systems

A Survey of the State of the Art Parallel Multiple Sequence Alignment Algorithms on Multicore Systems

Evolutionary modeling applications are the best way to provide full information to support in-depth understanding of evaluation of or- ganisms. These applications mainly depend on identifying the evo- lutionary history of existing organisms and understanding the rela- tions between them, which is possible through the deep analysis of their biological sequences. Multiple Sequence Alignment (MSA) is considered an important tool in such applications, where it gives an accurate representation of the relations between different bio- logical sequences. In literature, many efforts have been put into presenting a new MSA algorithm or even improving existing ones. However, little efforts on optimizing parallel MSA algorithms have been done. Nowadays, large datasets become a reality, and big data become a primary challenge in various fields, which should be also a new milestone for new bioinformatics algorithms.
Show more

9 Read more

Title: Multiple Sequence Alignment Based Method for Construction of Phylogenetic Trees

Title: Multiple Sequence Alignment Based Method for Construction of Phylogenetic Trees

number of biological data in the field of proteomics, multiple sequence alignment methods are required that deals with multiple sequences at a time. The biological data is available in different data types such as: genome database, sequence database, structure database, enzyme database and etc. The multiple sequence alignment methods take sequence databases for evolutionary study. There are different types of sequence databases such as: GenBank, EMBL: European Molecular Biology Laboratory (for nucleotides), Swiss-Prot, UniProt (for proteins) and etc.
Show more

7 Read more

MSARC: Multiple sequence alignment by residue clustering

MSARC: Multiple sequence alignment by residue clustering

Second, only sequences belonging to currently aligned subsets contribute to their pairwise alignment. Even if a guide-tree reflects correct phylogenetic relationships, these alignments may be inconsistent with remaining sequences and the inconsistencies are propagated to fur- ther steps. To address this problem, in recent programs [4-8] progressive alignment is usually preceded by consis- tency transformation (incorporating information from all pairwise alignments into the objective function) and/or followed by iterative refinement of the multiple alignment of all sequences. Moreover, recently several strategies avoiding guide trees altogether were also proposed [9-11]. In the present paper we propose MSARC, a new non-progressive multiple sequence alignment algorithm. MSARC constructs a graph with all residues from all sequences as nodes and edges weighted with alignment affinities of its adjacent nodes. Columns of best multi- ple alignments tend to form clusters in this graph, so in the next step residues are clustered (see Figure 1). Finally, MSARC refines the multiple alignment corresponding to the clustering.
Show more

11 Read more

Bootstrapping Lexical Choice via Multiple Sequence Alignment

Bootstrapping Lexical Choice via Multiple Sequence Alignment

izations vary considerably, and none directly matches the entire semantic input. For instance, it is not ob- vious without domain knowledge that “Given a and b as in the theorem statement” matches “a=0” and “b=0”, nor that “their product” and “a∗b” are equiv- alent. Moreover, sentence (3) omits the goal argu- ment entirely. However, as Figure 2 shows, the com- bination of these verbalizations, as computed by our multiple-sequence alignment method, exhibits high structural similarity to the semantic input: the indi- cated “sausage” structures correspond closely to the three arguments of show-from.
Show more

8 Read more

Refin-Align: New Refinement Algorithm For Multiple Sequence Alignment

Refin-Align: New Refinement Algorithm For Multiple Sequence Alignment

Multiple sequence alignment can help biologist to pre- dict structure and function information for a set of se- quences. Indeed, we can reveal information about biolog- ical functions common to biological macromolecules from several different organisms by identifying similar regions, these regions are often an important structural or functional roles. Multiple sequence alignment can also help in the classification of macromolecules into different families ac- cording to similar sub-strings detected. In addition, mul- tiple sequence alignment can help to construct a phyloge- netic tree and analyse relationships between species in or- der to establish a common biological ancestor.
Show more

8 Read more

DNA Multiple Sequence Alignment by a Hidden Markov Model and Fuzzy Levenshtein Distance based Genetic Algorithm

DNA Multiple Sequence Alignment by a Hidden Markov Model and Fuzzy Levenshtein Distance based Genetic Algorithm

In the last decade, biologists have experienced a fundamental shift away from the traditional empirical research to large- scale, computer-based research. Today bio-informatics is a systematic and predictive discipline which encompasses genomics, informatics, automation, and miniaturization. This fusion of biology and information science is expected to continue and expand for the foreseeable future. DNA Sequence alignment is a commonly observed problem in bio- informatics for establishing similarity and evolutionary relationship between DNA sequences. This paper has presented a DNA multiple sequence alignment technique by a genetic algorithm based on Hidden Markov Model and Fuzzy Levenshtein Distance.
Show more

5 Read more

Cluster Analysis Method for Multiple Sequence Alignment

Cluster Analysis Method for Multiple Sequence Alignment

With the addition of more data in the field of proteomics, the computational methods need to be more efficient. The fraction or the part of molecular sequence that is more resistant to change is functionally more important to the molecule. Comparative approaches are used to ensure the reliability of sequence alignment. The problem of multiple sequence alignment (MSA) is a proposition of evolutionary history. The explicit homologous correspondence of each individual sequence position is established for each column in the alignment. In the present work, the different pair-wise sequence alignment methods are discussed. The limitation of these methods is that they are capable for aligning the limited number of sequences having small sequence length. A new method is proposed for sequence alignment based on the local alignment with consensus sequence. The triticum wheat varieties sequences are considered which are loaded from the NCBI databank. The dataset is divided into two parts and two phylogenetic trees are constructed for each dataset. Using advanced pruning techniques, a single tree is constructed from the two trees generated. Then by applying the threshold condition, the closely related sequences are extracted and optimal MSA is obtained using shift operations in both directions.
Show more

7 Read more

Instability in progressive multiple sequence alignment algorithms

Instability in progressive multiple sequence alignment algorithms

The creation of a multiple sequence alignment is a rou- tine step in the analysis of homologous genes or proteins. For aligning more than a few hundred sequences, most methods use a heuristic approach termed “progressive alignment” by Feng and Doolittle [1]. This is a two-stage process: first a guide tree [2] is created by clustering the sequences based on some distance or similarity measure, and then the branching structure of the guide tree is used to order the pairwise alignment of sequences. The power of progressive multiple sequence alignement may come from the fact that “more similar” sequences are aligned first: “...assuming that in progressive alignment, the best accuracy is obtained at each node by aligning the two profiles that have fewest differences, even if they are not evolutionary neighbours” [3].
Show more

10 Read more

Heuristics for multiobjective multiple sequence alignment

Heuristics for multiobjective multiple sequence alignment

Background: Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplis- tic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment.
Show more

17 Read more

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing

three levels: multiple threads based on central processing unit (CPU) on a single machine, multiple threads based on graphics processing unit (GPU) on a single machine, and multiple threads based on CPUs or GPUs on cluster machines. CPU-based multiple threads, which are com- mon and effortless, suit small-scale sequence alignment. With emergence of bottlenecks in increasing clock fre- quency of multi-core CPUs, Moore’s law became mean- ingless [3]. Based on NVIDIA GPU, compute unified device architecture (CUDA) technique was designed for efficient parallelism [4, 5]. GPU functions in real-time rendering of screens, because hundreds of cores in GPUs can efficiently calculate pixels or coordinates in parallel. However, under limited video memory size and band- width, alignment of ultra-large sequences becomes dif- ficult or even impossible [6]. With high computational
Show more

10 Read more

Inferring phylogenies of evolving sequences without multiple sequence alignment

Inferring phylogenies of evolving sequences without multiple sequence alignment

support the latter view. The alignment-free approach implemented here appears to have no difficulty, at appropriate parameter settings across our simulated datasets, in capturing homology signal and generating topologies that are very similar or identical to those gen- erated by MSA followed by Bayesian inference, arguably the current standard in phylogenetics (see below). The robustness of alignment- free methods to rearrangements and insertions/deletions represents a critical advantage, since these events are common among microbial genomes 3 and frequently interrupt individual genes 45 . Our findings
Show more

9 Read more

P-Coffee: A New Divide-and-conquer Method for Multiple Sequence Alignment

P-Coffee: A New Divide-and-conquer Method for Multiple Sequence Alignment

in separate base list (called the primary library in T-Coffee), with the sequence identity (%) as their preliminary weights (or scores). For the local alignment library, the ten highest scoring local alignments are accepted. Next, the union is taken over all the residue pairs listed in the NW and SIM primary libraries to construct a single primary library. For residue pairs that exist in both libraries, the preliminary scores are simply added. Then, T-Coffee gives additional weight to the residue pairs that can be linked by another residue contained in the remaining sequences. For each residue pair in the primary library, all the remaining sequences are examined in search of such linkage. Whenever a link residue is found, the smaller weight of either linkage is added to the current weight. This process is called library extension. The final residue score can be expressed as the following equation.
Show more

80 Read more

Multiple sequence alignment algorithms for the phylogenic analysis of chloroplast DNA

Multiple sequence alignment algorithms for the phylogenic analysis of chloroplast DNA

Progressive alignment methods are the most commonly used and have the advantage of speed and simplicity (Notredame 2002). Progressive alignment successively aligns pairs of sequences using pairwise alignment algorithms (such as Needleman-Wunsch etc). Progressive alignment algorithms differ in several key ways: the way they choose the order in which to do the alignment, if they involve the alignment of a single sequence to a single growing alignment or if subfamilies are built up leading to alignments of alignments, and the method of aligning and scoring sequences or alignments against existing alignments. The most important heuristic used in progressive alignment algorithms is to align the most similar sequences first (those with the smallest edit distance). Progressive sequence alignment algorithms are sensitive to the order of the pairwise alignments which is determined solely by alignments of only two sequences at a time (Morgenstern, Dress & Wener 1996). This has been addressed recently by using a travelling salesman approach to determine the order of alignments (Chantal & Gaston 1999).
Show more

74 Read more

FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies

FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies

Although there are many scripts and online platforms that address these issues or manipulate sequence align- ments with single processing steps, a software tool which enables combined processing steps in a single operation is lacking. Software like SequenceMatrix [39], Transla- torX [40], and CONCATENATOR [41] are pure con- catenation tools which can be used only via graphical user interface or which are web server designed and therefore cannot be implemented in automatic process pipelines. 2matrix [42] is a pure concatenation tool as well but command line driven. SCaFoS [43] is a phyloge- nomic tool for selecting and concatenating sequences in large multigene and species datasets at either the amino acid or nucleotide level. Although SCaFoS is efficient at selecting orthologous sequences, creating chimerical sequences, and selecting genes according to their level of missing data, it lacks alignment processing options such as sequence translation, RY-coding, secondary structure handling, sequence renaming and consensus sequence generation.
Show more

8 Read more

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment

As with traditional objective functions for sequence align- ment, numerically optimal pairwise alignments can be cal- culated efficiently in the segment-based approach. In DIALIGN, this is done by a space-efficient fragment- chaining algorithms [10,11]. However, it is computation- ally not feasible to find mathematically optimal multiple alignments. Thus, heuristics must be used if more than two sequences are to be aligned. All previous versions of DIALIGN used a greedy algorithm for multiple alignment. In an initial step, optimal pairwise alignments are calcu- lated for all possible pairs of input sequences. Since these pairwise alignments are completely independent of each other, they can be calculated on parallel processors [12]. Fragments from these pairwise alignments are then sorted by their scores, i.e. based on their P-values, and then included one-by-one into a growing consistent set of frag- ments involving all pairs of sequences – provided they are consistent with the previously included fragments.
Show more

11 Read more

A Tabu Search Approach to Multiple Sequence Alignment

A Tabu Search Approach to Multiple Sequence Alignment

Tabu C uses an intensification procedure. Whereas Tabu B incorporated an HMM algorithm to locally improve the best solution at the end, Tabu C goes into an intensification phase many times. Generally, an intensification procedure revisits and examines good solutions. It maintains the good portions of this solution and searches to find a better neighboring solution. Tabu C enters an intensification phase when it stabilizes. The tabu is considered stable when a single MSA continues to have the highest score for many iterations. The intensification phase starts with the best MSA. There are two types of moves that will potentially lead to a better solution during this phase. The first type of move swaps individual gap positions in the MSA. The second type of move concentrates on areas that have multiple columns with gaps in over half of the sequences. Either the gaps are removed and the sequences are adjusted or the gaps are simply randomly moved around within that area. This intensification process stops when there is no improvement for many iterations. The intensification phase iteratively refines the best MSA, so that gaps introduced early in the alignment can be removed or switched around. A solution can only enter an intensification phase one time.
Show more

114 Read more

Show all 10000 documents...