Top PDF An Improved Needleman-Wunsch Algorithm for Pairwise Sequence Alignment of Protein-Albumin

An Improved Needleman-Wunsch Algorithm for Pairwise Sequence Alignment of Protein-Albumin

An Improved Needleman-Wunsch Algorithm for Pairwise Sequence Alignment of Protein-Albumin

Abstract—This paper aims to improve the method of optimal global sequence alignment in order to increase the computational performance. The huge number of genome sequences is main problem of alignment. One of global sequence alignment methods is Needleman-Wunsch algorithm. This algorithm is implemented by constructing a MxN matrix, which M is the length of first sequence and N is the length of second sequence. All cells of the matrix are filled to compute the score for constructing global pairwise sequence alignment, so that the time and space complexity are very high. Therefore, the improved Needleman-Wunsch algorithm (INWA) is addressed to compute partially for the score in the cells. The test set consisted of 1250 pairwise sequence alignments of human protein-albumin and it is compared to the original method. As a result shows that the space and time compexity of INWA is O(N) instead of O(MN).
Show more

5 Read more

Serial and parallel implementation of Needleman-Wunsch algorithm

Serial and parallel implementation of Needleman-Wunsch algorithm

Various approaches were introduced for the sequence alignment, such as a dynamic programming approach and a heuristic approach at the expense of accuracy [5] . Two widely known dynamic programming based pairwise sequence alignment algorithms are Needleman-Wunsch (NW) algorithm [2] for the global alignment and Smith-Waterman (SW) algorithm [6] for the local alignment [7] . Both algorithms find the most optimal alignment given a pair of sequences, and their computation time is proportional to the length of two sequences to be aligned [8] . Therefore, the computation time may increase significantly when the sequence length reaches more than millions. Tools that employ heuristic approaches such as FASTA [9] and BLAST [10] have shown to perform 40 times faster than the Central Processing Unit (CPU) based serial implementation of the SW algorithm [11] . However, the outputs of these tools are approximations of the optimal solution [5] .
Show more

12 Read more

A Noble Approach on Bioinformatics: Smart Sequence Alignment Algorithm applying DNA Replication (SSAADR)

A Noble Approach on Bioinformatics: Smart Sequence Alignment Algorithm applying DNA Replication (SSAADR)

Sequence alignment is a robust filed which can intelligently retrieve information of all the existence of presence from different patterns on various disciplines like biological, image processing, phrases and pattern matching. Sequences alignment is significant to find out the associative information regarding nature, formation and behavior of the patterns. In this paper, the concept of sequence alignment of bioinformatics is used [1] [2]. It is the way of arranging the sequences of Deoxyribonucleic Acid (DNA), Ribonucleic Acid (RNA) and Protein [3] [4] [5]. There are two algorithms of bioinformatics which are mostly used: Needleman-Wunsch Algorithm and Smith-Waterman Algorithm [1] [6].They are dynamic algorithms used to find optimal alignment solution between two strings [7] [8] [9] [10] [11].
Show more

6 Read more

Fast Dynamic Algorithm for Sequence Alignment based on Bioinformatics

Fast Dynamic Algorithm for Sequence Alignment based on Bioinformatics

Sequence alignment is widely used in Bioinformatics for Genome Sequence difference identification. It is the main problem of computational biology. Any sequence of Deoxyribonucleic acid (DNA), Ribonucleic acid (RNA), and protein can be alignment by many algorithms called bioinformatics algorithms. This paper presents a new implemented algorithm for sequence alignment based on concepts from bioinformatics algorithms .The implemented algorithm is called fast dynamic algorithm for sequence alignment (FDASA). This implemented algorithm based on making a matrix of M×N (M is the length of the first sequence, N is the length of the second sequence), After that filling the three main diagonal without filling the unused data and at the same time get the optimal solution; so that the execution time is decreased, the performance is high and the memory location decreased. The implementation introduced in this paper made a comparison between the dynamic algorithms Needleman- Wunsch algorithm, Smith-Waterman and our algorithm FDASA to test the execution time. The results show that our algorithm FDASA decreased the execution time when compared with Needleman-Wunsch and Smith-Waterman algorithms.
Show more

8 Read more

Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine

Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine

protein sequences were constructed by the scores of the glo- bal alignment (NeedlemanWunsch algorithm) between only N-terminal signal sequences. The high prediction per- formance of our method was verified, using the eukaryotic plant and non-plant data sets, through 5-fold cross-valida- tion as well as the jackknife validation. The advantages of our prediction system are: (1) the discriminative power of the feature vector is expected to increase, since it contains the information on positive as well as negative data; (2) our prediction system has important biological implica- tions because it considers only N-terminal signal sequences; and (3) the system is easy to understand and implement. Despite these advantages, there remain two basic limita- tions inherent in this approach. First, the vectorization of protein sequences is computationally ‘‘expensive’’, because it is based on a dynamic programming algorithm. Second, our prediction system is not suitable for the discrimination between cytoplasmic and nuclear proteins, since the sorting signals of these protein sequences are not located at the N- terminus. Therefore, what remains to be done in the future research is to extend the proposed system to circumvent these limitations.
Show more

6 Read more

DNA Sequence alignment using programme by algorithm

DNA Sequence alignment using programme by algorithm

can be applied to Needleman-Wunsch , and local alignments via the Smith-Waterman . In typical usage, protein alignments use a to assign scores to amino-acid matches or for matching an amino acid in to a gap in the other. DNA and RNA alignments may use a scoring matrix, but in practice often simply assign a positive match score, a negative mismatch score, and a negative gap penalty. (In standard dynamic programming, the ion is independent of the identity base stacking effects are not taken into account. However, it is possible to account for such A common extension to standard linear gap costs, is the usage of two different gap penalties for opening a gap and for extending a gap. Typically the former is much larger than the latter, e.g. -10 for gap open 2 for gap extension. Thus, the number of gaps in an alignment is usually reduced and residues and gaps are kept together, which typically makes more biological sense. The Gotoh algorithm implements affine gap costs by using three
Show more

5 Read more

A RANDOMIZED ALGORITHM FOR FAST SEQUENCE ALIGNMENT

A RANDOMIZED ALGORITHM FOR FAST SEQUENCE ALIGNMENT

Ever since sequence alignment gained significance, a large number of algorithms have been published. Most of these can be divided into two categories: pairwise sequence alignment algorithms and multiple sequence alignment algorithms. Pairwise sequence alignment is probably more of a theoretical interest only, as most problems deal with multiple sequences. However, we need to understand how pairwise sequence alignment works in order to proceed to multiple sequence alignment. Compared to MSA, pairwise sequence alignment algorithms can produce optimum results because of the small size of their input sequences. Several dynamic programming algorithms are available for this: the most famous being that of Needleman and Wunsch [1] and that of Smith and Waterman [2]. The former deals with global alignments while the latter with local alignments. Both of these produce the best possible alignment for the two given sequences. Several improvements in time complexity of these algorithms were made in later times, most notably by Gotoh [3] and then by Altschul and Erickson [4].
Show more

5 Read more

Faster and efficient algorithm for sequence alignment

Faster and efficient algorithm for sequence alignment

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity if two sequences in an alignment share a common ancestor, mismatches can be interpreted as point mutations and gaps as indels. The goal of this paper is to explore the computational approaches to sequence alignment in a faster and optimal way. Two techniques that have been studied are global alignment and local alignment. In this paper, I have used the idea of both the alignment techniques separately. Each technique follows an algorithm (NeedlemanWunsch algorithm for global alignment and Smith – Waterman algorithm for local alignment) which helps in generating proper optimal alignment accordingly. Multiple DNA
Show more

32 Read more

Improvisation of Global Pairwise Sequence Alignment Algorithm Using Dynamic Programming

Improvisation of Global Pairwise Sequence Alignment Algorithm Using Dynamic Programming

The global pairwise sequence alignment algorithms based on dynamic programming match each base pair step by step in the sequence under observation from start to end. This approach increases the time complexity which increases further many folds when large sequences are used. Needleman-Wunsch, Smith-Waterman, ALIGN, FASTA, BLAST and many other pair-wise sequence alignment algorithms are based on dynamic programming approach. The present communication is an attempt to provide a method for improvisation in dynamic programming used for pair-wise sequence alignment. The proposed technique is based on look-ahead method which decides whether it is required to continue or stop the processing of alignment steps for the sequence pair under observation from the current point if significant match score is not achieved. A threshold to be set by a user a-priori, indicating minimum percent match per base error to be accepted in sequence alignment process. The present improvisation method of dynamic programming can reduce bulky computational steps and hence save a reasonable amount of time in pairwise sequence alignment process.
Show more

6 Read more

A SHARED MEMORY BASED IMPLEMENTATION OF NEEDLEMAN WUNSCH ALGORITHM USING SKEWING TRANSFORMATION

A SHARED MEMORY BASED IMPLEMENTATION OF NEEDLEMAN WUNSCH ALGORITHM USING SKEWING TRANSFORMATION

Needleman-Wunsch [1] and Smith-Waterman [2] are two well known dynamic programming based algorithms developed in the 70s and early 80s to detect similarity between a pair of DNA/protein sequences. BLAST [6] is the most commonly used sequence alignment program for a pair wise alignment. It is based upon the principle of hashing small matching sequences and then extending the hash matches to create high-scoring segment pairs until the highest possible score is obtained. BLAST is faster than any dynamic programming based approach. However, it does not guarantee the optimal alignment of the query and database as dynamic programming.
Show more

7 Read more

Fragmented protein sequence alignment using two-layer particle swarm optimization (FTLPSO)

Fragmented protein sequence alignment using two-layer particle swarm optimization (FTLPSO)

The optimization problem of the MSA is used to be solved using dynamic programming (Needleman and Wunsch, 1970; Smith and Waterman, 1981), and progressive methods (Al Ait et al., 2013; Lalwani et al., 2015). However, problems of high processing and high memory usage may be faced with no guarantee that the system will reach the optimal solution. The new trend is using iterative approach techniques due to their simplicity, and ability to solve multidimensional opti- mization problems in many fields (Das et al., 2008; Kiranyaz et al., 2009). Particle swarm optimization (PSO) is a swarm intelligent technique which proves its ability in solving MSA problem. However, the PSO suffers from the trapping of par- ticles in local optima. Moreover, the PSO algorithm can han- dle short sequences in an efficient way, but increasing sequence lengths lead to decreasing solution accuracy. The main targets of this paper are to:
Show more

15 Read more

Pairwise Sequence Alignment between HBV and HCC Using Modified Needleman Wunsch Algorithm

Pairwise Sequence Alignment between HBV and HCC Using Modified Needleman Wunsch Algorithm

The Needleman Wunsch algrithm is a dynamic programming approach to align global optimally. How ever, this algoritm needs high computational time and space complexity. Therefore, it is need to optimize the filling and backtracking process. The main deferrence of the purposed method is filling process of matrix alignment with spesific area as in Figure 5. If the both sequences have the same length, then filling mattrix is on the main diagonal (D), (D-1) and (D+1). The (D-1) is a diagonal below the main diagonal and (D+1) is a diagonal above the main diagonal. In this case, we use the same with FAST Needleman Wunsch algorithm [5].
Show more

9 Read more

DECIPHERING THE SEQUENCE ALIGNMENT BY NEEDLEMAN-WUNSCH ALGORITHM ON TO REDUCE COMPUTATIONAL TIME VIA HIGH PERFORMANCE COMPUTING

DECIPHERING THE SEQUENCE ALIGNMENT BY NEEDLEMAN-WUNSCH ALGORITHM ON TO REDUCE COMPUTATIONAL TIME VIA HIGH PERFORMANCE COMPUTING

All possible pairs of residues (DNA bases or protein amino acids) - one from each sequence - are represented in a 2- dimensional array. The sequences are written across the top and down the left side of the matrix, except that an extra row (row #0) and column (column #0) are added to allow the alignment to begin with a gap of any length in either sequence. The gap rows are filled with penalty scores for gaps of increasing lengths. Maximum possible values are calculated for all other boxes below, to the right of the top row and left column using the above scoring functions. All possible alignments are represented by pathways through this matrix. Each cell is the maximum possible score for an alignment ending at that point. For each cell, look at all possible pathways back to the beginning of the sequence (allowing gaps) and give that cell the value of the maximum scoring pathway.
Show more

13 Read more

An improved algorithm for channel allocation on direct sequence spread spectrum

An improved algorithm for channel allocation on direct sequence spread spectrum

The Wireless LAN standard 802.11b and 802.11g in the process of distributing data using Direct Sequence Spread Spectrum (DSSS) technology (Goldsmith, 2005). DSSS works by taking a data stream of zeros and ones and modulating it with a second pattern, the chipping sequence. Various other electronic devices in a home, such as cordless phones, garage door openers, and microwave ovens, maybe use same frequency range. Any such device can interfere with a WLANs network, slowing down its performance and potentially breaking network connections (Perez, 1998; Theodore, 2001).
Show more

25 Read more

YOC, A new strategy for pairwise alignment of collinear genomes

YOC, A new strategy for pairwise alignment of collinear genomes

Aligning closely related bacterial genomes (for instance strains of the same species) should be one of the simplest cases for genome aligners, since the genomes are of moderate size (generally 1 to 6 Mb) and divergence times are short. Nevertheless, we observed that even in such cases, some WGA tools fail to capture more divergent regions, which are left out of the alignment, or conversely, tend to include wrong alignments of unrelated regions that need to be filtered out in a post-processing step [15,16]. With the aim of addressing this issue, we designed a more sensitive method for the similarity detection phase and a strategy to avoid the inclusion of badly aligned regions. We implemented this strategy in a new whole genome aligner named YOC, designed for robust pairwise alignment of collinear bacterial genomes. YOC provides several improvements: the strategy is simplified compared to other anchor-based tools and little parameter tuning is needed. Moreover, its sensitivity makes it possible to align more distantly related bacterial genomes. We also analyzed the quality and the reliability of the resulting alignments, which were extensively evaluated on several bacterial datasets. To this end, we introduce a quantitative criterion, GRA-FIL, based on the GRAPe soft- ware [25], and applied it to benchmark several tools. We show that this criterion measures efficiently the unreliable parts of the alignments, thus enabling rapid comparison of the performances of different genome aligners.
Show more

17 Read more

Analysing Multiple DNA Sequence Alignment Algorithms  Smith Waterman  Algorithm and Parallel Smith Waterman Algorithm

Analysing Multiple DNA Sequence Alignment Algorithms Smith Waterman Algorithm and Parallel Smith Waterman Algorithm

[3], Isokawa et al. presented a method to deal with multiple sequence alignment using a genetic algorithm (GA). In [4], Notredame et al. presented a method called SAGA for sequence alignment by genetic algorithms. Itinvolved evolving a opulation of alignments in a quasi-evolutionary manner and gradually improving the fitness of the population by an objective function that measures the multiple sequence alignment quality. In [5], Stoye presented techniques of multiple sequence alignment using a divide-and-conquer method, where an increase of the speed compared to optimal multiple alignment by dynamic programming can be guaranteed. In [6], Thompson et al. presented a method for improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positioning specific gap penalties and weight matrix choice.
Show more

6 Read more

Parallelizing and Analyzing the Behavior of Sequence Alignment Algorithm on a Cluster of Workstations for Large Datasets

Parallelizing and Analyzing the Behavior of Sequence Alignment Algorithm on a Cluster of Workstations for Large Datasets

Speedup (Sp) is defined as the ratio of computation time on a single core to the computation time on C cores. The obtained speedup ratios for DSM, DSR with number of cores and different chunk sizes are presented in figure. 9(a)–(e), respectively. It is clearly seen from these figures that our developed parallel Wavefront algorithm scales almost linearly. In figure. 9(a), the speedup line for DSR is higher than the lines for DSM. The number of communications from chunk size 2 14 to 2 10 (figure 9.b to 9.e)is decreasing and hence time required for internal computation is increasing. This elevation of the line for both datasets at most time consuming chunk size value confirms the suitability of our parallel Wavefront algorithm for huge datasets with large chunk size. The speedup values for both datasets figure. 9(a) - (e) are almost same up to 8 processors. The separation of the lines at the processor size 16 can be explained as the unsustainable balance between the computed local data and communicated data. There are comparatively less amount of communication for 8 cores while less amount of calculation for 16 cores. This situation changes for cores more than 16 for all chunk sizes where lines at each processor sizes are not matching. The
Show more

13 Read more

Protein sequence alignment with family specific amino acid similarity matrices

Protein sequence alignment with family specific amino acid similarity matrices

test. The outcome of the t-test is considered to be sta- tistically significant if its p-value is less than 0.05. To avoid over-fitting, the alignment quality scores are obtained using a 3-fold cross-validation. In this proce- dure, each SABmark group is randomly partitioned into three non-overlapping sub-sets, with two sub-sets used for training and the remaining sub-set used for testing. The process is repeated three times, so that each sub-set is used for testing once. In the case of a general-purpose similarity matrix, “ training ” means optimizing gap penal- ties for this matrix. In the case of a similarity matrix specific for a SABmark group k , “ training ” means deriv- ing the group-specific matrix itself and optimizing gap penalties for this matrix. “Testing” means using the matrix and optimized gap penalties obtained during the training step to align sequences from the test sub-set. During cross-validation, gap initiation, a, and gap exten- sion, b, penalties for a given group k and a given simi- larity matrix A are optimized using the following grid search procedure: Sequences from the training set of group k are aligned using matrix A , and all possible combinations of integer gap penalties in range 1 ≤a≤ 50, 1 ≤b≤ 30 are tested; the combination ( a , b ) that results in the highest average quality score is selected as the best and is used with matrix A to align sequences from the test set of group k .
Show more

10 Read more

Refin-Align: New Refinement Algorithm For Multiple Sequence Alignment

Refin-Align: New Refinement Algorithm For Multiple Sequence Alignment

Each algorithm adopting progressive approach or iterative approach produces mistakes in multiple sequence align- ment, thus, we used refinement algorithms in order to cor- rect bad aligned residues, that can ameliorate the quality of the multiple alignment by ameliorate his scores. The pro- cess of all refinement algorithms consists to apply a set of modifications to an initial multiple sequence alignment in order to construct a new one having better scores than the previous alignment. These modifications are repeated un- til convergence (i.e. no improvement can be made on the current alignment). There are different algorithms for re- finement of multiple sequence alignments:
Show more

8 Read more

DNA Sequence Alignment Algorithm Based on k tuple Statistics

DNA Sequence Alignment Algorithm Based on k tuple Statistics

Deoxyribonucleic acid (DNA) molecules are information macromolecules which form the blueprint for life on earth, and each strand of the famous double helix is a linear combination of the polymerized nucleotides (bases) adenine (A), guanine (G), cytosine (C), and thymine (T). Therefore, one DNA sequence can be treated as a string of characters with a four-character set ={A,C,G,T}. Thus, each DNA sequence S*[1].

6 Read more

Show all 10000 documents...