Abstract—This paper aims to improve the method of optimal global **sequence** **alignment** in order to increase the computational performance. The huge number of genome sequences is main problem of **alignment**. One of global **sequence** **alignment** methods is **Needleman**-**Wunsch** **algorithm**. This **algorithm** is implemented by constructing a MxN matrix, which M is the length of first **sequence** and N is the length of second **sequence**. All cells of the matrix are filled to compute the score for constructing global **pairwise** **sequence** **alignment**, so that the time and space complexity are very high. Therefore, the **improved** **Needleman**-**Wunsch** **algorithm** (INWA) is addressed to compute partially for the score in the cells. The test set consisted of 1250 **pairwise** **sequence** alignments of human **protein**-**albumin** and it is compared to the original method. As a result shows that the space and time compexity of INWA is O(N) instead of O(MN).

Show more
Various approaches were introduced for the **sequence** **alignment**, such as a dynamic programming approach and a heuristic approach at the expense of accuracy [5] . Two widely known dynamic programming based **pairwise** **sequence** **alignment** algorithms are **Needleman**-**Wunsch** (NW) **algorithm** [2] for the global **alignment** and Smith-Waterman (SW) **algorithm** [6] for the local **alignment** [7] . Both algorithms find the most optimal **alignment** given a pair of sequences, and their computation time is proportional to the length of two sequences to be aligned [8] . Therefore, the computation time may increase significantly when the **sequence** length reaches more than millions. Tools that employ heuristic approaches such as FASTA [9] and BLAST [10] have shown to perform 40 times faster than the Central Processing Unit (CPU) based serial implementation of the SW **algorithm** [11] . However, the outputs of these tools are approximations of the optimal solution [5] .

Show more
12 Read more

can be applied to **Needleman**-**Wunsch** , and local alignments via the Smith-Waterman . In typical usage, **protein** alignments use a to assign scores to amino-acid matches or for matching an amino acid in to a gap in the other. DNA and RNA alignments may use a scoring matrix, but in practice often simply assign a positive match score, a negative mismatch score, and a negative gap penalty. (In standard dynamic programming, the ion is independent of the identity base stacking effects are not taken into account. However, it is possible to account for such A common extension to standard linear gap costs, is the usage of two different gap penalties for opening a gap and for extending a gap. Typically the former is much larger than the latter, e.g. -10 for gap open 2 for gap extension. Thus, the number of gaps in an **alignment** is usually reduced and residues and gaps are kept together, which typically makes more biological sense. The Gotoh **algorithm** implements affine gap costs by using three

Show more
Ever since **sequence** **alignment** gained significance, a large number of algorithms have been published. Most of these can be divided into two categories: **pairwise** **sequence** **alignment** algorithms and multiple **sequence** **alignment** algorithms. **Pairwise** **sequence** **alignment** is probably more of a theoretical interest only, as most problems deal with multiple sequences. However, we need to understand how **pairwise** **sequence** **alignment** works in order to proceed to multiple **sequence** **alignment**. Compared to MSA, **pairwise** **sequence** **alignment** algorithms can produce optimum results because of the small size of their input sequences. Several dynamic programming algorithms are available for this: the most famous being that of **Needleman** and **Wunsch** [1] and that of Smith and Waterman [2]. The former deals with global alignments while the latter with local alignments. Both of these produce the best possible **alignment** for the two given sequences. Several improvements in time complexity of these algorithms were made in later times, most notably by Gotoh [3] and then by Altschul and Erickson [4].

Show more
In bioinformatics, a **sequence** **alignment** is a way of arranging the sequences of DNA, RNA, or **protein** to identify regions of similarity if two sequences in an **alignment** share a common ancestor, mismatches can be interpreted as point mutations and gaps as indels. The goal of this paper is to explore the computational approaches to **sequence** **alignment** in a faster and optimal way. Two techniques that have been studied are global **alignment** and local **alignment**. In this paper, I have used the idea of both the **alignment** techniques separately. Each technique follows an **algorithm** (**Needleman** – **Wunsch** **algorithm** for global **alignment** and Smith – Waterman **algorithm** for local **alignment**) which helps in generating proper optimal **alignment** accordingly. Multiple DNA

Show more
32 Read more

The global **pairwise** **sequence** **alignment** algorithms based on dynamic programming match each base pair step by step in the **sequence** under observation from start to end. This approach increases the time complexity which increases further many folds when large sequences are used. **Needleman**-**Wunsch**, Smith-Waterman, ALIGN, FASTA, BLAST and many other pair-wise **sequence** **alignment** algorithms are based on dynamic programming approach. The present communication is an attempt to provide a method for improvisation in dynamic programming used for pair-wise **sequence** **alignment**. The proposed technique is based on look-ahead method which decides whether it is required to continue or stop the processing of **alignment** steps for the **sequence** pair under observation from the current point if significant match score is not achieved. A threshold to be set by a user a-priori, indicating minimum percent match per base error to be accepted in **sequence** **alignment** process. The present improvisation method of dynamic programming can reduce bulky computational steps and hence save a reasonable amount of time in **pairwise** **sequence** **alignment** process.

Show more
The optimization problem of the MSA is used to be solved using dynamic programming (**Needleman** and **Wunsch**, 1970; Smith and Waterman, 1981), and progressive methods (Al Ait et al., 2013; Lalwani et al., 2015). However, problems of high processing and high memory usage may be faced with no guarantee that the system will reach the optimal solution. The new trend is using iterative approach techniques due to their simplicity, and ability to solve multidimensional opti- mization problems in many fields (Das et al., 2008; Kiranyaz et al., 2009). Particle swarm optimization (PSO) is a swarm intelligent technique which proves its ability in solving MSA problem. However, the PSO suffers from the trapping of par- ticles in local optima. Moreover, the PSO **algorithm** can han- dle short sequences in an efficient way, but increasing **sequence** lengths lead to decreasing solution accuracy. The main targets of this paper are to:

Show more
15 Read more

The **Needleman** **Wunsch** algrithm is a dynamic programming approach to align global optimally. How ever, this algoritm needs high computational time and space complexity. Therefore, it is need to optimize the filling and backtracking process. The main deferrence of the purposed method is filling process of matrix **alignment** with spesific area as in Figure 5. If the both sequences have the same length, then filling mattrix is on the main diagonal (D), (D-1) and (D+1). The (D-1) is a diagonal below the main diagonal and (D+1) is a diagonal above the main diagonal. In this case, we use the same with FAST **Needleman** **Wunsch** **algorithm** [5].

Show more
All possible pairs of residues (DNA bases or **protein** amino acids) - one from each **sequence** - are represented in a 2- dimensional array. The sequences are written across the top and down the left side of the matrix, except that an extra row (row #0) and column (column #0) are added to allow the **alignment** to begin with a gap of any length in either **sequence**. The gap rows are filled with penalty scores for gaps of increasing lengths. Maximum possible values are calculated for all other boxes below, to the right of the top row and left column using the above scoring functions. All possible alignments are represented by pathways through this matrix. Each cell is the maximum possible score for an **alignment** ending at that point. For each cell, look at all possible pathways back to the beginning of the **sequence** (allowing gaps) and give that cell the value of the maximum scoring pathway.

Show more
13 Read more

The Wireless LAN standard 802.11b and 802.11g in the process of distributing data using Direct **Sequence** Spread Spectrum (DSSS) technology (Goldsmith, 2005). DSSS works by taking a data stream of zeros and ones and modulating it with a second pattern, the chipping **sequence**. Various other electronic devices in a home, such as cordless phones, garage door openers, and microwave ovens, maybe use same frequency range. Any such device can interfere with a WLANs network, slowing down its performance and potentially breaking network connections (Perez, 1998; Theodore, 2001).

Show more
25 Read more

Aligning closely related bacterial genomes (for instance strains of the same species) should be one of the simplest cases for genome aligners, since the genomes are of moderate size (generally 1 to 6 Mb) and divergence times are short. Nevertheless, we observed that even in such cases, some WGA tools fail to capture more divergent regions, which are left out of the **alignment**, or conversely, tend to include wrong alignments of unrelated regions that need to be filtered out in a post-processing step [15,16]. With the aim of addressing this issue, we designed a more sensitive method for the similarity detection phase and a strategy to avoid the inclusion of badly aligned regions. We implemented this strategy in a new whole genome aligner named YOC, designed for robust **pairwise** **alignment** of collinear bacterial genomes. YOC provides several improvements: the strategy is simplified compared to other anchor-based tools and little parameter tuning is needed. Moreover, its sensitivity makes it possible to align more distantly related bacterial genomes. We also analyzed the quality and the reliability of the resulting alignments, which were extensively evaluated on several bacterial datasets. To this end, we introduce a quantitative criterion, GRA-FIL, based on the GRAPe soft- ware [25], and applied it to benchmark several tools. We show that this criterion measures efficiently the unreliable parts of the alignments, thus enabling rapid comparison of the performances of different genome aligners.

Show more
17 Read more

[3], Isokawa et al. presented a method to deal with multiple **sequence** **alignment** using a genetic **algorithm** (GA). In [4], Notredame et al. presented a method called SAGA for **sequence** **alignment** by genetic algorithms. Itinvolved evolving a opulation of alignments in a quasi-evolutionary manner and gradually improving the fitness of the population by an objective function that measures the multiple **sequence** **alignment** quality. In [5], Stoye presented techniques of multiple **sequence** **alignment** using a divide-and-conquer method, where an increase of the speed compared to optimal multiple **alignment** by dynamic programming can be guaranteed. In [6], Thompson et al. presented a method for improving the sensitivity of progressive multiple **sequence** **alignment** through **sequence** weighting, positioning specific gap penalties and weight matrix choice.

Show more
Speedup (Sp) is defined as the ratio of computation time on a single core to the computation time on C cores. The obtained speedup ratios for DSM, DSR with number of cores and different chunk sizes are presented in figure. 9(a)–(e), respectively. It is clearly seen from these figures that our developed parallel Wavefront **algorithm** scales almost linearly. In figure. 9(a), the speedup line for DSR is higher than the lines for DSM. The number of communications from chunk size 2 14 to 2 10 (figure 9.b to 9.e)is decreasing and hence time required for internal computation is increasing. This elevation of the line for both datasets at most time consuming chunk size value confirms the suitability of our parallel Wavefront **algorithm** for huge datasets with large chunk size. The speedup values for both datasets figure. 9(a) - (e) are almost same up to 8 processors. The separation of the lines at the processor size 16 can be explained as the unsustainable balance between the computed local data and communicated data. There are comparatively less amount of communication for 8 cores while less amount of calculation for 16 cores. This situation changes for cores more than 16 for all chunk sizes where lines at each processor sizes are not matching. The

Show more
13 Read more

test. The outcome of the t-test is considered to be sta- tistically significant if its p-value is less than 0.05. To avoid over-fitting, the **alignment** quality scores are obtained using a 3-fold cross-validation. In this proce- dure, each SABmark group is randomly partitioned into three non-overlapping sub-sets, with two sub-sets used for training and the remaining sub-set used for testing. The process is repeated three times, so that each sub-set is used for testing once. In the case of a general-purpose similarity matrix, “ training ” means optimizing gap penal- ties for this matrix. In the case of a similarity matrix specific for a SABmark group k , “ training ” means deriv- ing the group-specific matrix itself and optimizing gap penalties for this matrix. “Testing” means using the matrix and optimized gap penalties obtained during the training step to align sequences from the test sub-set. During cross-validation, gap initiation, a, and gap exten- sion, b, penalties for a given group k and a given simi- larity matrix A are optimized using the following grid search procedure: Sequences from the training set of group k are aligned using matrix A , and all possible combinations of integer gap penalties in range 1 ≤a≤ 50, 1 ≤b≤ 30 are tested; the combination ( a , b ) that results in the highest average quality score is selected as the best and is used with matrix A to align sequences from the test set of group k .

Show more
10 Read more

Each **algorithm** adopting progressive approach or iterative approach produces mistakes in multiple **sequence** align- ment, thus, we used refinement algorithms in order to cor- rect bad aligned residues, that can ameliorate the quality of the multiple **alignment** by ameliorate his scores. The pro- cess of all refinement algorithms consists to apply a set of modifications to an initial multiple **sequence** **alignment** in order to construct a new one having better scores than the previous **alignment**. These modifications are repeated un- til convergence (i.e. no improvement can be made on the current **alignment**). There are different algorithms for re- finement of multiple **sequence** alignments:

Show more
Deoxyribonucleic acid (DNA) molecules are information macromolecules which form the blueprint for life on earth, and each strand of the famous double helix is a linear combination of the polymerized nucleotides (bases) adenine (A), guanine (G), cytosine (C), and thymine (T). Therefore, one DNA **sequence** can be treated as a string of characters with a four-character set ={A,C,G,T}. Thus, each DNA **sequence** S*[1].