Top PDF Serial and parallel implementation of Needleman-Wunsch algorithm

Serial and parallel implementation of Needleman-Wunsch algorithm

Serial and parallel implementation of Needleman-Wunsch algorithm

Needleman-Wunsch dynamic programming algorithm measures the similarity of the pairwise sequence and finds the optimal pair given the number of sequences. The task becomes nontrivial as the number of sequences to compare or the length of sequences increases. This research aims to parallelize the computation involved in the algorithm to speed up the performance using CUDA. However, there is a data dependency issue due to the property of a dynamic programming algorithm. As a solution, this research introduces the heterogeneous anti-diagonal approach, which benefits from the interaction between the serial implementation on CPU and the parallel implementation on GPU. We then measure and compare the computation time between the proposed approach and a straightforward serial approach that uses CPU only. Measurements of computation times are performed under the same experimental setup and using various pairwise sequences at different lengths. The experiment showed that the proposed approach outperforms the serial method in terms of computation time by approximately three times. Moreover, the computation time of the proposed heterogeneous anti-diagonal approach increases gradually despite the big increments in sequence length, whereas the computation time of the serial approach grows rapidly.
Show more

12 Read more

Some Computational Results on MPI Parallel Implementation of Derived Subgraph Algorithm

Some Computational Results on MPI Parallel Implementation of Derived Subgraph Algorithm

In this Section, we introduce a serial derived subgraphs algorithm SDSA [6] which calculates the number of derived subgraphs for a given graph G . The algorithm also determines the residual and non-residual edges. The parameters of the algorithm are :

6 Read more

Implementation of Radix 4 Multiplier with a Parallel MAC unit using MBE Algorithm

Implementation of Radix 4 Multiplier with a Parallel MAC unit using MBE Algorithm

Our project gives a clear concept of different multiplier. We found that the parallel multipliers are much option than the serial multiplier. We concluded this from the result of power consumption and the total area. In case of parallel multipliers, the total area is much less than that of serial multipliers. Hence the power consumption is also less. This is clearly depicted in our results. This speeds up the calculation and makes the system faster.

6 Read more

DECIPHERING THE SEQUENCE ALIGNMENT BY NEEDLEMAN-WUNSCH ALGORITHM ON TO REDUCE COMPUTATIONAL TIME VIA HIGH PERFORMANCE COMPUTING

DECIPHERING THE SEQUENCE ALIGNMENT BY NEEDLEMAN-WUNSCH ALGORITHM ON TO REDUCE COMPUTATIONAL TIME VIA HIGH PERFORMANCE COMPUTING

The matrices are global and are accessible to the CPUs on a Grid. While implementing the above parallel Needleman-Wunsch algorithm using Alchemi framework [14], we faced the problem of increased network traffic. For small size of matrix it is not significant. However with typical sizes of DNA sequences the network traffic overhead has to be reduced. To handle these problems two formulas as under were used:

13 Read more

Pairwise Sequence Alignment between HBV and HCC Using Modified Needleman Wunsch Algorithm

Pairwise Sequence Alignment between HBV and HCC Using Modified Needleman Wunsch Algorithm

Chen [8]. Furthermore, the developing performance of dynamic programming has been applied through sharing memory to speed up the alignment process. The dynamic programming method for sequence alignment has developed with share memory system using four different data partitioning schemas: blocked columnwise, rowwise, antidiagonal, and revised blocked columnwise [9]. Another research is also parallel computing utilized on clusters of computers known as Distributed Memory. This research used star algorithms in parallel environment using MPI to distribute computing of DNA Multiple sequence Alignment [10].
Show more

9 Read more

A SHARED MEMORY BASED IMPLEMENTATION OF NEEDLEMAN WUNSCH ALGORITHM USING SKEWING TRANSFORMATION

A SHARED MEMORY BASED IMPLEMENTATION OF NEEDLEMAN WUNSCH ALGORITHM USING SKEWING TRANSFORMATION

An efficient GPU based implementation of Multiple Sequence Alignment is given by Liu et. al. [14]. They reformulated the compute intensive stage of CLUSTAL-W, so that it suits the GPU architecture. It involves parallelizing the Needleman-Wunsch algorithm. An efficient implementation of Needleman Wunsch algorithm on graphics processing unit is also presented in [15]. Our approach differs from the one presented in [15] by the use of lock free and lock based approaches for block synchronization on GPU. Our approach for parallelizing the Needleman-Wunsch algorithm differs by using skewing transformation on the original data access pattern to exhibit the inherent parallelism existing in the code.
Show more

7 Read more

FPGA Implementation of Serial and Parallel FIR Filters by using Vedic and Wallace tree Multiplier

FPGA Implementation of Serial and Parallel FIR Filters by using Vedic and Wallace tree Multiplier

a) Serial Architecture of Micro programmed FIR Filter:The serial architecture of N-tap microprogrammed FIR filter is shown in Fig. 2. It basically comprises of a MCU and a datapath unit. The MCU consists of a microprogram counter and microprogram memory. The datapath unit comprises of 2N data (X) and coefficient (W) registers and M- to-N decoder (M = log2N), two N-input multiplexers for selecting the data and coefficients, a multiplier and an adder, a two input multiplexer to control the flow of data from multiplier or accumulator, one 16-bit accumulator and a 16-bit register to latch the data.
Show more

6 Read more

IMPLEMENTATION OF PARALLEL ARTIFICIAL BEE COLONY ALGORITHM ON VEHICLE ROUTING PROBLEM

IMPLEMENTATION OF PARALLEL ARTIFICIAL BEE COLONY ALGORITHM ON VEHICLE ROUTING PROBLEM

We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale . Majority of processors today have multiple cores and even for a single core multiple treads can be implemented. In general, a system of n parallel processors, each of speed k, is less efficient than one processor of speed n * k. However, the parallel system is usually much cheaper to build and its power consumption is significantly smaller. [2][10] 5.1 Independent parallel runs approach
Show more

9 Read more

Implementation of Parallel Multiplier using Advanced Modified Booth Encoding Algorithm

Implementation of Parallel Multiplier using Advanced Modified Booth Encoding Algorithm

Silver Oak College of Engineering & Technology (SOCET), Gujarat Technological University- Ahmedabad, India. ________________________________________________________________________________________________________ Abstract - This research paper represents the implementation of Advanced Modified Booth Encoding (AMBE) parallel Multiplier. The already existed Booth and Baugh Wooly Multipliers are used for only signed numbers, while array multipliers uses only for unsigned numbers. Modern Computer system needs a very high speed parallel multiplier which is used for signed and unsigned numbers. This multiplier is obtained by extending a sign bit from Modified Booth Encoder and generates an additional partial product; the proposed multiplier can be used for both signed and unsigned bits. The Carry Save Adder tree (CSA) and Carry Look Ahead Adder (CLA) are used to add all partial products and generates the final product. This multiplier uses for both signed and unsigned numbers so total chip area reduces and power reduces as well. The Advanced Modified Booth Encoding parallel multiplier for 8 x 8 bits signed-unsigned and 64 x 64 bits signed unsigned multiplier is simulated using Verilog-HDL language in Xilinx 13.2ISE simulator and implements on Spartan 3E starter board.
Show more

7 Read more

Dissociating Parallel and Serial Processing of Numerical Value

Dissociating Parallel and Serial Processing of Numerical Value

data also confirm slower RTs for the serial search compared to the parallel search. This is expected because the salient target in the parallel search captures participants’ attention, whereas according to the classical view, in the serial search participants would have to scan each item in the display until finding the target (Duncan & Humphreys, 1989; Treisman & Gelade, 1980). Overall, RT patterns for the ensemble task did not increase with increased display size as they did in serial search, but in fact stayed the same or decreased, suggesting that participants did not need to examine each digit individually in order to incorporate it into a representation of the average. Rather, participants were able to extract information from items across the entire display in parallel. As discussed above, in Experiment 1, the reduction in RTs for larger displays in the ensemble task was driven by the greater-than-five condition; RTs in the less-than-five condition remained consistent across display sizes. Furthermore, in Experiment 1, accuracy increased with display size for only the greater-than-five condition. In addition to possible contribution of the brightness difference between conditions, it may be that participants showed a sample size bias such that they were more likely to respond that the mean is greater than five when there were more items in the display. Evidence for sample size biases whereby larger set sizes yield larger average estimations has been demonstrated in a task where participants were asked to provide numerical esti- mates of the mean of set of digits (Smith & Price, 2010). In our experiment, such a bias would be expected to also result in proportionately lower accuracy with increasing display size in the less-than-five condition; in Ex- periment 1, there was a non-significant trend in this direction, however in Experiment 2, the pattern of increased accuracy across display sizes did not differ based
Show more

20 Read more

Implementation and Evaluation of a Parallel Algorithm for Structure Learning in Bayesian Networks

Implementation and Evaluation of a Parallel Algorithm for Structure Learning in Bayesian Networks

robust MPI libary in C. BDeu is chosen as the scoring function, as it is also used in Tamada et al. [TIM11]. BDeu produces maximized score by default, and we multiply it by −1 to transform it to minimized score. An uninformative value of 1 is used as ESS in BDeu. Gathering counts efficiently from dataset is an important task, and it is completed by using AD-tree [ML98]. Distributed data communication is quite intense, especially in Para- OS. Processors have to send queries and replies to each other. In order to speed up the communication, an algorithm is employed for deciding processor sender-receiver pairs during the communication. We did not implement score pruning in Para-OS. Because local scores are stored in a highly dispersed way, score pruning would either introduce more data communication between processors, or we would need to change the algorithm.
Show more

61 Read more

Parallel Matrix Implementation of an Integer Division Algorithm Using FPGA

Parallel Matrix Implementation of an Integer Division Algorithm Using FPGA

With “Altium”, the schematic can be drawn hierarchically, using graphical symbols already available in generic libraries [7] for almost all frequently used hardware components. This integrated environment has the major advantage of maintaining the optimal implementation due the background usage of the specific vendor tools, which is entirely transparent, so the designer can now focus on the project itself and not on the tool usage learning [8].

7 Read more

Serial to Parallel Code Converter Tools: A Review

Serial to Parallel Code Converter Tools: A Review

developed as a platform on which research on compiler techniques is done for high- performance machines. It is flexible, modular, powerful and complete enough to compile benchmark programs. SUIF has been successfully used to perform research on various concepts including loop transformation, array data dependence analysis, software pre- fetching, scalar optimizations and instruction scheduling. The SUIF system consists of a parallelizer that automatically searches for parallel loops and generate a corresponding parallel code. To support parallelization the system supplies many features such as reduction variable recognition and data dependence analysis [20].
Show more

9 Read more

Parallel Implementation of the Gauss Seidel Algorithm on k Ary n Cube Machine

Parallel Implementation of the Gauss Seidel Algorithm on k Ary n Cube Machine

Most of the results show that the time consuming on communication between processors limit the parallel computation speed. Motivated by this fact and to conquer this problem we use the k-ary n-cube machine in order to change the interconnection network topology of parallel/ computing, and develop a cluster-based Gauss-Seidel algorithm, which is suitable for the parallel computing. A generic approach for the method will be developed and implemented. Also execution time prediction models will also be presented and verified.

6 Read more

Development of a 6 DOF Parallel Serial Hybrid Manipulator

Development of a 6 DOF Parallel Serial Hybrid Manipulator

The new design is based using a modular approach that have large reachable dextrous workspace along with desired rigidity and positional accuracy for diverse applications. For this purpose, a simple hybrid parallel serial manipulator is proposed after design and feasibility study. In this system, a serial arm is coupled with parallel platform to provide the base motion. The development of the hybrid manipulator system covers mechanical system design, system dynamic modelling and simulation, design optimization, trajectory generation and control system design. The different aspects under hybrid manipulator motion control scheme are shown in figure.5. These aspects of manipulation and control includes
Show more

62 Read more

GPU Implementation of Parallel Support Vector Machine Algorithm with Applications to Intruder Detection

GPU Implementation of Parallel Support Vector Machine Algorithm with Applications to Intruder Detection

SVM training and testing processes involve matrix operations such as multiplication, cumulative sum and seeking extreme value. The computational complexities of these operations are proportional to the size of data sets. GSVM algorithm uses parallel reduction methods to optimize these calculation processes. The basic idea of this method is described as follows. First, the data are divided into n parts and transferred to parallel computing nodes. Second, each computing node summarizes its data and executes corresponding operations, such as multiplication and addition. Finally, each parallel computing node transmits its operation results to the aggregation node for implementing the last operation.
Show more

8 Read more

Robust Parallel Implementation of a Lanczos-based Algorithm for an Structured Electromagnetic Eigenvalue Problem

Robust Parallel Implementation of a Lanczos-based Algorithm for an Structured Electromagnetic Eigenvalue Problem

2.2.3. Main interval decomposition. As we have mentioned before, the shift-and-invert version of the Lanczos’ algorithm computes a subset of the spectrum centred in the shift point. The number of eigenvalues required will determine the number of iterations of the Lanczos’ algorithm and its spatial cost [7]. Obviously, we cannot apply the Lanczos’ algorithm to the main interval [α, β] where all the desired eigenvalues lie. The original problem should be split into many smaller ones to ensure the optimal performance of the Lanczos’ algorithm.

8 Read more

PARALLEL AND SERIAL CONTROL STRATEGIES OF IMAGE UNDERSTANDING

PARALLEL AND SERIAL CONTROL STRATEGIES OF IMAGE UNDERSTANDING

Many image processing tasks exhibit a high degree of data locality and parallelism and map quite readily to specialized massively parallel computing hardware. Parallel distribution of image file reduces the complexity and increase the capability of image enhancement. Image understanding and computer vision are two closely related multidisciplinary research fields concerned with the use of computer algorithms to modify or analyze digital images using signal and image processing, machine learning, and artificial intelligence techniques in order to achieve certain tasks or applications. One of the main goals of image understanding and computer vision is to duplicate the abilities of human vision by electronically perceiving and understanding an image. Image understanding algorithms are required to solve more practical applications. In this paper we represent serial parallel control strategies to handle digital images in a optimized way.
Show more

7 Read more

Title: Parallel Implementation of AES Algorithm on GPU

Title: Parallel Implementation of AES Algorithm on GPU

Advanced Encryption Standard (AES) is a variant of Rijndael cipher algorithm, a symmetric block cipher which translates the plain text into cipher text in blocks. This algorithm has the fixed input block size of 128 bits and the key size of 128, 192, 256 bits. The input - the array of bytes A0, A1 A15 is copied into the state array. Advanced Encryption Standard (AES) is a variant of Rijndael cipher algorithm, a symmetric block cipher which translates the plain text into cipher text in blocks. This algorithm has the fixed input block size of 128 bits and the key size of 128, 192, 256 bits. The input – the array of bytes A0, A1 A15 is copied into the state array.
Show more

6 Read more

Implementation of Modified Booth Algorithm for Parallel MAC

Implementation of Modified Booth Algorithm for Parallel MAC

MAC is composed of an adder, multiplier and an accumulator. Usually adders implemented are Carry- Select or Carry-Save adders, as speed is of utmost importance in DSP (Chandrakasan, Sheng, & Brodersen, 1992 and Weste & Harris, 3rd Ed). One implementation of the multiplier could be as a parallel array multiplier. The inputs for the MAC are to be fetched from memory location and fed to the multiplier block of the MAC, which will perform multiplication and give the result to adder which will accumulate the result and then will store the result into a memory location. This entire process is to be achieved in a single clock cycle (Weste & Harris, 3rd Ed). The architecture of the MAC unit which had been designed in this work consists of one 16-bit register, one 16-bit Modified Booth Multiplier, 32-bit accumulator. To multiply the values of A and B, Modified Booth multiplier is used instead of conventional multiplier because Modified Booth multiplier can increase the MAC unit design speed and reduce multiplication complexity. SPST Adder is used for the addition of partial products and a register is used for accumulation. The operation of the designed MAC unit is as in equation (6). The product of Ai x Bi is always fed back into the 32-bit accumulator and then added again with the next product Ai x Bi. This MAC unit is capable of multiplying and adding with previous product consecutively up to as many as times.
Show more

8 Read more

Show all 10000 documents...