Sequence assembly

Top PDF Sequence assembly:

Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

In this paper, we propose a new algorithm for, DNA sequence assembly using a different strategy from the previous methods. Based on preliminary investigations, our method promises to be very fast and practical for DNA sequence assembly. Our algorithm takes advantage of several key features of the sequence data. First of all, the redundancy or the depth of coverage c is often much larger than 2. Thus, when only pair wise comparisons are considered, information about multiple overlaps of fragments is ignored, which leads to complications in reconstruction process. Second, the fragments are sequenced with errors but usually are 98% accurate. Therefore, there are very likely to be long stretches with no errors. In considering the basic data for shotgun sequencing, we have come to a very distinct strategy that takes advantage of these properties. Before we describe our strategy, we will give a sketch of the computer science associated with what seems at first to be an entirely different problem, sequencing by hybridization.
Show more

8 Read more

GPU-Euler: Sequence Assembly using GPGPU

GPU-Euler: Sequence Assembly using GPGPU

The Eulerian based de novo methods have always been widely used, and were inspired by the sequencing- by-hybridization approach [16, 28]. These algorithms represent each read by its set of k-mers (smaller subse- quences) and construct a de-Bruijn graph. A de-Bruijn graph is a directed graph where vertices are k-mers, and there exists an edge between two vertices if there is an overlapping subsequence of length ( k − 1 ) between them. Finding the Eulerian path or tour, where each edge in the de-Bruijn graph will be visited exactly once will lead to the sequence assembly solution. Before per- forming the Eulerian tour, these approaches use differ- ent heuristics to remove from the de-Bruijn graph, nodes and edges that are created due to sequencing errors and repeat regions within the genome. Myers presented an- other graph oriented approach based on the notion of bi-directed string graph [24]. A bi-directed string graph has direction associated with both end points of an edge produced by modeling the forward and reverse orienta- tion of sequence reads. The Eulerian-tour of such a graph enforces additional constraints that leads to improved accuracy and length of produced sequence contigs.
Show more

12 Read more

Scaling short read de novo DNA sequence assembly to gigabase genomes

Scaling short read de novo DNA sequence assembly to gigabase genomes

Controversially, scientists have begun exploring the use of these short-read technologies for de novo genome sequencing of large eukaryotic organisms, where the genomic sequence is determined solely from the short reads without using a reference genome as a guide — instigating debate similar to when the Human Genome Project strove to use Sanger sequencing to de novo sequence the human genome. In particular, it still is unclear whether or not short reads can successfully and reliably enable sequence assembly algorithms to reconstruct the original genomic sequence of complex organisms from the set of reads alone, as the short read length inherently limits the specificity of the location in the genome the read was sampled from; longer reads are desired as eukaryotic genomes contain repeated sub-sequences that are often longer than a single short read, making disambiguation difficult or impossible. To mitigate this factor, however, pairs of reads that are separated by a statistically known distance are used to virtually extend the length of the read beyond that of most repeats.
Show more

152 Read more

Using SWARM service for a GRID based Sequence Assembly

Using SWARM service for a GRID based Sequence Assembly

EST Sequence assembly is the process of assembling Expressed Sequence Tags [1] of an organism to contigs and then predicting gene functions from them. At the beginning of the EST project the starting material for the construction of cDNA library [9] is selected. This can be cells, tissues or even whole organisms. From this, the messenger RNAs are isolated. mRNAs are highly unstabe and so they are Reverse Transcribed to relatively more stable forms called the complementary DNA or cDNA. The cDNA has to be amplified to form a cDNA library. This is accomplished by cloning the cDNA into plasmid vectors. The plasmids are amplified by transforming the bacterium E-coli to generate the cDNA library. The cDNA library forms the basis of generating EST sequences. Usually the cloning of cDNA is done directionally, that is, it is known at which end of the vector the 5 prime and 3 prime ends of the cDNA are located. The cloned sequence can thus be sequenced from both ends simultaneously. The identified nucleotide sequence can be exported to a computer and the raw data is then processed.
Show more

13 Read more

Sequence Assembly with CAFTOOLS

Sequence Assembly with CAFTOOLS

Genomic sequencing is now a semi-industrial pro- cess that is being increasingly automated. The amount of finished sequence produced in large cen- ters worldwide more than doubles each year. This effort has required a huge investment in bioinfor- matics, and new software is under continual devel- opment both within these centers and in the wider academic community. High-throughput sequence assembly is a complicated multistep pipeline, using many pieces of software, and we as users want to be in a position to use the best set of software tools, even if this causes problems reconciling the various data formats they use. In addition, because more than one tool may be suitable for the same task (e.g., for manually editing sequence assemblies) we also want to offer alternatives within the same frame- work.
Show more

8 Read more

Improving the Quality of Automatic DNA Sequence Assembly using Fluorescent Trace-Data Classifications

Improving the Quality of Automatic DNA Sequence Assembly using Fluorescent Trace-Data Classifications

If desired, a single class may be assigned to base trace- data by selecting peak or valley according to which has the higher sum of scores, and then strong, medium, or [r]

12 Read more

TRITEX: chromosome scale sequence assembly of Triticeae genomes with open source tools

TRITEX: chromosome scale sequence assembly of Triticeae genomes with open source tools

suggesting fewer chimeric sequences. However, the size distribution of the elements in the Morex V2 assembly in- dicates a large population of overly large full-length ele- ments (Additional file 1: Table S6). In contrast, the size distribution of full-length elements in Morex V1 is nar- rower and shows two characteristic peaks corresponding to the autonomous and non-autonomous subfamilies (T. Wicker, unpublished results). Manual inspection of 50 randomly selected elements between 9900 and 10,000 bp in length showed that the large sizes of these elements are mainly due to large sequence gaps (i.e., long stretches of N’s). In the 50 manually inspected copies, we found 70 se- quence gaps in the internal domain and only 5 short gaps in the LTRs. The latter observation is not surprising as our method to identify full-length copies relied on largely gap-free LTRs. In only three cases, the large size of the element was caused by the genuine insertion of additional TEs. Overall, the Morex V2 assembly had more and larger gaps as TE length increased (Additional file 1: Table S6), a pattern that is absent from the Morex V1 assembly. In summary, the representation of repetitive sequence is similar in both assembly versions of the Morex genome. The longer read lengths and k-mer sizes used in the TRITEX pipeline may have resulted in a better repre- sentation of short tandem repeats in V2. However, the gap-free assembly of very recently inserted full- length TEs may benefit from prior complexity reduc- tion such as BAC sequencing.
Show more

18 Read more

A gene expression microarray for Nicotiana benthamiana based on de novo transcriptome sequence assembly

A gene expression microarray for Nicotiana benthamiana based on de novo transcriptome sequence assembly

DNA microarrays and high throughput RNA sequenc- ing (RNA-Seq) [4]. The latter technique directly reveals the sequence of transcripts and is becoming increas- ingly popular, as a result of continuous improvements in both the sequencing technology and the data analysis software. This increase has been marked by the develop- ment of sequencing centers and large consortia focused on specific organisms (Rice Genome Annotation Project, 1001 Arabidopsis Genomes Project, The Maize Genome Sequencing Consortium, to name just a few). These communities work on developing and standardization of protocols to facilitate aggregating and comparison of various datasets. Current RNA-Seq applications involve assembly of the transcriptome, with or without the refer- ence genome information, gene discovery and expression analysis, identification of unknown exon junctions and alternative transcripts, measuring allele-specific expres- sion and many more [5–8]. On the contrary, microarrays can only derive information on targets that are actually represented by the microarray probes and are sensitive to cross-hybridization, as well as display poor signal resolu- tion and increased variation at low signal intensities [9, 10]. Despite these drawbacks, the results generated on microarray platforms are concordant with those obtained with RNA-Seq [11, 12]. Additionally, thousands of studies performed over the past decades proved that the micro- arrays reflect the transcriptome composition with high fidelity and that they are a rich source of biologically val- uable information. Since their introduction, microarrays have been effectively used in searching for disease mark- ers [13], alternative splicing [14], gene function predic- tion [15], identification of transcriptionally active regions of the nuclear, mitochondrial and chloroplast genomes [16–18] and many other applications. The microarray experiments are still much cheaper than RNA-Seq, not only regarding the price of consumables and reagents but also the computational and human resources required for data analysis and storage. The latter are often underesti- mated when calculating the real costs of high-throughput sequencing experiments [19]. Remarkably, extracting biological information from the RNA-Seq data requires combining computational skills with deep knowledge of the problem of interest, typically by the close coop- eration of experts in each of those fields. Therefore, sequencing-based experiments may pose a substantial challenge for individual laboratories. With the small size of the resulting datasets and the relatively easy data analysis, DNA microarrays are still an attractive alterna- tive to RNA-Seq for a variety of studies, e.g., focused on differential analysis of known genes in the conditions of study and in time-course studies, where a large number of samples are to be processed and compared in a repeat- able manner. We surveyed the gene expression profiling
Show more

10 Read more

A New Algorithm for DNA Sequence Assembly

A New Algorithm for DNA Sequence Assembly

firs^ for each fragment, we apply the hashing methods (Dumas and Ninio, 1982; Wilbur and Lipman, 1983; Peizner and Waterman, 1995) to see where a fragment might align well to the in[r]

16 Read more

The use of a complexity model to facilitate in the selection of a fuel cell assembly sequence

The use of a complexity model to facilitate in the selection of a fuel cell assembly sequence

To reduce the large workspace associated with products that have many components, several metaheuristic approaches have been extensively researched in the literature. Common methods include genetic algorithms (GA) , ant colony optimisation (ACO) , particle swarm optimisation (PSO) , and simulated annealing (SA) [3]. These approaches do not guarantee the optimal solution, but have been considered successful. In general, these approaches transform information in the graph, combine them with objectives such as minimising assembly direction changes and tool changes, and add constraints such as precedence, to form a multi-criteria objectives that are solved to find the optimum. Common challenges ascribed to soft- computing metaheuristic approaches are high computational time, tedious data entry and premature convergence [3]. Many of the works present limited insight on the quality of the results and have a tendency to discuss and conclude about how a given approach makes headway in the aforementioned challenging areas.
Show more

7 Read more

A new approach to modularity in product development – utilising assembly sequence knowledge

A new approach to modularity in product development – utilising assembly sequence knowledge

In general, the above DFA approaches are applied at and limited to the detailed design phase. As a consequence, these approaches may require design changes after assembly analysis if certain design features are not suitable for assembly operations. This can lead to significant rework, including redesign, reanalysis through modelling and simulation and even re-prototyping. It is clear the above rework will increase the product development cost and lead time. To address these deficiencies, the proposed objective of this work is to start to use assembly knowledge in design process from the earliest stages of the development process and thus act on a product concept in a preliminary study phase. This will allow the identification of potential issues and negative consequences at early design stage so that any design decisions leading to significant negative assembly consequence can be eliminated at early stage. The changes are then easier to manage in an early stage of development rather than at the end of detailed study. Barnes et al. [17] proposes an approach that generates assembly sequences in parallel to the design (before the end of a project) but on a product with a very high level of details. He begins with the structure of parts to define the assembly sequence to the types of connections between parts.
Show more

11 Read more

An assembly sequence planning approach with a multi state gravitational 
		search algorithm

An assembly sequence planning approach with a multi state gravitational search algorithm

The costs of assembly processes are determined by assembly plans. Assembly sequence planning, which is an important part of assembly process planning, plays an essential role in the manufacturing industry. Given a product-assembly model, assembly sequence planning (ASP) determines the sequence of component installation to shorten assembly time or save assembly costs [1]. ASP is regarded as a large-scale, highly constrained combinatorial optimization problem because it is nearly impossible to generate and evaluate all assembly sequences to obtain the optimal sequence, either with human interaction or through computer programs.
Show more

7 Read more

Importance of Basic Residues in the Nucleocapsid Sequence for Retrovirus Gag Assembly and Complementation Rescue

Importance of Basic Residues in the Nucleocapsid Sequence for Retrovirus Gag Assembly and Complementation Rescue

If MLV Gag molecules truly interact only at the plasma membrane, then any MLV Gag protein with a mutant L do- main, which would be targeted to the membrane but otherwise be budding defective, should be capable of being rescued into particles. Since the L domain of MLV has not been mapped, we decided to examine a derivative of BgM that contains a deletion of the RSV L domain. For this, we made use of the RSV Gag mutant T10C.PR2 (Fig. 5A), which lacks amino acids 122 to 336, including the entire L domain, but retains the M and I domains (34). As shown in Fig. 5D (lanes 3), this molecule is budding incompetent but is rescuable by full-length RSV Gag (Fig. 5D, lanes 5). This well-characterized RSV L-domain mutation was introduced into BgM to create chi- mera T10M.PR2 (lanes 4). When expressed by itself, this recombinant, like T10C.PR2, was unable to bud from the cell (Fig. 5D, lanes 4). The presence of the strong membrane- binding sequence of Src and the high proteolytic activity of the PR1 form (data not shown) suggest that T10M is targeted to the plasma membrane. When coexpressed with an assembly- competent molecule (M.M1.PR2), T10M.PR2 was readily rescued into particles (Fig. 5D, lanes 6), which further indi- cates that it is not severely disrupted by the T10 deletion. This evidence supports the idea that interactions among MLV Gag proteins occur after the molecules are targeted to, and con- centrated on, the plasma membrane (see Discussion).
Show more

11 Read more

A framework for automatically realizing assembly sequence changes in a virtual manufacturing environment

A framework for automatically realizing assembly sequence changes in a virtual manufacturing environment

can be found in the literature, with each researcher choosing different areas to focus on and differing ontological structures to meet the requirements of their case. Lohse presented the O NTO MAS framework to reduce assembly system design effort using domain ontologies and implementing a function- behavior-structure paradigm to capture the characteristics of modular assembly system equipment [3]. A similar abstraction approach was proposed by Hui et al. that used semantic objects to retrieve information from documents of various formats and by inference allowing domain specific tools to become better integrated [15]. Lanz used feature based modelling to capture detailed product knowledge, categorizing features into geometric and non-geometric, to provide knowledge for a holonic manufacturing system [16]. Raza and Harrison described a collaborative production line planning approach supported by knowledge management theory [17]. A service- oriented architecture was proposed and supported by semantic web services that allowed automatic discovery and execution of assembly processes by modelling and mapping assembly processes and systems in [18]. An influential architecture for integrating the PPR domains is the Virtual Factory Framework (VFF) which is a data model that links and stores knowledge to support engineering concurrency in the resource domain [19], but does not have the granularity to model system control logic. More recently, knowledge-based mapping has been used to support in the selection of function blocks for manufacturing resource components [20], and Ramis et al. [21] showed how product requirements could be translated directly through to dynamically changing programmable controller logic. Chen et al. extended EAST-ADL (a language developed to model automotive electronic systems, see [22]) to model production systems using MetaEdit+ [23]. Mapping within and between the concepts of Equipment, Process and Product were achieved through the EAST-ADL feature links.
Show more

7 Read more

Exploring the sequence space for (tri-)peptide self-assembly to design and discover new hydrogels

Exploring the sequence space for (tri-)peptide self-assembly to design and discover new hydrogels

Abstract.  Peptides  that  self-­‐assemble  into  nanostructures  are  of  tremendous  interest  for  biological,  medical,  photonic   and  nanotechnological  applications.  The  enormous  sequence  space  that  is  available  from  20  amino  acids  likely  harbours   many  interesting  candidates,  but  it  is  currently  not  possible  to  predict  supramolecular  behaviour  from  sequence  alone.   Here,   we   demonstrate   computational   tools   to   screen   for   the   aqueous   self-­‐assembly   propensity   in   all   of   the   8,000   possible  tripeptides,  and  evaluate  these  by  comparison  with  known  examples.  We  applied  filters  to  select  for  candidates   that  simultaneously  optimize  the  apparently  contradicting  requirements  of  aggregation  propensity  and  hydrophilicity,   which   resulted   in   a   set   of   design   rules   for   self-­‐assembling   sequences.   A   number   of   peptides   were   subsequently   synthesized  and  characterised,  including  the  first  reported  tripeptides  that  are  able  to  form  a  hydrogel  at  neutral  pH.   These  tools,  which  enable  the  peptide  sequence  space  to  be  searched  for  supramolecular  properties,  enable  minimalistic   peptide  nanotechnology  to  deliver  on  its  promise.  
Show more

13 Read more

Integrated product relationships management : a model to enable concurrent product design and assembly sequence planning

Integrated product relationships management : a model to enable concurrent product design and assembly sequence planning

According to the above approaches, a lack of associativity in PLM systems was highlighted by Tremblay et al. (2006) where only ‘parent–child’ i.e. ‘is part of’ class) relationship exists. For a large-scale company, the management of relative positions of parts using matrices is imple- mented in PDM systems in order to be more closely related to geometric models embedded in CAD systems, and to facilitate change management and part positioning. During the last decade, (Weber et al. 2003) have proposed an advanced PDM system based on a property-driven development/design approach by introducing the handling of predicted engineering characteris- tics (i.e. structure, shape, and material) and properties (i.e. product’s behaviour) of the product with their interdependencies in a separate manner. However, information related to product rela- tionships and assembly process engineering is not effectively treated in their proposal. More recently, PLM systems have moved towards Web-based and Web-service technologies, in order to facilitate information exchange and access in distributed and extended enterprises (Huang et al. 1999, Liu and Xu 2001, Georgiev et al. 2007). An additional effort towards ontology and semantic Web can also be found (Matsokis and Kiritsis 2010). According to the above applications and approaches, a lack of support of associability among product models using product relationships still exists and is a barrier for effective and integrated lifecycle-oriented design (Tremblay et al. 2006, Sy and Mascle 2011).
Show more

20 Read more

Physical mapping and nucleotide sequence of a herpes simplex virus type 1 gene required for capsid assembly.

Physical mapping and nucleotide sequence of a herpes simplex virus type 1 gene required for capsid assembly.

2169 Downloaded from http://jvi.asm.org/ on November 10, 2019 by guest In this report, we describe some phenotypic properties of a temperature-sensitive mutant of herpes simplex type 1 H[r]

11 Read more

Chromosomal-level assembly of the Asian seabass genome using long sequence reads and multi-layered scaffolding

Chromosomal-level assembly of the Asian seabass genome using long sequence reads and multi-layered scaffolding

S5 Fig. Comparison of GC content of Asian seabass genome assembly (v2)with few selected fish genomes (A), with representatives from the different classes of vertebrates (B) and comparison of GC content with genome size of selected fishes (C). The GC-content of genomes of interest were calculated using a 20 kb sliding window (BedTools utilities [145]). In addition to Lates calcarifer, the genomes analyzed included (A) six teleosts (Danio rerio, Gadus morhua, Gasterosteus aculeatus, Oryzias latipes, Takifugu rubripes, Tetraodon nigroviridis) or (B) six vertebrates (Anolis carolinensis, Callorhinchus milii, Gallus gallus, Homo sapiens, Petro- myzon marinus, and Xenopus tropicalis). Sliding windows with more than 25% of Ns (gaps) were discarded and the proportion of sliding windows with a given GC-content (%) was calcu- lated and plotted. The script utilized to run BedTools [145] and perform downstream process- ing is available at https://github.com/ramadatta/Scripts/blob/master/Average_GC_Content_ Analysis/knowGC-contentrun1.sh. (C) Genome size of selected fish genomes compared with their average GC content. BP: Boleophthalmus pectinirostris; DR: Danio rerio; GM: Gadus mor- hua; GA: Gasterosteus aculeatus; LC: Lates calcarifer NB: Neolamprologus brichardi; OL: Ory- zias latipes; ON: Oreochromis niloticus; TR: Takifugu rubripes; TN: Tetraodon nigroviridis. (TIF)
Show more

35 Read more

Programmed assembly of polymer-DNA conjugate nanoparticles with optical readout and sequence-specific activation of biorecognition

Programmed assembly of polymer-DNA conjugate nanoparticles with optical readout and sequence-specific activation of biorecognition

tation may have important applications in drug-delivery, e.g. controlled targeting in the presence of a therapeutic nucleic acid sequence. In addition, sensing applications can be envisioned for systems that, like the FRET pair shown herein, can modulate their spectral properties in a programmable way allowing for in vitro and in vivo monitoring of unshielding. Finally, the exibility in the design and synthesis offered by nucleic acid based materials combined with the opportunity to tailor polymers and ligands for specic biomedical tasks suggests materials of this type may prove useful in the personalized diagnostics and patient-group stratied therapeutics.
Show more

7 Read more

Exploiting orthology and de novo transcriptome assembly to refine target sequence information

Exploiting orthology and de novo transcriptome assembly to refine target sequence information

The paper is structured as follows: We first present an estimate on how many sequences might benefit from our refinement approach in five pharmaceutical model species. Next, we validate the general idea of refining poorly annotated protein sequences by aligning the known protein sequences from human to de novo as- semblies from three tissue-specific transcriptomes (brain, liver and kidney) of these species. For this pur- pose, we use the 20,350 manually reviewed human pro- tein sequences in UniProtKB/Swiss-Prot (hereafter referred to as “known human protein sequences”) as ref- erence sequences. The Swiss-Prot subset of the Uni- ProtKB/Swiss-Prot database [2] is probably the most comprehensive resource for curated protein sequences. The number of human entries in this database has been quite stable for almost a decade indicating that most hu- man proteins are known. We generalise the approach used during validation of the general idea with an auto- mated sequence refinement workflow implemented in the a&o-tool and show an example application. For our analyses we used both publicly available data (mouse, rat, dog, pig and human) as well as newly generated paired-end RNA-Seq data (cynomolgus monkey).
Show more

12 Read more

Show all 10000 documents...