structure prediction methods

Top PDF structure prediction methods:

Ab initio Structure Prediction Methods for Battery Materials103-118

Ab initio Structure Prediction Methods for Battery Materials103-118

The search for new thermodynamically stable materials (those favoured to form during synthesis, when kinetic factors are excluded) using CSP can take one of many approaches (14), but all involve a search for the lowest energy minimum in a high-dimensional configuration space. The configuration space for a periodic structure with N atoms per unit cell has dimension 3N+3, taking into consideration the rotational symmetries and unit-cell degrees of freedom, whilst the number of local minima in the space scales exponentially with N (15). Ideally, all low-lying minima would be sampled during CSP since metastable phases may be synthesised experimentally, or indeed be thermodynamically stable under different conditions; for example, graphite is the most stable allotrope of carbon under ambient conditions, but diamond can be easily synthesised under high pressure. Particularly popular approaches to CSP include the use of evolutionary algorithms to ‘breed’ new structures (15) and particle swarm optimisation (16–18).
Show more

16 Read more

Application of Bioinformatics on Protein Structure Prediction

Application of Bioinformatics on Protein Structure Prediction

Protein tertiary structure prediction has been an important scientific problem for few decades, especially in bioinformatics and computational biology (Eisenhaber et al., 1995). Despite more and more native structures are included in protein data bank (PDB) database, the gap between the sequenced proteins and the native structures is still enlarging due to the exponential increase of protein sequences produced by large-scale genome and transcriptome sequencing. It is estimated that <1% of protein sequences have the native structures in PDB database (Rigden et al.,2009).Therefore, accurate computational methods for protein tertiary structure prediction that are much cheaper and faster than experimental structure determination techniques are needed to reduce this large sequence structure gap. Furthermore, computational structure prediction methods are important for obtaining the structures of membrane proteins whose structures are hard to be determined by experimental techniques such as X-ray crystallography (Yonath et al., 2011).
Show more

5 Read more

Evolutionary GRSA for Protein Structure Prediction

Evolutionary GRSA for Protein Structure Prediction

Due to the complexity of the problem and the long time that takes to analyze all that possible conformations, and that even for a small protein molecule the high dimensionality of the search space makes the problem intractable [4], only a tiny portion of protein sequences have experimentally solved three-dimensional structures. This fact had motivated further research in Computational Protein Structure Prediction Methods. Different computational approaches for finding the three-dimensional structure have been proposed. Algorithms are based on these strategies for solving protein folding problem, these algorithms search structures on a huge space of possible solutions. These methods can obtain several structures very close to the native structure. These computational strategies can be classified into 3 categories: (a) ab initio, (b) homology, and (c) threading [7]. Homology and threading methods use protein information looking for finding a solution of the problem, in contrast, ab initio uses only the amino acid sequence without additional structural information. Anfinsen (Nobel Prize in Chemistry, 1972) shows that only ab initio can solve PFP [1]. Ab initio is an interesting strategy for the next reasons: a) a lot of proteins do not have any homology with other proteins which native structure is known; b) the other strategies do not give information about why a protein adopts a certain structure; and c) even though, some proteins show high resemblance to other proteins, they adopt structures completely different [8]. On the contrary, the bases of ab initio come from physical concepts based on energy functions [9], which can be model as an optimization problem. As a result, only predictions made with ab initio can be fully reliable. The algorithm proposed in this work belongs to the ab initio strategy.
Show more

12 Read more

Proteins 2D Structure Prediction from 1D Sequence by Signal Processing and Soft Computing Methods

Proteins 2D Structure Prediction from 1D Sequence by Signal Processing and Soft Computing Methods

of unique protein folds exist in nature and structure prediction of a target sequence can be performed by consulting a database of known folds and determining which fold-model best fits the sequence. Both homology modeling and threading rely on the existence of known structures and the disadvantage of such approaches is that accurate prediction relies on proteins of similar structure already being solved. Another approach, namely the ab initio techniques [13] or prediction from first principles, bases structure prediction on known biochemical and biophysical facts related to the proteins. In general they are computationally very expensive methods. Machine learning methods such as neural network and nearest neighbor techniques, utilize a localized prediction methodology in the sense that a window, typically of less than 20 amino acids, is presented to the prediction system with the aim of predicting secondary structure. However, local information accounts for approximately 65% of secondary structure formation [8]. Therefore, prediction can potentially be improved by incorporating a more global prediction scheme [9]. Secondary structure prediction methods often employ neural networks (NNs) [14], SVMs [15], and hidden Markov models (HMMs) [16, 17]. In neural networks and SVMs utilize an encoding scheme to represent the amino acid residues by numerical vectors. On the other hand, in HMM methods, hidden states generate segments of amino acids that correspond to the non- overlapping secondary structure segments. There are two types of protein secondary structure prediction algorithms. A single sequence algorithm does not use information about other similar proteins. The algorithm should be suitable for a nonhomologous sequence with no sequence similarity to any other protein sequence. Algorithms of another type explicitly use sequences of homologous proteins, which often have similar structures. The accuracy (sensitivity) of the best current single sequence prediction methods is below 70%. The prediction accuracy of the best prediction methods that employ information from multiple alignments is close to 82.0% [18].
Show more

9 Read more

SPOT-Seq-RNA: Predicting protein-RNA complex structure and RNA-binding function by fold recognition and binding affinity prediction

SPOT-Seq-RNA: Predicting protein-RNA complex structure and RNA-binding function by fold recognition and binding affinity prediction

probabilistic matching between sequence profiles generated from PSI-BLAST (15) for query and template sequences and between structural features of a template and those predicted by SPINE X (16–18) for a query sequence. Predicted structural features include secondary structure (17), backbone torsion angles (16), and residue solvent accessibility (18). For binding affinity prediction, we extracted a knowledge-based energy function, DRNA, from protein-RNA complex structures (19) based on a distance-scaled finite ideal-gas reference (DFIRE) state (20). The DFIRE reference state was found to be one of the best reference states for deriving knowledge-based energy functions for folding and binding studies (21, 22). While many template-based structure prediction methods and knowledge-based energy functions for protein-RNA interactions exist, the coupling between fold recognition by SPARKS X and binding affinity prediction by DRNA in SPOT-Seq-RNA provides the first dedicated high-resolution function prediction for RBPs.
Show more

11 Read more

ROSEFW RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem

ROSEFW RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem

Contact Map (CM) prediction is a bioinformatics (and specifically a pro- tein structure prediction) classification task that is an ideal test case for a big data challenge for several reasons. As the next paragraphs will detail, CM data sets easily reach tens of millions of instances, hundreds (if not thou- sands) of attributes and have an extremely high class imbalance. In this section we describe in detail the steps for the creation of the data set used to train the CM prediction method of [26].

31 Read more

Predictive Modeling of Henry’s Law Constant in Chemical Structures Using LSSVM and ANFIS Algorithms

Predictive Modeling of Henry’s Law Constant in Chemical Structures Using LSSVM and ANFIS Algorithms

In the literature, there are some approaches to predict H of organic compounds in water based on chemical structure directly. Additionally, a number of indirect approaches for prediction of H based on vapor-liquid equilibrium data including activity coefficient, however their applications for prediction of the H are not exactly assessed [5,6]. Consequently in this paper, we focus on those approaches which can predict the H directly. There are two main types of correlation for prediction

11 Read more

ASSESSEMENT OF SOIL COLLAPSE PREDICTION METHODS

ASSESSEMENT OF SOIL COLLAPSE PREDICTION METHODS

collapse, there has been comparatively little literature on the comparison of prediction criteria. Tables 2a and 2b represent a quantitative evaluation of these existing methods using available experimental data in the literature. In these tables, application of collapse criteria reviewed here is listed for several soils of different depositional histories. The tables include soils encountered by the authors and others reported data in the literature. It was noted that in most cases no single criterion has predicted accurately the collapsibility of a particular soil. For example, for the criterion of Denisov [14], where the coefficient of susceptibility to soil collapse given by the corresponding expression should be less than 1, only 5 soils have been identified as collapsible over 16 reported. Furthermore:
Show more

8 Read more

Molecular cloning, phylogenetic analysis and structure prediction  Of amt1 from azolla anabaena azollae symbiotic system

Molecular cloning, phylogenetic analysis and structure prediction Of amt1 from azolla anabaena azollae symbiotic system

Molecular Evolutionary Genetics Analysis (MEGA) software is developed for the comparative analyses of DNA and protein sequences that are aimed at inferring the molecular evolutionary patterns of genes, genomes, and species over time (Kumar et al., 1994; Tamura et al., 2011).The phylogenetic analysis of the cloned AMT1 will be helpful in finding its relationship with other genes of AMT1 family. A protein structure can be predicted from the amino acid sequence of cloned AMT1 by structure prediction tools. The 3D structure of an unknown protein can be predicted using experimentally determined protein template having better homology with the target protein. Comparative modelling is the most reliable and accurate protein structure prediction method (Baker and Sali, 2001). Protein structure prediction helps to provide the biological function and mechanism of action of an unknown protein (Khan et al., 2016).RaptorX structure prediction server (Källberg et al., 2012,Peng, and Xu, 2011) helps in predicting the 3D structure for protein sequences when homologs are lacking in the Protein Data Bank (PDB). When an input sequence is given, RaptorX predicts its secondary and tertiary structures, contacts, solvent accessibility, disordered regions and binding sites. RaptorX also assigns some confidence scores to indicate the quality of a predicted 3D model: P-value for the relative global quality, GDT (global distance test) and uGDT (un-normalized GDT) for the absolute global quality and modeling error for each residue.
Show more

8 Read more

Homology modeling a fast tool for drug discovery: Current perspectives

Homology modeling a fast tool for drug discovery: Current perspectives

constructed homology models of Varicella Zoster virus thymidine kinase (VZV TK) based on herpes simplex virus type 1 thymidine kinase (HSV-1 TK) structure as template. Acyclovir and ganciclovir were docked in the constructed model to investigate the predictivity of these model as well as the characteristics of the binding with other substrates. It was found that there are slight differences in the way VZV TK binds the substrates in respect with HSV-1 TK. Missing loops in the VZV TK was modeled using the loop search routine of SYBYL 6.8. The study suggested that differences could be exploited for future ligand design in order to obtain more selective drugs. Li et al. [158] built homology models for a glycogen
Show more

17 Read more

Structure-Leveraged Methods in Breast Cancer Risk Prediction

Structure-Leveraged Methods in Breast Cancer Risk Prediction

Breast cancer is the most common non-skin malignancy affecting women, with approxi- mately 1.67 million cases diagnosed annually worldwide (Ferlay et al., 2013). If an individ- ual’s risk of breast cancer could be predicted, then screening, prevention, and treatment strategies could be targeted toward those women to maximize survival benefit and minimize harm. Risk prediction models are important tools to improve breast cancer care by lever- aging multi-dimensional electronic health data. Traditional breast cancer risk prediction models use demographic risk factors to estimate breast cancer risk, but they demonstrate only limited discriminatory power. In clinical practice, mammography is the most com- mon breast cancer screening test, and the only imaging modality supported by randomized trials demonstrating reduction in mortality rate. However, its effectiveness is not univer- sally accepted (Freedman et al., 2004). Recent advances in genome-wide association studies (GWAS) have revitalized the quest for genetic variants (single-nucleotide polymorphisms— SNPs) in risk prediction. However, the optimism of these studies has been tempered by disappointment and caution (Gail, 2008, 2009; Wacholder et al., 2010).
Show more

15 Read more

Prediction of RNA 3D Structure

Prediction of RNA 3D Structure

number of nucleotides present in the structure. It is very difficult to select the best structure amongst all possible kissing pair formation. Hence prediction is bases in similarity basis. Concept is to find out the sequence for kissing pair. This dot file sequence is searched in the PDB data bank. PDB data bank consists of more than 13600 sequences. If the dot files matches with any PDB sequence 100%, its corresponding nucleotide sequenced is checked. The sequence, that gives maximum similarity in nucleotides, is selected as most appropriate sequence. Hence algorithm Similarity calculates similarity using the concept of dynamic programming.
Show more

6 Read more

Annotation inconsistencies beyond sequence similarity-based function prediction – phylogeny and genome structure

Annotation inconsistencies beyond sequence similarity-based function prediction – phylogeny and genome structure

The former case is a classical example of domain fusion without supporting evidence. We will focus on the latter case, whose annotation history can be traced. It is encoded by gene MAC_00341, which is predicted to contain two domains, the Nup75 domain at positions 244–898 and the aconitase domain at po- sitions 900–1899: the linker sequence at positions 878–920 encodes for the C-terminal region of nucleo- porin Nup75 – Figure S5 in [10]. There are no indi- cations from any expression or short-read data that an aconitase domain follows – see also: Data Supple- ment 06 in [10]. Unfortunately, this mis-annotation has already propagated into other database entries since its original release in May 2010, in particular actual Nup75 homologs in other fungi, with GI num- bers (date submitted): 531865436 (November 2012), 572277876 (December 2013), 597570643 (March 2014), 632915374 (April 2014), which do not appear to be homologous to aconitases, and yet they are characterized precisely as such in their description lines. While Pfam searches do not admit this descrip- tion, the fact remains that the original entry is presented in domain architecture charts as a rare in- stance of the two domains joined into a single fusion protein. These cases should not only be treated differ- ently deploying a number of community criteria to be agreed on, but literally blacklisted in automated func- tion prediction (AFP) efforts. Thus, examining phylo- genetic distributions of genes, proteins or protein families can also be expanded to encompass phylo- genetic and genomic patterns to enhance the quality of annotation.
Show more

5 Read more

New product development resource forecasting

New product development resource forecasting

The New Product Development environment is one in which forecasting resource demand is particularly challenging (Anderson Jr and Joglekar, 2005; Loch and Terwiesch, 2007; Loch and Terwiesch, 1998). In most environments the goal of planning is to reduce uncertainty about events and their outcomes. Inhibiting uncertainty in NPD narrows the potential for innovation, defeating the objective of developing something new. However, not everything is uncertain and assumptions can be applied to the main types of activities that will be required and their likely outcomes (Kerzner, 2006). The problem of prediction is a complex one: multiple activities with multiple potential outcomes dictate proceeding activities; and, multiple factors can impact the likelihood of each outcome. Irrespective of the sophisticated planning tools that the resource data is packaged in, using an estimation-based approach to generate resource forecasts results in a number of issues (Hird 2013):
Show more

32 Read more

Machine Learning Methods for Diabetes Prediction

Machine Learning Methods for Diabetes Prediction

For evaluation, usually, performance measurement of the model depends on the learning process, techniques, and type of data. Numerous performance measurements that has been used in previous research is Accuracy, Sensitivity, Specificity, Peirce skill score (PSS), Heidke skill score (HSS), AUC/ROC, Precision, Recall, Kappa Statistic, Confusion matrix, Mean Square Error (MSE), Mathews correlation coefficient (MCC) and more. Hence, in this study, we select accuracy as our focus because more general and most of the researcher using this evaluation. The used of performance evaluation mostly for justification of the model when achieved the improvement result after a new strategy is applied and for comparing several models. However, in the term of diabetes, the prediction accuracy of the model is needed not only when the model well trained. It must have the ability to handle big data or EHR with consistent accuracy, reliability, and optimized computational time.
Show more

7 Read more

Aproaches to Prediction of Protein Structure: A Review

Aproaches to Prediction of Protein Structure: A Review

structure prediction largely depend upon the information out there in amino acid sequence. Evolutionary algorithms are like simple genetic algorithms (GA), messy GA, fast messy GA have addressed this problem. Support Vector Machine (SVM) represents a replacement approach to supervised pattern classification that has been with success applied to a large range of pattern recognition issues, as well as object recognition, speaker identification, gene function prediction with micro array expression profile, etc. In these cases, the performance of SVM either matches or is considerably higher than that of ancient machine learning approaches, as well as neural networks. However still SVMs are blackbox models. ANN is a good technique of protein structure prediction that relies on the sound theory of Back Propagation Algorithm. Protein secondary structure prediction has been satisfactorily performed by machine learning techniques like Artificial Neural Network and Support vector machines. Most secondary structure prediction programs target alpha helix and beta sheet structures and summarize all different structures within the random coil pseudo category. For the classification, ANN is employed as a binary classifier.
Show more

17 Read more

Prediction and Modeling of the Structure of 16S rRNA

Prediction and Modeling of the Structure of 16S rRNA

16 S rRNA base-pairs (921-922)·(1395-1396) and (923-925)·(1391-1393), which are part of region 28, are unstable and an alternate arrangement, (921-923)·(1532-1534), is detected by psoralen photochemical crosslinking. Site-directed mutagenesis has been used to investigate whether changes in base-paired region 28 or the alternate secondary structure is responsible for the inactivity of the subunit. 30 S subunits with the substitution C1533A or with deletion of nucleotides 1534 to 1542 can still be inactivated like the wild-type 30 S subunit. On the other hand, 30 S subunits that contain sequence changes in the 920 to 926 region show moderate to severe decreases in tRNA binding even under activating conditions. When 30 S subunits containing these mutations were subjected to chemical probing, they failed to show the normal hyper-reactivity of nucleotide G926 and, instead, reactivity was shifted to G925 or to G928, and G929. Two mutations in the 920 region result in structures in which A1394 is base-paired rather than being unpaired as normal; deletion but not substitution of A1394 resulted in loss of tRNA binding activity and depression of the reactivity of G926. Mutations were made to insert or delete a nucleotide at position 920. The deletion mutant but not the insertion mutant has decreased tRNA
Show more

262 Read more

Protein structure prediction with evolutionary algorithms

Protein structure prediction with evolutionary algorithms

When evaluating these penalty methods, it is impor- tant to consider whether they correctly bias the search strategy to feasible regions. The GAs discussed in Sec- tion 3 that use a penalty method apply a xed constant penalty term C per violation. This policy can cause problems if the second penalty method is applied with- out the extension of Patton et al. . For xed values of C is possible to construct examples where the structure with optimal energy with the penalty method does not correspond to the optimal energy for the HP model. It is also important to consider the ecacy of the penalty method to understand how well they facilitate opti- mization. For example, we believe that the extended formulation proposed by Patton et al. may lead to a less eective search than other methods. When the hy- drophobic amino acids are prevented from contribut- ing to the objective function because they overlap, the tness landscape may have large at regions, which can make the optimization problem more dicult. These considerations recommend the use of a xed penalty approach that is adapted based on the num- ber of hydrophobics available in the protein sequence,
Show more

6 Read more

Structure-based software reliability prediction*

Structure-based software reliability prediction*

Prevalent approaches to software reliability model- ing are black-box based, i.e., the the software system is considered as a whole and only its interactions with the outside world are modeled without looking into its in- ternal structure. However, with the advancement and widespread use of object oriented systems design and development, the use of component-based software de- velopment is on the rise. Software systems are devel- oped in a heterogeneous (multiple teams in dierent en- vironments) fashion, and hence it may be inappropriate to model the overall failure process of such systems us- ing only one of the several software reliability growth models. In this paper we outline the constituents of the structural models. We then present a exhaustive anal- yses of the classes of methods where the architecture of the application is modeled either as a discrete time Markov chain (DTMC) or a continuous time Markov chain (CTMC), and illustrate these methods using ex- amples.
Show more

6 Read more

Novel methods for the detection and prediction of changepoints

Novel methods for the detection and prediction of changepoints

In the previous chapters of this thesis we have seen, that more often than not, many of the data sets we encounter are non-stationary in nature. We have also seen that in many important application areas, e.g. time series forecasting in Chapter 4, it is important to capture the (temporal) dependence structure between observations adequately, otherwise future predictions may be unreliable. In this chapter, we turn our attention to a non-parametric framework in which we model such non-stationary time series. Specifically, we introduce wavelets (Section 5.1) and review the literature surrounding their application within locally stationary time series modelling (Section 5.2). Finally, in Section 5.3 we review the literature surrounding detecting change- points using the model described in Section 5.2. These ideas will be used in Chapter 6 for proposing a new method for detecting changes in variance, and in Chapter 7 we extend this into detecting changes in autocovariance.
Show more

181 Read more

Show all 10000 documents...