Top PDF A survey of models for inference of gene regulatory networks

A survey of models for inference of gene regulatory networks

A survey of models for inference of gene regulatory networks

Besides gene expression data, other data such as protein-DNA, protein-protein interac- tion data and microRNAs should be considered for revealing gene regulatory mechanisms. Only a small part of RNAs is coding RNAs whereas the bigger part from genome of eukaryotes is transcribed into non-coding RNAs. In the last few years, several small non- coding RNAs such as microRNAs and siRNAs are revealed [4]. The length of nucleotide thread in microRNAs is about 18–25 nucleotides [5]. MicroRNAs cause transcription cleavage or translation repression by connecting to their target mRNA [6]. MicroRNAs regulate expression by more than 30% of coding genes [7, 8]. Beside TFs, microRNAs are in mutual interaction with more cis-regulatory elements. Similarly to TFs, genes also contain binding sides for other TFs that may be targeted by microRNAs. Thus, the mutual influence between microRNAs and TFs makes microRNAs important components in the gene regulation.
Show more

22 Read more

Bayesian Inference of Gene Regulatory Networks : From Parameter Estimation to Experimental Design

Bayesian Inference of Gene Regulatory Networks : From Parameter Estimation to Experimental Design

Second, in this thesis an efficient algorithm for the performance of Bayesian experimental design is given and applied to a high-dimensional non-linear ODE model describing gene regu- latory networks in Chapter 7. We showed that with maximum entropy sampling it is possible to perform Bayesian experimental design for general distributions of model parameters and they do not have to follow a certain parametric form. Experiments are then selected such that the information in the posterior distribution will be maximally increased. Going the standard way to perform general BED, this would mean to have to solve triple integrals 2 . With the usage of maximum entropy sampling, this computationally non-tractable task reduces to the estimation of the entropy, being only one integral, of the distribution of the data that will be obtained after performance of this experiment. Then the experiment will be chosen where the entropy is maximal, i.e., an experiment will be performed where one knows the least. To obtain entropy estimates, one needs a high number of samples from the distribution of the model parameters. Reliable samples of the model parameters are obtained with the usage of a population-based Markov chain Monte Carlo algorithm which is parallelized to speed up the sampling procedure. The method was evaluated for simulated data as well for the DREAM 2 Challenge #3 data. It was compared to the choice of random experiments and the usage of the D-optimal classical experimental design for ODE models described in Atkinson [ADT07] Chapter 17.8. We show that the maximum entropy approach clearly outperforms the choice of random experiments as well as the classical experimental design method.
Show more

252 Read more

Structural influence of gene networks on their inference: analysis of C3NET

Structural influence of gene networks on their inference: analysis of C3NET

Detailed review: This manuscript investigates the per- formance of C3NET, a method developed to infer the structure of gene regulatory networks (GRNs) from gene expression data. C3NET is compared with ARACNE, another GRN inference method. Data sets are generated from network models designed to mimic expression data from GRNs with 100 genes. The methods section states that “ In the first step of C3NET, aiming at the elimination of nonsignificant edges, we used the optimal cut-off value, which is the threshold (I0) that maximizes the F-score for each data set with respect to the true underlying network structure [36,37].” This would seem to give an unfair advantage to C3NET, since in any application to real data the true network will be unknown. It is not clear how the parameters are set for other methods. On p. 7 the authors state “The DPI tol- erance parameter of ARACNE, when used for compari- son purposes, was chosen as 0.1 [30], ” but later they state “ the results presented in this section were obtained by using the optimal threshold values (I0) for all algo- rithms (C3NET, ARACNE and MRNET). ” These state- ments seem inconsistent. This also creates questions about how the results for synthetic data would look within tuning. The authors must describe how to set the parameters appropriately without knowledge of the true network structure. There have been several assessments of GRN inference through the DREAM challenge, and ideally it would be worthwhile to apply to some of these published data sets also. The manuscript should be modified to make the methods and results more self- contained. While some references to technical details are acceptable, it is too much to ask a reader to refer to previous papers for essential information such as the method itself (which can be compactly specified), for performance metrics, etc. The discussion of the perfor- mance for different edge types, p. 9, is a bit confusing because two concepts are being explored: (1) the recall
Show more

16 Read more

Inference of gene regulatory networks from genome-wide knockout fitness data

Inference of gene regulatory networks from genome-wide knockout fitness data

owing to the emergence of microarray technology, which allows for simultaneous measurement of gene expression on the genome-wide scale (Bonneau et al., 2006; Chou and Voit, 2009; Friedman et al., 2000; Liang and Wang, 2008; Margolin et al., 2006; Reiss et al., 2006; Shmulevich et al., 2002; Stuart et al., 2003). The vast amounts of data provided by gene expression microarrays enable the possibility of accurate estimation of gene regulatory network organization, which has greatly benefited a broad range of disciplines—from basic biological sciences, to bio- engineering, to medical diagnosis and treatment (Hanai et al., 2006; Mischel et al., 2004). The goal of inference algorithms is to discover the connectivity structure and, potentially, dynamic char- acteristics of these networks based on such time- or other state-series data. Among other things, the nature of inference al- gorithms varies depending on the types of biological networks and the way they are modelled (de Jong, 2002; Hendrickx et al., 2011; Lecca et al., 2011; Liu et al., 2006; Samoilov et al., 2001; Shmulevich et al., 2002; Tian and Burrage, 2003; Wang and Schonfeld, 2010). One category of models quantizes the empirical data into binary numbers and views network structures as Boolean constraints (Bornholdt, 2008; Kauffman et al., 2003). Although this could be attempted in a deterministic framework, both the uncertainties introduced by measurement errors as well as the inherent stochasticity of gene expression make any experi- mental data substantially probabilistic. To impart this random nature to the Boolean framework, the probability Boolean net- work models have been introduced (Akutsu et al., 1999; Huang, 1999; Shmulevich et al., 2002). However, as biological processes are neither digital nor homogeneous, further gene regulatory mod- elling and inference refinements may be achieved by using alter- native probabilistic network descriptions (Craciun et al., 2013; Kellam et al., 2002; Liu et al., 2006), continuous-time differential equations (Chen et al., 1999; Holter et al., 2001; Wang et al., 2008), stochastic differential equations (Tian and Burrage, 2003; Yeung et al., 2002), and control theory methods (Beal et al., 2005; Cook et al., 1998; Rangel et al., 2004), among others. Although any of these methods offers certain advantages and disadvantages in at- tempting to capture the structure and dynamics of gene regulatory network, it should be noted that they have largely been designed toward describing gene expression data.
Show more

9 Read more

Modeling gene regulatory networks through data integration

Modeling gene regulatory networks through data integration

Given the complex nature of these problems, I started building statistical frame- works that allowed integration of different models, where each model captures a dif- ferent aspect of complexity. In chapter 4, we propose an extended model inspired by module networks [10] and stochastic blockmodels [11] for integrative learning of reg- ulatory modules from gene expression and protein-DNA interaction data, e.g. ChIP binding information. Through this integration, we assign those Transcription Factors (TFs), as regulators, that have both physical interaction with genes and predictive power in explaining their expression. This model allows inference of combinatorial interactions between regulators. Incorporating complementary interaction data, im- proves accuracy by avoiding false assignments of indirect regulators or regulators with correlated expression [12]. Also, it enhances computational tractability and scalability of the method by restricting the space of possible regulatory structures. We developed a reversible-jump MCMC learning procedure for learning modules and model parameters and we also illustrated theoretical advantages of this integration in terms of model identifiability and practical significance of the model on M. tuber- culosis data [13, 14].
Show more

109 Read more

Analysis of the PC algorithm as a tool for the inference of gene regulatory networks: evaluation of the performance, modification and application to selected case studies.

Analysis of the PC algorithm as a tool for the inference of gene regulatory networks: evaluation of the performance, modification and application to selected case studies.

Gat-Viks and Ron Shami system. In 2007 Gat-Viks and Ron Shamir developed a system (Figure 2.11) that, starting from prior knowledge of a LGN, adds interactions between the genes of the LGN and it expands the LGN with additional genes and relative relationships [Gat-Viks and Shamir, 2007]. It starts from prior biological knowledge and it formalizes this in the Bayesian network in which each variable (node) can have several discrete states and then it obtains the Bayesian scoring matrix. The method finds the discrete function which represents the different relationships between genes of the LGN. The next step consists in generating two evaluation models starting from different levels of expression: observed and predicted. The observed expression level derives from a measurement in biological experiments (gene expression data, measures of the metabolism and/or, proteins). The predicted expression level, instead, is the probabilistic expectation of the variable given the model and the experimental data (gene expression data of the genetic perturbation). In the final steps these two expression levels are compared and the disagreement, between observed and predicted expression levels, indicates the possible edge to be added on the LGN. The new score of the Bayesian matrix with these new edges is calculated and if its score is bigger of the score of the original model the edge is added to the LGN. The same method is used to expand a LGN. Each hypothetical expansion gene is added to the LGN and the new scoring matrix is recalculated [Gat-Viks and Shamir, 2007].
Show more

160 Read more

Learning a Markov Logic network for supervised gene regulatory network inference

Learning a Markov Logic network for supervised gene regulatory network inference

a (dynamical) system [3] and (3) supervised edge predic- tion approaches that focus on the graph of regulation and only predict the presence/absence of regulations [4-7]. In the first family, relevance networks like ARACNE [8], CLR [9] and TD-ARACNE [10] use a mutual information score between the expression profiles of each pair of genes and given a threshold, decide to predict an interaction or not. The second family is based on model of behavior of the network, either static or dynamic. In case of static models devoted to steady-state data, Gaussian Graphical Models (GGM) [11,12] allow to build a linear regres- sion model that expresses how one gene can be predicted using the set of remaining genes. Interestingly, GGM build a network using partial correlation coefficients, pro- viding a stronger measure of dependence compared to correlation coefficients used in relevance networks. A powerful approach to regression and network inference based on an ensemble of randomized regression trees [13] has also proven to outperform competitors in inferring gene regulatory networks in recent DREAM competi- tions. Bayesian networks [14] provide another important approach in static modeling. Learning a Bayesian net- work involves learning the acyclic oriented graph that describes the parental relations between variables and the conditional probabilities that govern the behavior of the network. While appropriate to gene regulation cas- cades, Bayesian networks cannot, however, model cycles in the network. Other models incorporating dynamical modeling have therefore been proposed in the literature: dynamical Bayesian networks and differential equations [15-17].
Show more

15 Read more

The Local Edge Machine: inference of dynamic models of gene regulation

The Local Edge Machine: inference of dynamic models of gene regulation

The issue of non-identifiability of network models for gene regulatory networks has been recognized [6, 7] but not widely studied. This issue arises when distinct net- works (i.e., network topologies) have the capability of generating the same dynamics within similar parame- ter regimes. By definition, no inference algorithm can distinguish between such non-identifiable pairs of net- works. Since LEM takes a Bayesian approach, it implicitly rewards models that are robust to changes of parameters (see Additional file 1: Section 3 for a theoretical justifica- tion and Additional file 20 for examples). Thus, if two dis- tinct models generate the same dynamics, then LEM will place a higher posterior probability on the more robust model. Although LEM cannot overcome the theoretical limits on inference placed by non-identifiability issues, we observe that LEM, nonetheless, performs quite well, as evidenced by its ability to find global systems of differ- ential equations that fit the data (see Additional file 1: Section 8 and Additional file 20 for examples). This phe- nomenon also appears in yeast cell-cycle network 1, where LEM does not capture all of the gold standard edges, but it does generate dynamics that closely approximate the observed data (see Additional file 21). Predictions made by LEM for yeast cell-cycle network 1 that were not previ- ously identified by experiments are in the process of being tested.
Show more

13 Read more

Application of Logic Synthesis Toward the Inference and Control of Gene Regulatory Networks

Application of Logic Synthesis Toward the Inference and Control of Gene Regulatory Networks

There are several observations that impact the formulation of our GRN model and predictor inference algorithm. First, the activity level (i.e. activation or repression) of all genes at a particular time t represents the state of the GRN at that time t. From our knowledge of biological systems, we observe that over time, cellular processes converge to sequences of stable attractor states. Some of these attractor states represent normal cellular phenomena in biology (i.e. cell cycle and division), while other attractor states are consistent with disease (i.e. metastasis of cancer). Second, the GRN is often inferred by observing microarray-based experimental data though which the activity level of genes is measured. Both observations of gene activity (or state) can be used to infer the gene regulation network. The disadvantage of using microarray data is that such studies do not involve controlled time-series experimental data. Hence the measurements are assumed to arise from cyclic sequences of gene expressions (attractor states) in steady state. Such a sequence is referred to as an attractor cycle. The GRN is then inferred from this data, using methods traditionally based on probabilistic transition models [34, 35].
Show more

147 Read more

Integrative approach for inference of gene regulatory networks using lasso-based random featuring and application to psychiatric disorders

Integrative approach for inference of gene regulatory networks using lasso-based random featuring and application to psychiatric disorders

Basically the inference method should be determined depending on both what kind of data such as gene expres- sion, gene-Transcription Factor (TF) [4], or protein- protein interaction (PPI) [5] are used to infer and which type of network model, such as directed or indirected graph [6], we assume. In addition, we have to consider the case of data integration. Namely, not only individual data but also multiple data types together (i.e. integra- tion of gene expression and gene-TF data [7]) can be used for more reliable inference [8, 9]. As an assump- tion in this work, we limit our inference methods for directed network with a single data type: gene expression data. In order to decipher regulatory interactions with gene microarray data, which provides the gene expression level regulated by the other genes directly or indirectly, the number of effective network inference methods have been proposed by employing a variety of computational and structural models based on boolean networks [10], Bayesian networks [11], information theory [12], regres- sion model [13], and so on. Depending on the different approaches, however, the results tend to be irregular due to inherently different advantages and limitations of each of the inference solutions [14]. The results of the Dia- logue on Reverse Engineering Assessment and Methods (DREAM) project [15] describe well the pros and cons of the different methods as well as how effectively they can work together when the advantages of all meth- ods are integrated (but it does not mean any combina- tion always outperforms any other standalone method). More specifically, we note that they conclude two points through the experiments that (i) there is a limit to a single criterion for continuous improvement of network infer- ence research without the integration and (ii) specifically the bootstrapping (re-sampling) based regression method [16] is required to avoid overfitting in regression-based methods [15].
Show more

12 Read more

Simulation and identification of gene regulatory networks

Simulation and identification of gene regulatory networks

variants), which are more easily identified when associated with a group of etraits rather than with individual etraits. Several approaches to associating DNA variants with groups of etraits have recently been proposed, e.g. [ 38 , 97 , 130 , 173 , 185 ]. A major goal of systems ge- netics studies is to reconstruct a causal network whose nodes are the phenotypes, the etraits (and potentially other omics variables) and the DNA variants. Methods proposed to achieve this goal include Bayesian networks [ 186 ], differential equation models [ 23 , 46 ], structural equation modeling [ 101 , 103 ] and undirected dependency graph or co-expression network with edge orientation using DNA variants as causal anchors [ 21 , 126 ]. While multiple meth- ods for QTL mapping of etraits (omics variables) and for causal network inference are avail- able, at the present time not much is known about the strengths and weaknesses of all of these proposed methods and whether or when some methods perform better than others. However, researchers increasingly realize that thorough verification of algorithms in bioin- formatics and (genetical) systems biology is required. In fact, several international compe- titions are organized on an annual basis to compare computational methods for systems biology and genetic analysis. These include the Dialogue for Reverse Engineering Assess- ments and Methods (DREAM) project with its reverse-engineering challenges [ 17 , 164 , 165 ], for which SysGenSIM has been used to produce the systems genetics challenges in 2010, and the Genetic Analysis Workshops [ 9 , 40 ], which compare analysis tools relevant for cur- rent analytical problems in genetic epidemiology, statistical genetics and genetical systems biology.
Show more

148 Read more

NetDiff – Bayesian model selection for differential gene regulatory network inference

NetDiff – Bayesian model selection for differential gene regulatory network inference

Differential networks allow us to better understand the changes in cellular processes that are exhibited in conditions of interest, identifying variations in gene regulation or protein interaction between, for example, cases and controls, or in response to external stimuli. Here we present a novel methodology for the inference of differential gene regulatory networks from gene expression microarray data. Specifically we apply a Bayesian model selection approach to compare models of conserved and varying network structure, and use Gaussian graphical models to represent the network structures. We apply a variational inference approach to the learning of Gaussian graphical models of gene regulatory networks, that enables us to perform Bayesian model selection that is significantly more computationally efficient than Markov Chain Monte Carlo approaches. Our method is demonstrated to be more robust than independent analysis of data from multiple conditions when applied to synthetic network data, generating fewer false positive predictions of differential edges. We demonstrate the utility of our approach on real world gene expression microarray data by applying it to existing data from amyotrophic lateral sclerosis cases with and without mutations in C9orf72, and controls, where we are able to identify differential network interactions for further investigation.
Show more

9 Read more

Identifying Gene Regulatory Networks from Gene Expression Data

Identifying Gene Regulatory Networks from Gene Expression Data

The network states, S i , correspond to steady-states of gene expression of all the genes in the network following a single gene perturbation. To derive the network topology the authors utilize a parsimony argument, whereby the set of regulators in the network was postulated to be equal to the smallest set of nodes needed to explain the differentially expressed genes between the pairs of network states. The problem thus becomes the classical combinatorial optimization problem of minimum set covering, which is NP-complete in general. They solved small instances of it using standard branch and bound techniques. The solutions were graphs, or gene network topologies. To complete the network inference, i.e. to identify the Boolean functions at the nodes, they built truth tables from the input data and the inferred regulators for each node. This procedure does not yield a unique network in general. The authors proposed an information-theoretic approach for predicting the next experiment to perform which best disambiguated the inferred network, based on an information theoretic score of information content. Their results confirmed that the number of experiments needed to fully infer a Boolean network is proportional to logn, with double perturbation experiments having better resolving power on average than single perturbation ones. Limitations and Extensions
Show more

30 Read more

Construction of gene regulatory networks using biclustering and bayesian networks

Construction of gene regulatory networks using biclustering and bayesian networks

The ongoing development of high-throughput technologies such as microarray prompts researchers to study the complexity of gene regulatory networks (GRNs) in cells. GRN inference algorithms have significant impact on drug development and on understanding of disease ontology. Many GRN inference algorithms based on genome- wide data have been developed to unravel the complexity of gene regulation. Tran- scriptomic data measured by genome-wide DNA microarrays are traditionally used for GRN modelling. This is because RNA molecules are more easily accessible than pro- teins and metabolites. One of the major problems with time series microarrays is that a dataset consists of relatively few time points with respect to a large number of genes. Reducing the data dimensions is one of the interesting problems in GRN modelling. The most common and important design rule for modelling gene networks is that their topology should be sparse. This means that each gene is regulated by only a few
Show more

20 Read more

Model checking the evolution of gene regulatory networks

Model checking the evolution of gene regulatory networks

The key characteristics of the behaviour of a GRN are typically summarised by a directed graph where nodes represent genes and edges denote the type of regulation between the genes. A regulation edge is either activation (one gene’s activity increases the activity of the other gene) or repression (one gene’s activity decreases the activity of the other gene) [ 23 ]. In Wagner’s model of a GRN, in addition to the activation types between genes, each gene is assigned a threshold and each edge (pair of genes) is assigned a weight. The threshold of a gene models the amount of activation level necessary to sustain activity of the gene. The weight on an edge quantifies the influence of the source gene on destination gene of the edge. We extend the Wagner’s model by allowing a range of values for weight parameters. We call our model GRN space, denoting that all GRNs instantiated from that space share the same topology, and their parameters fall into given ranges. We assume that each gene always has some minimum level of expression without any external influence. In the model, this constant input is incorporated by a special gene which is always active, and activates all other genes from the network. The weight on the edge between the special gene and some other gene represents the minimum level of activation. The minimal activation is also subject to perturbation.
Show more

23 Read more

Prophetic Granger Causality to infer gene regulatory networks.

Prophetic Granger Causality to infer gene regulatory networks.

One difficulty in the field of GRN inference is the ability to unequivocally evaluate methods as gold standard datasets are in limited supply. The DREAM series of challenges was launched to formalize the creation of benchmarks. While the choice of metric for DREAM challenges may be somewhat arbitrary with several possible alternatives available, they have the distinct advantage of eliminating the so called “self assessment trap” in which method’s developer’s either consciously or unconsciously bias the evaluation in favor of their own methods [ 21 ]. DREAM has often found that ‘wisdom of crowds’ approaches combining several strategies often perform better than any stand-alone approach [ 22 ], consistent with classic work on ensem- bles that demonstrate weak learners can be combined to form a more accurate method as the errors of the weak learners tend to be mutually uncorrelated and average out [ 23 , 24 ]. The accu- racy of the top-performing ensembles reveal that considerable room for improvement exists in the ability of individual methods to reverse-engineer GRNs. New methodology, or those that draw inspiration from different fields of research, could complement existing algorithms.
Show more

22 Read more

Practical aspects of gene regulatory inference via conditional inference forests from expression data

Practical aspects of gene regulatory inference via conditional inference forests from expression data

output (response) and the remaining transcripts (genes) are taken as input (predictor variables). For each response, a CIF/CIT is constructed and a variable importance measure (VIM) is defined (see below). These measures per gene are either based on a single CIT or are aggregated over several CITs in gene-based CIFs, depending on the view taken to construct a network from trees. In general, a (statistically) “significant” VIM for gene X in predicting gene Y will lead to a connection between X and Y in the network. Because of the direction of prediction, the connection is presented as a directed edge, naturally giving rise to a directed network (i.e., GRN). The so-called predicted network is compared to a gold standard (when available), using network prediction performance criteria as suggested by [Marbach, et al. 2012; Prill, et al. 2010]: 1) the area under the receiver operating characteristic curve (AUROC), 2) the area under the precision-recall curve (AUPR), and 3) the DREAM challenge specific score. The ROC curve plots the sensitivity (i.e., true positive rate) versus 1 minus specificity (i.e., 1 minus the true negative rate) and is well-known in statistics. Precision-Recall curves or PR curves are often used in Information Retrieval and offer an alternative to ROC curves for skewed class distributions. An algorithm may be a good performer based on ROC but not based on PR. Whereas recall is defined as the true positive rate, precision is defined as the fraction of examples classified as positive that are truly positive. When the number of unconnected nodes exceeds the number of connected nodes in the GS networks, as is the case with GRNs, more information about comparative performance of methods can be retrieved from precision-recall curves [Davis and Goadrich 2006]. For more details about ROC-PR comparisons, we refer to [Davis and Goadrich 2006]. The overall score summarizes performance over several network scenarios and is defined as in [Marbach, et al. 2012] as the mean of the (minus log10-transformed) network specific p-values p PR and p ROC . The PR and ROC p-
Show more

23 Read more

Gene Regulatory Network Inference Using Machine Learning Techniques

Gene Regulatory Network Inference Using Machine Learning Techniques

Template-based methods exploit the idea that orthologous TFs regulate orthologous genes. Thus, in this category, one starts with the well reconstructed GRN of a well- known organism (the template) and then transfers information about regulation to orthologous genes in the genome of interest. This methodology requires the entire template genome and its GRN i.e the set of its TF-gene interactions. The genome can either be represented by its nucleotides sequence (DNA sequence) or its proteins sequences. These sequences are then used to determine their representatives (or- thologs) in the genome of interest. Orthologs are detected using sequences alignment tools. To present this category, we consider the works of Babu et.al [8], in which they used one the most well-characterized bacterial network, E. coli, as a template to reconstruct networks of 175 prokaryotic genomes. The orthology is detected us- ing a hybrid method combining sequence alignment and the Bidirectional Best Hit method(BBH). BBH consists of finding the pairs of genes in two different genomes that are more similar to each other than either is to any other gene in the other genome. Research has recently demonstrated that detecting homology with DNA is a challenging task [176] as they are rapidly evolving. Hence, it will be almost impossible to identify homology sequences after many years of divergence. Nowadays, homology is detected using protein sequences.
Show more

360 Read more

A Bayesian inference method for the analysis of transcriptional regulatory networks in metagenomic data

A Bayesian inference method for the analysis of transcriptional regulatory networks in metagenomic data

The results shown in Table 1 provide an outline of the Firmicutes CsoR meta-regulon of the human gut micro- biome. The inferred CsoR meta-regulon is in broad agree- ment with the reported CsoR regulons in Firmicutes [23, 24, 27, 28], but displays also several characteristic features that have not been previously reported. The inferred human gut Firmicutes CsoR meta-regulon comprises six distinct eggNOG/COG identifiers with annotated func- tion, but is primarily defined by two COG identifiers that encompass 96  % of the putatively CsoR-regulated promoters (Additional files 2, 3). COG1937 maps to the CsoR repressor, and all the putatively regulated com- plete gene sequences mapping to this COG contain the conserved C-H-C motif (Additional file 4). This indicates that these COG1937 instances are functional copper- responsive regulators and suggests that the reported self-regulation of CsoR is a common trait of human gut Firmicutes species [17, 23]. COG2217 maps to the cop- per-translocating P-type ATPases (CopA). These pro- teins harbor heavy metal-associated (HMA; IPR006121), haloacid dehydrogenase-like (HAD-like; IPR023214) and P-type ATPase A (IPR008250) domains and are canoni- cal members of the Firmicutes CsoR regulon [25]. The remaining eggNOG/COGs map to proteins containing a HMA (IPR006121) domain [NOG218972, NOG81268], an unknown function (DUF2318; IPR018758) membrane domain [NOG72602] or HMA (IPR006121), DsbD_2 (IPR003834) and DUF2318 (IPR018758) transmembrane domains [COG2836]. Proteins mapping to NOG218972 and NOG81268 are often annotated as copper chaper- ones, whereas those mapping to COG2836 are mainly annotated as heavy metal transport/detoxification pro- teins, and those mapping to NOG72602 are simply annotated as membrane proteins. Analysis of site score distribution for the eggNOG/COGs reported in Table 1 indicates the presence of a single putative false positive. The sequences mapping to NOG109008 belong to clonal instances of a glycoside hydrolase family 18 protein-cod- ing sequence harboring an average (19.42 score) putative CsoR-binding site in its promoter region.
Show more

11 Read more

CyNetworkBMA: a Cytoscape app for inferring gene regulatory networks

CyNetworkBMA: a Cytoscape app for inferring gene regulatory networks

binding, protein metabolite interaction, and protein pro- tein interaction data using Bayesian networks [7, 8]. Other methods rank edges based on correlation or mutual infor- mation [9, 10]. Regression-based algorithms formulate network inference as a variable selection problem with the goal to search for candidate regulators (i.e., parent nodes) for each target gene, for example [11–13]. In par- ticular, we previously showed the effectiveness of Bayesian Model Averaging (BMA) regression methods using time series data, in which snapshots of expression levels are taken at a few regular intervals after exposure to a drug perturbation [14]. Later work highlighted the ability of BMA to integrate external biological knowledge in the network building process to improve prediction accuracy [15]. Most recently, we have introduced the ScanBMA method for searching the model space, which signifi- cantly improves prediction accuracy and computational efficiency [16]. These BMA network inference methods are implemented in the networkBMA package [17] as part of Bioconductor [18].
Show more

7 Read more

Show all 10000 documents...