ing on the bistable nature of a genetically-regulated model, it was shown that the decisions to move and for location preference operate in parallel . This conclusion was reached by evaluating the dynamic stability of cellular behav- iors as a function of cellular phenotype and environmental inputs to the cell’s gene-regulatory network. Similarly, the synthetic biology community has begun to use models of GRNs to develop hypotheses for experimental validation . The theoretical biology community has been studying the dynamics of GRNs for many years, beginning with Boolean networks, where genes are defined by on-off states [83, 85]. Mathematical methods have extended this class of mod- els with tools for the analytical discovery of steady states, attractors that the GRNs will tend towards without external stimuli . Such analysis led to the prediction that evolution will drive single cell genetic network dynamics toward greater dynamic stability . In detailed analysis of a Boolean GRN derived from the biology of the cell cycle,  show that this ubiquitous GRN is inher- ently modular with a switch that triggers the completion of the cell cycle after passing a restriction point. These analytical and dynamical studies of Boolean GRNs have led to concrete predictions that can be experimentally verified.
Unsupervised learning is aimed at finding some patterns in the input data. Recognition of patterns in unlabelled datasets leads to clustering (unsupervised classification). Clustering involves grouping of data with similar features together. One of the key stages of recognition systems is pattern recognition. Pattern recognition has found application in diagnosing diseases, data mining, classification of documents, recognizing faces, etc [22, 26]. Data mining, as its name implies, involves automatically or semi-automatically mining (extracting) useful information from massive datasets [27, 28]. Self-organizing maps are artificial neural network algorithms for data mining . Massive data can be analysed and visualized efficiently by self- organising maps . In , unsupervised neural networks, based on self-organising map, was used for clustering of medical data with three subspaces namely patients’ drugs, body locations, and physiological abnormalities . In , self- organising map was used to analyse and visualize yeast gene expression, and identified as an excellent, speedy and convenient techniques for organization and interpretation of massive datasets like that of yeast gene expression . Unsupervised learning also performs the task of reducing the number of variables in high-dimensional data, a process known as dimensionality reduction. Data dimensionality reduction task can be further classified into feature extraction and feature selection . Feature selection involves selecting a subset of relevant variable from the original dataset [32, 33]. Transformation of the dataset in high dimensional space to low dimensional space is referred to as feature extraction . Principal component analysis is one of the best techniques for extracting linear features . High dimensional data can be easily classified, visualized, transmitted and stored thanks to the dimensionality reduction task which can be facilitated by unsupervised artificial neural network algorithms . In , auto-coders with weights initialized effectively was presented as a better tool than principal components analysis for data dimensionality reduction . Dimensionality reduction of data is usually performed at the pre-processing stages of other tasks to reduce computational complexity and improve performance of machine learning models. In , Performance component analysis, an unsupervised learning algorithm, was used to reduce the dimension of the data before classification for improvement in performance and better computational speed .
grouping of data with similar features together. One of the key stages of recognition systems is pattern recognition. Pattern recognition has found application in diagnosing diseases, data mining, classification of documents, recognizing faces, etc [22, 26]. Data mining, as its name implies, involves automatically or semi-automatically mining (extracting) useful information from massive datasets [27, 28]. Self- organizing maps are artificial neural network algorithms for data mining . Massive data can be analysed and visualized efficiently by self-organising maps . In , unsupervised neural networks, based on self-organising map, was used for clustering of medical data with three subspaces namely patients’ drugs, body locations, and physiological abnormalities . In , self-organising map was used to analyse and visualize yeast gene expression, and identified as an excellent, speedy and convenient techniques for organization and interpretation of massive datasets like that of yeast gene expression .
Although its relevance from the point of view of control system design has not to- date been considered, the problem of obtaining consistent estimates of parameters in the Michaelis-Menten model structure has been previously investigated (see the review paper  and references therein). In  and , different methods for fitting the Michaelis-Menten equation were analysed, and both studies concluded that different fitting methods will give different estimates of the parameters unless the experimental data is free from error (which in biological reality it never is). Different approaches to estimate the Michaelis-Menten coefficients have also been studied in ,  and , and those studies concluded that it is difficult to obtain a consistent estimate of the Michaelis-Menten coefficients unless particular design considerations are taken into account.
A number of studies have applied Boolean models to cancer analysis, both by considering specific pathways (Saadatpour and Albert, 2013), (Davidich and Bornholdt, 2008), (Fumi˜ a and Martins, 2013) and through more abstract systems- level studies (Huang et al., 2005), (Huang et al., 2009). Many of these studies have carried out an attractor analysis of the resulting models in order to gain insights into the biological system’s stable states (Albert and Othmer, 2003), (Huang et al., 2005), (Davidich and Bornholdt, 2008), typically associating these with phenotypes. In (Poret and Boissel, 2014), the authors went a step further and identified nodes whose state would effect the accessible attractors; this can help in identifying potential drug targets for preventing the expression of pathological phenotypes. Discrete models such as BNs have been shown to be equivalent to continuous models when only the steady states of the system are considered (Veliz-Cuba et al., 2012); however, it should be borne in mind that BNs are not appropriate when a detailed quantitative understanding of a process is required. For a review of Boolean modelling in biology, see (Saadatpour and Albert, 2013).
Developmental issues have been addressed by the Artificial Life research community . An early attempt to integrate multiple scales of develop- mental mechanism into a single model included cells with complex internal dynamics that communicated with each other via chemical and electrical signals as well as physical interactions . One of the findings of this study was that, while the multiple mechanisms enabled the robust production of interesting phenotypes, it also made the design of specific phenotypes more difficult. Later research demonstrated that this difficulty could be addressed by using a representation of the regulatory network that could be artificially evolved .
in most satellite cells in culture, intro- duction of dominant-negative Pax7 specifically abolishes MyoD (Relaix et al., 2006) but not Myf5 expression or satellite cell differ- entiation. The role of Pax7 in adult satellite cells has been contro- versial. A first report on conditional Pax7 mutants indicated that the satellite cell population was still present and that muscle regeneration could take place, even in the absence of both Pax7 and Pax3 (Lepper et al., 2009). Since then this view has been modified and in a more extensive study muscle regenera- tion was shown to be severely impaired when Pax7 ablation is attained in most satellite cells, preventing repopulation of the satellite cell pool (Gu¨nther et al., 2013; von Maltzahn et al., 2013). In this adult situation the satellite cell pool is not main- tained, not due to cell death but probably because of premature differentiation at the expense of proliferation (Gu¨nther et al., 2013). Pax3/Pax7 are normally downregulated prior to activation of Myogenin, cell cycle exit, and differentiation. Artificial mainte- nance of their expression in myoblasts has been reported to retard differentiation (Crist et al., 2012; Olguin and Olwin,
A node within the network can be selected as an “out- put” node, representing a class label attribute. There may be more than one output node. Various algorithms for learning can be applied to the network. Rather than re- turning a single class label, the classification process can return a probability distribution that gives the probability of each class. A major advantage of Bayesian network models is the ability to learn them from observed data. Bayesian networks can capture linear, non-linear, com- binatorial, stochastic and other types of relationships among variables. They are suitable for modeling genenetworks because of their ability to represent stochastic events, to describe locally interacting processes, to han- dle noisy or missing biological data in a principled statis- tical way and to possibly make causal inferences from the derived models [20,21]. Hence, Bayesian networks, including their variants Dynamic Bayesian networks, Gaussian networks, Module networks, mixture Bayesian networks and state-space models (SSMs), etc., have be- come widely used tools for regulatory-network model- ing.
To evaluate the generalization of predictability to another independent dataset, models parameterized from the entire TF induction data were used to predict ex- pressions in hypoxic condition. This hypoxia time course expression data was pre- processed using RMA and expression of days 1 to 14 were normalized relative to day zero. The RMA preprocessing was done independent of training data, i.e. TF induction data, as results were more consistent with companion RT-PCR data. To generate a single model for each gene that passed the above validation, the best model structure was selected and trained on the entire TF induction expression data set (figure 2.4). This step utilized the validated genes from cross-validation, to pre- dict the expression of genes during hypoxia and re-aeration. This step tests the ability of the models, generated from data derived from a baseline aerobic condition, to generalize and predict the expression of genes during a different, hypoxic, condi- tion. Each time point during hypoxia and re-aeration was predicted separately and independent of previous time points. We are thus currently predicting steady-state expression rather than timeseries evolution. Only genes whose expression changed by more than 2-fold, prior to normalization, were considered. After predicting the expression of each time point, we calculate the SSE between the model predictions at all time-points and the actual normalized expression data. Similarly, we use this SSE to calculate an F-test p-value as above. We also compare the predictions of the models to random TF’s as described above to check late in empirical FDR.
Once the reference network is loaded, we open the assessment dialog and select the inferred network and the reference from their respective drop-down lists. Every edge in our network under assessment has a posterior probability assigned to it. When a network does not have such probabilities on the edges, the application assumes all edges have probabilities equal to 1. After we click OK, CyNetworkBMA runs the assessment function and presents a window with three tabs (Fig. 4). The first tab shows various assessment statistics for a given proba- bility threshold, the value of which can be changed by moving a slider. The user can export the underlying data to a Cytoscape table, from where they can be saved to a file (see Additional file 3). The other two tabs show ROC and precision-recall curves, respectively, and their corresponding area under curve (AUC). The curves can also be exported to an image file. Our example net- work has an area under ROC curve of around 0.74. For networks inferred using the other four 100-gene data sets from DREAM4, this value ranges from 0.65–0.72. Table 1 shows other assessment scores for the example network.
In addition, several known subtype-dependent interactions were revealed from this analy- sis. For example, we observed a cluster of cell type dependent interactions involving the S6, p70S6K, GSK3B, and Akt proteins ( Fig 4 ), which involve a set of cell proliferation-related genes that respond to nutrient signals such as the mTOR-AKT pathway. In support of these findings, p70 S6 kinase, which targets the ribosomal subunit S6 also in this subnetwork, has been found to act as an alternate route for downstream signaling when Akt is inhibited [ 48 ]. Thus, the cell-specific interactions in this subnetwork may reflect tissue-dependent growth- related signaling. Another strong cluster of cell type dependent interactions found by the method involve EGFR and HER2, which direct growth signaling in response to binding growth factors produced by the stromal environment. The EGFR-family protein, HER2, does not bind ligand on its own but instead modulates the activity of other EGFR-family members through heterodimerization. HER2 plays a well-documented role in aberrant growth signaling in breast and other cancers where HER2 gene copies are amplified and/or overexpressed
relevant gene states and providing insights into the regulatory mechanism of the decision- making among gene states.
Unlike other methods that utilize randomization strategies to explore the parameter sensitivity for gene circuit[141–144], RACIPE adopts a more carefully designed sampling strategy to randomize circuit parameters over a wide range, but meanwhile to satisfy the half-functional rule to gain a comprehensive understanding of circuit dynamics. Instead of looking for the sensitivity of the circuit function to parameter variations [141,144] and the parameters best fitting the experimental data[142,143], we focused on uncovering conserved features from the ensemble of RACIPE models. This was carried out by standard statistical learning methods such as hierarchical clustering analysis. We showed the power of RACIPE to predict the robust gene states for a circuit with a given topology. Also, conceptually similar to the mixed-effects models used to describe a cell population for a very simple system , i.e. a one-gene transcription without a regulator, RACIPE could be potentially applied to a very large gene circuit to describe the gene expression dynamics of a cell population with an ensemble of models - an aspect we will work on in our future study. Moreover, it is easy to implement gene modifications such as knockdown or overexpression treatments with the RACIPE method to learn the significance of each gene or link in the circuit. Therefore, RACIPE provides a new way to model a gene circuit without knowing the detailed circuit parameters.
Our findings in Paper I provides a preliminary view of the regulatory landscape of causal molecular processes active within and across a majority of tissues believed to be central to advanced CAD. We identified 94 TS modules and 77 CT modules using X-WGCNA, an extension of WGCNA (explained in Paper II). Computationally it was not feasible to consider all genes from the seven STAGE tissues and therefore we only considered the most variant genes from each tissue. Nonetheless, we could still identify TS and CT RGNs that included both established  and previously unreported CAD candidate genes in the form of key drivers. These candidate genes participate in diverse molecular processes and established pathways of atherosclerosis, cholesterol and glucose metabolism, and acute inflammation, and were regulated in both TS and CT networks. Importantly, we found that nearly half of the RGNs were evolutionarily conserved, as judged from validation against the HMDP . As proof of concept, in RGN 42, a cross-species-validated, mouse atherosclerosis- and CAD- causal network active in AAW and involving RNA-processing genes, four key drivers (AIP, DRAP1, POLR2I, and PQBP1) specifically activated the same network genes and affected THP-1 foam cell formation. The entire RGN 42 was also re-identified in independent gene expression data from both CAD macrophages and carotid lesions.
Abstract—Synthetic Biologists are increasingly interested in the idea of using synthetic feedback control circuits for the mitigation of perturbations to generegulatorynetworks that may arise due to disease and/or environmental disturbances. Models employing Michaelis-Menten kinetics with Hill-type nonlinearities are typi- cally used to represent the dynamics of generegulatorynetworks. Here, we identify some fundamental problems with such models from the point of view of control system design, and argue that an alternative formalism, based on so-called S-System models, is more suitable. Using tools from system identification, we show how to build S-System models that capture the key dynamics of an example generegulatory network, and design a genetic feedback controller with the objective of rejecting an external perturbation. Using a sine sweeping method, we show how the S- System model can be approximated by a linear transfer function and, based on this transfer function, we design our controller. Simulation results using the full nonlinear S-System model of the network show that the synthetic control circuit is able to mitigate the effect of external perturbations. Our study is the first to highlight the usefulness of the S-System modelling formalism for the design of synthetic control circuits for generegulatorynetworks.
To extract relations, one should first recognize the named entities involved. This is particu- larly difficult in molecular biology where many forms of variation frequently occur. Synonymy is very common due to lack of standardization of gene names; BYP1, CIF1, FDP1, GGS1, GLC6, TPS1, TSS1, and YBR126C are all synonyms for the same gene/protein. Additionally, these names are subject to orthographic variation originating from differences in capitalization and hyphenation as well as syntactic variation of multiword terms (e.g. riboflavin synthetase beta chain = beta chain of riboflavin synthetase). Moreover, many names are homonyms since a gene and its gene product are usually named identically, causing cross-over of terms between semantic classes. Finally, para- grammatical variations are more frequent in life science publications than in common English due to the large number of publications by non-native speakers (Netzel et al., 2003).
Although theoretically rigorous, there are very few sys- tems for which the ME has been solved analytically. Direct numerical integration is often very difﬁcult, due to the fact that the state space grows very rapidly, and also to the stiff- ness of the equations. One way to overcome these issues is to employ some kind of approximation scheme. In certain lim- its, one can consider continuum approximations leading to partial differential equations, such as van Kampen’s linear- noise approximation and Fokker-Planck equations (13). How- ever, these approximations disregard the discreteness of the state of the system, and can therefore give rise to significant deviations when the number of molecules is very small, as can be the case for generegulatorynetworks (GRNs).
Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Knowledge Discovery in Data is the non- trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns from large data sets involving methods such as artificial intelligence, machine learning, statistics and database systems.  Ina sense, data mining is the
Dalam penyelidikan ini, Bayesian network adalah dicadangkan sebagai model untuk membina generegulatorynetworks dari kitar sel S. cerevisiae set data disebabkan keupayaannya untuk mengendali set data microarray yang mempunyai nilai-nilai yang hilang. Tujuan penyelidikan ini adalah untuk mempelajari dan memahami rekabentuk untuk Bayesian network, dan kemudian untuk membina generegulatorynetworks dari data Saccharomyces cerevisiae cell-cycle gene expression dan data Escherichia coli dengan membina model Bayesian networks dengan menggunakan algoritma hill-climbing serta Efron’s bootstrap approach dan genenetworks yang dibina untuk Saccharomyces cerevisiae dibandingkan dengan sub-networks yang dibina oleh Dejori . Pada akhir kajian ini, genenetworks yang dibina untuk Saccharomyces cerevisiae bukan sahaja telah mencapai True Positive Rate yang tinggi (lebih dari 90%), tetapi genenetworks yang dibina juga telah menemui lebih banyak interaksi berpotensi antara gen. Oleh kerana itu, dapat disimpulkan bahawa prestasi genenetworks yang dibina menggunakan Bayesian network dalam kajian ini adalah terbukti lebih baik kerana ia boleh mendedahkan lebih banyak hubungan antara gen.
(b) The state space of the network represented as a directed graph where each node represents a state of the network while arcs represent valid state transitions.
(GPBN) model proposed by Graudenzi et al. in , which is a generalization of the classical RBN model. In a GPBN, gene to gene interactions are mediated by the synthesis of proteins and other products. However post-transcriptional regulation carried out by miRNAs is still not fully consid- ered. Figure 2 depicts a very simple example of GRN mod- eling a cellular regulatory activity that includes all entities that need to be considered in order to extend the GPBN model to include post-transcriptional regulation mecha- nisms. According to the example, G1 and G2 are tran- scribed into two mRNA molecules (mRNA1 and mRNA2); P1 and P2 are the resulting proteins. P2 works as an up- stream promoter of gene G3, i.e., G2 is a transcription factor of gene G3. miRNA1 (still a product of G1) acts as a post-transcriptional repressor of mRNA2, which re- sults in a translational repression of P2 and therefore in an inhibition effect on gene G3.
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A., and Luscombe, N. M. (2009). A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet 10, 252-263.
Vermeirssen, V., Barrasa, M. I., Hidalgo, C. A., Babon, J. A. B., Sequerra, R., Doucette-Stamm, L., Barabási, A., and Walhout, A. J. M. (2007a). Transcription factor modularity in a gene-centered C. elegans core neuronal protein-DNA interaction network. Genome Res 17, 1061- 1071.