Present Inference Methods - Development of dynamic Bayesian network for the analysis of high-di

2.8 Conclusion

3.1.2 Present Inference Methods

There are various existing methods for inferring gene regulatory networks which have been categorised into differential equation methods and knowledge-based methods. Other methods such Boolean Networks and Correlation-based methods exist but are however beyond the scope of this study.

3.1.2.1 Differential Equation Methods

The formalisms of differential equations such as partial and ordinary differential equations have widely used to study dynamical systems in engineering and scientific re- search. These powerful methods have been adapted to model and represent metabolic processes and dynamics of gene regulation processes [93], [94]. Here ordinary differential equation (ODE) is focused on as it can be used to simultaneously represent differentiations in time and the dynamics of causal relationships using the earlier dis- cussed four inference levels of Figure 3.1.

In ODE, the rate of change of a gene expression’s component is represented as a concentration of all the components, and within the ODE systems, the causal effects of the genes are fixed [95], [96]. ODE can be mathematically formulated as

dt = θ (t, X ) (3.1)

where X = X (t) = (X1(t), ..., Xn(t))T denotes the gene expression values of genes

1, ..., n at time point t,t ∈ [t0, T ], 0 ≤ t0≤ T ≤ ∞. The function θ describes the re-

lationship between the first-order derivative of X and how genes are concentrated in the regulatory system. The function could be linear or nonlinear and describes the relationship between the rate of change in concentration of the genes and their causal regulators. For a linear ODE model, the relationship could be written as

dX_i dt = αi0+ n

∑

j=1 βi jXj(t), j = 1, ..., n (3.2)

where αi0is the intercept and β = {βi j}i, j=1,...,n represents how the genes in the regu-

latory system affect the rate of change of expression of the t-th gene.

Reconstructing the regulatory network from data is then transformed as a problem of parameter identification of the ODE system. To identify these parameters, likelihood and least squares-based methods have been used in the past [97], [77]. These methods are however not very effective for reverse engineering and an integrative pipeline ap- proach was proposed [95], [96]. This method involves a two-step process where the first step involves fitting the gene expression’s mean curves and estimating its derivative. At the second step, variable selection is carried out using regularization methods such as LASSO [98] and the Smoothly Clipped Absolute Deviation (SCAD) [99] to optimally shrink variables.

Gene regulations in ODE model are modelled and represented by derivative equations where the dependent variables which is the rate of change of one gene’s expression, is quantified as a function of other related genes. The TFs regulate the transcriptional processes of the genes in the regulatory system and the independent variables are assumed to be made up of the TF proteins. This assumption enables the reverse engineering of the regulatory network to become a problem of parameter inference of some functions such as linear functions used to model gene expression data [95].

Due to the underlying differences in the statistical and mathematical perspectives of inferring a GRN, ODE aids modelling of the regulatory system but not its inference [100]. This is because functional relationships are assumed to be described by deriva- tion equations then inference of the regulatory architectures is done by statistical methods such as variable selection and parameter estimation [95] and the time delays asso- ciated with the self-degradation and activations can integrated into the system by the introduction of respective self-degradation and activation terms into the differential equations.

3.1.2.2 Knowledge-based Methods

Genuine and accurate transcriptional regulations are difficult to identify sorely by re- liance on data-driven approaches and the efficiency of reverse engineering using only gene expression data is hard to promise [101], [97], [102]. This makes integration of a priori knowledge of regulations an important requirement in the development of more useful and robust methods of inference. Key functional linkages and relationships can be derived from documented regulations [103], [104], protein-protein interactions [91], protein-DNA binding data from ChIP-Seq [105] and motifs from TF binding se- quence [82]. Transcriptional regulatory networks can be identified by the integration of these a priori knowledge with gene expression data.

Integrating a priori knowledge reliably often leads to probabilistic ways of inference such as Bayesian Network [106], [107]. Many techniques have been proposed to calculate these probabilities. In the area of statistical physics an energy function ap- proach was proposed by [108], [109] which involves introduction of prior knowledge from multiple sources by expressing prior knowledge as a function of network energy. Gibbs distribution is used to obtain the prior distribution over the network structures [108]. The weights of the prior knowledge are represented by the parameters of the Gibbs distribution. As a result, the Bayesian network integrates the prior knowledge in order to learn the structure of the regulatory network. The authors of [108] and [109] achieved higher inference performance using both real and simulated data.

[110] proposed a linear programming (LP) method based on ODE for integration of prior knowledge in reverse engineering of GRNs. In the proposed method, the as- sociation gap between network structure and gene expression data is minimised by building an LP model and obtaining an integrated regulatory network by solving the LP. The objective function of the LP is then to minimise the number of gene connec- tions needed to have the sparseness of the inferred network with constraints being the prior knowledge and the linear additive equations. The solution that gives the sparser network with respect to the two constraints is chosen as the inferred network [110].

In document Development of dynamic Bayesian network for the analysis of high-dimensional biomedical data (Page 54-57)