A Bayesian Approach for Inferring Local Causal Structure in Gene Regulatory Networks

(1)

The following full text is a publisher's version.

For additional information about this publication click this link.

http://hdl.handle.net/2066/197601

Please be advised that this information was generated on 2019-06-02 and may be subject to

change.

(2)

A Bayesian Approach for Inferring Local Causal Structure in Gene

Regulatory Networks

Ioan Gabriel Bucur [email protected]

Tom van Bussel [email protected]

Tom Claassen [email protected]

Tom Heskes [email protected]

Faculty of Science, Radboud University, Postbus 9010, 6500GL Nijmegen, The Netherlands

Abstract

Gene regulatory networks play a crucial role in controlling an organism’s biological processes, which is why there is significant interest in developing computational methods that are able to extract their structure from high-throughput genetic data. A typical approach consists of a series of conditional independence tests on the covariance structure meant to progressively reduce the space of possible causal models. We propose a novel efficient Bayesian method for discovering the local causal relationships among triplets of (normally distributed) variables. In our approach, we score the patterns in the covariance matrix in one go and we incorporate the available background knowledge in the form of priors over causal structures. Our method is flexible in the sense that it allows for different types of causal structures and assumptions. We apply the approach to the task of inferring gene regulatory networks by learning regulatory relationships between gene expression levels. We show that our algorithm produces stable and conservative posterior probability estimates over local causal structures that can be used to derive an honest ranking of the most meaningful regulatory relationships. We demonstrate the stability and efficacy of our method both on simulated data and on real-world data from an experiment on yeast.

Keywords: Causal discovery; Structure learning; Covariance selection; Bayesian inference; Gene regulatory networks.

1. Introduction

Gene regulatory networks (GRNs) play a crucial role in controlling an organism’s biological pro-cesses, such as cell differentiation and metabolism. If we knew the structure of a GRN, we would be able to intervene in the developmental process of the organism, for instance by targeting a specific gene with drugs. In recent years, researchers have developed a number of methods for inferring regulatory relationships from data on gene expression, the process by which genetic instructions are used to synthesize gene products such as proteins. Gene regulatory relationships are inherently causal: we can manipulate the expression level of one gene (the ‘cause’) to regulate that of another gene (the ‘effect’). Because of this, many GRN inference algorithms such as ‘Trigger’ (Chen et al., 2007), ‘CIT’ (Millstein et al., 2009) or ‘CMST’ Neto et al. (2013) are aimed at finding promising causal regulatory relationships among genes.

An efficient way to derive causal relationships from observational data, which results in clear and easily interpretable output, is to find local causal patterns in the data. Thelocal causal discovery

(LCD) algorithm (Cooper, 1997) makes use of a combination of observational data and background knowledge when searching for unconfounded causal relationships among triplets of variables.

(3)

Trig-ger is also designed to search for this LCD pattern in the data, using the background knowledge that genetic information is randomized at birth, before any other measurements can be made. Mani et al. (2006), on the other hand, divide the causal discovery task into identifying so-called Y structures on subsets of four variables. The Y structure is the smallest pattern containing an unconfounded causal relationship that can be learned solely from observational data in the presence of latent variables.

A key feature of Trigger is that it can estimate the probability of causal regulatory relationships, while controlling for the false discovery rate (Chen et al., 2007). The algorithm consists of a series of likelihood ratio tests for regression coefficients that are translated into statements about conditional (in)dependence, which are then used to identify the presence of the LCD pattern. Testing whether regression coefficients are significantly different from zero essentially boils down to testing whether partial correlation coefficients are significantly different from zero (Meinshausen and B¨uhlmann, 2006), which means that all the information needed for the tests lies in the covariance structure.

We propose a Bayesian approach for local causal discovery on triplets of (normally distributed) variables that makes use of the information in the covariance structure. With our method, we directly score patterns in the data by computing posterior probabilities over all possible three-dimensional covariance structures in one go, with the end goal of identifying plausible causal relationships. This provides a stable, efficient and elegant way of expressing the uncertainty in the underlying local causal structure, even in the presence of latent variables. Moreover, it is straightforward to incorporate background knowledge in the form of priors on causal structures. We show how we can plug in our method into an algorithm that searches for local causal patterns in a GRN and outputs a well-calibrated and reliable ranking of the most likely causal regulatory relationships.

The rest of the paper is organized as follows. In Section 2, we introduce some standard back-ground notation and terminology. In Section 3, we describe our Bayesian approach for inferring the covariance pattern of a three-dimensional Gaussian random vector. By defining simple priors, we then derive the posterior probabilities of local causal structures given the data. In Section 4, we present the results of applying our method on simulated and real-world data. We conclude by discussing advantages and disadvantages of our approach in Section 5.

2. Background

Causal structures can be represented by directed graphs, where the nodes in the graph represent (random) variables and the edges between nodes represent causal relationships. Maximal

ances-tral graphs(MAGs) encode conditional independence information and causal relationships in the

presence of latent variables and selection bias (Richardson and Spirtes, 2002). We refer to MAGs without undirected edges asdirected maximal ancestral graphs(DMAGs). DMAGs are closed un-der marginalization, which means they preserve the conditional independence information in the presence of latent variables.

Two causal structures areMarkov (independence) equivalentif they imply the same conditional independence statements. TheMarkov equivalence classof a MAG (or DMAG) is represented by a partial ancestral graph (PAG), which displays all the edge marks (arrowhead or tail) shared by all members in the class and displays circles for those marks that are not common among all members. In this work, we will consider two types of graphs: directed acyclic graphsanddirected maximal

ancestral graphs. However, the results presented can be applied to any causal graph structure.

A(conditional) independence modelIover a finite set of variablesV is a set of tripleshX, Y |Zi,

(4)

be empty (Studeny, 2006). We can induce a (probabilistic) independence model over a probability distributionP ∈ Pby letting:

hA, B|Ci ∈ I(P) ⇐⇒ A⊥⊥B|Cw.r.t. P.

The conditional independence model induced by a multivariate Gaussian distribution is a

com-positional graphoid(Sadeghi and Lauritzen, 2014), which means that it satisfies thegraphoid

ax-iomsand thecompositionproperty. Because of this, there is a one-to-one correspondence between the conditional independence models that can be induced by a multivariate Gaussian and the Markov equivalence classes of a causal graph structure.

3. Bayes Factors of Covariance Structures (BFCS)

Model Markov Equivalence Class (PAG) Covariance

Matrix Precision Matrix ‘X1⊥6 ⊥X2⊥6 ⊥X3’ (Full) X1 X2 X3 ‘X1⊥⊥X3’ (Acausal) X1 X2 X3

0

‘X₁⊥⊥X₃|X₂’ (Causal) X1 X2 X3

0

‘(X1,X3)⊥⊥X2’ (Independent) X1 X2 X3

0

‘X1⊥⊥X2⊥⊥X3’ (Empty) X1 X2 X3

0 0

0

0 0 0

0 0

0

0 0 0

Figure 1: Overview of the five canonical independence patterns between three variables, depicting the equivalence between causal models, conditional independences, and covariance structures

We are interested in inferring the local covariance structure from observational data, assuming the data follows a (latent) Gaussian model. We will be working with triplets of variables. With finite data, we can never be sure about the true covariance structure underlying the data. Hence, we prefer to work with probability distributions over covariance matrices. For a general three-dimensional

(5)

covariance matrixΣ, the likelihood reads: p(D|Σ) = (2π)−32n|Σ|− n 2 exp −1 2tr SΣ −1 ,

whereS=D|_D_{is the scatter matrix.}

Under the Gaussianity assumption, there is a one-to-one correspondence between the constraints in the covariance matrix and the conditional independences among the variables. There are five specific canonical patterns to consider, which are depicted in Figure 1. The ‘full’ and ‘empty’ covariance patterns are self-explanatory. We call ‘independent’ the pattern occurring when one variable is independent of the other two. We call the pattern on the second row ‘acausal’ because

X2cannot causeX1orX3if conditioning uponX2turns a conditional independence betweenX1and X3 into a conditional dependence. We call the pattern on the third row ‘causal’ becauseX2 either

causesX₁ orX₃ if conditioning upon X₂ turns a conditional dependence between X₁ andX₃ into a conditional independence (Claassen and Heskes, 2011). The five patterns translate into eleven distinct covariance structures when considering all permutations of three variables. These are the only possible covariance structures on three variables, since the conditional independence model induced by a multivariate Gaussian is a compositional graphoid (Sadeghi and Lauritzen, 2014).

Our goal is to compute the posterior probability of each of the possible conditional independence models given the data. We denote byJ ={M0,M1, ...,M10}the set of all possible conditional

independence models. The model evidence is then, forM_j ∈ J:

p(D| Mj) = Z

dΣp(D|Σ)p(Σ| Mj).

To facilitate computation, we derive the Bayes factors of each conditional independence model (M_j) compared to a reference independence model (M₀):

Bj = p(D| M_j) p(D| M0) = R dΣp(D|Σ)p(Σ| M_j) R dΣp(D|Σ)p(Σ| M0).

As we shall see in Subsection 3.2, many terms will cancel out, making the resulting ratios much simpler to compute (see for example Equation 5). Finally, we arrive at the posterior probabilities:

p(Mj|D) = p(D| Mj)·p(Mj) P jp(D| Mj)·p(Mj) = PBj·p(Mj) jBj ·p(Mj) . (1)

3.1 Choosing the Prior on Covariance Matrices

We consider the inverse Wishart distribution for three-dimensional covariance matrices, which is parameterized by the positive definite scale matrixΨand the number of degrees of freedomν:

Σ∼ W₃−1(Ψ, ν); p(Σ) = |Ψ| ν 2 232νΓ₃(ν 2) |Σ|−ν+42 exp −1 2tr ΨΣ −1 .

The inverse Wishart is the conjugate prior on the covariance matrix of a multivariate Gaus-sian vector, which means the posterior is also inverse Wishart. Given the data setDcontainingn

observations andS=D|_D_{the scatter matrix, the posterior reads:}

(6)

In order to choose appropriate parameters for the inverse Wishart prior, we analyze the implied distribution in the space of correlation matrices. By transforming the covariance matrix into a correlation matrix, we end up with a so-calledprojected inverse Wishart distribution on the latter, which we denote by PW−1_{. Barnard et al. (2000) have shown that if the correlation matrix} _R

follows a projected inverse Wishart distribution with scale parameterΨandνdegrees of freedom, then the marginal distributionp(Rij), i6=j,for off-diagonal elements is uniform if we takeΨto be

any diagonal matrix andν=p+ 1, wherepis the number of variables. We are working with three variables, so we chooseν = 4.

It is easy to check that for any diagonal matrixD, the projected inverse Wishart is scale invariant:

PW−1(Ψ, ν)≡ PW−1(DΨD, ν).

From (2), it then follows that we can make the posterior distribution on the correlation matrices independent of the scale of the data by choosing the prior scale matrixΨ=03,3. Since that would

lead to an undefined prior distribution, we can achieve the same goal by settingΨ=I3in the limit

↓0, whereIis the identity matrix. Summarizing, we will consider the prior distribution: Σ∼ W−1

3 (I3,4), ↓0.

3.2 Deriving the Bayes Factors

As reference model (M₀), we choose the most general case in which no independences can be found in the data (‘X1⊥6 ⊥X2⊥6 ⊥X3’), which means that the covariance matrix is unconstrained (Figure 1,

first row). We assume that the covariance matrix follows an inverse Wishart distribution

p(Σ|X1⊥6 ⊥X2⊥6 ⊥X3) =W3−1(Σ;I3, ν),

where we consider the limit↓ 0and setν = 4(see Subsection 3.1). Using the conjugacy of the inverse Wishart prior for the covariance matrix, we immediately get the model evidence

p(D|X1⊥6 ⊥X2⊥6 ⊥X3) = 32νΓ₃(n+ν 2 ) π32nΓ3(ν 2) |S+I3|− n+ν 2 , (3)

whereΓpis thep-variate gamma function andS=D|Dis the scatter matrix. We first compare the

evidence for the conditional independence model ‘X1⊥⊥X2⊥⊥X3’ to the evidence for the reference

model ‘X₁⊥6 ⊥X₂⊥6 ⊥X₃’ by computing the Bayes factor:

B(X1⊥⊥X2⊥⊥X3) =

p(D|X1⊥⊥X2⊥⊥X3)

p(D|X1⊥6 ⊥X2⊥6 ⊥X3)

.

We can implement the ‘X1⊥⊥X2⊥⊥X3’ case (Figure 1, last row) by constrainingΣto be diagonal,

which means we only have to consider the parametersΣ11,Σ22,Σ33. We propose to take:

p(Σ|X1⊥⊥X2⊥⊥X3) = 3

Y i=1

W₁−1(Σii;, ν). (4)

The likelihood also factorizes in this case and becomes

p(D|Σ) = 3 Y i=1 (2πΣii)− n 2 exp −1 2tr SiiΣ −1 ii ,

(7)

yielding the model evidence p(D|X1⊥⊥X2⊥⊥X3) = 3 Y i=1 " ν2Γ₁(n+ν 2 ) πn2Γ1(ν 2) (Sii+)− n+ν 2 # .

Dividing by the model evidence from (3) and taking the limit↓0, we obtain the Bayes factor

B(X1⊥⊥X2⊥⊥X3) = Γ3(ν₂) Γ3(n+₂ν) _Γ1₍n+ν 2 ) Γ1(ν₂) 3 |C|−n+2ν = n+ν−2 ν−2 Γ(ν−₂1) Γ(n+₂ν) Γ(ν₂) Γ(n+ν₂−1)|C| −n+₂ν_, (5) withCthe sample correlation matrix andΓthe (univariate) gamma function. Due to the choice (4), the evidence for ‘X1⊥⊥X2⊥⊥X3’ also scales with

ν

2, so the dominant terms depending oncancel

out and the Bayes factor depends only on the correlation matrix in the limit↓ 0. The derivations for the other cases (left out due to space constraints) are similar, leading to the Bayes factors:

In conclusion, for deriving the Bayes factors in (6), we only need to plug in the correlation matrix with the number of samples and compute a limited number of closed-form terms. This is then sufficient to obtain the full posterior distribution over the covariance structures, which makes the BFCS method fast and efficient.

3.3 Priors on Causal Structures

To do a full Bayesian analysis, we need to specify priors over the different conditional independence models. Assuming faithfulness, there is a one-to-one correspondence between the Markov equiv-alence classes of the underlying causal graph structure and the conditional independence models. By taking a uniform prior over causal graphs and denoting by|Mj|the number of causal graphs

consistent with the independence modelM_j ∈ J, we arrive at the prior:

p(Mj) =

|Mj| P

j|Mj|

, ∀Mj ∈ J.

In Table 1 we count the number of DAGs and DMAGs (see Section 2) consistent with each covariance pattern (see Figure 1). For example, if we assume an underlying DAG structure, then

p(X1⊥⊥X2⊥⊥X3) = ₂₅1. The addition of background knowledge (BK) reduces the number of causal

(8)

Pattern CI Model Description DAG DAG w/ BK DMAG DMAG w/ BK Full M0 X1⊥6 ⊥X2⊥6 ⊥X3 6 2 19 3 M1 X1⊥⊥X2 1 1 3 2 Acausal M2 X2⊥⊥X3 1 0 3 0 M₃ X3⊥⊥X1 1 1 3 2 M₄ X1⊥⊥X2|X3 3 1 5 1 Causal M5 X2⊥⊥X3|X1 3 1 5 1 M6 X3⊥⊥X1|X2 3 1 5 1 M7 X1⊥⊥(X2,X3) 2 2 3 3 Independent M₈ X₂⊥⊥(X3,X1) 2 1 3 1 M₉ X3⊥⊥(X1,X2) 2 1 3 1 Empty M₁₀ X1⊥⊥X2⊥⊥X3 1 1 1 1 All 25 12 53 16

Table 1: Number of causal graph structures over three variables in each Markov equivalence class. In the columns marked ‘w /BK’, the background knowledge thatX1 precedes all other variables,

i.e., there can be no arrowhead towardsX1, is added when counting the number of structures.

discovering causal regulatory relationships is the background knowledge that the genetic marker precedes the expression traits, i.e., thatX1 precedes all other variables. This additional constraint

leads to the counts in the columns marked ‘w/ BK’ in Table 1. Some of the covariance patterns imply acausal / causal statements (Figure 1), which is what allows us to directly translate the posterior probabilities over covariance patterns into statements over causal relationships.

Now that we have defined priors on the conditional independence models (Markov equivalence classes), we can derive the posterior probabilities from equations (1) and (6). The posterior proba-bilities could also be derived by combining the Bayesian Gaussian equivalent (BGe) score (Geiger and Heckerman, 1994) with the priors on causal structures defined in this subsection. Due to our choice of priors on covariance matrices (see Subsection 3.1), however, the Bayes factors are simpler to compute. This makes our approach more efficient when used to infer causal relationships in large regulatory networks.

To summarize, we have developed a method for computing the posterior probabilities of the covariance structures over three variables (BFCS). We will employ this procedure as part of an algorithm for discovering regulatory relationships. Similarly to LCD and Trigger, the idea is to search over triplets of variables to find potential local causal structures (see Algorithm 1).

4. Experimental Results

4.1 Consistency of Detecting Local Causal Structures

In this simulation, we assessed how well our BFCS approach is able to detect the causal structure

X1 → X2 → X3, which is crucial to the application of the LCD and Trigger algorithms. We

considered the three generating structures depicted in Figure 2. In all three cases, the variables are mutually marginally dependent, but only in the first modelX1⊥⊥X3|X2holds.

We sampled random structural parameters (interaction strengths) for each causal relationship from independent standard normal distributions. Then, given each generating model, we generated

(9)

X1 X2 X3

(a) Causal model

X1 X2 X3

(b) Independent model

X1 X2 X3

(c) Full model

Figure 2: Generating models

data sets of different sizes (from102 up to106samples). For each data set, we computed the corre-lation matrix, which we plugged into 6 for computing the Bayes factors. We repeated this procedure for 1000 different parameter configurations. We assumed thatX1 precedes all other variables and

we allowed for latent variables, so we did not use the knowledge that the data is causally sufficient. We considered a uniform prior over the twelve possible DMAG structures (see Table 1).

0.00 0.25 0.50 0.75 1.00 2 3 4 5 6

Number of samples on the base−10 logarithmic scale

Probability of causal model

(a) Causal model

0.00 0.25 0.50 0.75 1.00 2 3 4 5 6

(b) Independent model 0.00 0.25 0.50 0.75 1.00 2 3 4 5 6

(c) Full model 0.00 0.25 0.50 0.75 1.00 2 3 4 5 6

(d) Causal model 0.00 0.25 0.50 0.75 1.00 2 3 4 5 6

(e) Independent model

0.00 0.25 0.50 0.75 1.00 2 3 4 5 6

(f) Full model

Figure 3: Box plots of the posterior probabilities ofp(X1 →X2 →X3|D)output by BFCS across

the 1000 different parameter configurations for each of the generating models in Figure 2. As we increased the number of samples, the probabilityp(X1 →X2→X3|D)converged to one when the

data was generated from the causal model (a) and converged to zero when it was not (b and c).Top: x= (X1,X2,X3)is multivariate Gaussian;Bottom:X1is Bernoulli and(X2,X3)|X1is Gaussian.

In the first experiment we generated random multivariate data from the models in Figure 2. As expected, the posterior probability p(X1 → X2 → X3 |D) converged to one (Figure 3, top row)

when the true generating model was the one in Figure 2a. At the same time,p(X1→X2 →X3|D)

converged to zero when the true generating model was the independent or full model. Note that it is easier to distinguish the causal model from the independent model than from the full model. When generating data from the full model, it is possible to generate structural parameters that are close to zero. If the direct interaction betweenX1 andX3 is close to zero, it looks as ifX1⊥⊥X3|X2.

In the second experiment, we considered the same generating models, but we sampledX1from

a Bernoulli distribution to mimic a genetic variable, e.g., an allele or the parental strain in the yeast experiment (Chen et al., 2007). We sampled a different success probability for the Bernoulli variable in each repetition from a uniform distribution between 0.1 and 0.9. The random vector(X2,X3)was

(10)

see that when the Gaussian assumption did not hold forX1, we lost some power in recovering the

correct model. When we increased the number of samples, this loss in power due to the violation of the Gaussian assumption became less severe and BFCS remained consistent.

4.2 Causal Discovery in Gene Regulatory Networks

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 1 − Specificity Sensitivity ROC 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Recall Precision PRC 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 1 − Specificity Sensitivity ROC 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Recall Precision PRC 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 1 − Specificity Sensitivity ROC 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Recall Precision PRC 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 1 − Specificity Sensitivity ROC 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Recall Precision PRC

trigger 1 trigger 2 trigger 3 BFCS DAG BFCS DMAG BGe

Figure 4: Comparing the performance of Trigger and BFCS in terms of the ROC and Precision-Recall (PRC) curves. We generated both a sparse GRN (left column) and a dense GRN (right column) consisting of 51 and 491 regulatory relationships, respectively. We generated 100 samples (top row) and then 1000 samples (bottom row) from both networks. We ran Trigger three times (labeled ‘trigger 1’, ‘trigger 2’, and ‘trigger 3’) on the simulated data to account for differences when sampling the null statistics. We compared Trigger against two versions of BFCS, in which we took a uniform prior over DAGs (‘BFCS DAG’) and DMAGs (‘BFCS DMAG’), respectively. For reference, we also show the performance of an equivalent method that uses the BGe score (‘BGe’). In this experiment, we simulated a transcriptional regulatory network meant to emulate the yeast data set analyzed in Chen et al. (2007). We randomly generated data for 100 genetic markers, where each marker is an independent Bernoulli variable with ‘success’ probability uniformly sampled between 0.1 and 0.5. We then generated transcript level expression data from the structural equation model:

t:=Bt+l+ε,

where Bis a lower triangular matrix,t = (T1, T2, ..., T100)| is the random vector of the

(11)

N(0100,I100)is added noise. The true causal relationships are in the directed graph structure

de-fined byB. Each expression traitTiwas causally linked to the genetic markerLi,∀i∈ {1,2, ...,100}.

For each pair of expression traits(Ti, Tj)and for every genetic markerLk, we derived the

pos-terior probability ofLk→Ti →Tj using BFCS. We then took the maximum of these probabilities

overk, which is a lower bound of the probability that there existsLksuch that the triple(Lk, Ti, Tj)

has the causal structureLk→ Ti →Tj. Similarly to Trigger, we reported this value as a

conserva-tive estimate for the probability ofTi →Tj. For reference, we also evaluated the performance of an

equivalent method that uses the BGe score instead of the Bayes factors computed in BFCS.

In Figure 4, we compare the performance of Trigger and BFCS by showing the ROC and Precision-Recall (PRC) curves. Both Trigger and BFCS performed much better when the under-lying network structure was sparse. For sparse networks, BFCS shows a significant improvement in the AUC measure for both curves. For dense networks, Trigger and BFCS have fairly similar ROC curves, but BFCS shows a noticeable improvement in the precision-recall curve. More specifically, the higher precision for the first causal relationships that are recalled shows that BFCS is better at ranking the top regulatory relationships. Furthermore, the probabilities output by BFCS are bet-ter calibrated than those by Trigger (see Table 2). This is because Trigger is overconfident in its predictions, which explains the large spread in probabilities compared to BFCS.

Sparse GRN Dense GRN

Samples trigger BFCS DAG BFCS DMAG BGe trigger BFCS DAG BFCS DMAG BGe 100 0.037 0.022 0.015 0.009 0.481 0.201 0.148 0.231 1000 0.028 0.011 0.008 0.006 0.343 0.376 0.310 0.445

Table 2: Comparing the calibration of Trigger and BFCS using the Brier score (lower is better).

4.3 Comparing Results from an Experiment on Yeast Algorithm 1Running BFCS on the yeast data set

1: Input:Yeast data set consisting of 3244 markers and 6216 gene expression measurements

2: for allexpression traitsTido

3: for allexpression traitsTj, j 6=ido

4: for allgenetic markersLkdo

5: Compute the Bayes factors for the triplet(Lk, Ti, Tj)

6: Derive the posterior probability of the structureLk→Ti→Tjgiven the data

7: end for

8: Savemaxkp(Lk→Ti→Tj)as the probability of geneiregulating genej

9: end for 10: end for

11: Output:Matrix of regulation probabilities

Chen et al. (2007) showcased the Trigger algorithm by applying it to an experiment on yeast, in which two distinct strains were crossed to produce 112 independent recombinant segregant lines. Genome-wide genotyping and expression profiling were performed on each segregant line. We computed the probabilities over the triples in the yeast data set usingtrigger(Chen et al., 2017) and BFCS (Algorithm 1). For BFCS, we used DMAGs to allow for the possibility of latent variables.

(12)

Rank Gene Chen et al. trigger BFCS 1 MDM35 0.973 0.999 0.678 2 CBP6 0.968 0.997 0.683 3 QRI5 0.960 0.985 0.678 4 RSM18 0.959 0.984 0.672 5 RSM7 0.953 0.977 0.684 6 MRPL11 0.924 0.999 0.670 7 MRPL25 0.887 0.908 0.675 8 DLD2 0.871 0.896 0.660 9 YPR126C 0.860 0.904 0.634 10 MSS116 0.849 0.997 0.659 (a) Genes regulated by NAM9, sorted by ‘Chen et al.’.

Rank Gene Chen et al. trigger BFCS 1 FMP39 0.176 0.401 0.691 2 DIA4 0.493 0.987 0.691 3 MRP4 0.099 0.260 0.691 4 MNP1 0.473 0.999 0.691 5 MRPS18 0.527 0.974 0.690 6 MTG2 0.000 0.000 0.690 7 YNL184C 0.299 0.768 0.690 8 YPL073C 0.535 0.993 0.690 9 MBA1 0.290 0.591 0.690 10 ACN9 0.578 0.927 0.690 (b) Genes regulated by NAM9, sorted by ‘BFCS’.

Table 3: The column ‘Chen et al.’ shows the original results of the Trigger algorithm as reported in Chen et al. (2007). The ‘trigger’ column contains the probabilities we obtained when running the algorithm from the Bioconductortriggerpackage (Chen et al., 2017) on the entire yeast data set with default parameters. The column ‘BFCS’ contains the output of running Algorithm 1 on the yeast data set, for which we took a uniform prior over DMAGs.

In Table 3a, we report the top ten genes purported to be regulated by the putative regulator NAM9, sorted according to the probability estimates reported in Chen et al. (2007). We see that BFCS also assigns relatively high, albeit better calibrated (Table 2) and much more conservative, probabilities to the most significant regulatory relationships found by Trigger. In Table 3b, we see that some relationships ranked significant by BFCS are assigned a very small probability by Trigger. The regulatory relationship NAM9→ MTG2, of which both genes are part of the mitochondrial ribosome assembly, is ranked sixth by BFCS, but is assigned zero probability by Trigger. This is because, in the Trigger algorithm, the genetic markerLi exhibiting the strongest linkage withTi

(NAM9 in this case) is preselected and then only the probability ofLi → Ti → Tj is estimated.

With BFCS, on the other hand, we estimated the probability of this structure for all genetic markers. 5. Discussion

We have introduced a novel Bayesian approach for inferring gene regulatory networks that uses the information in the local covariance structure over triplets of variables to make statements about the presence of causal relationships. One key advantage of our method is that we consider all possible causal structures at once, whereas other methods only look at and test for a subset of structures. Because we focus on discovering local causal structures, our method is simple, fast, and inherently parallel, which makes it applicable to very large data sets. Furthermore, the probability estimates produced by BFCS constitute a measure of reliability in the inferred causal relations. We have demonstrated the effectiveness of our approach by comparing it against the Trigger algorithm, a state-of-the-art procedure for inferring causal regulatory relationships. Other methods for inferring gene regulatory networks such as ‘CIT’ (Millstein et al., 2009) or ‘CMST’ (Neto et al., 2013) output

p-values instead of probability estimates, which is why they are not directly comparable to BFCS. In this paper, we have proposed a simple uniform prior on two types of causal graph structures, namely DAGs and DMAGs. However, our method allows for more informative causal priors to be incorporated, taking into consideration properties such as the sparsity of the networks. Moreover, our approach is structure-agnostic, by which we mean we can consider different causal graph

(13)

struc-tures incorporating various data-generating assumptions. The tricky part is then to come up with an appropriate prior on the set of causal graph structures from which we assume the data is generated.

Acknowledgments

This research has been partially financed by the Netherlands Organisation for Scientific Research (NWO), under project 617.001.451. We would like to thank the anonymous reviewers for their thoughtful comments and efforts towards improving our manuscript.

References

J. Barnard, R. McCulloch, and X.-L. Meng. Modeling Covariance Matrices in Terms of Standard Deviations and Correlations, with Application to Shrinkage. Stat. Sin., 10(4):1281–1311, 2000. L. S. Chen, F. Emmert-Streib, and J. D. Storey. Harnessing naturally randomized transcription to

infer regulatory relationships among genes. Genome Biology, 8:R219, Oct. 2007.

L. S. Chen, D. P. Sangurdekar, and J. D. Storey. trigger, 2017. URLhttps://doi.org/doi: 10.18129/B9.bioc.trigger.

T. Claassen and T. Heskes. A Logical Characterization of Constraint-based Causal Discovery. UAI’11, pages 135–144, Arlington, Virginia, United States, 2011. AUAI Press.

G. F. Cooper. A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships. Data Mining and Knowledge Discovery, 1(2):203–224, June 1997. D. Geiger and D. Heckerman. Learning Gaussian Networks. UAI’94, pages 235–243, San

Fran-cisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc. ISBN 978-1-55860-332-5.

S. Mani, P. Spirtes, and G. F. Cooper. A Theoretical Study of Y Structures for Causal Discovery. UAI’06, pages 314–323, Arlington, Virginia, United States, 2006. AUAI Press.

N. Meinshausen and P. B¨uhlmann. High-Dimensional Graphs and Variable Selection with the Lasso.

The Annals of Statistics, 34(3):1436–1462, 2006.

J. Millstein, B. Zhang, J. Zhu, and E. E. Schadt. Disentangling molecular relationships with a causal inference test. BMC Genetics, 10:23, May 2009.

E. C. Neto, A. T. Broman, M. P. Keller, A. D. Attie, B. Zhang, J. Zhu, and B. S. Yandell. Modeling Causality for Pairs of Phenotypes in System Genetics. Genetics, 193(3):1003–1013, Mar. 2013. T. Richardson and P. Spirtes. Ancestral Graph Markov Models. The Annals of Statistics, 30(4):

962–1030, 2002. ISSN 0090-5364.

K. Sadeghi and S. Lauritzen. Markov properties for mixed graphs. Bernoulli, 20(2):676–696, May 2014. ISSN 1350-7265. doi: 10.3150/12-BEJ502.

M. Studeny. Probabilistic Conditional Independence Structures. Springer Science & Business Media, June 2006. ISBN 978-1-84628-083-2. Google-Books-ID: NJ4iwCMoznIC.