BRAMP: Mondrian Process Change-points - Machine learning in systems biology at different scales

5.2 Model

5.2.5 BRAMP: Mondrian Process Change-points

The global change-point process described in the previous section lacks the capability to create segmentations with spatially varying length scales and dierent local neness and coarseness characteristics. In fact, introducing global change-points that might improve segmentation in one region can introduce artefacts in the form of undesired partitioning in other regions. In order to provide varying levels of neness of the segments and thereby account for spatial alterations of the regulatory relationships among species on a local scale, I adapt a local partitioning approach called the Mondrian process, introduced in [124] and described in detail in [123]. The Mondrian process can be expressed as a recursive generative process that randomly executes axis-aligned cuts, partitioning the underlying space in a hierarchical fashion akin to decision trees or kd-trees (Figure 5.3). The distinguishing feature of this recursive stochastic process is that it assigns probabilities to the various events in such a way that it is consistent (in a sense I make precise later). The implication of consistency is that the Mondrian process can be extended to innite spaces and used as a non-parametric prior in multiscale modelling. It can also be regarded as a n-dimensional generalization of the Poisson process, and it has the same self-consistency property.

Here I will introduce the Mondrian process into the framework of a Bayesian regression model and partitioning a 2-dimensional domainΘ1×Θ2 (longitude times latitude)

inhabited by the species proles as published in [3]. The so-called budget is a hyper- parameterλthat determines the average number of cuts in the partition. At each stage of the recursion, a Mondrian sample can either dene a trivial partitionΘ1×Θ2, i.e. a

segment, or a cut that creates two sub-processes m< and m>: m=hi, χ, λ 0

, m<, m>i,

whereiis the horizontal or vertical direction andχ the position of the cut. The directioniand positionχare drawn from a binomial and uniform distribution, respectively, both depending on Θ1 and Θ2, as shown in line 5 of Algorithm 1. The process of

cutting a segment is limited by the budgetλassociated to each segment and the cost

E of a cut. Conditional on half-perimeter τ = |Θ₁|+|Θ₂|, a cut is introduced yield-

ing m< and m> if the cost E ∼ exp(τ) does not exceed the budget λ, i.e. satises

λ0 ₌_λ₋_{E >}₀_{. The process is recursively repeated on} _m

< and m> until the budgets

are exhausted, as shown in Algorithm 1. This creates a binary tree with the initial Mondrian samplemk=1as the root node spanning the unit square[0; 1]2 and sub-nodes

representing Mondrian samplesm1<k≤K,k∈ {1, . . . , K} whereK is the total number

of nodes in the tree, e.g. K= 15in Figure 5.3. The leaf nodes present non-overlapping segments and are associated each with a latent variableh(k)∈ {1, . . . , H}labelled with

124 Chapter 5

mh(k) _{in the tree (right panel in Figure 5.3), where} _H _{constitutes the total number of}

leafs associated to the number of uncut segments in the 2-dimensional domain, e.g.

H = 8 in the left panel of Figure 5.3. Hence, the variable h(k) is an index to the segments or `spatial neighbourhoods' in space and can be understood as a latent eect that determines the dierent interactions among species, as described in the regression model in Section 5.2.2.

I apply the same regression model as dened Section 5.2.2, with the sole dierence that the segment indicesh∈ {1, . . . , H}are replaced with the uncut Mondrian partition

indicesh(k)∈ {1, . . . , H}.

Algorithm 1 MCMC Mondrian cut: Note, the Mondrian generative process corre- sponds to lines 1-4 and 7, i.e. the MCMC move extends it by considering the acceptance probability in lines 5-6.

1: Input: m,λ

2: h(k)← U(1, Z) . uniformly select uncut segmenth(k)

3: λ0

←λ−E withE ∼exp(|Θ₁h(k)|+|Θh₂(k)|)

4: if λ0 ≥0 then . if budget sucient draw directiond∈ {1,2},

5: . wherei= 1 is vertical and i= 2 is horizontal

6: i∼ B(|Θh₁(k)|/(|Θ₁h(k)|+|Θh₂(k)|))

7: χ|d∼ U(Θh_i(k)) . draw cut positionχ

8: α←min{1, r} .acceptance probability, equation 5.13

9: if α > u∼ U(0,1)then .accept with sub-trees m_< m_> 10: mh(k)_{← h}_{i, χ, λ}0_{, m}

<, m>i

11: end if

12: end if

5.2.5.1 Prior probabilities

The priors for the parameter w_n,h₍_k₎, regulator set πn, variance σn2, and signal-to-

noise hyper-parameterδ2

n are dened in the same way as for BRAM in Section 5.2.4.1.

However, the notation of the segment changes from h to h(k) so that I can identify the partition of a Mondrian sample mk with node index k. In addition, the size of

matrix Dn,h(k) below Equation 5.7 becomes sn,h(k) =|Θb

h(k) 1 ||Θb h(k) 2 | ×(pn+ 2), where |Θb h(k)

i |i∈{₁,2} denotes the size of a Mondrian sample in direction i.

The prior distribution of the Mondrian process depends on the hyper-parameter λ

and is dened via the generative process described in Algorithm 1. However, for the RJMCMC scheme described below all that is needed is the prior ratio, which is given by (5.14).

5.2. MODEL 125

Figure 5.3: Mondrian process example. The left panel shows an example partitioning with a Mondrian process. The right panel displays the asso-

ciated tree with labels of the latent variable h(k)identifying each non-

overlapping segment with leaf nodes (light grey) designated as mh_k(k),

wherek indexes all tree nodes.

5.2.5.2 Posterior probability

The likelihood follows from Equation (5.1) and closely resembles the previously dened likelihood of BRAM in Equation (5.8):

L(xh_n(k)|Gn,wnh(k), σhn(k)) = √ 2πσh(k) n −sn,h(k) × exp −(x h(k) n −Dn,h(k)(x)w h(k) n )†(xhn(k)−Dn,h(k)(x)w h(k) n ) 2(σh(k) n )2 !

An attractive feature of the chosen model is that the marginalization over the parameters w = {wnh(k),1 ≤ n ≤ N,1 ≤ h(k) ≤ H} and σ2 = {(σnh(k))2,1 ≤ n ≤ N,1 ≤

h(k) ≤H} is analytically tractable [89, 7], and I obtain a closed-form expression for

the marginal likelihood:

L(xh_n(k)|Gn, δ2) =

L(xh_n(k)|Gn,wnh(k), σhn(k))

p([σh(k)

126 Chapter 5 The objective of Bayesian inference is to sample from the posterior distribution given by

p(m,G, κ, δ2|x)∝ L(xh_n(k)|Gn, δ2)p(δ2)p(E)p(G|κ)p(m|λ) (5.12)

where all prior distributions have been dened above. To this end, I pursue a Gibbs sampling like strategy, where I iteratively sample new hyper-parameters from

p(κ,δ2_|G_{, m, x}₎_{, a new network structure from}_p₍_G|_κ,δ2_{, m, x}₎_{, and a new Mondrian pro-}

cess segmentation of the spatial domain fromp(m|G,y, λ). The rst distribution is of

standard form due to conjugacy of the prior, and the hyper-parameters can be sampled directly. However, direct sampling from the other two distributions is intractable, and I therefore apply RJMCMC [56]. To sample new network structures G, I follow the

scheme described by Lèbre et al. [89], which is based on edge birth and death moves. To sample new Mondrian process partitions, I adopt the method proposed by Wang et al. [147], which I will briey outline in the next Section. The scheme could be extended to inferλ, but that has not been done yet, and I assume this hyper-parameter to be xed and hence have not made it explicit on the left-hand side of equation (5.12).

I'm primarily interested in a sample of network structures from the posterior distribution p(G|x), which I obtain by marginalizing over the hyper-parameters and Mon-

drian process partitionings. I get the posterior probabilities of all species interactions

p(n → n0_|_x₎ _{by further marginalization, which dene the ranking of interactions in}

terms of posterior condence. 5.2.5.3 Inference

As described above, an essential step of the inference procedure is to sample a new Mondrian process segment m from p(m|G, y, λ). The current state of the Mondrian

process m is represented by a structure tree as illustrated in Figure 5.3 and a model parameter vector ζ, which contains all previous costs Ek and cut locations χk. Note

that all budgets and domains can be computed from that recursively. When a cut move is proposed (marked with +), the current parameter values are augmented by supplementary random variates u1 and u2 in such a way that the dimensions in the

higher and lower dimensional parameter spaces are matched. I uniformly sample a spatial segment h(k) draw u1 and u2 from the density q(u1, u2) and set ζ → ζ+ =

5.3. DATA 127 proceeds as shown in Algorithm 1, where χh(k) _{denes the position proportional to}

the sample domain size, which follows a Bernoulli distribution B. The proposed new

Mondrian process statem+ _{is accepted with probability} _α₌_min_{₁_{, r}_}_,

r= P_P(₍m_m+_|_λ|λ₎) ×q(m|m+) q(m+_|_m₎ × L(xh_<(k)|G, δ2)L(xh_>(k)|G, δ2) L(xh(k)_|G_{, δ}2₎ ×J (5.13) q(m|m+) q(m+|m) = Z φ(m+₎_q₍_Eh(k)_{, χ}h(k)₎, (5.14) P(m+_|_λ₎ P(m|λ) = ωh(k) < ω h(k) > p(Eh(k))p(χh(k)) ωh(k) (5.15)

Here, the subscripts>and <refer to the two new spatial segments associated with the cut, xh(k)

> and x h(k)

< are the corresponding subsets of xh(k), and xh(k) denotes the

species abundance data associated with segment/leaf nodeh(k). Following the standard RJMCMC scheme [56], the four terms in (5.13) are the prior ratio, inverse proposal ratio, marginal likelihood ratio and Jacobian. The latter is one, J = 1, the marginal likelihood is given by (5.11), and the prior and proposal ratios are given by (5.14), where

φ denotes the number of Mondrian leaf siblings, i.e. adjacent segments that can be merged in order to restorem, andωh(k)₌R∞

λ τh(k)exp(−τh(k)e)de= exp(−τh(k)λh(k))

denotes the probability of no further cut. By setting q(Ek, χk) = p(Ek)p(χk), the

expression naturally simplies. The state m is replaced by the proposal m+ _{in the}

case the move is accepted. The probability of removing a cut is given by the inverse of (5.13). A shift move replaces the directioniand positionχof a cut, which separates the adjacent segmentsh(k1)and h(k2) yielding the proposal segmentsh(k1)+ and h(k2)+.

The acceptance probability is α = min{1,L(xh(k1)+)_L(xh(k2)+)/_L(xh(k1))_L(xh(k2))_}

after cancelling the proposal and prior ratios because budget, cost and number of Mondrian samples remain invariant. Whenever a segment is cut or merged, the aected regression coecients are sampled from the posterior.

5.3 Data

In document Machine learning in systems biology at different scales : from molecular biology to ecology (Page 136-140)