Introduction to Bayesian Networks

Chapter 2 Graphical Models

2.3 Introduction to Bayesian Networks

A BN model is a pair B = (D,P₎_{, where} _D _{= (V, E)} _{is a DAG and} P _{is a}

probabilistic measure. Recall that in a DAG D = (V, E), the set of vertices is given by a well-ordered set V = {v1, . . . , vn}, where each vertex vi represents a

variable_Zi, and the edge setEcontains a collection of directed edge(vi, vj), i < j.

The edges in E enable analysts to describe whether a particular variable provides any relevant probabilistic statement to explain another variable given a set of contextual information.

In a BN, the encoding of probabilistic hypotheses is possible due to the con- cept of conditional independence. For example, take three discrete random variables Z1, Z2 and Z3 whose set of categories are, respectively, Z1 = {1, . . . , L1},

Z2 ={1, . . . , L2} and Z3 = {1, . . . , L3}. If a domain analyst believes that col-

lecting information on the value of Z2 once the value of Z3 is known brings no

further improvement to explain variableZ1, thenZ2 is not probabilistically relevant

to variable Z1 given the value of Z3. In this case we say that Z1 is conditionally

independent from Z2 given Z3. This implies that for every triad (z1, z2, z3) in

Z1×Z2×Z3, we have that p(Z1 =z1|Z2 =z2, Z3 =z3) =p(Z1 =z1|Z3 =z3).

This idea directly generalises for random vectors and continuous probabilistic mea- sures as formally stated in Definition 16.

Definition 16(Conditional Independence). Take three random vectorsX,Y and

Z in a probability space (Ω,A_,P₎_{. We say that} _X _is _{conditionally independent}

of Y givenZ underP_{, and write}_X _⊥_⊥_Y_|_Z_{, if and only if for every set} _A_∈A

alone, i.e.

X _⊥_⊥Y_|Z _⇐⇒ P(X _∈A_|Y,Z) =P(X _∈A_|Z), (2.1) whenever p(y,z) is strictly positive.

LetZ(m) ₌_{Z

1, . . . ,Zm} denote the firstmvariables of a set of ordered random

variablesZ ₌_{Z₁_{, . . . ,}_Z_N_}_{in a probability space}_(Ω,A_,P₎_{. Also let}_pa(Z_j_{) =} {Zi ∈ Z(j−1);vk ∈ pa(vj)} be the parent set of Zj with respect to D. We can

now introduce a useful Markov property that enables us to relate a probability measure to a graphical topology.

Definition 17(Ordered Markov Property (OMP)). Take a set of ordered random variables Z _{in a probability space} _(Ω,A_,P₎ _{and a DAG} _D_{. The probability}

measure P _{satisfies the} _{ordered Markov property} _{relative to} _D _{if for every pair}

of non-adjacent vertices vi and vj in V, i < j, a variable Zj is conditionally

independent of a variable Zi, i < j, given its parent set pa(Zj).

Now we can use the OMP to formally define a BN model.

Definition 18(Bayesian Network). ABayesian Network (BN) is a graphical model constituted by a set of random variables Z _{in a probability space}_(Ω,A_,P₎_and

by a DAG D such that the probability measure P _{satisfies the ordered Markov}

property relative toD.

Example 2 below presents a naive BN model to describe the radicalisation process of inmates in a prison system.

Example 2 (Radicalisation Process). A model of a male prisoner’s radicalisation within prisons uses as explanatory variables his social networks and how the popu- lation is affected by prison transfers (Hannah et al., 2008, Neumann, 2010, Silke, 2011). The physical movements and social interactions of prisoners are constantly being monitored and recorded. Here the radicalisation process is summarised by a variable R that classifies a prisoner into one of three categories: resilient to (r), vulnerable to (v) or adopting (a) radicalisation. A social network (variable N) can

take one of the following three levels: s- sporadic, f- frequent, i- intense. These levels measure the frequency that a “standard” prisoner is able to socially interact with other prisoners who are identified as potential recruiters to radicalisation. A binary variable T records whether an inmate remain in the prison (n) or is transferred (t) to another prison.

Figure 2.1: The BN associated with Example 2

Assume that the variable Transfer T is independent of the variable Network N given the variable Radicalisation R. Consider the hypothesis that all those prisoners who have not adopted radicalisation are equally likely to be transferred. Figure 2.1 depicts a possible BN to represent this process.

Larger collections of conditional independence structures than those described by the OMP can be read from a BN model using the following properties satisfied by the ternary conditional independence relation (Dawid, 1979, Spohn, 1980):

Symmetry X _⊥_⊥Y_|Z _⇒Y _⊥_⊥X_|Z

Decomposition X _⊥_⊥(Y,W)_|Z _⇒X _⊥_⊥Y_|Z and X _⊥_⊥W_|Z

Weak Union X ⊥⊥(Y,W)|Z ⇒X ⊥⊥Y|(Z,W)

Contraction X _⊥_⊥Y_|Z and X _⊥_⊥W_|(Y,Z)⇒X _⊥_⊥(Y,W)|Z

These four properties constitute the semi-graphoid axioms. These allow analysts to explore the relevance of information using a graphical topology initially elicited (Pearl and Paz, 1987). If the probability measure P _{is strictly positive then the}

fifth properties given below also holds and we have a graphoid. For an intuitive interpretation of the graphoid axioms see Pearl (2009).

Intersection X _⊥_⊥Y_|Z,W and X _⊥_⊥W_|(Y,Z)_⇒X _⊥_⊥(Y,W)_|Z

An alternative way to read conditional independences is to use the d-separation theorem initially stated in Pearl (1986, 1988) and then more formally treated in

Verma and Pearl (1990) and Geiger and Pearl (1990). To review this result, take two vertices va and vb of a DAG D = (V, E) and any subset VS ⊂ V\{va, vb}.

A trail τ between va and vb is said to be blocked by VS in D if there is a vertex

v _∈τ such that one of the conditions holds:

1. v pertains toVS and v is a non-collider vertex with respect to τ; or

2. v is a collider vertex inτ but v and all its descendants are not inVS.

If every trail betweenvaandvbis blocked thanvaandvb are said to bed-separated

by VS. It then follows that two disjoint subsets VA and VB are said to be

d-separated by a subset VS ⊂ V\(VA∪VB) if and only if every pair of vertices

(va, vb), such that va ∈Va and vb ∈VB, are d-separated by VS.

Theorem 1 (d-Separation Theorem,Pearl (1986, 1988)). Assume a BN model

B = (D,P₎_{, where} _D _{= (V, E)} _{and take any three disjoint subsets} _V_A_, _V_B _and VS of V. Let ZA, ZB and ZS be the set of random variables corresponding,

respectively, to VA,VB and VS. It then follows that

Z_A_⊥_⊥Z_B_|Z_S _⇐⇒ _V_S _d-separates _V_A _and _V_B_. _(2.2)

The property in Equation 2.2 is often called a global Markov property.

In Lauritzen et al. (1990) and Cowell et al. (2007), the d-separation theorem is rewritten using an undirected graphDM(A∪B∪S) = (VM, EM)corresponding to

a transformation of DAG D= (V, E) spanned by VA, VB and VS by the following

steps:

1. Take the graph Danc = (Vanc, Eanc), where Vanc =An(VA∪VB∪VS) in D

and Eanc ={(vi, vj)∈E;vi, vj ∈V}.

2. Construct the graphDM(A∪B∪S) = (VM, EM)fromDanc, whereVM =Vanc

and EM = ˜Eanc∪Emar. E˜anc is the set of undirected edges corresponding

toEanc, i.e,E˜anc ={(vi, vj); (vi, vj)∈Eanc}. Emar is the set of undirected

edges between any pair of vertices (vi, vj), i < j, in VM, such that in Danc

In an undirected graph G = (V, E) VS is said to separate VA and VB, where

VA∪VB∪VS are any three disjoints subsets of V, if every path between any pair

of vertices (va, vb), va ∈ Va and vb ∈ VB, passes through VS. The criterion of

d-separation as presented in Theorem 1 can then be restated as follows:

Z_A_⊥_⊥Z_B_|Z_S _⇐⇒ _V_S _separates _V_A _and _V_B _in D_M(A_∪B_∪S). (2.3) This alternative formulation is often more useful and appealing operationally. Using the d-separation property domain experts can identify local conditional independence structures that potentially characterise their processes. Therefore the qualitative aspects of a process can more deeply analysed and detailed and the probability distributions embedded within a model can be more precisely elicited and calibrated. The d-separation theorem also provides us with a solid criterion with which to manipulate and factorise complex graphical structures into local graphical components with simpler topologies. These local subgraphs constitute a key aspect that enables us to design and justify efficient inference and model selection algorithms: see e.g. Cowell et al. (2007), Korb and Nicholson (2011), Neapolitan (2004) and Smith (2010).

It is also shown that in a BN modelB= (D,P₎ _{the probability measure} P _over

the set of random variablesZ _{recursively factorizes as follows:}

p(Z =z_|D) = Y

Zi∈Z

p(Zi =zi|Zpa(Zi)=zpa(Zi)), (2.4)

where Z = (Z1, . . . , ZN)and Zpa(Zi) = (Zi1, . . . , Zik) are random vectors whose

every component is, respectively, a random variable in Z _and _pa(_Z_i₎_{. This also}

implies that in a BN modelB= (D,P₎_{every variable is conditionally independent}

of its non-descendent variables with respect to Dgiven its parent set. For further details see e.g. Cowell et al. (2007).

In document The dynamic chain event graph (Page 36-40)