Finding d-Separations - Creating Bayesian Networks Using Causal Edges

1.4 Creating Bayesian Networks Using Causal Edges

2.1.3 Finding d-Separations

Since d-separations entail conditional independencies, we want an eﬃcient algorithm for determining whether two sets are d-separated by another set. We develop such an algorithm next. After that, we show a useful application of the algorithm.

2.1. ENTAILED CONDITIONAL INDEPENDENCIES 77

U

X

Y

Z

W

V

S

T

Q

1 1 2

M

N

3 4 2 3 5

Figure 2.7: If the set of legal pairs is {(X _→ Y, Y _→ V), (Y _→ V, V _→ Q),

(X _→ W, W _→S), (X _→U, U _→ T), (U _→ T, T _→M), (T _→M, M _→ S),

(M_→S, S_→V),(S_→V, V _→Q)}, and we are looking for the nodes reachable from_{X_}, Algorithm 2.1 labels the edges as shown. Reachable nodes are shaded.

An Algorithm for Finding d-Separations

We will develop an algorithm that finds the set of all nodes d-separated from one set of nodesB by another set of nodes A. To accomplish this, we will first find every nodeXsuch that there is at least one active chain givenAbetweenX

and a node inB. This latter task can be accomplished by solving the following more general problemfirst. Suppose we have a directed graph (not necessarily acyclic), and we say that certain edges cannot appear consecutively in our paths of interest. That is, we identify certain ordered pairs ofedges(U _→V, V _→W)

aslegaland the remaining as illegal. We call apath legalif it does not contain any illegal ordered pairs of edges, and we say Y isreachablefromX if there is a legal path fromX to Y. Note that we are looking only for paths; we are not looking for chains that are not paths. We can find the set Rof all nodes reachable from X as follows: We note that any node V such that the edge

X _→V exists is reachable. We label each such edge with a 1, and add each suchV toR. Next for each such V, we check all unlabeled edges V _→W and see if (X _→V, V _→ W)is a legal pair. We label each such edge with a 2 and we add each such W to R. We then repeat this procedure with V taking the place of X and W taking the place of V. This time we label the edges found with a 3. We keep going in this fashion until wefind no more legal pairs. This is similar to a breadth-first graph search except we are visiting links rather than nodes. In this way, we may investigate a given node more than once. Of course, we want to do this because there may be a legal path through a given node even though another edge reaches a dead-end at the node. Figure 2.7 illustrates this method. The algorithm that follows, which is based on an algorithm in [Geiger et al, 1990a], implements it.

Before giving the algorithm, we discuss how we present algorithms. We use a very loose C++ like pseudocode. That is, we use a good deal of simple English description, we ignore restrictions of the C++ language such as the inability to declare local arrays, and we freely use data types peculiar to the given application without defining them. Finally, when it will only clutter rather than elucidate the algorithm, we do not define variables. Our purpose is to present the algorithm using familiar, clear control structures rather than adhere to the dictates of a programming language.

Algorithm 2.1

Find Reachable Nodes

Problem: Given a directed graph and a set of legal ordered pairs of edges, determine the set of all nodes reachable from a given set of nodes. Inputs: a directed graph_G= (V,E), a subsetB_⊂V, and a rule for determin-

ing whether two consecutive edges are legal.

Outputs: the subsetR_⊂Vof all nodes reachable fromB. voidf ind_reachable_nodes (directed_graph_G= (V,E),

set-of-nodesB,

set-of-nodes&R)

{

for(eachX_∈B){ addX toR;

for(eachV such that the edgeX_→V exists){ addV to R; labelX_→V with1; } } i= 1; f ound=true; while(f ound){ found=false;

for(eachV such that U _→V is labeledi)

for(each unlabeled edgeV _→W

such that(U _→V,V _→W)is legal){ addW to R; labelV _→W withi+ 1; found=true; } i=i+ 1; } }

2.1. ENTAILED CONDITIONAL INDEPENDENCIES 79 Geiger at al [1990b] proved Algorithm 2.1 is correct. We analyze it next.

Analysis of Algorithm 2.1 (Find Reachable Nodes)

Letn be the number of nodes and m be the number of edges. In the worst case, each of the nodes can be reached fromnentry points (Note that the graph is not necessarily a DAG; so there can be edge from a node to itself.). Each time a node is reached, an edge emanating from it may need to be re-examined. For example, in Figure 2.7 the edge S _→ V is examined twice. This means each edge may be examined n times, which implies the worst-case time complexity is the following:

W(m, n)_∈θ(mn).

Next we address the problem of identifying the set of nodes D that are d- separated fromB byA in a DAG_G= (V,E). First we willfind the setRsuch thatY _∈Rif and only if eitherY _∈Bor there is at least one active chain given AbetweenY and a node inB. Once wefindR,D=V₋(A_∪R).

If there is an active chain ρ between node X and some other node, then every 3-node subchainU₋V ₋W ofρhas the following property: Either

1. U₋V ₋W is not head-to-head atV andV is not inA; or

2. U₋V ₋W is head-to-head atV andV is or has a descendent inA. Initially we may try to mimic Algorithm 2.1. We say we are mimicking Algorithm 2.1 because now we are looking for chains that satisfy certain conditions; we are not restricting ourselves to paths as Algorithm 2.1 does. We mimic Algorithm 2.1 as follows: We call a pair of adjacent links (U ₋V,V ₋W) legal if and only ifU₋V ₋W satisfies one of the two conditions above. Then we proceed fromX as in Algorithm 2.1 numbering links and adding reachable nodes toR. This methodfinds only nodes that have an active chain between them andX, but it does not always find all of them. Consider the DAG in Figure 2.8 (a). GivenA is the only node inA andX is the only edge in B, the edges in that DAG are numbered according to the method just described. The active chain

X_→A_←Z _←T _←Y is missed because the edgeT _→Z is already numbered by the time the chain A _← Z _← T is investigated, which means the chain

Z _← T _←Y is never investigated. Since this is the only active chain between

X andY,Y is not be added toR.

We can solve this problem by creating from_G= (V,E)a new directed graph G0_{= (}_V_,_E0₎_{, which has the links in}_G_{going in both directions. That is,}

E0 =E_∪_{U _→V such thatV _→U _∈E_}.

We then apply Algorithm 2.1 to _G0 _{calling (}_U _→ _V_,_V _→ _W_{) legal in} _G0 _if and only ifU ₋V ₋W satisfies one of the two conditions above in_G. In this

X Y Z T A 1 1 2 2 X Y Z T A 1 1 2 2 3 4 (a) (b)

Figure 2.8: The directed graph_G0 _{in (b) is created from the DAG}_G_{in (a) by} making each link go in both directions. The numbering of the edges in (a) is the result of applying a mimic of Algorithm 2.1 to_G, while the numbering of the edges in (b) is the result of applying Algorithm 2.1 to_G0_.

way every active chain between X and Y in _G has associated with it a legal path from X to Y in _G0_{, and will therefore not be missed. Figure 2.8 (b)} shows_G0_{, when}_G_{is the DAG in Figure 2.8 (a), along with the edges numbered} according to this application of Algorithm 2.1. The following algorithm, taken from [Geiger et al, 1990a], implements the method.

Algorithm 2.2

Find d-Separations

Problem: Given a DAG, determine the set of all nodes d-separated from one set of nodes by another set of nodes.

Inputs: a DAG_G= (V,E)and two disjoint subsetsA,B_⊂V.

Outputs: the subsetD_⊂Vcontaining all nodes d-separated from every node inBbyA.That is,I_G(B,D_|A)holds and no superset ofDhas this property.

voidf ind_d_separations (DAG_G= (V,E),

set-of-nodes A,B,

set-of-nodes&D)

{

2.1. ENTAILED CONDITIONAL INDEPENDENCIES 81 for(eachV _∈V){ if (V _∈A) in[V] =true; else in[V] =false; if (V is or has a descendent inA) descendent[V] =true; else descendent[V] =false; } E0₌_E_∪_{_U _→_V _{such that}_V _→_U _∈_E_}_; // Call Algorithm 2.1 as follows:

f ind_reachable_nodes(_G0_{= (}_V_,_E0₎_,_B_,_R_);

// Use this rule to decide whether(U _→V, V _→W)is legal in_G0_: // The pair(U _→V, V _→W)is legal if and only ifU ₆=W

// and one of the following hold:

//1) U₋V ₋W is not head-to-head in_Gandin[V]is false; //2) U₋V ₋W is head-to-head in _Ganddescendent[V]is true. D=V₋(A_∪R); // We do not need to removeBbecauseB_⊆R.

}

Next we analyze the algorithm:

Analysis of Algorithm 2.2 (Find d-Separations)

Although Algorithm 2.1’s worst case time complexity is inθ(mn), where n is the number of nodes and m is the number of edges, we will show this application of it requires only θ(m) time in the worst case. We can implement the construction of descendent[V]

as follows. Initially set descendent[V] = true for all nodes in A. Then follow the incoming edges inAto their parents, their parents’ parents, and so on, settingdescendent[V] =true for each node found along the way. In this way, each edge is examined at most once, and so the construction requiresθ(m)time. Similarly, we can construct

in[V]inθ(m)time.

Next we show that the execution of Algorithm 2.1 can also be done in θ(m) time (assuming m _≥ n). To accomplish this, we use the following data structure to represent_G. For each node we store a list of the nodes that point to that node. For example, this list for nodeT in Figure 2.8 (a) is _{X, Y_}. Call this list the node’sinlist. We then create an outlist for each node, which contains all the node’s to which a node points. For example, this list for nodeX in Figure 2.8 (a) is_{A, T_}. Clearly, these lists can be created from the

inlists inθ(m)time. Now suppose Algorithm 2.1 is currently trying to determine for edge U _→V in _G0 _{which pairs} ₍_U _→ _{V, V} _→_W₎ are legal. We simply choose all the nodes in V’s inlist or outlist or both according to the following pseudocode:

if (U_→V in_G){ //U points to V in_G. if (descendent[V] ==true)

choose all nodesW inV’s inlist; if (in[V] ==false)

choose all nodesW inV’s outlist; }

else{ //V points toU in_G.

if (in[V] ==true)

choose no nodes;

else choose all nodesW inV’s inlist and inV’s outlist; }

So for each edge U _→ V in _G0 _{we can} _{find all legal pairs} ₍_U _→

V, V _→ W) in constant time. Since Algorithm 2.1 only looks for these legal pairs at most once for each edge U _→V, the algorithm runs inθ(m)time.

Next we prove the algorithm is correct.

Theorem 2.2 The setDreturned by Algorithm 2.2 contains all and only nodes d-separated from every node in B by A. That is, we have I_G(B,D_|A) and no superset ofDhas this property.

Proof. The setRdetermined by the algorithm contains all nodes inB(because

Algorithm 2.1 initially adds nodes in B to R) and all nodes reachable from B

via a legal path in _G0_{. For any two nodes} _X _∈ _B _and _{Y /}_∈ _A_∪_B_{, the chain}

X₋_{· · ·}₋Y is active in _Gif and only if the pathX_→_{· · ·}_→Y is legal in _G0_.

ThusR contains the nodes inB plus all and only those nodes that have active chains between them and a node inB. By the definition of d-separation, a node is d-separated from every node inB byA if the node is not inA_∪Band there is no active chain between the node and a node inB. Thus D=V₋(A_∪R)is the set of all nodes d-separated from every node inB byA.

An Application

In general, the inference problem in Bayesian networks is to determineP(B_|A), whereAandBare two sets of variables. In the application of Bayesian networks to decision theory, which is discussed in Chapter 5, we are often interested in determining how sensitive our decision is to each parameter in the network so that we do not waste eﬀort trying to refine values which do not aﬀect the decision. This matter is discussed more in [Shachter, 1988]. Next we show how

2.1. ENTAILED CONDITIONAL INDEPENDENCIES 83

X P_X

P(x1| p_x) = p_x

P(x2| p_x) = 1-p_x

Figure 2.9: PX is a variable whose possible values are the probabilities we may assign tox1. H B F L C Figure 2.10: A DAG.

Algorithm 2.2 can be used to determine which parameters are irrelevant to a given computation.

Suppose variableXhas two possible value x1and x2, and we have not yet ascertainedP(x). We can create a variablePX whose possible values lie in the interval[0,1], and representP(X=x)using the Bayesian network in Figure 2.9. In Chapter 6 we will discuss assigning probabilities to the possible values ofPx in the case where the probabilities are relative frequencies. In general, we can represent the possible values of the parameters in the conditional distributions associated with a node using a set of auxiliary parent nodes. Figure 2.11 shows one such parent node for each node in the DAG in Figure 2.10. In general, each node can have more than one auxiliary parent node, and each auxiliary parent node can represent a set of random variables. However, this is not important to our present discussion; so we show only one node representing a single variable for the sake of simplicity. You are referred to Chapters 6 and 7 for the details of this representation. Let _G00 _{be the DAG obtained from}_G _{by adding these} auxiliary parent nodes, and letPbe the set of auxiliary parent nodes. Then to determine which parameters are necessary to the calculation of P(B_|A)in _G, we need onlyfirst use Algorithm 2.1 to determineDsuch thatI_G00(B,D|A)and no superset ofDhas this property, and then take D_∩P.

H B F X L P_H P_B P_L P_X P_F

Figure 2.11: Each shaded node is an auxiliary parent node representing possible values of the parameters in the conditional distributions of the child.

Example 2.4 Let_Gbe the DAG in Figure 2.10. Then_G00_{is as shown in Figure}

2.11. To determineP(f) we need ascertain all and only the values ofPH,PB,

PL, and PF because we have IG00({F},{P_X}), and P_X is the only auxiliary

parent variable d-separated from _{F_} by the empty set. To determine P(f_|b)

we need ascertain all and only the values of PH, PL, and PF because we have

I_G00({F},{P_B, P_X}|{B}), andP_BandP_X are the only auxiliary parent variables

d-separated from_{F_}by_{B_}. To determineP(f_|b, x)we need ascertain all and only the values of PH, PL, PF, and PX, because IG00({F},{P_B}|{B, X}), and

PB is the only auxiliary parent variables d-separated from{F}by{B, X}. It is left as an exercise to write an algorithm implementing the method just described.

In document Learning Bayesian Networks Neapolitan R E pdf (Page 86-94)