1.4 Creating Bayesian Networks Using Causal Edges
2.1.3 Finding d-Separations
Since d-separations entail conditional independencies, we want an efficient al- gorithm for determining whether two sets are d-separated by another set. We develop such an algorithm next. After that, we show a useful application of the algorithm.
2.1. ENTAILED CONDITIONAL INDEPENDENCIES 77
U
X
Y
Z
W
V
S
T
Q
1 1 2M
N
3 4 2 3 5Figure 2.7: If the set of legal pairs is {(X → Y, Y → V), (Y → V, V → Q),
(X → W, W →S), (X →U, U → T), (U → T, T →M), (T →M, M → S),
(M→S, S→V),(S→V, V →Q)}, and we are looking for the nodes reachable from{X}, Algorithm 2.1 labels the edges as shown. Reachable nodes are shaded.
An Algorithm for Finding d-Separations
We will develop an algorithm that finds the set of all nodes d-separated from one set of nodesB by another set of nodes A. To accomplish this, we will first find every nodeXsuch that there is at least one active chain givenAbetweenX
and a node inB. This latter task can be accomplished by solving the following more general problemfirst. Suppose we have a directed graph (not necessarily acyclic), and we say that certain edges cannot appear consecutively in our paths of interest. That is, we identify certain ordered pairs ofedges(U →V, V →W)
aslegaland the remaining as illegal. We call apath legalif it does not contain any illegal ordered pairs of edges, and we say Y isreachablefromX if there is a legal path fromX to Y. Note that we are looking only for paths; we are not looking for chains that are not paths. We can find the set Rof all nodes reachable from X as follows: We note that any node V such that the edge
X →V exists is reachable. We label each such edge with a 1, and add each suchV toR. Next for each such V, we check all unlabeled edges V →W and see if (X →V, V → W)is a legal pair. We label each such edge with a 2 and we add each such W to R. We then repeat this procedure with V taking the place of X and W taking the place of V. This time we label the edges found with a 3. We keep going in this fashion until wefind no more legal pairs. This is similar to a breadth-first graph search except we are visiting links rather than nodes. In this way, we may investigate a given node more than once. Of course, we want to do this because there may be a legal path through a given node even though another edge reaches a dead-end at the node. Figure 2.7 illustrates this method. The algorithm that follows, which is based on an algorithm in [Geiger et al, 1990a], implements it.
Before giving the algorithm, we discuss how we present algorithms. We use a very loose C++ like pseudocode. That is, we use a good deal of simple English description, we ignore restrictions of the C++ language such as the inability to declare local arrays, and we freely use data types peculiar to the given application without defining them. Finally, when it will only clutter rather than elucidate the algorithm, we do not define variables. Our purpose is to present the algorithm using familiar, clear control structures rather than adhere to the dictates of a programming language.
Algorithm 2.1
Find Reachable NodesProblem: Given a directed graph and a set of legal ordered pairs of edges, determine the set of all nodes reachable from a given set of nodes. Inputs: a directed graphG= (V,E), a subsetB⊂V, and a rule for determin-
ing whether two consecutive edges are legal.
Outputs: the subsetR⊂Vof all nodes reachable fromB. voidf ind_reachable_nodes (directed_graphG= (V,E),
set-of-nodesB,
set-of-nodes&R)
{
for(eachX∈B){ addX toR;
for(eachV such that the edgeX→V exists){ addV to R; labelX→V with1; } } i= 1; f ound=true; while(f ound){ found=false;
for(eachV such that U →V is labeledi)
for(each unlabeled edgeV →W
such that(U →V,V →W)is legal){ addW to R; labelV →W withi+ 1; found=true; } i=i+ 1; } }
2.1. ENTAILED CONDITIONAL INDEPENDENCIES 79 Geiger at al [1990b] proved Algorithm 2.1 is correct. We analyze it next.
Analysis of Algorithm 2.1 (Find Reachable Nodes)
Letn be the number of nodes and m be the number of edges. In the worst case, each of the nodes can be reached fromnentry points (Note that the graph is not necessarily a DAG; so there can be edge from a node to itself.). Each time a node is reached, an edge emanating from it may need to be re-examined. For example, in Figure 2.7 the edge S → V is examined twice. This means each edge may be examined n times, which implies the worst-case time complexity is the following:
W(m, n)∈θ(mn).
Next we address the problem of identifying the set of nodes D that are d- separated fromB byA in a DAGG= (V,E). First we willfind the setRsuch thatY ∈Rif and only if eitherY ∈Bor there is at least one active chain given AbetweenY and a node inB. Once wefindR,D=V−(A∪R).
If there is an active chain ρ between node X and some other node, then every 3-node subchainU−V −W ofρhas the following property: Either
1. U−V −W is not head-to-head atV andV is not inA; or
2. U−V −W is head-to-head atV andV is or has a descendent inA. Initially we may try to mimic Algorithm 2.1. We say we are mimicking Algorithm 2.1 because now we are looking for chains that satisfy certain conditions; we are not restricting ourselves to paths as Algorithm 2.1 does. We mimic Algorithm 2.1 as follows: We call a pair of adjacent links (U −V,V −W) legal if and only ifU−V −W satisfies one of the two conditions above. Then we proceed fromX as in Algorithm 2.1 numbering links and adding reachable nodes toR. This methodfinds only nodes that have an active chain between them andX, but it does not always find all of them. Consider the DAG in Figure 2.8 (a). GivenA is the only node inA andX is the only edge in B, the edges in that DAG are numbered according to the method just described. The active chain
X→A←Z ←T ←Y is missed because the edgeT →Z is already numbered by the time the chain A ← Z ← T is investigated, which means the chain
Z ← T ←Y is never investigated. Since this is the only active chain between
X andY,Y is not be added toR.
We can solve this problem by creating fromG= (V,E)a new directed graph G0= (V,E0), which has the links inGgoing in both directions. That is,
E0 =E∪{U →V such thatV →U ∈E}.
We then apply Algorithm 2.1 to G0 calling (U → V,V → W) legal in G0 if and only ifU −V −W satisfies one of the two conditions above inG. In this
X Y Z T A 1 1 2 2 X Y Z T A 1 1 2 2 3 4 (a) (b)
Figure 2.8: The directed graphG0 in (b) is created from the DAGGin (a) by making each link go in both directions. The numbering of the edges in (a) is the result of applying a mimic of Algorithm 2.1 toG, while the numbering of the edges in (b) is the result of applying Algorithm 2.1 toG0.
way every active chain between X and Y in G has associated with it a legal path from X to Y in G0, and will therefore not be missed. Figure 2.8 (b) showsG0, whenGis the DAG in Figure 2.8 (a), along with the edges numbered according to this application of Algorithm 2.1. The following algorithm, taken from [Geiger et al, 1990a], implements the method.
Algorithm 2.2
Find d-SeparationsProblem: Given a DAG, determine the set of all nodes d-separated from one set of nodes by another set of nodes.
Inputs: a DAGG= (V,E)and two disjoint subsetsA,B⊂V.
Outputs: the subsetD⊂Vcontaining all nodes d-separated from every node inBbyA.That is,IG(B,D|A)holds and no superset ofDhas this property.
voidf ind_d_separations (DAGG= (V,E),
set-of-nodes A,B,
set-of-nodes&D)
{
2.1. ENTAILED CONDITIONAL INDEPENDENCIES 81 for(eachV ∈V){ if (V ∈A) in[V] =true; else in[V] =false; if (V is or has a descendent inA) descendent[V] =true; else descendent[V] =false; } E0=E∪{U →V such thatV →U ∈E}; // Call Algorithm 2.1 as follows:
f ind_reachable_nodes(G0= (V,E0),B,R);
// Use this rule to decide whether(U →V, V →W)is legal inG0: // The pair(U →V, V →W)is legal if and only ifU 6=W
// and one of the following hold:
//1) U−V −W is not head-to-head inGandin[V]is false; //2) U−V −W is head-to-head in Ganddescendent[V]is true. D=V−(A∪R); // We do not need to removeBbecauseB⊆R.
}
Next we analyze the algorithm:
Analysis of Algorithm 2.2 (Find d-Separations)
Although Algorithm 2.1’s worst case time complexity is inθ(mn), where n is the number of nodes and m is the number of edges, we will show this application of it requires only θ(m) time in the worst case. We can implement the construction of descendent[V]
as follows. Initially set descendent[V] = true for all nodes in A. Then follow the incoming edges inAto their parents, their parents’ parents, and so on, settingdescendent[V] =true for each node found along the way. In this way, each edge is examined at most once, and so the construction requiresθ(m)time. Similarly, we can construct
in[V]inθ(m)time.
Next we show that the execution of Algorithm 2.1 can also be done in θ(m) time (assuming m ≥ n). To accomplish this, we use the following data structure to representG. For each node we store a list of the nodes that point to that node. For example, this list for nodeT in Figure 2.8 (a) is {X, Y}. Call this list the node’sinlist. We then create an outlist for each node, which contains all the node’s to which a node points. For example, this list for nodeX in Figure 2.8 (a) is{A, T}. Clearly, these lists can be created from the
inlists inθ(m)time. Now suppose Algorithm 2.1 is currently trying to determine for edge U →V in G0 which pairs (U → V, V →W) are legal. We simply choose all the nodes in V’s inlist or outlist or both according to the following pseudocode:
if (U→V inG){ //U points to V inG. if (descendent[V] ==true)
choose all nodesW inV’s inlist; if (in[V] ==false)
choose all nodesW inV’s outlist; }
else{ //V points toU inG.
if (in[V] ==true)
choose no nodes;
else choose all nodesW inV’s inlist and inV’s outlist; }
So for each edge U → V in G0 we can find all legal pairs (U →
V, V → W) in constant time. Since Algorithm 2.1 only looks for these legal pairs at most once for each edge U →V, the algorithm runs inθ(m)time.
Next we prove the algorithm is correct.
Theorem 2.2 The setDreturned by Algorithm 2.2 contains all and only nodes d-separated from every node in B by A. That is, we have IG(B,D|A) and no superset ofDhas this property.
Proof. The setRdetermined by the algorithm contains all nodes inB(because
Algorithm 2.1 initially adds nodes in B to R) and all nodes reachable from B
via a legal path in G0. For any two nodes X ∈ B and Y /∈ A∪B, the chain
X−· · ·−Y is active in Gif and only if the pathX→· · ·→Y is legal in G0.
ThusR contains the nodes inB plus all and only those nodes that have active chains between them and a node inB. By the definition of d-separation, a node is d-separated from every node inB byA if the node is not inA∪Band there is no active chain between the node and a node inB. Thus D=V−(A∪R)is the set of all nodes d-separated from every node inB byA.
An Application
In general, the inference problem in Bayesian networks is to determineP(B|A), whereAandBare two sets of variables. In the application of Bayesian networks to decision theory, which is discussed in Chapter 5, we are often interested in determining how sensitive our decision is to each parameter in the network so that we do not waste effort trying to refine values which do not affect the decision. This matter is discussed more in [Shachter, 1988]. Next we show how
2.1. ENTAILED CONDITIONAL INDEPENDENCIES 83
X PX
P(x1| px) = px
P(x2| px) = 1-px
Figure 2.9: PX is a variable whose possible values are the probabilities we may assign tox1. H B F L C Figure 2.10: A DAG.
Algorithm 2.2 can be used to determine which parameters are irrelevant to a given computation.
Suppose variableXhas two possible value x1and x2, and we have not yet ascertainedP(x). We can create a variablePX whose possible values lie in the interval[0,1], and representP(X=x)using the Bayesian network in Figure 2.9. In Chapter 6 we will discuss assigning probabilities to the possible values ofPx in the case where the probabilities are relative frequencies. In general, we can represent the possible values of the parameters in the conditional distributions associated with a node using a set of auxiliary parent nodes. Figure 2.11 shows one such parent node for each node in the DAG in Figure 2.10. In general, each node can have more than one auxiliary parent node, and each auxiliary parent node can represent a set of random variables. However, this is not important to our present discussion; so we show only one node representing a single variable for the sake of simplicity. You are referred to Chapters 6 and 7 for the details of this representation. Let G00 be the DAG obtained fromG by adding these auxiliary parent nodes, and letPbe the set of auxiliary parent nodes. Then to determine which parameters are necessary to the calculation of P(B|A)in G, we need onlyfirst use Algorithm 2.1 to determineDsuch thatIG00(B,D|A)and no superset ofDhas this property, and then take D∩P.
H B F X L PH PB PL PX PF
Figure 2.11: Each shaded node is an auxiliary parent node representing possible values of the parameters in the conditional distributions of the child.
Example 2.4 LetGbe the DAG in Figure 2.10. ThenG00is as shown in Figure
2.11. To determineP(f) we need ascertain all and only the values ofPH,PB,
PL, and PF because we have IG00({F},{PX}), and PX is the only auxiliary
parent variable d-separated from {F} by the empty set. To determine P(f|b)
we need ascertain all and only the values of PH, PL, and PF because we have
IG00({F},{PB, PX}|{B}), andPBandPX are the only auxiliary parent variables
d-separated from{F}by{B}. To determineP(f|b, x)we need ascertain all and only the values of PH, PL, PF, and PX, because IG00({F},{PB}|{B, X}), and
PB is the only auxiliary parent variables d-separated from{F}by{B, X}. It is left as an exercise to write an algorithm implementing the method just described.