• No results found

1.4 Creating Bayesian Networks Using Causal Edges

2.1.2 d-Separation

We showed in Section 2.1.1 that the Markov condition entailsIP({C},{G}|{F}) for the DAG in Figure 2.1. This conditional independency is an example of a DAG property called ‘d-separation’. That is, {C}and {G} are d-separated by {A, F} in the DAG in Figure 2.1. Next we develop the concept of d- separation, and we show the following: 1) The Markov condition entails that all d-separations are conditional independencies; and 2) every conditional inde- pendencies entailed by the Markov condition is identified by d-separation. That is, if (G, P)satisfies the Markov condition, every d-separation inG is a condi- tional independency inP. Furthermore, every conditional independency, which is common to all probability distributions satisfying the Markov condition with

2.1. ENTAILED CONDITIONAL INDEPENDENCIES 71 G, is identified by d-separation.

All d-separations are Conditional Independencies

First we need review more graph theory. Suppose we have a DAGG= (V,E), and a set of nodes {X1, X2, . . . ., Xk}, where k ≥ 2, such (Xi−1, Xi) ∈ E or

(Xi, Xi−1) ∈ E for 2 ≤ i ≤ k. We call the set of edges connecting the k nodes a chain between X1 and Xk. We denote the chain using both the se- quence [X1, X2, . . . ., Xk] and the sequence [Xk, Xk−1, . . . ., X1]. For example,

[G, A, B, C]and [C, B, A, G]represent the same chain betweenGand Cin the DAG in Figure 2.3. Another chain betweenGandC is[G, F, B, C]. The nodes

X2, . . . Xk−1are calledinterior nodeson chain[X1, X2, . . . Xk]. Thesubchain of chain[X1, X2, . . . Xk]betweenXiandXj is the chain[Xi, Xi+1, . . . Xj]where

1i < j k. Acycleis a chain between a node and itself. Asimple chain is a chain containing no subchains which are cycles. We often denote chains by showing undirected lines between the nodes in the chain. For example, we would denote the chain[G, A, B, C]asGABC.If we want to show the direction of the edges, we use arrows. For example, to show the direction of the edges, we denote the previous chain asGAB C. A chain containing two nodes, such as XY, is called a link. A directed link, such as X Y, represents an edge, and we will call it an edge. Given the edge XY, we say the tail of the edge is at X and thehead of the edge is Y. We also say the following:

• A chainXZY is a head-to—tail meeting, the edges meethead- to-tailatZ, andZ is a head-to-tailnode on the chain.

• A chainXZY is atail-to—tail meeting, the edges meet tail-to- tailatZ, andZ is atail-to-tailnode on the chain.

• A chain X Z Y is a head-to—head meeting, the edges meet head-to-headatZ, andZ is ahead-to-headnode on the chain. • A chainXZY,such thatXandY are not adjacent, is anuncoupled

meeting.

Figure 2.4 shows an uncoupled head-to-head meeting. We now have the following definition:

Definition 2.2 LetG= (V,E) be a DAG,AV, X andY be distinct nodes inVA, andρbe a chain betweenX andY. ThenρisblockedbyAif one of the following holds:

1. There is a nodeZ Aon the chain ρ, and the edges incident toZ on ρ

meet head-to-tail atZ.

2. There is a nodeZ Aon the chain ρ, and the edges incident toZ on ρ

T W X S Z Y R

Figure 2.5: A DAG used to illustrate chain blocking.

3. There is a nodeZ, such that Z and all of Z’s descendents are not in A, on the chain ρ, and the edges incident toZ onρmeet head-to-head atZ. We say the chain is blocked at any node in Awhere one of the above meetings takes place. There may be more than one such node. The chain is calledactive

givenAif it is not blocked byA.

Example 2.1 Consider the DAG in Figure 2.5.

1. The chain [Y, X, Z, S] is blocked by {X}because the edges on the chain incident to X meet tail-to-tail at X. That chain is also blocked by {Z} because the edges on the chain incident to Z meet head-to-tail atZ. 2. The chain[W, Y, R, Z, S] is blocked by becauseR /, T / , and the

edges on the chain incident toR meet head-to-head atR.

3. The chain [W, Y, R, S] is blocked by {R} because the edges on the chain incident to R meet head-to-tail atR.

4. The chain [W, Y, R, Z, S] is not blocked by {R} because the edges on the chain incident to R meet head-to-head at R. Furthermore, this chain is not blocked by{T}becauseT is a descendent ofR.

We can now define d-separation.

Definition 2.3 Let G= (V,E) be a DAG, A V, and X and Y be distinct nodes in VA. We say X and Y are d-separated by A inG if every chain betweenX andY is blocked byA.

2.1. ENTAILED CONDITIONAL INDEPENDENCIES 73 It is not hard to see that every chain betweenX andY is blocked by Aif and only if every simple chain betweenXand Y is blocked byA.

Example 2.2 Consider the DAG in Figure 2.5.

1. X andR are d-separated by {Y, Z}because the chain [X, Y, R]is blocked atY, and the chain [X, Z, R]is blocked at Z.

2. XandT are d-separated by{Y, Z}because the chain[X, Y, R, T]is blocked atY, the chain [X, Z, R, T] is blocked atZ, and the chain[X, Z, S, R, T]

is blocked atZ and at S.

3. W and T are d-separated by {R} because the chains [W, Y, R, T] and

[W, Y, X, Z, R, T]are both blocked atR.

4. Y andZ are d-separated by {X}because the chain[Y, X, Z]is blocked at X, the chain[Y, R, Z]is blocked atR, and the chain[Y, R, S, Z]is blocked atS.

5. W andSare d-separated by{R, Z}because the chain[W, Y, R, S]is blocked atR, the chains[W, Y, R, Z, S] and[W, Y, X, Z, S]are both blocked at Z. 6. W andS are also d-separated by{Y, Z} because the chain[W, Y, R, S] is

blocked atY, the chain[W, Y, R, Z, S]is blocked atY,R, andZ, and the chain[W, Y, X, Z, S] is blocked atZ.

7. W andS are also d-separated by{Y, X}. You should determine why. 8. W andXare d-separated bybecause the chain[W, Y, X]is blocked atY,

the chain [W, Y, R, Z, X] is blocked at R, and the chain [W, Y, R, S, Z, X]

is blocked atS.

9. W andX are not d-separated by {Y}because the chain [W, Y, X] is not blocked atY sinceY ²{Y}and clearly it could not be blocked anywhere else. 10. W and T are not d-separated by {Y} because, even though the chain

[W, Y, R, T]is blocked atY, the chain[W, Y, X, Z, R, T]is not blocked atY sinceY ²{Y}and this chain is not blocked anywhere else because no other nodes are in{Y}and there are no other head-to-head meetings on it.

Definition 2.4 LetG= (V,E)be a DAG, andA,B,andCbe mutually disjoint subsets ofV. We sayAandBare d-separated by CinGif for everyXAand Y B, XandY are d-separated byC. We write

IG(A,B|C). If C=, we write only

Example 2.3 Consider the DAG in Figure 2.5. We have IG({W, X},{S, T}|{R, Z})

because every chain betweenW andS, W and T,X and S, and X and T is blocked by{R, Z}.

We writeIG(A,B|C)because, as we show next, d-separation identifies all and only those conditional independencies entailed by the Markov condition forG. We need the following three lemmas to prove this:

Lemma 2.1 LetP be a probability distribution of the variables inV and G= (V,E) be a DAG. Then(G, P)satisfies the Markov condition if and only if for every three mutually disjoint subsets A,B,C V, whenever A and B are d- separated byC,A and Bare conditionally independent in P givenC. That is,

(G, P)satisfies the Markov condition if and only if

IG(A,B|C) =IP(A,B|C). (2.1)

Proof. The proof that, if (G, P) satisfies the Markov condition, then each d- separation implies the corresponding conditional independency is quite lengthy and can be found in [Verma and Pearl, 1990] and in [Neapolitan, 1990].

As to the other direction, suppose each d-separation implies a conditional independency. That is, suppose Implication 2.1 holds. It is not hard to see that a node’s parents d-separate the node from all its nondescendents that are not its parents. That is, if we denote the sets of parents and nondescendents ofX by

PAX andNDX respectively, we have

IG({X},NDXPAX|PAX). Since Implication 2.1 holds, we can therefore conclude

IP({X},NDX−PAX|PAX),

which clearly states the same conditional independencies as IP({X},NDX|PAX),

which means the Markov condition is satisfied.

According to the previous lemma, if A and B are d-separated by C in G, the Markov condition entailsIP(A,B|C). For this reason, if(G, P)satisfies the Markov condition, we sayGis anindependence mapofP.

We close with an intuitive explanation for why every d-separation is a con- ditional independency. IfG= (V,E)and(G, P)satisfies the Markov condition, any dependency in P between two variables in V would have to be through a chain between them inG that has no head-to-head meetings. For example, suppose P satisfies the Markov condition with the DAG in Figure 2.5. Any

2.1. ENTAILED CONDITIONAL INDEPENDENCIES 75 dependency in P betweenX andT would have to be either through the chain

[X, Y, R, T]or the chain[X, Z, R, T]. There could be no dependency through the chain[X, Z, S, R, T] owing to the head-to-head meeting atS. If we instantiate a variable on a chain with no head-to-head meeting, we block the dependency through that chain. For example, if we instantiateY we block the dependency between X and T through the chain [X, Y, R, T], and if we instantiate Z we block the dependency betweenX andT through the chain [X, Z, R, T]. If we block all such dependencies, we render the two variables independent. For ex- ample, the instantiation ofY andZ renderXandT independent. In summary, the fact that we haveIG({X},{T}|{Y, Z})means we haveIP({X},{T}|{Y, Z}). If every chain between two nodes contains a head-to-head meeting, there is no chain through which they could be dependent, and they are independent. For example, ifPsatisfies the Markov condition with the DAG in Figure 2.5,W and

X are independent in P. That is, the fact that we haveIG({W},{X})means we haveIP({W},{X}). Note that we cannot concludeIP({W},{X}|{Y})from the Markov condition, and we do not haveIG({W},{X}|{Y}).

Every Entailed Conditional Independency is Identified by d-separation Could there be conditional independencies, other than those identified by d- separation, that are entailed by the Markov condition? The answer is no. The next two lemmas prove this. First we have a definition.

Definition 2.5 LetVbe a set of random variables, andA1,B1,C1,A2,B2, and

C2 be subsets ofV. We say conditional independencyIP(A1,B1|C1)is equiva-

lent to conditional independencyIP(A2,B2|C2)if for every probability distribu-

tionP ofV,IP(A1,B1|C1)holds if and only ifIP(A2,B2|C2)holds.

Lemma 2.2 Any conditional independency entailed by a DAG, based on the Markov condition, is equivalent to a conditional independency among disjoint sets of random variables.

Proof. The proof is developed in Exercise 2.4.

Due to the preceding lemma, we need only discuss disjoint sets of random variables when investigating conditional independencies entailed by the Markov condition. The next lemma states that the only such conditional independencies are those that correspond to d-separations:

Lemma 2.3 Let G = (V,E) be a DAG, and P be the set of all probability distributionsP such that(G, P)satisfies the Markov condition. Then for every three mutually disjoint subsets A,B,CV,

IP(A,B|C)for allP ∈P=⇒IG(A,B|C).

Proof. The proof can be found in [Geiger and Pearl, 1990].

Before stating the main theorem concerning d-separation, we need the fol- lowing definition:

X Y Z P(y1|x1) = 1 - (b + c) P(y2|x1) = c P(y3|x1) = b P(y1|x2) = 1 - (b + d) P(y2|x2) = d P(y3|x2) = b P(z1|y1) = e P(z2|y1) = 1 - e P(z1|y2) = e P(z2|y2) = 1 - e P(z1|y3) = f P(z2|y3) = 1 - f P(x1) = a P(x2) = 1-a

Figure 2.6: For this(G, P), we haveIP({X},{Z})but notIG({X},{Z}). Definition 2.6 We say conditional independency IP(A,B|C)is identified by

d-separation inGif one of the following holds: 1. IG(A,B|C).

2. A,B,andCare not mutually disjoint;A0,B0,andC0 are mutually disjoint,

IP(A,B|C)andIP(A0,B0|C0) are equivalent, and we haveIG(A0,B0|C0).

Theorem 2.1 Based on the Markov condition, a DAG G entails all and only those conditional independencies that are identified by d-separation in G.

Proof. The proof follows immediately from the preceding three lemmas.

You must be careful to interpret Theorem 2.1 correctly. A particular dis- tribution P, that satisfies the Markov condition withG, may have conditional independencies that are not identified by d-separation. For example, consider the Bayesian network in Figure 2.6. It is left as an exercise to showIP({X},{Z}) for the distributionP in that network. Clearly,IG({X},{Z})is not the case. However, there are many distributions, which satisfy the Markov condition with the DAG in thatfigure, that do not have this independency. One such distri- bution is the one given in Example 1.25 (with X, Y, and Z replaced by V,

C, and S respectively). The only independency, that exists in all distribu- tions satisfying the Markov condition with this DAG, isIP({X},{Z}|{Y}), and

IG({X},{Z}|{Y})is the case.