d-Separation - Creating Bayesian Networks Using Causal Edges

1.4 Creating Bayesian Networks Using Causal Edges

2.1.2 d-Separation

We showed in Section 2.1.1 that the Markov condition entailsIP({C},{G}|{F}) for the DAG in Figure 2.1. This conditional independency is an example of a DAG property called ‘d-separation’. That is, _{C_}and _{G_} are d-separated by _{A, F_} in the DAG in Figure 2.1. Next we develop the concept of d- separation, and we show the following: 1) The Markov condition entails that all d-separations are conditional independencies; and 2) every conditional independencies entailed by the Markov condition is identified by d-separation. That is, if (_G, P)satisfies the Markov condition, every d-separation in_G is a conditional independency inP. Furthermore, every conditional independency, which is common to all probability distributions satisfying the Markov condition with

2.1. ENTAILED CONDITIONAL INDEPENDENCIES 71 G, is identified by d-separation.

All d-separations are Conditional Independencies

First we need review more graph theory. Suppose we have a DAG_G= (V,E), and a set of nodes _{X1, X2, . . . ., Xk}, where k ≥ 2, such (Xi−1, Xi) ∈ E or

(Xi, Xi−1) ∈ E for 2 ≤ i ≤ k. We call the set of edges connecting the k nodes a chain between X1 and Xk. We denote the chain using both the sequence [X1, X2, . . . ., Xk] and the sequence [Xk, Xk−1, . . . ., X1]. For example,

[G, A, B, C]and [C, B, A, G]represent the same chain betweenGand Cin the DAG in Figure 2.3. Another chain betweenGandC is[G, F, B, C]. The nodes

X2, . . . Xk−1are calledinterior nodeson chain[X1, X2, . . . Xk]. Thesubchain of chain[X1, X2, . . . Xk]betweenXiandXj is the chain[Xi, Xi+1, . . . Xj]where

1_≤i < j _≤k. Acycleis a chain between a node and itself. Asimple chain is a chain containing no subchains which are cycles. We often denote chains by showing undirected lines between the nodes in the chain. For example, we would denote the chain[G, A, B, C]asG₋A₋B₋C.If we want to show the direction of the edges, we use arrows. For example, to show the direction of the edges, we denote the previous chain asG_←A_→B _→C. A chain containing two nodes, such as X₋Y, is called a link. A directed link, such as X_→ Y, represents an edge, and we will call it an edge. Given the edge X_→Y, we say the tail of the edge is at X and thehead of the edge is Y. We also say the following:

• A chainX_→Z_→Y is a head-to—tail meeting, the edges meethead- to-tailatZ, andZ is a head-to-tailnode on the chain.

• A chainX_←Z_→Y is atail-to—tail meeting, the edges meet tail-to- tailatZ, andZ is atail-to-tailnode on the chain.

• A chain X _→ Z _← Y is a head-to—head meeting, the edges meet head-to-headatZ, andZ is ahead-to-headnode on the chain. • A chainX₋Z₋Y,such thatXandY are not adjacent, is anuncoupled

meeting.

Figure 2.4 shows an uncoupled head-to-head meeting. We now have the following definition:

Definition 2.2 Let_G= (V,E) be a DAG,A_⊆V, X andY be distinct nodes inV₋A, andρbe a chain betweenX andY. ThenρisblockedbyAif one of the following holds:

1. There is a nodeZ _∈ Aon the chain ρ, and the edges incident toZ on ρ

meet head-to-tail atZ.

2. There is a nodeZ _∈ Aon the chain ρ, and the edges incident toZ on ρ

T W X S Z Y R

Figure 2.5: A DAG used to illustrate chain blocking.

3. There is a nodeZ, such that Z and all of Z’s descendents are not in A, on the chain ρ, and the edges incident toZ onρmeet head-to-head atZ. We say the chain is blocked at any node in Awhere one of the above meetings takes place. There may be more than one such node. The chain is calledactive

givenAif it is not blocked byA.

Example 2.1 Consider the DAG in Figure 2.5.

1. The chain [Y, X, Z, S] is blocked by _{X_}because the edges on the chain incident to X meet tail-to-tail at X. That chain is also blocked by _{Z_} because the edges on the chain incident to Z meet head-to-tail atZ. 2. The chain[W, Y, R, Z, S] is blocked by_∅ becauseR /_∈_∅, T /_∈ _∅, and the

edges on the chain incident toR meet head-to-head atR.

3. The chain [W, Y, R, S] is blocked by _{R_} because the edges on the chain incident to R meet head-to-tail atR.

4. The chain [W, Y, R, Z, S] is not blocked by _{R_} because the edges on the chain incident to R meet head-to-head at R. Furthermore, this chain is not blocked by_{T_}becauseT is a descendent ofR.

We can now define d-separation.

Definition 2.3 Let _G= (V,E) be a DAG, A _⊆ V, and X and Y be distinct nodes in V₋A. We say X and Y are d-separated by A in_G if every chain betweenX andY is blocked byA.

2.1. ENTAILED CONDITIONAL INDEPENDENCIES 73 It is not hard to see that every chain betweenX andY is blocked by Aif and only if every simple chain betweenXand Y is blocked byA.

Example 2.2 Consider the DAG in Figure 2.5.

1. X andR are d-separated by _{Y, Z_}because the chain [X, Y, R]is blocked atY, and the chain [X, Z, R]is blocked at Z.

2. XandT are d-separated by_{Y, Z_}because the chain[X, Y, R, T]is blocked atY, the chain [X, Z, R, T] is blocked atZ, and the chain[X, Z, S, R, T]

is blocked atZ and at S.

3. W and T are d-separated by _{R_} because the chains [W, Y, R, T] and

[W, Y, X, Z, R, T]are both blocked atR.

4. Y andZ are d-separated by _{X_}because the chain[Y, X, Z]is blocked at X, the chain[Y, R, Z]is blocked atR, and the chain[Y, R, S, Z]is blocked atS.

5. W andSare d-separated by_{R, Z_}because the chain[W, Y, R, S]is blocked atR, the chains[W, Y, R, Z, S] and[W, Y, X, Z, S]are both blocked at Z. 6. W andS are also d-separated by_{Y, Z_} because the chain[W, Y, R, S] is

blocked atY, the chain[W, Y, R, Z, S]is blocked atY,R, andZ, and the chain[W, Y, X, Z, S] is blocked atZ.

7. W andS are also d-separated by_{Y, X_}. You should determine why. 8. W andXare d-separated by_∅because the chain[W, Y, X]is blocked atY,

the chain [W, Y, R, Z, X] is blocked at R, and the chain [W, Y, R, S, Z, X]

is blocked atS.

9. W andX are not d-separated by _{Y_}because the chain [W, Y, X] is not blocked atY sinceY ²_{Y_}and clearly it could not be blocked anywhere else. 10. W and T are not d-separated by _{Y_} because, even though the chain

[W, Y, R, T]is blocked atY, the chain[W, Y, X, Z, R, T]is not blocked atY sinceY ²_{Y_}and this chain is not blocked anywhere else because no other nodes are in_{Y_}and there are no other head-to-head meetings on it.

Definition 2.4 Let_G= (V,E)be a DAG, andA,B,andCbe mutually disjoint subsets ofV. We sayAandBare d-separated by Cin_Gif for everyX_∈Aand Y _∈B, XandY are d-separated byC. We write

I_G(A,B_|C). If C=_∅, we write only

Example 2.3 Consider the DAG in Figure 2.5. We have I_G(_{W, X_},_{S, T_}|{R, Z_})

because every chain betweenW andS, W and T,X and S, and X and T is blocked by_{R, Z_}.

We writeI_G(A,B_|C)because, as we show next, d-separation identifies all and only those conditional independencies entailed by the Markov condition for_G. We need the following three lemmas to prove this:

Lemma 2.1 LetP be a probability distribution of the variables inV and _G= (V,E) be a DAG. Then(_G, P)satisfies the Markov condition if and only if for every three mutually disjoint subsets A,B,C _⊆ V, whenever A and B are d- separated byC,A and Bare conditionally independent in P givenC. That is,

(_G, P)satisfies the Markov condition if and only if

I_G(A,B_|C) =_⇒IP(A,B|C). (2.1)

Proof. The proof that, if (_G, P) satisfies the Markov condition, then each d- separation implies the corresponding conditional independency is quite lengthy and can be found in [Verma and Pearl, 1990] and in [Neapolitan, 1990].

As to the other direction, suppose each d-separation implies a conditional independency. That is, suppose Implication 2.1 holds. It is not hard to see that a node’s parents d-separate the node from all its nondescendents that are not its parents. That is, if we denote the sets of parents and nondescendents ofX by

PAX andNDX respectively, we have

I_G(_{X_},NDX₋PAX_|PAX). Since Implication 2.1 holds, we can therefore conclude

IP({X},NDX−PAX|PAX),

which clearly states the same conditional independencies as IP({X},NDX|PAX),

which means the Markov condition is satisfied.

According to the previous lemma, if A and B are d-separated by C in _G, the Markov condition entailsIP(A,B|C). For this reason, if(G, P)satisfies the Markov condition, we say_Gis anindependence mapofP.

We close with an intuitive explanation for why every d-separation is a conditional independency. If_G= (V,E)and(_G, P)satisfies the Markov condition, any dependency in P between two variables in V would have to be through a chain between them in_G that has no head-to-head meetings. For example, suppose P satisfies the Markov condition with the DAG in Figure 2.5. Any

2.1. ENTAILED CONDITIONAL INDEPENDENCIES 75 dependency in P betweenX andT would have to be either through the chain

[X, Y, R, T]or the chain[X, Z, R, T]. There could be no dependency through the chain[X, Z, S, R, T] owing to the head-to-head meeting atS. If we instantiate a variable on a chain with no head-to-head meeting, we block the dependency through that chain. For example, if we instantiateY we block the dependency between X and T through the chain [X, Y, R, T], and if we instantiate Z we block the dependency betweenX andT through the chain [X, Z, R, T]. If we block all such dependencies, we render the two variables independent. For example, the instantiation ofY andZ renderXandT independent. In summary, the fact that we haveI_G(_{X_},_{T_}|{Y, Z_})means we haveIP({X},{T}|{Y, Z}). If every chain between two nodes contains a head-to-head meeting, there is no chain through which they could be dependent, and they are independent. For example, ifPsatisfies the Markov condition with the DAG in Figure 2.5,W and

X are independent in P. That is, the fact that we haveI_G(_{W_},_{X_})means we haveIP({W},{X}). Note that we cannot concludeIP({W},{X}|{Y})from the Markov condition, and we do not haveI_G(_{W_},_{X_}|{Y_}).

Every Entailed Conditional Independency is Identified by d-separation Could there be conditional independencies, other than those identified by d- separation, that are entailed by the Markov condition? The answer is no. The next two lemmas prove this. First we have a definition.

Definition 2.5 LetVbe a set of random variables, andA1,B1,C1,A2,B2, and

C2 be subsets ofV. We say conditional independencyIP(A1,B1|C1)is equiva-

lent to conditional independencyIP(A2,B2|C2)if for every probability distribu-

tionP ofV,IP(A1,B1|C1)holds if and only ifIP(A2,B2|C2)holds.

Lemma 2.2 Any conditional independency entailed by a DAG, based on the Markov condition, is equivalent to a conditional independency among disjoint sets of random variables.

Proof. The proof is developed in Exercise 2.4.

Due to the preceding lemma, we need only discuss disjoint sets of random variables when investigating conditional independencies entailed by the Markov condition. The next lemma states that the only such conditional independencies are those that correspond to d-separations:

Lemma 2.3 Let _G = (V,E) be a DAG, and P be the set of all probability distributionsP such that(_G, P)satisfies the Markov condition. Then for every three mutually disjoint subsets A,B,C_⊆V,

IP(A,B|C)for allP ∈P=⇒IG(A,B|C).

Proof. The proof can be found in [Geiger and Pearl, 1990].

Before stating the main theorem concerning d-separation, we need the following definition:

Figure 2.6: For this(_G, P), we haveIP({X},{Z})but notIG({X},{Z}). Definition 2.6 We say conditional independency IP(A,B|C)is identified by

d-separation in_Gif one of the following holds: 1. I_G(A,B_|C).

2. A,B,andCare not mutually disjoint;A0_,_B0_,_and_C0 _{are mutually disjoint,}

IP(A,B|C)andIP(A0,B0|C0) are equivalent, and we haveIG(A0,B0|C0).

Theorem 2.1 Based on the Markov condition, a DAG _G entails all and only those conditional independencies that are identified by d-separation in _G.

Proof. The proof follows immediately from the preceding three lemmas.

You must be careful to interpret Theorem 2.1 correctly. A particular distribution P, that satisfies the Markov condition with_G, may have conditional independencies that are not identified by d-separation. For example, consider the Bayesian network in Figure 2.6. It is left as an exercise to showIP({X},{Z}) for the distributionP in that network. Clearly,I_G(_{X_},_{Z_})is not the case. However, there are many distributions, which satisfy the Markov condition with the DAG in thatfigure, that do not have this independency. One such distribution is the one given in Example 1.25 (with X, Y, and Z replaced by V,

C, and S respectively). The only independency, that exists in all distributions satisfying the Markov condition with this DAG, isIP({X},{Z}|{Y}), and

I_G(_{X_},_{Z_}|{Y_})is the case.

In document Learning Bayesian Networks Neapolitan R E pdf (Page 80-86)