First Orientation Rule - The Mixture Approach

5.2 The Mixture Approach

5.2.5 First Orientation Rule

We require the definition of a minimal independence set for the first orientation rule:

Definition 6. If Oi ⊥⊥ Oj|(W,S) with W ⊆ O\ {Oi, Oj}, but we have Oi 6⊥⊥ Oj|(B,S)

for any B⊂W, then W is a minimal independence set for Oi and Oj.

Recall that Proposition 3 tells us that we have Oi 6⊥⊥ Oj|(B,S) in the following three

We would however like to claim that ifOi ⊥⊥Oj|(W,S) withW having minimal cardinality,

then Ot

i 6⊥⊥Ojt|(Bt,St) at some t ∈S(f(T)) for all B ⊂W. However, the first item in the

above list introduces a problem because we have Ot

i ⊥⊥Otj|(Bt,St) at allt∈S(f(T)).

We can nonetheless make progress by realizing that satisfying the first item in List 1 is very difficult in practice. Let Z =W \B. Consider the following definition:

Definition 7. A non-empty variable set Z ⊆W is parameter independence inducing (PII) if and only if (1) W is the smallest set such that Ot_i ⊥⊥O_jt|(Wt,St) at allt ∈S(f(T)) with

λI_Oi|W,S ⊥⊥ λI_Oj|W,S, but (2) O

i ⊥⊥ Ojt|(Wt\Zt,St) at all t ∈ S(f(T)) with λI_Oi|W\Z,S 6⊥⊥

λI_Oj|W\Z,S.

In other words, if conditional independence holds given {Wt_\_Zt_,_St_} _{for all} _t _∈ _S₍_f₍_T_)),

then introducing Z maintains the conditional independence but also induces parameter independence.

Now I believe that PII sets almost never occur in practice. For example, consider the CMJ in Figure 5.1a. Assume that ft(Oi, Oj) uniquely changes at the dotted red lines in

Figure 5.1a. Then, we have Ot

1 ⊥⊥ O2t at all t ∈ S(f(T)) by the global directed Markov property but parameter dependence may hold. We can however consider a variable set Z

that takes four distinct values between the dotted red lines in the intervals [0,1), [1,2), [2,3) and [3,4). Then Ot₁ ⊥⊥ O₂t|Zt at all t ∈ S(f(T)) and λIO₁|Z ⊥⊥ λIO₂|Z, so Z induces

parameter independence. I can make a similar statement for another Z in Figure 5.1b that takes five values.

Notice that choosing Z depends on _PT. If nature constructs a PII variable set, then the

variable set must correspond with the investigator-determined_PT. On the other hand, if an

investigator constructs a PII variable set, then he or she must design_PT using prior knowledge

about the underlying CMJ. I believe both of the aforementioned cases are uncommon. I therefore feel safe to make the following assumption:

Assumption 2. Parameter faithfulness holds if and only if we cannot query the CI oracle with any PII variable set.

Observe that we may also construct a Lebesgue measure zero argument for parameter faithfulness, as we did with mixture faithfulness, since we must have the algebraic equality

Figure 5.1: (a) An example of a CMJ and a time distribution _PT. A variable taking on 4 unique values between the dotted red lines ensures that parameter independence holds. (b) We can likewise consider a variable taking 5 unique values in this shifted case.

PλI

Oi|W,SλI_Oj|W,S =PλI_Oi|W,SPλI_Oj|W,S when parameter independence holds.

Parameter faithfulness has another close connection with mixture faithfulness. Mixture faithfulness allows us to claim the following if and only if statement by Proposition 4: we have Oi ⊥⊥ Oj|(W,S) if and only if Oi ⊥⊥Oj|(W,S,λI_OiOj|W,S) and λI_Oi|W,S ⊥⊥ λI_Oj|W,S.

We may re-write the above statement in terms of B ⊂ W as follows: Oi 6⊥⊥ Oj|(B,S) if

and only if O_it6⊥⊥O_jt|(Bt,St) for somet ∈S(f(T)) or λI_Oi|B,S 6⊥⊥ λI_Oj|B,S. The “or” logical

disjunction again presents the problem here because we may have Ot

i ⊥⊥Otj|(Bt,St) for all

t∈S(f(T)) butλI_Oi|B,S 6⊥⊥λI_Oj|B,S. However, the parameter faithfulness assumption allows

us to avoid this problematic case when used in conjunction with mixture faithfulness:

Proposition 5. Assume that mixture faithfulness and parameter faithfulness holds. If Oi

⊥⊥Oj|(W,S) with minimal independence set W, then ∃t∈S(f(T))for each B⊂W such

that O_it6⊥⊥dOtj|(Bt,St); note that t may not necessarily be the same for all B⊂W.

Proof. If Oi ⊥⊥ Oj|(W,S), then Oti ⊥⊥ Otj|(Wt,St) at all t ∈ S(f(T)) with λI_Oi|W,S ⊥⊥

λI_Oj|W,S by mixture faithfulness. By parameter faithfulness, if we remove any non-empty

variable set Z from W, then we cannot have Ot_i ⊥⊥ Ot_j|(Wt\Zt,St) at all t ∈ S(f(T)) with λI_Oi|W,S 6⊥⊥ λI_Oj|W,S. Therefore, by minimality of W and Lemma 1, we must have

either (1) O_it 6⊥⊥ O_jt|(Wt\Zt,St) at some t ∈ S(f(T)) with λI_Oi|W,S ⊥⊥ λI_Oj|W,S, or (2)

i 6⊥⊥ Ojt|(Wt \Zt,St) at some t ∈ S(f(T)) with λI_Oi|W,S 6⊥⊥ λI_Oj|W,S. In either case,

O_it6⊥⊥Ot_j|(Wt\Zt,St) at somet ∈S(f(T)), so the conclusion follows by the global directed Markov property.

I find the above proposition useful for justifying part of the first orientation rule:

Lemma 8. Assume mixture faithfulness, d-separation faithfulness with respect to _PXt,∀t ∈

S(f(T)) as well as parameter faithfulness. If Oi ⊥⊥ Ok|(W,S) with minimal independence

set W and Oj ∈ W, then Otj ∈ An({Oit, Otk} ∪ St) at some t ∈ S(f(T)). If further

f(Oi, Ok|B,S) is stationary for all B ⊆ (W \Oj), then Ojt ∈ An({Oti, Otk} ∪S

t₎ _{at all}

t∈S(f(T)).

Proof. I first introduce the following technical lemma which is a modification of Lemma 14 in [Spirtes et al., 1999]:

Lemma 9. Suppose that we have Ot

j 6∈An({Oit, Otk} ∪St) at all t∈S(f(T)). If there is a

set W ⊆O\ {Oi, Ok} containing Oj such that Ot

i and Ot

k are d-connected given Bt

0 ∪St0

at some t0 ∈S(f(T)) for each subset B ⊆(W \Oj), then Ot

i and Ot

k are also d-connected

given Wt0 _∪_St0 _{for each subset}_B _⊆₍_W _\_O

j); note that t0 may not necessarily be the same

for all B ⊆(W \Oj).

Proof. I write G ∈An({Oi, Ok} ∪S) if and only if Gt ∈ An({Oti, Okt} ∪St) at some time

t∈S(f(T)). Let B∗ =An({Oi, Ok} ∪S)∩W and B∗t =An({Oit, Otk} ∪St)∩Wt. Recall

that Oj 6∈ An({Oi, Ok} ∪S) by hypothesis, so B∗ ⊆ (W \Oj). We also know that there

exists a path πt0 _{which d-connects} _Ot0

i and Ot 0 k given Bt 0 ∗ ∪St 0

by hypothesis. Thus, every vertex onπt0 is inAn({O_it0, O_kt0}∪B∗t0∪St

) by the definition of d-connection; in other words, every vertex on πt0 _{has a corresponding vertex in}_An_({_O

i, Ok} ∪B∗∪S). But since we also

have B∗t0 = An({Ot 0 i , Ot 0 k} ∪St 0

)∩Wt0, every vertex on πt0 more specifically has a vertex in An({Ot0 i , Ot 0 k} ∪Bt 0 ∗ ∪St 0 ) = An({Ot0 i, Ot 0 k} ∪St 0

) and therefore a corresponding vertex in An({Oi, Ok} ∪S). Now observe that W \B∗ = W ∩(¬An({Oi, Ok} ∪S)∪ ¬W)) =

(W ∩ ¬An({Oi, Ok} ∪ S))∪ (W ∩ ¬W))

= W ∩ ¬An({Oi, Ok} ∪ S), which is not

in An({Oi, Ok} ∪ S). Thus, no vertex in W \ B∗ can exist on πt

. Next, observe that (B∗∪S)⊆(W∪S) and (W∪S)\(B∗∪S) =W\B∗, so the additional vertices inW∪S

cannot exist onπt0. Hence πt0 still d-connects Ot_i0 and Ot_k0 given Wt0∪St0.

Now suppose for a contradiction that we haveOt

j 6∈An({Oti, Okt} ∪St) at allt ∈S(f(T))

for the first claim. With parameter faithfulness, we know that we have O_it 6⊥⊥d Okt|(Bt,St)

at some t ∈ S(f(T)) for each B ⊆ (W \Oj) by Proposition 5. We can therefore invoke

Lemma 9 and claim that we have O_it 6⊥⊥d Otk|(Wt ∪ St) at some t ∈ S(f(T)). Hence,

i 6⊥⊥ Okt|(Wt,St) at some t ∈ S(f(T)) by d-separation faithfulness andOi 6⊥⊥ Ok|(W,S)

by the contrapositive of Lemma 4. However, this contradicts the fact that we must have

Oi ⊥⊥Ok|(W,S).

We will need the following lemma for the second claim:

Lemma 10. Suppose that we have Ot0

j 6∈An({Ot 0 i , Ot 0 k} ∪St 0 )at some t0 ∈S(f(T)). If there is a set W ⊆O\ {Oi, Ok} containing Oj such that, for every subset B⊆(W \Oj), Oit and

k are d-connected given Bt∪St at all t∈ S(f(T)), then Ot

i and Ot

k are also d-connected

given Wt0 _∪_St0_.

Proof. The proof is similar to Lemma9. Recall thatOt0

j 6∈An({Ot 0 i , Ot 0 k}∪St 0 ) by hypothesis, so B_∗t0 ⊆Wt0 \O_jt0. We know that there exists a path πt0 that d-connects O_it0 and O_kt0 given

Bt0

∗ ∪St

by hypothesis. Thus, every vertex on πt0 _{is in} _An_({_Ot0 i , Ot 0 k} ∪Bt 0 ∗ ∪St 0 ) by the definition of d-connection. But since we also have B_∗t0 =An({O_it0, O_kt0} ∪St0)∩Wt0, every vertex onπt0 _{more specifically has a vertex in}_An_({_Ot0

i, Ot 0 k}∪Bt 0 ∗∪St 0 ) =An({Ot0 i , Ot 0 k}∪St 0 ) and therefore a corresponding vertex inAn({Oi, Ok}∪S). Now observe again thatW\B∗ =

W ∩ ¬An({Oi, Ok} ∪S), which is not in An({Oi, Ok} ∪S). Thus, no vertex in W \B∗

can exist onπt0. Again, (W ∪S)\(B∗∪S) =W \B∗, so the additional vertices inW∪S

cannot exist onπt0_{. Hence} _πt0 _{still d-connects} _Ot0

i and Ot

k given Wt

0 ∪St0_.

Now suppose again for a contradiction that f(Oi, Ok|B,S) is stationary for all B ⊆

(W \ Oj), but we have Ojt 6∈ An({Oti, Otk} ∪ St) at some t ∈ S(f(T)). We can then

follow a similar deductive argument as in the proof of the previous claim. With parameter faithfulness, we know that we have Ot

i 6⊥⊥d Otk|(Bt∪ St) at some t ∈ S(f(T)) for each B ⊆(W \Oj) by Proposition 5. It follows that Oit6⊥⊥Otk|(Bt,St) at somet∈ S(f(T)) for

eachB ⊆(W \Oj) by d-separation faithfulness. We can more strongly claim that we have

i 6⊥⊥Okt|(Bt,St) at all t ∈S(f(T)) for each B⊆ (W \Oj) by Lemma 2, because we also

know thatf(Oi, Ok|B,S) is stationary for allB ⊆(W\Oj). Hence,Oti 6⊥⊥dOkt|(Bt,St) at all

t∈S(f(T)) for eachB⊆(W\Oj) by the global directed Markov property. We can therefore

invoke Lemma 10and claim that we have Ot

i 6⊥⊥d Okt|(Wt,St) at some t ∈ S(f(T)). Thus,

i 6⊥⊥ Okt|(Wt,St) at some t ∈ S(f(T)) by d-separation faithfulness andOi 6⊥⊥ Ok|(W,S)

by the contrapositive of Lemma 4. However, this again contradicts the fact that we must haveOi ⊥⊥Ok|(W,S).

I can therefore use the above lemma in order to apply the following orientation rules: 1. R1a∗: If (1) Oi∗→ Oj◦−∗Ok or Oi∗→ Oj •−∗Ok, (2) Oi ⊥⊥ Ok|(W,S) with minimal

independence set W and Oj ∈ W, and (3) f(Oi, Ok|B,S) is stationary for all B ⊆

(W \Oj), then orientOi∗→Oj◦−∗Ok orOi∗→Oj•−∗Ok as Oi∗→Oj—∗Ok

2. R1b∗: If (1) Oi∗→ Oj◦−∗Ok or Oi∗→ OjJ—∗Ok, (2) Oi ⊥⊥ Ok|(W,S) with minimal

independence set W and Oj ∈W, and (3) f(Oi, Ok|B,S) is non-stationary for at least

one B ⊆(W \Oj), then orientOi∗→Oj◦—∗Ok orOi∗→OjJ—∗Ok asOi∗→Oj•−∗Ok.

3. R1c∗: If (1)Oi∗—IOj◦−∗Ok,Oi∗—IOj•−∗Ok,Oi∗—Oj◦−∗Ok orOi∗—Oj•−∗Ok, (2)Oi

⊥⊥Ok|(W,S) with minimal independence set W and Oj ∈W, and (3) f(Oi, Ok|B,S)

is stationary for all B ⊆ W \ Oj, then orient (C1) Oi ∗—IOj—∗ Ok when we have

Oi∗—IOj◦−∗Ok orOi∗—IOj•−∗Ok, or (C2)Oi∗—Oj—∗Okwhen we haveOi∗—Oj◦−∗Ok

or Oi∗—Oj•−∗Ok.

We also have the following result:

Lemma 11. If we have the edges Oi—Oj or Oi—Oj, then we cannot have an incoming

unfilled arrowhead at Oi or Oj. Similarly, if we have the edge Oi—Oj, then we cannot have

an incoming unfilled arrowhead, filled arrowhead or square at Oi or Oj.

Proof. Recall that we do not allow instantaneous feedback loops. Thus, if we have Oi—Oj

or Oi—Oj, then both Oit and Ojt must be ancestors of St at some time point t ∈ S(f(T)).

Hence, O_itand O_jt cannot be non-ancestors ofStat all time points t∈S(f(T)). Similarly, if we haveOi—Oj, then bothOitandOjtmust be ancestors ofStat all time pointst ∈S(f(T)).

Hence, O_it and Ot_j cannot be non-ancestors of St at some time point t∈S(f(T)).

We can thus further expand on R1 as follows by taking the contrapositive of Lemma 11: 1. R1a: If (1)-(3) hold as in R1a∗, then orient Oi∗→ Oj◦−∗Ok or Oi∗→ Oj •− ∗Ok as

Oi∗→Oj →Ok.

2. R1b: If (1)-(3) hold as in R1b∗, then orient Oi∗→ Oj ◦—∗Ok or Oi∗→ OjJ—∗Ok as

Oi∗→Oj •—IOk.

3. R1c: If (1)-(3) hold as in R1c∗, then orient (C1)Oi∗—IOj—IOk whenOi∗—IOj◦−∗Ok or

In document Causal Discovery Under Non-Stationary Feedback (Page 67-74)