• No results found

Chapitre 3 A NOVEL ALGORITHM FOR LOCAL LEARNING OF MARKOV BLAKNET :

3.2 IPC-MB algorithm specification and proof

3.2.4 Learn Spouses

By the Line 12 of IPC-MB (Figure 3-3), we have as discussed in last section. In fact, we also have collected all candidate spouses of with the repeated calls of FindCanPC( ).

Lemma 3.4 Given , if , contains candidate spouses of if there are. Proof. Theorem 3.1 tells us that , the output of FindCanPC( ), contains all parents/children of . Given , if , then is known as a true parent/child (Lemma 3.3). If is a child of , and if it is a common child of and some , must be returned by FindCanPC( ) . This applies to all ‟s parents which are ‟s spouses meanwhile. █

All outputs of FindCanPC( ) regarding to such are cached as (Line 9, IPC-MB) with subscript for later reference. Obviously, it contains more than what we want:

 , since ;

 True parents and/or children of , which would be ignored;

 True spouses of , i.e. those having as their child as . These are what we are interested to distinguished here;

 False positives (neither parents, children nor spouses of ).

Lemma 3.5 Given =FindCanPC( ), , where and , i.e. .

Proof. Assume there exists some spouse of which is not contained in , which means that is not contained in any , where and . This may happen only when (1) The common child of this and is not contained in , or (2) is not returned by its common child with , , though . Both cases are contradictory to the facts that

FindCanPC( ) returns all parents and children of (Theorem 3.1). █

With Lemma 3.5, it is known that contain all candidate spouses of , by Line 12 of

IPC-MB, and it is denoted with shorthand . However, there are many false positives are known as contained in as well, waiting for further processing.

Similarly to the discovery of parents and children of , i.e. , we depend on the underlying connectivity information to recognize from . For any , there are two facts

available for reference: (1) it has to belong to and ; (2) it is independent of as conditioned on or (that is why it is not included in ) , but it is

dependent with conditioned on or . The first observation is

obvious given the underlying topology, and the second is based on Theorem 1.4.

Lemma 3.6 Given each but , there must exist some , , such that

.

Proof. The proof is trivial since if there is no such , should be in .█

Lemma 3.7 In IPC-MB, for each but , either or (Note that means empty set , while means Null pointer, i.e. there is

no record for the corresponding subscript ).

Proof. Given each , (1) If it is a non-descendant of , it will be recognized as conditionally independent given some ; (2) Else if it is a descendant of , it

may be falsely decided as conditionally dependent with , which means that , and it will be contained in ; (3) Since we will call FindCanPC for each , if ,

will be recognized as conditionally independent given some within

FindCanPC( ). In short, for each , it is always can be recognized conditionally independent given some set, and therefore or . █

Due that either or , it is necessary to check them before the

assignment as done at Line 15 of IPC-MB.

Lemma 3.8 Given the faithfulness assumption, is equal to say that all paths

between and are blocked by , i.e. is d-separated from by .

Lemma 3.9 Given and = FindCanPC( ), .

Proof. Theorem 3.2 tells that FindCanPC( ) won‟t output ‟s non-descendants. Since

, it means that , i.e. given each . █

Theorem 3.4 Given and FindCanPC( ), for each but and (excluding processed and descendants of if there are), if is conditionally dependent

with given or (depending on which one is not NIL), is known as a true spouse of .

Proof. Given , but and , it is secure to declare that is connected with , denoted as , and is NOT connected with ,denoted as . Besides, due that , it is known that where

(Lemma 3.6 and Lemma 3.7) . In other words, blocks all

possible paths connecting and (Lemma 3.8). To prove the statement, we have to study the following six cases separately considering that X may be a parent/child of T ( ) and Y can be a parent/child/descendant of ( ):

1. and , i.e. , but . To block the path , the statement that must be true. Otherwise, at least we have a non-blocked path, which is contradictory to the fact that and Lemma 3.8. Therefore, ,and won‟t happen for this case;

2. and , i.e. but . Same proof as case 1;

3. and , i.e. but . It is easy to prove that adding does will make the path non-blocked, i.e. won‟t d-separates and anymore. Therefore, we have ;

4. and , i.e. but . Same proof as case 1;

5. and . (1) Since and , there must exist, at least one, non- blocked path . (2) Because , all paths connecting and must be blocked by some . Assuming that there is one path known as open, then it is extendable to access via since , e.g. . To ensure d-separation, this path has to be blocked; therefore has to be observed, i.e. . Otherwise, will keep open (since there is no chance to construct a converging pattern here with the existing of ), which is contradictory to the fact that . Since , it is impossible to have ;

These six cases cover all possible happenings, so the proof itself is complete. From the discussion above, it is noticed that only the true spouse can satisfy given , where , but and . █

Theorem 3.5 Under the assumptions that the independence tests are correct and that the learning

data is an independent and identically distributed sample from a probability distribution faithful to a DAG , all spouses of of interest are found with IPC-MB.

Proof. (Line 12 of IPC-MB) contains all the true parents and children, and

contains all spouses of . With each but not in the current

and not in , it will be correctly recognized if it is true spouse (Theorem 3.5). Since this

checking applies to all variables in , we are able to find all spouses of . █

The determination of any true spouse is done in a manner different from the learning of parents/children. While searching for ‟s parents and children, we try to filter as many false positives as possible, reaching a set containing true parents and children, though some descendants are included as well. Then, those false positives are further filtered out, with only true positives left. However, while searching for the spouses of , we directly check if each candidate is true or not.

Though the search of spouses proceeds in a different way, it depends on the output of

FindCanPC, including spouse candidates and sepsets cached. This again reflects the importance

of FindCanPC.

Related documents