Ascertaining Causal Influences Using Manipulation

1.4 Creating Bayesian Networks Using Causal Edges

1.4.1 Ascertaining Causal Influences Using Manipulation

dictionary definition of a cause is ‘the one, such as a person, an event, or a condition, that is responsible for an action or a result.’ Although useful, this simple definition is certainly not the last word on the concept of causation, which has been wrangled about philosophically for centuries (See e.g. [Eells, 1991], [Hume, 1748], [Piaget, 1966], [Salmon, 1994], [Spirtes et al, 1993, 2000].). The definition does, however, shed light on anoperationalmethod for identifying causal relationships. That is, if the action of making variable X take some value sometimes changes the value taken by variableY, then we assume X is

1.4. CREATING BAYESIAN NETWORKS USING CAUSAL EDGES 45 responsible for sometimes changingY’s value, and we concludeXis a cause of

Y. More formally, we say we manipulateX when we forceX to take some value, and we sayXcausesY if there is some manipulation ofXthat leads to a change in the probability distribution ofY. We assume that if manipulatingX

leads to a change in the probability distribution ofY, thenXobtaining a value by any means whatsoever also leads to a change in the probability distribution of

Y. So we assume causes and their eﬀects are statistically correlated. However, as we shall discuss soon, variables can be correlated without one causing the other. A manipulation consists of a randomized controlled experiment (RCE) using some specific population of entities (e.g. individuals with chest pain) in some specific context (E.g., they currently receive no chest pain medication and they live in a particular geographical area.). The causal relationship discovered is then relative to this population and this context.

Let’s discuss how the manipulation proceeds. We first identify the population of entities we wish to consider. Our random variables are features of these entities. Next we ascertain the causal relationship we wish to investigate. Suppose we are trying to determine if variableX is a cause of variableY. We then sample a number of entities from the population (See Section 4.2.1 for a discussion of sampling.). For every entity selected, we manipulate the value of

Xso that each of its possible values is given to the same number of entities (IfX

is continuous, we choose the values ofX according to a uniform distribution.). After the value of X is set for a given entity, we measure the value of Y for that entity. The more the resultant data shows a dependency betweenX and

Y the more the data supports thatXcausally influencesY. The manipulation of X can be represented by a variable M that is external to the system being studied. There is one valuemiof M for each value xi of X, the probabilities of all values ofM are the same, and whenM equalsmi, Xequals xi. That is, the relationship betweenM andXis deterministic. The data supports thatX

causally influencesY to the extent that the data indicatesP(yi_|mj)₆=P(yi_|mk)

forj₆=k.Manipulation is actually a special kind of causal relationship that we assume exists primordially and is within our control so that we can define and discover other causal relationships.

An Illustration of Manipulation

We demonstrate these ideas with a comprehensive example concerning recent headline news. The pharmaceutical company Merck had been marketing its drug finasteride as medication for men for a medical condition. Based on anecdotal evidence, it seemed that there was a correlation between use of the drug and regrowth of scalp hair. Let’s assume that Merck determined such a correlation does exist. Should they concludefinasteride causes hair regrowth and therefore market it as a cure for baldness? Not necessarily. There are quite a few causal explanations for the correlation of two variables. We discus these next.

Possible Causal Relationships LetF be a variable representingfinasteride use andGbe a variable representing scalp hair growth. The actual values ofF

(b) (d) F G H G F (a) G F (c) G F (e) F G Y

Figure 1.12: All five causal relationships could account for F and G being correlated.

andGare unimportant to the present discussion. We could use either continuous or discrete values. If F caused G, then indeed they would be statistically correlated, but this would also be the case ifG causedF, or if they had some hidden common causeH. If we again represent a causal influence by a directed edge, Figure 1.12 shows these three possibilities plus two more. Figure 1.12 (a) shows the conjecture thatF causes G, which we already suspect might be the case. However, it could be that Gcauses F (Figure 1.12 (b)). You may argue that, based on domain knowledge, this does not seem reasonable. However, in general we do not have domain knowledge when doing a statistical analysis. So from the correlation alone, the causal relationships in Figure 1.12 (a) and (b) are equally reasonable. Even in this domain, G causing F seems possible. A man may have used some other hair regrowth product such as minoxidil, which caused him to regrow hair, became excited about the regrowth, and decided to try other products such asfinasteride which he heard might cause regrowth. As a third possibility, it could be both that finasteride causes hair regrowth and hair regrowth causes use offinasteride, meaning we could have a causal loop or

1.4. CREATING BAYESIAN NETWORKS USING CAUSAL EDGES 47 feedback. Therefore, Figure 1.12 (c) is also a possibility. For example, finasteride may cause regrowth, and excitement about regrowth may cause use of finasteride. A fourth possibility, shown in Figure 1.12 (d), is thatF andGhave some hidden common cause H which accounts for their statistical correlation. For example, a man concerned about hair loss might try bothfinasteride and minoxidil in his eﬀort to regrow hair. The minoxidil may cause hair regrowth, while the finasteride does not. In this case the man’s concern is a cause of finasteride use and hair regrowth (indirectly through minoxidil use), while the latter two are not causally related. Afifth possibility is that we are observing a population in which all individuals have some (possibly hidden) eﬀect of both

F and G. For example, suppose finasteride and apprehension about lack of hair regrowth are both causes of hypertension2_{, and we happen to be observing} individuals who have hypertensionY. We say a node isinstantiatedwhen we know its value for the entity currently being modeled. So we are saying Y is instantiated to the same value for all entities in the population we are observing. This situation is depicted in Figure 1.12 (e), where the cross throughY means the variable is instantiated. Ordinarily, the instantiation of a common effect creates a dependency between its causes because each cause explains away the occurrence of the effect, thereby making the other cause less likely. Psycholo- gists call thisdiscounting. So, if this were the case, discounting would explain the correlation between F and G. This type of dependency is calledselection bias. Afinal possibility (not depicted in Figure 1.12) is thatF andGare not causally related at all. The most notable example of this situation is when our entities are points in time, and our random variables are values of properties at these different points in time. Such random variables are often correlated without having any apparent causal connection. For example, if our population consists of points in time, J is the Dow Jones Average at a given time, andL

is Professor Neapolitan’s hairline at a given time, thenJ and Lare correlated. Yet they do not seem to be causally connected. Some argue there are hidden common causes beyond our ability to measure. We will not discuss this issue further here. We only wish to note the diﬃculty with such correlations. In light of all of the above, we see then that we cannot deduce the causal relationship between two variables from the mere fact that they are statistically correlated. It may not be obvious why two variables with a common cause would be correlated. Consider the present example. Suppose His a common cause ofF

and G and neither F nor G caused the other. Then H and F are correlated becauseH causes F, H andG are correlated becauseH causes G, which im- plies F and G are correlated transitively through H. Here is a more detailed explanation. For the sake of example, suppose h1is a value of H that has a causal influence on F taking value f1and on G taking value g1. Then if F

had value f1, each of its causes would become more probable because one of them should be responsible. SoP(h1_|f1)> P(f1). Now since the probability ofh1has gone up, the probability ofg1would also go up becauseh1causesg1.

2_{There is no evidence that either}_fi_{nasteride or apprenhension about lack of hair regrowth}

M

F

G

P(m1) = .5 P(m2) = .5 P(f1|m1) = 1 P(f2|m1) = 0 P(f1|m2) = 0 P(f2|m2) = 1

Figure 1.13: A manipulation investigating whetherF causes G. Therefore,P(g1_|f1)> P(f1), which meansF andGare correlated.

Merck’s Manipulation Study Since Merck could not conclude finasteride causes hair regrowth from their mere correlation alone, they did a manipulation study to test this conjecture. The study was done on 1,879 men aged 18 to 41 with mild to moderate hair loss of the vertex and anterior mid-scalp areas. Half of the men were given 1 mg. of finasteride, while the other half were given 1 mg. of placebo. Let’s define variables for the study, including the manipulation variableM:

Variable Value When the Variable Takes this Value

F f1 Subject takes 1 mg. offinasteride.

f2 Subject takes 1 mg. of placebo.

G g1 Subject has significant hair regrowth.

e2 Subject does not have significant hair regrowth.

M m1 Subject is chosen to take 1mg offinasteride.

m2 Subject is chosen to take 1mg of placebo.

Figure 1.13 shows the conjecture that F causes G and the RCE used to test this conjecture. There is an oval around the system being modeled to indicate the manipulation comes from outside that system. The edges in that figure represent causal influences. The RCE supports the conjecture that F causesG

to the extent that the data supportP(g1_|m1)=₆ P(g1_|m2).Merck decided that ‘significant hair regrowth’ would be judged according to the opinion of independent dermatologists. A panel of independent dermatologists evaluated photos of the men after 24 months of treatment. The panel judged that significant hair regrowth was demonstrated in 66 percent of men treated withfinasteride compared to 7 percent of men treated with placebo. Basing our probability on

1.4. CREATING BAYESIAN NETWORKS USING CAUSAL EDGES 49

F

D

G

Figure 1.14: A causal DAG depicting thatF causesDandDcausesG. these results, we haveP(g1_|m1)_≈.67andP(g1_|m2)_≈.07. In a more analytical analysis, only 17 percent of men treated withfinasteride demonstrated hair loss (defined as any decrease in hair count from baseline). In contrast, 72 percent of the placebo group lost hair, as measured by hair count. Merck concluded that finasteride does indeed cause hair regrowth, and on Dec. 22, 1997 announced that the U.S. Food and Drug Administration granted marketing clearance to Propecia(TM) (finasteride 1 mg.) for treatment of male pattern hair loss (an- drogenetic alopecia), for use in men only. See [McClennan and Markham, 1999] for more on this.

Causal Mediaries The action offinasteride is well-known. That is, manipulation experiments have shown it significantly inhibits the conversion of testosterone to dihydro-testosterone (DHT) (See e.g. [Cunningham et al, 1995].). So without performing the study just discussed, Merck could assume finasteride (F) has a causal effect on DHT level (D). DHT is believed to be the andro- gen responsible for hair loss. Suppose we know for certain that a balding man, whose DHT level was set to zero, would regrow hair. We could then also conclude DHT level (D) has a causal effect on hair growth (G). These two causal relationships are depicted in Figure 1.14. Could Merck have used these causal relations to conclude for certain thatfinasteride would cause hair regrowth and avoid the expense of their study? No, they could not. Perhaps, a certain minimal level of DHT is necessary for hair loss, more than that minimal level has no further effect on hair loss, and finasteride is not capable of lowering DHT level below that level. That is, it may be thatfinasteride has a causal effect on DHT level, DHT level has a causal effect on hair growth, and yetfinasteride has no effect on hair growth. If we identify that F causesD andDcausesG, and

F and Gare probabilistically independent, we say the probability distribution of the variables is not faithful to the DAG representing their identified causal relationships. In general, we say (_G, P) satisfies the faithfulness condition if

(_G, P)satisfies the Markov condition and the only conditional independencies in

P are those entailed by the Markov condition. So, ifF andGare independent, the probability distribution does not satisfy the faithfulness condition with the DAG in Figure 1.14 because this independence is not entailed by the Markov condition. Faithfulness, along with its role in causal DAGs, is discussed in detail in Chapter 2.

Notice that if the variableDwas not in the DAG in Figure 1.14, and if the probability distribution did satisfy the faithfulness condition (which we believe based on Merck’s study), there would be an edge fromF directly intoGinstead

of the directed path throughD. In general, our edges always represent only the relationships among the identified variables. It seems we can usually conceive of intermediate, unidentified variables along each edge. Consider the following example taken from [Spirtes et al, 1993, 2000] [p. 42].

IfCis the event of striking a match, andAis the event of the match catching on fire, and no other events are considered, then C is a direct cause ofA. If, however, we addedB; the sulfur on the match tip achieved suﬃcient heat to combine with the oxygen, then we could no longer say thatC directly causedA, but ratherC directly causedB and Bdirectly causedA. Accordingly, we say thatB is a causal mediary betweenC andAifCcausesB andB causesA.

Note that, in this intuitive explanation, a variable name is used to stand also for a value of the variable. For example, Ais a variable whose value ison-fire

ornot-on-fire, andAis also used to represent that the match is onfire. Clearly, we can add more causal mediaries. For example, we could add the variable D

representing whether the match tip is abraded by a rough surface. Cwould then cause D, which would cause B, etc. We could go much further and describe the chemical reaction that occurs when sulfur combines with oxygen. Indeed, it seems we can conceive of a continuum of events in any causal description of a process. We see then that the set of observable variables is observer dependent. Apparently, an individual, given a myriad of sensory input, selectively records discernible events and develops cause-eﬀect relationships among them. There- fore, rather than assuming there is a set of causally related variables out there, it seems more appropriate to only assume that, in a given context or application, we identify certain variables and develop a set of causal relationships among them.

Bad Manipulation

Before discussing causation and the Markov condition, we note some cautionary procedures of which one must be aware when performing a RCE. First, we must be careful that we do not inadvertently disturb the system other than the disturbance done by the manipulation variable M itself. That is, we must be careful we do not accidentally have any other causal edges into the system being modeled. The following is an example of this kind of bad manipulation (due to Greg Cooper [private correspondence]):

Example 1.33 Suppose we want to determine the relative eﬀectiveness of home treatment and hospital treatment for low-risk pneumonia patients. Consider those patients of Dr. Welby who are randomized to home treatment, but whom Dr. Welby normally would have admitted to the hospital. Dr. Welby may give more instructions to such home-bound patients than he would give to the typical home-bound patient. These instructions might influence patient outcomes. If those instructions are not measured, then the RCE may give biased estimates of the eﬀect of treatment location (home or hospital) on patient outcome. Note, we

1.4. CREATING BAYESIAN NETWORKS USING CAUSAL EDGES 51

are interested in estimating the effect of treatment location on patient outcomes, everything else being equal. The RCE is actually telling us the effect of treatment allocation on patient outcomes, which is not of interest here (although it could be of interest for other reasons). The manipulation of treatment allocation is a bad manipulation of treatment location because it not only results in a manipulation M of treatment location, but it also has a causal effect on physicians’ other actions such as advice given. This is an example of what some call a ‘fat hand’ manipulation, in the sense that one would like to manipulate just one variable, but one’s hand is so fat that it ends up affecting other variables as well.

Let’s show with a DAG how this RCE inadvertently disturbs the system being modeled other than the disturbance done by M itself. If we letL represent treatment location, A represent treatment allocation, and M represent the manipulation of treatment location, we have these values:

Variable Value When the Variable Takes this Value

L l1 Subject is at home

l2 Subject is in hospital

A a1 Subject is allocated to be at home

a2 Subject is allocated to be in hospital

M m1 Subject is chosen to stay home

m2 Subject is chosen to stay in hospital

Other variables in the system include E representing the doctor’s evaluation of the patient,T representing the doctor’s treatments and other advice, andOrep- resenting patient outcome. Since these variables can have more than two values

In document Learning Bayesian Networks Neapolitan R E pdf (Page 54-61)