135Modeling dependencies

Modeling dependencies with Bayesian and

5.1.1 Directed dependencies

Directed dependencies lead from one variable to another and are typically used to represent cause-effect relationships. For example, the printer’s power button being off causes the printer to be down, so a direct dependency exists between Printer Power Button On and Printer State. Figure 5.1 illustrates this dependency, with a directed edge between the two variables. (Edge is the usual term for an arrow between two nodes in a graph.) The first variable (in this case, Printer Power Button On) is called the parent, and the second variable (Printer State) is called the child.

Why does the arrow go from cause to effect? A simple reason is that causes tend to happen before effects. More deeply, the answer is closely related to the concept of the generative model explored in chapter 4. Remember, a generative model describes a process for generating values of all of the variables in your model. Typically, the generative process simulates a real-world process. If a cause leads to an effect, you want to generate the value of the cause first, and use that value when you generate the effect. In our example, if you create a model of a printer, and imagine generating values for all of the variables in the model, you’ll first generate a value for Printer Power Button On and then use this value to generate Printer State.

Now, it bears repeating that the direction of a dependency isn’t necessarily the direction of reasoning. You can reason from the printer power button being off to the printer state being down, but you can also reason in the opposite direction: if the printer state is up, you know for sure that the power button isn’t off. Many people make the mistake of constructing their models in the direction they intend to reason. In a diagnosis application, you might observe that the printer is down and try to determine its causes, so you’d reason from Printer State to Printer Power Button On. You might be tempted to make the arrow go from Printer State to Printer Power Button On. This is incorrect. The arrow should express the generative process, which follows the cause- effect direction.

I’ve said that directed dependencies typically model cause-effect relationships. In fact, cause-effect is just one example of a general class of asymmetric relationships between variables. Let’s have a closer look at various kinds of asymmetric relationships—first, cause-effect relationships, and then other kinds.

Printer State Printer Power Button On

Figure 5.1 Directed dependency expressing cause-effect relationship

VARIETIESOFCAUSE-EFFECTRELATIONSHIPS

Here are some kinds of cause-effect relationships:

■ What happens first to what happens next—The most obvious kind of cause-effect relationship is between one thing that leads to another thing at a later time. For example, if someone turns the printer power off, then after that, the printer will be down. This temporal relationship is such a common characteristic of cause-effect relationships that you might think all cause-effect relationships involve time, but I don’t agree with this.

■ Cause-effect of states—Sometimes you can have two variables that represent different aspects of the state of the situation at a given point in time. For example, you might have one variable representing whether the printer power button is off and another representing whether the printer is down. Both of these are states that hold at the same moment in time. In this example, the printer power button being off causes the printer to be down, because it makes the printer have no power.

■ _{True value to measurement}_{—Whenever one variable is a}_measurement_{of the value} of another variable, you say that the true value is a cause of the measurement. For example, suppose you have a Power Indicator Lit variable that represents whether the printer’s power LED is lit. An asymmetric relationship exists from Printer Power Button On to Power Indicator Lit. Typically, measurements are produced by sensors, and there may be more than one measurement of the same value. Also, measurements are usually observed, and you want to reason from the measurements to the true values, so this is another example of the direction of the dependencies being different from the direction of reasoning. ■ Parameter to variable that uses the parameter—For example, consider the bias of a

coin, representing the probability that a toss will come out heads, and a toss of that coin. The toss uses the bias to determine the outcome. It’s clear that the bias is generated first, and only then the individual toss. And when there are many tosses of the same coin, they’re all generated after the bias.

ADDITIONALASYMMETRICRELATIONSHIPS

The preceding cases are by far the most important and least ambiguous. If you under- stand these cases, you’ll be 95% of the way to determining the correct direction of dependencies. Now let’s go deeper by considering a variety of other relationships that, although obviously asymmetric, are ambiguous about the direction of dependency. I’ll list these relationships and then describe a rule of thumb that can help you resolve the ambiguity.

■ _{Part to whole}_{—Often, the properties of part of an object lead to properties of} the object as a whole. For example, consider a printer with toner and a paper feeder. Faults with either the toner or paper feeder, which are parts of the printer, can lead to faults with the printer as a whole. Other times, properties of the whole can determine properties of the part. For example, if the printer

137

In document Exploring Data Science (Page 140-142)