169Using Markov networks

Modeling dependencies with Bayesian and

have potential value 0.4, while the one pixel that’s false has potential value 0.6. Table 5.4 shows the potential values from the four binary pixels. The cases where the two pixels have the same value have potential value 0.9, whereas the other two cases have potential value 0.1. This covers all potentials in the model.

Next, you multiply the potential values from all of the potentials. In our example, you get 0.4 × 0.4 × 0.4 × 0.6 × 0.9 × 0.1 × 0.9 × 0.1 = 0.00031104. Why do you multiply? Think about the “all else being equal” principle. If two worlds have the same probability except for one potential, then the probabilities of the worlds are proportional to their potential value according to that potential. This is exactly the effect you get when you multiply the probabilities by the value of this potential. Continuing this reasoning, you multiply the potential values of all of the potentials to get the “probability” of a possible world.

I put “probability” in quotes because it’s not actually a probability. When you multiply the potential values in this way, you’ll find that the “probabilities” don’t sum to 1. This is easily fixed. To get the probability of any possible world, you normalize the “probabilities” computed by multiplying the potential values. You call these the unnor- malized probabilities. The sum of these unnormalized probabilities is called the normaliz- ing factor and is usually denoted by the letter Z. So you take the unnormalized probabilities and divide them by Z to get the probabilities. Don’t worry if this process sounds cumbersome to you; Figaro takes care of all of it.

A surprising point comes out of this discussion. In a Bayesian network, you could compute the probability of a possible world by multiplying the relevant CPD entries. In a Markov network, you can’t determine the probability of any possible world without considering all possible worlds. You need to compute the unnormalized probability of every possible world to calculate the normalizing factor. For this reason, some people find that representing Markov networks is harder than Bayesian networks, because it’s harder to interpret the numbers as defining a probability. I say that if you keep in mind the “all else being equal” principle, you’ll be able to define the parameters of a Markov network with confidence. You can leave computing the normalizing factor to Figaro. Of course, the parameters of both Bayesian networks and Markov networks can be learned from data. Use whichever structure seems more appropriate to you, based on the kinds of relationships in your application.

Table 5.4 Potential values for binary potentials for example possible world

Variable 1 Variable 2 Potential value

Pixel 11 Pixel 12 0.9

Pixel 21 Pixel 22 0.1

Pixel 11 Pixel 21 0.9

5.5.2 Representing and reasoning with Markov networks

There’s one way Markov networks are definitely simpler than Bayesian networks: in the reasoning patterns. A Markov network has no notion of induced dependencies. You can reason from one variable to another variable along any path, as long as that path isn’t blocked by a variable that has been observed. Two variables are dependent if there’s a path between them, and they become conditionally independent given a set of variables if those variables block all paths between the two variables. That’s all there is to it.

Also, because all edges in a Markov network are undirected, there’s no notion of cause and effect or past and future. You don’t usually think of tasks such as predicting future outcomes or inferring past causes of current observations. Instead, you simply infer the values of some variables, given other variables.

REPRESENTINGTHEIMAGE-RECOVERYMODELIN FIGARO

In the image-recovery application, you’ll assume that some of the pixels are observed and the rest are unobserved. You want to recover the unobserved pixels. You’ll use the model described in the previous section, which specifies both the potential value for each pixel being on and the potential value for adjacent pixels having the same value. Here’s the Figaro code for representing the model. Remember that in section 5.1.2, I said there are two methods for specifying symmetric relationships, a constraints method and a conditions method. This code uses the constraints method:

val pixels = Array.fill(10, 10)(Flip(0.4))

def setConstraint(i1: Int, j1: Int, i2: Int, j2: Int) {

val pixel1 = pixels(i1)(j1)

val pixel2 = pixels(i2)(j2)

val pair = ^^(pixel1, pixel2)

pair.addConstraint(bb => if (bb._1 == bb._2) 0.9; else 0.1) } for { i <- 0 until 10 j <- 0 until 10 } { if (i <= 8) setConstraint(i, j, i+1, j) if (j <= 8) setConstraint(i, j, i, j+1) }

A few notes on this code are in order:

■ In the definition of pixels, Array.fill(10, 10)(Flip(0.4)) creates a 10 × 10 array and fills every element of the array with a different instance of Flip(0.4). All of the different pixels are defined by different Flip elements, which is important because they can all have different values.

■ You might be wondering why a Flip element is used at all for the unary potentials rather than a constraint. For a unary potential, defining it in the usual

Set the unary constraint on each variable

Set the binary constraint on a pair of variables given their coordinates

Apply the binary constraint to all pairs of adjacent variables

171

In document Exploring Data Science (Page 174-176)