**2.2 Probabilistic Graphical Models**

**2.2.5 Probabilistic Inference in Bayesian Networks and Influence Diagrams**

Bayesian networks and influence diagrams have the ability of answering different types of
*queries about the nodes they contain. Making queries to a BN or ID is a form of probabilistic *
*reasoning or probabilistic inference [68]. Normally, the inference starts when we have evidence, *
which means we observe a variable or set of variables, and we want to know the probability of
other variables given the evidence. For example, in the situation described by figure 2.2.4 we
observe that*J* 1, and want to know the probability of *R*0; we represent this query as

( 0 | 1)

*P R* *J* *. This operation is known as conditioning. Another query is marginalization, in *
which we look for the probability of a variable no conditioned to the other variables. In the
*example shown in figure 2.2.4 we could marginalize to obtainP J*( ), *P T*( ), *P S*( ), or *P R*( ).
Other operations that BNs allow are: most probable explanation (MPE), maximum a posteriori
probability (MAP), and sensitivity analysis [69, 72, 74].

Figure 2.2.4: Example of Bayesian Network

Let us have a BN composed of the sets **E** **and Q . The set ** **E** contains the evidence

**variables, whereas Q contains the remainder of the variables of the BN that are not evidence. **
When **E****e , MPE finds the instantiation q of Q that maximizes the probability *** P q e*( | ). If

41

**MPE looks for the instantiation of all the non-evidence variables, Q , that explains E****e with the **

highest probability. MAP does so but only for some of the non-evidence variables.

*Out of the aforementioned types of queries, conditioning or conditional probability query is *
the most common [69]. For solving this type of queries there exist exact and approximate
inference algorithms. For this dissertation I use an exact inference algorithm called variable
elimination (VE). The main idea behind the VE algorithm is to eliminate the variables that are
neither query nor evidence. The VE algorithm takes the factorized joint probability distribution
*and sums out the variables that it needs to eliminate. This operation is also called factor *
*marginalization. *

The factorized form of the joint probability distribution obtained with the chain rule for
Bayesian networks - equation 2.2.7 - can be seen as the product of factors. A factor is a
function that maps a set of variables **X** to a real number; : *Val*( )**X** , where * Val X*( )

**means value of the variable X and ** means map to. The set **X*** is called scope of the factor; *

therefore, *Scope*[ ] **X**. Equation 2.2.18 shows the chain rule for Bayesian networks as a

product of factors:
_{1}
1 1
( , , ) ( | )
*i* *i*
*n* *n*
*n* *i* *X* *X*
*i* *i*
*P X* *X* *P X* *Pa*

###

###

, (2.2.18) where the conditional probability ( | )*i*

*i* *X*

*P X Pa* is represented by the factor

*i*

*X*

###

. The scope of*i*

*X*

###

is the variable*X*and its parents

_{i}*i*

*X*
*Pa* .

42

Let us consider the set of variables **X*** and a variable Y***X being the scope of the factor **:

( , )*Y*

**X** *. The factor marginalization of Y* in , denoted ( , )

*Y*

*Y*

###

**X**, is equivalent to a factor

over **X** such that [69] :

( ) ( , )

*Y*

*Y*

**X**

###

**X**. (2.2.19) Another name for the operation in 2.2.19 is summing out of

*Y*in . In this operation we only should sum up combinations where the states of

**X**coincide.

The factor product and summation operations have properties equivalent to those of the
product and summation over numbers [69]. Both operations are commutative: _{1}· _{2} _{2}·_{1} and

*X* *Y* *Y* *X*

###

; the product is associative: ( 1 2) 3 1 ( 2 3); and they areinterchangeable:

( · )_{1} _{2} _{1}· _{2}

*X* *X*

###

###

, (2.2.20) if*X*

*Scope*[ ]

_{1}. This property allows to “push in” the summation, so that the summation is performed only on the subset of factors that contain the variable we want to eliminate. For instance, in 2.2.20 since we want to eliminate

*X*, we push in the summation to sum only over

_{2}because

_{1}does not contain

*X*.

The variable elimination (VE) algorithm takes advantage of the aforementioned properties. Figure 2.2.5 shows a very simple Bayesian network. The joint probability distribution for this

43

Bayesian network is: ( , , , )*P A B C D* * _{A}*· · ·

_{B}

_{C}*. If for instance we want to know the marginal probability of*

_{D}*D*,

*P D*( ), we apply factor marginalization: ( ) ( , , , )

*C* *B* *A*

*P D*

###

*P A B C D*. By applying property 2.2.15 to it we get:

( ) · · ·
· · ·
· · · .
*A* *B* *C* *D*
*C* *B* *A*
*C* *D* *A* *B*
*C* *B* *A*
*D* *C* *A* *B*
*C* *B* *A*
*P D*
_{} _{}
_{} _{} _{}_{}

###

###

###

###

###

###

(2.2.21)The procedure in 2.2.21 can be summarized as:

*Z*

###

. (2.2.22)Figure 2.2.5: Simple BN for illustration

*The expression 2.2.22 is also called sum-product inference task [69]. The variable *
elimination (VE) algorithm performs this inference task to sum out variables once at a time by
using the property 2.2.20. When summing a variable, we multiply the factors that contain such
variable to obtain a product factor. The next step is to sum out the variable from this product
factor to generate a new factor, which will go to the next iteration as part of the new set of factors
that the VE algorithm will be apply on. The VE algorithm will iterate until it removes all the
variables it aim to eliminate. We can summarize the VE algorithm as follows [68, 69]:

44

The VE algorithm receives a set of factors , a set of variables to eliminate **Z** and an

ordering on **Z**, . If the set **Z** is [*Z*_{1},,*Z _{k}*], let the ordering be

*Z*

_{i}*Z*

_{j}if and only if i*j*. The set

**Z**encompasses those variables that are neither query nor evidence. We refer to this

algorithm as procedure Sum-Product-VE(, ,**Z** ). The procedure Sum-Product-VE(, ,**Z** )

follows these steps:

1, ,
*i* *k*
**for** do
Sum-Product-Eliminate-Var ( , ) *Z _{i}*
*

###

after completing the*k*iteration.

_{th}*

**return** at the end.

The procedure Sum-Product-Eliminate-Var ( , ) *Zi* is performed for each of the iterations

1, ,

*i* *k*. This process receives the set of factors, and the variable to be eliminated*Z* ; then it
performs the next operations:

1. Form a set * with the factors that have Z in their scope: *

2. Form a set * with the factors that do not have Z in their scope, which is the set *
without :

{ : *Z* *Scope*[ ]}

45

3. Multiply the factors in the set and save the result in factor :

4. Add up the elements of the factor *where Z* *varies; this action eliminates Z from that *
*factor. After eliminating Z from * save the result in the factor :

5. Return the union between the set and the factor

The VE algorithm also applies when introducing evidence. Let us have a Bayesian network
that parameterizes the set of variables , the set of query variables **Y**, and evidence **E****e . **

When introducing evidence the task is to compute * P Y e*( , ). To execute this task, the factors are

reduced by **E****e and eliminate the variables ** **Y** **E** before applying the Sum-Product-VE

( , , ) **Z** procedure to the network . The factor *,which comes from Sum-Product-VE
( , , ) **Z** ,divided by is * P Y e*( , ). This whole procedure whereby

*( , ) is obtained is called*

**P Y e**Cond-Prob-VE( , , ) **Y E** procedure. This procedure encompasses the next steps [69]:

1. Factors parameterizing

2. Replace each by [**E****e**]

3. Set an elimination ordering

###

*Z*

###

{ } **return**

46
4. **Z** **Y** **E**
5. * Sum-Product-VE( , , **Z**)
6.
( )
* ( )
*y Val*
*y*

###

**Y**7.

**return** , *

47