Direct Decomposition Bounds for Influence Diagrams
3.3 Bounding Scheme using Exponentiated Utility Functions
3.3.2 Join-graph Decomposition Bounds for Exponentiated IDs
Given the exponentiated ID Me := hX, D, P, Ue, Oi of M := hX, D, P, U, Oi, a join-graph decomposition GJG := hG(C, S), χ, ψi for Me can be obtained in the usual way using the mini-bucket tree decomposition (see Example 3.1). Figure 3.5 and Example 3.5 show the
Figure 3.5: Join-graph Decomposition of the Exponentiated ID.
changes to the join-graph decomposition due to the exponentiated ID Me.
Example 3.5. Figure 3.5 presents a join-graph decomposition GJG:= hG(C, S), χ, ψi of the ID in Figure 3.1a. Here we do not use the valuation algebra over the Jensen’s potentials as was shown in Figure 3.1b. The join-graph in Figure 3.5 treats probability functions and exponentiated utility functions as unnormalized factors in a probabilistic graphical models.
Theorem 3.4, shows that the MEU of an ID M can be bounded by the log-partition function of the exponentiated ID Me. Therefore, we can now apply any variational decomposition bounds in the mixed inference task for bounding the log-partition function to yield bounds to the MEU of M. Namely, we bound the MEU by first applying Theorem 3.4 to obtain Me from M, and then by using variational decomposition bounds in the probabilistic graphical models to Me.
3.3.2.1 Derivation of Upper Bounds
Given a join-graph decomposition GJG := hG(C, S), χ, ψi of Me, we denote a cost-shifting function over the separator (Ci, Cj) ∈ S by δCi,Cj(XCi,Cj). The upper bounds on the MEU can be parameterized over GJG as described in the following proposition.
Proposition 3.3. Given an ID M := hX, D, P, U, Oi, where U = {U1, . . . , Ul} and its exponentiated ID Me := hX, D, P, Ue, Oi, Ue = {eU1, . . . , eUl}, a total elimination order Oelim := {X1, X2, . . . , XN} over the variables X ∪ D, and given a join-graph decomposition GJG := hG(C, S), χ, ψi of Me, the upper bound on the MEU can be parameterized over GJG as follows. UMEU denotes as usual an upper bound over the MEU. We claim that for a given set of weights.
and we denote the expression on the right hand side as UMEU where,
PCi(XCi) = Y
k is the weight for the variable Xk assigned at cluster C that satisfies wXk =P
C∈CwCX
k.
Proof. We start our derivation from Eq. (3.61) in Theorem 3.4 bounding the MEU of M by the log-partition function of Me repeated in Eq. (3.73). To get Eq. (3.74), we rearrange the probability functions and exponentiated utility functions at each cluster Ci according to GJG, and introduce cost-shifting functions over the separators, which cancel each other. Namely,
δCi,Cj(XCi,Cj)δCj,Ci(XCi,Cj) = 0(XCi,Cj) Next, to derive Eq. (3.75), we rewrite functions in the log-space and replaces the exponentiated utility functions Fi(XFi) with the original utility functions Ui(XUi) Subsequently, we switch the order of the multiplication operation and the elimination operation using decomposition bounds on mini-buckets [Ping et al., 2015]
yielding Eq. (3.76), and obtain the desired expression by changing log product to sum log and rewriting the powered-sum operation explicitly, yielding Eq. (3.77).
MEU =
Proposition 3.3 introduces two kinds of optimization parameters: (1) cost-shifting functions δCi,Cj(XCi,Cj), and (2) weight parameters wCi that satisfies the following condition, wX = P
Ci∈CwCXi for all X ∈ χ(Ci). Compared with algorithm JGD-ID presented in Section 3.2.2, algorithm JGD-EXP has only two types of optimization parameters because JGD-EXP does not use the valuation algebra for IDs.
Algorithm 3.8 Join-graph GDD of Exponentiated IDs (JGD-EXP)
Require: Exponentiated ID Me := hX, D, P, Ue, Oi of M := hX, D, P, U, Oi, weights wXi associated with each variable Xi ∈ X, i-bound, iteration limit M for the outer-loop of BCD.
Ensure: an upper bound of the MEU, UMEU Initialization Steps
1: Generate a join-graph decomposition GJG = hG(C, S), χ, ψi of Me with given i-bound
2: Initialize weights for each chance variable Xi ∈ X by uniform weights, wXC
i = |{C|X1
i∈χ(C)}|, and for each decision variable Di ∈ D by a constant close to zero, wDC
i ≈ 0.
Outer-loop of BCD
3: iter=0, UMEU= inf
4: while iter < M or Uvalue is not converged do Inner-loop of BCD
5: for each variable Xi ∈ X do
6: UMEU ← min(UMEU, Update-Weights(GJG, Xi)) . use Algorithm 3.3 with the gradient expression shown in Eq. (3.78)
7: for each edge (Ci, Cj) ∈ S do
8: UMEU ← min(UMEU, Update-Costs(GJG, (Ci, Cj))) . see Algorithm 3.9
9: iter = iter + 1 return UMEU
3.3.2.2 Optimizing the Paramterized Bounds
Next, we present algorithm JGD-EXP that solves a convex optimization problem for tighten-ing the upper bound UMEUdefined in Eq. (3.68) by optimizing over the cost-shifting functions δCi,Cj(XCi,Cj) and the weight parameters wCX
k parameterized over the join-graph G(C, S). As we will see, we will need to modify algorithm JGD-ID shown in Algorithm 3.2 in order to present JGD-EXP.
Message Passing Algorithm (JGD-EXP) Algorithm 3.8 outlines the overall procedure of BCD updates in JGD-EXP. In the initialization step of JGD-EXP, we generate a join-graph of Me and initialize its weights as usual (line 1-2), The inner-loop of JGD-EXP alternates updating weights by UPDATE-WEIGHTS(GJG,Xi) and updating cost-shifting functions by UPDATE-COSTS(GJG,(Ci,Cj)) (line 5-8). During the updating if the weight parameters, we optimize a subset of weight parameters wX = {wXC|X ∈ χ(C)} for each variable X using
Algorithm 3.3, which was also used for updating the weight parameters in JGD-ID. Yet, we modify the gradient expression for the exponentiated utility bounds as will be shown in Eq. (3.78). For updating the cost-shifting functions, we use Algorithm 3.9 to be presented shortly which performs gradient descent updates for δCi,Cj(XCi,Cj) over the separators S.
Since both JGD-ID and JGD-EXP are based on GDD bounds [Ping et al., 2015], they are quite similar. However, JGD-EXP does not involve the complicated non-convex optimization problems for updating the parameters, and the derived gradient expressions are much simpler than JGD-ID, as we will show in the following Section. The main difference between these two bounding schemes is in how each handles the additive utility functions. In JGD-ID, we generalize the powered-sum elimination for IDs and derived decomposition bounds by Minkowski’s inequality and Hölder’s inequality. In JGD-EXP, we bound the MEU of M by first using the exponentiated utility bound, which gives Me, and then by applying the decomposition bounds to the log-partition function of Me.
Next, we will present the optimization routines for optimzing the weight parametrs and the cost-shifting functions.
Updating Weights The UPDATE-WEIGHTS routine in Algorithm 3.8 updates the weights wCXji for each variable Xi over clusters {C|Xi ∈ χ(C)} like in algorithm JGD-ID. The only modification required in Algorithm 3.3 to accommodate that is to provide the gradient expression due to the new objective function UMEU defined in Eq. (3.68). Namely,
∂UMEU
Algorithm 3.9 UPDATE-COSTS(GJG, (Ci, Cj))
Require: a join-graph decomposition GJG, Xi) of an exponentiated ID Me = hX, D, P, Ue, Oi of M = hX, D, P, U, Oi, separator (Ci, Cj) iteration limit M for the gradient descent
Ensure: an upper bound of the MEU, UMEU,
1: λCi ←P
6: Compute gradient of UMEU with respect to the cost-shifting function by Eq. (3.80)
7: Find cost-shifting functions δCi,Cj(XCi,Cj) by Eq. (3.32).
8: λCi(XCi) ← FCi(XCi) − δCi,Cj(XCi,Cj)
9: λCj(XCj) ← FCj(XCj) + δCi,Cj(XCi,Cj)
10: Compute new UMEU using Eq. (3.68) with the updated λCi(XCi) and λCj(XCj)
11: t = t + 1
Eq. (3.78) is a conditional entropy of the pseudo marginal at cluster C defined by Eq. (3.79).
For more details for deriving Eq. (3.78), see [Ping et al., 2015].
Updating Cost-shifting Functions We now derive the UPDATE-COSTS routine that is used in Algorithm 3.8 to update the cost-shifting functions δCi,Cj(XCi,Cj) over separators (Ci, Cj) ∈ S. As usual, this is done via a gradient descent algorithm. We obtain a closed-form gradient expression for the cost-shifting paramters by evaluating the partial derivatives of UMEU in Eq. (3.68) with respect to δCi,Cj(XCi,Cj) as follows.
For the derivation of the gradients, see [Ping et al., 2015].
Algorithm 3.9 presents the gradient descent algorithm for optimizing the cost-shifting func-tions. Given a separator (Ci, Cj), we combine the probability functions and the utility functions assigned at cluster Ci and Cj to λCi(XCi) and λCj(XCj), respectively (line 1-2).
Note that this algorithm performs factor operations in the log-scale. In each iteration, we
set the cost-shifting function δCi,Cj(XCi,Cj) to a neutral factor 0(XCi,Cj). Then we compute gradient using Eq. (3.80) and find the cost-shifting function by Eq. (3.32) (line 6-7), and then we reparameterize λCi(XCi) and λCj(XCj) (line 8-9). We repeat this until we reach the iteration limit M or until convergence.
Complexity of JGD-EXP We can analyze the time and space complexity of Algorithm 3.8 by following the same reasoning for analyzing the complexity of algorithm JGD-ID (Theorem 3.2).
Theorem 3.5. Given an ID M := hX, D, P, U, Oi, and its exponentiated ID Me :=
hX, D, P, Ue, Oi, and given a join-graph decomposition GJG := hG(C, S), χ, ψi of Mewith an i-bound, the time complexity of JGD-EXP is O M1·M2·(|X∪D|·|C|·Gw·ki+1+·|S|·Gc·ki+1+) and the space complexity is O (|C| · ki+1+ |S| · k|sep|), where M1 is the iteration limit of the outer-loop of the BCD method, M2 is the iteration limit of the gradient descent algorithms in the inner-loop of the BCD method, |X ∪ D| is the number of variables, k is the maximum domain size, |C| and |S| are the number of clusters and separators in the join-graph decom-position GJG, and |sep| is the separator width. Gw, and Gc are constants that bounds the number of factor operations for evaluating gradients.
Summary Algorithm JGD-EXP generates upper bounds by adapting the GDD algorithm over the join-graph decomposition after exponentiating the utility functions. This formu-lation avoids the non-convexity issue in the previous algorithms JGD-ID and WMBE-ID.
However, the bounding scheme cannot recover the exact solution, even if the i bound ex-ceeds the constrained induced width.
3.3.3 Weighted Mini-bucket Moment Matching Bounds for Exponentiated