• No results found

Convergence Behavior for Iterative Algorithms

Empirical Evaluation of Decomposition Bounds for Influence Diagrams

4.4 Convergence Behavior for Iterative Algorithms

In this section, we report the convergence behavior of the iterative algorithms, JGD-ID, WMBE-ID, JGD-EXP, and GDD-MI. Note that WMBE-ID and WMBM-EXP generate the upper bound only once when they terminate. Therefore, we report the upper bound from WMBE-ID and WMBMM-EXP when they terminate.

4.4.1 FH-MDP Domain

Figure 4.1 and 4.2 report upper bounds as a function of time at mdp8-28-3-6-4 and mdp9-32-3-8-3 instances. The plots on the left hand side show the upper bounds from all algorithms in log scale with varying i-bounds from 1 to 15. The plots on the right hand side focus on the expected utility close to the best upper bounds in linear scale. JGD-ID and GDD-MI showed slower speed of convergence especially when the i-bound is larger. The results from WMBE-ID are tighter than the bounds from JGD-EXP and WMBMM-EXP as was also shown in Table 4.12 and 4.13.

4.4.2 FH-POMDP Domain

Figure 4.3 and 4.4 report upper bounds at pomdp5-6-4-3-5-3 and pomdp10-12-7-3-8-4 in-stances. We first see that the quality of upper bounds from JGD-EXP and WMBMM-EXP are order of magnitude tighter than other algorithms, and WMBE-ID showed clear improve-ments in the upper bounds given higher i-bounds. Comparing the convergence behavior, we see that EXP and GDD-MI improved upper bounds smoothly over time, while JGD-ID showed step-wise improvements. At pomdp5-6-4-3-5-3, we can observe that the upper bounds from JGD-EXP is tighter than WBMMM-EXP after time bounds 1,000 seconds when they were given i-bounds 1 or 5. As we increase i-bound to 10 and 15, WBMMM-EXP produced tighter upper bound than JGD-EXP.

4.4.3 RAND Domain

Figure 4.5 and 4.6 report upper bounds at rand-c50d1501 and rand-c70d21o1 instances. As in the previous domains, we observe similar convergence behavior at both instances. Namely, WBMMM-EXP terminated and generated the upper bound earlier than other algorithms, and JGD-EXP converged to tighter i-bound in shorter time bounds than other iterative algorithms JGD-ID and GDD-MI. Algorithm WMBE-ID improved the quality of upper bounds when it was given higher i-bounds.

4.4.4 BN Domain

Figure 4.7 and 4.8 report upper bounds at BN-14w57d12 and BN-78-w24d6 instances. From both instances, we observe the transition of the best performing algorithms due to increased i-bounds. Namely, we see that the upper bounds from JGD-EXP is tighter than WBMMM-EXP when they were given i-bounds 1 or 5. However, WBMMM-WBMMM-EXP started to generate tighter upper bounds than JGD-EXP when it was given i-bound greater than 10.

4.4.5 Summary

In this section, we presented the convergence behavior of our bounding algorithms. Among iterative algorithms, we see that algorithm JGD-EXP and WMBMM-EXP generated the tightest upper bounds that are order of magnitude tighter than other algorithms at shorter time bounds. Comparing JGD-ID and GDD-MI, We see that the JGD-ID showed step-wise improvement behavior until convergence, when GDD-MI showed smoother curves. We see that WMBE-ID greatly improved the upper bounds given higher i-bounds when other iterative algorithms often generated worse upper bounds or minor improvements. As we also observed in the tabular results earlier, WMBMM-EXP generated tighter upper bounds than JGD-EXP with higher i-bounds or when both have time bounds shorter than 1,000 seconds.

(a) mdp8-28-3-6-4 i=1 (b) mdp8-28-3-6-4 i=1

(c) mdp8-28-3-6-4 i=5 (d) mdp8-28-3-6-4 i=5

(e) mdp8-28-3-6-4 i=10 (f ) mdp8-28-3-6-4 i=10

(g) mdp8-28-3-6-4 i=15 (h) mdp8-28-3-6-4 i=15

Figure 4.1: Convergence Behavior over Varying i-bounds at mdp8-28-3-6-4. The x-axis of is the time in

(a) mdp9-32-3-8-3 i=1 (b) mdp9-32-3-8-3 i=1

(c) mdp9-32-3-8-3 i=5 (d) mdp9-32-3-8-3 i=5

(e) mdp9-32-3-8-3 i=10 (f ) mdp9-32-3-8-3 i=10

(g) mdp9-32-3-8-3 i=15 (h) mdp9-32-3-8-3 i=15

Figure 4.2: Convergence Behavior over Varying i-bounds at mdp9-32-3-8-3. The x-axis of is the time in log scale. Figures on the left hand side shows the expected utility in the log scale, and on the right hand side shows the expected utility in the linear scale in a focused region.

(a) pomdp5-6-4-3-5-3 i=1 (b) pomdp5-6-4-3-5-3 i=1

(c) pomdp5-6-4-3-5-3 i=5 (d) pomdp5-6-4-3-5-3 i=5

(e) pomdp5-6-4-3-5-3 i=10 (f ) pomdp5-6-4-3-5-3 i=10

(g) pomdp5-6-4-3-5-3 i=15 (h) pomdp5-6-4-3-5-3 i=15

Figure 4.3: Convergence Behavior over Varying i-bounds at pomdp5-6-4-3-5-3. The x-axis of is the time

(a) pomdp10-12-7-3-8-4 i=1 (b) pomdp10-12-7-3-8-4 i=1

(c) pomdp10-12-7-3-8-4 i=5 (d) pomdp10-12-7-3-8-4 i=5

(e) pomdp10-12-7-3-8-4 i=10 (f ) pomdp10-12-7-3-8-4 i=10

(g) pomdp10-12-7-3-8-4 i=15 (h) pomdp10-12-7-3-8-4 i=15

Figure 4.4: Convergence Behavior over Varying i-bounds at pomdp10-12-7-3-8-4. The x-axis of is the time in log scale. Figures on the left hand side shows the expected utility in the log scale, and on the right hand side shows the expected utility in the linear scale in a focused region.

(a) rand-c50d15o1-03 i=1 (b) rand-c50d15o1-03 i=1

(c) rand-c50d15o1-03 i=5 (d) rand-c50d15o1-03 i=5

(e) rand-c50d15o1-03 i=10 (f ) rand-c50d15o1-03 i=10

(g) rand-c50d15o1-03 i=15 (h) rand-c50d15o1-03 i=15

Figure 4.5: Convergence Behavior over Varying i-bounds at rand-c50d15o1-03. The x-axis of is the time

(a) rand-c70d21o1-01 i=1 (b) rand-c70d21o1-01 i=1

(c) rand-c70d21o1-01 i=5 (d) rand-c70d21o1-01 i=5

(e) rand-c70d21o1-01 i=10 (f ) rand-c70d21o1-01 i=10

(g) rand-c70d21o1-01 i=15 (h) rand-c70d21o1-01 i=15

Figure 4.6: Convergence Behavior over Varying i-bounds at rand-c70d21o1-01. The x-axis of is the time in log scale. Figures on the left hand side shows the expected utility in the log scale, and on the right hand side shows the expected utility in the linear scale in a focused region.

(a) BN-14w57d12 i=1 (b) BN-14w57d12 i=1

(c) BN-14w57d12 i=5 (d) BN-14w57d12 i=5

(e) BN-14w57d12 i=10 (f ) BN-14w57d12 i=10

(g) BN-14w57d12 i=15 (h) BN-14w57d12 i=15

Figure 4.7: Convergence Behavior over Varying i-bounds at BN-14w57d12. The x-axis of is the time in

(a)BN-78-w24d6 i=1 (b) BN-78-w24d6 i=1

(c) BN-78-w24d6 i=5 (d) BN-78-w24d6 i=5

(e) BN-78-w24d6 i=10 (f ) BN-78-w24d6 i=10

(g) BN-78-w24d6 i=15 (h) BN-78-w24d6 i=15

Figure 4.8: Convergence Behavior over Varying i-bounds at BN-78-w24d6. The x-axis of is the time in log scale. Figures on the left hand side shows the expected utility in the log scale, and on the right hand side shows the expected utility in the linear scale in a focused region.

Domain i-bd time JGD WMBE JGD WMBMM GDD WMBMM MBE

(sec) -ID -ID -EXP -EXP -MI -MMAP -ID

All 1 10E+1 0.017 0.0 0.086 0.708 0.011 0.005 0.002

n:75.6 f:87.1 10E+2 0.048 0.006 0.504 0.708 0.055 0.005 0.002 k:2.4 s:5.9 10E+3 0.112 0.059 0.67 0.708 0.105 0.005 0.002

w:25.9 10E+4 0.584 0.096 0.73 0.708 0.195 0.005 0.002

10E+6 0.663 0.181 0.786 0.708 0.272 0.005 0.002

5 10E+1 0.045 0.0 0.019 0.753 0.01 0.022 0.066

10E+2 0.078 0.065 0.457 0.753 0.036 0.022 0.066 10E+3 0.175 0.152 0.686 0.753 0.095 0.022 0.066 10E+4 0.449 0.262 0.759 0.753 0.176 0.022 0.066 10E+6 0.539 0.305 0.808 0.753 0.288 0.022 0.066

10 10E+1 0.062 0.0 0.034 0.818 0.008 0.087 0.2

10E+2 0.151 0.099 0.449 0.818 0.027 0.087 0.2

10E+3 0.252 0.284 0.67 0.818 0.071 0.087 0.2

10E+4 0.46 0.441 0.749 0.818 0.108 0.087 0.2

10E+6 0.468 0.486 0.816 0.818 0.226 0.087 0.2

15 10E+1 0.062 0.025 0.034 0.798 0.007 0.184 0.33

10E+2 0.147 0.099 0.227 0.861 0.025 0.184 0.33 10E+3 0.266 0.283 0.574 0.861 0.055 0.184 0.33 10E+4 0.433 0.489 0.682 0.861 0.083 0.184 0.33

10E+6 0.525 0.631 0.77 0.861 0.106 0.184 0.33

20 10E+1 0.062 0.0 0.019 0.557 0.005 0.271 0.478

10E+2 0.184 0.124 0.137 0.829 0.02 0.271 0.478 10E+3 0.262 0.249 0.335 0.879 0.056 0.275 0.478 10E+4 0.363 0.445 0.568 0.904 0.072 0.275 0.478 10E+6 0.479 0.705 0.614 0.904 0.075 0.275 0.478

Table 4.29: Average Quality of Upper Bounds Over All Domains. Table shows the average quality (closer to 1.0, higher the quality) of each algorithm at varying i-bounds and time bounds. i-bd is the i-bound, time is the time bound in seconds, n is the average number of variables, f is the average number of functions, k is the average of the maximum domains size, s is the average of the maximum scope size, w is the average of the constrained induced width.