A Comparison with Simpler Approximation Methods

5.4 Sequential Monte Carlo with Bayes Linear for Graphical Models

5.5.2 A Comparison with Simpler Approximation Methods

In this section, we justify the use of the SMC sample as an approximation to the MRF model by comparing the updates with other simpler approximate inference methods. The keys ideas behind the SMC method are:

CHAPTER 5. SEQUENTIAL MONTE CARLO WITH BAYES LINEAR 115

(a) Greedy (b) Bayes-UCB

Figure 5.5.3: The mean cumulative number of relevant items screened using the three models on the small network. The true network is defined using non-linear prior conditional A (Table 5.5.1b) and the clique factor (Table 5.5.1a) with [λ1, λ2] = [0.5,0.5]. The BL model finds just as many relevant items as the SMC and MRF models.

1. It only takes a Monte Carlo sample of the random variables directly involved in the observations.

2. It uses BL updates to approximate both the probability of the realisations within the Monte Carlo sample and the latent variables not in the sample, given the realisations. 3. We have chosen the optimal resampling scheme.

The simplest approximation technique we consider is an IS method, where a Monte Carlo sample is taken to the full set of latent variables, given the prior mean and covariance. This allows us to evaluate the impact of only including a subset of the latent variables in the approximation. The particles are sampled using Algorithm 7 and corresponding weights are then calculated using BL updates. As observations are made, we then update the weights to approximate the posterior distribution of Z|Y. Henceforth, this method is called the BL IS sampler.

We can evaluate the impact of using BL updates to approximate the probabilities in the Monte Carlo sample by also considering an IS sampler where the samples are re-weighted so they approximate the true posterior distribution, hence forth called the MRF IS sampler.

The optimal resampling scheme given in Algorithm 8 is used to decide which particles to include in the Monte Carlo approximation. A key feature of this resampling algorithm is

CHAPTER 5. SEQUENTIAL MONTE CARLO WITH BAYES LINEAR 116

that no duplicate particles are selected. These particles are chosen to minimise the squared error loss for approximating a discrete probability mass function with a finite support. We compare the method to the default method for resampling in particle filters, multinomial resampling, which allows for duplicate particles in the Monte Carlo sample. Additionally, we compare the method with max-weight resampling (Tugnait, 1982; Punskaya et al., 2002), which ensures no duplicate particles are selected but deterministically selects the particles with maximum weight.

Finally, we also compare the updates using the BL model described in Chapter 4. The SMC method was developed to better cope with non-linearities in the relationship between observations and latent variables. Hence, for the conditional distributions considered, we would expect this to give a worse approximation to the MRF model.

We run each approximate inference method and the MRF model on 50 sets of 200 observations screened on randomly chosen edges for a network with 25 nodes, shown in Figure 5.5.1b. This network is large enough to require resampling in the SMC model. The true node realisations are simulated using the method in Section 4.6.2, withδ = 0.8. The prior clique factor in Table 5.5.1a with [λ1, λ2] = [0.4,0.4] is used to define the prior MRF model for the Zs. The values for λ1 and λ2 are chosen to ensure that the conditions in Lemma 5.3.3 are met. The prior mean and covariance used in the approximate inference methods are calculated directly from the MRF model. The relationship between the latent variables and observations is described using the beta distribution with parameters given by prior conditional A in Table 5.5.1b.

The differences between the posterior expectation of the latent variables using the approximate methods and MRF model are illustrated after a given number of observations and over all repetitions, see Figure 5.5.4. For the SMC methods and the IS methods, we consider the maximum number of particles, M = 1000. Additionally, we run the IS methods with

M = 4000, which ensures the run time updating the Monte Carlo sample in the IS models given an observation roughly matches the maximum run time of an iteration in the SMC method, as the IS methods are less computationally expensive. The plots are split over the

CHAPTER 5. SEQUENTIAL MONTE CARLO WITH BAYES LINEAR 117

(a) SMC Optimal Resampling (b) SMC Multinomial Resampling

(e) MRF IS,M = 1000 (f) BL IS,M = 4000

(g) MRF IS,M = 4000 (h) Bayes Linear

Figure 5.5.4: 25 node network with observations on random edges. The distribution of differences between inference in the Markov random field using prior conditional A and the approximate inference methods over 50 sets of random observations, given the relevance of the nodes. The Monte Carlo methods have a maximum number of particles of M = 1000. The SMC models give the best approximation to the MRF model for this network.

true relevance of the nodes in the network.

The SMC methods with optimal and max-weight resampling schemes produce the smallest errors compared to the MRF model. The results for optimal resampling in Figure 5.5.4a provide a slightly better approximation compared to max weight resampling, particularly for nodes with a true relevance of 1. SMC using multinomial resampling gives a very poor approximation. The BL IS and MRF IS models perform poorly in comparison to the SMC method with optimal resampling, even when we consider a larger number of particles. The

CHAPTER 5. SEQUENTIAL MONTE CARLO WITH BAYES LINEAR 118

particles in the IS methods depend only on the prior mean and covariance, whilst the SMC also takes into consideration the observations when expanding the particle space and possibly only contains a subset of the random variables. The BL IS model performs similarly to the MRF IS model, suggesting that the BL method gives a good approximation to the probabilities in the particle approximation. The BL model gives a poor approximation when the true node relevance is one. The prior conditional probability distribution is non-linear so this is to be expected.

The SMC model performs well when observations are on random edges compared to the other simpler approximate methods. However, the SMC model was developed for inference within sequential decision problems where there is likely to be less exploration within the network than as a result of choosing random edges. We consider the set of observations from running the greedy algorithm 50 times on the MRF model rather than random edge observations. The greedy algorithm has far less exploration and so it is possible that observations are made on only a small number of edges in the network. Figure 5.5.5 shows the difference in posterior values in the MRF and the approximate inference methods. The SMC model with optimal resampling gives the most accurate approximation to the MRF model, see Figure 5.5.5a. Max-weight resampling gives a less accurate approximation than using optimal resampling; the diversity in the particles is lost at an earlier point in time than with the optimal resampling. This shows the influence a good resampling scheme can have on the accuracy of the method.

In document Inference and decision making in large weakly dependent graphical models (Page 124-128)