A better proposal distribution for MH step

The confidence sampler requires the termψ(xm

k+1, x∗k+1) as an input, which in turn depends on

the proposal distributionq(x∗

k+1|xmk+1). In its simplest form, a proposal can simply be based

on the random walk. The resulting scheme is termed as the random walk Metropolis (RWM) and was first described in [MRR+_{53]. The proposed value for a d-dimensional Markov chain is}

chosen randomly from a pre-specified Lebesgue density, with a given step size. The acceptance rate for RWM algorithm is reasonable for a 1D state space, but as the dimensionality increases, it decreases dramatically. Also, if the target distribution is multi modal with far separated modes, the Markov chain can get stuck at one of the modes and might take very long before it can get to the others. This problem is of course exacerbated in higher dimensions. Therefore, it makes sense to propose samples from a distribution that is well suited to the problem i.e.it has support that is wide enough, so that any region with non-zero mass can be reached. A sufficient condition is thatq(x∗

k+1|xmk+1) is positive everywhere. The accept-reject step makes sure that

as the chain reaches stationarity, it is sampling from the correct target distribution indeed i.e fromp(xk+1|Zk+1).

The next most obvious choice for the proposal distributions is the prior densityp(xk+1|Zk).

If chosen so,ψ(xmk+1, x∗k+1) reduces to log(u)/M , making it independent of the state values.

While quite simple, this choice is often not very, especially when the prior and posterior distributions have significant probability masses in well separated regions of the state space. This could be the case when the likelihood is quite peaked, resulting in a very low acceptance rate. As a possible remedy, one could sample from regions of the posterior with significant probability mass to improve the acceptance rate.

As alluded to in the introduction, the log-homotopy based particle flow can be used to form a better proposal. This is owing to the fact that the flow incrementally moves the particles towards their posterior locations by gradually incorporating measurements. This helps solve the issue of

degeneracyin a standard estimation problem. DHF, if carefully implemented can also be com- putationally cheaper than a standard particle filter [KUK17]. Hence, it comes naturally to use

4.5 A better proposal distribution for MH step 93

the particles out of the DHF to form the proposal distribution for the subsequent MCMC step. Below we describe some basics of the homotopy based particle flow and its implementation methodology.

4.5.1 Log homotopy based particle flow

The whole procedure is shown in an algorithmic form in the Algorithm 9, where{ˆxi k+1} Np i=1 and_{¯xi k+1} Np

i=1are the set of prior and posterior particles, respectively. We plan to use the DHF

based approximation for the posterior density as the proposal in the confidence sampling based SMCMC i.e.q(xk+1|.) ≈ ˆpDHF(xk+1|Zk+1). But before this can be done, there are two main

issues to be resolved. The first one is the processing time of the DHF. As the main focus of the work is to propose a MCMC based method that can handle massive sensor data, the dimensionality of the measurement space becomes a critical factor here. As it can be noted that non zero diffusion constrained flow equation requires the Hessian of the log-likelihood function. A direct application of the DHF, therefore, can be prohibitively expensive. The question becomes, how to use the DHF while still maintaining a reasonably low processing cost. One answer to this problem is to decimate or sub-sample the measurement set. The second question relates with the finding of an analytical approximation forpˆDHF(xk+1|Zk+1), for its sampling and

evaluation.

Algorithm 9Log homotopy flow based measurement update

1: procedureLOGHOMOTOPYFLOWUPDATE({ˆxk+1} Np

i=1,{∆λj, λj}Nj=1λ,zk+1)

2: Pˆk+1= SHRINKAGEESTIMATOR(ˆxik+1) ⊲ Estimate the prior covariance matrix

3: fori = 1 :Npdo

4: y₀= ˆxik+1 ⊲ Temporary variable

5: forj = 1 :Nλdo

6: Hλ= GETHESSIANMATRIX(log h(zk|yj−1) )

7: hλ= GETGRADIENTVECTOR(log h(zk|yj−1) )

8: m(y_j) = -hˆP−1k + λjHλ

i−1

hTλ ⊲ Non zero diffusion constrained flow

9: y_j= yj−1+ m(yj)∆λj ⊲ Propagate the particles in pseudo time

10: end for

11: ¯xik+1= yNλ

12: end for

13: Evaluate the posterior meanµµ¯¯µ¯k+1and covariance matrix ¯Pk+1

14: REDRAWPARTICLES(_{¯xi k+1}

i=1) ⊲ Redraw particle (Optional)

15: return_{xik+1} Np

i=1,µ¯µµ¯¯k+1, ˆPk+1

16: end procedure

4.5.2 Data reduction

Firstly, we tackle the issue of dimensionality reduction. Below, we list some of the methods which could be employed for reducing the number of measurements.

94 4 Bayesian processing of Massive sensor data with log-homotopy based particle flow

4.5.2.1 Naive subsampling

This is the most basic method for shrinking the measurement set, which can be done by throwing out elements at random keeping the overall set cardinality fixed. Alternatively, it can be done by taking every mth measurement. This method does not take the structure present in the data set into account. Though one of the simplest method, it could deteriorate the performance of the filter if the not enough samples are chosen.

4.5.2.2 Data Clustering

The next approach is based on clustering of the data points in the measurement space. Clustering turns out to be a quite effective means of dimensionality reduction. This has been a thoroughly studied topic, with applications in areas like image processing, computer vision, machine learn- ing etc. We use two clustering methods, K-means clustering and K-medoids clustering with the partitioning done around medoids.

K-means clustering K-means clustering is essentially a vector quantization technique, with origins in voice compression, that eventually got popular for the data analysis. The basic idea in K-means clustering is to partition a fixed number of N-dimensional observations into

Ksets/clusters, where each observation belongs to the cluster whose centroid is closest to it. It is an iterative method consisting of two steps: Expectation (E) and the Maximization (M). The process starts with choosing centroids as K random data points. In the E step, all data points are assigned to the one of the centroids. Usually the L2-norm is chosen as the metric for measuring inter-point distance. Next, the centroids locations are updated by averaging the points in their respective clusters. The procedure is carried out until the convergence or a fixed number of iterations have been carried out.

K-medoids clustering This is the second method we employ for the data clustering. A medoids is a point within a cluster whose average dissimilarity to all other points in the cluster is minimal. i.e.it is a most centrally located point in the cluster. While in K-mean clustering the centroid is usually not point with the cluster set (average of points), a medoids is always a point with in the cluster. Similar to the previous method, E and M steps are also iteratively followed in the K-medoids algorithm. K-medoids clustering is said to be more robust to noise and outliers as compared to K-means because it minimizes a sum of pairwise dissimilarities instead of a sum of squared Euclidean distances.

4.5.3 Proposal density representation

As discussed before, the output of the DHF are the approximated posterior samples, represented through a Dirac-delta approximation. For them to be used as a proposal density within a MCMC step, they have to be further approximated by some closed form probability density expression. As described in the [KU15], the redrawing step in the Algorithm 9 (step 14) returns an approximated density, either as a single multivariate Gaussian (MVG) or as a Gaussian mixture model (GMM). In the current work, we follow a similar approach and use a MVG approximated form for the proposal density.

In document Nonlinear Filtering based on Log-homotopy Particle Flow : Methodological Clarification and Numerical Evaluation (Page 102-105)