3.4 Comments on WMC
3.4.3 Finite range of resolution levels
Now marginally integrating over the estimates ˆdψj,i,
p(y) = Z
p(y|{ ˆdψji})p({ ˆdψj,i})d ˆdψj,i = E[p(y|{ ˆdψj,i})]
= f (y) +X
j,i
E[dˆψj,i]ψj,i(y),
assuming that estimate ˆdψj,i is unbiased (E[ ˆdψj,i] = dψj,i), we get,
= f (y) +X
j,i
dψj,iψj,i(y).
So, all we require is that ˆdψj,i is unbiased. By construction, our estimate (3.4.30) is indeed unbiased.
Clearly, using an estimator will aect the total probability of leaving a state xt
X
j,i
[ ˆdψj,iψj,i(xt)]−
rf (xt) (3.4.33)
and, in turn, will aect the total number of ˆdψj,i that need to be estimated in order to achieve a sample from the target.
3.4.2 Implications of Theorem 3.3.2
Theorem 3.3.2 proves that, under certain conditions using the transition rate λt,ji
dened by (3.3.15) the Kolmogorov forward equation holds, and, hence, ft(·) is the
correct marginal distribution for all times t ∈ [0, 1]. The Markov process induced by λt,ji is essentially applying pWMC algorithm (Section 3.2) at innitely small time
steps. To avoid applying this algorithm at innitely small increments of time, a survival analysis theory was applied to make the algorithm practical. Although the goal of WMC is to produce samples from the target density g(·), there are points xs being sampled from intermediate distributions fs(·), s ∈ [0, 1) which have an
associated survival time t > s. What exactly does it mean for a point xs to have
survived for δs = t − s amount of time?
At the core of WMC, the transition intensity density λt,ji dictates how the process
will unfold and λt,ji is constructed based on pWMC. So, looking from the pWMC
perspective, pWMC was applied on point xs sequentially between times s ≥ 0 and
t > s, for δs = t − s amount of time. As time was evolving from s to t > s, at all instances the point xs was never `rejected' because the event of `no pair (j, i) is
selected' was always occurring with probability 1 −X
j∈Z
X
i∈Z
pj,i(xs), 0 ≤ s < t. (3.4.34)
Except at the very last point, at time t, a pair (j, i) was sampled indicating that point xs has survived for δs = t − s and now a new point needs to be sampled.
As time was evolving, under the pWMC algorithm the starting sample point xs ∼
fs(·) was `accepted' as a sample from all intermediate distribution fl(·), where s ≤
l < t. So, as a consequence of Theorem 3.3.2 a sampled survival time t > s for a point xs with s ≥ 0 indicates that xs ∼ fl(·) for s ≤ l < t. However, we make
an observation that we can only make the claim that point xs is a representative
on the fact that it did not move for the t − s amount of time, or in other words, xs
point's history from s to t, Ht s(xs).
Figure 3.4: Illustrative example for (3.4.35). Both, fs(·) and fk(·) are densities of
uniform distributions U(0, 0.66) and U(0.33, 1) respectively. Although xs survives
until point in time t and under the standard WMC if we do not condition on the history of the point xs we would also conclude that xs ∼ fk(·). However, if we
do condition on the history of the point xs at time k, Hsk(xs), it is very clear that
xs 6∼ fk(·), due to the limited range of support it passes through.
Denition 3.4.1. Given any point xs with 0 ≤ s ≤ 1 and time interval I = (t1, t2),
we denote the history ofxs over the interval I as Htt12(xs).
In general, we have that if xs survives until some point in time t > s, then
Figure 3.4 presents an example in which the conditioning issue is rather clearly demonstrated.
On the subject of conditioning, the nal target density g(·) could be interpreted as an innite mixture of distributions, each corresponding to a particular particle history,
g(x) = X
H(x)
f x|H(x)p H(x), (3.4.36) where H(x) is a full history of a point x up to a point in time t = 1 and f x|H(x) is the conditional density of a point x.
3.4.3 Finite range of resolution levels
The WMC theorems are proved to hold if one has access to innite range of resolution levels j ∈ (−∞, +∞) as for example in the decomposition of the dierence function d(x) in 3.1.7, it is clear that, in practice, we will restrict ourselves to coarsest jmin
and nest jmaxresolution levels when implementing WMC. How does this restriction
aect samples produced from the target and in particular given this restriction from which exactly target samples are being produced?
The moment the restriction is made, we no longer have access to j > jmax and
j < jmin levels and it is clear that samples produced by WMC using a limited
range of resolution levels cannot be from the target Pj,ig ψ
j,iψj,i(x). As we start
with samples from f(x) = Pj,if ψ
j,iψj,i(x) all we really doing in WMC is changing
coecients from fψ j,i to g
ψ
j,i. If we have access to an innite range of resolution levels,
eventually all coecients could be changed; however, working with a limited range certain levels are restricted and therefore some fψ
j,i coecients stay the same. For
this reason, our actual distribution at t = 1 in practice becomes,
ˆ g(x) = jmax X j=jmin X i gψj,iψj,i(x) + X j<jmin X i fj,iψψj,i(x) + X j>jmax X i fj,iψψj,i(x). (3.4.37)
Assuming that ˆg(·) above satises the probability density properties, if we were to replace g(·) with ˆg(·) in Theorem 3.3.2, the proof would still hold and in addition our resolution range across which WMC would be performed would be limited to j ∈ [jmin, jmax]. This would mean that algorithm could be implemented exactly.
Generally, ˆg(x) will not satisfy density properties, specically non-negativity everywhere; for this reason, g(·) will be used as the target in practice, but WMC samples will be treated as though they are from ˆg(x).