Finite range of resolution levels

3.4 Comments on WMC

3.4.3 Finite range of resolution levels

Now marginally integrating over the estimates ˆdψ_j,i,

p(y) = Z

p(y|{ ˆdψ_ji})p({ ˆdψ_j,i})d ˆdψ_j,i = E[p(y|{ ˆdψ_j,i})]

= f (y) +X

j,i

E[dˆψj,i]ψj,i(y),

assuming that estimate ˆdψ_j,i is unbiased (E[ ˆdψ_j,i] = dψ_j,i), we get,

= f (y) +X

j,i

dψ_j,iψj,i(y).

So, all we require is that ˆdψ_j,i is unbiased. By construction, our estimate (3.4.30) is indeed unbiased.

Clearly, using an estimator will aect the total probability of leaving a state xt

j,i

[ ˆdψ_j,iψj,i(xt)]−

rf (xt) (3.4.33)

and, in turn, will aect the total number of ˆdψ_j,i that need to be estimated in order to achieve a sample from the target.

3.4.2 Implications of Theorem 3.3.2

Theorem 3.3.2 proves that, under certain conditions using the transition rate λt,ji

dened by (3.3.15) the Kolmogorov forward equation holds, and, hence, ft(·) is the

correct marginal distribution for all times t ∈ [0, 1]. The Markov process induced by λt,ji is essentially applying pWMC algorithm (Section 3.2) at innitely small time

steps. To avoid applying this algorithm at innitely small increments of time, a survival analysis theory was applied to make the algorithm practical. Although the goal of WMC is to produce samples from the target density g(·), there are points xs being sampled from intermediate distributions fs(·), s ∈ [0, 1) which have an

associated survival time t > s. What exactly does it mean for a point xs to have

survived for δs = t − s amount of time?

At the core of WMC, the transition intensity density λt,ji dictates how the process

will unfold and λt,ji is constructed based on pWMC. So, looking from the pWMC

perspective, pWMC was applied on point xs sequentially between times s ≥ 0 and

t > s, for δs = t − s amount of time. As time was evolving from s to t > s, at all instances the point xs was never `rejected' because the event of `no pair (j, i) is

selected' was always occurring with probability 1 −X

j∈Z

i∈Z

pj,i(xs), 0 ≤ s < t. (3.4.34)

Except at the very last point, at time t, a pair (j, i) was sampled indicating that point xs has survived for δs = t − s and now a new point needs to be sampled.

As time was evolving, under the pWMC algorithm the starting sample point xs ∼

fs(·) was `accepted' as a sample from all intermediate distribution fl(·), where s ≤

l < t. So, as a consequence of Theorem 3.3.2 a sampled survival time t > s for a point xs with s ≥ 0 indicates that xs ∼ fl(·) for s ≤ l < t. However, we make

an observation that we can only make the claim that point xs is a representative

on the fact that it did not move for the t − s amount of time, or in other words, xs

point's history from s to t, Ht s(xs).

Figure 3.4: Illustrative example for (3.4.35). Both, fs(·) and fk(·) are densities of

uniform distributions U(0, 0.66) and U(0.33, 1) respectively. Although xs survives

until point in time t and under the standard WMC if we do not condition on the history of the point xs we would also conclude that xs ∼ fk(·). However, if we

do condition on the history of the point xs at time k, Hsk(xs), it is very clear that

xs 6∼ fk(·), due to the limited range of support it passes through.

Denition 3.4.1. Given any point xs with 0 ≤ s ≤ 1 and time interval I = (t1, t2),

we denote the history ofxs over the interval I as Htt12(xs).

In general, we have that if xs survives until some point in time t > s, then

Figure 3.4 presents an example in which the conditioning issue is rather clearly demonstrated.

On the subject of conditioning, the nal target density g(·) could be interpreted as an innite mixture of distributions, each corresponding to a particular particle history,

g(x) = X

H(x)

f x|H(x)p H(x), (3.4.36) where H(x) is a full history of a point x up to a point in time t = 1 and f x|H(x) is the conditional density of a point x.

3.4.3 Finite range of resolution levels

The WMC theorems are proved to hold if one has access to innite range of resolution levels j ∈ (−∞, +∞) as for example in the decomposition of the dierence function d(x) in 3.1.7, it is clear that, in practice, we will restrict ourselves to coarsest jmin

and nest jmaxresolution levels when implementing WMC. How does this restriction

aect samples produced from the target and in particular given this restriction from which exactly target samples are being produced?

The moment the restriction is made, we no longer have access to j > jmax and

j < jmin levels and it is clear that samples produced by WMC using a limited

range of resolution levels cannot be from the target Pj,ig ψ

j,iψj,i(x). As we start

with samples from f(x) = Pj,if ψ

j,iψj,i(x) all we really doing in WMC is changing

coecients from fψ j,i to g

j,i. If we have access to an innite range of resolution levels,

eventually all coecients could be changed; however, working with a limited range certain levels are restricted and therefore some fψ

j,i coecients stay the same. For

this reason, our actual distribution at t = 1 in practice becomes,

ˆ g(x) = jmax X j=jmin X i gψ_j,iψj,i(x) + X j<jmin X i f_j,iψψj,i(x) + X j>jmax X i f_j,iψψj,i(x). (3.4.37)

Assuming that ˆg(·) above satises the probability density properties, if we were to replace g(·) with ˆg(·) in Theorem 3.3.2, the proof would still hold and in addition our resolution range across which WMC would be performed would be limited to j ∈ [jmin, jmax]. This would mean that algorithm could be implemented exactly.

Generally, ˆg(x) will not satisfy density properties, specically non-negativity everywhere; for this reason, g(·) will be used as the target in practice, but WMC samples will be treated as though they are from ˆg(x).

In document Theory, Analysis and Implementation of Wavelet Monte Carlo. (Page 62-66)