Controlling samples from intermediate distributions

7.4 Implications of Besov theory on WMC

8.1.5 Controlling samples from intermediate distributions

In a conventional MIS set up, we have several importance densities picked in advance from which sampling is going to be performed directly to estimate moments of a target distribution. However, in the WMC scenario, we are not able to pick intermediate distributions a priori and sample from them directly; the sampling procedure from intermediate distributions is uncontrolled and determined by a random process. Nonetheless, from the WMC theory (3.4.2) we know that if a given point xs at a time t = s has an associated survival time t = t?, then xs could

be treated as a sample from any distribution between fs(·)and ft?(·)excluding the

density at t = t?_,

x ∼ fl(·), s ≤ l < t?, (8.1.7)

which means that sample x is a representative sample from all intermediate distributions between fs(·) and ft(·), t > s, excluding the density at time t. Figure

8.1 demonstrates this process for a starting point x0. Firstly we sample x0 ∼ f (·)

at time t = 0, secondly we sample a survival time t? _{after which we would sample a}

new point xt? ∼ ψ_j,i(x)if t? < 1. Having observed that point x₀ existed at all times

0 ≤ s < t?, we conclude that x0 ∼ fs(x) for any 0 ≤ s < t?.

Figure 8.1: Diagram showing how randomly sampled intermediate points in a WMC are going to be assigned to a distribution. Point x0 had a survival time t = t?, where

0 ≤ s < t?, hence we conclude x0 ∼ fl(·), 0 ≤ l < t?.

The question remains, how to decide to which ft(·)distribution intermediate sample

points should be assigned to during the full run of WMC for a starting sample size of N points from f(·).

The idea is to create checkpoints tk with each single WMC run, which will indicate

the intermediate distributions ftk to which points xs, s ∈ [0, 1), should be assigned

to. For the rst sample x ∼ f(·) a survival time t is sampled and if t < 1 a new point x? ∼ ψj,i(·)is sampled according to the WMC algorithm. The sampled survival time

t becomes a checkpoint created by the initial point from a starting distribution f(·), after this a survival time t? _{for the point x}? _{is sampled and if t}? _{< 1} _{we record t}? _as

So, each starting point xk ∼ f (·)and its associated intermediate points will create a

set of checkpoints tk,l(k), where k ∈ {1, 2, ..., N} indicates at which run the checkpoint

was created and l(k)_{∈ N indicates the l-th checkpoint in k-th WMC run. Therefore,}

after the total of N runs we will end up with a pooled collection of checkpoints {t_k,l(k)}, where k ∈ {1, 2, ..., N} and l(k) = 1, ..., l

(k)

max. It could be the case that no

checkpoints are created in the k-th run, in that case we would have l(k)

max = ∅ and

t_k,l(k) = ∅. Checkpoint creation procedure could be inspected in Figure 8.2.

Figure 8.2: Illustrating how checkpoints are created over several WMC runs. With each run new checkpoints are created then pooled into a single collection.

Having created all the checkpoints, we next allocate points to intermediate distributions. Given any starting point xn ∼ f (·), where n ∈ {1, ..., N}, and its

associated intermediate points that were created in n-th run, the allocation process is as follows:

1. Given a point, observe its initial time tI, so for xn ∼ f (·) we have tI = 0, we

also take note of a survival time of xn which let us say is tn,1< 1.

2. From the full collection of checkpoints {tk,l(k)}, k ∈ {1, 2, ..., N}, l(k) =

1, ..., lmax(k) we discard all checkpoints created by n-th run to create a new sub-

collection of checkpoints with k 6= n, {tk,l(k)}_k6=n

3. We allocate point xn to all intermediate distributions ft

k,l(k)(·) for which the

inequality tI ≤ tk,l(k) < t is satised, where t_k,l(k) ∈ {t_k,l(k)}_k6=n.

The same exact steps above are taken in allocating intermediate points x ∼ ψj,i(x).

Figure 8.3: After creating a full collection of checkpoints after N runs, each starting point x0 ∼ f (·) and associated intermediate points x ∼ ψj,i(·) are allocated to

intermediate distribution based on those checkpoints that the point has survived through. The point x0 has survived past the time t1 and hence is assigned to ft1(·).

On the other hand, the point x1 is not assigned to any intermediate distribution

because there are no checkpoints in between initial time and survival time to which this point could be allocated. Furthermore, points could be allocated to several intermediate distributions at the same time, points x2 and x3 both survive through

two checkpoints and hence are assigned to both intermediate distributions.

Samples in Figure 8.3 x0 ∼ f (·) x3 ∼ ft4(·)

x0 ∼ ft1(·) x3 ∼ ft5(·)

x2 ∼ ft2(·) y := x3 ∼ g(·)

x2 ∼ ft3(·)

Table 8.1: Table summarising the samples produced in Figure 8.3. In addition to a starting sample x0 ∼ f (·) and a target sample y ∼ g(·), there was exactly one point

assigned to every intermediate distribution.

a starting distribution we end up with:

1. {xi}Ni=1∼ f (x)

2. {yi}Ni=1∼ g(x)

3. {xn,k}N −1n=1 ∼ ftk(x), for k = {1, ..., r}, where as before, r is the total number

of intermediate distributions used (checkpoints created) and xn,k is the nth

sample from the distribution ftk(x).

As we can see in Figure 8.3, due to a continuity of the time parameter t each checkpoint needs to be passed exactly one time in each WMC run; this means that if we start with N samples from a starting distribution, there are going to be N − 1 points assigned to every intermediate distribution that was dened by a checkpoint. There are going to be N − 1 samples because as described in the allocation process above, when allocating intermediate sample point to intermediate distributions, checkpoints that were created from that particular WMC run are not being used, hence leaving us with N − 1 samples for each intermediate distribution. There also exists a possibility to predene checkpoints in advance, manually. The manual grid selection of checkpoints would signicantly reduce the total number

of intermediate distributions used in construction of the estimator ˜Gw and would

reduce the correlation present across samples from ft(·)and fs(·)where t ≈ s, i.e. s

and t are almost equal. On the other hand, manual selection of checkpoints assumes that user has knowledge of distribution of survival points and can select checkpoints in a meaningful manner. The dynamic allocation of checkpoints presented in this section is not uniform and is highly inuenced by the discrepancy present between starting distribution f(·) and the target g(·). If f(·) and g(·) are highly similar it is expected that checkpoints could be more concentrated towards t = 1 and therefore a uniform grid would not be a meaningful way of creating checkpoints as a lot of information would be wasted and not directed towards more accurate computation of ˜Gw.

A thinned out, informative grid could be constructed after checkpoints have been collected and analysed. The idea would be to reduce the number of checkpoints on the original grid but still maintain the overall distribution and structure created on the original grid. In this way the grid would still represent patterns where points usually tend to get extinct but also it would be coarse enough to mitigate the present correlation between points that were assigned to several intermediate distributions.

In document Theory, Analysis and Implementation of Wavelet Monte Carlo. (Page 169-174)