7.4 Implications of Besov theory on WMC
8.1.5 Controlling samples from intermediate distributions
In a conventional MIS set up, we have several importance densities picked in advance from which sampling is going to be performed directly to estimate moments of a target distribution. However, in the WMC scenario, we are not able to pick intermediate distributions a priori and sample from them directly; the sampling procedure from intermediate distributions is uncontrolled and determined by a random process. Nonetheless, from the WMC theory (3.4.2) we know that if a given point xs at a time t = s has an associated survival time t = t?, then xs could
be treated as a sample from any distribution between fs(·)and ft?(·)excluding the
density at t = t?,
x ∼ fl(·), s ≤ l < t?, (8.1.7)
which means that sample x is a representative sample from all intermediate distributions between fs(·) and ft(·), t > s, excluding the density at time t. Figure
8.1 demonstrates this process for a starting point x0. Firstly we sample x0 ∼ f (·)
at time t = 0, secondly we sample a survival time t? after which we would sample a
new point xt? ∼ ψj,i(x)if t? < 1. Having observed that point x0 existed at all times
0 ≤ s < t?, we conclude that x0 ∼ fs(x) for any 0 ≤ s < t?.
Figure 8.1: Diagram showing how randomly sampled intermediate points in a WMC are going to be assigned to a distribution. Point x0 had a survival time t = t?, where
0 ≤ s < t?, hence we conclude x0 ∼ fl(·), 0 ≤ l < t?.
The question remains, how to decide to which ft(·)distribution intermediate sample
points should be assigned to during the full run of WMC for a starting sample size of N points from f(·).
The idea is to create checkpoints tk with each single WMC run, which will indicate
the intermediate distributions ftk to which points xs, s ∈ [0, 1), should be assigned
to. For the rst sample x ∼ f(·) a survival time t is sampled and if t < 1 a new point x? ∼ ψj,i(·)is sampled according to the WMC algorithm. The sampled survival time
t becomes a checkpoint created by the initial point from a starting distribution f(·), after this a survival time t? for the point x? is sampled and if t? < 1 we record t? as
So, each starting point xk ∼ f (·)and its associated intermediate points will create a
set of checkpoints tk,l(k), where k ∈ {1, 2, ..., N} indicates at which run the checkpoint
was created and l(k)∈ N indicates the l-th checkpoint in k-th WMC run. Therefore,
after the total of N runs we will end up with a pooled collection of checkpoints {tk,l(k)}, where k ∈ {1, 2, ..., N} and l(k) = 1, ..., l
(k)
max. It could be the case that no
checkpoints are created in the k-th run, in that case we would have l(k)
max = ∅ and
tk,l(k) = ∅. Checkpoint creation procedure could be inspected in Figure 8.2.
Figure 8.2: Illustrating how checkpoints are created over several WMC runs. With each run new checkpoints are created then pooled into a single collection.
Having created all the checkpoints, we next allocate points to intermediate distributions. Given any starting point xn ∼ f (·), where n ∈ {1, ..., N}, and its
associated intermediate points that were created in n-th run, the allocation process is as follows:
1. Given a point, observe its initial time tI, so for xn ∼ f (·) we have tI = 0, we
also take note of a survival time of xn which let us say is tn,1< 1.
2. From the full collection of checkpoints {tk,l(k)}, k ∈ {1, 2, ..., N}, l(k) =
1, ..., lmax(k) we discard all checkpoints created by n-th run to create a new sub-
collection of checkpoints with k 6= n, {tk,l(k)}k6=n
3. We allocate point xn to all intermediate distributions ft
k,l(k)(·) for which the
inequality tI ≤ tk,l(k) < t is satised, where tk,l(k) ∈ {tk,l(k)}k6=n.
The same exact steps above are taken in allocating intermediate points x ∼ ψj,i(x).
Figure 8.3: After creating a full collection of checkpoints after N runs, each starting point x0 ∼ f (·) and associated intermediate points x ∼ ψj,i(·) are allocated to
intermediate distribution based on those checkpoints that the point has survived through. The point x0 has survived past the time t1 and hence is assigned to ft1(·).
On the other hand, the point x1 is not assigned to any intermediate distribution
because there are no checkpoints in between initial time and survival time to which this point could be allocated. Furthermore, points could be allocated to several intermediate distributions at the same time, points x2 and x3 both survive through
two checkpoints and hence are assigned to both intermediate distributions.
Samples in Figure 8.3 x0 ∼ f (·) x3 ∼ ft4(·)
x0 ∼ ft1(·) x3 ∼ ft5(·)
x2 ∼ ft2(·) y := x3 ∼ g(·)
x2 ∼ ft3(·)
Table 8.1: Table summarising the samples produced in Figure 8.3. In addition to a starting sample x0 ∼ f (·) and a target sample y ∼ g(·), there was exactly one point
assigned to every intermediate distribution.
a starting distribution we end up with:
1. {xi}Ni=1∼ f (x)
2. {yi}Ni=1∼ g(x)
3. {xn,k}N −1n=1 ∼ ftk(x), for k = {1, ..., r}, where as before, r is the total number
of intermediate distributions used (checkpoints created) and xn,k is the nth
sample from the distribution ftk(x).
As we can see in Figure 8.3, due to a continuity of the time parameter t each checkpoint needs to be passed exactly one time in each WMC run; this means that if we start with N samples from a starting distribution, there are going to be N − 1 points assigned to every intermediate distribution that was dened by a checkpoint. There are going to be N − 1 samples because as described in the allocation process above, when allocating intermediate sample point to intermediate distributions, checkpoints that were created from that particular WMC run are not being used, hence leaving us with N − 1 samples for each intermediate distribution. There also exists a possibility to predene checkpoints in advance, manually. The manual grid selection of checkpoints would signicantly reduce the total number
of intermediate distributions used in construction of the estimator ˜Gw and would
reduce the correlation present across samples from ft(·)and fs(·)where t ≈ s, i.e. s
and t are almost equal. On the other hand, manual selection of checkpoints assumes that user has knowledge of distribution of survival points and can select checkpoints in a meaningful manner. The dynamic allocation of checkpoints presented in this section is not uniform and is highly inuenced by the discrepancy present between starting distribution f(·) and the target g(·). If f(·) and g(·) are highly similar it is expected that checkpoints could be more concentrated towards t = 1 and therefore a uniform grid would not be a meaningful way of creating checkpoints as a lot of information would be wasted and not directed towards more accurate computation of ˜Gw.
A thinned out, informative grid could be constructed after checkpoints have been collected and analysed. The idea would be to reduce the number of checkpoints on the original grid but still maintain the overall distribution and structure created on the original grid. In this way the grid would still represent patterns where points usually tend to get extinct but also it would be coarse enough to mitigate the present correlation between points that were assigned to several intermediate distributions.