IV. Multiple proposal Markov chain Monte Carlo
4.3 Multiple proposal piecewise deterministic MCMC algorithms
4.3.1 Algorithm description
Some MCMC algorithms use piecewise deterministic proposal kernels to update the state of the Markov chain. Some of these algorithms extend the target distribu- tion on space X to a joint distribution on a product space X × V whose marginal on X equals the original target distribution. Elements in space V determine how the piece- wise deterministic kernel makes proposals. In this section, we will first describe the algorithms in an abstract setting. We will then present how specific algorithms, such as Hamiltonian Monte Carlo or the bouncy particle sampler, fit into this framework. In the extended space X × V, we assume that the target distribution has density
¯
π(x)ψ(v) for (x, v) ∈ X × V with respect to a product reference measure denoted by dx dv. The X-component of the reference measure dx is the same as the original reference measure on X for which the original target distribution has density ¯π(x). The variable v ∈ V is often called the momentum variable in HMC and the velocity variable in the BPS. We define a collection of deterministic maps Sτ : X × V → X × V
for possibly various values of τ . The map Sτ may be interpreted as the evolution
of a particle for time duration τ in a system, such that Sτ(x, v) denotes the final
position-velocity pair of a particle that moves in the system with initial position x and initial velocity v.
In order to make sure that the target density ¯π(x)ψ(v) is stationary in the algo- rithm, we impose some conditions on {Sτ} and ψ(·).
• Measure preserving condition. First, the map Sτ for each τ preserves the
reference measure dx dv: that is, for every measurable set A ∈ X × V,
(4.3) Z 1[Sτ(x,v)∈A]dxdv = Z A dxdv.
Then for any integrable measurable function f , we have Z
f {S(x, v)}dxdv = Z
f (x, v)dxdv.
• Reversibility condition. Second, we assume that there exists a velocity re- flection operator R(x) : V → V defined for every point x ∈ X, such that
R(x) ◦ R(x) = id for all x ∈ X, (4.4)
R(x) preserves the reference measure dv, (4.5)
ψ{R(x)v} = ψ(v) for all (x, v) ∈ X × V, (4.6)
and if we define a map T : X × V → X × V as T (x, v) := (x, R(x)v),
Algorithm 4: Multiple proposal piecewise deterministic MCMC
Input : The distribution of the maximum number of proposals and the maximum number of accepted proposals ν(N, L)
Time step length distribution µ(dτ ) Velocity distribution density ψ(v) Time evolution operators {Sτ}
Velocity reflection operator R(x) Velocity refreshment probability pref Number of iterations, M
Output: Markov chain X(i)
i∈1:M
Initialize: Set X(0) arbitrarily and draw V(0)∼ ψ(·).
for i ← 0 : M −1 do Draw N, L ∼ ν(·, ·) Draw τ ∼ µ(·) Draw Λ ∼ unif(0, 1)
Set (X(i+1), V(i+1)) ← (X(i), R(X(i))V(i))
Set na ← 0
for n ← 1 : N do
Set (Yn, Wn) = Sτ(Yn−1, Wn−1), where we understand Y0:= X(i)and W0:= V(i)
if Λ < π(Yn)ψ(Wn)
π(X(i))ψ(V(i)) then na ← na+ 1
if na = L then
Set (X(i+1), V(i+1)) ← (Y n, Wn)
Break end end
With probability pref, refresh V(i+1)∼ ψ( · )
end
In the above, id denotes the identity maps in the corresponding space V or X × V. The reversibility condition can be understood as an abstraction of an aspect of the Hamiltonian dynamics that if we reverse the velocity of a particle and advance in time, the particle traces back its past trajectory. Its meaning will become clearer in the context of explicit cases of HMC or the BPS. The proof of the following lemma is provided in appendix.
Lemma IV.3. Suppose (4.4) and (4.7) hold. Define recursively Sn
τ := Sτn−1 ◦ Sτ
where S1
τ = Sτ. Then for any n ≥ 1, we have T ◦ Sτn◦ T ◦ Sτn = id. Moreover, Sτ is
a bijective map.
fashion as multiple proposal Metropolis-Hastings algorithms do. The pseudocode is shown in Algorithm 4. The main difference from Algorithm 3 is that proposals are obtained deterministically by the relation (Yn, Wn) = Sτ(Yn−1, Wn−1) and that
the acceptability criterion takes into account the density ψ as well. If there are less than L acceptable proposals in the sequence of proposals, the next state of the Markov chain is set to (X(i+1), V(i+1)) = (X(i), R(X(i))V(i)). In order to facilitate better mixing, the velocity V(i+1) may be refreshed with a certain probability pref at the end of each iteration by drawing from ψ(·). The output X(i)i∈1:M is obtained by simply discarding the velocity variables V(i)i∈1:M.
We finally note that the time length τ for the evolution map Sτ can be drawn
either collectively or separately for each n ∈ 1 : N in Algorithm 4. The pseudocode in Algorithm4shows the case where τ is drawn collectively such that the same value of τ is used for all n ∈ 1 : N . Instead, the line Draw τ ∼ µ(·) can be moved right below the for n ← 1 : N do line such that for each n ∈ 1 : N , a different value of τn
is drawn independently and Sτn is used to obtain (Yn, Wn).
The invariance of the target distribution π(x)ψ(v) can be shown in a similar way as in Section 4.2.2.
Proposition IV.4. Algorithm 4 constructs a reversible Markov chain with respect to the density ¯π(x)ψ(v).
Proof. See Appendix 4.D.