Coupling Methods for Hamiltonian Monte Carlo
Nawaf Bou-Rabee
Joint work with Andreas Eberle (Bonn)
and Katharina Schuh (Bonn)
Supported in part by the NSF under Grant No. DMS-181637
and the Alexander von Humboldt Foundation
Coupling Methods for HMC
N. Bou-Rabee
Markov Chain Monte Carlo (MCMC)
The goal of MCMC is to approximate the integral µ(f ) =
Z
S f (x) µ(dx) f : S ! R
w.r.t. a probability measure µ (called the target measure) on a Polish space (S, B). MCMC is used where exact sampling or numerical quadrature are impractical due to unknown normalizing constant, nonlogconcavity of µ or high-dimensionality of S.
Coupling Methods for HMC
N. Bou-Rabee
Computational Vaccine Design
Lelièvre, Rousset, Stoltz 2010, Free Energy Computations: A Mathematical Perspective
(a) free energy computation: S = Rd, µ(dx) / e U(x)dx and f ⌘ I {A}.
Coupling Methods for HMC
N. Bou-Rabee
Computational Materials Science
Prokhorenko, Kalka, Nahas, Bellaiche 2018 npj Computational Materials
Rhombohedral
Orthorhombic
Tetragonal
Cubic
BaTiO
3Coupling Methods for HMC
N. Bou-Rabee
n = 16 n = 32 n = 64 discretization refinementQuantum Statistical Mechanics
Korol, Rosa-Raíces, Bou-Rabee, Miller 2020 J Chem Phys
(c) path integral molecular dynamics:
S = H = L2([0, ], Rd), µ(dx) / e U(x)N (0, ( P + aI) 1)(dx) where a > 0 is a parameter and f : H ! R.
Coupling Methods for HMC
N. Bou-Rabee
Statistical Inverse Problems
Stuart 2010: Inverse Problems: A Bayesian Perspective Cotter, Dashti, Robinson, Stuart 2009: Bayesian Inverse Problems for Functions and Applications to Fluid Mechanics
y = F (
x
) + ⌘
observational noise(unknown) initial condition
sparse observations
2D Navier-Stokes
@
tv
⌫
v + (v · r)v + rp = f
r · v = 0,
v|
t=0=
x
(d) Eulerian data assimilation: S = H = ⇢ u 2 L2(T2),Z T2 u = 0, r · u = 0 , µ(dx) / exp ✓ 1 2|y F (x)|2⌃ ◆ N (ub, A ↵)(dx) for any ↵ > 1 and ub 2 H↵ ⇢ H; and f (x) ⌘ x.
Coupling Methods for HMC
N. Bou-Rabee
MCMC
Question
How many MCMC steps m ensure that ⌫⇡m is a good approximation of µ?
Some references: (i) Douc, Moulines, Priouret, Soulier 2018: Markov Chains; (ii) Eberle 2021: Markov Processes; (iii) Hairer 2010: Convergence of Markov Processes; (iv) Joulin, Ollivier 2010: Curvature, Concentration, and Error Estimates for Markov Chain Monte Carlo; (v) Levin, Peres, Wilmer 2009: Markov chains and mixing times; and (vi) Madras 2002: Lectures on Monte Carlo Methods.
Basic idea is to simulate a time-homogeneous Markov chain (Xm)m2N on (S, B) with initial distribution ⌫ and transition kernel ⇡ satisfying µ = µ⇡ and then estimating µ(f ) by
1 m
m X
i=1
f (Xi+b) where b is the “burn-in time.”
Powerful tools and techniques have been developed to address this question including • geometric (e.g. conductance and isoperimetric inequalities)
• analytic (e.g. spectral gaps, functional inequalities, and hypocoercivity) • probabilistic (e.g. minorization/drift conditions, coupling methods)
Coupling Methods for HMC
N. Bou-Rabee
Hamiltonian Monte Carlo (HMC)
HMC is an MCMC method for approximate sampling from target measures of the form µ(dx) / e U(x)dx
where U : Rd ! R+ is a twice differentiable potential energy function satisfying Z
Rd e
U(x)dx < 1 .
HMC uses a fictitious dynamics M d2
dt2qt(x, v) = rU(qt(x, v)) q0(x, v) = x
d
dtq0(x, v) = v where M is a d ⇥ d symmetric, positive definite mass matrix.
These are Newton’s equations for a particle in Rd with potential energy U(x).
Some references: (i) Bou-Rabee, Sanz-Serna 2018: Geometric integrators and the Hamiltonian Monte Carlo method; (ii) Duane, Kennedy, Pendleton, Roweth 1987: Hy-brid Monte Carlo; (iii) Leli`evre, Rousset, Stoltz 2010: Free energy computations: a mathematical perspective (iv) Neal 2011: MCMC Using Hamiltonian Dynamics
Coupling Methods for HMC
N. Bou-Rabee
HMC Transition Step
Definition (Transition Kernel of Exact HMC)
For any x 2 Rd and A 2 B, the transition kernel of exact HMC is defined by ⇡(x, A) = P[qT(x, ⇠) 2 A] ⇠ ⇠ N (0, M 1) .
A transition step of exact HMC inputs x 2 Rd and a duration parameter T > 0 and outputs qT(x, ⇠) where ⇠ ⇠ N (0, M 1).
x
q
T(x, ⇠)
Coupling Methods for HMC
N. Bou-Rabee
HMC
Question
How many HMC steps m ensure that ⌫⇡m is a good approximation of µ?
By drift/minorization conditions, geometric ergodicity was verified in: (i) Bou-Rabee, Sanz-Serna 2017: Randomized HMC; (ii) Durmus, Moulines, Saksman 2020: On the convergence of HMC; (iii) Livingstone, Betancourt, Byrne, Girolami 2019: On the geo-metric ergodicity of HMC.
Theorem
The transition kernel of HMC satisfies µ⇡ = µ.
We answer this question by designing couplings tailored to HMC. These couplings are inspired by couplings for hypoelliptic diffusions. (i) Ben Arous, Cranston, Kendall 1995: Coupling Constructions for Hypoelliptic Diffusions: Two Examples; (ii) Eberle, Guillin, Zimmer 2019: Coupling and quantitative contraction rates for Langevin diffusions.
Coupling Methods for HMC
N. Bou-Rabee
Probability Metrics and Couplings
Let (S, B) be a Polish state space and let P(S) be the set of probability measures on S.
Definition (Coupling of Probability Measures)
A coupling of ⌫, ⌘ 2 P(S) is a 2 P(S ⇥ S) such that for any A 2 B
(A ⇥ S) = ⌫(A) and (S ⇥ A) = ⌘(A) .
Denote the set of all couplings of ⌫, ⌘ by Couplings(⌫, ⌘).
Definition (Wasserstein Distance)
The L1 Wasserstein distance with respect to the metric d on S of two probability measures ⌫, ⌘ 2 P(S) is defined by
Wd1(⌫, ⌘) = inf n
Coupling Methods for HMC
N. Bou-Rabee
Probability Metrics and Couplings
Definition (Total Variation Distance)
The total variation distance of probability measures ⌫, ⌘ 2 P(S) is defined by TV(⌫, ⌘) = supn|⌫(A) ⌘(A)| : A 2 Bo .
some references
(i) den Hollander 2012: Probability Theory: Coupling Method. (ii) Lindvall 2002: Lectures on the Coupling Method.
(iii) Villani 2009: Optimal Transport – Old and New.
Lemma (Coupling characterization of TV distance)
Coupling Methods for HMC
N. Bou-Rabee
Coupling Construction for HMC
The coupling transition step inputs (x, y) 2 Rd ⇥ Rd and outputs (q
T(x, ⇠), qT(y, ⌘)) where ⇠ ⇠ N (0, M 1) and ⌘ ⇠ N (0, M 1) are defined on a common probability space.
Definition (Coupling of HMC)
For any x, y 2 Rd and A, B 2 B, define the coupling transition kernel by ((x, y), A ⇥ B) = P[qT(x, ⇠) 2 A, qT(y, ⌘) 2 B]
where (⇠, ⌘) are a pair of random variables s.t. Law(⇠) = Law(⌘) = N (0, M 1) and ⌘ = (⇠) with maximal probability, i.e.,
P[⌘ 6= (⇠)] = TV(Law(⇠), Law( (⇠))) where : Rd ! Rd is a measurable, near identity map.
Coupling Methods for HMC
N. Bou-Rabee
Synchronous Coupling
x
y
q
T
(x, ⇠)
q
T
(y, ⇠)
⇠
⇠
Coupling Methods for HMC
N. Bou-Rabee
Synchronous Coupling
U is a function in C2(Rd) satisfying:
(A1) U has bounded second derivatives, i.e., L = sup kr2Uk < 1. (A2) there exists K 2 (0, 1)
(x y) · (rU(x) rU(y)) K |x y|2 for all x, y 2 Rd.
Theorem (Chen and Vempala 2019)
Suppose U satisfies (A1)-(A2) and T > 0 satisfies LT2 1/4. Then for all initial distributions ⌫, ⌘ 2 P(Rd), and for all m 0,
W1(⌫⇡m, ⌘⇡m) e cm W1(⌫, ⌘) where c = KT2/10.
Coupling Methods for HMC
N. Bou-Rabee
Free One-Shot Coupling
x
y
x + T ⇠
⇠
Coupling Methods for HMC
N. Bou-Rabee
Free One-Shot Coupling
U is a function in C2(Rd) satisfying:
(A1) U has a local minimum at 0, and U(0) = 0.
(A2) U has bounded second derivatives, i.e., L = sup kr2Uk < 1. (A3) there exist constants R 2 [0, 1) and K 2 (0, 1)
(x y) · (rU(x) rU(y)) K |x y|2 for all x, y 2 Rd with |x y| R.
Theorem (Bou-Rabee, Eberle, and Zimmer 2020)
Suppose U satisfies (A1)-(A3) and T > 0 satisfies LT2 min K
L, 14, 256 LR1 2 . Then for all initial distributions ⌫, ⌘ 2 P(Rd), and for all m 0,
W1(⌫⇡m, ⌘⇡m) M e cm W1(⌫, ⌘), where M = e52(1+R/T ), and c = 1 10 min ✓ 1, 1 2KT2(1 + RT )e R/(2T ) ◆ e 2R/T.
Coupling Methods for HMC
N. Bou-Rabee
Mean-Field Model
Consider a high-dimensional mean-field model where U : Rdn ! R is defined as U(x) = n X i=1 0 @V (xi) + ✏ n n X j=1,j6=i W (xi xj) 1 A , x = (x1, ... , xn) , xi 2 Rd . Here V , W are functions in C2(Rd) satisfying:
(A1) V has a local minimum at 0, and V (0) = 0. (A2) L = sup kr2V k < 1 and ˜L = sup kr2W k < 1. (A3) there exist constants R 2 [0, 1) and K 2 (0, 1)
(x y) · (rV(x) rV(y)) K |x y|2 for all x, y 2 Rd with |x y| R. Mean-field models were introduced by Kac to understand the statistical properties of high-dimensional systems. (i) Guillin, Liu, Wu, Zhang 2019: The kinetic Fokker-Planck equation with mean-field interaction; (ii) Kac 1956: Foundations of Kinetic Theory; (iii) McKean 1966: A class of Markov processes associated with nonlinear parabolic equations; (iv) M´el´eard 1996: Asymptotic behavior of some interacting particle sys-tems; McKean-Vlasov and Boltzmann models; (v) Mischler and Mouhot 2013: Kac’s program in kinetic theory.
Coupling Methods for HMC
N. Bou-Rabee
Componentwise Coupling
x
i
y
i
x
i
+ T ⇠
i
⇠
i
i
(⇠
i
)
Coupling Methods for HMC
N. Bou-Rabee
Componentwise Coupling
Theorem (Bou-Rabee and Schuh 2020)
Suppose V , W satisfy (A1)-(A3). Suppose T > 0 and ✏ 0 satisfy LT2 35 min✓ 3K10L, 14, 256 · 5 · 23K6LR2(L + K ) ◆ |✏|˜L < min 0 @ K 6 , 1 2 ✓ K 36 · 149 ◆2 T + 8R r L + K K !2 exp 40R T r L + K K !1 A . Then for all initial distributions ⌫, ⌘ 2 P(Rdn), and for all m 0,
W`11(⌫⇡ m, ⌘⇡m) M e cm W1 `1(⌫, ⌘), where `1(x, y) = n X i=1 |xi yi| M = exp 5 2(1 + 4R T r L + K K ) ! and c = KT2 156 exp 10RT r L + K K ! .
Coupling Methods for HMC
N. Bou-Rabee
Perturbation of a Gaussian Measure in ∞-Dimension
Consider a target measure on a Hilbert space (H, h·, ·i) µ(dx) / exp( U(x))N (0, C)(dx)
Here C is a positive, trace class, symmetric linear operator with eigenfunctions {ei} and corresponding eigenvalues { i} arranged in descending order:
1 2 · · ·
U is a function in C2(H) satisfying: (A1) U has a local minimum at 0.
(A2) first and second derivative of U are bounded, i.e.,
K = sup krUk < 1 and L = sup kr2Uk < 1 .
In this setting, convergence bounds for preconditioned Crank-Nicolson and precondi-tioned MALA were developed in: (i) Eberle 2014: Error bounds for Metropolis-Hast-ings algorithms applied to perturbations of Gaussian measures in high dimensions; (ii) Hairer, Stuart, Vollmer 2014: Spectral gaps for a Metropolis-Hastings algorithm in infinite dimension.
Coupling Methods for HMC
N. Bou-Rabee
Exact Preconditioned HMC
Let M = C 1. Then the dynamics simplifies to d2
dt2qt(x, v) = qt(x, v) CrU(qt(x, v)) q0(x, v) = x
d
dtq0(x, v) = v
Definition (Transition Kernel of Exact Preconditioned HMC)
For any x 2 H and A 2 B, the transition kernel of exact preconditioned HMC is defined by
⇡(x, A) = P[qT(x, ⇠) 2 A] ⇠ ⇠ N (0, C) .
Preconditioned HMC was introduced in an 1-dimensional setting by Beskos, Pinski, Sanz-Serna, Stuart 2011: Hybrid Monte Carlo on Hilbert spaces.
Coupling Methods for HMC
N. Bou-Rabee
Two-Scale Coupling
x
i
y
i
q
T
(x
i
, ⇠
i
)
q
T
(y
i
, ⇠
i
)
⇠
i
⇠
i
high modes: i > nThis splitting of the Hilbert space goes back to work on stochastic Navier-Stokes (Mat-tingly PhD thesis 1998) and analogous deterministic results (Foias/Prodi 1967). See:
(i) Mattingly 2003: On recent progress for the stochastic Navier–Stokes equations; (ii) Zimmer 2017: Explicit contraction rates for a class of degenerate and
infinite-dimensional diffusions.
x
i
y
i
x
i
+ T ⇠
i
⇠
i
i
(⇠
i
)
low modes: 1 i nCoupling Methods for HMC
N. Bou-Rabee
Two-Scale Coupling
Theorem (Bou-Rabee and Eberle 2020)
Suppose U satisfies (A1)-(A2). Set
R = 16p60 21K2/2 + trace(C) 1/2 (1 + 1L)pL . Suppose T > 0 satisfies (1 + 1L)T2 min✓ 196p 1 3 1L(1 + 1L), 1 256(1 + 1L)R2 ◆ . Then for all initial distributions ⌫, ⌘ 2 P(H), and for all m 0,
W1(µ, ⌫⇡m) M e cm where c = min✓ 1 32T2, 1 128T max(R, T )e max(R,T )/T ◆ .
Coupling Methods for HMC
N. Bou-Rabee
One-Shot Coupling
x
y
q
T
(x, ⇠)
⇠
(⇠)
Lemma (M ¨uller and Ortiz 2004)
Suppose (A2) holds. For any a, b 2 Rd and for any T > 0 s.t. LT2 (2/5)⇡2, there exists a unique solution to the boundary value problem:
d2
Coupling Methods for HMC
N. Bou-Rabee
One-Shot Coupling
U is a function in C3(Rd) satisfying:
(A1) U has a global minimum at 0 and U(0) = 0.
(A2) U has bounded second derivatives, i.e., L = sup kr2Uk < 1. (A3) U has bounded third derivatives, i.e., LH = sup kr3Uk < 1.
Madras and Sezer 2010 connected one-shot couplings for Markov chains to “upgraded” convergence bounds.
Theorem (Bou-Rabee and Eberle 2021)
Suppose U satisfies (A1)-(A3) and T > 0 satisfies LT2 1/6. Then for all initial distributions ⌫ 2 P(Rd), and for all m 0,
Coupling Methods for HMC
N. Bou-Rabee
Conclusion and Prospects
x y qT(x, ⇠) qT(y, ⇠) ⇠ ⇠
x
y
x + T ⇠
⇠
(⇠)
x
y
q
T(x, ⇠)
⇠
(⇠)
xi yi qT(xi, ⇠i) qT(yi, ⇠i) ⇠i ⇠i high modes: i > n xi yi xi + T ⇠i ⇠i i(⇠i) low modes: 1 i n1. synchronous 2. free one-shot
5. one-shot 4. two-scale xi yi xi + T ⇠i ⇠i i(⇠i) 3. componentwise
(i) These couplings for exact HMC offer significant flexibility for further extensions. (ii) These results have been mostly extended to unadjusted HMC and partly
ex-tended to Metropolis-adjusted HMC.
(iii) As a byproduct of TV convergence bounds, we obtain geometric tail bounds for the corresponding coupling times, which are crucial for new coupling-based unbi-ased estimation techniques (see Heng, Jacob 2019; and Jacob, O’Leary, Atchad´e JRSSB Discussion Paper 2020).
Coupling Methods for HMC
N. Bou-Rabee
Mixing Time Guarantees
Unadjusted HMC
(Bou-Rabee/Eberle 2021)Metropolized HMC
(Chen/Dwivdedi/Wainright/Yu 2020 JMLR) non-strongly logconcave model -mean-field model with weak interactions -non-strongly logconcave model (warm start) strongly logconcavemodel (warm start)