• No results found

Coupling Methods for Hamiltonian Monte Carlo

N/A
N/A
Protected

Academic year: 2021

Share "Coupling Methods for Hamiltonian Monte Carlo"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

Coupling Methods for Hamiltonian Monte Carlo

Nawaf Bou-Rabee

Joint work with Andreas Eberle (Bonn)

and Katharina Schuh (Bonn)

Supported in part by the NSF under Grant No. DMS-181637

and the Alexander von Humboldt Foundation

(2)

Coupling Methods for HMC

N. Bou-Rabee

Markov Chain Monte Carlo (MCMC)

The goal of MCMC is to approximate the integral µ(f ) =

Z

S f (x) µ(dx) f : S ! R

w.r.t. a probability measure µ (called the target measure) on a Polish space (S, B). MCMC is used where exact sampling or numerical quadrature are impractical due to unknown normalizing constant, nonlogconcavity of µ or high-dimensionality of S.

(3)

Coupling Methods for HMC

N. Bou-Rabee

Computational Vaccine Design

Lelièvre, Rousset, Stoltz 2010, Free Energy Computations: A Mathematical Perspective

(a) free energy computation: S = Rd, µ(dx) / e U(x)dx and f ⌘ I {A}.

(4)

Coupling Methods for HMC

N. Bou-Rabee

Computational Materials Science

Prokhorenko, Kalka, Nahas, Bellaiche 2018 npj Computational Materials

Rhombohedral

Orthorhombic

Tetragonal

Cubic

BaTiO

3

(5)

Coupling Methods for HMC

N. Bou-Rabee

n = 16 n = 32 n = 64 discretization refinement

Quantum Statistical Mechanics

Korol, Rosa-Raíces, Bou-Rabee, Miller 2020 J Chem Phys

(c) path integral molecular dynamics:

S = H = L2([0, ], Rd), µ(dx) / e U(x)N (0, ( P + aI) 1)(dx) where a > 0 is a parameter and f : H ! R.

(6)

Coupling Methods for HMC

N. Bou-Rabee

Statistical Inverse Problems

Stuart 2010: Inverse Problems: A Bayesian Perspective Cotter, Dashti, Robinson, Stuart 2009: Bayesian Inverse Problems for Functions and Applications to Fluid Mechanics

y = F (

x

) + ⌘

observational noise

(unknown) initial condition

sparse observations

2D Navier-Stokes

@

t

v

v + (v · r)v + rp = f

r · v = 0,

v|

t=0

=

x

(d) Eulerian data assimilation: S = H = ⇢ u 2 L2(T2),Z T2 u = 0, r · u = 0 , µ(dx) / exp ✓ 1 2|y F (x)|2⌃ ◆ N (ub, A ↵)(dx) for any ↵ > 1 and ub 2 H↵ ⇢ H; and f (x) ⌘ x.

(7)

Coupling Methods for HMC

N. Bou-Rabee

MCMC

Question

How many MCMC steps m ensure that ⌫⇡m is a good approximation of µ?

Some references: (i) Douc, Moulines, Priouret, Soulier 2018: Markov Chains; (ii) Eberle 2021: Markov Processes; (iii) Hairer 2010: Convergence of Markov Processes; (iv) Joulin, Ollivier 2010: Curvature, Concentration, and Error Estimates for Markov Chain Monte Carlo; (v) Levin, Peres, Wilmer 2009: Markov chains and mixing times; and (vi) Madras 2002: Lectures on Monte Carlo Methods.

Basic idea is to simulate a time-homogeneous Markov chain (Xm)m2N on (S, B) with initial distribution ⌫ and transition kernel ⇡ satisfying µ = µ⇡ and then estimating µ(f ) by

1 m

m X

i=1

f (Xi+b) where b is the “burn-in time.”

Powerful tools and techniques have been developed to address this question including • geometric (e.g. conductance and isoperimetric inequalities)

• analytic (e.g. spectral gaps, functional inequalities, and hypocoercivity) • probabilistic (e.g. minorization/drift conditions, coupling methods)

(8)

Coupling Methods for HMC

N. Bou-Rabee

Hamiltonian Monte Carlo (HMC)

HMC is an MCMC method for approximate sampling from target measures of the form µ(dx) / e U(x)dx

where U : Rd ! R+ is a twice differentiable potential energy function satisfying Z

Rd e

U(x)dx < 1 .

HMC uses a fictitious dynamics M d2

dt2qt(x, v) = rU(qt(x, v)) q0(x, v) = x

d

dtq0(x, v) = v where M is a d ⇥ d symmetric, positive definite mass matrix.

These are Newton’s equations for a particle in Rd with potential energy U(x).

Some references: (i) Bou-Rabee, Sanz-Serna 2018: Geometric integrators and the Hamiltonian Monte Carlo method; (ii) Duane, Kennedy, Pendleton, Roweth 1987: Hy-brid Monte Carlo; (iii) Leli`evre, Rousset, Stoltz 2010: Free energy computations: a mathematical perspective (iv) Neal 2011: MCMC Using Hamiltonian Dynamics

(9)

Coupling Methods for HMC

N. Bou-Rabee

HMC Transition Step

Definition (Transition Kernel of Exact HMC)

For any x 2 Rd and A 2 B, the transition kernel of exact HMC is defined by ⇡(x, A) = P[qT(x, ⇠) 2 A] ⇠ ⇠ N (0, M 1) .

A transition step of exact HMC inputs x 2 Rd and a duration parameter T > 0 and outputs qT(x, ⇠) where ⇠ ⇠ N (0, M 1).

x

q

T

(x, ⇠)

(10)

Coupling Methods for HMC

N. Bou-Rabee

HMC

Question

How many HMC steps m ensure that ⌫⇡m is a good approximation of µ?

By drift/minorization conditions, geometric ergodicity was verified in: (i) Bou-Rabee, Sanz-Serna 2017: Randomized HMC; (ii) Durmus, Moulines, Saksman 2020: On the convergence of HMC; (iii) Livingstone, Betancourt, Byrne, Girolami 2019: On the geo-metric ergodicity of HMC.

Theorem

The transition kernel of HMC satisfies µ⇡ = µ.

We answer this question by designing couplings tailored to HMC. These couplings are inspired by couplings for hypoelliptic diffusions. (i) Ben Arous, Cranston, Kendall 1995: Coupling Constructions for Hypoelliptic Diffusions: Two Examples; (ii) Eberle, Guillin, Zimmer 2019: Coupling and quantitative contraction rates for Langevin diffusions.

(11)

Coupling Methods for HMC

N. Bou-Rabee

Probability Metrics and Couplings

Let (S, B) be a Polish state space and let P(S) be the set of probability measures on S.

Definition (Coupling of Probability Measures)

A coupling of ⌫, ⌘ 2 P(S) is a 2 P(S ⇥ S) such that for any A 2 B

(A ⇥ S) = ⌫(A) and (S ⇥ A) = ⌘(A) .

Denote the set of all couplings of ⌫, ⌘ by Couplings(⌫, ⌘).

Definition (Wasserstein Distance)

The L1 Wasserstein distance with respect to the metric d on S of two probability measures ⌫, ⌘ 2 P(S) is defined by

Wd1(⌫, ⌘) = inf n

(12)

Coupling Methods for HMC

N. Bou-Rabee

Probability Metrics and Couplings

Definition (Total Variation Distance)

The total variation distance of probability measures ⌫, ⌘ 2 P(S) is defined by TV(⌫, ⌘) = supn|⌫(A) ⌘(A)| : A 2 Bo .

some references

(i) den Hollander 2012: Probability Theory: Coupling Method. (ii) Lindvall 2002: Lectures on the Coupling Method.

(iii) Villani 2009: Optimal Transport – Old and New.

Lemma (Coupling characterization of TV distance)

(13)

Coupling Methods for HMC

N. Bou-Rabee

Coupling Construction for HMC

The coupling transition step inputs (x, y) 2 Rd ⇥ Rd and outputs (q

T(x, ⇠), qT(y, ⌘)) where ⇠ ⇠ N (0, M 1) and ⌘ ⇠ N (0, M 1) are defined on a common probability space.

Definition (Coupling of HMC)

For any x, y 2 Rd and A, B 2 B, define the coupling transition kernel by ((x, y), A ⇥ B) = P[qT(x, ⇠) 2 A, qT(y, ⌘) 2 B]

where (⇠, ⌘) are a pair of random variables s.t. Law(⇠) = Law(⌘) = N (0, M 1) and ⌘ = (⇠) with maximal probability, i.e.,

P[⌘ 6= (⇠)] = TV(Law(⇠), Law( (⇠))) where : Rd ! Rd is a measurable, near identity map.

(14)

Coupling Methods for HMC

N. Bou-Rabee

Synchronous Coupling

x

y

q

T

(x, ⇠)

q

T

(y, ⇠)

(15)

Coupling Methods for HMC

N. Bou-Rabee

Synchronous Coupling

U is a function in C2(Rd) satisfying:

(A1) U has bounded second derivatives, i.e., L = sup kr2Uk < 1. (A2) there exists K 2 (0, 1)

(x y) · (rU(x) rU(y)) K |x y|2 for all x, y 2 Rd.

Theorem (Chen and Vempala 2019)

Suppose U satisfies (A1)-(A2) and T > 0 satisfies LT2  1/4. Then for all initial distributions ⌫, ⌘ 2 P(Rd), and for all m 0,

W1(⌫⇡m, ⌘⇡m)  e cm W1(⌫, ⌘) where c = KT2/10.

(16)

Coupling Methods for HMC

N. Bou-Rabee

Free One-Shot Coupling

x

y

x + T ⇠

(17)

Coupling Methods for HMC

N. Bou-Rabee

Free One-Shot Coupling

U is a function in C2(Rd) satisfying:

(A1) U has a local minimum at 0, and U(0) = 0.

(A2) U has bounded second derivatives, i.e., L = sup kr2Uk < 1. (A3) there exist constants R 2 [0, 1) and K 2 (0, 1)

(x y) · (rU(x) rU(y)) K |x y|2 for all x, y 2 Rd with |x y| R.

Theorem (Bou-Rabee, Eberle, and Zimmer 2020)

Suppose U satisfies (A1)-(A3) and T > 0 satisfies LT2  min K

L, 14, 256 LR1 2 . Then for all initial distributions ⌫, ⌘ 2 P(Rd), and for all m 0,

W1(⌫⇡m, ⌘⇡m)  M e cm W1(⌫, ⌘), where M = e52(1+R/T ), and c = 1 10 min ✓ 1, 1 2KT2(1 + RT )e R/(2T ) ◆ e 2R/T.

(18)

Coupling Methods for HMC

N. Bou-Rabee

Mean-Field Model

Consider a high-dimensional mean-field model where U : Rdn ! R is defined as U(x) = n X i=1 0 @V (xi) + ✏ n n X j=1,j6=i W (xi xj) 1 A , x = (x1, ... , xn) , xi 2 Rd . Here V , W are functions in C2(Rd) satisfying:

(A1) V has a local minimum at 0, and V (0) = 0. (A2) L = sup kr2V k < 1 and ˜L = sup kr2W k < 1. (A3) there exist constants R 2 [0, 1) and K 2 (0, 1)

(x y) · (rV(x) rV(y)) K |x y|2 for all x, y 2 Rd with |x y| R. Mean-field models were introduced by Kac to understand the statistical properties of high-dimensional systems. (i) Guillin, Liu, Wu, Zhang 2019: The kinetic Fokker-Planck equation with mean-field interaction; (ii) Kac 1956: Foundations of Kinetic Theory; (iii) McKean 1966: A class of Markov processes associated with nonlinear parabolic equations; (iv) M´el´eard 1996: Asymptotic behavior of some interacting particle sys-tems; McKean-Vlasov and Boltzmann models; (v) Mischler and Mouhot 2013: Kac’s program in kinetic theory.

(19)

Coupling Methods for HMC

N. Bou-Rabee

Componentwise Coupling

x

i

y

i

x

i

+ T ⇠

i

i

i

(⇠

i

)

(20)

Coupling Methods for HMC

N. Bou-Rabee

Componentwise Coupling

Theorem (Bou-Rabee and Schuh 2020)

Suppose V , W satisfy (A1)-(A3). Suppose T > 0 and ✏ 0 satisfy LT2  35 min✓ 3K10L, 14, 256 · 5 · 23K6LR2(L + K ) ◆ |✏|˜L < min 0 @ K 6 , 1 2 ✓ K 36 · 149 ◆2 T + 8R r L + K K !2 exp 40R T r L + K K !1 A . Then for all initial distributions ⌫, ⌘ 2 P(Rdn), and for all m 0,

W`11(⌫⇡ m, ⌘⇡m)  M e cm W1 `1(⌫, ⌘), where `1(x, y) = n X i=1 |xi yi| M = exp 5 2(1 + 4R T r L + K K ) ! and c = KT2 156 exp 10RT r L + K K ! .

(21)

Coupling Methods for HMC

N. Bou-Rabee

Perturbation of a Gaussian Measure in ∞-Dimension

Consider a target measure on a Hilbert space (H, h·, ·i) µ(dx) / exp( U(x))N (0, C)(dx)

Here C is a positive, trace class, symmetric linear operator with eigenfunctions {ei} and corresponding eigenvalues { i} arranged in descending order:

1 2 · · ·

U is a function in C2(H) satisfying: (A1) U has a local minimum at 0.

(A2) first and second derivative of U are bounded, i.e.,

K = sup krUk < 1 and L = sup kr2Uk < 1 .

In this setting, convergence bounds for preconditioned Crank-Nicolson and precondi-tioned MALA were developed in: (i) Eberle 2014: Error bounds for Metropolis-Hast-ings algorithms applied to perturbations of Gaussian measures in high dimensions; (ii) Hairer, Stuart, Vollmer 2014: Spectral gaps for a Metropolis-Hastings algorithm in infinite dimension.

(22)

Coupling Methods for HMC

N. Bou-Rabee

Exact Preconditioned HMC

Let M = C 1. Then the dynamics simplifies to d2

dt2qt(x, v) = qt(x, v) CrU(qt(x, v)) q0(x, v) = x

d

dtq0(x, v) = v

Definition (Transition Kernel of Exact Preconditioned HMC)

For any x 2 H and A 2 B, the transition kernel of exact preconditioned HMC is defined by

⇡(x, A) = P[qT(x, ⇠) 2 A] ⇠ ⇠ N (0, C) .

Preconditioned HMC was introduced in an 1-dimensional setting by Beskos, Pinski, Sanz-Serna, Stuart 2011: Hybrid Monte Carlo on Hilbert spaces.

(23)

Coupling Methods for HMC

N. Bou-Rabee

Two-Scale Coupling

x

i

y

i

q

T

(x

i

, ⇠

i

)

q

T

(y

i

, ⇠

i

)

i

i

high modes: i > n

This splitting of the Hilbert space goes back to work on stochastic Navier-Stokes (Mat-tingly PhD thesis 1998) and analogous deterministic results (Foias/Prodi 1967). See:

(i) Mattingly 2003: On recent progress for the stochastic Navier–Stokes equations; (ii) Zimmer 2017: Explicit contraction rates for a class of degenerate and

infinite-dimensional diffusions.

x

i

y

i

x

i

+ T ⇠

i

i

i

(⇠

i

)

low modes: 1  i  n

(24)

Coupling Methods for HMC

N. Bou-Rabee

Two-Scale Coupling

Theorem (Bou-Rabee and Eberle 2020)

Suppose U satisfies (A1)-(A2). Set

R = 16p60 21K2/2 + trace(C) 1/2 (1 + 1L)pL . Suppose T > 0 satisfies (1 + 1L)T2  min✓ 196p 1 3 1L(1 + 1L), 1 256(1 + 1L)R2 ◆ . Then for all initial distributions ⌫, ⌘ 2 P(H), and for all m 0,

W1(µ, ⌫⇡m)  M e cm where c = min✓ 1 32T2, 1 128T max(R, T )e max(R,T )/T ◆ .

(25)

Coupling Methods for HMC

N. Bou-Rabee

One-Shot Coupling

x

y

q

T

(x, ⇠)

(⇠)

Lemma (M ¨uller and Ortiz 2004)

Suppose (A2) holds. For any a, b 2 Rd and for any T > 0 s.t. LT2  (2/5)⇡2, there exists a unique solution to the boundary value problem:

d2

(26)

Coupling Methods for HMC

N. Bou-Rabee

One-Shot Coupling

U is a function in C3(Rd) satisfying:

(A1) U has a global minimum at 0 and U(0) = 0.

(A2) U has bounded second derivatives, i.e., L = sup kr2Uk < 1. (A3) U has bounded third derivatives, i.e., LH = sup kr3Uk < 1.

Madras and Sezer 2010 connected one-shot couplings for Markov chains to “upgraded” convergence bounds.

Theorem (Bou-Rabee and Eberle 2021)

Suppose U satisfies (A1)-(A3) and T > 0 satisfies LT2  1/6. Then for all initial distributions ⌫ 2 P(Rd), and for all m 0,

(27)

Coupling Methods for HMC

N. Bou-Rabee

Conclusion and Prospects

x y qT(x, ⇠) qT(y, ⇠) ⇠ ⇠

x

y

x + T ⇠

(⇠)

x

y

q

T

(x, ⇠)

(⇠)

xi yi qT(xi, ⇠i) qT(yi, ⇠i) ⇠i ⇠i high modes: i > n xi yi xi + T ⇠i ⇠i i(⇠i) low modes: 1  i  n

1. synchronous 2. free one-shot

5. one-shot 4. two-scale xi yi xi + T ⇠i ⇠i i(⇠i) 3. componentwise

(i) These couplings for exact HMC offer significant flexibility for further extensions. (ii) These results have been mostly extended to unadjusted HMC and partly

ex-tended to Metropolis-adjusted HMC.

(iii) As a byproduct of TV convergence bounds, we obtain geometric tail bounds for the corresponding coupling times, which are crucial for new coupling-based unbi-ased estimation techniques (see Heng, Jacob 2019; and Jacob, O’Leary, Atchad´e JRSSB Discussion Paper 2020).

(28)

Coupling Methods for HMC

N. Bou-Rabee

Mixing Time Guarantees

Unadjusted HMC

(Bou-Rabee/Eberle 2021)

Metropolized HMC

(Chen/Dwivdedi/Wainright/Yu 2020 JMLR) non-strongly logconcave model

-mean-field model with weak interactions

-non-strongly logconcave model (warm start) strongly logconcave

model (warm start)

O(d

4/3

log(1/"))

O(d

11/12

log(1/"))

O(d

3/4

"

1/2

log(d/"))

O(d

3/4

"

1/2

log(d/"))

O(d

3/4

"

1/2

log(d/"))

O(d

1/2

"

1/2

log(d/"))

References

Related documents

The Advisory Commission on Electronic Commerce (the “Advisory Commission”) was established by the Internet Tax Freedom Act to study Federal, State, local, and international tax

The results shows that most operators in the passenger transport operators have the adequate knowledge on how to prepare the successor and they know that there is a

Abstract: 7KH REMHFWLYHV RI WKLV UHVHDUFK ZHUH WR DQDO\]H WKH XVH RI ODERU DOORFDWLRQ SURGXFWLYLW\ DQG LQFRPH RI VZDPS ULFH IDUPHUV RI XVHUV DQG QRQ XVHUV RI FRPELQH

P1 TENS Acute pain relief Tingling – Just below pain threshold 30 min P2 TENS Long term pain relief Tingling - Just below pain threshold 30 min P3 BURST Chronic

Layer 2 Tunnel Protocol (L2TP) IP PPP Layer 2 Forwarding Protocol (L2F Protocol) Point-to-Point Tunnelling Protocol (PPTP) IP.. Manfred Lindner VPN Intro + VPDN, v4.3 61

(ii) Explain some of the advantages and disadvantages of aid for helping poor countries and reducing the wealth gap between them and rich

The objective of this plan is the redevelopment of 4 existing AGFD water developments and the construction of 1 new water development on Big Black Mesa to meet the present and future