Analytic elliptical slice sampling - Advances in Bayesian inference and stable optimization for

E(x, ν) =_{x0 :x0=xcos(θ) +νsin(θ) for some θ_∈[0,2π)_}

as in an ESS iteration as defined by (2.2). Let S(y;E) be the slice corresponding to the acceptable points in_E for a given slice heighty:

S(y;_E) =_{x0 _{∈ E} :_L(x0)> y_}.

If we can analytically characterize _S(y;_E) then we only need to sample a point uniformly from the slice to propagate the Markov Chain [Neal, 2003]. This has three advantages: (i) We eliminate expensive slice shrinkage steps which reduces the computational cost of our sampler; (ii)In standard slice sampling algorithms, shrinkage steps bias the next sample to be close to the current sample thereby introducing correlations. Since we uniformly sample over _S(y;_E), the resulting samples are less correlated as we are not biased towards the current point; (iii) We can easily incorporate the recycling idea here resulting in an extremely efficient algorithm, which we refer to as Analytic Elliptical Slice Sampling.

As in Recycled ESS, in Analytic ESS we sample J >1 points from each ellipse_E. To do so we first sample J different Y values, y1, y1, ..., yJ, which are evenly spaced in a Quasi Monte-Carlo

Algorithm 3:Analytic Slice Sampling

Input : Likelihood ˆ_L, prior ˆp, initial pointx(0)₁ , subroutineSample Ellipseto sample an ellipse, subroutine Characterize Slice to analytically characterizeS(·;E),

number of iterationsN, number of slices per iteration J

Output: Samples from Markov Chain ((x(1)₁ , ..., x(1)_J ), ...,(x(₁N), ..., x(_JN))) 1 for i= 1 to N do

2 E ←Sample Ellipse(x(₁i−1),pˆ)

3 S(·;E)←Characterize Slice(E) 4 u∼ Uniform [0,1]

5 forj= 1 to J do

6 Define slice height: y_j ←(j−u)/J·Lˆ(x(₁i−1))

7 Sample accepted point: x(_ji)← Uniform {x:x∈ S(yj;E)}

8 end

9 (x(₁i), ..., x(_Ji))←rand perm(ˆx(₁i), ...,xˆ(_Ji))

10 end

11 return ((x(1)₁ , ..., x(1)_J ), ...,(x(₁N), ..., x(_JN)))

fashion. Corresponding to eachyj value, we analytically solve for S(yj;E) (which has only a small amortized computational cost). One point is then uniformly sampled from each slice_S(yj;E). The pseudocode for Analytic ESS is given in Algorithm 3 and in Theorem 2.4.1 we prove its validity. Theorem 2.4.1. Each element in the Analytic ESS Markov chain has marginal stationary distribution p∗_.

Proof. The proof follows exactly the same argument as in Theorem 2.3.2.

Unfortunately solving for _S(y;_E) in closed form is not possible in general, although it can be done for the Truncated Multivariate Gaussian (TMG) as shown below.

CHAPTER 2. ELLIPTICAL SLICE SAMPLING WITH EXPECTATION PROPAGATION 19

2.4.1 Analytic EPESS for TMG

The (linear) TMG distribution is defined as:

p∗(X) _{∝ N}(X; 0, I) m

j=1

1(L>_j X_≥cj).

Using the EP approximation_N(X;µ,Σ) and (2.3), we can rewrite the density p∗ as:

p∗(X) _{∝ N}(X; 0, I) m Y j=1 1(L>_jX _≥cj) ∝ N(X;µ,Σ)_· N(X; 0, I) N(X;µ,Σ) m Y j=1 1(L>_jX _≥cj) =_N(Z; 0,Σ)_·N(Z;−µ, I) N(Z; 0,Σ) m Y j=1 1(L>_j (Z+µ)_≥cj) ≡ N(Z; 0,Σ)·Lˆ(Z) ≡p˜(Z)

where Z =X−µis a transformation with identity Jacobian. We are able to apply Analytic ESS to ˜p(Z) =_N(Z; 0,Σ)_·_Lˆ(Z) and can then recover samples for X by reversing the transformation via X=Z+µ.

A TMG slice is analytically characterized as:

S(y;_E) =_{θ_∈[0,2π) : ˆ_L(z0(θ))> y_}= m

j=1

Θj∩Θy,

wherez0₍_θ_{) =}_z_cos(_θ_{) +}_ν_sin(_θ_{) and}

Θj ={θ∈[0,2π) :Lj>z0(θ) +L>j µ≥cj} Θy =_{θ_∈[0,2π) : N(z

0₍_θ_);₋_{µ, I}₎ N(z0₍_θ_{); 0}_,_Σ) > y}.

The region Θj is the part of the ellipse that lies in the halfspace defined by Lj and cj. Since it is defined by a linear inequality of sin(θ) and cos(θ), it is easily characterized using basic trigonometry. The resulting region may be rewritten as Θj = [0,2π)−(lj, uj) for some 0 < lj ≤ uj < 2π. The intersection of all m regions can be computed asTm_j₌₁Θj = [0,2π)−Sj:lj6=uj(lj, uj), which takes

The inequality defining Θy can be simplified by taking logarithms on both sides, which reduces it to the form:

a0+a1cosθ+a2sinθ+a3cosθsinθ+a4cos2θ >0, (2.7) where ai = ai(µ,Σ, y, z, ν). The roots of this inequality can be obtained by solving a quartic equation. Finally, S(y;E) = Tm_j₌₁Θj ∩Θy can be computed by taking the intersection of the intervals defined by (2.7) and those defined byTm_j₌₁Θj = [0,2π)−

j:lj6=uj(lj, uj).

The most expensive operations in Analytic ESS are generatingE and computingTm_j₌₁Θj, after which sampling from_S(y;_E) is relatively cheap. To see this, letddenote the dimension of X and

m the number of linear truncations. To sample the ellipse E we must draw ν from the Gaussian prior which costsO(d2_{). Computing}Tm

j=1Θj involves takingminner products ofZ =zandνwith

the linear truncations Lj, which can be calculated in O(md), and taking the intersection of the intervals Θjat a cost ofO(mlogm). The total computational overhead is thusO(d2₊_md₊_m_log_m_).

Once this is done, sampling from S(y;E) is cheap. It involves sampling the slice height y, solving the quartic equation from (2.7) for Θy, taking the intersection of Θy and Tm_j₌₁Θj, and sampling uniformly from the resultant domain, all of which can be done inO(d+m) (here we have added in the expense of storing each sample at a costO(d)).

Since the upfront cost of generating _E and computingTm_j₌₁Θj isO(d2+md+mlogm), we can sampleO(d) points per ellipse_Ewithout significantly increasing the total computational complexity. This leads to a high effective sample size relative to the computational complexity.

The Exact-HMC algorithm for TMG [Pakman and Paninski, 2014] has an intimate relationship with Analytic ESS and inspired our analytic framework. We explain this connection in Section 2.5.2.

In document Advances in Bayesian inference and stable optimization for large-scale machine learning problems (Page 36-39)