## arXiv:2012.13829v1 [math.PR] 26 Dec 2020

### Hahn polynomials and the Burnside process

### Persi Diaconis

∗†### Chenyang Zhong

‡†Abstract

We study a natural Markov chain on{0,1,· · ·, n} with eigenvectors

the Hahn polynomials. This explicit diagonalization makes it possible to get sharp rates of convergence to stationarity. The process, the Burn-side process, is a special case of the celebrated ‘Swendsen-Wang’ or ‘data augmentation’ algorithm. The description involves the beta-binomial dis-tribution and Mallows model on permutations. It introduces a useful generalization of the Burnside process.

Dedicated to the memory of Richard Askey

### 1

### Introduction

Over the past 100 years, physicists, geneticists, statisticians, and probabilists have found positive symmetric operators (Markov chains) with orthogonal poly-nomial eigenfunctions. This allows precise asymptotic analysis of high powers of the operator. Orthogonal polynomials come in ‘families’ and an explicit di-agonalization suggests ‘deforming’ the operator so that the full family of eigen-functions appears. These deformations can turn out to be useful and natural. This paper gives an example with these characteristics.

### 1.1

### The Burnside process

Let X be a finite set. Let G be a finite group acting on X. This splits X

into disjoint orbits. Let Ox = {xg : g ∈ G} be the orbit containing x. The
Burnside process gives a practical way to choose an orbit uniformly at random.
This captures a familiar problem: enumerating unlabeled objects. For example,
there arenn−2_{labeled trees on}_{n}_{vertices. There is no formula for the number of}

unlabeled trees. Indeed, there is an emerging literature for choosing a random spanning tree of a graph. As a second example, let X =Gwith G acting on

∗_{Department of Statistics and Mathematics, Stanford University}
†_{Research supported by NSF grant DMS 1954042}

itself by conjugation. Now the orbits are conjugacy classes ofG. For instance, ifG=Sn, the symmetric group, the conjugacy classes are indexed by partitions ofnand the Burnside algorithm gives a novel way to choose a random partition.

Let

Xg={y∈ X :yg=y}, Gx={g∈G:xg=x}. The Burnside process is a Markov chain onX:

• Fromx, chooseg∈Gx uniformly at random

• Fromg, choosey ∈ Xg uniformly at random

One step of the chain goes fromxtoy. The transition matrix of this chain is

K(x, y) = X g∈Gx∩Gy

1

|Gx| 1

|Xg|

, an|X | × |X |matrix.

It is easy to see thatK has stationary distribution

π(x) = z−

1

|Ox|

(z= number of orbits).

This means, if the chain is run from any starting state x0, after ‘a long time’

the orbit containing the current state is close to uniformly distributed.

In this generality, useful quantitative analysis of the Burnside process is an open problem. Even for the symmetric group.

We study a special case where analysis can be pushed through. Let

• X =C2n — the binaryn-tuples

• G=Sn acting by permuting coordinates Ifx∈ X hasj ones (denoted|x|=j), the orbit

Ox={y∈ X :|y|=j}. (1.1)

Here 0≤ |x| ≤nand the Burnside process gives a complicated way of choosing uniformly from{0,1,· · ·, n}. It is instructive to see how the general algorithm specializes:

• Forx∈ X, the permutations fixing xform the subgroupSj×Sn−j (for

j=|x|) permuting the ones among themselves and the zeros among them-selves. It is easy to chooseσ∈Gx at random.

• Forσ∈Sn, writeσas a product of cycles. Label the entries of each cycle,

randomly, by a zero or a one (2c(σ)_{choices if}_{σ}_{has}_{c}_{(}_{σ}_{) cycles). Installing}

A first result of this paper shows that, for this example, a finite number of steps are necessary and sufficient for convergence no matter how large nis. To say this carefully, define

kKxl −πk= 1 2

X

y∈X

|Kl(x, y)−π(y)|.

Theorem 1.1. For X = Cn

2, G = Sn, the Burnside process defined above

satisfies, for all n≥2, starting statex0= (1,1,· · · ,1) and alll∈ {1,2,3,· · · },

1 4(

1 4)

l

≤ kKxl0−πk ≤4(

1 4)

l_{.} _{(1.2)}

The development thus far doesn’t seem to have much to do with orthogonal polynomials. Indeed, we do not really understand where they come from or just when they will appear in other versions of the Burnside process. Let us explain their appearance here.

LetX0=x, X1, X2,· · · be the realization of the Burnside chain of Theorem

1.1. There is enough symmetry around that |X0| =|x|,|X1|,|X2|,· · · forms a

Markov chain on {0,1,2,· · ·, n}. Let Q(i, j) be the transition matrix for this
chain. The explicit form ofQis derived in Proposition 5.2 below (originally it
was written out in [13, (3.1)-(3.3)]). TheQ chain turns out to be symmetric
with a uniform stationary distributionu(j) = _{n}_{+1}1 ,0≤j≤n.

Let Tn

j (x) be the discrete Chebyshev polynomials on {0,· · ·, n}—the or-thogonal polynomials for the uniform distribution ([9]).

Theorem 1.2. For the Markov chainQ(i, j)on{0,· · · , n}, the non-zero eigen-values are given by1and{(

2k k)

2

24k }

⌊n

2⌋

k=1. Moreover the eigenfunctions corresponding

to the zero eigenvalues are the discrete Chebyshev polynomials on{0,1,· · ·, n} with odd degree. The eigenfunction corresponding to the eigenvalue (

2k k)

2

24k is the

discrete Chebyshev polynomial on{0,1,· · ·, n} with degree 2k,1≤k≤ n2.

Theorem 1.2 and classical analysis yield Theorem 1.1. The proof appears in Section 3.

### 1.2

### Hahn polynomials

The discrete Chebyshev polynomials sit naturally in the family of Hahn poly-nomials: forα, β >0,

Qj(x) =3F2(−j, j+α+β−1,−x;α,−n|1) (1.3)

(Set α = β = 1 for discrete Chebyshev). Qj is a polynomial of degree j for 0≤j ≤n. The usual notation indicates withα, β and nsub and superscripts suppressed here. These polynomials are orthogonal with respect to the ‘beta-binomial distribution’ defined and discussed in Section 2.3 below.

It is natural to ask, is there some variant of the Burnside process havingQj as eigenfunctions? This is a mathematicians question, as per the first paragraph

of this introduction. To answer it, we found a useful abstract generalization of the general Burnside process which we hope has a life of its own. Specialized to Cn

2 and Sn it gives a Markov chain on {0,1,· · ·, n} with a ‘beta-binomial distribution’ having Hahn polynomial eigenfunction and allowing a parallel to Theorem 1.1.

Section 2 of this paper sets out needed background: on Markov chains, the Burnside process and orthogonal polynomials (it also gives a brief overview of the many other Markov chains with polynomial eigenfunctions—from Askey-Wilson to Macdonald!) Theorems 1.1 and 1.2 are proved in Section 3. Section 4 explains our ‘twisted Burnside process’ from first principles. The analogues of Theorems 1.1 and 1.2 are stated in Section 5.

### 1.3

### Acknowledgement

Richard Askey’s support and encouragement was crucial throughout our fledgling efforts to learn and apply the beautiful subject that he built. He took us in, pa-tiently fielded ‘stupid questions’, introduced us to his community and published our papers. We also thank Howard Cohl for his help.

### 2

### Background

This section contains background material on the Burnside process and related ‘auxiliary variables’ algorithms (Section 2.1). Needed theory from the analytic-geometric theory of Markov chains are in Section 2.2. Background on orthogonal polynomials and connections to Markov chains (Cannings method) are in Sec-tion 2.3. Since our readership has diverse backgrounds we attempt a friendly tutorial.

### 2.1

### The Burnside process and auxiliary variables

The Burnside process was introduced by Mark Jerrum ([25]) as a contribution to ‘computational Polya theory’. Polya theory is about enumeration under symmetry: “how many different ways can ten dice be colored with red, white, blue where the order of the dice and the symmetry of a die are neglected?” This is a standard part of combinatorics. The book by Polya and Read ([32]) gives a classical account. Jerrum, working with Leslie Ann Goldberg ([26],[20],[21]), managed to show that for carefully selected problems, these computations are #-P complete. Our work shows that for many natural problems, the Burnside process is highly efficient. This opens up a research area—when is it good? Applications of the Burnside process to practical problems in computer science ([23]) suggest it is worth study.

A different motivation comes because the Burnside process is a special case of a wonderful ‘unifying algorithm’ called variously ‘auxiliary variables’, ‘data augmentation’ or ‘hit and run’. Briefly letX be a finite or countable set,π(x)>

Auxiliary variables create ‘non-local’ Markov chains, able to move far away in a single step. They seem to mix much more rapidly than standard local algorithms. Introduce a setI of auxiliary variables. For eachi∈I andx∈ X, choose a proposal kernel wx(i) ≥ 0 (a way of moving from x to i) such that

P

iwx(i) = 1 and for each i, there is at least one x with wx(i) > 0. These ingredients define a joint probabilityf(x, i) =π(x)wx(i) onX ×I. To proceed, for eachi, a Markov chainKi(x, y) with stationary distributionf(x|i) must be specified. The auxiliary variables algorithm is

• Fromx, chooseiwith probabilitywx(i) • Fromi, choosey with probabilityKi(x, y)

The chain moves fromxtoy in one step. The transition matrix is

K(x, y) =X i

wx(i)Ki(x, y).

A direct calculation shows thatπ(x) is a stationary distribution:

X

x

π(x)K(x, y) = X x

π(x)X i

wx(i)Ki(x, y) =X i

X

x

wx(i)π(x)Ki(x, y)

= X

i

m(i)X x

f(x|i)Ki(x, y) =X i

m(i)f(y|i) =π(y) (m(i) =P

xf(x, i) is the marginal distribution). Of course connectedness and aperiodicity ofK must be checked so that the Perron-Frobenius theorem is in force.

A comprehensive review of this class of algorithms is in [2] which gives dozens of examples and special cases including the celebrated Swendsen-Wang algo-rithm for sampling from the Ising model.

The point for now is that essentially none of the algorithms have sharp running time analysis. The Burnside process falls into this class: take I=G,

wx(i) the uniform distribution onGx andKg(x, y) the uniform distribution on

Xg. The hope is, because of the extra group structure available, analysis will be easier for this special case. That was our initial motivation.

There has been some previous effort to study the Burnside process onX =
[k]n_{, with} _{G}_{=}_{S}

n—here [k] ={1,2,· · ·, k}.

Theorems 1.1 and 1.2 studyk= 2. Jerrum ([25]) showed that fork= 2 the total variation mixing time is at most of order√n; sharpening this, Diaconis proved that, fork= 2,kKl

x−πk ≤(1−c)l withc= π1. Theorem 1.1 sharpens this to get the exact rate and makes the connection to orthogonal polynomials (versus the complex, difficult argument of [13]). David Aldous ([1]) used a clever coupling argument to show that, for generalk, kKl

x−πk ≤ n(1−k1)
l_{.}
This impliesl of order lognsteps suffice fork= 2. Further results are in [8].

### 2.2

### Analysis of Markov chain convergence

For readers unfamiliar with Markov chains, the best introduction is the book by David Levin and Yuval Peres [30]. This contains all the bounds below and much more. The analytic-geometric theory of Markov chains is well developed in Laurent Saloff-Coste’s [33].

LetX be a finite set,π(x)>0,P

xπ(x) = 1 a probability distribution onX. A Markov chain onX is specified by a matrixK(x, y)≥0 withP

yK(x, y) = 1. Suppose that πis a stationary distribution forK: P

xπ(x)K(x, y) =π(y) (so

πis a left eigenvector forK with eigenvalue 1). Callπ, K reversible if the
de-tailed balance conditionπ(x)K(x, y) =π(y)K(y, x) is satisfied for allx, y∈ X.
All the Markov chains below satisfy detailed balance. LetL2_{(}_{π}_{) =}_{{}_{f} _{:} _{X →}

R_{:} P

f(x)2_{π}_{(}_{x}_{)} _{<} _{∞}}_{.} _{K} _{acts on} _{L}2_{(}_{π}_{) by} _{Kf}_{(}_{x}_{) =} P

yK(x, y)f(y).
De-tailed balance is equivalent to saying thathKf|gi=hf|KgisoKis a bounded,
self-adjoint operator on L2_{(}_{π}_{). The spectral theorem is in force: there exist}

eigenvaluesβi(1 =β0≥β1≥ · · · ≥β_{|X |−}1≥ −1) and eigenfunctionsψi(x) (so

Kψi(x) =βiψi(x)).

Throughout this section assumeK is connected (for everyx, y∈ X there is

l≥1 withKl_{(}_{x, y}_{)}_{>}_{0) and aperiodic (}_{β}

|X |−1>−1). It is useful to introduce

two distances

• Total variation distance—kKxl−πk=12

P

y|Kl(x, y)−π(y)|
= max_{k}fk∞≤1K

l_{(}_{f}_{)}_{−}_{π}_{(}_{f}_{). Here}_{k}_{f}_{k}

∞= maxx|f(x)|, π(f) =Pπ(x)f(x).

This equality is easy to prove (the maximizingfis 1 atyifKl_{(}_{x, y}_{)}_{≥}_{π}_{(}_{y}_{)}
and−1 otherwise).

• Chi-square distance—χ2x(l) =

P

y

(Kl_{(}_{x,y}_{)}

−π(y))2

π(y) .

The Cauchy-Schwarz inequality implies

4kKxl −πk2≤χ2x(l). (2.1)

This is a useful route to getting bounds on convergence: boundL1 byL2 and use eigenvalues to boundL2. Eigenvalues and eigenfunctions come in by

χ2x(l) =

|X |−1

X

i=1

β2l

i ψi2(x). (2.2)

The following lower bound on total variation and chi-square distance is proved in [15, Lemma 2.1]. It is applied in Section 3.

Proposition 2.1. Suppose thatψ is an eigenfunction of K with eigenvalueβ.
Assumekψk2_{= 1}_{. Then}

χ2x(l)≥ |ψ(x)|2|β|2l.

For ψwith kψk∞<∞,

kKl

x−πk ≥

|ψ(x)||β|l 2kψk∞

### 2.3

### Orthogonal polynomials and Markov chains

Background on orthogonal polynomials is wonderfully presented in the intro-ductory text by Chihara [9]. The recent encyclopedia by Ismail et al [24] brings in much further material.

Hahn polynomials These are the orthogonal polynomials on {0,1,· · ·, n}

with respect to the beta-binomial distribution. For reference and variations of the beta-binomial distribution see [35]. Fixα, β >0. Define

m(j) =

_{n}

j

_{(}_{α}_{)j(}_{β}_{)n}

−j

(α+β)n ,(α)x=α(α+ 1)· · ·(α+x−1),(α)0= 1. (2.3) This is a probability distribution on{0,1,· · ·, n} generated as a beta mixture of binomial distributions

m(j) =

Z 1

0

_{n}

j

xj_{(1}_{−}_{x}_{)}n−jΓ(α+β)
Γ(α)Γ(β)x

α−1_{(1}

−x)β−1_{dx.}

Whenα=β= 1,m(j) = 1

n+1–the uniform distribution.

The orthogonal polynomials form(j) are called Hahn polynomials. Detailed development is in [27]. They have the explicit form

Qj(x) =3F2(−j, j+α+β−1,−x;α,−n|1)

normalized soQj(0) = 1,Qj(n) =

(−β−j)j

(α+1)j . The standard form

rFs(a1,· · ·, ar;b1,· · ·, bs|x) =

∞

X

l=0

(a1· · ·ar)lxl

(b1· · ·bs)ll!

with (a1· · ·ar)l =Q_{i}r=1(ai)l, is used. When α=β = 1 these are the discrete

Chebyshev polynomials

Q0(x) = 1, Q1(x) =n−2x

n , Q2(x) =

6x2_{−}_{6}_{nx}_{+}_{n}_{(}_{n}_{−}_{1)}

n(n−1) .

In Section 3 the proof of Theorem 1.2 uses ‘Cannings argument’ ([7]), a frequently used tool for proving that a Markov chain has orthogonal polynomial eigenfunctions:

‘If the operator K sends polynomials of degree j to polynomials of degree

j for all 0≤j ≤n, then the operator has the orthogonal polynomials for the stationary distribution as its eigenfunctions.’

The following lemma is a formal statement and proof of Cannings lemma. A very similar argument works for multivariate polynomials.

Lemma 2.1. Suppose that K is the transition matrix of a reversible Markov chain with state spaceX ={0,1,· · ·, n} and stationary distributionπ. Suppose

that for any polynomial f on X of degree l ≤n, Kf is a polynomial on X of

degree less than or equal tol. Then we conclude that the eigenfunctions of K

Proof. We prove by induction that for any 0≤l ≤n, the orthogonal polyno-mial forπ of degreel is an eigenfunction ofK. We denote the lth orthogonal polynomial for π byφl. Forl = 0, we haveK~1 =~1, where~1 is the constant function taking value 1. Now forlsuch that 1≤l≤n, by induction hypothesis we suppose that for any 0≤i≤l−1,Kφi=λiφi. Nowφl is a polynomial of degreel, henceKφl is a polynomial of degree≤l. ExpandingKφl in terms of

{φi}li=0, we get

Kφl=λlφl+ l−1

X

i=0

aiφi

for someλl,{ai}l_{i}−=01. Now consider the inner product forπdefined as

hf, giπ := n

X

k=0

f(k)g(k)π(k).

K is self-adjoint with respect to this inner product, hence we have for any 0≤i≤l−1,

hKφl, φiiπ =hφl, Kφiiπ, which leads to

ai=

λihφl, φiiπ

hφi, φiiπ = 0.

Therefore we conclude thatKφl=λlφl.

The connections between orthogonal polynomials and Markov chains is long-standing. Indeed, perhaps the first Markov chain–the Bernoulli-Laplace urn is diagonalized by Hahn polynomials. To briefly recall, Bernoulli and Laplace con-sidered two urns, the left containing n red balls, the right containingn black balls. Each time, a ball is picked uniformly at random and the two balls are switched. The number of red balls in the left urn evolves as a Markov chain on {0,1,· · ·, n}. In [17] this chain is diagonalized and shown to have Hahn polynomial eigenfunctions. That same paper treats the Ehrenfest’s urn (with Krawtchouk polynomial eigenfunctions) and severalq-deformations of these two examples. These urn models, and the birth and death chains discussed below, are examples of ‘local Markov chains’. Most appearances of orthogonal poly-nomials have occurred for such local chains (analogs of the Laplacian). The present examples are much more vigorous.

Jacobi polynomials arise as eigenfunctions of a Markov chain constructed from the ‘Gibbs sampler’ [15]. Extensions of this construction need the full power of the Askey-Wilson polynomials ([6]).

An extensive connection between orthogonal polynomials and birth-death chains follows from the Karlin-McGregor theory. A textbook account with full details is in [3]. Multivariate orthogonal polynomials as in [19] arise in natural genetics problems. See [29],[14],[36] for multivariate Hahn and Krawtchouk polynomials and [16] for Macdonald polynomials. This is just a small sample, drawn from our work with colleagues. The list goes on and on.

### 3

### Proof of Theorems 1.1 and 1.2

The proof consists of three main parts. First we show that the operatorQof Section 1.1 sends polynomials to polynomials. Thus Cannings lemma (Lemma 2.1) shows that the discrete Chebyshev polynomials are the eigenfunctions of the Burnside process (lumped to orbits). Next, the eigenvalues are computed, proving Theorem 1.2. Finally, the eigen structure and analytic tools of Section 2.2 are used to prove Theorem 1.1.

Throughout this section α=β = 1 and the stationary distributionu(j) =

1

n+1 for 0≤j≤n.

### 3.1

### Proof of Theorem 1.2

We denote by Tn

j (x) the discrete Chebyshev polynomial on{0,1,· · ·, n} with degreej below. We also assume that n is even below (the proof for odd n is similar).

Before the proof of Theorem 1.2, we present some preparatory lemmas. The following lemma follows from Polya’s cycle index theorem ([34],[18, Sec-tion 5]). We recall the notaSec-tion. For σ in the symmetric group Sn, write

ai(σ) for the number of i-cycles when σis written in cycle notation. So a1(σ)

is the number of fixed points, a2(σ) is the number of transpositions,· · · Thus

0≤ai(σ)≤nandPn

i=1iai(σ) =n. Write the cycle index ofSn as

Cn(x1,· · ·, xn) = 1

n!

X

σ∈Sn

Y

xai_{i} (σ). (3.1)

The generating function of theseCn isC(t) =P∞n=0Cntn. Polya showed

C(t) =eP∞i=1xitii. (3.2)

Repeatedly differentiating in the xi, setting xi = 1 and comparing coefficient gives

Lemma 3.1. For n= 1,2,· · ·, any1 ≤k1<· · ·< kr ≤n andl1,· · ·, lr ≥1,

ifPn

i=1kili≤n, then for a random permutationσuniformly chosen in Sn, r

Y

i=1

(li)!E_{[}

r

Y

i=1

_{a}

ki(σ)

li

] = r

Y

i=1

1

k_{i}li; (3.3)

while ifPn

i=1kili> n, then

E_{[}

r

Y

i=1

_{a}

ki(σ)

li

] = 0. (3.4)

Below, by “even order terms” we mean monomials of the formαXl1

i1· · ·X

lk ik withi1<· · ·< ik,α∈Randl1,· · ·, lk even.

Lemma 3.2. Suppose thatN, a≥1. The “even order terms” in the expansion of(PN

i=1Xi)2a can be expressed as a linear combination of the following form

k

Y

q=1

( N

X

i=1

X_{i}2lq) (3.5)

for some k≥1,l1,· · ·, lk ≥1 and Pk_{q}=1lq =a. Moreover, the coefficients in

the linear combination only depend on a(and do not depend onN).

Proof. By adding finitely many indeterminates and taking them to be 0 in

the end, we can assume without loss of generality that N ≥ 2a. We note that the “even order terms” of (PN

i=1Xi)2a is a symmetric polynomial in the

indeterminates X2

1,· · ·, XN2. By the proof of [5, Proposition 2.9], it can be expressed as linear combination of the form (3.5), and the coefficients only depend ona(do not depend onN).

Proof of Theorem 1.2, eigenfunction part. First we prove by induction that for

0 ≤k ≤ n

2, T2nk only has even-order terms in (x− n2), and T2nk−1 (for k ≥1)

only has odd-order terms in (x−n

2). For the base case note thatT0n= 1. The

inductive step can be achieved by applying the Gram-Schmidt procedure, as the
uniform distribution on{0,1,· · ·, n}is symmetric around n_{2}. For any 0≤i≤k,
if we write (by the induction hypothesis)

T2i(x) =

i

X

t=0

a2i,2t(x−n

2)

2t_{,} _{(3.6)}

then

n

X

x=0

(x−n_{2})2k+1_{T}
2i(x) =

i

X

t=0

a2i,2t n

X

x=0

(x−n_{2})2t+2k+1 _{= 0}_{.} _{(3.7)}

ThereforeT2k+1only has odd-order terms in (x−n_{2}). Similarly, it can be shown

thatT2k+2 only has even-order terms in (x−n_{2}).

Now asQ(i, j) =Q(i, n−j), we conclude that Q[(x−n2)2k−1] = 0 for any

1 ≤k ≤ n

2. By the previous conclusion, QT

n

2k−1 = 0. Hence {T2nk−1}

n

2

k=1 are

eigenfunctions ofK corresponding to the zero eigenvalues.

In view of Lemma 2.1, in order to show that the rest of the eigenfunctions are also given by the discrete Chebyshev polynomials, it suffices to show that for anyasuch that 1≤a≤ n

2, Q[(x−n2)2a](j) is a polynomial in j of degree

≤2a(wherej∈ {0,1,· · ·, n}). Below we verify this fact.

Suppose that we start from the orbitj. For the first step in the Burnside pro-cess,σ1∈Sj andσ2∈Sn−j are drawn uniformly. Let {ai}ni=1,{bi}ni=1 denote

the number of cycles of lengthiinσ1, σ2, respectively. For the second step, we

la-bel the entries of each cycle by 0/1 independently. LetZi∼Binomial(ai+bi,1_{2}).
The outcome of one iteration of the Burnside process (number of coordinates

labeled by 1) can be represented byX =Pn

i=1iZi. For 1≤a≤ n2, we denote

by

W2a:=E[(X− 1 2n)

2a_{] =}_{E}_{[(}_{X}

−

n

X

i=1

1

2i(ai+bi))

2a_{]}_{.} _{(3.8)}

We note that

W2a =E[E[(X − n

X

i=1

1

2i(ai+bi))

2a_{|}_{σ}

1, σ2]]. (3.9)

Below we denote by ˜W2a(σ1, σ2) the conditional expectation inside the above

expression.

By the multinomial theorem and vanishing of odd moments, we obtain that ˜

W2a(σ1, σ2) = E[(

n

X

i=1

i(Zi− 1

2(ai+bi)))

2a

|σ1, σ2]

= X

j1+···+jn=a,

j1,···,jn≥0

_{2}_{a}

2j1,· · ·,2jn

×

n

Y

i=1

i2ji_{E}_{[(}_{Z}
i−

1

2(ai+bi))

2ji_{|}_{σ}

1, σ2]. (3.10)

By splittingZi into Bernoulli random variables and using the multinomial the-orem again we obtain

E_{[(}_{Z}_{i}_{−}1

2(ai+bi))

2ji_{|}_{σ}

1, σ2] =

1 22ji

X

pi,1+···+p_{i,ai}+_{bi}=ji

pi,1,···,pi,ai+bi≥0

_{2}_{j}

i 2pi,1,· · ·,2pi,ai+bi

.

(3.11) Plugging the above expression into (3.10), we get

˜

W2a(σ1, σ2) = 1

22a

X

n

P

i=1

ai_{P}+bi
q=1

pi,q=a pi,q≥0

n

Y

i=1

i2(pi,1+···+pi,ai+bi)

×

2a

2p1,1,· · ·,2p1,a1+b1,· · ·,2pn,1,· · · ,2pn,an+bn

.(3.12)

We note that by (3.12), 22a_{W}_{˜}

2a can be obtained from “even order terms” in (

n

X

i=1

ai+bi

X

q=1

Xi,q)2a (3.13)

by substitutingXi,q=ifor 1≤q≤ai+bi. Now by lemma 3.2, the “even order terms” of (Pn

i=1

Pai+bi

q=1 Xi,q)2a can be expressed in terms of the following form

k Y r=1 ( n X i=1

ai+bi

X

q=1

withk ≥1, l1,· · ·, lk ≥1 and Pkr=1lr =a. Moreover, the coefficients in the linear combination only depend on a. By substituting Xi,q = i and taking expectation in (3.14), we get

E_{[}

k

Y

r=1

( n

X

i=1

i2lr_{(}_{a}

i+bi))]. (3.15)

In order to make use of Lemma 3.1, we introduce the following subspaces.
We take the base field to beR_{. For every monomial}_{αX}d1

i1 · · ·X

dk

ik (i1 <· · · <

ik,α∈R_{), we define}

B(αXd1

i1 · · ·X

dk

ik) :=α(d1)!· · ·(dk)!

_{X}

i1

d1

· · ·

_{X}

ik

dk

, (3.16)

and extend linearly. We also denote by

X(l1,· · ·, lk) = k

Y

r=1

( n

X

i=1

i2lr_{X}_{i)}_{,} _{(3.17)}

andB(l1,· · · , lr) := B(X(l1,· · ·, lr)). For anyd≥1, we define X(d, a) to be

the set ofX(l1,· · · , lk) with 1≤k≤dandl1+· · ·+lk =a, andB(d, a) to be the set ofB(l1,· · ·, lk) with 1 ≤k≤ dand l1+· · ·+lk =a. We also define

˜

X(d, a) to be the linear span of X(d, a) and ˜B(d, a) to be the linear span of

B(d, a).

Now we prove that for l1 +· · ·+lk = a, ˜X(k, a) = ˜B(k, a), and that it is possible to express X(l1,· · · , lk) in terms of elements of B(k, a) so that

the coefficients do not depend on n. The method is to prove the follow-ing stronger claim by induction on k: for any l1+· · ·+lk = a, ˜X(k, a) =

˜

B(k, a),X(l1,· · · , lk)−B(l1,· · ·, lk)∈X˜(k−1, a), and it is possible to express

X(l1,· · ·, lk) in terms of elements ofB(k, a),B(l1,· · ·, lk) in terms of elements

ofX(k, a) andX(l1,· · · , lk)−B(l1,· · ·, lk) in terms of elements ofX(k−1, a)

so that the coefficients do not depend onn. Whenk= 1,X(l1) =B(l1) for any

l1, and the result holds. For the induction step, we assume that the claim holds

for≤k, and consider thek+ 1 case. We will make use of the following identity:

X(l1,· · ·, lk+1)−B(l1,· · ·, lk+1)

= 1

k+ 1 k+1

X

j=1

(X(l1,· · · ,lˆj,· · ·, lk+1)−B(l1,· · ·,lˆj,· · · , lk+1))

×( n

X

i=1

i2lj_{X}_{i)}

+ 1

k+ 1 k+1

X

j=1

X

j′

6

=j

B(l1,· · ·, lj′+l

j,· · ·,lˆj,· · ·, lk+1).

The identity can be proved by matching individual terms on both sides. Now by the induction hypothesis, whenl1+· · ·+lk+1=a,X(l1,· · · ,lˆj,· · ·, lk+1)−

B(l1,· · · ,lˆj,· · ·, lk+1) is in ˜X(k−1, a−lj). Therefore the first term of the

right hand side is in ˜X(k, a). The second term of the right hand side is in ˜

B(k, a), hence in ˜X(k, a). Thus left hand side is in ˜X(k, a) = ˜B(k, a). Hence ˜

X(k+ 1, a) = ˜B(k+ 1, a). The last conclusion can also be checked from the identity.

NowE_{[}Qk

r=1(

Pn

i=1i2lr(ai+bi))] can be expanded in terms of forms (coeffi-cients do not depend onn, j)

E_{[}Y

r∈Γ

( n

X

i=1

i2lrai)]E_{[} Y

r∈[k]\Γ

( n

X

i=1

i2lrbi)] (3.18)

where Γ⊆[k]. By the above result, the first factor can be expressed in terms of elements inB(|Γ|,P

r∈Γlr), and similarly for the second factor.

Now we denote byB′_{(}_{l}_{1}_{,}_{· · ·} _{, l}_{k) the result obtained by substituting}_{X}

i =ai inB(l1,· · ·, lk), andB′′(l1,· · · , lk) similar with Xi=bi. Note that by Lemma 3.1,

E_{[}_{B}′_{(}_{l}_{1}_{,}_{· · ·}_{, l}_{k)] =} X

i1+···+ik≤j

i2l1−1

1 · · ·i 2lk−1

k . (3.19)

It can be shown by induction onkthat the above expression is a polynomial in

j with degree 2(l1+· · ·+lk). Similarly, it can be shown thatE[B′′(l1,· · · , lk)]

is a polynomial in (n−j) of degree 2(l1+· · ·+lk), hence it is also a polynomial

of the same degree in j with coefficients depending on n (but the coefficient
of highest degree does not depend onn). Putting these together, we conclude
that E_{[}Qk

r=1(

Pn

i=1i2lr(ai+bi))] is a polynomial of degree≤2ain j, and the coefficient of its degree 2aterm does not depend onn.

We conclude thatW2a is a polynomial of degree≤2ainj, and its coefficient of degree 2adoes not depend onn. By the conclusion and proof of Lemma 2.1, the eigenfunctions ofK are given by the discrete Chebyshev polynomials, and the eigenvalue corresponding toTn

2k does not depend onnas long asn≥2k. Now we present the proof of the eigenvalues. We make use of three lemmas in the proof, which we also present below. The first gives needed inequalities for the gamma function. For many further references, see [22].

Lemma 3.3(Explicit Stirling approximation). For any x >0, √

π(x

e)

x_{(8}_{x}3_{+ 4}_{x}2_{+}_{x}_{+} 1

100)

1

6 <Γ(1 +x)<√π(x

e)

x_{(8}_{x}3_{+ 4}_{x}2_{+}_{x}_{+} 1

30)

1 6.

(3.20)

From this it can be easily derived that for anyx≥1, √

2π(x

e)

x√_{x <}_{Γ(1 +}_{x}_{)}_{<}√_{π}_{(}x

e)

x√_{2}_{x}_{(1 +} 1

x)

1

6 (3.21)

Lemma 3.4(Clausen [10], see also [31]).

3F2(2a,2b, a+b;a+b+

1

2,2a+ 2b|x) = (2F1(a, b;a+b+ 1 2|x))

Lemma 3.5(Gauss’s hypergeometric theorem, see Page 2 of [4]). Fora, b, c∈R

such thata+b < c, we have

2F1(a, b;c|1) =

Γ(c)Γ(c−a−b)

Γ(c−a)Γ(c−b). (3.23)

Proof of Theorem 1.2, eigenvalue part. We use the notation as in the statement of Theorem 1.2. Letλk,n be the eigenvalue corresponding to the eigenfunction

Tn

2k for the Markov chainQ(i, j) on{0,1,· · ·, n}. Note that from the determi-nation of eigenfunctions, λk,n is constant for all n≥ 2k, and we denote it by

λk. The strategy of the proof is to analyze the expression forλk in a system of sizenand letn→ ∞to get the desired result.

The eigenfunction that corresponds toλk has been proven previously to be the discrete Chebyshev polynomials Tn

2k. Thus by examining the first row of the eigenvalue-eigenfunction equation, we obtain that (see Proposition 5.2 and [13, (3.1)-(3.3)])

λk,n =

Pn

j=0αnjT2nk(j)

Tn

2k(0) =

n

X

j=0

αnjT2nk(j), (3.24)

whereαn j =

(2j j)(

2(n−j)

n−j )

22n . Plugging the expression

T2nk(j) =

2k

X

l=0

(−2k)l(2k+ 1)l(−j)l 1l(−n)l

1

l! (3.25)

into the expression above, we obtain that

λk =

2k

X

l=0

(−2k)l(2k+ 1)l 1ll!

n

X

j=0

(−j)lαn j

(−n)l . (3.26)

Below we prove that lim n→∞

n

X

j=0

(−j)lαn j (−n)l =

(1_{2})l

(1)l. (3.27)

We fix a smallc >0, and assume thatn is sufficiently large below. Note that by Lemma 3.3, we can derive that fornsufficiently large,

|X

j≤cn

(−j)lαn j (−n)l | ≤

X

j≤cn

2j j

2(n−j)

n−j

22n ≤

2

√

n+

X

1≤j≤cn 4

√

j√n−j ≤

10√c

√

1−c.

(3.28) Similarly,

| X

(1−c)n≤j≤n

(−j)lαn j (−n)l | ≤

10√c

√

Now note that fornsufficiently large, whencn < j <(1−c)n, we have (_{n}j)l_{(1}_{−}
l+1

cn)l≤

(−j)l

(−n)l ≤( j

n)l. Moreover, using Lemma 3.3, we obtain that

1

π

1

√

j√n−j(1+

6

cn)−

2 3 ≤αn

j ≤ π1

1

√

j√n−j(1 +

6

cn)

1

3 forcn < j <(1−c)n. Therefore we have X

cn<j<(1−c)n

(−j)lαn j

(−n)l ≤(1 + 6 cn) 1 31 π X

cn<j<(1−c)n 1

q

j n

q

1−nj (j

n)

l1

n, (3.30)

X

cn<j<(1−c)n

(−j)lαn j

(−n)l ≥(1+ 6

cn)

−2

3(1−l+ 1

cn )

l1

π

X

cn<j<(1−c)n 1

q

j n

q

1−nj (j

n)

l1

n.

(3.31) Now note that

lim n→∞

1

π

X

cn<j<(1−c)n 1

q

j n

q

1−nj (j n) l1 n = 1 π

Z 1−c

c

xl

√

x√1−x. (3.32)

Hence we have 1

π

Z 1−c

c

xl

√

x√1−x−

20√c

√

1−c ≤lim infn→∞

n

X

j=0

(−j)lαn j (−n)l

≤ lim sup

n→∞

n

X

j=0

(−j)lαn j (−n)l ≤

1

π

Z 1−c

c

xl

√

x√1−x+

20√c

√

1−c.

Sendingc→0 gives lim n→∞

n

X

j=0

(−j)lαn l (−n)l =

1 π Z 1 0 xl √

x√1−x=

(1 2)l

(1)l. (3.33)

Now in (3.26) we taken→ ∞and get

λk=

2k

X

l=0

(−2k)l(2k+ 1)l(1_{2})l

(1)l(1)ll! =3F2( 1

2,−2k,2k+ 1; 1,1|1). (3.34)
Takea=−kandb=k+1_{2} in Lemma 3.4, we obtain that

λk= (2F1(−k, k+

1 2; 1|1))

2_{.} _{(3.35)}

Finally, by Lemma 3.5,

2F1(−k, k+1

2; 1|1) =

Γ(1)Γ(1_{2})
Γ(k+ 1)Γ(1_{2}−k) =

(−1)k_{1}_{·}_{3}_{· · · · ·}_{(2}_{k}_{−}_{1)}

2·4· · · · ·(2k) . (3.36) Hence

λk=

2k k

2

### 3.2

### Proof of Theorem 1.1

In this part, we prove Theorem 1.1 based on Theorem 1.2.

Proof of Theorem 1.1. First note that for the starting state x0 = (1,1,· · ·,1),

bySn-invariance ofKl_{(}_{x, y}_{) and}_{π}_{(}_{x}_{) for any}_{x, y, l}_{, we have}

kKl

x0−πk=

1 2

X

y∈X

|Kl

x0(y)−π(y)|=

1 2

n

X

j=0

|Ql

n(j)−u(j)|=kQln−uk, whereu(j) = 1

n+1,0≤j≤n.

Without loss of generality suppose n is even. The analysis for odd n is similar. By Theorem 1.2,

λk=

2k k

2

24k . (3.38)

By Lemma 3.3, we obtain that

λk≤ 1

πk(1 +

1 2k)

1

3. (3.39)

First we show the lower bound. By Proposition 2.1, takingβ =λ1=1_{4} and

ψ=Tn

2, we obtain

kQln−uk ≥ 1 4(

1 4)

l_{.} _{(3.40)}

Now we show the upper bound. For l= 1 the upper bound follows by the fact that total variation distance is always upper bounded by 1. Forl≥2,

4kQln−uk2≤χ2n(l) = n

2 X

k=1

λ2klβ2k(4k+ 1), (3.41)

where β2k ≤ _{n}_{+2}n ≤1 (see [15, Section 2.5] for the estimates related to Hahn
polynomials that are used above). Therefore we obtain that

χ2n(l) ≤ 5 n

2 X

k=1

kλ2kl≤15( 1 16)

l_{+}
n

2 X

k=3

k( 2

πk)

2l

≤ 15(1 16)

l_{+ 27(} 2
3π)

2lX∞

k=1

1

k2 ≤60(

1 16)

l_{.}

Hence we conclude that

kQln−uk ≤4( 1 4)

l _{(3.42)}

for anyl≥2.

Therefore for anyl∈ {1,2,· · · }, 1

4( 1 4)

l

≤ kKxl0−πk ≤4(

1 4)

### 4

### The twisted Burnside process

In this section, we consider a generalization of the original Burnside process, which we will call “twisted Burnside process”. This will allow deformation of the examples above to give a natural process with a larger family of Hahn polynomials as eigenfunctions. Several further examples are presented. We believe that it gives many new examples of easy to run, rapidly converging Markov chains with tunable stationary distributions.

The setting is the same: we have a finite group Gacting on a finite setX; forx∈ X, letGx={g∈G:xg=x}; forg∈G, letXg={x∈ X :xg=x}.

We choose a positive weightwon the groupGand letW(x) be the sum of

w(g) forg∈Gx. We also choose a positive weight von the setX and letV(g)
be the sum ofv(x) forx∈ Xg. The new Markov chain is: fromx, chooseg∈Gx
with probability _{W}w(_{(}g_{x})_{)}; giveng, choosey ∈ Xg with probability _{V}v((yg)); the chain

goes fromxto y.

Proposition 4.1 below gives the stationary distribution of the twisted Burn-side process.

Proposition 4.1. The twisted Burnside process as discussed in the preceding is a reversible Markov chain with stationary distribution

π(x)∝W(x)v(x) (4.1)

forx∈ X.

Moreover, if w is constant on each conjugacy class of G andv is constant

on each orbit ofX (under the action of G), then the chain can be lumped onto orbits ofX. Forx∈ X, letOxdenote the orbit containingx. The lumped chain

is a reversible Markov chain with stationary distribution

˜

π(Ox)∝ W(x)v(x) |Gx|

. (4.2)

Proof. For anyx, y ∈ X, the transition probability from xto y of the twisted Burnside process is given by

Kx,y =

v(y)

W(x)

X

g∈Gx∩Gy

w(g)

V(g). (4.3)

Note that the factor P

g∈Gx∩Gy w(g)

V(g) is symmetric in x, y. Hence if π(x) =

W(x)v(x)

Z forx∈ X (whereZ is a normalizing constant), then we have

π(x)Kx,y=

v(x)v(y)

Z

X

g∈Gx∩Gy

w(g)

V(g) =π(y)Ky,x. (4.4)

This shows that the twisted Burnside process is reversible with stationary dis-tribution proportional toW(x)v(x).

For the second part, we assume thatwis constant on each conjugacy class ofGandvis constant on each orbit ofX. Below we show by Dynkin’s criterion (see [28, Page 133]) that in this case the twisted Burnside process can be lumped onto orbits. Suppose thatx, y ∈ X are in the same orbit. Then there exists

h∈Gsuch thaty=xh_{. Hence}

Gy =h−1Gxh. (4.5)

Aswis constant on each conjugcy class ofG, by (4.5) we have

W(y) = X g∈Gy

w(g) = X g∈Gx

w(h−1_{gh}_{) =} X

g∈Gx

w(g) =W(x). (4.6)

For anyz∈ X, we have

X

q∈Oz

v(q) X g∈Gy∩Gq

w(g)

V(g)

= X

q∈Oz

v(qh_{)} X

g∈Gy∩G_{qh}

w(g)

V(g)

= X

q∈Oz

v(q) X
g∈h−1_{(}_{Gx}_{∩}_{Gq}_{)}_{h}

w(g)

V(g)

= X

q∈Oz

v(q) X g∈Gx∩Gq

w(h−1_{gh}_{)}

V(h−1_{gh}_{)}, (4.7)

where the second equality follows from the fact thatvis constant on each orbit ofX. Now note that

Xh−1_{gh}={xh:x∈ X_{g}}. (4.8)

By (4.8) and the fact thatv is constant on each orbit ofX, we have

V(h−1gh) = X x∈Xh−1gh

v(x) = X x∈Xg

v(xh) = X x∈Xg

v(x) =V(g). (4.9)

By (4.7), (4.9) and the fact thatwis constant on each conjugacy class ofG, we have

X

q∈Oz

v(q) X g∈Gy∩Gq

w(g)

V(g) =

X

q∈Oz

v(q) X g∈Gx∩Gq

w(g)

V(g). (4.10)

Therefore, by (4.3), (4.6) and (4.10), we obtain that

X

q∈Oz

Kx,q=

X

q∈Oz

Ky,q. (4.11)

By Dynkin’s criterion, the chain can be lumped onto orbits of X. Note that by (4.6) and the fact thatv(x) is constant on each orbit, we haveW(x)v(x) is

constant on every orbit. Thus the lumped chain is reversible with stationary distribution

˜

π(Ox)∝ |Ox|W(x)v(x). (4.12)

By the orbit-stabilizer theorem, we have ˜

π(Ox)∝ W(x)v(x) |Gx|

. (4.13)

An example of the twisted version of the Burnside process considered in Theorem 1.1 is presented below. Specializingk= 2 andγ2 = 1 in the example

gives the chain with beta-binomial stationary distribution considered in Section 5 below.

Example 4.1. ConsiderX = [k]n_{and}_{G}_{=}_{S}

n withGacting onX by permuting

coordinates. Fixk positive parametersθ, γ2,· · ·, γk. We take

w(σ) =θc(σ), (4.14)

for every σ ∈ Sn, where c(σ) is the number of cycles of σ. For any ~x =

(x1,· · · , xn)∈ X andj ∈[k], we define

S(~x, j) := #{i∈[n] :xi =j}. (4.15)

We further take

v(~x) = k

Y

j=2

γ_{j}S(~x,j) (4.16)

for every~x∈ X. Below we letγ1:= 1 to simplify notations.

Now we discuss the twisted Markov chain. From~x∈ X, we choose σ∈Sn

fixing ~x with probability _{W}w(_{(}σ_{~}_{x})_{)}. This can be realized as follows: find the set
of indices Ij := {l ∈ [n] : xl = j} for each j ∈ [k]; for each Ij, sample a

permutationσj ∈SIj from the Ewens distribution with parameterθ(see Section

5 for details of the Ewens distribution);σ is the product ofσj for j∈[k].

Given σ, we choose ~y ∈ X fixed by σ with probability _{V}v(_{(}y~_{σ})_{)}. This can be
done as below: breakσ into cycles C1,· · ·, Cm, and denote by cd the length of

the cycle Cd for every d ∈ [m]; for every d ∈ [m], pick an integer rd ∈ [k]

with probability _{P}γcdrd
k
l=1γlcd

, and takeyi =rd for every i∈Cd. Note that for any

~y∈ Xσ, yi for i∈Cd takes the same value (assuming that it’s rd). Hence the

probability of generating ~y (where~y∈ Xσ) through this procedure is m

Y

d=1

γcd rd

Pk

l=1γlcd

∝

m

Y

d=1

γcd rd =

k

Y

j=2

γ_{j}S(~y,j). (4.17)

Note that w is constant on each conjugacy class ofG. Moreover, v(~x)only depends on S(~x, j) for j ∈ [k], hence v is constant on each orbit of X. By

Proposition 4.1, the twisted Markov chain can be lumped onto orbits ofX. For

t1,· · · , tk ∈ N such that Pkl=1tl = n, let~t:= (t1,· · ·, tk) denote the orbit of

X consisting of ~x= (x1,· · · , xn)such that S(~x, j) =tj for every j∈[k]. Note

that for any~xin the orbit~t, we have|G~x|=Qkj=1(tj)!and

W(~x) = X σ∈Sn:σfixes~x

θc(σ)_{=}

k

Y

j=1

(θ· · ·(θ+tj−1))∝ k

Y

j=1

Γ(tj+θ). (4.18)

Thus by Proposition 4.1, the stationary distribution of the lumped chain is

˜

π(~t)∝

k

Y

j=1

Γ(tj+θ) (tj)!

k

Y

j=2

γ_{j}tj. (4.19)

Note that the termQk

j=1 Γ(tj+θ)

(tj)! is proportional to the probability corresponding

to the symmetric Dirichlet-multinomial distribution of parameter(θ,· · · , θ).

We close with a final remark. There are many probability measures onSn
that are constant on conjugacy classes. One way to construct these is to
de-fine P(σ) as proportional to θd(σ) _{where} _{d}_{(}_{σ}_{) =} _{d}_{(}_{id, σ}_{) for} _{d} _{a bi-invariant}

metric onSn. In turn, such bi-invariant metrics can be constructed as follows:
Letρ :Sn → GL(V) be a faithful unitary representation ofSn. Let k.k be a
unitarily invariant norm onV. Then d(σ, τ) =kρ(σ)−ρ(τ)k is a bi-invariant
metric on Sn. In particular, the Cayley distance d(σ, τ) = n−C(στ−1_{) =}

min #transpositions required to bringσtoτis bi-invariant, giving the example used above. Similarly the Hamming distance #{i withσ(i) different fromτ(i)}

is bi-invariant. A host of other examples appear in [12, Chapter 6C]. This in-cludes von Neuman’s useful characterization of unitarily invariant matrix norms.

### 5

### The twisted Burnside process and Hahn

### poly-nomials

In this section, we add a parameter to the Burnside process on Cn

2 with the

group Sn so that more general Hahn polynomials appear as eigenfunctions. The idea is simple: replace the uniform distribution onSn, used in step one of the algorithm, by the Ewens distribution

P_{θ(}_{σ}_{) =} 1

(θ)nθ

c(σ)_{,}_{(}_{θ}_{)n} _{=}_{θ}_{(}_{θ}_{+ 1)}_{· · ·}_{(}_{θ}_{+}_{n}_{−}_{1)} _{(5.1)}

where c(σ) is the number of cycles inσ and 0 < θ <∞ is a parameter. This familiar distribution is studied in genetics and combinatorics [11]. It may be seen as ‘the Mallows model through the Cayley metric’ as discussed in the end of Section 4. This chain was discovered via the twisted Burnside construction of Section 4. In hindsight, the following simplified description is available.

OnCn

• IdentifyGx withS|x|×Sn−|x|

• Pickσ∈S|x|×Sn−|x| choosing the two components independently from

the Ewens measure (5.1)

• Breakσinto cycles and label the cycles 0/1 with probability 12. Put this

0/1 string intoy∈Cn

2.

The argument of Section 4 shows

Proposition 5.1. The twisted Burnside process given above, lumped to orbits, is a Markov chain on{0,1,· · ·, n} with a beta-binomial distribution having

pa-rametersα=β=θ (see (2.3)).

The transition matrix of this Markov chain, call it pn,θ_{ij} , can be written
explicitly.

Proposition 5.2. Consider the twisted Burnside process given above, lumped to orbits. The transition matrix is given by

pn,θ_{0}_{j} =

_{n}

j

θ

2· · ·(

θ

2+j−1)

θ

2· · ·(

θ

2 +n−j−1)

θ(θ+ 1)· · ·(θ+n−1) , (5.2)

pn,θ_{jk} = X

max{0,j+k−n}≤l≤min{j,k}

pj,θ_{0}_{l} pn_{0}_{,k}−_{−}j,θ_{l}. (5.3)

Proof. We prove this using Polya’s identity. Specifically, for indeterminates

x1, x2,· · · , xn, we define

pn(x1,· · ·, xn) =

1

n!

X

g∈Sn n

Y

i=1

xai_{i} (g), (5.4)

whereai(g) is the number of cycles of lengthiing. Then Polya’s identity states that

∞

X

n=0

tnpn =

∞

Y

i=1

eti xii . (5.5)

We also have

pn,θ0j =

n!

θ(θ+ 1)· · ·(θ+n−1) 1

n!

X

g∈Sn,λ⊢j j

Y

i=1

ai(g)

bi(λ)

(θ 2)

a1(g)+···+an(g)_{.} _{(5.6)}

Using Polya’s identity by taking derivatives and multiplying, we get

pn,θ0j =

n j

θ

2· · ·(

θ

2+j−1)

θ

2· · ·(

θ

2 +n−j−1)

θ(θ+ 1)· · ·(θ+n−1) . (5.7) Moreover, by the definition of the twisted Burnside process given above, we have

pn,θ_{jk} = X

max{0,j+k−n}≤l≤min{j,k}

Theorems 1.1 and 1.2 above deform in the following form.

Theorem 5.1.Consider the twisted Burnside process onCn

2 given above, lumped

to orbits. The non-zero eigenvalues of the Markov chain are given by1 and

λk =

2k

X

l=0

(θ_{2})l(−2k)l(2k+ 2θ−1)l
(θ)l(θ)l

1

l! (5.9)

for 1 ≤k ≤ n

2. Moreover, the eigenfunctions corresponding to the zero

eigen-values are the Hahn polynomials on{0,1,· · ·, n} with parameters α=β =θ of

odd degree. The eigenfunction corresponding to the eigenvalue λk is the Hahn

polynomial on{0,1,· · ·, n}with parametersα=β =θof degree2k,1≤k≤ n2.

Theorem 5.2. Consider the twisted Burnside process on Cn

2 given above with

θ∈[1,2]. Denote by πthe stationary distribution of the chain (see Proposition

5.1), and denote by Kl

x0 for x0 = (1,1,· · ·,1) the distribution after l steps starting from nones. Then there exist positive constants c(θ), C(θ) which only depend onθ, such that for all n≥2 and alll≥25,

c(θ)( 1 2(1 +θ))

l_{≤ k}_{K}l

x0−πk ≤C(θ)(

1 2(1 +θ))

l_{.} _{(5.10)}

The proofs of Theorems 5.1 and 5.2 are similar but quite a bit more involved, to the proofs of Theorems 1.1 and 1.2. The restriction thatθ∈[1,2] in Theorem 5.2 is due to a technical step in our proof (for showing monotonicity of a further lumping of the chain). We refer the interested reader to [38],[37]. This develops things fork≥2 and has other approaches to proof.

We have not (yet) succeeded in finding a two-parameter deformation of the Burnside process onCn

2 which gives the full set of Hahn polynomials as

eigen-functions. Similarly, we have not succeeded in diagonalizing the Burnside
pro-cess on [k]n _{for any}_{k}_{≥}_{3.}

The point of this paper was to show (a) that orthogonal polynomials ‘pop up’ everyplace (b) seeing orthogonal polynomials as belonging to families leads to useful extension of classical algorithms.

### References

[1] Aldous, D., and Fill, J. Reversible Markov chains and random walks on graphs, 1995.

[2] Andersen, H. C., and Diaconis, P. Hit and run as a unifying device.

Journal de la soci´et´e fran¸caise de statistique 148, 4 (2007), 5–28.

[3] Anderson, W. J. Continuous-time Markov chains: An applications-oriented approach. Springer Science & Business Media, 2012.

[4] Bailey, W. N. Generalized hypergeometric series. Stechert-Hafner, 1964. [5] Borodin, A., and Olshanski, G. Representations of the infinite

sym-metric group, vol. 160. Cambridge University Press, 2017.

[6] Bryc, W., and Weso lowski, J. Askey-Wilson polynomials, quadratic harnesses and martingales. The Annals of Probability (2010), 1221–1262. [7] Cannings, C.The latent roots of certain Markov chains arising in genetics:

a new approach, i. Haploid models. Advances in Applied Probability 6, 2 (1974), 260–290.

[8] Chen, W.-K. Mixing times for Burnside processes. Master’s thesis, Na-tional Chiao Tung University, 2006.

[9] Chihara, T. S. An introduction to orthogonal polynomials. Courier Cor-poration, 2011.

[10] Clausen, T. Ueber die f¨alle, wenn die reihe von der form y= etc. ein quadrat von der form z= etc. hat. Journal f¨ur die reine und angewandte Mathematik 1828, 3 (1828), 89–91.

[11] Crane, H. The ubiquitous Ewens sampling formula. Statistical science 31, 1 (2016), 1–19.

[12] Diaconis, P. Group representations in probability and statistics. Lecture

notes-monograph series 11 (1988), i–192.

[13] Diaconis, P. Analysis of a Bose–Einstein Markov chain. In Annales de l’Institut Henri Poincare (B) Probability and Statistics (2005), vol. 41, Elsevier, pp. 409–418.

[14] Diaconis, P., and Griffiths, R. An introduction to multivariate Krawtchouk polynomials and their applications.Journal of Statistical

Plan-ning and Inference 154 (2014), 39–53.

[15] Diaconis, P., Khare, K., and Saloff-Coste, L. Gibbs sampling, exponential families and orthogonal polynomials. Statistical Science 23, 2 (2008), 151–178.

[16] Diaconis, P., and Ram, A. A probabilistic interpretation of the Mac-donald polynomials. The Annals of Probability 40, 5 (2012), 1861–1896. [17] Diaconis, P., and Shahshahani, M. Time to reach stationarity in the

Bernoulli–Laplace diffusion model. SIAM Journal on Mathematical Anal-ysis 18, 1 (1987), 208–218.

[18] Diaconis, P., and Shahshahani, M. On the eigenvalues of random matrices. Journal of Applied Probability 31, A (1994), 49–62.

[19] Dunkl, C. F., and Xu, Y. Orthogonal polynomials of several variables. No. 155. Cambridge University Press, 2014.

[20] Goldberg, L. A. Automating P´olya theory: The computational com-plexity of the cycle index polynomial. Information and Computation 105, 2 (1993), 268–288.

[21] Goldberg, L. A. Computation in permutation groups: counting and

randomly sampling orbits. LONDON MATHEMATICAL SOCIETY

LEC-TURE NOTE SERIES (2001), 109–143.

[22] Gordon, L.A stochastic approach to the gamma function. The American

Mathematical Monthly 101, 9 (1994), 858–865.

[23] Holtzen, S., Millstein, T., and Van den Broeck, G. Generating and sampling orbits for lifted probabilistic inference. In Uncertainty in

Artificial Intelligence (2020), PMLR, pp. 985–994.

[24] Ismail, M., Ismail, M. E., and van Assche, W.Classical and quantum orthogonal polynomials in one variable, vol. 13. Cambridge university press, 2005.

[25] Jerrum, M. Uniform sampling modulo a group of symmetries using Markov chain simulation. Expanding Graphs 10 (1993), 37–47.

[26] Jerrum, M. Computational P´olya theory. Citeseer, 1995.

[27] Karlin, S., and McGregor, J. The Hahn polynomials, formulas and an application. Scripta Math. 26 (1961), 33–46.

[28] Kemeny, J. G., and Snell, J. L.Finite Markov chains. Springer-Verlag, New York, 1976.

[29] Khare, K., and Zhou, H. Rates of convergence of some multivariate Markov chains with polynomial eigenfunctions. The Annals of Applied Probability 19, 2 (2009), 737–777.

[30] Levin, D. A., and Peres, Y. Markov chains and mixing times, vol. 107. American Mathematical Soc., 2017.

[31] Milla, L. A detailed proof of the Chudnovsky formula with means of ba-sic complex analysis–Ein ausf¨uhrlicher Beweis der Chudnovsky-Formel mit elementarer Funktionentheorie. arXiv preprint arXiv:1809.00533 (2018). [32] Polya, G., and Read, R. C. Combinatorial enumeration of groups,

graphs, and chemical compounds. Springer Science & Business Media, 2012. [33] Saloff-Coste, L. Lectures on finite Markov chains. In Lectures on

probability theory and statistics. Springer, 1997, pp. 301–413.

[34] Shepp, L., and Lloyd, S. Ordered cycle lengths in a random permu-tation. Transactions of the American Mathematical Society 121, 2 (1966), 340–357.

[35] Wilcox, R. R. A review of the beta-binomial model and its extensions.

Journal of Educational Statistics 6, 1 (1981), 3–32.

[36] Xu, Y. Tight frame with Hahn and Krawtchouk polynomials of several variables. SIGMA. Symmetry, Integrability and Geometry: Methods and Applications 10 (2014), 019.

[37] Zhong, C. A Ewens deformation of a Bose-Einstein Markov chain, in preparation.