2.6 Examples of weakly dependent spin systems
2.6.4 Models with exclusion: Random coloring and hard-core model 46
space in the random coloring model is the set of all proper colorings πΊ0 β πΆπ, i. e.
the set of all π β πΆπ such that {π£,π€} β πΈ β ππ£ ΜΈ= ππ€, and π = π(πΊ,πΆ) denotes the uniform distribution on πΊ0.
Proposition 2.20. Let πΊ= (π, πΈ) be a simple graph with maximum degree π₯ and π β₯ 2π₯+1. Then the conditions of Theorem 2.14 hold for the random coloring model ππΊ with πΌ1, πΌ2 depending on π₯ and π only.
Proof of Proposition 2.20. Let us first show that π½π£,π€ := π₯+11 1π£βΌπ€ can be used as an interdependence matrix. To see this, let π1, π2 β πΊ0 be two colorings that differ
2.6 Examples of weakly dependent spin systems 47
only in one vertex π£1, and π£2 ΜΈ= π£1. In the case π£1 βΌ π£2 the measures π(Β· | πππ£2) are uniform on πΆβ{πππ£π : π£π βΌ π£1} for π = 1,2, and hence
πTV(π β πΊ(Β· | π1π£2), ππΊ(Β· | π2π£2))
= 1 2
(οΈ 1
π β |{π1π£2 : π£2 βΌ π£1}| + 1
π β |{π2π£2 : π£2 βΌ π£1}|
)οΈ
β€ 1
π β π₯ β€ 1 π₯+ 1. On the other hand, if π£2 ΜΈβΌ π£1, then ππΊ(Β· | πππ£2) are equal and thus π½π£1,π£2 = 0. So, by the symmetry of π½ we have
|π½|2β2 β€ |π½|βββ β€max
π£βπ
βοΈ
π€βπ
π½π£,π€ β€ π₯
π₯+ 1 <1.
Moreover, we have to show ΜοΈπ½(ππΊ) β₯ π(π,π₯) . Let π ( π be a nonempty collection of vertices, π£1 β π/ and ππ β πΆπ be a proper coloring of the induced graph πΊ |π= (π, πΈπβ© π Γ π) and ππ£1 β πΆβ{ππ£2 : π£2 β π, π£2 βΌ π£1}. Using the definition πΊ0(π») for the set of all proper colorings of an arbitrary graph π» with a fixed number of colors π, we have
ππΊ(ππ£1 | ππ) = ππΊ(ππ£1, ππ)
ππΊ(ππ) = |πΊ0(πΊ |π)|
|πΊ0(πΊ |πβͺπ£1)|. (2.30) It is clear that |πΊ0(πΊ |π)| = πβ1|πΊ0(πΊΜοΈπ)|, where πΊΜοΈπ is obtained by adding an isolated vertex π£1 to π. Hence we fix the vertex set π βͺ π£1 and rewrite equation (2.30) as follows. Let π(π£1, π) = {π£2 β π : π£2 βΌ π£1}= {π1, . . . , ππ}be the neighbors of π£1 in π and for any π€1, . . . , π€πβ π(π£1, π) we let ππ = {π£1, π€π} and πΊπ1,...,ππ be the graph with edge set (πΈπβ© π Γ π) βͺ {π1, . . . , ππ}. This leads to
ππΊ(ππ£1 | ππ) = πβ1
π
βοΈ
π=1
|πΊ0(πΊπ1,...,ππβ1)|
|πΊ0(πΊπ1,...,ππ)| β₯ πβ1(οΈ π₯+ 1 π₯+ 2
)οΈπ
β₯ πβ1(οΈ π₯+ 1 π₯+ 2
)οΈπ₯
, where the inequality follows from [Jer95, equation (2)], stating that the ratios are bounded from below by a constant π(π₯).
Finally, the case π = β is much easier, as π(ππ) = πβ1 due to the invariance of the random coloring model induced by a relabeling of the colors πΆ.
Another model with hard constraints is the hard-core model with fugacity parameter π. Given a graph πΊ = (π,πΈ) and π > 0, the hard-core model π = ππΊ,π
is the spin system on π΄ = {0,1}π satisfying
π(π) =
{οΈπβ1βοΈ
ππππ π admissible
0 otherwise .
Here, an admissible configuration satisfies ππ£ππ€ = 0 for all {π£, π€} β πΈ.
Let us first consider the following example, in which the partition function π
can be easily calculated.
Example 2.21. Consider the star ππ with π neighbors and a hub vertex β. In this case, it is easy to calculate that the normalizing constant is given by
π = π +
π
βοΈ
π=0
(οΈπ π
)οΈ
ππ = π + (1 + π)π.
Here, the π corresponds to a particle being at the hub and the sum is over all possible configurations of particles at the leaves.
Proposition 2.22. Let πΊ = (π,πΈ) be any graph with maximum degree π₯ and π <(π₯ β 1)β1. The conditions of Theorem 2.14 hold for the hard core model ππ with fugacity π, and πΌ1, πΌ2 depend on π₯ and π.
Proof of Proposition 2.22. Since we are going to require hard-core models corre-sponding to various graphs, we write ππΊ to emphasize the graph under considera-tion. The fugacity π will not change.
Firstly, let us show that π½π£1,π£2 = 1+ππ 1π£1βΌπ£2 can be used as an interdependence matrix. Let π£1 β π be a site, π1, π2 β π΄ be two admissible configurations differing only at π£1, i. e. ππ£11 = 1, ππ£21 = 0, and π£2 β π be another site. If π£2 βΌ π£1, then ππΊ(1 | π1π£1) = 0, whereas ππΊ(1 | π1π£1) = 1+ππ . If π£2 ΜΈβΌ π£1 we have ππΊ(Β· | π1π£1) = ππΊ(Β· | π2π£1) and so π½π£1,π£2 = 0. Now, by the symmetry of π½ the inequality
|π½|2β2 β€ |π½|βββ β€ π₯1+ππ < 1 holds, where the last step is a consequence of π < π₯β11 .
Secondly, to see that there is a lower bound on the conditional probabilities, let us first consider the case π = β . Let π£ β π be arbitrary, write π(π£) for the neighborhood of π£ and π΄ for the complement of π£ βͺ π(π£), and observe that
ππΊ(ππ£ = 1) = ππΊ(ππ£ = 1, ππ (π£) = 0) = πβ1βοΈ
πΜοΈπ΄
π1+|ΜοΈππ΄|
= ππβ1βοΈ
ΜοΈππ΄
π|πΜοΈπ΄|=: ππβ1ππ΄,
where the summation is over all admissible (partial) configurations ΜοΈππ΄ such that the configuration (ΜοΈππ΄,1, 0, . . . ,0) is admissible. Note that due to ππ (π£) = 0 these are actually all admissible configurations of the graph πΊ |π΄ induced by π΄. The normalization constant π can be bounded from above and below by
(1 + π)ππ΄β€ π = βοΈ
ΜοΈππ΄ adm.
π|ΜοΈππ΄|βοΈ
πΜοΈπ΄π
π|ΜοΈππ΄π|1(ΜοΈππ΄,ΜοΈππ΄π) adm.β€(1 + 2π₯)ππ΄.
Here, the first inequality follows by considering the two configurations ππ£ = 1, ππ (π£) = 0 and ππ£βͺπ (π£)= 0 only, and the second one is a consequence of the fact that there can be at most 1 + 2π₯ admissible configurations for π£ βͺ π(π£). This
2.6 Examples of weakly dependent spin systems 49
leads to the bounds
π
1 + 2π₯ β€ ππΊ(ππ = 1) β€ π
1 + π. (2.31)
The case π ΜΈ= β follows by a reduction argument. Let ΜοΈππ be an admissible configuration of πΊ |π, π := {π€ β π : ππ€ = 1} β π be the occupied sites in π and π(π ) = βͺπ£βππ(π£) be the neighborhood of π . For any π£ β π(π ) we have ππΊ(ππ£ = 1 |πΜοΈπ) = 0, so that we only need to consider π£ /β π(π ).
However, it is an elementary calculation to show that conditional on ππ (π ) = 0, ππ£ and ππ are independent random variables (the easiest way to see this heuristically is to observe that π only excludes two particles at neighboring vertices, and π£ and π have distance at least 2). More precisely, we have
ππΊ(ππ£ = 1 | ππ =ΜοΈππ) = ππΊ(ππ£ = 1 | ππ (π ) = 0).
The probability on the right hand side is equal to the probability that π£ is occupied in the graph π induced by [π]βπ(π ), i. e.
ππΊ(ππ£ = 1 | ππ =ΜοΈππ) = ππΊ|π (ππ£ = 1).
From inequality (2.31) we obtain an upper and lower bound on the probability, leading to ΜοΈπ½(ππΊ) β₯ π(π₯, π).
CHAPTER 3
Bernstein-type inequalities
The goal of this chapter is to establish concentration bounds which are akin to Bernstein or HansonβWright inequalities. By this, we mean upper bounds of the type
π(οΈπ β₯ Eππ+π‘)οΈ β€ exp(οΈ
β π‘2
2(π + ππ‘))οΈ or π(οΈπ β₯ Eππ+π‘)οΈ β€ exp(οΈ
βmin(οΈπ‘2 π,π‘
π )οΈ)οΈ
. As both inequalities show two different levels of tail decay (the Gaussian one for π‘ β€ ππβ1 and an exponential one for π‘ > ππβ1), we use the terminology of Adamczak [ABW17; AKPS19] and call them two-level deviation inequalities (or concentration inequalities, if a lower bound holds as well).
Let us first provide some historical remarks on the term Bernstein inequality. As it was not possible to find the original publication of Bernstein [Ber24], we refer to the work of Bennett [Ben62] (however, note that Bennett also did not have access to the publication, but in turn relied on the two works of Craig [Cra33]
and Godwin [God55]). In its simplest form (for bounded random variables), it states that if π1, . . . , ππ are centered random variables satisfying |ππ| β€1 for all π β[π], then for ππ := βοΈππ=1ππ and all π‘ β₯ 0 we have
P(ππβ₯ π‘) β€ exp(οΈ
β π‘2
2Var(ππ) + 23π π‘ )οΈ
, so that in this case π = Var(ππ) and π = 13π.
It is striking that Bernstein inequalities appear in many different contexts.
Apart from the classical example above, they appear in the study of empirical processes, i. e. random variables of the form
π := sup
π ββ±
β
β
β
π
βοΈ
π=1
π(ππ)β
β
β, (3.1)
where ππ are independent random variables and β± is a (countable) class of functions. It was initiated in [Tal96b, Theorem 1.4] using the powerful induction method and convex distance-type inequalities (see [Tal96b, Theorem 4.2]). In
subsequent works [Mas00, Theorem 3], [Rio02, ThΓ©orΓ¨me 1.1], [Bou02, Theorem 2.3], [KR05, Theorems 1.1 and 1.2] [Ada08, Theorem 4] either the entropy or the martingale method was applied to prove similar inequalities. For example, it can be deduced from Bousquetβs result that for independent and identically distributed random variables π1, . . . , ππ and a class of functions β± such that E π (π1) = 0 for all π β β± it holds
P(π β E π β₯ π‘) β€ exp
(οΈβ πβ(οΈπ‘ π
)οΈ)οΈ
for π = π supπ ββ±Var(π(π1)) and β(π₯) = (1 + π₯) log(1 + π₯) β π₯. As β(π₯) β₯ π₯2/2(1 + π₯3), this especially implies
P(π β E π β₯ π‘) β€ exp
(οΈβ π‘2 2(π +13π‘)
)οΈ
,
so that the bound holds with π = π and π = 13. Note that in this case a stronger estimate, i. e. Bennettβs inequality, holds. If one drops the assumption of identical distributions, the result with the best constants was obtained by Klein and Rio, who prove a Bernstein-type inequality for the right tail. We omit the details.
Furthermore, van de Geer and Lederer [GL13] have defined a norm which can precisely capture such Gaussian and exponential behavior, which they termed BernsteinβOrlicz norm. It is defined in terms of the increasing, convex function πΉπΏ(π₯) := exp(οΈ
πΏβ1(β
1 + 2πΏπ§ β 1)2)οΈ
β1 for some fixed πΏ > 0, i. e. for any random variable π they set
βπβπΉπΏ = inf{π‘ > 0 : E πΉπΏ(|π|/π‘) β€ 1}.
Many of the results given below can be phrased in terms of these BernsteinβOrlicz norms, but for readabilityβs sake we choose not to do so.
Lastly, we want to mention that Bernstein inequalities are not restricted to real value random variables. Oliveira [Oli10] and Tropp [Tro12] independently developed Bernstein-type inequalities for a sum of random matrices, which give concentration inequalities for the largest eigenvalue of a sum of random Hermitian matrices. As we will not need these results, we refer the interested reader to the monograph [Tro15].
3.1 General results
Let (πΊ, β±, π) be a probability space and let π€ be a difference operator on π β πΏβ(π). We say that π satisfies a π€ βmLSI(π), if for all π β π we have
Entπ(ππ) β€ π
2Eππ€(π)2ππ. (3.2)
3.1 General results 53
We will suppress the dependence of this definition on the class π, as it should be clear from the context.
The first main result are the following Bernstein-type deviation and concentra-tion inequalities.
Theorem 3.1. Assume that π satisfies a π€ βmLSI(π) for some difference operator π€ and π > 0. Let π, π be two measurable functions such that π€ (π) β€ π, and π
Let us provide some remarks on Theorem 3.1.
Remark. 1) The fact that π€ βmLSI(π) provides sub-Gaussian concentration for functions π satisfying π€ (π) β€ 1 is well known. More precisely, [BG99] shows that in this case any differentiable function π’, see also Section 3.2), [BG99] also proves that the exponential moments of ππ2 can be controlled for π‘ β [0,1/(2π)), which is expressed in the inequality
In contrast, Theorem 3.1 does neither assume π€ to be a derivation, nor that there is a uniform bound on π€ (π).
2) The somewhat unsatisfactory factor 4/3 cannot be improved using our method.
It is possible to modify our proofs in order to apply [KZ18, Lemma 1.3], which leads to an inequality of the form
π(οΈπ β Eππ β₯ π‘)οΈ β€exp(οΈ
β πmin(οΈ π‘2
π(Eππ)2+ 2π2π2, π‘
β2ππ )οΈ)οΈ
for some absolute constant π (the same one as in [KZ18]). However, this is at
the cost of a weaker denominator in the Gaussian term as compared to (3.3), and so we choose to present it in the form (3.3).
3) It is easy to see that for any π,π β₯ 0 such that ππ ΜΈ= 0 the function ππ,π(π‘) = π‘2/(2π + 2ππ‘) is invertible with inverse πβ1π,π(π‘) = ππ‘ +β
π2π‘2+ 2ππ‘, so that the Bernstein-type inequality can equivalently be written as
π(οΈ
π β Eππ β₯ ππ‘+β
π2π‘2 + 2ππ‘)οΈ
β€exp(βπ‘) for π‘ β₯ 0.
In the same way, ππ,π(π‘) = min(π‘2/π2, π‘/π) has inverse max(β
π2π‘, ππ‘), which allows to rewrite the HansonβWright-type inequality in a similar fashion.
Next, we consider a special class of functions, for which analogue results can be obtained - the self-bounding functions. In our framework, given a difference operator π€ , we say that π β₯ 0 is a (π,π)βself-bounding function (with respect to π€) for some π, π β₯ 0, if
π€(π)2 β€ ππ+ π.
For a product measure π, there are various sources that provide deviation or concentration inequalities for self-bounding functions, see e. g. [BLM00, Theorem 2.1], [Rio01, ThΓ©orΓ¨me 3.1], [BLM03, Theorem 5], [BBLM05, Corollary 1], [Cha05, Theorem 3.9], [MR06, Theorem 1] and [BLM09, Theorem 1]. As many of the proofs rely on the entropy method, it is an easy task to generalize some of the results to the framework of difference operators and obtain Bernstein-type inequalities.
Proposition 3.2. Assume that π satisfies a π€ βmLSI(π) and let π β₯ 0 be a (π,π)βself-bounding function. Then we have for all π‘ β₯ 0
π(οΈπ β Eππ β₯ π‘)οΈ β€exp(οΈ
β π‘2
π(4π Eππ + 4π +23ππ‘) )οΈ
.
Furthermore, if π€(ππ) = |π|π€ (π) for all π β R, then for all π‘ β [0, E π] it holds π(οΈ
Eππ β π β₯ π‘)οΈ β€exp(οΈ
β π‘2
π(4π E π + 4π +23ππ‘) )οΈ
.
As we will show in Proposition 3.10, product measures always satisfy an mLSI with respect to the difference operator used in the works mentioned above. This is a well-known fact and was first proven in [Mas00].
We are also able to prove a version of Talagrandβs famous concentration in-equality for the convex distance for random permutations by similar means as used in the proofs of the upper results. To this end, recall that for any measurable space πΊ and any π = (π1, . . . , ππ) β πΊπ, we may define the convex distance of π to some measurable set π΄ β πΊπ by
ππ(π, π΄) := sup
πΌβRπ:|πΌ|2=1
ππΌ(π, π΄), ππΌ(π, π΄) := inf
πβ²βπ΄ππΌ(π, πβ²) := inf
πβ²βπ΄ π
βοΈ
π=1
|πΌπ|1ππ=πβ²π.
3.1 General results 55
Proposition 3.3. Let ππ be the symmetric group and ππ be the uniform distribu-tion on ππ. For any set π΄ β ππ with ππ(π΄) β₯ 1/2 and all π‘ β₯ 0 we have
ππ(ππ(Β·, π΄) β₯ π‘) β€ 2 exp(οΈ
β π‘2 32
)οΈ
. (3.5)
Talagrandβs celebrated convex distance inequality (see [Tal95, Theorem 5.1]) states that for any subset π΄ β ππ it holds
ππ(π΄) Eππexp(οΈ 1
16ππ(Β·, π΄)2)οΈ
β€1, (3.6)
which, in particular, easily implies (3.5) with a constant 16 instead of 32. An inequality similar to (3.6) was deduced for product measures in [Tal95], and reproven using the entropy method in [BLM09]. Furthermore, [Pau14] extended Talagrandβs inequality to weakly dependent random variables. However, it does not seem possible to adjust the method therein to the case of the symmetric group and so we are not aware of any proof of either of the inequalities using the entropy method. In [Sam17] the author has proven the convex distance inequality for the symmetric group using weak transport inequalities.
Lastly, we show Bernstein-type concentration inequalities for multilinear poly-nomials in independent random variables with values in [0,1]. We consider a π-homogeneous multilinear form π as follows. Let π» = (π, πΈ, (π€π)πβπΈ) be a weighted hypergraph, such that every π β πΈ consists of exactly π vertices, assume that (ππ£)π£βπ are independent, [0,1]-valued random variables, and set
π(π) = π((ππ£)π£βπ) = βοΈ
πβπΈ
π€πβοΈ
π βπ
ππ =βοΈ
πβπΈ
π€πππ. (3.7)
Define the maximum first order partial derivative ML(π) as ML(π) := sup
π£βπ
sup
π₯β[0,1]π
ππ£π(π₯). (3.8)
Proposition 3.4. Let(ππ£)π£βπ be independent, [0,1]-valued random variables and π : [0,1]π β R given as in (3.7) and assume that ππ£π(π₯) β₯ 0 for all π£ β π and π₯ β[0,1]π. We have for any π‘ β₯ 0
P(π (π) β E π (π) β₯ π‘) β€ exp
(οΈβ π‘2
2πML(π)(E π(π) + π‘/2) )οΈ
. (3.9) Furthermore, for π‘ β[0, E π] it holds
P(E π (π) β π (π) β₯ π‘) β€ exp
(οΈβ π‘2 2πML(π)
)οΈ
.
For example, the monotonicity in every variable condition ππ£π(π₯) β₯ 0 is fulfilled if the weights π€π are non-negative.
The following corollary can be easily deduced from Proposition 3.4
Corollary 3.5. Let (ππ£)π£βπ be independent, [0,1]-valued random variables and π = βοΈππ=1ππ for some π-homogeneous functions (ππ) of the form (3.7) with positive weights. For any π‘ β₯0 we have
P(π (π) β E π (π) β₯ π‘) β€ π exp
(οΈβ min
π=1,...,π
(οΈ π‘2
2πML(ππ)(π2E ππ(π) + ππ‘/2) )οΈ)οΈ
. A slight modification of the proof of Proposition 3.4 also allows for deviation inequalities for suprema of such homogeneous polynomials. For example, this can be used to prove concentration inequalities for maxima of linear forms.
Proposition 3.6. Let (ππ£)π£βπ be independent, [0,1]-valued random variables, β± β {π β Rπ : ππ β[0,1]π} and define π(π)β± := supπββ±
βοΈ
πβπ ππππ. For any π‘ β₯0 we have
P(πβ±(π) β E πβ±(π) β₯ π‘) β€ exp(οΈ
β π‘2
2 supπββ±βπββ(E πβ±(π) + π‘/2) )οΈ
. In particular, for any π β[1,β] it holds
P(βπβπβ Eβπβπ β₯ π‘) β€ exp(οΈ
β π‘2
2(Eβπβπ+ π‘/2) )οΈ
.