and π£πΌ = (1, . . . , 1), otherwise. In particular, we always have βπ£πΌββ . 1, and therefore, by Lemma 5.21,
β ΜοΈπ|k|βπ¦. βπ΄|k|β1πΏ(ΜοΈβ)βπ¦,
from where we easily arrive at (5.26) by applying Lemma 5.20.
Now, (5.24) follows by combining (5.25) and (5.26), which finishes the proof.
5.6 The general sub-exponential case: πΌ β (0,1]
Using slightly different techniques than in the proofs of Theorems 5.4 and 5.6, we may obtain concentration results for polynomials in independent random variables with bounded ππΌ-norms for any πΌ β (0,1]. Here, the key difference is that we will not compare their moments to products of Gaussians but to Weibull variables.
To this end, we need some further notations. Let π΄ = (πi)iβ[π]π be a π-tensor and πΌ β [π] a set of indices. Then, for any iπΌ := (ππ)πβπΌ, we denote by π΄iπΌπ = (πi)iπΌπ
the (π β |πΌ|)-tensor defined by fixing ππ, π β πΌ. For instance, if π = 4, πΌ = {1,3}
and π1 = 1, π3 = 2, then π΄iπΌπ = (π1π2π)ππ.
For πΌ = [π], i. e. we fix all indices of i, we interpret π΄iπΌπ = πi as the i-th entry of π΄. Moreover, in this case, we assume that there is a single element π₯ β π (πΌπ) (which we may call the βemptyβ partition), and βπ΄iπΌπβπ₯ = |πi|is just the Euclidean norm of πi. Finally, note that if πΌ = β , iπΌ does not indicate any specification, and π΄iπΌπ = π΄.
Using the characterization of the πΉπΌ norms in terms of the growth of πΏπ norms (see Appendix B for details), [KL15, Corollary 2] yields a result similar to Theorem
5.4 for all πΌ β (0,1]:
Corollary 5.22. Let π1, . . . , ππ be independent, centered random variables sat-isfying βπβπΉπΌ β€ π for some πΌ β(0,1], π΄ be a symmetric π-tensor with vanishing diagonal and consider ππ,π΄= ππ,π΄(π) as in (5.3). We have for any π‘ β₯ 0
P
(οΈ|ππ,π΄| β₯ π‘)οΈ
β€2 exp(οΈ
β 1
πΆπ,πΌ min
πΌβ[π] min
π₯ βπ (πΌπ)
(οΈ π‘
ππmaxiπΌβπ΄iπΌπβπ₯
)οΈ2|πΌ|+πΌ|π₯ |2πΌ )οΈ
. The main goal of this section is to generalize Corollary 5.22 to arbitrary polynomials similarly to Theorem 5.6, which is the following result.
Theorem 5.23. Let π1, . . . , ππ be independent random variables with βππβππΌ
β€ π for some πΌ β (0,1] and π > 0. Let π : Rπ β R be a polynomial of total degree π· β N. Then, for any π‘ β₯ 0, it holds
P(|π (π) β Eπ (π)| β₯ π‘)
β€2 exp(οΈ
β 1
πΆπ·,πΌ min
πβ[π·]min
πΌβ[π] min
π₯ βπ (πΌπ)
(οΈ π‘
ππmaxiπΌβ(Eπ(π)(π))iπΌπβπ₯
)οΈ2|πΌ|+πΌ|π₯ |2πΌ )οΈ
.
To prove Theorem 5.23, note that one particular example of centered random variables with βπβπΉπΌ β€ π is given by symmetric Weibull variables with shape parameter πΌ (and scale parameter 1), i. e. symmetric random variables π€ with P(|π€| β₯ π‘) = exp(βπ‘πΌ). In fact, [KL15, Example 3] especially implies the following analogue of Lemma 5.16. Here, we write π΄ βΌπ,πΌ π΅ for a two-sided inequality πΆπ,πΌβ1π΄ β€ π΅ β€ πΆπ,πΌπ΄.
Lemma 5.24. Let π΄= (πi)iβ[π]π be a π-tensor and(π€ππ)πβ[π],πβ[π], an array of i. i. d.
Weibull variables with shape parameter πΌ β(0,1]. Then, for every π β₯ 2,
ββ¨π΄, π€1β . . . β π€πβ©βπ βΌπ,πΌ βοΈ
Moreover, we need a replacement of Lemma 5.15. Here, we use Weibull instead of Gaussian random variables to compare the π-th moments:
Lemma 5.25. For any π β N, any πΌ > 0 and any π β₯ 1 the following holds.
For any set of independent, symmetric random variables π1, . . . , ππ satisfying
βππβππΌ
where π€ππ are symmetric i. i. d. Weibull random variables with shape parameter πΌ and ππΌ,π := (π/(1 β log(2)))π/πΌ.
where the second inequality requires the condition π‘ β₯ 1. Now the rest follows exactly as in [AW15, Proof of Lemma 5.4].
Alternatively, one can extend the inequality to all π‘ β₯ 0 by multiplying the left hand side by a constant. Indeed, it is easy to see (by observing P(π|π€π1Β· Β· Β· π€ππ| β₯ 1) β₯ 2/π) that for all π‘ β₯ 0 it holds
P(|ππ| β₯ π‘) β€ π
2P(π|π€π1Β· Β· Β· π€ππ| β₯ π‘).
5.6 The general sub-exponential case: πΌ β (0,1] 117
Thus, the contraction principle [Kwa87, Theorem 1] tells us that for any π β₯ 1 we have
β¦
β¦
β¦
π
βοΈ
π=1
ππππβ¦
β¦
β¦π
β€ π
2ππΌ,ππβ¦
β¦
β¦
π
βοΈ
π=1
πππ€π1Β· Β· Β· π€ππβ¦
β¦
β¦π
.
Our next goal is to adapt Lemmas 5.19, 5.20 and 5.21 to the βrestrictedβ tensors π΄iπΌπ. That is, we examine whether (a modification of) the inequality
β(π΄ β 1πΆ)iπΌπβπ₯ β€ βπ΄iπΌπβπ₯ (5.27) still holds in this situation, where π₯ is a partition of πΌπ.
Lemma 5.26. Let π΄= (πi)iβ[π]π be a π-tensor, πΌ β[π] and iπΌ β[π]πΌ fixed.
1. If πΆ = {i: ππ1 = π1, . . . , πππ = ππ} for some 1 β€ π1 < . . . < ππ β€ π (βgeneral-ized rowβ), then (5.27) holds.
2. If πΆ = {i: ππ = ππ βπ, π β πΎ} for some πΎ β [π] (βgeneralized diagonalβ), then (5.27) holds.
3. If πΆ1, πΆ2 β[π]π are such that (5.27) holds, then so is πΆ1β© πΆ2. 4. If π¦ β ππ, then β(π΄ β 1πΏ(π¦))iπΌπβπ₯ β€2|π¦|(|π¦|β1)/2βπ΄iπΌπβπ₯.
5. For any vectors π£1, . . . , π£πβ Rπ, β(π΄ β βππ=1π£π)iπΌπβπ₯ β€ βπ΄iπΌπβπ₯ βοΈπ
π=1βπ£πββ. Proof. To see (1), we may assume that {π1, . . . , ππ} β© πΌ = β (note that in the case {π1, . . . , ππ} β© πΌ ΜΈ= β , either the conditions are not compatible, in which case (π΄ β 1πΆ)ππΌπ = 0, or we can remove some of the conditions and obtain a subset with {π1, . . . , π
ΜοΈπ} β© πΌ = β ). In this case, if πΆ is a generalized row, then (π΄ β 1πΆ)iπΌπ = π΄iπΌπ β1πΆβ² for some generalized row πΆβ² in πΌπ. This proves (1).
If πΆ is a generalized diagonal, we have to consider two situations. Assuming πΎ β© πΌ = β , i. e. πΎ is subset of πΌπ, we immediately obtain (2). On the other hand, if πΎ β© πΌ ΜΈ= β , then (π΄ β 1πΆ)iπΌπ = π΄iπΌπ β1πΆβ² for some generalized row πΆβ² in πΌπ, readily leading to (2) again.
(3) is clear. To see (4), one may argue as in the proof of Lemma 5.20 (for π = 1), replacing Lemma 5.19 (2) and (3) by their analogues we just proved. Finally, an easy modification of the proof of Lemma 5.21 yields (5).
We are now ready to prove Theorem 5.23. Here, we recall the notation used in the proof of Theorem 5.6, with the only difference that now, by π . π we mean an inequality of the form π β€ πΆπ·,πΌπ.
Proof of Theorem 5.23. We will follow the proof of Theorem 5.6. In particular, let us assume π = 1.
Step 1. Recall the inequality from the proof of Theorem 5.6 Step 2. Applying Lemma 5.26, we arrive at
βπ(π) β Eπ(π)βπ .
Here, (π€(π)π,π) is an array of i. i. d. symmetric Weibull variables with shape parameter πΌ. Now we may define π-tensors π΄π as in the proof of Theorem 5.6. Similarly as in (5.21), rewriting and applying Lemma 5.24 and Lemma 5.26 (4) yields
βπ(π) β Eπ(π)βπ .
Step 3. In the proof of Theorem 5.6 we have decomposed
E
πππ
ππ₯π1. . . ππ₯ππ(π) = π!π1! Β· Β· Β· ππ!ππ1,...,ππ+ π (π)i
with a remainder tensor π (π)i corresponding to the set of indices k with k > l and π (π)i = 0 for π = π·. Again, for any πΌ β [π·] and any partition π₯ β π (πΌπ),
using Lemma 5.26 (4) in the last step. To complete the proof, we need to show that for any π = 1, . . . , π· β 1, any πΌ β [π] and any partitions β β π ([π]), π₯ β π ([π]βπΌ),
Actually, analyzing the proof one can see it is possible to restrict the second sum
5.6 The general sub-exponential case: πΌ β (0,1] 119
on the right-hand side to partitions π¦ with |π¦| β {|π₯ |, |π₯ | + 1}. Once having proven (5.28), it follows from reverse induction that
βοΈπ|πΌ|/π+|π₯ |/2max
iπΌ
β(π΄π)iπΌπβπ₯ .βοΈ
π|πΌ|/π+|π₯ |/2max
iπΌ
β(Eπ(π)(π))iπΌπβπ₯, where βοΈ = βοΈπβ[π·]βοΈ
πΌβ[π]
βοΈ
π₯ βπ (πΌπ). Here, we use that for any π β₯ 2 and any
|π¦| β₯ |π₯ | we have π|π₯ |/2 β€ π|π¦|/2. In view of Step 2 and Proposition 2.10, this finishes the proof.
Step 4. Recall the π-tensor πβ(π,k)= (π (π,πi 1,...,ππ))iβ[π]π = (π (π)i )iβ[π]π and for any k β πΌπ,β€π· with k > l the |k|-tensor πΜοΈ|k| from the proof of Theorem 5.6.
In the sequel, we fix the following items:
β’ π β [π· β 1] and k satisfying |k| > π,
β’ a set πΌ β [π] and iπΌ β[π]πΌ,
β’ an admissible partition β β π ([π]) and an associated extension πΌ β πΜοΈ ([|k|]).
The notion of admissibility was not relevant in Theorem 5.6, as we have not fixed any indices πΌ and values iπΌ β[π]πΌ. Here, it simply means that the level sets have to compatible with the fact that we have fixed some of the partial derivatives by πΌ and iπΌ. Also note that β is a partition of [π] and not of [π]βπΌ, since it arises from level sets of partial derivatives and includes the those from πΌ.
Our aim is to find a partition π¦ β π ([|k|]βπΌ) with |π¦| β {|π₯ |, |π₯ | + 1} such that
β(πβ(π,k))iπΌπβπ₯ . β(π΄|k|)iπΌπβπ¦. (5.29) First, set π¦ := π₯ = {π½1, . . . , π½π}, so that it remains to assign the elements π β {π+ 1, . . . , |k|}. This is done as follows: Select π β {π + 1, |k|}, and do the following steps:
β’ choose π β N such that π βπΌΜοΈπ,
β’ take the smallest element π‘ =: π(π) in πΌΜοΈπ (since πΌπβ ΜοΈπΌπ, we have π‘ β [π]),
β’ if π‘ β πΌπ, there is a set π½π β π‘ and we add π to π½π; otherwise, we assign π to an βextra setβ πΎπ+1
In particular, it may happen that πΎπ+1= β . In this case, we ignore π½ = π + 1 in the rest of the proof.
Now we claim that it holds
β(πβ(π,|k|))iπΌπβπ₯ β€ β(πΜοΈ|k|)iπΌπβπ¦. (5.30)
Figure 5.1: An illustration of the procedure of producing the partition π¦. Here, we start with π = 6, |k| = 8, πΌ = {2,3}. The partition βΜοΈ is indicated by colors, i. e. βΜοΈ= {{1,2,5}, {3,6,8}, {4,7}}. {8} belongs to πΎ4 = πΎπ+1
since {3} β πΌ. Changing its color to yellow would produce a partition π¦ with 3 subsets.
To see this, let π₯ = (π₯(π½))π½β[π] = ((π₯(π½)iπ½π½))π½β[π] be a unit vector with respect to π₯ , i. e. |π₯|π₯ = maxπ½β[π]|π₯(π½)| β€1. We embed it in the unit ball with respect to |π₯|π¦ by defining π¦ = (π¦(π½))π½β[π+1] via
π¦i(π½)
πΎπ½ ={οΈπ₯(π½)i
πΎπ½ β©[π]
βοΈ
πβπΎπ½β[π]1ππ=ππ(π) π½ β[π]
βοΈ
πβπΎπ+11ππ=ππ(π) π½ = π + 1.
As π¦(π+1) has a single non-zero entry, it is easy to see that |π¦|π¦ β€ 1. Moreover, by the definition of the matrix πΜοΈ|π| and the fact that if i β πΏ(βΜοΈ), then for π > π, ππ = ππ(π), which implies π¦(π½)iπΎπ½ = π₯(π½)iπΎπ½ β©[π] = π₯(π½)iπ½π½ as well as π¦i(π+1)πΎπ+1 = 1 we have
β¨(π(π,k))iπΌπ, βππ½=1π₯(π½)β©= β¨(πΜοΈ(|π|))iπΌπ, βπ+1π½=1π¦(π½)β©.
Hence, the supremum on the left hand side of (5.30) is taken over a subset of the unit ball with respect to |π₯|π¦. Finally, it remains to prove
β(πΜοΈ|k|)iπΌπβπ¦. β(π΄|k|)iπΌπβπ¦ (5.31) for any partition π¦ β π (πΌπ). This may be achieved as in the proof of Theorem 5.6, replacing Lemma 5.21 by Lemma 5.26 (5).
Combining (5.30) and (5.31) yields (5.29), which finishes the proof.
Finally, we prove Proposition 5.1 and Theorem 5.2, from which Corollary 5.3 follows immediately.
Proof of Proposition 5.1. The case πΌ β (0,1] follows from the π = 2 case of Corollary 5.22. πΌ = 2 corresponds to the HansonβWright inequality.
Proof of Theorem 5.2. Let πΌ β (0,1] and consider the bound given by Theorem 5.23. Fix any π β [π·], and observe that for any πΌ β [π], any iπΌ and any π₯ β π (πΌπ) we have by (5.13)
β(Eπ(π)(π))iπΌπβπ₯ β€ β(Eπ(π)(π))iπΌπβHSβ€ βEπ(π)(π)βHS
5.6 The general sub-exponential case: πΌ β (0,1] 121
as well as
πΌ
π β€ 2πΌ
2|πΌ| + πΌ|π₯ | β€2.
If π‘ β₯ ππβEπ(π)(π)βHS, this immediately yields the result. Otherwise, note that the tail bound given in Theorem 5.2 is trivial. (In fact, here one needs to ensure that πΆπ·,πΌ is sufficiently large, e. g. πΆπ·,πΌ β₯1.)
In a similar way, it is possible to derive the same results for πΌ = 2/π and any π β N from Theorem 5.6. From these results, the exponential moment bound follows by standard arguments, see for example [BGS19, Proof of Theorem 1.1].
APPENDIX A
Approximate tensorization of entropy in finite spaces
The concentration results for finite spin systems were based on an approximate tensorization property of the entropy. Here, we reformulate and provide a complete proof of a result of Marton [Mar15] and moreover rewrite it in the terms of entropy of functions instead of relative entropy of measures. Let π³ be a finite set, π³π its π-fold product and fix a probability measure ππ on π³π. Define the total variation distance, the relative entropy and the Wasserstein-2-type distance
ππ π(π,π) := sup
π΄βπ³π
|π(π΄) β π(π΄)| = 1 2
βοΈ
π₯βπ³π
|π({π₯}) β π({π₯})|,
π»(π || π) = Λ ππ
ππ log ππ
ππππ if π βͺ π π2(π, π) = inf
πβπΆ(π,π)
(οΈβοΈπ
π=1
π(π₯π ΜΈ= π¦π)2)οΈ1/2
,
where πΆ(π,π) is the set of all couplings of π and π, i. e. probability measures π on π³πΓ π³π with marginals π and π.
Remark. The infimum in the definition of π2 is always attained, since πΆ(π,π) is a compact subset of π«(π³πΓ π³π) equipped with the weak topology and the map π β¦β (βοΈππ=1π(π₯π ΜΈ= π¦π)2)1/2 is lower semicontinuous. This fact and the gluing lemma for measures with a common marginal [AG13, Theorem 2.1] can be used to prove that π2 is a distance function on π«(π³π), see for example [Vil09, Chapter 6] for a similar line of reasoning.
Denote by ππ, ππ the pushforward of π and π respectively of the projection onto the π-th coordinate. By the subadditivity of the square root (for the upper bound for π2) as well as the fact that every π β πΆ(π,π) induces a coupling of ππ, ππ via the projection onto the π₯π and π¦π coordinate (for the lower bound), we obtain
π
βοΈ
π=1
π2π π(ππ, ππ) β€ π22(π,π) β€ πππ π(π,π). (A.1)
We need the following lemma, which is a slight improvement of [Mar15, Lemma 2].
Lemma A.1. Let π be a probability measure on a finite space π³ and π½π :=
infπ₯βπ³+π(π₯), where π³+ := {π₯ β π³ : π(π₯) > 0}. For any probability measure π βͺ π we have
π»(π || π) β€ 2π½πβ1π2π π(π,π).
Proof. The shifted logarithm π(π₯) := log(1 + π₯) is a concave function on (β1,β), so that for any π₯ β₯ β1 the inequality π(π₯) β€ πβ²(0)π₯ = π₯ holds. Rewrite ππ = 1+πβππ to obtain
π»(π || π) = βοΈ
π₯βπ³+
π(π₯) (οΈ
1 + π(π₯) β π(π₯) π(π₯)
)οΈ
π(οΈ π(π₯) β π(π₯) π(π₯)
)οΈ
β€ βοΈ
π₯βπ³+
π(π₯) (οΈ
1 + π(π₯) β π(π₯) π(π₯)
)οΈ π(π₯) β π(π₯)
π(π₯) = βοΈ
π₯βπ³+
(π(π₯) β π(π₯))2 π(π₯)
β€ π½πβ1 βοΈ
π₯βπ³+
(π(π₯) β π(π₯))2 β€ π½πβ1ππ π(π,π)βοΈ
π₯βπ³
|π(π₯) β π(π₯)|
= 2π½πβ1π2π π(π,π).
Unfortunately, the factor 2π½πβ1cannot be removed. To see this, let π³ = {0,1} and π(0) = 1βπ(1) = πΌ for a fixed πΌ β (0,1/2). Now, considering the family of measures ππ(0) = πΌ + π, an easy calculation yields π2π π(ππ,π) = π2 and π»(ππ || π) βΌ 2πΌβ1π2.
A consequence of Lemma A.1 is that on finite spaces the relative entropy and the total variation distance are comparable with a constant depending on the (fixed) measure π, i. e. we have
π2π π(π,π) β€ 1
2π»(π || π) β€ π½πβ1π2π π(π,π).
Here, the first inequality is the well-known Pinsker inequality (also known as CsiszΓ‘rβKullbackβPinsker inequality), see for example [Tsy09, Lemma 2.5].
Theorem A.2. Let ππ be a probability measure on a finite space π³π with full support and set π½ := minπβ[π]minπ₯βπ³πππ(π₯π | π₯π).
(π) Let ππ be a probability measure on π³π. If for all subsets πΌ β[π] and all π¦πΌ
we have
π22(ππΌ(Β· | π¦πΌ), ππΌ(Β·, π¦πΌ)) β€ πΆβοΈ
πβπΌ
EππΌ(Β·|π¦πΌ)π2π π(ππ(Β· | π¦π), ππ(Β· | π¦π)), (A.2) then the approximate tensorization property holds:
π»(ππ || ππ) β€ πΆ π½
π
βοΈ
π=1
Eπππ»(ππ(Β· | π¦π) || ππ(Β· | π¦π)). (A.3)
125
In particular, if π denotes the density of ππ with respect to ππ, then this can be rewritten as
(ππ) Assume that an interdependence matrix π΄ of ππ satisfies βπ΄β2β2 <1. Then (A.2) holds with πΆ = (1 β βπ΄β2β2)β2. In particular, (A.3) and (A.4) hold with the same constant.
Proof. (π): We prove the theorem by induction. In the case π = 1 there is nothing to prove if one interprets π1(Β· | π¦1) = π. The disintegration theorem for the relative entropy (see for example [DZ10, Theorem D.13]) yields
π»(ππ || ππ) = 1
We bound the two terms separately. For the first term, using Lemma A.1, equations (A.1), (A.2) and Pinskerβs inequality gives
1
We apply the induction hypothesis to rewrite the second term. For each fixed π β[π] and π¦π β π³ we interpretπΜοΈ:= ππ(Β· | π¦π) as a measure on π³π satisfying
a generic vector. A short calculation shows that the conditional probability of ππ(Β· | π¦π) with respect to the projection πππ : π³π β π³ππ for some π ΜΈ= π is given
Integration with respect to ππ and summation over π yields 1
π
π
βοΈ
π=1
Λ
π»(ππ(Β· | π¦π) || ππ(Β· | π¦π))πππ(π¦π) β€ πΆ π½
π β1 π
π
βοΈ
π=1
Eπππ»(ππ(Β· | π¦π) || ππ(Β· | π¦π)), which combined with the first term proves the assertion.
(A.4) is a simple rewriting of (A.3), noting that as a consequence of the disintegration theorem (or in this case Bayesβ theorem) we have
πππ(Β· | π¦π)
πππ(Β· | π¦π)(π¦π) = Β΄ π(π¦π, π¦π) π(π¦π, π₯π)πππ(π₯π | π¦π) and πππππ
π(π₯π) = Β΄
π(π₯π, π₯π)πππ(π₯π | π₯π).
(ππ): See [Mar15, Theorem 2].
In [Mar15, Theorem 1] it is stated that using the quantity π½ := inf
π=1,...,π inf
π₯βπ³π:ππ(π₯)>0ππ(π₯π | π₯π)
one can deduce ππ(πππ(π₯) = π₯π) β₯ π½ for all π₯π such that the LHS is nonzero. This is possible only if ππ has full support. A counterexample is given by the push-forward of a random uniform permutation under the map π β¦β (π1, . . . , ππ), which satisfies π½ = 1.
Actually, we can modify the condition of Theorem A.2 to allow for measures without full support. This may be done by using the definition ΜοΈπ½ as in (2.24) instead of π½. As ΜοΈπ½ is a uniform control, it can be used in any step of the induction, so that the theorem remains valid. Clearly the inequality (A.3) only holds for ππβͺ ππ only, and (A.4) holds for positive functions vanishing outside the support of ππ.
APPENDIX B
Properties of Orlicz quasinorms
As mentioned in Chapter 5, the Orlicz norms defined in (5.2) satisfy the triangle inequality only for πΌ β₯ 1. Indeed, the function ππΌ : π₯ β¦β exp(π₯πΌ) is not convex around 0 for πΌ β (0,1), see the following figure. A short calculation yields that ππΌ
is convex in the region [(πΌβ1β1)πΌβ1, β) only, and concave around 0.
However, for any πΌ β (0,1) this is still a quasinorm, which for many purposes is sufficient. We shall collect some elementary results on Orlicz quasinorms in this appendix. The first result is a HΓΆlder-type inequality for the πΉπΌ norms.
Lemma B.1. Let π1, . . . , ππ be random variables such that βππβπΉπΌπ < β for some πΌπ β(0,1] and let π‘ := (βοΈππ=1πΌβ1π )β1. Then ββοΈπ
π=1ππβπΉπ‘ < β and
β¦
β¦
β¦
π
βοΈ
π=1
ππβ¦
β¦
β¦πΉπ‘
β€
π
βοΈ
π=1
βππβπΉπΌπ.
Out[ο]=
0.5 1.0 1.5 2.0 2.5 3.0
2 4 6 8 10
1 4 1 2 3 4 1
Figure B.1: The function ππΌ(π₯), π₯ β [0,3] for four different values of πΌ.
Especially for πΌπ = πΌ for all π β [π] this leads to ββοΈππ=1ππβπΉπΌ/π β€βοΈπ
π=1βππβπΉπΌ. The random variables π1, . . . , ππ need not be independent, i. e. we can consider a random vector π = (π1, . . . , ππ) with marginals having πΌ-sub-exponential tails.
Proof. By homogeneity we can assume βπβπΉπΌπ = 1 for all π β [π]. We need the general form of Youngβs inequality, i. e. for all π1, . . . , ππ >1 satisfying βοΈππ=1πβ1π = 1 and any π₯1, . . . , π₯πβ₯0 we have
π
βοΈ
π=1
π₯π β€
π
βοΈ
π=1
πβ1π π₯πππ,
which follows easily from the concavity of the logarithm. If we apply this to ππ := πΌππ‘β1 and use the convexity of the exponential function, we obtain
E exp (οΈβοΈπ
π=1
|ππ|π‘)οΈ
β€ E exp(οΈβοΈπ
π=1
πβ1π |ππ|πΌπ)οΈ
β€
π
βοΈ
π=1
πβ1π E exp
(οΈ|ππ|πΌπ)οΈ
β€2.
This is however equivalent to ββοΈππ=1ππβπΉπ‘ β€1.
To state the other lemmas, for any 0 < πΌ < 1 define
ππΌ := (πΌπ)1/πΌ/2 and π·πΌ := (2π)1/πΌ. Lemma B.2. For any 0 < πΌ < 1 we have
ππΌsup
πβ₯1
βπβπ
π1/πΌ β€ βπβπΉπΌ β€ π·πΌsup
πβ₯1
βπβπ
π1/πΌ . (B.1)
The statement of the lemma remains true for πΌ β₯ 1, with (πΌ-independent constants) ππΌ = 1/2 and π·πΌ = 2π, see [Bob10, Section 8]. We closely follow the proof therein, but keep track of the πΌ-dependent constants.
Proof. We begin with the first inequality. By homogeneity, we assume βπβπΉπΌ = 1.
First let us show that we have
π(π₯) := (πΌπ)β1/πΌππ₯πΌ β π₯ β₯0 for π₯ β₯ 0. (B.2) Note that π is continuous on [0,β) and differentiable on (0,β) with π(0) > 0 and π(π₯) β β as π₯ β β. Therefore, it suffices to find the critical points. We can rewrite the condition πβ²(π₯) = 0 as ππ¦π¦ = π¦1/πΌ(πΌπ)1/πΌ, setting π¦ := π₯πΌ. From this representation it can be seen that there can be at most two points π₯0 and π₯1 satisfying this condition. One of these points is π₯πΌ := πΌβ1/πΌ, which satisfies π(π₯πΌ) = 0. A short calculation shows that πβ²β²(π₯πΌ) = πΌ1/πΌ+1 > 0, so that π₯πΌ is a global minimum, from which π β₯ 0 follows.
129 of (B.2). Consequently, for any π β₯ 1 it holds
E|π|π β€(οΈ π For the second inequality, assume supπβ₯1
βπβπ
π1/πΌ = 1, again by homogeneity. First, we need to extend the the supremum to π β [πΌ, β), which can be done as follows. π₯,π¦ β₯0 and πΌ β [0,1], and the fourth one is an application of Youngβs inequality
ππ β€ π2/2 + π2/2 for all π,π β₯ 0.
Lemma B.4. Let 0 < πΌ < 1. For all random variables π we have
βE πβπΉπΌ β€ 1
ππΌ(log 2)1/πΌβπβπΉπΌ.
Proof. Assuming βπβπΉπΌ < β, an application of Lemma B.2 gives
βE πβπΉπΌ = |E π|
(log 2)1/πΌ β€ βπβ1
(log 2)1/πΌ β€ 1
ππΌ(log 2)1/πΌβπβπΉπΌ.
We can readily infer the following corollary from the last two lemmas.
Corollary B.5. For any 0 < πΌ < 1 and any random variable π it holds
βπ β E πβπΉπΌ β€21/πΌ(οΈ
1 + (ππΌlog 2)β1/πΌ)οΈ βπβπΉπΌ.
APPENDIX C
LSIs and difference operators
To conclude this thesis, we discuss the LSI property (2.3) for different choices of difference operators π€ . Here, we always assume that the probability measure πis defined on a product of Polish spaces π΄ = βππ=1π³π (which itself is a Polish space) and equipped with the product Borel π-algebra π = β¬(βππ=1π³π).
As stated in Chapter 2, we can use the disintegration theorem on Polish spaces to define the difference operators h and d. For finite spaces, π(Β· | π₯π) is just the ordinary conditional probability as used in the definition of the difference operator d. The setting of Polish spaces is clearly more general than the one of finite spin systems. However, the next proposition shows that the dβLSI property in fact enforces the underlying space to be finite. More precisely, in this case we say that πhas finite support if there is no sequence of sets π΄π β π with π(π΄π) > 0 for any π and π(π΄π) β 0.
Proposition C.1. Let π΄ = βππ=1π³π be a product of Polish spaces, and π be a probability measure on π΄. If π satisfies a dβLSI(π2) for some π2 < β, then π has finite support. Moreover, if π is a product probability measure, then π satisfies a dβLSI(π2) if and only if π has finite support.
Proof. First assume π does not have finite support, i. e. there exists a sequence π΄πβ π with π(π΄π) β 0. Choosing ππ := 1π΄π β πΏβ(π) and assuming a dβLSI(π2) holds, we obtain
π(π΄π) log(1/π(π΄π)) = Entπ(ππ2) β€ 2π2 Λ
(dππ)2ππ= 2π2π(π΄π)(1βπ(π΄π)). (C.1) This easily leads to a contradiction as π β β.
On the other hand, let π be a product probability measure with finite support.
By tensorization, it suffices to consider π = 1, and we may moreover assume π΄ to have finitely many elements only. Then, by [BT06, Remark 6.6], π satisfies a dβLSI(π2) with π2 β€ πΆlog(1/ minπ¦:π(π¦)>0π(π¦)), which finishes the proof.
In fact, Proposition C.1 can be adapted to the difference operator h+ as well.
To see this, note that (C.1) can easily be rewritten for the difference operator h+ (with only minor changes) and Β΄
|dπ |2ππ β€Β΄
|h+π |2ππ. In particular, the d- and h+-LSI properties are not essentially different.
The situation drastically changes if we consider hβLSIs instead. Here, a sufficient condition for the hβLSI property to hold is that the measure π satisfies an approximate tensorization (AT) property. As a consequence, for product probability measures, satisfying an h-LSI is in fact a universal property.
Theorem C.2. Let π΄ = βππ=1π³π be a product of Polish spaces, and π be a probability measure on π΄. If π satisfies an approximate tensorization property
Entπ(π2) β€ πΆ
π
βοΈ
π=1
Λ
Entπ(Β·|π₯π)(π2(π₯π, Β·))πππ(π₯π),
then π also satisfies an hβLSI(πΆ). In particular, any product probability measure satisfies an hβLSI(1).
For product measures, the theorem might be compared to the EfronβStein inequality (see e. g. [ES81; Ste86]) which establishes the tensorization property for the variance, and can be regarded as a universal PoincarΓ© inequality with respect to d (see [BGS19] for such an interpretation). However, Theorem C.2 does not imply the EfronβStein inequality due to the usage of h instead of d. Unfortunately, as Proposition C.1 demonstrates, there is no βentropy versionβ of the EfronβStein inequality of the form Entπ(π2) β€ πΆ Eπ|dπ |2 valid for product probability measure πand some universal constant πΆ.
Unfortunately, it seems impossible to use the entropy method based on hβLSI.
More precisely, Theorem C.2 cannot be used to estimate the growth of πΏπ norms as in the setting of a dβLSI(π2). Indeed, it is impossible to prove the required moment inequalities
βπ β E πβπ β€(π2π)1/2βhπ βπ (C.2) under an hβLSI(π2). For example, the measure ππ = ππΏ1 + (1 β π)πΏ0 satisfies hβLSI(π2π) with ππ2 βΌ π(1 β π) log(1/π) (for π β 0), so that (C.2) would imply for π(π₯) = π₯ an upper bound on the Orlicz norm associated to πΉ2(π₯) = ππ₯2 β1
βπ β E πβπΉ2 β€2π sup
πβ₯1
βπ β E πβπ
π1/2 β€4πππ. However, a simple calculation shows that E exp (οΈ(π βE π )16π2π22
π )οΈ β βas π β 0.
The approximate tensorization property in Theorem C.2 is interesting in its own right, but it is not yet well-studied. We have seen sufficient conditions in Appendix A. Similar results have been derived in [CMT15], which can be applied in discrete and continuous settings. For example, if one considers a measure of the form
π(π₯) = πβ1
π
βοΈ
π=1
π0,π(π₯π) exp(οΈ βοΈ
π,π
π½πππ€ππ(π₯π,π₯π))οΈ
for some countable spaces πΊπ, π₯π β πΊπ, measures π0,π on πΊπ and bounded functions
133
π€ππ, under certain technical conditions π satisfies an approximate tensorization property, which is independent of any functional inequality for π0,π.
On the other hand, the AT(πΆ) property requires some certain weak dependence conditions. For example, the push-forward of a random permutation π of [π] to Nπ cannot satisfy an approximate tensorization property. It is an interesting question to find necessary and sufficient conditions for the approximate tensorization property to hold.
Now we prove Theorem C.2. Here we include a simplified proof, for the original proof we refer to [GSS18, Theorem 5.2].
Proof of Theorem C.2. Our first step is to that for any bounded, measurable function π the inequality
Entπ(π2) β€ 2 sup
π₯,π¦(π(π₯) β π(π¦))2 (C.3)
holds. From log(π₯) + 1 β€ π₯ for all π₯ > 0 we can deduce
π₯2β π₯log π₯ β π₯ β₯ 0 for all π₯ β₯ 0. (C.4) Let π β₯ 0 be a measurable function satisfyingΒ΄
π ππ= 1. Integrating (C.4) with respect to π yields Λ
π2ππ β1 β₯ Λ
πlog πππ,
i. e. by using the definition and the homogeneity of the entropy we obtain for any measurable function π
Λ
π2ππEntπ(π2) β€ Varπ(π2).
Moreover, we have the upper bound Varπ(π2) = 1
2
Β¨
(π(π₯) β π(π¦))2(π(π₯) + π(π¦))2ππ(π₯)ππ(π¦)
β€2 sup
π₯,π¦(π(π₯) β π(π¦))2 Λ
π2ππ.
This proves (C.3) and the case π = 1.
For arbitrary π, the proof is now easily completed. Assume that π β πΏβ(π), i. e.
ππ(π₯π)-a.s. we have π(π₯π, Β·) β πΏβ(π(Β· | π₯π)). For these π₯π, the π = 1 case yields Entπ(Β·|π₯π)(π2(π₯π, Β·)) β€ 2 sup
π¦β²π,π¦πβ²β²
(οΈπ(π₯π, π¦πβ²) β π(π₯π, π¦β²β²π))οΈ2. Plugging this into the assumption leads to
Entπ(π2) β€ 2πΆ
π
βοΈ
π=1
Λ sup
π¦β²π,π¦πβ²β²
(οΈπ(π₯π, π¦πβ²) β π(π₯π, π¦β²β²π))οΈ2πππ(π₯π) = 2πΆ Λ
|hπ |2ππ.
As for the second part, it is a classical fact that product measures satisfy the tensorization property (i. e. AT(1)), see for example [Led01, Proposition 5.6], [BBLM05, Theorem 4.10] or [Han16, Theorem 3.14]. Moreover, in this case, the assumption that π΄ is a product of Polish spaces can be dropped by simply defining π(Β· | π₯π) = ππ.
Open questions
Question 1. The approximate tensorization of entropy property for weakly dependent spin systems has two major drawbacks. First off, it relies on Lemma A.1 and thus on the comparability of the relative entropy and the total variation distance. However, these can only be compared in finite spaces, severely limiting the applicability of Theorem A.2. There are two other works proving approximate tensorization of entropy:
β’ in [Mar13] the author has succeeded in proving it for measures of the form exp(βπ (π₯))ππ₯ in Rπ under technical conditions on π , but it relied on properties of Rπ,
β’ [CMT15] used weak dependence condition allowing for the approximate tensorization for countable spaces as well, but it yields sub-optimal conditions in the CurieβWeiss model on {β1,+1}π, whereas Theorem A.2 can be applied in the full regime.
The second problem arises from the fact that the interdependence matrix π½ always assumes the worst possible configuration π₯. However, this does not take into
The second problem arises from the fact that the interdependence matrix π½ always assumes the worst possible configuration π₯. However, this does not take into