• No results found

Concentration inequalities for order statistics Using the entropy method and Rényi s representation

N/A
N/A
Protected

Academic year: 2021

Share "Concentration inequalities for order statistics Using the entropy method and Rényi s representation"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

Concentration inequalities for order statistics

Using the entropy method and Rényi’s representation

Maud Thomas

1

in collaboration with Stéphane Boucheron1 1LPMA Université Paris-Diderot

High Dimensional Probability VII Cargèse, May 26 - 30, 2014

(2)

Background: order statistics

Sample: X1, . . . ,Xn„i.i.d.F.

Order statistics

Xp1qě. . .ěXpnq non-increasing rearrangement ofX1, . . . ,Xn. Xp1q: sample maximum. Xpn{2q: sample median. @k,PtXpkqďtu “ řn i“k `n i ˘ Fiptqp1 ´Fptqqn´i.

Classical statistic theoryandExtreme Value Theoryprovide: Asymptotic distributions.

Convergence of moments.

Goal

(3)

Background: concentration

Concentration of measure phenomenon

Any function of many independent random variables that does not

depend too much on any of them is concentrated around its mean

value.

Example: Gaussian concentration

X a standard Gaussian vector andZ “fpXq. Poincaré’s inequality: VarrZs ďE}∇f}2. Gross logarithmic Sobolev inequality: EntrZ2

s ď2E}∇f}2. Cirelson’s inequality: PtZ ěEZ `tu ďexpp´t2

{p2L2

qqif

(4)

Gaussian case and the Poincaré’s inequality

fpX1, . . . ,Xnq “Xpkq

the rankk order statistic of a sample is a simple function ofn independent random variables.

||∇f|| “1.

Xi are standard Gaussian.

Poincaré’s inequalityñVarrXpkqs ď1.

Extreme Value TheoryñVarrXp1qs “Op1{lognq.

Classical statistic theoryñVarrXpn{2qs “Op1{nq.

We do not understand (clearly)

(5)

Order statistics and spacings

Proposition (Boucheron, T. (2012))

For all0ăk ďn{2 VarrXpkqs ďkE ”` Xpkq´Xpk`1q ˘2ı “kEr∆k2s. For allλPR, Ent“eλXpkq‰ : λErX pkqe

λXpkqs ´EreλXpkqslogEreλXpkqs ď kE“eλXpk`1qψpλpX

pkq´Xpk`1qqq

“ kE“eλXpk`1qψpλkq‰ withψpxq “1` px´1qex.

(6)

Remarks

Vk :“k∆2kis called the Efron-Stein estimate of the variance ofXpkq.

Without any assumption such as:

F belongs to themax-domain of attraction of an extreme value distributionG,i.e

lim

nÑ`8F n

panx`bnq “Gpxq

for every continuity pointx ofG.

pXpkqqis a sequence of

extremeorder statistics, ifk fixed,nÑ 8;

centralorder statistics, ifk{nÑpP p0,1qwhile,nÑ 8;

(7)

Proof

Efron-Stein inequality (Efron, Stein (1981))

Letf:RnÑRbe measurable, and letZ “fpX1, . . . ,Xnq.

LetZi“fipX1, . . . ,Xi´1,Xi`1, . . . ,Xnqwhere fi:Rn´1ÑRis an arbitrary measurable function.

SupposeZ is square-integrable, then:

VarrZs ďE « n ÿ i“1 pZ´Ziq2 ff .

Modified logarithmic Sobolev inequality (Wu(2000); Massart

(2000))

Letτpxq “ex

´x´1. With the same notations, for any λPR,

Ent“eλZ‰ďE « n ÿ i“1 eλZτp´λpZ´Ziqq ff .

(8)

Graphical assessment

+ + + + + + + + + + 1 2 3 4 5

Ratio between the Efron-Stein estimate and the variance of the maximum ofn independent Gaussian random variables. n“2p for

p“1, . . . ,10. The asymptote is the line

(9)

Rényi’s representation

The order statistics of an exponential sample are distributed as partial sums of independentexponentially distributed random variables.

Rényi’s representation (Rényi (1953))

LetYp1qěYp2qě. . .ěYpnq be the order statistics of an independent

sample of the standard exponential distribution, then

` Ypnq, . . . ,Ypiq, . . . ,Yp1q ˘ „`Enn, . . . , n ÿ k“i Ek k, . . . , n ÿ k“1 Ek k ˘

(10)

Quantile transformation

Definition (Quantile function)

ppq “inftx:Fpxq ěpu,pP p0,1q .

Notation

Uptq “FÐp1

´1{tq,t P p1,8q.

Representation for order statistics

IfYp1qě. . .ěYpnqare the order statistics of an exponential sample, then

pU˝expqpYp1qq ě. . .ěpU˝expqpYpnqq

(11)

Hazard rate, spacings and order statistics

Definition (Hazard rate)

The hazard ratehof a differentiable distribution functionF is defined as:

h“F1{F

“F1{p1

´Fq .

Lemma

The distribution function F has non-decreasing hazard rate h, iffU˝exp

is concave. Indeed,

pU˝expq1“ 1 hpU˝expq .

If the distribution is log-concave, then the associated hazard rate is non-decreasing.

(12)

Variance bound for order statistics when the

hazard rate is non-decreasing

Recall: Vk “k∆2k.

Proposition (Boucheron, T. (2012))

If F hasnon-decreasing hazard rateh, then for1ďkďn{2,

Var“Xpkq ‰ ďEVk ď 2 kE „ ´ 1 hpXpk`1qq ¯2 .

(13)

Towards an exponential Efron-Stein

inequality

Definition (Exponential Efron-Stein inequality)

LetZ “fpX1, . . . ,XnqwhereX1, . . . ,Xn are independent random

variables andV its Efron-Stein estimate of the variance ofZ.

Z satisfies anexponential Efron-Stein inequalityif for allθ,λą0 such thatλθă1 andE“eλV{θ‰ă 8: logE ” eλpZ´EZqı ď λθ 1´λθlogE ” eλV{θı .

Problem

For an exponential sample,E“eλV{艓 8. ëFind another decoupling inequality.

(14)

Decoupling inequality: negative association

Negative association

X andY are negatively associated if for any non-decreasing functionsf,g

ErfpXqgpYqs ďErfpXqsErgpYqs.

Lemma

If the distribution function F has non-decreasing hazard rate, then Xpk`1q

(15)

Exponential Efron-Stein inequality for order

statistics

Proposition (Boucheron, T. (2012))

If F hasnon-decreasing hazard rateh, then forλě0, and1ďk ďn{2,

logEeλpXpkq´EXpkqq ď λk 2E “ ∆k ` eλ∆k´1˘‰ “ λk 2E «c Vk k ´ eλ?Vk{k´1 ¯ ff .

(16)

Gaussian hazard rate

Uptq “ΦÐp1´1{tqfortą1. 0 2 4 6 8 10 −1 0 1 2 3 4 U(e xp(x)) Gaussian distribution 0 2 4 6 8 10 0 1 2 3 4 U(2 e xp(x)) Absolute value of Gaussian random variable

(17)

Variance of absolute values of Gaussian

random variables

Proposition (Boucheron, T. (2012))

Let ně3, let Xpkqbe the rank k order statistic of absolute values of n

standard independent Gaussian random variables, VarrXpkqs ď 1 klog 2 8 log`2n k ˘

´logp1`k4log log`2n k

˘ q .

For the maximum (k “1), the bound becomes: 1

log 2

8

log 2n´logp1`4 log log 2nq .

Mn : maximum ofnstandard Gaussian r.v

Chatterjee (Talagrand L1-L2 inequality): VarrMns ď 1`log1 n.

(18)

Bernstein inequality for the maximum of

absolute values of Gaussian random

variables

Theorem (Boucheron, T. (2012))

For n such that the solution vn of equation

16{x`logp1`2{x`4 logp4{xqq “logp2nq

is smaller than 1, for all0ďλă?1 vn, logEeλpXp1q´EXp1qqď vnλ 2 2p1´?vnλq .

(19)

References

Related documents