Exam 2020-2021 - High-dimensional statistics

High-dimensional statistics

EXAM: duration 2h30

Documents, calculators, phones and smartphones are forbidden

Exercice 1

For n ∈ N^∗ and σ > 0, we consider the following statistical model X = β^∗+ σZ,

with X = (X₁, . . . , X_n)^T ∈ Rⁿ and Z = (Z₁, . . . , Z_n)^T ∈ Rⁿ so that the Z_i’s are i.i.d with common distribution N (0, 1). The goal is to estimate β^∗ = (β₁^∗, . . . , β_n^∗)^T ∈ Rⁿ by using the observation X. We denote k · k the classical `₂-norm.

1. (a) We consider ˆβ¹ the penalized estimate defined by βˆ¹ ∈ arg min

β∈RⁿkX − βk²+ 2λpen(β)

with λ > 0 and where the penalty pen : Rⁿ 7→ R+ is a function depending only on β. Write the penalty pen corresponding to the Lasso estimate.

Correction : pen(β) = kβk₁ =Pn i=1|β_i|.

(b) Establish that for any i ∈ {1, . . . , n}, ˆβ_i¹, the ith coordinate of ˆβ¹, is obtained by minimizing the function

Θ : t ∈ R 7→ t²− 2tX_i+ 2λ|t|.

Correction : We have:

kX − βk²+ 2λpen(β) =

i=1

β_i²− 2β_iX_i+ X_i²+ 2λ|β_i|i

and minimization is obtained coordinatewise.

βˆ_i¹ = sign(X_i)(|X_i| − λ)₊,

where sign(X_i) ∈ {+1, −1} denotes the sign of X_iand (|X_i|−λ)₊ = max(|X_i|−

λ; 0).

Correction : On R+, Θ(t) = t²− 2tX_i+ 2λt and arg min

t∈R+

Θ(t) = max(X_i − λ; 0) On R⁻, Θ(t) = t²− 2tX_i− 2λt and

arg min

t∈R−

Θ(t) = min(X_i+ λ; 0) - If X_i ≥ λ,

Θ(X_i−λ) = X_i²+λ²−2λX_i−2(X_i−λ)X_i+2λ(X_i−λ) = −X_i²−λ²+2λX_i = −(X_i−λ)² ≤ 0.

- If X_i ∈ [−λ; λ], Θ(0) = 0.

- If Xi ≤ −λ,

Θ(X_i+λ) = X_i²+λ²+2λX_i−2(X_i+λ)X_i−2λ(X_i+λ) = −X_i²−λ²−2λX_i = −(X_i+λ)² ≤ 0.

Finally,

βˆ_i¹ =







X_i− λ if X_i ≥ λ 0 if X_i ∈ [−λ; λ]

X_i + λ if X_i ≤ −λ







= sign(X_i)(|X_i| − λ)₊.

2. In the sequel, for F : Rⁿ 7→ Rⁿ a measurable function, we denote ˆβ = F (X) an estimate of β^∗. We denote (F₁, . . . , F_n) the R-valued components of F that are assumed to be C¹, so that F (X) = (F₁(X), . . . , F_n(X)). We consider g : R 7→ R assumed to be C¹ such that E[|g⁰(Z₁)|] < ∞. We denote φ the density of Z₁.

(a) Prove that for any t ∈ R,

φ⁰(t) = −tφ(t).

Correction : The density of Z₁ is φ(t) = ^√¹

2πexp(−t²/2). Therefore φ⁰(t) = −t × 1

√2πexp(−t²/2) = −tφ(t).

(b) Establish that Correction : Fubini’s theorem holds since

Z +∞

3. In the sequel, we fix i ∈ {1, . . . , n} and introduce G_i : Rⁿ 7→ R such that for any u ∈ Rⁿ,

Gi(u) = Fi(β^∗+ σu)

and X⁽⁻ⁱ⁾ the vector of Rⁿ⁻¹ built from X by removing its ith component:

X⁽⁻ⁱ⁾ = (X₁, . . . , X_i−1, X_i+1, . . . , X_n).

(a) Prove that the ith partial derivative of F_i satisfies

∂F_i

∂x_i(X) = 1 σ

∂G_i

∂x_i(Z).

Correction : We have, with u = (u₁, . . . , u_n)^T, Fi(u) = Gi

u₁− β₁^∗

σ , . . . ,u_n− β_n^∗ σ

, which implies

∂F_i

∂x_i(u) = 1 σ

∂G_i

∂x_i

u₁− β₁^∗

σ , . . . ,u_n− β_n^∗ σ

. With u = X, we obtain:

∂Fi

∂x_i(X) = 1 σ

∂Gi

∂x_i(Z).

(b) By using (A.8), deduce that

∂F_i

∂x_i(X) | X⁽⁻ⁱ⁾

= 1

σEZ_iG_i(Z) | X⁽⁻ⁱ⁾.

Correction : We have X⁽⁻ⁱ⁾ = β^∗(−i)+ σZ⁽⁻ⁱ⁾. So, by using (A.8),

∂F_i

∂x_i(X) | X⁽⁻ⁱ⁾

= 1 σE

∂G_i

∂x_i(Z) | Z⁽⁻ⁱ⁾

= 1

σEZ_iG_i(Z₁, . . . , Z_i−1, Z_i, Z_i+1, . . . , Z_n) | Z⁽⁻ⁱ⁾

= 1

σEZ_iG_i(Z) | X⁽⁻ⁱ⁾.

Correction : We have

E[C] = E[kX − F (X)k²] + 2σ²E Using the previous question, we have

E[C] = E[kF (X) − β^∗k²].

5. We now assume that the estimate F (X) depends on a hyperparameter λ; we write F (X) ≡ F^λ(X).

(a) Deduce a method to select the hyperparameter λ.

Correction : Since C is an unbiased estimate of E[kF (X) − β^∗k²], we naturally minimize the function

λ 7→ kX − F^λ(X)k²+ 2σ²

i=1

∂F_i^λ

∂x_i (X)

(b) We now assume that for all i ∈ {1, . . . , n}, F_i^λ(X) only depends on X_i and there exists H_i^λ such that for any x ∈ R,

F_i^λ(x) = Z x

H_i^λ(t)dt.

Prove that

C˜^λ = kX − F^λ(X)k²+ 2σ²

i=1

H_i^λ(X_i) − nσ² satisfies

E[C˜^λ] = E[kF^λ(X) − β^∗k²].

Correction : We assume that F_i^λ is absolutely continuous on R: there exists H_i^λ such that for any x ∈ R,

F_i^λ(x) = Z x

H_i^λ(t)dt.

We can check that all previous computations hold by replacing ^∂F_∂xⁱ^λ

i(X) by H_i^λ (see (A.9)).

(c) We consider the soft thresholding rule and estimate each coordinate β_i^∗ by F_i^λ(X) = sign(X_i)(|X_i| − λ)₊. Determine a good criterion to select λ.

Correction : We set

H_i^λ(x) = 1_{|x|>λ}, x ∈ R.

By distinguishing the cases x > 0 and x < 0, we obtain F_i^λ(x) =

Z x 0

H_i^λ(t)dt.

We obtain the SURE criterion (Stein Unbiased Risk Estimate criterion):

C˜^λ = kX − F^λ(X)k²+ 2σ²card{i : |X_i| > λ} − nσ².

Exercice 2

In the sequel, we denote

L2(R) :=

f : R 7→ C : kf k²₂ :=

|f (t)|²dt < ∞

endowed with the Euclidian scalar product:

hf, gi :=

f (t)g(t)dt.

For any f ∈ L²(R), we denote bf its Fourier transform:

f (ξ) :=b Z

e^−itξf (t)dt, ξ ∈ R.

We recall the inversion formula:

f (t) = 1 2π

e^itξf (ξ)dξ,b t ∈ R and the Plancherel formula:

f (t)g(t)dt = 1 2π

f (ξ)b bg(ξ)dξ, Z

|f (t)|²dt = 1 2π

| bf (ξ)|²dξ.

We recall that a multiresolution analysis V = (V_j)_j∈Z is a sequence of nested vector spaces satisfying

{0} ⊂ · · · ⊂ V_j+1⊂ V_j ⊂ V_j−1 ⊂ · · · ⊂ L2(R)

such that, for any j ∈ Z, if PVj is the orthogonal projection on V_j, for any f ∈ L2(R), 1. kP_V_jf − f k₂ ^j→−∞−→ 0

2. kP_V_jf k₂ ^j→+∞−→ 0

3. f ∈ V_j ⇐⇒ x 7→ f (x/2) ∈ V_j+1 for any j ∈ Z 4. f ∈ V_j ⇐⇒ x 7→ f (x + 2^jk) ∈ V_j for any (j, k) ∈ Z²

5. ∃ φ such that (φ_k)_k∈Z is an orthonormal basis of V₀ with for any x ∈ R, φk(x) = φ(x − k).

In the sequel, we set

φ(t) = sin(πt)

πt , t ∈ R.

The goal is to prove that the sequence of vector spaces (V_j)_j∈Z, defined by V_j =n

f ∈ L2(R) : supp( bf ) ⊂ [−2^−jπ, 2^−jπ)o

, j ∈ Z, is a multiresolution analysis associated with φ.

1. Establish that

φ(ξ) = 1b _[−π,π)(ξ), ξ ∈ R.

Indication : Use the Fourier inversion formula.

Correction : We have for any t ∈ R:

1 2π

e^itξ1_[−π,π)(ξ)dξ = 1

2πit(e^itπ− e^−itπ)

= sin(πt) πt . This provides the result.

2. We consider f ∈ V₀ and we set g such that bg is 2π-periodic and bg(ξ) := bf (ξ), ξ ∈ [−π, π).

(a) By using that any 2π-periodic function can be decomposed on (ξ 7→ e^ikξ)_k∈Z, prove that there exists (ak)_k∈Z ∈ `2(Z) such that for any ξ ∈ R,

bg(ξ) =X

k∈Z

a_ke^−ikξ.

Correction : Since bg is 2π-periodic, this is obvious.

(b) By computing bφk(ξ) for any k ∈ Z and any ξ ∈ R, deduce that f (t) =X

k∈Z

a_kφ_k(t), t ∈ R.

Correction : We have for any k ∈ Z and ξ ∈ R, φb_k(ξ) =

e^−itξφ(t − k)dt

= e^−ikξ Z

e^−itξφ(t)dt

= e^−ikξ1_[−π,π)(ξ).

Then, we have

f (ξ) =b bg(ξ)1_[−π,π)(ξ)

k∈Z

a_ke^−ikξ1_[−π,π)(ξ)

k∈Z

a_kφb_k(ξ).

This gives the result.

3. We study projections on the spaces (V_j)_j∈Z.

(a) Let j ∈ Z. Prove that for any f ∈ L2(R), the projection of f on Vj is given by the function f_j such that for any ξ ∈ R,

fb_j(ξ) := bf (ξ) × 1_[−2^−j_π,2^−j_π)(ξ).

Correction : We obviously have that f_j ∈ V_j. Furthermore, for any g_j ∈ V_j then bg_j is supported by [−2^−jπ, 2^−jπ), we have that

hf − fj, gji = Z

(f (t) − fj(t))gj(t)dt = 1 2π

( bf (ξ) − bfj(ξ))bgj(ξ)dξ = 0.

This gives the result.

(b) Let f ∈ L2(R). Prove that kPVjf − f k₂ ^j→−∞−→ 0.

Correction : We have:

kP_V_jf − f k²₂ = 1

2πk bf_j − bf k²₂ = 1 2π

ξ6∈[−2^−jπ,2^−jπ)

| bf (ξ)|²dξ.

Since bf ∈ L2(R), the right hand side goes to 0 when j → −∞.

Correction : We have:

kP_V_jf k²₂ = 1

2πk bf_jk²₂ = 1 2π

ξ∈[−2^−jπ,2^−jπ)

| bf (ξ)|²dξ.

Since bf ∈ L2(R), the right hand side goes to 0 when j → +∞.

4. Establish that (V_j)_j∈Z is a multiresolution analysis.

Correction :

- Previous questions give Conditions 1 and 2.

- We prove Condition 5. We have that for any (k, k⁰) ∈ Z², Z

φ_k(t)φ_k⁰(t)dt = 1 2π

φˆ_k(ξ) ˆφ_k⁰(ξ)dξ = 1 2π

Z π

−π

e^i(k⁰^−k)ξdξ = 1{k=k⁰}, by using the first part of Question 2b). This result, combined with the second part of Question 2b), shows that Condition 5 is satisfied.

- We prove Condition 3. Let f ∈ L2(R). We denote g : x 7→ f (x/2). We have for ξ ∈ R,

bg(ξ) = Z

f (x/2)e^−ixξdx = 2 Z

f (t)e^−2itξdt = 2 bf (2ξ) and for any j ∈ Z,

supp( bf ) ⊂ [−2^−jπ, 2^−jπ) ⇐⇒ supp(bg) ⊂ [−2^−j−1π, 2^−j−1π) meaning that

f ∈ V_j ⇐⇒ x 7→ f (x/2) ∈ V_j+1.

- We prove Condition 4. Let f ∈ L2(R) and (j, k) ∈ Z². We denote g : x 7→

f (x + 2^jk). We have for ξ ∈ R,

bg(ξ) = Z

f (x + 2^jk)e^−ixξdx = eⁱ²^j^kξ Z

f (t)e^−itξdt = eⁱ²^j^kξf (ξ)b and

supp( bf ) = supp(bg) meaning that

f ∈ V_j ⇐⇒ g ∈ V_j.

[1] Aza¨ıs, J.M. et Bardet, J.M. (2005) Le mod`ele lin´eaire par l’exemple. Dunod.

[2] Bickel, P.J. et Doksum, K.A. (2000) Mathematical Statistics: Basic Ideas and Selected Topics. Prentice Hall.

[3] B¨uhlmann, P. and van de Geer, S. (2000) Statistics for High-Dimensional Data.

Springer.

[4] Cornillon, P.A. et Matzner-Løber, E. (2007) R´egression - Th´eorie et applications.

Springer.

[5] Cornillon, P.A. et Matzner-Løber, E. (2011) R´egression avec R. Springer.

[6] Dacunha-Castelle, D. et Duflo, M. (1997) Probabilit´es et statistiques. Masson.

[7] Daubechies, I. (1992). Ten lectures on wavelets. CBMS-NSF Regional Conference Series in Applied Mathematics, 61. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA

[8] Dobson, A.J. et Barnett, A.G. (2008) An introduction to generalized linear models.

Chapman & Hall.

[9] Donoho, D. (2000) High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. American Math. Society ”Math Challenges of the 21st Century” . [10] Fourdriner, D. (2002) Statistique inf´erentielle. Dunod.

[11] Giraud, C. (2015) Introduction to High-Dimensional Statistics. Chapman & Hall.

[12] Guyon, X. (2001) Statistique et ´econom´etrie. Ellipse.

[13] H¨ardle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998) Wavelets, ap-proximation, and statistical applications. Lecture Notes in Statistics, 129. Springer-Verlag, New York.

109

[14] Hastie, T., Tibshirani, R. and Friedman J. (2009) The Elements of Statistical Learn-ing: Data Mining, Inference, and Prediction. Springer, New York.

[15] Mallat, S. (1998). A wavelet tour of signal processing. Academic Press, Inc., San Diego, CA.

[16] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Mathematics. Springer.

[17] Lafaye de Micheaux, P., Drouilhet R. et Liquet B. (2011) Le logiciel R - Matriser le langage Effectuer des analyses statistiques Springer

[18] McCullagh, P. et Nelder, J.A. (1989) Generalized linear models. Chapman & Hall.

[19] Rivoirard, V. et Stoltz, G. (2009) Statistique en action. Vuibert.

[20] Saporta, G. (1990) Probabilit´es, analyse des donn´ees et statistique. Technip.

[21] Tenenhaus, M. (1998) La r´egression PLS - Th´eorie et pratique. Technip.

[22] Tenenhaus, M. (2007) Statistique - Méthodes pour décrire, expliquer et prévoir.

Dunod.

[23] Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. Roy. Statist.

Soc. Ser. B 58, no. 1, 267–288.

[24] Wasserman, L. (2005) All of statistics. A concise course in statistical inference.

Springer.

In document High-dimensional statistics (Page 98-110)