Regularization methods for Sliced Inverse Regression

(1)

Regularization methods for Sliced Inverse Regression

Caroline Bernard-Michel, Laurent Gardes, Stephane Girard

To cite this version:

Caroline Bernard-Michel, Laurent Gardes, Stephane Girard. Regularization methods for Sliced Inverse Regression. 8th International Conference on Operations Research,, Feb 2008, La Ha-vane, Cuba. <hal-00985822>

HAL Id: hal-00985822

https://hal.inria.fr/hal-00985822

Submitted on 30 Apr 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche fran¸cais ou étrangers, des laboratoires publics ou privés.

(2)

Regularization methods for Sliced Inverse Regression

Stéphane Girard

Team Mistis, INRIA Rhône-Alpes, France http ://mistis.inrialpes.fr/˜girard

February 2008

(3)

Outline

1 Sliced Inverse Regression (SIR)

2 Inverse regression without regularization

3 Inverse regression with regularization

4 Validation on simulations

(4)

Outline

(5)

SIR : Goal

[Li, 1991]

Infer the conditional distribution of a response r.v. Y ∈R

given a predictor X∈Rp.

Whenp is large, curse of dimensionality.

Sufficient dimension reduction aims at replacingX by its projection onto a subspace of smaller dimension without loss of information on the distribution of Y givenX.

The central subspaceis the smallest subspace S such that, conditionally on the projection of X onS,Y andX are independent.

(6)

SIR : Basic principle

Assumedim(S) = 1 for the sake of simplicity,i.e. S=span(b), withb∈Rp =⇒ Single index model :

Y =g(btX) +ξ whereξ is independent ofX.

Idea:

Find the directionb such thatbt_X _{best explains} _Y_.

Conversely, when Y is fixed,btX should not vary.

Find the directionb minimizing the variations ofbtX given Y.

In practice:

The range of Y is partitioned intoh slicesSj. Minimize the within slice variance ofbt_X _{under the}

normalization constraint var(btX) = 1.

Equivalent to maximizing the between slice varianceunder the same constraint.

(7)

(8)

SIR : Estimation procedure

Given a sample{(X1, Y1), . . . ,(Xn, Yn)}, the directionb is

estimated by

ˆ_b_{= argmax}

b

btΓˆb u.c. btΣˆb= 1. (1)

whereΣˆ is the estimated covariance matrix and Γˆ is the between slice covariance matrix defined by

ˆ Γ = h X j=1 nj n( ¯Xj−X¯)( ¯Xj−X¯) t_, _X_¯ j = 1 nj X Yi∈Sj Xi,

with nj is proportion of observations in sliceSj. The optimization

problem (1) has an explicit solution :ˆbis the eigenvector of Σˆ−1_Γˆ

(9)

SIR : Limitations

Problem :Σˆ can be singular, or at least ill-conditioned, in several situations.

Since rank( ˆΣ)≤min(n−1, p), ifn≤pthen Σˆ is singular. Even whenn andpare of the same order, Σˆ is ill-conditioned, and its inversion introduces numerical instabilities in the estimation of the central subspace.

Similar phenomena occur when the coordinates ofX are highly correlated.

(10)

SIR : Numerical experiment (1/2)

Experimental set-up.

A sample {(X1, Y1), . . . ,(Xn, Yn)}of size n= 100 where

Xi∈Rp with p= 50andYi∈R, fori= 1, . . . , n.

Xi∼ Np(0,Σ) with Σ =Q∆Qt where

∆ =diag(pθ_{, . . . ,}₂θ_,₁θ_),

Qis a matrix drawn from the uniform distribution on the set of orthogonal matrices.

=⇒ The condition number ofΣ ispθ.(Here,θ= 2).

Yi=g(btXi) +ξ where

g is the link functiong(t) = sin(πt/2),

bis the true directionb= 5−1/2Q(1,1,1,1,1,0, . . . ,0)t, ξ∼ N₁(0,9.10−4)

(11)

SIR : Numerical experiment (2/2)

Blue : Projectionsbt_X

i on the

true directionbversus Yi, Red : Projections ˆbtXi on

the estimated direction ˆb ver-susYi,

(12)

Outline

(13)

Single-index inverse regression model

Model introduced in[Cook, 2007].

X =µ+c(Y)V b+ε, (2)

where

µ andbare non-random Rp− vectors,

ε∼ Np(0, V), independent of Y,

c:R→R is a nonrandom coordinate function.

Consequence :The conditional expectation ofX−µgiven Y is a degenerated random vector located in the directionV b.

(14)

Maximum Likelihood estimation (1/3)

Projection estimator of the coordinate function. c(.) is expanded as a linear combination of h basis functions sj(.),

c(.) =

h

X

j=1

cjsj(.) =st(.)c,

wherec= (c1, . . . , ch)t is unknown and

s(.) = (s1(.), . . . , sh(.))t. Model (2) can be rewritten as

X =µ+st(Y)cV b+ε, ε∼ Np(0, V),

Definition :Signal to Noise Ratio in the direction b.

ρ= b

t_Σ_b₋_bt_{V b}

(15)

Maximum Likelihood estimation (2/3)

Notations

W : the h×hempirical covariance matrix ofs(Y) defined by

W = 1 n n X i=1 (s(Yi)−s¯)(s(Yi)−s¯)t with s¯= 1 n n X i=1 s(Yi).

M : theh×p matrix defined by

M = 1 n n X i=1 (s(Yi)−¯s)(Xi−X¯)t,

(16)

Maximum Likelihood estimation (3/3)

IfW andΣˆ are regular, then the ML estimators are :

Direction:ˆb is the eigenvector associated to the largest eigenvalue λˆ ofΣˆ−1_Mt_W−1_M_,

Coordinate:cˆ=W−1_Mˆ_b/ˆ_bt_Vˆˆ_b_,

Location parameter :µˆ= ¯X−s¯tcˆVˆˆb,

Covariance matrix :Vˆ = ˆΣ−ˆλΣˆˆbˆbtΣˆ/ˆbtΣˆˆb,

Signal to Noise Ratio :ρˆ= ˆλ/(1−λˆ). The inversion ofΣˆ is still necessary.

(17)

SIR : A particular case

In the particular case ofpiecewise constant basis functions

sj(.) =I{.∈Sj}, j= 1, . . . , h,

standard calculations show that

MtW−1

M = ˆΓ

and thus the ML estimatorˆb ofb is the eigenvector associated to the largest eigenvalue ofΣˆ−1_Γ.ˆ

(18)

Outline

(19)

Gaussian prior

Introduction of a prior information on the projection ofX onb

appearing in the inverse regression model

(1 +ρ)−1/2₍_s₍_Y₎₋_s_¯₎t_cb_{∼ N}₍₀_,_Ω)_.

(1 +ρ)−1/2 _{is introduced for normalization purposes,}

permitting to preserve the interpretation of the eigenvalue in terms of signal to noise ratio.

Ω describes which directions inRp are the most likely to contain b.

(20)

Gaussian regularized estimators

IfW andΩ ˆΣ +Ip are regular, the ML estimators are

Direction :ˆb is the eigenvector associated to the largest eigenvalue λˆ of(Ω ˆΣ +Ip)−1ΩMtW−1M,

Coordinate : ˆc=W−1_Mˆ_b/_{((1 +}_η_(ˆ_b_))ˆ_bt_Vˆˆ_b₎_{, with}

η(ˆb) = ˆbt_Ω−1ˆ_b/ˆ_bt_Σˆˆ_b_, ˆ

µ,Vˆ andρˆare unchanged.

=⇒The inversion ofΣˆ is replaced by the inversion ofΩ ˆΣ +Ip.

=⇒For a properly chosen prior matrix Ω, the numerical instabilities in the estimation ofbdisappear.

(21)

Gaussian regularized SIR (1/2)

GRSIR :In the particular case of piecewise constant basis

functions, the ML estimatorˆb ofb is the eigenvector associated to the largest eigenvalue of(Ω ˆΣ +Ip)−1ΩˆΓ.

Links with existing methods

Ridge [Zhong et al, 2005]:Ω =τ−1_I

p. No privileged direction

for bin Rp.τ >0 is the regularization parameter. PCA+SIR[Chiaromonte et al, 2002] :

Ω = d X j=1 1 ˆ δj ˆ qjqˆjt,

whered∈ {1, . . . , p} is fixed, ˆδ1 ≥ · · · ≥δˆd are the dlargest

(22)

Gaussian regularized SIR (2/2)

Three new methods

PCA+ridge : Ω = 1 τ d X j=1 ˆ qjqˆjt.

No privileged direction in thed-dimensional eigenspace. Tikhonov :Ω =τ−1_{Σ. Directions with large variance are most}ˆ likely. PCA+Tikhonov : Ω = 1 τ d X j=1 ˆ δjqˆjqˆjt.

In thed-dimensional eigenspace, directions with large variance are most likely.

(23)

Outline

(24)

Validation on simulations

Experimental set-up :Same as previously.

Proximity criterionbetween the true directionband the estimated onesˆb(r) onN = 100 replications :

PC= 1 N N X r=1 (btˆb(r))2 0≤PC≤1,

a value close to 0 implies a low proximity : Theˆb(r) are nearly orthogonal to b,

a value close to 1 implies a high proximity : Theˆb(r) _are

(25)

Influence of the regularization parameter

logτ versus PC. The “cut-off” dimension and the condition number are fixed (d= 20and θ= 2).

Ridge andTikhonov : significant improvement ifτ is large,

PCA+SIR: reasonable results compared to SIR,

(26)

Sensitivity with respect to the condition number of

the covariance matrix

θversus PC. The “cut-off” dimension is fixed tod= 20. The optimal regularization parameter is used for each value ofθ.

Only SIRis very sensitive to the ill-conditioning, ridge andTikhonov : similar results,

(27)

Sensitivity with respect to the “cut-off” dimension

dversus PC. The condition number is fixed (θ= 2) The optimal regularization parameter is used for each value ofd.

(28)

Outline

(29)

Estimation of Mars surface physical properties from

hyperspectral images

Context :

Observation of the south pole of Mars at the end of summer, collected during orbit 61 by the French imaging spectrometer OMEGA on board Mars Express Mission.

3D image : On each pixel, a spectra containing p= 184 wavelengths is recorded.

This portion of Mars mainly contains water ice, CO2 and dust. Goal :For each spectraX ∈Rp, estimate the corresponding physical parameterY ∈R(grain size of CO2).

(30)

An inverse problem

Forward problem.

Physical modeling of individual spectra with a surface reflectance model.

Starting from a physical parameter Y, simulateX=F(Y). Generation of n= 12,000synthetic spectra with the corresponding parameters.

=⇒Learning database.

Inverse problem.

Estimate the fonctional relationship Y =G(X). Dimension reduction assumption G(X) =g(btX).

b is estimated by SIR/GRSIR,g is estimated by a nonparametric one-dimensional regression.

(31)

Estimated functional relationship

(32)

Estimated CO2

maps

Grain size of CO2 estimated by SIR (left) and GRSIR (right) on an

(33)

References

[Li, 1991] Li, K.C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association,86, 316–327.

[Cook, 2007]. Cook, R.D. (2007). Fisher lecture : Dimension reduction in regression. Statistical Science,22(1), 1–26.

[Zhong et al, 2005]. Zhong, W., Zeng, P., Ma, P., Liu, J.S. and Zhu, Y. (2005). RSIR : Regularized Sliced Inverse Regression for motif discovery. Bioinformatics,21(22), 4169–4175.

[Chiaromonte et al, 2002]. Chiaromonte, F. and Martinelli, J. (2002). Dimension reduction strategies for analyzing global gene expression data with a response. Mathematical Biosciences,176, 123–144.

(34)

References

R. Coudret, S. Girard & J. Saracco. A new sliced inverse regression method for multivariate response,Computational Statistics and Data Analysis, to appear, 2014.

M. Chavent, S. Girard, V. Kuentz, B. Liquet, T.M.N. Nguyen & J. Saracco. A sliced inverse regression approach for data stream, Computational Statistics, to appear, 2014.

C. Bernard-Michel, S. Douté, M. Fauvel, L. Gardes & S. Girard. Retrieval of Mars surface physical properties from OMEGA hyperspectral images using Regularized Sliced Inverse Regression, Journal of Geophysical Research - Planets, 114, E06005, 2009. C. Bernard-Michel, L. Gardes & S. Girard. A Note on Sliced Inverse Regression with Regularizations,Biometrics, 64, 982–986, 2008. C. Bernard-Michel, L. Gardes & S. Girard. Gaussian Regularized Sliced Inverse Regression, Statistics and Computing, 19, 85–98, 2009.

A. Gannoun, S. Girard, C. Guinot & J. Saracco. Sliced Inverse Regression in reference curves estimation,