• No results found

Regularization methods for Sliced Inverse Regression

N/A
N/A
Protected

Academic year: 2021

Share "Regularization methods for Sliced Inverse Regression"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Regularization methods for Sliced Inverse Regression

Caroline Bernard-Michel, Laurent Gardes, Stephane Girard

To cite this version:

Caroline Bernard-Michel, Laurent Gardes, Stephane Girard. Regularization methods for Sliced Inverse Regression. 8th International Conference on Operations Research,, Feb 2008, La Ha-vane, Cuba. <hal-00985822>

HAL Id: hal-00985822

https://hal.inria.fr/hal-00985822

Submitted on 30 Apr 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destin´ee au d´epˆot et `a la diffusion de documents scientifiques de niveau recherche, publi´es ou non, ´emanant des ´etablissements d’enseignement et de recherche fran¸cais ou ´etrangers, des laboratoires publics ou priv´es.

(2)

Regularization methods for Sliced Inverse Regression

Stéphane Girard

Team Mistis, INRIA Rhône-Alpes, France http ://mistis.inrialpes.fr/˜girard

February 2008

(3)

Outline

1 Sliced Inverse Regression (SIR)

2 Inverse regression without regularization

3 Inverse regression with regularization

4 Validation on simulations

(4)

Outline

1 Sliced Inverse Regression (SIR)

2 Inverse regression without regularization

3 Inverse regression with regularization

4 Validation on simulations

(5)

SIR : Goal

[Li, 1991]

Infer the conditional distribution of a response r.v. Y R

given a predictor X∈Rp.

Whenp is large, curse of dimensionality.

Sufficient dimension reduction aims at replacingX by its projection onto a subspace of smaller dimension without loss of information on the distribution of Y givenX.

The central subspaceis the smallest subspace S such that, conditionally on the projection of X onS,Y andX are independent.

(6)

SIR : Basic principle

Assumedim(S) = 1 for the sake of simplicity,i.e. S=span(b), withb∈Rp = Single index model :

Y =g(btX) +ξ whereξ is independent ofX.

Idea:

Find the directionb such thatbtX best explains Y.

Conversely, when Y is fixed,btX should not vary.

Find the directionb minimizing the variations ofbtX given Y.

In practice:

The range of Y is partitioned intoh slicesSj. Minimize the within slice variance ofbtX under the

normalization constraint var(btX) = 1.

Equivalent to maximizing the between slice varianceunder the same constraint.

(7)
(8)

SIR : Estimation procedure

Given a sample{(X1, Y1), . . . ,(Xn, Yn)}, the directionb is

estimated by

ˆb= argmax

b

btΓˆb u.c. btΣˆb= 1. (1)

whereΣˆ is the estimated covariance matrix and Γˆ is the between slice covariance matrix defined by

ˆ Γ = h X j=1 nj n( ¯Xj−X¯)( ¯Xj−X¯) t, X¯ j = 1 nj X Yi∈Sj Xi,

with nj is proportion of observations in sliceSj. The optimization

problem (1) has an explicit solution :ˆbis the eigenvector of Σˆ1Γˆ

(9)

SIR : Limitations

Problem :Σˆ can be singular, or at least ill-conditioned, in several situations.

Since rank( ˆΣ)min(n−1, p), ifn≤pthen Σˆ is singular. Even whenn andpare of the same order, Σˆ is ill-conditioned, and its inversion introduces numerical instabilities in the estimation of the central subspace.

Similar phenomena occur when the coordinates ofX are highly correlated.

(10)

SIR : Numerical experiment (1/2)

Experimental set-up.

A sample {(X1, Y1), . . . ,(Xn, Yn)}of size n= 100 where

Xi∈Rp with p= 50andYi∈R, fori= 1, . . . , n.

Xi∼ Np(0,Σ) with Σ =QQt where

∆ =diag(pθ, . . . ,2θ,1θ),

Qis a matrix drawn from the uniform distribution on the set of orthogonal matrices.

= The condition number ofΣ is.(Here,θ= 2).

Yi=g(btXi) +ξ where

g is the link functiong(t) = sin(πt/2),

bis the true directionb= 51/2Q(1,1,1,1,1,0, . . . ,0)t, ξ∼ N1(0,9.104)

(11)

SIR : Numerical experiment (2/2)

Blue : ProjectionsbtX

i on the

true directionbversus Yi, Red : Projections ˆbtXi on

the estimated direction ˆb ver-susYi,

(12)

Outline

1 Sliced Inverse Regression (SIR)

2 Inverse regression without regularization

3 Inverse regression with regularization

4 Validation on simulations

(13)

Single-index inverse regression model

Model introduced in[Cook, 2007].

X =µ+c(Y)V b+ε, (2)

where

µ andbare non-random Rp− vectors,

ε∼ Np(0, V), independent of Y,

c:RR is a nonrandom coordinate function.

Consequence :The conditional expectation ofX−µgiven Y is a degenerated random vector located in the directionV b.

(14)

Maximum Likelihood estimation (1/3)

Projection estimator of the coordinate function. c(.) is expanded as a linear combination of h basis functions sj(.),

c(.) =

h

X

j=1

cjsj(.) =st(.)c,

wherec= (c1, . . . , ch)t is unknown and

s(.) = (s1(.), . . . , sh(.))t. Model (2) can be rewritten as

X =µ+st(Y)cV b+ε, ε∼ Np(0, V),

Definition :Signal to Noise Ratio in the direction b.

ρ= b

tΣbbtV b

(15)

Maximum Likelihood estimation (2/3)

Notations

W : the h×hempirical covariance matrix ofs(Y) defined by

W = 1 n n X i=1 (s(Yi)−s¯)(s(Yi)−s¯)t with s¯= 1 n n X i=1 s(Yi).

M : theh×p matrix defined by

M = 1 n n X i=1 (s(Yi)¯s)(Xi−X¯)t,

(16)

Maximum Likelihood estimation (3/3)

IfW andΣˆ are regular, then the ML estimators are :

Directionb is the eigenvector associated to the largest eigenvalue λˆ ofΣˆ1MtW1M,

Coordinate:cˆ=W−1Mˆb/ˆbtVˆˆb,

Location parameter :µˆ= ¯X−s¯tcˆVˆˆb,

Covariance matrix :Vˆ = ˆΣˆλΣˆˆbˆbtΣˆ/ˆbtΣˆˆb,

Signal to Noise Ratio :ρˆ= ˆλ/(1−λˆ). The inversion ofΣˆ is still necessary.

(17)

SIR : A particular case

In the particular case ofpiecewise constant basis functions

sj(.) =I{.∈Sj}, j= 1, . . . , h,

standard calculations show that

MtW−1

M = ˆΓ

and thus the ML estimatorˆb ofb is the eigenvector associated to the largest eigenvalue ofΣˆ1Γ.ˆ

(18)

Outline

1 Sliced Inverse Regression (SIR)

2 Inverse regression without regularization

3 Inverse regression with regularization

4 Validation on simulations

(19)

Gaussian prior

Introduction of a prior information on the projection ofX onb

appearing in the inverse regression model

(1 +ρ)1/2(s(Y)s¯)tcb∼ N(0,Ω).

(1 +ρ)1/2 is introduced for normalization purposes,

permitting to preserve the interpretation of the eigenvalue in terms of signal to noise ratio.

Ω describes which directions inRp are the most likely to contain b.

(20)

Gaussian regularized estimators

IfW andΩ ˆΣ +Ip are regular, the ML estimators are

Direction :ˆb is the eigenvector associated to the largest eigenvalue λˆ of(Ω ˆΣ +Ip)1ΩMtW−1M,

Coordinate : ˆc=W−1Mˆb/((1 +ηb))ˆbtVˆˆb), with

ηb) = ˆbtb/ˆbtΣˆˆb, ˆ

µ,Vˆ andρˆare unchanged.

=The inversion ofΣˆ is replaced by the inversion ofΩ ˆΣ +Ip.

=For a properly chosen prior matrix Ω, the numerical instabilities in the estimation ofbdisappear.

(21)

Gaussian regularized SIR (1/2)

GRSIR :In the particular case of piecewise constant basis

functions, the ML estimatorˆb ofb is the eigenvector associated to the largest eigenvalue of(Ω ˆΣ +Ip)1ΩˆΓ.

Links with existing methods

Ridge [Zhong et al, 2005]:Ω =τ−1I

p. No privileged direction

for bin Rp.τ >0 is the regularization parameter. PCA+SIR[Chiaromonte et al, 2002] :

Ω = d X j=1 1 ˆ δj ˆ qjqˆjt,

whered∈ {1, . . . , p} is fixed, ˆδ1 ≥ · · · ≥δˆd are the dlargest

(22)

Gaussian regularized SIR (2/2)

Three new methods

PCA+ridge : Ω = 1 τ d X j=1 ˆ qjqˆjt.

No privileged direction in thed-dimensional eigenspace. Tikhonov :Ω =τ−1Σ. Directions with large variance are mostˆ likely. PCA+Tikhonov : Ω = 1 τ d X j=1 ˆ δjqˆjqˆjt.

In thed-dimensional eigenspace, directions with large variance are most likely.

(23)

Outline

1 Sliced Inverse Regression (SIR)

2 Inverse regression without regularization

3 Inverse regression with regularization

4 Validation on simulations

(24)

Validation on simulations

Experimental set-up :Same as previously.

Proximity criterionbetween the true directionband the estimated onesˆb(r) onN = 100 replications :

PC= 1 N N X r=1 (btˆb(r))2 0PC1,

a value close to 0 implies a low proximity : Theˆb(r) are nearly orthogonal to b,

a value close to 1 implies a high proximity : Theˆb(r) are

(25)

Influence of the regularization parameter

logτ versus PC. The “cut-off” dimension and the condition number are fixed (d= 20and θ= 2).

Ridge andTikhonov : significant improvement ifτ is large,

PCA+SIR: reasonable results compared to SIR,

(26)

Sensitivity with respect to the condition number of

the covariance matrix

θversus PC. The “cut-off” dimension is fixed tod= 20. The optimal regularization parameter is used for each value ofθ.

Only SIRis very sensitive to the ill-conditioning, ridge andTikhonov : similar results,

(27)

Sensitivity with respect to the “cut-off” dimension

dversus PC. The condition number is fixed (θ= 2) The optimal regularization parameter is used for each value ofd.

(28)

Outline

1 Sliced Inverse Regression (SIR)

2 Inverse regression without regularization

3 Inverse regression with regularization

4 Validation on simulations

(29)

Estimation of Mars surface physical properties from

hyperspectral images

Context :

Observation of the south pole of Mars at the end of summer, collected during orbit 61 by the French imaging spectrometer OMEGA on board Mars Express Mission.

3D image : On each pixel, a spectra containing p= 184 wavelengths is recorded.

This portion of Mars mainly contains water ice, CO2 and dust. Goal :For each spectraX Rp, estimate the corresponding physical parameterY R(grain size of CO2).

(30)

An inverse problem

Forward problem.

Physical modeling of individual spectra with a surface reflectance model.

Starting from a physical parameter Y, simulateX=F(Y). Generation of n= 12,000synthetic spectra with the corresponding parameters.

=Learning database.

Inverse problem.

Estimate the fonctional relationship Y =G(X). Dimension reduction assumption G(X) =g(btX).

b is estimated by SIR/GRSIR,g is estimated by a nonparametric one-dimensional regression.

(31)

Estimated functional relationship

(32)

Estimated CO2

maps

Grain size of CO2 estimated by SIR (left) and GRSIR (right) on an

(33)

References

[Li, 1991] Li, K.C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association,86, 316–327.

[Cook, 2007]. Cook, R.D. (2007). Fisher lecture : Dimension reduction in regression. Statistical Science,22(1), 1–26.

[Zhong et al, 2005]. Zhong, W., Zeng, P., Ma, P., Liu, J.S. and Zhu, Y. (2005). RSIR : Regularized Sliced Inverse Regression for motif discovery. Bioinformatics,21(22), 4169–4175.

[Chiaromonte et al, 2002]. Chiaromonte, F. and Martinelli, J. (2002). Dimension reduction strategies for analyzing global gene expression data with a response. Mathematical Biosciences,176, 123–144.

(34)

References

R. Coudret, S. Girard & J. Saracco. A new sliced inverse regression method for multivariate response,Computational Statistics and Data Analysis, to appear, 2014.

M. Chavent, S. Girard, V. Kuentz, B. Liquet, T.M.N. Nguyen & J. Saracco. A sliced inverse regression approach for data stream, Computational Statistics, to appear, 2014.

C. Bernard-Michel, S. Douté, M. Fauvel, L. Gardes & S. Girard. Retrieval of Mars surface physical properties from OMEGA hyperspectral images using Regularized Sliced Inverse Regression, Journal of Geophysical Research - Planets, 114, E06005, 2009. C. Bernard-Michel, L. Gardes & S. Girard. A Note on Sliced Inverse Regression with Regularizations,Biometrics, 64, 982–986, 2008. C. Bernard-Michel, L. Gardes & S. Girard. Gaussian Regularized Sliced Inverse Regression, Statistics and Computing, 19, 85–98, 2009.

A. Gannoun, S. Girard, C. Guinot & J. Saracco. Sliced Inverse Regression in reference curves estimation,

References

Related documents