T here is no satisfactory approach in th e literatu re to th e analysis of influ ence in a p artial least squares setting. This is probably due to th e com pu ta tio n a l cost of naive im plem entations of full leave-one-out cross-validatory calculations for p artial least squares regression, as well as to th e algorithm ic approach to partial least squares in m ost of th e literature. M artens and Naes [MN89, page 285] extend th e application of influence m easures for ordinary least squares to the p artial least squares setting. These m easures have the advantage th a t closed form formulae exist which m ay be employed to reduce th e com putational cost. However, th e validity of this approach in a p artial least squares setting is doubtful. This is because it will typically ignore th e
contributions of p artial least squares com ponents which have not been incor p o rated in th e selected regression equation. Furtherm ore, this m ethod will not identify observations which are influential for the p artial least squares decom position itself.
T he Stone-Brooks cross-validatory algebra provides an ideal opportunity to consider th e problem of influence for p artial least squares. Full leave-one- out cross-validation for p artial least squares regression involves the com pu ta tio n of th e cross-validated sums of squared projections
hk(i) y yÇxjÇtfcp") T 1)) ,
for each partial least squares com ponent and each observation in th e
cross-validatory calculations. Hence, as for principal com ponent regression, we could consider the influence m easures
hk —
where is th e sum of squared projections of the com plete d a ta on the
vector of p artial least squares com ponent loadings q^. However, in contrast to the principal com ponent eigenvalue pertu rbatio n s, these statistics will re flect th e extent to which an observation is outlying in th e response space as well as in th e predictor space. More im portantly, these statistics do not have th e elegant properties of th e corresponding principal com ponent influence measures. F irst of all, they m ay no longer be in terp reted as sums of squared projection downdates for the corresponding p artial least squares com ponents. T he sum of squared projections for a p artial least squares com ponent m ay well increase when an observation is removed from th e data. Also, these m easures no longer sum to th e to tal principal com ponent eigenvalue down
d ate p and they are not central to th e Stone-Brooks approach to p artial least
squares leave-one-out cross-validation. Hence, they are not appropriate as p artial least squares influence m easures. Likewise, their sum no longer ap pears suitable for a possible norm alisation of th e individual sums of squared p artial least squares projection perturbations.
A m ore n atu ral p artial least squares influence m easure would be th e p er
tu rb a tio n of th e sample covariance vector d. Indeed, p artial least squares
decom position optim izes the sample covariance and hence, we could consider th e influence m easure
This seems to be a sum m ary influence m easure th a t considers the influence of observations in both the predictor and response space sim ultaneously by
weighting the sum m ary principal com ponent influence m easure p w ith th e
square of th e observed response for th e removed datum .
As always in these cross-validatory com putations, we m ay easily generate th e leave-one-out cross-validated residuals
yi{k) yi{i,k)'
3.3.3
C o m p u ta tio n a l C ost and P erform an ce
We would like to com pare the com putational cost of the Stone-Brooks cross- validatory com putations for partial least squares w ith naive im plem entations of full leave-one-out p artial least squares cross-validation. For th e naive im plem entations, we will consider th e orthogonalized and non-orthogonalized p a rtia l least squares algorithm s of M artens and Naes [MN89, pages 119 to 125] [Den91, pages 62 and 64], as well as the p artial least squares com puta tions described by Helland [Hel88].
S im u la tio n s
T he Stone-Brooks approach to full leave-one-out p artial least squares re gression cross-validation was im plem ented in th e num erical analysis package GAUSS [Apt92] (appendix, page 146). The naive procedures for th e cross- validation of p a rtia l least squares regression were also im plem ented in GAUSS (appendix, pages 141, 142 and 143 for th e base procedures).
We used th e native GAUSS procedure OLSQR [Apt92, volume 2, page 1352] for th e com putation of the least squares fits in b o th M arten s’s non- orthogonalized m ethod as well as in H elland’s m ethod for p artial least squares decom position. This procedure is based on the Q R decomposition. T he ini tial eigendecom position necessary for th e Stone-Brooks approach was com p u ted w ith th e GAUSS procedure EIGHV [Apt92, volume 2, page 1180].
T he sim ulations were done in th e same m anner as explained for principal com ponent regression cross-validation. For th e regular case (n — 1 > p), d a ta were generated w ith th e GAUSS pseudo-random num ber generator RNDU [Apt92, volum e 2, page 1452]. We generated predictor m atrices w ith 5, 10, 15, 20, 25, 30, 35, 40, 45 up to 50 predictor variables and 100 observations. For
1
aI
o o
§
— Martens ; orthogonalized algorithm Martens : non-orthogonalized algorithm Helland — Stone-Brooks 0
1
I
01
8
O o 10 20 30 40 50 5 number o f predictorsFigure 3.1: C om putation tim es in seconds for full leave-one-out cross- validation of p artial least squares regression in th e regular case w ith 100 observations.
each sim ulated problem , a response vector of 100 num bers was generated and th e four im plem entations of full leave-one-out cross-validation were com pared on each sim ulated problem . For each sim ulation, all partial least squares factors were derived and th e PRESS statistic was com puted for all factors. Figure 3,1 shows th e com putation tim es in seconds, w ith th e num ber of pred icto r variables plo tted on a log scale. There appears to be an order of m ag n itu d e difference in the com putational cost between th e non-orthogonal and th e orthogonal m ethods.
For th e singular case (n — 1 < p), th e same procedure was repeated, b u t w ith m atrices of 12 observations and 15, 20, 25,50,75,100,500,1000,2500 up to 3000 predictor variables. For each sim ulation, all ten factors were derived in th e cross-validation, A small num ber of observations was chosen so th a t th e results will m ainly reflect th e costs due to the num ber of predictor variables derived in the cross-validatory com putations. Figure 3.2 shows th e results p lo tted on th e log scale for the num ber of predictor variables. As for principal com ponents, the com putational cost due to the num ber of predictor variables appears to cancel out for th e Stone-Brooks m ethod. T here is a slight increase in th e com putational cost of this m ethod when th e num ber of predictor variables exceeds 500, This is probably due to th e in itial eigendecom position. All other m ethods are an order of m agnitude m ore expensive.
Let us again consider a theoretical study of th e com putational costs to analyse these results. To simplify m atters, we will only concern ourselves w ith orders of m agnitude descriptions of th e com putational costs involved in these procedures. Also, we will have special atten tio n for th e com putational costs due to th e num ber of predictor variables in th e problem , as p artial least squares com putations are often employed in problem s w ith m any predictor variables. Therefore, we will describe costs for th e case when all p artial least squares factors are derived,
A naive im plem entation of full leave-one-out cross-validation w ith th e orthogonal scores algorithm of M artens and Naes involves an order of
m ultiplications in th e regular case and an order of pn^ m ultiplications in th e
singular case. For th e non-orthogonalized algorithm described in th e same
book, as well as for th e algorithm due to Helland, we will consider im ple m entations based on th e QR decom position for th e ordinary least squares fitting in th e p a rtia l least squares algorithm s, A naive im plem entation of full
1
•S
I
§
I
— Martens ; orthogonalized algorithm Martens : non-orthogonalized algorithm Helland — Stone-Brooks o in
8
o in o 15 25 50 100 500 1000 3000 number o f predictorsFigure 3.2: C om putation tim es in seconds for full leave-one-out cross-
validation of p artial least squares regression in th e singular case w ith 12 observations.
Case C om putation Regular Singular w=xm’ym np^ pn'^ w =w /sqrt(w ^w ) p" pn t=xm*w np^ pn^ t t = t ^t np n^ p = x m 't / t t np"^ pn"^ q = y m 't / t t np n^ xm=xm-t*p^ np^ pn^ ym=ym-t*q np n^ Subtotal np^ T np T p^ pn^ pn -\- n^ PRESS pn Total np^ -\- np -\- p^ pn'^ -\- pn n^
Table 3.1: C ontributions of a single observation to th e order of th e num ber of m ultiplications involved in a full leave-one-out cross-validatory analysis of p artial least squares regression w ith a naive im plem entation based on M arten s’s orthogonalized algorithm , when all factors are derived in th e cross- validation.
Case C om putation Regular Singular w=xm ^ ym np^ prP w = w /sq rt(w 'w ) p ' pn t=xm*w np^ pn^ q q = o l s q r ( y O , t t [ . , 1 : i ] ) np^ xm=xm-t*w^ np^ pn^ y m = y O -tt[ . , 1 : i] * q q np^ Subtotal np^ + np"^ -f p^ + pn + PR ESS pn 4- n^ Total np^ -t- np^ + p^ pn^ -\- pn rU
Table 3.2: C ontributions of a single observation to the order of th e num ber of m ultiplications involved in a full leave-one-out cross-validatory analysis of p a rtia l least squares regression w ith a naive im plem entation based on M arten s’s non-orthogonalized algorithm , when all factors are derived in the cross-validation. Case C om putation Regular Singular w=s-xm '(xm *b) np^ pn^ w =w /sqrt(w ^w ) p ' pn x w [. , i]=xm*w np^ pn^ b= w w [.,1 : i]* o ls q r( y m ,x w [. np^ + p^ S ub to tal np^ + np^ + p^ pn^ + pn -1- PR ESS p2 pn Total np^ + np^ + p^ pn^ 4- pn -f n'^
Table 3.3: C ontributions of a single observation to th e order of th e num ber of m ultiplications involved in a full leave-one-out cross-validatory analysis of p a rtia l least squares regression w ith a naive im plem entation based on H olland’s algorithm , when all factors are derived in the cross-validation.
Case C om putation Regular Singular a i = e . * z i- n u * s u m c ( f i. * z i ) * f i / m ia i= m i* a i p^ m i= m i-m ia i* m ia i V a i 'm i a i p^ z i= m i* d i z i = z i / s q r t ( z i ' z i ) Subtotal PRESS np'^ + np Total np^ n p T p^
Table 3.4: C ontributions of a single observation to th e order of th e num ber of m ultiplications involved in a full leave-one-out cross-validatory analysis of p a rtia l least squares regression w ith th e Stone-Brooks m ethod, when all factors are derived in th e cross-validation.
leave-one-out cross-validation for both these algorithm s will involve an order
of m ultiplications for the regular case, using the Householder m ethod
for th e Q R decom position [GL89, page 219]. In th e singular case, these al
gorithm s require an order of pn^ + m ultiplications. The Stone-Brooks
cross-validatory com putations are based on th e eigendecom position of the cross-products m atrix , in th e regular case, and of th e m atrix X X ^ , in the singular case. However, as these calculations are only done once, we m ay largely ignore th e com putational cost of these decom positions. The rem ain
der of th e calculations are of order n^p^ -f rPp in th e regular case and of order
rP in th e singular case.
As for principal com ponents, we have sum m arized these results in ta bles 3.1 to 3.4, which consider th e contributions of a single observation to th e com p u tatio n al costs of cross-validating th e regression equation, for each
observation. Again, th e relevant quantities m ust be m ultiplied by n for com
parison w ith th e tex t.
A few conclusions m ay be drawn from this rudim entary analysis of com p u tatio n al cost. F irst of all, com putations are considerably slower in th e regular case, not only because th e num ber of observations tends to be larger.
b u t also because th e com putational costs due to the num ber of predictor variables increase. In th e singular case, th e num ber of p artial least squares com ponents th a t m ay be derived is effectively restricted by th e num ber of observations. Furtherm ore, in the regular case, th e com putational cost of th e non-orthogonal m ethods relates to th a t of th e orthogonal approaches as
relates to n^p^. Thus, for th e same num ber of observations, we observe
in effect an order of m agnitude increase in cost due to th e num ber of predic to r variables only. This is because both these m ethods com pute correlated scores and hence, a num erically optim ized least squares fitting procedure m ust be employed, like those based on th e Q R algorithm , to ob tain reli able least squares fits. It is th e cost of com puting these Q R decom positions th a t dom inates th e to tal com putational cost for th e non-orthogonal m e th ods. O ur im plem entation is based on D enham ’s approach [DenQl], who has im plem ented th e algorithm s of M artens, Naes and H elland in Spins [Sta92], using th e native LS procedure for least squares fitting, which is based on th e Q R decom position. Likewise, th e tables on com putational cost assum e a QR decom position for th e non-orthogonal m ethods. M arten s’s orthogonalized m ethod and the Stone-Brooks approach do not have these problem s w ith uncorrelated scores.
In th e singular case, the com putational cost due to th e num ber of predic to r variables cancels out for the Stone-Brooks approach. This is in analogy to th e efficient cross-validation of principal com ponent regression. T he rea sons are similar. The cost of com puting th e partial least squares scores does not depend on th e num ber of predictor variables and neither does th e cost of com puting th e m atrices M (q.