Naive im plem entations of leave-one-out principal com ponent cross-validation are based on th e recom putation of th e cross-products m atrices after th e dele
tion of an observation. There is an order of np^ m ultiplications involved in
th e com putation of each cross-products m atrix. Hence, th e to ta l num ber of m ultiplications due to the com putation of th e cross-products m atrices is of
order rPp^.
For each cross-products m atrix , the principal com ponent decom position m ust then be com puted from scratch. The num ber of m ultiplications in volved in th e com putation of the principal com ponent coefficients and th e principal com ponent scores depends on th e m ethod th a t is chosen for these com putations. Golub and Van Loan [GL89, page 423], W ilkinson [Wil65] and W ilkinson and Reinsch [WR71, p art 2] are th e stan d ard references on th e com putation of the sym m etric eigenvalue problem . For th e sym m etric QR
algorithm [GL89, page 423], the com putations will be of order p^. C alculat
ing th e principal com ponent scores from th e principal com ponent coefficients
will involve np^ m ultiplications. Thus the to tal cost of th e cross-validatory
procedure will be of order rPp^ -|- np^.
Several authors in the statistical literatu re refer to th e singular value decom position as a suitable m ethod for th e joint com putation of th e princi pal com ponent coefficients and th e principal com ponent scores [Jol86, page 235] [Seb84, page 506]. An im plem entation of th e naive approach w ith th e Golub and Reinsch m ethod [GL89, page 430] for th e singular value decom
position will involve an order of np^ + p^ m ultiplications for each d atu m
[GL89, page 239]. Perform ing this for all d a ta requires an order of rPp^ T n p ^
m ultiplications. Hence, such im plem entations will involve roughly th e same com putational costs as com pared to those based on th e Q R algorithm .
W ith th e efficient procedures, th e num ber of m ultiplications involved in
dow ndating th e principal com ponent coefficients is of order for each da
coefficients is of order np^. Therefore, when th e num ber of observations is large com pared to th e num ber of predictor variables, this approach should be significantly faster th an the naive im plem entations of principal com ponent
coefficient dow ndating. This is because we avoid a com putation of order n^p^.
T hus we achieve an order of m agnitude improvem ent in th e dow ndating of th e principal com ponent coefficients.
U nfortunately, for applications to principal com ponent regression, this gain is largely lost in the com putation of th e PRESS statistics. Indeed, the leave-one-out cross-validation of principal com ponent regression involves the co m p u tatio n of th e principal com ponent scores, which requires an order of
rPp^ m ultiplications. This does not m ean th a t we no longer achieve th e
im provem ent in the calculation of th e principal com ponent coefficients, but ra th e r th a t this im provem ent is drowned in a subsequent calculation which is of th e sam e order of m agnitude as for th e naive procedures. Hence, we can only halve th e num ber of m ultiplications in th e regular case.
M ethod O bject Naive Efficient S ( i ) np^ not applicable Q(.) p ' p ' U ( . ) np^ np^ Total np^ + p^ np^ + p^
Table 2.1: The contributions of a single observation to th e order of th e num ber of m ultiplications needed for th e dow ndating of a principal com ponent decom position in th e regular case w ith th e leave-one-out approach.
T h e sin g u la r ca se
In th e singular case, th e naive approach is based on the com putation of th e
m atrices X(qX ^^. There is an order of prP m ultiplications involved in th e
com putation of each such m atrix. Thus, th e to ta l num ber of m ultiplications
necessary for th e com putation of these m atrices is of th e order p n ^ . W ith th e
M ethod O bject
Naive Efficient
S(i) np^ not applicable
Q(.) p^ not applicable
U(i) np^ np^
Subtotal np^ 4- p^ np^
PRESS np np
Total np^ P np -\- p^ np^ 4- np
Table 2.2: T he contributions of a single observation to th e order of th e num ber of m ultiplications needed for the com putation of th e prediction error sum of squares for principal com ponent regression in th e regular case, using th e leave-one-out approach.
of th e principal com ponent decom position of one such m atrix is of th e order
n^. T h e principal com ponent scores can be obtained from this decom posi
tion, which requires an order of m ultiplications. We m ust also com pute
th e principal com ponent coefficients, which involves an order of -f m ul
tiplications. T hus, th e to tal num ber of m ultiplications necessary to obtain
th e cross-validated principal com ponent analysis is of th e order prP + n'*.
T he efficient rank-one m odification procedure involves an order of m ul
tiplications for th e com putation of th e dow ndated principal com ponent scores
for all observations and an order pn^ — prP [BN78] m ultiplications for the
calculation of th e principal com ponent coefficients. In principal com ponent regression, we only need to consider th e dow ndating of th e scores. As can be seen, th e com putational cost of calculating these quantities does not depend
on p. As a consequence, the com putational cost of th e efficient approach is
constant when th e num ber of observations is fixed, irrespective of th e num ber of predictor variables th a t are considered.
In applications to principal com ponent regression, th e calculation of the
PR ESS statistic will involve an order of m ultiplications and hence, this
gain is m aintain ed in th e full leave-one-out cross-validation of principal com ponent regression. Thus we achieve an order of m agnitude reduction for th e cross-validation of principal com ponent regression in th e singular case and
th e com putational cost due to th e num ber of variables is elim inated com pletely. M ethod O bject Naive Efficient Sp) prP not applicable Qp) pn^ T n? prP -f n? Up) rP Total prP + prP -F rP
Table 2.3: T he contributions of a single observation to th e order of th e num ber of m ultiplications needed for the dow ndating of a principal com ponent decom position in th e singular case w ith th e leave-one-out approach.
M ethod O bject Naive Efficient S ( i ) prP not applicable Q ( . ) prP -p rP not applicable U ( . ) Subtotal prP + PRESS rP Total prP -P
Table 2.4: The contributions of a single observation to th e order of the num ber of m ultiplications needed for the com putation of th e prediction error sum of squares for principal com ponent regression in th e singular case, using th e leave-one-out approach.
Finally, tables 2.1 up to 2.4 sum m arize these results. T he tables are concerned w ith costs induced by each single observation in th e full leave-one- out cross-validation. Hence, the appropriate quantities m ust be m ultiplied