The regular case - Efficient cross-validatory computations and influence measures for principal

Naive im plem entations of leave-one-out principal com ponent cross-validation are based on th e recom putation of th e cross-products m atrices after th e dele

tion of an observation. There is an order of np^ m ultiplications involved in

th e com putation of each cross-products m atrix. Hence, th e to ta l num ber of m ultiplications due to the com putation of th e cross-products m atrices is of

order rPp^.

For each cross-products m atrix , the principal com ponent decom position m ust then be com puted from scratch. The num ber of m ultiplications in volved in th e com putation of the principal com ponent coefficients and th e principal com ponent scores depends on th e m ethod th a t is chosen for these com putations. Golub and Van Loan [GL89, page 423], W ilkinson [Wil65] and W ilkinson and Reinsch [WR71, p art 2] are th e stan d ard references on th e com putation of the sym m etric eigenvalue problem . For th e sym m etric QR

algorithm [GL89, page 423], the com putations will be of order p^. C alculat

ing th e principal com ponent scores from th e principal com ponent coefficients

will involve np^ m ultiplications. Thus the to tal cost of th e cross-validatory

procedure will be of order rPp^ -|- np^.

Several authors in the statistical literatu re refer to th e singular value decom position as a suitable m ethod for th e joint com putation of th e princi pal com ponent coefficients and th e principal com ponent scores [Jol86, page 235] [Seb84, page 506]. An im plem entation of th e naive approach w ith th e Golub and Reinsch m ethod [GL89, page 430] for th e singular value decom

position will involve an order of np^ + p^ m ultiplications for each d atu m

[GL89, page 239]. Perform ing this for all d a ta requires an order of rPp^ T n p ^

m ultiplications. Hence, such im plem entations will involve roughly th e same com putational costs as com pared to those based on th e Q R algorithm .

W ith th e efficient procedures, th e num ber of m ultiplications involved in

dow ndating th e principal com ponent coefficients is of order for each da

coefficients is of order np^. Therefore, when th e num ber of observations is large com pared to th e num ber of predictor variables, this approach should be significantly faster th an the naive im plem entations of principal com ponent

coefficient dow ndating. This is because we avoid a com putation of order n^p^.

T hus we achieve an order of m agnitude improvem ent in th e dow ndating of th e principal com ponent coefficients.

U nfortunately, for applications to principal com ponent regression, this gain is largely lost in the com putation of th e PRESS statistics. Indeed, the leave-one-out cross-validation of principal com ponent regression involves the co m p u tatio n of th e principal com ponent scores, which requires an order of

rPp^ m ultiplications. This does not m ean th a t we no longer achieve th e

im provem ent in the calculation of th e principal com ponent coefficients, but ra th e r th a t this im provem ent is drowned in a subsequent calculation which is of th e sam e order of m agnitude as for th e naive procedures. Hence, we can only halve th e num ber of m ultiplications in th e regular case.

M ethod O bject Naive Efficient S ( i ) np^ not applicable Q(.) p ' p ' U ( . ) np^ np^ Total np^ + p^ np^ + p^

Table 2.1: The contributions of a single observation to th e order of th e num ber of m ultiplications needed for th e dow ndating of a principal com ponent decom position in th e regular case w ith th e leave-one-out approach.

T h e sin g u la r ca se

In th e singular case, th e naive approach is based on the com putation of th e

m atrices X(qX ^^. There is an order of prP m ultiplications involved in th e

com putation of each such m atrix. Thus, th e to ta l num ber of m ultiplications

necessary for th e com putation of these m atrices is of th e order p n ^ . W ith th e

M ethod O bject

Naive Efficient

S(i) np^ not applicable

Q(.) p^ not applicable

U(i) np^ np^

Subtotal np^ 4- p^ np^

PRESS np np

Total np^ P np -\- p^ np^ 4- np

Table 2.2: T he contributions of a single observation to th e order of th e num ber of m ultiplications needed for the com putation of th e prediction error sum of squares for principal com ponent regression in th e regular case, using th e leave-one-out approach.

of th e principal com ponent decom position of one such m atrix is of th e order

n^. T h e principal com ponent scores can be obtained from this decom posi

tion, which requires an order of m ultiplications. We m ust also com pute

th e principal com ponent coefficients, which involves an order of -f m ul

tiplications. T hus, th e to tal num ber of m ultiplications necessary to obtain

th e cross-validated principal com ponent analysis is of th e order prP + n'*.

T he efficient rank-one m odification procedure involves an order of m ul

tiplications for th e com putation of th e dow ndated principal com ponent scores

for all observations and an order pn^ — prP [BN78] m ultiplications for the

calculation of th e principal com ponent coefficients. In principal com ponent regression, we only need to consider th e dow ndating of th e scores. As can be seen, th e com putational cost of calculating these quantities does not depend

on p. As a consequence, the com putational cost of th e efficient approach is

constant when th e num ber of observations is fixed, irrespective of th e num ber of predictor variables th a t are considered.

In applications to principal com ponent regression, th e calculation of the

PR ESS statistic will involve an order of m ultiplications and hence, this

gain is m aintain ed in th e full leave-one-out cross-validation of principal com ponent regression. Thus we achieve an order of m agnitude reduction for th e cross-validation of principal com ponent regression in th e singular case and

th e com putational cost due to th e num ber of variables is elim inated com pletely. M ethod O bject Naive Efficient Sp) prP not applicable Qp) pn^ T n? prP -f n? Up) rP Total prP + prP -F rP

Table 2.3: T he contributions of a single observation to th e order of th e num ber of m ultiplications needed for the dow ndating of a principal com ponent decom position in th e singular case w ith th e leave-one-out approach.

M ethod O bject Naive Efficient S ( i ) prP not applicable Q ( . ) prP -p rP not applicable U ( . ) Subtotal prP + PRESS rP Total prP -P

Table 2.4: The contributions of a single observation to th e order of the num ber of m ultiplications needed for the com putation of th e prediction error sum of squares for principal com ponent regression in th e singular case, using th e leave-one-out approach.

Finally, tables 2.1 up to 2.4 sum m arize these results. T he tables are concerned w ith costs induced by each single observation in th e full leave-one- out cross-validation. Hence, the appropriate quantities m ust be m ultiplied

In document Efficient cross-validatory computations and influence measures for principal component and partial least squares decompositions with applications in chemometrics (Page 56-60)