Simultaneous diagonalisation algorithms - The identification and application of common principa

The classical Jacobi iteration algorithm (Jacobi, 1846) is the oldest known method for the diagonalisation of a single symmetric matrix. It is a compa- ratively simple method which involves the systematic pre- and postmultipli- cation of ap×psymmetric matrixS with a sequence of orthogonal matrices so as to annihilate all off-diagonal elements in S. With each iteration of the procedure, the largest off-diagonal element in S is turned to zero by the rotation of S.

However, with the rotation at each iteration, some of the previously an- nihilated off-diagonal elements may again take non-zero values. The rotation process is therefore continued until the absolute values of all of the off-diagonal elements are smaller than some suitably small constant.

In a modification of the classical Jacobi algorithm, thecyclical Jacobi algorithm avoids the search for the largest off-diagonal element ofS by choosing the vector rotation pairs cyclically, for example in the order (1,2), (1,3), . . ., (1, p), (2,3), . . ., (2, p), (p−1, p), (Flury, 1988).

TheFlury-Gautschi (FG) algorithm was proposed by Flury and Gautschi

(1986) as an extension of the cyclical Jacobi procedure to two or more p×

p symmetric matrices. As a measure of simultaneous deviation of the Li

matrices from diagonality, they defined the measure

φ(L1, . . . ,Lk;n1, . . . , nk) = k Y i=1 det (diagLi) det (Li) ni , (3.16)

which attains a minimum value of one when all k matrices are perfectly

diagonal. The FG algorithm provides an estimate of the modal matrixBin a

way that (3.16) is minimised. Given that the CPC model is appropriate, the

columns of B are estimates of the common eigenvectors of the k groups and

together will simultaneously rotate all of the Si matrices to nearly diagonal

form.

To use the known parametric methods for inference on the common eigenvector loadings in B, the assumption of multivariate normality in the populations is necessary. However, the FG algorithm does not depend on this assumption, which justifies its use in finding common eigenvectors of the co- variance matrices of multivariate non-normal populations (see Flury, 1988, pp. 71 and 178–188).

For the special case where k = 1 the FG algorithm may also be used to

obtain the eigenvectors of the single group.

The FORTRAN routines given by Flury (1988) for the FG algorithm

to compute the modal matrix B was translated to R and are given in Ap-

pendix B.

Krzanowski (2000) proposed a simple estimator for the matrix of common eigenvectors and thek sets of eigenvalues. Under CPC hypothesis (3.1),

B0(Σ1 +. . .+Σk)B =Λ1+. . .+Λk. (3.17)

Let Γ=Σ1+. . .+Σk and ΛALL =Λ1+. . .+Λk. The spectral decom-

position of Γ is given by

Γ=BΛALLB0, (3.18)

with the columns of B containing the eigenvectors of Γ. Let G =S1+

. . .+Sk be the unbiased sample estimator of Γ. A simple estimator of the

common eigenvector matrix, B, is given by the spectral decomposition ofG, i.e.

G=BLALLB0. (3.19)

When comparing the results from the simple estimator in (3.19) with the

B matrix as estimated with the FG algorithm, Krzanowski (2000) observed

the common eigenvector loadings to be almost identical for the case where the CPC hypothesis is tenable.

Cardoso and Souloumiac (1996) gave, in closed form, the optimal Jacobi

rotation angles to nearly diagonalise k symmetric matrices simultaneously.

k X i=1 off(B0SiB), (3.20) where off(B0SiB) = off(Li) = X 1≤j6=h≤p l_jh2 (3.21)

is the sum of the squared off-diagonal elements of the nearly diagonalised matrix Li.

The algorithm for their method was implemented in the rjd function of

the JADE package in R (Nordhausen et al., 2013), and will henceforth be

referred to as the JADE algorithm.

Because the focus of the FG algorithm is on simultaneous diagonalisation, the common eigenvectors are found in an arbitrary order. It is therefore not always useful if the purpose of the CPC analysis is dimensionality reduction, as the common eigenvectors estimated by this method do not necessarily have the same rank order in all of the populations with regard to the amount of variation accounted for (Trendafilov, 2010).

Furthermore, the FG algorithm estimates the common eigenvectors (if they exist) simultaneously through an iterative procedure. If, for the purpose of dimensionality reduction, only the firstq < pcommon eigenvectors should be retained, it implies that the computation involved with finding the last

p−q eigenvectors is unnecessary.

Trendafilov (2010) proposed a stepwise CPC technique where the common eigenvectors are estimated sequentially, analogous to PCA for a single group. The method almost always ensures that the common eigenvectors are found in such a way that the rankings of the common eigenvectors with regard

to the amount of variation accounted for are the same over all k groups.

For data sets with a large number of correlated variables, the stepwise CPC

procedure may be stopped at the point where q < p common eigenvectors,

which account for some minimum (for example, 90%) of variation within

each of thek groups, are found. This may save valuable computing time and

at the same time ensure that the variation of all the groups are represented sufficiently well in the q-dimensional approximation.

The stepwise CPC procedure is based on the standard power method

(Golub and Van Loan, 1996) and estimates the jth _{common eigenvector by}

minimising the criterion,

k X i=1 (ni−1)log b0jSibj , (3.22)

subject to the usual orthogonality constraints,b0_jbj = 1 andbj0B[j−1]=0, where the columns of B[j−1] are the first j−1 common eigenvectors.

Trendafilov (2010) have further shown mathematically and also by numer- ical examples that whenli1 ≥li2 ≥. . .≥lip for thek groups simultaneously,

the stepwise CPC solution coincides with the solution given by the FG algorithm. If this is not the case, the stepwise CPC solution will still ensure that the subsequent eigenvalues are decreasing (or at least not increasing too much) within each of the groups. Stepwise CPC is thus better suited for dimensionality reduction than the FG algorithm, and the proportion of variation accounted for by the first q < p stepwise CPC eigenvectors is usually

equal to or greater than that of the first q common eigenvectors estimated

with the FG algorithm.

On the other hand, the stepwise CPC algorithm does not attempt to minimise the criteria in (3.16) or (3.20) and thus generally performs worse than the FG and JADE algorithms in the simultaneous diagonalisation of symmetric matrices.

The stepwise CPC algorithm supplied by Trendafilov (2010) was imple-

mented in the R function stepwisecpc which is given in Appendix B.

Two new algorithms for the estimation of common eigenvectors have re- cently been developed by Browne and McNicholas (2014b) and Browne and McNicholas (2014a), namely theaccelerated line search (ALS)andmajorisation- minimisation (MM) algorithms. They applied these new algorithms in the mixture model-based clustering context, showing that it surpasses the FG algorithm in speed, specifically in higher-dimensional (p ≥ 20) situations.

For the k = 5, p = 100 case considered in the simulation study reported in

Browne and McNicholas (2014a), the computational time of the FG algorithm became prohibitively large. However, the FG algorithm seems slightly superior to the ALS and MM algorithms in terms of the convergence criterion used.

In document The identification and application of common principal components (Page 66-69)