Interpreting the eigenvectors - The identification and application of common principal componen

Inspection of the eigenvectors of the sample covariance matrix occasionally reveals complex relationships between variables which were not previously suspected. Such relationships would ordinarily not be revealed by a mere bivariate analysis of the variables. In this way PCA provides a more com- prehensive view of the structure in the data.

As discussed in Section 2.6, S is not scale invariant and the eigenvectors of S will usually differ from the eigenvectors of the correlation matrix, R. Therefore the interpretation of these two sets of eigenvectors will also differ and if the variables are incommensurable it is recommended to rather use the eigenvectors ofR. If PCA is performed usingS and one of the variables has a much larger variance than the others, the first eigenvector will be dominated by this variable.

In the case where all p variables in x are uncorrelated, E = I and the

principal components will simply be theporiginal variables. For such uncorrelated variables the sample correlations will usually be small but not equal to zero, and each of the eigenvectors will be dominated by a single variable. The rank order of the eigenvectors will in this case correspond to the rank order of the variances of the original variables.

If all the loadings of the first eigenvector have the same sign, the associated principal component is a weighted average of the variables. According to the Perron-Frobenius theorem (Rencher 2002, p. 34), this will happen

when all of the off-diagonal elements of S or R are positive. In the case

where the measurements were made on physical objects, this first principal component will often be an indication of the overallsize of the objects. Both positive and negative loadings in an eigenvector (contrasting the variables to one another) is often an indication of shape.

Where the pvariables inxinclude dimensional measurements on objects

as well as other quantitative characteristics such as chemical properties, the eigenvectors associated with the largest eigenvalues will usually be dominated by the size and shape characteristics of the objects. The eigenvectors associated with the smaller eigenvalues can contain valuable information about the chemical characteristics, as the variances of these properties will generally be smaller than those of the size measurements.

A large absolute loading in any eigenvector means that the associated variable is highly correlated with the specific principal component (Krzanowski, 1979). With sufficient knowledge about the data under consideration, identification of the variables with large absolute loadings (relative to the rest of the loadings) in each of the principal components can aid the practitioner in labelling the different principal components as pertaining to properties such

as size, shape, chemical characteristics, etc. In this way, the interpretation of the principal components is similar to the description of the unobservable factors in factor analysis.

Even so, for very high dimensional data the interpretation of the eigenvectors by inspection of the loadings may still be problematic, as typically none of the loadings will be equal to zero. One solution is to rotate the individual eigenvectors further in order to find directions in which some of the loadings are equal to zero, to simplify interpretation. However, as pointed out by Rencher (2002), further rotation of the individual eigenvectors will cause them to not be mutually orthogonal anymore. The rotated solution will also not be optimal in the sense that the components successively account for the maximum of the remaining variance observed in the data.

Another proposal to aid with the interpretation of eigenvectors is to in- spect the correlations between the original variables and the principal components. Variables showing high correlations with the first number of principal components are deemed to be important in accounting for the observed vari- ation in the data. Rencher (2002) has shown that using this method to rank the variables in order of importance does not necessarily provide the same rank order as would be obtained by ranking the variables according to the absolute magnitude of their loadings in a specific eigenvector. Furthermore, this method only provides univariate information about the variables and is therefore not very useful for interpretation in the multivariate context.

Jolliffe et al. (2003) and Zou et al. (2006) developed a technique called

sparse principal component analysis (SPCA) to ease the interpretation of principal components. SPCA shrinks many of the smallest eigenvector loadings to zero by imposing a constraint on the sum of the absolute values of the loadings. The approach taken by Jolliffe et al. (2003) is to maximise e0_j(X0X)ej subject to

h=1|ejh| ≤t and e

jej = 1, (2.44)

where t is some predetermined positive value. However, because this

approach is computationally difficult and there is no clear guidance on how to select an appropriate value fort, Zou et al. (2006) proposed to find the SPCA solution by reformulating it as a ridge regression problem and obtaining the eigenvector matrix, E, which minimises the criterion

n X m=1 kxm−AE0xm k2+θ p X j=1 kej k2+ p X j=1 θ1,j kej k1, (2.45)

kej k1=

h=1

|ejh|, (2.46)

subject to the constraint A0A = Ip. The first penalisation factor, θ, is

kept constant, while the second penalisation factor, θ1,j, is allowed to vary

from component to component. The SPCA algorithm provided by Zou et al. (2006) initially sets A equal to the eigenvectors of X, whereafter A and E are updated iteratively until convergence. They also provide some guidelines for choosing appropriate values for the penalisation factors.

Shen and Huang (2008) proposed a simpler and computationally cheaper

SPCA algorithm named sparse PCA via regularised SVD (sPCA-rSVD).

Their algorithm for the computation of the sparse principal components minimises a different objective function and proved to be considerably faster than the SPCA algorithm of Zou et al. (2006).

When many of the smaller eigenvector coefficients are shrunk to zero, the interpretation of the principal components is simplified, especially for data with a large number of variables. However, as pointed out by Shen and Huang (2008), the orthogonality property of the principal components is lost with most SPCA procedures.

In document The identification and application of common principal components (Page 47-49)