• No results found

2.3 Principal components analysis for functional data

2.3.1 PCA for multivariate data

One of the problems with multivariate data is that there are simply too many variables to make the application of graphical techniques successful in provid- ing an informative initial assessment of the data. Moreover, having too many variables can also cause problems, such as multicollinearity, for other multi- variate techniques that the researcher may want to apply to the data. Principal components analysis is a multivariate technique with the central aim of reduc- ing the dimensionality of a multivariate data set while accounting for as much of the original variation as possible. This aim is achieved by creating a new set of variables, the principal components, that are linear combinations of the original variables, which are uncorrelated and are ordered so that the first few

of them account for most of the variation in all the original variables. Ideally, the result of a principal components analysis would be the creation of a small number of new variables that can be used as surrogates for the originally large number of variables and consequently provide a simpler basis for graphing or summarising the data, and for further multivariate analyses of the data.

Let the data matrix X be of size n×p, where n is the number of samples and

p is the number of variables. Let us also assume that each variable is centered,

i.e. column means have been subtracted and the centered means are now equal to zero. Principal components analysis describes variation in a set of correlated variables, xT = (x

1, ..., xp)T, in terms of a new set of uncorrelated variables, yT = (y

1, ..., yp)T, each of which is a linear combination of the x variables. The

new variables are derived in decreasing order of ‘importance’ in the sense that

y1accounts for as much of the variation as possible in the original data amongst

all linear combinations of x. Then y2 is chosen to account for as much of the

remaining variation as possible, subject to being uncorrelated with y1, and so

on. The new variables defined by this process, y1, ..., yp, are the orthogonal

principal components. The general hope of principal components analysis is that the first few components will account for a substantial proportion of the variation in the original variables, x1, ..., xp, and can be used to provide a

convenient lower-dimensional summary of these variables.

Finding the principal components

The first principal component of the observations, y1, is the linear combination

whose sample variance is greatest among all such linear combinations. Since the variance of y1 could be increased without limit simply by increasing the co-

efficients ψT

1 = (ψ11, ψ12, ..., ψ1p)T, a normalization restriction must be placed

on these coefficients: the sum of squares of the coefficients should take the value 1. Hence, to find the coefficients defining the first principal component, we need to choose the elements of the vector ψ1 that maximize the variance

of y1, subject to the constraint ψ1Tψ1 = 1. The second principal component, y2, is defined to be the linear combination

y2 = ψ21x1+ ψ22x2+ ... + ψ2pxp

that has the greatest variance subject to the following two conditions: ψ2Tψ2 =

1 and ψT

2ψ1 = 0, i.e. y1 and y2 are uncorrelated. Continuing in this fashion,

the jth principal component is that linear combination y

j = ψjTx that has the

greatest variance subject to the conditions ψT

j ψj = 1 and ψjTψi = 0 for all i < j. To calculate each ψj, we solve the eigenequation

Σψj = λjψj,

where Σ is the sample variance-covariance matrix which is defined as Σ =

N−1XTX (X is the centered data matrix), ψj is the jth eigenvector of Σ with

the corresponding eigenvalue λj. Putting all eigenvectors as columns of a ma-

trix V and corresponding eigenvalues as entries of a diagonal matrix Λ, the above equation can be extended to ΣV = VΛ, or Σ = VΛVT, the eigen de- composition of Σ. Here, the columns of V are called the principal components (PCs) which are orthogonal with unit norm; Λ is a diagonal matrix, defined

as Λ = diag{λ1, λ2, . . . , λp} where the entries are non-negative and arranged

in decreasing order. The entry λk, k = 1, . . . , p gives the variance of the data

along the corresponding PC and the proportion of variance explained by the

kth PC is defined as λ

k/Ppl=1λl. Finally, the projections of the data on the

principal components are known as PC scores; these can be seen as new trans- formed variables. The jth principal component projection is given by the jth column of XV and the coordinates of the ith data point in the new PC space are given by the ith row of XV .

Related documents