Chapter 3: General Methodology
3.4 Multivariate Statistical Analyses
3.4.1 Principal Component Analysis
Principal Component Analysis (PCA) is multivariate statistical method used to establish variation between variables. Using PCA, data is presented in a new coordinate system, capturing the maximum variance of a data set (Badesa et al., 2014; Dillmann et al., 2014; Wu et al., 2007; Yang
et al., 2012). PCA can be calculated using either the covariance or correlation matrices. The
matrix used depends on the nature of the data, for example, if the variables under investigation share the same units the covariance matrix should be used whilst the correlation matrix should be used when the variables have different units. PCA was first applied to biomechanical data to derive a representation of signals instead of using signals themselves (Wootten et al., 1990), others used it as a data reduction method (Olney et al., 1998), whilst different researchers used it to assess entire gait waveforms retaining potentially valuable information (Deluzio et al., 1997). A visual example of PCA is shown in Figure 3.15. Suppose the spheres represent two variables that make up a data set represented in a 𝑥1− 𝑥2 coordinate system (Figure 3.15 a). The direction in which most of the variance occurs between these two variables can be captured by the axis 𝑢 (Figure 3.15 b). A second axis 𝑣, perpendicular to axis 𝑢, will represent the axis holding the second most variation between the data (Figure 3.15 c). The 𝑢 − 𝑣 coordinate system will
Chapter 3: General Methodology
76 represent the mean of the variables, where the covariance between 𝑢 and 𝑣 variables would be zero. For a given data set, PCA finds the axis system defined by the principle direction of variance, i.e. 𝑢 − 𝑣 axis, were 𝑢 and 𝑣 are the principle components (PCs) (Figure 3.15 d). In a larger data set, with a greater number of variables, the number of PCs would match the number of variables, creating a high-dimensional space.
Figure 3.15. Illustration of PCA analysis. The variance of the variables is captured using PCA and represented in a new data set of PCs.
To compute PCA using covariance matrix the following methods are used (Robertson et al., 2013). Firstly, the data under investigation should be represented in a matrix.
𝑋 = [ 𝑥11 𝑥12 … 𝑥21 𝑥22 … ⋮ 𝑥𝑛1 ⋮ 𝑥𝑛2 ⋱ … 𝑥1𝑝 𝑥2𝑝 ⋮ 𝑥𝑛𝑝 ] (3.7)
Chapter 3: General Methodology
77 To find differences in the structure of the data, the covariance of columns of 𝑋 is calculated.
𝑆 = [ 𝑠11 𝑠12 … 𝑠21 𝑠22 … ⋮ 𝑠𝑛1 ⋮ 𝑠𝑛2 ⋱ … 𝑠1𝑝 𝑠2𝑝 ⋮ 𝑠𝑛𝑝 ] (3.8) Where:
𝑆 = covariance or correlation matrix (of columns of 𝑋)
𝑠𝑗𝑗= diagonal elements, that represent the variance at each instance of the temporal waveform.
Where the diagonal elements of covariance are computed as follows:
𝑠𝑖𝑖 =
∑𝑛𝑘=1(𝑥𝑘𝑖− 𝑥̅𝑖)2
𝑛 − 1 (3.9)
Where: 𝑖 = column
𝑛 = number of rows (participants)
The off-diagonal elements represent the covariance between each pair of time instants: 𝑐𝑜𝑣 (𝑖, 𝑗) = 𝜎𝑖,𝑦= 𝑠𝑖𝑗=
∑𝑛𝑘=1(𝑥𝑘𝑖− 𝑥̅𝑖)(𝑥𝑘𝑗− 𝑥̅𝑗)
𝑛 − 1 (3.10)
Where:
𝑖 and 𝑗 = two columns
𝑛 = number of rows (participants) 𝑥̅ = mean value
𝜎 = variance
A covariance that is not equal to zero indicates a linear relationship between two variables. The strength of the linear relationship can be defined by the correlation coefficient:
𝑐𝑜𝑟𝑟(𝑖, 𝑗) = 𝜌𝑖,𝑗= 𝑟𝑖𝑗 = 𝑠𝑖𝑗 𝑠𝑖𝑖𝑠𝑗𝑗 𝑜𝑟 𝜎𝑖,𝑗 𝜎𝑖𝜎𝑗 𝑜𝑟 𝑐𝑜𝑣(𝑖, 𝑗) 𝜎𝑖𝜎𝑗 (3.11)
The variance of the original data (matrix 𝑋) is presented by the covariance matrix 𝑆. If the off- diagonal elements of matrix 𝑆 are non-zero, they represent a correlation of the columns in matrix 𝑋. The principal components (PC) are extracted from matrix 𝑆. Since the PCs are independent of each other i.e. uncorrelated, the off-diagonal elements of the covariance matrix 𝑆 are changed to equal zero. The process of changing all off-diagonal elements to zero from the covariance matrix
Chapter 3: General Methodology
78 𝑆 to a covariance matrix 𝐷 is known as diagonalisation, or also referred to as orthogonal decomposition and is given by:
𝑈𝑇𝑆𝑈 = 𝐷 (3.12)
Where:
𝑆= covariance matrix
𝑈= orthogonal transformation of X (columns of U are Eigenvectors of S known as loading vectors)
𝑈𝑇= transpose orthogonal transformation of X
D = diagonal covariance matrix of S (Eigenvalues are stored in D which indicate variance of PCs).
If the covariance matrix of data is a diagonal matrix, such that the covariances are zero, then this means that the variances must be equal to the Eigenvalues . Matrix 𝑈 can be seen as a orthogonal transformation matrix of the original data set in a new coordinate system. The new coordinates represent PCs which are aligned in descending order of variance in the data. The columns of 𝑈 are Eigenvectors of 𝑆 and are known as loading vectors which are the PCs.
The diagonal covariance matrix 𝐷 has the elements 𝜆𝑖, which are the Eigenvalues of 𝑆. Each Eigenvalue is a measure of variance associated with each PC. The maximum number of PCs is presented by the non-zero diagonal elements of matrix 𝐷. This is equal to fewer of participant number 𝑛 or length of temporal waveform 𝑝 corresponding to the rank 𝑟 of matrix 𝑆.
Matrix 𝑈 is the transformation of the original data set to new uncorrelated principal components (𝑌).
𝑌
(𝑛𝑥𝑟)= [𝑋 − 𝑋̅](𝑛𝑥𝑟) (𝑝𝑥𝑟)𝑈 (3.13) In matrix 𝑌 each column is a PC and the elements of these columns are PC scores. Following the computation of PCs, they are organised in descending order of variance so that the first PC displays the maximum amount of variance in the original data followed by the second PC orthogonal to the first, and so on. The Eigenvalues 𝜆𝑖 which are the diagonal elements of matrix 𝐷 give the variance of each PCs.
Hence PCA is a technique that conserves the variance of the original raw data through the PCs. To measure the total variation within the data the sum of variances can be computed which is corresponding to the sum of diagonal elements of 𝑆. The sum of the diagonal matrix in referred to as (𝑡𝑟) of a matrix therefore:
Chapter 3: General Methodology
79
𝑡𝑟(𝑆) = 𝑡𝑟(𝐷) (3.14)
Quantifying the portion of total variance explained by each principal component, 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑏𝑦 𝑃𝐶𝑖 = 𝜆𝑖
𝑡𝑟 (𝑆)= 𝜆𝑖
∑ 𝜆 (3.15)