• No results found

Chapter 2 Literature Review

2.4 Data Reduction

The collection of HMA data results in an extensive amount of temporal information. Generally, gait variables are normalised using 101 data points to a percentage of stance phase or the entire gait cycle. To allow a meaningful statistical analysis to be performed, these temporal waveforms must be summarised using a smaller number of discrete variables. This has resulted in an extensive application of data reduction techniques to HMA data (Chau, 2001a). A common method of reducing data is to define discrete parameters of the waveform, as shown in Figure 2.7. For example, during the swing phase of gait, the knee must flex to achieve toe clearance as the limb progresses forward. A reduction in this angle might be related to an indication of an increased risk of trips or falls.

Choosing which discrete parameter to calculate, however, is subjective and may be discarding valuable information. While consistent peaks and troughs may be identifiable in healthy subjects, often the waveforms of pathological subjects will have completely different characteristics. Furthermore, by completely discarding the rest of the waveform, important information regarding inter-subject variability can be lost (Gaudreault et al., 2011).

Deluzio et al. (1999) demonstrated that PCA was a useful technique in the reduction of temporal biomechanical data. The study found that principal component scores were sensitive to gait changes associated with knee OA, as well as changes following a partial knee replacement. PCA has since been successfully applied at Cardiff University to help distinguish between OA and non-pathological (NP) subjects and hence objectively measure changes in gait parameters following TKR surgery (Jones et al., 2008, Whatling

et al., 2008, Whatling, 2009, Metcalfe et al., 2013).

Principal Component Analysis is a multivariate data analysis technique which applies an orthogonal transformation of an n dimension dataset of potentially correlated variables, in order to arrive at a new n dimension dataset of linearly uncorrelated variables. The

first dimension of the new dataset will represent the greatest amount of variance in the dataset, and so forth until the nth dimension, which will often end up representing an extremely small amount of the total variance. It then becomes possible to reduce the dimensionality of the dataset by only considering, for example, the first five dimensions.

2.4.1 Computing Principal Components

PCA is a relatively straightforward multivariate analysis technique. The steps are listed below but explained in much greater detail in Section 3.4.

1. Standardise the data – such that it has zero mean and a unit variance 2. Calculate the correlation coefficient matrix

A

B

Figure 2.7 A) An example of how a knee flexion/extension angle during gait might be reduced

into discrete parameters which can be easily interpreted.

B) An example of the how the results of principal component analysis (PCA) might be interpreted. The three principal components which represent the greatest total variance have been selected. The areas highlighted by the dashed lines represent the proportion of the gait

3. Calculate the eigendecomposition of the correlation matrix to compute at the eigenvectors and eigenvalues

4. Multiply the eigenvectors by the square root of the eigenvalues to arrive at the factor loadings

5. Multiply the eigenvectors by the standardised data points to arrive at the principal component (PC) scores for each subject

If, for example, 101 data points have been used to normalise a gait waveform to 0-100% of the gait cycle, this method will calculate 101 eigenvectors, each with 101 dimensions. Each eigenvector will have a corresponding eigenvalue which represents how much of the total variance of the dataset that eigenvector represents; e.g. if the first eigenvalue was 0.78, and if we were then to reconstruct all the waveforms using just that first eigenvector/principal component, 78% of the initial variance between subjects would be represented.

The purpose in this instance was to reduce the dimensions of the dataset, and therefore not all 101 PCs will be retained. One objective criterion for PC selection is to use Kaisers rule (Kaiser, 1960). This rule suggests that all principal components with an eigenvalue of less than one should be discarded. Another reasonable technique is to define a target variance that would ideally be represented. The minimum number of PCs that are required to meet that threshold can then be used for further analysis.

A further potential selection technique is to use the factor loadings. The factor loadings can be thought of as the correlation coefficients of the new data. The correlation coefficient between two variables is often donated as the r value, and the amount of the total variance that correlation represents is generally donated as the r2 value. If a correlation is greater than 0.71 or less than -0.71, its r2 value is greater than 0.5 and it, therefore, represents greater than 50% of the variance. Each principal component has a factor loading for each point of the gait cycle, indicating how much of the total variance that principal component represents at that point of the gait cycle. Comrey and Lee

(2013) suggest that the >0.71 and <-0.71 range be used as a threshold for consideration of PCs.

2.4.2 Further Techniques

The technique of individually computing PCs for waveforms takes advantage of the high amounts of correlation that individual points of a single waveform have with each other. There is also a high amount of interdependency between individual gait variables and there are therefore potential advantages to performing PCA to all waveforms in a single ‘state space’. For example, knee flexion is required during swing phase to achieve ground clearance, but hip circumduction, a combination of ab/adduction and internal/external rotation can be adopted as a gait compensatory strategy to achieve toe clearance. In subjects adopting this strategy, changes in knee flexion/extension waveforms would likely be highly correlated with changes in hip flexion/extension, ab/adduction, and internal/external rotation. If these waveforms were all considered in a single PCA, a large amount of this correlated variation might be representable using a single principal component.

The application of PCA to multiple joint angles within the same ‘state space’ has been reported in the literature (Boyer et al., 2012). Other researchers have also employed a slightly different technique which applies PCA to the time normalised marker coordinate data. The marker coordinates of the pelvis are generally subtracted from the markers at each frame, such that markers’ coordinates are represented as distances from the pelvis marker (von Tscharner et al., 2013, Federolf et al., 2013). These techniques often still include the whole GRF waveforms within the state space.