Multivariate Statistical Analysis and Machine Learning Algorithms in Gait

Chapter 2: Review of Literature

2.6 Characteristics of Gait Data

2.6.2 Multivariate Statistical Analysis and Machine Learning Algorithms in Gait

Automatic gait recognition tools are becoming increasingly popular in gait analysis. In a clinical setting, they can provide a quantitative, non-invasive diagnostic method, patient-specific treatment recommendations, and more effective evaluation of treatment outcomes (Alaqtash et

al., 2011b; Lakany, 2008; Pogorelc et al., 2012). Current challenges in clinical settings are the

discrimination of able-bodied gait and pathological gait and the evaluation of the progression of pathological gait (Figueiredo et al., 2018). Therefore, classification methods based on statistical analysis, mathematical transformation and machine learning algorithms have been assessed in the investigation of gait data (Alaqtash et al., 2011b). Using statistical analysis, the persistent challenges of an objective analysis have not been achieved and a normal distribution of data is assumed (Chau, 2001b). Mathematical transforms were limited to applications of univariate signals and guideline selection based on wavelets (Chau, 2001b). However, machine learning algorithms used to develop automatic gait recognition tools were able to detect patterns and work with complex non-linear relationships between variables (Alaqtash et al., 2011b; Zheng et al., 2009). They provide an objective method for the analysis of large datasets and thus eliminating researcher bias (Alaqtash, Sarkodie-Gyan, et al., 2011) whilst providing a quick and cost-effective method of analysis (Alaqtash et al., 2011b; Lakany, 2008; Simon, 2004). Furthermore, these algorithms could handle high-dimensional data and new data could easily be incorporated to improve the prediction performance (Alaqtash et al., 2011a; Begg & Kamruzzaman, 2005 ; Zheng

et al., 2009). The ability to address nonlinear and high-dimensional data such as gait data and the

ability to properly process new data makes machine learning algorithms a suitable method for gait analysis.

Research studies have implemented multivariate statistical analysis methods and machine learning algorithms to investigate Parkinson’s disease, cerebral palsy, spinal cord injury, osteoarthritis, running injuries and stroke. The application of these advanced statistical methods was initiated due to the lack of quantitative methods in the assessment of motor symptoms in Parkinsonian gait (Palmerini et al., 2011). In recent years, the use of machine learning algorithms has had many applications for the assessment of pathological gait, for example investigating the use of classifiers to detect cerebral palsy in infants (Rahmati et al., 2016) and children (Kamruzzaman and Begg, 2006), determine the severity of the condition (Rozumalski and Schwartz, 2009), characterise movement patterns of stroke patients (Kaczmarczyk et al., 2009) and diagnose osteoarthritis (Astephen et al., 2008). Other applications included determining the risk of developing a disease or predicting the outcome of an intervention (Wei et al., 2017).

Chapter 2: Review of Literature

45 In gait analysis, many studies focused on predictive tasks such as classification (80.6%) and regression (11.6%), while only a few investigated data mining such as clustering tasks (7.8%) (Halilaj et al., 2018). These two machine learning approaches, predictive modelling and data mining, serve different purposes compared to more traditional statistical approaches. Predictive modelling is used to find a function/model to map input data such as kinetic or kinematic waveforms to a given output such as severity of pathology so that it can be used to make future predictions. An example of predictive modelling is powered prosthesis, which use myoelectrical sensors embedded in the prosthesis’ socket to predict an individual’s intention for the upcoming steps (e.g. Afzal et al., 2017). Predictive modelling was also used to develop diagnosis and prognostic models, for example, of predicting falling (e.g. Wei et al., 2017) or activity during outpatient treatment (e.g. Biswas et al., 2013). Data mining, on the other hand, is used to discover new patterns in data. For example, using clustering analysis gait patterns of subpopulations within types of pathological gait could be identified (e.g. Rozumalski and Schwartz, 2009).

Recent investigations in the development of automatic gait recognition tool were performed on data extracted from wearable sensory systems such as footswitches and accelerometers (Taborri

et al., 2016). Advances in technology make these sensors smaller, lightweight and easier to take

on and off. These sensors also allow measuring variables in free-living conditions which can be advantageous specifically in the advancement of robotic or powered therapies (e.g. Afzal et al., 2017). Hegde et al., (2018), for example, used shoe-based wearable sensors to monitor activity and gait of children with CP. Machine learning models were used to automatically classify activities of daily living. The results showed that activities could be classified with a 95.3% and 96.2% accuracy for children with and without CP, respectively. A disadvantage of wearable sensors, however, is that they only provide kinematic data. To overcome this issue Wouda et al. (2018) used ANN to estimate kinematic and kinematic parameters of runners using wearable sensors. Joint angles and vertical acceleration from the wearable sensors were used as input values to estimate vertical GRF. The outcome showed that sagittal knee kinematics and vertical GRF could be estimated using three inertial sensors with no significant difference to the reference data. Although wearable sensors have their advantages, using non-ambulatory external sensors such as motion capture-systems or force platforms can provide more detailed information. These systems operate in a controlled environment (Sabatini et al., 2005), which occasionally is considered a disadvantage since it can be challenging to acquire consecutive gait cycles for long-term applications in a natural environment (Alahakone et al., 2010; Azhar et al., 2014). However, the accuracy of these systems cannot be underestimated, as they provide comprehensive and reliable biomechanical data such as temporal-spatial, kinematic and kinetic variables (Howell et al., 2012;

Chapter 2: Review of Literature

46 Bamberg et al., 2008). Alaqtash et al., (2011a), for example, have used the nearest neighbour classifier and ANN to classify GRF data of able-bodied individuals, individuals with CP and multiple sclerosis. The classification outcome yielded an accuracy of 95%, indicating that automatic gait recognition tools can be useful for clinicians in the diagnosis and identification of pathological gait. Ertelt et al., (2018) used Gaussian distribution to classify the GRF patterns of athletes from different sports. The results showed that the overall prediction was 94,29% of sports and athletes. Only three out of the ten sports under investigation could not be correctly classified in all instances, whilst the other sports were 100% correctly allocation. These results can have high implications in both medical and sports fields since they have the potential to be used for the identification of gait patterns at different points during an intervention.

In previous studies, the feature space, which presents the number of variables, was generally larger than number of observations, which present the number of participants (Alaqtash et al., 2011 a; b; Begg and Kamruzzaman, 2005; Eskofier et al., 2013) since most studies would have fewer participants (median = 40 participants) compared to variable data points (Halilaj et al., 2018). In general, the number of observations should be greater than the number of features when using machine learning otherwise there might be a risk of overfitting. Barrett and Kline (1981) recommend that the number of participants should be at least 50 for PCA. However, having said this, during gait analysis of pathological groups, the characteristics of and the location of the research site might impose constraints regarding the number of participants which can be obtained for a study.

In gait analysis, descriptive statistical methods such as peak angles are extracted from temporal waveforms. However, these methods require a priori selection of features, which depends on researchers experience and knowledge. Consequently, a large part of the temporal waveform is discarded which may hold important information. Dimensionality reduction technique could be used for feature selection and feature extraction to overcome this issue and thus full gait cycles could be implemented in the classification procedure. However, many investigations performed the machine learning procedure using discrete parameters (Begg and Kamruzzaman, 2005) and only a few have tried including entire gait waveforms (Phinyomark et al., 2015). Furthermore, some studies limited their investigation to specific variables, i.e. only kinetic, kinematic or EMG (Alaqtash et al., 2011a; Ertelt et al., 2018), however, investigations have shown that machine learning algorithms still perform well when using different variables.

Although, some models were build using various data of kinetic, kinematic and EMG, only a limited number of studies addressed the scaling of these data (Rahmati et al., 2016, Roy et al., 2013), which could adversely affect the classification outcome due to the different units and

Chapter 2: Review of Literature

47 weightings of these variables. Some studies report that variables from different planes have the potential to improve the classification results, thus providing a more comprehensive understanding of pathological gait (Schöllhorn et al., 2002) but the majority of studies focused on sagittal plane data only. However, the use of data from different planes should be approached with caution since ambiguous and erroneous data such as soft tissue artefacts can negatively affect the results (Phinyomark et al., 2018). Thus, more data does not necessarily mean a more accurate classification outcome would be obtained.

Machine learning algorithms are currently being trialled for a number of applications in gait analysis. Some recent studies investigated the use of machine learning in combination with modern technology to enhance medical practice. Zhan et al., (2018), for example, used machine learning and smartphones to quantify the severity of Parkinson’s disease in individuals. Automatic gait recognition tools have proven to be effective in the analysis of pathological gait. However, a drawback of the methods developed thus far is the lack of inclusion of patient history (Bonnefoy-Mazure et al., 2013), which needs to be addressed.

2.6.3 Application of Multivariate Statistical Analyses and Machine Learning

In document Assessment and understanding of unilateral trans-tibial amputee gait using principal component analysis and discriminant function analysis (Page 76-79)