Classification Algorithms - Classification of Activities from Accelerometer Output

2.3 Classification of Activities from Accelerometer Output

2.3.5 Classification Algorithms

The classification algorithm is the engine that processes the training and test data, and produces predictions of which activities are present in the test dataset, as shown in Figure 5.

The classifier must first be trained to recognise activities before the test data can be processed. The training data that is supplied to the classification algorithm is a collection of features extracted from each data window, and labelled according to which activity the window represents. The test data contains accelerometer signals which represent an unknown set of activities for which predictions are to be made. The test data is divided into windows, and features are extracted before being passed to the classifier. The classifier generates an activity type prediction for each window in the test dataset based on the training data it has received.

Figure 5: Supervised machine learning

There are several machine-learning algorithms that have been used effectively in the field of activity monitoring. A brief description of some relevant algorithms follows. The reader is directed to Preece et al. (159) for a more comprehensive summary.

Decision trees (84, 111) are structures built to model a particular decision making process. The tree consists of a number of nodes following a parent-child hierarchy. Parent nodes represent decisions, each of which have child nodes which may either be further decisions or terminate at a conclusion to the decision process. The structure is traversed from the top, following the sequence of decisions, until a terminal node is reached. In the context of activity classification each decision is based on features from a window of monitor output and the terminal node is used as the prediction for the class. Figure 6 shows a decision tree based on the example accelerometer output in Figure 2: first the variance of the vertical signal is compared with a threshold which has been established through the analysis of training data, if the threshold is exceeded, then the decision tree returns ―cross-trainer‖ as its prediction, otherwise a comparison is made between the anteroposterior variance and a similarly obtained threshold, and ―rowing‖ or ―cycling‖ is returned accordingly. Decision trees may be defined manually but there are also algorithms, such as the C4.5 algorithm, able to automatically construct optimal decision trees from a set of data

Figure 6: A simple decision tree. The circular nodes represent decisions, and the square terminal nodes represent an activity classification which depends on the truth of the criteria in the decision nodes. In this example, activity type is decided first on whether the variance of the vertical accelerometer signal exceeds a predetermined threshold, and, if not, whether the anteroposterior (A-P) signal variance exceeds the threshold.

Another common machine-learning algorithm is k-Nearest Neighbours (kNN) (135). For this approach, features are first extracted from labelled training data and these are used to populate an n-dimensional feature space, such as the example in Figure 4. A window from an unknown dataset is mapped to the feature space and its distance from each labelled point in the training set is determined. A classification is arrived at by taking the first k nearest points, or neighbours, in the training set and choosing the most frequently occurring class (as labelled). The value of k varies; a small value for k can mean that the classifier is more susceptible to noise in the data, but a larger value will increase computational time. The kNN algorithm was first applied to activity recognition by Foerster et al. (160) who were able to classify nine activities, and a number of subsequent studies have successfully built on this approach (161-162).

Quadratic Discriminant Analysis and Linear Discriminant Analysis (QDA and LDA) have been applied to activity classification (140). Probabilities for each class of activity are defined by multivariate Gaussian distributions (163), and a discriminant function (which is a simplification of the distribution), when applied to a window of unknown activity, provides a likelihood value for each possible activity (164). The activity with the maximum likelihood is chosen as the activity prediction for the window. LDA is a special case of QDA where the covariances of each activity distribution are assumed to be equal, resulting in linear decision boundaries between activity classes (165).

Other common algorithms include the following: threshold-based classification (154, 166), which is a simple classification scheme that compares feature values with predefined thresholds to determine which activity is chosen; Naïve Bayes classification and Gaussian Mixture Model (GMM) (80, 111, 167), which are probabilistic schemes based on Bayes theorem; artificial neural networks (aNN) (84, 117), which return predictions using a mathematical model designed to process information in a similar way to the human brain; Support Vector Machine (SVM) (144, 168), which aims to differentiate between two activities by finding a hyperplane that separates the two with the greatest margin; fuzzy logic (169-170), which is a type of logic which assigns a measure of truth ranging between 0 and 1, allowing input data to have partial membership of fuzzy sets, and returns predictions from rules based on set membership; and Hidden Markov Models (HMM) (140, 146, 171), which return predictions based on the likelihood of transitions between states.

In document Measuring physical activity in obese populations using accelerometry (Page 34-38)