3.5 Design and Implementation of W8-Scope
3.5.4 The W8-Scope Classification Pipeline
Based on the insights gathered from the sensor data analysis, we develop the W8- Scope classification pipeline that leverages on specific features that are extracted from the accelerometer and magnetometer sensor data. Using these sensor-based features, we first identify the amount of weight lifted and then identify the exer- cise performed. Subsequently, in logically-parallel steps, we detect incorrectness in specific exercise execution and distinguish between the users performing same exercise.
Initially, we tested performance of various classifiers (SVM, Decision Trees, Random Forest) in Weka [43] for classifying the different weights using the window- based features extracted from data collected during the controlled study for different weightsexperiment (explained in Section 3.4.1). We first tuned the parameters of the different models on our dataset and selected the parameters that gave the best
Figure 3.20: Pipeline of classifying the amount of weight lifted, exercise performed, user performing the exercise and incorrect exercise executions made
performance for each model. We then evaluated the performance of the parameter tuned machine learning models using 10-fold cross validation and found that the best classification performance was achieved with Random Forest (RF) Classifier (with number of trees= 60). Hence, we used RF classifier throughout our multi- stage pipeline–this is consistent with prior works (e.g., [46, 127]) that also found RF classifiers to be more accurate for sensor-based exercise monitoring.
The key components in the classification pipeline (see Figure 3.20) are as fol- lows:
• Amount of Weight Lifted Identification – We train a weight classifier us- ing the parameter tuned random forest classifier. The weight classifier pro- vides the classification of the different weights and the distribution of con- fidence values for the set of weights (i.e., the probabilities that weight = [w1, w2, w3, w4, w5, w6]) for each instance.
• Exercise Identification – For the exercise classifier, we follow a soft decoding approach, that is to feed in the results from the prior classifier as a new feature vector to the existing set of features, i.e., we use the probability distribution of weights classification, instead of using only the ‘most likely’ label for the weights. The exercise classification is performed on the new feature set with the parameter tuned RF classifier.
• Detecting Mistakes in Exercise Execution – After identifying the exercise, we attempt to detect the mistakes made, at a per-repetition level. (This is neces- sary as users may incorrectly execute only a subset of the multiple repetitions in a set.) We first segment acceleration and magnetic sensor signals corre- sponding to the upward and downward motion of the weight stack during a repetition using techniques described earlier in Section 3.5.1.1. We also ob- tain the velocity and displacement corresponding to each transition. Instead of using a fixed window size, we now extract the statistical features on frames representing individual transitions for four signals (acceleration, velocity, dis- placement and magnetic). We also feed in the output of the exercise classifier as a new feature, by taking majority output labels during a set. Note, as shown in Figure 3.20, this implies that mistake classification is not real-time–i.e., it is only performed retrospectively, after the user has completed an entire set (usually lasting 30-40secs). On the new set of features extracted, we again used a RF classifier to classify the commonplace mistakes such as “pulling the weight stack too fast”, “releasing fast or slamming down the weight stack”, “lifting the weights only half-way through”.
• User Identification – This component is used to distinguish between users performing the same exercise on the cable pulley weight machine. For this purpose, we used the initial set of features used for weight classification and split it into exercise-specific feature files, subsequently building a per-exercise classifier that attempts to predict the exercising user, given an entire exercise set.
3.5.4.1 Controlled Study Results
We now present summarized results of the different W8-Scope components, eval- uated on controlled studies performed initially with a small set of explicitly- instructed users (explained earlier in Section 3.4.1). As these studies do not cap-
Table 3.8: Average error (in cm) in displacement computation for varying heights to which weight stack is lifted
Actual Height 6 cm 12 cm 18 cm 24 cm
Average Error ±0.67 cm ±0.87 cm ±1.1 cm ±1.96 cm
Table 3.9: Controlled Study –Summary of performance accuracy (using 10-fold cross validation) for each classifier using individual sensors as well as combination of both sensors
Weight Exercise Mistakes User
Only Accelerometer 77.49% 91.53% 90.43% 93.41%
Only Magnetometer 92.96% 79.37% 83.85% 87.65%
Accelerometer and Magnetometer 99.41% 98.74% 97.34% 99.12%
ture the natural gym activities (e.g., the weight variations across exercises, the se- quence/mix of exercises performed), the results here are meant primarily to quanti- tatively differentiate the capabilities of the magnetic vs. accelerometer sensor, and to establish the accuracy of several of the key W8-Scope features (rather than the inferred outcomes).
Repetition Counting: Based on the 94 sets (containing 940 repetitions) of data collected from the different {weights,exercise} combinations, we ascertain that the repetition counting mechanism (Section 3.5.1.1) achieves an accuracy of 98% in counting the 10 repetitions in each set.
Weight Stack Displacement: We studied the accuracy of displacement estima- tion (i.e., how much did the weight stack move during a repetition?), using the data collected from controlled lats exercises, where the participant lifted the weight stack to four different heights (6cm, 12cm, 18cm and 24cm) for three different weights (3.75kg, 8.75kg and 13.75kg). We observed an average estimation error of ±1.15cm compared to the ground truth height. Table 3.8 shows the breakdown of the average error in displacement computed for each height.
Weight Amount: We utilized the data collected from 54 sets (from two subjects) for three exercises (biceps, triceps and lats), with weights varying from 3.75– 23.75kg. The RF classifier achieves an accuracy of 99.41% (yielding an aver-
age precision of 0.992 and recall of 0.994) in distinguishing between the 9 set of weights. In contrast, the accuracy for weight classification using only magnetic and only accelerometer sensors were 92.96% and 77.49% respectively, showing the importance of fusing multiple sensing modalities.
Exercise Detection: Using the data collected for 2 sets each of 10 different ex- ercises, we found that using only accelerometer and only magnetic sensor based features result in an exercise classification accuracy of 91.53% and 79.37% respec- tively, whereas the joint use of features results in an overall performance accuracy of 98.74% (with an average precision of 0.988 and recall of 0.987) in distinguishing between exercises.
Identifying Mistakes: We used the W8-Scope pipeline (Section 3.5.4) to perform a multi-class classification {correct, incorrect-pull fast, incorrect-release fast} on the data provided by 6 gym staff, which included deliberate mistakes in exercise execution. The performance accuracy achieved when using only accelerometer, only magnetometer and combination of both sensors were 90.43%, 83.85% and 97.34% respectively.
Distinguishing Users: Using the data collected from 8 subjects (48 exercise sets), we found that W8-Scope can distinguish users (i.e., distinguish between the 8 users performing a specific exercise) with an accuracy of 99.12% (precision of 0.991 and recall of 0.993) when using a combination of both sensor features, with the accuracy dropping to 93.41% and 87.65% when only accelerometer or magnetometer features are used.
Summary: Table 3.9 summarizes the key numerical insights. Our controlled stud- ies show that W8-Scope can be promising (accuracy of over 97% using 10-fold cross validation) in realizing each of the attributes in W8-Scope, and that combin- ing both accelerometer and magnetic sensor based features helps to increase system accuracy.