Student-level nested cross validation - Multimodal Techniques for Frustration Detection

Chapter 4. Multimodal Techniques for Frustration Detection

4.3. Modelling

4.4.2. Student-level nested cross validation

The data for 22 students with a total of 9502 instances were used for the student-level nested cross validation. By using a student-level nested cross validation, we train our models using the dataset for p or 15 students and use the dataset for the remaining (n – p) or 7 students for testing the models, where n refers to the total number of students and p is the number of students selected for training the model for each iteration. With this technique, we can test whether our models are generalizable across new test subjects.

In the outer loop of the student-level nested cross validation, the data set for a randomly selected 66.7% (15 students) of the students are used as the training set with the data set for the remaining 33.3% (7 students) of the students as the test set. In the inner loop, a further 66.7% of the student’s data within the outer training set (44.5% of the whole data set) is then used for feature selection. The feature selection results are then averaged over 10 runs of the inner loop. The classification results for each of the models are averaged over 30 runs of the outer loop to derive the final classification results.

For feature selection, the RELIEF-F technique (Kononenko, 1994) is used to select the top 30, 40, 50 and 60 features. RELIEF-F is used here as it can deal with incomplete and noisy data set. The selected features are then fed into the Random Forest, Logistic Regression and Naïve Bayes classifiers trained for each of the top 30, 40, 50 and 60 features as shown in Figure 4-4.

Feature Selection (Relief-F)

Training of feature fusion models for Logistic Regression(LR), Naïve Bayes(NB) and Random

Forests(RF) Student-level

nested cross

validation Training data Top n featuresn=30,40,50 and 60 LR_30 LR_40 LR_50 LR_60 NB_30 NB_40 NB_50 NB_60 RF_30 RF_40 RF_50 RF_60 Trained Models Test Data AUCs

Figure 4-4: Data pre-processing, features selection and training of features fusion classification models

The AUCs for the unimodal models - facial channel (FC), head pose channel (HPC) and keystrokes, mouse clicks and contextual channel (KMC) using the Random Forest, Logistic Regression and Naive Bayes classifiers are as shown in Table 4-5.

Channels Classifiers No. of

features Random Forest Logistic Regression Naive Bayes Facial (FC) 0.552 0.58 0.555 25 Head Pose (HPC) 0.51 0.542 0.553 5 Keystrokes, Mouse clicks 0.539 0.575 0.5 20

91 and Contextual

(KMC)

Table 4-5: AUCs of unimodal models (Random Forests, Logistic Regression and Naive Bayes)

From the results, the facial channel offers the best performance (AUC=0.58) followed by the keystrokes, mouse clicks and contextual logs channel (AUC= 0.575). 25 facial features, 5 head pose features and 20 combined keystrokes, mouse clicks and contextual features are extracted for FC, HPC and KMC respectively using the RELIEF-F feature selection algorithm. The classifiers for each of the 3 channels are better than the random model with AUC=0.5, thus providing evidence that each of the 3 classifiers can discriminate between instances of frustration from non-frustration better than chance. It can also be seen from Table 4-4 that the logistic regression classifier offers the best performance among the 3 classifiers for FC and KMC.

A range of features from 30, 40, 50 and 60 features are extracted using the RELIEF- F algorithm for the feature fusion models. The classification result for the various feature fusion models using the base classifiers of Logistic Regression, Random Forests and Naive Bayes is shown in Table 4-6. The best performing classifier is Logistic Regression for the top 30 selected features.

Random Forest is an ensemble classification technique which should give better performance. However, in this case, our dataset is an imbalanced one with frustration consisting of only 8.34% of the entire dataset. This affects the performance of Random Forest as it is constructed to minimize the overall error rate. This results in the maximizing of the majority class accuracy, resulting in poor accuracy for the minority class (Chen, Liaw, & Breiman, 2004).

Top selected

features AUC (Logistic Regression) AUC (Random Forests) AUC (Naive Bayes)

30 0.636 0.601 0.6

40 0.621 0.608 0.597

60 0.568 0.606 0.6

Table 4-6: AUCs for feature fusion models across the classifiers

The 3 channels feature fusion model combines the features for the FC, HPC and KMC into a large feature vector for classification while the 2 channels feature fusion model combines the features for the FC and HPC into a large feature vector for classification. The AUCs for the 2-channels and 3-channels feature fusion models using Logistic Regression as the base classifier is shown in Table 4-7.

Fusion Models No. of features AUC (Logistic Regression) 3 channels feature fusion 30 0.636 2 channels feature fusion 30 0.631 3 channels feature fusion 40 0.621 2 channels feature fusion 40 0.607 3 channels feature fusion 50 0.582 2 channels feature fusion 50 0.58 3 channels feature fusion 60 0.568 2 channels feature fusion 60 0.568

Table 4-7: AUCs for 2-channels and 3-channels feature fusion models

The results show that the 3 channels feature fusion model using the top 30 selected features has the best performance (AUC=0.636) among the fusion models. The AUC of 0.636 of the feature fusion model is 9.7% higher than the best unimodal channel (facial channel) with an AUC of 0.58, verifying that multimodal fusion leads to higher detection accuracy over unimodal model. In general, the feature fusion models perform better than the decision fusion models (for those feature fused models with 50 or lesser features).

Although the AUC of the 2 channels feature fusion model which combines the facial and head pose channels is only marginally lower than that of the 3 channels feature fusion model, it is still relevant to include keystroke, mouse clicks and contextual features in the fusion model as the facial and head pose features are unavailable for 17% of the total sessions across all students. In addition, the keystroke features of backspace latency and keystrokes wait interval are among the top 10 features out of the 30 features selected. Thus, the addition of keystrokes, mouse clicks and contextual channel features does complement the detection using facial and head pose channel features.

In document A conceptual framework for an affective tutoring system using unobtrusive affect sensing for enhanced tutoring outcomes (Page 105-109)