Toward Culturally Relevant Emotion Detection Using Physiological Features

(1)

Toward Culturally Relevant Emotion

Detection Using Physiological Features

Khadija Zanna

Department of Computer Science and Engineering

University of South Florida

March 13th, 2020

(2)

Problem Statement

● “Emotional distress is shown to have a statistically signiﬁcant eﬀect on grade

point averages and the intent to drop out.” (Mary E. Pritchard, 2003)

● High distinction in graduation rates between students of diﬀerent cultural

backgrounds (as high as 25%) (Dina C. Maramba, 2012 and Dough Shapiro,

2017).

● Some studies suggest similarities in emotional intelligence, stability, and

motivation amongst demographically similar groups.

(3)

1. Introduction and Research Aims

2. Database Description

3. Feature Extraction

(4)

Introduction

● Two intersecting areas of interest

○ Culturally-relevant emotion recognition

○ Physiological measurements of emotion

● The use of physiological data for emotion detection has made signiﬁcant gains in recent

years (A. Sano 2015, 2016, Rui Wang 2018).

● Extensive exposure and interaction between cultural groups might distort ﬁndings

particularly those on in-group advantage, which is particularly likely to occur among

students due to the culturally-diverse nature of college campuses.

(5)

Research Aims

● We explore the use of physiological signals for culturally-relevant emotion

recognition within the college student population.

● We explore several types of physiological signals captured while

(6)

Database

● Several measurements of physiological responses collected by Zhang et. al at the University of Binghamton.

(7)

Database

Each subject experienced 10 emotion inducing tasks: ● Happiness (amusement) ● Surprise ● Sadness ● Startle (surprise) ● Skepticism ● Embarrassment ● Fear (nervousness) ● Physical pain ● Anger ● Disgust

During each task, physiological data was collected using a system that captures vital sign signals (blood pressure, respiration rate, heart rate, and electrodermal activity).

● Sample rate of 1000Hz.

(8)

Feature Extraction

● Each of the 8 signals were stored in text ﬁles for each emotional stimulus (8 ﬁles for each (10) stimulus).

● Standardized all physiological data by removing the mean and dividing by variance:

● Extracted 13 statistical features: minimum (min), maximum (max), mean, variance, skewness, kurtosis, min of the derivative, max of the derivative, mean of the derivative, variance of the derivative, skewness of the derivative, kurtosis of the derivative, root mean square (rms) of the derivative per text ﬁle, to create a single sample.

(9)

Feature Extraction

● We annotated each sample with the respective

physiological signal, race, emotion, gender, and user_id.

● In addition, we used class combinations - race-emotion,

race-gender, and race-gender-emotion

(10)

(11)

Unsupervised Learning: Clustering

● DBSCAN - Samples near each other are clustered according to a distance metric and a minimum number of surrounding samples.

● Robust to outliers (noise points) and it supports non-globular structures, unlike partitioning methods and hierarchical clustering algorithms, great at separating high density clusters from low density clusters, does not require the number of clusters to be speciﬁed priorly.

● EPS values - 0.25, 0.50, 0.75, and 1.00

(12)

Unsupervised Learning: Clustering

● Total of 96 runs for every class, 672

experiments were evaluated in total (7 classes * 8 physiological signals * 4 eps values * 3

distance metrics).

● Measured the performance of these

experiments (e.g., if a single partition consisted of data from one class) according to each class. ● We evaluated each physiological signal

separately during these experiments (e.g.,

(13)

Clustering Results

Evaluated resulting partitions by the number of clusters and noise points and some clustering metrics scores:

● Homogeneity (HOM)- Quantiﬁes number of clusters with members of a single class (1 - all clusters contain

only members of a single class, 0 - no data points in a cluster belong to a single class).

● Completeness (COM)- Quantiﬁes the data points belonging to a given class that are elements of the same

cluster.

● V-measure (VM)- Measures how successfully the criteria of homogeneity and completeness have been

satisﬁed.

● Adjusted Rand Index (ARI)- Similarity measure between two clusterings by considering all pairs of samples

and counting pairs that are assigned in the same or diﬀerent clusters in the predicted and true clusterings.

● Adjusted Mutual Information (AMI)- Measure of the similarity between two labels of the same data.

● Silhouette Coeﬃcient (SC)- Measures how similar an object is to its own cluster (cohesion) compared to

(14)

Clustering Results

● Utilized Multi-Dimensional Scaling (MDS) - a distance preserving dimensionality reduction

technique, to reduce the data to three dimensions for visualizing clusters

● Extracted 90th percentile (top 10%) of every clustering metric score for each class

evaluation.

(15)

(16)

(17)

(18)

(19)

Race+Gender+Emotion and User ID

(20)

Clustering Results

● BP_mmHg and EDA_microseimens generally performed better than others at clustering

race, producing the right amount of clusters, and having generally higher clustering

metric scores.

● Respiration Rate_BPM clusters by race-emotion better than others, and shows up for

most of the classes, with high completeness score.

● BP_mmHg does better than others for gender according to clustering metric scores, but

BP Dia_mmHg produced more distinct clusters visually.

(21)

Supervised Learning: Classiﬁcation

● Random Forest Classiﬁer and Support Vector Machine (SVM)

● Random Forest (n = 100) - High accuracy, handles large data sets with high

dimensionality well, has an eﬀective method for estimating missing data and

maintains its accuracy.

● SVM - High performance with little tuning needed, memory eﬃcient.

● Split data into training and testing data with test sizes of 25%, 33%, and 40%.

● Trained classiﬁer using all 13 features to classify the 7 classes.

(22)

(23)

Confusion Matrix for Gender

● 82 Females, 58 males

● 0 = Female

(24)

(25)

(26)

(27)

(28)

Conclusions

● Blood pressure and electrodermal activity generally yielded better clusters regarding race.

● Respiration rate and blood pressure provided better clusters for emotion and a combination of race and emotion.

● The results show that emotion and gender were classiﬁed better than the other classes.

● The confusion matrix for emotion gives us more insight on which emotions are better recognized using physiological data (sadness, startle), and which emotions are most mistaken for each other (surprise and skepticism).

● Random forests is a better classiﬁer for our set of physiological features.

(29)

Conclusions

● In some cases, the number of clusters exceeded the number of classes, indicating that

the classes could be too generalized.

● We conclude that this occurred as a result of generalization of race groups.

(30)

Limitations and Future Work

● Unbalanced nature of most datasets in terms of race and/or gender diversity.

● Generalizations made when grouping subjects by race.

● Presence of a more balanced dataset that is labelled considering races within

larger regions could yield much better results.

(31)

Toward Culturally Relevant Emotion Detection Using Physiological Features