II. Literature Review
2.6 Machine Learning
2.6.3 Artificial Neural Networks
An Artificial Neural Network (ANN) is another approach to a machine learning task. These models were inspired by the biological makeup of the brain in which neural networks propagate signals and information [84]. ANNs are proficient at learning complex relationships amongst the data and are capable of expressing these complex relationships as more simple relationships.
2.6.3.1 Fully Connected Neural Networks
The first ANN that was developed is the fully connected neural network. In these networks, each neuron, or unit, is fully connected to every other neuron in the subsequent layers. Each layer in an ANN receives inputs, multiplies these inputs by a set weight, and then passes the weighted sum of the inputs through an activation layer. The output of the nth layer is:
xn = f (WnTxn−1+ bn)
f : non-linear activation function xn−1: the input to the nth layer
Wn: the matrix of weights that describes a mapping from xn−1 to xn
bn: vector of biases.
An ANN learns by modifying the parameters of a model until the network can correctly map an input to the desired output. An example of an ANN can be seen in Figure 2.
Figure 2. A simple example of an ANN. This fully connected ANN has two inputs, two hidden layers, and a single output.
ANNs generally perform well when classifying EEG data. In an attempt to classify operator workload via EEG data, Wilson et al. used a single, 43-node, fully-connected ANN [85]. In this task, eight participants performed NASA’s Multi-Attribute Task Battery (MATB) at one of the three levels of workload: baseline, low, or high. EEG data was collected over three five-minute sessions during the course of a single day. Each session corresponded to one of the three levels of workload. Once collected, the EEG data was processed by using a fast Fourier transform (FFT) to transform it the frequency domain so that the average power could be computed. The EEG bands used included delta (1-3 Hz), theta (4-7 Hz), alpha (8-13 Hz), beta (14-30 Hz), and gamma (31-42 Hz). The network achieved a mean classification accuracy of 85.0% on the baseline level, 82.0% on the low workload level, and 86.0% on the high workload level. Further work in the same area by Christensen et al. showed that ANNs outperformed both LDAs and Support Vector Machines when classifying workload level with EEG [86].
2.6.3.2 Convolutional Neural Networks
A Convolutional Neural Network (CNN) is a neural network that is able to learn spatial patterns in the input data [84]. There are three main layer types used when building a CNN: the convolutional layer, the pooling layer, and the fully-connected layer. In the first layer, the convolutional layer, the input is convolved using a set of kernels. An activation map is then produced by applying an element-wise application function. The next layer, the pooling layer, downsamples the spatial dimensions of the activation map. The last layer, the fully-connected layer, calculates the scores of each class and classifies the input. An example of a CNN can be seen in Figure 3.
Figure 3. In this CNN, an image of an animal is taken as an input and the output is the type of animal [87].
Regarding classifying emotions from EEG data, CNNs have been shown to outperform other machine learning methods. Tripathi et al. used a CNN to classify human emotion using EEG data fom the DEAP dataset [88]. The DEAP dataset consists of 40-channel EEG data recorded from 32 participants as they watched 40 one- minute clips of music videos. The participants completed a self-assessment and scored themselves on arousal, valance, and dominance for each music video. The CNN model achieved an accuracy of 81.4% and 73.4% for the binary classification of the valence and arousal levels of the participants, and an accuracy of 66.8% and 57.6% for the
three-class classification of valence and arousal levels.
2.6.3.3 Recurrent Neural Networks
A Recurrent Neural Network (RNN) is a form of a neural network that is able to learn long sequences and their dependencies on each other by maintaining a state or a memory. This memory is maintained by a recurrent connection to itself which allows the model to process both the current input as well as previously seen inputs [84]. An example of a simple RNN can be seen in Figure 4. The major problems with simple RNNs is that they are a victim of the vanishing gradient problem. The vanishing gradient problem is a problem in which the gradients that are seen towards the end of the model become extremely small as they are back-propagated to the beginning of the model. The effect is that the model is unable to retain information about inputs seen “a long time ago” and is therefore unable to learn long-term dependencies. The Long Short-Term Memory (LSTM) model was created in order to solve this problem. LSTMs contain a separate channel in which important information is stored so that it is able to be used in learning long-term dependencies [84]. An example of a simple LSTM can be seen in Figure 5
Figure 5. This is an example of a single LSTM [89].
RNNs are well-suited for machine learning problems in which time-ordered information is important or can lend clues as to the current state. Because EEG data is temporally organized, RNNs have been shown to outperform other machine learning models in classifying EEG signals. In classification of six different hand motions from a grasp- and-lift experiment, an RNN obtained an accuracy of 94.8% in classifying which motion was being performed [82]. This was an improvement of 23.5% over other machine learning methods. RNNs have also been shown to obtain the lowest test error of mental load classifications ; the addition of LSTM layers in a CNN reduced the test error of a four-class mental load classification by 21.5% [82].
Due to its ability to learn long-term dependencies, LSTMs are perhaps more powerful than simple RNNs when it comes to classification and EEG. Hefron et al. created a multi-path convolutional recurrent neural network (MPCRNN) that consisted of CNNs and LSTMs [90]. This network achieved a cross-participant accuracy
of 86.8% in classifying low and high workload from EEG data. This performance outperformed CNNs and LSTMs by themselves. These results suggest that RNNs may perform well in classifying EEG signals.
2.6.3.4 Temporal Convolutional Networks
A Temporal Convolutional Network (TCN) is a specific type of CNN that has faster training times and longer memory than traditional RNNs when modeling a sequence [91]. The general structure of a TCN can be seen in Figure 6. The first layer of a TCN is the dilated causal convolutional layer which is a standard convolutional layer with a dilated kernel. Using stacked, exponentially increasing, dilated convolutional layers causes the receptive field of the model to increase exponentially while the number of parameters increases linearly. The residual blocks used by a TCN consist of a dilated one-dimensional convolutional layer with a ‘causal’ padding, a weight normalization layer, a rectified linear unit (ReLU) activation layer, and a spatial dropout layer which repeats for as many blocks are present. The various layers of a residual block contained in a TCN can be seen in Figure 7.
Figure 6. The architecture of a TCN.