Model Selection and Evaluation - Exploring machine learning techniques in epileptic seizure det

In this section, we present methods used in the selection and evaluation of the models employed throughout this thesis. Some of these steps may differ in some applications.

2.7.1 Training Set vs. Test Set Error

Training-set error is the number of records where the predicted value is different from the actual value. The test-set error on the other hand, is when the data at hand, are divided into two parts, training data and test data. We use training data to train the model and calculate the training-set error, and we use the test data to test the model, as if it were the future real data from which the test-set error rate can be calculated.

The aim is to reduce the training-set error rate as much as possible in order for a model to be less error prone when used on future real data. However, in this process we should be aware of noise in the data and over-fitting it. A sign of over-fitting could be a higher error rate of the test data.

2.7.2 Leave-one-out cross-validation (LOOCV)

In this method, one of the records is temporarily removed from the dataset, and the model is trained using the remainder of the data. The error is then found with respect to the left-out data point. This procedure is repeated for each data point. At the end of the loop, the mean error rate is calculated. This method does not waste data but is computationally expensive.

2.7.3 K-fold cross-validation

Cross validation is an approach used for preventing the model from over-fitting the data. Cross validation works by estimating the Accuracy of the model learnt from some training data, against future unseen data. In this method, the dataset is randomly broken

down to k partitions. For each partition, the model is trained on the data point that is not in the partition and is tested against those in the partition. In essence, there will be k models, from all of which the mean error is calculated.

CV can be used in model selection; using k-fold CV. the best model among a number of candidates is selected. That model will be used and trained with all of the data. A CV can also be used for choosing the kernel parameter in kernel regression or locally weighted regression and the Bayesian prior in the Bayesian regression. These involve real-valued parameters. In a classification problem, CV can be used to calculate the total number of misclassifications on a test-set instead of sum-squared errors on the test-set. CV is also used for feature selection, where the features that are most useful to the learning algorithm are picked, using stochastic search (simulated annealing or genetic algorithms), hill climbing, backward elimination or forward selection.

2.7.4 Evaluation Measures

There are several evaluation measures in the field of machine learning that are used for evaluating both the training outcome and the test outcome. Some of the most notable methods are mean squared error and Accuracy. The choice of evaluation measure, mainly depends on the common measure used in the relevant line of research for ease of cross-study result comparison. The following are the evaluation criteria that will be used throughout this report:

Accuracy

Accuracy is the most common evaluation measure in machine learning research. It compares the predicted output y against the target output yˆ and calculates the percentage of those output labels predicted accurately.

Accuracy=TruePositives+TrueNegative

Positives+Negatives

(18)

Specificity (true negative rate)

In skewed datasets as well as particular lines of research such as seizure prediction, Accuracy is not solely representative of the performance of the model. In a life-critical

application such as seizure prediction, it is also important to evaluate the positive predicted values also known as the true negative rate or Specificity.

Specificity= TrueNegative

FalsePositives+TrueNegatives

(19)

Specificity is the ratio of the true negatives (the correctly predicted negative target value) against the sum of all negative target values. In the case of seizure detection, Specificity is the measure of how many of the states that were predicted as non-seizure were correctly identified.

Sensitivity (Recall)

The true positive rate, also known as Sensitivity or recall, is the measure of true positives to the true positives and false negatives.

Sensitivity= TruePositives

TruePositives+FalseNegatives

(20)

In the context of seizure prediction, this measure reveals the percentage of the target seizure states, which were accurately predicted. The preference between higher values for Sensitivity or Specificity, is domain dependent. Table 2.1 presents the confusion matrix of the seizure prediction scenario. A true positive is when a prediction was made and this prediction was followed by a seizure; the true negative on the other hand is when neither a prediction of a seizure has been made, nor a seizure has occurred.

Seizure occurred Seizure did not occur

Alarm raised a

b FP

Alarm was not raised FN

d Table 2.1 The confusion matrix for seizure prediction.

S1-Score

When Accuracy is not solely sufficient for evaluating the prediction outcome, precision and recall are often used as main or auxiliary measures. The trade-off between Sensitivity and Specificity is mainly domain specific, and is subject to application requirements.

Sensitivity and Specificity are also of importance when dealing with skewed data. In a highly skewed dataset with 99% negative examples and 1% positive examples, a random prediction can have an Accuracy of 99%. In such datasets, the Accuracy is not a representative measure of the prediction, and performance should be verified using Sensitivity and Specificity, which take the number of positive and negative instances into account. In order to facilitate loss functions to only include one measure, instead of both Sensitivity and Specificity, and for ease of evaluation we use the following measure, which incorporates both measures of Sensitivity and Specificity:

S1 =2×

Sensitivity×Specificity Sensitivity+Specificity

(21)

The S1-Score is the harmonic mean of Sensitivity and Specificity, and equally captures both measures.

2.8 Summary

This chapter presented an introduction to some of the machine learning and signal processing concepts and algorithms used throughout this thesis. Some of the topics were presented at a very high level of abstraction. There are two reasons behind this: i) All presented concepts have rich bodies of research attached to them, which mean they are a deep research field on their own. Providing full details of the topics is out of the scope of this thesis. ii) Implementation details of some methods were omitted and will be presented in future chapters.

In document Exploring machine learning techniques in epileptic seizure detection and prediction (Page 36-40)