TLE patient identification using Support Vector Machines

Chapter 5: Conclusions

4.2 Methods and Materials

4.2.5 TLE patient identification using Support Vector Machines

To determine the applicability of the voxel-based asymmetry metric on the detection of TLE, a linear support vector machine (SVM) was employed. A linear SVM is a supervised binary classifier that computes the hyperplane or decision boundary that maximizes the separation (margin) between two groups [8]. The SVM was trained/tested using leave-one-out cross- validation (LOOCV) due to the small sample size. The SVM analysis was done using the

scikit-learnlibrary [9].

One classification experiment was performed for each region of interest (Figure 4.6). In each experiment, the classifier was restricted to that region for extraction and selection of features. This allowed us to establish and compare the sensitivity and specificity of the proposed

method for TLE patient identification using features belonging to the mesial temporal region (MSL), the lateral temporal region (NEO), and the anterior temporal lobe (ATL).

Feature selection

Feature vectors were constructed for each subject by concatenating the asymmetry maps in the temporal lobe regions as follows:

Vs,ϕR =[ϕR(As,T I), ϕR(As,FA), ϕR(As,MD), ϕR(As,GM∗)] (4.4)

whereϕR(x) is a function that returns the voxels inR, withR ∈ {MS L,NEO,AT L}in a VBA mapx. Since all the VBA maps are registered to the symmetric template, the respective regions

Rhave the same number of voxels, and thus, the feature vectors have the same length.

Binary class label were assigned to each feature vectorV(0=patient, 1=control). Then, the classifier was trained and tested using leave-one-out cross-validation (LOOCV) after which performance metrics such as accuracy, sensitivity and specificity were collected.

Each training set, as generated per LOOCV, was randomly sampled 5 times using stratified shuffling and splitting to generate feature selection folds. Each fold retained 90% of the subjects in the training set. In addition, the stratified shuffling and splitting enforces the ra- tio between patients and controls to be similar to that of the training set being sampled. This creates balanced folds and strengthens the search for discriminative features.

An ANOVA-based feature selection algorithm was executed on each fold and the top K

most discriminative features were retained. Feature ranking was based on the resulting F-value and ties were broken randomly. This procedure created a feature set candidate for each fold. Then, the final feature set was assembled by features that were consistently selected across all feature folds (Figure 4.7).

Once features were selected, the training dataset was transformed using principal compo- nent analysis (PCA) and the 5 top components were kept, obtaining a compact data represen- tation. The same features were selected in the testing set, which was posteriorly projected onto PCA space (Figure 4.1).

M1candid

ate1sets

FS FS FS

Training Set K1features

FS selected1features1<=1K select1only1common1features :1ANOVA-based1ranking.1Keep1K1best. feature1selection1fold11 feature1selection1fold12 feature1selection1fold1M

Figure 4.7: Feature selection.The algorithm randomly samples the training set, creating feature selection folds (M=5). Then, features on each fold are ranked using an ANOVA test and only the K top features are kept. Finally, only those features identified in all selection folds are chosen to constitute the definitive feature set that will be used in training and testing

Robust features and informative features

A feature set is obtained on each iteration of the LOOCV loop. This feature set can be different from iteration to iteration due to the fact that the training set is different every time and also due to the random sampling mechanism. Therefore it is important to determine which features are recurrent over all classification iterations. Theserobust featuresare obtained by examining the feature sets obtained on all the iterations of the LOOCV loop and retaining only those that consistently appear in all of them. This mechanism is similar to that described for the construction of the final feature set on every iteration, however in this case, the selection of features occurs across all LOOCV feature sets instead of feature folds in a particular training set.

Given that the feature selection algorithm requires setting the K parameter (cardinality)a priori, the performance of the classifier may vary depending on how large the set of features used for classification is. Therefore, the identification of robust features is limited by the selection ofK. To account for this free parameter, robust features were evaluated for a range of K

values, starting at 1000 features, incrementing by 1000 features every time, until reaching the maximum number of available featuresT. Then,informative featureswere identified as those features that were consistently selected at least 80% of the times across all cases where the classifier performed well (above the measured average accuracy). An analysis of informative

features is presented in the results section.

4.2.6 Baseline experiment

A baseline experiment was proposed to compare the effectiveness of TLE identification based on the idea of focal asymmetry features versus classification based on the information con- tained in the T1, FA, MD and GM* images directly (no asymmetry calculation). The feature extraction and feature selection steps for this experiment were similarly performed on the MSL, NEO and ATL regions. However in this case, both the left and right hemisphere are explored for features. A linear SVM, identical to the one used in the VBA classification, was trained and tested following the same methodology as described before.

4.3 Results

In document Multivariate Analysis of MR Images in Temporal Lobe Epilepsy (Page 160-163)