Chapter 4 Comparison of two machine learning approaches and two
4.6 Application: GeneMSA data
4.6.3 Prediction Accuracies
We evaluate the predictive performance of all four classification models by computing the confusion matrices. Due to the large discrepancy in membership numbers of the five MS subtypes, the average accuracy is more representative of each method’s performance than the overall accuracy (see Subsection 2.4.1). Also note that chance level of average accuracy lies at 20%.
Table 4.3 and Table 4.4 show confusion matrices for all models based on T1 and T2 lesion data, respectively. In case of the SVM classifier, results for two
combinations of features from multiple imaging modalities are shown in the tables. A comparison of classification performance reveals the superiority of the spatially informed approaches while the na¨ıve Bayesian approach performs only slightly above chance level. The NBC results are based on employing a lesion mask (only voxels with at least two lesions) on the full MRI data. In contrast, when using all voxels, the NBC yields predictions where every single subject is classified into the largest subtype, RRMS, resulting in an average accuracy exactly equal to chance.
The feature set for the SVM (M4) (see Table 4.1) classifier comprises GM volume by lobar ROI’s, T1 and T2 lesion count and lesion volume by WM ROI’s
but excludes any demographic or clinical covariates. It reflects the performance of SVM on lesion data when using only traditional measures such as lesion load and count combined with brain atrophy as measured by GM volume-to-whole-brain ratios. The achieved average classification accuracy of 39.4% is well above chance level and indicates that not the covariates but instead the information contained in
MRI data is predominantly driving the predictions.
The feature configuration for SVM (M7) is the one that showed the highest prediction accuracy among our selection of feature sets. The feature set includes GM volume, median T2 lesion volume and T2 EP characteristic split into WM ROI’s
and the following whole brain summaries: standard deviation of T1 mean breadth,
median of T2 mean breadth, T1 and T1-Gd mean intra-lesion intensities; alongside
all available demographic and clinical covariates; resulting in an average accuracy of 47.8%.
Although the SVM classifiers are performing better than a na¨ıve mass-univariate approach in terms of average classification accuracy, they struggle particularly with predicting the PPMS and PRMS subtypes. As with the NBC, the majority of misclassified subjects are categorised as belonging to the largest group, RRMS.
The BSGLMM shows strong prediction results with an average accuracy of 78% based on T1-weighted and 82% based on T2-weighted data. The BSGLMM
confusion matrices show that misclassification predominantly occurs into the CIS subtype. Misclassified patients tend to have fewer and smaller lesions than those that are correctly classified, which is consistent with the clinical presentation of the CIS subtype. Note that classification results improve when using the empirical proportions of group-membership instead of an equal prior, achieving 81.8% (85.5%) average (overall) accuracy on T1 data and 83.7% (79.9%) on T2 data.
The LGCP model does not consider any covariates, which, one could expect, might put it at a disadvantage when doing predictions. With respect to the T1data,
the classifier of the LGCP model performs well on the largest subtype (RRMS) but has difficulty with the two smallest groups (CIS, PPMS). This can at least in part be attributed to the small number of data points available for these subtypes, e.g. there are only eight CIS patients with T1 lesions in the data set. A further difficulty
arises from the fact that only about half as many lesions are visible on T1-weighted
images compared to T2-weighted scans. The comparatively much higher prediction
accuracy in the case of T2lesions indicates that additional data would likely increase
model performance with respect to the T1 lesions.
Regarding the T2 data, the LGCP’s predictive accuracy reaches 84.7% over-
all and 74.7% when averaged across groups. Among the four models considered here, the LGCP is also the closest to a generative model for lesion data, i.e. when simu- lating new data, it would give much more realistic predictions than the BSGLMM for instance, which assumes independent lesion data conditional on (spatially regu- larised) coefficients.
Table 4.3: Confusion matrices and prediction accuracies for different classifiers based on T1 lesion data (except for SVM).
NBC: Overall & average accuracy: 0.580 & 0.219. CIS RRMS PPMS SPMS PRMS CIS 0.000 0.400 0.400 0.000 0.200 RRMS 0.018 0.799 0.018 0.128 0.037 PPMS 0.000 0.923 0.077 0.000 0.000 SPMS 0.024 0.756 0.049 0.122 0.049 PRMS 0.200 0.700 0.000 0.000 0.100
SVM (M4): Overall & average accuracy: 0.536 & 0.394. CIS RRMS PPMS SPMS PRMS CIS 0.454 0.364 0.000 0.182 0.000 RRMS 0.139 0.595 0.064 0.162 0.041 PPMS 0.077 0.385 0.231 0.308 0.000 SPMS 0.070 0.209 0.116 0.488 0.116 PRMS 0.000 0.500 0.000 0.300 0.200
BSGLMM: Overall & average accuracy: 0.654 & 0.783. CIS RRMS PPMS SPMS PRMS CIS 1.000 0.000 0.000 0.000 0.000 RRMS 0.348 0.598 0.030 0.024 0.000 PPMS 0.083 0.000 0.917 0.000 0.000 SPMS 0.216 0.054 0.027 0.703 0.000 PRMS 0.100 0.100 0.100 0.000 0.700
LGCP: Overall & average accuracy: 0.753 &0.510. CIS RRMS PPMS SPMS PRMS CIS 0.250 0.375 0.125 0.125 0.125 RRMS 0.056 0.850 0.069 0.019 0.006 PPMS 0.167 0.333 0.333 0.083 0.083 SPMS 0.071 0.119 0.119 0.667 0.024 PRMS 0.111 0.222 0.111 0.111 0.445
Table 4.4: Confusion matrices and prediction accuracies for different classifiers based on T2 lesion data (except for SVM).
NBC: Overall & average accuracy: 0.592 &0.280. CIS RRMS PPMS SPMS PRMS CIS 0.000 0.500 0.200 0.000 0.300 RRMS 0.012 0.781 0.018 0.116 0.073 PPMS 0.000 0.769 0.000 0.154 0.077 SPMS 0.024 0.585 0.024 0.220 0.146 PRMS 0.000 0.600 0.000 0.000 0.400
SVM (M7): Overall & average accuracy: 0.560 &0.478. CIS RRMS PPMS SPMS PRMS CIS 0.818 0.182 0.000 0.000 0.000 RRMS 0.162 0.584 0.058 0.081 0.116 PPMS 0.000 0.231 0.308 0.231 0.231 SPMS 0.023 0.093 0.116 0.581 0.186 PRMS 0.000 0.400 0.200 0.300 0.100
BSGLMM: Overall & average accuracy: 0.748 &0.823. CIS RRMS PPMS SPMS PRMS CIS 1.000 0.000 0.000 0.000 0.000 RRMS 0.238 0.713 0.006 0.043 0.000 PPMS 0.083 0.000 0.917 0.000 0.000 SPMS 0.162 0.000 0.054 0.784 0.000 PRMS 0.200 0.000 0.000 0.100 0.700
LGCP: Overall & average accuracy: 0.847 &0.747. CIS RRMS PPMS SPMS PRMS CIS 0.600 0.100 0.300 0.000 0.000 RRMS 0.035 0.896 0.017 0.029 0.023 PPMS 0.154 0.154 0.692 0.000 0.000 SPMS 0.023 0.116 0.093 0.767 0.000 PRMS 0.111 0.111 0.000 0.000 0.778