Interpreting multivariate machine-learning-based lesion-deficit mappings

CHAPTER 1: General Introduction

1.8 Lesion-deficit mapping studies

1.8.2 Interpreting multivariate machine-learning-based lesion-deficit mappings

methods that (unlike univariate approaches) capture the multivariate patterns in which brain damage affects behaviour (Xu et al., 2018). This novel collection of techniques involves using advanced machine learning algorithms to explicitly model (i) how the effect of damage in one voxel depends on that in another and (ii) the parasitic voxel-to- voxel associations that arise as a result of the non-random distribution of vascular damage across the brain. For instance, Zhang et al. (2014) introduced a non-linear support vector regression-based (SVR) multivariate lesion-deficit mapping approach and showed that, in comparison with a standard univariate approach, it (a) yielded

significantly higher performance accuracy (higher sensitivity and specificity) in detecting ground-truth lesion-deficit relationships generated from synthetic data and (b) was more sensitive to empirical lesion-deficit relationships present in real patient data. A five-fold cross-validation scheme iterated 40 times revealed, however, that the same SVR method only achieved a relatively modest prediction accuracy (i.e. the ability to predict functional outcomes in new patients): R2_{= ~0.10. Likewise, Pustina et al. (2018) demonstrated that}

a multivariate lesion-deficit mapping method based on sparse canonical correlations produced more accurate lesion-deficit mappings in terms of smaller localisation errors (although far from being perfect) than state of the art univariate implementations in most of the simulated situations including different: (a) sample sizes, (b) corrections for multiple comparisons and (c) multi-area combinations. Moreover, when applied to real patient data, it exhibited higher sensitivity than its univariate counterpart. Nonetheless, the cross-validated (i.e. 4-fold cross-validation scheme) correlation between predicted and observed behavioural scores was still lower than 0.60 (i.e. R2_{< 0.35).}

Other efforts have placed a greater emphasis on exploiting the promising potential of machine learning algorithms to build data-driven predictive models that can successfully learn the most important multivariate structure-function-recovery rules underlying accurate outcome predictions (e.g., Hope et al., 2015; Rehme et al., 2015a, b; Siegel et al., 2016). For example, Smith et al. (2013) trained and tested a non-linear support vector machine classifier on lesion data from a large sample of 140 right- hemisphere stroke patients to predict the presence or absence of spatial neglect. By adopting a leave-one-out cross-validation scheme, the authors found that, when the classifications examined the contribution of multiple voxel across the whole brain simultaneously, the predictive power of patches of voxels (i.e. multivariate classification) outperformed the best-performing single voxel (i.e. univariate classification). In addition, when the lesion analysis considered the degree of damage to 45 right hemisphere regions, a significantly higher average prediction accuracy was observed for two-region combinations than single regions (lesion-size-adjusted percentages: ~62% versus

~53%); and for three-region combinations than two-region combinations (lesion-size- adjusted percentages: ~67% versus ~62%). Similarly, Hope et al. (2013) trained and tested a Gaussian process regression model on lesion and non-lesion data from a large sample of 270 stroke patients to predict speech production scores in new cases at the individual subject level. The best predictor configuration included time post-stroke, lesion size and lesion load (i.e. the proportion of damaged voxels among all voxels within discrete anatomical structures) in 35 grey/white matter regions, which were selected using a fully automated procedure. Nevertheless, the predicted scores (obtained from a leave-one-out cross-validation scheme) only accounted for 59% of the variance in the observed speech production scores.

The ever-increasing popularity of machine learning algorithms does not come free of its own problems and controversies (e.g., Arbabshirani et al., 2017; Varoquaux et al., 2017; Carlson et al., 2018; Janssen et al., 2018; Mateos-Pérez et al., 2018). In what follows I give seven examples. First, because the number of variables (or feature space) in a typical neuroimaging study is much greater than the number of observations (aka “curse of dimensionality”), machine learning techniques usually incorporate some sort of dimensionality reduction step embedded somewhere in the learning process (Lemm et al., 2011; Klöppel et al., 2012), which could compromise the spatial specificity and interpretability of the results. Second, the choice of input lesion variables is user- dependent (as in univariate methods). However, the shape and size of the functional units of the brain continue to be a matter of debate (Eickhoff et al., 2018a, b; Genon et al., 2018) because cortical areas can be defined on the basis of their structure (e.g., Amunts et al., 1999; Caspers et al., 2006), function (e.g., Sereno et al., 1995; Formisano et al., 2003) and/or connectivity (e.g., Ruschel et al., 2014; Gordon et al., 2016), with distinct brain mapping approaches arriving at different solutions. Consider, for instance, two recent multi-modal parcellations of the human cerebral cortex: the Human Connectome Project atlas (Glasser et al., 2016) and the Brainnetome atlas (Fan et al., 2016). There are 30 more cortical regions per hemisphere in the latter (N = 210) than the

former (N = 180). Critically, even small changes in how brain damage is encoded (e.g., at the level of single voxels versus atlas-based regions) lead to noticeable differences in prediction performance (Rondina et al., 2016; see also Abraham et al., 2017). Third, the tuning of the model hyper-parameters (including the regularisation term) is not a trivial issue and does not have a one-size-fits-all solution (Hastie et al., 2004; Lemm et al., 2011; Varoquaux et al., 2017).

A fourth challenge for machine learning users is the lack of consensus as to which type of algorithm to choose when tackling classification (for categorical outcomes) or regression (for continuous outcomes) problems (e.g., Cui and Gong, 2018; Hope et al., 2018). A fifth challenge is that multivariate methods are computationally more expensive than their univariate counterparts, especially if the lesion information is encoded at the voxel level (DeMarco and Turkeltaub, 2018); and could be more complicated to implement in practical terms given the extent of technical knowledge involved. Sixth, due to the high-dimensionality and nonlinearity that arise when attempting to capture multivariate structure-function-recovery associations, the output of machine learning algorithms can be complex and may obscure precise neurobiological interpretations, thereby limiting the degree of scientific insight afforded by these methods (Haufe et al., 2014; Coveney et al., 2016; Huys et al., 2016; Bzdok and Yeo, 2017; Stephan et al., 2017). Seventh, although cross-validation (i.e. splitting the data into training and testing sets) restricts overfitting, the estimated predictive performance of the model is still subject to the vagaries of sample size (Braga-Neto and Dougherty, 2004; Isaksson et al., 2008; Popovici et al., 2010; Cui and Gong, 2018; Varoquaux, 2018). In other words, the use of cross-validation does not preclude the need for proper validation as indicated by studies that reported a substantial drop in effect size estimates after testing the generalizability of the cross-validation results with a full split-half analysis (Price et al., 2013; Pustina et al., 2017).

In summary, while the capacity of multivariate lesion-deficit mapping techniques to model the spatial bias in vascular lesions and the distributed nature of human cognitive

functions is certainly better than that of univariate lesion-deficit mapping techniques, that does not imply that they are perfect. Indeed, Pustina et al. (2018) showed that multivariate methods reduce, but do not completely correct, the displacement of critical areas relative to univariate methods. More research on this topic is thus warranted.

In document Using non-invasive stimulation of the undamaged brain to guide the identification of lesion sites that predict language outcome after stroke (Page 51-55)