2.4 Linear SVMs and other linear methods
2.4.3 Other linear methods
We mention here few other linear methods well known for their good performance in terms of accuracy and interpretability.
Multiple kernel learning
Multiple Kernel Learning (MKL) has been proposed byBach et al. [2004], Rakotomamonjy et al.[2008] and consists in considering the linear
kernelK(xi,x)as a linear combination of several kernels, i.e.
K(xi,x) = M X i=1 diKi(xi,x), (2.33) withdi ≥0and P M
i=1di = 1. Each kernelKi is thus associated to a subset of features, which can correspond to different regions of the brain or different imaging modalities for instance. The solution of the optimization problem thus attributes simultaneously a weight di to each kernel and a weightwto each feature value if kernelsKi are linear kernels. Absolute values of the weights di denote the importance of each subset of features in the classification function. Sparsity in the kernel weights is enforced by the L1 constraint ondi, i.e. some weights can be null and the corresponding subset of
features is not contributing to the model.
Lasso
Tibshirani [1996] proposed a linear regression method with L1 penalizationof feature weights. The model f(x) = wTx+b obtained is the result of the following optimization problem min w,b ( 1 2 n X i=1 (yi−wTxi−b)2+λ m X i=1 |wi| ) . (2.34)
Through theL1penalization of the weights, such procedure enforces sparsity of weight
vector and thus embeds variable selection. For classification problems, the Lasso penalty can be applied to the Logistic Regression algorithm giving rise to the follow- ing formulation max w,b ( n X i=1 h (yi(wTxi+b)−log(1 +ew Tx i+b) i −λ m X i=1 |wi| ) . (2.35)
CHAPTER 2. PRINCIPLES OF MACHINE LEARNING
24
Group Lasso
The link between Lasso and Group Lasso is similar to the one between SVM and MKL, i.e. Group Lasso is a version of Lasso taking into account groups of features. Such method can be convenient in situations where variables are organized in groups, as genes in a same biological pathway or voxels in a same brain region. This method has been firstly introduced in [Yuan and Lin,2006]. The penalty is applied at the group level, such that the optimization problem in regression is as followsmin w,b ( 1 2 n X i=1 (yi−wTxi−b)2+λ G X g=1 √ mgkwg k2 ) . (2.36)
In this formulation, there are Ggroups, the cardinality of groupg is mg and its coeffi- cient vector iswg. Askwgk is null only if all its components are null, this penalization enforces sparsity between groups. A Logistic Regression adaptation for classification has been proposed byMeier et al.[2008]. We can also cite other adaptations of group lasso; sparse group lasso [Friedman et al.,2010], overlap and graph group lasso [Jacob
Chapter
3
Machine learning in neuroimaging
Chapter overview
This chapter provides the neuroimaging background of this thesis. Image prepro- cessing and machine learning state of the art in this field will thus be discussed here. We first describe in this chapter what is brain imaging and how we deal with
this type of data. In Section3.2we provide a non exhaustive list of interesting pub-
lications made in the field of machine learning for neuroimaging, in particular for computer aided diagnosis systems for Alzheimer’s disease. We finish the chapter by describing the data we will make use of in our experiments.
3.1
Principles of neuroimaging
The target of this section is the introduction to basic concepts of the neuroimaging field. In particular, we describe here the different imaging modalities used later in the manuscript, but also their pre-processing and the classical statistical univariate ap- proach for inference.
Figure 3.1 illustrates three types of modalities widely used: structural and func- tional MRI, and PET imaging. The type of modality chosen for a study depends on what you are interested in.
Indeed, structural imaging will provide information about the brain anatomy and could be helpful to localize a lesion for instance. Volume measures of brain regions can also bring information about atrophy/hypertrophy linked to cognitive behaviours or disease. By comparison functional imaging aims at the characterization of brain ac- tivity. Functional MRI (fMRI) data are generally used to study the link between brain activity and cognitive task through the “blood-oxygenation-level dependent” (BOLD) sig- nal. Finally PET images reflect metabolic processes depending on the radiotracer used. Medical diagnoses of neurodegenerative diseases or brain tumour detections are often carried out with this imaging modality.
We describe these modalities in Subsection3.1.1. In Subsection3.1.2, we provide a small overview of preprocessing techniques necessary for the analysis of neuroimages. This stage is in particular mandatory before any multi-subject analysis because brain images have to be made comparable across the group of subjects.
CHAPTER 3. MACHINE LEARNING IN NEUROIMAGING
26
(a) Structural MRI. (b) Functional MRI. (c) PET scan.