Other linear methods - Linear SVMs and other linear methods

2.4 Linear SVMs and other linear methods

2.4.3 Other linear methods

We mention here few other linear methods well known for their good performance in terms of accuracy and interpretability.

Multiple kernel learning

Multiple Kernel Learning (MKL) has been proposed by

Bach et al. [2004], Rakotomamonjy et al.[2008] and consists in considering the linear

kernelK(xi,x)as a linear combination of several kernels, i.e.

K(xi,x) = M X i=1 diKi(xi,x), (2.33) withdi ≥0and P M

i=1di = 1. Each kernelKi is thus associated to a subset of features, which can correspond to different regions of the brain or different imaging modalities for instance. The solution of the optimization problem thus attributes simultaneously a weight di to each kernel and a weightwto each feature value if kernelsKi are linear kernels. Absolute values of the weights di denote the importance of each subset of features in the classification function. Sparsity in the kernel weights is enforced by the L1 constraint ondi, i.e. some weights can be null and the corresponding subset of

features is not contributing to the model.

Lasso

Tibshirani [1996] proposed a linear regression method with L1 penalization

of feature weights. The model f(x) = wT_x₊_b _{obtained is the result of the following} optimization problem min w,b ( 1 2 n X i=1 (yi−wTxi−b)2+λ m X i=1 |wi| ) . (2.34)

Through theL1penalization of the weights, such procedure enforces sparsity of weight

vector and thus embeds variable selection. For classification problems, the Lasso penalty can be applied to the Logistic Regression algorithm giving rise to the following formulation max w,b ( n X i=1 h (yi(wTxi+b)−log(1 +ew T_x i+b₎ i −λ m X i=1 |wi| ) . (2.35)

CHAPTER 2. PRINCIPLES OF MACHINE LEARNING

24 Group Lasso

The link between Lasso and Group Lasso is similar to the one between SVM and MKL, i.e. Group Lasso is a version of Lasso taking into account groups of features. Such method can be convenient in situations where variables are organized in groups, as genes in a same biological pathway or voxels in a same brain region. This method has been firstly introduced in [Yuan and Lin,2006]. The penalty is applied at the group level, such that the optimization problem in regression is as follows

min w,b ( 1 2 n X i=1 (yi−wTxi−b)2+λ G X g=1 √ mgkwg k2 ) . (2.36)

In this formulation, there are Ggroups, the cardinality of groupg is mg and its coeffi- cient vector iswg. Askwgk is null only if all its components are null, this penalization enforces sparsity between groups. A Logistic Regression adaptation for classification has been proposed byMeier et al.[2008]. We can also cite other adaptations of group lasso; sparse group lasso [Friedman et al.,2010], overlap and graph group lasso [Jacob

Chapter

3

Machine learning in neuroimaging

Chapter overview

This chapter provides the neuroimaging background of this thesis. Image preprocessing and machine learning state of the art in this field will thus be discussed here. We first describe in this chapter what is brain imaging and how we deal with

this type of data. In Section3.2we provide a non exhaustive list of interesting pub-

lications made in the field of machine learning for neuroimaging, in particular for computer aided diagnosis systems for Alzheimer’s disease. We finish the chapter by describing the data we will make use of in our experiments.

3.1 Principles of neuroimaging

The target of this section is the introduction to basic concepts of the neuroimaging field. In particular, we describe here the different imaging modalities used later in the manuscript, but also their pre-processing and the classical statistical univariate ap- proach for inference.

Figure 3.1 illustrates three types of modalities widely used: structural and functional MRI, and PET imaging. The type of modality chosen for a study depends on what you are interested in.

Indeed, structural imaging will provide information about the brain anatomy and could be helpful to localize a lesion for instance. Volume measures of brain regions can also bring information about atrophy/hypertrophy linked to cognitive behaviours or disease. By comparison functional imaging aims at the characterization of brain activity. Functional MRI (fMRI) data are generally used to study the link between brain activity and cognitive task through the “blood-oxygenation-level dependent” (BOLD) sig- nal. Finally PET images reflect metabolic processes depending on the radiotracer used. Medical diagnoses of neurodegenerative diseases or brain tumour detections are often carried out with this imaging modality.

We describe these modalities in Subsection3.1.1. In Subsection3.1.2, we provide a small overview of preprocessing techniques necessary for the analysis of neuroimages. This stage is in particular mandatory before any multi-subject analysis because brain images have to be made comparable across the group of subjects.

CHAPTER 3. MACHINE LEARNING IN NEUROIMAGING

26

(a) Structural MRI. (b) Functional MRI. (c) PET scan.

Figure 3.1 – Different brain modalities providing information about the brain

anatomy (Fig.3.1(a)), neuronal oxygen consumption variation and thus its acti-

vation (Fig.3.1(b)), and metabolism (Fig.3.1(c)).

Figure 3.2 – Matrix structure of a neuroimage. Image taken from [Prince and

Links,

2006].

In document Characterization of neurodegenerative diseases with tree ensemble methods: the case of Alzheimer's disease (Page 36-39)