• No results found

5.4 Preprocessing methods

5.4.2 Neighbourhood based averaging

The main drawback of atlas based averaging is that it requires knowledge of the group structure. The alternative preprocessing technique proposed in this section tries to cir- cumvent this issue. The approach only assumes that the group structure is consistent with some given spatial organization of the features.

The general idea of the approach, called neighbourhood based averaging, is very similar to the previous atlas based averaging approach except that instead of using a pre-defined partition of the features in non-overlapping groups, it relies on a function

G that will generate a potentially large set of z groups of features. These groups are expected now to be overlapping and to be consistent with the neighbourhood relation- ship defined on the features, i.e., to be composed only of features that are spatially contiguous. From these groups, the algorithm computes a new learning sample with a new feature for each group computed by averaging the values of the original features in the group. Importance scores are then derived for each group from a Random Forests model trained on the new input matrix and these importance scores are mapped back to each original feature by computing the average importances over the groups to which that feature belongs. The procedure is described more formally in Algorithm3.

CHAPTER 5. EXPLOITING SPATIAL AND GROUP STRUCTURE

70

Unlike atlas based averaging, the approach does not require any a priori information about the exact group structure. The idea is that if relevant groups are included among (or close to some of) the groups generated through the function G, the random forests algorithm will be able to identify them among all the other candidate groups. It requires however to define the group generatorG, which typically will depends on some hyper- parameters. We propose and experiment below with two group generators specifically designed for the artificial and real datasets respectively.

Algorithm 3Generic neighbourhood based averaging algorithm

Require:

Learning sampleLS, algorithm

RF

to obtain importance scores from

a forest, group division generatorG

of size

z.

1:

Generate groups(G

1

, . . . , G

z

) =G(LS).

2:

forj= 1 :z

do

3:

x

newj

=

#1G j

P

xk∈Gj

xk

4:

end for

5:

Let

LS

new

be

LS

where original features are replaced by new features

(x

new1

, . . . , x

newz

).

6:

Compute variable importance scores

(s

g1

, . . . , s

gz

) =RF(LS

new

).

7:

Compute importancesi

of original feature

xi

(fori= 1, . . . , m) as follows:

s

i

=

X

k∈{1,...,z}|xi∈Gk

s

gk

.

8:

Divides

i

by the number of different groups to which featurex

i

belongs, i.e.,

|{k∈ {1, . . . , z}|x

i

∈G

k

}|.

Artificial datasets

In the artificial datasets, there is a linear organization of the features, since the groups correspond to blocks (of variable sizes) of features along their original ordering. The group generator function that we will consider generates groups in the form of blocks of contiguous features of fixed size along their ordering. We set block size tos+ 1with

s even so that each block can be considered as centred on one of the original features and contains this feature and its s closest neighbours in the ordering. If there are m

features originally, then the number of generated groups will bez=m−s.

We analyse different neighbourhood sizessin Figure5.7 forT = 200, withK = 1in Figure 5.7(a)and Kd in Figure 5.7(b)(with Kd =

m−sin the case of neighbourhood based averaging). The tested sizes are20,40and80where40corresponds to the average size of a true group by construction of the artificial datasets (50groups for a total of2000

features). Figure 5.8illustrates the effect of the pre-processing on importances scores on the first artificial dataset (withKd, 50 samples, ands= 20).

In Figure 5.7(a), we observe that, for K = 1, the procedure provides better AUPR values than the baseline in almost all settings. Notably for s = 20, the preprocessing approach always beats the baseline, while it does not perform better than the baseline withs= 80and500samples. The value of shas an observable impact on the efficiency of the method, with the best performance depending on the number of samples. ForKd in Figure 5.7(b), neighbourhood based averaging improves the results for all values of

CHAPTER 5. EXPLOITING SPATIAL AND GROUP STRUCTURE

71

(a)K= 1.

(b)Kd.

Figure 5.7 –

T

= 200

and

z

=m−s. Linear neighbourhood based averaging on

the artificial datasets. AUPRs of Random Forests feature ranking. The AUPR

values are averaged over 20 datasets in each case.

Figure 5.8 –

T

= 200,

Kd. Linear neighbourhood based averaging on the first

artificial dataset for

50

sample and

s

= 20. Distribution of importance scores.

The numbers on the x-axis represent the groups in which the features belong.

They are placed at the end of the group.

s only with50 samples. It still improves for100 samples whens = 20ands= 40 but it deteriorates performance in all other settings.

CHAPTER 5. EXPLOITING SPATIAL AND GROUP STRUCTURE

72

Figure 5.9 –

T

= 1000. Spherical neighbourhood based averaging on the real

dataset. AUPRs of Random Forests feature ranking. The AUPR values are aver-

aged over 10 runs.

Real dataset

In the real dataset, spatial organization of the features is three dimensional. Therefore, groups should be generated in a different way. We get for this inspiration from the

searchlight approach used in the field of fMRI [Kriegeskorte et al., 2006]. This multi-

variate pattern analysis method consists in the evaluation of the information contained in spherical volumes centred on every voxel of the brain [Kriegeskorte et al.,2006]. In particular, the method associates to each voxel the capability of its searchlight region to distinguish the studied conditions. This results in a statistical map in which each value brings interpretation relative to a sphere of voxels instead of an individual voxel itself.

Inspired by this method, our group generator G creates groups of the atlas by as- sociating to each feature a sphere of radius R centred on this feature. The amount of overlap between the groups directly depends on the value of R. In this configuration, the new input matrix has as many features as originally. We propose to evaluate three distinct settings: R= 10mm,R= 16mm, andR= 20mm. They respectively correspond to spheres composed of around 470, 1830, and 3480 features. These values have been chosen to explored neighbourhood of sizes lower, close, and greater than the average group size in the AAL atlas (ie., 1431 features. See AppendixC).

Results are shown in Figure 5.9. We observe that the neighbourhood based aver- aging improves AUPR values compared to the baseline for any value of R. The case

K = 1 shows higher variance than Kd. The value of R influences the performance of the approach and the best AUPR value is obtained forR= 16both forK= 1andKd. It is worth to note thatKd both for the baseline and for the alternative procedures corre- sponds toK=√m, as each feature value is replaced by the average feature values in a sphere centred on it.

5.4.3

Discussion

Although results are very good for the atlas based averaging procedure, it is quite rare in general to perfectly know the groups to which the relevant and irrelevant features belong. Indeed, in neuroimaging, we often work with atlas defining brain division into several groups according to anatomical or functional consideration. That kind of fea- tures separation links a particular voxel to a brain area and thus helps for the interpre- tation of the role of the variables highlighted by methods. Although that type of atlas is available to interpret results, relevant groups of features will not necessarily match

CHAPTER 5. EXPLOITING SPATIAL AND GROUP STRUCTURE

73

with an entire brain area. Relevant groups of voxels can correspond to small fractions of brain areas or can be distributed over two adjacent regions.

We therefore proposed another approach called Neighbourhood based averaging. This approach is independent of an atlas a priori defined. Both for artificial datasets and real dataset, it provided good improvements compared to the baseline for both K

values and for different atlas sizes. Although the choice of s or R can be difficult in practice, this approach seems promising. The approach is furthermore generic.