Select Hierarchical Information-Preserving (HIP) Features

The Select Hierarchical Information-Preserving (HIP) Features method focuses only on eliminating the hierarchical redundancy in the set of selected features, ignoring the relevance values of individual features. Recall that two features are hierarchically redundant, in a given instance, if they have the same value in that instance and are located in the same path from a root to a leaf node in the feature graph (for more details on hierarchical redundancy, see Chapter 2). The motivation for eliminating the hierarchical redundancy among selected features is that some types of classification algorithms, like Naïve Bayes, are particularly sensitive to redundancy among features, as discussed earlier.

The pseudocode of the HIP method is shown as Algorithm 4.1, whereTrainSet and TestSet denote the training dataset and testing dataset, and they consist of all input features; A(xi) and D(xi) denote the set of ancestors and descendants (respectively) of the featurexi;Status(xi)means the selection status (“Selected” or “Removed”) of the featurexi; Inst<w> means the current instance being classified inTestSet;V alue(xi,w)denotes the value of featurexi (“1” or “0”) in that instance; Aij denotes the jth ancestor of the feature xi; Dij denotes the jth descendant of the feature xi; TrainSet_FS denotes the shorter version of the training dataset where all features’ status are “Selected”; and Inst_FS<w> denotes the shorter version of instance w that consists only of features whose status is “Selected”.

In the first part of Algorithm 4.1 (lines: 1 – 8), it firstly constructs the DAG of features, finds all ancestors and descendants of each feature in the DAG, and initialises the status of each feature as “Selected”. During the execution of the algorithm, some features will have their status set to “Removed”, whilst other features will remain with their status set “Selected” throughout the algorithm’s execution. When the algorithm terminates, the set of features with status “Selected” is re- turned as the set of selected features.

In the second part of Algorithm 4.1 (lines: 9 – 27), it performs feature selection for each testing instance in turn, using a lazy learning approach. For each instance, for each featurexi, the algorithm checks its value in that instance. If xi has value

Chapter 4. Lazy Hierarchical Feature Selection Methods with Naïve Bayes 58

J

K

H

D

C

B

A

G

F

E

I

L

0.23 0.21 0.31 0.25 0.38 0.23 0.25 0.26 0.38 0.28 0.26 0.26 0 1 1 1 1 0 1 0 1 0 0 0

Figure 4.1Example of a Small DAG of Features

“1”, all its ancestors in the DAG have their status set to “Removed” – since the value “1” of each ancestor is redundant, being logically implied by the value “1” of xi. If xi has value “0”, all its descendants have their status set to “Removed” – since the value “0” of each descendant is redundant, being logically implied by the value “0” of xi.

To show how the second part of Algorithm 4.1 works, we use as example a hypothetical testing instance with just 12 features, denoted by the letters A – L. Figure 4.1 shows a small hypothetical DAG specifying the hierarchical relationships among the features of our hypothetical instance. In Figure 4.1, the relevance and value (“1” or “0”) for each feature is shown on the left (in bold) and on the right (respectively) of the node representing that feature. Note that the HIP feature selection method uses only information about the feature values and their hierarchical relationships; the features’ relevance values are used only by the two other feature selection methods described later.

With respect to the example DAG in Figure 4.1, lines 10 – 20 of Algorithm 4.1 work as follows. When feature A is processed, the selection status of its ancestor features D, J, C and K will be assigned as “Removed” (lines: 12 – 14), since the value “1” of A logically implies the value “1” of all of A’s ancestors. Analogously, when feature B is processed, the selection status of its descendant features G, I,

Chapter 4. Lazy Hierarchical Feature Selection Methods with Naïve Bayes 59

F, L and E will be assigned as “Removed” (lines: 16 – 18), since the value “0” of B logically implies the value “0” of all of B’s descendants. When feature C (with value “1”) is processed, its ancestor K has its status set to “Removed”. And so on, processing one feature at a time.

Note that the status of a feature may be set to “Removed” more than once, as it happened for feature K in the earlier example. However, once the status of a feature is set to “Removed”, it cannot be re-set to “Selected” again. Hence, the result of Algorithm 4.1 does not depend on the order in which the features are processed.

After processing all features in the example DAG, the features selected by the loop in lines 10 - 20 are A, B and H. Note that these three core features contain the complete hierarchical information associated with all the features in the DAG of Figure 4.1, in the sense that the observed values of these three core features logically imply the values of all other features in that DAG.

Next, the training dataset and current testing instance are reduced to contain only features whose status are “Selected” (lines: 21 – 22), and that reduced instance is classified by Naïve Bayes (line: 23). Finally, the status of all features is reassigned as “Selected” (lines: 24 – 26), as a preparation for feature selection for the next testing instance.

Chapter 4. Lazy Hierarchical Feature Selection Methods with Naïve Bayes 60

Algorithm 4.1 Select Hierarchical Information-Preserving (HIP) Features

1: _InitializeDAG _{with all features in Dataset;}

2: _InitializeTrainSet_;

3: InitializeTestSet_;

4: foreach featurex_i _do

5: _Initialize_A₍x_i₎ _inDAG_;

6: _Initialize_D₍x_i₎ _inDAG_;

7: _InitializeStatus₍x_i₎_←_{“Selected”;}

8: _{end for}

9: _for_eachInst_<w> _∈TestSet _do

10: _for_{each feature} x_i _∈ DAG _do

11: _if V alue₍x_i,w₎ _{= 1} _then

12: _for_{each ancestor} _A_ij _∈ _A₍x_i₎ _do

13: Status₍A_ij₎_← _{“Removed”;}

14: end for

15: else

16: _for_{each descendant}_D_ij _∈_D₍x_i₎ _do

17: Status₍D_ij₎_← _{“Removed”;}

18: _{end for}

19: _{end if}

20: _{end for}

21: _Re-create TrainSet_{_}FS_{with all features} x_i _whereStatus₍x_i₎ _{= “Selected”;}

22: _Re-create Inst_{_}FS_<w> _{with all features} x_i _whereStatus₍x_i₎ _{= “Selected”;}

23: _NaïveBayes(TrainSet_{_}FS_,Inst_{_}FS_<w>_);

24: _for_{each feature} x_i _do

25: Re-assign Status₍x_i₎_← _“_Selected_”;

26: end for 27: _{end for}

Chapter 4. Lazy Hierarchical Feature Selection Methods with Naïve Bayes 61

In document Novel Hierarchical Feature Selection Methods for Classification and Their Application to Datasets of Ageing-Related Genes (Page 73-77)