Random Projection Forests - Automatic image annotation applied to habitat classification

The traditional implementation of Random Forests presents some limitations when the dimensions of its basic parameters, i.e. size, depth and number of randomly selected features in each node, increase. Particularly, increasing the random number of features taken into consideration in each node can be quite time-consuming when the feature vector dimensions’ increase. In order to fix these limitations, we have created Random Projection Forests (RPFs).

RPFs are the third contribution of this thesis. They were designed to be more efficient and accurate than traditional RF. RPFs are more efficient than RFs in terms of execution time during training and testing, as will be shown in Section 7.6, particularly when increasing two of its parameters: the size of the forest and the number of random features to be taken into consideration in the split nodes.

In Random Projection Forests, randomness is introduced in two ways. First, we use different random subsets of the training data to train different decision trees, referred to as bootstrapping [18]. Then, we use Random Projections [101] to reduce the dimensionality of the feature vectors. Random Projections have been used in conjunction with Random Forests in [103]. However, [103] follow a simple approach by projecting the input feature vectors before training traditional Random Forests. This choice is not ideal, since it limits the effect of the randomness that Random Projections could infuse Random Forests with and, consequently, does not benefit from Random Projections as much as they could. In our case, we generate a random projection in each internal node and we use it to project the samples that reach said node. Similarly to traditional

random forests, RPFs are composed of two different types of nodes: split nodes and leaf nodes.

As with Random Forests, the input of RPFs are the annotations of our database and the feature vectors extracted from the photographs themselves. Each forest is generated by training each binary decision tree breadth-wise until one of the stopping criteria is met. Similarly to the stopping criteria introduced in Section 7.2, RPFs will stop being constructed when the number of samples that reach a node is 1 or when the tree has reached its maximum allowed depth.

• Split nodes : These nodes store a test function that splits the data. As mentioned in the previous section, during training, the aim is to optimise the threshold of the split functions in each node so the trees can be as accurate as possible [46]. Our approach is based on random projections [17], previously discussed in Chapter 2. Random Projections are a dimensionality reduction mechanism that enables us to project large feature vectors into scalar values using orthogonal vectors.

In our case, we use random projections to split incoming samples of an internal node to its two child nodes. Let F = (f1, f2, . . . , fn) be the n-dimensional input feature vector of a node, R = (r1, r2, . . . , rn) be an n-dimensional random vector, generated as follows ri =        −1 with probability 1₃ 0 with probability 1₃ +1 with probability 1₃ (7.4) with i = 1, 2, ..., n.

We then project the input onto the random vector. This is done by calculating the inner product between the feature vector F and the random projection vector R as p = FRT. Once the feature vector has been projected, each feature vector is reduced to a single scalar value, and samples are distributed to the left or the right child node according to a threshold as:

(

p ≥ T go to left child

otherwise go to right child (7.5)

where T is a threshold value.

As can be seen, each feature vector, once projected, will be reduced to only one scalar value, the projection itself. This makes our RPFs much more efficient than traditional RFs. Since the projected feature vectors are simple scalar values, the

calculation of the threshold value is quite simple. After projecting all the samples that have reached an internal node, we generate a user-input number of equidis- tant thresholds, 10 by default, that range from the minimum projection to the maximum. Then, we select the threshold that maximises the Information Gain (IG). The IG, which can be calculated as shown in Equation7.2, is then used to select the split function which produces the highest information gain in the final distributions [28].

The computational requirements needed to train a Random Projection Forest are much smaller than those required to train a Random Forest. Instead of considering M splits in each internal node, all samples are projected into one scalar value, an operation that only requires a multiplication. Moreover, in a Random Projection Forest in which L thresholds will be tested, for each random-projection decision tree T with N split nodes, the IG will be calculated LxT xN times. Following the example introduced in Section 7.2, in a Random Projection Forest with 150 trees of depth 9 (512 nodes, 264 of those split nodes) in which 10 thresholds are considered, the IG will have to be calculated 396,000 times. That is 19,404,000 less IG calculations than in the corresponding scenario with Random Forest. • Leaf nodes: At this stage, the leaf nodes are the same as those of traditional RFs.

In our case, they store a normalised probability distribution of the occurrence of all possible habitats. This probability is calculated as shown in Equation 7.3.

The whole procedure of building a random projection decision tree is summarised in Algorithm 1. The pseudocode describing how to build a Random Projection Forest is shown in 2.

In document Automatic image annotation applied to habitat classification (Page 142-144)