Random Forest Classifier - Construction Trees

Chapter 4 Model Descriptions

4.2 Construction Trees

4.2.3 Random Forest Classifier

The Random Forest Classifier (RFC) is one of the high-order approaches to machine learning, employing an ensemble of decision tree learners, in conjunction with feature bagging, to constitute a strong overall classifier. Such a composition strategy can be identified as a meta- learning approach to problem-solving [143]. The RFC methodology was first proposed by Tin Kam Ho [144, 145] and then developed into the current form by Brieman [146]. Importantly, the individual decision tree base learners produced as a result of RFC procedure are trained independently and therefore remain uncorrelated [147]. One advantage of the RFC is become popular in machine learning field due to the same algorithm can be used for regression and classification.

RFC has become a prominent ensemble learning algorithm in the last several decades, facilitating the learning of complex functions in numerous task domains [148]. The classifier produced is an intuitive model that provides a robust probabilistic structure for solving a number of learning tasks. Following a divide and conquer strategy, it is clear that RFC efficiently generates partitions of high-dimensional attributes, over which a probability distribution is located. Therefore, the algorithm allows density estimation for arbitrary functions, with possible usage to task modalities of clustering, regression or classification. The methodology of RFC is described in Equations 4.6 and 4.7.

48 | P a g e

𝑓(𝑥) = 1

𝑚∑ 𝑓(𝑥, 𝑥𝑖𝑝 ) 𝑚

𝑖=1 (4.6) Where 𝑥 refers to the variable that partial dependence is required, while 𝑥_𝑖𝑝 is considered the other variable for data.

𝑓(𝑥) = log𝑡_𝑗−1

𝐽∑ (log𝑡𝑘(y) 𝐽

𝑘=1 ) (4.7) Where J belongs the number of classes, whereas j refers to a class. In addition, 𝑡_𝑘is belong t tohe proportion of total votes for class j.

Given an M feature set, the decision trees are built utilising m features from the feature set that is randomly selected at each node [149]. The optimal way is calculating m features that continues till the decision tree is grown without being in need of pruning. In order to use different bootstrap instances of the medical data, the task is repeated continuously for all decision trees in the whole forest [149]. One purpose of classifying new instances can be accomplished by a majority vote. Combines decision tree classifiers with bagging can be obtained using RFC (refer to Algorithm 4.2) [149]. In the bagging method, construct a number of decision trees based on bootstrapped training datasets. However, when building these trees, a split in a tree is required at each time, a random instance of 𝑚 predictors is selected as split candidates from the complete set of 𝑝 predictors. In this case, the spilt is permitted to utilise one m predictors [136].

Algorithm 4.2: Random Forest

1 Given a training set{(𝑥1, 𝑦1), … , (𝑥𝑁, 𝑦𝑁)}, where 𝑥𝑖∈ 𝑅𝑑 and 𝑦𝑖∈ 𝐶, where 𝐶 represents

target classes; define the B of trees and the m of random features to select. 2 For b = 1, ..., B,

(a) Using the training set of datasets and sampling, produce a bootstrap instance of size n; some patterns in the training set will be replicated again, while other patterns will be omitted based on the tree itself.

(b) Implement a decision tree model, 𝜂𝑏(𝑥) utilising the bootstrap example as

training dataset, each node in the tree m variables with randomly selecting to consider for splitting.

model.

3 Assign 𝑥𝑖 to the target class most characterised by the 𝜂𝑏′(𝑥) models, where 𝑏′ belongs to

the bootstrap instances that do not involve 𝑥𝑖.

This approach generates a number of trees to create a big forest. Typically, the higher number of trees in a forest can make the algorithm more robust, producing high accuracy. Significant

49 | P a g e improvements demonstrated empirically and theoretically from the formation of decision tree ensembles that may be aggregated to form a final decision through voting procedures. In order to grow such ensembles, RFC performs the additional step of feature bagging [146]. Hence, through the ensemble design, the RFC algorithm produces a strong learner from individually weaker decision trees. Moreover, the model is efficient to train and test over empirical datasets and has integrated mechanisms for predicting confidence and estimating test error.

The combination of learning algorithms increases the classification accuracy and performance evaluation. RFC uses bagging over both training example subsets and feature subsets, producing a large collection of decorrelated models manifested through a series of decision trees [150]. Suppose M is a matrix of training samples that used to train a classifier. In this context, 𝑥_𝐴1 belongs to the feature A of the 1st_instance,_xB_{1, the feature}_B_{of the 1}st_instance,

xC1 the feature C of the 1st_{instance, and so on. This research continue in all samples up to}_N_.

y1 and yN refer to the training classes. Therefore, in the matrix M, a number of features and

training classes to classify the SCD datasets.

M = [

𝑥_𝐴1 𝑥_𝐵1 𝑥_𝐶1 𝑦₁

⋮ ⋮ ⋮ ⋮

𝑥𝐴𝑁 𝑥𝐵𝑁 𝑥𝐶𝑁 𝑦𝑁

]

A number of subsets randomly selected as shown in M1 and Figure 4.2. For example, features 𝑥_𝐴14,𝑥_𝐴17, 𝑥_𝐴20and 𝑥_𝐴38as well as some other random elements in B and C. Then, make another random subset with different values as shown in matrix M2. Eventually, create any number of decision trees as illustrated in SM. The main idea of using different variations is to generate a ranking of classifiers. This process is repeated continuously at each decision tree until the correct class label is found. The vast majority voting among decision trees is selected as the correct target value.

50 | P a g e

Figure 4-2: Decision trees example

The RF classifier is trained by the development of an ensemble method of B trees, giving the training sets X = 𝑥1. . . 𝑥𝑛 , and the target class label (responses) is Y = 𝑦1. . . 𝑦𝑛. 𝑓𝑜𝑟 𝑏 =

1, … , 𝐵: Instance with replacement B belong the training sample from 𝑋, 𝑌 which refer to 𝑋𝑏, 𝑌𝑏. Y is belonged the predicted class that usually selected through the majority voting. In theoretical side, select a number of datasets for training phase 𝑀 = {(X1, (Xn) … , (Y1, Yn), where Xi, i = 1. . , n is descriptors vector and Yi is either the activity of interest or the corresponding label [151].

Ma et al [152] proposed nonlinear regression random forest model and multiple linear regression to examine the Single nucleotide polymorphisms, frequently called (SNPs) and the alteration in HbF level afterward 2 years of medication with the response of hydroxyurea. The study recruited 137 SCD patients who take hydroxyurea dosage daily. Random forest involved a number of trees; all the decision trees refer to regression function. This model shows significant outcomes in terms of HbF concentration for the vast majority with sickle cell anaemia patients.

In document Machine Learning Approaches and Web-Based System to the Application of Disease Modifying Therapy for Sickle Cell (Page 64-67)