Ensemble
In this section, four methods, i.e., bagging, boosting, stacked generalisation and mix- ture of experts are compared against six characteristics of ensemble learning, to help
practitioners select the most suitable ensemble method for their specific research needs.
3.6.1
Predictive Performance - Accuracy
Predictive performance is considered to be the main feature for selecting the al- gorithm (Rokach, 2009; Statnikov, Aliferis, Tsamardinos, Hardin, & Levy, 2005).
Moreover, predictive performance measures accuracy, which can be used to bench- mark algorithms. In this regard, the bagging method is considered to be high in
accuracy meaning resulting in high prediction accuracy in percentage it is the value of correctly classified samples, it results because of its easy implementation and its
functionality on limited data size referring to less than 50 thousand samples (Zhao et al., 2007; Polikar, 2006). The boosting method has low accuracy, because of its
suffering from problems and its failure to understand complex composite classifiers (Rokach, 2009). The stacked generalisation method has low accuracy as combining
lower level models to higher level models is a complex task (Ting & Witten, 2011). A mixture of experts results in low accuracy considering the fact that assigning weights
to the classifiers from the output of T ier1 classifiers to T ier2 classifiers is a complex task too (Polikar, 2006).
3.6.2
Scalability
Scalability refers to the ability of the method to function on large data sets (Rokach, 2010). The bagging method has low scalability as it operates on limited data size
(Polikar, 2006). The boosting method operates on unlimited data size consisting of more than one hundred thousand samples, hence having high scalability (Polikar,
2006). The stacked generalisation method operates on medium sized training data consisting of 50 thousand to one hundred thousand samples resulting in medium
scalability (Wolpert, 1992). The mixture of experts method functions on low data size, hence having low scalability (Nasrabadi, 2007).
3.6.3
Computational Cost
It is important to know about the computational cost of a method,i.e., does it pro- duce results in reasonable amount of time often related to computational complexity
(Granitto, Verdes, & Ceccatto, 2005). In terms of computational complexity, the bagging and booting methods are less computational complex. Both methods obtain
an ensemble of classifiers efficiently through robust training of data, resulting in lesser computational cost (Freund et al., 2003; Polikar, 2006). The stacked generalisation
method of data training requires more resources in terms of time, and also rectifying improper training by T ier2 requires more time and complex, hence resulting in high
computational cost (Wolpert, 1992). The mixture of experts method requires more resources in terms of time for training data for classifiers and classifying problem,
hence resulting in high computational complexity leading to high computational cost (Polikar, 2006).
3.6.4
Usability
Machine learning is considered to be an iterative process (Ribeiro & Cardoso, 2008). To improve the performance of an ensemble system practitioners change parameters
to generate better classifiers.
of both these algorithms are flexible for generating better classifiers (Polikar, 2006).
The stacked generalisation method has low usability, as once the weights to T ier1 classifiers are assigned they are not flexible, resulting in low usability (Polikar, 2006).
The parameters of assigning weights to classifiers in the mixture of experts method are partially flexible to generate better classifiers, hence resulting in medium usability
(Nasrabadi, 2007).
3.6.5
Compactness
Compactness can be measured by ensemble size and complexity of classifiers in en- semble methods (Rokach, 2010). In this regard, the bagging method results are highly
compact because it only works on limited training data size and results are easy to understand (Zhao et al., 2007; Polikar, 2006). The boosting method on the other
side has low compactness, due to its functionality on unlimited data size, whereas boosting of decision trees could result in thousands (or millions) of nodes which is
difficult to visualise them (Polikar, 2006). Both stacked generalisation and mixture of experts methods have medium compactness, as they operate on low to medium sized
training data (Mitchell et al., 1986; Wolpert, 1992).
3.6.6
Speed of Classification
Computational complexity plays important role in speed of classification. Speed of
classification indicates the ability of a method to perform the classification in a cer- tain time frame (Pfahringer, Holmes, & Kirkby, 2001). The bagging method results
in robust classification because it operates on a limited data size consisting of less than 50 thousand samples and its computational complexity is lowest (Zhao et al.,
2007; Polikar, 2006). The speed of classification for the boosting method is moder- ate compared with bagging because it operates on unlimited data size consisting of
more than one hundred thousand samples (Polikar, 2006). The stacked generalisation (Mitchell et al., 1986) and (Wolpert, 1992) mixture of experts methods are slow in
ing data that results in high computational complexity, and the end output of these
classifiers is an ensemble to make decisions.
Table 3.1 provides a comparison of learning algorithms methods. The table shows
that the bagging algorithm has high accuracy compared with boosting, stacked gen- eralisation and mixture of experts. In terms of scalability, the boosting algorithm
has high ability to handle more data compared with stacked generalisation which has
medium scalability. On the other side boosting and mixture of experts resulted in low scalability. The computational cost of bagging and boosting is low compare to
stacked generalisation and mixture of experts which have a high computational cost to produce results in a reasonable amount of time. In terms of usability, bagging and
boosting have high usability, whereas stacked generalisation and mixture of experts have low usability as their parameters are not flexible for generating better results.
Bagging has high compactness, meaning it operates on limited data. Therefore, its results are easier to understand compared with boosting, stacked generalisation and
mixture of experts which have low compactness. Bagging has a high speed of clas- sification compared with boosting, stacked generalisation and mixture of experts to
perform the classification task in a certain time frame.
Characteristics Bagging Boosting Stacked Generalisation Mixture of Experts
Accuracy High Low Low Low
Scalability Low High Medium Low
Computational Cost Less Less High High
Usability High High Low Medium
Compactness High Medium Low Low
Speed of Classification High Medium Low Low