Ensemble Learning - Support Vector Machine (SVM) aggregation modelling for spatio-temporal air

3.2.1 Overview

The goal of supervised learning algorithms is to perform a task through hypothesis space to come up with suitable hypothesis that will result in good predictions for a

particular problem (Sollich & Krogh, 1996). Even if the hypothesis space contains hypotheses that are best suited to a speciﬁc problem, it is still very diﬃcult to select

a good one(Sollich & Krogh, 1996). In fact ensembles combine multiple hypotheses to form better hypotheses for decision making.

In other words we can say that ensemble is a technique that combines the weak learners to produce a strong learner. The term usually referred to methods that gen-

erate multiple hypotheses using the same base learners. However ensemble learning is a process in which multiple models such as classiﬁers or experts are tactically created

to solve a computational problem. Ensemble learning is mainly used to improve the classiﬁcation, prediction and performance of a model or to avoid the selection of a

poor model (Y. Liu & Yao, 1999). Recent machine learning approaches like deep learning have tried to implement and were successful in combining multiple learners

as an ensemble to tackle dimensionality issues with continuous data (Tirumala, Ali, & Ramesh, 2016).

The prediction of an ensemble requires a lot of computation compared with assess- ing prediction of a single model. So ensembles may be considered as a technique by

working on poor learning algorithms by performing a lot of computations (Chandra & Yao, 2006). On the other hand fast learning algorithms e.g. decision trees are

commonly deployed with ensembles. However, slower learning algorithms can take advantage of ensemble techniques as well (Chandra & Yao, 2006).

3.2.2 Ensemble Theory

An ensemble is a supervised learning algorithm. In other words it can be trained and deployed to do predictions. The trained ensemble generates a single hypothesis.

However, this hypothesis is not essentially contained in the hypothesis space of the

models from which it is built. Ensembles have shown more flexibility in representation (L. I. Kuncheva & Whitaker, 2003). This flexibility in theory, results in overfitting the

training data more than a single model. However, in practical ensemble techniques tend to reduce the problem of overﬁtting of the training data (Webb & Zheng, 2004).

Practically ensembles tend to produce better results, when there is a signiﬁcant diﬀer-

ence among the models (L. I. Kuncheva & Whitaker, 2003). Therefore many ensemble methods focus on promotion of diversity among the models they combine. However,

a variety of strong learning algorithms that can be used to produce strong ensemble, will be considered more eﬃcient in ensemble learning (L. I. Kuncheva & Whitaker,

2003).

3.2.3 Importance of Studying Ensembles

Studying ensemble learning methods is appealing in many ways (Parikh & Polikar, 2007). Firstly, ensemble methods are easy to implement. However, they consist of

complex underlying dynamics that require hundreds of hours to study. A key instance of this is the bagging algorithm. Bagging is considered to be one of the most widely

used ensemble techniques which can be implemented in a few lines of code.

Secondly, wide applicability of ensemble methods to research problems is also

an attractive aspect of it (Barkia, Elghazel, & Aussem, 2011). Various ensemble methods were developed in previous years, are still relevant and will be relevant to

new learning techniques in the future. However, there are various approaches for constructing ensembles, but in our research we will be primarily focusing on SVM

based ensemble methods construction. Working at this level at least allows us to contribute knowledge to various disciplines such as artiﬁcial intelligence, machine

learning, ﬁnancial forecasting and statistics to name but a few. As development in machine learning is still growing, ensemble methods continue to be applied even to

more advanced models (Barkia et al., 2011; Parikh & Polikar, 2007). With ensemble methods, researchers will be able to extract a little bit more valuable information

However, the study of why ensemble methods could do well is of elementary im-

portance. If someone understands why a method needs to be applied successfully to a certain problem, then developing a new tool for machine learning will not be a

diﬃcult task.

3.2.4 Purpose of Ensemble Based Systems

The output of a learning algorithm based on a single hypothesis suﬀers from three basic problems. Firstly, a statistical problem arises when the learning algorithm is

looking for a space of hypothesis which is too large for the training data. In such a scenario there may be various hypotheses that result in the same accuracy on

training data, whereas, the learning algorithm has to select one of those hypotheses from output. There is also a risk that selected hypotheses will not predict future data

points correctly. In this case, a simple vote of all equally good classiﬁers will reduce such statistical problems.

Secondly, a computational problem arises when a learning algorithm fails to iden- tify the best hypothesis within the hypothesis space. For example, in neural networks

and decision trees, to locate the best hypothesis that fits the training data are compu- tationally difficult to find, so heuristic methods (D. Opitz & Maclin, 1999) need to be

applied. However, these heuristic methods (such as gradient descent) can be stuck in local minima (Rokach, 2009), and hence result in failure to ﬁnd the best hypothesis.

This statistical problem can be reduced by using a weighted combination of various local minima, hence reducing the risk of selecting the wrong local minima to output.

Finally, a representational problem is the result of hypothesis space that does not contain any hypothesis which provides good approximations to the true function f.

This representation problem can be reduced by using a weighted vote of hypothesis (Rokach, 2009). This will enable the learning algorithm to form an accurate approx-

3.3 A Brief History of SVM Based Ensemble

In document Support Vector Machine (SVM) aggregation modelling for spatio-temporal air pollution analysis (Page 58-61)