Combining Continuous Outputs - Support Vector Machine (SVM) aggregation modelling for spatio-te

A classiﬁer that results in continuous outputs for a given class can be expressed as

the degree of support to that class and accepted as an estimate posterior probability of that class (Z. Hu, Cai, Li, & Xu, 2005). The posterior probability needs suﬃcient

data and requires the classiﬁers’ outputs to be normalised up to 1 of over all classes. Various continuous output combiner methods are discussed below:

3.9.1 Algebraic Combiners

Algebraic combiners are non trainable combiners of continuous outputs of classiﬁers (L. I. Kuncheva, Bezdek, & Duin, 2001). The output of classiﬁers is combined with

the help of an algebraic expression. The overall support for each class is derived as a simple function of the supports obtained by individual classiﬁers. Various algebraic

rules are discussed below:

3.9.2 Mean Rule

The support for µj(x) for class wj is calculated as the average of all classiﬁers jth

outputs. Mathematically it can be represented from (3.3) (Polikar, 2006).

µj(x) = 1 T T ∑ t=1 dt,j(x) (3.3)

µj(x) represents total support for a given instance x by mean rule through a set

of classifiers that constitute ensemble. j represents the class, T is the total number of classifiers, dt is the decision ofjth classifier. Mean rule is considered as equivalent

to sum rule, as highlighted in various literatures (Arajo & New, 2007). The decision

for ensemble is taken as the class wj, having the largest total support for µj(x).

3.9.3 Weighted Average

This rule is the combination of mean and weighted majority voting. In this rule weights are not only applied to class labels, but also to actual continuous outputs.

This type of combination rule is applicable to trainable and a non-trainable combination rules, considering how the weights are calculated. If the weights are obtained

as a part of regular training during ensemble generation, as in AdaBoost, then it is considered as non-trainable combination rule. However, if the separate training is

involved in getting the weights, such as mixture of experts model, then it is referred to as a trainable combination rule. In this rule there is weight for each classiﬁer or

for each class and each classiﬁer. If we haveT weights,w1, . . . wT, which are obtained

through some measure of performance then the total support for wj is represented

from (3.4)(Lemke & Gabrys, 2010):

µj(x) = T

∑

t=1

wtdt,j(x) (3.4)

where wt represents weight of thetth classiﬁer for classifying class instances.

3.9.4 Trimmed Mean

If a classiﬁer results in unreliable or usually low, or unexpectedly high support to a

specific class, then it would adversely affect the mean combiner (Lemke & Gabrys, 2010). To avoid this kind of problem, the most optimistic and pessimistic classifiers

are eliminated from the ensemble prior to calculating mean, through a procedure called trimmed mean. For a percentage of trimmed mean, percentage of support is

eliminated from each end, and mean is calculated based on the remaining supports, eliminating extreme values of support.

3.9.5 Minimum/Maximum/Median Rule

As the names imply, these functions calculate minimum, maximum and median among the classiﬁers based on individual outputs. The minimum function is represented from

(3.5)(Duin, 2002).

µj(x) = mint=1···T{dt,j(x)} (3.5)

Total support µj(x) for a given instance x, a set of classiﬁers that constitute

ensemble through minimum rule is represented by (3.5). Where min represents the

minimum support provided by a classiﬁer dt to classify classes j = 1, ..., C.

The maximum function is represented from (3.6)(Duin, 2002).

µj(x) = maxt=1···T{dt,j(x)} (3.6)

Total supportµj(x) for a given instancexby maximum rule through a set of classiﬁers

that constitute ensemble is represented by (3.6). Wheremaxrepresents the maximum

support provided by a classiﬁer dt to classify classes j = 1, ..., C.

The median function is represented from (3.7)(Duin, 2002).

µj(x) = mediant=1···T{dt,j(x)} (3.7)

Total support µj(x) for a given instance xby median rule through a set of classiﬁers

that constitute ensemble is represented by (3.7). Where med represents the median

provided by a classiﬁer dt to classify classes j = 1, ..., C. The ensemble decision is

based on the selected class which has the largest total support. In minimum rule, it

selects a class which has minimum support among the classiﬁers. However, trimmed mean at limit 50% is considered to be equal to the median rule.

3.9.6 Product Rule

In this rule, supports provided by classifiers are multiplied. This rule is very delicate towards pessimistic classifiers. A low support provided by the classifier for a class is

eﬀectively removed, so there is no chance of that class being selected. However, if the individual posterior probabilities are calculated correctly at the classiﬁer outputs,

then this rule shows the best estimate of overall posterior probability of the selected class by ensemble. It can be represented from (3.8) (Alexandre, Campilho, & Kamel,

2001). µj(x) = 1 T T ∏ t=1 dt,j(x) (3.8)

Total supportµj(x) for a given instancexfor classesj = 1, C. Where T represents

the number of classiﬁers and dt represents the decision of thetth classiﬁers.

In document Support Vector Machine (SVM) aggregation modelling for spatio-temporal air pollution analysis (Page 78-81)