Three Novel SVMs for Multi-Target Regression

Three novel models have been implemented for the purposes of multi-target regression. The base model is the SVR model, where m single-target soft-margin non-linear support

vector regressors (NL-SVR) are built for each target variable Yj.

For NL-SVR, the regularized soft-margin loss function given in equation (2.10) is mini- mized. This contribution involves solving the dual of this formulation given by (2.11). Using the dual formulation, the multi-target problem is solved by transforming it into m single- target problems, as shown in Algorithm 3.1 and Figure 3.1. This algorithm will output m single-target models, hj,∀j = 1, . . . , m, for a given datasetD. It first splits the dataset into

m separate ones, Dj, each with a single-target variable Yj, and then builds a distinct SVR

model for each of the datasets.

Buildingm ST models is a good base-line, but as mentioned previously, it does not cap- ture possible correlations between the target attributes during training. If these correlations

D₁: [X][Y1] h1 :D1 →Yˆ1

D₂: [X][Y2] h2 :D2 →Yˆ2

D: [X][Y] ...

Dm: [X][Ym] hm :Dm →Yˆm

Fig. 3.1.: SVR Flow Diagram. Firstly, the multi-target dataset is divided into m ST datasets, D1,D2, . . . ,Dm. Thenm models, h1, h2, . . . , hm, are independently trained for each ST dataset. Algorithm 3.1 Multi-Target Support Vector Regression (SVR)

Input: Training dataset D

Output: ST modelshj, j= 1, . . . , m

1: forj= 1 to m do

2: Dj ={X,Yj} . Get ST data

3: hj :X →R . Build ST model for thejth target

Algorithm 3.2 Build Chained Model Input: Training dataset D, random chainC Output: A chained modelhj, j ={1, . . . , m}

1: D1 ={X,YC1} . Initialize first dataset

2: forj= 1 to m do . For each target in chainC

3: hj :Dj →R . Train model on appended dataset

4: if j < mthen

5: Dj+1 =

Dj,YCj .Append new target in chain to dataset

6: end if 7: end for

are not exploited, this could retract from the model’s potential performance. Therefore, creating an ensemble model using a series of random chains was proposed, using the base-line

SVR method, named SVR Random Chains (SVRRC).

For SVRRC, ensembles of at most 10 m-sized random chains,C, are built from different and distinct permutations of the target variable indices. When chaining target values, there are two main options: using the predicted value as input for the following target, or using the true value of the target variable as input of the subsequent targets. The main problem with the former approach is that errors are propagated throughout the chained model, therefore SVRRC employs chaining of the true values.

For each random chain, a new model is trained by predicting the first target variable in the chain. Next, the first target’s true value, Yj, is appended to the training set. This

chaining process is repeated for all the target indices in the chains,{C1, . . . ,Cc} ∈ C, c ≤10 .

This process will be repeated for each random chain generated, returning an ensemble of chained SVRs. Algorithm 3.2 describes the process of building a chained model given chain C ∈ C, and Algorithm 3.3 shows the steps taken by SVRRC.

Given this ensemble of chained models, the predicted values for a given unseen instance are calculated by taking the mean of the multiple models generated using different random chains. Since the unseen input has no known target value, the predicted value at each step of the chain ˆYj is appended to the input at each step of the chain.

D: [X][Y1Y2Y3]

[1,2,3] [1,3,2] . . . [3,2,1]

h1 : [X]→Yˆ1 h1: [X]→Yˆ1 . . . h1 : [X]→Yˆ1

h2: [XY1]→Yˆ2 h2 : [XY1]→Yˆ3 . . . h2: [XY3]→Yˆ2

h3 : [XY1Y2]→Yˆ3 h3: [XY1Y3]→Yˆ2 . . . h3 : [XY3Y2]→Yˆ1

Fig. 3.2.: SVRRC Flow Diagram on a dataset with three targets. SVRRC first builds the six random chains of the target’s indices (three examples are shown). It then constructs a chained model by proceeding recursively over the chain, building a model, and appending the current target to the input space to predict the next target in the chain.

Algorithm 3.3 Multi-Target SVR with Random-Chains (SVRRC) Input: Training dataset D,crandom chains C

Output: An ensemble of chained modelshC

1: for each C ∈ C do . For each random chain

2: hC ←buildChainedModel(D,C) . build a chained model for chainC 3: end for

models, the number of ensembles and chains are limited to a maximum of 10. However, if the number of target variables is less than 3, i.e. m!≤10, allm! random chains are constructed. A disadvantage of building an ensemble of 10 random chains stems from the fact that: when the number of output variables increases, the number of possible chains increases factorially. Therefore, there is no guarantee that the 10 random chains generated will truly reflect the relationships among the target variables. Additionally, building an ensemble of regressors is computationally expensive. Finding a heuristic that allows the identification of a single, most appropriate chain, which fully reflects the output variable interrelations would improve the scalability of training the ensemble.

D: [X][Y1Y2Y3] [1,2,3]

h1: [X]→Yˆ1 h2 : [XY1]→Yˆ2 h3 : [XY1Y2]→Yˆ3 generate maximum correlation chain

E[(_Yi−_µi)(_Yj−_µj)]

E[(Yi−_µi)(_Yi−_µi)]E[(_Yj−_µj)(_Yj−_µj)]

Fig. 3.3.: SVRCC Flow Diagram on a sample dataset with three targets. SVRCC first finds the direction of maximum correlation among the targets and uses that order as the only chain. It then constructs the chained model, as done in SVRRC.

Algorithm 3.4 Multi-Target SVR with max-Correlation Chain (SVRCC)

1: P=corrcoef(Y) . Find correlation coefficient matrix for target variables

2: C =Pn_i₌₁|Pij|,∀j= 1, . . . , m . Sum row elements of the correlation coefficient matrix

3: C =sort(C,decreasing) . Sort sums in decreasing order 4: hC =buildChainedModel(D,C) . build a chained model for max correlation chainC

The third proposal was designed to remedy this issue. It builds a single chain based on the maximization of the correlations among the target variables. By calculating the correlation of the target variables and imposing it on the order of the chain, this ensures that each appended target provides some additional knowledge on the training of the next. With SVRRC, there is no reasoning behind the generation of these chains, and since the number of random chains generated is limited to 10, there is no way of ensuring that the 10 chains fully represent the targets’ dependencies. Calculating and using the correlations of the targets would break this uncertainty. Algorithm 3.4 presents the SVR maximum Correlation Chain

(SVRCC) method. The computational complexity and hardware constraints (memory size) are negligible during the construction of the targets’ correlation matrix, since the correlation matrix would be an (m×m) matrix, and the likelihood that the number of targets is large enough to cause a memory issue is minimal.

To calculate the correlation coefficients of the targets, the targets’ co-variance matrix,

Σ, is first calculated as shown in Equation 3.1:

Σij =cov(Yi,Yj) = E[(Yi−µi)(Yj −µj)], (3.1)

will show how the targets change together.

The correlation coefficients matrix, P, is then calculated as shown in Equation 3.2:

P=corrcoef(Y) = _pΣij

ΣiiΣjj

, ∀i, j ∈ {1, . . . , m}, (3.2)

which describes the linear relationship among the target variables. The coefficients are then sorted in decreasing order, creating the maximum correlation chain.

In document Novel Support Vector Machines for Diverse Learning Paradigms (Page 43-47)