Ratio of % of label flipped examples selected as support vectors to the

% of label flipped examples

Figure 4.5: The ratio of the % of the label flipped examples that got selected as the support vectors to the % of the label flipped examples having a particular functional margin. The rate at which the label flipped examples that got selected as support vectors drops with a decrease in the functional margin.

Table 4.1: Datasets used in the experiments

Dataset # Examples Feature

dimension

UCI letter recognition (h vs b and r) 2171 16

UCI letter recognition (r vs h and b) 2171 16

MNIST Digit recognition (9 vs 4 and 7) 5000 784

acoustic 5000 50

ijcnn1 5000 22

seismic 5000 50

splice 1000 60

datasets 5000 examples were randomly sampled for each experiment to reduce the computation time.

Both the linear and RBF kernels were tested. The SVM regularization parameterC was randomly

sampled between 2−1 and 26. The RBF kernel parameter γ was randomly sampled between 2−6

and26_{. The reported results were the average of 1000 experiments. Each experiment was a random}

combination of the datasets, kernels and SVM parameters (C and γ). All the experiments were

The labels were flipped for all the examples xk with slack value ξk > 2. Labels were

flipped for 147,245 examples (|S|) in 1000 experiments with a Linear kernel and 1595 (1.08%) of

them got selected as support vectors (|P|). Roughly 1 out of every 92 label flipped examples got

selected as a support vector. For the RBF kernel, labels were flipped for 46,969 examples in the

1000 experiments and 89 (< 0.2%) of them got selected as support vectors. This result supports

the hypothesis described earlier that only a small fraction of the label flipped examples will get selected as support vectors. The left image in Figure 4.4 shows the probability density of all label

flipped examples (examples in the setS) with respect to their functional margin for the linear kernel

experiment. Similarly, the right image shows the probability density of the label flipped examples

that got selected as support vectors (examples in the set P). It can be seen that the number

of examples in both the cases drops exponentially as the functional margin decreases. The rate of

decrease in the number of examples is higher for the setP compared to the setS. This phenomenon

is more evident in Figure 4.5. Figure 4.5 shows the ratio of % of examples in the set P to the %

of examples in the set S for a given value of the functional margin. The decrease in the ratio of

the two cases indicates that the number of examples in the setP decreases faster than the number

of examples in the set S with a decrease in the functional margin value. It can be observed from

the experiments that the chance of getting label flipped examples selected as support vectors drops from 1.54% to 0.06% when the functional margin decreases from -1 to -1.5.

4.4 General Scenarios For Which AC_SVM Fails

AC_SVM creates an SVM classifier to find label noise examples. The label noise examples (identified mislabels) are found by manually reviewing the support vectors of the SVM classifier.

is repeated until no label noise example is selected as a support vector. Sections 4.2 and 4.3 show that it is possible to create label noise examples that can evade this AC_SVM method. These label noise examples lie farther from their true margin boundary than the distance of the margin itself,

with functional margin<−2. AC_SVM will fail to find these label noise examples, as they are not

support vectors. These examples appear to be on the correct side of the decision boundary (and actually do not affect the boundary). We refer to this condition as the imposter criterion in the

following discussion. Here, we assume that there is no difference in the parameters (C and kernel

dependent parameters, for example γ) before and after the label flip. We divide the dataset into

two types 1) non-separable and 2) separable and describe the general characteristics of the examples satisfying the imposter criterion. We do not know of any other characteristics of the examples that can be exploited to create label noise such that only a small fraction of them will get selected as support vectors. We do not quantify the % of label flipped examples that will get selected as support vectors through our argument. Our experimental results with uniform random noise show that less than 5% of examples will escape detection by AC_SVM, when applied iteratively.

4.4.1 Imposter Criterion Dataset Characteristics

There are at least two characteristics of a dataset which can result in label noise examples that satisfy the imposter criterion: 1) non-separable data where some of the examples might appear closer to examples in the opposite class in feature space, 2) separable data where the probability distribution of the features from at least one of the classes is multi-modal and/or contains sparsely distributed regions.

4.4.1.1 Non-separable Data

Based on the results demonstrated in Section 4.3 it is clear that flipping labels of examples

with functional margin<−1, gives them a low probability of getting selected as support vectors. In

general, we argue that flipping the labels of all the examples with functional margin<−∆, where

∆≥0, with an optimal hyperplaneHw will create a large number of undetected imposter examples.

Flipping the labels of all the examples with slack value ξk > 1 + ∆, where ∆ is a positive value,

creates a space where all the examples have correct labels with respect to the decision boundary beyondfw(xk)<−∆andfw(xk)>∆. If a hyperplane lies inside this space to include the examples

in this region as support vectors its cost function will get negatively affected due to all the examples in this space that lie on the wrong side of the margin boundary becoming support vectors and increasing the cost function through their slack values. So we argue that the optimal hyperplane will not lie inside this region and only a small fraction of the label flipped examples that lie in the boundary of this space will get selected as support vectors.

Table 4.2: The % of label noise examples that get selected as support vectors after flipping the

labels for a given % of randomly chosen examples with functional margin <−0.5

Examples

Mislabeled

mislabeled examples selected

In document Active Cleaning of Label Noise Using Support Vector Machines (Page 82-85)