Performance of augmentation techniques - Semi-supervised learning and fairness-aware learning u

3.6 Experiments

3.6.6 Performance of augmentation techniques

The performance of Co-Training and Self-Learning is evaluated using the area under precision-recall curve (AU-PRC) which better reflects classifier’s performance, comparing to e.g., accuracy, in case of class imbalance [SR15] and also provides more informative representations than AUC [HG09]. We perform holdout evaluation (67% training and 33% testing split) and report AU-PRC and the number of predicted instances per iteration. In addition, we show the ratio of the annotated corpus (in- cluding ground truth while in Table3.14 we show only the ratio of final predictions). Self-Learning-based augmented batch annotation

Figure3.15, demonstrates the performance of the original unmodified data same as in Section3.6.1. We observe the same behavior as before, e.g., performance is degrading whileδis increased, however forδ= 100%performance is maximized while the labeled instances are significantly reduced compared to other thresholds.

On the other hand, undersampling, oversampling, their combination and blankout methods have better performance for low δ values (Figures 3.16, 3.17, 3.18 and 3.19). Moreover, by comparing these methods to the original Self-Learning procedure in Figure 3.15, we observe that the performance is better when augmentation is em- ployed. Also, the class imbalance problem is tackled as in each iteration the minority (negative) class receives more instances compared to the original Self-Learning.

However, word embeddings do not perform equally good as the other augmentation methods. In Figure 3.20, we see that word embeddings have same behavior as original Self-Learning. Also, the negative instances are fewer in all the other methods. A probable reason of this behavior could be that the augmentation process generates pseudo-instances of the opposite class. Swapping terms with other semantically sim- ilar terms does not guarantee that the sentimentality of the terms will be the same.

(a) Au-PRC (b) Number of instances per iteration Figure 3.16: Self-Learning combined with under-sampling

(a) Au-PRC (b) Number of instances per iteration Figure 3.17: Self-Learning combined with over-sampling

3.6 Experiments 65

(a) Au-PRC (b) Number of instances per iteration Figure 3.18: Self-Learning combined with under-sampling and over-sampling

(a) Au-PRC (b) Number of instances per iteration Figure 3.19: Self-Learning combined with blankout

(a) Au-PRC (b) Number of instances per iteration Figure 3.20: Self-Learning combined with word embeddings

δ Original Blankout Glove Over. Under. Over.& Under.

65% 1:9 1:2 1:16 1:1 1:1 1:2 70% 1:9 1:1 1:18 1:1 1:1 1:2 75% 1:9 1:1 1:21 1:1 1:1 1:2 80% 1:9 1:1 1:24 1:1 1:1 1:2 85% 1:9 1:1 1:29 1:1 1:1 1:2 90% 1:10 1:1 1:36 1:1 1:1 1:2 95% 1:11 1:1 1:52 1:1 1:1 1:2 100% 1:10 1:7 1:11 1:7 1:7 1:7

Table 3.15: Self-Learning: Class Ratio (negative:positive) of the predictions for dif- ferent methods

Even though we employ SentiWordNet to tackle this problem, it is not enough due to the grammatical and syntactical structure of a tweet, therefore the performance is degrading.

In Table 3.15, we show the ratio of negative, positive predicted instances per method. In contrast to Table 3.14, this table includes the ground truth while the latter shows the overall predicted instances, thus when δ = 100% the diﬀerence is significantly reduced. Blankout, oversampling (Over.), undersampling (Under.) and the combination of the last two (Over. & Under.) methods tackle the problem of class imbalance. Word embeddings on the other hand, are enhancing the gap between the two classes.

Furthermore, we have performed two significant tests: paired t-test and McNe- mar’s test, to compare original Self-Learning with the other augmentation methods. For every method, we obtained highly significant results in both tests (forα= 0.01).

3.6 Experiments 67

(a) Au-PRC (b) Number of instances per iteration Figure 3.21: Original Co-Training

(a) Au-PRC (b) Number of instances per iteration Figure 3.22: Co-Training combined with oversampling

Co-Training-based augmented batch annotation

For Co-Training, we report on both models: bigrams and unigrams in Figures 3.21, 3.22, 3.23, 3.24, 3.25 and 3.26. In Figure 3.21, the original Co-Training method is shown for which unigrams are slightly better than bigrams. Nonetheless, by comparing the class labels and performance of original Co-Training with the augmented methods, we observe significant diﬀerences. However, same as in Self-Learning the word embeddings do not exhibit good performance compared to the other augmentation methods.

Nonetheless, by comparing Tables 3.16 and 3.17, we see that word embeddings reduce the gap between the two classes in the unigram model while the opposite is happening in the bigrams model. The latter model however is trained based on the predictions of the first model which implies that unigrams as feature space are aﬀected more than bigrams. Other augmentation methods such as oversampling and undersampling deal with class imbalance eﬃciently by balancing the two classes while maintaining high performance. In addition, same as in Self-Learning we performed the two significant tests, namely McNemar and pair t-test for which we compared the

(a) Au-PRC (b) Number of instances per iteration Figure 3.23: Co-Training combined with undersampling

(a) Au-PRC (b) Number of instances per iteration Figure 3.24: Co-Training combined with over and under sampling

(a) Au-PRC (b) Number of instances per iteration Figure 3.25: Co-Training combined with blankout

3.6 Experiments 69

(a) Au-PRC (b) Number of instances per iteration Figure 3.26: Co-Training combined with embeddings

δ Original Blankout Glove Over. Under. Over.& Under.

65% 1:4 1:2 1:3 1:2 1:1 1:1 70% 1:4 1:2 1:3 1:2 1:1 1:1 75% 1:4 1:2 1:3 1:2 1:1 1:2 80% 1:5 1:2 1:3 1:2 1:1 1:2 85% 1:6 1:2 1:4 1:2 1:1 1:2 90% 1:7 1:2 1:5 1:2 1:1 1:2 95% 1:9 1:2 1:6 1:2 1:1 1:2 100% 1:12 1:10 1:13 1:10 1:7 1:10

Table 3.16: Co-Training Unigrams: Class Ratio (negative:positive) of the predictions for diﬀerent methods

δ Original Blankout Glove Over. Under. Over.& Under.

65% 1:8 1:1 1:14 1:1 1:1 1:1 70% 1:9 1:2 1:14 1:1 1:1 1:1 75% 1:9 1:2 1:15 1:1 1:1 1:1 80% 1:9 1:2 1:16 1:1 1:1 1:2 85% 1:9 1:2 1:19 1:1 1:1 1:2 90% 1:9 1:2 1:21 1:1 1:1 1:2 95% 1:9 1:2 1:26 1:2 1:1 1:2 100% 1:9 1:8 1:11 1:8 1:7 1:8

Table 3.17: Co-Training Bigrams: Class Ratio (negative:positive) of the predictions for diﬀerent methods

augmentation methods with the original Co-Training. For all the methods and the δ

values we obtained highly significant results (forτ = 0.01).

In document Semi-supervised learning and fairness-aware learning under class imbalance (Page 81-88)