Pixel Labeling Hints - Problem Definition

5.2 Pentomino Dataset

5.2.1 Problem Definition

5.2.3.1 Pixel Labeling Hints

Recall that the Pentomino problem infers a binary classification task, where G¨ulc¸ehre and Bengio (2016) tested several machine learning algorithms on the Pentomino dataset, most had very poor performance, generally no better than the results of a coin-flip. The best results of the model learned without hints can only reach 64% accuracy in a dataset with 40,000 test examples, but the accuracy significantly increases to around 97% when layers 1-3 were pretrained using hints.

Before setting off to learn a pixel labeling model, I first examined how the original model would perform if it was provided the exact Pentomino probability at every pixel. Would this information beneficial to P1NN for identifying the Pentomino for the block, or, even more, is it useful to P2NN for classifying whether the Pentominoes in the image were the same type or not? To validate this hypothesis, I conducted a simple experiment. As described in 5.2.2.3, if I set a small aperture window of size 5×5 traversing all 10 Pentominoes, I obtained 284 distinct patterns. Recall the images are binary, and the 284 are a subset of the225possible patterns. The target vector is obtained by examining whether the particular patch pattern appears in a given Pentomino type. For example, if a patch pattern appears in Pentomino 2 and 3, the target vector for such patch pattern will be 0.5 on Pentomino 2 and 3. I calculated the probabilities for each 11 Pentomino labels for each individual pattern and assigned the likely Pentomino probabilities to every pixel in P1NN input. The accuracy of the identification task in P1NN was increased to 98.182%, and the performance for

Figure 5.6: Confusion matrix of the pixel labeling on all transformations of Pentominoes. Columns are the predicted results of the pixel labeling model, and rows are the ground truth of ten Pentominoes with eight transformations plus background (10), where the numbers represent ten Pentomino type (0-9) and the letters indicate the eight transformations (a-d, A-D). Lowercase (a-d) are small Pentominoes (scale 3) rotated by four angles and uppercase (A-D) are four rotated large Pentominoes (scale 4). The values in the confusion matrix are the normalized predicted probabilities of each transform Pentomino being particular Pentomino type, where white means low probability and black illustrates high probability.

Table 5.1: Image classification results of two SMLP learning processes in 20k dataset compared with applying pixel hints or not.

Without Pixel Hints With Pixel Hints

SMLP-NoHints 53.836% 73.02%

SMLP-Hints 67.323% 73.44%

final P2NN task increased to 91.516% accuracy. This simple experiment demonstrates that pixel labeling can be used as prior knowledge, guiding the model learning in the right direction.

I then attempted to learn a pixel labeling model with a standard CNN architecture, and applied it to P1NN on the modified Pentomino dataset. Using the original SMLP architecture, I got 79.533% accuracy on P1NN intermediate task, block Pentomino identification, in 20k dataset. After applying the pixel labeling to SMLP, the performance improves to 87.189%.

Figure 5.6 shows the confusion matrix of the pixel labeling results on P1NN task. The rows in the confusion matrix are the normalized predicted probabilities of each Pentomino type given a Pentomino orientation and scale, where white means low probability and black illustrates high probability. Half of the Pentomino shapes are the mirror images of the other half. For example, Pentomino 5 is the mirror image of Pentomino 0, Pentomino 1 and 6 are mirror-image pair, and etc. Because the feature representations of those Pentomino mirror-image pairs are similar, this often lead to confusion between mirror versions. The transformed Pentomino are correctly identified in the vast majority of examples, achieving 72.84% accuracy. It is obvious that, with the pixel labeling hints, P1NN can easily classify the type of Pentomino, even the Pentomino is in different transformations.

Table 5.1 shows the results of SMLP-Nohints and SMLP-Hints with and without adding the pixel labeling hints on the meta task, whether the types of Pentominoes in the image are the same or not, in 20k dataset. When the model learns without any hints, the learning procedure appears to search the parameter space blindly not knowing what to look for, and seems to fall into local minimum that leads to an accuracy indistinguishable from random guessing. However, with a simple hint like the one proposed in (G¨ulc¸ehre and Bengio, 2016), the meta-task accuracy improves.

Table 5.2: Sensitivity analysis of pixel labeling hints and P1NN hints on image classification task. Columns represent the corruption rates applied to the pixel labeling results.

0% 10% 20%

Corrupted pixel labeling hints 73.44% 72.56% 70.20%

Corrupted P1NN hints 73.44% 73.44% 71.80%

Moreover, applying the pixel hints to SMLP-Nohints architecture increases the performance by an additional 19.184% and by 6.117% in SMLP-Hints architecture. This suggests that if the model can learn with an appropriate guidance, it will learn better and reduce classification errors as well. Although I expected that including the pixel labeling hints in SMLP-Hints architecture would improve the performance even more, the best result I achieved was 73.44%. Because of the ambiguity of Pentomino patterns when viewed through an aperture the hints provided by the pixel labeling have limitations to give P1NN exact answers of which Pentomino type it identifies. But overall, from the experiments on P1NN and P2NN tasks, they both demonstrate that the pixel labeling hints provide beneficial information to all the downstream tasks and improve the performance significantly.

In the original paper, SMLP-Hints had 69.3% accuracy in 20k dataset and 96.9% in 40k dataset. When I compared my results with SMLP-Hints in 20k dataset, the pixel labeling hints improved their Pentomino-discrimination hints by around 5%, but does not outperform their hints in 40k dataset. Although they claim that adding more training examples did not help for both training and test error, my experiments do not support the assumption using SMLP-Hints. Moreover, their model relies on each Pentomino falling within predefined image regions which is hard to achieve for real classification problems without applying a segmentation or object detection task beforehand; in contrast, the pixel labeling hints do not require objects to be segmented in advance. The hints provided by the pixel labeling classifier is therefore more feasible and practical in a general setting.

In document Kao_unc_0153D_17544.pdf (Page 70-74)