4.4 Extended data set grouping: a compression approach
4.5.3 MSSL for multi-class classification problems
4.5.3.4 Comparing among proposed multi-class MSSL techniques
Finally, Figures 4.12 and 4.13 show the difference among MMSSL approaches. It is appreciable how MMSSL using labels fails classifying long areas of contiguous pixels, while the rest of methods predict them correctly. This is because using confidences for each sample instead of the most probable label makes possible to break ties of equiprobable classes. Differences between compressed and non compressed methods are not so straightforward to see, but while in Figure 4.12 there are few differences in Figure 4.13 we can see how non compressed methods leads to smoother results than compressed methods. Compressed methods tend to fail in closing some classes, appearing spurious pixels inside them. Sometimes this behavior can lead to a better classification if it was misclassified, as it happens in the last row of Figure4.12. The second row of Figure4.13shows the learning capacity of the likelihood maps. In column (d), MMSSL without compression, we can see a boundary marked as unknown object (class 0, black label) in front of the building. Inside this region, it is marked as car label. In the original image, column (a), it is appreciable a woman riding a bicycle in those area. Although in the groundtruth image, column (b), these elements are omitted, our method using likelihood maps is capable of detecting them as an element different to road or building, and assigning the inner region to car label, given its visual appearance and position.
(a) (b) (c) (d) (e) (f)
Figure 4.12: Comparative between multi-class multi-scale stacked sequential learning approaches in ETRIMS 4 Classes HOG database. (a) Shows the original image, (b) the groundtruth image, and (c), (d), (e), and (f) shows the different MMSSL schemes: (c) MMSSL using only label predictions, (d) MMSSL using confidences, (e) MMSSL using binary compression, and (f) MMSSL using ternary compression
(a) (b) (c) (d) (e) (f)
Figure 4.13: Comparative between multi-class multi-scale stacked sequential learning approaches in ETRIMS 8 Classes HOG database. (a) Shows the original image, (b) the groundtruth image, and (c), (d), (e), and (f) shows the different MMSSL schemes: (c) MMSSL using only label predictions, (d) MMSSL using confidences, (e) MMSSL using binary compression, and (f) MMSSL using ternary compression.
4.6
Conclusions
In this chapter we extended the Multi-scale stacked sequential learning framework in different ways. First we adapt the J for working with likelihood values instead of prediction labels when the first classifier is able to produce them. Secondly we adjust the MSSL framework for classifying objects at different sizes. In order to do this, we proposed theshifting technique at testing time. This allows to correctly classify objects at different sizes than the learned ones. Thirdly, we adapt the multi-scale sequential learning (MSSL) to the multi-class case (MMSSL). In order to do this, we put the ECOC framework into the base classifiers and show how to compute the confidence maps using the normalized margins obtained from the ECOC base classifiers. Finally we define a compression approach for reducing the number of features in the extended data set. The results show that, on the one hand, MMSSL achieves accurate classification performance in multi-class classification problems taking benefit of sequential learning. On the other hand, the compression process is feasible, since in terms of accuracy the loss of information is negligible.
In next chapter we present an example application of MSSL for human body segmen- tation, where exists sequential dependences between instances. In this scenario MSSL is used with great success, improving results with respect state-of-the-art methodologies.
Application of MSSL for human
body segmentation
Human segmentation in RGB images is a challenging task due to the high variability of the human body, which includes a wide range of human poses, lighting conditions, cluttering, clothes, appearance, background, point of view, number of human body limbs, etc. In this particular problem, the goal is to provide a complete segmentation of the person/people appearing in an image. In literature, human body segmentation is usually treated in a two-stage fashion. First, a human body part detection step is performed, obtaining a large set of candidate body parts. These parts are used as prior knowledge by segmentation/inference optimization algorithms in order to obtain the final human body segmentation.
In the first stage, that is the detection of body parts, weak classifiers are trained in order to obtain a soft prior of body parts (which are often noisy and unreliable). Most works in literature have used edge detectors, convolutions with filters, linear SVM classifiers, Adaboost or Cascading classifiers (67). For example, (58) used a tubular edge template as a detector, and convolved it with an image defining locally maximal responses above a threshold as detections. In (57), the authors used quadratic logistic regression on RGB features as the part detectors. Other works, have applied more robust part detectors such as SVM classifiers (15, 35) or AdaBoost (51) trained on HOG features (19). More recently, Dantone et. al used Random Forest as classifiers to learn body parts (20). Although recently robust classifiers have been used, part detectors still involve false-positive and false-negatives problems given the similarity nature among body parts and the presence of background artifacts. Therefore, a second stage is usually required in order to provide an accurate segmentation.
In the second stage, soft part detections are jointly optimized taking into account the nature of the human body. However, standard segmentation techniques (i.e. region- growing, thresholding, edge detection, etc.) are not applicable in this context due to the huge variability of environmental factors (i.e lightning, clothing, cluttering, etc.) and the changing nature of body textures. In this sense, the most known models for the optimization/inference of soft part priors are Poselets (9, 51) of Bourdev et. al. and Pictorial Structures (4,30,62) by Felzenszwalb et. al., both of which optimize the initial soft body part priors to obtain a more accurate estimation of the human pose, and provide with a multi-limb detection. In addition, there are some works in literature that tackle the problem of human body segmentation (segmenting the full body as one class) obtaining satisfying results. For instance, Vinet et al. (66) proposed to use Conditional Random Fields (CRF) based on body part detectors to obtain a complete person/background segmentation. Belief propagation, branch and bound or Graph Cut optimization are common approaches used to perform inference of the graphical models defined by human body (38,39,59). Finally, methods like structured SVM or mixture of parts (72,73) can be use in order to take profit of the contextual relations of body parts.
In this chapter, we present a novel two-stage human body segmentation method based on the Multi-Scale Stacked Sequential Learning (MSSL) framework. In the first stage of our method for human segmentation, a multi-class Error-Correcting Output Codes classifier (ECOC) is trained to detect body parts and to produce a soft likelihood map for each body part. In the second stage, a multi-scale decomposition of these maps and a neighborhood sampling is performed, resulting in a new set of features. The extended set of features encodes spatial, contextual and relational information among body parts. This extended set is then fed to the second classifier of MSSL, in this case a Random Forest binary classifier, which maps a multi-limb classification to a binary human classification problem. Finally, in order to obtain the resulting binary human segmentation, a post-processing step is performed by means of Graph Cuts optimization, which is applied to the output of the binary classifier.