• No results found

AdaBoost learning of shape and color features for object recognition

2. Related Work

Sung and Poggio (Sung & Poggio, 1998) and Rowley et al. (Rowley et al., 1998) present early trainable sys- tems in the face detection domain. The former assume a mixture of Gaussians for both object and background classes while the latter use a multilayer neural network. A number of methods follow with different learning algorithms. The major obstacle to a generic object detection system lies in their exploration of training

data. The performance of appearance-based methods, where all pixel values are used in classification, is likely to degrade when background is embedded with object examples. In addition, the scalability of the learning techniques has to be examined because of the large size and/or high dimensionality of the training data for other object classes rather than human faces. Finally, the “bootstrap” method of collecting negative exam- ples (Sung & Poggio, 1998) is not easy to automate for generic object classes, for example the setting of the accuracy of the system in each bootstrapping round. Viola and Jones (Viola & Jones, 2004) present a fast object detection system by using a cascade of classi- fiers. This type of classifiers provides a viable approach to exploring negative examples. However, training an optimal classifier of this type is extremely difficult (Vi- ola & Jones, 2004). Thus, a heuristic approach is adopted. As a result, the generalization performance of a cascade of classifiers is not clear.

The works in (Burl & Perona, 1996; Ioffe & Forsyth, 2001; Agarwal & Roth, 2002) belong to the class of detection-by-part methods. In this approach the ob- ject parts are first detected, then grouped to form objects according to an explicit spatial relationship among parts. This approach is intuitive. However, under the presented formulation only translation of parts is dealt with. The difficult problem of learning the object model is addressed in (Weber et al., 2000; Ioffe & Forsyth, 2001; Agarwal & Roth, 2002) where the specific object class is handled.

Detection by part can be seen from a different per- spective. Mohan et al. (Mohan et al., 2001) model the pedestrian object class by six components. The sup- port vector machine learning method (Vapnik, 1998) is used to train a detector for each component and to train a combined classifier. The system shows robust detection even when partial occlusion occurs, which is a clear advantage of this approach. Their result also shows that combination of classifiers outperforms a single classifier approach. The major drawback is that the model is constructed manually. This prob- lem is in fact also present in their related work (Pa- pageorgiou & Poggio, 2000) where a reduced subset of features is selected manually to improve the detection speed.

In summary, methods that assume a generative model are not suitable for generic detection system, while dis- tribution free methods such as support vector machine (SVM) (Vapnik, 1998) or Sparse Network of Winnow (SNoW) (Yang et al., 2000) do not fully address the class imbalance problem in an automatic manner. In addition, appearance-based methods do not provide a

viable solution to the problem where background is embedded with object examples. The detection by part approach deals with this problem elegantly. How- ever, the problem of learning a generic object model remains unsolved. Furthermore, current methods con- sider object parts at one scale and orientation only, and hence important discriminative features might not be used.

3. AdaBoost learning

Let us consider a standard two class classification problem. Let there be a training set {(xi, yi)} drawn from some fixed but unknown distribution P(x, y) on X ×Y, where X is the space of the data variable x andY ={−1,1}is the set of the class labely. In our context, −1 denotes the background class and 1 de- notes the object class. The task is to predict the label y givenx.

Among the various learning techniques, ensemble learning methods (Freund & Schapire, 1997; Breiman, 1998) are suited for our problem because they are ef- ficient and robust with respect to training data while making no assumption about the underlying distribu- tion. The fact that they work directly in the distribu- tion space of the input data allows us to deal with the class imbalance problem in a simple manner. They are flexible in that prior knowledge can be incorpo- rated via the class of base classifiers. This allows us to design a discriminative model combining both color and shape information.

In this paper we are interested in a class of ensem- ble methods which finds a sparse linear combination of base classifiers (Freund & Schapire, 1997). Specifi- cally, suppose that there are a set of classifiers (weak hypotheses) H={ht :X → Y} and a learning algo- rithm (base learner) which returns a hypothesisht∈ H for any distribution over the inputs. The number of classifiers inHcould be infinite. A classifier ensemble is constructed by iteratively calling the base learner with an appropriate distribution, depending on the empirical performance of the hypotheses learned in the previous steps.

The AdaBoost algorithm (Freund & Schapire, 1997) is a powerful ensemble learning method. Empirical stud- ies in (Breiman, 1998) show that the performance of the AdaBoost algorithm is similar or slightly better than related ensemble methods in terms of generaliza- tion. We choose the AdaBoost algorithm because of the ease of implementation. A summary of the algo- rithm is as follows.

The AdaBoost Algorithm

Input: N examples {(xi, yi)} and an initial distrib- ution represented by a set of weights D1(i) over the

examples.

Do for t= 1, . . . , T

1. Learn a hypothesis ht∈ H from the training ex- amples with distribution Dt.

2. calculate the empirical error ofht ǫt= Pri eDt[ht(xi)6=yi] (1) 3. set αt= 1 2ln (1−ǫt) ǫt 4. update Dt+1(i) = Dt(i)e (−αtyiht(xi)) Qt whereQt is a normalization factor.

Output: The final classifier

fH(x) = sign (gH(x)) (2) where gH(x) = T X t=1 αtht(x) (3)

gH(x) might be used to indicate the confidence of the

classification.