• No results found

ALGORITHMS FOR FACE AND FACIAL FEATURE DETECTION

N/A
N/A
Protected

Academic year: 2021

Share "ALGORITHMS FOR FACE AND FACIAL FEATURE DETECTION"

Copied!
71
0
0

Loading.... (view fulltext now)

Full text

(1)

XINGHAN LUO

ALGORITHMS FOR FACE AND FACIAL FEATURE DETECTION

Master of Science Thesis

Examiners: Dr. Atanas Gotchev Prof. Karen Egiazarian Institute of Signal Processing

Tampere University of Technology (TUT), Finland

Prof. Joern Ostermann

Institut fuer Informationsverarbeitung (TNT) Leibniz University of Hanover (LUH),

Germany

Examiners and topic approved in the Information Technology Department Council meeting on 1st June 2006

(2)

Abstract

TAMPERE UNIVERSITY OF TECHNOLOGY

Master's Degree Programme in Information Technology

LUO, XINGHAN: Algorithms for Face and Facial Feature Detection Master of Science (Technology) Thesis, 58 pages, 5 Appendix pages December 2007

Major: Image and Video Signal Processing

Examiners: Dr. Atanas Gotchev (TUT), Prof. Karen Egiazarian (TUT); Prof.

Joern Ostermann (LUH)

Keywords: face detection, facial feature detection, AdaBoost, cascade, Active Appearance Models, landmarks, optimization, facial animation

Automatic face detection and facial feature localization are important computer vision problems, which have challenged a number of researchers to develop fast and accurate algorithms for their solutions. This master of science thesis structured in two parts, addresses two modern approaches for face and facial feature detection, namely the cascade Adaptive Boosting (AdaBoost) face detection [4], [5] and Active Appearance Models (AAM) [1], [2] based facial feature detection. In the first part, AdaBoost method and its extensions are overviewed and a Matlab based test platform is described. To develop it, training and test face databases have been collected and generated. Matlab packages for face detection have been modified to include multiple detection elimination module and a graphical user interface for easy manipulation and demonstration. The demo software has been also extended with an OpenCV-based implementation. In the second part, facial feature detection as a first step in a facial animation application is studied. An AAM based facial feature detection system, aimed at lip motion and face shape detection/tracking has been targeted. Its development has included proper landmark definition for lip/face shape models, training/test set selection and marking, modification of an existing AAM module built on the OpenCV library.

Specifically, the detection accuracy has been improved by landmarks optimization based on training set self-rebuilt convergent iterations. Detection and tracking results presented demonstrate significant improvements compared with previous implementations. As the main contribution of this thesis project, an accurate facial feature geometric database has been set up based on automatic accurate facial feature detection. It is expected to improve the performance of facial animation [53] techniques being developed for 3DTV-related applications.

(3)

Preface

This master thesis has been accomplished during the period of September 2006 – August 2007. The research work has been split into two parts: study and development of face detection techniques and study and development of facial feature detection techniques for the need of facial animation systems. The former topic has been carried out in the Institute of Signal Processing, Tampere University of Technology (Finland), while the latter topic has been carried out in the Institut fuer Informationsverarbeitung (TNT), Leibniz University of Hanover (Germany) within the EC-funded 3DTV Network of Excellence. The work was funded by the same project.

I would like to express my sincere appreciation to my kind teacher and supervisor Dr. Atanas Gotchev at TUT, who has offered me this interesting thesis topic, given me valuable and helpful advice and instructions, always been supportive and patient. I would also like to thank Prof. Karen Egiazarian, who has provided financial support for the research, given important remarks and suggestions. I would like to extend my special thanks to my daily supervisor at LUH, PhD student Kang Liu, who has always been a nice friend and active colleague, leading the direction of the work and research, sharing his knowledge and experience with me, guiding me and helping me by those constructive discussions. Thanks to my German host Prof. Joern Ostermann, who warmly welcomed me and involved me in the ambitious research in his group, he has always been paying close attention to the progress of the work and offering precious technical remarks and suggestions.

Xinghan Luo

(4)

Table of contents

ABSTRACT...I PREFACE ... II ABBREVIATIONS ... V LIST OF FIGURES...VI LIST OF TABLES... VII

1. INTRODUCTION... 1

1.1 Problem statement ... 2

1.2 Thesis objectives... 3

1.3 Thesis organization ... 3

2. AN OVERVIEW OF TECHNIQUES FOR FACE AND FACIAL FEATURE DETECTION ... 4

2.1 Techniques for face detection ... 4

2.2 Facial feature detection algorithms... 7

2.3 Applications ... 10

3. ADABOOST AND CASCADE FACE DETECTION... 11

3.1 Algorithm basics ... 11

3.1.1 Features and integral image ... 12

3.1.2 AdaBoost core method ... 14

3.1.3 Cascade of classifiers ... 16

3.2 Extensions and software packages ... 17

3.2.1 Floatboost methods ... 17

3.2.2 OpenCV face detector ... 18

3.3 Implementations ... 19

3.3.1 Database collection ... 19

3.3.2 Adaboot Matlab implementation and modification ... 21

3.3.3 Matlab GUI-based platform for FD ... 22

3.3.4 Multiple detection elimination ... 26

4. AAM-BASED FACIAL FEATURE DETECTION ... 29

4.1 AAM basics ... 29

4.1.1 Building active appearance models... 30

4.1.2 AAM search ... 31

4.2 AAM tools and AAM-API package... 33

4.2.1 AAM tools... 33

4.2.2 AAM-API package... 33

4.2.3 Object contour detection and tracking ... 34

4.3 Training set optimization ... 35

(5)

4.3.1 Training and test set ... 35

4.3.2 Landmark definition and marking... 35

4.3.3 Optimized training set selection... 37

4.3.4 Landmark set optimization... 39

4.3.5 Accuracy assessment... 43

4.4 Appearance model initial projection... 44

4.4.1 Exhaustive approximation... 44

4.4.2 OpenCV-based initialization for face tracking... 44

4.5 Experimental results and discussion... 46

4.5.1 Training set landmark optimization results ... 46

4.5.2 Lip and face contour detection and tracking results... 48

4.5.3 Facial animation application and results ... 50

5. CONCLUSION ... 53

BIBLIOGRAPHY ... 54

APPENDIX ... 59

(6)

Abbreviations

FD Face Detection

FFD Facial Feature Detection AdaBoost Adaptive Boosting

AAM Active Appearance Models NN Neural Network

PCA Principal Component Analysis ATR Automatic Target Recognition HCI Human-Computer Interaction GST Generalized Symmetry Transform VPF Variance Projection Function PDM Point Distributed Models

SPDNN Self-Growing Probabilistic Decision Neural Network SVM Support Vector Machine

GWN Gabor Wavelet Network

OpenCV Intel’s Open Source Computer Vision Library FERET The Facial Recognition Technology

GPA Gerneralised Procrustes Analysis LLE Locally Linear Embedding

VTTS Visual Text to Speech Synthesizer TTS Text To Speech

(7)

List of figures

Figure 3.1 Four basis Haar-like features a, b, c, d ... 12

Figure 3.2 Integral image... 13

Figure 3.3 Basic scheme of AdaBoost and its main goals... 14

Figure 3.4 Schematic depiction of the detection cascade... 16

Figure 3.5 Extended set of Haar-like features ... 18

Figure 3.6 FERET and frontal face samples... 20

Figure 3.7 (a) Matlab GUI for FD ... 22

Figure 3.7 (b) Tool bar... 22

Figure 3.7 (c) (1)(2)(3) Single and batch processing... 23

Figure 3.7 (d) Real-time processing report... 23

Figure 3.7 (e) Statistics save and plot... 24

Figure 3.7 (f) Options and parameter setting... 24

Figure 3.8 (a)(b) Multiple detection elimination ... 26

Figure 3.9 Merge multiple detection ... 27

Figure 3.10 Eliminate low weight overlap detection... 28

Figure 3.11 Final result of multiple detection elimination ... 28

Figure 4.1 Example AAM texture and shape models of human face ... 30

Figure 4.2 Example appearance models of human face ... 31

Figure 4.3 AAM search on unseen image... 32

Figure 4.4 Typical AAM-API based object tracking system ... 34

Figure 4.5 Example training and test samples ... 35

Figure 4.6 Face and lip landmark and contour definition... 36

Figure 4.7 Flow chart for optimized training set selection... 37

Figure 4.8 Example of lip and face training samples by optimized selection ... 38

Figure 4.9 Flow chart for training set self-rebuild convergent iterations ... 39

Figure 4.10 (a)(b) Example 10-iteration convergence plot for x, y coordinates ... 40

Figure 4.11 Example correction for’ train000’ for the ‘mund’ training set ... 41

Figure 4.12 Flow chart for accuracy measurement ... 43

Figure 4.13 Comparison of initial approximation for face model... 45

Figure 4.14 (a) Highlight regions ... 46

Figure 4.14 (b) (1) Manual landmarks (2) Optimized landmarks ... 46

Figure 4.14 (c) (1) Manual landmarks (2) Optimized landmarks... 47

Figure 4.15 (a)(b)(c) Example resulting frames ... 48

Figure 4.16 (a)(b)(c) Example of typical error mouth tracking... 49

Figure 4.17 System block diagram of VTTS facial animation... 50 Figure 4.18 (a)(b)(c) Comparison of synthetic mouth motions in continuous frames 52

(8)

List of tables

Table 2.1 Main FD methods ... 4

Table 2.2 Main FFD methods ... 7

Table 2.3 FFD methods comparison and evaluation ... 9

Table 4.1 Video clip series for training and test samples ... 35

Table 4.2 91 training sample selection from 4 ‘mund’ video sequences ... 37

Table 4.3 Total number of selected training and test samples ... 38

Table 4.4 (a) x, y values of convergence iterations for No.10 landmark... 41

Table 4.4 (b) x, y values of convergence iterations for No.12 landmark ... 41

Table 4.5 Statistics of overall error corrections for34 landmarks in train000 ... 41

(9)

1. Introduction

When we observe an image, many thought processes occur in our brain that go into interpreting and analysing it, enable us to de-construct the image into individual objects, create our understanding of what the image presents, and further our interpretation of the objects. We will then form an opinion of what is inside this image, and what is happening. This seems to be easy and natural, in some sense automatic for us, since we already have fairly complicated and well-trained living biological vision and signal processing system. This sophisticated system is so efficient and accurate that enables us to perceive, detect, recognize and analyse objects and scenes, and be very sensible to trivial differences. The field of computer vision is very much centred on the replication and simulation of the innate ability to learn and comprehend that a human has, but it is by far most academic, and as yet still to be accurately achieved.

Among all computer vision research topics, human facial image processing is one of the most essential ones. It includes challenging sub-topics such as face detection, face tracking, face recognition, pose estimation, expression recognition and facial animation.

It is also essential for intelligent vision-based human computer interaction and other applications. Any human face-specific techniques use the positions of faces and facial features, i.e. faces and facial features in an image or an image sequence should be localized. Algorithms for face detection and facial feature detection have been targeted so to provide the initial information for any further processing on facial images. Due to the variation in appearance and shape of different faces in different conditions, detecting face or facial feature is considered a very demanding computer vision problem not completely solved yet. However, recent years have witnessed several breakthroughs in that field.

In this master thesis, modern algorithms for face and facial feature detection are overviewed. Two state-of-the-art algorithms, namely AdaBoost-based face detection [4], [5] and AAM-based facial feature detection [1], [2], [56], [57], [58] are studied in detail, modified and implemented. Experiments with face databases demonstrate the effectiveness of these modified implementations. The aim of this first introductory chapter is to formulate the problems of face detection and facial feature detection, to specify the thesis objectives and to overview the thesis structure.

(10)

1.1 Problem statement

Face Detection (FD) denotes the general problem of determining the locations and sizes of human faces presented in digital images [3], [19]. Based on the localized face, the problem of Facial Feature Detection (FFD) is to find the exact location of facial features, such as mouth and eyes corners, lip contour, jaw contour, and the shape of the entire face [21].

Face and facial feature detection are difficult problems, due to the large variations a face can have in a scene caused by factors such as intra-subject variations in pose, scale, expression, color, illumination, background clutter, presence of accessories, occlusions, hair, hats, eyeglasses, beard etc [20]. The easiest case considers a single frontal face and uncluttered background and solutions for this case are available.

However, most realistic images generally contain multiple faces and faces with (some of) the following variations:

Pose: the camera angle of an image can vary to the extreme of the image being rotated a total of 180 degrees from an upright frontal view. The orientation of the face in the image can be e.g. frontal, rotated to 45 degree, profile, upside-down etc. The pose also can occlude some of the facial features of interest.

Presence or absence of facial components: facial components such as beard, moustache or glasses influence the detection precision.

Expression: facial expressions change the appearance of a face.

Occlusion: faces may be partially obscured behind objects, or other faces.

Image orientation: the rotation of the observed image can directly affect the possible locations of faces.

Imaging conditions: factors such as lighting and camera response characteristics affect the interpretation of the image, in the later processes.

Ideal FD / FFD systems should identify and locate human face and facial features in any images with cluttered background, regardless of their position, scale, in-plane rotation, orientation, pose (out-of-plane rotation) and illumination. Such kinds of systems have been an ultimate goal for many researchers. A large amount of methods and algorithms have been already developed and implemented. Even though some of them have shown high performance, the ultimate goal is still far from being completely achieved yet. The recent research efforts have been focused on increasing the accuracy, speed and robustness against facial variations of existing algorithm and implementations either by utilizing better training sets or by extending current methods or seeking for alternative approaches.

(11)

1.2 Thesis objectives

The objective of the work presented in this thesis is twofold. First, it aims at implementing a demo system capable of accommodating, comparing and visualizing different face detection algorithms working on various face image databases. Second, it aims at studying the active appearance models for their applicability to facial feature detection so to implement an AAM based algorithm for the purposes of facial animation.

Experiments with databases containing face image sequences, including training and test sets selection and manual facial landmark marking are also within the scope of the thesis.

1.3 Thesis organization

The thesis is organized into five chapters. Chapter 1 introduces the topic and specifies the thesis objectives. Chapter 2 presents a brief overview of recent state-of-the-art methods for face and facial feature detection, as well as potential applications of these. Chapter 3 concentrates on Viola and Jones’ AdaBoost and cascade FD methods, different extensions and implementations. The same chapter describes the developed demo system and the experiments performed on selected face databases.

Chapter 4 overviews the basics of AAM and its applicability to FFD. Novel approaches for training set optimization together with experimental results proving their superiority, are presented as well. Chapter 5 summarizes the outcomes of the thesis project, and makes conclusions and recommendations for future work.

(12)

2. An overview of techniques for face and facial feature detection

2.1 Techniques for face detection

First FD attempts date back to 1970s. At that time simple rule-based approaches were used [3]. These approaches made simplifying initial assumptions such as frontal view of the faces and uncluttered background. Later, the simple rule-based methods evolved to more complex ones, e.g. methods based on features and employing neural networks.

In overall, the methods differ by the following aspects: representation, search strategy, post processing, precision, scale invariance, computational complexity. They can be broadly classified into four categories listed in Table 2.1 [3]. Note however, that many FD methods combine different strategies and hence their categories might overlap.

Table 2.1 Main FD methods

Methods Brief description

Knowledge based methods

Encode human knowledge of what constitutes a typical face, usually, the relationships between facial features.

Mainly used for face localization.

Feature invariant methods

Locate facial features which are invariant under different conditions and using these to locate face. Aim to find structural features of a face that exist even when the pose, viewpoint, or lighting conditions vary.

Template matching methods

Several standard patterns of face are stored and correlation computed with a test image for detection.

Those patterns are used to describe the face as a whole or the facial features separately.

Appearance or image based methods

Models are learnt from face images by training and learnt models are used to perform detection.

In knowledge based FD [22], methods are developed based on the rules derived from the researcher’s knowledge of human faces. It is easy to come up with simple rules to describe the features of a face and their relationships. The development of a knowledge-based FD system implies that a series of rules are defined prior to implementation and the system is defined by the scope and accuracy of the rule set. The rules for such systems generally are either defined as strict or loose. Strict: the rules are so strict that a comparatively low detection rate is observed. Loose: the rules loosely defined the data and lead to large false detection rates. Most prevalent characteristic of a knowledge-based technique is the absence of training. The system is bound by the rules which have been carefully defined. For this reason there is an upper bound limit on their

(13)

effectiveness in a detection task.

In contrast to the knowledge based top-down approach, researchers have been trying to find invariant features of faces to be used for detection [4], [5], [6], [7], [8].

The underlying assumption is based on the observation that humans can effortlessly detect faces and objects in different poses and lighting conditions and, so, there must exist properties or features which are invariant over these varying conditions. Numerous methods have been proposed to first detect facial features and then to infer the presence of a face. Facial features such as eyebrows, eyes, nose, mouth, and hair-line are commonly extracted using edge detectors. Based on the extracted features, a statistical model is built to describe their relationships and to verify the existence of a face. One problem with these feature-based algorithms is that the image features can be severely corrupted due to illumination, noise, and occlusion. Feature boundaries can be weakened for faces, while shadows can cause numerous strong edges which together render perceptual grouping algorithms useless. Locating useful classification features in an image is a complex task, depending on the type of feature used there can be an extremely large sample space. Feature invariant methods are designed so that only the critical features required for detection are used. This is usually done in the form of exhaustive feature evaluation exercises with boosting or bagging algorithms. Depending on the type of a learning function used systems that evaluate localized facial features are generally quicker and more robust than their pixel based counterparts.

In template matching FD [9], [10], a standard face pattern (usually frontal) is manually predefined or parameterised by a function. Given an input image, the correlation values with the standard patterns are computed for the face contour, eyes, nose, and mouth independently. The existence of a face is determined based on the correlation values. This approach has the advantage of being simple to implement.

However, it has proven to be inadequate for FD since it cannot effectively deal with variation in scale, pose, and shape. Multi-resolution, multi-scale, sub-templates, and deformable templates have subsequently been proposed to achieve scale and shape invariance. Consider the largely probabilistic task off FD in an image, at multiple resolutions the image must be permuted. The human face although it may change in colour, generally will not change (by a great deal) in shape, so it is feasible to model the characteristics of a human face, thus building a template, this template can then be used to speed up evaluation for faces at multiple scales.

The application of a template is similar to knowledge-based methods [22], knowledge of the human face characteristics can either be learnt and a dynamic template can be constructed, or it can be predefined. The application of a template can be at many scales due to the simplistic nature of most common templates simply related to symmetry of facial features and relative distances. However, template matching alone is generally weak and this is also often used in conjunction with other feature invariant techniques to strengthen the inference. The information to follow will focus only on techniques where template matching is the basis for face.

Rather than using templates for, which are generally build based upon expert assumptions about the structure of the object being located, appearance based FD [11], [12] employ a machine learning paradigm which allows for effective models/templates

(14)

to be built from a test set of data. The obvious constrain of such systems being that they are particularly reliant on the learning techniques used and the sample sets provided to them, yet they are able to provide equivalent if not the best results in comparison to previously identified techniques.

Contrasted to the template matching methods where templates are predefined by experts, the ‘templates’ in appearance-based methods are learned from examples in images. In general, appearance-based methods rely on techniques from statistical analysis and machine learning to find the relevant characteristics of face and non-face images. The learned characteristics are in the form of distribution models or discriminant functions that are consequently used for FD. Meanwhile, dimensionality reduction is usually carried out for the sake of computation and detection efficiency.

(15)

2.2 Facial feature detection algorithms

Facial feature detection (FFD) aims at searching for the exact location of facial features, such as eyes, nose, mouth, ears etc. within a given region in an image or image sequence [21]. FFD algorithms output the coordinates of pre-defined facial feature points, which can be connected to draw the contour of facial features. FFD can be regarded as post-processing towards a facial image within the region of interests defined by any pre-processing methods, e.g. manually localized facial region or output from FD.

FFD methods can be broadly classified into the following five categories listed in Table 2.2 below:

Table 2.2 Main FFD methods

Methods Brief description

Knowledge based

Summarize rules according to the characteristics of typical facial features, transform the input image to intensify the target features, find the candidates or region of interest.

Geometric based

Construct a geometric model with variable parameters according to the shape of facial features, define evaluation functions to measure the difference between regions of interest and the model.

Continuously update the model parameters to minimize the evaluation function until the model eventually converges on the facial feature.

Colour based

Build colour model for facial features based on statistics.

Completely search the potential regions, select candidates according to comparison between the colour information of the region and the facial colour model.

Appearance based

Map the sub-windows within facial feature region to points in a high dimensional space. A set of such points can represents facial features of the same type, the distribution models can be deduced by statistics. Facial features can be located by matching the potential regions and the models.

Association information

based

Based on local information of individual facial features, the location of facial features within a face can be used to minimize the searching area.

In knowledge based FFD, the knowledge is the experiential description of normal facial features. A human facial image has some obvious characteristics, e.g. the facial region contains the two eyes, nose and mouth, and these normally have lower intensity level in contrast with the regions around, the eyes are symmetric, and nose and mouth are roughly on this symmetrical axis, etc. In order to utilize these basic features for FFD, the input image is transformed to emphasize the desired features, and to filter out the candidate points or regions. The difficulties of this approach are related with the

(16)

accuracy, universality and adaptivity of the description used. Among the knowledge based methods, Yang and Huang [22] proposed the Mosaic Image approach which divides the image into panes to localize facial features. In another approach proposed by Kotropoulos [23], instead of panes, flexible rectangular units are used to better simulate the human facial shape. In Geometric Mapping [24], [25] approach, the intensity or intensity function sum of X, Y direction is calculated; the specific points that have changed in different directions are summarized, and locations combined to identify the feature location. Similarly, Feng and Yuen [26] proposed Variance Projection Function (VPF) for the mapping. In a threshold-based approach proposed by Zhang [27], the pupils of the eyes are found via a thresholded image, and the rest of features are found by filtering and edge tracing. In a generalized symmetry approach, Reisfeld et al. [28], [29] defined the so-called Generalized Symmetry Transform (GST). It relies on the strong symmetry of human eyes and the geometric distribution of facial features being robust against rotation, different expression and lighting condition etc.

In geometric based FFD, a snake-based approach proposed by Kass [30] utilizes edge detection and image segmentation for FFD [31], [32]. In variable template approach, Yuille [33] used parameterised variable templates to locate eyes and mouth.

The Point Distributed Models (PDM) approach, proposed by Cootes [34], is a parameterised shape description model, which applies a set of discrete control points to depict the object shape, and use PCA to set up the kinetic models with restrictions for each control point to keep the deformation in the acceptable range [46]. The application of PDM in FFD can be found also in Lanitis work utilizing facial models with 152 control points [36]. Compared with snakes, the PDM approach adds more feature information to the models, reduces the sensitivity of the model against noise and local deformation, but for the cost of higher computational complexity.

In a colour based FFD, Phung [37] used the so-called ‘cave seeking’ approach.

Image areas recognized as ‘skin’ by their color are examined, and the ‘caves’ found are classified as facial features according to their size, shape, position, etc. In a skin colour modelling approach, Fu used Self-Growing Probabilistic Decision Neural Network (SPDNN) [38] in YES colour space to build models for the E and S weights of the pupils of eyes. Other researchers applied YCbCr space for skin colour modelling.

In appearance based FFD, Waite used neural network approach to localize eyeballs [39]. The idea is based on the fact that, compared with the whole eyes, the eye micro-features (e.g. the left and right eye corners and upper and bottom eye socket and nearby regions) are invariant, therefore, the segments of intensity images of the micro-features nearby regions can be used to build NN separately. Similarly, Reinders utilized the grads vector as NN inputs [40]. Then, the idea is to search the target area by different NN, and to filter and combine the results to locate the features. In the classical PCA approach, Karhunen-Loeve transform (KLT) is used to map high dimensional vectors presenting the human face, to a few so-called eigenface vectors in lower dimensional sub-space, to optimally decompose and reconstruct the human facial image [46]. Cootes proposed to use multiple PCA models to assist the definition of the initial parameters of PDM [41]. In a Support Vector Machine (SVM) approach, Pan used the square-shape scan window, and considered eyebrow and eyes as an entire object in

(17)

order to reduce the interference of the eyebrows while identifying the eyes [42]. Li proposed a 2-layer SVM approach, in which the idea is to filter out candidate points by SVM with linear core, and use SVM with polynomial core to make the final judgement [43].

In contrast to the above mentioned methods association information based FFD, reduces the number of candidate points by the relatively fixed facial feature positions.

Kin and Cipolla used probability network approach, with a 3-layer Bayesian probability network to build facial model [44]. They first search the feature candidate points by combining Gaussian filter and edge detector, and then utilize the relations between adjacent points, matching vertical or horizontal pairs, leading to a more precise classification of four facial regions (up, down, left and right). In a Gabor Wavelet Network (GWN) based approach, Feris used 2-layer GWN tree model where first layer is meant for the whole face and the other layer is meant for the individual facial features [45]. A GWN tree model is built for every training image, and each facial feature is labeled, a collection of training samples set up a facial database. While searching the new image, the most similar whole-face GWN model is selected from the database, the search is started from the labeled points within this specific model and the accurate location of the facial features is found by matching the corresponding labeled features in the model.

Table 2.3 below shows a comparison of different FFD methods in terms of computational complexity, accuracy and robustness.

Table 2.3 FFD methods comparison and evaluation

Robustness FFD methods classification Complexity Accuracy Image quality

requirement

Influence of pose, expression

and lighting Mosaic Image Complex

Geometric Mapping Simple Thresholding Simple

Relatively Knowledge low

based

Generalized symmetry

Complex Relatively high

High High

Snake Variable template

Complex High Geometric

based

ASM Relatively complex

High

Relatively High

High

Colour based Simple Low Relatively

high

Low

Neural networks Appearance PCA

based SVM

Complex High Low

Relatively Low

Probability network Low

Association information

based

Gabor Wavelet Network

Relatively

complex High

Relatively

low Relatively Low

(18)

2.3 Applications

The application of FD and FFD techniques can be summarized as follows [3], [21]:

(1) First step in any fully automatic face or facial expression recognition system.

(2) First step in surveillance systems targeting face pose estimation and human body movement tracking.

(3) Automatic Target Recognition (ATR) applications or generic object detection or recognition applications.

(4) Human-machine interaction systems.

Furthermore, accurate localization of facial features would enable various further applications such as face recognition, gesture recognition, expression recognition, face image compression and reconstruction, facial animation.

The FD and FDD play an important role in 3D applications, such as 3D visual communications and 3DTV [20]. Knowledge about motion of human face and body, nature and limits of human motions can be used in order to make the processing more efficient. Case-oriented algorithms can perform better than general purpose algorithms.

For 3D display systems, detection and tracking of observer's eyes and observer’s view points are necessary to render the correct view according to the observer position. The face and facial feature localization and tracking algorithms, robust face position estimation and tracking are also important for improving the Human-Computer Interaction (HCI) and facial animation applications.

(19)

3. AdaBoost and cascade face detection

There are two main factors which determine the effectiveness of an FD system: the system’s detection accuracy and its processing speed. Although the detection accuracy has been improved through many novel approaches during the last ten years, the speed is still a problem impeding the wide use of FD system in real time applications. One of the biggest step toward improving the processing speed and making the real time implementation possible has been the introduction of the AdaBoost and cascade FD, proposed by Viola and Jones [4], [5]. In this chapter, the basics of AdaBoost and cascade algorithms, their extensions and available implementations are briefly described.

Then, a particular implementation is described in details, focusing on topics such as training/test databases assembling; modification of an existing Matlab implementation to include multiple detection elimination module, and GUI-based FD demo. FD implementations in OpenCV are also investigated.

3.1 Algorithm basics

The Viola and Jones technique achieves fast and robust FD based on three key contributions, as listed below:

(1) A new image representation called “Integral Image” [16]. It allows a very fast computation of the features to be used by the detector.

(2) A simple and efficient classifier which is built using the AdaBoost learning algorithm [17], [18]. It allows selecting a small number of critical visual features from a very large set of potential features.

(3) A method for combining classifiers in a “cascade” [4]. Due to this cascade, background regions occupying most of the image areas are quickly discarded during first stages, while promising face-like regions are processed thoroughly during later stages.

Viola and Jones’ technique relies on simple rectangular features, reminiscent to Haar basis functions [13]. These features are equivalent to intensity difference readings and are quite easy to compute. There are three feature types used with varying numbers of sub-rectangles: two/two rectangle, one/three and one/four rectangle feature types (these are described in more details in Subchapter 3.1.1). Using rectangular features in an image instead of pixels provides a number of benefits, namely a sort of ad-hoc domain knowledge is implied as well as a speed increase is achieved over pixel based systems. The calculation of the features is facilitated by the use of an image representation, so-called integral image. It allows calculating the area of any rectangle in an image in exactly eight references. The integral image itself can be calculated in one pass of the sample image, which is also a factor helping to the speed up the algorithm. The integral image is similar to a summed area table, used in computer graphics but its use is applied in pixel area evaluation.

(20)

Haar-like features of different scales form a large feature. In order to restrict it to a small number of critical features, the training stage utilizes an adaptive boosting algorithm (AdaBoost) [18]. Inference is enhanced with the use of AdaBoost where a small set of features is selected from a large set, and in doing so, a strong hypothesis is formed, in this case resulting in a strong classifier. The computational efficiency is improved not only by having a reduced set of features and training the corresponding classifiers, but also by the use of a degenerative tree of classifiers, leading to a cascade structure [4]. This degenerative tree, sometimes referred to as a decision stump, chains weak classifiers from general to more specific ones. That is, the first few classifiers are general enough to discount an image sub window and save on the time of further observations by the more specific classifiers down the chain, this can save large amount of computation.

3.1.1 Features and integral image

In the Viola and Jones system, a simple feature set is used, with relation to the feature sets described in the paper of Papageorgiou et al. [13]. Viola and Jones emphasize the fact that the use of feature-based instead of a pixel-based system is important due, especially for FD, to the benefit of ad-hoc domain encoding. Features can be used to represent and distinguish between both facial information and background in a sample image.

Figure 3.1 Four basis Haar-like features a, b, c, d and example overlaid on real facial image

Top row in Fig. 3.1 shows the first and second features selected by AdaBoost. The first feature measures the difference in intensity between the region of the eyes and a region across the upper cheeks. The feature capitalizes on the observation that the eyes region is often darker than the cheeks. The second feature compares the intensities in the eye regions to the intensity across the bridge of the nose [5].

In their simplest form, the features can be thought of as pixel intensity set evaluations. The sum of the luminance of the pixels in the white region of the feature is subtracted from the sum of the luminance in the remaining gray section. This difference value is used as the feature value, and can be combined to form a weak hypothesis on regions of the image. Within the implementation, four types of Haar-like features are chosen, the first with a horizontal division, the second a vertical, the third containing

(21)

two vertical divisions and the last containing both a horizontal and vertical division. The features are called Haar-like because of their resemblance to Haar-basis functions (Haar wavelets) [13].

Having types of features chosen, what follows is to find a way of their fast computation. The integral image representation is such an efficient way. As described in [4] and [16], it is a form of summed area table and is constructed by simply taking the sum of the luminance values above and to the left of a pixel in an image. This, it is effectively the double integral of the sample image, first along the rows then along the columns, as illustrated by Fig. 3.2.

(a) (b) Figure 3.2 Integral image

(a) The integral image at location (x, y) contains the sum of the pixels above and to the left.

(b)The sum of the pixels within rectangle D can be computed with four array references. The value of the integral image at location 1 is the sum of the pixels in rectangle A. The value at location 2 is A+B, at location 3 is A+C, and at location 4 is A+B+C+D. The sum within D can be computed as 4+1-(2+3).

The brilliance in using an integral image to speed up the feature extraction lies in the fact that any rectangle in an image can be calculated from that image’s integral image, in only four indexes to the integral image while the calculation of the integral image itself is done in only one pass of the image.

(22)

3.1.2 AdaBoost core method

Adaptive Boosting (AdaBoost) [60], is a machine learning algorithm, first formulated by Freund and Schapire [18]. It is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance.

AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. Otherwise, it is less susceptible to the overfitting problem than most learning algorithms. AdaBoost calls a weak classifier repeatedly in a series of rounds t = 1, 2,…T. For each call, a distribution of weights Dt is updated to indicate the importance of examples in the data set for the classification. On each round, the weights of each incorrectly classified example are increased (or alternatively, the weights of each correctly classified example are decreased), so that the new classifier focuses more on those examples.

In the FD training algorithm, AdaBoost allows the designer to combine weak and simple learners to form an accurate and complex overall classifier, as shown in Fig. 3.3.

Figure 3.3 Basic scheme of AdaBoost and its main goals Training set

Family of weak classifiers

One strong classifier

AdaBoost

(1) Selecting a few sets of features representing as much as possible faces.

(2)Training a strong final classifier with a linear combination of these best features.

The following description of AdaBoost is based on the works [17] and [18]. In this particular example the weighting is modified slightly and this is done in order to favor the classification of the face class. Where the deviation from the classical implementation is shown by setting the initial weights to one over the database size, the weighting is set such that the face examples will have a higher weight or importance.

This is shown as an initial step in the algorithm. The core of idea behind the use of AdaBoost is the application of a weight distribution to the sample set and the modification of the distribution during each iteration of the algorithm, where the weights are first normalized and then adjusted. At the beginning, the weight distribution is flat, but after each iteration of the algorithm each of the weak learners returns a hypothesis and the weight distribution is modified.

(23)

AdaBoost algorithm for FD application [17], [18]:

Given example images ( , ), …, ( , ) where =0,1 for negative and positive examples respectively

x1 y1 xn yn yi

Initialize weights ω1,i= m 2

1 , l 2

1 for =0,1 respectively, where m and l are the number of negatives and positives respectively.

yi

For t = 1, …, T:

1. Normalize the weights,

=

n

j tj

i t i

t

1 ,

,

, ω

ω ω (3.3), so that ωiis a probability

distribution.

2. For each feature, j, train a classifier which is restricted to using a single feature. The errow is evaluated with respect to

hj

i i j i t j

i,e, h (x )−y

ω .

3. Choose the classifier, ht, with the lowest error et. 4. Update the weights: ωt+1,it,iβt1ei (3.4)

Where if example ei =0 xi is classified correctly, ei =1 otherwise, and

t t

t ε

β ε

=

1 .

The final strong classifier is:

⎪⎩

=

=

otherwise x x h

C

T

t

T

t t t

t

0

2 ) 1 (

1 )

( 1α

1α

= (3.5) where

t

t β

α =log 1

T hypotheses are constructed each using a single feature. The final hypothesis is a weighted linear combination of the T hypotheses where the weights are inversely proportional to the training errors.

(24)

3.1.3 Cascade of classifiers

FD is a “rare event detection”, within any single image an overwhelming majority of sub-images are negative, the target patterns occur at very much lower frequency than non-targets. Consider an image of size 320×240 containing a single face. While executing the detection algorithm, 166358 sub-windows of size 20×20 pixels need to be explored. Computationally, huge speed-ups are possible if the sparsity of faces in input sub-windows can be exploited. It is best to remove as many non-face sub-windows from consideration as possible at the very early stages.

Viola and Jones used a series of classifiers to achieve this goal by using initially simple features, and selecting increasingly more complex features in later stages.

Simpler classifiers are used to reject the majority of sub-windows before more complex classifier are called to focus the subsequent processing on promising regions. The overall form of the detection process is that of a degenerate decision tree, Viola and Jones call it a cascade structure [4], which can implement a coarse-to-fine search strategy. Fig. 3.4 depicts the cascade for FD.

Figure 3.4 Schematic depiction of the detection cascade Green circle: sub-window potentially contains a face

Red circle: non-face sub-window

Stage: strong classifier trained by AdaBoost algorithm

A positive result from the first classifier triggers the evaluation of a second classifier which has also been adjusted to achieve higher detection rates. A positive result from the second classifier triggers a third classifier, and so on. A negative outcome at any point leads to the immediate rejection of the sub-window.

Viola and Jones’ final detector [5] is a 38-layer (stage) cascade of classifiers which include a total of 6060 features. The speed of the cascaded detector is directly related to the number of sub-windows and features evaluated per scanned sub-window. they claimed that in practical test, a large majority of the sub-windows are discarded by the first and two stages of the cascade, and an average of 8 features out of 6060 are evaluated per sub-window, which enables the cascaded detector process a 384×288 pixel image in just about 0.067 seconds, that is roughly 15 frames per second, much faster than any previous methods.

(25)

3.2 Extensions and software packages

3.2.1 Floatboost methods

In a theoretical setting, AdaBoost can be regarded as a procedure minimizing an upper error bound which is an exponential function of the margin on the training set [18]. However, the ultimate goal in pattern classification applications is to achieve a minimum error rate. A strong classifier learned by AdaBoost may not necessarily be best for this criterion. On the other hand, AdaBoost needs an effective procedure for learning the optimal weak classifier, such as the log posterior ratio, which requires estimation of densities in the input data space. When the dimensionality is high, this is a difficult problem. To overcome these problems, Li et al. [61], [63] proposed so-called FloatBoost to be incorporated into AdaBoost. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate directly, rather than minimizing an exponential function of the margin as in the traditional AdaBoost algorithms.

The idea of Floating Search has been originally proposed in [62] for feature selection. A backtrack mechanism allows deletion of those weak classifiers that are non-effective or unfavourable in terms of the error rate. This leads to a strong classifier consisting of fewer weak classifiers. Because deletions in backtrack is performed according to the error rate, an improvement in classification error is also obtained.

Floating Search [62] is originally aimed to deal with non-monotonicity of straight sequential feature selection, non-monotonicity means that adding an additional feature may lead to drop in performance. When a new feature is added, backtracks are performed to delete those features that cause performance drops. Limitations of sequential feature selection are thus amended, and improvement is gained for the cost of increased computation due to the extended search.

In addition, a statistical model is provided for learning weak classifiers and effective feature selection in high dimensional feature space. A base set of weak classifiers, defined as the log posterior ratio, are derived based on an over-complete set of scalar features. The weak classifiers in FloatBoost are formed out of simple features.

In each stage, the weak classifier that reduces error the most is selected. If any previously selected classifiers contribute to error reduction less than the latest selected, these classifiers are removed.

In general, Adaboost is very fast, accurate and simple to implement, but it is greedy search through feature space with highly constrained features, it needs a considerably long training time. In [61], [63] Li et al. showed that FloatBoost finds a more potent set of weak classifiers through a less greedy search, and yields a strong classifier consisting of fewer weak classifiers yet achieves lower error rates. Though it results in a faster and more accurate classifier in run-time, FloatBoost requires longer training times, reported as 5 times longer than traditional AdaBoost.

(26)

3.2.2 OpenCV face detector

Another extension of Viola and Jones’ methods has been developed by Lienhart and Maydt [6]. They introduced a novel feature set designed for detecting in-plane rotation faces (cf. Fig. 3.5). In addition, their work presented analyses on different boosting algorithms, (i.e. Discrete, Real, and Gentle AdaBoost) and compared the performance between stumps and regression tree and also analyzed the effect of training data size. The ideas in the so-referred publication have been implemented in a face detector software package in Intel's Open Source Computer Vision Library (OpenCV) [66]. OpenCV is a collection of C functions and C++ classes that implement some popular algorithms of image processing and computer vision.

Figure 3.5 Extended set of Haar-like features

Lienhart and Maydt have made the distinction that feature based systems, as apposed to raw pixel based ones are important in reducing the in-class variability while increasing the out-of-class variability. They also identified that when feature pools are combined with a selection method, such as gentle AdaBoost, the capacity of the learning algorithm can be increased. In their technique, the over-complete Haar-like features have been modified. Based on the existing feature set, each of the feature is rotated in both a positive and negative direction, varying by 45 degrees, 14 of these rotated features were then selected. It was found that decrease of approximately ten percents in false alarm rates is achieved when compared to the original technique by Viola and Jones. The technique also evaluated a number of boosting algorithms and implemented to so-called gentle AdaBoost, which in comparison places less focus on the attempt to define examples, which are generally outliers. This is quite different to the discrete boosting process by AdaBoost [17], [18], yet proved very effective in reducing the average number of features to be calculated.

In our experiments, the basic function of the OpenCV face detector [6], [66], [48]

was tested under VC++ 6.0 environment, the cascade was trained [48] by the face database assembled during the development of this thesis and the performance was evaluated. The OpenCV face detector was applied to FFD in the later stage of the project (see Subchapter 4.4).

(27)

3.3 Implementations

This section describes the experiments and implementations accomplished during the development of this thesis. They are mainly based on running and modification of two existing AdaBoost FD software packages:

(1)The Matlab implementation released by University of California San Diego[14], which is a replicate work of the original AdaBoost FD.

(2)The FD implementation in OpenCV following the work of Lienhart and Maydt [6], as included in the OpenCV Beta 5 package. This is the full C++ implementation of all methods with extended feature set and complete cascade structure for the classifiers.

3.3.1 Database collection

Because of its non-rigidity and complex three-dimensional (3D) structure, the appearance of a face is affected by a large number of factors including identity, face pose, illumination, facial expression, age, occlusion, and facial hair. The development of algorithms robust to these variations requires databases of sufficient size that include carefully controlled variations of these factors. Furthermore, common databases are necessary to compare and evaluate different algorithms. Collecting a high quality database is a resource-intensive task. Fortunately, there are quite many public face databases [15] available for face-related research, these databases are also good source for further database assembling for more application-specific experiments and implementations. In our work, the following face database were used and/or assembled.

(1.) FERET database

The FERET (The Facial Recognition Technology) image corpus was assembled to support government monitored testing and evaluation of face recognition algorithms using standardized tests and procedures [64]. The final corpus, presented here, consists of 14051 eight-bit grayscale images of human heads with views ranging from frontal to left and right profiles. See samples in Fig. 3.6 (a) in the next page.

(2) Grayscale frontal face database cropped from FERET

In order to generate a training database for the frontal face detector, all the samples from FERET database containing frontal faces were manually marked and the facial boundaries were selected by using a Matlab image crop tool. Located faces are then cropped from the original images and saved in new ‘bmp’ format images with the size of 64 by 64. Altogether, this database consists of 3,157 frontal face images (no more than 3 images per person.). In our experiments, this database was used for training both Matlab AdaBoost face detector and FD cascade in OpenCV. Examples of this dataset are shown in Fig. 3.6 (b).

(28)

(a) FERET sample images (b) The cropped frontal face samples Figure 3.6 FERET and frontal face samples

(3) CMU + MIT face data set

The Combined MIT + CMU data set [65] with ground truth for frontal FD: the image dataset has been assembled by the CMU FD project and then provided for evaluating algorithms for detecting frontal views of human faces. This particular test set was originally assembled as part of work in neural network based FD. It combines images collected at CMU and MIT. This face database has altogether 180 grayscale GIF format real-life human facial images of various pose, expression, in complex background.

The selection of a diverse training facial image dataset is extremely important for the FD training process, since the outcome is quite driven by the datasets quality. For instance, Viola and Jones [4], [5] constructed a large image database from a series of web-crawls, cropping the images of sorted face and non-face samples to a base resolution of 24 by 24 pixels. The selection of a base resolution of 64 by 64 pixels in our experiments, was decided upon because of the amount of extra information that was potentially stored within the larger pixel region, for example the hair and face profile information.

(29)

3.3.2 Adaboot Matlab implementation and modification

Our Matlab implementation is based on the software package developed by Masnadi-Shirazi [14]. The original package contains 17 Matlab scripts (’.m’ files) providing functions for learning, classification, dataset pre-processing, feature verification, integral image calculation, etc. We have integrated the package into a purposely designed GUI. The commented GUI scripts are provided in the attached DVD.

In addition, the code was optimized to run faster and new algorithms for eliminating multiple detection was added. The modified package with detailed comments is provided to the attached DVD as well.

We have tested our implementation with the original one. In the original face detector, 739 frontal faces and 739 non-faces of each of size 24 by 24 pixels, had been used to train an AdaBoost classifier based on 10 best Haar features. In our setting, 1000 grayscale frontal face images from the FERET database were cropped to size of64 by 64 pixels, and then resized to 24 by 24 pixels. These images, together with 1000 non-face samples, were used as the training set in our system. We set the number of best Haar features to 30 in order to achieve an improved performance.

We tested the so-trained detector with example images used for testing the original system. Our detector was able to detect more difficult cases, such as faces with different scales and sizes due to the extended number of features used. This experiment has proved the correct implementation of the algorithms.

(30)

3.3.3 Matlab GUI-based platform for FD

In our Matlab-based FD experiments, a Matlab Graphical User Interface (GUI) was designed (under Matlab version 7.0.1) and utilized as a platform for FD result demonstration, parameter setting, real-time progress display and algorithm comparison.

The GUI has been designed to be an open platform for integrating different algorithm implementations. A GUI window is shown in Fig. 3.7(a).

Figure 3.7 (a) Matlab GUI for FD

On the top of the Matlab GUI window, the basic tool bar (Fig. 3.7 (b)) provides the following functionalities:

Figure 3.7 (b) Tool bar

‘File’: test image load/save, software exit.

‘Parameters’: reserve for algorithm parameter setting.

‘Databases’: load and analyze training data sets.

‘Batch processing’: process all images in one folder, see Fig. 3.7 (c)(2).

‘Help’: software manual, version and author..

On the left side, the input image display window is right in the middle to display both grayscale and color image for FD experiments. The ‘Image information’ text message on the top of the display window provides the name, type and size/resolution of current image once it is successfully loaded and displayed. The ‘Program status’ text message (Fig. 3.7 (c)(1)) shows current processing stage of the program by messages like e.g. ‘Input image is ready for face detection...’, ‘Start face detection...’, ‘Face detection complete...’ etc. On the left bottom of the GUI, the ‘Single processing’ push button is

(31)

used for experiments on individual test image, and ‘Batch processing’ push button (Fig.

3.7 (c)(3)) is used towards a folder which contains a set of images of the same format, JPG and BMP are supported. The text messages on both push buttons are changeable, which indicate start and end of the detection algorithm. The ‘pause’ push button is used to stop/proceed the program.

(1) Program status, single and batch processing buttons

(2) Image folder selection (3) Bach processing finished Figure 3.7 (c) (1)(2)(3) Single and batch processing

On the upper right side, the Reports region (Fig. 3.7(d)) displays the real-time progress while running the algorithm, the processing information, processing time and the statistics of both intermediate and final results.

Figure 3.7 (d) Real-time processing report Four kinds of information are mainly shown in texts:

‘Detector information’: detector name, minimum scanning sub-window size and shift interval.

‘Pre-processing information’: input image conversion and processing, time consumption.

‘Scanning information’: real-time display the number of scanned sub-window, current scanning window size, final processing statistics and time consumption.

‘Resulting information’: final FD results, number of face detected, number of candidates found, face x, y center locations in images and detection window size.

(32)

Below the real-time report (Fig. 3.7(e)), the ‘Save the processed image’ click box is used to enable saving the image with detection window drawn on detected face to either default folder where the Matlab codes is, or to any folder selected by the ‘Browse’

push button on the right side. The ‘Save the report’ click box is used to enable saving e.g. the information and statistics of current experiments, the above four types of information etc. in a Microsoft doc file of certain format to either default folder where the Matlab codes is, or to any other folder selected by the ‘Browse’ push button on the right side. See example report below Fig. 3.7(e). Reports can be saved to certain doc file, or a doc file will be created automatically if no file is selected. Further reports will be appended automatically follow previous reports in current doc file. On the rightmost is the ‘Show ROC plot’ button for displaying ROC curves generated in experiments.

Figure 3.7 (e) Statistics save and plot

--- Face detection report 1 ---

*Image name: test8.JPG , *type: color image, *size: 350X311

Detector information

Detector name: Adaboosting cascade methods, minimum scanning sub-window size: 24X24, sub-window interval: 10 (pixels)

Preprocessing information

Input image is converted to gray-level image, processing time: 0.010000 (seconds) Scanning information

17165 sub-windows have been scanned, minimum scanning sub-window size: 24X24, maximum scanning sub-window size: 306X306, sub-window interval: 10 (pixels), processing time:

18.857000 (seconds)

Resulting information

3 face(s) being detected, 9 candidate(s) being found,*face 1, center location, 89(r) 106(c),size: 66X66 *,*face 2, center location, 82(r) 148(c),size: 78X78 *,*face 3, center location, 195(r) 195(c),size: 150X150 *

--- End of report 1 ---

On the lower right side, the Options region (Fig. 3.7(f)) provides options like algorithm selection, initial parameter settings and enable/disable special functionalities.

Figure 3.7 (f) Options and parameter setting

(33)

‘Detector type selection’: selection of FD algorithm, e.g. AdaBoost and cascade methods, neural network methods etc.

‘Minimum scanning size’: the initial scanning sub-window size.

‘Scanning shift option’: the number of pixels to shift between successive scanning window in the same scanning iteration, horizontally or vertically.

There are two click boxes for the multiple detection elimination purpose.

‘Show all candidate’: enable/disable the display of all the detection windows on the same face.

‘Eliminate of overlap with low weight’: enable/disable the functionality which eliminates the detection window of low weight (a weak candidate) that overlaps with detection window of high weight (a strong candidate). See more description in subchapter 3.3.4.

The so-designed Matlab GUI allows for various FD experiments. Algorithm parameters can be easily changed and performance of different techniques compared.

The comparison includes visualization of the resulting images and statistical data, generation of reports and experimental figures (curves) from a series of experiments.

Any FD algorithms can be realized and tested easily and clearly through such kind of GUI, and the performance like e.g. the accuracy and statistics can be automatically assessed.

Though Matlab is a good platform for image processing experiments, the running speed of Matlab scripts is always much slower than other codes written by lower-level programming language e.g. C++. This prevent the Matlab GUI and the algorithm realized in Matlab scripts from many practical usages, and also make it impossible to measure the actual real-time speed of the algorithms while in commercial applications, unless it can be realized in e.g. C++. So the Matlab GUI for FD can only be used as an experimental platform.

(34)

3.3.4 Multiple detection elimination

In practical tests with the Matlab face detector, faces are normally detected at different scales, which result in multiple detection windows of different sizes. A typical example is shown in Fig. 3.8 (a), where the two close faces are detected by multiple overlapping windows. In addition, there is a false detection. The problem with this particular image is that faces which are detected at multiple nearby positions or scales, and false detection overlap the true ones. For solving such kind of problems, we have developed a post-processing method for merging the multiple detections and reducing the false detections. The developed module has been integrated in our final implementation.

(a) Multiple detection example (b) Merged results Figure 3.8 (a)(b) Multiple detection elimination

To describe how the algorithm works, we will use the above image as an example.

As an initial step, a list of detection window candidates with their positions and scales is generated. Our goal is to find and merge multiple detections, estimate and eliminate any possible false detection. In our experiment, a two-stage approach is used: Stage 1:

assigning weights (number of detections) to each neighbouring detection, merging multiple detections. Stage 2: eliminating low-weight detections which overlap with high-weight detections.

In the first stage, for each location and scale, the number of detections within a specified neighbourhood of that location is counted as weight. The neighbourhood condition: detection center (centroid of the rectangle) is inside a small rectangle which shares the same centroid of the smallest existing detection rectangle with one third of its edge length, see example in Fig. 3.9 (a). Detection center outside the small rectangle is simply labelled as overlap (in case overlapped with existing candidate) or new detection.

Then, classify strong candidates and weak candidates. If the number is above a threshold, then that location is labelled as a strong candidate. (The threshold in our experiments was set to 2). The centroid of the nearby detections defines the location of the detection result. By calculating the mean of all nearby detection centers, multiple

References

Related documents

In this prospective randomized preliminary blinded study, we aimed to examine whether, in critically ill patients, PSV and proportional ventilation modes have different effects

The simulation result shows that the throughput and delay of the prioritized applications has improved when compared to non- prioritized applications.Thus, by

With 85 corporations in Hong Kong stock market as research sample, they empirically analyzed the correlation between proportion of independent directors and level

a Senior Syste ndamental, ess skills and know Pegasystems is e Certification to the Se ut the Exa urpose of the b cation exam co ueprint include pts are include am is based u

1. First, per Objective 1, operators should have a process in place that allows mechanics to report difficulties in carrying out a maintenance task. The operators would have

This methodology serves to provide a narrative of different types of neighborhoods with respect to how they responded to the foreclosure crisis, using three

Purpose The recent publication of the ACOSOG Z1031 trial results demonstrated that Ki-67 proliferation marker- based neoadjuvant endocrine therapy response monitoring could be used

e, Distribution of plumage scores of adult males (upper histogram) and yearling males (lower histogram).. centage nesting shrub cover) showed a disruptive pattern with