• No results found

Robust Real-Time Face Detection

N/A
N/A
Protected

Academic year: 2021

Share "Robust Real-Time Face Detection"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

Robust Real-Time Face Detection

International Journal of Computer Vision 57(2), 137–154, 2004

Paul Viola, Michael Jones

授課教授:林信志 博士 報告者:林宸宇

報告日期:96.12.18

(2)

Outline

• Introduction

• The Boost algorithm for classifier learning

Computer Graphics & Interactive Techniques Lab.

2

learning

– Feature Selection

– Weak learner constructor – The strong classifier

• Result

• Conclusion

(3)

Introduction

• A machine learning approach for visual object detection

– Capable of processing images extremely rapidly – Achieving high detection rates

– Achieving high detection rates

• Three key contributions

– A new image representation     Integral Image – A learning algorithm( Based on AdaBoost)

– A combining classifiers method     cascade

classifiers

(4)

Feature

• Papageorgiou et al (1998)

Computer Graphics & Interactive Techniques Lab.

4

(5)

Integral Image

• D=4+1-(2+3)

(6)

Computer Graphics & Interactive Techniques Lab.

6

(7)

AdaBoost

• A supervised training process

(8)

Computer Graphics & Interactive Techniques Lab.

8

(9)

AdaBoost

(10)

Attentional Cascade

• Rowley et al.(1998)

• Use two neural networks

Computer Graphics & Interactive Techniques Lab.

10

(11)

Attentional Cascade

(12)

Attentional Cascade

Computer Graphics & Interactive Techniques Lab.

12

(13)

Result

• A 38 layer cascaded classifier was trained to detect frontal upright faces

– Training set:

• Face: 4916 hand labeled faces with resolution 24x24.

• Face: 4916 hand labeled faces with resolution 24x24.

• Non-face: 9544 images contain no face.

(350 million subwindows within these non-face images)

– Features

• The first five layers of the detector: 1, 10, 25, 25 and 50 features

• Total # of features in all layer  6061

(14)

Result

• Each classifier in the cascade was trained

– Face : 4916 + the vertical mirror image  9832 images

Computer Graphics & Interactive Techniques Lab.

14

– Non-face sub-windows: 10,000

(size=24x24)

(15)

Result-outline

• Speed of the final Detector

• Image Processing

• Scanning the Detector

• Scanning the Detector

• Integration of Multiple Detector

• Experiments on a Real-World Test Set

(16)

Speed of the final Detector

• The speed is directly related to the

number of features evaluated per scanned sub-window.

• MIT+CMU test set

Computer Graphics & Interactive Techniques Lab.

16

• MIT+CMU test set

– An average of 10 features out of a total 6061 are evaluated per sub-window.

• On a 700Mhz PentiumIII, a 384 x 288

pixel image in about .067 seconds

(17)

Image Processing

• Minimize the effect of different lighting- conditions

• Using integral image

• Using integral image

• α is standard deviation, m is mean, x is

piexl value

(18)

Scanning the Detector

• The final detector is scanned across the image at multiple scale and locations

• Locations are obtained by shifting the

Computer Graphics & Interactive Techniques Lab.

18

• Locations are obtained by shifting the window some pixels △ △ △ △

– If the current scale is s, the window is shifted

by [s △△]

(19)

Integration of Multiple Detector

• Multiple detections will usually occur

around each face and some types of false positives.

positives.

• A post-process to detected sub-windows in order to combine overlapping

detections into a single detection

– Two detections are in the same subset if their

bounding regions overlap

(20)

Experiments on a Real-World Test Set

Computer Graphics & Interactive Techniques Lab.

20

(21)

Result

(22)

Result

Computer Graphics & Interactive Techniques Lab.

22

(23)

Conclusion

• Authors had developed the fastest known face detector for gray scale images

• This paper brings together new algorithms, representations and insights which are quite representations and insights which are quite generic

• The database set includes faces under very wide range of conditions including: illumination,

scale, pose, and camera variation

(24)

Conclusion

• The database set includes faces under very wide range of conditions including:

illumination, scale, pose, and camera

Computer Graphics & Interactive Techniques Lab.

24

illumination, scale, pose, and camera

variation

(25)

Thanks !

報告結束 ~

(26)

Introduction

• The attentional operator is trained to

detect examples of a particular class --- a supervised training process

Computer Graphics & Interactive Techniques Lab.

26

supervised training process

• Face classifier is constructed

– In the domain of face detection

< 1% false negative

<40% false postivie

(27)
(28)

Computer Graphics & Interactive Techniques Lab.

28

(29)

Example

• x

1

=[1 1] x

2

=[2 2] x

3

=[2 1] x

4

=[3 2]

• y

1

=1 y

2

=1 y

3

=0 y

4

=0

• t=1~3 (round)

• t=1~3 (round)

• Initial weight

t=1 (round)

W

t,i

=[w

1,1

=1/4, w

1,2

=1/4, w

1,3

=1/4, w

1,4

=1/4]

(30)

Normalize weight

• t=1 (round)

• w

1,1

=(1/4) / (1/4+1/4+1/4+1/4) = 1/4,

• w

1,2

=(1/4) / (1/4+1/4+1/4+1/4) = 1/4,

Computer Graphics & Interactive Techniques Lab.

30

• w

1,2

=(1/4) / (1/4+1/4+1/4+1/4) = 1/4,

• w

1,3

=(1/4) / (1/4+1/4+1/4+1/4) = 1/4,

• w

1,4

=(1/4) / (1/4+1/4+1/4+1/4) = 1/4,

(31)

• The error is evaluated with respect to ω

t=1

• ε

1

= 1/4|1-1|+1/4|0-1|+1/4|0-0|+ 1/4|0-0| = 1/4

• ε

22

= 1/4|0-1|+1/4|1-1|+1/4|0-0|+ 1/4|1-0| = 1/2

(32)

• Choose the lowest error ε

j

t=1 (round) Choose h

1

• Update weight

/

Computer Graphics & Interactive Techniques Lab.

32

β

1

= (¼) / (1- (¼)) = 1/3

• W

2,1

=1/4× β

11-0

= 1/12

• W

2,2

=1/4× β

11-1

= 1/4

• W

2,3

=1/4× β

11-0

= 1/12

• W

2,4

=1/4× β

11-0

= 1/12

(33)

Normalize weight (when t=2)

• W

2,1

=1/12 / 1/2 = 1/6

• W

2,2

=1/4 / 1/2 = 1/2

• W =1/12 / 1/2 = 1/6

• W

2,3

=1/12 / 1/2 = 1/6

• W

2,4

=1/12 / 1/2 = 1/6

(34)

• The error is evaluated with respect to ω

t=2

• ε

1

= 1/6|1-1|+1/2|0-1|+1/6|0-0|+ 1/6|0-0| = 1/2

• ε

2

= 1/6|0-1|+1/2|1-1|+1/6|0-0|+ 1/6|1-0| = 1/3

Computer Graphics & Interactive Techniques Lab.

34 2

(35)

• Choose the lowest error ε

j

t=2 (round) Choose h

2

• Update weight

/

β

2

= (1/3) / (1- (1/3)) = 1/2

• W

3,1

=1/6× β

21-1

= 1/6

• W

3,2

=1/2× β

21-0

= 1/4

• W

3,3

=1/6× β

21-0

= 1/12

• W

3,4

=1/6× β

21-1

= 1/6

(36)

Normalize weight (when t=3)

• W

3,1

=1/6 / 2/3 = 1/4

• W

3,2

=1/4 / 2/3 = 3/8

• W =1/12 / 2/3 = 1/8

Computer Graphics & Interactive Techniques Lab.

36

• W

3,3

=1/12 / 2/3 = 1/8

• W

3,4

=1/6 / 2/3 = 1/4

(37)

• The error is evaluated with respect to ω

t=3

• ε

1

= 1/4|1-1|+3/8|0-1|+1/8|0-0|+ 1/4|0-0| = 3/8

• ε

2

= 1/4|0-1|+3/8|1-1|+1/8|0-0|+ 1/4|1-0| = 1/2

(38)

• Choose the lowest error ε

j

t=3 (round) Choose h

1

• Update weight

/

Computer Graphics & Interactive Techniques Lab.

38

β

3

= (3/8) / (1- (3/8)) = 3/5

(39)

The final strong classifier

• α

1

=log3 α

2

=log2 α

3

=log(5/3)

• log3×h

1

(x)+log2×h

2

(x)+log(5/3) ×h

1

(x) ≧1/2×1

0.4771 0.301 0.2218

0.4771 0.301 0.2218

1 0 1 class1 T

0 0 0 class0 T

0 1 0 class0 F

Test point (1,100) 1 1 1 => class1

(40)

False positive rate

Computer Graphics & Interactive Techniques Lab.

40

Detection rate

Features

References

Related documents

The results dem- onstrated that knockdown of NSCLC significantly suppressed the cell migration and invasion capability of H226 cells (Fig. 2f ), whereas overexpression of Id-1

World Bank, 1992. City Profile: Shanghai. Migrant Housing in Urban China: Choices and Constraints. Theory and Practice in Chinese Urban Housing Reform. Zhang Zuoji et. Market

Varian (1982) describes a set of techniques, based on revealed preference axioms, through which to test a fi nite set of data for consistency with utility maximiza- tion, to

Donated Properties 5 Points (variable) Empowerment 10 Points (variable) Economic Diversity 5 Points (fixed) Sponsorship 5 Points (variable) First-time Homebuyer 5 Points

Notes: Figure 1 plots raw means of reimbursement per prescription and number of prescriptions using NDC-level data and mean targeted market share using drug group level data.

There is no limit to the number of Licensed Users who may use this Service provided that: (a) each Licensed User opens no more than 50 (fifty) Research Notes; and (b) such use

panel installations.  An average family consumed 221Wh/day.  The initial investment cost can be obtained from loan with annual interest of 10 % with long time period.  For

Twenty six institutions participated in the 2013 national pilot and 12,732 students provided responses to the survey, representing 10.9% of the target