Robust Real-Time Face Detection
International Journal of Computer Vision 57(2), 137–154, 2004
Paul Viola, Michael Jones
授課教授:林信志 博士 報告者:林宸宇
報告日期:96.12.18
Outline
• Introduction
• The Boost algorithm for classifier learning
Computer Graphics & Interactive Techniques Lab.
2
learning
– Feature Selection
– Weak learner constructor – The strong classifier
• Result
• Conclusion
Introduction
• A machine learning approach for visual object detection
– Capable of processing images extremely rapidly – Achieving high detection rates
– Achieving high detection rates
• Three key contributions
– A new image representation Integral Image – A learning algorithm( Based on AdaBoost)
– A combining classifiers method cascade
classifiers
Feature
• Papageorgiou et al (1998)
Computer Graphics & Interactive Techniques Lab.
4
Integral Image
• D=4+1-(2+3)
Computer Graphics & Interactive Techniques Lab.
6
AdaBoost
• A supervised training process
Computer Graphics & Interactive Techniques Lab.
8
AdaBoost
Attentional Cascade
• Rowley et al.(1998)
• Use two neural networks
Computer Graphics & Interactive Techniques Lab.
10
Attentional Cascade
Attentional Cascade
Computer Graphics & Interactive Techniques Lab.
12
Result
• A 38 layer cascaded classifier was trained to detect frontal upright faces
– Training set:
• Face: 4916 hand labeled faces with resolution 24x24.
• Face: 4916 hand labeled faces with resolution 24x24.
• Non-face: 9544 images contain no face.
(350 million subwindows within these non-face images)
– Features
• The first five layers of the detector: 1, 10, 25, 25 and 50 features
• Total # of features in all layer 6061
Result
• Each classifier in the cascade was trained
– Face : 4916 + the vertical mirror image 9832 images
Computer Graphics & Interactive Techniques Lab.
14
– Non-face sub-windows: 10,000
(size=24x24)Result-outline
• Speed of the final Detector
• Image Processing
• Scanning the Detector
• Scanning the Detector
• Integration of Multiple Detector
• Experiments on a Real-World Test Set
Speed of the final Detector
• The speed is directly related to the
number of features evaluated per scanned sub-window.
• MIT+CMU test set
Computer Graphics & Interactive Techniques Lab.
16
• MIT+CMU test set
– An average of 10 features out of a total 6061 are evaluated per sub-window.
• On a 700Mhz PentiumIII, a 384 x 288
pixel image in about .067 seconds
Image Processing
• Minimize the effect of different lighting- conditions
• Using integral image
• Using integral image
• α is standard deviation, m is mean, x is
piexl value
Scanning the Detector
• The final detector is scanned across the image at multiple scale and locations
• Locations are obtained by shifting the
Computer Graphics & Interactive Techniques Lab.
18
• Locations are obtained by shifting the window some pixels △ △ △ △
– If the current scale is s, the window is shifted
by [s △ △ △] △
Integration of Multiple Detector
• Multiple detections will usually occur
around each face and some types of false positives.
positives.
• A post-process to detected sub-windows in order to combine overlapping
detections into a single detection
– Two detections are in the same subset if their
bounding regions overlap
Experiments on a Real-World Test Set
Computer Graphics & Interactive Techniques Lab.
20
Result
Result
Computer Graphics & Interactive Techniques Lab.
22
Conclusion
• Authors had developed the fastest known face detector for gray scale images
• This paper brings together new algorithms, representations and insights which are quite representations and insights which are quite generic
• The database set includes faces under very wide range of conditions including: illumination,
scale, pose, and camera variation
Conclusion
• The database set includes faces under very wide range of conditions including:
illumination, scale, pose, and camera
Computer Graphics & Interactive Techniques Lab.
24
illumination, scale, pose, and camera
variation
Thanks !
報告結束 ~
Introduction
• The attentional operator is trained to
detect examples of a particular class --- a supervised training process
Computer Graphics & Interactive Techniques Lab.
26
supervised training process
• Face classifier is constructed
– In the domain of face detection
< 1% false negative
<40% false postivie
Computer Graphics & Interactive Techniques Lab.
28
Example
• x
1=[1 1] x
2=[2 2] x
3=[2 1] x
4=[3 2]
• y
1=1 y
2=1 y
3=0 y
4=0
• t=1~3 (round)
• t=1~3 (round)
• Initial weight
t=1 (round)W
t,i=[w
1,1=1/4, w
1,2=1/4, w
1,3=1/4, w
1,4=1/4]
Normalize weight
• t=1 (round)
• w
1,1=(1/4) / (1/4+1/4+1/4+1/4) = 1/4,
• w
1,2=(1/4) / (1/4+1/4+1/4+1/4) = 1/4,
Computer Graphics & Interactive Techniques Lab.
30
• w
1,2=(1/4) / (1/4+1/4+1/4+1/4) = 1/4,
• w
1,3=(1/4) / (1/4+1/4+1/4+1/4) = 1/4,
• w
1,4=(1/4) / (1/4+1/4+1/4+1/4) = 1/4,
• The error is evaluated with respect to ω
t=1• ε
1= 1/4|1-1|+1/4|0-1|+1/4|0-0|+ 1/4|0-0| = 1/4
• ε
22= 1/4|0-1|+1/4|1-1|+1/4|0-0|+ 1/4|1-0| = 1/2
• Choose the lowest error ε
jt=1 (round) Choose h
1• Update weight
/
Computer Graphics & Interactive Techniques Lab.
32
β
1= (¼) / (1- (¼)) = 1/3
• W
2,1=1/4× β
11-0= 1/12
• W
2,2=1/4× β
11-1= 1/4
• W
2,3=1/4× β
11-0= 1/12
• W
2,4=1/4× β
11-0= 1/12
Normalize weight (when t=2)
• W
2,1=1/12 / 1/2 = 1/6
• W
2,2=1/4 / 1/2 = 1/2
• W =1/12 / 1/2 = 1/6
• W
2,3=1/12 / 1/2 = 1/6
• W
2,4=1/12 / 1/2 = 1/6
• The error is evaluated with respect to ω
t=2• ε
1= 1/6|1-1|+1/2|0-1|+1/6|0-0|+ 1/6|0-0| = 1/2
• ε
2= 1/6|0-1|+1/2|1-1|+1/6|0-0|+ 1/6|1-0| = 1/3
Computer Graphics & Interactive Techniques Lab.
34 2
• Choose the lowest error ε
jt=2 (round) Choose h
2• Update weight
/
β
2= (1/3) / (1- (1/3)) = 1/2
• W
3,1=1/6× β
21-1= 1/6
• W
3,2=1/2× β
21-0= 1/4
• W
3,3=1/6× β
21-0= 1/12
• W
3,4=1/6× β
21-1= 1/6
Normalize weight (when t=3)
• W
3,1=1/6 / 2/3 = 1/4
• W
3,2=1/4 / 2/3 = 3/8
• W =1/12 / 2/3 = 1/8
Computer Graphics & Interactive Techniques Lab.
36
• W
3,3=1/12 / 2/3 = 1/8
• W
3,4=1/6 / 2/3 = 1/4
• The error is evaluated with respect to ω
t=3• ε
1= 1/4|1-1|+3/8|0-1|+1/8|0-0|+ 1/4|0-0| = 3/8
• ε
2= 1/4|0-1|+3/8|1-1|+1/8|0-0|+ 1/4|1-0| = 1/2
• Choose the lowest error ε
jt=3 (round) Choose h
1• Update weight
/
Computer Graphics & Interactive Techniques Lab.
38
β
3= (3/8) / (1- (3/8)) = 3/5
The final strong classifier
• α
1=log3 α
2=log2 α
3=log(5/3)
• log3×h
1(x)+log2×h
2(x)+log(5/3) ×h
1(x) ≧1/2×1
• 0.4771 0.301 0.2218
• 0.4771 0.301 0.2218
• 1 0 1 class1 T
• 0 0 0 class0 T
• 0 1 0 class0 F
Test point (1,100) 1 1 1 => class1
False positive rate
Computer Graphics & Interactive Techniques Lab.
40