Face Identification by Human and by Computer:
Two Sides of the Same Coin, or Not?
Tsuhan Chen
[email protected] Carnegie Mellon University
Pittsburgh, USA
TsuhanChen2004
What do you see?
TsuhanChen2004
What do you see?
TsuhanChen2004
What do you see?
[http://www.palmyra.demon.co.uk]
TsuhanChen2004
TsuhanChen2004 [Adam Finkelstein, “Mona”]
TsuhanChen2004
From Human to Computer
TsuhanChen2004
Face Identification: A Generalization Problem
“Single gallery image”
Pattern
Recognition “Single probe image”
Need to generalize for all variations, without observing those variations
TsuhanChen2004
Computer vs. Human
Feng-Shui (風水) as an example
Ancient Chinese room arrangement technique
Way 1:
Write down all the rules
Too many and do not generalize
Way 2:
Imagine how a dragon would move through the room to arrange it in a livable manner
Intuitive and creative Done by Feng-Shui masters
“Biomimetics”?
Neural networks
Among the first biologically motivated PR; some success in face detection and recognition
TsuhanChen2004
Lesson from Deep Blue
November 5, 1997: Deep Blue beat Kasparov
(2-w, 1-l, 3-d), the first time in history
Deep blue was not designed to mimic
humans. Instead, it was designed to take
advantage of a computer’s strengths, i.e.
speed and memory
Deep Blue beat Kasparov by memorizing a
large amount of information and table lookup
Kasparov said it best, “quantity is sometimesquality”
TsuhanChen2004
Quantity is not always quality
Unfortunately, most object recognition work is under the “Deep Blue” paradigm
Deeper search, more data, faster computers, etc.
Object recognition requires some attributes in computers that will be more human-like
Generalization (intuition) Adapting from the past Bounds on knowledge
Face perception as example
Initial overall examination of external features, followed by a sequential analysis of internal features [Matthews, 1978] [Fraser and Parker, 1986]
TsuhanChen2004
“Generalization”
Banca Database
(ICPR2004)“Controlled” “Degraded” “Adverse”
- Studio lighting - High quality camera - Minimal pose variation
- Varying lighting
-Low quality web camera - Some pose variation
- Varying lighting
- High quality camera - Noticeable pose and other variations
TsuhanChen2004
DCT/Gabor Transform
“Parts” Representation Of The Face
TsuhanChen2004
GMM
EM Learner
Estimating Parametric Model
DCT/Gabor Transform
TsuhanChen2004
Combining Representations
Hypothesis
Monolithic representation: low-frequency information Parts representation: both low- and high- frequency representation
Weighting can be view-dependent Combine the scores with sum rule
LDA-COS
FSC-GMM
Input face + COMB
Comparative Results
Algorithm/Protocol Mc Ud Ua P LDA-NC [1] 4.93 15.99 20.24 14.79 ORG-SVM [1] 5.43 25.43 30.11 20.33 PCA-MAH 10.2 17.84 26.63 21.57 LDA-COS 6.46 10.99 20.39 14.96 FSC-GMM 2.14 24.78 17.06 21.97 COMB 1.42 9.65 16.51 12.52[1] M. Sadeghi, J. Kittler, A. Kostin, and K.Messer, “A comparative study of automatic face
TsuhanChen2004
Some Motivations
TsuhanChen2004
Holistic vs. Parts
TsuhanChen2004
‘Thatcher Illusion’
[Thomson, 1980]
‘Thatcher Illusion’
TsuhanChen2004
Holistic vs. Parts
Parts of faces are [Tanaka and Farah, 1993]
easily recognized in typical whole-face configuration less easily in new configuration
most poorly recognized in isolation
Chins differences detected first [Sargent, 1984]
Not as obvious when faces are inverted These suggest
Face perception is holistic
andby parts
Orientation is important
TsuhanChen2004
“Bounds on Knowledge” and
TsuhanChen2004
Bounds on Knowledge
Socrates (470-399 B.C.)
Computer is no where near this yet. It thinks it knows every conceivable variation (but in fact only limited to what has been programmed to it)
"The only true wisdom is in knowing you know nothing."
Adapt from Past
Humans are good at adapting using past
experience. Can computers do the same?
Yes, it is called relevance adaptation (RA)
Previously used in speech recognition Obtains a subject-dependent model from a subject-independent average distribution (the past), using a small amount of adaptation data
TsuhanChen2004
No Relevance Adaptation
TsuhanChen2004
Relevance Adaptation
TsuhanChen2004
Another Aspect: 3D/Video
3/4-View
Frontal/profile views result in poorer recognition by human than 3/4-view for unfamiliar faces [Baddeley & Woodhead, 1981; Bruce, 1982]
3/4-view looks good too!
frontal view ¾ view ¾ view
TsuhanChen2004
“Face Mosaic”
m 1 v 2 v m w TsuhanChen2004TsuhanChen2004
Face in Video
Moving faces are significantly better
recognized by human than still images
Movement provides 3D structure of the face and allows recognition of facial gestures [Knight and Johnston, 1997]
To pixelating or blurring, moving images of faces are recognized better than still images [Lander, et al.
1999](perhaps “masking” or “super-resolution”)
Face Recognition from Video
Computer can use video too
More than simple majority voting or frame selection
Integration of temporal/motion/geometry information
Updating over time
Most variations are continuous (at 30Hz): pose, illumination, expression, registration, etc.
TsuhanChen2004
Face-in-Action (FiA) Database
TsuhanChen2004
Other aspects to be explored...
TsuhanChen2004
Stages of Face Identification
Face
→
Identity
→
Name
[Young et al, 1982]Common situations:
Case 1: Can not recognize the face
Case 2: “The face looks familiar” without identity
Case 3: Identify the face (e.g., occupation) but can’t recall the name
Own-Race Bias
We are better at identifying faces belonging
to races with which we are familiar
[Shapiro and Penrod, 1986]TsuhanChen2004
Own-Race Bias
TsuhanChen2004
Independent Modules
Facial expression identified independently of
face identity
[Bruce, 1986, Young et al.,1986]Prosopagnosia patients can still identify facial emotion
Some patients cannot identify facial emotion, but could identify famous faces
TsuhanChen2004
Independent Modules
McGurk effect
A prosopagnosia patient can still experience McGurk effect [Campbell et al., 1986], suggesting that holistic face
recognition is affected, but not the by-part
Audio + Visual → Perceived ba ga da pa ga ta ma ga na
Internet Psychology Lab http://kahuna.psych.uiuc.edu//ipl [McGurk and MacDonald ‘76]
TsuhanChen2004 [Baker and Kanade, “Hallucinating Faces”]
How many samples for a face?
One Single Image
images face possible all of number >> population world history human 365 24 60 60 30× × × × × × >>
Number of all possible 16×12 images
Reconstruct
Æ Power of prior; “adapt from past” 8 12 16 2 × × = TsuhanChen2004
Some Art Work
12 x 16 LEDs, 8-bit Grayscale [Jim Campbell, “Portrait of a Portrait of Harry Nyquist’]
TsuhanChen2004
More…
12 x 16 LEDs, 8-bit Grayscale [Jim Campbell, “Portrait of a Portrait of Claude Shannon”]
Finally…
“The most compelling shapes are those near to
our hearts: people’s
faces
, a gracefully
moving body, a natural scene with rustling
leaves and flowing water.
Evolution has tuned us to these sights.
By combining vision and graphics, capturing
and creating images of these scenes may
soon be within reach. …”
TsuhanChen2004
Try this…
[http://www.palmyra.demon.co.uk]
TsuhanChen2004
Advanced Multimedia Processing Lab
Please visit us at: http://amp.ece.cmu.edu