n o e c n e r e f n o C l a n o it a n r e t n I 8 1 0
2 Communicaiton ,NetworkandAritifcia lIntelilgence(CNA I2018) 8 7 9 : N B S
I -1-60595- 50 -5 6
n
o
it
a
m
r
o
f
n
I
l
a
d
o
m
it
l
u
M
n
o
d
e
s
a
B
s
e
g
a
m
I
s
w
e
N
n
i
n
o
it
a
t
o
n
n
A
s
e
c
a
F
Z
a
h
C
H
E
N
G
*,
G
a
o
C
H
A
O
a
n
d
Y -
a
n
c
h
u
a
n
W
A
N
G
, r e t n e C h c r a e s e R l a c i g o l o n h c e T g n i r e e n i g n E m e t s y S g n i h c ti w S l a ti g i D l a n o it a N 0 0 0 0 5 4 a n i h C , n a n e H , u o h z g n e h Z r o h t u a g n i d n o p s e r r o C * : s d r o w y e
K Mulitmodalinformaiton, Faceannotaiton, Informaitonf usion.
.t c a r t s b
A Newsi magesusuallyappearwitht hecompanyofdescripitvecapitons.I nani mage-caption , n o i t p a c g n i d n o p s e r r o c e h t n i s e m a n w e f a h t i w d e t a i c o s s a e g a m i n i d e n i a t n o c s e c a f l a r e v e s e r e h w r i a p c a e r o f e m a n t c e r r o c e h t r e f n i o t s i n o i t a t o n n a e c a f f o k s a t e h
t h face .In this work ,a nove lface
d e l l a c k r o w e m a r f n o i t a t o n n
a faceannotationbasedonmulitmoda linformation(FAMI)isproposed . s e m a n g n i r r e f n i n o s e s u c o f I M A F , y t i r a l i m i s l a i c a f n o g n i y l e r y l n i a m s k r o w s u o i v e r p m o r f t n e r e f f i D r o f n i l a d o m i t l u m g n i s u f y
b mationextractedfromimagesandcaptions .Specifically ,wefirs textrac t o t t p m e tt a e w , t a h t r e t f A . s e c a f n o i t a t o n n a o t e t u b i r t n o c y a m t a h t n o i t a m r o f n i f o s e p y t e l p i t l u m y b s e c a f e t a t o n n
a a n information fusionmodel .Finally ,acorrectingstrategyisadoptedto improve h g i h s d l e i y k r o w e m a r f d e s o p o r p e h t , s t n e m i r e p x e r u o n i n w o h s s A . r e h t r u f s t l u s e r n o i t a t o n n a e h t -. s e h c a o r p p a e n i l e s a b l a r e v e s t s n i a g a s e c n a m r o f r e p n o it a t o n n a e c a f y t i l a u q n o it c u d o r t n I a t n o c s n o i t p a c e v i t p i r c s e d h t i w r a e p p a y l l a u s u s e g a m i s w e
N ining severa lnamesindicating who
d e t a t o n n a e b d l u o c s e g a m i s w e n n i s e c a f f I . s e g a m i e h t n i e b y a
m by namescontained incaptions
s s o r c s a h c u s s d l e i f h c r a e s e r w e f a f o t n e m p o l e v e d e h t t i f e n e b d l u o w t i , y l l a c i t a m o t u
a -media
n i c i l b u p , l a v e i r t e r n o i t a m r o f n
i telligenceminingandsoon.
r e f n i s d o h t e m g n it s i x e , s e g a m i s w e n n i s e c a f e t a t o n n a o
T correc tnames forfaces based on the
) 1 : s t n i a r t s n o c l a r e n e g g n i w o l l o
f N -on redundancyconstraint - inani mage-captionpair ,eachdetected d e t c e t e d y l e s l a f g n i d u l c n i ( e c a
f faces)i nanimagecanonlybeannotated byoneoft henamesinthe
s a r o t e s e m a n e t a d i d n a
c Null ,whichi ndicatest heground-truthnamedoesno tappeari nt hecaption . )
2 Uniquenes sconstraint- mulitplefacesofthesameperson cannotappearinan imageexcep tthe
ll u
N class[1] .3)C -o occurrenceconstraint -afaceofacertain nameismorepossibleto appearin ) 4 . ] 2 [ e m a n e h t n i a t n o c s n o i t p a c e s o h w s e g a m
i Simliartiyconstraint -twofacesshouldbelongt ot he
e d n U . r a l i m i s y l h g i h e r a y e h t f i n o s r e p e m a
s rt heseconstraints,t heearlierworksstressonexploiting
[ n I . n o i t a t o n n a e c a f r o f n o i t a m r o f n i y t i r a l i m i s e c a
f 3] ,faceannotationist reatedascandidatelabelling
se t(CLS)problem.Accordingt omaximummargincriterion ,maximummargin se t(MMS)learning [ n I . t e s e c a f d e l e b a l e t a d i d n a c m o r f s r e i f i s s a l c e c a f n r a e l o t d e s o p o r p s
i 4] ,consideringeachcandidate
g n i n r a e l e h t o t y l t n e r e f f i d e t u b i r t n o c d l u o h s e m a
n process ,an algorithm named confidence-rated
discriminativepartia llabe llearning (CORD)based on boostingtechniquesisproposed .In CORD , d n u o r g e h
t -truthconfidenceofeachcandidatenamei sestimatedandutliizedt ofacilitatethel earning [ n I . e r u d e c o r
p 5] ,a partia llabe llearning algorithm named instance-based partia llabe llearning P
I
( A L)isproposedt osolvefaceannotationproblembyaffinityrelationshipanalysisandan tierative e r u d e c o r p n o i t a g a p o r p l e b a
l overfaces.
e h t , s e g a m i s w e n n i , s s e l e h t r e v e
N faces from thesamesubjec tmay have differen tappearances
o i t a i r a v e h t f o e s u a c e
b nsi nposes,i lluminationsandexpressions ,whichreducest hereliabilityoff ace . s t l u s e r n o i t a t o n n a y r o t c a f s i t a s n u s e s u a c d n a n o it a m r o f n i y t i r a l i m i s s a d e r r e f e r k r o w e m a r f n o i t a t o n n a e c a f l e v o n a e s o p o r p e w , k r o w s i h t n
I faceannotationbasedon
multimoda i lnformaiton(FAMI) ,whichfocusesonutliizingmultipletypesofinformationextracted r e h t a r s n o i t p a c d n a s e g a m i m o r
f thanonlyfacesimilarityinformation .Ourframeworkismotivated t a h t n o i t a v r e s b o e h t y
b therearesomehiddenconsistencebetweenf acesandt heirground-truthnames t e s e m a n e t a d i d n a c n
) 5 : g n i w o l l o f s a s t n i a r t s n o
c Importanceconssitenceconsrtaint - afaceandi tsground-truenameare a
t r o p m i e s o l c e v a h o t d e t c e p x
e nce .6)Gende rconsistenceconstraint - afaceand tisground-truth f o s e p y t e r o m , s n i a r t s n o c l a n o it i d d a e h t r e d n U . s e s a i b r e d n e g e m a s e h t t n e s e r p d l u o h s e m a n
s A . k r o w e m a r f d e s o p o r p e h t n i n o i t a t o n n a e c a f r o f d e c u d o r t n i e b d l u o c n o i t a m r o f n
i showedi nFigure
1,t heproposedf rameworkcontainst hreeprocessingsteps .First ,multimodali nformaitoni sextracted n
o i t i n g o c e r e c a f g n i d u l c n i , s n o i t p a c d n a s e g a m i m o r
f results(𝐹𝑟) ,facesize(𝐹𝑠) ,faceposition (𝐹𝑝) ,
n o it i n i f e d e c a
f (𝐹𝑑) ,facegender(𝐹𝑔) ,facenumberintheimage(𝑁𝑢𝑚𝑓) ,nameposition (𝑁𝑝) ,name (
r e d n e
g 𝑁𝑔)andnamenumberi nthecaption(𝑁𝑢𝑚𝑛) .Second ,amodeli st rainedt of uset heextracted e
h t o t g n i d r o c c a , d r i h T . s e c a f e m a n o t n o i t a m r o f n
i uniquenessconstraint ,theannotation resultsare
d e t c e r r o c r e h t r u
f byaproposednamecorrectingstrategy .
e s o p o r p e w , t s r i F . s t c e p s a e e r h t n i d e z i r a m m u s e b n a c k r o w s i h t f o s n o i t u b i r t n o c n i a m e h t ,l l a r e v O
l a n o i ti d d a o w
t constrainsabou ttheconsistencebetweenfacesand theirground-truth names .Under s
e m a n r e f n i o t d e s u e b d l u o c y t i r a l i m i s e c a f s e d i s e b n o i t a m r o f n i e l p i t l u m , s n i a r t s n o c e s e h
t forfaces .
, s e c a f e t a t o n n a o t d e s u f d n a s y a w e l b a n o s a e r n i d e t c a r t x e e r a n o i t a m r o f n i f o s e p y t e n i n o t p u , d n o c e S
h c i h
w ismos tinfaceannotation field to ourbes tknowledge .Third, wegiveamethod to obtainan n
o it a m r o f n i l a d o m i t l u m e s u f o t l e d o m l a m i t p
o andi nfernames .
1 e r u g i
F . Thei llustrationofFAMIframework.
n o it a m r o f n I l a d o m it l u M n o d e s a B n o it a t o n n A e c a F
d n a n o it i n if e D m e l b o r
P Nota iton
e g a m i f o n o i t c e ll o c a n e v i
G -capitonpairs ,facesand namesarefirs tdetectedfromthecollection .In e
g a m i n
a -caption pair ,wedenotetheface se tas 𝐹 � �𝑓1,𝑓2,…,𝑓𝑝� ,consisting ofp faces detected a
c a m r o f d e t c e t e d s e m a n e h T . e g a m i e h t m o r
f ptionwtihan addiitona lclassNullaredenotedas 𝐶 �
�𝑐1,𝑐2,…,𝑐𝑞,𝑁𝑢𝑙𝑙� ,whichi st hecandidatenamesetf oreachf acei ncorrespondingi mage .Compared
e m a n l a n o i t i d d a e h t o
t Null ,𝐶𝑟 � �𝑐1,𝑐2,…,𝑐𝑞� isregardedast herea lcandidatenamese tforeach d
n u o r g e h T . e c a
f -truth namesand predictive namesof 𝐹 are represented by 𝑁� �𝑛1,𝑛2,…,𝑛𝑝� d
n
a 𝑌� �𝑦1,𝑦2,…,𝑦𝑝� respecitvely ,where 𝑛𝑖 representst heground-truthnameoff ace 𝑓𝑖 and 𝑦𝑖 r
p e
r esents the corresponding predictive name obtained by an annotation model . Follow the [
n i n o it p i r c s e
d 3,6] ,an image-caption paircan be regarded asabag containing multipleinstances . r
i a p e m a n e t a d i d n a c e c a f a s t n e s e r p e r e c n a t s n i h c a
E {𝑓,𝑐}�𝑓 ∈𝐹,𝑐∈𝐶�, and theinstancewli lbe
f i e v i t i s o p d e l e b a
l 𝑐 istheground-truthnameof𝑓 ,orlabeledasnegativeotherwise .Aftermultiple f
o s e p y
t informaiton are extracted ,theinstance {𝑓,𝑐} can berepresented asavector 𝑋�𝑓,𝑐� .To ,
s e c a f f o s e m a n e h t r e f n
𝑍�𝑋(𝑓,𝑛)�� 𝑍�𝑋(𝑓,𝑐)� (1) e
r e h
w 𝑐∈𝐶 ,and the equality holds when 𝑛� 𝑐 .In Eq.1, 𝑍�𝑋(𝑓,𝑐)� can be interpreted as the f
o e c n e d i f n o
c 𝑓 beingnamed by 𝑐 .Consequenlty ,thepredictivenameof 𝑓 could beinferred as :
s w o l l o f
𝑦� argmax𝑐∈𝐶𝑍�𝑋(𝑓,𝑐)� ( 2)
n o it c a r t x E n o it a m r o f n I l a d o m it l u M
a r t x e e r a n o i t a m r o f n i f o s e p y t e n i n , k r o w s i h t n
I cted from image-capiton pairs ,including face
( t l u s e r n o i t i n g o c e
r 𝐹𝑟) ,facesize(𝐹𝑠) ,facepostiion(𝐹𝑝) ,facedefinition(𝐹𝑑) ,facegender(𝐹𝑔) ,name (
n o i t i s o
p 𝑁𝑝) ,namegender(𝑁𝑔) ,thenumberoffaceintheimage(𝑁𝑢𝑚𝑓)and thenumberofname n
i thecaption(𝑁𝑢𝑚𝑐) .Int heset ypesofi nformation,t heextractingprocessesoft hefirs tsevenones e
r o m e r
a complicatedandwil lbediscussedi nt hefollowing .Beforet hediscussion ,onepoin tneeds e
m u s s a e w : d e z i s a h p m e e b o
t thestepoffaceandnamedetectionhasbeenfinished.
e c a
F RecogniitonResutl
s w e n n i y ti l i b a i l e r g n i y f s it a s n u s a h y t i r a l i m i s e c a f h g u o h t l
A images,i tstil lcanofferi mportan tclues
. g n i r r e f n i e m a n r o
f Inthiswork ,facesimliarityisexploitedbyafacerecognizerbasedonmodified K-Neares tNeighbors algorithm (KNN)[7] .Themain idea of tradiitona lfacerecognizer based on
e c a f a n e v i g : g n i w o l l o f s a s i N N
K 𝑓 and tis rea lcandidate name set 𝐶𝑟 � �𝑐1,𝑐2,…,𝑐𝑞� ,and a
t e s g n i n i a r
t 𝑇� ��𝑓1,𝑛1�,�𝑓2,𝑛2�…,�𝑓𝑡,𝑛𝑡�� consistingofeachcandidatename’smultipleground -,
s e c a f h t u r
t the distances between 𝑓 and each train sample can be calculated via the predefined t
e s e h t g n i t o n e D . c i r t e m e c n a t s i
d consisitngofthetop-k samplesneares tto 𝑓 as 𝑁𝑘�𝑓� ,thename
f
o 𝑓 canbei nferredbyt hemajorityvotingstrategyasfollowing:
𝑦� argmax𝑐𝑗∈𝐶� ∑𝑓𝑖∈𝑁𝑘(𝑓)𝐼�𝑛𝑖 � 𝑐𝑗�� (3)
e r e h
w 𝐼 istheindicativefunction, 𝐼�𝑛𝑖 � 𝑐𝑗�� 1 when 𝑛𝑖 � 𝑐𝑗 and 𝐼�𝑛𝑖 � 𝑐𝑗�� 0 otherwise. o
t d e s u s i N N K n o d e s a b r e z i n g o c e r e c a f n e h w , r e v e w o
H annotatefacesin newsimages ,thefirs t
s i t i t a h t s i y t l u c i f f i
d no teasytofindsufficien tground-truth facesforeachcandidatenametobulid ,
m e l b o r p s i h t s s e r d d a o T . t e s g n i n i a r t e h
t weattemptt oobtaint het rainingse tfromt hegivenimage
-s r i a p n o it p a
c on thebasisofanassumption proposed in [8] :in thecandidatename 𝑐’srelated face t
s i s n o c t e s e h t e .i ( t e
s s of faces detected from al limages associated with candidatename 𝑐) , the d
n u o r
g - rt uthfacesof 𝑐 occupythemajority ,andadditiona lfacesofanyotherpeopleappearjus ta w
e
f times .Accordingly ,afteraclusteringprocessconductedi ntherelatedfacese tof 𝑐,t hefacesi n n
a c r e t s u l c t s e g g i b e h
t becollected asthetrainingsamplesof 𝑐 .In thispaper ,affinity propagation g
n i r e t s u l
c algorithm(APclusteringalgorithm)[9] ,no trequiring thenumberofhiddenclustersasa s
i , r e t e m a r a p t u p n i y r a s s e c e
n introducedt oclusterfaces .
n i a r t e h t r e t f
A ingse tobtained,t henameof 𝑓 canbepredictedbyEq.3 .However,i nordert oge t n
i a t n o c o t d e t c e p x e e r a s t l u s e r n o i t i n g o c e r e h t , e c n a m r o f r e p n o i t a t o n n a r e t t e
b the information of
y t i l i b a b o r p n o it c i d e r
p 𝐹𝑟�𝑓,𝑐� .ThereforeEq.3 ismodified tooutpu tth erecogniiton probability as :
g n i w o l l o f
𝐹𝑟(𝑓,𝑐)� 2arctan�∑𝑓𝑖∈𝑁𝑘(𝑓)�1� 𝑑(𝑓𝑖,𝑓)�∙𝑤𝑖∙𝐼�𝑛𝑖 � 𝑐��/𝜋 )( 4
e r e h
w 𝑑(𝑓𝑖,𝑓) ist heEuclideandistancebetweent rainingsample 𝑓𝑖 a ndtheface 𝑓 .𝑤𝑖 represents s
i d n a e c n a t s i d h c a e f o t h g i e w e h
t definedas 1/𝑑(𝑓𝑖,𝑓) ,indicatingthesamplescloserto 𝑓 having
n i a r t s n o c o t r e d r o n I . y t i l i b a b o r p e h t n o e c n e u l f n i e r o
m 𝐹𝑟(𝑓,𝑐) to [0,1] ,2arctan(… )/𝜋 isused
a v e h t r o f n o it a m r o f s n a r t r a e n i l n o n a e k a m o
t luei nbracket .Asfort hecase 𝑐� 𝑁𝑢𝑙𝑙 ,wefollowt he
[ n i a e d
i 10]andmodeli tasaproblemofi nformaitonuncertainty .Theuncertaintyreachest hehighes t f
o y t i l i b a b o r p e h t n e h w , y l e s r e v e R . d e t u b i r t s i d y l m r o f i n u e r a s e i t i l i b a b o r p e h t n e h
w 𝑓 recognizedas
s i e m a n
a noticeablyhighert hanothernames,t heuncertaintybecomeslower .Therefore ,𝐹𝑟(𝑓,𝑁 )𝑢𝑙𝑙 :
𝐹𝑟(𝑓,𝑁 )𝑢𝑙𝑙 � � ∑𝑐∈𝐶𝑟𝐹𝑟(𝑓lo,𝑐)g2lo𝑞g2𝐹𝑟(𝑓,𝑐)� 1 (5)
e r e h
w 𝑞 is the name number in rea lcandidate name se t𝐶𝑟 fo 𝑓 .Besides ,the firs tterm is the e
h t d n a , y p o r t n e d e z i l a m r o
n aim ofminusoneisintended to make 𝐹𝑟(𝑓,𝑁 )𝑢𝑙𝑙 negaitveso tha ti t
m o r f d e h s i u g n i t s i d e b d l u o
c �𝐹𝑟(𝑓,𝑐),𝑐∈𝐶𝑟�. sA 𝐹𝑟(𝑓,𝑁𝑢𝑙𝑙) can reflec tthe distribution of �𝐹𝑟(𝑓,𝑐),𝑐∈𝐶𝑟�, i tmay facliitate face annotation ,so weintroduce 𝐹𝑟(𝑓,𝑁 )𝑢𝑙𝑙 to 𝑋(𝑓,𝑐) as an
.l l e w s a n o i t a m r o f n i l a n o i t i d d a
e c a
F S ize
e c n a t r o p m i e h t o t d e t a l e r e b o t d e r e d i s n o c e r a s e z i s e c a f , e g a m i s w e n a n
I off aces .Generally,i tseems
e c n a t r o p m i e h t t a h
t of faces are positively related to their sizes .In [ 11 ,] 𝐹𝑠(𝑓) is defined as the f
o e g a t n e c r e
p 𝑓’s bounding box areaovertota lareaofal lfaces’ in theimage .Nevertheless, this i
e h t s e s s o l n o i t i n i f e
d nformation tha thowafacesizeis’prominent’compared with otherface’sin .
e g a m i e m a s e h
t Inordert oi ncludet hisi nformationaswell ,wedefine 𝐹𝑠(𝑓) asfollow:
𝐹𝑠(𝑓)� 𝐹𝑎(𝑓)� ∑𝑓𝑖∈𝐹�𝐹𝑎(𝑓)� 𝐹𝑎(𝑓𝑖)� ( )6
n
I Eq.6 ,𝐹 ist hefaceseti nt hei mage , 𝐹𝑎 ist henormalizedfaceareaasdefinedi n[11 .]
e c a
F Po is iton
o t d n e t y e h t , e n o e m o s f o s e r u t c i p e k a t s r e h p a r g o t o h p n e h
W takethepersons’facesin thecenterof
, e r o f e r e h T . e g a m i e h
t comparedt ot hefacesi ncorner ,wet hinkt hefacesappearingi nt hecenterpar t t
c a r t x e o T . t n a t r o p m i e r o m e r a s e g a m i f
o thei nformationoffacepostiionfor 𝑓 ,wefirs tcalculatet he
Euclidean distance 𝐹𝑟𝑝(𝑓) from thecenter of 𝑓’sbounding box to thecenterofthe image .Then 𝐹𝑟𝑝(𝑓) isnormalizedto 𝐹𝑛𝑝(𝑓) bydividesthetota ldistancesofal lfaces’intheimage .Afterthat ,
s
a 𝐹𝑟 ,the information tha thow a distance is ’prominent’ compared wtih others is introduced as :
s w o l l o f
𝐹𝑝(𝑓)� 𝐹𝑛𝑝(𝑓)� ∑𝑓𝑖∈𝐹�𝐹𝑛𝑝(𝑓)� 𝐹𝑛𝑝(𝑓𝑖)� ( )7
e r e h
w 𝐹𝑛𝑝(𝑓)� 𝐹𝑟𝑝(𝑓)/ ∑𝑓𝑖∈𝐹𝐹𝑟𝑝�𝑓𝑗� ,and 𝐹 ist hefaceseti nt hei mage.
e c a
F Deifni iton
o t s d n e t a r e m a c e h t , e r u t c i p a s e k a t r e h p a r g o t o h p a n e h
W be focused on the faces of important
n o s r e
p s .Thisaction wlilmaketheimportan tfacesinimagesclearer than otherfaces. To evaluate e
c a f a f o n o i t i n i f e d h c u
s 𝑓 ,we firs tuse point sharpness algorithm to obtain the raw definition 𝐹𝑟𝑑(𝑓) .Then ,we normalize 𝐹𝑟𝑑(𝑓) to 𝐹𝑛𝑑(𝑓) and introduce the ’prominent’ information as
: s w o l l o f
𝐹𝑝(𝑓)� 𝐹𝑛𝑝(𝑓)� ∑𝑓𝑖∈𝐹�𝐹𝑛𝑝(𝑓)� 𝐹𝑛𝑝(𝑓𝑖)� ( )8
e r e h
w 𝐹𝑛𝑑(𝑓)� 𝐹𝑟𝑑(𝑓)/ ∑𝑓𝑖∈𝐹𝐹𝑟𝑑�𝑓𝑗�.
e m a
N Po isiton
e r u t a r e t i
L [ 21 ]hasprovedt hatt henamesclosert ot hebeginningofacapitonaremorepossiblet obe n
i s n o i t i s o p e m a n e h t e v e i l e b e w , s i h t y b d e r i p s n I . e g a m i g n i d n o p s e r r o c e h t n i d e r u t c i
p captionscould
[ n I . s e m a n e h t f o e c n a t r o p m i e h t t c e l f e
r 1 ,2] thenameposiitoni sdefinedasfollows:
𝑁𝑝(𝑐)� 𝐿�𝑐𝑎𝐿𝑝�𝑐𝑡�𝑖𝑜𝑛� ( )9
e r e h
w 𝐿�𝑐� isthelengthofcaption fromthebeginningtothelocation of 𝑐 ,and 𝐿�𝑐𝑎𝑝𝑡𝑖𝑜𝑛�isthe ,
n o i t i n i f e d s i h t r e d n u , r e v e w o H . n o i t p a c e l o h w e h t f o h t g n e
l 𝑁𝑝(𝑐) canno treflec ttheorderwhere
𝑐 appearsin thecaption relativeto othercandidatenames .Therefore ,weadap ttwo typesofname n
o i t i n i f e d t s r i f e h T . y l s u o e n a tl u m i s n o i t i n i f e d n o i t i s o
p 𝑁𝑝1(𝑐) is same as Eq.9 ,and the second
n o i t i n i f e
h t o b , s n o i t p a c y n a n i r a e p p a t ’ n d l u o w t i s
a 𝑁𝑝1(𝑁 )𝑢𝑙𝑙 and 𝑁𝑝2(𝑁 )𝑢𝑙𝑙 should bese tasinfintiies .
t o n s i y t i n i f n i , r e v e w o
H suitableasani nputt omostl earningalgorithms,t huswesetboth 𝑁𝑝1(𝑁𝑢 )𝑙𝑙 d
n
a 𝑁𝑝2(𝑁 )𝑢𝑙𝑙 to20t hati smuchl argert hanthevaluesofrea lnamesrespectively.
e m a
N Gender
h t i w d e l e b a l s e m a n 0 0 0 6 e t a m i x o r p p a s e d u l c n i h c i h w y r a n o i t c i d e m a n a d l i u b t s r i f e
W gender
f o r e b m u n a m o r f d e t c e l l o c e r a h c i h w , n o i t a m r o f n
i countries or cutluresand have covered al lthe
n i d e s u s e m a
n ourexperiments .𝑁𝑔(𝑐) isse tas1 when 𝑐 isamalenameandse tas0 otherwise . e
h t e n i m r e t e d o t r e d r o n i , s e d i s e
B gendersoffaces ,weusealmos t7000labeledfacestot rainaface a
l c r e d n e
g ssifierbasedonsuppor tvectormachine(SVM) .Thet rainingfacescoveral lhumanraces , d
n
a facesofeachgenderoccupyhalfoft hefacese trespectively. 𝐹𝑔(𝑓) isdefinedast heclassifier’s f
o e c n e d i f n o c e h t g n it a c i d n i y t il i b a b o r p t u p t u
o 𝑓 beingamaleface .Asfort headditiona lnameNull ,
s a t e s s i r e d n e g s t
i -1,i nordert odistinguishi tfromrea lcandidatenames.
g n it c e r r o C s tl u s e R n o it a t o n n A d n a g n i r r e f n I e m a N
e c n a t s n i e h t , e r u d e c o r p n o it c a r t x e n o it a m r o f n i e h t r e t f
A {𝑓,𝑐} can be represented as 𝑋{𝑓,𝑐}�
�𝐹𝑟(𝑓,𝑐),𝐹𝑟(𝑁 )𝑢𝑙𝑙 ,𝐹𝑠(𝑓),𝐹𝑝(𝑓),𝐹𝑑(𝑓),𝐹𝑔(𝑓),𝑁𝑢𝑚𝑓,𝑁𝑝1(𝑐),𝑁𝑝2(𝑐),𝑁𝑔(𝑐),𝑁𝑢𝑚𝑐� .Then thenex t
f o e m a n e h t g n i r r e f n i s i p e t
s 𝑓 .According to Eq.2, thecoreofthisprocess isbuilding themode l 𝑍:𝑋(𝑓,𝑐)→𝑧,𝑧∈𝑅 following thecriterionin Eq.1 .In thiswork ,wtih setting thetargetsas1 for
e v i t a g e n r o f 0 d n a s e c n a t s n i e v i t i s o
p instances ,we attemp tto learn the mode l𝑍 by regression
n I . s e u q i n h c e
t this paper, Neura lNe tRegression (NNR) ,having favorable nonlinear mapping e
c n a m r o f r e
p andhigherparalle linformation processingcapabiltiy ,isutiilzedtobuild 𝑍. Specially , o
w t
a -layerfeedforwardnetwork withsigmoidtransferfunctionsin both hiddenand outpu tlayer is .
s e c n a t s n i f o s t e g r a t e h t t i f o t d e i l p p a
e v i f , r a f o
S of the six assumptions mentioned in introduction have been used in the proposed s e c a f e l p i t l u m t a h t e s a c e h t e b y a m e r e h t , tl u s e r a s A . n o i t p m u s s a s s e n e u q i n u e h t t p e c x e k r o w e m a r f
i t c e r r o c a , n o i t a u t i s s i h t d i o v a o T . e m a n e m a s a y b d e t a t o n n a e r a e g a m i n a n
i ngstrategyi sadaptedas
: s w o l l o f
1 m h ti r o g l
A TheCorrectingStrategy
t u p n
I :�𝑍 �𝑋�𝑓𝑖,𝑐𝑗��, (1� 𝑖� 𝑝,1� 𝑗� 𝑞� 1)� ,𝑌𝑟 (Therawannotationresults)
t u p t u
O : 𝑌 (Thefina lannotationresults) 1 F or 𝑖 =1t o 𝑝 od
2 F or 𝑖 =1t o 𝑝 od
3 I (f 𝑦𝑖 = 𝑦𝑘)&&(𝑖� 𝑘)t hen
4 I f 𝑍 �𝑋�𝑓𝑖,𝑦𝑗��� 𝑍�𝑋(𝑓𝑘,𝑐𝑘)� then
5 𝑦𝑖 = 𝑦𝑖
6 𝑦𝑘 � 𝑎𝑟𝑔𝑠𝑒𝑐𝑚𝑎𝑥𝑐𝑗∈𝐶𝑍 �𝑋�𝑓𝑘,𝑐𝑗��
7 else
8 𝑦𝑘 = 𝑦𝑘
9 𝑦𝑖 � 𝑎𝑟𝑔𝑠𝑒𝑐𝑚𝑎𝑥𝑐𝑗∈𝐶𝑍 �𝑋�𝑓𝑖,𝑐𝑗��
0
1 E n fdi
1
1 Endi f
2
1 Endfor
3
1 Endfor
e r e h
s t n e m i r e p x E
t e d u l c n i s t n e m i r e p x e e h
T w oparts :Firs,tt hedatase tandperformancemetricsarei ntroduced .Second , f
o s s e n e v i t c e f f e e h t e t a g i t s e v n i e
w eachinformationi nf aceannotation ,andbenchmarkFAMIagains t l
a r e v e
s baseilneapproaches.
e c n a m r o f r e P d n a t e s a t a
D Metrics
n i d e s u t e s a t a d e h
T ourexperiments is Labe lYahoo! News[ . 2] Both faces and names have been ,
d e t c e t e
d and faces have been represented to 4992-dimensional feature vectors . By principa l e
w , ) A C P ( s i s y l a n a t n e n o p m o
c reducethedimension ofthefacia lfeaturevectorsto300.Following t
s a e l t a g n i r r u c c o s e m a n 4 1 2 e h t n i a t e r e w , ] 0 1
[ 20t imesi nt hecaptionsandt rea tothersasNullclass .
e h
T imagest ha tdono tcontainanyoft he214namesareremoved.Themoredetailsaboutt hedatase t e
l b a T n i n w o h s s
i 1 .I tshouldbenotedt hatthei temnamesi nTable1i ncludesthenumberofnames g
n i r a e p p a l li t s t u b s e m a n 4 1 2 e h t t p e c x
e int hecapitonsaswell .Andt heground-truthratiorepresents
d n u o r g e s o h w s e c a f f o e g a t n e c r e p e h
t -truthnamesarerea lnames(rathert hanNull)overallt hefaces
e h t n
i datase.t We use the Three performance metrics are utiilzed in the experiments, including y
c a r u c c a e h T . l l a c e r d n a , n o i s i c e r p , y c a r u c c
a isthe percentage of correctly annotated faces (also
g n i d u l c n
i thecorrectlyannotatedfaceswhoseground- rtu th namearetheNull)overal lfaces ,whlie e
h t s i n o i s i c e r p e h
t percentageofcorrectlyannotatedfacesovert hefaceswhichareannotatedasrea l e
g a t n e c r e p e h t s i l l a c e r e h t d n a , s e m a
n of correclty annotated rea lfacesover therea lfaces whose
d n u o r
g -truthnamesareno tNull.
b a
T l 1. e Detailsoft hedataset. s
e g a m
I Faces Nameclasses names Ground-truthratio 6
2 1 0
1 15864 2 14 16947 0.56
e m a
N InferringResutls
, s n o i t a t u m r e p t n e r e f f i d 5 r e v o d e m r o f r e p e r a s t n e m i r e p x e e h
T randomlysampling50%imagesand
d n a , t e s g n i n i a r t s a s n o it p a
c usingtheres tfortesting .Thefinalperformanceisse tastheaverageof e
c n a m r o f r e p ’ s n o i t a t u m r e p t n e r e f f i d 5 e h
t .Duringt heexperiments,t henumberofhiddenneuronsi s
e c n e u l f n i e h t y d u t s o t r e d r o n I . T N N r o f 0 2 s a t e
s ofacertaini nformationonfaceannotaiton ,ametric
( e t a r n o it u b i r t n o c d e m a
n 𝐶𝑜𝑛) i sdefinedi nEq.10:
𝑛 𝑜
𝐶 �𝑥𝑖� � ∑9 �𝑀(𝑀�𝑋)𝑋~−𝑀e(�𝑥𝑋𝑖)~�e�𝑥𝑗��
𝑗=1 ( 01 )
e r e h
w 𝑥𝑖 ist hei-thi nformationi n 𝑋�𝑓,𝑐� ,and 𝑋~e(𝑥𝑖) represents 𝑋�𝑓,𝑐� excepting 𝑥𝑖 .Misa e
c n a m r o f r e p n i a t r e
c metric (.ie one ofaccuracy ,precision ,and recall) .The performance ofFAMI t
u p n i n o it a m r o f n i t n e r e f f i d h t i
w andthecorresponding 𝐶𝑜𝑛 arereportedinFigure2 .According to 2
e r u g i
F ,𝐹𝑟 makes the greates tcontribution to the annotation result .I tiseasy to be understo od e
s u a c e
b 𝐹𝑟 are helpfu lfor face annotaiton in any images bu tother information may be no.t For l
e s l a f t o n ( s e c a f l a e r g n i n i a t n o c e g a m i n a n i , e l p m a x
e y detected results)whoseground-truth names
l l a e r
a Null ,𝐹𝑟 ist heonlyusefuli nformationt oannotatet hesefaceswithNull .Besides 𝐹𝑟 ,both 𝐹𝑠 d
n
a 𝐹𝑔 show outstanding contributions to face annotation .On the other hand ,𝑁𝑝2 degrades the f
r e
p ormancesonprecision .Wet hinki tcouldbei nterpretedtha tnamesou toft heselected214names n
e h w d e p p i k s e r
a counitng the appearance orders of names ,which causes that many noises are n
i d e n i a t n o
c 𝑁𝑝2 .However , 𝑁𝑝2 improves the performance on recall ,and the improvemen tis r
e t a e r
2 e r u g i
F . Thecontributionrateofdifferenti nformationi nFAMI.
b a
T l 2. e Performancesofdifferen tapproaches. h
c a o r p p
A Accuracy Precision Recall L
A P
I 0.6640 0.6158 0.8768 S
M
M 0.6366 0.8660 0.4207 D
R O
C 0.6452 0.6217 0.8854 I
M A
F 0.8543 0.8503 0.8538
e h t e z y l a n a r e h t r u f o t r e d r o n
I effectiveness of our framework, three baseline face annotation r
o f d e y o l p m e e r a s e h c a o r p p
a comparativestudies:MMS[3] ,CORD[4] ,andIPAL[5 .] Eachapproach n
i s r e t e m a r a p d e t s e g g u s h t i w d e r u g i f n o c s
i theoriginall iterature .Wet es teachapproachesfivet imes
h t i
w thesametes tsetsasFAMI ,and thefinalperformanceisse tastheaverageofvaluesofeach .
c i r t e
m Theresutls are shown in Table 2. tI can be observed that: IPAL and CORD show better l
l a c e r n o e c n a m r o f r e
p thanFAMI ,bu tworseperformanceonaccuracyand precision .I treflectst ha t e
r a D R O C d n a L A P
I n otcapableofannotaitngnoisefaces(faceswhoseground-truthnamesareNull) i
s w e n n i t s i x e y l l a r e n e g t a h
t mages .ComparedwithFAMI ,MMSachievecomparableorbetterr esutls e
s r o w m r o f r e p t u b , n o i s i c e r p n
o ontwoothermetrics ,reflectingtha tMMSt endstol abelfaceswith
ll u
N label s.
e l b a r o v a f s w o h s n o i t a m r o f n i l a d o m i t l u m , n o i s u l c n o c n
I capabiilty of inferring namesfor faces .
l a d o m i t l u m n o d e s a
B informaiton fusion ,FAMI is more robus tto noise faces than the baseilne .
s e h c a o r p p a
k r o W e r u t u F d n a n o is u l c n o C
e v a h e w , r e p a p s i h t n
I presented anove lframework offaceannotation in newsimagesdomain .In e
h t t i o l p x e o t r e d r
o images and corresponding captions for face annotation as fully as possible , d
e t c a r t x e e r a n o i t a m r o f n i f o s e p y t e l p i t l u
m from news data and fused to infer names .In the
t n e m i r e p x
e results ,theproposed framework presentstheencouraging performanceagains tsevera l g
n i t a c i d n i , s e h c a o r p p a e n i l e s a
b the remarkable effectiveness of mutlimoda linformaiton for face k
r o w s i h t n i d e s u n o i t a m r o f n i e h t , r e v e w o H . n o i t a t o n n
a isselectedandextractedi nt hewaysdesigned
l a u n a m y
b efforts .Withthehugeamoun tofimage-caption dataavailablein theInternet ,i tmaybe n
o i t c a r t x e d n a n o i t c e l e s n o i t a m r o f n i n r a e l o t l e d o m n o i t a t o n n a e c a f r o f e v i t c e f f e e r o
m automatically .
e r e h t h t i
W cen tadvancements in mulitmodal deep learning field ,we plan to investigate how to e
c a f o t s e u q i n h c e t g n i n r a e l p e e d t n e c e r e t a r o p r o c n
t n e m e g d e l w o n k c A
. n o i t a d n u o F e c n e i c S l a n o it a N e h t y b d e t r o p p u s y l l a i c n a n i f s a w h c r a e s e r s i h T
s e c n e r e f e R
] 1
[ Zeng Z. ,Xiao S. ,Jia K. ,e tal .Learning by Associaitng Ambiguously Labeled Images [C]/ / :
3 1 0 2 , E E E I . n o i t i n g o c e R n r e tt a P d n a n o i s i V r e t u p m o
C 87 -0 715.
] 2
[ Guillaumin M. ,Mensink T. ,VerbeekJ. ,e tal .Facerecognition from capiton-based supervision :
) 1 ( 6 9 , 2 1 0 2 , n o i s i V r e t u p m o C f o l a n r u o J l a n o i t a n r e t n I . ] J
[ 46 - .8 2
] 3
[ Luo J. ,Orabona F .Learning from Candidate Labeling Sets [C]//Nips Foundaiton-advances in :
1 1 0 2 , p a i d I . s m e t s y S g n i s s e c o r P n o i t a m r o f n I l a r u e
N 1504-1512.
] 4
[ C -. Z .Tang ,M.-L .Zhang .Confiden -cerateddiscriminativepartiall abell earning .In :Proceedings n i , 7 1 0 2 , A C , o c s i c n a r F n a S , ) 7 1 ’ I A A A ( e c n e g il l e t n I l a i c i f i t r A n o e c n e r e f n o C I A A A t s 1 3 e h t f o
. s s e r p
] 5
[ Zhang M.L. ,YuF .Solvingthepartia llabe llearningproblem :aninstance-basedapproach [C]/ / 5
1 0 2 , s s e r P I A A A . e c n e g il l e t n I l a i c i f i t r A n o e c n e r e f n o C l a n o i t a n r e t n
I : 4048-4054.
] 6
[ YangJ. ,Yan R. ,Hauptmann A.G .Mulitpleinstancelearning forlabelingfacesinbroadcasting o
e d i v s w e
n [C]/ /ACMI nternaitona lConferenceonMultimedia ,Singapore ,November .DBLP ,2005: 1
3 - .4 0 ] 7
[ CoverT. ,Har tP .Neares tneighborpatternclassificaiton [J] .IEEE TransactionsonInformaiton :
) 1 ( 3 1 , 7 6 9 1 , y r o e h
T 12 - .2 7
] 8
[ Guillaumin M. ,Mensink T. ,VerbeekJ. ,e tal .Facerecognition from capiton-based supervision r
e t n I . ] J
[ nationa lJourna lofComputerVision ,2012 ,96(1): 46 - .8 2 ]
9
[ Frey B.J. ,Dueck D .Clustering by Passing Messages between DataPoints[J] .Science ,2007 , :
) 4 1 8 5 ( 5 1
3 972.
] 0 1
[ PangL ,NgoC.W .UnsupervisedCelebrityFaceNamingi nWebVideos[J] .IEEETransactions :
) 6 ( 7 1 , 5 1 0 2 , a i d e m it l u M n
o 1- .1
] 1 1
[ PhamP.T. ,MoensM.F. ,TuytelaarsT .Cross-MediaAilgnmen tofNamesand Faces[J] .IEEE :
) 1 ( 2 1 , 0 1 0 2 , a i d e m i tl u M n o s n o i t c a s n a r
T 31 - .2 7
] 2 1