Faces Annotation in News Images Based on Multimodal Information

(1)

n o e c n e r e f n o C l a n o it a n r e t n I 8 1 0

2 Communicaiton ,NetworkandAritifcia lIntelilgence(CNA I2018) 8 7 9 : N B S

I -1-60595- 50 -5 6

n

o

it

a

m

r

o

f

n

I

l

a

d

o

m

it

l

u

M

n

o

d

e

s

a

B

s

e

g

a

m

I

s

w

e

N

n

i

n

o

it

a

t

o

n

A

s

e

c

a

F

Z

a

h

C

H

E

N

G

*

,

G

a

o

C

H

A

O

a

n

d

Y -

a

n

c

h

u

a

n

W

A

N

G

, r e t n e C h c r a e s e R l a c i g o l o n h c e T g n i r e e n i g n E m e t s y S g n i h c ti w S l a ti g i D l a n o it a N 0 0 0 0 5 4 a n i h C , n a n e H , u o h z g n e h Z r o h t u a g n i d n o p s e r r o C * : s d r o w y e

K Mulitmodalinformaiton, Faceannotaiton, Informaitonf usion.

.t c a r t s b

A Newsi magesusuallyappearwitht hecompanyofdescripitvecapitons.I nani mage-caption , n o i t p a c g n i d n o p s e r r o c e h t n i s e m a n w e f a h t i w d e t a i c o s s a e g a m i n i d e n i a t n o c s e c a f l a r e v e s e r e h w r i a p c a e r o f e m a n t c e r r o c e h t r e f n i o t s i n o i t a t o n n a e c a f f o k s a t e h

t h face .In this work ,a nove lface

d e l l a c k r o w e m a r f n o i t a t o n n

a faceannotationbasedonmulitmoda linformation(FAMI)isproposed . s e m a n g n i r r e f n i n o s e s u c o f I M A F , y t i r a l i m i s l a i c a f n o g n i y l e r y l n i a m s k r o w s u o i v e r p m o r f t n e r e f f i D r o f n i l a d o m i t l u m g n i s u f y

b mationextractedfromimagesandcaptions .Specifically ,wefirs textrac t o t t p m e tt a e w , t a h t r e t f A . s e c a f n o i t a t o n n a o t e t u b i r t n o c y a m t a h t n o i t a m r o f n i f o s e p y t e l p i t l u m y b s e c a f e t a t o n n

a a n information fusionmodel .Finally ,acorrectingstrategyisadoptedto improve h g i h s d l e i y k r o w e m a r f d e s o p o r p e h t , s t n e m i r e p x e r u o n i n w o h s s A . r e h t r u f s t l u s e r n o i t a t o n n a e h t -. s e h c a o r p p a e n i l e s a b l a r e v e s t s n i a g a s e c n a m r o f r e p n o it a t o n n a e c a f y t i l a u q n o it c u d o r t n I a t n o c s n o i t p a c e v i t p i r c s e d h t i w r a e p p a y l l a u s u s e g a m i s w e

N ining severa lnamesindicating who

d e t a t o n n a e b d l u o c s e g a m i s w e n n i s e c a f f I . s e g a m i e h t n i e b y a

m by namescontained incaptions

s s o r c s a h c u s s d l e i f h c r a e s e r w e f a f o t n e m p o l e v e d e h t t i f e n e b d l u o w t i , y l l a c i t a m o t u

a -media

n i c i l b u p , l a v e i r t e r n o i t a m r o f n

i telligenceminingandsoon.

r e f n i s d o h t e m g n it s i x e , s e g a m i s w e n n i s e c a f e t a t o n n a o

T correc tnames forfaces based on the

) 1 : s t n i a r t s n o c l a r e n e g g n i w o l l o

f N -on redundancyconstraint - inani mage-captionpair ,eachdetected d e t c e t e d y l e s l a f g n i d u l c n i ( e c a

f faces)i nanimagecanonlybeannotated byoneoft henamesinthe

s a r o t e s e m a n e t a d i d n a

c Null ,whichi ndicatest heground-truthnamedoesno tappeari nt hecaption . )

2 Uniquenes sconstraint- mulitplefacesofthesameperson cannotappearinan imageexcep tthe

ll u

N class[1] .3)C -o occurrenceconstraint -afaceofacertain nameismorepossibleto appearin ) 4 . ] 2 [ e m a n e h t n i a t n o c s n o i t p a c e s o h w s e g a m

i Simliartiyconstraint -twofacesshouldbelongt ot he

e d n U . r a l i m i s y l h g i h e r a y e h t f i n o s r e p e m a

s rt heseconstraints,t heearlierworksstressonexploiting

[ n I . n o i t a t o n n a e c a f r o f n o i t a m r o f n i y t i r a l i m i s e c a

f 3] ,faceannotationist reatedascandidatelabelling

se t(CLS)problem.Accordingt omaximummargincriterion ,maximummargin se t(MMS)learning [ n I . t e s e c a f d e l e b a l e t a d i d n a c m o r f s r e i f i s s a l c e c a f n r a e l o t d e s o p o r p s

i 4] ,consideringeachcandidate

g n i n r a e l e h t o t y l t n e r e f f i d e t u b i r t n o c d l u o h s e m a

n process ,an algorithm named confidence-rated

discriminativepartia llabe llearning (CORD)based on boostingtechniquesisproposed .In CORD , d n u o r g e h

t -truthconfidenceofeachcandidatenamei sestimatedandutliizedt ofacilitatethel earning [ n I . e r u d e c o r

p 5] ,a partia llabe llearning algorithm named instance-based partia llabe llearning P

I

( A L)isproposedt osolvefaceannotationproblembyaffinityrelationshipanalysisandan tierative e r u d e c o r p n o i t a g a p o r p l e b a

l overfaces.

e h t , s e g a m i s w e n n i , s s e l e h t r e v e

N faces from thesamesubjec tmay have differen tappearances

o i t a i r a v e h t f o e s u a c e

b nsi nposes,i lluminationsandexpressions ,whichreducest hereliabilityoff ace . s t l u s e r n o i t a t o n n a y r o t c a f s i t a s n u s e s u a c d n a n o it a m r o f n i y t i r a l i m i s s a d e r r e f e r k r o w e m a r f n o i t a t o n n a e c a f l e v o n a e s o p o r p e w , k r o w s i h t n

I faceannotationbasedon

multimoda i lnformaiton(FAMI) ,whichfocusesonutliizingmultipletypesofinformationextracted r e h t a r s n o i t p a c d n a s e g a m i m o r

f thanonlyfacesimilarityinformation .Ourframeworkismotivated t a h t n o i t a v r e s b o e h t y

b therearesomehiddenconsistencebetweenf acesandt heirground-truthnames t e s e m a n e t a d i d n a c n

(2)

) 5 : g n i w o l l o f s a s t n i a r t s n o

c Importanceconssitenceconsrtaint - afaceandi tsground-truenameare a

t r o p m i e s o l c e v a h o t d e t c e p x

e nce .6)Gende rconsistenceconstraint - afaceand tisground-truth f o s e p y t e r o m , s n i a r t s n o c l a n o it i d d a e h t r e d n U . s e s a i b r e d n e g e m a s e h t t n e s e r p d l u o h s e m a n

s A . k r o w e m a r f d e s o p o r p e h t n i n o i t a t o n n a e c a f r o f d e c u d o r t n i e b d l u o c n o i t a m r o f n

i showedi nFigure

1,t heproposedf rameworkcontainst hreeprocessingsteps .First ,multimodali nformaitoni sextracted n

o i t i n g o c e r e c a f g n i d u l c n i , s n o i t p a c d n a s e g a m i m o r

f results(_𝐹_𝑟) ,facesize(_𝐹_𝑠) ,faceposition (_𝐹_𝑝) ,

n o it i n i f e d e c a

f (_𝐹_𝑑) ,facegender(_𝐹_𝑔) ,facenumberintheimage(_𝑁_𝑢_𝑚_𝑓) ,nameposition (_𝑁_𝑝) ,name (

r e d n e

g _𝑁_𝑔)andnamenumberi nthecaption(_𝑁_𝑢_𝑚_𝑛) .Second ,amodeli st rainedt of uset heextracted e

h t o t g n i d r o c c a , d r i h T . s e c a f e m a n o t n o i t a m r o f n

i uniquenessconstraint ,theannotation resultsare

d e t c e r r o c r e h t r u

f byaproposednamecorrectingstrategy .

e s o p o r p e w , t s r i F . s t c e p s a e e r h t n i d e z i r a m m u s e b n a c k r o w s i h t f o s n o i t u b i r t n o c n i a m e h t ,l l a r e v O

l a n o i ti d d a o w

t constrainsabou ttheconsistencebetweenfacesand theirground-truth names .Under s

e m a n r e f n i o t d e s u e b d l u o c y t i r a l i m i s e c a f s e d i s e b n o i t a m r o f n i e l p i t l u m , s n i a r t s n o c e s e h

t forfaces .

, s e c a f e t a t o n n a o t d e s u f d n a s y a w e l b a n o s a e r n i d e t c a r t x e e r a n o i t a m r o f n i f o s e p y t e n i n o t p u , d n o c e S

h c i h

w ismos tinfaceannotation field to ourbes tknowledge .Third, wegiveamethod to obtainan n

o it a m r o f n i l a d o m i t l u m e s u f o t l e d o m l a m i t p

o andi nfernames .

1 e r u g i

F . Thei llustrationofFAMIframework.

n o it a m r o f n I l a d o m it l u M n o d e s a B n o it a t o n n A e c a F

d n a n o it i n if e D m e l b o r

P Nota iton

e g a m i f o n o i t c e ll o c a n e v i

G -capitonpairs ,facesand namesarefirs tdetectedfromthecollection .In e

g a m i n

a -caption pair ,wedenotetheface se tas _𝐹 _� _�_𝑓₁_,_𝑓₂_,_…_,_𝑓_𝑝_� ,consisting ofp faces detected a

c a m r o f d e t c e t e d s e m a n e h T . e g a m i e h t m o r

f ptionwtihan addiitona lclassNullaredenotedas _𝐶 _�

�𝑐1,𝑐2,…,𝑐𝑞,𝑁𝑢𝑙𝑙� ,whichi st hecandidatenamesetf oreachf acei ncorrespondingi mage .Compared

e m a n l a n o i t i d d a e h t o

t Null ,_𝐶_𝑟 _� _�_𝑐₁_,_𝑐₂_,_…_,_𝑐_𝑞_� isregardedast herea lcandidatenamese tforeach d

n u o r g e h T . e c a

f -truth namesand predictive namesof _𝐹 are represented by _𝑁_� _�_𝑛₁_,_𝑛₂_,_…_,_𝑛_𝑝_� d

n

a _𝑌_� _�_𝑦₁_,_𝑦₂_,_…_,_𝑦_𝑝_� respecitvely ,where _𝑛_𝑖 representst heground-truthnameoff ace _𝑓_𝑖 and _𝑦_𝑖 r

p e

r esents the corresponding predictive name obtained by an annotation model . Follow the [

n i n o it p i r c s e

d 3,6] ,an image-caption paircan be regarded asabag containing multipleinstances . r

i a p e m a n e t a d i d n a c e c a f a s t n e s e r p e r e c n a t s n i h c a

E {𝑓,𝑐}�𝑓 ∈𝐹,𝑐∈𝐶�, and theinstancewli lbe

f i e v i t i s o p d e l e b a

l _𝑐 istheground-truthnameof_𝑓 ,orlabeledasnegativeotherwise .Aftermultiple f

o s e p y

t informaiton are extracted ,theinstance {𝑓,𝑐} can berepresented asavector _𝑋_�_𝑓_,_𝑐_� .To ,

s e c a f f o s e m a n e h t r e f n

(3)

𝑍�𝑋(𝑓,𝑛)�� 𝑍�𝑋(𝑓,𝑐)� (1) e

r e h

w _𝑐_∈_𝐶 ,and the equality holds when _𝑛_� _𝑐 .In Eq.1, _𝑍�_𝑋(_𝑓_,_𝑐)� can be interpreted as the f

o e c n e d i f n o

c _𝑓 beingnamed by _𝑐 .Consequenlty ,thepredictivenameof _𝑓 could beinferred as :

s w o l l o f

𝑦� argmax𝑐∈𝐶𝑍�𝑋(𝑓,𝑐)� ( 2)

n o it c a r t x E n o it a m r o f n I l a d o m it l u M

a r t x e e r a n o i t a m r o f n i f o s e p y t e n i n , k r o w s i h t n

I cted from image-capiton pairs ,including face

( t l u s e r n o i t i n g o c e

r _𝐹_𝑟) ,facesize(_𝐹_𝑠) ,facepostiion(_𝐹_𝑝) ,facedefinition(_𝐹_𝑑) ,facegender(_𝐹_𝑔) ,name (

n o i t i s o

p _𝑁_𝑝) ,namegender(_𝑁_𝑔) ,thenumberoffaceintheimage(_𝑁_𝑢_𝑚_𝑓)and thenumberofname n

i thecaption(_𝑁_𝑢_𝑚_𝑐) .Int heset ypesofi nformation,t heextractingprocessesoft hefirs tsevenones e

r o m e r

a complicatedandwil lbediscussedi nt hefollowing .Beforet hediscussion ,onepoin tneeds e

m u s s a e w : d e z i s a h p m e e b o

t thestepoffaceandnamedetectionhasbeenfinished.

e c a

F RecogniitonResutl

s w e n n i y ti l i b a i l e r g n i y f s it a s n u s a h y t i r a l i m i s e c a f h g u o h t l

A images,i tstil lcanofferi mportan tclues

. g n i r r e f n i e m a n r o

f Inthiswork ,facesimliarityisexploitedbyafacerecognizerbasedonmodified K-Neares tNeighbors algorithm (KNN)[7] .Themain idea of tradiitona lfacerecognizer based on

e c a f a n e v i g : g n i w o l l o f s a s i N N

K _𝑓 and tis rea lcandidate name set _𝐶_𝑟 _� _�_𝑐₁_,_𝑐₂_,_…_,_𝑐_𝑞_� ,and a

t e s g n i n i a r

t _𝑇_� _�_�_𝑓₁_,_𝑛₁_�_,_�_𝑓₂_,_𝑛₂_�_…_,_�_𝑓_𝑡_,_𝑛_𝑡_�_� consistingofeachcandidatename’smultipleground -,

s e c a f h t u r

t the distances between _𝑓 and each train sample can be calculated via the predefined t

e s e h t g n i t o n e D . c i r t e m e c n a t s i

d consisitngofthetop-k samplesneares tto _𝑓 as _𝑁_𝑘_�_𝑓_� ,thename

f

o _𝑓 canbei nferredbyt hemajorityvotingstrategyasfollowing:

𝑦� argmax𝑐𝑗∈𝐶� ∑𝑓𝑖∈𝑁𝑘(𝑓)𝐼�𝑛𝑖 � 𝑐𝑗�� (3)

e r e h

w _𝐼 istheindicativefunction, _𝐼�_𝑛_𝑖 _� _𝑐_𝑗_�_� ₁ when _𝑛_𝑖 _� _𝑐_𝑗 and _𝐼�_𝑛_𝑖 _� _𝑐_𝑗_�_� ₀ otherwise. o

t d e s u s i N N K n o d e s a b r e z i n g o c e r e c a f n e h w , r e v e w o

H annotatefacesin newsimages ,thefirs t

s i t i t a h t s i y t l u c i f f i

d no teasytofindsufficien tground-truth facesforeachcandidatenametobulid ,

m e l b o r p s i h t s s e r d d a o T . t e s g n i n i a r t e h

t weattemptt oobtaint het rainingse tfromt hegivenimage

-s r i a p n o it p a

c on thebasisofanassumption proposed in [8] :in thecandidatename _𝑐’srelated face t

s i s n o c t e s e h t e .i ( t e

s s of faces detected from al limages associated with candidatename _𝑐) , the d

n u o r

g - rt uthfacesof _𝑐 occupythemajority ,andadditiona lfacesofanyotherpeopleappearjus ta w

e

f times .Accordingly ,afteraclusteringprocessconductedi ntherelatedfacese tof _𝑐,t hefacesi n n

a c r e t s u l c t s e g g i b e h

t becollected asthetrainingsamplesof _𝑐 .In thispaper ,affinity propagation g

n i r e t s u l

c algorithm(APclusteringalgorithm)[9] ,no trequiring thenumberofhiddenclustersasa s

i , r e t e m a r a p t u p n i y r a s s e c e

n introducedt oclusterfaces .

n i a r t e h t r e t f

A ingse tobtained,t henameof _𝑓 canbepredictedbyEq.3 .However,i nordert oge t n

i a t n o c o t d e t c e p x e e r a s t l u s e r n o i t i n g o c e r e h t , e c n a m r o f r e p n o i t a t o n n a r e t t e

b the information of

y t i l i b a b o r p n o it c i d e r

p _𝐹_𝑟_�_𝑓_,_𝑐_� .ThereforeEq.3 ismodified tooutpu tth erecogniiton probability as :

g n i w o l l o f

𝐹𝑟(𝑓,𝑐)� 2arctan�∑𝑓𝑖∈𝑁𝑘(𝑓)�1� 𝑑(𝑓𝑖,𝑓)�∙𝑤𝑖∙𝐼�𝑛𝑖 � 𝑐��/𝜋 )( 4

e r e h

w _𝑑(_𝑓_𝑖_,_𝑓) ist heEuclideandistancebetweent rainingsample _𝑓_𝑖 a ndtheface _𝑓 ._𝑤_𝑖 represents s

i d n a e c n a t s i d h c a e f o t h g i e w e h

t definedas ₁_/_𝑑(_𝑓_𝑖_,_𝑓) ,indicatingthesamplescloserto _𝑓 having

n i a r t s n o c o t r e d r o n I . y t i l i b a b o r p e h t n o e c n e u l f n i e r o

m _𝐹_𝑟(𝑓,𝑐) to [0,1] ,₂_a_r_c_t_a_n(… )/𝜋 isused

a v e h t r o f n o it a m r o f s n a r t r a e n i l n o n a e k a m o

t luei nbracket .Asfort hecase _𝑐_� _𝑁_𝑢_𝑙𝑙 ,wefollowt he

[ n i a e d

i 10]andmodeli tasaproblemofi nformaitonuncertainty .Theuncertaintyreachest hehighes t f

o y t i l i b a b o r p e h t n e h w , y l e s r e v e R . d e t u b i r t s i d y l m r o f i n u e r a s e i t i l i b a b o r p e h t n e h

w _𝑓 recognizedas

s i e m a n

a noticeablyhighert hanothernames,t heuncertaintybecomeslower .Therefore ,_𝐹_𝑟₍_𝑓_,_{𝑁 )}_𝑢_𝑙𝑙 :

(4)

𝐹𝑟(𝑓,𝑁 )𝑢𝑙𝑙 � � ∑𝑐∈𝐶𝑟𝐹𝑟(𝑓_l_o,𝑐)_g₂lo_𝑞g2𝐹𝑟(𝑓,𝑐)� 1 (5)

e r e h

w _𝑞 is the name number in rea lcandidate name se t_𝐶_𝑟 fo _𝑓 .Besides ,the firs tterm is the e

h t d n a , y p o r t n e d e z i l a m r o

n aim ofminusoneisintended to make _𝐹_𝑟(𝑓,𝑁 )𝑢𝑙𝑙 negaitveso tha ti t

m o r f d e h s i u g n i t s i d e b d l u o

c _�_𝐹_𝑟₍_𝑓_,_𝑐)_,_𝑐_∈_𝐶_𝑟_�. sA _𝐹_𝑟₍_𝑓_,_𝑁_𝑢_𝑙_𝑙) can reflec tthe distribution of �𝐹𝑟(𝑓,𝑐),𝑐∈𝐶𝑟�, i tmay facliitate face annotation ,so weintroduce 𝐹𝑟(𝑓,𝑁 )𝑢𝑙𝑙 to 𝑋(𝑓,𝑐) as an

.l l e w s a n o i t a m r o f n i l a n o i t i d d a

e c a

F S ize

e c n a t r o p m i e h t o t d e t a l e r e b o t d e r e d i s n o c e r a s e z i s e c a f , e g a m i s w e n a n

I off aces .Generally,i tseems

e c n a t r o p m i e h t t a h

t of faces are positively related to their sizes .In [ 11 ,] _𝐹_𝑠(𝑓) is defined as the f

o e g a t n e c r e

p _𝑓’s bounding box areaovertota lareaofal lfaces’ in theimage .Nevertheless, this i

e h t s e s s o l n o i t i n i f e

d nformation tha thowafacesizeis’prominent’compared with otherface’sin .

e g a m i e m a s e h

t Inordert oi ncludet hisi nformationaswell ,wedefine _𝐹_𝑠(𝑓) asfollow:

𝐹𝑠(𝑓)� 𝐹𝑎(𝑓)� ∑𝑓𝑖∈𝐹�𝐹𝑎(𝑓)� 𝐹𝑎(𝑓𝑖)� ( )6

n

I Eq.6 ,_𝐹 ist hefaceseti nt hei mage , _𝐹_𝑎 ist henormalizedfaceareaasdefinedi n[11 .]

e c a

F Po is iton

o t d n e t y e h t , e n o e m o s f o s e r u t c i p e k a t s r e h p a r g o t o h p n e h

W takethepersons’facesin thecenterof

, e r o f e r e h T . e g a m i e h

t comparedt ot hefacesi ncorner ,wet hinkt hefacesappearingi nt hecenterpar t t

c a r t x e o T . t n a t r o p m i e r o m e r a s e g a m i f

o thei nformationoffacepostiionfor _𝑓 ,wefirs tcalculatet he

Euclidean distance _𝐹_𝑟_𝑝(𝑓) from thecenter of _𝑓’sbounding box to thecenterofthe image .Then 𝐹𝑟𝑝(𝑓) isnormalizedto 𝐹𝑛𝑝(𝑓) bydividesthetota ldistancesofal lfaces’intheimage .Afterthat ,

s

a _𝐹_𝑟 ,the information tha thow a distance is ’prominent’ compared wtih others is introduced as :

s w o l l o f

𝐹𝑝(𝑓)� 𝐹𝑛𝑝(𝑓)� ∑𝑓𝑖∈𝐹�𝐹𝑛𝑝(𝑓)� 𝐹𝑛𝑝(𝑓𝑖)� ( )7

e r e h

w _𝐹_𝑛_𝑝(𝑓)� 𝐹𝑟𝑝(𝑓)/ ∑𝑓𝑖∈𝐹𝐹𝑟𝑝�𝑓𝑗� ,and 𝐹 ist hefaceseti nt hei mage.

e c a

F Deifni iton

o t s d n e t a r e m a c e h t , e r u t c i p a s e k a t r e h p a r g o t o h p a n e h

W be focused on the faces of important

n o s r e

p s .Thisaction wlilmaketheimportan tfacesinimagesclearer than otherfaces. To evaluate e

c a f a f o n o i t i n i f e d h c u

s _𝑓 ,we firs tuse point sharpness algorithm to obtain the raw definition 𝐹𝑟𝑑(𝑓) .Then ,we normalize 𝐹𝑟𝑑(𝑓) to 𝐹𝑛𝑑(𝑓) and introduce the ’prominent’ information as

: s w o l l o f

𝐹𝑝(𝑓)� 𝐹𝑛𝑝(𝑓)� ∑𝑓𝑖∈𝐹�𝐹𝑛𝑝(𝑓)� 𝐹𝑛𝑝(𝑓𝑖)� ( )8

e r e h

w _𝐹_𝑛_𝑑(𝑓)� 𝐹𝑟𝑑(𝑓)/ ∑𝑓𝑖∈𝐹𝐹𝑟𝑑�𝑓𝑗�.

e m a

N Po isiton

e r u t a r e t i

L [ 21 ]hasprovedt hatt henamesclosert ot hebeginningofacapitonaremorepossiblet obe n

i s n o i t i s o p e m a n e h t e v e i l e b e w , s i h t y b d e r i p s n I . e g a m i g n i d n o p s e r r o c e h t n i d e r u t c i

p captionscould

[ n I . s e m a n e h t f o e c n a t r o p m i e h t t c e l f e

r 1 ,2] thenameposiitoni sdefinedasfollows:

𝑁𝑝(𝑐)� _𝐿_�_𝑐_𝑎𝐿_𝑝�𝑐_𝑡�_𝑖_𝑜_𝑛_� ( )9

e r e h

w _𝐿_�_𝑐_� isthelengthofcaption fromthebeginningtothelocation of _𝑐 ,and _𝐿_�_𝑐_𝑎_𝑝_𝑡_𝑖_𝑜_𝑛_�isthe ,

n o i t i n i f e d s i h t r e d n u , r e v e w o H . n o i t p a c e l o h w e h t f o h t g n e

l _𝑁_𝑝(𝑐) canno treflec ttheorderwhere

𝑐 appearsin thecaption relativeto othercandidatenames .Therefore ,weadap ttwo typesofname n

o i t i n i f e d t s r i f e h T . y l s u o e n a tl u m i s n o i t i n i f e d n o i t i s o

p _𝑁_𝑝₁(𝑐) is same as Eq.9 ,and the second

n o i t i n i f e

(5)

h t o b , s n o i t p a c y n a n i r a e p p a t ’ n d l u o w t i s

a _𝑁_𝑝₁(𝑁 )𝑢𝑙𝑙 and _𝑁_𝑝₂(𝑁 )𝑢𝑙𝑙 should bese tasinfintiies .

t o n s i y t i n i f n i , r e v e w o

H suitableasani nputt omostl earningalgorithms,t huswesetboth _𝑁_𝑝₁(𝑁𝑢 )𝑙𝑙 d

n

a _𝑁_𝑝₂(𝑁 )𝑢𝑙𝑙 to20t hati smuchl argert hanthevaluesofrea lnamesrespectively.

e m a

N Gender

h t i w d e l e b a l s e m a n 0 0 0 6 e t a m i x o r p p a s e d u l c n i h c i h w y r a n o i t c i d e m a n a d l i u b t s r i f e

W gender

f o r e b m u n a m o r f d e t c e l l o c e r a h c i h w , n o i t a m r o f n

i countries or cutluresand have covered al lthe

n i d e s u s e m a

n ourexperiments ._𝑁_𝑔(𝑐) isse tas1 when _𝑐 isamalenameandse tas0 otherwise . e

h t e n i m r e t e d o t r e d r o n i , s e d i s e

B gendersoffaces ,weusealmos t7000labeledfacestot rainaface a

l c r e d n e

g ssifierbasedonsuppor tvectormachine(SVM) .Thet rainingfacescoveral lhumanraces , d

n

a facesofeachgenderoccupyhalfoft hefacese trespectively. _𝐹_𝑔(𝑓) isdefinedast heclassifier’s f

o e c n e d i f n o c e h t g n it a c i d n i y t il i b a b o r p t u p t u

o _𝑓 beingamaleface .Asfort headditiona lnameNull ,

s a t e s s i r e d n e g s t

i -1,i nordert odistinguishi tfromrea lcandidatenames.

g n it c e r r o C s tl u s e R n o it a t o n n A d n a g n i r r e f n I e m a N

e c n a t s n i e h t , e r u d e c o r p n o it c a r t x e n o it a m r o f n i e h t r e t f

A _{_𝑓_,_𝑐} can be represented as _𝑋{_𝑓_,_𝑐}_�

�𝐹𝑟(𝑓,𝑐),𝐹𝑟(𝑁 )𝑢𝑙𝑙 ,𝐹𝑠(𝑓),𝐹𝑝(𝑓),𝐹𝑑(𝑓),𝐹𝑔(𝑓),𝑁𝑢𝑚𝑓,𝑁𝑝1(𝑐),𝑁𝑝2(𝑐),𝑁𝑔(𝑐),𝑁𝑢𝑚𝑐� .Then thenex t

f o e m a n e h t g n i r r e f n i s i p e t

s _𝑓 .According to Eq.2, thecoreofthisprocess isbuilding themode l 𝑍:𝑋(𝑓,𝑐)→𝑧,𝑧∈𝑅 following thecriterionin Eq.1 .In thiswork ,wtih setting thetargetsas1 for

e v i t a g e n r o f 0 d n a s e c n a t s n i e v i t i s o

p instances ,we attemp tto learn the mode l_𝑍 by regression

n I . s e u q i n h c e

t this paper, Neura lNe tRegression (NNR) ,having favorable nonlinear mapping e

c n a m r o f r e

p andhigherparalle linformation processingcapabiltiy ,isutiilzedtobuild _𝑍. Specially , o

w t

a -layerfeedforwardnetwork withsigmoidtransferfunctionsin both hiddenand outpu tlayer is .

s e c n a t s n i f o s t e g r a t e h t t i f o t d e i l p p a

e v i f , r a f o

S of the six assumptions mentioned in introduction have been used in the proposed s e c a f e l p i t l u m t a h t e s a c e h t e b y a m e r e h t , tl u s e r a s A . n o i t p m u s s a s s e n e u q i n u e h t t p e c x e k r o w e m a r f

i t c e r r o c a , n o i t a u t i s s i h t d i o v a o T . e m a n e m a s a y b d e t a t o n n a e r a e g a m i n a n

i ngstrategyi sadaptedas

: s w o l l o f

1 m h ti r o g l

A TheCorrectingStrategy

t u p n

I :_�_{𝑍 �}_𝑋�_𝑓_𝑖_,_𝑐_𝑗_��_{, (}₁_� _𝑖_� _𝑝_,₁_� _𝑗_� _𝑞_� ₁₎_� ,_𝑌_𝑟 (Therawannotationresults)

t u p t u

O : _𝑌 (Thefina lannotationresults) 1 F or _𝑖 =1t o _𝑝 od

2 F or _𝑖 =1t o _𝑝 od

3 I (f _𝑦_𝑖 = _𝑦_𝑘)&&(_𝑖_� _𝑘)t hen

4 I f _{𝑍 �}_𝑋�_𝑓_𝑖_,_𝑦_𝑗_��_� _𝑍�_𝑋(_𝑓_𝑘_,_𝑐_𝑘_)� then

5 _𝑦_𝑖 = _𝑦_𝑖

6 _𝑦_𝑘 _� _𝑎_𝑟_𝑔_𝑠_𝑒_𝑐_𝑚_𝑎_𝑥_𝑐_𝑗_∈_𝐶_{𝑍 �}_𝑋�_𝑓_𝑘_,_𝑐_𝑗_��

7 else

8 _𝑦_𝑘 = _𝑦_𝑘

9 _𝑦_𝑖 _� _𝑎_𝑟_𝑔_𝑠_𝑒_𝑐_𝑚_𝑎_𝑥_𝑐_𝑗_∈_𝐶_{𝑍 �}_𝑋�_𝑓_𝑖_,_𝑐_𝑗_��

0

1 E n fdi

1

1 Endi f

2

1 Endfor

3

1 Endfor

e r e h

(6)

s t n e m i r e p x E

t e d u l c n i s t n e m i r e p x e e h

T w oparts :Firs,tt hedatase tandperformancemetricsarei ntroduced .Second , f

o s s e n e v i t c e f f e e h t e t a g i t s e v n i e

w eachinformationi nf aceannotation ,andbenchmarkFAMIagains t l

a r e v e

s baseilneapproaches.

e c n a m r o f r e P d n a t e s a t a

D Metrics

n i d e s u t e s a t a d e h

T ourexperiments is Labe lYahoo! News[ . 2] Both faces and names have been ,

d e t c e t e

d and faces have been represented to 4992-dimensional feature vectors . By principa l e

w , ) A C P ( s i s y l a n a t n e n o p m o

c reducethedimension ofthefacia lfeaturevectorsto300.Following t

s a e l t a g n i r r u c c o s e m a n 4 1 2 e h t n i a t e r e w , ] 0 1

[ 20t imesi nt hecaptionsandt rea tothersasNullclass .

e h

T imagest ha tdono tcontainanyoft he214namesareremoved.Themoredetailsaboutt hedatase t e

l b a T n i n w o h s s

i 1 .I tshouldbenotedt hatthei temnamesi nTable1i ncludesthenumberofnames g

n i r a e p p a l li t s t u b s e m a n 4 1 2 e h t t p e c x

e int hecapitonsaswell .Andt heground-truthratiorepresents

d n u o r g e s o h w s e c a f f o e g a t n e c r e p e h

t -truthnamesarerea lnames(rathert hanNull)overallt hefaces

e h t n

i datase.t We use the Three performance metrics are utiilzed in the experiments, including y

c a r u c c a e h T . l l a c e r d n a , n o i s i c e r p , y c a r u c c

a isthe percentage of correctly annotated faces (also

g n i d u l c n

i thecorrectlyannotatedfaceswhoseground- rtu th namearetheNull)overal lfaces ,whlie e

h t s i n o i s i c e r p e h

t percentageofcorrectlyannotatedfacesovert hefaceswhichareannotatedasrea l e

g a t n e c r e p e h t s i l l a c e r e h t d n a , s e m a

n of correclty annotated rea lfacesover therea lfaces whose

d n u o r

g -truthnamesareno tNull.

b a

T l 1. e Detailsoft hedataset. s

e g a m

I Faces Nameclasses names Ground-truthratio 6

2 1 0

1 15864 2 14 16947 0.56

e m a

N InferringResutls

, s n o i t a t u m r e p t n e r e f f i d 5 r e v o d e m r o f r e p e r a s t n e m i r e p x e e h

T randomlysampling50%imagesand

d n a , t e s g n i n i a r t s a s n o it p a

c usingtheres tfortesting .Thefinalperformanceisse tastheaverageof e

c n a m r o f r e p ’ s n o i t a t u m r e p t n e r e f f i d 5 e h

t .Duringt heexperiments,t henumberofhiddenneuronsi s

e c n e u l f n i e h t y d u t s o t r e d r o n I . T N N r o f 0 2 s a t e

s ofacertaini nformationonfaceannotaiton ,ametric

( e t a r n o it u b i r t n o c d e m a

n _𝐶_𝑜_𝑛) i sdefinedi nEq.10:

𝑛 𝑜

𝐶 �𝑥𝑖� � _∑9 _�_𝑀(𝑀�_𝑋)𝑋~₋_𝑀e(_�𝑥_𝑋𝑖)_~�_e�_𝑥_𝑗�_�

𝑗=1 ( 01 )

e r e h

w _𝑥_𝑖 ist hei-thi nformationi n _𝑋_�_𝑓_,_𝑐_� ,and _𝑋_~_e(_𝑥_𝑖) represents _𝑋_�_𝑓_,_𝑐_� excepting _𝑥_𝑖 .Misa e

c n a m r o f r e p n i a t r e

c metric (.ie one ofaccuracy ,precision ,and recall) .The performance ofFAMI t

u p n i n o it a m r o f n i t n e r e f f i d h t i

w andthecorresponding _𝐶_𝑜_𝑛 arereportedinFigure2 .According to 2

e r u g i

F ,_𝐹_𝑟 makes the greates tcontribution to the annotation result .I tiseasy to be understo od e

s u a c e

b _𝐹_𝑟 are helpfu lfor face annotaiton in any images bu tother information may be no.t For l

e s l a f t o n ( s e c a f l a e r g n i n i a t n o c e g a m i n a n i , e l p m a x

e y detected results)whoseground-truth names

l l a e r

a Null ,_𝐹_𝑟 ist heonlyusefuli nformationt oannotatet hesefaceswithNull .Besides _𝐹_𝑟 ,both _𝐹_𝑠 d

n

a _𝐹_𝑔 show outstanding contributions to face annotation .On the other hand ,_𝑁_𝑝₂ degrades the f

r e

p ormancesonprecision .Wet hinki tcouldbei nterpretedtha tnamesou toft heselected214names n

e h w d e p p i k s e r

a counitng the appearance orders of names ,which causes that many noises are n

i d e n i a t n o

c _𝑁_𝑝₂ .However , _𝑁_𝑝₂ improves the performance on recall ,and the improvemen tis r

e t a e r

(7)

2 e r u g i

F . Thecontributionrateofdifferenti nformationi nFAMI.

b a

T l 2. e Performancesofdifferen tapproaches. h

c a o r p p

A Accuracy Precision Recall L

A P

I 0.6640 0.6158 0.8768 S

M

M 0.6366 0.8660 0.4207 D

R O

C 0.6452 0.6217 0.8854 I

M A

F 0.8543 0.8503 0.8538

e h t e z y l a n a r e h t r u f o t r e d r o n

I effectiveness of our framework, three baseline face annotation r

o f d e y o l p m e e r a s e h c a o r p p

a comparativestudies:MMS[3] ,CORD[4] ,andIPAL[5 .] Eachapproach n

i s r e t e m a r a p d e t s e g g u s h t i w d e r u g i f n o c s

i theoriginall iterature .Wet es teachapproachesfivet imes

h t i

w thesametes tsetsasFAMI ,and thefinalperformanceisse tastheaverageofvaluesofeach .

c i r t e

m Theresutls are shown in Table 2. tI can be observed that: IPAL and CORD show better l

l a c e r n o e c n a m r o f r e

p thanFAMI ,bu tworseperformanceonaccuracyand precision .I treflectst ha t e

r a D R O C d n a L A P

I n otcapableofannotaitngnoisefaces(faceswhoseground-truthnamesareNull) i

s w e n n i t s i x e y l l a r e n e g t a h

t mages .ComparedwithFAMI ,MMSachievecomparableorbetterr esutls e

s r o w m r o f r e p t u b , n o i s i c e r p n

o ontwoothermetrics ,reflectingtha tMMSt endstol abelfaceswith

ll u

N label s.

e l b a r o v a f s w o h s n o i t a m r o f n i l a d o m i t l u m , n o i s u l c n o c n

I capabiilty of inferring namesfor faces .

l a d o m i t l u m n o d e s a

B informaiton fusion ,FAMI is more robus tto noise faces than the baseilne .

s e h c a o r p p a

k r o W e r u t u F d n a n o is u l c n o C

e v a h e w , r e p a p s i h t n

I presented anove lframework offaceannotation in newsimagesdomain .In e

h t t i o l p x e o t r e d r

o images and corresponding captions for face annotation as fully as possible , d

e t c a r t x e e r a n o i t a m r o f n i f o s e p y t e l p i t l u

m from news data and fused to infer names .In the

t n e m i r e p x

e results ,theproposed framework presentstheencouraging performanceagains tsevera l g

n i t a c i d n i , s e h c a o r p p a e n i l e s a

b the remarkable effectiveness of mutlimoda linformaiton for face k

r o w s i h t n i d e s u n o i t a m r o f n i e h t , r e v e w o H . n o i t a t o n n

a isselectedandextractedi nt hewaysdesigned

l a u n a m y

b efforts .Withthehugeamoun tofimage-caption dataavailablein theInternet ,i tmaybe n

o i t c a r t x e d n a n o i t c e l e s n o i t a m r o f n i n r a e l o t l e d o m n o i t a t o n n a e c a f r o f e v i t c e f f e e r o

m automatically .

e r e h t h t i

W cen tadvancements in mulitmodal deep learning field ,we plan to investigate how to e

c a f o t s e u q i n h c e t g n i n r a e l p e e d t n e c e r e t a r o p r o c n

(8)

t n e m e g d e l w o n k c A

. n o i t a d n u o F e c n e i c S l a n o it a N e h t y b d e t r o p p u s y l l a i c n a n i f s a w h c r a e s e r s i h T

s e c n e r e f e R

] 1

[ Zeng Z. ,Xiao S. ,Jia K. ,e tal .Learning by Associaitng Ambiguously Labeled Images [C]/ / :

3 1 0 2 , E E E I . n o i t i n g o c e R n r e tt a P d n a n o i s i V r e t u p m o

C 87 -0 715.

] 2

[ Guillaumin M. ,Mensink T. ,VerbeekJ. ,e tal .Facerecognition from capiton-based supervision :

) 1 ( 6 9 , 2 1 0 2 , n o i s i V r e t u p m o C f o l a n r u o J l a n o i t a n r e t n I . ] J

[ 46 - .8 2

] 3

[ Luo J. ,Orabona F .Learning from Candidate Labeling Sets [C]//Nips Foundaiton-advances in :

1 1 0 2 , p a i d I . s m e t s y S g n i s s e c o r P n o i t a m r o f n I l a r u e

N 1504-1512.

] 4

[ C -. Z .Tang ,M.-L .Zhang .Confiden -cerateddiscriminativepartiall abell earning .In :Proceedings n i , 7 1 0 2 , A C , o c s i c n a r F n a S , ) 7 1 ’ I A A A ( e c n e g il l e t n I l a i c i f i t r A n o e c n e r e f n o C I A A A t s 1 3 e h t f o

. s s e r p

] 5

[ Zhang M.L. ,YuF .Solvingthepartia llabe llearningproblem :aninstance-basedapproach [C]/ / 5

1 0 2 , s s e r P I A A A . e c n e g il l e t n I l a i c i f i t r A n o e c n e r e f n o C l a n o i t a n r e t n

I : 4048-4054.

] 6

[ YangJ. ,Yan R. ,Hauptmann A.G .Mulitpleinstancelearning forlabelingfacesinbroadcasting o

e d i v s w e

n [C]/ /ACMI nternaitona lConferenceonMultimedia ,Singapore ,November .DBLP ,2005: 1

3 - .4 0 ] 7

[ CoverT. ,Har tP .Neares tneighborpatternclassificaiton [J] .IEEE TransactionsonInformaiton :

) 1 ( 3 1 , 7 6 9 1 , y r o e h

T 12 - .2 7

] 8

[ Guillaumin M. ,Mensink T. ,VerbeekJ. ,e tal .Facerecognition from capiton-based supervision r

e t n I . ] J

[ nationa lJourna lofComputerVision ,2012 ,96(1): 46 - .8 2 ]

9

[ Frey B.J. ,Dueck D .Clustering by Passing Messages between DataPoints[J] .Science ,2007 , :

) 4 1 8 5 ( 5 1

3 972.

] 0 1

[ PangL ,NgoC.W .UnsupervisedCelebrityFaceNamingi nWebVideos[J] .IEEETransactions :

) 6 ( 7 1 , 5 1 0 2 , a i d e m it l u M n

o 1- .1

] 1 1

[ PhamP.T. ,MoensM.F. ,TuytelaarsT .Cross-MediaAilgnmen tofNamesand Faces[J] .IEEE :

) 1 ( 2 1 , 0 1 0 2 , a i d e m i tl u M n o s n o i t c a s n a r

T 31 - .2 7

] 2 1