• No results found

Faces Annotation in News Images Based on Multimodal Information

N/A
N/A
Protected

Academic year: 2020

Share "Faces Annotation in News Images Based on Multimodal Information"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

n o e c n e r e f n o C l a n o it a n r e t n I 8 1 0

2 Communicaiton ,NetworkandAritifcia lIntelilgence(CNA I2018) 8 7 9 : N B S

I -1-60595- 50 -5 6

n

o

it

a

m

r

o

f

n

I

l

a

d

o

m

it

l

u

M

n

o

d

e

s

a

B

s

e

g

a

m

I

s

w

e

N

n

i

n

o

it

a

t

o

n

n

A

s

e

c

a

F

Z

a

h

C

H

E

N

G

*

,

G

a

o

C

H

A

O

a

n

d

Y -

a

n

c

h

u

a

n

W

A

N

G

, r e t n e C h c r a e s e R l a c i g o l o n h c e T g n i r e e n i g n E m e t s y S g n i h c ti w S l a ti g i D l a n o it a N 0 0 0 0 5 4 a n i h C , n a n e H , u o h z g n e h Z r o h t u a g n i d n o p s e r r o C * : s d r o w y e

K Mulitmodalinformaiton, Faceannotaiton, Informaitonf usion.

.t c a r t s b

A Newsi magesusuallyappearwitht hecompanyofdescripitvecapitons.I nani mage-caption , n o i t p a c g n i d n o p s e r r o c e h t n i s e m a n w e f a h t i w d e t a i c o s s a e g a m i n i d e n i a t n o c s e c a f l a r e v e s e r e h w r i a p c a e r o f e m a n t c e r r o c e h t r e f n i o t s i n o i t a t o n n a e c a f f o k s a t e h

t h face .In this work ,a nove lface

d e l l a c k r o w e m a r f n o i t a t o n n

a faceannotationbasedonmulitmoda linformation(FAMI)isproposed . s e m a n g n i r r e f n i n o s e s u c o f I M A F , y t i r a l i m i s l a i c a f n o g n i y l e r y l n i a m s k r o w s u o i v e r p m o r f t n e r e f f i D r o f n i l a d o m i t l u m g n i s u f y

b mationextractedfromimagesandcaptions .Specifically ,wefirs textrac t o t t p m e tt a e w , t a h t r e t f A . s e c a f n o i t a t o n n a o t e t u b i r t n o c y a m t a h t n o i t a m r o f n i f o s e p y t e l p i t l u m y b s e c a f e t a t o n n

a a n information fusionmodel .Finally ,acorrectingstrategyisadoptedto improve h g i h s d l e i y k r o w e m a r f d e s o p o r p e h t , s t n e m i r e p x e r u o n i n w o h s s A . r e h t r u f s t l u s e r n o i t a t o n n a e h t -. s e h c a o r p p a e n i l e s a b l a r e v e s t s n i a g a s e c n a m r o f r e p n o it a t o n n a e c a f y t i l a u q n o it c u d o r t n I a t n o c s n o i t p a c e v i t p i r c s e d h t i w r a e p p a y l l a u s u s e g a m i s w e

N ining severa lnamesindicating who

d e t a t o n n a e b d l u o c s e g a m i s w e n n i s e c a f f I . s e g a m i e h t n i e b y a

m by namescontained incaptions

s s o r c s a h c u s s d l e i f h c r a e s e r w e f a f o t n e m p o l e v e d e h t t i f e n e b d l u o w t i , y l l a c i t a m o t u

a -media

n i c i l b u p , l a v e i r t e r n o i t a m r o f n

i telligenceminingandsoon.

r e f n i s d o h t e m g n it s i x e , s e g a m i s w e n n i s e c a f e t a t o n n a o

T correc tnames forfaces based on the

) 1 : s t n i a r t s n o c l a r e n e g g n i w o l l o

f N -on redundancyconstraint - inani mage-captionpair ,eachdetected d e t c e t e d y l e s l a f g n i d u l c n i ( e c a

f faces)i nanimagecanonlybeannotated byoneoft henamesinthe

s a r o t e s e m a n e t a d i d n a

c Null ,whichi ndicatest heground-truthnamedoesno tappeari nt hecaption . )

2 Uniquenes sconstraint- mulitplefacesofthesameperson cannotappearinan imageexcep tthe

ll u

N class[1] .3)C -o occurrenceconstraint -afaceofacertain nameismorepossibleto appearin ) 4 . ] 2 [ e m a n e h t n i a t n o c s n o i t p a c e s o h w s e g a m

i Simliartiyconstraint -twofacesshouldbelongt ot he

e d n U . r a l i m i s y l h g i h e r a y e h t f i n o s r e p e m a

s rt heseconstraints,t heearlierworksstressonexploiting

[ n I . n o i t a t o n n a e c a f r o f n o i t a m r o f n i y t i r a l i m i s e c a

f 3] ,faceannotationist reatedascandidatelabelling

se t(CLS)problem.Accordingt omaximummargincriterion ,maximummargin se t(MMS)learning [ n I . t e s e c a f d e l e b a l e t a d i d n a c m o r f s r e i f i s s a l c e c a f n r a e l o t d e s o p o r p s

i 4] ,consideringeachcandidate

g n i n r a e l e h t o t y l t n e r e f f i d e t u b i r t n o c d l u o h s e m a

n process ,an algorithm named confidence-rated

discriminativepartia llabe llearning (CORD)based on boostingtechniquesisproposed .In CORD , d n u o r g e h

t -truthconfidenceofeachcandidatenamei sestimatedandutliizedt ofacilitatethel earning [ n I . e r u d e c o r

p 5] ,a partia llabe llearning algorithm named instance-based partia llabe llearning P

I

( A L)isproposedt osolvefaceannotationproblembyaffinityrelationshipanalysisandan tierative e r u d e c o r p n o i t a g a p o r p l e b a

l overfaces.

e h t , s e g a m i s w e n n i , s s e l e h t r e v e

N faces from thesamesubjec tmay have differen tappearances

o i t a i r a v e h t f o e s u a c e

b nsi nposes,i lluminationsandexpressions ,whichreducest hereliabilityoff ace . s t l u s e r n o i t a t o n n a y r o t c a f s i t a s n u s e s u a c d n a n o it a m r o f n i y t i r a l i m i s s a d e r r e f e r k r o w e m a r f n o i t a t o n n a e c a f l e v o n a e s o p o r p e w , k r o w s i h t n

I faceannotationbasedon

multimoda i lnformaiton(FAMI) ,whichfocusesonutliizingmultipletypesofinformationextracted r e h t a r s n o i t p a c d n a s e g a m i m o r

f thanonlyfacesimilarityinformation .Ourframeworkismotivated t a h t n o i t a v r e s b o e h t y

b therearesomehiddenconsistencebetweenf acesandt heirground-truthnames t e s e m a n e t a d i d n a c n

(2)

) 5 : g n i w o l l o f s a s t n i a r t s n o

c Importanceconssitenceconsrtaint - afaceandi tsground-truenameare a

t r o p m i e s o l c e v a h o t d e t c e p x

e nce .6)Gende rconsistenceconstraint - afaceand tisground-truth f o s e p y t e r o m , s n i a r t s n o c l a n o it i d d a e h t r e d n U . s e s a i b r e d n e g e m a s e h t t n e s e r p d l u o h s e m a n

s A . k r o w e m a r f d e s o p o r p e h t n i n o i t a t o n n a e c a f r o f d e c u d o r t n i e b d l u o c n o i t a m r o f n

i showedi nFigure

1,t heproposedf rameworkcontainst hreeprocessingsteps .First ,multimodali nformaitoni sextracted n

o i t i n g o c e r e c a f g n i d u l c n i , s n o i t p a c d n a s e g a m i m o r

f results(𝐹𝑟) ,facesize(𝐹𝑠) ,faceposition (𝐹𝑝) ,

n o it i n i f e d e c a

f (𝐹𝑑) ,facegender(𝐹𝑔) ,facenumberintheimage(𝑁𝑢𝑚𝑓) ,nameposition (𝑁𝑝) ,name (

r e d n e

g 𝑁𝑔)andnamenumberi nthecaption(𝑁𝑢𝑚𝑛) .Second ,amodeli st rainedt of uset heextracted e

h t o t g n i d r o c c a , d r i h T . s e c a f e m a n o t n o i t a m r o f n

i uniquenessconstraint ,theannotation resultsare

d e t c e r r o c r e h t r u

f byaproposednamecorrectingstrategy .

e s o p o r p e w , t s r i F . s t c e p s a e e r h t n i d e z i r a m m u s e b n a c k r o w s i h t f o s n o i t u b i r t n o c n i a m e h t ,l l a r e v O

l a n o i ti d d a o w

t constrainsabou ttheconsistencebetweenfacesand theirground-truth names .Under s

e m a n r e f n i o t d e s u e b d l u o c y t i r a l i m i s e c a f s e d i s e b n o i t a m r o f n i e l p i t l u m , s n i a r t s n o c e s e h

t forfaces .

, s e c a f e t a t o n n a o t d e s u f d n a s y a w e l b a n o s a e r n i d e t c a r t x e e r a n o i t a m r o f n i f o s e p y t e n i n o t p u , d n o c e S

h c i h

w ismos tinfaceannotation field to ourbes tknowledge .Third, wegiveamethod to obtainan n

o it a m r o f n i l a d o m i t l u m e s u f o t l e d o m l a m i t p

o andi nfernames .

1 e r u g i

F . Thei llustrationofFAMIframework.

n o it a m r o f n I l a d o m it l u M n o d e s a B n o it a t o n n A e c a F

d n a n o it i n if e D m e l b o r

P Nota iton

e g a m i f o n o i t c e ll o c a n e v i

G -capitonpairs ,facesand namesarefirs tdetectedfromthecollection .In e

g a m i n

a -caption pair ,wedenotetheface se tas 𝐹 𝑓1,𝑓2,,𝑓𝑝 ,consisting ofp faces detected a

c a m r o f d e t c e t e d s e m a n e h T . e g a m i e h t m o r

f ptionwtihan addiitona lclassNullaredenotedas 𝐶

�𝑐1,𝑐2,…,𝑐𝑞,𝑁𝑢𝑙𝑙� ,whichi st hecandidatenamesetf oreachf acei ncorrespondingi mage .Compared

e m a n l a n o i t i d d a e h t o

t Null ,𝐶𝑟 𝑐1,𝑐2,,𝑐𝑞 isregardedast herea lcandidatenamese tforeach d

n u o r g e h T . e c a

f -truth namesand predictive namesof 𝐹 are represented by 𝑁 𝑛1,𝑛2,,𝑛𝑝 d

n

a 𝑌 𝑦1,𝑦2,,𝑦𝑝 respecitvely ,where 𝑛𝑖 representst heground-truthnameoff ace 𝑓𝑖 and 𝑦𝑖 r

p e

r esents the corresponding predictive name obtained by an annotation model . Follow the [

n i n o it p i r c s e

d 3,6] ,an image-caption paircan be regarded asabag containing multipleinstances . r

i a p e m a n e t a d i d n a c e c a f a s t n e s e r p e r e c n a t s n i h c a

E {𝑓,𝑐}�𝑓 ∈𝐹,𝑐∈𝐶�, and theinstancewli lbe

f i e v i t i s o p d e l e b a

l 𝑐 istheground-truthnameof𝑓 ,orlabeledasnegativeotherwise .Aftermultiple f

o s e p y

t informaiton are extracted ,theinstance {𝑓,𝑐} can berepresented asavector 𝑋𝑓,𝑐 .To ,

s e c a f f o s e m a n e h t r e f n

(3)

𝑍�𝑋(𝑓,𝑛)�� 𝑍�𝑋(𝑓,𝑐)� (1) e

r e h

w 𝑐𝐶 ,and the equality holds when 𝑛 𝑐 .In Eq.1, 𝑍�𝑋(𝑓,𝑐)� can be interpreted as the f

o e c n e d i f n o

c 𝑓 beingnamed by 𝑐 .Consequenlty ,thepredictivenameof 𝑓 could beinferred as :

s w o l l o f

𝑦� argmax𝑐∈𝐶𝑍�𝑋(𝑓,𝑐)� ( 2)

n o it c a r t x E n o it a m r o f n I l a d o m it l u M

a r t x e e r a n o i t a m r o f n i f o s e p y t e n i n , k r o w s i h t n

I cted from image-capiton pairs ,including face

( t l u s e r n o i t i n g o c e

r 𝐹𝑟) ,facesize(𝐹𝑠) ,facepostiion(𝐹𝑝) ,facedefinition(𝐹𝑑) ,facegender(𝐹𝑔) ,name (

n o i t i s o

p 𝑁𝑝) ,namegender(𝑁𝑔) ,thenumberoffaceintheimage(𝑁𝑢𝑚𝑓)and thenumberofname n

i thecaption(𝑁𝑢𝑚𝑐) .Int heset ypesofi nformation,t heextractingprocessesoft hefirs tsevenones e

r o m e r

a complicatedandwil lbediscussedi nt hefollowing .Beforet hediscussion ,onepoin tneeds e

m u s s a e w : d e z i s a h p m e e b o

t thestepoffaceandnamedetectionhasbeenfinished.

e c a

F RecogniitonResutl

s w e n n i y ti l i b a i l e r g n i y f s it a s n u s a h y t i r a l i m i s e c a f h g u o h t l

A images,i tstil lcanofferi mportan tclues

. g n i r r e f n i e m a n r o

f Inthiswork ,facesimliarityisexploitedbyafacerecognizerbasedonmodified K-Neares tNeighbors algorithm (KNN)[7] .Themain idea of tradiitona lfacerecognizer based on

e c a f a n e v i g : g n i w o l l o f s a s i N N

K 𝑓 and tis rea lcandidate name set 𝐶𝑟 𝑐1,𝑐2,,𝑐𝑞 ,and a

t e s g n i n i a r

t 𝑇 𝑓1,𝑛1,𝑓2,𝑛2,𝑓𝑡,𝑛𝑡 consistingofeachcandidatename’smultipleground -,

s e c a f h t u r

t the distances between 𝑓 and each train sample can be calculated via the predefined t

e s e h t g n i t o n e D . c i r t e m e c n a t s i

d consisitngofthetop-k samplesneares tto 𝑓 as 𝑁𝑘𝑓 ,thename

f

o 𝑓 canbei nferredbyt hemajorityvotingstrategyasfollowing:

𝑦� argmax𝑐𝑗∈𝐶� ∑𝑓𝑖∈𝑁𝑘(𝑓)𝐼�𝑛𝑖 � 𝑐𝑗�� (3)

e r e h

w 𝐼 istheindicativefunction, 𝐼�𝑛𝑖 𝑐𝑗 1 when 𝑛𝑖 𝑐𝑗 and 𝐼�𝑛𝑖 𝑐𝑗 0 otherwise. o

t d e s u s i N N K n o d e s a b r e z i n g o c e r e c a f n e h w , r e v e w o

H annotatefacesin newsimages ,thefirs t

s i t i t a h t s i y t l u c i f f i

d no teasytofindsufficien tground-truth facesforeachcandidatenametobulid ,

m e l b o r p s i h t s s e r d d a o T . t e s g n i n i a r t e h

t weattemptt oobtaint het rainingse tfromt hegivenimage

-s r i a p n o it p a

c on thebasisofanassumption proposed in [8] :in thecandidatename 𝑐’srelated face t

s i s n o c t e s e h t e .i ( t e

s s of faces detected from al limages associated with candidatename 𝑐) , the d

n u o r

g - rt uthfacesof 𝑐 occupythemajority ,andadditiona lfacesofanyotherpeopleappearjus ta w

e

f times .Accordingly ,afteraclusteringprocessconductedi ntherelatedfacese tof 𝑐,t hefacesi n n

a c r e t s u l c t s e g g i b e h

t becollected asthetrainingsamplesof 𝑐 .In thispaper ,affinity propagation g

n i r e t s u l

c algorithm(APclusteringalgorithm)[9] ,no trequiring thenumberofhiddenclustersasa s

i , r e t e m a r a p t u p n i y r a s s e c e

n introducedt oclusterfaces .

n i a r t e h t r e t f

A ingse tobtained,t henameof 𝑓 canbepredictedbyEq.3 .However,i nordert oge t n

i a t n o c o t d e t c e p x e e r a s t l u s e r n o i t i n g o c e r e h t , e c n a m r o f r e p n o i t a t o n n a r e t t e

b the information of

y t i l i b a b o r p n o it c i d e r

p 𝐹𝑟𝑓,𝑐 .ThereforeEq.3 ismodified tooutpu tth erecogniiton probability as :

g n i w o l l o f

𝐹𝑟(𝑓,𝑐)� 2arctan�∑𝑓𝑖∈𝑁𝑘(𝑓)�1� 𝑑(𝑓𝑖,𝑓)�∙𝑤𝑖∙𝐼�𝑛𝑖 � 𝑐��/𝜋 )( 4

e r e h

w 𝑑(𝑓𝑖,𝑓) ist heEuclideandistancebetweent rainingsample 𝑓𝑖 a ndtheface 𝑓 .𝑤𝑖 represents s

i d n a e c n a t s i d h c a e f o t h g i e w e h

t definedas 1/𝑑(𝑓𝑖,𝑓) ,indicatingthesamplescloserto 𝑓 having

n i a r t s n o c o t r e d r o n I . y t i l i b a b o r p e h t n o e c n e u l f n i e r o

m 𝐹𝑟(𝑓,𝑐) to [0,1] ,2arctan(… )/𝜋 isused

a v e h t r o f n o it a m r o f s n a r t r a e n i l n o n a e k a m o

t luei nbracket .Asfort hecase 𝑐 𝑁𝑢𝑙𝑙 ,wefollowt he

[ n i a e d

i 10]andmodeli tasaproblemofi nformaitonuncertainty .Theuncertaintyreachest hehighes t f

o y t i l i b a b o r p e h t n e h w , y l e s r e v e R . d e t u b i r t s i d y l m r o f i n u e r a s e i t i l i b a b o r p e h t n e h

w 𝑓 recognizedas

s i e m a n

a noticeablyhighert hanothernames,t heuncertaintybecomeslower .Therefore ,𝐹𝑟(𝑓,𝑁 )𝑢𝑙𝑙 :

(4)

𝐹𝑟(𝑓,𝑁 )𝑢𝑙𝑙 � � ∑𝑐∈𝐶𝑟𝐹𝑟(𝑓lo,𝑐)g2lo𝑞g2𝐹𝑟(𝑓,𝑐)� 1 (5)

e r e h

w 𝑞 is the name number in rea lcandidate name se t𝐶𝑟 fo 𝑓 .Besides ,the firs tterm is the e

h t d n a , y p o r t n e d e z i l a m r o

n aim ofminusoneisintended to make 𝐹𝑟(𝑓,𝑁 )𝑢𝑙𝑙 negaitveso tha ti t

m o r f d e h s i u g n i t s i d e b d l u o

c 𝐹𝑟(𝑓,𝑐),𝑐𝐶𝑟. sA 𝐹𝑟(𝑓,𝑁𝑢𝑙𝑙) can reflec tthe distribution of �𝐹𝑟(𝑓,𝑐),𝑐∈𝐶𝑟�, i tmay facliitate face annotation ,so weintroduce 𝐹𝑟(𝑓,𝑁 )𝑢𝑙𝑙 to 𝑋(𝑓,𝑐) as an

.l l e w s a n o i t a m r o f n i l a n o i t i d d a

e c a

F S ize

e c n a t r o p m i e h t o t d e t a l e r e b o t d e r e d i s n o c e r a s e z i s e c a f , e g a m i s w e n a n

I off aces .Generally,i tseems

e c n a t r o p m i e h t t a h

t of faces are positively related to their sizes .In [ 11 ,] 𝐹𝑠(𝑓) is defined as the f

o e g a t n e c r e

p 𝑓’s bounding box areaovertota lareaofal lfaces’ in theimage .Nevertheless, this i

e h t s e s s o l n o i t i n i f e

d nformation tha thowafacesizeis’prominent’compared with otherface’sin .

e g a m i e m a s e h

t Inordert oi ncludet hisi nformationaswell ,wedefine 𝐹𝑠(𝑓) asfollow:

𝐹𝑠(𝑓)� 𝐹𝑎(𝑓)� ∑𝑓𝑖∈𝐹�𝐹𝑎(𝑓)� 𝐹𝑎(𝑓𝑖)� ( )6

n

I Eq.6 ,𝐹 ist hefaceseti nt hei mage , 𝐹𝑎 ist henormalizedfaceareaasdefinedi n[11 .]

e c a

F Po is iton

o t d n e t y e h t , e n o e m o s f o s e r u t c i p e k a t s r e h p a r g o t o h p n e h

W takethepersons’facesin thecenterof

, e r o f e r e h T . e g a m i e h

t comparedt ot hefacesi ncorner ,wet hinkt hefacesappearingi nt hecenterpar t t

c a r t x e o T . t n a t r o p m i e r o m e r a s e g a m i f

o thei nformationoffacepostiionfor 𝑓 ,wefirs tcalculatet he

Euclidean distance 𝐹𝑟𝑝(𝑓) from thecenter of 𝑓’sbounding box to thecenterofthe image .Then 𝐹𝑟𝑝(𝑓) isnormalizedto 𝐹𝑛𝑝(𝑓) bydividesthetota ldistancesofal lfaces’intheimage .Afterthat ,

s

a 𝐹𝑟 ,the information tha thow a distance is ’prominent’ compared wtih others is introduced as :

s w o l l o f

𝐹𝑝(𝑓)� 𝐹𝑛𝑝(𝑓)� ∑𝑓𝑖∈𝐹�𝐹𝑛𝑝(𝑓)� 𝐹𝑛𝑝(𝑓𝑖)� ( )7

e r e h

w 𝐹𝑛𝑝(𝑓)� 𝐹𝑟𝑝(𝑓)/ ∑𝑓𝑖∈𝐹𝐹𝑟𝑝�𝑓𝑗� ,and 𝐹 ist hefaceseti nt hei mage.

e c a

F Deifni iton

o t s d n e t a r e m a c e h t , e r u t c i p a s e k a t r e h p a r g o t o h p a n e h

W be focused on the faces of important

n o s r e

p s .Thisaction wlilmaketheimportan tfacesinimagesclearer than otherfaces. To evaluate e

c a f a f o n o i t i n i f e d h c u

s 𝑓 ,we firs tuse point sharpness algorithm to obtain the raw definition 𝐹𝑟𝑑(𝑓) .Then ,we normalize 𝐹𝑟𝑑(𝑓) to 𝐹𝑛𝑑(𝑓) and introduce the ’prominent’ information as

: s w o l l o f

𝐹𝑝(𝑓)� 𝐹𝑛𝑝(𝑓)� ∑𝑓𝑖∈𝐹�𝐹𝑛𝑝(𝑓)� 𝐹𝑛𝑝(𝑓𝑖)� ( )8

e r e h

w 𝐹𝑛𝑑(𝑓)� 𝐹𝑟𝑑(𝑓)/ ∑𝑓𝑖∈𝐹𝐹𝑟𝑑�𝑓𝑗�.

e m a

N Po isiton

e r u t a r e t i

L [ 21 ]hasprovedt hatt henamesclosert ot hebeginningofacapitonaremorepossiblet obe n

i s n o i t i s o p e m a n e h t e v e i l e b e w , s i h t y b d e r i p s n I . e g a m i g n i d n o p s e r r o c e h t n i d e r u t c i

p captionscould

[ n I . s e m a n e h t f o e c n a t r o p m i e h t t c e l f e

r 1 ,2] thenameposiitoni sdefinedasfollows:

𝑁𝑝(𝑐)� 𝐿𝑐𝑎𝐿𝑝�𝑐𝑡𝑖𝑜𝑛 ( )9

e r e h

w 𝐿𝑐 isthelengthofcaption fromthebeginningtothelocation of 𝑐 ,and 𝐿𝑐𝑎𝑝𝑡𝑖𝑜𝑛isthe ,

n o i t i n i f e d s i h t r e d n u , r e v e w o H . n o i t p a c e l o h w e h t f o h t g n e

l 𝑁𝑝(𝑐) canno treflec ttheorderwhere

𝑐 appearsin thecaption relativeto othercandidatenames .Therefore ,weadap ttwo typesofname n

o i t i n i f e d t s r i f e h T . y l s u o e n a tl u m i s n o i t i n i f e d n o i t i s o

p 𝑁𝑝1(𝑐) is same as Eq.9 ,and the second

n o i t i n i f e

(5)

h t o b , s n o i t p a c y n a n i r a e p p a t ’ n d l u o w t i s

a 𝑁𝑝1(𝑁 )𝑢𝑙𝑙 and 𝑁𝑝2(𝑁 )𝑢𝑙𝑙 should bese tasinfintiies .

t o n s i y t i n i f n i , r e v e w o

H suitableasani nputt omostl earningalgorithms,t huswesetboth 𝑁𝑝1(𝑁𝑢 )𝑙𝑙 d

n

a 𝑁𝑝2(𝑁 )𝑢𝑙𝑙 to20t hati smuchl argert hanthevaluesofrea lnamesrespectively.

e m a

N Gender

h t i w d e l e b a l s e m a n 0 0 0 6 e t a m i x o r p p a s e d u l c n i h c i h w y r a n o i t c i d e m a n a d l i u b t s r i f e

W gender

f o r e b m u n a m o r f d e t c e l l o c e r a h c i h w , n o i t a m r o f n

i countries or cutluresand have covered al lthe

n i d e s u s e m a

n ourexperiments .𝑁𝑔(𝑐) isse tas1 when 𝑐 isamalenameandse tas0 otherwise . e

h t e n i m r e t e d o t r e d r o n i , s e d i s e

B gendersoffaces ,weusealmos t7000labeledfacestot rainaface a

l c r e d n e

g ssifierbasedonsuppor tvectormachine(SVM) .Thet rainingfacescoveral lhumanraces , d

n

a facesofeachgenderoccupyhalfoft hefacese trespectively. 𝐹𝑔(𝑓) isdefinedast heclassifier’s f

o e c n e d i f n o c e h t g n it a c i d n i y t il i b a b o r p t u p t u

o 𝑓 beingamaleface .Asfort headditiona lnameNull ,

s a t e s s i r e d n e g s t

i -1,i nordert odistinguishi tfromrea lcandidatenames.

g n it c e r r o C s tl u s e R n o it a t o n n A d n a g n i r r e f n I e m a N

e c n a t s n i e h t , e r u d e c o r p n o it c a r t x e n o it a m r o f n i e h t r e t f

A {𝑓,𝑐} can be represented as 𝑋{𝑓,𝑐}

�𝐹𝑟(𝑓,𝑐),𝐹𝑟(𝑁 )𝑢𝑙𝑙 ,𝐹𝑠(𝑓),𝐹𝑝(𝑓),𝐹𝑑(𝑓),𝐹𝑔(𝑓),𝑁𝑢𝑚𝑓,𝑁𝑝1(𝑐),𝑁𝑝2(𝑐),𝑁𝑔(𝑐),𝑁𝑢𝑚𝑐� .Then thenex t

f o e m a n e h t g n i r r e f n i s i p e t

s 𝑓 .According to Eq.2, thecoreofthisprocess isbuilding themode l 𝑍:𝑋(𝑓,𝑐)→𝑧,𝑧∈𝑅 following thecriterionin Eq.1 .In thiswork ,wtih setting thetargetsas1 for

e v i t a g e n r o f 0 d n a s e c n a t s n i e v i t i s o

p instances ,we attemp tto learn the mode l𝑍 by regression

n I . s e u q i n h c e

t this paper, Neura lNe tRegression (NNR) ,having favorable nonlinear mapping e

c n a m r o f r e

p andhigherparalle linformation processingcapabiltiy ,isutiilzedtobuild 𝑍. Specially , o

w t

a -layerfeedforwardnetwork withsigmoidtransferfunctionsin both hiddenand outpu tlayer is .

s e c n a t s n i f o s t e g r a t e h t t i f o t d e i l p p a

e v i f , r a f o

S of the six assumptions mentioned in introduction have been used in the proposed s e c a f e l p i t l u m t a h t e s a c e h t e b y a m e r e h t , tl u s e r a s A . n o i t p m u s s a s s e n e u q i n u e h t t p e c x e k r o w e m a r f

i t c e r r o c a , n o i t a u t i s s i h t d i o v a o T . e m a n e m a s a y b d e t a t o n n a e r a e g a m i n a n

i ngstrategyi sadaptedas

: s w o l l o f

1 m h ti r o g l

A TheCorrectingStrategy

t u p n

I :𝑍 �𝑋�𝑓𝑖,𝑐𝑗��, (1 𝑖 𝑝,1 𝑗 𝑞 1) ,𝑌𝑟 (Therawannotationresults)

t u p t u

O : 𝑌 (Thefina lannotationresults) 1 F or 𝑖 =1t o 𝑝 od

2 F or 𝑖 =1t o 𝑝 od

3 I (f 𝑦𝑖 = 𝑦𝑘)&&(𝑖 𝑘)t hen

4 I f 𝑍 �𝑋�𝑓𝑖,𝑦𝑗�� 𝑍�𝑋(𝑓𝑘,𝑐𝑘)� then

5 𝑦𝑖 = 𝑦𝑖

6 𝑦𝑘 𝑎𝑟𝑔𝑠𝑒𝑐𝑚𝑎𝑥𝑐𝑗𝐶𝑍 �𝑋�𝑓𝑘,𝑐𝑗��

7 else

8 𝑦𝑘 = 𝑦𝑘

9 𝑦𝑖 𝑎𝑟𝑔𝑠𝑒𝑐𝑚𝑎𝑥𝑐𝑗𝐶𝑍 �𝑋�𝑓𝑖,𝑐𝑗��

0

1 E n fdi

1

1 Endi f

2

1 Endfor

3

1 Endfor

e r e h

(6)

s t n e m i r e p x E

t e d u l c n i s t n e m i r e p x e e h

T w oparts :Firs,tt hedatase tandperformancemetricsarei ntroduced .Second , f

o s s e n e v i t c e f f e e h t e t a g i t s e v n i e

w eachinformationi nf aceannotation ,andbenchmarkFAMIagains t l

a r e v e

s baseilneapproaches.

e c n a m r o f r e P d n a t e s a t a

D Metrics

n i d e s u t e s a t a d e h

T ourexperiments is Labe lYahoo! News[ . 2] Both faces and names have been ,

d e t c e t e

d and faces have been represented to 4992-dimensional feature vectors . By principa l e

w , ) A C P ( s i s y l a n a t n e n o p m o

c reducethedimension ofthefacia lfeaturevectorsto300.Following t

s a e l t a g n i r r u c c o s e m a n 4 1 2 e h t n i a t e r e w , ] 0 1

[ 20t imesi nt hecaptionsandt rea tothersasNullclass .

e h

T imagest ha tdono tcontainanyoft he214namesareremoved.Themoredetailsaboutt hedatase t e

l b a T n i n w o h s s

i 1 .I tshouldbenotedt hatthei temnamesi nTable1i ncludesthenumberofnames g

n i r a e p p a l li t s t u b s e m a n 4 1 2 e h t t p e c x

e int hecapitonsaswell .Andt heground-truthratiorepresents

d n u o r g e s o h w s e c a f f o e g a t n e c r e p e h

t -truthnamesarerea lnames(rathert hanNull)overallt hefaces

e h t n

i datase.t We use the Three performance metrics are utiilzed in the experiments, including y

c a r u c c a e h T . l l a c e r d n a , n o i s i c e r p , y c a r u c c

a isthe percentage of correctly annotated faces (also

g n i d u l c n

i thecorrectlyannotatedfaceswhoseground- rtu th namearetheNull)overal lfaces ,whlie e

h t s i n o i s i c e r p e h

t percentageofcorrectlyannotatedfacesovert hefaceswhichareannotatedasrea l e

g a t n e c r e p e h t s i l l a c e r e h t d n a , s e m a

n of correclty annotated rea lfacesover therea lfaces whose

d n u o r

g -truthnamesareno tNull.

b a

T l 1. e Detailsoft hedataset. s

e g a m

I Faces Nameclasses names Ground-truthratio 6

2 1 0

1 15864 2 14 16947 0.56

e m a

N InferringResutls

, s n o i t a t u m r e p t n e r e f f i d 5 r e v o d e m r o f r e p e r a s t n e m i r e p x e e h

T randomlysampling50%imagesand

d n a , t e s g n i n i a r t s a s n o it p a

c usingtheres tfortesting .Thefinalperformanceisse tastheaverageof e

c n a m r o f r e p ’ s n o i t a t u m r e p t n e r e f f i d 5 e h

t .Duringt heexperiments,t henumberofhiddenneuronsi s

e c n e u l f n i e h t y d u t s o t r e d r o n I . T N N r o f 0 2 s a t e

s ofacertaini nformationonfaceannotaiton ,ametric

( e t a r n o it u b i r t n o c d e m a

n 𝐶𝑜𝑛) i sdefinedi nEq.10:

𝑛 𝑜

𝐶 �𝑥𝑖� � 9 𝑀(𝑀�𝑋)𝑋~𝑀e(𝑥𝑋𝑖)~e�𝑥𝑗�

𝑗=1 ( 01 )

e r e h

w 𝑥𝑖 ist hei-thi nformationi n 𝑋𝑓,𝑐 ,and 𝑋~e(𝑥𝑖) represents 𝑋𝑓,𝑐 excepting 𝑥𝑖 .Misa e

c n a m r o f r e p n i a t r e

c metric (.ie one ofaccuracy ,precision ,and recall) .The performance ofFAMI t

u p n i n o it a m r o f n i t n e r e f f i d h t i

w andthecorresponding 𝐶𝑜𝑛 arereportedinFigure2 .According to 2

e r u g i

F ,𝐹𝑟 makes the greates tcontribution to the annotation result .I tiseasy to be understo od e

s u a c e

b 𝐹𝑟 are helpfu lfor face annotaiton in any images bu tother information may be no.t For l

e s l a f t o n ( s e c a f l a e r g n i n i a t n o c e g a m i n a n i , e l p m a x

e y detected results)whoseground-truth names

l l a e r

a Null ,𝐹𝑟 ist heonlyusefuli nformationt oannotatet hesefaceswithNull .Besides 𝐹𝑟 ,both 𝐹𝑠 d

n

a 𝐹𝑔 show outstanding contributions to face annotation .On the other hand ,𝑁𝑝2 degrades the f

r e

p ormancesonprecision .Wet hinki tcouldbei nterpretedtha tnamesou toft heselected214names n

e h w d e p p i k s e r

a counitng the appearance orders of names ,which causes that many noises are n

i d e n i a t n o

c 𝑁𝑝2 .However , 𝑁𝑝2 improves the performance on recall ,and the improvemen tis r

e t a e r

(7)

2 e r u g i

F . Thecontributionrateofdifferenti nformationi nFAMI.

b a

T l 2. e Performancesofdifferen tapproaches. h

c a o r p p

A Accuracy Precision Recall L

A P

I 0.6640 0.6158 0.8768 S

M

M 0.6366 0.8660 0.4207 D

R O

C 0.6452 0.6217 0.8854 I

M A

F 0.8543 0.8503 0.8538

e h t e z y l a n a r e h t r u f o t r e d r o n

I effectiveness of our framework, three baseline face annotation r

o f d e y o l p m e e r a s e h c a o r p p

a comparativestudies:MMS[3] ,CORD[4] ,andIPAL[5 .] Eachapproach n

i s r e t e m a r a p d e t s e g g u s h t i w d e r u g i f n o c s

i theoriginall iterature .Wet es teachapproachesfivet imes

h t i

w thesametes tsetsasFAMI ,and thefinalperformanceisse tastheaverageofvaluesofeach .

c i r t e

m Theresutls are shown in Table 2. tI can be observed that: IPAL and CORD show better l

l a c e r n o e c n a m r o f r e

p thanFAMI ,bu tworseperformanceonaccuracyand precision .I treflectst ha t e

r a D R O C d n a L A P

I n otcapableofannotaitngnoisefaces(faceswhoseground-truthnamesareNull) i

s w e n n i t s i x e y l l a r e n e g t a h

t mages .ComparedwithFAMI ,MMSachievecomparableorbetterr esutls e

s r o w m r o f r e p t u b , n o i s i c e r p n

o ontwoothermetrics ,reflectingtha tMMSt endstol abelfaceswith

ll u

N label s.

e l b a r o v a f s w o h s n o i t a m r o f n i l a d o m i t l u m , n o i s u l c n o c n

I capabiilty of inferring namesfor faces .

l a d o m i t l u m n o d e s a

B informaiton fusion ,FAMI is more robus tto noise faces than the baseilne .

s e h c a o r p p a

k r o W e r u t u F d n a n o is u l c n o C

e v a h e w , r e p a p s i h t n

I presented anove lframework offaceannotation in newsimagesdomain .In e

h t t i o l p x e o t r e d r

o images and corresponding captions for face annotation as fully as possible , d

e t c a r t x e e r a n o i t a m r o f n i f o s e p y t e l p i t l u

m from news data and fused to infer names .In the

t n e m i r e p x

e results ,theproposed framework presentstheencouraging performanceagains tsevera l g

n i t a c i d n i , s e h c a o r p p a e n i l e s a

b the remarkable effectiveness of mutlimoda linformaiton for face k

r o w s i h t n i d e s u n o i t a m r o f n i e h t , r e v e w o H . n o i t a t o n n

a isselectedandextractedi nt hewaysdesigned

l a u n a m y

b efforts .Withthehugeamoun tofimage-caption dataavailablein theInternet ,i tmaybe n

o i t c a r t x e d n a n o i t c e l e s n o i t a m r o f n i n r a e l o t l e d o m n o i t a t o n n a e c a f r o f e v i t c e f f e e r o

m automatically .

e r e h t h t i

W cen tadvancements in mulitmodal deep learning field ,we plan to investigate how to e

c a f o t s e u q i n h c e t g n i n r a e l p e e d t n e c e r e t a r o p r o c n

(8)

t n e m e g d e l w o n k c A

. n o i t a d n u o F e c n e i c S l a n o it a N e h t y b d e t r o p p u s y l l a i c n a n i f s a w h c r a e s e r s i h T

s e c n e r e f e R

] 1

[ Zeng Z. ,Xiao S. ,Jia K. ,e tal .Learning by Associaitng Ambiguously Labeled Images [C]/ / :

3 1 0 2 , E E E I . n o i t i n g o c e R n r e tt a P d n a n o i s i V r e t u p m o

C 87 -0 715.

] 2

[ Guillaumin M. ,Mensink T. ,VerbeekJ. ,e tal .Facerecognition from capiton-based supervision :

) 1 ( 6 9 , 2 1 0 2 , n o i s i V r e t u p m o C f o l a n r u o J l a n o i t a n r e t n I . ] J

[ 46 - .8 2

] 3

[ Luo J. ,Orabona F .Learning from Candidate Labeling Sets [C]//Nips Foundaiton-advances in :

1 1 0 2 , p a i d I . s m e t s y S g n i s s e c o r P n o i t a m r o f n I l a r u e

N 1504-1512.

] 4

[ C -. Z .Tang ,M.-L .Zhang .Confiden -cerateddiscriminativepartiall abell earning .In :Proceedings n i , 7 1 0 2 , A C , o c s i c n a r F n a S , ) 7 1 ’ I A A A ( e c n e g il l e t n I l a i c i f i t r A n o e c n e r e f n o C I A A A t s 1 3 e h t f o

. s s e r p

] 5

[ Zhang M.L. ,YuF .Solvingthepartia llabe llearningproblem :aninstance-basedapproach [C]/ / 5

1 0 2 , s s e r P I A A A . e c n e g il l e t n I l a i c i f i t r A n o e c n e r e f n o C l a n o i t a n r e t n

I : 4048-4054.

] 6

[ YangJ. ,Yan R. ,Hauptmann A.G .Mulitpleinstancelearning forlabelingfacesinbroadcasting o

e d i v s w e

n [C]/ /ACMI nternaitona lConferenceonMultimedia ,Singapore ,November .DBLP ,2005: 1

3 - .4 0 ] 7

[ CoverT. ,Har tP .Neares tneighborpatternclassificaiton [J] .IEEE TransactionsonInformaiton :

) 1 ( 3 1 , 7 6 9 1 , y r o e h

T 12 - .2 7

] 8

[ Guillaumin M. ,Mensink T. ,VerbeekJ. ,e tal .Facerecognition from capiton-based supervision r

e t n I . ] J

[ nationa lJourna lofComputerVision ,2012 ,96(1): 46 - .8 2 ]

9

[ Frey B.J. ,Dueck D .Clustering by Passing Messages between DataPoints[J] .Science ,2007 , :

) 4 1 8 5 ( 5 1

3 972.

] 0 1

[ PangL ,NgoC.W .UnsupervisedCelebrityFaceNamingi nWebVideos[J] .IEEETransactions :

) 6 ( 7 1 , 5 1 0 2 , a i d e m it l u M n

o 1- .1

] 1 1

[ PhamP.T. ,MoensM.F. ,TuytelaarsT .Cross-MediaAilgnmen tofNamesand Faces[J] .IEEE :

) 1 ( 2 1 , 0 1 0 2 , a i d e m i tl u M n o s n o i t c a s n a r

T 31 - .2 7

] 2 1

References

Related documents

In order to support new Science Park and Technology Park and their professionals, the International Association of Science Parks and Areas of Innovation (IASP) is organising

While these studies are at least in agreement as to the finding of high heterogeneity in mutation rates with respect to loci and on the positive correlation of muta- tion rate

Overall, we found that all five measures correlated with at least some of the other constructs — health care need, wealth, and risk protection — in expected ways; however,

Abbreviations: aUC, area under the curve; lzapp, length of diaphragm zone of apposition; FrC, functional residual capacity; CaT, COPD assessment Test; Pr,

It was not until the patient developed facial flushing, conjunctival injection and polyuria that the conditions of high fever, thrombocytopenia and acute renal insufficiency

Alternative specifications of the VEC model were initially estimated separately for each productivity measure including also all the other variables: the oil price (oil), the

Figure 6-4 Expression of integrin a6 mRNA during zebrafish

Abbreviations: 1MT, 1-methyl-d-tryptophan; ACT, adoptive cell transfer; APC, antigen-presenting cell; BRAFi, BRAF inhibitor; CARs, chimeric antigen receptors; CD, cluster