Banknote Recognizer: From Theory to Application

(1)

8 1 0

2 InternaitonalConference no Communicaiton,Network da n AritifcialIntelilgence(CNAI2018) :

N B S

I 89 -1-7 60595- 50 -5 6

e

t

o

n

k

n

a

B

R

e

c

o

g

n

i

z

e

r

:

F

r

o

m

T

h

e

o

r

y

o

t

A

p

il

c

a

it

o

n

g

n

o

L

-y n

i Y

U

N

G

1

,

M

i

n

g

- a

h X A

u

I

2

a k

n

d

Y -c

i

h

u

n

g

WU

1 *,

1_D_e_p_a_r_t_m_e_n_t_f_o_E_l_e_c_t_r_i_c_a_l_d_a_n _E_l_e_c_t_r_o_n_i_c_E_n_g_i_n_e_e_r_i_n_g_,_T_h_e_U_n_i_v_e_r_s_ti_y_f_o_H_o_n_g_K_o_n_g_,_H_o_n_g_K_o_n_g

2_S_c_h_o_o_l_f_o_E_l_e_c_t_r_o_n_i_c_s_d_a_n _I_n_f_o_r_m_a_it_o_n_T_e_c_h_n_o_l_o_g_y_,_S_u_n_{Y - n}_a_t_s_e _U_n_i_v_e_r_s_ti_y_,_G_u_a_n_g_z_h_o_u_,_C_h_i_n_a

g n i d n o p s e r r o C

* author

: s d r o w y e

K Deeplearning,Imageclassiifcaiton,Moblieappilcaiton.

t c a r t s b

A . nI recentyears, Deep Convolutional Neural Network (CNN) ah s demonstrated a robust e

c n a m r o f r e

p da n reaches et h state- fo - et - th a r performance ni many image processing related tasks, h

c u

s sa objectdetection,imageclassification ro somenaturallanguageprocessingtasks. However, t

s o

m fo et h studiestend ot focus no et h development fo et h modelarchitecturedesign,especiallyw ith e

m o

s standard datasets such sa MNIST ro ImageNet da n only a wfe implementing et h advanced s

e i g o l o n h c e

t oint real- el if application[ n1].I sthi study, ew furtherintroduce et h w -ell designedmodel n

i imageclassification da n demonstrate et h designed modelapplied into a mob lieenvironmen.t nI a

t a

d preparation,differentdatacollecitonmethodswereevaluated da n examiningdifferentmethods r

o

f creating a datase.t eW further investigate et vh ad antages da n restrictions fo et h mobileneural k

r o w t e

n mode.lResults lilustrated et h consistency fo performancewhentransferringfromcomputer t

n e m n o r i v n

e ot mobileenvironmen.t nI addiiton, a real- el if productw asmadebased no et h theory d

n

a investigatio yn b oc -operating w ith local blind society a nd software development company, g

n i m r o

f et h firstreal-timeA .I.application rf o visuallyimpairedw ithhighmobility da n bulit- ni neural k

r o w t e

n mode,lcalled“HongKongBanknoteRecognizer”.

n o it c u d o r t n I

p e e

D NeuralNetwork(DeepLearning), a branchundermachinelearning, sh a beenw elldeveloped d

n

a have a widerange fo applications ni recentyears. nI et h early stage, ti w asjust ni et h form fo i

t l u

m -layerPerceptron(MLP)networkw ithbackpropagaitonuntil et h break-through fo AlexNet yb x

e l

A Krizhevsky da n Geoffrey Hinton [ t2]. I demonstrated et h high capability fo neural network h

c i h

w on longerbeing restricted yb et h smallsize da en t h grey-scaleimagedataset ni MNIST tb u e

l b

a ot solvethousands fo class feso classification ktas ni ImageNet[ e3].T h success fo deeplearning n

i largescaleimageclassificationgrealtyincrease et h popularity da n accelerate et h development fo ,t

i includingcommunity, software/hardwaredevelopment da n attracitng morepeople ot contribute. t

A et h same time, w ith rapid development fo et h technologies da n models, they havesuccessfully d

e r r e f s n a r

t da n appiled oint simliartasks. nI 2015,Google sh a reached na excellentperformance ni e

c a

f recognitiontaskswhich sh a 99.63%accuracyusing LabeledFaces ni W ild(LFW)dataset da n %

2 1 . 5

9 ni Youtube Faces Database [ 4]. T hehigh accuracy da n solid performance sh a shown et h s

s e n t s u b o

r fo neuralnetwork da tn a et h sametimeindicates et h mature fo et h technology. sA a resul,t e

r o

m produc dtsa n applicationswerebuiltbased no ti da n started ot merge oint ro u dailylife,such sa f

l e

s -driving rc na i Tesla da n faceverification ni AppleiPhone .X t

A et h same itme,only a wfe applications ma oi t benefit et h underprivilegedgroups ni et h socie ty d

n

a et h stiuaiton ni HongKong si moreserious.There ea r differentversions fo banknotecirculated n

i et h market da n onlyversionsafter2010 sh a embossed et h braille da n tactlielines. sA et h timepass, e

h

t features fo et h banknotew eillb diminished da n hence ew attempt ot make eu fs o et h recognition r

e w o

p fo A o.I.t enhancetheirlivingexperience. n

I this paper, ew introduce et h modern architecture fo convolutional neural network ni image .

g n i s s e c o r

p T he methods rf o creaitng a dataset which si able ot apply ot deep learning would eb .

d e i d u t

s yB applying transferlearning, ew nc a takeadvantages from et h existing works da n keep g

n i p o l e v e

d based no .ti This effectively reduces t he time f or training a nd maintains good .

(2)

e h

T paperwould eb ro ganized sa follow.Section II wouldintroduce et h modelarchtiectures w ith e

h

t novel features, including inception module ][ d5 a n residual connection [ 6]. Section II I would e

t a r t s n o m e

d et h datasetcreated rf so thi specificusage.Section VI woulddiscuss t eh methodology rf o g

n i t a e r

c et h application da n Section V woulddiscuss da n evaluate et h resutls.

d e t a l e

R Work

n

I 2012,AlexKrizhevskyintroduced a breakthroughtechnologyusing et h concept fo Convolutional l

a r u e

N Network rf o imageprocessing da n published a papernamed “ImageNetClassification w ith p

e e

D ConvolutionalNeuralNetworks”[ e2].T h success fo et h model tn o onlydepends no et h kernel s

r e t l i

f w tih convolution, ti also implemented Recitfied LinearU nit(ReLU)[7], normalisation,m -ax ,

g n i l o o

p dropout da n datapreprocessingmethods ot achieve et h robustperformance.Aftertheyable o

t solve et h 1000classes in imageclassification, et wh ne ae fr o deeplearningw asestablished ta that .t

n e m o m

l a n o it u l o v n o

C Layers

e h

T motivation fo developing CNN w as mainly related ot et h capactiy a nd performance fo et h .

k r o w t e

n yB considering et h task classifying a 256x256 size fo image, rf eo t h traditional MLP ,

k r o w t e

n et h weightsfrom et h firstlayeralreadyconsist fo 5,000,000 parameters ed ou t et h 65536 r

e b m u

n fo pixelsfrom256x256.T hehighnumber fo parameters tn o onlyexhausted da n consumed h

g i

h computational power b ut also increase et h difficulty rf eo t h weights ot converge yb back -.

n o i t a g a p o r

p tI greatlyaffects et h generalizaiton fo et h results. eT h CNNnetworkexploits et h spatial n

o i t a l e r r o

c wtihin et h image da n reduce et h number fo parameterswhichincrease et mh co putational y

c n e i c i f f

e sharplyduring et h trainingphase.

e r u g i

F .1 Structure fo Convolutionalneuralnetwork.

e r u g i

F 1 demonstrate et h working principle fo convolutional neuralnetwork.T herepresentation f

o spatialinformation sh a beensqueezed to du a t en h parametersthatm oapt et h content fo et h image .

n i a m e

r sA shown ni et h figure, et h heigh,t width da n depth fo et h image ah ev beenvariedthrough t

n e r e f f i

d layers. rF a o simpleneuralnetwork,normally et h equation fo eachneuron si a simplelogistic e

r gression )( , 1

𝑦𝑖𝑙 = 𝑔(𝑊𝑇𝑦𝑖𝑙−1+ 𝑏) ( 1)

which_{𝑊 = 𝑊}𝑇T represent et h weights, b si bias, g si et h activationfuncitons fo ReLUwhich_{𝑔(𝑥) =}

𝑥 𝑎

𝑚 (0, 𝑥) ][ d7 a n _𝑦_𝑖𝑙 represent input ro output ni layer l. B rut f o neuron ni convolutional neural ,

k r o w t e

n et h spatialinformation fo inputscould remain et h same ed ou t et h calculation available ni o

w

t ro higherdimensions ( 2):

𝑦𝑖𝑙𝑗 = 𝑔( ∑ 𝑎=0 𝑚−1

∑

𝑏=0 𝑚−1

(3)

e h

T fliter_𝜔w ith_{𝑚 × 𝑚}could generate et h outputw ith_{(𝑛 − 𝑚 + 1) × (𝑛 − 𝑚 + 1)} fi et h input s

a

h _{𝑛 × 𝑛}dimensions da en t h depthcorrelated ot et h number fo filters ro et h channels fo et h image. e

h

T output_𝑦_𝑖𝑙_𝑗takes et h summation fo lla valuesfrompreviouslayerw ithweighted yb et h filters da n e

c n e

h tg ee t h featurerepresentaitons frompreviouslayers. t

r a p

A rf mo convolution, pooling layeralsogreatly reduces et h totalparameters da n increase et h e

c n a m r o f r e

p yb summarizing et h featuresfrom et h upperlayer.Finally, yb connected la el t h flatten s

r e t l i

f fromconvolutionallayer ot a fullyconnected layer, a simpl econvolutionalneuralnetwork si .

d e t c u r t s n o c

n o it p e c n

I Module

r o

F CNN, eo fn o et h majorconcern si et h representationalbottlenecks ni each layer,especially rf o e

h

t lowerlayers.Differentsize fo et h fliters ro kernelswouldgeneratecompletelydifferentfeature ,

s p a

m which la fl o them ea r basicallyvaluable da n m aybenefit ot et h mode.lInformaitonwould eb s

s o

l fi either eo fn o size si fixed. sA a resul,t ti si difficult ot compare et h contribution fo different e

z i

s fo filters da n decide et eh p rfectone.Normally, et h b estflitersize si determined yb experience d

n

a experiments. nI ilght fo et h situation, incepiton modules(shown ni Figure )2 provide a solid n

o i t u l o

s da a n noveldesignprinciple rf o solving et h problem.

e r u g i

F .2 Structure fo inceptionmodule,whichacts sa a layerinside a neuralnetwork.

e h

T final output fo each module si concatenated from different size fo convolution filters ro d

e w o l l o

f yb a pooilng layer. T he main feature fo et h module si providing paths rf eo t h model ot h

c oose et h b estfeaturesrepresentation yb itself. tI prevents et h situationwhere et h informationwould e

b loss fi et h certain esiz fo et h fliters si fixed.Anotherfeature fo et h module si et 1h 1 x convolutional .

s r e t l i

f tI does tn o vary et h size tb u vary t eh depth fo et h outpu,t which implies et h control fo et h s

n o i s n e m i

d ni et h channels fo et h image. F orexample, two 1 x1 convoluitonalfliterapplied ot na 8

2 x 8

2 RGB images could reduce t he dimension from 28x28x3 ot 28x28x2, performing na n

o i t a m r o f n

i integration ni channels tb u remain et h planeinformaiton fo et h image. D ouet et h high y

ti l i b i x e l

f fo et h module, ti oals provideshigherefficiency rf o computation. eT h structurelooksmore ,

d e t a c i l p m o

c however, et h computaitonaltimeneeded rf o training si unexpectedlylow.

l a u d is e

R conneciton

r o

F residual modules (shown ni Figure ,3 e) t h idea came from et h investigation fo deep neural .

k r o w t e

n Theoretically,morelayersexist ni et h network,morefeaturescould eb extractedfrom et h .t

u p n

i nI fac,t paperfromMicrosoftResearch ][ 6 indicated ht ta thereexisted a botlteneckwhen et h l

e d o

m attempted ot tg e deeper.Experimentsclearlyshowed et h decrease ni performancewhenadding a

r t x

e layers ot na existed mode.l When et h number fo layers already saturated, such sa 65 layers ,l

e d o

m it tg s e highererror erat ni training da n inferencingcompared ot a 02 layersmodel no CIFAR -.

0

1 Both et h trainingerror da en t h testing error ea r higher rf eo t h deepermode.lT heidealcase rf o e

h

t extralayers ea r giving et h sameresultsfrom et h abovelayers. However, experimentsindicated t

a h

(4)

d l u o

c tn o simply find na optimizedsolution. sA a resul,t a residualmodulew assuggested ot solve e

h

t problem( 3),

𝐻(𝑥) = 𝐹(𝑥) + 𝑥 ( 3)

e r u g i

F .3 Residualmapping.

e r e h

w _𝐻(𝑥) si et h outpu,t_𝐹(𝑥) si et h modulebetweeninput da n output da x sn i et h inpu.tO fneo et h r

o j a

m problem fo a deep neuralnetwork si et h vanishing ro explodinggradientw ith squashing da n n

o i t a v it c

a function. oT solve et h problems, et h residualconnection offers na option rf eo t h network o

t skip et h module fi there ea or n featuressuccessfullyextractedfrom et h inpu.tExperimentsshowed t

a h

t w etih t h residual module, better ro sameperformance obtained from et h modelw ith increase r

e b m u

n fo layers ni et h network. Besides, yb offering et h identity function ot et h mode,l et h time d

e d e e

n rf o training si greatly decreasecompared ot et h samelayerw ith normalconstructionneural .

k r o w t e n

n o it p e c n

I -Re isdualModules

h t i

W et h success fo both et h inception da n residualmodules, Google dh a combined et h benefits fo o

w

t modules da n merged them ot create na Inception-ResNet models[ 8]. T heperformance fo et h d

e n i b m o

c modulesreached et h highestperformancethan et h rest fo et h mode.l tI became et h state fo e

h

t ta r model da n technology. oPr vided w eitht h solidperformance da n slowcomputationaltime rf o ,

g n i n i a r

t most fo et h tasksbased no ti wouldhavegreatsuccess.

t e s a t a D

n

I sthi section,efficientmethods rf o creating a sutiabledataset ea r represented.Regarding et h success f

o deep une r alnetwork, et h major ilmitations ea er t h size fo et h trainingdata da en t h diversity fo et h .

a t a

d sA deep neural network requires huge amount fo data, ti si inevitable rf o collecting a large t

n u o m

a fo relevant da n labelled data.However, there si on expli citguidance no et h minimumdata d

e d e e

n a end t h performance varies depended no et h quality fo et h datase.t Besides, o fne o et h n

o it c i r t s e

r fo convolutionalneuralnetwork si lacking et h ability ot eb spatiallyinvariant fo et h input .

a t a

d tI si difficult rf to i ot inference et h imagecaptured ni differentangels ro underdifferentlight .

y ti s n e t n

i O fne o et h possible solutions si ot eu s anotherarchtiecturesuch sa Spatial Transformer s

k r o w t e

N ][ o9 t handle et h spaitalinformation fo et h image.Another spo siblew sayi ot enhance et h t

e s a t a

d yb coveringsuchscenarios. n

I light fo et h stiuaiton, otw datasets ea r created, eo sn i normaltraining imagescreated direclty m

o r

f recorded videos fo banknotes no both sides. Another o snei a special tesitng dataset hw i ch t

n e s e r p e

r et h real- el if situations, rf o example, banknotes captured w ith different angles da n light .

y ti s n e t n i

s e l b a

T I provides et h informaiton fo et h banknote videos a end t h images created from .ti oT y

l t n e i c i f f

e created a datase,t et h proposedmethod si ot export et h framesw 0ith3 sf p from et h movies. t

n e r e f f i

(5)

t n e r e f f i

d angles.Thisgreatlyreduces et h time rf o preparing et h data da n ensure et h quantity fo data generated rf o training a classifier. Besides, et h imbalance fo et h distribution si mainly d oue t et h

r e h s i l b u

p fo et h banknotes. 0$ 1 banknoteonly sh ea o n type da 0n $ s2 h a some do l versionbanknotes h

c i h

w currenltycirculated ni et h marke.t

e l b a

T .1 Number fo images da n representedproportion rf o eachclass ni training da n testingdatasetsfrom et h movies.

g n i n i a r

T Dataset(Movie)

) D K H ( e t o n k n a

B N o.Videos Durations(second) N o.Images Proportion

0 1

$ 5 3 s 0 9 30 9.8%

0 2

$ 3 1 1 s 55 4870 51.3%

0 5

$ 1 5 7 s 5 2338 24.6%

0 0 1

$ 1 1 4 s 3 1355 14.3%

e l b a

T II provides et h number fo imagesused rf o training da n tesitng. F ortestingdatase,t la el t h s

e t o n k n a

b ni et h images ea r captured eo yn b eo n from et h camera. oT ensure et h performance fo et h l

e d o

m sh ea t h sameperformance sa real- eil f appilcation, la el t h testingimageshavecertainfeatures h

c i h

w represent et h samewhen using et h mobilephones. First fo ,la l instead fo using a brand- wne ,

e t o n k n a

b la el t h banknotes ni et h testingimages ea r apparentlybeingused rf a o longperiod fo time. e

m o

S fo them ea r folded,scratched ro w tihcrease da n they ea r capturedfromdifferentangles ro et h e

t o n k n a

b si placedjust ta et h corner fo et h images.Besides,images ea r capturedw ithvariety fo light y

ti s n e t n

i ni a naturalway. F orexample,some fo et h images ea mr d ei d ou t et h lightsourcebeing d

e k c o l

b da n some ea r underhighlightexposure.

e l b a

T .2 Number fo images da n representedproportion rf o eachclass ni training da n testingdatasets.

g n i n i a r

T Dataset TestingDataset

e t o n k n a B

) D K H

( N o.Images N o.Images

0 1

$ 9 30 4 8

0 2

$ 4870 2 53

0 5

$ 2338 8 7

0 0 1

$ 1355 1 28

y g o l o d o h t e M

n

I sthi section, et h maindesignworkflow fo et h banknoterecognis re da a n usefultrainingtechnique d

e ll a

c transferlearning [10] ea r presented. nI ro u work, et h banknoteimages ea r trained w ith four s

u o m a

f CNNmodels,VGG16,ResNe,tInception,InceptionResNet da n finalized et h bestmodel rf o .

n o i t a t n e m e l p m

i Some fo et h ideas ea r presented da n discussed ni Seciton .I I Detail fo et h models ea r c

u d o r t n

i e nd i thissection da en t h results da n performance fo et h modelwould eb discussed ni et h t

x e

n section. yB understanding da n comparingdifferentmodelarchitectures, ti si possible ot decide e

h

t mostsutiablestructure rf so thi specifictask.Besides,w eitht h benefit fo et h transferlearning, et h y

t l u c i f f i

d rf o trainingsuchdeepnetworkgrealtydecrease da n solidperformance fo et h models si oals .

d e e t n a r a u

g tA et h end, et h trainedmodelwould tp nu i et h mobileenvironment da n form a real-time g

n i y f i s s a l

(6)

] 1 1 [ 6 1 G G V

e h

T number 61 represent et 6h 1 layersdepth fo et h convolutionalneuralnetwork. tI si et eh o fn o et h y

l r a

e CNNmodelwhich nc a reach et h state- fo - et - th a r result ni largescaleimageclassificationtask. e

h

T designedprinciple fo sthi model si simple, 31 convolutionallayers da n poolinglayersfollowed y

b 3 fullyconnected layers. nI order ot increase et h ability rf o learning et h detailfeatures, ti used a l

l a m

s size 33 x convolutionalfliterinstead fo a largesize 77 x fliters ni AlexNet[ 2]. Besides, when e

h

t networkgoesdeeper, ti containsmorefilters ot capture et h detailfeatures fo et h inputs.However, s

a et h number fo parameters si morethan 01 3 mliilon, et h hardwarerequirementgreatlyincreased.

t e N s e

R ][ 6

s

A describe ni Seciton ,I I althoughVGG16g a ets goodperformance,when et h networkgoingdeeper, e

h

t performancebecomeworse.This si mainly ed ou t et h vanishing ro explodinggradientsproblem g

n i r u

d training [12] da n ResNetmainly addresses ti da n solvesthisproblem.Ideally, fi eo fn o et h y

a

l e s nr i et h neuralnetwork si saturated,whichimplied on extrafeatures ea r learned ro extracted, ti s

i obvious et h input ro output fo suchlayershould eb identica.lHowever,experimentalresultsshow t

a h

t layercould tn o explicitlyfinding et h perfect ew ight yb itself, henceaddingresidualmapping si e

n

o fo et h soluiton. Similar ot et h design fo VGG, 3 convolutional layers combined da n form a .

” k c o l B

“ T he input a nd output fo et h blocks have residual mapping which n ot only increase ro n

i a t n i a

m et h performancewhen goingdeeper tb tu a et h sametimeincreasing et h efficiency ta back .

n o i t a g a p o r

p sA there ea r 50/101/152 layers fo ResNe,t ni ro u work, ew decide ot pick et 1h 1 r0 f o r

u

o proposedmode.l

n o it p e c n

I ][ d8 a n Incep itonResNet ][ 13

n o i t c e

S II presented et h coreidea fo et h inception modules. nI Incepiton da n InceptionResNe,t et h s

e l u d o

m ea r differentfrom et h concept fo block ni ResNe.t Inspired yb [14], et h modules ea r more y

l e k i

l a bs -u networkinside a neuralnetwork.Thisstrengthen et h abiilty fo et h network rf o extracitng s

e r u t a e

f ni a moreenhanced way, particularly rf eo t h localregion near et h inputwhich ta s sc a na t

n a t r o p m

i role sa et h network si learningfeatureslayer yb layer.T hedesign fo et h modules ni both s

l e d o

m ea r simliar. Instead fo having residualmapping, InceptionResNetused more techniques ot e

k a

m et h model more computational efficien.t O fne o et h major different si having more 1 x1 l

a n o i t u l o v n o

c layers rf o dimension reduction. A nd after many hands- no experiments da n testing, e

m o

s specialarchitectures such sa using 71 dx a 1n 7 x convolutionalfilterinstead fo 77 x directly ro h

c t a

b normalisation si being removed. Some convolutional filters ni InceptionResNet ea r n ot d

e w o l l o

f yb na acitvationfunction.Based no manyhand-craftedaugmentaiton, et h inceptionmodules d

n

a et h architecture ni incepitonResNet si comparatively complicated da n unnatural tb u showing a r

e t t e

b performance.

r e f s n a r

T Learning

s

A mentioned ni [10],CNN rf o imageprocessingalwaysshared et h sameweights ni et h lowerlayers d

n

a et h featureslearnedthere ea r independent ot et h tasks da n datasets. T heinvestigation no such y

t r e p o r

p promoted et h transferlearning,which et h parameterscould eb sharedamongdifferenttasks. n

O et h otherhand, ew could make eu fs o this opr perty ot efficienlty train ro u desiredmodelw a ith y

r o t c a f s i t a

s performancebased no et h previouswork.Figure 4 demonstrated et h concept fo transfer .

(7)

e r u g i

F .4 Workflow rf o transferlearning.

r o

F lla ro u trained models, ew keep la el t h weights ni et h middlelayers(convolutionallayers) sa l

l

a et h filters ea r w elltrainedw a ith highdemandeddatasetsuch sa ImageNet ro otherlargedatasets. y

B removing et h l ast fully connected layer da n constructed et h designed layer, et h model sh ea t h y

t i l i b

a ot transfer et h learned featurerepresentation from et h previouswork ot et wh ne dataset da n e

c n e

h et h modelcould eb easliybeingoptimized. However,transferlearningonlycapable ni similar .

s k s a

t tI si impossible ot make eu of ti ns i otherfields, us hc sa sound ro textclassificaiton. T ehep -r d

e n i a r

t weightswouldbecome a barrier rf eo t wh ne comingtasks. ,

s e d i s e

B o fne o et h common features ni la el t h above proposed model architecture ea r highly d

e d n a m e

d no hardwareconfiguration.Parameters ro weightsneed ot eb trained ea r startingfrom 01 3 n

o i l l i

m ni VGG16 ot even larger one. This restricted et h development sa lack fo having enough e

r a w d r a

h resources sa suchmutlipleGPU. tI si unable ot train et h modelfromscratch yb ro u own. ,

e s i w r e h t

O w a ith slowerCPU,trainingcouldtakemonths rf eo t h model ot converge.

e li b o

M Environment

e r e h

T ea r advantages rf o usingTensorflow rf o training et h mode.lFirs,t et h community fo Tensorflow s

i largeenough which ti si easy rf eo t h users ot share et h resources,such sa et eh p -r trainedmode.l ,

d n o c e

S Tensorflow si w elldeveloped,offeringfunctions rf o converting et h trainedmodelfrom et h m

r o

f fo python ot C++ a nd stripping ro freezing et h model which greatly reduce et h hardware t

n e m e r i u q e

r rf o inferencing et rh esutlfrom et h mode.l

s t n e m i r e p x

E da n Dsicus ison

n

I thissection ew investigate et h possibiltiy fo differentproposedmodelarchitectures rf o banknote n

o i t i n g o c e

r tasks. yB comparing et h trainingloss da en t h accuracy fo et h testingdatase,t ew ea r able o

t conclude da n decide et h mostsutiablemodelarchitecture rf ro o u application.

e l b a

T .3 Accuracy fo et h testingdataset.

y c a r u c c

A no testingdataset

6 1 G G

V 53.52%

1 0 1 _ 1 v _ t e N s e

R 9.18%

n o i t p e c n

I 81.64%

t e N s e R _ n o i t p e c n

I 85.74%

e l b a

T II I shows et h results da en t h performance fo differentmodels. yB adding more advance s

e u q i n h c e

t ot et h CNN, such sa inception, residual mapping, et h performance fo et h model y

l l a n o i t r o p o r

(8)

e c n a m r o f r e

p from et h evaluation. Hence et h inceptionResNetwould eb et h finalized model rf eo t h n

o it a c i l p p

a sa expected. rF eo t h resutlsobtainedfromVGG da n ResNe,t ew couldexplainthemw ith e

h

t training lossfrom Figure .5 sA shown ni figure, et h fluctuated traini ng lossmainly ed ou t et h i

n i

m -batchbased gradientdescen.tT helargedifferent fo gradients ni eachbatchimplied et h failure r

o

f et h model ot optimized from et h data sa et h gradients si n otgenerally decrease through time. e

s a e r c n

I et h batchsize ro hc ange et h opitmisationmethodswouldreduce et h diversity ni eachbatch t

u

b et h solutions ea tr n o covered ni thiswork. F eort h resutls fo ResNe,t et wh lo lossvaluesshow t

a h

t et h modellearn et h trainingdatasetw ellwhichhaving a high accuracy rf eo t th rainingdatase.t On et h otherhand, ti implies et h modellack fo generalisationabiilty rf wo ne dataset sa ti sh a been

d e h c a e

r ot et h state fo overfitting. T heweights fo et h modelonly sensitive ot et h seen data. tI si a n

o m m o

c phenomenon fi et h model si trained rf a o long itme.

e r u g i

F .5 Softmaxlossoverstepsduringtrainingphrase.

h g u o h t l

A thereshouldexistvarieitesbetweenmodels, tb ou t standardize et h results,samelearning ,

e t a

r batch esiz da n trainingsteps fo 5000 ea r decidedafterseveralexperimentsexamined. sA shown n

i Figure ,5 et h convergence fo et h softmaxlossindicate et h success fo transferlearning rf eo t wh ne e

t o n k n a

b datase.tHowever, ti si tn eo t h finalizedmodels rf eo t h application.Afterdecided et h model ,

e r u t c e t i h c r

a ew ra e required ot optimize et h model yb manyhand-craftedhyper-parameterstuning. r

o

F example,training steps,learningrate, batchsize, et h distribution fo training da n testing dataset e

r

a la el t h essenitalelements.Aftersuccessfullyfine-tuning et h model, ti sh ea t h abiltiy ot recognize e

h

t downloaded imaged from et h w reb o et h banknotebeingcovered ro foldedcorreclty (shown ni e

r u g i

F .6 ) After et h model si ultimatefinailzed, ew ea r able ot covert et h model ot et h C++capable t

a m r o

f da n plug oint et s fh e -l designedfront- de n platform.

e r u g i

F .6 Extratestingimages.

s n o is u l c n o C

n

I sthi paper, ew demonstrate et h wholeprocess rf o making a neuralnetworkapplicaiton,fromdataset ,

g n i r a p e r

p modeldesign ot appilcationimplementation.Atlhoughthere ea r manyapplicationsapply e

h

t wne A .I.technology,such sa autodrivingsystems,A .I.chatbots, tb u seldom et h appilcations ea r g

n i t e g r a

t et h minority fo et h society.HongKongBanknoteRecognize sri na applicationspecifically g

n i m i

(9)

y l e t i n i f e

d compensate et h disadvantages fo et h visually impaired. nI fac,t ew receive postiive s

k c a b d e e

f from et h users ni blindsociety da tn i shows et h minorityalsohaveright to enjoy et h rapid t

n e m p o l e v e

d fo technology.Application ea r available ni GooglePlayStore[15] da n iTunes pa p store e

r u g i F

( .7 )

e r u g i

F .7 KH Banknote ni iTunesA ppStore.

t n e m e g d e l w o n k c A

e

W would elik ot thankAxon-labslimited ot initiate sthi researchprojects da n assisting et h software t

n e m p o l e v e

d fo et h HongKongBanknoterecognizer.

s e c n e r e f e R

] 1

[ .Y LeCun, .K Kavukcuoglu, da .n C Farabe.tConvolutionalnetworks da n applications ni vision. n

I Circuits da n Systems (ISCAS), Proceedings fo 2010 IEEE International Symposium o n, pages 3

5

2 –256.IEEE,2010. ]

2

[ .A Krizhevsky, .I Sutskever, da .n G .E Hinton. Imagenet classificationw ith deep convolutional l

a r u e

n networks. nI Advances ni NeuralInformationProcessingSystem,2012. ]

3

[ .J Deng, .W Dong, .R Socher, .L - .J ,iL .K ,iL da .n L FeiFe.iImagenet: A large-scalehierarchical e

g a m

i database. nI IEEEConference no ComputerVision da n PatternRecognition.IEEE,2009. ]

4

[ Schroff, ,.F Kalenichenko, D & ,. Philbin, .J (2015). FaceNe:t A unified embedding rf o face n

o i t i n g o c e

r da n clustering. 2015 IEEE Conference no Computer Vision da n Pattern Recognition .

) R P V C

( do:i10.1109/cvpr.2015.7298682. ]

5

[ .C Szegedy, .W L .iu,Y ,J .ia P Sermane,t .S Reed, .D Anguelov, .D Erhan, .V Vanhoucke, da .n A .

h c i v o n i b a

R Going ede p erw ithconvolutions. nI IEEEConference no ComputerVision da n Pattern ,

n o i t i n g o c e

R 2015.

] 6

[ .K H .e, X Zhang, .S Ren, da .n J Sun. Deep residual learning rf o image recognition. nI IEEE e

c n e r e f n o

C no ComputerVision da n PatternRecognition,2016. ]

7

[ .V N daira .n G .E Hinton.Rectified ilnearunitsimproverestrictedboltzmannmachines. nI Proc. h

t 7

2 InternationalConference no MachineLearning,2010. ]

8

[ .C Szegedy, .V Vanhoucke, .S Ioffe, .J Shlens, a .nd Z Wojna. Rethinking et h Incepiton e

r u t c e t i h c r

a of r computervision. nI IEEEConference no ComputerVision da n PatternRecognition, .

6 1 0 2

] 9

[ .M Jadererg, .K Simonyan, .A Zisserman da .n K Kavukcuoglu,Spatial TransformerNetworks. v

i X r

a preprintarXiv:1506.02025,2015. ]

0 1

[ JasonYosinsk,iJ effClune,Yoshu aBengio, da n H odLipson,‘Howtransferable ea r features ni p

e e

d neural networks?’, ni Advances ni Neural Informaiton Processing Systems, p p. 3320–3328, .

) 4 1 0 2 (

] 1 1

[ VGG16 - .K Simonyan da .n A Zisserman. Very deep convolutional networks rf o large-scale m

(10)

] 2 1

[ .X Glorot a .nd Y Bengio. Understanding et h difficulty fo training deep feedforward neural .

s k r o w t e

n nI AISTATS,2010. ]

3 1

[ .C Szegedy, .S Ioffe, a .nd V Vanhoucke. Inception- ,v 4 inception-resnet a end t h impact fo a

u d i s e

r l connecitons no learning.arXivpreprintarXiv:1602.07261,2016. ]

4 1

[ M inL in,QiangChen, da n ShuichengYan.Network ni network.CoRR,abs/1312.4400,2013. ]

5 1

[ NB Camera/HKBanknote,

c e r e t o n k n a b . s b a l n o x a . m o c = d i ? s l i a t e d / s p p a / e r o t s / m o c . e l g o o g . y a l p / / : s p t t