8 1 0
2 InternaitonalConference no Communicaiton,Network da n AritifcialIntelilgence(CNAI2018) :
N B S
I 89 -1-7 60595- 50 -5 6
e
t
o
n
k
n
a
B
R
e
c
o
g
n
i
z
e
r
:
F
r
o
m
T
h
e
o
r
y
o
t
A
p
p
il
c
a
it
o
n
g
n
o
L
-y n
i Y
U
N
G
1,
M
i
n
g
- a
h X A
u
I
2a k
n
d
Y -c
i
h
u
n
g
WU
1 *,1Department fo Electrical da n ElectronicEngineering,T heUniverstiy fo HongKong,HongKong
2School fo Electronics da n InformaitonTechnology,S unY - nats e Universtiy,Guangzhou,China
g n i d n o p s e r r o C
* author
: s d r o w y e
K Deeplearning,Imageclassiifcaiton,Moblieappilcaiton.
t c a r t s b
A . nI recentyears, Deep Convolutional Neural Network (CNN) ah s demonstrated a robust e
c n a m r o f r e
p da n reaches et h state- fo - et - th a r performance ni many image processing related tasks, h
c u
s sa objectdetection,imageclassification ro somenaturallanguageprocessingtasks. However, t
s o
m fo et h studiestend ot focus no et h development fo et h modelarchitecturedesign,especiallyw ith e
m o
s standard datasets such sa MNIST ro ImageNet da n only a wfe implementing et h advanced s
e i g o l o n h c e
t oint real- el if application[ n1].I sthi study, ew furtherintroduce et h w -ell designedmodel n
i imageclassification da n demonstrate et h designed modelapplied into a mob lieenvironmen.t nI a
t a
d preparation,differentdatacollecitonmethodswereevaluated da n examiningdifferentmethods r
o
f creating a datase.t eW further investigate et vh ad antages da n restrictions fo et h mobileneural k
r o w t e
n mode.lResults lilustrated et h consistency fo performancewhentransferringfromcomputer t
n e m n o r i v n
e ot mobileenvironmen.t nI addiiton, a real- el if productw asmadebased no et h theory d
n
a investigatio yn b oc -operating w ith local blind society a nd software development company, g
n i m r o
f et h firstreal-timeA .I.application rf o visuallyimpairedw ithhighmobility da n bulit- ni neural k
r o w t e
n mode,lcalledβHongKongBanknoteRecognizerβ.
n o it c u d o r t n I
p e e
D NeuralNetwork(DeepLearning), a branchundermachinelearning, sh a beenw elldeveloped d
n
a have a widerange fo applications ni recentyears. nI et h early stage, ti w asjust ni et h form fo i
t l u
m -layerPerceptron(MLP)networkw ithbackpropagaitonuntil et h break-through fo AlexNet yb x
e l
A Krizhevsky da n Geoffrey Hinton [ t2]. I demonstrated et h high capability fo neural network h
c i h
w on longerbeing restricted yb et h smallsize da en t h grey-scaleimagedataset ni MNIST tb u e
l b
a ot solvethousands fo class feso classification ktas ni ImageNet[ e3].T h success fo deeplearning n
i largescaleimageclassificationgrealtyincrease et h popularity da n accelerate et h development fo ,t
i includingcommunity, software/hardwaredevelopment da n attracitng morepeople ot contribute. t
A et h same time, w ith rapid development fo et h technologies da n models, they havesuccessfully d
e r r e f s n a r
t da n appiled oint simliartasks. nI 2015,Google sh a reached na excellentperformance ni e
c a
f recognitiontaskswhich sh a 99.63%accuracyusing LabeledFaces ni W ild(LFW)dataset da n %
2 1 . 5
9 ni Youtube Faces Database [ 4]. T hehigh accuracy da n solid performance sh a shown et h s
s e n t s u b o
r fo neuralnetwork da tn a et h sametimeindicates et h mature fo et h technology. sA a resul,t e
r o
m produc dtsa n applicationswerebuiltbased no ti da n started ot merge oint ro u dailylife,such sa f
l e
s -driving rc na i Tesla da n faceverification ni AppleiPhone .X t
A et h same itme,only a wfe applications ma oi t benefit et h underprivilegedgroups ni et h socie ty d
n
a et h stiuaiton ni HongKong si moreserious.There ea r differentversions fo banknotecirculated n
i et h market da n onlyversionsafter2010 sh a embossed et h braille da n tactlielines. sA et h timepass, e
h
t features fo et h banknotew eillb diminished da n hence ew attempt ot make eu fs o et h recognition r
e w o
p fo A o.I.t enhancetheirlivingexperience. n
I this paper, ew introduce et h modern architecture fo convolutional neural network ni image .
g n i s s e c o r
p T he methods rf o creaitng a dataset which si able ot apply ot deep learning would eb .
d e i d u t
s yB applying transferlearning, ew nc a takeadvantages from et h existing works da n keep g
n i p o l e v e
d based no .ti This effectively reduces t he time f or training a nd maintains good .
e h
T paperwould eb ro ganized sa follow.Section II wouldintroduce et h modelarchtiectures w ith e
h
t novel features, including inception module ][ d5 a n residual connection [ 6]. Section II I would e
t a r t s n o m e
d et h datasetcreated rf so thi specificusage.Section VI woulddiscuss t eh methodology rf o g
n i t a e r
c et h application da n Section V woulddiscuss da n evaluate et h resutls.
d e t a l e
R Work
n
I 2012,AlexKrizhevskyintroduced a breakthroughtechnologyusing et h concept fo Convolutional l
a r u e
N Network rf o imageprocessing da n published a papernamed βImageNetClassification w ith p
e e
D ConvolutionalNeuralNetworksβ[ e2].T h success fo et h model tn o onlydepends no et h kernel s
r e t l i
f w tih convolution, ti also implemented Recitfied LinearU nit(ReLU)[7], normalisation,m -ax ,
g n i l o o
p dropout da n datapreprocessingmethods ot achieve et h robustperformance.Aftertheyable o
t solve et h 1000classes in imageclassification, et wh ne ae fr o deeplearningw asestablished ta that .t
n e m o m
l a n o it u l o v n o
C Layers
e h
T motivation fo developing CNN w as mainly related ot et h capactiy a nd performance fo et h .
k r o w t e
n yB considering et h task classifying a 256x256 size fo image, rf eo t h traditional MLP ,
k r o w t e
n et h weightsfrom et h firstlayeralreadyconsist fo 5,000,000 parameters ed ou t et h 65536 r
e b m u
n fo pixelsfrom256x256.T hehighnumber fo parameters tn o onlyexhausted da n consumed h
g i
h computational power b ut also increase et h difficulty rf eo t h weights ot converge yb back -.
n o i t a g a p o r
p tI greatlyaffects et h generalizaiton fo et h results. eT h CNNnetworkexploits et h spatial n
o i t a l e r r o
c wtihin et h image da n reduce et h number fo parameterswhichincrease et mh co putational y
c n e i c i f f
e sharplyduring et h trainingphase.
e r u g i
F .1 Structure fo Convolutionalneuralnetwork.
e r u g i
F 1 demonstrate et h working principle fo convolutional neuralnetwork.T herepresentation f
o spatialinformation sh a beensqueezed to du a t en h parametersthatm oapt et h content fo et h image .
n i a m e
r sA shown ni et h figure, et h heigh,t width da n depth fo et h image ah ev beenvariedthrough t
n e r e f f i
d layers. rF a o simpleneuralnetwork,normally et h equation fo eachneuron si a simplelogistic e
r gression )( , 1
π¦ππ = π(πππ¦ππβ1+ π) ( 1)
whichπ = ππT represent et h weights, b si bias, g si et h activationfuncitons fo ReLUwhichπ(π₯) =
π₯ π
π (0, π₯) ][ d7 a n π¦ππ represent input ro output ni layer l. B rut f o neuron ni convolutional neural ,
k r o w t e
n et h spatialinformation fo inputscould remain et h same ed ou t et h calculation available ni o
w
t ro higherdimensions ( 2):
π¦πππ = π( β π=0 πβ1
β
π=0 πβ1
e h
T fliterπw ithπ Γ πcould generate et h outputw ith(π β π + 1) Γ (π β π + 1) fi et h input s
a
h π Γ πdimensions da en t h depthcorrelated ot et h number fo filters ro et h channels fo et h image. e
h
T outputπ¦πππtakes et h summation fo lla valuesfrompreviouslayerw ithweighted yb et h filters da n e
c n e
h tg ee t h featurerepresentaitons frompreviouslayers. t
r a p
A rf mo convolution, pooling layeralsogreatly reduces et h totalparameters da n increase et h e
c n a m r o f r e
p yb summarizing et h featuresfrom et h upperlayer.Finally, yb connected la el t h flatten s
r e t l i
f fromconvolutionallayer ot a fullyconnected layer, a simpl econvolutionalneuralnetwork si .
d e t c u r t s n o c
n o it p e c n
I Module
r o
F CNN, eo fn o et h majorconcern si et h representationalbottlenecks ni each layer,especially rf o e
h
t lowerlayers.Differentsize fo et h fliters ro kernelswouldgeneratecompletelydifferentfeature ,
s p a
m which la fl o them ea r basicallyvaluable da n m aybenefit ot et h mode.lInformaitonwould eb s
s o
l fi either eo fn o size si fixed. sA a resul,t ti si difficult ot compare et h contribution fo different e
z i
s fo filters da n decide et eh p rfectone.Normally, et h b estflitersize si determined yb experience d
n
a experiments. nI ilght fo et h situation, incepiton modules(shown ni Figure )2 provide a solid n
o i t u l o
s da a n noveldesignprinciple rf o solving et h problem.
e r u g i
F .2 Structure fo inceptionmodule,whichacts sa a layerinside a neuralnetwork.
e h
T final output fo each module si concatenated from different size fo convolution filters ro d
e w o l l o
f yb a pooilng layer. T he main feature fo et h module si providing paths rf eo t h model ot h
c oose et h b estfeaturesrepresentation yb itself. tI prevents et h situationwhere et h informationwould e
b loss fi et h certain esiz fo et h fliters si fixed.Anotherfeature fo et h module si et 1h 1 x convolutional .
s r e t l i
f tI does tn o vary et h size tb u vary t eh depth fo et h outpu,t which implies et h control fo et h s
n o i s n e m i
d ni et h channels fo et h image. F orexample, two 1 x1 convoluitonalfliterapplied ot na 8
2 x 8
2 RGB images could reduce t he dimension from 28x28x3 ot 28x28x2, performing na n
o i t a m r o f n
i integration ni channels tb u remain et h planeinformaiton fo et h image. D ouet et h high y
ti l i b i x e l
f fo et h module, ti oals provideshigherefficiency rf o computation. eT h structurelooksmore ,
d e t a c i l p m o
c however, et h computaitonaltimeneeded rf o training si unexpectedlylow.
l a u d is e
R conneciton
r o
F residual modules (shown ni Figure ,3 e) t h idea came from et h investigation fo deep neural .
k r o w t e
n Theoretically,morelayersexist ni et h network,morefeaturescould eb extractedfrom et h .t
u p n
i nI fac,t paperfromMicrosoftResearch ][ 6 indicated ht ta thereexisted a botlteneckwhen et h l
e d o
m attempted ot tg e deeper.Experimentsclearlyshowed et h decrease ni performancewhenadding a
r t x
e layers ot na existed mode.l When et h number fo layers already saturated, such sa 65 layers ,l
e d o
m it tg s e highererror erat ni training da n inferencingcompared ot a 02 layersmodel no CIFAR -.
0
1 Both et h trainingerror da en t h testing error ea r higher rf eo t h deepermode.lT heidealcase rf o e
h
t extralayers ea r giving et h sameresultsfrom et h abovelayers. However, experimentsindicated t
a h
d l u o
c tn o simply find na optimizedsolution. sA a resul,t a residualmodulew assuggested ot solve e
h
t problem( 3),
π»(π₯) = πΉ(π₯) + π₯ ( 3)
e r u g i
F .3 Residualmapping.
e r e h
w π»(π₯) si et h outpu,tπΉ(π₯) si et h modulebetweeninput da n output da x sn i et h inpu.tO fneo et h r
o j a
m problem fo a deep neuralnetwork si et h vanishing ro explodinggradientw ith squashing da n n
o i t a v it c
a function. oT solve et h problems, et h residualconnection offers na option rf eo t h network o
t skip et h module fi there ea or n featuressuccessfullyextractedfrom et h inpu.tExperimentsshowed t
a h
t w etih t h residual module, better ro sameperformance obtained from et h modelw ith increase r
e b m u
n fo layers ni et h network. Besides, yb offering et h identity function ot et h mode,l et h time d
e d e e
n rf o training si greatly decreasecompared ot et h samelayerw ith normalconstructionneural .
k r o w t e n
n o it p e c n
I -Re isdualModules
h t i
W et h success fo both et h inception da n residualmodules, Google dh a combined et h benefits fo o
w
t modules da n merged them ot create na Inception-ResNet models[ 8]. T heperformance fo et h d
e n i b m o
c modulesreached et h highestperformancethan et h rest fo et h mode.l tI became et h state fo e
h
t ta r model da n technology. oPr vided w eitht h solidperformance da n slowcomputationaltime rf o ,
g n i n i a r
t most fo et h tasksbased no ti wouldhavegreatsuccess.
t e s a t a D
n
I sthi section,efficientmethods rf o creating a sutiabledataset ea r represented.Regarding et h success f
o deep une r alnetwork, et h major ilmitations ea er t h size fo et h trainingdata da en t h diversity fo et h .
a t a
d sA deep neural network requires huge amount fo data, ti si inevitable rf o collecting a large t
n u o m
a fo relevant da n labelled data.However, there si on expli citguidance no et h minimumdata d
e d e e
n a end t h performance varies depended no et h quality fo et h datase.t Besides, o fne o et h n
o it c i r t s e
r fo convolutionalneuralnetwork si lacking et h ability ot eb spatiallyinvariant fo et h input .
a t a
d tI si difficult rf to i ot inference et h imagecaptured ni differentangels ro underdifferentlight .
y ti s n e t n
i O fne o et h possible solutions si ot eu s anotherarchtiecturesuch sa Spatial Transformer s
k r o w t e
N ][ o9 t handle et h spaitalinformation fo et h image.Another spo siblew sayi ot enhance et h t
e s a t a
d yb coveringsuchscenarios. n
I light fo et h stiuaiton, otw datasets ea r created, eo sn i normaltraining imagescreated direclty m
o r
f recorded videos fo banknotes no both sides. Another o snei a special tesitng dataset hw i ch t
n e s e r p e
r et h real- el if situations, rf o example, banknotes captured w ith different angles da n light .
y ti s n e t n i
s e l b a
T I provides et h informaiton fo et h banknote videos a end t h images created from .ti oT y
l t n e i c i f f
e created a datase,t et h proposedmethod si ot export et h framesw 0ith3 sf p from et h movies. t
n e r e f f i
t n e r e f f i
d angles.Thisgreatlyreduces et h time rf o preparing et h data da n ensure et h quantity fo data generated rf o training a classifier. Besides, et h imbalance fo et h distribution si mainly d oue t et h
r e h s i l b u
p fo et h banknotes. 0$ 1 banknoteonly sh ea o n type da 0n $ s2 h a some do l versionbanknotes h
c i h
w currenltycirculated ni et h marke.t
e l b a
T .1 Number fo images da n representedproportion rf o eachclass ni training da n testingdatasetsfrom et h movies.
g n i n i a r
T Dataset(Movie)
) D K H ( e t o n k n a
B N o.Videos Durations(second) N o.Images Proportion
0 1
$ 5 3 s 0 9 30 9.8%
0 2
$ 3 1 1 s 55 4870 51.3%
0 5
$ 1 5 7 s 5 2338 24.6%
0 0 1
$ 1 1 4 s 3 1355 14.3%
e l b a
T II provides et h number fo imagesused rf o training da n tesitng. F ortestingdatase,t la el t h s
e t o n k n a
b ni et h images ea r captured eo yn b eo n from et h camera. oT ensure et h performance fo et h l
e d o
m sh ea t h sameperformance sa real- eil f appilcation, la el t h testingimageshavecertainfeatures h
c i h
w represent et h samewhen using et h mobilephones. First fo ,la l instead fo using a brand- wne ,
e t o n k n a
b la el t h banknotes ni et h testingimages ea r apparentlybeingused rf a o longperiod fo time. e
m o
S fo them ea r folded,scratched ro w tihcrease da n they ea r capturedfromdifferentangles ro et h e
t o n k n a
b si placedjust ta et h corner fo et h images.Besides,images ea r capturedw ithvariety fo light y
ti s n e t n
i ni a naturalway. F orexample,some fo et h images ea mr d ei d ou t et h lightsourcebeing d
e k c o l
b da n some ea r underhighlightexposure.
e l b a
T .2 Number fo images da n representedproportion rf o eachclass ni training da n testingdatasets.
g n i n i a r
T Dataset TestingDataset
e t o n k n a B
) D K H
( N o.Images N o.Images
0 1
$ 9 30 4 8
0 2
$ 4870 2 53
0 5
$ 2338 8 7
0 0 1
$ 1355 1 28
y g o l o d o h t e M
n
I sthi section, et h maindesignworkflow fo et h banknoterecognis re da a n usefultrainingtechnique d
e ll a
c transferlearning [10] ea r presented. nI ro u work, et h banknoteimages ea r trained w ith four s
u o m a
f CNNmodels,VGG16,ResNe,tInception,InceptionResNet da n finalized et h bestmodel rf o .
n o i t a t n e m e l p m
i Some fo et h ideas ea r presented da n discussed ni Seciton .I I Detail fo et h models ea r c
u d o r t n
i e nd i thissection da en t h results da n performance fo et h modelwould eb discussed ni et h t
x e
n section. yB understanding da n comparingdifferentmodelarchitectures, ti si possible ot decide e
h
t mostsutiablestructure rf so thi specifictask.Besides,w eitht h benefit fo et h transferlearning, et h y
t l u c i f f i
d rf o trainingsuchdeepnetworkgrealtydecrease da n solidperformance fo et h models si oals .
d e e t n a r a u
g tA et h end, et h trainedmodelwould tp nu i et h mobileenvironment da n form a real-time g
n i y f i s s a l
] 1 1 [ 6 1 G G V
e h
T number 61 represent et 6h 1 layersdepth fo et h convolutionalneuralnetwork. tI si et eh o fn o et h y
l r a
e CNNmodelwhich nc a reach et h state- fo - et - th a r result ni largescaleimageclassificationtask. e
h
T designedprinciple fo sthi model si simple, 31 convolutionallayers da n poolinglayersfollowed y
b 3 fullyconnected layers. nI order ot increase et h ability rf o learning et h detailfeatures, ti used a l
l a m
s size 33 x convolutionalfliterinstead fo a largesize 77 x fliters ni AlexNet[ 2]. Besides, when e
h
t networkgoesdeeper, ti containsmorefilters ot capture et h detailfeatures fo et h inputs.However, s
a et h number fo parameters si morethan 01 3 mliilon, et h hardwarerequirementgreatlyincreased.
t e N s e
R ][ 6
s
A describe ni Seciton ,I I althoughVGG16g a ets goodperformance,when et h networkgoingdeeper, e
h
t performancebecomeworse.This si mainly ed ou t et h vanishing ro explodinggradientsproblem g
n i r u
d training [12] da n ResNetmainly addresses ti da n solvesthisproblem.Ideally, fi eo fn o et h y
a
l e s nr i et h neuralnetwork si saturated,whichimplied on extrafeatures ea r learned ro extracted, ti s
i obvious et h input ro output fo suchlayershould eb identica.lHowever,experimentalresultsshow t
a h
t layercould tn o explicitlyfinding et h perfect ew ight yb itself, henceaddingresidualmapping si e
n
o fo et h soluiton. Similar ot et h design fo VGG, 3 convolutional layers combined da n form a .
β k c o l B
β T he input a nd output fo et h blocks have residual mapping which n ot only increase ro n
i a t n i a
m et h performancewhen goingdeeper tb tu a et h sametimeincreasing et h efficiency ta back .
n o i t a g a p o r
p sA there ea r 50/101/152 layers fo ResNe,t ni ro u work, ew decide ot pick et 1h 1 r0 f o r
u
o proposedmode.l
n o it p e c n
I ][ d8 a n Incep itonResNet ][ 13
n o i t c e
S II presented et h coreidea fo et h inception modules. nI Incepiton da n InceptionResNe,t et h s
e l u d o
m ea r differentfrom et h concept fo block ni ResNe.t Inspired yb [14], et h modules ea r more y
l e k i
l a bs -u networkinside a neuralnetwork.Thisstrengthen et h abiilty fo et h network rf o extracitng s
e r u t a e
f ni a moreenhanced way, particularly rf eo t h localregion near et h inputwhich ta s sc a na t
n a t r o p m
i role sa et h network si learningfeatureslayer yb layer.T hedesign fo et h modules ni both s
l e d o
m ea r simliar. Instead fo having residualmapping, InceptionResNetused more techniques ot e
k a
m et h model more computational efficien.t O fne o et h major different si having more 1 x1 l
a n o i t u l o v n o
c layers rf o dimension reduction. A nd after many hands- no experiments da n testing, e
m o
s specialarchitectures such sa using 71 dx a 1n 7 x convolutionalfilterinstead fo 77 x directly ro h
c t a
b normalisation si being removed. Some convolutional filters ni InceptionResNet ea r n ot d
e w o l l o
f yb na acitvationfunction.Based no manyhand-craftedaugmentaiton, et h inceptionmodules d
n
a et h architecture ni incepitonResNet si comparatively complicated da n unnatural tb u showing a r
e t t e
b performance.
r e f s n a r
T Learning
s
A mentioned ni [10],CNN rf o imageprocessingalwaysshared et h sameweights ni et h lowerlayers d
n
a et h featureslearnedthere ea r independent ot et h tasks da n datasets. T heinvestigation no such y
t r e p o r
p promoted et h transferlearning,which et h parameterscould eb sharedamongdifferenttasks. n
O et h otherhand, ew could make eu fs o this opr perty ot efficienlty train ro u desiredmodelw a ith y
r o t c a f s i t a
s performancebased no et h previouswork.Figure 4 demonstrated et h concept fo transfer .
e r u g i
F .4 Workflow rf o transferlearning.
r o
F lla ro u trained models, ew keep la el t h weights ni et h middlelayers(convolutionallayers) sa l
l
a et h filters ea r w elltrainedw a ith highdemandeddatasetsuch sa ImageNet ro otherlargedatasets. y
B removing et h l ast fully connected layer da n constructed et h designed layer, et h model sh ea t h y
t i l i b
a ot transfer et h learned featurerepresentation from et h previouswork ot et wh ne dataset da n e
c n e
h et h modelcould eb easliybeingoptimized. However,transferlearningonlycapable ni similar .
s k s a
t tI si impossible ot make eu of ti ns i otherfields, us hc sa sound ro textclassificaiton. T ehep -r d
e n i a r
t weightswouldbecome a barrier rf eo t wh ne comingtasks. ,
s e d i s e
B o fne o et h common features ni la el t h above proposed model architecture ea r highly d
e d n a m e
d no hardwareconfiguration.Parameters ro weightsneed ot eb trained ea r startingfrom 01 3 n
o i l l i
m ni VGG16 ot even larger one. This restricted et h development sa lack fo having enough e
r a w d r a
h resources sa suchmutlipleGPU. tI si unable ot train et h modelfromscratch yb ro u own. ,
e s i w r e h t
O w a ith slowerCPU,trainingcouldtakemonths rf eo t h model ot converge.
e li b o
M Environment
e r e h
T ea r advantages rf o usingTensorflow rf o training et h mode.lFirs,t et h community fo Tensorflow s
i largeenough which ti si easy rf eo t h users ot share et h resources,such sa et eh p -r trainedmode.l ,
d n o c e
S Tensorflow si w elldeveloped,offeringfunctions rf o converting et h trainedmodelfrom et h m
r o
f fo python ot C++ a nd stripping ro freezing et h model which greatly reduce et h hardware t
n e m e r i u q e
r rf o inferencing et rh esutlfrom et h mode.l
s t n e m i r e p x
E da n Dsicus ison
n
I thissection ew investigate et h possibiltiy fo differentproposedmodelarchitectures rf o banknote n
o i t i n g o c e
r tasks. yB comparing et h trainingloss da en t h accuracy fo et h testingdatase,t ew ea r able o
t conclude da n decide et h mostsutiablemodelarchitecture rf ro o u application.
e l b a
T .3 Accuracy fo et h testingdataset.
y c a r u c c
A no testingdataset
6 1 G G
V 53.52%
1 0 1 _ 1 v _ t e N s e
R 9.18%
n o i t p e c n
I 81.64%
t e N s e R _ n o i t p e c n
I 85.74%
e l b a
T II I shows et h results da en t h performance fo differentmodels. yB adding more advance s
e u q i n h c e
t ot et h CNN, such sa inception, residual mapping, et h performance fo et h model y
l l a n o i t r o p o r
e c n a m r o f r e
p from et h evaluation. Hence et h inceptionResNetwould eb et h finalized model rf eo t h n
o it a c i l p p
a sa expected. rF eo t h resutlsobtainedfromVGG da n ResNe,t ew couldexplainthemw ith e
h
t training lossfrom Figure .5 sA shown ni figure, et h fluctuated traini ng lossmainly ed ou t et h i
n i
m -batchbased gradientdescen.tT helargedifferent fo gradients ni eachbatchimplied et h failure r
o
f et h model ot optimized from et h data sa et h gradients si n otgenerally decrease through time. e
s a e r c n
I et h batchsize ro hc ange et h opitmisationmethodswouldreduce et h diversity ni eachbatch t
u
b et h solutions ea tr n o covered ni thiswork. F eort h resutls fo ResNe,t et wh lo lossvaluesshow t
a h
t et h modellearn et h trainingdatasetw ellwhichhaving a high accuracy rf eo t th rainingdatase.t On et h otherhand, ti implies et h modellack fo generalisationabiilty rf wo ne dataset sa ti sh a been
d e h c a e
r ot et h state fo overfitting. T heweights fo et h modelonly sensitive ot et h seen data. tI si a n
o m m o
c phenomenon fi et h model si trained rf a o long itme.
e r u g i
F .5 Softmaxlossoverstepsduringtrainingphrase.
h g u o h t l
A thereshouldexistvarieitesbetweenmodels, tb ou t standardize et h results,samelearning ,
e t a
r batch esiz da n trainingsteps fo 5000 ea r decidedafterseveralexperimentsexamined. sA shown n
i Figure ,5 et h convergence fo et h softmaxlossindicate et h success fo transferlearning rf eo t wh ne e
t o n k n a
b datase.tHowever, ti si tn eo t h finalizedmodels rf eo t h application.Afterdecided et h model ,
e r u t c e t i h c r
a ew ra e required ot optimize et h model yb manyhand-craftedhyper-parameterstuning. r
o
F example,training steps,learningrate, batchsize, et h distribution fo training da n testing dataset e
r
a la el t h essenitalelements.Aftersuccessfullyfine-tuning et h model, ti sh ea t h abiltiy ot recognize e
h
t downloaded imaged from et h w reb o et h banknotebeingcovered ro foldedcorreclty (shown ni e
r u g i
F .6 ) After et h model si ultimatefinailzed, ew ea r able ot covert et h model ot et h C++capable t
a m r o
f da n plug oint et s fh e -l designedfront- de n platform.
e r u g i
F .6 Extratestingimages.
s n o is u l c n o C
n
I sthi paper, ew demonstrate et h wholeprocess rf o making a neuralnetworkapplicaiton,fromdataset ,
g n i r a p e r
p modeldesign ot appilcationimplementation.Atlhoughthere ea r manyapplicationsapply e
h
t wne A .I.technology,such sa autodrivingsystems,A .I.chatbots, tb u seldom et h appilcations ea r g
n i t e g r a
t et h minority fo et h society.HongKongBanknoteRecognize sri na applicationspecifically g
n i m i
y l e t i n i f e
d compensate et h disadvantages fo et h visually impaired. nI fac,t ew receive postiive s
k c a b d e e
f from et h users ni blindsociety da tn i shows et h minorityalsohaveright to enjoy et h rapid t
n e m p o l e v e
d fo technology.Application ea r available ni GooglePlayStore[15] da n iTunes pa p store e
r u g i F
( .7 )
e r u g i
F .7 KH Banknote ni iTunesA ppStore.
t n e m e g d e l w o n k c A
e
W would elik ot thankAxon-labslimited ot initiate sthi researchprojects da n assisting et h software t
n e m p o l e v e
d fo et h HongKongBanknoterecognizer.
s e c n e r e f e R
] 1
[ .Y LeCun, .K Kavukcuoglu, da .n C Farabe.tConvolutionalnetworks da n applications ni vision. n
I Circuits da n Systems (ISCAS), Proceedings fo 2010 IEEE International Symposium o n, pages 3
5
2 β256.IEEE,2010. ]
2
[ .A Krizhevsky, .I Sutskever, da .n G .E Hinton. Imagenet classificationw ith deep convolutional l
a r u e
n networks. nI Advances ni NeuralInformationProcessingSystem,2012. ]
3
[ .J Deng, .W Dong, .R Socher, .L - .J ,iL .K ,iL da .n L FeiFe.iImagenet: A large-scalehierarchical e
g a m
i database. nI IEEEConference no ComputerVision da n PatternRecognition.IEEE,2009. ]
4
[ Schroff, ,.F Kalenichenko, D & ,. Philbin, .J (2015). FaceNe:t A unified embedding rf o face n
o i t i n g o c e
r da n clustering. 2015 IEEE Conference no Computer Vision da n Pattern Recognition .
) R P V C
( do:i10.1109/cvpr.2015.7298682. ]
5
[ .C Szegedy, .W L .iu,Y ,J .ia P Sermane,t .S Reed, .D Anguelov, .D Erhan, .V Vanhoucke, da .n A .
h c i v o n i b a
R Going ede p erw ithconvolutions. nI IEEEConference no ComputerVision da n Pattern ,
n o i t i n g o c e
R 2015.
] 6
[ .K H .e, X Zhang, .S Ren, da .n J Sun. Deep residual learning rf o image recognition. nI IEEE e
c n e r e f n o
C no ComputerVision da n PatternRecognition,2016. ]
7
[ .V N daira .n G .E Hinton.Rectified ilnearunitsimproverestrictedboltzmannmachines. nI Proc. h
t 7
2 InternationalConference no MachineLearning,2010. ]
8
[ .C Szegedy, .V Vanhoucke, .S Ioffe, .J Shlens, a .nd Z Wojna. Rethinking et h Incepiton e
r u t c e t i h c r
a of r computervision. nI IEEEConference no ComputerVision da n PatternRecognition, .
6 1 0 2
] 9
[ .M Jadererg, .K Simonyan, .A Zisserman da .n K Kavukcuoglu,Spatial TransformerNetworks. v
i X r
a preprintarXiv:1506.02025,2015. ]
0 1
[ JasonYosinsk,iJ effClune,Yoshu aBengio, da n H odLipson,βHowtransferable ea r features ni p
e e
d neural networks?β, ni Advances ni Neural Informaiton Processing Systems, p p. 3320β3328, .
) 4 1 0 2 (
] 1 1
[ VGG16 - .K Simonyan da .n A Zisserman. Very deep convolutional networks rf o large-scale m
] 2 1
[ .X Glorot a .nd Y Bengio. Understanding et h difficulty fo training deep feedforward neural .
s k r o w t e
n nI AISTATS,2010. ]
3 1
[ .C Szegedy, .S Ioffe, a .nd V Vanhoucke. Inception- ,v 4 inception-resnet a end t h impact fo a
u d i s e
r l connecitons no learning.arXivpreprintarXiv:1602.07261,2016. ]
4 1
[ M inL in,QiangChen, da n ShuichengYan.Network ni network.CoRR,abs/1312.4400,2013. ]
5 1
[ NB Camera/HKBanknote,
c e r e t o n k n a b . s b a l n o x a . m o c = d i ? s l i a t e d / s p p a / e r o t s / m o c . e l g o o g . y a l p / / : s p t t