• No results found

Intelligent facial emotion recognition using moth-firefly optimization

N/A
N/A
Protected

Academic year: 2021

Share "Intelligent facial emotion recognition using moth-firefly optimization"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

Contents lists available at ScienceDirect

Knowle

dge-Base

d

Systems

journal homepage: www.elsevier.com/locate/knosys

Intelligent

facial

emotion

recognition

using

moth-firefly

optimization

Li

Zhang

a,∗

,

Kamlesh

Mistry

a

,

Siew

Chin

Neoh

b

,

Chee

Peng

Lim

c

aComputationalIntelligenceResearchGroup,DepartmentofComputingScienceandDigitalTechnologies,FacultyofEngineeringandEnvironment, UniversityofNorthumbria,Newcastle,NE18ST,UK

bFacultyofEngineering,TechnologyandBuiltEnvironment,UCSIUniversity,Malaysia

cInstituteforIntelligentSystemsResearchandInnovation,DeakinUniversity,WaurnPonds,VIC3216,Australia

a

r

t

i

c

l

e

i

n

f

o

Articlehistory: Received18March2016 Revised18August2016 Accepted19August2016 Availableonline22August2016 Keywords:

Facialexpressionrecognition Featureselection

Evolutionaryalgorithm Ensembleclassifier

a

b

s

t

r

a

c

t

Inthisresearch, weproposeafacialexpression recognitionsystemwithavariantofevolutionary fire-flyalgorithmforfeatureoptimization.Firstofall,amodifiedLocalBinaryPatterndescriptorisproposed toproduceaninitialdiscriminativefacerepresentation.Avariantofthefireflyalgorithmisproposedto performfeatureoptimization.Theproposedevolutionaryfireflyalgorithmexploitsthespiralsearch be-haviourofmothsandattractivenesssearchactionsoffirefliestomitigateprematureconvergenceofthe Levy-flightfireflyalgorithm(LFA)andthemoth-flameoptimization(MFO)algorithm.Specifically,it em-ploysthelogarithmicspiralsearchcapabilityofthemothstoincreaselocalexploitationofthefireflies, whereasincomparisonwiththeflamesinMFO,thefirefliesnotonlyrepresentthebestsolutions iden-tifiedbythemothsbutalsoactas thesearchagents guidedbythe attractivenessfunctiontoincrease globalexploration.SimulatedAnnealingembeddedwithLevyflightsisalsousedtoincreaseexploitation ofthemostpromisingsolution.Diversesingleandensembleclassifiersareimplementedforthe recogni-tionofsevenexpressions.Evaluatedwithfrontal-viewimagesextractedfromCK+,JAFFE,andMMI,and 45-degreemulti-viewand90-degreeside-viewimagesfromBU-3DFEandMMI,respectively,oursystem achievesasuperiorperformance,and outperformsotherstate-of-the-artfeature optimizationmethods andrelatedfacialexpressionrecognitionmodelsbyasignificantmargin.

© 2016TheAuthors.PublishedbyElsevierB.V. ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/ ).

1. Introduction

Facial expression recognition, which plays an important role inpatternrecognition,computer vision,andhumancomputer in-teraction,iswidely usedinpersonalised healthcare,videogames, surveillancesystems,humanoidservicerobots,andmultimedia.In recentstudies,manyalgorithmsfocusingonfacerecognition, gen-derandageestimation,andfacialemotionclassificationhavebeen developed.However,highdimensionalityisstillachallengingissue for such applications. Although many feature dimensionality re-ductiontechniqueshavebeenproposed,itisstilldifficultto iden-tifythemostsignificantdiscriminatingfeaturesthatbestrepresent withinandbetweenclassvariancesforemotionalfacialexpression. In thisresearch, we propose a facial emotion recognition sys-temwithavariantofevolutionaryfireflyalgorithmforfeature op-timization. The main aim of the proposed system is to identify

Correspondingauthor.

E-mail addresses: [email protected] (L. Zhang), kamlesh.mistry@ northumbria.ac.uk (K. Mistry), [email protected] (S.C. Neoh), chee.lim@ deakin.edu.au(C.P.Lim).

the most significant discriminative characteristics for each emo-tion category, in an attempt to address the above challenges. It integratesthespiralsearch behaviourofthemoths,attractiveness search actions of the fireflies,and SimulatedAnnealing (SA) em-bedded with the Levy flights to increase local exploitation and global exploration and, at the same time, mitigate the prema-tureconvergenceproblemoftheLevy-flightfireflyalgorithm(LFA) [1]and the moth-flame optimization (MFO) algorithm[2].

The proposed system is composed of three key steps, as il-lustrated inFig. 1. Firstly, anovel texture descriptoris proposed, which incorporates the use of Local Binary Patterns (LBP), Local GaborBinaryPatterns(LGBP),andLBPvariance(LBPV)[3]to cap-turelocal spatialpatterns andcontrast measures of localtexture toretrieveaninitialdiscriminativefacialrepresentation.Secondly, theproposed variantofthe fireflyoptimizationalgorithm isused toidentifythemostsignificantanddiscriminativefeaturesofeach emotioncategory.Thirdly,singleandensembleclassifiersareused for recognizing seven expressions (happiness, fear, disgust, sur-prise, sadness, anger, and neutral) based on the derived optimal featuresubsets.Evaluatedwithfrontal-viewimagesfromCK+ [4], MMI [5]andJAFFE [6], and45-degree multi-view and90-degree http://dx.doi.org/10.1016/j.knosys.2016.08.018

(2)

Fig.1. Thesystemarchitecture. side-viewimagesfromBU-3DFE[7]andMMI,respectively,the

em-pirical results indicate that our system shows a superior perfor-mance,andoutperformsotherstate-of-the-artoptimization meth-ods andrelatedfacial expressionrecognition modelsby a signifi-cantmargin.

Thecontributionsofthisresearcharesummarizedasfollows. 1. A novel texture descriptor incorporating LBP, LGBP, and LBPV

isproposedtoderiveaninitialdiscriminativefacial representa-tion.Theproposed descriptorisabletoextractspatialpatterns andcontrastmeasures ofeach imageregionforfacialanalysis, inordertobetterdealwithilluminationchanges,rotations,and scalingdifferences. Theempirical findings indicatethatit out-performs conventionaltexturedescriptors forfacial expression recognition.

2. A LFA variant, known asM-LFA, isproposed forfeature opti-mization.It exploresthespiralsearch behaviour ofthemoths and attractiveness search actions of the fireflies to mitigate the premature convergence problem of the conventional LFA andMFOmodels.Specifically,itemploysthelogarithmicspiral search of the mothsto increase local exploitation of the fire-flies. In comparison with the flames in MFO, the fireflies not onlyrepresentthe best solutionsidentified by the moths,but alsoactasthesearchagentsguidedbytheattractiveness func-tiontocausesuddenmovementsofthefirefliesandassociated mothstoincreaseglobalexploration.Therefore,itincreases lo-calexploitation ofLFAandglobalexploration ofMFOto guide the search process towards global optima. SA-embedded Levy flights diversification isalsoused to furtherimprovelocal ex-ploitationoftheidentifiedcurrentglobalbestsolution.Overall, theproposedstrategiesworkcooperativelytoavoidpremature stagnationwhileguidingthesearchprocesstowardsglobal op-tima.

3. Evaluatedwithfrontal-view imagesextractedfromCK+,MMI, andJAFFEdatabasesandmulti-viewandside-viewimagesfrom BU-3DFE andMMI, respectively, the empiricalresults indicate that the proposed system showssuperior capabilities of find-ing local and global optima simultaneously, and outperforms a number of conventional and state-of-the-art metaheuristic searchmethodssuchasParticleSwarmOptimization(PSO), Ge-netic Algorithm(GA), MFO,LFA, and other PSOandfirefly al-gorithm (FA)variants, non-evolutionaryfeature selection algo-rithms, as well as other related facial expression recognition modelsreportedintheliteraturebyasignificantmargin. The paperisorganised inthefollowingway.Therelatedwork on facial expression recognitionandstate-of-the-art feature

opti-mizationtechniquesarediscussedinSection2.Section3describes the key stagesof the proposed system, which includefacial fea-tureextraction using the proposed LBP descriptor in Section 3.1, and the proposed M-LFA algorithm for feature optimization in Section 3.2. Section 4 presents the evaluation of the proposed systemusing frontal-view images extractedfromCK+,MMI, and JAFFE and multi-view and side-view images from BU-3DFE and MMI, respectively. Section 5 presents some concluding remarks ofthisresearch, andidentifiesa numberofdirections forfurther work.

2. Relatedwork

In this section, we review state-of-the-art research on evolu-tionaryfeatureoptimizationandfacialexpressionrecognition. 2.1. Facialexpressionrecognition

Manyfacialexpressionrecognitionapplicationshavebeen pro-posed recently. Zhang et al. [8] proposed a multimodal learning method to learn the joint representation fromtexture and land-markmodalitiesof facialimages.A structuredregularization (SR) incombinationwithanauto-encoderwasproposedintheirwork tolearn sparsityanddensityfromeach modality to generatethe jointrepresentation.Featureextractionandclassificationwerealso combinedtogetherintheirproposedmodel.Theproposedmethod wasalsocapableofdealingwithexpressionrecognitiontaskswith headposevariations.EvaluatedwithCK+andNVIEdatabases,the work showed superiority over other methods. Ali et al. [9] pro-posedafacial expressionrecognitionsystemwithempiricalmode decomposition(EMD) based feature extraction.The EDM method decomposed1-Dfacialsignalintoasetofintrinsicmodefunctions (IMFs), in which the first IMF was considered as facial features forexpressionclassification.Theirworkemployedthreefeature di-mensionalityreductiontechniques,i.e.,PrincipalComponent Anal-ysis(PCA)withLinearDiscriminantAnalysis(LDA),PCAwithLocal Fisher DiscriminantAnalysis (LFDA),andKernelLFDA (KLFDA), to performoptimizationoftheextractedEMD-basedfeatures.Several classifiersincludingtheExtremeLearningMachinewithRadial Ba-sis Function(ELM-RBF) were usedto classify seven facial expres-sions.JAFFEandCK+databaseswereemployedforsystem evalua-tion.Shojaeilangarietal.[10]proposedaspatio-temporal descrip-tornamed Histogram of Dominant Phase Congruency(HDPC) for facialexpressionrecognitionfromvideosequences.Thisproposed descriptorextendedthephase congruencyconceptto 3D,and in-corporatedhistogrambinningtodescribebothmotionand

(3)

appear-ance based features. The experimental results indicated that the proposedsystemwascapableofdealingwithexpression recogni-tiontasks withscaling variationsandillumination changes. Eval-uated with CK+ and AVEC 2011 video sub-challenge, the work showedimpressive performanceintermsofrobustnessand accu-racy.

Siddiqi etal.[11]proposed afacialexpression recognition sys-temthat applieda stepwise lineardiscriminantanalysis(SWLDA) for feature extraction and the hidden conditional random fields (HCRFs)modelforexpressionrecognition.Intheirwork,facial ex-pressionswere divided into three categories,i.e. lips-based, lips-eyes-based,andlips-eyes-forehead-based.Thefirststepwastouse SWLDAandHCRFtoclassifyan inputimageintooneofthethree categories.Secondly, SWLDAand HCRFtrained only fora partic-ular category were used to determine an emotion label. Evalu-atedwithCK+,JAFFE,MMIandExtendedYaleBFacedatasets,the work achieved significant improvement on recognition accuracy, butwithexpensivecomputationalcosts.Neohetal.[12]proposed afacialexpressionsystemwithdirectsimilarityandPareto-based featureoptimization.The formerwasintegratedwiththeconcept ofmicro GA to identify feature subsets that could representthe commonfeatures ofeach expression.The lattertook both within andbetweenclassvariationsintoaccount formulti-objective fea-ture optimization. Integrated with diverse ensemble classifiers, the work achieved impressive performances when tested with CK+andMMI. ThePareto-basedapproach wasalsoproven tobe moreefficientindealingwithchallengingfeatureoptimizationfor frontal and side-view images. Zhang et al. [13] conducted real-time 3D facial Action Unit (AU)intensity estimation and expres-sion recognition. Their work employed the minimal-redundancy-maximal-relevance(mRMR)criteriontoidentifyasetof16feature subsets among the initially extracted raw facial features, which were then used to estimate the intensities of 16 diagnostic AUs. Anovelensemble classifierwasintegratedwithaclustering algo-rithmforclassification ofsix universalfacial expressions and de-tectionofnewlyarrived, unseennovelemotion classes(thosenot includedin the trainingdataset). In their work, a distance-based clusteringmethodandtheuncertaintymeasuresofthebase clas-sifierswithin each ensemblewere used fornovelclass detection. EvaluatedusingtheBosphorus 3D databaseandrealhuman sub-jects,thesystemachievedimpressiveperformancesfor identifica-tionofsixemotionsandnovelunseenexpressions.

2.2.Featureselectionandoptimizationalgorithms

Evolutionary algorithms have been widely employed for fea-ture optimization because of their impressive search capabilities [14,15].Inthissection,wediscussstate-of-the-artevolutionary fea-tureoptimizationalgorithms,whichincludethemostpopular con-ventionalsearch algorithms such as PSO, GA, and SA, and other hybridmodels.

2.2.1. Geneticalgorithm

MotivatedbytheDarwinianprincipleof‘survivalofthefittest’, theGAwasdevelopedbyHolland[16].Itemploysthree evolution-aryoperators,i.e.crossover,mutation,andselection.Thecrossover operator generates two offspring by exchanging part of a chro-mosomewith thecorresponding part ofanother. We employthe single-pointcrossoveroperatorinthisresearch.The mutation op-erator randomly changes one or more bits of an offspring chro-mosome in order to produce new genetic characteristics. Selec-tion ensures the highest quality chromosomes will be selected and propagated to the next generation to enhance the conver-gence property of the algorithm. Crossover helps local exploita-tiontoenhance convergence,while mutationbrings search diver-sityandincreasesglobalexploration.Accordingtotheoretical

stud-ies, ahigher crossover probability inthe rangeof [0.6,0.95]and alowermutationprobabilityintherangeof[0.001,0.05]are usu-allyrecommended.Thesesettingsenableahigherleveloflocal ex-ploitationandalowerdegreeofglobalexplorationtoreachglobal optimality[14].

2.2.2. Particleswarmoptimization

Motivatedby swarm behaviourssuchasbird flockingandfish schooling, PSO was proposed by Kennedy and Eberhart [17]. In PSO,anumberofparticlesmoveinthesearchspacebyfollowing theswarmleader,inordertofindtheoptimalsolutions.Itrecords thebestpositioneverachievedbyaparticleasthepersonalbest, p_best, and the best position of the overall swarm asthe global best,g_best.PSOemploysthefollowingstrategiesforupdatingthe positionandvelocityofeachparticle.

xtid+1= xtid+

v

t+1 id (1)

v

t+1 id = w×

v

t id+ c1 ×r

pidxtid

+ c2 ×r

pgdxtid

(2) where xt+1 id and

v

t+1

id represent the position andvelocity of each

particleinthe(t+1)thiterationinthedth dimension,respectively. Aninertiaweight,w,isalsointroducedtoadjusttheeffectsofthe previous velocitytothecurrentone. Moreover,pidandpgd

repre-sent thepersonal best(p_best)andglobalbest(g_best) inthe dth

dimension,respectively.c1 andc2 denotethelearningparameters oraccelerationconstants,whereas r1 andr2 indicatetwo random

parametersbetween[0,1].

Overall,PSOisawidelyusedswarm-basedalgorithmowingto itssimplicityandflexibility.Howeverithaslimitedexploration ca-pability,andtendstobetrappedinlocaloptima[14].

2.2.3. Simulatedannealing

Proposedby Kirkpatricketal.[18],SA simulatestheannealing processinmaterialprocessingthatusually requiresa careful con-troloftemperatureanditscoolingrate.SAhasbeenwidelyusedin diverseoptimizationproblems.SAemploysarandomsearch tech-nique for global exploration. It acceptsnot only better solutions butalso those lessideal solutions witha probability,p,as defined inEq.(3). p= exp

f T

> r (3)

where

fdenotesthechangeofthefitnessfunctionbetweenthe newandprevioussolutions.Trepresentsthetemperaturefor con-trollingtheannealingprocesswithrasarandomvalueuniformly distributedin[0,1].As anexample,inaminimization problem,a newsolutionwitha higherfitnessvalue thanthat ofthecurrent solutionwillbeacceptedwithprobabilityp

InSA,theannealingschedule(i.e.thecoolingschedule),which controls the decreasing rateof the temperature, plays an impor-tantroleininfluencinglocalexploitationandglobalexploration.In thisresearch,weemployageometriccoolingschedule,i.e.T=

α

T, wherethetemperatureisdecreasedbyacoolingfactor

α

∈[0,1] [14].Inpractice, SAisable toattain globaloptimality, butatthe expenseofahighcomputationalcost[14].

2.2.4. Variantsorhybridoptimizationmethods

Xue et al. [19] proposed two PSO-based multi-objective fea-ture selection algorithms. The first algorithm incorporated non-dominated sorting intoPSO (NSPSO)while the second algorithm integrated the concepts of crowding, mutation, and dominance intoPSO(CMDPSO)toaddressfeatureoptimizationproblems.They comparedtheperformancesofbothalgorithmswiththoseof

(4)

non-dominated sorting genetic algorithm II (NSGAII), Strength Pareto EvolutionaryAlgorithm2(SPEA2),ParetoArchivedEvolutionary Al-gorithms (PAES) on twelve benchmark datasets. The experimen-tal results indicated that CMDPSO selected the smallest number offeatures,andoutperformedNSPSO,NSGAII,SPEA2, andPAESin terms of classification performance and computational efficiency. Zhang etal.[15]proposed a PSOvariant calledGM-PSOto iden-tifythe mostdiscriminativebodily characteristicsfromstatic and dynamic motion features for the regression of arousal and va-lence dimensions forbodily expression perception. GM-PSO inte-gratedPSOwithGAandmutationtechniquesofGaussian,Cauchy, and Levy distributions to overcome the premature convergence problemofconventionalPSO.It outperformedGA,PSO,andother PSO variants in selecting the discriminative features and finding the global optimum. Goodarzi and Coelho [20] proposed a FA-based variable selection method for application to spectroscopy. They employed FA, PSO, and GA integrated with partial least squares (PLS) for spectroscopic data selection respectively. Eval-uated withthree spectroscopic datasets, FA identified the small-est number of wavelengths while maintaining a similar predic-tion performance. AlweshahandAbdullah [21]proposed two hy-bridFAmethods tooptimizetheweightsofaprobabilistic neural network (NN) to improveits classification performance. The first method integrated FA with SA (SFA), where SA was used to im-prove thefinal solution ofFA. The second methodcombined SFA withLevyflights(LSFA)tofurtherimprovetheglobalbestsolution. Tested with eleven benchmark datasets, LSFA outperformed SFA and LFA, and achieved impressive classification accuracy. Verma etal.[22] proposeda modifiedFA incorporatingopposition-based and dimensional-based methodologies, known as ODFA, to deal withhighdimensionaloptimizationproblems.TheproposedODFA modelused opposition-basedlearningtoperform initializationof the candidatesolutions byincluding initializationofthe opposite position of each firefly. It also employed the dimensional-based method to update the position of the global best firefly along each dimension. Evaluatedwith multidimensional standard func-tions,OFDAoutperformedFA, PSO,andDifferentialEvolution(DE) significantly.

There are also other hybrid or modified FA algorithms that dealwithdiverseengineeringoptimizationproblems.Asan exam-ple, Coelho et al.[23] proposed a modifiedFA model combined with chaotic mapsto improve the convergencerate ofthe origi-nalFAmodelforsolvingreliability-redundancyoptimization prob-lems. Yang [1] proposed a Levy-flight Firefly Algorithm (i.e. LFA) toincreaseglobalexploration,whichoutperformedclassicalsearch algorithms such as PSOand GA. Abdullah et al.[24] proposed a hybrid FA model by combining FA withDE for highdimensional andnonlinearbiologicalparameteroptimization.Amulti-objective FA modelwasalsoproposed byArsuaga-RíosandVega-Rodríguez [25] by addingmulti-objectivepropertiesinto classicalFA todeal with workload scheduling problems for minimizing energy con-sumptioningridcomputing. Kazemetal.[26]proposed achaotic FAmodelthatcombinedchaostheorywithFAtoidentifyoptimal hyper-parameter settings of Support Vector Regression (SVR) for stockmarketpriceforecasting.Insteadofrandomlygeneratingthe initial population, thechaotic mapping operator (CMO)wasused toproducetheinitialswarmtoincreasepopulationdiversity. Fire-flies with a lower light intensity employed a chaotic movement to move towards those with a higher light intensity. Especially, thefireflywiththehighestlightintensitypurely usedthechaotic movement, rather than a random walk behaviour, for exploring the search space.In comparisonwith GA-basedSVR, chaotic GA-basedSVR,FA-basedSVR,NN,andtheadaptiveneuro-fuzzy infer-encesystem,chaoticFAoutperformedtheserelatedmethodsinthe evaluationofseveralmostchallengingstockmarketdatasets from NASDAQ.

3. Theproposedfacialexpressionrecognitionsystem

In this section, we introduce the proposed facial expression recognitionsystem, which includesa newLBP descriptorfor fea-tureextraction, a new M-LFA algorithm for feature optimization, aswellassingleandensembleclassifiersforfacialemotion recog-nition.

3.1. FeatureextractionusingtheproposedLBPdescriptor

We propose a new texture descriptor, which combines LBP, LGBP,andLBPV,forfeatureextraction,inordertobetterdealwith illumination changes, rotations, andscaling differences. The pro-posedLBPvariantdescriptorcombinesthediscriminative capabil-ities of LBP, LGBP, and LBPV, and depicts great efficiency in ex-tracting discriminative spatial patterns and contrast information fortextureclassification.

Proposedby Ojalaetal.[27], LBPisa well-knowntexture de-scriptor.Thebasicidea ofLBPistothresholdeach group of3×3 neighbouringpixelsagainstthecentrepixeltogenerateasequence of binary outputs. It has been further extended to use various numbers of circular neighbouring pixels. The LBP descriptor can be denoted as LBPs,r, where sis the number of samplingpoints intheneighbourhood andristheradius.TheadvantageofLBPis itsinvariance tomonotonic gray-scalechanges. LBPis efficientin extractingrotation invariant texture features from a local region. However,itsdrawbackislossofneighbourhoodcontrastandglobal informationfortexturedescription.LGBP[28]combinesGabor fil-terswithLBPtoimprovethediscriminativecapabilityofLBP.Ithas excellentrepresentationanddiscriminatingpowerofspatial infor-mationoftheface,andshowsgreatrobustnessindealingwith il-luminationchanges,misalignment,andscalingdifferences.

Introduced by Guo et al.[3], LBPV is a rotation invariant de-scriptor that focuses on exploiting local contrast information to further improvethe discriminative capability of LBP.It combines two aspectsof complementarytexture information,i.e.local spa-tialstructure extracted by LBP andthe contrast,and generatesa simplified joint representation. Specifically, the rotation invariant contrast measure (i.e. the variance of local image texture (VAR)) iscalculatedfroma localregion,andused asanadaptive weight to fine tune the impact of the LBP code to generate histograms. Therefore,enrichedwithcontrastmeasures, LBPVpossesses more robustnessanddiscriminativecapabilitythan thatofLBP for tex-tureclassification. It also has efficientcomputational complexity, andpossessesthesamefeaturedimensionsasthoseofLBP.When pairingwithglobalmatchingmechanisms,LBPVisableto outper-formmorecomplex,state-of-the-artjointLBPandcontrast distri-butiondescriptorssuchasLBP/VAR.

Inthisresearch,we combinethe abovethreewell-known tex-ture descriptors to gain additional discriminating power to im-provetextureclassification andbetterdeal withrotations, illumi-nation, andscaling differences. Moreover, the proposed LBP vari-antemploys a three-parentcrossover scheme forhistogram gen-eration. First of all, each of the three descriptors is applied to a gray-scale input image with a size of 75×75. The three tex-turedescriptors generate a binarypatternfor each sub-region of the test image, respectively. To combine the output patterns, a three-parentcrossover scheme isapplied, i.e., an offspring is de-rived from three parents. Because of the impressive discriminat-ingcapabilitiesofLBPVandLGBP incomparisonwiththatofLBP, LBPV andLGBP are selected asthe dominating parents, whereas LBPisuseda referenceparenttosupplybasicreference informa-tionwhenLBPVandLGBPdisagrees.Theproposeddescriptor com-pareseachbit ofthefirstparentpatterngeneratedby LBPVwith thecorresponding bit ofthesecond parent generatedby LGBP. If they are the same, this bit is inherited by the offspring.

(5)

Other-wise, thecorresponding referencebit fromthe third parent gen-eratedby LBPis inherited by the offspring. As an example, sup-posethe1stparent(generatedbyLBPV)=11,010,001,the2nd par-ent(producedbyLGBP)=0,110,0100,andthe3rdparent(obtained fromLBP)=11,011,010. After applying the three-parent crossover, thenewlygeneratedoffspring is11,010,000.Moreover,itisworth topointoutthatalthoughthedifferencesbetweentheresults ob-tainedbytheproposed LBPdescriptorandthethree baseline de-scriptors(e.g.LBPV)intheaboveexampleareminor,theyare ob-tainedfroma smalllocal region of3×3.These differencescould increase drastically to 625 (25×25) histograms when an image withasizeof75×75isused.Incomparisonwiththeparent pat-terns,the experimental resultsindicate that thenewly generated offspringpossessesmorediscriminativecapabilities.Evaluated us-ingimageswithilluminationandscalingdifferencesderived from theCK+ databaseandmulti-viewandside-viewimagesfrom BU-3DFEandMMI respectively,theproposedLBPoperatorshows su-periorperformance,andoutperformstheabovethreebaseline de-scriptorssignificantly in a number ofdiverse test cases.Detailed evaluationresultsareprovidedinSection4.1.Overall,theproposed descriptoris more capable of retrievingdiscriminative facial fea-turessuchasedges andcorners,andshowsgreatrobustnessand efficiencyindealingwithimageswithlowcontrastratios. 3.2.Backgroundandtheproposedfeatureoptimizationalgorithm

Inthissection,weintroducetheproposedM-LFAalgorithmfor feature optimization.To provide the necessary background infor-mation,theoriginalFA,LFA andMFOmodelsarefirstintroduced, asfollows.

3.2.1. Fireflyalgorithm

Proposed by Yang in 2008 [14], FA is a nature inspired population-based metaheuristic search algorithm. FA applies the followingthreeprinciplesinitssearch process.Firstly, allfireflies areunisex,withoneattractedtoallothers.Secondly,attractiveness isproportional tothe brightnessofa firefly.Therefore,thefirefly withlessbrightnessmovestowards theonewithstronger illumi-nations.Ifno brighterfireflies exist, a randomwalk behaviour is conducted. Thirdly,the light intensityof each firefly denotes the solutionquality. Studiesindicate that FA demonstratespromising superiorityoverotheralgorithms,suchasPSOandGA[14].

In FA, the light intensityvariation andattractiveness are two importantaspects.Theybothdecreaseaseitherthedistancetothe lightsource orthe distancebetweentwo fireflies increases.They alsovary withthe degree of light absorption. The light intensity variationwithrespectto thedistance,r,andmedia absorption is definedinEq.(4)[14].

I = I0eγr (4)

whereI0istheoriginallightintensitywhenr=0,while

γ

denotes afixedlightabsorptioncoefficient.

Theattractivenessfactor,

β

(r),ofafireflyisproportionaltothe lightintensity,asdefinedinEq.(5)[14].

β

(

r

)

=

β

0eγr 2

(5)

where

β

0istheinitialattractivenessatr=0.

Moreover,thedistancebetweentwofirefliesiandjiscomputed inaccordancewiththeCartesiandistance,asshowninEq.(6).

ri j =

xixj

=

d k=1

xi,kxj,k

2 (6)

wherexi andxjrepresentthepositionsoffirefliesiandj,

respec-tively.xi,k indicates the kth dimension of position xi for the ith

fireflywhileddenotesthedimensionsofagivenproblem.

In conventionalFA, randomization isconducted usinga Gaus-sian or uniform distribution. A Levy-flight Firefly Algorithm (i.e. LFA)wasalsoproposedbyYang[1].Levyflightsareusedto imple-ment randomization to further enhance performance. The move-ment offirefly i towards a brighter firefly j isdefined inEq. (7).

xi= xi+

β

0eγr 2 i j

x jxi

+

α

sign

rand−1 2 Levy (7)

wherethesecondtermindicates themovementduetoattraction, andthethirdtermdenotesrandomizationusingLevyflights.Note that

α

istherandomizationparameter,whilesign[rand−1

2] repre-sentsarandomdirectionwiththerandomsteplengthfollowinga Levy distribution,whererand generatesa randomnumber inthe rangeof[0, 1].

InFA andLFA,the lightabsorption coefficient,

γ

,playsa very importantroleincharacterisingattractivenessanddeterminingthe convergencespeedofthealgorithms. When

γ

0,attractiveness remainsconstantandthelightintensitydoesnotdecrease.Inthis case,thebehavioursofFAandLFAareverysimilartothatofPSO, wherethewholepopulationofindividualsisvisibleinthesearch space,andtheglobaloptimumcanbeeasilyidentified.When

γ

→ ∞,attractivenessbecomesnearlynon-existence,whereeachfirefly performstherandomwalkoperation(e.g.LevyflightsforLFAand GaussianmutationforFA)withoutinteractingwithother individu-alsinthesearchspace.Inthisway,FAandLFAareequivalenttoa randomsearchalgorithmsuchasSA.FAandLFAusuallywork be-tween thesetwo extremecases,where differentfireflies work in-dependentlytosearchfortheoptimalsolutions.Theexperimental resultsindicatethat FAandLFA arecapableoffinding globaland localoptima simultaneously. Studiesalsoindicate that they auto-matically divide the population into subgroups, show impressive capabilityofdealingwithmultimodaloptimizationproblems,and outperform PSO andGA in terms ofaccuracy and computational efficiency[1].

3.2.2. Moth-flameoptimization

Introduced by Mirjalili [2], MFO is inspired by the transverse orientation navigationbehavioursofmoths.Itpossesseslocaland globalsearch capabilities, and isefficient in solving optimization problems withconstrained and unknown search spaces.In MFO, both moths and flames are employed to representsolutions and their positions denote theproblemvariables in thesearch space. Specifically, moths are designated as the search agents to iden-tifyoptimalsolutionsinthesearchspace,whereasflamesare em-ployedtoindicatethebestpositionsofthemothsobtainedsofar.

MFOemploysalogarithmicspiralfunctiondefinedinEq.(8)to updatethepositionofamoth[2].

S

Mi, Fj

= Di. ebt. cos

(

2

π

t

)

+ Fj (8)

whereMirepresentsthe ithmoth whileFjdenotes thejthflame.

Di=

|

FjMi

|

is the distance between the ith moth and the jth

flame,while b indicates a constant that definesthe shape ofthe logarithmicspiral. t indicates how closethe next positionof the moth is to the flame, which is a value in the range of [-1, 1] where-1and1indicatetheclosestandfarthesttotheflame, re-spectively.Thespiralequation isequippedwithefficientlocaland globalsearchcapabilities. Itenablesamoth toexplorethe search space, but not necessarily in the space between a moth and a flame, and to balance well betweenlocal exploration andglobal exploration.Inotherwords,explorationisachievedwhenthe op-timal solutionis found outsidethe spacebetween amoth anda flame whereas exploitation occurs when the fitter solutions are foundbetweenthem.

Inaddition,flames are rankedbasedon their fitnessvaluesin eachiteration.Inordertoavoidlocaloptimumandincreaseglobal

(6)

exploration,eachmothupdatesitspositionwithrespecttoa spe-cificflame.Thebestflameisusedforupdatingthepositionofthe firstmoth,whereas theworstflameisemployedforupdatingthe positionofthelastmoth.Overall,theupdatedflamesareusedfor updatingthepositionsofthemothsineachgeneration,inorderto explorethesearchspacemoreeffectively.

To increase local exploitation of the optimal solution vectors, the number of flames, f_no, in MFOis decreased adaptively. Eq. (9)definestheoperation.

f_no. = round

Nm×N −1

T

(9)

wherem representsthe currentnumberofiterationswithN and T indicating the maximum number of iterations and flames, re-spectively.Overall,MFOemploysthelogarithmicspiralsearch de-fined in Eq.(8) andthe flame decrement strategy definedin Eq. (9) to balance between global exploration andlocal exploitation toachieveglobaloptimality.Itshowssuperiorcapabilitiesof deal-ingwithmultimodalandunimodaloptimizationfunctions,aswell asother challengingoptimizationproblemswithunknown search spaces[2].

3.2.3. TheproposedvariantoftheLFAalgorithm

We propose a variant of the LFA algorithm, which integrates LFA withtheconceptofMFOfordiscriminativefeature optimiza-tion.TheresultingalgorithmisknownasM-LFA.Theproposed M-LFA algorithm benefits from both spiral search behaviour of the moths and attractiveness search actions of the fireflies to miti-gate the premature convergenceproblemof the original LFA and MFO models. It employs the logarithmicspiral search process of the moths to increase local exploitation of the fireflies to avoid stagnation.IncomparisonwiththeflamesinMFO,thefirefliesnot onlyrepresentthebestsolutionsidentifiedbythemoths,butalso act as the search agents based on the attractiveness function to increase search diversity. Therefore,it increaseslocal exploitation ofLFA andglobalexplorationofMFOtoguidethesearch process towards the globaloptimum. The identified best solution is also furthermutatedbytheSAoperationwithLevyflights,inorderto generatean offspringfurtherawayfromitsparentandtoincrease exploitation.Overall,theabovemechanismsworkinacooperative mannertoovercomeprematureconvergenceandguidethesearch process towardsglobaloptimality. Algorithm1illustrates the pro-posedM-LFAalgorithm.Fig.2showstheflowchartoftheproposed algorithm.

As illustrated in Algorithm 1, the proposed M-LFA algorithm firstly initializes a population of fireflies and a swarm of moths, respectively.Then,thefitnessofeachmothandeachfireflyis eval-uated usingafitness orobjectivefunction,f(xj).Twoseparate ar-raysarecreatedtostoreandrankthecorrespondingfitnessvalues forthefireflies andmoths,respectively. Thisenablespreservation of thebest solutionsobtained bythe mothsandfireflies, respec-tively.Thenextstepistoinitializethelightintensityofeach fire-fly,whereIj= f

(

xj

)

,andtheconstant lightabsorptioncoefficient,

γ

.

As discussed earlier,the mainmechanism oftheproposed al-gorithm consists of two search strategies. The first strategy em-ploys the moth spiral concept to improve the exploitation capa-bilityofthefireflies,whilethesecondstrategyemploysthe attrac-tionandattractivenessbehaviourofLFAtoenablesuddenand op-timalmovementofthefirefliesinthesearchspacetodiversifythe search process.Thefirstsearch strategyissimilartothat ofMFO, where each moth is assigned to a specific firefly that is ranked basedonthefitnessvalue.Specifically,thefirstmothisassignedto thebestfirefly,whilethelastmothisassignedtotheworstfirefly. Thesequenceofthefirefliesisalsoupdatedbasedonthebest so-lutionfoundby themothsandtheattractivenessimpact between

Fig.2. FlowchartoftheproposedM-LFAfeatureoptimizationalgorithm.

thefirefliesineachgeneration.Therefore,thestrategyrequires dif-ferent moths to move around different fireflies, which enhances globalexploration andreducesthe probability ofpremature con-vergence.TheupdatedlogarithmicspiralmovementdefinedinEq. (10)isusedtoupdateamoth’spositionwithrespecttoafirefly.

S

Mi, xj

= Di. ebt. cos

(

2

π

t

)

+ x

j (10)

whereMi representsthe ithmoth while xj isthe positionof the jthfirefly, andDi=

|

xjMi

|

representsthedistance betweenthe

ithmothandthejthfirefly.Inthisway,eachmothperformsa spi-ral search aroundeach firefly toexploit its neighbourhood. Ifthe solution obtainedby themoth has a better fitness, it is used to replacethepositionofthecurrentfirefly.Overall,thefirefliesare

(7)

Algorithm1:Pseudo-codeofproposedmoth-fireflyalgorithm. 1.Start

2.Initializeapopulationofmothsandapopulationoffirefliesrandomly;

3.Generatetwoarraystostorethefitnessvaluesformothsandfirefliesrespectively; 4.Evaluateeachmothandeachfireflyusingthefitness/objectivefunctionf(x); 5.SetlightintensityI=f(x)andlightabsorptioncoefficientγ;

6.

7.Sortfirefliesbasedontheirfitnessvaluesinitiallyandassignmothsaccordingly;

8.While(Stoppingcriterionisnotsatisfied)//untilitfindstheoptimalsolutionorthemaximumnumberofiterationsismet. 9.{

10. Fori=1tondo//foreachmothattachedwitheachfirefly 11. {

12. //Usingspiralsearchofmothstoguidethesearch

13. Updateconvergenceconstantandtherandomnumbert;//tparametercontrolshowclosethenextpositionofthemothistothefirefly,e.g. -1istheclosestand1isthefarthest.

14. Calculatethedistancebetweentheithfirefly(xi)andtheithmoth(Mi)usingDi=|xiMi|; 15. UpdatethepositionoftheithmothwithrespecttotheithfireflyusingEq.(10);

16. If(themoth’ssolutionisbetterthanthefirefly)

17. {

18. Replacetheithfireflywiththeithmoth’ssolution; 19. }EndIf

20. //Usingattractivenessfunctionoffirefliestoguidethesearch 21. Forj=1tondo//forallfireflies

22. {

23. If(Ij > Ii)

24. {

25. MovefireflyitowardsfireflyjusingEq.(7); 26. }EndIF

27. Varyattractivenesswithdistancerviaexp[-γr2]; 28. Evaluatenewsolutionsandupdatethelightintensity; 29. }EndFor

30. }EndFor

31. Rankthefirefliesandfindthecurrentglobalbest;

32. ImprovetheglobalbestbyapplyingSimulatedAnnealingembeddedwithLevyflights; 33. Reassignthemothsbasedontheupdatedrankingoffireflies;

34.}EndWhile

35.Outputoptimalsolution(s); 36.End

updatedwiththe mostoptimal solutionsidentified bythe moths ineachiteration.

In the proposed LFA variant, both moths and fireflies are re-garded as the search agents. As such, they move around in the searchspacetosearchfortheoptimalsolutions.Asexplained ear-lier,eachfireflyisupdatednotonlybythefittersolutionsfoundby thecorrespondingmothsintheneighbourhood,butalsomoves to-wardsothermoreattractivefirefliesinthesearchspace.Therefore, the second search strategy employs attractiveness and attraction actiondefinedinEq.(7)to moveafireflytowards abrighterone, inordertoincreasesearchdiversity.Sincethemothsperform spi-ralmovement around thefireflies,andthefireflies movetowards moreattractiveonesineach iteration,the secondsearch strategy increasesglobalexploration of the mothsin MFO.The algorithm thenupdatestheattractivenessandlightintensitywithrespectto thedistance.

Importantly, theabove two search mechanismswork ina col-laborativemanner toenablethe algorithmtoescapefromthe lo-cal optimum. As an example, if the spiral search process of the mothsstagnatesanddoesnotfindfittersolutionsforsomeofthe fireflies,the attractivenessfunctionstill enableslessfitfirefliesto movetowardsbettersolutions,andreachmoreoptimalsearch re-gions,thereforeavoidingstagnation.Inaddition,ifthefirefliesfail tocommunicate or interactwitheach other because ofa similar lightintensityorfoggysituations,themothsconductspiralsearch around different fireflies to increase global exploration and fine tunetheoptimalsolutionvectorstoovercomethelocaloptimum. Thesestrategiesworkcooperativelytomitigatepremature conver-genceandguidethesearchprocesstoattainglobaloptimality.

Afterconductingtheabovementionedtwosearchprocesses,the setof fireflies is subsequently ranked based on their fitness

val-ues.Themostpromisingsolution(firefly)amongthepopulationis identifiedineach generation.Toimprovelocalexploitation ofthe currentbestsolution,the SAoperation definedinEq.(11)is em-ployedtoperformmutation.

xt+1 = xt +

ε

(11)

where xt+1 andxt represent the newly generated and the

origi-nally identified promising solutions, respectively, and ɛ indicates astandard randomwalkoperation suchasaGaussian,Cauchy,or Levydistribution.Inthisresearch,Levyflightsareusedto mutate theoriginal solutionandgeneratean offspring furtherawayfrom its parent.Ifthenewoffspring solution,Nsol,hasabetter fitness value,it isusedtoreplacethe currentglobalbest,Csol.However, if the new solution is worse than the current best solution, the SAacceptsthenewsolutionifitsatisfiesthefollowingprobability rule[14]. exp

f

(

Nsol

)

f

(

Csol

)

T

> random[0, 1] (12)

whereTrepresentsthecurrenttemperatureforcontrollingthe an-nealingprocess(seeSection2.2.3).Tisdecreasedineachiteration byacoolingfactor

α

[0,1]asdefinedinEq.(13).

T =

α

T (13)

Subsequently, the new globalbest solution is used to guide the searchprocessinthenextgeneration.Thealgorithmiteratesuntil theterminationcriterionismet,i.e.themaximumnumberof iter-ationisreachedortheoptimalsolutionisfound. Inthisway,the algorithm is able to benefit from the optimal solutions obtained frombothmothsandfirefliessimultaneously.

(8)

Intheproposed algorithm,thefollowingfitnessfunctionis ap-pliedtoevaluatethefitnessofeachmothandfirefly.Itconsistsof twocriteria, i.e.thenumberofselectedfeatures andclassification accuracy. Note that classification accuracy refers to the accuracy rateobtained foreach emotioncategory, rather than acombined accuracyscoreforallemotionclasses,inordertoavoidbias.

f

(

x

)

= wa×accx+ wf×

(

number_f eaturex

)

−1 (14)

wherewaandwf representtheweightsforclassificationaccuracy

andthenumberofselectedfeatures,respectively,withwf=1−wa.

Inthisresearch,weconsiderclassificationaccuracyismore impor-tantthanthenumberofselectedfeatures,thereforewaissettoa

highervalue(0.9)whilewfassumesasmallervalue(0.1).

Thesettingof0.9and0.1astheweightsforclassification accu-racyandthenumberofselectedfeatures,respectively,isobtained from the empirical studies ofthis research, although such a set-tingisalsorecommendedinotherstudies[12,15,19,29].Inthis re-search,wehavealsousedanempiricaldemonstrationbychanging wa:wffrom0.9:0.1,0.8:0.2,0.7:0.3,0.6:0.4to0.5:0.5.Thesettingof

0.9:0.1 has beenselected owing to its performance in producing the best trade-off between classification accuracy and the num-ber of selected features. This observation is also consistent with otherfindings,e.g.tofurtherincreaseclassificationaccuracy,more redundant features need to be removed. Indeed, the experimen-talresultsindicate thattheselectedfeaturesubsetsbyM-LFA are morediscriminativethan thoseobtainedby other state-of-the-art PSOandFA variants(e.g.GM-PSO[15],chaotic FA[26]), and non-evolutionaryfeatureselection methods[9,13] (seethe evaluation detailsinSections4.2and4.3).

Overall, the proposed M-LFA algorithm employs spiral search, attractivenessaction,andrandomwalk operationstodiversifythe search process and increase local exploitation and global explo-ration.The empiricalresults indicatethat ithassuperior capabil-ities infindingtheglobaloptimum,andoutperforms metaheuris-ticsearchmethodssuchasPSO,GA,MFO,LFA,GM-PSO[15],LSFA [21],ODFA[22]andchaoticFA[26],significantly.

Fortheexperimentalstudy,weemployNNandSupportVector Machine (SVM)forrecognitionofseven expressions. Optimal set-tings ofNN andSVM areidentified using thetrial-and-error and grid searchmethods,respectively. TheAdaBoost procedureisalso used toconstructtwo ensembleclassifiersforexpression classifi-cation,i.e.NN-basedandSVM-basedensemblemodels,wherethe formeremploysthreeNNsasthebaseclassifierswiththelatter us-ing threeSVMsasthebaseclassifiers.Eachensemble model em-ploys the weighted majority votingmethod to combine the out-puts fromthe three baseclassifiers to generatethe final classifi-cationresult[12,29].Theempiricalresultsindicatethatthe SVM-based ensemblemodeloutperformsboth theNN-basedensemble andsingleclassificationmodelsindiverseexperimentalsettings. 4. Evaluation

Weemploythefrontal-viewimagesfromCK+andimageswith 45-degree and 90-degree rotations from BU-3DFE and MMI, re-spectively, for evaluating the proposed LBP descriptor forfeature extraction.Moreover,thefrontal-viewimagesfromCK+,JAFFE,and MMI, and multi-view and side-view images from BU-3DFE and MMI arealso usedtoevaluate theproposed M-LFA algorithm for featureoptimization.

4.1. EvaluationoftheproposedLBPdescriptorforfeatureextraction ToevaluatetheproposedLBPdescriptor,threebaselineLBP de-scriptors, i.e.LBP, LGBP andLBPV, havebeen employed for com-parison. Distinctive sets of 250 and 175images representing the

sevenfacialexpressionsfromtheCK+databaseareusedfor train-ingandtest,respectively.Inthisexperiment,weuseeachLBP de-scriptorintegratedwithsingleandensembleclassifiersfor expres-sionrecognition without applyingany feature selection methods. Table1showstheresultsfromtheentiresetsofrawfeatures ex-tractedby the proposed LBP variantandthe original LBPV,LGBP andLBPdescriptors,respectively.

AsindicatedinTable1,thebestresultsareproducedusingthe SVM-basedensemblemodel,andtheproposedLBPdescriptor out-performsLBP,LGBP,andLBPVby14.80%,8.33%,and5.25%, respec-tively.The empirical results indicateefficiency andsuperiority of theproposedLBPdescriptoroverLBP,LGBP,andLBPV.

TofurtherevaluatetheefficiencyoftheproposedLBPdescriptor indealingwithrotations,illumination changesandscaling differ-ences,we havegeneratedfoursets ofimageswith45-degreeand 90-degreerotations,illumination changes,andscalingdifferences, respectively, for evaluation purposes. Firstly, all 175 test images fromCK+havebeenconvertedtothosewithilluminationchanges byusingthebrightnessandcontrastadjustmentfunctionprovided byOpenCV[30],asfollows.

g

(

i, j

)

=

α

f

(

i, j

)

+

β

(15)

whereg(i,j)denotestheoutputimagepixelsandf(i,j)denotesthe sourceimage pixels withi asthe rowofpixels andj asthe col-umnofthepixels.

α

representsthegainand

β

representsthebias, whichareusedtocontrolthecontrastandbrightness,respectively. Theoriginaltestimagesaresetintohighandlowbrightness alter-natively,usingtheaboveequationtogeneratethenewtestimages. Similarly, the original 175 test images have also been trans-formed toa set of175 imageswithscaling differencesusing the OpenCVresize()function[30].Thesametrainingsetof250images from CK+ used in the previous experiment has been employed fortrainingbefore newlygeneratedtest imageswithillumination changesandscalingdifferencesareusedforevaluation.

ToevaluatetheLBPdescriptorsusingrotatedimages,45-degree multi-viewimagesfromtheBU-3DFEdatabaseand90-degree side-viewimagesfromtheMMIdatabasearealsoextracted.Specifically, a totalof 140side-view imageswith 90-degreerotations are ex-tractedfromthe video sequences ofMMI,withhalf ofthem (i.e. 70) employed for trainingand the other half (i.e. 70) for test. A setofmulti-view imageswith45-degree rotations fromBU-3DFE isalsoemployedforevaluation,with500imagesfortrainingand another250imagesfortest. Ineachexperiment, the correspond-ingLBP-basedfeatureextractionmethodisappliedandintegrated withdiverseclassifierswithoutanyfeature selectionprocess.The detailedevaluationresultsforcasesofillumination changes, scal-ingdifferences,androtationsaresummarisedinTables2–5.

As indicated in Tables 2–5, the proposed LBP descriptor showsgreatrobustnessandefficiencyindealingwithillumination changes,scalingdifferences,androtations.Itoutperformsthethree baseline LBP descriptors in the above diverse test cases signifi-cantly.IntegratedwiththeSVM-basedensemblemodel,alltheLBP descriptorsachieve the highestaccuracyrateineach experiment. When the SVM-based ensemble model is applied, the proposed LBPdescriptoroutperformsthethreebaselinedescriptorsby10.17– 16.62%, 10.18–17.83%, 3.77–8%and 5.9–7.1%, for imageswith illu-mination changesand scaling differences, andmulti-view images with 45-degree rotations (BU-3DFE), and side-view images with 90-degree rotations (MMI), respectively. Moreover, the 90-degree side-viewimagesfromtheMMIdatabaseposethemost challeng-ingproblembecauseofthedramaticinformationlossinside-view expressions.OurLBPdescriptor,however,stillshowsmore discrim-inating capabilities as compared with those from other LBP de-scriptorsinhandlingsuchimages.Overall,theempiricalresults in-dicate thatthe three strategiesincorporated inthe proposed LBP descriptorare ableto better preservethe distinctivenessand

(9)

dif-Table1

ClassificationperformanceofallfeaturesextractedbytheproposedLBPdescriptorandthe orig-inalLBPV,LGBP,andLBPdescriptorsfortheCK+database.

FeatureExtractionMethods NN(%) SVM(%) NN-based Ensemble(%) SVM-based Ensemble(%) LBP 63.20 64.83 70.00 71.20 LGBP 68.12 68.98 75.00 77.67 LBPV 70.54 71.00 77.32 80.75 TheproposedLBPdescriptor 77.70 79.10 83.50 86.00

Table2

Performancecomparisonbetweenthe proposed LBPdescriptorand otherLBPdescriptors withoutanyfeatureselectionusing175imageswithilluminationchangesderivedfromCK+.

Featureextractionmodels NN% SVM% NN-based Ensemble% SVM-based Ensemble% LBP 59.45 62.75 65.00 66.05 LGBP 65.95 67.23 70.35 72.25 LBPV 68.35 69.15 71.88 72.50 TheproposedLBPdescriptor 75.50 77.07 80.33 82.67

Table3

Performancecomparisonbetweenthe proposed LBPdescriptorand otherLBPdescriptors withoutanyfeatureselectionusing175imageswithscalingdifferencesderivedfromCK+.

Featureextractionmodels NN% SVM% NN-based Ensemble% SVM-based Ensemble% LBP 60.10 61.88 64.35 65.50 LGBP 66.00 68.00 71.67 73.15 LBPV 67.25 68.45 70.85 71.55 TheproposedLBPdescriptor 76.66 78.81 81.24 83.33

Table4

Performancecomparisonbetweenthe proposed LBPdescriptorand otherLBPdescriptors withoutanyfeatureselectionusing70side-viewimageswith90-degreerotationsextracted fromMMI.

Featureextractionmodels NN% SVM% NN-based Ensemble% SVM-based Ensemble% LBP 50.70 51.25 53.44 53.90 LGBP 51.00 51.50 54.00 54.00 LBPV 51.67 52.05 54.60 55.10 TheproposedLBPdescriptor 55.35 57.15 60.50 61.00

Table5

Performancecomparisonbetweenthe proposed LBPdescriptorand otherLBPdescriptors withoutanyfeature selection using250multi-viewimages with 45-degreerotations ex-tractedfromBU-3DFE.

Featureextractionmodels NN% SVM% NN-based Ensemble% SVM-based Ensemble% LBP 70.00 72.33 74.00 74.00 LGBP 73.45 74.25 76.50 77.33 LBPV 74.10 74.75 76.00 78.23 TheproposedLBPdescriptor 79.00 79.00 81.50 82.00

ferentiate different local structures in the neighbouring pixels of an input image. Fig.3 showsthe exampleoutputs of all the LBP operatorsforimageswithilluminationchanges,scalingdifferences androtations.

4.2.ComparisonoftheproposedM-LFAfeatureoptimizationwith othermetaheuristicsearchmethods

ToevaluatetheproposedM-LFAalgorithmforfeatureselection, anumberofstate-of-the-artandconventionalsearchmethodsare usedforcomparisonpurposes,which includePSO, GA,LFA,MFO, GM-PSO[15],LSFA [21],ODFA [22],andchaotic FA [26]. We

em-ploy the frontal-view images from CK+, JAFFE and MMI, multi-view images with 45-degree rotations from BU-3DFE, and side-viewimageswith90-degreerotationsfromMMIinthe experimen-talstudy.Specifically,weemploy250frontal-viewimagesfromthe CK+databasefortraining,and175imagesextractedfromeachof the CK+, MMI, and JAFFE databases for test. Moreover, 500 and 250multi-viewimageswith45-degreerotationsfromBU-3DFEare alsousedfortrainingandtestrespectively.Anothersetof140 side-viewimageswith90-degreerotationsfromMMIisalsoemployed, with70imagesfortrainingandtheremaining70imagesfortest. Ineachexperiment,theproposedLBPdescriptorisusedtoextract the initial features. Then, each feature optimization algorithm is

(10)

Fig.3. ExampleoutputsofalltheLBPoperatorsforimageswithilluminationchanges(the2ndand3rdcolumnsderivedfromCK+),scalingdifferences(the4thand5th columnsderivedfromCK+)androtations(the6thand7thcolumnsfromBU-3DFEandthelastcolumnfromMMI).

employed, beforesingleandensembleclassifiers areused for ex-pressionrecognition.

Firstly, the optimal settings of the proposed M-LFA algorithm and all other optimization methods are identified. As an exam-ple, to achieve the best trade-off between classification accuracy and thecomputational efficiency,LFA employs thefollowing set-tings,i.e.populationsize=30,initialattractiveness=1.0, randomiza-tion parameter=0.2, absorption coefficient=1.0, Levy’s index=1.5, and maximum iterations=500. These settings of LFA have also been applied to other FA variants (i.e. LSFA, ODFA and chaotic FA), and the proposed M-LFA algorithm, except that M-LFA has a population size of 30 moths plus 30 fireflies. Moreover, the following optimal settings have also been applied to PSO based on published studies and our empirical results, i.e. maximum velocity=0.6,inertiaweight=0.78,populationsize=30,acceleration constantsc1=c2=1.2, andmaximumgenerations=500.The setting

of classical GA is as follows: crossover probability=0.6, mutation probability=0.05, andmaximum generations=500. Theabove PSO andGA parameters are also usedas theoptimal settingsof GM-PSO.

4.2.1. WithindatabaseevaluationusingFrontal-viewimagesfrom CK+

Since the proposed and other feature optimization algorithms are stochastic in nature, we perform 30 trials to find the most discriminative feature subsetsforeach algorithm.In the first ex-periment, we employ 250 and175 frontal-view images fromthe CK+ database for training andtest, respectively. Empirically, the proposed M-LFA algorithmis abletoconvergewithin 200 to300 iterationsinmostcaseswithasetof30to50features extracted. Moreover,we havecompared theproposedM-LFAalgorithm with otherstate-of-the-artandconventionalmetaheuristicsearch meth-ods.Table6showstheaverageclassificationaccuracyratesofeach methodintegratedwithdiverseclassifiersover30successiveruns, respectively.

Incomparisonwithallother methods,theproposedalgorithm is able to extract the lowest number of features in the range of [30–50]andachievethehighestaverageaccuracyrateswhen com-bined with all single and ensemble classifiers. When NN- and SVM-based ensemblemodels are applied, our algorithm achieves its best performance of 100% accuracy. Integrated with the NN-based ensemblemodel, our algorithm outperformsGA, PSO, LFA, MFO, GM-PSO, LSFA, ODFA, and chaotic FA by 21.12%, 18.67%, 14.27%,8.54%,10.7%,16.11%,9.1%,and7.67%,respectively.In com-binationwiththeSVM-basedensemblemodel,theproposed algo-rithm outperformsGA, PSO, LFA, MFO,GM-PSO,LSFA, ODFA,and chaoticFA by20%,17.5%,12.19%, 7%,10%,13.25%,7.35%,and5.5%, respectively.Moreover,theaboveaccuracyratesobtainedusingthe proposedM-LFAalgorithmsignificantlyoutperformthoseusingthe entireset ofrawfeatures withthe proposedLBPdescriptor with-outanyfeatureselection,asshowninTable1.

Fig.4showssomeexamplesofthegeneratedoptimizedfeature sub-regionsforeach expressionofthe imagesfromCK+. Overall, significant discriminativecharacteristics arerevealed foreach ex-pression, which correlate well with the emotional muscular ac-tivities defined in the Facial Action Coding System (FACS) [31]. As an example, the characteristicsassociated with the lip corner pullerandcheekraiserare observedintheoptimizedsub-regions for “happiness”, whereas mouth open, eye widened and the in-ner and outer brow raisers are explicitly illustrated in the facial sub-regionsfor“surprise” .TheM-LFA algorithmalsoreveals fea-turesthatarecloselyassociatedwiththenosewrinkler,upperlip raiser, chinraiser andlipspart for“disgust” . Thesignificance of thelip stretcher, widenedeyes, outer brow raiserand brow low-ererisobserved intheselectedfacial sub-regionsfor“fear” .The significance of the brow lowerer, lid and lip tightener is explic-itlydemonstratedintheselectedfeaturesubset for“anger”,while theinner brow raiser, brow lowerer andlip cornerdepressorare clearly indicated in the optimized facial regions for “sadness” . Overall, significant discriminative characteristics are revealed for eachexpression,whichmapcloselytotheAUsprovidedinFACS.

(11)

Table6

Averageclassificationresultsfordifferentoptimizationmodelsover30runsusingtheCK+database. Featureselectionmethods Numberoffeatures NN(30runs)% SVM(30runs)% NN-based

Ensemble (30runs)% SVM-based Ensemble (30runs)% GA 100–200 74.60 76.90 78.88 80.00 PSO 110–200 76.33 78.70 81.33 82.50 LFA 55–80 80.00 80.00 85.73 87.81 MFO 40–90 87.80 89.10 91.46 93.00 GM-PSO[15] 45–80 83.76 86.45 89.30 90.00 LSFA[21] 60–85 79.00 79.50 83.89 86.75 ODFA[22] 50–80 87.22 88.55 90.90 92.65 ChaoticFA[26] 45–90 88.65 89.00 92.33 94.50 TheproposedM-LFAalgorithm 30–50 95.66 96.50 100 100

Fig.4. Thesub-regionsandtheirdistributionsselectedbyM-LFAforeachexpressionoftheCK+images. Table7

Averageclassificationresultsover30runsforcross-databaseevaluationwithJAFFE.

Featureselectionmethods Numberoffeatures NN(30runs)% SVM(30runs)% NN-based Ensemble (30runs)% SVM-based Ensemble (30runs)% GA 100–200 72.21 73.65 77.00 78.30 PSO 110–200 73.13 75.90 78.72 79.89 LFA 55–80 79.21 79.67 81.62 84.55 MFO 40–90 84.85 86.31 88.21 89.63 GM-PSO[15] 45–80 84.25 85.96 88.50 89.00 LSFA[21] 60–85 78.40 79.00 83.55 85.95 ODFA[22] 50–80 84.79 85.00 88.55 88.78 ChaoticFA[26] 45–90 88.00 88.00 90.75 91.45 TheproposedM-LFAalgorithm 30–50 94.21 95.30 100 100

4.2.2. CrossdatabaseevaluationusingFrontal-viewimagesfrom JAFFEandMMI

Tofurther evaluatetheefficiencyoftheproposed algorithm,a cross-database evaluationis conducted. A setof 175frontal-view images is extracted from MMI and JAFFE, respectively, for test, whiletheabove250frontal-viewimagesfromCK+ areemployed asthetrainingset.Table7showstheaverageperformanceofeach algorithmincombinationwithsingleandensembleclassifiersfor evaluationof175images extractedfromthe JAFFE databaseover 30runs.

AsillustratedinTable7,incomparisonwithallothermethods, theproposed M-LFA algorithm shows great robustnessfor cross-databaseevaluationwithJAFFE.Integratedwithsingleand ensem-bleclassifiers,it achievesthebestaverageaccuracy over30runs. IntegratedwiththeNN-basedensemblemodel,theaverage accu-racy(i.e. 100%)of theproposed algorithm is23%, 21.28%, 18.38%, 11.79%, 11.5%, 16.45%, 11.45% and9.25% higher than those ofGA,

PSO,LFA, MFO,GM-PSO,LSFA,ODFA,andchaoticFA, respectively. Combined with the SVM-based ensemble model,the average ac-curacy(i.e.100%)oftheproposedalgorithmoutperformsthoseof GA,PSO,LFA,MFO,GM-PSO,LSFA,ODFA,andchaotic FAby21.7%, 20.11%,15.45%,10.37%,11%.14.05%,11.22%,and8.55%,respectively. Anothercross-database evaluationisalsoconductedusing175 frontal-view images from the MMI database. The average clas-sification results of each algorithm integrated with diverse clas-sifiers over 30 trials are provided in Table 8. As shown in Table 8, trained with 250 images from CK+ and tested with 175 images from MMI, the proposed algorithm achieves the highest average accuracy rates in combination with all classi-fiers over 30 runs. When the SVM-based ensemble model is used, it achieves an average accuracy rate of 94.86%, and out-performs GA, PSO, LFA, MFO, GM-PSO, LSFA, ODFA, and chaotic FA by 17.35%, 16.8%, 8.46%,5.07%, 6.88%,7.81%, 5.98%,and 5.91%, respectively.

(12)

Table8

Averageclassificationresultsover30runsforcross-databaseevaluationwithMMI.

Featureselectionmethods Numberoffeatures NN(30runs)% SVM(30runs)% NN-based Ensemble (30runs)% SVM-based Ensemble (30runs)% GA 100–200 69.60 69.79 75.34 77.51 PSO 110–200 71.03 73.11 77.51 78.06 LFA 55–80 79.90 81.76 83.33 86.40 MFO 40–90 84.83 85.30 87.16 89.79 GM-PSO[15] 45–80 83.24 83.77 86.45 87.98 LSFA[21] 60–85 79.33 81.45 85.75 87.05 ODFA[22] 50–80 82.13 83.06 86.95 88.88 ChaoticFA[26] 45–90 86.00 86.88 87.67 88.95 TheproposedM-LFAalgorithm 30–50 91.21 91.44 94.27 94.86

Fig.5. TheboxplotdiagramforalltheoptimizationmethodsintegratedwithSVM-basedensemblemodelover30runsusingtheMMIdatabase.

Fig. 5 showsthe boxplotdiagram forthe distribution of clas-sification results over 30 runs of all optimization algorithms in-tegrated with the SVM-based ensemble model using 175images fromMMI.AsshowninFig.5,theproposedalgorithmoutperforms all other methods significantly.Nearlyall theresults ofour algo-rithm are higher than the maximum accuracy rates of GA, PSO, LFA, MFO, GM-PSO, and ODFA with at least 75% of our results higherthanthehighestaccuracyrateofchaoticFA,and25%ofour results higherthan themaximum accuracyrateof LSFA.In com-parison with other algorithms, the proposed algorithm also has a better accuracy distribution with comparatively smaller varia-tions betweenthe 25% and75% percentiles.Themedianvalue of ouralgorithm(94.93%)ishigherthanthoseofGA,PSO, LFA,MFO, GM-PSO,LSFA,ODFA,andchaotic FAby17.72%,16.43%,8.29%,5%, 6.93%, 7.86%, 5.93%, and5.86% respectively. Fig. 6 also illustrates thedetailedperformancevariationsofeachemotioncategoryover 30runsforallalgorithmsusingtheMMIimages.

As illustrated inFig.6,foreach emotioncategory, the median value ofouralgorithm ishigherthan thoseof allother methods. For recognition of “sadness” and“fear” emotions, 25% of the re-sultsfromouralgorithmarehigherthanthe maximumresultsof allotheralgorithms.Forthe“disgust” emotion,exceptforLSFAand ODFA, 25% of ourresults are also higher than the maximum re-sults of the rest of the methods. For “happiness”, the minimum accuracy rate of our algorithm (with a lower whisker of 92%)is higherthan50%oftheresultsofMFO,ODFA,andchaotic FA,75% of the resultsof GA, PSO, GM-PSO, andLSFA, andall theresults of LFA. For“anger”,the minimumaccuracy rateof ouralgorithm (withalowerwhiskerof89%)ishigherthannearly50%ofthe re-sultsofMFO,ODFA,andchaoticFA,75%oftheresultsofLFA, GM-PSO,andLSFA,andalltheresultsofGAandPSO.Forthe“surprise” emotion,themedianvalueofouralgorithm(100%)is5%,5%,5.5%,

6.5%,7.5%,8%,15.5%,and16%higherthanthoseofchaoticFA,MFO, ODFA,LSFA,GM-PSO,LFA,PSO,andGA,respectively.Forthe “neu-tral” emotion,incomparisonwithallothermethods,ouralgorithm hasabetteraccuracydistributionwithcomparativelysmaller vari-ations between the 25% and 75% percentiles, and the minimum accuracy rateof our algorithm (with a lower whisker of91%) is higherthanatleast25%oftheresultsofMFOandchaoticFA,50% of the results of LSFA and ODFA, 75% of the results of LFA and GM-PSO,andallresultsofGAandPSO.Overall,theevaluation re-sultsindicatesuperiorityofouralgorithm.Itoutperformsallother methodsbyasignificantmargin.

Fig.7showssome examplesof thegeneratedoptimizedfacial sub-regionsforeachexpressionpertainingtotheimagesfromMMI andJAFFE usingthe proposed M-LFA algorithm. Similar observa-tionsasthosefromCK+canbeexplicitlyobservedintheexample outputs. Ingeneral, significant discriminativecharacteristics asso-ciatedwitheachexpressionarerevealed,whichindicateefficiency andsuperiorityoftheproposedM-LFAalgorithm.

4.2.3. EvaluationusingimageswithrotationsfromMMIand BU-3DFE

Asindicatedinourpreviousexperiments,the45-degree multi-view images from BU-3DFE and 90-degree side-view rotated fa-cialimagesfromMMIrevealsignificantinformationlossandpose great challenges to state-of-the-art facial expression recognition systems.Therefore,weemploysuchmulti-viewandside-view im-agesfromBU-3DFEandMMI,respectively,tofurtherascertainthe robustnessof our featureselection algorithm.In the experiment, we use the proposed LBP descriptorfor feature extraction.Then, each feature selection algorithm is used for feature optimization beforeemployingthesingleandensembleclassifiers.

(13)

Fig.6. Theboxplotdiagramsforeachemotioncategoryover30runsusingtheMMIdatabase. Firstly,weemployprevious140side-viewimagesfromMMIfor

evaluation.Atotalof70imagesareusedfortraining,withthe re-maining imagesfor test. Table 9shows theaverage classification results of each algorithm in combination with diverse classifiers over30runs.Someexamplesoftheselectedoptimizedfacial sub-regionsforeachexpressionoftheside-viewimagesareillustrated inFig.8.

As illustrated in Table9, theproposed algorithmachieves the bestaccuracyscoresincombinationwithdiverseclassifiers.When theSVM-basedensemblemodelisapplied,M-LFAachievesan av-erageaccuracyrateof86.35%,andoutperformsGA,PSO,LFA,MFO, GM-PSO, LSFA, ODFA, and chaotic FA by 21.23%, 20.45%, 15.6%, 11.35%, 10.25%,12.68%, 11.35%and9.9%, respectively.As indicated in Fig. 8, significant discriminative features associated with each

(14)

Fig.7. Thesub-regionsandtheirdistributionsselectedbytheproposedM-LFAforeachexpressionforimagesfromMMI(thefirstthreeimages)andJAFFE(thelastfour images).

Table9

Averageclassificationresultsover30runsforthe90-degreeside-viewimagesextractedfromMMI. Featureselectionmethods Numberoffeatures NN(30runs)% SVM(30runs)% NN-based

Ensemble (30runs)% SVM-based Ensemble (30runs)% GA 120–250 60.44 60.77 64.43 65.12 PSO 100–195 61.85 61.95 65.00 65.90 LFA 45–90 66.99 68.19 70.09 70.75 MFO 50–85 71.20 71.88 73.50 75.00 GM-PSO[15] 45–80 72.33 72.50 74.75 76.10 LSFA[21] 65–85 70.25 71.00 72.34 73.67 ODFA[22] 50–70 70.05 70.78 72.45 75.00 ChaoticFA[26] 50–80 74.55 75.00 76.05 76.45 TheproposedM-LFAalgorithm 40–65 78.00 80.40 85.99 86.35

Fig.8. Thesub-regionsandtheirdistributionsselectedbytheproposedM-LFAalgorithmforeachexpressionoftheside-viewimageswith90-degreerotationsextracted fromMMI.

expression are also revealed for the 90-degree side-view images to indicate efficiency of the proposed M-LFA algorithm. The ex-tractedfeaturesalsomapcloselywiththemuscularactivities rec-ommendedbyFACS.

Moreover,we havealsoemployedmulti-view imagesfromthe BU-3DFEdatabasetofurtherassessourfeatureselectionalgorithm. The original 750 multi-view images from BU-3DFE are used for evaluation, with500imagesfortrainingand250imagesfortest. Table10showstheaverageclassificationperformancesofeach al-gorithmincombinationwithdiverseclassifiersover30runsusing multi-viewimageswith45-degreerotations.AsshowninTable10,

theM-LFAalgorithmachievesthehighestaccuracyratesin combi-nationwithdiverseclassifiers.IntegratedwiththeSVM-based en-semble model,it obtains the best average accuracyrate of100%, and outperforms GA, PSO, LFA, MFO, GM-PSO, LSFA, ODFA, and chaoticFAby26.75%,23.89%,20%,9.55%,10.9%,12.05%,7.01%,and 3.3%,respectively.

The selected optimized facial sub-regions for each expression ofthemulti-viewimagesareshowninFig.9.Theseselected opti-malsub-regionsaroundthemouthandeyeareasassociatestrongly withtheexpressionofseven emotions.Theyarehighlycorrelated withtheAUsprovidedinFACStoo.

(15)

Table10

Averageclassificationresultsover30runsformulti-viewimageswith45-degreerotationsextractedfromBU-3DFE. Featureselectionmethods Numberoffeatures NN(30runs)% SVM(30runs)% NN-based

Ensemble (30runs)% SVM-based Ensemble (30runs)% GA 135–245 70.15 70.80 73.00 73.25 PSO 150–215 72.33 73.00 75.67 76.11 LFA 60–95 76.75 77.56 79.33 80.00 MFO 45–75 86.45 87.05 89.75 90.45 GM-PSO[15] 40–70 85.77 86.10 88.50 89.10 LSFA[21] 45–90 84.00 84.75 86.65 87.95 ODFA[22] 40–65 89.20 90.00 92.35 92.99 ChaoticFA[26] 40–70 92.50 92.89 94.30 96.70 TheproposedM-LFAalgorithm 25–55 95.87 96.25 100 100

Fig.9. Thesub-regionsandtheirdistributionsselectedbytheproposedM-LFAalgorithmforeachexpressionofmulti-viewimageswith45-degreerotationsextractedfrom BU-3DFE.

Some theoretical comparisonbetweenODFA [22] andthe pro-posedM-LFAalgorithmisconducted,asfollows.ODFAemploysan opposition-based method for population initialization, which in-cludes generating an opposite population of the original swarm. Then, a dimensional-based method is applied to generate the globalbestsolutionbyidentifyingthemostoptimalvalueforeach dimensionindividually. Thisglobalbest solution isthen used for updating the position of each firefly in each iteration. Although achieving an efficient computational cost, the search process of ODFAinheritsthePSOconcept,whichisguidedbytheglobalbest solutionineachgeneration,ratherthanevolvingthrougha setof optimal solutions based on the attractiveness and attraction be-havioursoftheoriginalFAmodel.Inaddition,ODFAdoesnot pro-videanymechanismtoconductlongandshortjumpsofthe iden-tifiedglobalbestsolutionto avoidlocaloptimum.Therefore,it is morelikelytoleadtopremature convergence.Another shortcom-ingofODFAisitslimitationindealingwithmultimodal optimiza-tionproblemsthat havemultiplebest solutions.GM-PSO [15] in-tegratesPSOwiththeGAandthreemutationtechniquesof Gaus-sian,CauchyandLevy distributionstofurtherenhancetheswarm leader to identify the discriminative features for bodily expres-sionregression.However, itssearch strategy reliesheavily onthe PSOmechanismwherethesearchprocess isguidedbytheglobal bestleaderratherthanmultiplepromisingsolutionsinthesearch space, therefore more likely to be trapped in local optima. LSFA [21]incorporates LFAwith SA,where thebest solutionidentified byLFAisfurtherenhancedbytheSA.However,SA-basedlocal

ex-ploitationisonlyappliedtotheglobalbestsolution.Thealgorithm doesnotincludeanyotherstrategytoincreaselocalexploitationof theoverallpopulation.Therefore,thesearchprocesslacksof diver-sity and showslimited capability in escaping from local optima. Moreover, chaotic FA [26] employs CMOforpopulation initializa-tion,inordertoincreaseswarmdiversity.Achaoticstrategyisalso employedbyfireflieswithalowerlightintensitytomovetowards those with a higher light intensity in the neighbourhood. Espe-cially, the firefly withthe highest light intensity purely executes thischaoticmovement,ratherthanarandombehaviour,toexploit thesearchspace.However,ifthischaoticmovementfailsto gener-ateafitteroffspringforthecurrentglobalbestleader,thereisno searchmechanismembeddedinthealgorithmtodrivethesearch outofthelocaloptimumandtoovercomestagnation.

Incomparisonwiththeabovemethods,theproposedM-LFA al-gorithm employs fireflies andmothsto followmultiple attractive solutions inthe neighbourhood (rather thanpurely following the globalbest leader asinODFA [22] andGM-PSO [15]) tomitigate premature convergence.It employs two search strategies, i.e.the spiralsearchofthemothstoincreaselocalexploitationofLFAand the attractiveness search actions of the fireflies to cause sudden movementoffireflies andtheir attachedmothstoincreaseglobal explorationofMFO.Ineachiteration,eachfireflyisguidedbyboth searchstrategiessimultaneouslytofindtheoptimalsolution.These twostrategiesworkcooperativelytoovercomethelocaloptimum. Whentherearenomoreattractivefirefliesinthesearchspace,the spiralsearchbehaviourofthemothsexploitstheneighbourhoodof

(16)

Table11

ComparisonbetweentheproposedM-LFAalgorithmandnon-evolutionaryfeatureselectionmethodsusingfrontal-viewimages fromCK+,JAFFEandMMI.

Featureselectionmethods Averagenumberof CK+ JAFFE MMI(Frontal-view) selectedfeatures SVM-based

Ensemble% SVM-based Ensemble% SVM-based Ensemble% mRMR[13] 72 84.00 83.56 79.50 PCA+LDA[9] 61 97.88 97.05 92.33 PCA+LFDA[9] 55 98.75 98.25 92.50 KLFDA[9] 54 99.00 98.88 93.00

TheproposedM-LFAalgorithm 43 100 100 94.86

Table12

ComparisonbetweentheproposedM-LFAalgorithmandnon-evolutionary featureselectionmethodsusingside-viewimageswith90-degreerotations fromMMI.

MMI(90-degreeside-view) Featureselection methods Averagenumberof selectedfeatures SVM-based Ensemble% mRMR[13] 164 70.85 PCA+LDA[9] 71 80.45 PCA+LFDA[9] 57 81.00 KLFDA[9] 57 83.20 TheproposedM-LFA

algorithm

50 86.35

thefireflies,inordertoavoidstagnation.Ontheotherhand,when thespiralsearchprocess ofthemothsfailsto identifya fitter so-lution,theattractiveness andattractionbehavioursofthefireflies are able to guidethe search process towards the optimalregion. Inaddition,SAembeddedwithLevyflightsisusedtoenable long-jumpofthemostoptimalsolutiontoavoidstagnation.Theabove search mechanismleadstosuperiority oftheproposed algorithm over other state-of-the-artPSOandFA variants,i.e.GM-PSO [15], LSFA [21],ODFA [22], andchaotic FA [26].Most importantly,our algorithm extendsthenaturalmultimodaloptimization character-istics of the original LFA, and showsefficient abilities in dealing withmultimodaloptimizationproblems.

4.3. ComparisonwithNon-evolutionaryfeatureselectionmethods Besidestheabovetheoreticalevaluationandempirical compar-ison against evolutionary feature optimization methods, we fur-thercompareM-LFAwithnon-evolutionaryfeatureselection tech-niques presented in [9] and [13]. As discussed earlier, Ali et al. [9] employed three feature dimensionality reduction techniques to identify the most discriminative features for each expression, i.e., PCA+LDA,PCA+LFDA,andKLFDA.Zhangetal.[13]developed a shape-based facial expression recognition system with mRMR-based feature selection. We have also implemented both non-evolutionary feature selectionmethods described in[9]and [13], i.e. mRMR, PCA+LDA, PCA+LFDA, and KLFDA, for comparison.In each experiment, we use the proposed LBP descriptor to extract initialfacialfeatures,andthenapplyeachoftheabovefeature se-lection methods forfeature optimization. Table 11showsthe re-sultsforeachset of175frontal-view imagesextractedfromCK+, MMI,andJAFFE respectively.Note that250imagesfromCK+ are used fortraining. We have alsocompared M-LFA withthe above methodsusingthe90-degreeside-viewimagesfromMMI,with70 imagesfortrainingandanother70imagesfortest.Multi-view im-agesfromBU-3DFEarealsousedforevaluation,with500and250 imagesfortrainingandtest,respectively.Theresultsfromthe side-view imagesfromMMIandmulti-viewimagesfromBU-3DFEare providedinTables12and13,respectively.

Table13

ComparisonbetweentheproposedM-LFAalgorithm,andnon-evolutionary featureselectionmethodsusingthemulti-viewimageswith45-degree ro-tationsfromBU-3DFE.

BU-3DFE(45-degreemulti-view) Featureselection methods Averagenumberof selectedfeatures SVM-based Ensemble% mRMR[13] 177 77.33 PCA+LDA[9] 90 96.00 PCA+LFDA[9] 84 96.35 KLFDA[9] 87 97.98 TheproposedM-LFA

algorithm

36 100

Theoretical comparison between M-LFA and the above non-evolutionary feature selection methods is conducted, as follows. Although mRMR is a popular feature optimization method, ac-cording to Zeng et al. [32], the incremental search scheme of mRMR only selects one feature at a time without considering the interaction betweengroupsof features. Therefore, the exper-imentsusingthemRMR-basedfeatureselection methodshownin Tables 11–13 yield the least promising performance. As indi-cated in Tables 11–13, the three related

Figure

Fig. 1. The system architecture.
Fig. 2. Flowchart of the proposed M-LFA feature optimization algorithm.
Fig. 3. Example outputs of all the LBP operators for images with illumination changes (the 2nd and 3rd columns derived from CK + ), scaling differences (the 4th and 5th  columns derived from CK + ) and rotations (the 6th and 7th columns from BU-3DFE and th
Fig. 4. The sub-regions and their distributions selected by M-LFA for each expression of the CK + images
+4

References

Related documents