• No results found

Cry-based infant pathology classification using GMMs

N/A
N/A
Protected

Academic year: 2021

Share "Cry-based infant pathology classification using GMMs"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

SpeechCommunication77(2016)28–52

www.elsevier.com/locate/specom

Cry-based

infant

pathology

classification

using

GMMs

Hesam

Farsaie

Alaie

1,∗

,

Lina

Abou-Abbas

,

Chakib

Tadj

MMSLab,DepartmentofElectricalEngineering,ÉcoledeTechnologieSupérieure,Université duQuébec,1100rueNotre-DameOuest,Montréal,QC,Canada, H3C1K3

Received1February2015;receivedinrevisedform13October2015;accepted2December2015 Availableonline11December2015

Abstract

Traditionalstudiesofinfantcrysignalsfocusmoreonnon-pathology-basedclassificationofinfants.Inthispaper,weintroduceanoninvasive healthcaresystemthatperformsacousticanalysisofuncleannoisyinfantcrysignalstoextractandmeasurecertaincrycharacteristicsquantitatively andclassifyhealthyandsicknewborninfantsaccordingtoonlytheircries.Intheconductofthisnewborncry-baseddiagnosticsystem,thedynamic MFCCfeaturesalongwithstaticMel-FrequencyCepstralCoefficients(MFCCs)areselectedandextractedforbothexpiratoryandinspiratory cryvocalizationstoproduceadiscriminativeandinformativefeaturevector.Next,wecreateauniquecrypatternforeachcryvocalizationtype andpathologicalconditionbyintroducinganovelideausingtheBoostingMixtureLearning(BML)methodtoderiveeitherhealthyorpathology subclassmodelsseparatelyfromtheGaussianMixtureModel-UniversalBackgroundModel(GMM-UBM).Ournewborncry-baseddiagnostic system(NCDS)hasahierarchicalschemethatisatreelikecombinationofindividualclassifiers.Moreover,ascore-levelfusionoftheproposed expiratoryandinspiratorycry-basedsubsystemsisperformedtomakeamorereliabledecision.Theexperimentalresultsindicatethattheadapted BMLmethodhaslowererrorratesthantheBayesianapproachorthemaximumaposterioriprobability(MAP)adaptationapproachwhenconsidered asareferencemethod.

© 2015TheAuthors.PublishedbyElsevierB.V.

ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/).

Keywords:Gaussianmixturemodel;Universalbackgroundmodel;Mel-frequencyCepstralCoefficient;Likelihoodratioscores;Newborninfantcries;Expiratory andinspiratorycry.

1. Introduction

Crying isthefirstclearsignoflifethatisobservedshortly afterababy’slivebirth.Althoughtherehavebeensomebooks andproductsthatwerecreatedthroughtheyearstounlockthe secretlanguage of babies,their potential for usein the early diagnosisandtreatmentofnewbornsremainslargelyinanopen andundevelopedstate.Theresultsofthesestudieshighlightthe existenceof somecry attributesinsickinfantsthatare rarely observedin the cries of healthy infants(Wasz-Hockert et al. 1985,1968;BensonandHaith,2010).Instead,theseattributes occurfrequently inthe cries of sick infants whosuffer from differentmedicaldiseasesandconditions.Therefore,infantcry characteristicsreflecttheintegrityofthecentralnervoussystem.

Correspondingauthor.Tel:+15146419426.

E-mailaddress:[email protected], [email protected](H.FarsaieAlaie).

1

A2446,1100rueNotre-DameOuest,QC.

1.1. Earlystudiesandbirthdefects

Many early researchers defined several cry characteristics andpresented their common values,suchas the fundamental frequency, formants, cry modes, cry latency, phonation, hy-perphonation, and dysphonation (Wasz-Hockert et al. 1985; LaGasseetal.,2005;Lederman,2002;Newman,1985). Grad-ually,detailedacousticanalysis,whichmeasuresandcompares the acoustical characteristics of newborn infant cry signals, showshiddendiagnosticpotential ofcry signalsfor thebasic cry types and the cries of infants in pathological conditions such as brain damage, central nervous system diseases and Down’s syndrome (Wasz-Hockert et al.1985,1968; Michels-son,1971; MichelssonandMichelsson,1999;Partanenetal., 1967). It appearsthat some of the symptoms are not always recognizedorevendonotappearformonthsoryears;thus,it might betoo latefor treatment after clinicalsymptoms start, especiallyincountriesthatdonothavewell-establishedhealth services.

http://dx.doi.org/10.1016/j.specom.2015.12.001

(2)

Fig.1. Leadingcausesofinfantdeathsin193countriesin2010.

AsFig. 1 shows,congenital anomalies andpreterm births werethedominantcausesofapproximately2.7millioninfants deathsin193countriesin2010(Congenitalanomalies,2014). Statistical reports by the World Health Organization(WHO) (Congenitalanomalies,2014) andCenterfor Disease Control andprevention(CDC)(Anon,2008)statethatcongenital anoma-liesor birthdefects affectapproximately 1in33infantsborn every year. Moreover, in spite of the fact that the U.S. and Canada are highly developed countries, the results of an in-vestigation ofearlyinfantmortalityratesin176 countries in-dicatethattheU.S.andCanadahadthe1stand2ndworstrates inthe developed world,respectively, by 2.6and2.4first-day deaths per 1000births (SaveTheChildren, 2013).It is worth-whilementioning thatapproximately1 percentoftheworld’s infantdeathsoccurindevelopedcountries,andthesituationis worseformanydevelopingcountries.However,it iseasierto identifyababywhohasstructuralproblemssuchascleftlip;on theotherhand,symptomsofsomedefectsmightbeinvisibleand hiddenfromsight.Therefore,webelievethatbyprovidingan inexpensivehealthcaresystemthatdoesnothavecomplexand advancedtechnologyforpoormotherswithnewbornbabiesin low-incomecountries,morebabiescansurvivebeyondthefirst monthsoflife.

Thereareasubstantialnumberofmaternaland environmen-talissuesthatcanraisetherisksofseveralcomplicationsand associatedanomalies,suchasthegestationalage,birthweight, consanguinity,maternal age,multiplegestations, maternal in-fectionduringpregnancy,socioeconomicfactorsandmaternal nutritional status. Forexample, the gestationalage isa note-worthypredictorofinfanthealthconditionswithinthenormal rangeof37–41weeksforbabieswhoarefullydeveloped(full term).Prematurebirth,evenonlyafewweeksearly,increases thechanceofbirthdefectsorinfantdeathinsuchawaythatin theU.S.,the2010mortalityrateforveryearlypreterm(under32 weeks)birthswas74timesworsethanthatoffull-terminfants (MathewsandMacDorman,2013).

Theseofficialstatistics,whichiscompletelyindependentof theinformationintheinfantcries,canprovidemore informa-tionaboutthechancethataninfantisbornwithaspecific con-genitaldisease.Moreover,thereareotherindependentsources of information that are relatedto the physiological condition of the newborninfantsthatcan beusefulinasimilarwayin

multimodalbiometricsystems,whichusemultipleindependent sourcesofinformationandindeedprovideamorereliable sys-tem.However, inthispaper,we are curiousabout examining onlytheabilityofinformationthatisembeddedininfantcries todifferentiatebetweenseveralpathologiesinspiteoftheother sources.

Thisapproachencouragedustobeambitiousanddevelopa newborncry-baseddiagnosticsystemforthecareofbirthdefects afterbirthbyidentifyingsomepossiblephysiologicaldisorders andbirthdefects.Thisearlyinterventioncandefinitelysavethe livesofmanyinfantsandprotectthemfromsomephysical, intel-lectual,visualorauditoryimpairmentbeforeseveredisabilities mightbecaused.

1.2. Relatedstudies

Theleadingroleintheclassificationoftheinfantcryishow toscientificallydiscriminatebetweendifferentneonatalhealth statuses,onlyonthebasisoftheircrysignalsbesidesthehealth examinationofinfantsandotherpredictorsofchildhealth.In recentyears,severalmachinelearningandclassification algo-rithms,suchasartificialneuralnetworks(OrozcoandGarcia, 2003;Canoetal.,2006;Hariharanetal.,2011),radialbasis func-tion(RBF)networks(CanoOrtizetal.,2004),supportvector machines(SVMs)(Amaro-CamargoandReyes-García,2007), NaïveBayes (Amaro-CamargoandReyes-García,2007), Ge-neticSelectionofaFuzzyModel(GSFM)(Rosales-Pérezetal., 2012)havedemonstratedthe abilitytorecognizecrypatterns andmakeintelligentdecisionsbasedontheavailabletraining databases.Accordingtoourknowledge,casesubjectsofbinary classificationtasksinmostofthepreviousworks(Orozcoand Garcia,2003;Canoetal.,2006;Hariharanetal.,2011;Cano Or-tizetal.,2004;Amaro-CamargoandReyes-García,2007)were normalinfantsandthoseinfantssufferingfromhypoacoustic (deaf)orasphyxia.Onthecontrary,theprincipalaimofthe pro-posedsystemistobroadenthediagnosticsystemtoaddressthe mostlife-threateningillnessesanddefectsthatoccurinnewborn infantsintheearlydaysoftheirlife.Itisworthwhilementioning thatitissometimesnoteasytocollectalargenumberof sam-plestorepresentageneralcry pattern.The failuretoachieve thelowestpossibleerrorrateisthemaindrawbackofhavingno

(3)

Table1

DifferentunitsavailableintheCDB.

Labels Definitions

EXP Voicedexpirationsegmentduringaperiodofcry EXPN Unvoicedexpirationsegmentduringaperiodofcry INS Unvoicedinspirationsegmentduringaperiodofcry INSV Voicedinspirationsegmentduringaperiodofcry EXP2 Voicedexpirationsegmentduringaperiodofpseudo-cry INS2 Voicedinspirationsegmentduringaperiodofpseudo-cry PSEUDOCRY Anysoundgeneratedbythebabyanditisnotacry Speech Soundofthenurseorparentstalkingaround

Background Kindofnoisesolow,itischaracterizedbyaverylowpower-silenceaffectedwithlittlenoise. Noisycry Anysoundheardwiththecrye.g.machine’sbip,water,diaperetc….

Noisypseudo-cry Anysoundheardwiththepseudo-cry

Noise Likethesoundcausedbythemicmovedbysomeone,thediaper,adoorsound,speech+background,speech+bip. BIP soundofthemedicalinstrumentsnextthebaby

acceptablecrydatabase.Forthisreason,learningfromasmall, incompletesetofsamplesisofpracticalinterest.

Finitemixturemodelsareaflexibleandpowerful probabilis-tictoolformodelingunivariateandmultivariatedatatoperform modelingandclassificationtasks(McLachlanandPeel,2004). Inthispaper,weemploytheGaussianmixturemodel(GMM), whichisapowerfulmodelforrepresentingalmostany distribu-tion.TheGaussianmixturemodeliscomputationally inexpen-siveandisbasedonawell-understoodstatisticalmodelthatcan beviewedasahybridbetweenaparametricandnonparametric densitymodel. Moreover,thereare manyadvantagesthat are claimedwhenusingGMMasthelikelihoodfunction(Reynolds etal.,2000).Theexpectation-maximization(EM)algorithmis acommon methodfor maximumlikelihood learningof finite GMMs;thisapproachhassomeadvantagesoverotherlearning methods, such as agradient-basedapproach (Xu andJordan, 1996).

Inourpreviousstudies(FarsaieAlaieandTadj,2012;Alaie andTadj,2013),wehaveintroducedaworkingprototypetotrain aGMMinanincrementalandrecursivemanner;thismethodis calledthe adapted boosting mixturelearning Method(BML). The proposedmethod trains finitemixture models bya pool ofGaussiancomponents.Partialandglobalupdatingmethods areused inmodelparameterestimationprocessestospeedup thelearningprocessandconvergetoamorerobustandreliable estimationofanewmixturecomponent.Theselectedstrategy tostoptheaddingprocessisacriterion-basedapproachcalled BayesianInformationCriterion(BIC).Wehaveshownthatthe proposedmethod has better performance than the traditional EM-basedre-estimationalgorithmasareferencesystemforthe classificationoftheinfants’cries.Inabinaryclassificationtask, thesystemdiscriminatedatestinfant’scrysignalintooneoftwo classes,namely,healthyinfantsandsicknewbornbabieswith selecteddiseases.

In thispaper, we describethe developmentandevaluation of a Gaussian Mixture Model-Universal Background Model (GMM-UBM) system that is applied to an infant cry expi-ration and inspiration corpora for the enrolled health condi-tions.Thisnewborncry-baseddiagnosticsystemcanbereferred to as the GMM-UBM health-condition verification/detection system.

The remainder of this paper is organized as follows. In

Section2,wepresentrecordingprocedureandourcrydatabase. Next,inSection3,thefeatureextractionprocedureisexplained.

Section4 presentsour classification approach, andSection5

describesexperimentsandtheresultsofeachhealth-condition detectorusingourcrydatabase.Finally,conclusionsandfuture directionsarepresentedinSection6.

2. Recordingprocedureandcrydatabase

Therecordingsweremadeintheneonatologydepartments ofseveralhospitalsinCanadaandLebanon.Weperformedthe recordingprocessbyconvertingtheanalogcrysignaltoan un-compresseddigitalaudioformatthatissuitablefor storingan originalrecordinginawavfile.Eachinfant’scrywasrecorded byanOlympushand-helddigital2-channelrecorder,placed10– 30cmawayfromtheinfant’smouth,atasamplingfrequency of 44.1 kHz and a sample resolution of 16 bits. A neonatal intensive-careunit(NICU)isaspecialsystemofcarefor new-borninfantswhoaresickorprematureorgenerallyneedmore medicalattentionduetosufferingfromsomecongenital abnor-malities.Occasionally,thecryrecordingprocesswasperformed withbackgroundnoiseorevenwithaconstantnoisefromthe careunitandfrommedicalequipmentthatisconnectedtothe infantswho are inthe NICUdueto their prematurity or de-fects. Although the NICUshouldbe aquiet environment for sleepingbabies,inrealitythereisalargeamountofunwanted noise from infusion pumps,monitors, ventilators, telephones anddoors.Ourgoalistodevelopasystemthatdoesnotneed complexandadvancedtechnologytoprovidepoormothers hav-ingnewbornbabiesinlow-incomecountries.Thus,soundproof roomsorunitswerenotusedtorecordthecrysignaltoobtain thebestsignal-to-noiseratiothatcouldotherwisebeachieved. Eachrecordedinfant’scrysignal,evenahealthyinfantwhois notcompletely clean,is manuallysegmentedinto13unitsor classes,whicharedefinedandlabeledinTable1.

Thecasesubjectsforthisstudy wereinfantsselectedfrom 1to53daysold,comprisinghealthyandsickfull-termbabies. Foreachinfant,therearethreerecordingfiles,withtheaverage durationof 90 sfor each continuous file.Usefulinformation suchasthedateofbirthwererecordedalongwiththefollowing

(4)

Table2

Listofhealth-conditions.

Categories Description

Healthyinfants Full-terminfantswithoutanymajordisorderorsickness

Heartproblems Full-terminfantssufferingfromtetralogyoffallot,thrombus,complexcardioorcongenitalheartdiseases Neurologicaldisorders Full-terminfantssufferingfromsepsisormeningitis

Respiratorydiseases Full-terminfantssufferingfromrespiratorydistressorasphyxiadiseases Bloodabnormalities Full-terminfantssufferingfromhyperbilirubinemiaorhypoglycemiadiseases

Other Full-terminfantssufferingfromotherabnormalitiesorphysicalproblemswhicharenotinpriorityorderforoursystem

pertinentinformation:weight,gender,maturity,race,ethnicity, gestationalage, knownanddetecteddiseases,APGARresult, dateandtimeofeachcryrecordingandthereasonforthecrying (suchaspain,hunger,diaperchange,birthcry,medicalexam). Sofar,wehavedividedtheavailablehealthconditionsinourcry database(CDB)intodifferentcategorieslistedinTable2.The reasonforthecryisnotconsideredintheselectedsamples,in contrasttopreviousstudiesthatusedonlythepaincry(Partanen etal.,1967).

3. Featureextraction

Usually, in humanspeech signals, thereare low-level and high-levelcuesthatcanbeusedtorecognizedifferentspeakers; thesecuesarerelatedtotheacousticandsemanticor linguis-ticaspectsofspeech.Thehumanauditorysystemusesdifferent levels of information,incontrast toautomaticsystems which dependstillonlow-levelacousticinformation.Themajor chal-lengesof havingahigherlevelof informationderived froma crysignalaretofindandextractsomefeaturesinsuchaway that they convey distinctive information from the cry signal; thisapproachhasbeenunderstudyinrecentyears(Kheddache, 2014).Inthispaper,MFCCsareselectedtobeextractedas fea-turesinordertorepresentnewborninfantcrysignalssincethey haveagoodperformanceonvarioustypesofspeechapplications (Deller etal.,1993;Quatieri,2002).Notethat thesamebasic modelofspeechproduction(Delleretal.,1993)inadultsisused tofindthesemeasurements.Thusfar,ithasbeenshownthatthey arealsoeffectiveinclassifyinghealthyandsickinfantsbasedon ourprimaryresults(FarsaieAlaieandTadj,2012).Moreover, weincorporatecontextinformationbyaddingdynamicfeatures inthiswork,buttheyarenotnecessarilythemostinformative featuresfortheintendedpathologyclassificationtask.

3.1. Preprocessingstages

ToincreasetheaccuracyandreliabilityoftheMFCCfeature extraction process,cry signals are pre-processed in 3 simple steps:

3.1.1.Convertstereochanneltomonochannel

Inthedatacollectionstep,becausewehaveuseda2-channel recorder,wemustaveragethechannelsfirstandthenconvertthe signalintoasingle-channelsignalusingameanvaluefunction.

3.1.2.Pre-emphasization

Similartoinaspeechrecognitionsystem, weused a first-orderhigh-passFIRfiltertopre-emphasizethesignalduetothe highdynamicrange ofthedigitizedcrywaveform,suchasin aspeechwaveform.Themainreasonforusingthisfilteristo compensatethespectraleffectoftheglottalsourceby introduc-ingazeronearz=1(Delleretal.,1993).Therefore,thefilter P(z)=1−0.97z−1 (Youngetal.,2006;RabinerandSchafer, 1978)shouldbeappliedpriortoderivingthefeaturesor charac-teristicsthatcorrespondtothevocaltractonly.

3.1.3.EXP/INSVdetection

Generallyacryisdefinedastheexpiratoryphaseof respira-tionwithsoundorphonationbythelarynx,whichcontainsthe vocalcordsorfoldsandtheglottis(LaGasseetal.,2005).Here, theinputdata(featurevector)giventoourcry-baseddiagnostic system(crypathologyclassifier)representsaprocessedversion ofoneormorevoicedexpiration(EXP)orinspiration(INSV) segmentsofcryutterances. Therefore,after segmentationand labeling,onlytheEXPandINSVsegments ofthecrysignals areselectedforthe featureextractionprocedures.Thesystem hasbeen builtmanuallyby trainedexperts sofar,but we are workingonautomaticsegmentationofrecordedcrysignalsthat canactinstead ofvoiceactivation detection(VAD)inspeech recognitionsystems.

3.2. StaticanddynamicMFCCs

Briefly,thecrysignalsarepre-processedtobepreparedfor short-termprocessing,andthen,thefeatureextractionprocedure isapplied.Generally,alloftheconventionalanalysistechniques usedinthesignalprocessingapplicationworkwithshort-term framesofsignalswithnon-stationarydynamics,suchashuman speech.Therefore,eveninourcaseinpoint, itisourdutyto selectareasonableportionofthecrysignalinsuchawaythatit doesnotchangestatistically.Framesarecommonly10–30ms induration,tobestatisticallystationarywithagoodtradeoff be-tweenthefrequencyandtimeresolutionsinapplicationsthatuse speechsignals.Inthispaper,featureextractionwasperformed byusingtwodifferentframedurations,10and30,withthesame overlappercentage(30%)betweentwoconsecutivewindows,to findagoodtradeoffbetweenthefrequencyandtimeresolution ofthecrysignalsandtoassesswhatimprovementsitmighthave. Afterframingandpriortoanyfrequencydomainanalysis,the hammingwindowinghasbeenappliedtoreduceany discontinu-itiesattheedgesoftheselectedregion.Inhumanspeechsignals, thereisnotmuchinformationabove6.8kHz.Thecumulative

(5)

1000 2000 3000 4000 5000 6000 7000 8000 50 55 60 65 70 75 80 85 90 95 100

Cumulative Power Spectrum for EXP segments

P e rc e n ta g e (% ) Frequency (kHz) Hesam-Fullterm-Sick-RespiratoryDiseases Hesam-Fullterm-Sick-Others Hesam-Fullterm-Sick-NeurologicDiseases Hesam-Fullterm-Healthy 4 KHz 6.8 KHz 98% 94% 1000 2000 3000 4000 5000 6000 7000 8000 50 55 60 65 70 75 80 85 90 95 100

Cumulative Power Spectrum for INSV segments

P e rc ent age ( % ) Hesam-Fullterm-Sick-RespiratoryDiseases Hesam-Fullterm-Sick-Others Hesam-Fullterm-Sick-NeurologicDiseases Hesam-Fullterm-Healthy 4 KHz 6.8 KHz 98% 94% 1000 2000 3000 4000 5000 6000 7000 8000 50 55 60 65 70 75 80 85 90 95 100

Cumulative Power Spectrum for EXP segments

P e rc e n ta g e (% ) Hesam-Fullterm-Sick Hesam-Fullterm-Healthy 4 KHz 6.8 KHz 98% 94% 1000 2000 3000 4000 5000 6000 7000 8000 50 55 60 65 70 75 80 85 90 95 100

Cumulative Power Spectrum for INSV segments

P e rc ent age ( % ) Frequency (kHz) Hesam-Fullterm-Sick Hesam-Fullterm-Healthy 4 KHz 6.8 KHz 98% 94% a b c d

Fig.2. Cumulativepowerspectrumof(a)and(c)EXPunits,(b)and(d)INSVunitsforeachhealthcondition.

powerspectrumisusedheretodetecttheuppercutofffrequency oftheefficientfrequencybandofinfantcrysignals,wherethe poweralmoststopstoincrease.Theresultsofourexperiments depictedinFig.2demonstratethatthecrysignalsoffull-term healthyandsickinfantshavealmost94%and98%oftheir ener-giesbelow4kHzand6.8kHzrespectively.Theresultsdepicted inFig.2alsoindicatethattheenergiesoftheinspiration seg-mentsforsickinfantstendtoaccumulateataslowerratethan theenergiesoftheexpiration segments,especiallyincasesof RDSdisorders.Thusfar,informationuptoa4kHzbandwidth hasbeenused,butweplantoconductpioneeringresearchon theaforementionedupperfrequencyband.Inbrief,thefeature extractionphaseforeithertheINSVorEXP-labeledsegments canbeperformedintwostages:

1.ReducethedimensionalitybyCepstralanalysisandextract thefirst12MFCCscomputedfrom24filterbanksplusthe energyfeature.

MFCCs areintroduced byMermelsteininDavisand Mer-melstein (1980) as the DCT of the log-energy output of the triangularbandpassfilters.To extractthe MFCCs,first thefastFourierTransform(FFT)isperformedtoobtainthe magnitudefrequencyofeachwindowedframe,andthen,the MFCCs arecalculatedbyconvertingthelogMelspectrum

backtothetimedomainusingthediscretecosinetransform (DCT): Ci= K k=1 Skcos i k−1 2 π K , i=1,2,...,M (1) whereKisnumberofsubbands(filterbanks)(whichis24for ourselectedbandwidth(Reynolds,1995)),Misthedesired lengthofthecepstrum,andSkrepresentsthelog-energy

out-putofthekthtriangularbandpassfilter.Fig.3indicatesall ofthe3pre-processingstepsfollowedbytheaforementioned MFCCextractionprocedureinstages.

2. Adddynamicfeaturesbytakingthefirstandsecond deriva-tivesoftheobtained13-static features,calledthedeltaand delta–delta(acceleration)coefficients.

Thefirst timederivationof thebasicstaticparameters (re-ferredtoasdeltacoefficients)canbecalculatedoveralimited window,asfollows(Youngetal.,2006):

Dn=

θ=1θ(Cn+θCn−θ)

2θ=1θ2 (2)

where Dis a delta coefficient at the discrete time n, andCi

(6)

Fig.3. Pre-processingandMFCCfeatureextractionsteps.

boththepastandfuturestaticparametersCn±θ,toavoidhaving

aproblemwiththeregressionwindowatthebeginningandend ofthestaticparameters,usuallyreplicationofthefirstandlast parametersisrequired.Thesameformulaisappliedtothe calcu-lateddeltacoefficientstocomputethesecondderivationofthe staticparameters(referredtoasthe accelerationcoefficients). Afterappending the deltaandacceleration coefficients tothe staticMFCC parameters, the setof 39-length feature vectors extractedfromeach singlewindowedframecryisdenotedxt,

wheretshowsthesequenceindex.Therefore,acrysignalcanbe displayedbythesequenceoffeaturevectorsxtrunninguptothe

endofthesignal,withTfeaturevectorsX =(x1,...,xt,...,xT).

4. Statisticalmodelinganddescriptions

Thereisaminordifferencebetweendetectionand identifi-cationsystemsinadecisionprocess,whilebothusethe same baseofinformation.Ithasbeenshownthatidentificationoccurs insideofadetectiontaskinsomesenseanyway,andtheir per-formances changetogether inthesameway(Thomas, 1985). Weareseekingtointroduce acry-basedidentificationsystem toclassifythepresented infantas havingoneof thespecified healthconditions,butinthispaper,detectionismeasuredbythe abilityofourclassifiertodistinguishbetweenaninfantwiththe specifiedhealthconditionandaninfantwiththeotherconditions listedinTable2aspreliminarystages.

4.1. Likelihoodratiodetector

Inanidealcasewithwell-definedmodelsforcrysignalsof infantsinhealthandpathologicalconditions,thedefined clas-sificationproblemissimilartothecanonicallanguage recogni-tionproblem(Brummer,2010)withtheclosed-setofspecified languages.Similartospeakeridentificationsystemsthatare in-tendedfora1:Nmatch,thevoiceiscomparedagainstNspeaker models12,...,λN),whereλirepresentstheparametersof

theithithspeakermodel.Thissystemcanbepresentedbya maxi-mumlikelihoodclassifierwhoseobjectiveistoselectthespeaker modelthathasthemaximumaposterioriprobability(MAP)for the observation vectorsequenceX =(x1,...,xt,...,xT).The

decisioncanbepresentedbytheminimum-errorBayes’rule,as follows:

MatchedSpeakerIndex=argmax

1≤iNPr(λi|X) =argmax

1≤iN

p(X|λi)Pr(λi)

p(X) (3)

Ifitcanbeassumedthatthepriorprobabilityofeachspeaker isequal,thenthedecisionformulacanbesimplifiedasfollows: MatchedSpeakerIndex=argmax

1≤iNp(X|λi) =argmax 1≤iN T t=1 logp(xt|λi) (4)

Inverificationsystems,thetaskisa1:1match,whichisin markedcontrast tothe identificationsystem. Forexample, in speakerverificationsystems,theobjectiveistodetermineifthe observedinputXisfromthehypothesizedspeaker(hypothesis

H0)ornot(hypothesisH1).Thelikelihoodratiodetectorhasbeen

acceptedasageneralapproachinthespeakerverificationsystem (Reynoldsetal.,2000).AssumethatH0andH1hypothesesare

representedbymodelsλHypandλH yp,respectively;itcalculates

theratiooftheposteriorprobabilitiesofthetwohypotheses:

PrλH yp|X

PrλH yp|X

(5) Bayes’ruleprovidesashortcutforcalculatingthelikelihood ratiointhelogdomainbyignoringconstantsthatresultinthe log-likelihoodratiosofH0andH1,asfollows:

(X)=logpXλH yp −logp XλH yp acceptH0 ≥ < re jectH0 θ (6) where θ isathreshold thatadjuststhe trade-offbetweentwo typesof error, falseacceptance andfalse rejection.Although theclaimedspeakerhasawell-definedmodelinsuchasystem, thecorrespondingalternativemodelsareill-defined.Thisissue poses achallenge tocreate λH yp in such away that presents

the entire space of possible alternatives to the hypothesized speaker.Ingeneral,twomainapproacheshavebeendescribedin

Reynoldsetal.(2000)tomodelλH yp.Sincebothtechniquesare

evaluatedinourexperiments,wedescribebothofthembriefly below.

(1) Backgroundspeakermodels

Inthisapproach,thesetofspeakermodelsexcludingthe hy-pothesizedspeakerhavebeenselectedandcombinedtomodel the alternative hypothesis.There has been alarge amountof researchintobackgroundspeakers(Higginsetal.,1991; Mat-suiandFurui,1994;Reynolds,1995;Rosenbergetal.,1992). GivenBequallylikelybackgroundspeakers,whichare repre-sentedby12,...,λB),thelog-likelihoodsofthe

(7)

arecomputedas(Reynolds,1995) logpXλH yp = 1 T T t=1 logpxtλH yp (7)

The1/Tfactorisusedtonormalizethedurationeffectinthe log-likelihood.Notethatbyignoringthe1/Tfactor,the likeli-hoodofthebackgroundspeakerscanbeobservedasthejoint probabilitydensityoftheobservationXarisingfromoneofthe

Bbackgroundspeakers: logpXλH yp =log 1 B B b=1 p(X|λb) (8) wherep(X|λb)iscomputedas inEq.(7).The maindrawback

ofthisapproachispreparingabackgroundspeakersetforeach hypothesizedspeaker,whichcanbeaproblemforapplications thathavealargenumberofhypotheses.

(2)Speaker-independentmodel

This technique (Reynolds, 1997; Matsuiand Furui, 1995) attemptstopooltrainingsamplesfromalargenumberof speak-ers,torepresentthepopulationofspeakersbyasingle speaker-independent model; thismodel is currently known as a uni-versal background model (UBM). A Universal Background Model(UBM)isaworldmodelthatisusedmostlyin biomet-ricverificationsystemstorepresentgeneralfeature character-istics(Reynolds,2009).Specifically,theuniversal-background model-basedGMMorGMM-UBMhasalargeamountof suc-cessinstatisticalmodeling techniquesforspeakerrecognition andlanguagerecognitionsystems(Reynoldsetal.,2000),and incontrasttopreviousapproaches,atrainedUBMcanbeused forallhypothesizedspeakersinthetask.

4.2.Gaussianmixturemodels

The GMM modelingtechnique issimplebuteffective due toitsremarkableabilitytoformsmoothapproximationsfrom anyarbitrarilyshapeddatadistribution.Ithasbeenasuccessas astatistical modelindifferent applicationsandsystems,most notablyinspeaker recognition andspeaker identification sys-tems(ReynoldsandRose,1995)duetoitsabilitytomodelthe underlyingdataclassesordistributionsofacousticobservations fromaspeaker.ThelikelihoodfunctionofaGMMusedfora

D-dimensionalfeaturevector,x,isaweightedsumofK multi-variateGaussiancomponents,fi(x),eachparameterizedbyaD ×1meanvector(μi)andaD×Dcovariancematrix(i),as

givenbytheequation

F(x|λK)= K i=1 cifi(x)= K i=1 ciN(x|i) = K i=1 ciN(x|μi,i) (9)

where λK represents the GMM parametersandconsists of K

componentswiththerestrictionthatthemixtureweightsmust

satisfythefollowingtwoconstraints:ci≥0fori=1,...,Kand

K

i=1ci=1.Theithcomponentcanbewritteninthefollowing

notation: fi(x)=N(x|i)=N(x|μi,i)= 1 (2π )D2|i| 1 2 ×exp −1 2(xμi) Tr−1 i (xμi) (10) where i=(μi,i)are the parameters for the ithGaussian

density, and ATr represents the transpose of matrix A. Col-lectively,a GMM canbe denoted by its parametersas λK= (ci,i, i=1,...,K).

4.3. Systemdescription

The proposeddiagnostic system isbuilt around the likeli-hoodratiotestforverification,GMMsforlikelihoodfunctions andGMM-UBMmodelsforadaptingalternativehealth condi-tionsviatheadapted-BMLmethodinsteadofcommonBayesian adaptation(Reynoldsetal.,2000;DudaandHart,1973;Gauvain andChin-Hui,1994).Intherelatedliterature,thisapproachis alsoknownasBayesianlearningortheMAPestimationmethod. In ourwork, oneUBMis ahealth-independentGMM that is trainedwithcrysamplesfromtheavailabletrainingCDBthat contains full-term healthy and sick infants with specific dis-eases,torepresentthegeneralcryfeaturecharacteristics. An-other UBM is pathology-independent GMM that attemptsto modelcriesfromsickinfantsinavailablepathologicalclasses. Then,weemployedtheadaptedBMLmethodtoadaptthe re-latedUBMtoatargetorspecificclass.Wewillshowthatthis approachimprovestheperformanceofourclassifierin compar-isontoourreferencesystem,whichusesBayesianadaptation.

The cruxof the design ishow we fusesubsystems into a singleeffectivesystem.Ourcry-basedmulti-classrecognition systemhasahierarchicalschemethatisatreelikecombination ofindividualclassifiersinserialandparallelmodes.Weusedthe abilityoftheserialmodetonarrowdownthehealthcondition oftheinfantstooneoftwopossibilities,suchasthebiometric identificationsystemintroducedinHongandJain(1998).This approachmeansthatinthefirststep,thetwo-classpattern recog-nizershouldmakethedecisionastowhichpropositionshould beeliminated,healthyorsickinfants.Thispartofoursystem acts asaverification systemfor healthy infants,while distin-guishingbetweenhealthyinfantsandsickinfants.Thegoalhere istodeterminewhethertheinfantishealthyornot;inthecaseof unhealthy,thesecondpartshouldactasanidentificationsystem becausethecrysignalsofthesickinfantsareassumedtobefrom thepredefinedsetofknownsicknesses.Thewinnersicknessbest matchesthetestinfant’scrysignalmodelinaknowngroupof diseases.Thissicknessidentificationsysteminvolvesonlythe aforementionedenrolledsicknessesandnotallofthenewborn illnesses.Intheclosed-setcase, (N1,N2,...,NL)representsL

differentinfantsicknesses,whichhavewell-definedstatistical models.In the second scenario,whichis calledthe open-set, the same(N1,N2,...,NL−1)are specifiedsicknesses, andNL

(8)

Table3

NumberanddurationoftherecordedcrysignalsthatwereavailableinthetrainingCDBatthetime. Class Numberofinfants Numberofcrysignals

intrainingCDB

Overalllengthof trainingCDB

INSV EXP

Healthyinfants 58(4male) 142(11male) 12’4’’ 92’3’’

Sickinfants 25 66 3’53’’ 41’25’’

Heart 4(1male) 12(3male) 34’’ 5’4’’

Neurological 5(3male) 11(7male) 51’’ 8’2’’

Respiratory 10(5male) 27(15male) 1’08’’ 17’

Blood 3(2male) 9(5male) 36’’ 5’06’’

Others 3(nomale) 7(nomale) 43’’ 4’5’’

(a)Trainingcrydatabase(CDB)

Class Numberofinfants Numberofcrysignals inbalancedtraining CDB Overalllengthof balancedCDB INSV EXP Healthyinfants 39 53 2’40’’ 25’25’’ Sickinfants 22 54 3’03’’ 26’06’’ Heart 4 12 34’’ 5’40’’ Neurological 5 9 34’’ 5’30’’ Respiratory 7 17 35’’ 4’49’’ Blood 3 9 36’’ 5’06’’ Others 3 7 43’’ 4’50’’

(b)Trainingbalancedcrydatabase

createdaclasscalled“others” or“none-of-the-above” in the tar-getsetofdiseases.Thestate-of-the-artinfant’scry-basedhealth caresystemhasahierarchicalschemethatiscomposedoftwo subsystemsthatarebothbasedonanacousticapproach. Individ-ualscoresforexpiration(EXP)-basedandinspiration (INSV)-basedexpertsarefusedtogethertoexploitthecomplementary informationthatcanberepresentedbyourhealthcaresystem definedoverthefeaturesextractedfromtwodifferenttypesof corpora.

Forhealthyinfants,thedecisionprocessshouldbestopped atthisstagebeforeusingalloftheremainingclassifiersthatcan reducethe overallrecognition time.Then, inthecaseof sick infants, trained individual pathological detectorsystems, ina parallelmodeofoperation,shouldarriveatafinaldecisionon thepathologicalconditionofthetestinfant.Incasetheaccuracy ofthefinaldecisiononthemostlikelydiseasewascalledinto doubt,thereisanotherclasscalledothers thatcorresponds to infants that do not havethe considereddiseases or those for whichweneedmorerecordedcrysignalsformoreexamination. Theproposeddetectionsystemworksintwophases,namely, training andruntime test. In the first phase, labeled cry cor-poraareanalyzed andused totrainthecorresponding model. Eachmodelshouldrepresentsomehealth-dependent character-isticsof thetraining data.Inthe testphase,the presentedcry samplegoesthroughthesameprocessasinthetrainingphase (the preprocessingandfeature extractionsteps),andthen, for boththeexpiration andinspirationcorporaofthe sample,the log-likelihoodratiotothehypothesizedmodeliscalculated.In thispaper,theperformance ofeachEXPandINSV-based ex-pertisevaluatedandthen,atthedecisionstageofthesystem, theobtainedscoresfromtheseexpertsarefusedtoimprovethe performance.

4.4. ApplyingtheGMM-UBM

Inthispaper,wedefinedandtrainedtwoGMM-UBMsfor eachcorpora(EXPandINSV),aswementionedearlier;thefirst oneisa singlehealth-independent backgroundmodel trained torepresent the distributionof the extractedcry features, re-gardless of what condition the infant might have(healthy or sick), and the second one is a pathology-independent back-ground model. Because we focus our attention on full-term healthyand sick newborninfants that havespecific diseases, wetrain theUBMtobeused for theclassificationof healthy andsickinfantsusingonlycorrespondingdatathat are reflec-tiveof the expectedalternative cryto be encountered during recognition.Forexample,inthiscase,itisknownapriorithat thecrysignalbelongs toafull-terminfant,andthus,the full-termtestinfantwill onlybeclassifiedagainstfull-terminfant cries.Thisapproachappliestoboththegestationalageofthe infants and the types of diseases that are considered in our case.

Note that inthis researchproject the recording process is stillinprogress.Sofarthecollectionandmanualsegmentation ofdatahasbeendoneattwodifferentpointsintime.Thuswe havetwo separate CDB called training andtesting including completely different newborn infants. Table3(a)displays in-formationabouttrainingdata(includingavailabledataforboth training UBMandadaptation processatthe time).The num-berofinfants,thenumberofmaleandfemale, thenumberof usable cry signalsineach class andtotal durationof expira-tion/inspirationsegmentsineachclassareshown.Notethatthe classnamed“Sickinfants” containsallinfantsandtheirsamples fromthefivepathologyclasseslistedinTable2.Moreover,the durationofusableEXP/INSVcrytypesintotalwereshownin

(9)

Fig.4. BalanceddatapoolingapproachesfortwodefinedGMM-UBM. Table3(a)foreachclass.OurtestingCDBwillbedescribedin

detaillaterinSection5.2ofthispaper.

ForboththeEXPandINSV-based experts,the crysignals fromsubpopulations(healthyandsickinfantswithselected dis-eases)arepooledpriortotrainingthehealth-independentUBM. Sincewehavemoredatasamplesinhealthyinfantclass com-pareto sick class in total, there is a possibility of obtaining biasedUBMtowardthedominantsubpopulationwhichisthe healthyclass.Thereforeweexploitaportionofeach subpopula-tionwithintheavailabletrainingdata(Table3(a))insuchaway thatwecreateabalancedtrainingdatabaseoverthesubclasses forbothpredefinedUBMs(Fig.4)toavoidthisproblem.For thispurpose,firstwe triedtoselectcrysignalsinsuchaway that,eachpathologyclass (fivepathologyclasses)has almost the sameduration of EXP/INSV-labeled segments. Then, we selectedhealthycrysignalsinawaythattheirinspirationand expirationdurationsarealmostthesamelengthasthoseoftotal crysignalsinthesickclass(includingall5selecteddiseases).

Table3(b)providesanoverviewofthenumberofinfantsand recordedcrysignals,thedurationofusableEXP/INSVcrytype withinthecreatedbalancedCDBforeach healthyand patho-logicalclass.NotethattherestoftrainingCDBwhichwerenot selectedasamemberofthecreatedbalancedCDB(Table3(b)), wereemployedtoderivethehypothesizedmodelsbyadaptation oftheUBMs.Similartoinspeakerverification,thereisno ob-jectivemeasuretodeterminethecorrectnumberofinfantsorthe durationofcrysignaltotrainaUBM.Itisworthwhile recall-ingthattheprocedureforthedatacollectionisstillinprogress; thus,weusedallofthedatathatwasavailableatthetimefor trainingeachmodeland,then,theincomingdataforthetestand evaluationprocess.

Prior thiswork,we introducedthe AdaptedBML (Farsaie AlaieandTadj,2012)methodtoestimatemixturemodel param-eters;thisapproachhasbetterperformancethantheconventional EM-basedre-estimationalgorithmasareferencesystemforthe GMMtrainingstep.TheAdaptedBMLhasseveraladvantages overthementionedreferencesystem,butthedistinctadvantage isthat itestimatestheoptimumnumberofcomponentsby it-erativelyaddingnewcomponentsinthedirection thatlargely increasesthepredefinedobjectivefunction.Thereisno guaran-teethatincreasingthenumberofcomponentsinaGMMtrained byHTKprovidesbettersystemaccuracy(Dobrovicetal.,2012), althoughtheEMalgorithm(Dempsteretal.,1977)iteratively

re-estimatestheGMMparameterstomonotonicallyincreasethe likelihoodofthemodelforthevectorofobservations,incontrast totheadapted-BML,inwhicheachnewaddedcomponentbrings improvementinthepredefinedobjectivefunction.Despitethis option,inthepreliminarystagesofourcry-baseddiagnostic sys-tem,thebuildingortrainingphaseofthedefinedGMM-UBMs hasbeenperformedbasedontheHTK(HiddenMarkovModel Toolkit)software tool,whichisanestablished tool ofspeech recognitionsystems based onhiddenMarkovmodels (Young etal.,2006).Inthetrainingprocedure,we substitutediagonal covariancematricesfor thefullcovariancematricesduetoits computationalefficiencybecauseadiagonalcovarianceGMM withorderK>1canmodeldistributionsoffeaturevectorswith correlatedelements.Then,inthenextstep,weusedtheadapted versionoftheparameterupdatingproceduredescribedinFarsaie AlaieandTadj(2012)toadapt UBMtocreatespecifichealth conditionmodels.

4.5. BMLadaptationofsub-modelsorhealth-dependent-infant crymodels

Asmentioned earlier, there isacommon technique called Bayesianadaptation(DudaandHart,1973;Gauvainand Chin-Hui,1994)for derivingthehypothesizedspeaker modelfrom GMM-UBM. Here,we introduce anewway of updating the GMM-UBM parametersbasedon theinfantcry signalsfrom relatedsubclasses.In fact,apart ofthisadaptation technique wasintroducedearlierinFarsaieAlaieandTadj(2012)andJun etal.(2011)aspartialandglobalupdatinginBoostedmixture learning (BML) of GMM and HMM-based acoustic models. Specifically,weusetheconceptofboostingtorefinetheUBM parametersusingthetrainingcrysignalsofinfantswithaspecific healthcondition.

As mentioned earlier, the model parameters, λK = (ci,i, i=1,.....,K),ofeachUBMwithaknownnumberof

mixtures,K,canbecalculatedbyusingtheHTKsoftwaretool. ToadapttheobtainedUBM,thestatisticsandsampleweights,

W(xt),ofeachsubclasstrainingdataarecalculatedforeach

mix-ture,fk,intheUBM.Then,theyareusedtorefinethe

correspond-ingmixtureparameters,k,andmixtureweights,ck,iteratively,

whileFK−1areassumedtobeconstant.ByapplyingtheEM

al-gorithmtooptimizethelog-likelihoodofthemodelforthevector ofobservationsonlywithrespecttothemixturecomponentfk,

(10)

theiterativeformulacanbederivedtoadaptthemodel param-eters. The adapted parameters, ˆλK=(cˆi,ˆi, i=1,.....,K),

canbeestimatedinthe(n+1)ththequationasfollows: wn(Xt)= fk Xt(kn) cn kfk Xt(kn) + 1−cn k Fk−1 Xtλk−1 = fk Xt(kn) Fk Xtλk γt (n) k = wnXt T t=1wn Xt ˆ cnk+1= 1 T T t=1 ˆ cnkw n Xt ˆ μn+1 k = T t=1 γt (n) k .Xt ˆ n+1 k = T t=1 γt (n) k . Xt− ˆμnk+1 Xt− ˆμnk+1 Tr (11) inwhichtheUBMparametersareusedasaninitialpoint.The adaptationprocedureisperformedinsuchawaythatmixtures witha highcount of subclasstraining data concentratemore ontheseexamples,andviceversa.In otherwords,duetothe existenceoffkinthenumeratorandFK−1inthedenominatorof

theweightsamplesequation,theobservationsthathavelower probabilitiesbytheFK−1model aregivenlargerweightsthan

thosethathavehigherprobabilities.Itisworthwhilementioning thatthefirstpartofthedenominatorcanreducetheprobability ofthecaseinwhichfkisdominatedbyafewsamples.Moreover,

sampleweightsintheupdatingmixtureweightsformulaactas atuningparameter,whichhelpstorectifythemixtureweights iterativelybydeterminingtheabilityofeachmixturecomponent tomodelthesubclasstrainingsamples.

Incomparisontotheclearcouplingmethodpresentedinthe Bayesianadaptation(Reynoldsetal.,2000;Gauvainand Chin-Hui,1994),theBMLadaptationcanbeobservedasanindirect or hiddencouplingbetweenboththemixtureweightsandthe parametersof the adapted model and UBM.Note that inthe Bayesianmethod,therearerelevantfactorandadaptation coef-ficients(Reynoldsetal.,2000)thatcontrolthebalancebetween theoldandnewestimates.

5. Evaluationsandexperiments

5.1. DefiningGMM-UBMandadaptationmethods

Specifically, we created two health-independent UBMsby training 875 and92mixture GMMs withpooledhealthy and sick data, from the balanced database (Table 3(b)) for the EXPandINSV-labeledcrysegments,calledλH IUBME XPand λH IUBMIN SV,respectively.Then,twopathology-independent

UBMsthatincluded 443and51mixture GMMsweretrained byonlysickdatafortheEXPandINSV-labeledcrysegments, called λPIUBME XP and λPIUBMIN SV, respectively. We

se-lected thenumber of mixtures based onthe created UBM in

the1999NISTSRE,whichisacombinationof 1024mixture GMMsfrom using onehour of speechpergender(Reynolds etal.,2000).Inthefirstclassificationtask,healthyandsick mod-elsarederived fromthe health-independentUBMs,λH IUBM,

whileinthepathologydetectiontasktwopathologymodels (in-fantswithneurologicalandrespiratoryproblems)arederived fromthepathology-independentUBMs,λPIUBM.Notethat

dur-ingadaptationoftheUBMs,fourdifferentadaptationmethods havebeenexploitedinordertocompareasfollow:

1. MAPorBayesianadaptationthatadaptsonlythemean vec-tors– Thisapproachhasthebestperformanceamongallof thecombinationsofparameteradaptationsforaspeaker ver-ificationsystem (Reynolds etal., 2000). Moreover,it was mentionedthatadaptingtheweightsbyMAPforknown rea-sonsdegradestheoverallperformance.

2. BMLadaptationmethodforrefiningthemeanandvariance vectors.

3. CouplingoldandBMLadaptationestimatesoverthemean andvariancevectors– Wecomputenewstatisticsforthe pa-rametersbasedontheBMLmodelestimates,andweusea singleadaptationcoefficientforboththemeanandvariance parametersαi= nni

i+r withtherelevantfactorr=16to con-trolthebalance,whichisthesameasinBayesianadaptation. 4. BMLadaptationmethodforrefiningonlythemeanvectors.

5.2. Log-likelihoodscorescomputation

We applied the idea of the HNORM score normalization methoddescribedinReynolds(1997)fortheEXP/INSV-labeled crysegmentsseparately. Ineachhealth-conditiondetector,we used only non-hypothesized or non-target cry samples (im-poster) to estimate the normalization parameters. Therefore, thenon-targetlog-likelihoodratioscoredistributionshavebeen rescaled tohave amean of zero anda standarddeviation of one.DuetodifferentlengthsofextractedEXP/INSVsegments fromeachrecordedcrysignal,moreevidencemightbeneeded tomakeareliabledecisionforeachtestfile,especiallyforthe inspiratorycry whichhas a shorter duration than that of the expiratorycry.Therefore,eachcorpuswassplitintosmallcry unitsofapproximately3 sdurationtoinvestigatetheeffectof the EXP/INSVduration length ineach recorded file.The re-sultsindicatethatindependentoftheframelength,typeofcry units(EXP/INSV),adaptationmethodandtaskofthedetector, recordedfiles thathavemore3 s-length EXP/INSVcry units havemoreseparable LLRscores(Fig.5).In otherwords, the more information that is available (EXP/INSV length inside eachfile),themorelikelytheinformationleadstomorereliable decisionand less uncertainty about the detected pathological condition.

Earlier, we defined approximately 3-s duration of an EXP/INSV-labeledsegmentasanEXP/INSVcryunit.In gen-eral,crysignalsincludemoreexpiratorycrysegmentsthan in-spiratorycrysegments,butthesituationbecameworsebecause findingpureINSVsegmentswasnotlikely.Table4displaysthe informationaboutthetestingCDBinwhich89and101cry sig-nalsofhealthyandsickinfantrespectivelywerecollectedfrom

(11)

Fig.5. MeanoftheLLRscoresoverINSVcryunitsinsidethe(a)healthyand(b)sickinfantsforthehealthyinfantverificationsystem.

Table4

NumberofinfantsandrecordedcrysignalsavailableinthetestingCDBatthe time.

Class Numberofinfants Numberofcrysignals intestingCDB Healthyinfants 42(4male) 89(11male)

Sickinfants 40 101

Heart 2(2male) 3(3male)

Neurological 11(6male) 30(18male) Respiratory 18(12male) 49(35male)

Blood 4(4male) 9(9male)

Others 4(2male) 10(4male)

newinfantsafterthetimeofcollectingtrainingCDB(Table3). The testingCDB described inTable 4 was collected justfor evaluationandtestingprocess.Itisworthwhilementioningthat duringthetestprocess,inordertoevaluatetheeffectofduration ofavailableINSV/EXP-labeledsegmentsinatestcrysignalon theclassificationresult,theentireofonerecordingfileforababy wasconsideredasatestinputnotonlyonecryunitorashort segmentofthetestfile.Thus,basedontheavailabledurationof EXP/INSVsegmentsinsamples,threedifferenttestingdatasets canbedefinedfrom testingCDB(Table4)for eachcry type (EXPorINSV)asfollows:

Test dataset A: Including cry signals with any length of INSV/EXP-labeledsegments(Table5(a))

Test dataset B: Including crysignalswith atleast one3-s INSV/EXPcryunit(Table5(b))

TestdatasetC:Includingcrysignalswithatleast three3-s INSV/EXPcryunits(Table5(c))

Table5showsthenumberoftestcrysamplesmoreclear cor-responding for theaforementioned testdatasets. Moreover,it depictsalarge reduction intheamount of bothBandC test datasetsafterusing twocorresponding cryunitrestrictions.In

latersectionsofthispaperweevaluatedbothEXPand INSV-basedexpertswithallthesethreedatabasesbutinsome exper-imentsonlythebestobtainedresultswerepresented.

5.3. Healthy-conditioneddetectorsystemsbasedonexpiratory orinspiratorycryunits

Inthissection,wepresenttheresultsofbothhealthyandsick (withspecificdiseases)infantdetectionsystems.

5.3.1.Healthyinfantdetector

Wepresent the resultsof our healthyinfant detectorfor a testdatabaseusingbothtestEXPandINSV-labeled segments with two different frame lengths (10 ms and 30 ms). More-over, to describe the entire space of possible alternatives for thehealthyclass,twoexplainedapproaches,calledbackground speakermodesλH yp=12,...,λB)andUBMλUBM,have

beenusedtocomputetheLLRscores.Becausesomeofthetest datadonot have3-sINSV cryunits,toevaluatethe detector basedontheINSVmodels,weperformedourexperimentson twosetsoftestdatasets(Table5):1)containingINSV-labeled segmentswithanylength(testdatasetA)and2)containingat least one 3-s INSV unit (test dataset B). On the otherhand, almostallofthedatafromthetestdatabasecontainspure EXP-labeledsegmentsexceptforonesample,andthus,thereare88 healthyand101sicksamplesfortheevaluationEXP-based ex-pert.Here,weonlypresenttheresultsoftestdatasetBwhich aremoresatisfactorythanthoseoftestdatasetA,asexpected. Themiss(falsenegative)andfalsealarm(falsepositive)rates are,respectively,plottedonthex-andy-axes,whicharescaled non-linearly(normal deviatescale)as detectionerrortradeoff (DET)curves(Martinetal.,1997).TheDETplotsthatare de-pictedinFigs.6and7distinguishmoreclearlytheperformance ofthesystemsthathavedifferentadaptationmethods(defined

(12)

Table5

NumberofcrysamplesindefinedthreedifferenttestingdatasetsfromtestingCDBavailableinTable4. Healthycrysignals Sickcrysignals Heart Neuro Resp Blood Others

INSVmodel 86 93 1 29 45 9 9

EXPmodel 88 101 3 30 49 9 10

(a)TestdatasetA

Healthycrysignals Sickcrysignals Heart Neuro Resp Blood Others

INSVmodel 66 62 1 22 24 7 8

EXPmodel 88 101 3 30 49 9 10

(b)TestdatasetB

Healthycrysignals Sickcrysignals

INSVmodel 32 23

EXPmodel 87 99

(c)TestdatasetC

inSection5.1),framelengths,cryunittypesandrepresentatives forthealternativehealthconditions.

AllofthepointsontheDETcurveshavedifferentFAR(%) andFPR(%),andinpractice,theoperatingpoint(OP)should beselectedbasedonthetaskofthesysteminwhichallofthe applicationcriteriaaremet.Forexample,inbiometricsecurity systems,thepointmusthavealowFAR.Becausefindingthebest suitableOPinsuchadiagnosticsystemisnotourconcerninthis paper,theequalerrorrate(EER)pointsareplottedbyindividual circle-shapedpointsoncurveswhereFAR(t)=FPR(t), t ∈S, andSisthesetofthresholdsforcalculatingtheOPdistribution. NotethatanexactEERpointmightnotexist.Moreover,the op-timalROCoperatingpointsdescribedinMetz(1978)areshown bythesquare-shapedpointsonthecurves.Thedecision thresh-oldisselectedinawaythatminimizestheaveragecostatthis point.TheslopeoftheROCatthispointisgivenby

S=CFPCTN CFNCTP

×P(+D)

P(D) (12)

whereCFP,CTN,CFN,CTParethecosts,andP(∓D)isequalto

theprobabilitythatacasefromthedatabaseisan∓case.Here, ourpredefinedcostsforcomputingSareasfollows:

CFP=CFN=0.5, CTN =CTP=0

Theoverallaccuracyanderrorratesdependonthe chosen operatingpoint,whichisnotclearhere.Therefore,tocompare thesystemsfairlyandindependentlyofthecutpoint,theArea underthe ROCcurve(AUC)isused asameasureofthe per-formanceof theeachdetector,whileanidealclassifierhasan AUCequalto1.ThevalueofEER(%)andAUCforthesystems plottedinFigs.6and7arelistedinTable6.Itisapparentthat theexperimentsontheshorterframelengthhavebetterresults inmostofthecasesandtheEXP-labeledcryunitshaveamore distinctiveabilitythanthe INSV-labeledcry unitsin classify-ing healthyandsickinfantsindependent ofthe framelength, asweanticipated.Moreover,becauseingeneraleachcrysignal contains moreexpiratory cryunits thaninspiratory cryunits, theaverageLLRscorecomputedoverEXP-labeledcryunitsis morereliablethantheINSV-labeledcryunits.

To showtheimpactof thenumberofcryunitsonthe sys-temperformance,especiallyfortheinspiratorycry,weusedtest datasetC(Table5(c))includingonlytestfilesthathaveatleast 3 cryunitstoperformtheclassification.Almostallofthecry

testsampleswithEXP-labeledsegmentshavemorethan3cry units(exceptforonehealthyandtwosicksamples);therefore, theachieved results for the expiratorycry unitsare the same asthe resultsinTable6 again.However,thiscondition hasa largereffectontheinspiratorysegmentsoftherecordedinfant crysignalsandreducesthe numberoftestfiles to32and23 for healthyandsickinfantsrespectively. The resultsgivenin

Table7,whichareindependentoftheframelength,adaptation methodandbackgroundmodel,confirm that themore INSV-labeledcryunitsareinsideatestfile,themorechanceattheend oftheevaluationtodiagnoseitcorrectly.

Amongthefouradaptationmethodsdefinedearlier(Section 5.1),ourreferencesystemwiththeBayesianorMAPadaptation method(method1)hasthelowestAUC,andtheotherspecific methods, both2-3,whichusethe BML adaptationestimates, havelowererror-detectionrateswithahigherAUC.Thesystem withthehighestAUCforboththeEXPandINSV-labeledcry segmentsisthesystemthatusestheλH ypbackgroundmodeland

the2ndmethodofadaptation(definedinSection5.1).Therefore, theminimumachievedequalerrorratesare14.85%and25.8% forusingtheEXPandINSV-labeledcrysegments,respectively.

5.3.2.Sickinfantdetectorwithaspecificdisease

Alackofdata,especiallyinthetrainingdataforaspecific illness,causesdifficultyintraining andadapting well-defined models,suchasthemodelforinfantswhohaveblooddisorders. Evenattheevaluationtime,thereisnotasufficientnumberof testsamplesinallofthediseases(Table4).Thusfar,wehave onlyusedourdetectorforsickinfantswhosufferfrom neuro-logicalandrespiratorydisorders.Intheinterestofbrevity,only theresultsfor ashorterframelength(10ms)andbackground speakermodelsλHypovertestdatasetB,whicharemore satis-factorythanotherresults,willbediscussed.

Therearelimitednumbersofdistinctsickinfantsintraining CDB(Table3)whoarenotthesameastheinfantscollectedin thetestingCDB(Table4).In total,10and5infantsareused totraintherespiratoryandneurologicaldiseasemodels, respec-tively.In comparisontoprevious resultsinthe healthyinfant detectorsystem, very low training errorsplus testresults de-pictedinTable8fortheunseentestdatasetB(Table5(b))might beasignofmemorizingtrainingdataratherthanlearning.This findingisduetoanapparentlackofenoughdistinctinfantsin

(13)

30 40 50 60 70 80 5

10 20 30

False Alaram (False Positive) Rate

Miss (False Negative) Rate

DET (%) - INSV Cry Units

Optimum Point, ¸θ=0.91 (fp=40.32%, fn=7.58%) EER point, θ=1.16 (fp=27.42%, fn=27.27%) 1st method 2nd method 3th method 4th method 20 30 40 50 10 20 30 40

False Alaram (False Positive) Rate

et a R ) e vit a g e N e sl a F( s si M

DET (%) - INSV Cry Units

Optimum Point, ¸θ=2.10 (fp=19.35%, fn=27.27%) EER point, θ=2.00 (fp=25.81%, fn=25.76%) 1st method 2nd method 3th method 4th method 20 30 40 50 60 5 10 20 30 40 50 60

False Alaram (False Positive) Rate

et a R ) e vit a g e N e sl a F( s si M

DET (%) - EXP Cry Units

Optimum Point, ¸θ=0.89 (fp=22.77%, fn=23.86%) EER point, θ=0.87 (fp=23.76%, fn=23.86%) 1st method 2nd method 3th method 4th method 10 20 30 5 10 20

False Alaram (False Positive) Rate

et a R ) e vit a g e N e sl a F( s si M

DET (%) - EXP Cry Units

Optimum Point, ¸θ=1.50 (fp=9.90%, fn=17.05%) EER point, θ=1.46 (fp=14.85%, fn=14.77%) 1st method 2nd method 3th method 4th method a b c d

Fig.6. DETcurvesfortwoalternativehypothesizedmodels,λPIU BM(a)and(c)andλH yp(b)and(d),andforINSV(a)and(b)andEXP(c)and(d)cryunitswith

a10msframelengthinthehealthyinfantverificationsystem.

eachcorrespondingclass,especiallyforneurologicaldisease.It isimportanttounderstandthatthedatacollectionprocess, train-ingandadaptingproceduresaretime-consuming,butthatalso, inspiteofthem,furthercorrespondingfull-termsickinfants(as withhealthyinfants)resultinbettergeneralizationbytraining well-definedmodels. Even using cross-validation, whichis a methodfor preventing overfitting,isnot aquick-fix solution. Therefore,collectingnewdatatoincreasethesizeofthe train-ingCDBtorebuildapathology-independentbackgroundmodel andsicknessmodelsisapracticalsolutiontoimprovingthe per-formance.

IthasbeenshowninTable8thatamongadaptionmethods (definedinSection5.1)thosewhichuseBML technique(the 2nd,3rdand4thmethods)havethehighestAUCthanour ref-erencesystemwiththeBayesianadaptation.Moreover, accord-ingtotheresultinTable8,itisclearthatEXPandINSVcry unit-basedmodelshavedifferentabilitytodistinguishavailable classes.Forexample,INSVmodelhasbetterperformancethan EXPmodelinRespiratorydisorderdetectorsystem,butonthe otherhandinneurologicaldisorderdetectorsystem,EXPmodel hasbetterperformancethanINSVmodel.Therefore,we tried totakeadvantage of thestrength of eachclassifier (EXPand

(14)

20 30 40 50 60 70 80 30

40 50

False Alaram (False Positive) Rate

et a R ) e vit a g e N e sl a F( s si M

DET (%) - INSV Cry Units

Optimum Point, ¸θ=1.20 (fp=21.82%, fn=41.27%) EER point, θ=1.11 (fp=36.36%, fn=36.51%) 1st method 2nd method 3th method 4th method 20 30 40 50 60 70 20 30 40 50 60

False Alaram (False Positive) Rate

et a R ) e vit a g e N e sl a F( s si M

DET (%) - INSV Cry Units

Optimum Point, ¸θ=1.82 (fp=18.18%, fn=34.92%) EER point, θ=1.58 (fp=30.91%, fn=30.16%) 1st method 2nd method 3th method 4th method 10 20 30 40 50 60 10 20 30 40

False Alaram (False Positive) Rate

et a R ) e vit a g e N e sl a F( s si M

DET (%) - EXP Cry Units

Optimum Point, ¸θ=1.33 (fp=8.91%, fn=28.41%) EER point, θ=1.13 (fp=19.80%, fn=19.32%) 1st method 2nd method 3th method 4th method 20 30 40 50 60 10 20 30 40

False Alaram (False Positive) Rate

et a R ) e vit a g e N e sl a F( s si M

DET (%) - EXP Cry Units

Optimum Point, ¸θ=1.21 (fp=18.81%, fn=28.41%) EER point, θ=1.10 (fp=27.72%, fn=27.27%) 1st method 2nd method 3th method 4th method a b c d

Fig.7. DETcurvesforthetwoalternativehypothesizedmodels,λPIU BM(a)and(c)andλH yp(b)and(d),andforINSV(a)and(b)andEXP(c)and(d)cryunits

withthe30msframelengthinthehealthyinfantverificationsystem.

INSVcryunit-basedmodels)asaseparatesourceof informa-tion,knownasdecisionfusion.

5.4. Fusion,calibrationanddecision

Amoresophisticatedsystemcanbedevelopedby integrat-ingtheevidencepresentedbymultiplesourcesofinformation, similartoinmultimodalbiometricsystems.Suchamultimodal systemisexpectedtobemorereliableincontrasttoaunimodal system,whichreliesontheevidenceofasinglesource. Gen-erallyspeaking,thestrategyoffusion canbecategorizedinto

threelevels,whicharecalledthedataorfeaturelevel,matching scorelevel,anddecisionlevel(BlumandLiu,2005).Although thefeaturesetisricherindiscriminativeinformationthanthe matchingscoreortheoutputdecisionofaclassifier,fusionatthe matchscorelevelisusuallypreferredbecauseitisrelativelyeasy toobtainandthereisnoneedtoworryaboutthefeature com-patibilityatthescorelevelorrigidfusionatthedecisionlevel (RossandJain,2004).Here,thefusionofthetwoproposed sub-systems(expiratoryandinspiratorycryunit-basedGMM)canbe donebyintegrationatmatchingscorelevelsinceobtainedLLR

(15)

Table6

Comparisonofthedifferenthealthyinfantdetectorsystemsbasedontheequal errorrateandareaunderthecurveforallofthetestsamples.

INSV-λH IU BM 10ms 30ms Equalerror rate(%) AUC Equalerror rate(%) AUC ’method1 29.03 0.77 41.81 0.62 ’method2 27.41 0.815 36.36 0.69 ’method3 27.41 0.811 38.18 0.68 ’method4 29.03 0.78 40 0.64

INSV-λH yp Equalerror rate(%) Equalerror rate(%) AUC ’method1 27.41 0.806 38.18 0.68 ’method2 25.80 0.8350 30.90 0.77 ’method3 25.80 0.8355 29.09 0.81 ’method4 29.03 0.80 34.54 0.71

EXP-λH IU BM Equalerror

rate(%)

AUC Equalerror rate(%) AUC ’method1 27.72 0.81 20.79 0.87 ’method2 23.76 0.84 19.80 0.89 ’method3 24.75 0.826 22.77 0.86 ’method4 25.74 0.827 18.81 0.89 EXP-λH yp Equalerror

rate(%)

AUC Equalerror rate(%) AUC ’method1 15.8415 0.932 32.67 0.76 ’method2 14.8514 0.951 27.72 0.825 ’method3 15.8415 0.951 24.75 0.824 ’method4 15.8415 0.932 30.69 0.77 Table7

ComparisonofthedifferenthealthyinfantdetectorsystemsbasedontheEER andAUCforthetestsamplesthathavemorethan3INSVunits(32and23cry samplesofhealthyandsickinfantsrespectively).

10ms 30ms

INSV-λH IU BM Equalerrorrate(%) AUC Equalerrorrate(%) AUC

’method1 13.043 0.933 25 0.75

’method2 13.043 0.944 30 0.79

’method3 13.043 0.941 35 0.77

’method4 13.043 0.938 25 0.78

INSV-λH yp Equalerrorrate(%) AUC Equalerrorrate(%) AUC

’method1 26.08 0.872 25 0.81

’method2 17.39 0.888 25 0.82

’method3 21.73 0.887 25 0.85

’method4 26.08 0.877 20 0.82

scores(theoutputofEXPandINSV-cryunitbasedmodels)are thequalityofeachmatch.

Therearetwodifferentstrategiesforconsolidatingscores ob-tainedfromdifferentclassifiers.Thefirstapproachformulatesit asacombinationproblem.Thefinaldecisionismadebya sin-glescalarscorewhichisacombinationofmatching/individual scores(Ben-Yacoubetal.,1999;Dieckmannetal.,1997).Note thatbefore combination, the scoresmust be firsttransformed to a common domain. There are several techniques for ad-dressingthe combinationproblem like the sumrule, median rule,productrule,min/max/medianrules andmajorityvoting (Lietal.,2013;Snelicketal.,2005).Thesecondapproachtreats itasaclassificationproblemandconstructsafeaturevectorusing thematchingscoresoutputbytheindividualclassifiers/matchers

Table8

Resultsofthesickinfantdetectorsystemsforrespiratoryandneurological dis-orders. Respiratorydiseases INSV-λH yp EXP-λH yp 10ms Equalerror rate(%)

AUC Equalerrorrate(%) AUC

’method1 0.8028.94 34.61 0.75 ’method2 0.8223.68 32.69 0.74 ’method3 0.8126.31 36.53 0.73 ’method4 0.8421.05 30.76 0.77 Neurologicaldisorders 10ms Equalerror rate(%)

AUC Equalerrorrate(%) AUC

’method1 40 0.607 33.80 0.727

’method2 0.58747.5 28.16 0.777

’method3 0.57247.5 29.57 0.770

’method4 0.63142.5 30.98 0.746

(VerlindeandCholet,1999;Vatsa etal.,2007).The obtained featurevectoristhenclassifiedintooneoftwo“Accept” or “Re-ject” classes.In contrasttofirstapproach, thereisnoneed of preprocessingtohavehomogeneousindividualscores.

In thispaper, the classification approachhas beenused to informationfusionatmatchingscorelevel. Soweconsidered matchingscoresattheoutputofEXPandINSV-cryunitbased modelsasatwo-dimensionalfeaturevector.Inotherword,the LLRscoresobtainedfromtwoindividualclassifiers(EXP/INSV cry unit-based model) are concatenated to construct a two-dimensionalfeaturespace.Severalclassifiershavebeenusedto consolidatetheobtainedindividualscoresandarriveata deci-sion:multilayerperceptron(MLP)usingtheback-propagation algorithm,probabilisticneuralnetworks(PNN)andasupport vectormachine(SVM)(Table9).Theseclassifierswere evalu-atedusingthreetestdatasets(Table5)thatincludedcrysignals containingbothEXPandINSVlabeledsegments.

5.5. Resultsanddiscussion

Tocomparethegeneralizabilityofthesethreedifferent algo-rithmsandtofindoutthebestalgorithmfortheavailabledata, stratifiedK-foldcross-validationisusedinwhicheachfoldhas aroughlyequalsizeandcontainsthesamepercentageof sam-plesofeachtargetclassasinthewholedataset.Although10-fold cross-validationismorecommon,inpractice,usuallythechoice ofthenumberoffoldsdependsonthesizeofthedataset. Al-thoughtherearedifferentvariantsofcross-validatedestimates (Refaeilzadeh etal., 2009), stratified 10-fold cross-validation isrecommendedby Kohavi(1995)as the bestmodel.To ob-tainreliableperformanceestimation,multipleroundsof cross-validationareperformedtotestnewanddifferentrandomsplits that result in smaller variance in the results and reduce the variability.Then,thevalidationresultswiththreedifferent val-uesofK,depictedinTable10,areaveragedoverthe correspond-ingnumberofrounds.

(16)

Table9

TrainingparametersusedinSVM,PNNandMLP.

PNN Numberofneuronsinlayer1:161

Firstlayertransferfunction:Radialbasistransferfunction Spreadvalue:0.1

Numberofneuronsinlayer2:2

Secondlayertransferfunction:Competitivetransferfunction Performancefunction:mse

Learningalgorithm:Scaledconjugategradient MLP Numberofhiddenlayers:1

Hiddenlayerneurons:10

Hiddenactivationfunction:Hyperbolictangentsigmoid Outputlayerneurons:2

Outputactivationfunction:Softmax(normalizedexponential) Maxnumberofiteration:1000

Performancefunction:Crossentropy Learningalgorithm:Scaledconjugategradient SVM Numberofiterations:15,000 Kernelfunctions: 1-Linear 2-Quadratic 3-Polynomial(order3) 4-GaussianRBFKernel

5-MultilayerperceptronKernelwithSMOmethodtofindtheseparatinghyperplane

Table10

Numberoffoldsandroundsusedintheexperiments.

ValueofK 3 5 10

Numberofiterations 400 200 100

Theperformancesofthe classifiersthatusedifferent adap-tationmethods(definedinSection5.1)arecomparedbasedon somewidelyusedstatistical measures,namely,thefalse posi-tiverate,falsenegativerate,accuracy,sensitivityandspecificity. Thesestatisticalmeasurescanbecalculatedfromtheclassifier’s results,asdescribedinFawcett(2006).

Theerroroftheclassifierfollowsabinomialdistributionwith thefollowingmeanandstandarddeviation:

μerror =pcv, σerror =

pcv(1−pcv)

n

wherepcvisthemeanofKerrors,andnisthenumberofsamples.

Wecanapproximatethe100(1−α)%confidenceintervalfor the errorbythe Waldconfidenceinterval (AgrestiandCoull, 1998),asfollows:

pcvzα/2σerror

wherezβisthe1−βquantileofthestandardnormal distribu-tion.

BasedonTable5,thetestdatasetAconsistsof86and93cry samplesofhealthyandsickinfantswhoserecordedcrysignals containbothEXPandINSV-labeledsegments withany dura-tion.Thesedataareusedtodetecthealthyinfantsbyfusionof the obtained likelihoodratio scores fromtheir expiratory and inspiratorycryunits.Fig.8indicatesbothtypesof errors,the falsepositiverate(FPRortypeIerror)andthefalsenegativerate (FNRortypeIIerror),withan80%confidenceintervalafter ap-plyingstratifiedK-foldcross-validationforeachclassifier.Due

Table11

Averageaccuracy, sensitivityandspecificity overtheusedclassifiersinthe healthyinfantdetectiontask.

Method1 Method2 Method3 Method4 Averageaccuracy 85.21 89.13 89.76 85.98 Averagesensitivity 88.07 89.85 90.73 88.64 Averagespecificity 82.53 88.37 88.41 82.74

tospacelimitations,weonlyplotthetesterrorwhichisthe av-erageerrorthatresultsfromusingastatisticallearningmethod topredict theresponseonanobservation inthe testfold,one thatwasnotusedintrainingphaseofK-foldcrossvalidation. Foreach methodof adaptation(definedinSection5.1),there isnotmuchdifferenceintheerrorsbetweentheK-fold cross-validationswithdifferentvaluesoffolds.TheBayesian adapta-tionmethod(1stmethod),whichisourreferencemethod,has obviouslyhighererrorsthantheothermethods,especiallythe 2ndand3rdmethods.Toobtainabettercomparisonbetweenthe performancesoftheadaptationmethods(definedinSection5.1),

Table11depictstheaverageaccuracy,sensitivityandspecificity ratesoveralloftheusedclassifiersforeachadaptationmethod. Here,theaccuracyrateindicatestheproportionof true classi-fiedinfants(bothhealthyandsick)amongthetotalnumberof infantsinthetestdatasetA(Table5(a)).Thesensitivitymeasures theportionofhealthyinfantsthatarecorrectlyidentified,while thespecificitymeasurestheproportionofsickinfantsthatare correctlyverified.Asisclear,theBMLadaptationmethodfor refiningboththemeanandvariancevectors(2ndmethod)and thecoupledBML adaptationestimationswitholdestimations (3rdmethod)aresuperiortotheothers,andeventheBML adap-tationforrefiningonlythemeanvectors(4thmethod)performs betterthantheBayesianadaptationmethod(1stmethod).

Classifiersprovidedifferentfalsenegativeandfalsepositive rates,butasweexpectedfromthefusionapproach,eveninthe

(17)

(a) Multilayer Perceptron Neural Network

(b) Probabilistic Neural Network

(c) SVM with Linear Kernel Fucntion

(d) SVM with Quadratic Kernel Fucntion

1st Method 2nd Method 3rd Method 4th Method 0 5 10 15 20 25

MLP - 80% Confidence Interval for Estimated Error Using N Repeated K-Fold CV

E s ti m a te d E rro r (% ) (FNR) N=100, K=10 (FPR) N=100, K=10 (FNR) N=200, K=5 (FPR) N=200, K=5 (FNR) N=400, K=3 (FPR) N=400, K=3

1st Method 2nd Method 3rd Method 4th Method 0 5 10 15 20 25 30

PNN - 80% Confidence Interval for Estimated Error Using N Repeated K-Fold CV

E s ti m a te d E rro r (% ) (FNR) N=100, K=10 (FPR) N=100, K=10 (FNR) N=200, K=5 (FPR) N=200, K=5 (FNR) N=400, K=3 (FPR) N=400, K=3

1st Method 2nd Method 3rd Method 4th Method 0 5 10 15 20 25 30

SVM with Linear Kernel Function - 80% Confidence Interval for Estimated Error Using N Repeated K-Fold CV

E s ti m a te d E rro r (% ) (FNR) N=100, K=10 (FPR) N=100, K=10 (FNR) N=200, K=5 (FPR) N=200, K=5 (FNR) N=400, K=3 (FPR) N=400, K=3

1st Method 2nd Method 3rd Method 4th Method 0 5 10 15 20 25

SVM with Quadratic Kernel Function - 80% Confidence Interval for Estimated Error Using N Repeated K-Fold CV

Es ti m a te d Er ro r ( % ) (FNR) N=100, K=10 (FPR) N=100, K=10 (FNR) N=200, K=5 (FPR) N=200, K=5 (FNR) N=400, K=3 (FPR) N=400, K=3

(18)

(e) SVM with Polynomial Kernel Function order 3

(f) SVM with Gaussian radial Basis Function Kernel

(g) SVM with Multilayer Perceptron Kernel Function 1st Method 2nd Method 3rd Method 4th Method

0 5 10 15 20 25

SVM with Polynomial Kernel Function - 80% Confidence Interval for Estimated Error Using N Repeated K-Fold CV

E s ti m a te d E rro r (% ) (FNR) N=100, K=10 (FPR) N=100, K=10 (FNR) N=200, K=5 (FPR) N=200, K=5 (FNR) N=400, K=3 (FPR) N=400, K=3

1st Method 2nd Method 3rd Method 4th Method 0 5 10 15 20 25

SVM with (Gaussian) RBF Kernel - 80% Confidence Interval for Estimated Error Using N Repeated K-Fold CV

E s ti m a te d E rro r (% ) (FNR) N=100, K=10 (FPR) N=100, K=10 (FNR) N=200, K=5 (FPR) N=200, K=5 (FNR) N=400, K=3 (FPR) N=400, K=3

1st Method 2nd Method 3rd Method 4th Method 0 5 10 15 20 25

SVM with MLP Kernel Function - 80% Confidence Interval for Estimated Error Using N Repeated K-Fold CV

Es ti m a te d Er ro r ( % ) (FNR) N=100, K=10 (FPR) N=100, K=10 (FNR) N=200, K=5 (FPR) N=200, K=5 (FNR) N=400, K=3 (FPR) N=400, K=3 Fig.8. Continued

1st Method 2nd Method 3rd Method 4th Method

0 5 10 15 20 25 30

Comparison of SVM with different Kernel Functions - 80% Confidence Interval for Estimated Error Using 100 Repeated 10-Fold CV

E s ti m a te d E rro r (% ) (FNR) Linear (FPR) Linear (FNR) Quadratic (FPR) Quadratic (FNR) Polynomial (FPR) Polynomial (FNR) Gaussian RBF (FPR) Gaussian RBF (FNR) MLP (FPR) MLP

(19)

1st Method 2nd Method 3rd Method 4th Method 0 20 40 60 80 100

Accuracy Rates of the Tested Healthy Infants Detecor Systems for each Adaptation method Using 100 Repeated 10-Fold CV

A c c u ra c y (% ) MLP PNN SVM-Linear SVM-Quadratic SVM-Polynomial SVM-Gaussian RBF SVM-MLP

Fig.10. Comparisonoftheaccuracyratesofalloftheclassifiersinthehealthyinfantdetectiontask.

1st Method 2nd Method 3rd Method 4th Method 0 5 10 15 20 25 30 35

Error Rates Obtained by Different Classifiers Using Only Test Samples with at Least 3 EXP/INSV Units - 80% Confidence Interval for Estimated Error Using 400 Repeated 3-Fold CV

E s ti m a te d E rro r (% ) (FNR) MLP (FPR) MLP (FNR) PNN (FPR) PNN

(FNR) SVM with Linear Kernel (FPR) SVM with Linear Kernel (FNR) SVM with Quadratic Kernel (FPR) SVM with Quadratic Kernel (FNR)SVM with Polynomial Kernel (FPR) SVM with Polynomial Kernel (FNR) SVM with Gaussian RBF Kernel (FPR) SVM with Gaussian RBF Kernel (FNR) SVM with MLP Kernel (FPR) SVM with MLP Kernel

Fig.11. TypeIandtypeIIerrorsoftheclassifiersusedinthehealthyinfantdetectiontask,using400repeated3-foldCVsforthetestsamplesthatcontainmore than3EXPandINSV-labeledcryunits.

Table12

Accuracyratesfortheusedclassifiersinthehealthyinfantdetectiontask. Classifier Method1 Method2 Method3 Method4

MLP 85.03 89.95 91.68 84.99 PNN 82.11 88.79 89.93 83.82 SVM-linear 83.12 87.67 89.33 84.82 SVM-quadratic 86.11 88.98 88.98 87.25 SVM-polynomial 86.99 89.82 89.27 87.08 SVM-GaussianRBF 86.59 88.78 88.75 84.33 SVM-MLP 86.57 89.85 90.41 86.60 Averageaccuracy 85.21 89.13 89.76 85.98

worstcaseofeachclassifierusing the2ndand3rdadaptation methods,weobtainedasmallererrorratethanthelowestequal errorrateofthemodelusingEXPandINSV-labeledsegments separately,whichare14.85%and25.8%,respectively.Thetype IerrorandtypeIIerrorweredecreasedto8.84%forthefalse negativerateand11.49%forthefalsepositiverateusing SVM-MLPwiththe3rdadaptationmethod.Thesystemwithasmaller falsepositiveleadstomorefalsenegatives,andviceversa.In biometricsystems,thesystemwithanapproximatelyequalfalse rejectionrateandfalseacceptancerateiscalledatunedsystem (TiptonandKrause,2007),andthegoalistotunethissystemto obtainanequalerrorratethatisaslowaspossible.Amongallof theusedclassifiers,basedontheerrorratesandaccuracyrates depictedinFigs. 8–10, MLPandSVM withtheMLPkernel functionperformbetterthantheothersbyprovidingalmostthe sametypeIandIIerrors(closetotheEERpoint).Tables12–14

Table13

Sensitivityratesfortheusedclassifiersinthehealthyinfantdetectiontask. Classifier Method1 Method2 Method3 Method4

MLP 84.86 90.83 93.05 91.11 PNN 82.77 87.08 89.44 83.88 SVM-linear 89.58 89.44 91.80 90.55 SVM-quadratic 93.19 90.97 92.08 93.19 SVM-polynomial 85.69 90.27 90.27 88.19 SVM-GaussianRBF 88.47 89.72 90.83 86.25 SVM-MLP 91.94 90.69 90.69 87.36 Averagesensitivity 88.07 89.85 90.73 88.64 Table14

Specificityratesfortheusedclassifiersinthehealthyinfantdetectiontask. Classifier Method1 Method2 Method3 Method4

MLP 85 89 90.22 79.77 PNN 81.66 90.22 90.22 83.77 SVM-linear 77.11 85.88 87 79.33 SVM-quadratic 79.55 87.11 86.11 81.66 SVM-polynomial 88 89.22 88.11 85.88 SVM-GaussianRBF 84.88 88.11 87 82.77 SVM-MLP 81.55 89.11 90.22 86 Averagespecificity 82.53 88.37 88.41 82.74

indicateindetailtheaccuracyrate,sensitivityandspecificityof theclassifiersasstatisticalmeasuresoftheperformanceofthe classificationtest.

Earlier,weshowedthatusingtestsampleswithmoreINSV cryunitscanreducetheequalerrorratedownto13%(Table6).

References

Related documents