• No results found

Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data

N/A
N/A
Protected

Academic year: 2021

Share "Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

ContentslistsavailableatScienceDirect

Biomedical

Signal

Processing

and

Control

j ou rn a l h o m e pa g e :w w w . e l s e v i e r . c o m / l o c a t e / b s p c

Technical

note

Feature

selection

method

based

on

mutual

information

and

class

separability

for

dimension

reduction

in

multidimensional

time

series

for

clinical

data

Liying

Fang

a,b,c,∗

,

Han

Zhao

a,b,c

,

Pu

Wang

a,b,c

,

Mingwei

Yu

d

,

Jianzhuo

Yan

a,b,c

,

Wenshuai

Cheng

a,b,c

,

Peiyu

Chen

a,b,c

aCollegeofElectronicInformationandControlEngineering,BeijingUniversityofTechnology,Beijing100124,China

bEngineeringResearchCenterofDigitalCommunity,MinistryofEducation,Beijing100124,China

cBeijingKeyLaboratoryofComputationalIntelligenceandIntelligentSystem,Beijing100124,China

dHospitalofTraditionalChineseMedicine,CPUMS,Beijing100010,China

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received7December2014

Receivedinrevisedform28March2015

Accepted14May2015

Availableonline15June2015

Keywords:

Multidimensionaltimeseries

Dimensionreduction

Featureselection

Mutualinformation

Classseparability

a

b

s

t

r

a

c

t

Inclinicalmedicine,multidimensionaltimeseriesdatacanbeusedtofindtherulesofdiseaseprogress bydataminingtechnology,suchasclassificationandprediction.However,inmultidimensionaltime seriesdataminingproblems,theexcessivedatadimensioncausestheinaccuracyofprobabilitydensity distributiontoincreasethecomputationalcomplexity.Besides,informationredundancyandirrelevant featuresmayleadtohighcomputationalcomplexityandover-fittingproblems.Thecombinationofthese twofactorscanreducetheclassificationperformance.Toreducecomputationalcomplexityandto elim-inateinformationredundanciesandirrelevantfeatures,weimproveduponamultidimensionaltime seriesfeatureselectionmethodtoachievedimensionreduction.Theimprovedmethodselectsfeatures throughthecombinationoftheKozachenko–Leonenko(K–L)informationentropyestimationmethod forfeatureextractionbasedonmutualinformationandthefeatureselectionalgorithmbasedonclass separability.WeperformedexperimentsontheElectroencephalogram(EEG)datasetforverificationand thenon-smallcelllungcancer(NSCLC)clinicaldatasetforapplication.Theresultsshowthatwiththe comparisonofCLeVer,CoronaandAGV,respectively,theimprovedmethodcaneffectivelyreducethe dimensionsofmultidimensionaltimeseriesforclinicaldata.

©2015TheAuthors.PublishedbyElsevierLtd.ThisisanopenaccessarticleundertheCCBY-NC-ND license(http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

Time-seriesanalysisiswidelyusedinmanyapplicationfields, including medical data, financial data, moving-object tracking, human-computerinteractioninterface[1,2],etc.Dataminingfor timeserieshasveryimportantvalue,suchasresearchonthe classi-fication,clusteringorpredictionofdata,whichcanassistinfinding thepotentialrulesoftimeseriesdataandprovidesupport. Cur-rently,mostresearchesfocusonunivariatetimeseriesprocessing. However, with the development of data-collection technology, moreandmoremultidimensionaltimeseriesdatabecome avail-able,whichcontainaconsiderableamountofpotentiallyvaluable information.Forexample,diabetesclinicaldata,asakindoftime

∗ Correspondingauthorat:BeijingUniversityofTechnology,CollegeofElectronic

InformationandControlEngineering,No.100,PingleyuanStreet,Beijing100124,

China.Tel.:+8613810101581.

E-mailaddress:[email protected](L.Fang).

seriesdata,containabundantinformationincludingfoodintake, drugs intakeand daily activities.The EEGdata1 which contain plentifulinformationonbrainwavesreflectcorrelationswith cer-taingeneticpredispositionand disease.In Tanawongsuwanand Bobick[3],22markersarespreadoverthehumanbodyto mea-surethemovementsof body parts whilewalking.In medicine, EEGdatafrom64 electrodesplacedonthescalp aremonitored toexaminethecorrelationofgeneticpredispositiontoalcoholism [4].Therefore,inrecentyears,multidimensionaltimeseries clas-sification,dimensionreduction andsimilaritysearchtechnology havebecomecommonconcernsforresearchersinthefieldofdata mining[5–7].

Atimeseriesisaseriesofobservations,

xi(t); [i=1,...,d; t=1,...,n] (1)

1http://archive.ics.uci.edu/ml/datasets/EEG+Database.

http://dx.doi.org/10.1016/j.bspc.2015.05.011

1746-8094/©2015TheAuthors.PublishedbyElsevierLtd.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/

(2)

UsingKNN methodfor filling

missingvalue

Using the feature extractionwith K-L estimation methodforfeature

space transformation

Using the feature selectionalgorithm

forthedimension attributesgrading and choosing K optimalfeatures

Vectorization for MImatrices

Putthedimension reductionfeature

vectorsinto classifier

DataPre-processing FeatureExtraction FeatureSelection Vectorization ClassificationInput the

Fig.1.MTSdatadimensionreductionprocess.

madesequentiallythroughtimewhereiindexesthemeasurements madeateachtimepointt.Itiscalledaunivariatetimeserieswhen disequalto1andamultidimensionaltimeseries(MTS)whend isequaltoorgreaterthan2.DuetothemassproductionofMTS dataandthegrowingdemandforclassificationinvariousfields, MTSclassificationtechniqueshavebeenappliedinmanyfields, suchastheclassificationofRNAin bioinformatics,handwriting recognitionandelectrocardiogram(ECG)patternmatching.AsMTS dataaretypicalhigh-dimensionaldata[8],manyfeaturesareeither irrelevantorredundant.Moreover,dimensiondisasters,whichare causedbyexcessivedimensions,existinmultidimensionalfeature space.Therefore,howtoeffectivelyselectusefulfeaturesfor clas-sificationfromtherawMTSdatahasbecomeacurrentresearch hotspotwithahighdegreeofdifficulty.

Featureextractionandfeatureselectionarethemainmethods ofdimensionreduction[9].Notonlycantheyreduceclassification errors,butcanalsoimproveclassificationefficiency.Currently, fea-tureselectionmethodsareusedwidelyinMTSincludingCLeVer [10]andAGV[11]basedonPCAandCorona[12]basedona cor-relationcoefficientmethod.However,theycanonlyidentifylinear relationshipsamongdimensions,andtheircalculationsaremore suited to dealing withequal length samplesof MTS. However, unequal length data are indeed the norm in clinical follow-up becausepatientsmaydieorotherwisebelostfromthedataset. Mutualinformation(MI)isanimportantconceptininformation theory.MIcanbeappliedtononlineartransformationand extrac-tionofhigh-orderstatistics.Therefore,weconsiderusingMIfor featureextractiontotransformthedifferentlengthsofsamplesto equallength.Meanwhile,bythenonlinearrelationshipin multi-dimensionalfeaturespace,wecaneffectivelyreducedimensions throughfeatureselection.However,theprobabilitydensity estima-tionmethodhasagreatinfluenceonMIcomputationwhichimplies whetherthemethodcaneffectivelyandefficientlyexpressthe typ-icalfeaturestopromotetheaccuracyoffeatureselection.Thus,itis significanttochooseanapplicableprobabilitydensityestimation methodforMIfeatureextractioninMTS.Inaddition,thefeature subsetevaluationcriterionisthekeyissueinfeatureselectionand itsqualitydirectlyimpactsthefinalresult.Theclassseparability criterionisoneoftheimportantevaluationcriteria.Between-class distancecriterionisoneofthecommonlyusedmethods.Weget betterclassseparabilitybyminimizingwithin-classdistanceand maximizingbetween-classdistancesimultaneously.Thepurpose offeatureselectionis tochoosethefeaturesubsets withlarger classseparability.However,sincetheredundantvariableshavean obviouseffectontheresultofclassification,whilethe between-classdistancecriterioncannoteliminatetheredundantvariables, weconsiderthatintroduceacriterionwithredundancyvariableto eliminateredundanciesandirrelevantfeatures.Wethenintroduce theimprovedmethodwhich caneffectivelychoosetheoptimal featuresandreducedimensions.

Thispaperaimstobreakthelimitationthatcorrelation matri-cesintraditionalMTSfeatureselectionmethodcanonlymeasure thelinearrelationshipsbetweenvariables.Weimprovethefeature selection methodbased onmutualinformation and class sepa-rability.WefirstcomputetheMIvaluebyaprobabilitydensity

estimation methodtoextract thelinear andnonlinear relation-shipbetweenvariablesthroughMImatrices.Byconsideringthe existenceofredundancieswenextintroducethefeatureselection algorithm basedonclass separabilitytoeliminateredundancies andmakehighcorrelationbetweenthechosenfeaturesubsetsand thetargetclass.Wethenusetheimprovedmethodfordimension reductionprocessingonMTSasisshowninFig.1.Finally,we ver-ifythatiftheimprovedmethodcaneffectivelyreducedimensions throughthecontrastexperimentsbasedonclassificationaccuracy withanSVMclassifier.

Theremainder ofthis paperis organizedasfollows.Section 2introducesthefeatureextractionmethodbasedonMI.Section 3introducesthefeatureselectionalgorithmbasedonclass sepa-rability.Theexperimentandresultwiththeimprovedmethodis followedinSection4,whichisfollowedbyconclusioninSection5.

2. FeatureextractionmethodbasedonMI

ThissectionintroducestheMIfeatureextractionmethod,which involvessomebasicconceptsofentropyandMIasareshownin Refs.[13–15].

Ingeneral,aMTScanbeexpressedasad×nmatrix[xi,t]d×n. Eachmatrixexpressesonesample.Assumethattheseresearchdata includeseveralsamplesandthattwoofthesamplesare[xi,t]d×n1 and[xi,t]d×n2.Generallyspeaking,eachvariableofwithin-sample samplingtimehasthesamelength.However,thelengthoftwo samplesofbetween-samplesamplingtimet asn1 andn2 isnot alwaysthesame.Therefore,eachMTSsampleisexpressedbya d×tjmatrix[xi,t]d×tj.

x11 x12 ... x1,tj ... ... ... ... xi1 ... ... xitj xd1 xd2 ... xdtj

,



i=1,2...,d; t=1,2,...,tj

(2)

wherexi,tdenotesthesamplingvalueofthevariablexiwiththeith dimensionattimepointt.Mjsubstitutesforthejthsamplematrix [xi,t]d×tjasisshowninFig.2.tjDenotesthesamplingtimelengthof thejthsample.Xishowstheindexsequenceoftheithdimension. BecauseeachsequenceXihasdifferentdegreesofimportanceto classification,Xiisexpressedindifferentcolorsandthatadeeper colormeansahigherdegreeofimportance.However,underthe initialcondition,for degreeofimportance foreach sequence is unknown,thecolorsareshowninrandomdepth.Fig.3showsaMTS datasetwithnsamplesandeachsampleisamatrixwithdimension dandsamplingtimelengthtj.Foranygivensample,thedegreeof importanceofeachsequenceisinitiallyunknown.

BythedefinitionofinformationentropyandMI,theprobability densitydistributionofrandomvariablesmustbeapproximately estimatedbeforeMIcalculation.Onekindofprobabilitydensity estimationmethodbasedonnearestneighborisintroducedin[16], whichhasgoodeffectusedin[17,18]aswell.Theadvantageofthis methodisthatthereisnoneedtoestimatetheprobabilitydensity distributionfunctionforanyvariables.

(3)

Fig.2. ThediagramofsamplematrixMj.

Assumethat X and Y are two random variables, where X=

xi,i=1...n

,Y=

yi,i=1...n

,.In[19],theK–Lnearest neigh-borestimationentropyisdefinedas

ˆ H (X) =− (k) + (n) +log (cd)+d n n

i=1 log (εX(i,k)) (3)

wherek isthenumber ofthenearest neighborpoints; disthe dimensionofdata;cd istheunit-spherevolumeofd;εX(i,k) is thedistancebetweenxiandthekthnearestneighborpoint;and isthedoublegammafunction.

Basedonformula(3),tosolvetheregressionproblem,onekind ofMIestimationmethodsisproposedbyKraskovin[16]as ˆI (X;Y) = (n) + (k) −1 k− 1 n n

i=1



(x(i))

+





y(i)

(4) wherex(i)isthenumberofpointsonthedistancewhich isno morethanε (i,k) betweenXandxi.y(i)issimilartox(i).Here, ε (i,k) =max (εX(i,k) ,εY(i,k)).

Thispaperutilizesformula(4)onMIcomputationby sequen-tiallycomputingXiofeachMjwithallsequences (X1,X2,...Xd). Hence,eachsequenceistransformedintoaMIvectorViandeach matrixistransformedintoad×dMImatrixIj.Thus,thefeature extractionmethodmaybedescribedasfollows:

Input:aMTSdatasetwithsizenanddimensiond.(Assumed≥tj) Output:d×dMImatrixIj. Ij=

Ij(X1,X1) Ij(X1,X2) ··· Ij(X1,Xd) Ij(X2,X1) Ij(X2,X2) ··· Ij(X2,Xd) . . . ... . .. ... Ij(Xd,X1) Ij(Xd,X2) ··· Ij(Xd,Xd)

, j=1,2,...,n (5)

Theithvariable ofthejthsample is shown byMI vectoras follows:Vji=

Ij(Xi,X1),Ij(Xi,X2),...,Ij(Xi,Xd),



,i=1,2,...,d. Therefore,eachvariableinIjcanbedescribedwithavectorVji.The datachangeprocessfortheMIfeatureextractionphaseintheMTS datasetis showninFig.4which transformseach sampleintoa squarematrixsamplewithequaldimension.

3. Featureselectionbasedonclassseparability

AfterthefeatureextractionbasedonMIprocessing,the com-binationform offeature spacefor the samplematrix hasbeen convertedintoanMImatrixallowingthesefeaturestoexpressthe datacharacteristicsmoreclearlyandtoachievebettereffectinthe featureselectionmethod.First,weintroducetheprincipleofclass separabilitycriterioninSection3.1.Then,wereferenceafeature selectionalgorithmbasedonclassseparabilitytoeliminate redun-dantvariablesandweconverttheMImatricesintovectorsasinputs ofanSVMclassifierinSection3.2.

3.1. Classseparabilitycriterion

Theclassseparabilitycriterionisoftenusedasthebasisin fea-tureselection.Thereareseveralcriteriathatarecommonlyused, suchas,theclass separability criterion basedonthegeometric distance,theprobabilitydensityfunctionandtheposterior proba-bility.Thelattertwoneedtoobtainthestatisticalcharacteristicsof samples,whilethebetween-classdistancecriterionismore com-monlyusedinclassseparabilitybasedonthegeometricdistance. Althoughthedefinitionsofthebetween-classcriteriavaryinthe literature[20–22],theyare essentiallybasedonthe conceptof distance.

Assumethattherearectypes.ωjisthejthclass.x(j)k isthekth sampleofωj.Letnjbethesamplenumberofωj.nisthetotalsample number.mjisthesamplemeanvectorofωjandmisthemeanvector ofallsamples. mj= 1 nj nj

k=1 x(j)k (6) m=



c1 j=1nj c

j=1 nj

k=1 x(j)k (7)

whereJwisthewithin-classtotalmeansquaredistance.

Jw= c

j=1



PjJj

(8)

(4)

Fig.4. Thedatachangediagramonfeatureextractionphase.

wherePj(j=1,2,...,c) isthepriorprobabilityofωjwhichcanbe estimatedbynjandn.Jjisthewithin-classmeansquaredistance ofωj. Jj= 1 n nj

k=1



xk(j)−mj



T



x(j)k −mj



(9)

Jbisthebetween-classtotalmeansquaredistance.

Jb= c

j=1 Pj



mj−m

T



mj−m

(10)

Inordertomakeaneffectthatminimizeswithin-classdistance andmaximizesbetween-classdistancesimultaneously,theclass separabilitycriterionJmisconstitutedintuitively[23]asfollows: Jm= Jb

Jw

(11)

3.2. Featureselectionbasedonclassseparability

Inviewofthedistanceratiobetweenthebetween-classand within-classisthecontributionforclassificationtothevariable. Therefore,theideaofclassseparabilityistochoosetheoptimal featuresubsetsforclassification.AsMTSisakindof multidimen-sionaldata,duringtheprocessoffeatureselection,theredundant variableshaveobviouseffectupontheresultofclassification. How-ever,thebetween-classcriterionfunctioncannoteliminatethese redundantvariableswhichreducetheclassificationaccuracy.Thus, weintroduceafunction[24]witharedundancyevaluationvariable Jftopromotetheaccuracyoffeatureselection:

Jf= 1



S



| S|

i=1 C

j=1 nj



mj−mij



mj−mij

T (12)

where



S



indicatesthevariablenumberwhichhasbeenchosenin thefeaturesubsetsS.mijshowstheithaverageofthejthsample

inS.ThebiggertheJfis,thesmallerthebetween-classredundant variableis.Therefore,thecriterionJrisasfollows:

Jr= Jb+Jf Jw

(13) Thefollowingprocessisreferencedbythegeneralconceptof thefeatureselectionalgorithmin[24].

Algorithm1

MTSfeatureselectionalgorithmbasedonclassseparability. Input:aMTSdataset(d×dMImatrixIjwithjsamples). Output:theoptimalfeaturesubsetwithKsequences.

Step1:ComputeeachJbiandJwiinIj.Becauseoftheresultsof bothJbiandJwiaretheproductbetweenarowandacolumnvector, theirvaluesarequantitativevalues.Inthisway,allvariablescanbe sortedbytheformulaasfollows:

Jmi= Jbi

Jwi,i=1,2,...,N

(14) ThelargertheJmi is,themoreimportanttheithvariableisto theclassificationresult.

Step2:ChoosethelargestvariableofJmiasthefirstelementof S.

Step3:Considertheexistenceofredundanciesbetween vari-ables and introduce the redundancy evaluation variable Jfi to synthesizeselectionvariables.

Jmi= Jbi+Jfi

Jwi ,i=1,2,...,N

(15) ThelargertheJriis,themoreimportanttheithvariableistothe classificationresult.ThelargestJriischosenasfeatureelement.

Step4:If



S



isK,thenalgorithmends,elseloopoperationwith Step3.

Tofindthehighestattributetotheclassificationcontribution rate,wefirst computeeachdimensionofMI matrices withthe introducedclassseparabilitycriterion.Next,wesortallattributes usingthecriterionandchooseksequencestoreducethe dimen-sionsofmatrices.AsisshowninFig.5,thedimensionswiththe deepercolorsmeanthatthehigherattributestotheclassification contributionrateareinthefront.Vsiistheithsequenceafter sor-ting.

(5)

Fig.5.Thedatachangediagramonfeatureselectionphase.

InordertosatisfytheinputrequirementofanSVMclassifier, MTSmatricesneedtobetransformedintofeaturevectors,whichis calledaprocessofvectorization.Thespecificalgorithmof vector-izationisasfollows:Algorithm2

MTSvectorizationalgorithmbasedonMI. Input:MTSsampleafterfeatureselection. Output:MIvectorIv.

Step1:ComputetheMIvaluebetweenvariablesinMTSandget aMImatrixI;

Step2:InitializeavoidvectorIv= []; Step3:Fori=1:d;

Step4:Iv= [IvI [i,i+1:d]]; Step5:End.

EventuallywegetnMIvectorsIv withk featuresubsets and eachinputvectorhasaclassificationlabelIvinFig.6.whereVsiisa vectorwiththelengthofd.Aftervectorization,theMIvectorofthe jthsampleisIvj.

Afterfeature extraction, we have completed the dimension selectionforMTSfeaturematricessofar.Throughusingthechosen dimensionsinfeaturematriceswecreatethefeaturevectorsand finallygetIvbyvectorizationasinputstoanSVMclassifier.Overall, theoriginalMTSdimensionisfurtherreducedbychoosingthetop kattributes.

Accordingtothedescriptionabove,weimproveadimension reductionmethodforMTStermedasfeatureselectionbasedon mutualinformationandclassseparability(FSMICS).

4. Experimentaldesignandresultanalysis

InordertoevaluatetheeffectivenessofFSMICSintermsof clas-sificationperformanceandoverallprocessingtime,weconducted averificationandanapplicationexperiment onEEGandNSCLC datasets,respectively.Inaddition,wecomparedtheperformance ofFSMICSwiththoseoftheotherthreemethodsincludingCLeVer, CoronaandAGV.Here,CLeVerandCoronautilizethe transforma-tionofcorrelationmatricesfordimensionreduction.AGVextracts theaverageandvarianceofeachvariablefordimensionreduction usingthemethodinRef.[11].

Foralldata,weperformeddimensionreductionwithfour fea-tureselectionmethods,respectively,andsetthesameparameters ofSVMforclassification.Subsequently,wegotthebaseline classi-ficationaccuracyandprocessingtimeofeachmethod.However,to

increasetheprecisionoftheexperimentalresults,weperformed theexperimentsonthesamedatasetwitheachmethod10times. Aftercalculationwegot theaverageclassificationaccuracy and processingtimeforeachone.Classificationmethodcanbeused asatooltotesttheeffectivenessoffeatureselectionmethod.There areseveralclassificationmethodssuchasdecisiontree(DT),neural network(NN)andSVM,whichhavedifferentbenefitsand limita-tions.Theeffectivenessofclassificationmethoddependslargely onthecharacteristicsofdata.SVMisapopularclassificationtool, whichoriginallypresentedbyVapnikandhisco-workers.Itisalso capableofnonlinearclassificationandhandlinghigh-dimensional datawell,thusappliedinmanyfieldssuchasbioinformatics,cancer diagnosis,imageclassification,textminingandfeatureselection [25,26].Therefore,FSMICSiscomparedwithCLeVer,Coronaand AGVviaSVMwhichisadoptedwithlinearkernel.Here,SVM clas-sificationiscompletedwithLIBSVM[27]byusingMATLAB.

4.1. Publicdatasetscomparison

ThisexperimentutilizesEEGdataastheresearchdataoffeature selection.TheEEGdataoriginatefroma largestudytoexamine EEGcorrelatesofgeneticpredisposition toalcoholism.TheEEG datacontainmeasurementsfrom64electrodesplacedonthescalp whicharesampledattherateof256Hz.Therearetwogroupsof subjects:alcoholicandcontrol.Weselected200samplesfromeach grouptoperformtheexperiment.Each grouphastwo datasets whicharetrainingandtestingincluding100samplesforeachone. AnSVMwithlinearkernelisadoptedfortheclassifierto evalu-atetheclassificationperformanceofFSMICS.Thus,theparameter inertiafactorcissetfor2.Inordertoguaranteetheexperimental precision,weperformed10experimentswitheachoffour meth-odsandgottheaverageofeachone,respectively.Wethengotthe comparisonofperformanceinFig.7forthefourmethodsonEEG data.TheXaxisshowsthechosennumberoffeaturesubsets.The YaxisshowstheclassificationaccuracyoftheSVM.

AscanbeseenfromFig.7,theclassificationaccuracyofCLeVer hasthefastestconvergenceratewiththeincreaseofthechosen numberoffeaturesubsets.FSMICSandCoronaaresimilarandget decentconvergencerate.TheslowestiswithAGV.However,when theclassificationaccuracyconverges,thechosennumberoffeature subsetsforFSMICSisminimum,andtheaverageclassification accu-racyofFSMICSislargerthanotherthreemethods.Therefore,itis

(6)

Fig.6. Thedatachangediagramonfeaturerankingandselectionphase.

observedthatthethreemethods,FSMICS,CLeVerandCoronahave goodstabilityaftertheconvergenceofclassificationaccuracy.The poorstabilityforAGVmethod,itmightbeduetothecharacteristics oftheEEGdataandthedesignconceptofAGV.AGVisafilterfeature subsetselectionmethodbasedontheacrossgroupvariancethat considersgroupstructureinthedata.AGVisnotoriginallydesign forMTSdata,anditcannotperformfeatureselectiondirectlyusing MTSdata.TheMTSitemsshouldbefirsttransformedintofeature matricesbeforefeatureselection[28].Wecanverifywhetheror notthisconclusionisapplicableinourclinicaldata.Inaddition, asaverificationexperiment,wecanfindfromtheresultsthatthe classificationperformanceoftheimprovedmethodisbetteroverall thantheoriginalfeatureselectionalgorithm[24].

4.2. NSCLCdatasetfeatureselectionanalysis

Theresearchdatainthissectionarethetreatmentsand follow-upmedicalinformationfrommiddle-latestageNSCLCpatientsina certainhospital.Thedatacollectedfromthemedicalhistorysheets

Fig.7.Comparisonofclassificationperformanceforfourfeatureselectionmethods

inEEGdataset.

include four parts:TCM (Traditional Chinese Medicine) clinical symptoms,TCMSyndrome,thephysicalandchemicalexamination ofclinicalsignificanceandpatientsself-administeredFACT-Lscore [29].Thereare68medicalindexesaltogetherforourexperiment.

Weselectedthefeatures fortheNSCLCdatasetbasedonthe improvedfeatureselectionmethodforMTS.Therearen=205 sam-plesinthesedatathat205patientsarefollowedupduring2–3 years.Eachsampleincludes68variablesXiasmedicalindexesin theclinicaldata.Theaveragelengthofeachsampletjis10.The patientsweredividedintotwomajorclasses:Class1(Deceased) includingthedeceasedpatients,andClass2(Living)containingthe aliveones.Theywereseparatedfromtheirdifferentsituationsof tumorprogressionandwhetherornotthepatientdiedduringthe observationperiodinthedata.Then,thereare94patientsinClass 1and111patientsinClass2.

WefirstutilizedtheKNNmethod[30]tofillthemissingdata andobtainedacompletematrixMj,j=1,...,n.Next,wecomputed theMIvaluebetweenvariablesbytheK–Lestimationmethodto geta68×68MImatrixIj,j=1,...,n.WethengotthevariableVs1 ofthegreatestclassificationcorrelationbytheclassseparability criterion.AndwegotthesecondarycorrelationvariableVs2 and othersbytheintroducedcriterion.Inthesameway,wecangetthe sequencesVs1,Vs2,...,Vs68accordingtothecorrelation.Finally, wechosethetopksequencesastheresultoffeatureselectionand vectorizedthefeaturesubsetstogetthevectorIv.

Inordertovalidatetheclassificationaccuracy,weintroducethe correspondingconfusionmatrixasisshowninTable1.

Accordingto theconfusion matrix, thesample classification accuracyP1,P2 and thetotal sampleclassificationaccuracyPof each class are as follows: P1=P11/(P11+P12), P2=P22/(P21+ P22) and P=(P11+P22)/



2i=1,j=1Pij. To avoid the deviationof experimentalresult,thedataaredividedinto10groups.Allsamples

Table1

Twoclassificationconfusionmatrix.

Predict→ True↓

Class1 Class2

Class1 P11 P12 P1

(7)

Table2

Comparisonofclassificationperformanceparametersforfourfeatureselectionmethods.

Featureselectionmethods Chosennumber P1(%) P2(%) P(%) Standarddeviation Time(s)

FSMICS 39 80.9 83.8 82.4 0.5706 0.8939

CLeVer 45 77.7 81.1 79.5 0.5765 0.8558

Corona 43 76.6 78.4 77.6 0.5992 1.1089

AGV 40 69.1 73.0 71.2 0.8818 0.7132

arerandomlydividedintotrainingandtestingsamplesfor10-fold cross-validations(10-foldCVs).

In this paper, according to the total sample classification accuracyas theperformance evaluation index, we performthe experimentwithSVMfor theclassificationaccuracyofsamples whosevariablesarechosenfrom1to68.Thetrendforthedataset classificationaccuracywiththenumberoffeaturesubsets after featureselectionisshowninFig.8.Inordertoguarantee exper-imentalprecision,thesecurvesareplottedfromtheaveragesof10 experimentsusingdifferenttrainingandtestingdatasets.When totalclassificationaccuracyofmodelconverges,theaverage clas-sificationaccuracy,classificationefficiencyandstandarddeviation ofeachclassafter10-foldCVsareindicatedinTable2.

AscanbeseenfromFig.8,foronething,theaverage classifi-cationaccuracyofFSMICSisthemaximumwhenitsclassification accuracyconverges.CLeVerandCoronagetthedecentandsimilar averageclassificationaccuracywhichistheminimumwithAGV. Foranotherthing,theconvergencerateofFSMICSandCLeVerare close,buttheinitialclassificationaccuracyofCLeVerissmallerthan FSMICS.More informationof classificationsuchasstability and processingtimecannotbepreciselyreflectedfromFig.8. There-fore,theclassificationperformanceofthefourmethodsisfurther analyzedbycombiningthedatainTable2.

InTable2,the“chosennumber”meansthatthechosennumber from68variableXiorderlyforeachmethodwhenitsclassification accuracyachievestheconvergence.Thesmallerthechosen num-beris,thebetterconvergenceperformancethemethodgets.And thisitemisapartofthereferencestandard oftheclassification performance.Thestandarddeviationdemonstratesthe classifica-tionstabilityafterconvergenceofeachmethodandthelastcolumn showstheprocessingtimeofclassification.AlldatafromTable2 arecalculatedintotheaveragesof10times.

AccordingtotheresultsinFig.8andTable2,someinformation canbeconcludedasfollows:

(1)Whenclassificationaccuracyconverges:

Fig.8.Comparisonofclassificationperformanceforfourfeatureselectionmethods

inNSCLCdataset.

(a)TheaverageclassificationaccuracyofdatasetafterFSMICS

method processing is the maximum, which reaches to

82.4%.

(b) Choosing39featuresubsetsforFSMICScanmakethe clas-sification accuracy converge, which is the minimum of theotherthree.Theothersareindescendingorder,AGV, CoronaandCLeVer.

(2)Afterclassificationaccuracyconverges,thestandarddeviations ofFSMICSandCLeVerareclose,whicharebothrelativelystable. Relativelyspeaking,thestabilityofCoronaisabitpoorandAGV isnotstable.

(3)Thetimeofclassificationfordatasetafterfourfeature selec-tionmethodswithSVMisapproximately1S.Thefastestiswith AGV,followedbyCLeVer,FSMICSandCorona.Sincethefeature vectorsaftervectorizationprocessstillhasahighdimension, theFSMICSisnotthebestonclassificationtimeamongfour methods.

(4)After the analysis of the clinical experiment, we can give the conclusionthat AGV is not applicableto ourMTS data type.However,AGVshowsgreatclassificationefficiency,which meansthatAGVisagooddimensionreductionmethodtosome extent.

Ascanbeseenfromtheclassificationresultofpublicand clini-caldatasets,bycomparingtotheotherthreeMTSfeatureselection methods,FSMICSgetsthemaximumaverageclassification accu-racyandthedecentconvergenceratewhenclassificationaccuracy converges.Moreover,itshowsgoodstabilityaftertheconvergence. Throughtheexperiments,wecandeterminethattheFSMICSyields thehighestselectionaccuracy,withrelativelyacceptable classifi-cationefficiency.

Fromthemedicalperspectiveofmathematicalstatistics,wecan concludethatFSMICScanclassifythepatientsintocorresponding classeswithrelativelyaccuracyintheclinicaldata.

5. Conclusion

In MTS data mining problems, the excessive data

dimen-sioncausesinaccuracy ofprobabilitydensitydistributionwhich increasesthecomputationalcomplexity,andinformation redun-danciesand irrelevantfeaturesmayleadtohighcomputational complexityandover-fittingproblems.Thusthispaperfocuseson dimensionreductionandimprovesaMTSfeatureselectionmethod throughcombiningaK–Linformationentropyestimationmethod forfeatureextractionbasedonMIandafeatureselectionalgorithm. WefirstcomputedtheMI valuetoextractthedistinct relation-shipoffeaturesbyusingtheK–Linformationentropyestimation method.Nextweusedclassseparabilitycriteriontoevaluatethe contributionon each variable. For considering theexistence of redundancies between variables, moreover, we introduced the classseparabilitycriterionbyaddingredundancyevaluation vari-ablestoeliminatetheinformationredundanciesandgetthefinal variables,whichyieldmorecorrelationandfewerredundancies. We then sorted theattributes in terms of their importanceto choosetheoptimalfeaturesforreducingthenumberof dimen-sions.Finally,wevectorizedthefeaturematricestosatisfytheinput requirementofclassification.

(8)

Throughaverificationandanapplicationexperimenton pub-licdatasetandclinicaldataset,respectively,theimprovedmethod is proved toeffectively reduce the dimensionsof MTS and get a betterapplication in TCM. Thatis to say, FSMICS can effec-tivelyreducecomputationcomplexityandeliminateinformation redundanciesandirrelevantfeaturesinMTStoachievethe pur-poseofdimensionreduction.Fromtheresultsofourexperiments, FSMICScanequallyhandlewithlinearandnonlinearrelationships betweendimensionsandcanbeappliedbetterinMTSdatatype withunequallengthsamples.However,thefeaturevectorsafter vectorizationprocessstillhasahighdimension,whichcanimpact thetimeperformanceofaclassifier.Therefore,theproblemofhow toconductfurtherprocessingand dimensionreductionfor fea-tureattributesundertheconditionofensuringtheclassification accuracytoimprovethetimeperformanceisoneoftheimportant directionsuponwhichourfutureresearchwillfocus.

Acknowledgements

We gratefullyacknowledgethe2014BeijingMunicipal

Edu-cation Commission plan on the scientific research project

(KM201410005004) and theBeijingLaboratory for Urban Mass Transitforgivingustheirsupport.Also,wethanktheHospitalof TraditionalChineseMedicine,CPUMS,ofBeijingreferencedinthis paper,forprovidingclinicalfollow-updataandprovidinghelpful adviceonthisissue.

References

[1]T.Fu,Areviewontimeseriesdatamining,Eng.Appl.Artif.Intell.24(1)(2011) 164–181.

[2]G.Ristanoski,J.Bailey,DistributionBasedDataFilteringforFinancialTime SeriesForecasting,Springer-Verlag,Berlin,2011,pp.122–131.

[3]R.Tanawongsuwan,A.Bobick,PerformanceAnalysisofTime-DistanceGait ParametersUnderDifferentSpeeds,Springer-Verlag,Berlin,2003,pp.715–724.

[4]X.L.Zhang,etal.,Event-relatedpotentialsduringobjectrecognitiontasks,Brain Res.Bull.38(6)(1995)531–538.

[5]W.A.Chaovalitwongse,Y.J.Fan,R.C.Sachdeo,Supportfeaturemachinefor classificationofabnormalbrainactivity,in:KDD-2007Proceedingsofthe Thir-teenthACMSIGKDDInternationalConferenceonKnowledgeDiscoveryand DataMining,2007,pp.113–122.

[6]M.Krawczak,G.Szkatula,Anapproachtodimensionalityreductionintime series,Inf.Sci.260(2014)15–36.

[7]J.M.Wang,etal.,Multivariatetimeseriessimilaritysearching,Sci.WorldJ.2014 (2014)1–8.

[8]K.Javed,H.A.Babri,M.Saeed,Featureselectionbasedonclass-dependent den-sitiesforhigh-dimensionalbinarydata,IEEETrans.Knowl.DataEng.24(3) (2012)465–477.

[9]C.H.Lu,Z.M.Zhu,X.F.Gu,Anintelligentsystemforlungcancerdiagnosisusing anewgeneticalgorithmbasedfeatureselectionmethod,J.Med.Syst.38.(979) (2014).

[10]H.Yoon,K.Y.Yang,C.Shahabi,Featuresubsetselectionandfeature rank-ingformultivariatetimeseries,IEEETrans.Knowl.DataEng.17(9)(2005) 1186–1198.

[11]F.Alimardani,etal.,Presentinganewsearchstrategytoselectsynchronization valuesforclassifyingbipolarmooddisordersfromschizophrenicpatients,Eng. Appl.Artif.Intell.26(2)(2013)913–923.

[12]V.Singh,K.P.Miyapuram,R.S.Bapi,DetectionofcognitivestatesfromfMRIdata usingmachinelearningtechniques,in:20thInternationalJointConferenceon ArtificialIntelligence,2007,pp.587–592.

[13]S.N.Li,Z.H.Zhang,J.Q.Duan,Anensemblemulti-labelfeatureselection algo-rithmbasedoninformationentropy,Int.ArabJ.Inf.Technol.11(4)(2014) 379–386.

[14]M.A.Hossain,X.P.Jia,M.Pickering,Subspacedetectionusingamutual informa-tionmeasureforhyperspectralimageclassification,IEEEGeosci.RemoteSens. Lett.11(2)(2014)424–428.

[15]A.Mehri,A.H.Darooneh,Theroleofentropyinwordranking,Phys.A:Stat. Mech.Appl.390(18–19)(2011)3157–3163.

[16]A.Kraskov,H.Stogbauer,P.Grassberger,Estimatingmutualinformation,Phys. Rev.E:Stat.NonlinearSoftMatterPhys.69(2004)066138.

[17]F.Rossi,etal.,Mutualinformationfortheselectionofrelevantvariablesin spectrometricnonlinearmodelling,Chemom.Intell.Lab.Syst.80(2)(2006) 215–226.

[18]F.Rossi,etal.,FastselectionofspectralvariableswithB-splinecompression, Chemom.Intell.Lab.Syst.86(2)(2007)208–218.

[19]L.F.Kozachenko,N.N.Leonenko,Sampleestimateoftheentropyofarandom vector,Prob.Inf.Transm.23(2)(1987)95–101.

[20]A.Nazarpour,P.Adibi,Two-stagemultiplekernellearningforsupervised dimensionalityreduction,PatternRecognit.48(5)(2015)1854–1862.

[21]M.Imani,H.Ghassemian,Featureextractionusingattractionpointsfor classi-ficationofhyperspectralimagesinasmallsamplesizesituation,IEEEGeosci. RemoteSens.Lett.11(11)(2014)1986–1990.

[22]Y.Xu,etal.,Fromtheideaofsparserepresentationtoarepresentation-based transformationmethodforfeatureextraction,Neurocomputing113(2013) 168–176.

[23]J.Li,AnIntroductiontoPatternRecognition,HigherEducationPress,Beijing, 1994.

[24]H.Min,L.Xiaoxin,Featureselectiontechniqueswithclassseparabilityfor mul-tivariatetimeseries,Neurocomputing110(2013)29–34.

[25]G.Zararsiz,F.Elmali,A.Ozturk,Baggingsupportvectormachinesforleukemia classification,Int.J.Comput.Sci.Issues(IJCSI)9(6)(2012)355–358.

[26]S.Korkmaz,G.Zararsiz,D.Goksuluk,Drug/nondrugclassificationusingsupport vectormachineswithvariousfeatureselectionstrategies,Comput.Methods Prog.Biomed.117(2)(2014)51–60.

[27]C.C.Chang,C.J.Lin,V.M.LIBS,Alibraryforsupportvectormachines,ACMTrans. Intell.Syst.Technol.2(2011)273SI.

[28]N.S.Dias,etal.,Featuredown-selectioninbrain-computerinterfaces.in Neu-ralEngineering,2009.NER‘09,in:4thInternationalIEEE/EMBSConference, Antalya,2009.

[29]C.Z.Y.S.ChonghuaWan,TheChineseversionofqualityoflifetableFACT-Lof lungcancerpatients,ChinaCancer9(3)(2000)109–110.

[30]P.J.Garcia-Laencina,etal.,Knearestneighbourswithmutualinformationfor simultaneousclassificationandmissingdataimputation,Neurocomputing72 (7-9)(2009)1483–1493.

References

Related documents