Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data

(1)

ContentslistsavailableatScienceDirect

Biomedical

Signal

Processing

and

Control

j ou rn a l h o m e pa g e :w w w . e l s e v i e r . c o m / l o c a t e / b s p c

Technical

note

Feature

selection

method

based

on

mutual

information

and

class

separability

for

dimension

reduction

in

multidimensional

time

series

for

clinical

data

Liying

Fang

a,b,c,∗

_,

_Han

_Zhao

a,b,c

_,

_Pu

_Wang

a,b,c

_,

_Mingwei

_Yu

d

_,

_Jianzhuo

_Yan

a,b,c

_,

Wenshuai

Cheng

a,b,c

_,

_Peiyu

_Chen

a,b,c

a_College_of_Electronic_Information_and_Control_Engineering,_Beijing_University_of_Technology,_Beijing_100124,_China

b_Engineering_Research_Center_of_Digital_Community,_Ministry_of_Education,_Beijing_100124,_China

c_Beijing_Key_Laboratory_of_{Computational}_Intelligence_and_Intelligent_System,_Beijing_100124,_China

d_Hospital_of_Traditional_Chinese_Medicine,_CPUMS,_Beijing_100010,_China

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received7December2014

Receivedinrevisedform28March2015

Accepted14May2015

Availableonline15June2015

Keywords:

Multidimensionaltimeseries

Dimensionreduction

Featureselection

Mutualinformation

Classseparability

a

b

s

t

r

a

c

t

Inclinicalmedicine,multidimensionaltimeseriesdatacanbeusedtofindtherulesofdiseaseprogress bydataminingtechnology,suchasclassificationandprediction.However,inmultidimensionaltime seriesdataminingproblems,theexcessivedatadimensioncausestheinaccuracyofprobabilitydensity distributiontoincreasethecomputationalcomplexity.Besides,informationredundancyandirrelevant featuresmayleadtohighcomputationalcomplexityandover-fittingproblems.Thecombinationofthese twofactorscanreducetheclassificationperformance.Toreducecomputationalcomplexityandto elim-inateinformationredundanciesandirrelevantfeatures,weimproveduponamultidimensionaltime seriesfeatureselectionmethodtoachievedimensionreduction.Theimprovedmethodselectsfeatures throughthecombinationoftheKozachenko–Leonenko(K–L)informationentropyestimationmethod forfeatureextractionbasedonmutualinformationandthefeatureselectionalgorithmbasedonclass separability.WeperformedexperimentsontheElectroencephalogram(EEG)datasetforverificationand thenon-smallcelllungcancer(NSCLC)clinicaldatasetforapplication.Theresultsshowthatwiththe comparisonofCLeVer,CoronaandAGV,respectively,theimprovedmethodcaneffectivelyreducethe dimensionsofmultidimensionaltimeseriesforclinicaldata.

1. Introduction

Time-seriesanalysisiswidelyusedinmanyapplicationfields, including medical data, financial data, moving-object tracking, human-computerinteractioninterface[1,2],etc.Dataminingfor timeserieshasveryimportantvalue,suchasresearchonthe classi-fication,clusteringorpredictionofdata,whichcanassistinfinding thepotentialrulesoftimeseriesdataandprovidesupport. Cur-rently,mostresearchesfocusonunivariatetimeseriesprocessing. However, with the development of data-collection technology, moreandmoremultidimensionaltimeseriesdatabecome avail-able,whichcontainaconsiderableamountofpotentiallyvaluable information.Forexample,diabetesclinicaldata,asakindoftime

∗ Correspondingauthorat:BeijingUniversityofTechnology,CollegeofElectronic

InformationandControlEngineering,No.100,PingleyuanStreet,Beijing100124,

China.Tel.:+8613810101581.

E-mailaddress:[email protected](L.Fang).

seriesdata,containabundantinformationincludingfoodintake, drugs intakeand daily activities.The EEGdata1 _which _contain plentifulinformationonbrainwavesreflectcorrelationswith cer-taingeneticpredispositionand disease.In Tanawongsuwanand Bobick[3],22markersarespreadoverthehumanbodyto mea-surethemovementsof body parts whilewalking.In medicine, EEGdatafrom64 electrodesplacedonthescalp aremonitored toexaminethecorrelationofgeneticpredispositiontoalcoholism [4].Therefore,inrecentyears,multidimensionaltimeseries clas-sification,dimensionreduction andsimilaritysearchtechnology havebecomecommonconcernsforresearchersinthefieldofdata mining[5–7].

Atimeseriesisaseriesofobservations,

xi(t); [i=1,...,d; t=1,...,n] (1)

1_{http://archive.ics.uci.edu/ml/datasets/EEG+Database}_.

http://dx.doi.org/10.1016/j.bspc.2015.05.011

(2)

UsingKNN methodfor filling

missingvalue

Using the feature extractionwith K-L estimation methodforfeature

space transformation

Using the feature selectionalgorithm

forthedimension attributesgrading and choosing K optimalfeatures

Vectorization for MImatrices

Putthedimension reductionfeature

vectorsinto classifier

DataPre-processing FeatureExtraction FeatureSelection Vectorization _{Classification}Input the

Fig.1.MTSdatadimensionreductionprocess.

madesequentiallythroughtimewhereiindexesthemeasurements madeateachtimepointt.Itiscalledaunivariatetimeserieswhen disequalto1andamultidimensionaltimeseries(MTS)whend isequaltoorgreaterthan2.DuetothemassproductionofMTS dataandthegrowingdemandforclassificationinvariousfields, MTSclassificationtechniqueshavebeenappliedinmanyfields, suchastheclassificationofRNAin bioinformatics,handwriting recognitionandelectrocardiogram(ECG)patternmatching.AsMTS dataaretypicalhigh-dimensionaldata[8],manyfeaturesareeither irrelevantorredundant.Moreover,dimensiondisasters,whichare causedbyexcessivedimensions,existinmultidimensionalfeature space.Therefore,howtoeffectivelyselectusefulfeaturesfor clas-sificationfromtherawMTSdatahasbecomeacurrentresearch hotspotwithahighdegreeofdifficulty.

Featureextractionandfeatureselectionarethemainmethods ofdimensionreduction[9].Notonlycantheyreduceclassification errors,butcanalsoimproveclassificationefficiency.Currently, fea-tureselectionmethodsareusedwidelyinMTSincludingCLeVer [10]andAGV[11]basedonPCAandCorona[12]basedona cor-relationcoefficientmethod.However,theycanonlyidentifylinear relationshipsamongdimensions,andtheircalculationsaremore suited to dealing withequal length samplesof MTS. However, unequal length data are indeed the norm in clinical follow-up becausepatientsmaydieorotherwisebelostfromthedataset. Mutualinformation(MI)isanimportantconceptininformation theory.MIcanbeappliedtononlineartransformationand extrac-tionofhigh-orderstatistics.Therefore,weconsiderusingMIfor featureextractiontotransformthedifferentlengthsofsamplesto equallength.Meanwhile,bythenonlinearrelationshipin multi-dimensionalfeaturespace,wecaneffectivelyreducedimensions throughfeatureselection.However,theprobabilitydensity estima-tionmethodhasagreatinfluenceonMIcomputationwhichimplies whetherthemethodcaneffectivelyandefficientlyexpressthe typ-icalfeaturestopromotetheaccuracyoffeatureselection.Thus,itis significanttochooseanapplicableprobabilitydensityestimation methodforMIfeatureextractioninMTS.Inaddition,thefeature subsetevaluationcriterionisthekeyissueinfeatureselectionand itsqualitydirectlyimpactsthefinalresult.Theclassseparability criterionisoneoftheimportantevaluationcriteria.Between-class distancecriterionisoneofthecommonlyusedmethods.Weget betterclassseparabilitybyminimizingwithin-classdistanceand maximizingbetween-classdistancesimultaneously.Thepurpose offeatureselectionis tochoosethefeaturesubsets withlarger classseparability.However,sincetheredundantvariableshavean obviouseffectontheresultofclassification,whilethe between-classdistancecriterioncannoteliminatetheredundantvariables, weconsiderthatintroduceacriterionwithredundancyvariableto eliminateredundanciesandirrelevantfeatures.Wethenintroduce theimprovedmethodwhich caneffectivelychoosetheoptimal featuresandreducedimensions.

Thispaperaimstobreakthelimitationthatcorrelation matri-cesintraditionalMTSfeatureselectionmethodcanonlymeasure thelinearrelationshipsbetweenvariables.Weimprovethefeature selection methodbased onmutualinformation and class sepa-rability.WeﬁrstcomputetheMIvaluebyaprobabilitydensity

estimation methodtoextract thelinear andnonlinear relation-shipbetweenvariablesthroughMImatrices.Byconsideringthe existenceofredundancieswenextintroducethefeatureselection algorithm basedonclass separabilitytoeliminateredundancies andmakehighcorrelationbetweenthechosenfeaturesubsetsand thetargetclass.Wethenusetheimprovedmethodfordimension reductionprocessingonMTSasisshowninFig.1.Finally,we ver-ifythatiftheimprovedmethodcaneffectivelyreducedimensions throughthecontrastexperimentsbasedonclassiﬁcationaccuracy withanSVMclassiﬁer.

Theremainder ofthis paperis organizedasfollows.Section 2introducesthefeatureextractionmethodbasedonMI.Section 3introducesthefeatureselectionalgorithmbasedonclass sepa-rability.Theexperimentandresultwiththeimprovedmethodis followedinSection4,whichisfollowedbyconclusioninSection5.

2. FeatureextractionmethodbasedonMI

ThissectionintroducestheMIfeatureextractionmethod,which involvessomebasicconceptsofentropyandMIasareshownin Refs.[13–15].

Ingeneral,aMTScanbeexpressedasad×nmatrix[xi,t]d×n. Eachmatrixexpressesonesample.Assumethattheseresearchdata includeseveralsamplesandthattwoofthesamplesare[xi,t]d×n1 and[xi,t]d×n2.Generallyspeaking,eachvariableofwithin-sample samplingtimehasthesamelength.However,thelengthoftwo samplesofbetween-samplesamplingtimet asn1 andn2 isnot alwaysthesame.Therefore,eachMTSsampleisexpressedbya d×tjmatrix[xi,t]d×tj.

⎡

⎢

⎣

x11 x12 ... x1,t_j ... ... ... ... xi1 ... ... xitj xd1 xd2 ... xdt_j

⎤

⎥

⎦

,

i=1,2...,d; t=1,2,...,tj

(2)

wherexi,tdenotesthesamplingvalueofthevariablexiwiththeith dimensionattimepointt.Mjsubstitutesforthejthsamplematrix [xi,t]d×tjasisshowninFig.2.tjDenotesthesamplingtimelengthof thejthsample.Xishowstheindexsequenceoftheithdimension. BecauseeachsequenceXihasdifferentdegreesofimportanceto classiﬁcation,Xiisexpressedindifferentcolorsandthatadeeper colormeansahigherdegreeofimportance.However,underthe initialcondition,for degreeofimportance foreach sequence is unknown,thecolorsareshowninrandomdepth.Fig.3showsaMTS datasetwithnsamplesandeachsampleisamatrixwithdimension dandsamplingtimelengthtj.Foranygivensample,thedegreeof importanceofeachsequenceisinitiallyunknown.

BythedeﬁnitionofinformationentropyandMI,theprobability densitydistributionofrandomvariablesmustbeapproximately estimatedbeforeMIcalculation.Onekindofprobabilitydensity estimationmethodbasedonnearestneighborisintroducedin[16], whichhasgoodeffectusedin[17,18]aswell.Theadvantageofthis methodisthatthereisnoneedtoestimatetheprobabilitydensity distributionfunctionforanyvariables.

(3)

Fig.2. ThediagramofsamplematrixMj.

Assumethat X and Y are two random variables, where X=

xi,i=1...n

,Y=

yi,i=1...n

,.In[19],theK–Lnearest neigh-borestimationentropyisdeﬁnedas

ˆ H (X) =− (k) + (n) +log (cd)+d n n

i=1 log (εX(i,k)) (3)

wherek isthenumber ofthenearest neighborpoints; disthe dimensionofdata;cd istheunit-spherevolumeofd;εX(i,k) is thedistancebetweenxiandthekthnearestneighborpoint;and isthedoublegammafunction.

Basedonformula(3),tosolvetheregressionproblem,onekind ofMIestimationmethodsisproposedbyKraskovin[16]as ˆI (X;Y) = (n) + (k) −1 k− 1 n n

i=1

(x(i))

+

y(i)

(4) wherex(i)isthenumberofpointsonthedistancewhich isno morethanε (i,k) betweenXandxi.y(i)issimilartox(i).Here, ε (i,k) =max (εX(i,k) ,εY(i,k)).

Thispaperutilizesformula(4)onMIcomputationby sequen-tiallycomputingXiofeachMjwithallsequences (X1,X2,...Xd). Hence,eachsequenceistransformedintoaMIvectorViandeach matrixistransformedintoad_×dMImatrixIj.Thus,thefeature extractionmethodmaybedescribedasfollows:

Input:aMTSdatasetwithsizenanddimensiond.(Assumed≥tj) Output:d×dMImatrixIj. Ij=

⎡

⎢

⎣

Ij(X1,X1) Ij(X1,X2) ··· Ij(X1,Xd) Ij(X2,X1) Ij(X2,X2) ··· Ij(X2,Xd) . . . ... . .. ... Ij(Xd,X1) Ij(Xd,X2) ··· Ij(Xd,Xd)

⎤

⎥

⎦

, j=1,2,...,n (5)

Theithvariable ofthejthsample is shown byMI vectoras follows:Vji=

Ij(Xi,X1),Ij(Xi,X2),...,Ij(Xi,Xd),

,i=1,2,...,d. Therefore,eachvariableinIjcanbedescribedwithavectorVji.The datachangeprocessfortheMIfeatureextractionphaseintheMTS datasetis showninFig.4which transformseach sampleintoa squarematrixsamplewithequaldimension.

3. Featureselectionbasedonclassseparability

AfterthefeatureextractionbasedonMIprocessing,the com-binationform offeature spacefor the samplematrix hasbeen convertedintoanMImatrixallowingthesefeaturestoexpressthe datacharacteristicsmoreclearlyandtoachievebettereffectinthe featureselectionmethod.First,weintroducetheprincipleofclass separabilitycriterioninSection3.1.Then,wereferenceafeature selectionalgorithmbasedonclassseparabilitytoeliminate redun-dantvariablesandweconverttheMImatricesintovectorsasinputs ofanSVMclassiﬁerinSection3.2.

3.1. Classseparabilitycriterion

Theclassseparabilitycriterionisoftenusedasthebasisin fea-tureselection.Thereareseveralcriteriathatarecommonlyused, suchas,theclass separability criterion basedonthegeometric distance,theprobabilitydensityfunctionandtheposterior proba-bility.Thelattertwoneedtoobtainthestatisticalcharacteristicsof samples,whilethebetween-classdistancecriterionismore com-monlyusedinclassseparabilitybasedonthegeometricdistance. Althoughthedeﬁnitionsofthebetween-classcriteriavaryinthe literature[20–22],theyare essentiallybasedonthe conceptof distance.

Assumethattherearectypes.ωjisthejthclass.x(j)_k isthekth sampleofωj.Letnjbethesamplenumberofωj.nisthetotalsample number.mjisthesamplemeanvectorofωjandmisthemeanvector ofallsamples. mj= 1 nj n_j

k=1 x(j)_k (6) m=

c1 j=1nj c

j=1 n_j

k=1 x(j)_k (7)

whereJwisthewithin-classtotalmeansquaredistance.

Jw= c

j=1

PjJj

(8)

(4)

Fig.4. Thedatachangediagramonfeatureextractionphase.

wherePj(j=1,2,...,c) isthepriorprobabilityofωjwhichcanbe estimatedbynjandn.Jjisthewithin-classmeansquaredistance ofωj. Jj= 1 n n_j

k=1

x_k(j)−mj

T

x(j)_k −mj

(9)

Jbisthebetween-classtotalmeansquaredistance.

Jb= c

j=1 Pj

mj−m

T

mj−m

(10)

Inordertomakeaneffectthatminimizeswithin-classdistance andmaximizesbetween-classdistancesimultaneously,theclass separabilitycriterionJmisconstitutedintuitively[23]asfollows: Jm= Jb

Jw

(11)

3.2. Featureselectionbasedonclassseparability

Inviewofthedistanceratiobetweenthebetween-classand within-classisthecontributionforclassificationtothevariable. Therefore,theideaofclassseparabilityistochoosetheoptimal featuresubsetsforclassification.AsMTSisakindof multidimen-sionaldata,duringtheprocessoffeatureselection,theredundant variableshaveobviouseffectupontheresultofclassification. How-ever,thebetween-classcriterionfunctioncannoteliminatethese redundantvariableswhichreducetheclassificationaccuracy.Thus, weintroduceafunction[24]witharedundancyevaluationvariable Jftopromotetheaccuracyoffeatureselection:

Jf= 1

_S

| S|

i=1 C

j=1 nj

mj−mij

T (12)

where

S

indicatesthevariablenumberwhichhasbeenchosenin thefeaturesubsetsS.mijshowstheithaverageofthejthsample

inS.ThebiggertheJfis,thesmallerthebetween-classredundant variableis.Therefore,thecriterionJrisasfollows:

Jr= Jb+Jf Jw

(13) Thefollowingprocessisreferencedbythegeneralconceptof thefeatureselectionalgorithmin[24].

Algorithm1

MTSfeatureselectionalgorithmbasedonclassseparability. Input:aMTSdataset(d×dMImatrixIjwithjsamples). Output:theoptimalfeaturesubsetwithKsequences.

Step1:ComputeeachJbiandJwiinIj.Becauseoftheresultsof bothJbiandJwiaretheproductbetweenarowandacolumnvector, theirvaluesarequantitativevalues.Inthisway,allvariablescanbe sortedbytheformulaasfollows:

Jmi= Jbi

Jwi,i=1,2,...,N

(14) ThelargertheJmi is,themoreimportanttheithvariableisto theclassiﬁcationresult.

Step2:ChoosethelargestvariableofJmiastheﬁrstelementof S.

Step3:Considertheexistenceofredundanciesbetween vari-ables and introduce the redundancy evaluation variable Jﬁ to synthesizeselectionvariables.

Jmi= Jbi+Jﬁ

Jwi ,i=1,2,...,N

(15) ThelargertheJriis,themoreimportanttheithvariableistothe classiﬁcationresult.ThelargestJriischosenasfeatureelement.

Step4:If

S

isK,thenalgorithmends,elseloopoperationwith Step3.

Tofindthehighestattributetotheclassificationcontribution rate,wefirst computeeachdimensionofMI matrices withthe introducedclassseparabilitycriterion.Next,wesortallattributes usingthecriterionandchooseksequencestoreducethe dimen-sionsofmatrices.AsisshowninFig.5,thedimensionswiththe deepercolorsmeanthatthehigherattributestotheclassification contributionrateareinthefront.Vsiistheithsequenceafter sor-ting.

(5)

Fig.5.Thedatachangediagramonfeatureselectionphase.

InordertosatisfytheinputrequirementofanSVMclassiﬁer, MTSmatricesneedtobetransformedintofeaturevectors,whichis calledaprocessofvectorization.Thespeciﬁcalgorithmof vector-izationisasfollows:Algorithm2

MTSvectorizationalgorithmbasedonMI. Input:MTSsampleafterfeatureselection. Output:MIvectorIv.

Step1:ComputetheMIvaluebetweenvariablesinMTSandget aMImatrixI;

Step2:InitializeavoidvectorIv= []; Step3:Fori=1:d;

Step4:Iv= [IvI [i,i+1:d]]; Step5:End.

EventuallywegetnMIvectorsIv withk featuresubsets and eachinputvectorhasaclassiﬁcationlabelIvinFig.6.whereVsiisa vectorwiththelengthofd.Aftervectorization,theMIvectorofthe jthsampleisIvj.

Afterfeature extraction, we have completed the dimension selectionforMTSfeaturematricessofar.Throughusingthechosen dimensionsinfeaturematriceswecreatethefeaturevectorsand ﬁnallygetIvbyvectorizationasinputstoanSVMclassiﬁer.Overall, theoriginalMTSdimensionisfurtherreducedbychoosingthetop kattributes.

Accordingtothedescriptionabove,weimproveadimension reductionmethodforMTStermedasfeatureselectionbasedon mutualinformationandclassseparability(FSMICS).

4. Experimentaldesignandresultanalysis

InordertoevaluatetheeffectivenessofFSMICSintermsof clas-siﬁcationperformanceandoverallprocessingtime,weconducted averiﬁcationandanapplicationexperiment onEEGandNSCLC datasets,respectively.Inaddition,wecomparedtheperformance ofFSMICSwiththoseoftheotherthreemethodsincludingCLeVer, CoronaandAGV.Here,CLeVerandCoronautilizethe transforma-tionofcorrelationmatricesfordimensionreduction.AGVextracts theaverageandvarianceofeachvariablefordimensionreduction usingthemethodinRef.[11].

Foralldata,weperformeddimensionreductionwithfour fea-tureselectionmethods,respectively,andsetthesameparameters ofSVMforclassiﬁcation.Subsequently,wegotthebaseline classi-ﬁcationaccuracyandprocessingtimeofeachmethod.However,to

increasetheprecisionoftheexperimentalresults,weperformed theexperimentsonthesamedatasetwitheachmethod10times. Aftercalculationwegot theaverageclassificationaccuracy and processingtimeforeachone.Classificationmethodcanbeused asatooltotesttheeffectivenessoffeatureselectionmethod.There areseveralclassificationmethodssuchasdecisiontree(DT),neural network(NN)andSVM,whichhavedifferentbenefitsand limita-tions.Theeffectivenessofclassificationmethoddependslargely onthecharacteristicsofdata.SVMisapopularclassificationtool, whichoriginallypresentedbyVapnikandhisco-workers.Itisalso capableofnonlinearclassificationandhandlinghigh-dimensional datawell,thusappliedinmanyfieldssuchasbioinformatics,cancer diagnosis,imageclassification,textminingandfeatureselection [25,26].Therefore,FSMICSiscomparedwithCLeVer,Coronaand AGVviaSVMwhichisadoptedwithlinearkernel.Here,SVM clas-sificationiscompletedwithLIBSVM[27]byusingMATLAB.

4.1. Publicdatasetscomparison

ThisexperimentutilizesEEGdataastheresearchdataoffeature selection.TheEEGdataoriginatefroma largestudytoexamine EEGcorrelatesofgeneticpredisposition toalcoholism.TheEEG datacontainmeasurementsfrom64electrodesplacedonthescalp whicharesampledattherateof256Hz.Therearetwogroupsof subjects:alcoholicandcontrol.Weselected200samplesfromeach grouptoperformtheexperiment.Each grouphastwo datasets whicharetrainingandtestingincluding100samplesforeachone. AnSVMwithlinearkernelisadoptedfortheclassifierto evalu-atetheclassificationperformanceofFSMICS.Thus,theparameter inertiafactorcissetfor2.Inordertoguaranteetheexperimental precision,weperformed10experimentswitheachoffour meth-odsandgottheaverageofeachone,respectively.Wethengotthe comparisonofperformanceinFig.7forthefourmethodsonEEG data.TheXaxisshowsthechosennumberoffeaturesubsets.The YaxisshowstheclassificationaccuracyoftheSVM.

AscanbeseenfromFig.7,theclassificationaccuracyofCLeVer hasthefastestconvergenceratewiththeincreaseofthechosen numberoffeaturesubsets.FSMICSandCoronaaresimilarandget decentconvergencerate.TheslowestiswithAGV.However,when theclassificationaccuracyconverges,thechosennumberoffeature subsetsforFSMICSisminimum,andtheaverageclassification accu-racyofFSMICSislargerthanotherthreemethods.Therefore,itis

(6)

Fig.6. Thedatachangediagramonfeaturerankingandselectionphase.

observedthatthethreemethods,FSMICS,CLeVerandCoronahave goodstabilityaftertheconvergenceofclassificationaccuracy.The poorstabilityforAGVmethod,itmightbeduetothecharacteristics oftheEEGdataandthedesignconceptofAGV.AGVisafilterfeature subsetselectionmethodbasedontheacrossgroupvariancethat considersgroupstructureinthedata.AGVisnotoriginallydesign forMTSdata,anditcannotperformfeatureselectiondirectlyusing MTSdata.TheMTSitemsshouldbefirsttransformedintofeature matricesbeforefeatureselection[28].Wecanverifywhetheror notthisconclusionisapplicableinourclinicaldata.Inaddition, asaverificationexperiment,wecanfindfromtheresultsthatthe classificationperformanceoftheimprovedmethodisbetteroverall thantheoriginalfeatureselectionalgorithm[24].

4.2. NSCLCdatasetfeatureselectionanalysis

Theresearchdatainthissectionarethetreatmentsand follow-upmedicalinformationfrommiddle-latestageNSCLCpatientsina certainhospital.Thedatacollectedfromthemedicalhistorysheets

Fig.7.Comparisonofclassiﬁcationperformanceforfourfeatureselectionmethods

inEEGdataset.

include four parts:TCM (Traditional Chinese Medicine) clinical symptoms,TCMSyndrome,thephysicalandchemicalexamination ofclinicalsigniﬁcanceandpatientsself-administeredFACT-Lscore [29].Thereare68medicalindexesaltogetherforourexperiment.

Weselectedthefeatures fortheNSCLCdatasetbasedonthe improvedfeatureselectionmethodforMTS.Therearen=205 sam-plesinthesedatathat205patientsarefollowedupduring2–3 years.Eachsampleincludes68variablesXiasmedicalindexesin theclinicaldata.Theaveragelengthofeachsampletjis10.The patientsweredividedintotwomajorclasses:Class1(Deceased) includingthedeceasedpatients,andClass2(Living)containingthe aliveones.Theywereseparatedfromtheirdifferentsituationsof tumorprogressionandwhetherornotthepatientdiedduringthe observationperiodinthedata.Then,thereare94patientsinClass 1and111patientsinClass2.

WefirstutilizedtheKNNmethod[30]tofillthemissingdata andobtainedacompletematrixMj,j=1,...,n.Next,wecomputed theMIvaluebetweenvariablesbytheK–Lestimationmethodto geta68×68MImatrixIj,j=1,...,n.WethengotthevariableVs1 ofthegreatestclassificationcorrelationbytheclassseparability criterion.AndwegotthesecondarycorrelationvariableVs2 and othersbytheintroducedcriterion.Inthesameway,wecangetthe sequencesVs1,Vs2,...,Vs68accordingtothecorrelation.Finally, wechosethetopksequencesastheresultoffeatureselectionand vectorizedthefeaturesubsetstogetthevectorIv.

Inordertovalidatetheclassiﬁcationaccuracy,weintroducethe correspondingconfusionmatrixasisshowninTable1.

Accordingto theconfusion matrix, thesample classiﬁcation accuracyP1,P2 and thetotal sampleclassiﬁcationaccuracyPof each class are as follows: P1=P11/(P11+P12), P2=P22/(P21+ P22) and P=(P11+P22)/

2_i=1,j=1Pij. To avoid the deviationof experimentalresult,thedataaredividedinto10groups.Allsamples

Table1

Twoclassiﬁcationconfusionmatrix.

Predict→ True↓

Class1 Class2

Class1 P11 P12 P1

(7)

Table2

Comparisonofclassiﬁcationperformanceparametersforfourfeatureselectionmethods.

Featureselectionmethods Chosennumber P1(%) P2(%) P(%) Standarddeviation Time(s)

FSMICS 39 80.9 83.8 82.4 0.5706 0.8939

CLeVer 45 77.7 81.1 79.5 0.5765 0.8558

Corona 43 76.6 78.4 77.6 0.5992 1.1089

AGV 40 69.1 73.0 71.2 0.8818 0.7132

arerandomlydividedintotrainingandtestingsamplesfor10-fold cross-validations(10-foldCVs).

In this paper, according to the total sample classification accuracyas theperformance evaluation index, we performthe experimentwithSVMfor theclassificationaccuracyofsamples whosevariablesarechosenfrom1to68.Thetrendforthedataset classificationaccuracywiththenumberoffeaturesubsets after featureselectionisshowninFig.8.Inordertoguarantee exper-imentalprecision,thesecurvesareplottedfromtheaveragesof10 experimentsusingdifferenttrainingandtestingdatasets.When totalclassificationaccuracyofmodelconverges,theaverage clas-sificationaccuracy,classificationefficiencyandstandarddeviation ofeachclassafter10-foldCVsareindicatedinTable2.

AscanbeseenfromFig.8,foronething,theaverage classifi-cationaccuracyofFSMICSisthemaximumwhenitsclassification accuracyconverges.CLeVerandCoronagetthedecentandsimilar averageclassificationaccuracywhichistheminimumwithAGV. Foranotherthing,theconvergencerateofFSMICSandCLeVerare close,buttheinitialclassificationaccuracyofCLeVerissmallerthan FSMICS.More informationof classificationsuchasstability and processingtimecannotbepreciselyreflectedfromFig.8. There-fore,theclassificationperformanceofthefourmethodsisfurther analyzedbycombiningthedatainTable2.

InTable2,the“chosennumber”meansthatthechosennumber from68variableXiorderlyforeachmethodwhenitsclassification accuracyachievestheconvergence.Thesmallerthechosen num-beris,thebetterconvergenceperformancethemethodgets.And thisitemisapartofthereferencestandard oftheclassification performance.Thestandarddeviationdemonstratesthe classifica-tionstabilityafterconvergenceofeachmethodandthelastcolumn showstheprocessingtimeofclassification.AlldatafromTable2 arecalculatedintotheaveragesof10times.

AccordingtotheresultsinFig.8andTable2,someinformation canbeconcludedasfollows:

(1)Whenclassiﬁcationaccuracyconverges:

Fig.8.Comparisonofclassiﬁcationperformanceforfourfeatureselectionmethods

inNSCLCdataset.

(a)TheaverageclassiﬁcationaccuracyofdatasetafterFSMICS

method processing is the maximum, which reaches to

82.4%.

(b) Choosing39featuresubsetsforFSMICScanmakethe clas-siﬁcation accuracy converge, which is the minimum of theotherthree.Theothersareindescendingorder,AGV, CoronaandCLeVer.

(2)Afterclassiﬁcationaccuracyconverges,thestandarddeviations ofFSMICSandCLeVerareclose,whicharebothrelativelystable. Relativelyspeaking,thestabilityofCoronaisabitpoorandAGV isnotstable.

(3)Thetimeofclassiﬁcationfordatasetafterfourfeature selec-tionmethodswithSVMisapproximately1S.Thefastestiswith AGV,followedbyCLeVer,FSMICSandCorona.Sincethefeature vectorsaftervectorizationprocessstillhasahighdimension, theFSMICSisnotthebestonclassiﬁcationtimeamongfour methods.

(4)After the analysis of the clinical experiment, we can give the conclusionthat AGV is not applicableto ourMTS data type.However,AGVshowsgreatclassiﬁcationefﬁciency,which meansthatAGVisagooddimensionreductionmethodtosome extent.

Ascanbeseenfromtheclassificationresultofpublicand clini-caldatasets,bycomparingtotheotherthreeMTSfeatureselection methods,FSMICSgetsthemaximumaverageclassification accu-racyandthedecentconvergenceratewhenclassificationaccuracy converges.Moreover,itshowsgoodstabilityaftertheconvergence. Throughtheexperiments,wecandeterminethattheFSMICSyields thehighestselectionaccuracy,withrelativelyacceptable classifi-cationefficiency.

Fromthemedicalperspectiveofmathematicalstatistics,wecan concludethatFSMICScanclassifythepatientsintocorresponding classeswithrelativelyaccuracyintheclinicaldata.

5. Conclusion

In MTS data mining problems, the excessive data

dimen-sioncausesinaccuracy ofprobabilitydensitydistributionwhich increasesthecomputationalcomplexity,andinformation redun-danciesand irrelevantfeaturesmayleadtohighcomputational complexityandover-fittingproblems.Thusthispaperfocuseson dimensionreductionandimprovesaMTSfeatureselectionmethod throughcombiningaK–Linformationentropyestimationmethod forfeatureextractionbasedonMIandafeatureselectionalgorithm. WefirstcomputedtheMI valuetoextractthedistinct relation-shipoffeaturesbyusingtheK–Linformationentropyestimation method.Nextweusedclassseparabilitycriteriontoevaluatethe contributionon each variable. For considering theexistence of redundancies between variables, moreover, we introduced the classseparabilitycriterionbyaddingredundancyevaluation vari-ablestoeliminatetheinformationredundanciesandgetthefinal variables,whichyieldmorecorrelationandfewerredundancies. We then sorted theattributes in terms of their importanceto choosetheoptimalfeaturesforreducingthenumberof dimen-sions.Finally,wevectorizedthefeaturematricestosatisfytheinput requirementofclassification.

(8)

Throughaverificationandanapplicationexperimenton pub-licdatasetandclinicaldataset,respectively,theimprovedmethod is proved toeffectively reduce the dimensionsof MTS and get a betterapplication in TCM. Thatis to say, FSMICS can effec-tivelyreducecomputationcomplexityandeliminateinformation redundanciesandirrelevantfeaturesinMTStoachievethe pur-poseofdimensionreduction.Fromtheresultsofourexperiments, FSMICScanequallyhandlewithlinearandnonlinearrelationships betweendimensionsandcanbeappliedbetterinMTSdatatype withunequallengthsamples.However,thefeaturevectorsafter vectorizationprocessstillhasahighdimension,whichcanimpact thetimeperformanceofaclassifier.Therefore,theproblemofhow toconductfurtherprocessingand dimensionreductionfor fea-tureattributesundertheconditionofensuringtheclassification accuracytoimprovethetimeperformanceisoneoftheimportant directionsuponwhichourfutureresearchwillfocus.

Acknowledgements

We gratefullyacknowledgethe2014BeijingMunicipal

Edu-cation Commission plan on the scientiﬁc research project

(KM201410005004) and theBeijingLaboratory for Urban Mass Transitforgivingustheirsupport.Also,wethanktheHospitalof TraditionalChineseMedicine,CPUMS,ofBeijingreferencedinthis paper,forprovidingclinicalfollow-updataandprovidinghelpful adviceonthisissue.

References

[1]T.Fu,Areviewontimeseriesdatamining,Eng.Appl.Artif.Intell.24(1)(2011) 164–181.

[2]G.Ristanoski,J.Bailey,DistributionBasedDataFilteringforFinancialTime SeriesForecasting,Springer-Verlag,Berlin,2011,pp.122–131.

[3]R.Tanawongsuwan,A.Bobick,PerformanceAnalysisofTime-DistanceGait ParametersUnderDifferentSpeeds,Springer-Verlag,Berlin,2003,pp.715–724.

[4]X.L.Zhang,etal.,Event-relatedpotentialsduringobjectrecognitiontasks,Brain Res.Bull.38(6)(1995)531–538.

[5]W.A.Chaovalitwongse,Y.J.Fan,R.C.Sachdeo,Supportfeaturemachinefor classiﬁcationofabnormalbrainactivity,in:KDD-2007Proceedingsofthe Thir-teenthACMSIGKDDInternationalConferenceonKnowledgeDiscoveryand DataMining,2007,pp.113–122.

[6]M.Krawczak,G.Szkatula,Anapproachtodimensionalityreductionintime series,Inf.Sci.260(2014)15–36.

[7]J.M.Wang,etal.,Multivariatetimeseriessimilaritysearching,Sci.WorldJ.2014 (2014)1–8.

[8]K.Javed,H.A.Babri,M.Saeed,Featureselectionbasedonclass-dependent den-sitiesforhigh-dimensionalbinarydata,IEEETrans.Knowl.DataEng.24(3) (2012)465–477.

[9]C.H.Lu,Z.M.Zhu,X.F.Gu,Anintelligentsystemforlungcancerdiagnosisusing anewgeneticalgorithmbasedfeatureselectionmethod,J.Med.Syst.38.(979) (2014).

[10]H.Yoon,K.Y.Yang,C.Shahabi,Featuresubsetselectionandfeature rank-ingformultivariatetimeseries,IEEETrans.Knowl.DataEng.17(9)(2005) 1186–1198.

[11]F.Alimardani,etal.,Presentinganewsearchstrategytoselectsynchronization valuesforclassifyingbipolarmooddisordersfromschizophrenicpatients,Eng. Appl.Artif.Intell.26(2)(2013)913–923.

[12]V.Singh,K.P.Miyapuram,R.S.Bapi,DetectionofcognitivestatesfromfMRIdata usingmachinelearningtechniques,in:20thInternationalJointConferenceon ArtiﬁcialIntelligence,2007,pp.587–592.

[13]S.N.Li,Z.H.Zhang,J.Q.Duan,Anensemblemulti-labelfeatureselection algo-rithmbasedoninformationentropy,Int.ArabJ.Inf.Technol.11(4)(2014) 379–386.

[14]M.A.Hossain,X.P.Jia,M.Pickering,Subspacedetectionusingamutual informa-tionmeasureforhyperspectralimageclassiﬁcation,IEEEGeosci.RemoteSens. Lett.11(2)(2014)424–428.

[15]A.Mehri,A.H.Darooneh,Theroleofentropyinwordranking,Phys.A:Stat. Mech.Appl.390(18–19)(2011)3157–3163.

[16]A.Kraskov,H.Stogbauer,P.Grassberger,Estimatingmutualinformation,Phys. Rev.E:Stat.NonlinearSoftMatterPhys.69(2004)066138.

[17]F.Rossi,etal.,Mutualinformationfortheselectionofrelevantvariablesin spectrometricnonlinearmodelling,Chemom.Intell.Lab.Syst.80(2)(2006) 215–226.

[18]F.Rossi,etal.,FastselectionofspectralvariableswithB-splinecompression, Chemom.Intell.Lab.Syst.86(2)(2007)208–218.

[19]L.F.Kozachenko,N.N.Leonenko,Sampleestimateoftheentropyofarandom vector,Prob.Inf.Transm.23(2)(1987)95–101.

[20]A.Nazarpour,P.Adibi,Two-stagemultiplekernellearningforsupervised dimensionalityreduction,PatternRecognit.48(5)(2015)1854–1862.

[21]M.Imani,H.Ghassemian,Featureextractionusingattractionpointsfor classi-ﬁcationofhyperspectralimagesinasmallsamplesizesituation,IEEEGeosci. RemoteSens.Lett.11(11)(2014)1986–1990.

[22]Y.Xu,etal.,Fromtheideaofsparserepresentationtoarepresentation-based transformationmethodforfeatureextraction,Neurocomputing113(2013) 168–176.

[23]J.Li,AnIntroductiontoPatternRecognition,HigherEducationPress,Beijing, 1994.

[24]H.Min,L.Xiaoxin,Featureselectiontechniqueswithclassseparabilityfor mul-tivariatetimeseries,Neurocomputing110(2013)29–34.

[25]G.Zararsiz,F.Elmali,A.Ozturk,Baggingsupportvectormachinesforleukemia classiﬁcation,Int.J.Comput.Sci.Issues(IJCSI)9(6)(2012)355–358.

[26]S.Korkmaz,G.Zararsiz,D.Goksuluk,Drug/nondrugclassiﬁcationusingsupport vectormachineswithvariousfeatureselectionstrategies,Comput.Methods Prog.Biomed.117(2)(2014)51–60.

[27]C.C.Chang,C.J.Lin,V.M.LIBS,Alibraryforsupportvectormachines,ACMTrans. Intell.Syst.Technol.2(2011)273SI.

[28]N.S.Dias,etal.,Featuredown-selectioninbrain-computerinterfaces.in Neu-ralEngineering,2009.NER‘09,in:4thInternationalIEEE/EMBSConference, Antalya,2009.

[29]C.Z.Y.S.ChonghuaWan,TheChineseversionofqualityoflifetableFACT-Lof lungcancerpatients,ChinaCancer9(3)(2000)109–110.

[30]P.J.Garcia-Laencina,etal.,Knearestneighbourswithmutualinformationfor simultaneousclassiﬁcationandmissingdataimputation,Neurocomputing72 (7-9)(2009)1483–1493.

Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data

Biomedical

Signal

Processing

and

Control

Technical

note

Feature

selection

method

based

on

mutual

information

and

class

separability

for

dimension

reduction

in

multidimensional

time

series

for

clinical

data

Liying

Fang

,

Han

Zhao

,

Pu

Wang

,

Mingwei

Yu

,

Jianzhuo

Yan

,

Wenshuai

Cheng

,

Peiyu

Chen

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

⎡

⎢

⎢

⎢

⎣

⎤

⎥

⎥

⎥

⎦







_,

_Han

_Zhao

_,

_Pu

_Wang

_,

_Mingwei

_Yu

_,

_Jianzhuo

_Yan

_,

_,

_Peiyu

_Chen