ContentslistsavailableatScienceDirect
Biomedical
Signal
Processing
and
Control
j ou rn a l h o m e pa g e :w w w . e l s e v i e r . c o m / l o c a t e / b s p c
Technical
note
Feature
selection
method
based
on
mutual
information
and
class
separability
for
dimension
reduction
in
multidimensional
time
series
for
clinical
data
Liying
Fang
a,b,c,∗,
Han
Zhao
a,b,c,
Pu
Wang
a,b,c,
Mingwei
Yu
d,
Jianzhuo
Yan
a,b,c,
Wenshuai
Cheng
a,b,c,
Peiyu
Chen
a,b,caCollegeofElectronicInformationandControlEngineering,BeijingUniversityofTechnology,Beijing100124,China
bEngineeringResearchCenterofDigitalCommunity,MinistryofEducation,Beijing100124,China
cBeijingKeyLaboratoryofComputationalIntelligenceandIntelligentSystem,Beijing100124,China
dHospitalofTraditionalChineseMedicine,CPUMS,Beijing100010,China
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:
Received7December2014
Receivedinrevisedform28March2015
Accepted14May2015
Availableonline15June2015
Keywords:
Multidimensionaltimeseries
Dimensionreduction
Featureselection
Mutualinformation
Classseparability
a
b
s
t
r
a
c
t
Inclinicalmedicine,multidimensionaltimeseriesdatacanbeusedtofindtherulesofdiseaseprogress bydataminingtechnology,suchasclassificationandprediction.However,inmultidimensionaltime seriesdataminingproblems,theexcessivedatadimensioncausestheinaccuracyofprobabilitydensity distributiontoincreasethecomputationalcomplexity.Besides,informationredundancyandirrelevant featuresmayleadtohighcomputationalcomplexityandover-fittingproblems.Thecombinationofthese twofactorscanreducetheclassificationperformance.Toreducecomputationalcomplexityandto elim-inateinformationredundanciesandirrelevantfeatures,weimproveduponamultidimensionaltime seriesfeatureselectionmethodtoachievedimensionreduction.Theimprovedmethodselectsfeatures throughthecombinationoftheKozachenko–Leonenko(K–L)informationentropyestimationmethod forfeatureextractionbasedonmutualinformationandthefeatureselectionalgorithmbasedonclass separability.WeperformedexperimentsontheElectroencephalogram(EEG)datasetforverificationand thenon-smallcelllungcancer(NSCLC)clinicaldatasetforapplication.Theresultsshowthatwiththe comparisonofCLeVer,CoronaandAGV,respectively,theimprovedmethodcaneffectivelyreducethe dimensionsofmultidimensionaltimeseriesforclinicaldata.
©2015TheAuthors.PublishedbyElsevierLtd.ThisisanopenaccessarticleundertheCCBY-NC-ND license(http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
Time-seriesanalysisiswidelyusedinmanyapplicationfields, including medical data, financial data, moving-object tracking, human-computerinteractioninterface[1,2],etc.Dataminingfor timeserieshasveryimportantvalue,suchasresearchonthe classi-fication,clusteringorpredictionofdata,whichcanassistinfinding thepotentialrulesoftimeseriesdataandprovidesupport. Cur-rently,mostresearchesfocusonunivariatetimeseriesprocessing. However, with the development of data-collection technology, moreandmoremultidimensionaltimeseriesdatabecome avail-able,whichcontainaconsiderableamountofpotentiallyvaluable information.Forexample,diabetesclinicaldata,asakindoftime
∗ Correspondingauthorat:BeijingUniversityofTechnology,CollegeofElectronic
InformationandControlEngineering,No.100,PingleyuanStreet,Beijing100124,
China.Tel.:+8613810101581.
E-mailaddress:[email protected](L.Fang).
seriesdata,containabundantinformationincludingfoodintake, drugs intakeand daily activities.The EEGdata1 which contain plentifulinformationonbrainwavesreflectcorrelationswith cer-taingeneticpredispositionand disease.In Tanawongsuwanand Bobick[3],22markersarespreadoverthehumanbodyto mea-surethemovementsof body parts whilewalking.In medicine, EEGdatafrom64 electrodesplacedonthescalp aremonitored toexaminethecorrelationofgeneticpredispositiontoalcoholism [4].Therefore,inrecentyears,multidimensionaltimeseries clas-sification,dimensionreduction andsimilaritysearchtechnology havebecomecommonconcernsforresearchersinthefieldofdata mining[5–7].
Atimeseriesisaseriesofobservations,
xi(t); [i=1,...,d; t=1,...,n] (1)
1http://archive.ics.uci.edu/ml/datasets/EEG+Database.
http://dx.doi.org/10.1016/j.bspc.2015.05.011
1746-8094/©2015TheAuthors.PublishedbyElsevierLtd.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/
UsingKNN methodfor filling
missingvalue
Using the feature extractionwith K-L estimation methodforfeature
space transformation
Using the feature selectionalgorithm
forthedimension attributesgrading and choosing K optimalfeatures
Vectorization for MImatrices
Putthedimension reductionfeature
vectorsinto classifier
DataPre-processing FeatureExtraction FeatureSelection Vectorization ClassificationInput the
Fig.1.MTSdatadimensionreductionprocess.
madesequentiallythroughtimewhereiindexesthemeasurements madeateachtimepointt.Itiscalledaunivariatetimeserieswhen disequalto1andamultidimensionaltimeseries(MTS)whend isequaltoorgreaterthan2.DuetothemassproductionofMTS dataandthegrowingdemandforclassificationinvariousfields, MTSclassificationtechniqueshavebeenappliedinmanyfields, suchastheclassificationofRNAin bioinformatics,handwriting recognitionandelectrocardiogram(ECG)patternmatching.AsMTS dataaretypicalhigh-dimensionaldata[8],manyfeaturesareeither irrelevantorredundant.Moreover,dimensiondisasters,whichare causedbyexcessivedimensions,existinmultidimensionalfeature space.Therefore,howtoeffectivelyselectusefulfeaturesfor clas-sificationfromtherawMTSdatahasbecomeacurrentresearch hotspotwithahighdegreeofdifficulty.
Featureextractionandfeatureselectionarethemainmethods ofdimensionreduction[9].Notonlycantheyreduceclassification errors,butcanalsoimproveclassificationefficiency.Currently, fea-tureselectionmethodsareusedwidelyinMTSincludingCLeVer [10]andAGV[11]basedonPCAandCorona[12]basedona cor-relationcoefficientmethod.However,theycanonlyidentifylinear relationshipsamongdimensions,andtheircalculationsaremore suited to dealing withequal length samplesof MTS. However, unequal length data are indeed the norm in clinical follow-up becausepatientsmaydieorotherwisebelostfromthedataset. Mutualinformation(MI)isanimportantconceptininformation theory.MIcanbeappliedtononlineartransformationand extrac-tionofhigh-orderstatistics.Therefore,weconsiderusingMIfor featureextractiontotransformthedifferentlengthsofsamplesto equallength.Meanwhile,bythenonlinearrelationshipin multi-dimensionalfeaturespace,wecaneffectivelyreducedimensions throughfeatureselection.However,theprobabilitydensity estima-tionmethodhasagreatinfluenceonMIcomputationwhichimplies whetherthemethodcaneffectivelyandefficientlyexpressthe typ-icalfeaturestopromotetheaccuracyoffeatureselection.Thus,itis significanttochooseanapplicableprobabilitydensityestimation methodforMIfeatureextractioninMTS.Inaddition,thefeature subsetevaluationcriterionisthekeyissueinfeatureselectionand itsqualitydirectlyimpactsthefinalresult.Theclassseparability criterionisoneoftheimportantevaluationcriteria.Between-class distancecriterionisoneofthecommonlyusedmethods.Weget betterclassseparabilitybyminimizingwithin-classdistanceand maximizingbetween-classdistancesimultaneously.Thepurpose offeatureselectionis tochoosethefeaturesubsets withlarger classseparability.However,sincetheredundantvariableshavean obviouseffectontheresultofclassification,whilethe between-classdistancecriterioncannoteliminatetheredundantvariables, weconsiderthatintroduceacriterionwithredundancyvariableto eliminateredundanciesandirrelevantfeatures.Wethenintroduce theimprovedmethodwhich caneffectivelychoosetheoptimal featuresandreducedimensions.
Thispaperaimstobreakthelimitationthatcorrelation matri-cesintraditionalMTSfeatureselectionmethodcanonlymeasure thelinearrelationshipsbetweenvariables.Weimprovethefeature selection methodbased onmutualinformation and class sepa-rability.WefirstcomputetheMIvaluebyaprobabilitydensity
estimation methodtoextract thelinear andnonlinear relation-shipbetweenvariablesthroughMImatrices.Byconsideringthe existenceofredundancieswenextintroducethefeatureselection algorithm basedonclass separabilitytoeliminateredundancies andmakehighcorrelationbetweenthechosenfeaturesubsetsand thetargetclass.Wethenusetheimprovedmethodfordimension reductionprocessingonMTSasisshowninFig.1.Finally,we ver-ifythatiftheimprovedmethodcaneffectivelyreducedimensions throughthecontrastexperimentsbasedonclassificationaccuracy withanSVMclassifier.
Theremainder ofthis paperis organizedasfollows.Section 2introducesthefeatureextractionmethodbasedonMI.Section 3introducesthefeatureselectionalgorithmbasedonclass sepa-rability.Theexperimentandresultwiththeimprovedmethodis followedinSection4,whichisfollowedbyconclusioninSection5.
2. FeatureextractionmethodbasedonMI
ThissectionintroducestheMIfeatureextractionmethod,which involvessomebasicconceptsofentropyandMIasareshownin Refs.[13–15].
Ingeneral,aMTScanbeexpressedasad×nmatrix[xi,t]d×n. Eachmatrixexpressesonesample.Assumethattheseresearchdata includeseveralsamplesandthattwoofthesamplesare[xi,t]d×n1 and[xi,t]d×n2.Generallyspeaking,eachvariableofwithin-sample samplingtimehasthesamelength.However,thelengthoftwo samplesofbetween-samplesamplingtimet asn1 andn2 isnot alwaysthesame.Therefore,eachMTSsampleisexpressedbya d×tjmatrix[xi,t]d×tj.
⎡
⎢
⎢
⎢
⎣
x11 x12 ... x1,tj ... ... ... ... xi1 ... ... xitj xd1 xd2 ... xdtj⎤
⎥
⎥
⎥
⎦
, i=1,2...,d; t=1,2,...,tj(2)
wherexi,tdenotesthesamplingvalueofthevariablexiwiththeith dimensionattimepointt.Mjsubstitutesforthejthsamplematrix [xi,t]d×tjasisshowninFig.2.tjDenotesthesamplingtimelengthof thejthsample.Xishowstheindexsequenceoftheithdimension. BecauseeachsequenceXihasdifferentdegreesofimportanceto classification,Xiisexpressedindifferentcolorsandthatadeeper colormeansahigherdegreeofimportance.However,underthe initialcondition,for degreeofimportance foreach sequence is unknown,thecolorsareshowninrandomdepth.Fig.3showsaMTS datasetwithnsamplesandeachsampleisamatrixwithdimension dandsamplingtimelengthtj.Foranygivensample,thedegreeof importanceofeachsequenceisinitiallyunknown.
BythedefinitionofinformationentropyandMI,theprobability densitydistributionofrandomvariablesmustbeapproximately estimatedbeforeMIcalculation.Onekindofprobabilitydensity estimationmethodbasedonnearestneighborisintroducedin[16], whichhasgoodeffectusedin[17,18]aswell.Theadvantageofthis methodisthatthereisnoneedtoestimatetheprobabilitydensity distributionfunctionforanyvariables.
Fig.2. ThediagramofsamplematrixMj.
Assumethat X and Y are two random variables, where X=
xi,i=1...n
,Y=
yi,i=1...n,.In[19],theK–Lnearest neigh-borestimationentropyisdefinedas
ˆ H (X) =− (k) + (n) +log (cd)+d n n
i=1 log (εX(i,k)) (3)wherek isthenumber ofthenearest neighborpoints; disthe dimensionofdata;cd istheunit-spherevolumeofd;εX(i,k) is thedistancebetweenxiandthekthnearestneighborpoint;and isthedoublegammafunction.
Basedonformula(3),tosolvetheregressionproblem,onekind ofMIestimationmethodsisproposedbyKraskovin[16]as ˆI (X;Y) = (n) + (k) −1 k− 1 n n
i=1 (x(i))+ y(i)
(4) wherex(i)isthenumberofpointsonthedistancewhich isno morethanε (i,k) betweenXandxi.y(i)issimilartox(i).Here, ε (i,k) =max (εX(i,k) ,εY(i,k)).
Thispaperutilizesformula(4)onMIcomputationby sequen-tiallycomputingXiofeachMjwithallsequences (X1,X2,...Xd). Hence,eachsequenceistransformedintoaMIvectorViandeach matrixistransformedintoad×dMImatrixIj.Thus,thefeature extractionmethodmaybedescribedasfollows:
Input:aMTSdatasetwithsizenanddimensiond.(Assumed≥tj) Output:d×dMImatrixIj. Ij=
⎡
⎢
⎢
⎢
⎢
⎣
Ij(X1,X1) Ij(X1,X2) ··· Ij(X1,Xd) Ij(X2,X1) Ij(X2,X2) ··· Ij(X2,Xd) . . . ... . .. ... Ij(Xd,X1) Ij(Xd,X2) ··· Ij(Xd,Xd)⎤
⎥
⎥
⎥
⎥
⎦
, j=1,2,...,n (5)Theithvariable ofthejthsample is shown byMI vectoras follows:Vji=
Ij(Xi,X1),Ij(Xi,X2),...,Ij(Xi,Xd),
,i=1,2,...,d. Therefore,eachvariableinIjcanbedescribedwithavectorVji.The datachangeprocessfortheMIfeatureextractionphaseintheMTS datasetis showninFig.4which transformseach sampleintoa squarematrixsamplewithequaldimension.
3. Featureselectionbasedonclassseparability
AfterthefeatureextractionbasedonMIprocessing,the com-binationform offeature spacefor the samplematrix hasbeen convertedintoanMImatrixallowingthesefeaturestoexpressthe datacharacteristicsmoreclearlyandtoachievebettereffectinthe featureselectionmethod.First,weintroducetheprincipleofclass separabilitycriterioninSection3.1.Then,wereferenceafeature selectionalgorithmbasedonclassseparabilitytoeliminate redun-dantvariablesandweconverttheMImatricesintovectorsasinputs ofanSVMclassifierinSection3.2.
3.1. Classseparabilitycriterion
Theclassseparabilitycriterionisoftenusedasthebasisin fea-tureselection.Thereareseveralcriteriathatarecommonlyused, suchas,theclass separability criterion basedonthegeometric distance,theprobabilitydensityfunctionandtheposterior proba-bility.Thelattertwoneedtoobtainthestatisticalcharacteristicsof samples,whilethebetween-classdistancecriterionismore com-monlyusedinclassseparabilitybasedonthegeometricdistance. Althoughthedefinitionsofthebetween-classcriteriavaryinthe literature[20–22],theyare essentiallybasedonthe conceptof distance.
Assumethattherearectypes.ωjisthejthclass.x(j)k isthekth sampleofωj.Letnjbethesamplenumberofωj.nisthetotalsample number.mjisthesamplemeanvectorofωjandmisthemeanvector ofallsamples. mj= 1 nj nj
k=1 x(j)k (6) m=c1 j=1nj c j=1 nj k=1 x(j)k (7)whereJwisthewithin-classtotalmeansquaredistance.
Jw= c
j=1 PjJj(8)
Fig.4. Thedatachangediagramonfeatureextractionphase.
wherePj(j=1,2,...,c) isthepriorprobabilityofωjwhichcanbe estimatedbynjandn.Jjisthewithin-classmeansquaredistance ofωj. Jj= 1 n nj
k=1 xk(j)−mj T x(j)k −mj (9)Jbisthebetween-classtotalmeansquaredistance.
Jb= c
j=1 Pj mj−mT mj−m
(10)
Inordertomakeaneffectthatminimizeswithin-classdistance andmaximizesbetween-classdistancesimultaneously,theclass separabilitycriterionJmisconstitutedintuitively[23]asfollows: Jm= Jb
Jw
(11)
3.2. Featureselectionbasedonclassseparability
Inviewofthedistanceratiobetweenthebetween-classand within-classisthecontributionforclassificationtothevariable. Therefore,theideaofclassseparabilityistochoosetheoptimal featuresubsetsforclassification.AsMTSisakindof multidimen-sionaldata,duringtheprocessoffeatureselection,theredundant variableshaveobviouseffectupontheresultofclassification. How-ever,thebetween-classcriterionfunctioncannoteliminatethese redundantvariableswhichreducetheclassificationaccuracy.Thus, weintroduceafunction[24]witharedundancyevaluationvariable Jftopromotetheaccuracyoffeatureselection:
Jf= 1
S | S| i=1 C j=1 nj mj−mijmj−mij
T (12)
where
Sindicatesthevariablenumberwhichhasbeenchosenin thefeaturesubsetsS.mijshowstheithaverageofthejthsampleinS.ThebiggertheJfis,thesmallerthebetween-classredundant variableis.Therefore,thecriterionJrisasfollows:
Jr= Jb+Jf Jw
(13) Thefollowingprocessisreferencedbythegeneralconceptof thefeatureselectionalgorithmin[24].
Algorithm1
MTSfeatureselectionalgorithmbasedonclassseparability. Input:aMTSdataset(d×dMImatrixIjwithjsamples). Output:theoptimalfeaturesubsetwithKsequences.
Step1:ComputeeachJbiandJwiinIj.Becauseoftheresultsof bothJbiandJwiaretheproductbetweenarowandacolumnvector, theirvaluesarequantitativevalues.Inthisway,allvariablescanbe sortedbytheformulaasfollows:
Jmi= Jbi
Jwi,i=1,2,...,N
(14) ThelargertheJmi is,themoreimportanttheithvariableisto theclassificationresult.
Step2:ChoosethelargestvariableofJmiasthefirstelementof S.
Step3:Considertheexistenceofredundanciesbetween vari-ables and introduce the redundancy evaluation variable Jfi to synthesizeselectionvariables.
Jmi= Jbi+Jfi
Jwi ,i=1,2,...,N
(15) ThelargertheJriis,themoreimportanttheithvariableistothe classificationresult.ThelargestJriischosenasfeatureelement.
Step4:If
SisK,thenalgorithmends,elseloopoperationwith Step3.Tofindthehighestattributetotheclassificationcontribution rate,wefirst computeeachdimensionofMI matrices withthe introducedclassseparabilitycriterion.Next,wesortallattributes usingthecriterionandchooseksequencestoreducethe dimen-sionsofmatrices.AsisshowninFig.5,thedimensionswiththe deepercolorsmeanthatthehigherattributestotheclassification contributionrateareinthefront.Vsiistheithsequenceafter sor-ting.
Fig.5.Thedatachangediagramonfeatureselectionphase.
InordertosatisfytheinputrequirementofanSVMclassifier, MTSmatricesneedtobetransformedintofeaturevectors,whichis calledaprocessofvectorization.Thespecificalgorithmof vector-izationisasfollows:Algorithm2
MTSvectorizationalgorithmbasedonMI. Input:MTSsampleafterfeatureselection. Output:MIvectorIv.
Step1:ComputetheMIvaluebetweenvariablesinMTSandget aMImatrixI;
Step2:InitializeavoidvectorIv= []; Step3:Fori=1:d;
Step4:Iv= [IvI [i,i+1:d]]; Step5:End.
EventuallywegetnMIvectorsIv withk featuresubsets and eachinputvectorhasaclassificationlabelIvinFig.6.whereVsiisa vectorwiththelengthofd.Aftervectorization,theMIvectorofthe jthsampleisIvj.
Afterfeature extraction, we have completed the dimension selectionforMTSfeaturematricessofar.Throughusingthechosen dimensionsinfeaturematriceswecreatethefeaturevectorsand finallygetIvbyvectorizationasinputstoanSVMclassifier.Overall, theoriginalMTSdimensionisfurtherreducedbychoosingthetop kattributes.
Accordingtothedescriptionabove,weimproveadimension reductionmethodforMTStermedasfeatureselectionbasedon mutualinformationandclassseparability(FSMICS).
4. Experimentaldesignandresultanalysis
InordertoevaluatetheeffectivenessofFSMICSintermsof clas-sificationperformanceandoverallprocessingtime,weconducted averificationandanapplicationexperiment onEEGandNSCLC datasets,respectively.Inaddition,wecomparedtheperformance ofFSMICSwiththoseoftheotherthreemethodsincludingCLeVer, CoronaandAGV.Here,CLeVerandCoronautilizethe transforma-tionofcorrelationmatricesfordimensionreduction.AGVextracts theaverageandvarianceofeachvariablefordimensionreduction usingthemethodinRef.[11].
Foralldata,weperformeddimensionreductionwithfour fea-tureselectionmethods,respectively,andsetthesameparameters ofSVMforclassification.Subsequently,wegotthebaseline classi-ficationaccuracyandprocessingtimeofeachmethod.However,to
increasetheprecisionoftheexperimentalresults,weperformed theexperimentsonthesamedatasetwitheachmethod10times. Aftercalculationwegot theaverageclassificationaccuracy and processingtimeforeachone.Classificationmethodcanbeused asatooltotesttheeffectivenessoffeatureselectionmethod.There areseveralclassificationmethodssuchasdecisiontree(DT),neural network(NN)andSVM,whichhavedifferentbenefitsand limita-tions.Theeffectivenessofclassificationmethoddependslargely onthecharacteristicsofdata.SVMisapopularclassificationtool, whichoriginallypresentedbyVapnikandhisco-workers.Itisalso capableofnonlinearclassificationandhandlinghigh-dimensional datawell,thusappliedinmanyfieldssuchasbioinformatics,cancer diagnosis,imageclassification,textminingandfeatureselection [25,26].Therefore,FSMICSiscomparedwithCLeVer,Coronaand AGVviaSVMwhichisadoptedwithlinearkernel.Here,SVM clas-sificationiscompletedwithLIBSVM[27]byusingMATLAB.
4.1. Publicdatasetscomparison
ThisexperimentutilizesEEGdataastheresearchdataoffeature selection.TheEEGdataoriginatefroma largestudytoexamine EEGcorrelatesofgeneticpredisposition toalcoholism.TheEEG datacontainmeasurementsfrom64electrodesplacedonthescalp whicharesampledattherateof256Hz.Therearetwogroupsof subjects:alcoholicandcontrol.Weselected200samplesfromeach grouptoperformtheexperiment.Each grouphastwo datasets whicharetrainingandtestingincluding100samplesforeachone. AnSVMwithlinearkernelisadoptedfortheclassifierto evalu-atetheclassificationperformanceofFSMICS.Thus,theparameter inertiafactorcissetfor2.Inordertoguaranteetheexperimental precision,weperformed10experimentswitheachoffour meth-odsandgottheaverageofeachone,respectively.Wethengotthe comparisonofperformanceinFig.7forthefourmethodsonEEG data.TheXaxisshowsthechosennumberoffeaturesubsets.The YaxisshowstheclassificationaccuracyoftheSVM.
AscanbeseenfromFig.7,theclassificationaccuracyofCLeVer hasthefastestconvergenceratewiththeincreaseofthechosen numberoffeaturesubsets.FSMICSandCoronaaresimilarandget decentconvergencerate.TheslowestiswithAGV.However,when theclassificationaccuracyconverges,thechosennumberoffeature subsetsforFSMICSisminimum,andtheaverageclassification accu-racyofFSMICSislargerthanotherthreemethods.Therefore,itis
Fig.6. Thedatachangediagramonfeaturerankingandselectionphase.
observedthatthethreemethods,FSMICS,CLeVerandCoronahave goodstabilityaftertheconvergenceofclassificationaccuracy.The poorstabilityforAGVmethod,itmightbeduetothecharacteristics oftheEEGdataandthedesignconceptofAGV.AGVisafilterfeature subsetselectionmethodbasedontheacrossgroupvariancethat considersgroupstructureinthedata.AGVisnotoriginallydesign forMTSdata,anditcannotperformfeatureselectiondirectlyusing MTSdata.TheMTSitemsshouldbefirsttransformedintofeature matricesbeforefeatureselection[28].Wecanverifywhetheror notthisconclusionisapplicableinourclinicaldata.Inaddition, asaverificationexperiment,wecanfindfromtheresultsthatthe classificationperformanceoftheimprovedmethodisbetteroverall thantheoriginalfeatureselectionalgorithm[24].
4.2. NSCLCdatasetfeatureselectionanalysis
Theresearchdatainthissectionarethetreatmentsand follow-upmedicalinformationfrommiddle-latestageNSCLCpatientsina certainhospital.Thedatacollectedfromthemedicalhistorysheets
Fig.7.Comparisonofclassificationperformanceforfourfeatureselectionmethods
inEEGdataset.
include four parts:TCM (Traditional Chinese Medicine) clinical symptoms,TCMSyndrome,thephysicalandchemicalexamination ofclinicalsignificanceandpatientsself-administeredFACT-Lscore [29].Thereare68medicalindexesaltogetherforourexperiment.
Weselectedthefeatures fortheNSCLCdatasetbasedonthe improvedfeatureselectionmethodforMTS.Therearen=205 sam-plesinthesedatathat205patientsarefollowedupduring2–3 years.Eachsampleincludes68variablesXiasmedicalindexesin theclinicaldata.Theaveragelengthofeachsampletjis10.The patientsweredividedintotwomajorclasses:Class1(Deceased) includingthedeceasedpatients,andClass2(Living)containingthe aliveones.Theywereseparatedfromtheirdifferentsituationsof tumorprogressionandwhetherornotthepatientdiedduringthe observationperiodinthedata.Then,thereare94patientsinClass 1and111patientsinClass2.
WefirstutilizedtheKNNmethod[30]tofillthemissingdata andobtainedacompletematrixMj,j=1,...,n.Next,wecomputed theMIvaluebetweenvariablesbytheK–Lestimationmethodto geta68×68MImatrixIj,j=1,...,n.WethengotthevariableVs1 ofthegreatestclassificationcorrelationbytheclassseparability criterion.AndwegotthesecondarycorrelationvariableVs2 and othersbytheintroducedcriterion.Inthesameway,wecangetthe sequencesVs1,Vs2,...,Vs68accordingtothecorrelation.Finally, wechosethetopksequencesastheresultoffeatureselectionand vectorizedthefeaturesubsetstogetthevectorIv.
Inordertovalidatetheclassificationaccuracy,weintroducethe correspondingconfusionmatrixasisshowninTable1.
Accordingto theconfusion matrix, thesample classification accuracyP1,P2 and thetotal sampleclassificationaccuracyPof each class are as follows: P1=P11/(P11+P12), P2=P22/(P21+ P22) and P=(P11+P22)/
2i=1,j=1Pij. To avoid the deviationof experimentalresult,thedataaredividedinto10groups.AllsamplesTable1
Twoclassificationconfusionmatrix.
Predict→ True↓
Class1 Class2
Class1 P11 P12 P1
Table2
Comparisonofclassificationperformanceparametersforfourfeatureselectionmethods.
Featureselectionmethods Chosennumber P1(%) P2(%) P(%) Standarddeviation Time(s)
FSMICS 39 80.9 83.8 82.4 0.5706 0.8939
CLeVer 45 77.7 81.1 79.5 0.5765 0.8558
Corona 43 76.6 78.4 77.6 0.5992 1.1089
AGV 40 69.1 73.0 71.2 0.8818 0.7132
arerandomlydividedintotrainingandtestingsamplesfor10-fold cross-validations(10-foldCVs).
In this paper, according to the total sample classification accuracyas theperformance evaluation index, we performthe experimentwithSVMfor theclassificationaccuracyofsamples whosevariablesarechosenfrom1to68.Thetrendforthedataset classificationaccuracywiththenumberoffeaturesubsets after featureselectionisshowninFig.8.Inordertoguarantee exper-imentalprecision,thesecurvesareplottedfromtheaveragesof10 experimentsusingdifferenttrainingandtestingdatasets.When totalclassificationaccuracyofmodelconverges,theaverage clas-sificationaccuracy,classificationefficiencyandstandarddeviation ofeachclassafter10-foldCVsareindicatedinTable2.
AscanbeseenfromFig.8,foronething,theaverage classifi-cationaccuracyofFSMICSisthemaximumwhenitsclassification accuracyconverges.CLeVerandCoronagetthedecentandsimilar averageclassificationaccuracywhichistheminimumwithAGV. Foranotherthing,theconvergencerateofFSMICSandCLeVerare close,buttheinitialclassificationaccuracyofCLeVerissmallerthan FSMICS.More informationof classificationsuchasstability and processingtimecannotbepreciselyreflectedfromFig.8. There-fore,theclassificationperformanceofthefourmethodsisfurther analyzedbycombiningthedatainTable2.
InTable2,the“chosennumber”meansthatthechosennumber from68variableXiorderlyforeachmethodwhenitsclassification accuracyachievestheconvergence.Thesmallerthechosen num-beris,thebetterconvergenceperformancethemethodgets.And thisitemisapartofthereferencestandard oftheclassification performance.Thestandarddeviationdemonstratesthe classifica-tionstabilityafterconvergenceofeachmethodandthelastcolumn showstheprocessingtimeofclassification.AlldatafromTable2 arecalculatedintotheaveragesof10times.
AccordingtotheresultsinFig.8andTable2,someinformation canbeconcludedasfollows:
(1)Whenclassificationaccuracyconverges:
Fig.8.Comparisonofclassificationperformanceforfourfeatureselectionmethods
inNSCLCdataset.
(a)TheaverageclassificationaccuracyofdatasetafterFSMICS
method processing is the maximum, which reaches to
82.4%.
(b) Choosing39featuresubsetsforFSMICScanmakethe clas-sification accuracy converge, which is the minimum of theotherthree.Theothersareindescendingorder,AGV, CoronaandCLeVer.
(2)Afterclassificationaccuracyconverges,thestandarddeviations ofFSMICSandCLeVerareclose,whicharebothrelativelystable. Relativelyspeaking,thestabilityofCoronaisabitpoorandAGV isnotstable.
(3)Thetimeofclassificationfordatasetafterfourfeature selec-tionmethodswithSVMisapproximately1S.Thefastestiswith AGV,followedbyCLeVer,FSMICSandCorona.Sincethefeature vectorsaftervectorizationprocessstillhasahighdimension, theFSMICSisnotthebestonclassificationtimeamongfour methods.
(4)After the analysis of the clinical experiment, we can give the conclusionthat AGV is not applicableto ourMTS data type.However,AGVshowsgreatclassificationefficiency,which meansthatAGVisagooddimensionreductionmethodtosome extent.
Ascanbeseenfromtheclassificationresultofpublicand clini-caldatasets,bycomparingtotheotherthreeMTSfeatureselection methods,FSMICSgetsthemaximumaverageclassification accu-racyandthedecentconvergenceratewhenclassificationaccuracy converges.Moreover,itshowsgoodstabilityaftertheconvergence. Throughtheexperiments,wecandeterminethattheFSMICSyields thehighestselectionaccuracy,withrelativelyacceptable classifi-cationefficiency.
Fromthemedicalperspectiveofmathematicalstatistics,wecan concludethatFSMICScanclassifythepatientsintocorresponding classeswithrelativelyaccuracyintheclinicaldata.
5. Conclusion
In MTS data mining problems, the excessive data
dimen-sioncausesinaccuracy ofprobabilitydensitydistributionwhich increasesthecomputationalcomplexity,andinformation redun-danciesand irrelevantfeaturesmayleadtohighcomputational complexityandover-fittingproblems.Thusthispaperfocuseson dimensionreductionandimprovesaMTSfeatureselectionmethod throughcombiningaK–Linformationentropyestimationmethod forfeatureextractionbasedonMIandafeatureselectionalgorithm. WefirstcomputedtheMI valuetoextractthedistinct relation-shipoffeaturesbyusingtheK–Linformationentropyestimation method.Nextweusedclassseparabilitycriteriontoevaluatethe contributionon each variable. For considering theexistence of redundancies between variables, moreover, we introduced the classseparabilitycriterionbyaddingredundancyevaluation vari-ablestoeliminatetheinformationredundanciesandgetthefinal variables,whichyieldmorecorrelationandfewerredundancies. We then sorted theattributes in terms of their importanceto choosetheoptimalfeaturesforreducingthenumberof dimen-sions.Finally,wevectorizedthefeaturematricestosatisfytheinput requirementofclassification.
Throughaverificationandanapplicationexperimenton pub-licdatasetandclinicaldataset,respectively,theimprovedmethod is proved toeffectively reduce the dimensionsof MTS and get a betterapplication in TCM. Thatis to say, FSMICS can effec-tivelyreducecomputationcomplexityandeliminateinformation redundanciesandirrelevantfeaturesinMTStoachievethe pur-poseofdimensionreduction.Fromtheresultsofourexperiments, FSMICScanequallyhandlewithlinearandnonlinearrelationships betweendimensionsandcanbeappliedbetterinMTSdatatype withunequallengthsamples.However,thefeaturevectorsafter vectorizationprocessstillhasahighdimension,whichcanimpact thetimeperformanceofaclassifier.Therefore,theproblemofhow toconductfurtherprocessingand dimensionreductionfor fea-tureattributesundertheconditionofensuringtheclassification accuracytoimprovethetimeperformanceisoneoftheimportant directionsuponwhichourfutureresearchwillfocus.
Acknowledgements
We gratefullyacknowledgethe2014BeijingMunicipal
Edu-cation Commission plan on the scientific research project
(KM201410005004) and theBeijingLaboratory for Urban Mass Transitforgivingustheirsupport.Also,wethanktheHospitalof TraditionalChineseMedicine,CPUMS,ofBeijingreferencedinthis paper,forprovidingclinicalfollow-updataandprovidinghelpful adviceonthisissue.
References
[1]T.Fu,Areviewontimeseriesdatamining,Eng.Appl.Artif.Intell.24(1)(2011) 164–181.
[2]G.Ristanoski,J.Bailey,DistributionBasedDataFilteringforFinancialTime SeriesForecasting,Springer-Verlag,Berlin,2011,pp.122–131.
[3]R.Tanawongsuwan,A.Bobick,PerformanceAnalysisofTime-DistanceGait ParametersUnderDifferentSpeeds,Springer-Verlag,Berlin,2003,pp.715–724.
[4]X.L.Zhang,etal.,Event-relatedpotentialsduringobjectrecognitiontasks,Brain Res.Bull.38(6)(1995)531–538.
[5]W.A.Chaovalitwongse,Y.J.Fan,R.C.Sachdeo,Supportfeaturemachinefor classificationofabnormalbrainactivity,in:KDD-2007Proceedingsofthe Thir-teenthACMSIGKDDInternationalConferenceonKnowledgeDiscoveryand DataMining,2007,pp.113–122.
[6]M.Krawczak,G.Szkatula,Anapproachtodimensionalityreductionintime series,Inf.Sci.260(2014)15–36.
[7]J.M.Wang,etal.,Multivariatetimeseriessimilaritysearching,Sci.WorldJ.2014 (2014)1–8.
[8]K.Javed,H.A.Babri,M.Saeed,Featureselectionbasedonclass-dependent den-sitiesforhigh-dimensionalbinarydata,IEEETrans.Knowl.DataEng.24(3) (2012)465–477.
[9]C.H.Lu,Z.M.Zhu,X.F.Gu,Anintelligentsystemforlungcancerdiagnosisusing anewgeneticalgorithmbasedfeatureselectionmethod,J.Med.Syst.38.(979) (2014).
[10]H.Yoon,K.Y.Yang,C.Shahabi,Featuresubsetselectionandfeature rank-ingformultivariatetimeseries,IEEETrans.Knowl.DataEng.17(9)(2005) 1186–1198.
[11]F.Alimardani,etal.,Presentinganewsearchstrategytoselectsynchronization valuesforclassifyingbipolarmooddisordersfromschizophrenicpatients,Eng. Appl.Artif.Intell.26(2)(2013)913–923.
[12]V.Singh,K.P.Miyapuram,R.S.Bapi,DetectionofcognitivestatesfromfMRIdata usingmachinelearningtechniques,in:20thInternationalJointConferenceon ArtificialIntelligence,2007,pp.587–592.
[13]S.N.Li,Z.H.Zhang,J.Q.Duan,Anensemblemulti-labelfeatureselection algo-rithmbasedoninformationentropy,Int.ArabJ.Inf.Technol.11(4)(2014) 379–386.
[14]M.A.Hossain,X.P.Jia,M.Pickering,Subspacedetectionusingamutual informa-tionmeasureforhyperspectralimageclassification,IEEEGeosci.RemoteSens. Lett.11(2)(2014)424–428.
[15]A.Mehri,A.H.Darooneh,Theroleofentropyinwordranking,Phys.A:Stat. Mech.Appl.390(18–19)(2011)3157–3163.
[16]A.Kraskov,H.Stogbauer,P.Grassberger,Estimatingmutualinformation,Phys. Rev.E:Stat.NonlinearSoftMatterPhys.69(2004)066138.
[17]F.Rossi,etal.,Mutualinformationfortheselectionofrelevantvariablesin spectrometricnonlinearmodelling,Chemom.Intell.Lab.Syst.80(2)(2006) 215–226.
[18]F.Rossi,etal.,FastselectionofspectralvariableswithB-splinecompression, Chemom.Intell.Lab.Syst.86(2)(2007)208–218.
[19]L.F.Kozachenko,N.N.Leonenko,Sampleestimateoftheentropyofarandom vector,Prob.Inf.Transm.23(2)(1987)95–101.
[20]A.Nazarpour,P.Adibi,Two-stagemultiplekernellearningforsupervised dimensionalityreduction,PatternRecognit.48(5)(2015)1854–1862.
[21]M.Imani,H.Ghassemian,Featureextractionusingattractionpointsfor classi-ficationofhyperspectralimagesinasmallsamplesizesituation,IEEEGeosci. RemoteSens.Lett.11(11)(2014)1986–1990.
[22]Y.Xu,etal.,Fromtheideaofsparserepresentationtoarepresentation-based transformationmethodforfeatureextraction,Neurocomputing113(2013) 168–176.
[23]J.Li,AnIntroductiontoPatternRecognition,HigherEducationPress,Beijing, 1994.
[24]H.Min,L.Xiaoxin,Featureselectiontechniqueswithclassseparabilityfor mul-tivariatetimeseries,Neurocomputing110(2013)29–34.
[25]G.Zararsiz,F.Elmali,A.Ozturk,Baggingsupportvectormachinesforleukemia classification,Int.J.Comput.Sci.Issues(IJCSI)9(6)(2012)355–358.
[26]S.Korkmaz,G.Zararsiz,D.Goksuluk,Drug/nondrugclassificationusingsupport vectormachineswithvariousfeatureselectionstrategies,Comput.Methods Prog.Biomed.117(2)(2014)51–60.
[27]C.C.Chang,C.J.Lin,V.M.LIBS,Alibraryforsupportvectormachines,ACMTrans. Intell.Syst.Technol.2(2011)273SI.
[28]N.S.Dias,etal.,Featuredown-selectioninbrain-computerinterfaces.in Neu-ralEngineering,2009.NER‘09,in:4thInternationalIEEE/EMBSConference, Antalya,2009.
[29]C.Z.Y.S.ChonghuaWan,TheChineseversionofqualityoflifetableFACT-Lof lungcancerpatients,ChinaCancer9(3)(2000)109–110.
[30]P.J.Garcia-Laencina,etal.,Knearestneighbourswithmutualinformationfor simultaneousclassificationandmissingdataimputation,Neurocomputing72 (7-9)(2009)1483–1493.