Machine learning algorithms and forced oscillation measurements applied to the automatic identification of chronic obstructive pulmonary disease

(1)

j ou rna l h o me pa g e:w w w . i n t l . e l s e v i e r h e a l t h . c o m / j o u r n a l s / c m p b

Machine

learning

algorithms

and

forced

oscillation

measurements

applied

to

the

automatic

identiﬁcation

of

chronic

obstructive

pulmonary

disease

Jorge

L.M.

Amaral

a

,

Agnaldo

J.

Lopes

b

,

José

M.

Jansen

b

,

Alvaro

C.D.

Faria

c

,

Pedro

L.

Melo

c,∗

a_Department_of_Electronics_and_{Telecommunications}_Engineering,_State_University_of_Rio_de_Janeiro,_Rio_de_Janeiro,_Brazil b_Pulmonary_Function_Laboratory,_Pedro_Ernesto_University_Hospital,_State_University_of_Rio_de_Janeiro,_Rio_de_Janeiro,_Brazil c_Biomedical_{Instrumentation}_Laboratory,_Institute_of_Biology_Roberto_Alcantara_Gomes_and_Laboratory_of_Clinical_and_Experimental ResearchinVascularBiology(BioVasc),StateUniversityofRiodeJaneiro,RiodeJaneiro,Brazil

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received1April2011 Receivedinrevisedform 15August2011

Accepted22September2011

Keywords:

Clinicaldecisionsupport Artiﬁcialintelligence Classiﬁcation

Forcedoscillationtechnique Respiratorysystem

Chronicobstructivepulmonary disease

a

b

s

t

r

a

c

t

Thepurposeofthisstudyistodevelopaclinicaldecisionsupportsystembasedonmachine learning(ML)algorithmstohelpthediagnosticofchronicobstructivepulmonarydisease (COPD)usingforcedoscillation(FO)measurements.Tothisend,theperformancesof clas-sificationalgorithmsbasedonLinearBayesNormalClassifier,Knearestneighbor(KNN), decisiontrees,artificialneuralnetworks(ANN)andsupportvectormachines(SVM)were comparedinordertothesearchforthebestclassifier.Fourfeatureselectionmethodswere alsousedinordertoidentifyareducedsetofthemostrelevantparameters.The avail-abledatasetconsistsof7possibleinputfeatures(FOparameters)of150measurements madein50volunteers(COPD,n=25;healthy,n=25).Theperformanceoftheclassifiersand reduceddatasetswereevaluatedbythedeterminationofsensitivity(Se),specificity(Sp)and areaundertheROCcurve(AUC).Amongthestudiedclassifiers,KNN,SVMandANN classi-fierswerethemostadequate,reachingvaluesthatallowaveryaccurateclinicaldiagnosis (Se>87%,Sp>94%,andAUC>0.95).Theuseoftheanalysisofcorrelationasarankingindex oftheFOTparameters,allowedustosimplifytheanalysisoftheFOTparameters,while stillmaintainingahighdegreeofaccuracy.Inconclusion,theresultsofthisstudyindicate thattheproposedclassifiersmaycontributetoeasythediagnosticofCOPDbyusingforced oscillationmeasurements.

1.

Introduction

Chronic obstructive pulmonary disease (COPD) is a major cause of chronic morbidity and mortality throughout the world[1].AccordingtoWHOestimates,80millionpeoplehave moderatetosevereCOPD.Morethan3millionpeoplediedof

∗ _{Corresponding}_author_.

E-mailaddresses:[email protected],[email protected](P.L.Melo).

COPD in2005,whichcorrespondsto5%ofall deaths glob-ally[2].ThechronicairﬂowlimitationcharacteristicofCOPD iscausedbyamixtureofsmallairwaydisease (obstructive bronchiolitis)andparenchymaldestruction(emphysema)[1]. Thereisanagreementintheliteraturethatnewmeasurement technologiesthatareabletodetectCOPDinearlystageswould contributetodecreasingmedicalandeconomicburdens[3].

Open access under the Elsevier OA license.

(2)

Submitting aphysical system toforced oscillationsis a very general approachto the investigation ofits structure and/orproperties[4].Itsapplicationtorespiratorymechanics wasﬁrstproposedbyDuBoisetal.[5].Thismethod,known as forced oscillation technique (FOT), consists of applying smallsinusoidalpressurevariationstostimulatethe respira-torysystematfrequencieshigherthanthenormalbreathing frequency and measuring the ﬂow response. This method characterizestherespiratoryimpedanceanditstwo compo-nents,respiratorysystemresistance(Rrs)andreactance(Xrs).

Themethodissimpleandrequiresonlypassiveco-operation andnoforcedexpiratorymaneuvers.Recently,thistechnique hasbeensuccessfullyappliedinthedetectionofearly respi-ratorychangesinsmokers[6].

Althoughobtainingrespiratoryimpedancevaluesiseasy, theresultingvaluesaredifﬁculttounderstandbyclinicians astheyarebasedonanelectricalequivalentcircuitmodelof therespiratorysystem.Inthecontextofadiagnosis frame-work,theinterpretationofresistanceandreactancecurves,as wellasthederivedparametersmeasuredbytheFOT,requires trainingandexperience,andisdifﬁculttaskfortheuntrained pulmonologist.

Methodsbasedonmachinelearning(ML)havebeenwidely usedtodevelopclassiﬁers.Thesesystemscanextract infor-mation from different classes ofsignals afterhaving been trainedtoperformthisspeciﬁctaskbylearningfrom exam-ples.Inrespiratorymechanics,MLprovedtobeusefulasa patternrecognitionmethodtooptimizealarmsofanesthesia breathingcircuits[7],detectionofupperairwayobstruction

[8],esophagealintubation[9],assessmentoflunginjury[10], staticcomplianceinanimalmodels[11]andtheevaluationof spirometricexams[12].Recently,aseverityclassiﬁcationfor idiopathicpulmonaryﬁbrosisbyusingfuzzylogicwas pro-posed[13].

2.

Background

Previous works [14,15] have compared groups of controls and COPD patients observing clear modifications in FOT parameters.However,categorizationofpulmonarydiseases by looking at the plotted curves of respiratory impedance or derived parameters can prove a difficult task for the untrained pulmonologist. This raises the question: an ML basedapproachtothe analysisofFOT datacan providean efficientmethodtorecognizeCOPD?Infact,onlytworecent conferencepapershaveaddressedthisquestion[16,17].

Inthe work ofBaruáet al.[16], anartificial neural net-work(ANN)wasusedtorecognizeandclassifythediseases ofthecentralandperipheralairways.TheauthorsusedIOS measurementsandafeedforwardANNthatwastrainedby thebackpropagationalgorithm.Aftersupervisedtraining,the classifierproduceda98.47%and61.53%correctclassification ratewhenthesamedataandanewsetofunseendatawere used,respectively.Itwaspointedoutthattheproposed clas-sifiercouldbefurtherimprovedwiththeinclusionofmore trainingsamplescombinedwithfuzzylogicdecisionrules.

Inalatterworkofthesamegroup[17],aclassiﬁerbased onANNwascapableofdistinguishingbetweenrelatively con-strictedand nonconstrictedairwayconditions inasthmatic

children. The performance of the classiﬁer was evaluated bytwomethods:(1)usingallofthepatternsduringtraining aswellasinthe feed-forwardstageand(2)usingonly60% ofthedatasetduringtrainingandwiththeremaining40% as unseen patterns. The classiﬁcation accuracies obtained were95.01%and98.61%,respectively.Theauthorsconcluded that ANNs can successfully be trained with the impulse oscillation system (IOS) data, enabling them to generalize theIOSparameterrelationshipstoclassifypreviouslyunseen pulmonary patterns. The two cited studies used an IOS, whichhasdifferencesfromtheclassicalFOT,includingdata processing and the parameters used tointerpret raw data

[18,19].Inaddition,fromasystemidentiﬁcationpointofview, the impulse excitationsignalusedinIOS isamuchworse excitationsignalthanaMultisineusedinFOT.Thisdifference isassociatedwithaworsecrestfactorintheimpulsesignal.

Inthiscontext,weobservedthattherewasnodatainthe literatureconcerningtheuseofMLalgorithmsassociatedwith classicalFOTmeasurementstoaidcliniciansinthe identifi-cationofCOPD.Tocontributetoelucidatethisquestion,our grouprecentlyinvestigatedthispossibilityusingtheclassical FOTassociatedwithaclassifierbasedonANN[20].Twofeature selectionmethods(theanalysisofthelinearcorrelationand forwardsearch)wereusedinordertoidentifyareducedset ofthemostrelevantparameters.Twodifferenttraining strate-giesfortheANNswereusedandtheperformanceofresulting networkswere evaluatedbythedeterminationofaccuracy, sensitivity(Se),specificity(Sp)andAUC.TheANNclassifiers presentedhighaccuracy(Se>0.9,Sp>0.9andAUC>0.9)both inthecompleteandthereducesetsofFOTparameters.This indicates that ANNs classifiersmay contributeto easy the diagnosticofCOPDusingFOTmeasurements.Althoughthese resultswereverypromising,thisinitialworkwaslimitedto theinvestigationofanANNbasedclassifierbecausewewere interestedinadirectcomparisonwiththetwopreviouslycited works.

Thepurposeofthepresentstudy istoevaluatethe per-formance ofseveral MLalgorithms in the developmentof anautomaticclassiﬁertohelpthediagnosticofCOPDusing forcedoscillationmeasurements.

Thepaperisorganizedasfollows:adiscussionofthedesign principlesandimplementationgoalsispresentedinthenext section.ThehealthygroupandtheCOPDgroupare character-izedinSection4,alongwithadescriptionofthemeasurement protocol.Thissectionalsopresentstheevaluatedclassiﬁers anddescribesthemethodsusedforperformanceevaluation, comparisonsamongclassiﬁersandfeatureselection.Section

5presentstheresultsandSection6discussestheresultswith respecttothesearchforthebestclassiﬁerandparameters. Section7summarizesthemainoutcomesofthisinvestigation andpointstofuturestepsinthisresearchtopic.

3.

Design

considerations

3.1. Classiﬁcationsystem

Thebasicstructureofaclassificationsystemistheinput,the classifierandtheoutput.Inthepresentwork,theinputsare theparametersprovidedbytheFOT,theclassifierisoneofthe

(3)

patternrecognitionalgorithmschosen,andtheoutputtellsif theinputparametersindicateCOPDornot.

The design process of a classification system presents severalimportantaspectssuchas:theevaluationofthe clas-sifiers,choiceofthealgorithmstobeused,featureselection, selectionofthebestparametersandcomparisonofclassifiers performance.Inthefollowingsections,theseaspectswillbe brieflydescribed.

3.2. Thestudiedclassiﬁers

Inthisparticularstudy,thefollowingclassiﬁcationalgorithms wereevaluated:

• LinearBayesNormalClassiﬁer[21,22]

• Knearestneighbor[21]

• Decisiontrees[23,24]

• Artiﬁcialneuralnetworks[25]

• Supportvectormachines[26]

Thesealgorithmswerechosenbecausetheyrepresentwide varietyofclassifieralgorithmsasseeninLippmann’slistof typesofclassifiers[21]. Theywill bebrieflydescribed. The completefulldescriptionofthealgorithmscanbefoundin thereferences.

TheLinear Bayes Normal Classiﬁer (LBNC) presents the minimum-error,accordingtothe BayesianDecision Theory, whentheclassesarenormallydistributedwithequal covari-ancematrixes.TheLinearBayesisfastandsimpletocompute fromthetrainingdataandprovidesaverystraight interpreta-tion,sinceitisdecisionboundaryisahyperplane.Inspiteof itssimplicity,itisreasonablyrobust,i.e.,itcandeliver surpris-inglygoodresultsevenwhentheclassesdonotfollownormal distributionswithequalcovariancematrixes[21].

TheKnearestneighbor(KNN)isoneofthemostsimple andelegantclassificationmethodsinpatternrecognition[21]. Itisatypeofinstance-basedlearning,orlazylearning,which means that inthe learningstage, it simplystores a set of labeledinstances(trainingset).Whenanewqueryinstance hastobeclassified,thealgorithmfindsKnumber of train-ing instancesclosest tothe querypoint, using asimilarity functionusuallybasedontheEuclideandistance.The classifi-cationisdoneusingthemajorityvoteamongtheclassification oftheKobjects.IfK=1,thentheobjectissimplyassignedto theclassofitsnearestneighbor.

Adecisiontree(DTREE)isahierarchicalstructurethat con-sists ofnodes and branches [23]. There are threetypes of nodes:the rootthathasonlyoutgoingbranches,the inter-nalnodesthathaveoneincomingandtwoormoreoutgoing branches and terminal (leaf) nodes that have no outgoing branches.All terminalnodeshaveaclasslabelassigned to them[23].Eachnonterminalnodeinthetreerepresentsa testononeoftheattributesandeachbranchthatcomesout ofthenoderepresentsoneofthepossibleoutcomesofthe testperformed.Aqueryinstanceisclassiﬁedbystartingat therootnode,testingtheattributespeciﬁedbythisnode,and thenmovingdownthetreebranchcorrespondingtothe out-comeofthetestforthisattribute.Thisprocessisrepeated untilitgetstoaterminalnode,wheretheclasslabelisgiven tothequeryinstance.

An ANN isa massive parallel system [25]composed of manysimpleprocessingelements(neurons)whosefunctionis determinedbythenetworkarchitecture,connectionstrengths (synapticweights)andtheprocessingperformedatthe neu-rons. Neural networksare capableof acquiringknowledge throughalearningprocessandtostorethatknowledgeinthe synapticweights.Oneofthemostsuccessfulneuralnetwork architectureisthe multilayerperceptron(MLP). Ithasbeen successfullyappliedtoavarietyofpatternrecognition prob-lemsinindustry,business,science[27]andinmedical diagno-sis[27,28].Oneofthemostimportantfeaturesofaneural net-workistheabilitytogeneralizewhatithaslearnedfromthe trainingprocedure.Thisallowsthenetworktodealwithnoise intheinputdataandtoprovidethecorrectoutputstonew datapatterns,i.e.,datathatwerenotusedtotrainthenetwork. Supportvectormachines(SVM)arelearningsystemsbased on statisticallearning theory[26] and theyhavebeen suc-cessfully used in a varietyof classification and regression problems. For a two-classclassification problem, the basic form SVM is a linear classifier that performs a classifica-tionconstructingahyperplanethatoptimallyseparatesthe classes.Theoptimalhyperplaneistheonethatprovidesthe maximalmargin.(Themarginisdefinedasthedistancefrom atrainingsampleandthehyperplane.)Itcanbeproventhat this particularsolutionhasthe highestgeneralization abil-ity.Thisformulationcanbegeneralizedapplyinganon-linear mappingofthetrainingset.Thedataistransformedtoanew featurehigh-dimensionalspacewheretheclassesaremore easily separable and anoptimal hyperplanecan be found. TheradialbasisfunctionKernelisfrequentlyusingin accom-plishingthisnonlinearmappinganditisfrequentlythefirst nonlinearmappingtoconsider.Althoughthedecisionsurface (hyperplane)islinearinthehighdimensionalspace,however, whenitisseenintheoriginallow-dimensionalfeaturespace, itisnolongerlinear,meaningthatSVMcanalsobeappliedto datathatisnotlinearlyseparable[29].

3.3. Featureselection

Thepurposeoftheinputfeatureselectionistoﬁndthe small-est number of relevant and informative features that can resultinasatisfactory performance[30].Other motivations toperformfeatureselectionare:generaldatareduction,to limitstoragerequirements,increasethealgorithmspeedand togainknowledgeabouttheprocessthatgeneratesthedata andtoallowdatavisualization(2Dor3D)[30].Itisalso impor-tantbecausealargenumberofinputsimplyintheestimation ofalargenumberofmodelparameters,whichcanbedifﬁcult inlimitedsizedatasets[28].

Basicallytherearethreetypesoffeaturesselection meth-ods: filters, wrappers and embedded methods [30]. Filter methods provide a ranking order of the features using a relevant index such as correlation coefficients or classical statistical tests (T-test, F-test, Chi-squared, etc.). Wrappers normallyapplyanefficientsearchstrategytofindthebest fea-turesbasedonthemachinelearningalgorithmperformance, suchastheclassificationaccuracy.Embeddedmethods per-form feature selection in the process of training and are usuallyspecifictosomegivenlearningmachines,suchas deci-siontrees[30].

(4)

3.4. Performanceevaluation

Theevaluationoftheclassifiersplaysakey rolein classifi-cationsystemdesign.Itsprimarygoalistochoosethebest classifierandestimatesitsperformanceonfutureexamples (generalizationaccuracy)[31].Themaincomponentsinthis evaluationare:thechoiceofthe performancefunction,the evaluationstructureandthecomparisonofdifferent classi-fiers.Thereareseveralmeasuresthatcanbeusedtoaccess theperformanceofthe classifier,dependingonthespecific domainofapplication.Someofthecommonusedmeasures are:accuracy,sensitivity,specificity,TruePositiveRate,False PositiveRate,Recall,PrecisionandtheareaundertheReceiver OperatingCharacteristic(ROC)curve(AUC)[32].

Theevaluationstructureisanimportantpartofthedesign. In order todecide the best classifier, one hasto look into the generalizationaccuracy. Thiscan bedone using either Hold-outorK-foldcross-validationprocedures.InHoldout, theavailabledataisdividedintrainingandtestdatasets.The classifieristrainedwiththetrainingdatasetandthe perfor-manceofthetrainedclassifierisevaluatedinthetestdata settoestimatethegeneralizationaccuracy.Theproblemwith HoldoutisthatdifferentHoldoutsets(differentsplits)leads todifferentresults.Also,dependingontheavailabledata,it ispossibletoendupwithaverywideconfidenceintervalfor theaccuracy[24].InaK-foldcross-validation,all the avail-abledataispartitionedintokequal(orapproximatelyequal) datasetsorfolds[33].For eachfoldinturn,usethatfolder fortestingandtheremainingk−1foldersareusefortraining aclassifier.Theperformanceofeachlearningalgorithmon eachfoldcanbetrackedusingsomepre-determinedmeasure suchasaccuracy.Uponcompletion,ksamplesofthe perfor-mancemetricwillbeavailableanddifferentmethodologies suchasaveragingcanbeusedtoobtainanaggregatemeasure fromthesesamples,orthesesamplescanbeusedina statisti-calHypothesistesttocomparetwoormoremachinelearning algorithms.

TheuseofK-foldcross-validationallowsustoestimate performanceofthelearnedmodelfromavailabledatausing onealgorithm.Inotherwords,it ispossibletoestimateits performanceinunseenexamples (the generalization capa-bilityofthe algorithm).Itcanalsobeusedtocomparethe performanceoftwoormoredifferentalgorithmsandrealize thebestalgorithmfortheavailabledata,oralternatively,it canhelpthedesignertochoosethebestsetofparametersof aparticularmodel.

TheHypothesistestisanotherimportantelementwhen onedesiretocomparetwoormoremachine learning algo-rithms. In the Hypothesis test, we want to verify if there isnodifference inthe performanceoftwo classifiers(Null Hypothesis)undera certainconfidence level(usually 95%). Foracomparisoninonedataset,onecanusetheStudentˇıs test(t-test)oroneofitsvariations,forexamplethecorrected resample[24].Dietterich[31]pointsout thattheuseofa t-test hasa right risk ofa Type I error,i.e., a riskof find a differencewherenoneexists,recommendingthe5×2 cross-validationortheuseofMcNemarˇıstest.Inthecaseofmultiple datasetsfromdifferentdomains,Demsar[34]recommends Wilcoxon’sSignedRankstest,Friedman testsand Posthoc tests.

It is also importantto mention that sometimes classi-fiersareevaluatednotonlybytheirperformancemeasures, but alsobythe speedandscalability,robustnessand inter-pretability.Whenonelooksatspeedandscalability,he(she)is interestedtoknowhowlongittakestoconstructtheclassifier, howlongittakestouseclassifierandifitisabletodealwith datasetswithseveralthousandpoints.Ifrobustnessis impor-tant, onetries to evaluateits capability ofhandling noise, missingvaluesandirrelevantfeatures.Iftheinterpretability isimportant,onetriestofindiftheclassifiercangivesome explanationonhowitachievedtheclassificationforacertain pointofthedataset.

4.

Methods

4.1. Subjectsandspirometry

Theobjectivesofthestudywereexplainedtoallindividuals andtheirwrittenconsentwasobtainedbeforeinclusioninthe study.ThestudywasapprovedbytheMedicalResearchEthics CommitteeoftheStateUniversityofRiodeJaneiro.Thestudy involvedagroupofCOPDpatientswith25subjectsanda con-trol groupformedby25 neversmokingsubjects. Thegroup wasformedbasicallybystudentsandemployeesoftheState University ofRiode Janeiro,andwas composedbyhealthy subjectswhopresentednormalspirometryandnohistoryof pulmonaryorcardiacdisease.ThepatientswithCOPDwere comingfromtheAmbulatoryofCOPDoftheServiceof Pneu-mologyofourUniversityHospital.Thepatientswereinstable clinicalcondition.

COPDpatientspresentedmild(n=8),moderate(n=9)and severe(n=8)airﬂowobstruction,whichwasevaluatedusing the following parameters [6,14,35]: forced Expiratory Vol-ume intheﬁrst second(FEV1),ForcedVital Capacity(FVC),

FEV1/FVCratioandtheForcedExpiratoryFlow(FEF)between

25%and75%ofFVC,andFVC(FEF/FVC)ratio.These measure-mentswereobtainedforallpatientsinasittingposition,using aclosedcircuit spirometer(VitraceVT-139;Pro-médico,Riode Janeiro,Brazil),andwerepresentedasrawdataandpercentile ofthepredictedvalues(%pred).

4.2. Forcedoscillationtechnique

The instrumentation used for evaluation of respiratory impedancebyFOThasbeendescribedinotherstudies[36,37]. Brieﬂy,apseudorandomsinusoidalsignalwith2cmH2O

peak-to-peakofamplitude,containingallharmonicof2Hzbetween 4and32Hz,wasappliedbyaloudspeaker.Thepressureinput wasmeasuredwithaHoneywell176PCpressuretransducer (Microswitch,Boston,MA, USA),and theairway flowswith ascreenpneumothacographcoupledtoasimilartransducer withamatchedfrequencyresponse.Thesignalswere digi-tizedatarateof1024Hz, forperiodsof16s,byapersonal computer,andafastFouriertransformwascomputedusing blocksof4096pointswith50%overlap.ToperformtheFOT analysisthevolunteerremainedinasittingposition,keeping theheadinanormalpositionandbreathingspontaneously throughamouthpiece. Duringthe measurements,the sub-jectsfirmlysupportedhis/hercheeksandmouthfloorusing

(5)

bothhands,whileanoseclipwasworn.Aminimal coher-encefunctionof0.9wasconsideredadequate[6,38].Anytime thecoherencecomputed,(foranyofthestudiedfrequencies) waslessthanthisthreshold,themaneuverwasnot consid-eredvalidandtheexamwasrepeated.Threemeasurements weremadeandtheﬁnalresultofthetestwascalculatedas themeanofthesethreemeasurements.

TodescribetheresistivecomponentoftheFOTdata, an analysisoflinearregressioninthefrequencyrangebetween 4and16Hzwasusedinordertoachieveinterceptresistance (R0)andtheslopeoftheresistivecomponentoftheimpedance

(S).Usingthesamefrequencyrange,aparametercommonly relatedtoairwaysdimensions,themeanresistance(Rm)was

alsocalculated[6,12,38].Theresultsassociatedwiththe reac-tancewere interpretedusing themean reactance(Xm),the

resonancefrequency(fr)andthedynamiccomplianceofthe

respiratorysystem(Crs,dyn)[6,12,38].TheCrs,dynwasestimated

consideringrespiratoryreactanceattheoscillatoryfrequency of4Hz(Xrs4Hz)andusingtheequationXrs4Hz=−1/(2fCrs,dyn)

[6,12,38].Thesamefrequencywasusedtoevaluatethe abso-lutevalueofrespiratoryimpedance(Z4Hz),whichrepresents

thetotalmechanicalloadoftherespiratorysystem,including resistiveandelasticeffects[38].

4.3. Featureselection

Inordertofindtheappropriatesetofinputs,all three fea-ture selection methods cited in the previous section were used.Thechosenfiltermethodusedthecorrelation coeffi-cientsasarankingindex.Theanalysisofthelinearcorrelation coefficientswasdonecalculatingthematrixCofcorrelation coefficients.Eachelementofthismatrixrepresentsthe cor-relationcoefficientbetweentwofeatures,C(featurei,featurej)

orbetweenthefeaturesandtheoutput,C(featurei,Output).

Theproceduretoﬁndthemostrelevantfeatureswasstarted bylookingforafeaturethatpossessthehighestcorrelation coefﬁcientwiththeoutput,C(featurei,Output).Ifthisiscalled

HCCFO(HighestCorrelationCoefﬁcientFeaturewithOutput). Thenextstepwastoeliminatefeatureswherethefollowing relationholds:

|C(HCCFO,Feature)|>|C(Output,HCCFO)

|>|C(Output,Feature)| (1)

Itwas done because if the relation (1) holds for a spe-cificfeature, the information it carries can berepresented bythefeature thathas highestcorrelationcoefficient with the output (HCCFO).Theprocess of featureselectionusing thecorrelation coefficientswasperformedusing the cross-validation method [33]. The available dataset was divided ina fixed number of folds. Eachfold had the same num-berofnormaland COPDmeasurements.Oneofthefolders isthe testset and remainingfolders used astraining set. Thefeatureselectionusingthecorrelationcoefficientswere appliedonlyonthetrainingsets.Therewasusedthreesearch strategies(forward,backward,forwardfloating)inthewrapper methods,andtheperformanceindexwas1-nearestneighbor leave-one-out classification performance. The embedded methodwasusedonlyinthetrainingofthedecisiontree.

4.4. Searchforthebestclassiﬁerparameters

Thefiveclassifiers(LBNC,KNN,DTREE,ANNandSVM)were implementedwithapatternrecognitiontoolbox(prtools)for Matlab[39].TheLBNCwasusedwiththedefaultparameters, i.e.,withnoregularization.IntheKNN,Kwasset1,sowehave theonenearestneighborclassifier.Inalltheotherclassifiers, thesearchforthebestparameterswasdonewitha10-fold cross-validationusingtheaverageclassificationaccuracyin thetestfoldsasaperformanceindex.

In the decision process, the used parameters were the binarysplittingcriterion(informationgain,purity,ﬁsher cri-terion)andthepruningtype(Quinlanpruning,nopruningor theuseofatuningsetforpruning)[39].

In the ANNclassifier the parameterto besearch is the numberofneuronsinthehiddenlayer.Ontheotherhand, concerningtheSVMclassifierwithradialbasisfunction ker-nelhasonlytwoparameterstobefound:theregularization parameterC,thatexpressthechoiceofhavingalarge mar-ginwithmoretrainingsampleswronglyclassifiedorhaving asmallmarginwithlessclassificationerrorsandthe param-eter,andr,thatistheradiusoftheradialbasiskernel.Since theseparametersarenotdiscretevalues,itwasuseda grid-search.Variouspairsof(C,r)valuesweretriedandtheonewith thebestcross-validationaccuracywaspicked.Sincedoinga completegridsearchmaystillbetimeconsuming,Hsu[40]

recommendedtouseacoarsegridfirsttofindthe“bestregion” andtheuseafinergridtosearchthisregion.Itisimportant tonoticethatthethisparametersearchhastobedonefor eachsetofselectedfeatures,i.e.,thereisnoguaranteethat thesameparametersettingwillworkforallsetsofselected features.Forallexperiments,thefeatureswerenormalizedto havezeromeanandunitstandarddeviation.Thisisnecessary toremovescaleeffectscausedbytheuseoffeaturesthathas differentmeasurementscales[41].

Alltheclassifiersweretrainedandevaluatedwiththesame trainingandtestsetsgeneratedbya10foldcross-validation inavailabledataset.Theaccuracy,Se,SpandtheAUCwere calculatedinthe10testssets.Also,itwasassignedtoeach testexampleinthetestsetstwopossibleoutcomes:1 mean-ingthattheclassificationprovidedbytheclassifierwascorrect and0,otherwise.ItallowedustoapplytheCochran’sQtest

[42]todeterminewhethersignificant differencesexistedin theclassificationresults.BesidestheCochran’stest,the McNe-marstest[31]wasappliedbetweeneachpairofclassifiersto findwhethersignificantdifferencesexisted[43].Thesetests wereassumedtobestatisticallysignificantatp<0.05andwere implementedinMatlab7.4.0usingtheStatisticsToolbox6.0.

5.

Results

5.1. Characteristicsofthesubjects

Thebiometricandspirometriccharacteristicsofthestudied subjectsaregiveninTable1.Thebiometriccharacteristicsof thetwostudiedgroupswerewellmatched,andtherewerenot signiﬁcantdifferencesbetweenthegroups.Ascanbeseenin

Table1,patientswithCOPDpresentedsigniﬁcantreductions inthespirometricparameters(p<0.0001).

(6)

Table1–Biometricandspirometriccharacteristicsofthe studiedgroups. CG COPD p Age(years) 55.2±16.7 61.4±9.7 ns(0.38) Weight(kg) 65.4±11.8 66.0±8.4 ns(0.84) Height(cm) 162.2±8.9 163.5±7.9 ns(0.58) FEV1(L) 2.8±0.9 1.4±0.7 <0.0001 FEV1(%pred) 107.1±20.3 57.0±27.7 <0.0001 FEF/FVC(%) 100.3±32.3 28.5±18.3 <0.0001 FEV1/FVC(%) 87.9±10.0 55.0±16.7 <0.0001

Table2–selectedfeaturesusingdifferentstrategies.

Searchstrategy Selectedfeatures

Forward fr,Xm,R0,Crs,dyn,|Zrs|

Backward fr,Xm,R0,|Zrs|

Forwardﬂoating fr,Xm,R0,|Zrs|

5.2. Featureselection

Themostcommonselectedfeaturesusingcorrelationinthe differenttrainingsetswere:(fr,R0,Crs,dyn)and(R0,Crs,dyn).The smallnumberofselectedfeaturesshowsthatparametersare highlycorrelated.Theresultsofthefeatureselectionusing the different search strategies(forward, backward,forward ﬂoating)using1-nearestneighborleave-one-outclassiﬁcation

accuracyas performanceindex are shown inTable 2. The

searchstrategieswereconﬁguredtoﬁndthenumberof fea-turesthatgivesthehighestperformance.

5.3. Performanceofthestudiedclassiﬁersusing differentfeatureselectionmethods

5.3.1. Experiment1—useofallfeatures(FOTparameters)

Fig.1showstheaverageROCcurveforeachclassiﬁer,while

Table3presentstheaverageandthestandarddeviationofthe derivedparameterscalculatedinthe10testfolds,forallofthe studiedclassiﬁers.Theresultspresentedwereobtainedwith thebestparametersfoundforeachclassiﬁer.

LBNCpresentedthebestaverageSp(1.00),KNNpresented thebestaverageAcc(0.97)andAUC(1.00).Ontheotherhand, SVMpresentedthebestaverageSe(0.97)andAUC(1.00).The applicationoftheCochran testhasshownstatistically sig-nificantdifferenceintheclassifiers,andtheMcNemarstest appliedtoall pairsofclassifiersindicatedthat therewas a statisticallysignificant difference between KNN and LBNC, andbetweenKNNandDTREE.

Fig.1–AverageROCcurveforexperiment1.

Fig.2–AverageROCcurveforexperiment2.

5.3.2. Experiment2—forwardselectionsearch

Thesecondexperimentwascarriedoutusingtheselected fea-tureschosenbytheforwardselectionsearchstrategy(fr,Xm, R0,Crs,dyn,|Zrs|).TheseresultsaredescribedinFig.2,which

showstheaverageROCcurveforeachclassiﬁer,andTable4. Theseresultswereobtainedwiththebestparametersfound foreachclassiﬁer.

Table3–Resultsoftheexperiment1.

Classiﬁer Acc Se Sp AUC

LBNC 0.89±0.07 0.78±0.14 1.00±0.00 0.95±0.05

KNN 0.97±0.04 0.96±0.09 0.99±0.04 1.00±0.00

DTREE 0.90±0.05 0.90±0.08 0.91±0.07 0.95±0.04

ANN 0.93±0.06 0.89±0.12 0.96±0.06 0.97±0.05

SVM 0.96±0.05 0.97±0.05 0.94±0.05 1.00±0.01

LBNC,LinearBayesNormalClassifier;KNN,Knearestneighbor;DTREE,decisiontrees;ANN,artificialneuralnetworks;SVM,supportvector machines;Acc,accuracy;Se,sensitivity;Sp,specificity;AUC,areaundertheROCcurve.

(7)

Table4–Resultsoftheexperiment2.

LBNC 0.90±0.07 0.80±0.13 1.00±0.00 0.96±0.05

KNN 0.95±0.04 0.93±0.09 0.97±0.05 1.00±0.00

DTREE 0.89±0.07 0.88±0.13 0.91±0.09 0.95±0.04

ANN 0.94±0.05 0.92±0.09 0.96±0.07 0.96±0.06

SVM 0.95±0.05 0.95±0.09 0.95±0.07 0.98±0.03

Boldindicatesthebestvaluesofaccuracy,sensitivity,speciﬁcityandAUC.

Fig.3–AverageROCcurveforexperiment3.

LBNCpresentedthebestaverageSp(1.00),whileKNN pre-sentedthebestAcc(0.95)andAUC(0.99).SVMpresentedthe bestaverageAcc(0.95)andSe(0.95).Theapplicationofthe Cochrantesthasshownstatisticallysignificantdifference,in theclassifiersandtheMcNemarstestappliedtoallpairsof classifiersindicatedthattherewasastatisticallysignificant

differencebetween:KNNandDTREE.

5.3.3. Experiment3—forwardﬂoatingselection

The third experiment was carried out using the selected

featureschosenbytheforwardﬂoatingselectionandthe back-wardsearchstrategiessincebothchosethesamefeatures(fr, Xm,R0,|Zrs|).TheseresultsaredescribedinFig.3,whichshows

theaverageROCcurveforeachclassiﬁer,andTable5. Accordingtotheresults,LBNCpresentsthebestaverage Sp(1.00),KNN presentthebest averageAcc (0.95)and AUC (0.99).Ontheotherhand,SVCpresentsthebestaverageAcc (0.95)andSe(0.94).TheapplicationoftheCochrantesthasnot shownastatisticallysigniﬁcantdifference.

Fig.4–AverageROCcurveforexperiment4.

5.3.4. Experiment4—analysisofcorrelationcoefﬁcients

The fourth experiment was carried out using the features (fr, R0, Crs,dyn) selected by the analysis of the

correla-tion coefﬁcients. Theses results are presented in Fig. 4, shows the average ROC curve for each classiﬁer, and

Table6.

Usingthesefeatures,LBNCpresentedthebestaverageSp (1.00), KNN presentedthe best averageAcc (0.95), Se(0.93) andAUC(0.99).TheapplicationoftheCochrantesthasnot shownastatisticallysigniﬁcantdifferencebetweenthe clas-siﬁerresults.

5.3.5. Experiment5—correlationcoefﬁcients

Theﬁfthexperimentwascarriedoutalsousingthefeatures (R0,Crs,dyn)selectedbytheanalysisofthecorrelation

coefﬁ-cients.Fig.5showstheaverageROCcurveforeachclassiﬁer, whileTable7showstheassociatedparameters.

Intheseconditions,LBNCpresentedthebestaverageSp (1.00),KNN andANNpresentedthebest averageAcc(0.93),

Table5–Resultsoftheexperiment3.

LBNC 0.89±0.07 0.78±0.14 1.00±0.00 0.96±0.04

KNN 0.95±0.04 0.92±0.07 0.97±0.06 0.99±0.03

DTREE 0.89±0.07 0.89±0.11 0.89±0.09 0.95±0.05

ANN 0.91±0.08 0.87±0.09 0.95±0.10 0.97±0.06

SVM 0.95±0.05 0.94±0.07 0.95±0.09 0.96±0.05

(8)

Table6–Resultsoftheexperiment4.

LBNC 0.90±0.07 0.80±0.13 1.00±0.00 0.96±0.05

KNN 0.95±0.09 0.93±0.10 0.97±0.09 0.99±0.03

DTREE 0.91±0.08 0.89±0.11 0.93±0.11 0.96±0.05

ANN 0.91±0.05 0.89±0.09 0.93±0.09 0.95±0.05

SVM 0.93±0.11 0.91±0.11 0.94±0.14 0.96±0.07

Table7–Experiment5results.

LBNC 0.90±0.07 0.80±0.13 1.00±0.00 0.97±0.05

KNN 0.93±0.05 0.91±0.09 0.96±0.07 0.99±0.01

DTREE 0.90±0.07 0.91±0.09 0.89±0.15 0.95±0.05

ANN 0.93±0.05 0.91±0.09 0.94±0.07 0.97±0.04

SVM 0.91±0.07 0.82±0.14 1.00±0.00 0.97±0.04

Fig.5–AverageROCcurveforexperiment5.

KNN, DTREEand ANNpresentedthe bestaverage Se(0.91)

andKNN presentedthebest AUC(0.99).Theapplicationof

theCochrantesthasnotshownastatisticallysigniﬁcant dif-ferencebetweentheclassiﬁerresults.

5.4. Searchforthebestclassiﬁerparameters

Tables8and9showthebestparametersforeachclassiﬁerand theiraverageaccuracies.

5.5. PerformanceoftheKNNclassiﬁersusingdifferent featureselectionmethods

Table10liststheresultsachievedbyKNNinallofthe experi-ments.

6.

Discussion

ThepurposeofthepresentstudywastodevelopanML sys-tem classifierthatmaycontributetoeasythediagnosticof COPD usingFOT measurements. Althoughprevious confer-ence papershaveinvestigatedthepotentialofANNtoeasy the diagnostic ofCOPD using IOS [16,17] and FOT [20], to the authors’knowledge,this isthefirst study dedicatedto compare the performanceof several MLalgorithms in the developmentofanautomaticclassifiertohelpthediagnostic ofCOPDusingFOTmeasurements.Morespecifically,we inves-tigatedtheperformanceoftheLBNC,KNN,DTREE,ANNand SVMalgorithms.Wealsoperformedaninputfeatureselection inorder tofind thesmallestnumberofrelevantand infor-mativefeaturesthatcanresultinasatisfactoryperformance

Table8–Selectedparametersfordifferentselectedfeatures.

Selectedfeatures Classiﬁers Parameter Value Averageaccuracy

Allfeatures

DTREE Splitting_Pruning_typecriterion Purity_None 0.89

ANN Numberofhiddennodes 8 0.95

SVM Regularizationparameter(C) 8 0.96

Radius(r) 0.707

fr,Xm,R0,Crs,dyn,

|Zrs|

DTREE Splittingcriterion Purity 0.90

Pruningtype None

(9)

Table9–selectedparametersfordifferentselectedfeatures.

Selectedfeatures Classiﬁers Parameter Value Averageaccuracy

fr,Xm,R0,|Zrs|

DTREE Splitting_Pruning_typecriterion Purity_None 0.89

SVM Regularizationparameter(C) 22.627 0.94

Radius(r) 0.42

fr,R0,Crs,dyn

Pruningtype None

SVM Regularization_Radius_(r) Parameter(C) 8_0.5 0.95

R0,Crs,dyn

Pruningtype None

SVM Regularizationparameter(C) 0.25 0.91

Radius(r) 1

Table10–ComparisonsoftheresultsachievedbyKNNinalloftheexperiments.

Experiment Acc Se Sp AUC

AllFeatures 0.97±0.04 0.96±0.09 0.99±0.04 1.00±0.00

fr,Xm,R0,Crs,dyn,|Zrs| 0.95±0.04 0.93±0.09 0.97±0.05 1.00±0.00

fr,Xm,R0,|Zrs| 0.95±0.04 0.92±0.07 0.97±0.06 0.99±0.03

fr,R0,Crs,dyn 0.95±0.09 0.93±0.10 0.97±0.09 0.99±0.03

R0,Crs,dyn 0.93±0.05 0.91±0.09 0.96±0.07 0.99±0.01

[30].Finally, wecomparedtheperformanceofclassifiersin ordertoevaluatethemostadequatemethodtodetectCOPD.It hasbeenshownthat,ingeneral,allofthestudiedalgorithms wereabletoadequatelydetect COPD.However,it is impor-tanttopointoutthatsomeclassifierwillperformthiswork betterthanothers.Interestingly,thefeatureselectionallowed thereductionoftheusedfeatureswithoutasignificant reduc-tioninperformance.Furthermore,ROCanalysisshowedthat particularlythreeofthestudiedalgorithmspresentedagreat potentialtocontributetotheautomaticdetectionofthe res-piratoryeffectsofCOPDinaclinicalsetting.

TheanalysisofROCcurvesisperformedbyplotting sensi-tivityversus1-speciﬁcityforeachpossiblecut-offlevel.This way,thelargertheareaunderthecurve(AUC),themorevalid thediagnostictestis.Thisparameterhastheclinicallyuseful interpretationofrepresentingtheprobabilityofcorrectly dis-criminatingbetweentwosubjectsinarandomlyselectedpair ofabnormalandnormalsubjects[44,45].Accordingtothe lit-erature,ROCcurveswithAUCsbetween0.50and0.70indicate lowdiagnosticaccuracy,AUCsbetween0.70and0.90indicate moderateaccuracy,andAUCsbetween0.90and1.00indicate highaccuracy[46,47].

Takingintoconsiderationthesevalues,allofthestudied classiﬁersreachedhighlevelsofaccuracywhenallfeatures wereused(experiment1,Fig.1andTable2).KNNwasthemost adequate algorithmtocorrectly identify COPD(AUC=1.00), followedbySVM(AUC=1.00)andANN(AUC=0.97).Statistical comparisonsshowedthatKNNwassigniﬁcantlybetterthan LBNCandDTREE.

Theresultsobtainedusingtheﬁveselectedfeatures cho-senbytheforwardselectionsearchstrategy,describedinFig.2

and Table4,were coherent withthatobtained usingall of thesevenfeatures,showingthatKNNwasthemostadequate algorithmtocorrectlyidentifyCOPD(AUC=1.00),followedby

SVM(AUC=0.98)andANN(AUC=0.96).Oncegain,statistical comparisonsshowedabetterperformanceoftheKNNwhen comparedwithDTREE.

Althoughwecouldnotobservestatisticallysignificant dif-ferencesamongtheperformanceoftheclassifiers,theKNN algorithmalsopresentedthehighestvalueofAUC(0.99) con-sideringtheresultsobtainedusingthefourselectedfeatures chosenbythe forwardfloatingselectionand thebackward search strategies. Theseresults are describedin Fig.3 and

Table7.TheperformanceoftheKNNwasfollowedbyANN (AUC=0.97) and SVM and LBNC (AUC=0.96). These results were similartothat observedfurtherreducing the number ofusedfeatures(Fig. 4andTable 6), whichwasconducted usingthreefeaturesselectedbytheanalysisofthecorrelation coefﬁcients.

Inthelastexperiment,conductedusingonlytwofeatures selectedbytheanalysisofthecorrelationcoefﬁcients(Fig.5

andTable7),KNNwasalsotheclassifierwiththehighestAUC (0.99).However,inthisexperiment,onecannotsaythata par-ticularclassifierdominatestheothers.Differentsectionsof thecurvearedominatedbydifferentclassifiers(Fig.5).

RecommendationsforresearchinCOPD[48]includethe need for improved noninvasive mechanical tests of lung function.Thepresentstudywasconductedasaneffortto con-tributeinthisdirection,andshowedthatFOTmeasurements, integratedwithmachinelearningalgorithms,mayconstitute averypromisingsystemabletonon-invasivelyandaccurately diagnoseCOPD.WeobservedhighvaluesofAUCinallofthe classifiersandfeaturesstudied,andthattherearestatistically significantdifferencesinthefirstexperimentbetweenKNN and LBNC,KNN andDTREE,and inthesecond experiment betweenKNNandDTREE.Itmeansforallothercasesonecan use anyofthefiveclassifiers.However,ifonelooksonthe averagevaluesoftheperformancemeasures(Tables3–7),the

(10)

classifiersthatperformedbestinallexperimentswereKNN andSVM.Theywere followedbytheANNandthen bythe LBNCandDTREE.Infact,KNNwasthemostadequate clas-sifiertousetocorrectlyidentifytherespiratorymodifications inthestudiedCOPDpatients.

AlthoughtheFOTmaybeveryusefulinclinicalpractice, thistechniquehasnotbeenwidelyusedinthemedical com-munityduetothelackofspeciﬁcity,whichisassociatedwith thebiasfromtheupperairwayshunt.Itisinterestingtonote thattheuseofmachinelearningalgorithmsresultedinvery accurateresults (Tables3–7 and 10). Webelieve that these resultsmayhelptoincreasetheacceptationoftheFOTinthe medicalcommunity.

Theﬁveexperimentswere madeusingdifferent feature selectionmethods.Noneofthemprovidedbetterresultsthan theexperimentthatuseallfeatures,i.e.,allFOTparameters. Thismeansthatall FOT parametersarerelevant. However, byanalyzingtheexperimentsthatselectedspeciﬁcfeatures, onecanobserve arankbetween theFOT parameters. This can beshown in Table10 which lists the results achieved byKNN in all the experiments. Thesmall decrease inthe performancemeasurementsindicates thatfr, R0,Crs,dyn are

themostimportantparameters.Thisagreeswiththeanalysis ofthecorrelationcoefﬁcients. Theuseoffewerparameters (fr, R0, Crs,dyn or R0,Crs,dyn)simpliﬁes theanalysis and still

keepahighdegreeofaccuracy.

Inrelationtothespeed,itknownthatKNNisverypowerful andveryfasttobuild,butitcantakealongtimetoperform aclassificationif thetrainingset islarge[21]. Sinceinthis casethedatasetissmall,theKNNdidnottakelongto per-formaclassification.TheSVMclassifierpresentsverygood results.Itisveryfasttotrainandtoperformthe classifica-tion.TheANNtakesalongtimetobuildaclassifierdueto thetrainingprocedures,butitisveryfasttoperforma clas-sification.Italsodoesnotprovideanyexplanationonhowit achievedtheclassification.Theclassifiersthatpresentmore interpretableresults(LBNCandDTREE)haveverysimilar per-formance.TheDTREEsufferedfromthefactthatthefeatures arehighlycorrelated.Intheauthor’sopinion,consideringthe trade-offamongaccuracy,thetimetobuild,totrain,andthe timetoperformaclassification,ifweusealltheFOT param-eters,theKNNisthemostappropriatechoice.Itallowsusto achieveahighdegreeofaccuracyandanintuitive interpreta-tionoftheclassification.Inthiscase,theexamundertestis classifiedasnormalorCOPDaccordingtothetrainingsetthat isclosesttoit.Thisclassifierisalsoagoodchoiceifwewant touseonlytwoparameters,asdescribedinSection5.3.5.

Itisimportanttopointoutthatthefeatureselectionand associatedresultsofthisstudyarespeciﬁcfortheCOPD.Other diseaseswillresultindifferentchangesintherespiratory sys-temand,thus,otherparametersmaybebettersuitedtothe identiﬁcationoftherespiratorychanges.Eveninpure emphy-sema,whichisadiseaseassociatedwithCOPD,theauthors recommendthatasimilarstudybeconductedandthe opti-mizedconditionsareobtainedandused.

7.

Conclusions

Inthispaper, wedesignedandevaluated severalclassiﬁers systemsandfeatureselectionmethodstodevelopaclinical

decisionsupportsystemtohelpthediagnosticofCOPDusing FOTmeasurements.KNN,SVMandANNclassifierswerethe mostadequate,reachingvaluesthatallowaveryaccurate clin-icaldiagnosis.Theseclassifiersallowedtheidentificationof therespiratorymodificationswithaminimumsensitivityof 87%andaminimumspecificityof94%.Theuseofthe analy-sisofcorrelationasarankingindexoftheFOTparameters, allowedustosimplify theanalysisofthe FOT parameters, whilestillmaintainingahighdegreeofaccuracy.

8.

Future

plans

Basedonthesepromisingresults,futureworkincludesthe fol-lowinggoals:(1)toaddtotheclassiﬁcationsystemtheability ofidentifyingthelevelofairﬂowobstructioninCOPD(mild, moderateorsevere);(2)toapplythismethodologyinthe detec-tionofearlysmoking-inducedrespiratorychanges,and(3)to contributetothediagnosisofairwayobstructioninasthma.

Conﬂict

of

interest

Nonedeclared.

Acknowledgements

The authors would like to thankJosiel G. Santosfor their technical assistance. The Brazilian Council for Scientiﬁc and TechnologicalDevelopment (CNPq)and Riode Janeiro StateResearchSupportingFoundation(FAPERJ)supportedthis study.

r

e

f

e

r

e

n

c

e

s

[1] Theglobalinitiativeforchronicobstructivelungdisease. Availablefrom:<http://www.goldcopd.com>(accessed March2011).

[2] WorldHealthOrganization.Availablefrom:

<http://www.who.int/respiratory/copd/burden/en/index.html> (accessedMarch2011).

[3] P.L.Enright,R.M.Crapo,Controversiesintheuseof spirometryforearlyrecognitionanddiagnosisofchronic obstructivepulmonarydiseaseincigarettesmokers,Clin. ChestMed.21(4)(2000)645–652.

[4] L.Ljung,SystemIdentiﬁcation:TheoryfortheUser, Prentice-HallInc.,Londres,1987.

[5] A.B.Dubois,A.W.Brody,D.H.Lewis,B.F.BurgesJr., Oscillationmechanicsoflungsandchestinman,J.Appl. Physiol.8(1956)587–594.

[6] A.C.D.Faria,A.J.Lopes,J.M.Jansen,P.L.Melo,Evaluatingthe forcedoscillationtechniqueinthedetectionofearly smoking-inducedrespiratorychanges,Biomed.Eng.Online 25(2009)8–22.

[7] A.J.Orr,D.R.Westenskow,Abreathingcircuitalarmsystem basedonneuralnetworks,J.Clin.Monit.10(1994)101–109. [8] P.Bright,M.R.Miller,J.A.Franklyn,M.C.Sheppard,Theuseof

aneuralnetworktodetectupperairwayobstructioncaused bygoiter,Am.J.Respir.Crit.CareMed.157(1998)1885–1891. [9] M.A.Leon,J.Räsänen,D.Mangar,Neuralnetwork-based

detectionofesophagealintubation,Anesth.Analg.78(1994) 548–553.

(11)

[10] J.Räsänen,M.A.León,Detectionoflunginjurywith conventionalandneuralnetwork-basedanalysisof continuousdata,J.Clin.Monit.14(1998)433–439.

[11] G.Perchiazzi,M.Högman,C.Rylander,R.Giuliani,T.Fiore,G. Hedenstierna,Assessmentofrespiratorysystemmechanics byartiﬁcialneuralnetworks:anexploratorystudy,J.Appl. Physiol.90(2001)1817–1824.

[12] U.Uncü,Evaluationofpulmonaryfunctiontestsbyusing fuzzylogictheory,J.Med.Syst.34(3)(2010)241–250. [13] A.J.Lopes,D.Capone,R.Mogami,R.S.Lanzillotti,P.L.Melo,

J.M.Jansen,Severityclassiﬁcationforidiopathicpulmonary ﬁbrosisbyusingfuzzylogic,Clinics66(6)(2011)1015–1019. [14] A.M.G.T.DiMango,A.J.Lopes,J.M.Jansen,P.L.Melo,Changes

inrespiratorymechanicswithdegreesofairwayobstruction inCOPD:detectionbyforcedoscillationtechnique,Respir. Med.100(3)(2006)399–410.

[15] C.Ionescu,E.Derom,R.DeKeyser,Assessmentof respiratorymechanicalpropertieswithconstant-phase modelsinhealthyandCOPDlungs,Comput.Methods ProgramsBiomed.97(1)(2010)78–85.

[16] M.Barúa,H.Nazeran,P.Nava,V.Granda,B.Diong, Classiﬁcationofpulmonarydiseasesbasedonimpulse oscillometricmeasurementsoflungfunctionusingneural networks,in:Conf.Proc.IEEEEng.Med.Biol.Soc.,2004,pp. 3848–3851.

[17] M.Barúa,H.Nazeran,P.Nava,B.Diong,M.Goldman, Classiﬁcationofimpulseoscillometricpatternsoflung functioninasthmaticchildrenusingartiﬁcialneural networks,in:Conf.Proc.IEEEEng.Med.Biol.Soc.,2005, pp.327–331.

[18] D.Macleod,M.Birch,Respiratoryinputimpedance measurements:forcedoscillationmethods,Med.Biol.Eng. Comput.39(2001)505–516.

[19] J.Hellinckx,M.Cauberghs,K.DeBoeck,M.Demedts, Evaluationofimpulseoscillationsystem:comparisonwith forcedoscillationtechniqueandbodyplethysmography,Eur. Respir.J.18(2001)564–570.

[20] J.L.M.Amaral,A.C.D.Faria,A.J.Lopes,J.M.Jansen,P.L.Melo, Automaticidentiﬁcationofchronicobstructivepulmonary diseasebasedonforcedoscillationmeasurementsand artiﬁcialneuralnetworks,in:32ndAnnualInternational ConferenceoftheIEEEEngineeringinMedicineandBiology Society,BuenosAires,Argentina,2010.

[21] L.I.Kuncheva,CombiningPatternClassiﬁers:Methodsand Algorithms,Wiley-Interscience,2004.

[22] R.O.Duda,P.E.Hart,D.G.Stork,PatternClassiﬁcation, Wiley-Interscience,2000.

[23] P.N.Tan,M.Steinbach,V.Kumar,IntroductiontoData Mining,UniversityofMinnesotaPublisher,Addison-Wesley Copyright,2006.

[24] I.H.Witten,E.Frank,DataMining:PracticalMachine LearningToolsandTechniques,2nded.,MorganKaufmann, 2005.

[25] S.Haykin,NeuralNetworksaComprehensiveFoundation, MacmillanCollegePublishingCompany,EnglewoodCliffs, 1994.

[26] V.N.Vapnik,TheNatureofStatisticalLearningTheory,2nd ed.,Springer,NewYork,2000.

[27] G.P.Zhang,Neuralnetworksforclassiﬁcation:asurvey,IEEE Trans.Syst.ManCybern.C:Appl.Rev.30(2000)451–462. [28] C.E.Pedreira,L.Macrini,M.G.Land,E.S.Costa,Newdecision

supporttoolfortreatmentintensitychoiceinchildhood acutelymphoblasticleukemia,IEEETrans.Inf.Technol. Biomed.13(2009)284–290.

[29] M.H.Goldbaum,P.A.Sample,K.Chan,J.Williams,T.-W.Lee, E.Blumenthal,C.A.Girkin,L.M.Zangwill,C.Bowd,T. Sejnowski,R.N.Weinreb,Comparingmachinelearning classiﬁersfordiagnosingglaucomafromstandard automatedperimetry,Invest.Ophthalmol.Vis.Sci.43(1) (2002)162–169.

[30] I.Guyon,A.Elisseeff,Anintroductiontovariableandfeature selection,J.Mach.Learn.Res.3(2003)1157–1182.

[31] D.T.Dietterich,Approximatestatisticaltestsforcomparing supervisedclassiﬁcationlearningalgorithms,Neural Comput.10(1998)1895–1923.

[32] T.Fawcett,Anintroductiontorocanalysis,PatternRecogn. Lett.27(8)(2006)861–874.

[33] P.Refaeilzadeh,L.Tang,H.Liu,CrossValidation, EncyclopediaofDatabaseSystems,Springer,2009. [34] J.Demsar,Statisticalcomparisonsofclassiﬁersover

multipledatasets,J.Mach.Learn.Res.7(2006)1–30. [35] J.V.Cavalcanti,A.J.Lopes,J.M.Jansen,P.L.Melo,Detectionof

changesinrespiratorymechanicsduetoincreasingdegrees ofairwayobstructioninasthmabytheforcedoscillation technique,Respir.Med.100(12)(2006)2207–2219. [36] P.L.Melo,Newimpedancespectrometerforscientiﬁcand

clinicalstudiesontherespiratorysystem,Rev.Sci.Instrum. 71(7)(2000)2867–2872.

[37] P.L.Melo,M.M.Werneck,A.Giannella-Neto,Inﬂuenceofthe pressuregeneratornon-linearitiesintheaccuracyof respiratoryinputimpedancemeasuredbyforcedoscillation, Med.Biol.Eng.Comput.38(2000)102–108.

[38] A.C.D.Faria,A.J.Lopes,J.M.Jansen,P.L.Melo,Assessmentof respiratorymechanicsinpatientswithsarcoidosisusing forcedoscillation:correlationswithspirometricand volumetricmeasurementsanddiagnosticaccuracy, Respiration78(1)(2009)93–104.

[39] R.P.W.Duin,P.Juszczak,P.Paclik,E.Pekalska,D.deRidder, D.M.J.Tax,S.Verzakov,PRTools4.1,AMatlabToolboxfor PatternRecognition,DelftUniversityofTechnology, 2007.

[40] C.W.Hsu,C.C.Chang,C.J.Lin,Apracticalguidetosupport vectorclassiﬁcation.Availablefrom:

<www.csie.ntu.edu.tw/∼cjlin/papers/guide/guide.pdf> (accessedOctober2010).

[41] W.L.Martinez,A.R.Martinez,ExploratoryDataAnalysiswith MATLAB,CRCPress,2005.

[42] W.J.Conover,PracticalNonparametricStatistics,3rded., Wiley,1999.

[43] S.J.Delany,P.Cunningham,L.Coyle,Anassessmentof case-basedreasoningforspamﬁltering,Artif.Intell.Rev.24 (3)(2005)359–378.

[44] J.A.Hanley,B.J.McNeil,Themeaninganduseofthearea underareceiveroperatingcharacteristic(ROC)curve, Radiology143(1982)29–36.

[45] J.A.Swets,R.M.Picket,Evaluationofdiagnosticsystems: methodsfromsignaldetectiontheory,Med.Phys.10(2) (1983)266–267.

[46] J.A.Swets,Measuringtheaccuracyofdiagnosticsystems, Science240(1988)1285–1293.

[47] R.Golpe,A.Jiménez,R.Carpizo,J.M.Cifrian,Utilityofhome oximetryasascreeningtestforpatientswithmoderateand severesymptomsofobstructivesleepapnea,Sleep22(7) (1999)932–937.

[48] T.L.Croxton,G.G.Weinmann,R.M.Senior,J.R.Hoidal,Future researchdirectionsinchronicobstructivepulmonary disease,Am.J.Respir.Crit.CareMed.165(2002) 838–844.

w w w . i n t l . e l s e v i e r h e a l t h . c o m / j o u r n a l s / c m p b

Elsevier OA license.

http://www.goldcopd.com>

http://www.who.int/respiratory/copd/burden/en/index.html>

www.csie.ntu.edu.tw/∼