• No results found

Machine learning algorithms and forced oscillation measurements applied to the automatic identification of chronic obstructive pulmonary disease

N/A
N/A
Protected

Academic year: 2021

Share "Machine learning algorithms and forced oscillation measurements applied to the automatic identification of chronic obstructive pulmonary disease"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

j ou rna l h o me pa g e:w w w . i n t l . e l s e v i e r h e a l t h . c o m / j o u r n a l s / c m p b

Machine

learning

algorithms

and

forced

oscillation

measurements

applied

to

the

automatic

identification

of

chronic

obstructive

pulmonary

disease

Jorge

L.M.

Amaral

a

,

Agnaldo

J.

Lopes

b

,

José

M.

Jansen

b

,

Alvaro

C.D.

Faria

c

,

Pedro

L.

Melo

c,∗

aDepartmentofElectronicsandTelecommunicationsEngineering,StateUniversityofRiodeJaneiro,RiodeJaneiro,Brazil bPulmonaryFunctionLaboratory,PedroErnestoUniversityHospital,StateUniversityofRiodeJaneiro,RiodeJaneiro,Brazil cBiomedicalInstrumentationLaboratory,InstituteofBiologyRobertoAlcantaraGomesandLaboratoryofClinicalandExperimental ResearchinVascularBiology(BioVasc),StateUniversityofRiodeJaneiro,RiodeJaneiro,Brazil

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received1April2011 Receivedinrevisedform 15August2011

Accepted22September2011

Keywords:

Clinicaldecisionsupport Artificialintelligence Classification

Forcedoscillationtechnique Respiratorysystem

Chronicobstructivepulmonary disease

a

b

s

t

r

a

c

t

Thepurposeofthisstudyistodevelopaclinicaldecisionsupportsystembasedonmachine learning(ML)algorithmstohelpthediagnosticofchronicobstructivepulmonarydisease (COPD)usingforcedoscillation(FO)measurements.Tothisend,theperformancesof clas-sificationalgorithmsbasedonLinearBayesNormalClassifier,Knearestneighbor(KNN), decisiontrees,artificialneuralnetworks(ANN)andsupportvectormachines(SVM)were comparedinordertothesearchforthebestclassifier.Fourfeatureselectionmethodswere alsousedinordertoidentifyareducedsetofthemostrelevantparameters.The avail-abledatasetconsistsof7possibleinputfeatures(FOparameters)of150measurements madein50volunteers(COPD,n=25;healthy,n=25).Theperformanceoftheclassifiersand reduceddatasetswereevaluatedbythedeterminationofsensitivity(Se),specificity(Sp)and areaundertheROCcurve(AUC).Amongthestudiedclassifiers,KNN,SVMandANN classi-fierswerethemostadequate,reachingvaluesthatallowaveryaccurateclinicaldiagnosis (Se>87%,Sp>94%,andAUC>0.95).Theuseoftheanalysisofcorrelationasarankingindex oftheFOTparameters,allowedustosimplifytheanalysisoftheFOTparameters,while stillmaintainingahighdegreeofaccuracy.Inconclusion,theresultsofthisstudyindicate thattheproposedclassifiersmaycontributetoeasythediagnosticofCOPDbyusingforced oscillationmeasurements.

©2011ElsevierIrelandLtd.

1.

Introduction

Chronic obstructive pulmonary disease (COPD) is a major cause of chronic morbidity and mortality throughout the world[1].AccordingtoWHOestimates,80millionpeoplehave moderatetosevereCOPD.Morethan3millionpeoplediedof

Correspondingauthor.

E-mailaddresses:[email protected],[email protected](P.L.Melo).

COPD in2005,whichcorrespondsto5%ofall deaths glob-ally[2].ThechronicairflowlimitationcharacteristicofCOPD iscausedbyamixtureofsmallairwaydisease (obstructive bronchiolitis)andparenchymaldestruction(emphysema)[1]. Thereisanagreementintheliteraturethatnewmeasurement technologiesthatareabletodetectCOPDinearlystageswould contributetodecreasingmedicalandeconomicburdens[3].

0169-2607©2011ElsevierIrelandLtd. doi:10.1016/j.cmpb.2011.09.009

Open access under the Elsevier OA license.

(2)

Submitting aphysical system toforced oscillationsis a very general approachto the investigation ofits structure and/orproperties[4].Itsapplicationtorespiratorymechanics wasfirstproposedbyDuBoisetal.[5].Thismethod,known as forced oscillation technique (FOT), consists of applying smallsinusoidalpressurevariationstostimulatethe respira-torysystematfrequencieshigherthanthenormalbreathing frequency and measuring the flow response. This method characterizestherespiratoryimpedanceanditstwo compo-nents,respiratorysystemresistance(Rrs)andreactance(Xrs).

Themethodissimpleandrequiresonlypassiveco-operation andnoforcedexpiratorymaneuvers.Recently,thistechnique hasbeensuccessfullyappliedinthedetectionofearly respi-ratorychangesinsmokers[6].

Althoughobtainingrespiratoryimpedancevaluesiseasy, theresultingvaluesaredifficulttounderstandbyclinicians astheyarebasedonanelectricalequivalentcircuitmodelof therespiratorysystem.Inthecontextofadiagnosis frame-work,theinterpretationofresistanceandreactancecurves,as wellasthederivedparametersmeasuredbytheFOT,requires trainingandexperience,andisdifficulttaskfortheuntrained pulmonologist.

Methodsbasedonmachinelearning(ML)havebeenwidely usedtodevelopclassifiers.Thesesystemscanextract infor-mation from different classes ofsignals afterhaving been trainedtoperformthisspecifictaskbylearningfrom exam-ples.Inrespiratorymechanics,MLprovedtobeusefulasa patternrecognitionmethodtooptimizealarmsofanesthesia breathingcircuits[7],detectionofupperairwayobstruction

[8],esophagealintubation[9],assessmentoflunginjury[10], staticcomplianceinanimalmodels[11]andtheevaluationof spirometricexams[12].Recently,aseverityclassificationfor idiopathicpulmonaryfibrosisbyusingfuzzylogicwas pro-posed[13].

2.

Background

Previous works [14,15] have compared groups of controls and COPD patients observing clear modifications in FOT parameters.However,categorizationofpulmonarydiseases by looking at the plotted curves of respiratory impedance or derived parameters can prove a difficult task for the untrained pulmonologist. This raises the question: an ML basedapproachtothe analysisofFOT datacan providean efficientmethodtorecognizeCOPD?Infact,onlytworecent conferencepapershaveaddressedthisquestion[16,17].

Inthe work ofBaruáet al.[16], anartificial neural net-work(ANN)wasusedtorecognizeandclassifythediseases ofthecentralandperipheralairways.TheauthorsusedIOS measurementsandafeedforwardANNthatwastrainedby thebackpropagationalgorithm.Aftersupervisedtraining,the classifierproduceda98.47%and61.53%correctclassification ratewhenthesamedataandanewsetofunseendatawere used,respectively.Itwaspointedoutthattheproposed clas-sifiercouldbefurtherimprovedwiththeinclusionofmore trainingsamplescombinedwithfuzzylogicdecisionrules.

Inalatterworkofthesamegroup[17],aclassifierbased onANNwascapableofdistinguishingbetweenrelatively con-strictedand nonconstrictedairwayconditions inasthmatic

children. The performance of the classifier was evaluated bytwomethods:(1)usingallofthepatternsduringtraining aswellasinthe feed-forwardstageand(2)usingonly60% ofthedatasetduringtrainingandwiththeremaining40% as unseen patterns. The classification accuracies obtained were95.01%and98.61%,respectively.Theauthorsconcluded that ANNs can successfully be trained with the impulse oscillation system (IOS) data, enabling them to generalize theIOSparameterrelationshipstoclassifypreviouslyunseen pulmonary patterns. The two cited studies used an IOS, whichhasdifferencesfromtheclassicalFOT,includingdata processing and the parameters used tointerpret raw data

[18,19].Inaddition,fromasystemidentificationpointofview, the impulse excitationsignalusedinIOS isamuchworse excitationsignalthanaMultisineusedinFOT.Thisdifference isassociatedwithaworsecrestfactorintheimpulsesignal.

Inthiscontext,weobservedthattherewasnodatainthe literatureconcerningtheuseofMLalgorithmsassociatedwith classicalFOTmeasurementstoaidcliniciansinthe identifi-cationofCOPD.Tocontributetoelucidatethisquestion,our grouprecentlyinvestigatedthispossibilityusingtheclassical FOTassociatedwithaclassifierbasedonANN[20].Twofeature selectionmethods(theanalysisofthelinearcorrelationand forwardsearch)wereusedinordertoidentifyareducedset ofthemostrelevantparameters.Twodifferenttraining strate-giesfortheANNswereusedandtheperformanceofresulting networkswere evaluatedbythedeterminationofaccuracy, sensitivity(Se),specificity(Sp)andAUC.TheANNclassifiers presentedhighaccuracy(Se>0.9,Sp>0.9andAUC>0.9)both inthecompleteandthereducesetsofFOTparameters.This indicates that ANNs classifiersmay contributeto easy the diagnosticofCOPDusingFOTmeasurements.Althoughthese resultswereverypromising,thisinitialworkwaslimitedto theinvestigationofanANNbasedclassifierbecausewewere interestedinadirectcomparisonwiththetwopreviouslycited works.

Thepurposeofthepresentstudy istoevaluatethe per-formance ofseveral MLalgorithms in the developmentof anautomaticclassifiertohelpthediagnosticofCOPDusing forcedoscillationmeasurements.

Thepaperisorganizedasfollows:adiscussionofthedesign principlesandimplementationgoalsispresentedinthenext section.ThehealthygroupandtheCOPDgroupare character-izedinSection4,alongwithadescriptionofthemeasurement protocol.Thissectionalsopresentstheevaluatedclassifiers anddescribesthemethodsusedforperformanceevaluation, comparisonsamongclassifiersandfeatureselection.Section

5presentstheresultsandSection6discussestheresultswith respecttothesearchforthebestclassifierandparameters. Section7summarizesthemainoutcomesofthisinvestigation andpointstofuturestepsinthisresearchtopic.

3.

Design

considerations

3.1. Classificationsystem

Thebasicstructureofaclassificationsystemistheinput,the classifierandtheoutput.Inthepresentwork,theinputsare theparametersprovidedbytheFOT,theclassifierisoneofthe

(3)

patternrecognitionalgorithmschosen,andtheoutputtellsif theinputparametersindicateCOPDornot.

The design process of a classification system presents severalimportantaspectssuchas:theevaluationofthe clas-sifiers,choiceofthealgorithmstobeused,featureselection, selectionofthebestparametersandcomparisonofclassifiers performance.Inthefollowingsections,theseaspectswillbe brieflydescribed.

3.2. Thestudiedclassifiers

Inthisparticularstudy,thefollowingclassificationalgorithms wereevaluated:

• LinearBayesNormalClassifier[21,22]

Knearestneighbor[21]

• Decisiontrees[23,24]

• Artificialneuralnetworks[25]

• Supportvectormachines[26]

Thesealgorithmswerechosenbecausetheyrepresentwide varietyofclassifieralgorithmsasseeninLippmann’slistof typesofclassifiers[21]. Theywill bebrieflydescribed. The completefulldescriptionofthealgorithmscanbefoundin thereferences.

TheLinear Bayes Normal Classifier (LBNC) presents the minimum-error,accordingtothe BayesianDecision Theory, whentheclassesarenormallydistributedwithequal covari-ancematrixes.TheLinearBayesisfastandsimpletocompute fromthetrainingdataandprovidesaverystraight interpreta-tion,sinceitisdecisionboundaryisahyperplane.Inspiteof itssimplicity,itisreasonablyrobust,i.e.,itcandeliver surpris-inglygoodresultsevenwhentheclassesdonotfollownormal distributionswithequalcovariancematrixes[21].

TheKnearestneighbor(KNN)isoneofthemostsimple andelegantclassificationmethodsinpatternrecognition[21]. Itisatypeofinstance-basedlearning,orlazylearning,which means that inthe learningstage, it simplystores a set of labeledinstances(trainingset).Whenanewqueryinstance hastobeclassified,thealgorithmfindsKnumber of train-ing instancesclosest tothe querypoint, using asimilarity functionusuallybasedontheEuclideandistance.The classifi-cationisdoneusingthemajorityvoteamongtheclassification oftheKobjects.IfK=1,thentheobjectissimplyassignedto theclassofitsnearestneighbor.

Adecisiontree(DTREE)isahierarchicalstructurethat con-sists ofnodes and branches [23]. There are threetypes of nodes:the rootthathasonlyoutgoingbranches,the inter-nalnodesthathaveoneincomingandtwoormoreoutgoing branches and terminal (leaf) nodes that have no outgoing branches.All terminalnodeshaveaclasslabelassigned to them[23].Eachnonterminalnodeinthetreerepresentsa testononeoftheattributesandeachbranchthatcomesout ofthenoderepresentsoneofthepossibleoutcomesofthe testperformed.Aqueryinstanceisclassifiedbystartingat therootnode,testingtheattributespecifiedbythisnode,and thenmovingdownthetreebranchcorrespondingtothe out-comeofthetestforthisattribute.Thisprocessisrepeated untilitgetstoaterminalnode,wheretheclasslabelisgiven tothequeryinstance.

An ANN isa massive parallel system [25]composed of manysimpleprocessingelements(neurons)whosefunctionis determinedbythenetworkarchitecture,connectionstrengths (synapticweights)andtheprocessingperformedatthe neu-rons. Neural networksare capableof acquiringknowledge throughalearningprocessandtostorethatknowledgeinthe synapticweights.Oneofthemostsuccessfulneuralnetwork architectureisthe multilayerperceptron(MLP). Ithasbeen successfullyappliedtoavarietyofpatternrecognition prob-lemsinindustry,business,science[27]andinmedical diagno-sis[27,28].Oneofthemostimportantfeaturesofaneural net-workistheabilitytogeneralizewhatithaslearnedfromthe trainingprocedure.Thisallowsthenetworktodealwithnoise intheinputdataandtoprovidethecorrectoutputstonew datapatterns,i.e.,datathatwerenotusedtotrainthenetwork. Supportvectormachines(SVM)arelearningsystemsbased on statisticallearning theory[26] and theyhavebeen suc-cessfully used in a varietyof classification and regression problems. For a two-classclassification problem, the basic form SVM is a linear classifier that performs a classifica-tionconstructingahyperplanethatoptimallyseparatesthe classes.Theoptimalhyperplaneistheonethatprovidesthe maximalmargin.(Themarginisdefinedasthedistancefrom atrainingsampleandthehyperplane.)Itcanbeproventhat this particularsolutionhasthe highestgeneralization abil-ity.Thisformulationcanbegeneralizedapplyinganon-linear mappingofthetrainingset.Thedataistransformedtoanew featurehigh-dimensionalspacewheretheclassesaremore easily separable and anoptimal hyperplanecan be found. TheradialbasisfunctionKernelisfrequentlyusingin accom-plishingthisnonlinearmappinganditisfrequentlythefirst nonlinearmappingtoconsider.Althoughthedecisionsurface (hyperplane)islinearinthehighdimensionalspace,however, whenitisseenintheoriginallow-dimensionalfeaturespace, itisnolongerlinear,meaningthatSVMcanalsobeappliedto datathatisnotlinearlyseparable[29].

3.3. Featureselection

Thepurposeoftheinputfeatureselectionistofindthe small-est number of relevant and informative features that can resultinasatisfactory performance[30].Other motivations toperformfeatureselectionare:generaldatareduction,to limitstoragerequirements,increasethealgorithmspeedand togainknowledgeabouttheprocessthatgeneratesthedata andtoallowdatavisualization(2Dor3D)[30].Itisalso impor-tantbecausealargenumberofinputsimplyintheestimation ofalargenumberofmodelparameters,whichcanbedifficult inlimitedsizedatasets[28].

Basicallytherearethreetypesoffeaturesselection meth-ods: filters, wrappers and embedded methods [30]. Filter methods provide a ranking order of the features using a relevant index such as correlation coefficients or classical statistical tests (T-test, F-test, Chi-squared, etc.). Wrappers normallyapplyanefficientsearchstrategytofindthebest fea-turesbasedonthemachinelearningalgorithmperformance, suchastheclassificationaccuracy.Embeddedmethods per-form feature selection in the process of training and are usuallyspecifictosomegivenlearningmachines,suchas deci-siontrees[30].

(4)

3.4. Performanceevaluation

Theevaluationoftheclassifiersplaysakey rolein classifi-cationsystemdesign.Itsprimarygoalistochoosethebest classifierandestimatesitsperformanceonfutureexamples (generalizationaccuracy)[31].Themaincomponentsinthis evaluationare:thechoiceofthe performancefunction,the evaluationstructureandthecomparisonofdifferent classi-fiers.Thereareseveralmeasuresthatcanbeusedtoaccess theperformanceofthe classifier,dependingonthespecific domainofapplication.Someofthecommonusedmeasures are:accuracy,sensitivity,specificity,TruePositiveRate,False PositiveRate,Recall,PrecisionandtheareaundertheReceiver OperatingCharacteristic(ROC)curve(AUC)[32].

Theevaluationstructureisanimportantpartofthedesign. In order todecide the best classifier, one hasto look into the generalizationaccuracy. Thiscan bedone using either Hold-outorK-foldcross-validationprocedures.InHoldout, theavailabledataisdividedintrainingandtestdatasets.The classifieristrainedwiththetrainingdatasetandthe perfor-manceofthetrainedclassifierisevaluatedinthetestdata settoestimatethegeneralizationaccuracy.Theproblemwith HoldoutisthatdifferentHoldoutsets(differentsplits)leads todifferentresults.Also,dependingontheavailabledata,it ispossibletoendupwithaverywideconfidenceintervalfor theaccuracy[24].InaK-foldcross-validation,all the avail-abledataispartitionedintokequal(orapproximatelyequal) datasetsorfolds[33].For eachfoldinturn,usethatfolder fortestingandtheremainingk−1foldersareusefortraining aclassifier.Theperformanceofeachlearningalgorithmon eachfoldcanbetrackedusingsomepre-determinedmeasure suchasaccuracy.Uponcompletion,ksamplesofthe perfor-mancemetricwillbeavailableanddifferentmethodologies suchasaveragingcanbeusedtoobtainanaggregatemeasure fromthesesamples,orthesesamplescanbeusedina statisti-calHypothesistesttocomparetwoormoremachinelearning algorithms.

TheuseofK-foldcross-validationallowsustoestimate performanceofthelearnedmodelfromavailabledatausing onealgorithm.Inotherwords,it ispossibletoestimateits performanceinunseenexamples (the generalization capa-bilityofthe algorithm).Itcanalsobeusedtocomparethe performanceoftwoormoredifferentalgorithmsandrealize thebestalgorithmfortheavailabledata,oralternatively,it canhelpthedesignertochoosethebestsetofparametersof aparticularmodel.

TheHypothesistestisanotherimportantelementwhen onedesiretocomparetwoormoremachine learning algo-rithms. In the Hypothesis test, we want to verify if there isnodifference inthe performanceoftwo classifiers(Null Hypothesis)undera certainconfidence level(usually 95%). Foracomparisoninonedataset,onecanusetheStudentˇıs test(t-test)oroneofitsvariations,forexamplethecorrected resample[24].Dietterich[31]pointsout thattheuseofa t-test hasa right risk ofa Type I error,i.e., a riskof find a differencewherenoneexists,recommendingthe5×2 cross-validationortheuseofMcNemarˇıstest.Inthecaseofmultiple datasetsfromdifferentdomains,Demsar[34]recommends Wilcoxon’sSignedRankstest,Friedman testsand Posthoc tests.

It is also importantto mention that sometimes classi-fiersareevaluatednotonlybytheirperformancemeasures, but alsobythe speedandscalability,robustnessand inter-pretability.Whenonelooksatspeedandscalability,he(she)is interestedtoknowhowlongittakestoconstructtheclassifier, howlongittakestouseclassifierandifitisabletodealwith datasetswithseveralthousandpoints.Ifrobustnessis impor-tant, onetries to evaluateits capability ofhandling noise, missingvaluesandirrelevantfeatures.Iftheinterpretability isimportant,onetriestofindiftheclassifiercangivesome explanationonhowitachievedtheclassificationforacertain pointofthedataset.

4.

Methods

4.1. Subjectsandspirometry

Theobjectivesofthestudywereexplainedtoallindividuals andtheirwrittenconsentwasobtainedbeforeinclusioninthe study.ThestudywasapprovedbytheMedicalResearchEthics CommitteeoftheStateUniversityofRiodeJaneiro.Thestudy involvedagroupofCOPDpatientswith25subjectsanda con-trol groupformedby25 neversmokingsubjects. Thegroup wasformedbasicallybystudentsandemployeesoftheState University ofRiode Janeiro,andwas composedbyhealthy subjectswhopresentednormalspirometryandnohistoryof pulmonaryorcardiacdisease.ThepatientswithCOPDwere comingfromtheAmbulatoryofCOPDoftheServiceof Pneu-mologyofourUniversityHospital.Thepatientswereinstable clinicalcondition.

COPDpatientspresentedmild(n=8),moderate(n=9)and severe(n=8)airflowobstruction,whichwasevaluatedusing the following parameters [6,14,35]: forced Expiratory Vol-ume inthefirst second(FEV1),ForcedVital Capacity(FVC),

FEV1/FVCratioandtheForcedExpiratoryFlow(FEF)between

25%and75%ofFVC,andFVC(FEF/FVC)ratio.These measure-mentswereobtainedforallpatientsinasittingposition,using aclosedcircuit spirometer(VitraceVT-139;Pro-médico,Riode Janeiro,Brazil),andwerepresentedasrawdataandpercentile ofthepredictedvalues(%pred).

4.2. Forcedoscillationtechnique

The instrumentation used for evaluation of respiratory impedancebyFOThasbeendescribedinotherstudies[36,37]. Briefly,apseudorandomsinusoidalsignalwith2cmH2O

peak-to-peakofamplitude,containingallharmonicof2Hzbetween 4and32Hz,wasappliedbyaloudspeaker.Thepressureinput wasmeasuredwithaHoneywell176PCpressuretransducer (Microswitch,Boston,MA, USA),and theairway flowswith ascreenpneumothacographcoupledtoasimilartransducer withamatchedfrequencyresponse.Thesignalswere digi-tizedatarateof1024Hz, forperiodsof16s,byapersonal computer,andafastFouriertransformwascomputedusing blocksof4096pointswith50%overlap.ToperformtheFOT analysisthevolunteerremainedinasittingposition,keeping theheadinanormalpositionandbreathingspontaneously throughamouthpiece. Duringthe measurements,the sub-jectsfirmlysupportedhis/hercheeksandmouthfloorusing

(5)

bothhands,whileanoseclipwasworn.Aminimal coher-encefunctionof0.9wasconsideredadequate[6,38].Anytime thecoherencecomputed,(foranyofthestudiedfrequencies) waslessthanthisthreshold,themaneuverwasnot consid-eredvalidandtheexamwasrepeated.Threemeasurements weremadeandthefinalresultofthetestwascalculatedas themeanofthesethreemeasurements.

TodescribetheresistivecomponentoftheFOTdata, an analysisoflinearregressioninthefrequencyrangebetween 4and16Hzwasusedinordertoachieveinterceptresistance (R0)andtheslopeoftheresistivecomponentoftheimpedance

(S).Usingthesamefrequencyrange,aparametercommonly relatedtoairwaysdimensions,themeanresistance(Rm)was

alsocalculated[6,12,38].Theresultsassociatedwiththe reac-tancewere interpretedusing themean reactance(Xm),the

resonancefrequency(fr)andthedynamiccomplianceofthe

respiratorysystem(Crs,dyn)[6,12,38].TheCrs,dynwasestimated

consideringrespiratoryreactanceattheoscillatoryfrequency of4Hz(Xrs4Hz)andusingtheequationXrs4Hz=−1/(2fCrs,dyn)

[6,12,38].Thesamefrequencywasusedtoevaluatethe abso-lutevalueofrespiratoryimpedance(Z4Hz),whichrepresents

thetotalmechanicalloadoftherespiratorysystem,including resistiveandelasticeffects[38].

4.3. Featureselection

Inordertofindtheappropriatesetofinputs,all three fea-ture selection methods cited in the previous section were used.Thechosenfiltermethodusedthecorrelation coeffi-cientsasarankingindex.Theanalysisofthelinearcorrelation coefficientswasdonecalculatingthematrixCofcorrelation coefficients.Eachelementofthismatrixrepresentsthe cor-relationcoefficientbetweentwofeatures,C(featurei,featurej)

orbetweenthefeaturesandtheoutput,C(featurei,Output).

Theproceduretofindthemostrelevantfeatureswasstarted bylookingforafeaturethatpossessthehighestcorrelation coefficientwiththeoutput,C(featurei,Output).Ifthisiscalled

HCCFO(HighestCorrelationCoefficientFeaturewithOutput). Thenextstepwastoeliminatefeatureswherethefollowing relationholds:

|C(HCCFO,Feature)|>|C(Output,HCCFO)

|>|C(Output,Feature)| (1)

Itwas done because if the relation (1) holds for a spe-cificfeature, the information it carries can berepresented bythefeature thathas highestcorrelationcoefficient with the output (HCCFO).Theprocess of featureselectionusing thecorrelation coefficientswasperformedusing the cross-validation method [33]. The available dataset was divided ina fixed number of folds. Eachfold had the same num-berofnormaland COPDmeasurements.Oneofthefolders isthe testset and remainingfolders used astraining set. Thefeatureselectionusingthecorrelationcoefficientswere appliedonlyonthetrainingsets.Therewasusedthreesearch strategies(forward,backward,forwardfloating)inthewrapper methods,andtheperformanceindexwas1-nearestneighbor leave-one-out classification performance. The embedded methodwasusedonlyinthetrainingofthedecisiontree.

4.4. Searchforthebestclassifierparameters

Thefiveclassifiers(LBNC,KNN,DTREE,ANNandSVM)were implementedwithapatternrecognitiontoolbox(prtools)for Matlab[39].TheLBNCwasusedwiththedefaultparameters, i.e.,withnoregularization.IntheKNN,Kwasset1,sowehave theonenearestneighborclassifier.Inalltheotherclassifiers, thesearchforthebestparameterswasdonewitha10-fold cross-validationusingtheaverageclassificationaccuracyin thetestfoldsasaperformanceindex.

In the decision process, the used parameters were the binarysplittingcriterion(informationgain,purity,fisher cri-terion)andthepruningtype(Quinlanpruning,nopruningor theuseofatuningsetforpruning)[39].

In the ANNclassifier the parameterto besearch is the numberofneuronsinthehiddenlayer.Ontheotherhand, concerningtheSVMclassifierwithradialbasisfunction ker-nelhasonlytwoparameterstobefound:theregularization parameterC,thatexpressthechoiceofhavingalarge mar-ginwithmoretrainingsampleswronglyclassifiedorhaving asmallmarginwithlessclassificationerrorsandthe param-eter,andr,thatistheradiusoftheradialbasiskernel.Since theseparametersarenotdiscretevalues,itwasuseda grid-search.Variouspairsof(C,r)valuesweretriedandtheonewith thebestcross-validationaccuracywaspicked.Sincedoinga completegridsearchmaystillbetimeconsuming,Hsu[40]

recommendedtouseacoarsegridfirsttofindthe“bestregion” andtheuseafinergridtosearchthisregion.Itisimportant tonoticethatthethisparametersearchhastobedonefor eachsetofselectedfeatures,i.e.,thereisnoguaranteethat thesameparametersettingwillworkforallsetsofselected features.Forallexperiments,thefeatureswerenormalizedto havezeromeanandunitstandarddeviation.Thisisnecessary toremovescaleeffectscausedbytheuseoffeaturesthathas differentmeasurementscales[41].

Alltheclassifiersweretrainedandevaluatedwiththesame trainingandtestsetsgeneratedbya10foldcross-validation inavailabledataset.Theaccuracy,Se,SpandtheAUCwere calculatedinthe10testssets.Also,itwasassignedtoeach testexampleinthetestsetstwopossibleoutcomes:1 mean-ingthattheclassificationprovidedbytheclassifierwascorrect and0,otherwise.ItallowedustoapplytheCochran’sQtest

[42]todeterminewhethersignificant differencesexistedin theclassificationresults.BesidestheCochran’stest,the McNe-marstest[31]wasappliedbetweeneachpairofclassifiersto findwhethersignificantdifferencesexisted[43].Thesetests wereassumedtobestatisticallysignificantatp<0.05andwere implementedinMatlab7.4.0usingtheStatisticsToolbox6.0.

5.

Results

5.1. Characteristicsofthesubjects

Thebiometricandspirometriccharacteristicsofthestudied subjectsaregiveninTable1.Thebiometriccharacteristicsof thetwostudiedgroupswerewellmatched,andtherewerenot significantdifferencesbetweenthegroups.Ascanbeseenin

Table1,patientswithCOPDpresentedsignificantreductions inthespirometricparameters(p<0.0001).

(6)

Table1–Biometricandspirometriccharacteristicsofthe studiedgroups. CG COPD p Age(years) 55.2±16.7 61.4±9.7 ns(0.38) Weight(kg) 65.4±11.8 66.0±8.4 ns(0.84) Height(cm) 162.2±8.9 163.5±7.9 ns(0.58) FEV1(L) 2.8±0.9 1.4±0.7 <0.0001 FEV1(%pred) 107.1±20.3 57.0±27.7 <0.0001 FEF/FVC(%) 100.3±32.3 28.5±18.3 <0.0001 FEV1/FVC(%) 87.9±10.0 55.0±16.7 <0.0001

Table2–selectedfeaturesusingdifferentstrategies.

Searchstrategy Selectedfeatures

Forward fr,Xm,R0,Crs,dyn,|Zrs|

Backward fr,Xm,R0,|Zrs|

Forwardfloating fr,Xm,R0,|Zrs|

5.2. Featureselection

Themostcommonselectedfeaturesusingcorrelationinthe differenttrainingsetswere:(fr,R0,Crs,dyn)and(R0,Crs,dyn).The smallnumberofselectedfeaturesshowsthatparametersare highlycorrelated.Theresultsofthefeatureselectionusing the different search strategies(forward, backward,forward floating)using1-nearestneighborleave-one-outclassification

accuracyas performanceindex are shown inTable 2. The

searchstrategieswereconfiguredtofindthenumberof fea-turesthatgivesthehighestperformance.

5.3. Performanceofthestudiedclassifiersusing differentfeatureselectionmethods

5.3.1. Experiment1—useofallfeatures(FOTparameters)

Fig.1showstheaverageROCcurveforeachclassifier,while

Table3presentstheaverageandthestandarddeviationofthe derivedparameterscalculatedinthe10testfolds,forallofthe studiedclassifiers.Theresultspresentedwereobtainedwith thebestparametersfoundforeachclassifier.

LBNCpresentedthebestaverageSp(1.00),KNNpresented thebestaverageAcc(0.97)andAUC(1.00).Ontheotherhand, SVMpresentedthebestaverageSe(0.97)andAUC(1.00).The applicationoftheCochran testhasshownstatistically sig-nificantdifferenceintheclassifiers,andtheMcNemarstest appliedtoall pairsofclassifiersindicatedthat therewas a statisticallysignificant difference between KNN and LBNC, andbetweenKNNandDTREE.

Fig.1–AverageROCcurveforexperiment1.

Fig.2–AverageROCcurveforexperiment2.

5.3.2. Experiment2—forwardselectionsearch

Thesecondexperimentwascarriedoutusingtheselected fea-tureschosenbytheforwardselectionsearchstrategy(fr,Xm, R0,Crs,dyn,|Zrs|).TheseresultsaredescribedinFig.2,which

showstheaverageROCcurveforeachclassifier,andTable4. Theseresultswereobtainedwiththebestparametersfound foreachclassifier.

Table3–Resultsoftheexperiment1.

Classifier Acc Se Sp AUC

LBNC 0.89±0.07 0.78±0.14 1.00±0.00 0.95±0.05

KNN 0.97±0.04 0.96±0.09 0.99±0.04 1.00±0.00

DTREE 0.90±0.05 0.90±0.08 0.91±0.07 0.95±0.04

ANN 0.93±0.06 0.89±0.12 0.96±0.06 0.97±0.05

SVM 0.96±0.05 0.97±0.05 0.94±0.05 1.00±0.01

LBNC,LinearBayesNormalClassifier;KNN,Knearestneighbor;DTREE,decisiontrees;ANN,artificialneuralnetworks;SVM,supportvector machines;Acc,accuracy;Se,sensitivity;Sp,specificity;AUC,areaundertheROCcurve.

(7)

Table4–Resultsoftheexperiment2.

Classifier Acc Se Sp AUC

LBNC 0.90±0.07 0.80±0.13 1.00±0.00 0.96±0.05

KNN 0.95±0.04 0.93±0.09 0.97±0.05 1.00±0.00

DTREE 0.89±0.07 0.88±0.13 0.91±0.09 0.95±0.04

ANN 0.94±0.05 0.92±0.09 0.96±0.07 0.96±0.06

SVM 0.95±0.05 0.95±0.09 0.95±0.07 0.98±0.03

Boldindicatesthebestvaluesofaccuracy,sensitivity,specificityandAUC.

Fig.3–AverageROCcurveforexperiment3.

LBNCpresentedthebestaverageSp(1.00),whileKNN pre-sentedthebestAcc(0.95)andAUC(0.99).SVMpresentedthe bestaverageAcc(0.95)andSe(0.95).Theapplicationofthe Cochrantesthasshownstatisticallysignificantdifference,in theclassifiersandtheMcNemarstestappliedtoallpairsof classifiersindicatedthattherewasastatisticallysignificant

differencebetween:KNNandDTREE.

5.3.3. Experiment3—forwardfloatingselection

The third experiment was carried out using the selected

featureschosenbytheforwardfloatingselectionandthe back-wardsearchstrategiessincebothchosethesamefeatures(fr, Xm,R0,|Zrs|).TheseresultsaredescribedinFig.3,whichshows

theaverageROCcurveforeachclassifier,andTable5. Accordingtotheresults,LBNCpresentsthebestaverage Sp(1.00),KNN presentthebest averageAcc (0.95)and AUC (0.99).Ontheotherhand,SVCpresentsthebestaverageAcc (0.95)andSe(0.94).TheapplicationoftheCochrantesthasnot shownastatisticallysignificantdifference.

Fig.4–AverageROCcurveforexperiment4.

5.3.4. Experiment4—analysisofcorrelationcoefficients

The fourth experiment was carried out using the features (fr, R0, Crs,dyn) selected by the analysis of the

correla-tion coefficients. Theses results are presented in Fig. 4, shows the average ROC curve for each classifier, and

Table6.

Usingthesefeatures,LBNCpresentedthebestaverageSp (1.00), KNN presentedthe best averageAcc (0.95), Se(0.93) andAUC(0.99).TheapplicationoftheCochrantesthasnot shownastatisticallysignificantdifferencebetweenthe clas-sifierresults.

5.3.5. Experiment5—correlationcoefficients

Thefifthexperimentwascarriedoutalsousingthefeatures (R0,Crs,dyn)selectedbytheanalysisofthecorrelation

coeffi-cients.Fig.5showstheaverageROCcurveforeachclassifier, whileTable7showstheassociatedparameters.

Intheseconditions,LBNCpresentedthebestaverageSp (1.00),KNN andANNpresentedthebest averageAcc(0.93),

Table5–Resultsoftheexperiment3.

Classifier Acc Se Sp AUC

LBNC 0.89±0.07 0.78±0.14 1.00±0.00 0.96±0.04

KNN 0.95±0.04 0.92±0.07 0.97±0.06 0.99±0.03

DTREE 0.89±0.07 0.89±0.11 0.89±0.09 0.95±0.05

ANN 0.91±0.08 0.87±0.09 0.95±0.10 0.97±0.06

SVM 0.95±0.05 0.94±0.07 0.95±0.09 0.96±0.05

(8)

Table6–Resultsoftheexperiment4.

Classifier Acc Se Sp AUC

LBNC 0.90±0.07 0.80±0.13 1.00±0.00 0.96±0.05

KNN 0.95±0.09 0.93±0.10 0.97±0.09 0.99±0.03

DTREE 0.91±0.08 0.89±0.11 0.93±0.11 0.96±0.05

ANN 0.91±0.05 0.89±0.09 0.93±0.09 0.95±0.05

SVM 0.93±0.11 0.91±0.11 0.94±0.14 0.96±0.07

Boldindicatesthebestvaluesofaccuracy,sensitivity,specificityandAUC.

Table7–Experiment5results.

Classifier Acc Se Sp AUC

LBNC 0.90±0.07 0.80±0.13 1.00±0.00 0.97±0.05

KNN 0.93±0.05 0.91±0.09 0.96±0.07 0.99±0.01

DTREE 0.90±0.07 0.91±0.09 0.89±0.15 0.95±0.05

ANN 0.93±0.05 0.91±0.09 0.94±0.07 0.97±0.04

SVM 0.91±0.07 0.82±0.14 1.00±0.00 0.97±0.04

Boldindicatesthebestvaluesofaccuracy,sensitivity,specificityandAUC.

Fig.5–AverageROCcurveforexperiment5.

KNN, DTREEand ANNpresentedthe bestaverage Se(0.91)

andKNN presentedthebest AUC(0.99).Theapplicationof

theCochrantesthasnotshownastatisticallysignificant dif-ferencebetweentheclassifierresults.

5.4. Searchforthebestclassifierparameters

Tables8and9showthebestparametersforeachclassifierand theiraverageaccuracies.

5.5. PerformanceoftheKNNclassifiersusingdifferent featureselectionmethods

Table10liststheresultsachievedbyKNNinallofthe experi-ments.

6.

Discussion

ThepurposeofthepresentstudywastodevelopanML sys-tem classifierthatmaycontributetoeasythediagnosticof COPD usingFOT measurements. Althoughprevious confer-ence papershaveinvestigatedthepotentialofANNtoeasy the diagnostic ofCOPD using IOS [16,17] and FOT [20], to the authors’knowledge,this isthefirst study dedicatedto compare the performanceof several MLalgorithms in the developmentofanautomaticclassifiertohelpthediagnostic ofCOPDusingFOTmeasurements.Morespecifically,we inves-tigatedtheperformanceoftheLBNC,KNN,DTREE,ANNand SVMalgorithms.Wealsoperformedaninputfeatureselection inorder tofind thesmallestnumberofrelevantand infor-mativefeaturesthatcanresultinasatisfactoryperformance

Table8–Selectedparametersfordifferentselectedfeatures.

Selectedfeatures Classifiers Parameter Value Averageaccuracy

Allfeatures

DTREE SplittingPruningtypecriterion PurityNone 0.89

ANN Numberofhiddennodes 8 0.95

SVM Regularizationparameter(C) 8 0.96

Radius(r) 0.707

fr,Xm,R0,Crs,dyn,

|Zrs|

DTREE Splittingcriterion Purity 0.90

Pruningtype None

ANN Numberofhiddennodes 7 0.92

(9)

Table9–selectedparametersfordifferentselectedfeatures.

Selectedfeatures Classifiers Parameter Value Averageaccuracy

fr,Xm,R0,|Zrs|

DTREE SplittingPruningtypecriterion PurityNone 0.89

ANN Numberofhiddennodes 3 0.90

SVM Regularizationparameter(C) 22.627 0.94

Radius(r) 0.42

fr,R0,Crs,dyn

DTREE Splittingcriterion Purity 0.92

Pruningtype None

ANN Numberofhiddennodes 3 0.92

SVM RegularizationRadius(r) Parameter(C) 80.5 0.95

R0,Crs,dyn

DTREE Splittingcriterion Purity 0.89

Pruningtype None

ANN Numberofhiddennodes 3 0.91

SVM Regularizationparameter(C) 0.25 0.91

Radius(r) 1

Table10–ComparisonsoftheresultsachievedbyKNNinalloftheexperiments.

Experiment Acc Se Sp AUC

AllFeatures 0.97±0.04 0.96±0.09 0.99±0.04 1.00±0.00

fr,Xm,R0,Crs,dyn,|Zrs| 0.95±0.04 0.93±0.09 0.97±0.05 1.00±0.00

fr,Xm,R0,|Zrs| 0.95±0.04 0.92±0.07 0.97±0.06 0.99±0.03

fr,R0,Crs,dyn 0.95±0.09 0.93±0.10 0.97±0.09 0.99±0.03

R0,Crs,dyn 0.93±0.05 0.91±0.09 0.96±0.07 0.99±0.01

[30].Finally, wecomparedtheperformanceofclassifiersin ordertoevaluatethemostadequatemethodtodetectCOPD.It hasbeenshownthat,ingeneral,allofthestudiedalgorithms wereabletoadequatelydetect COPD.However,it is impor-tanttopointoutthatsomeclassifierwillperformthiswork betterthanothers.Interestingly,thefeatureselectionallowed thereductionoftheusedfeatureswithoutasignificant reduc-tioninperformance.Furthermore,ROCanalysisshowedthat particularlythreeofthestudiedalgorithmspresentedagreat potentialtocontributetotheautomaticdetectionofthe res-piratoryeffectsofCOPDinaclinicalsetting.

TheanalysisofROCcurvesisperformedbyplotting sensi-tivityversus1-specificityforeachpossiblecut-offlevel.This way,thelargertheareaunderthecurve(AUC),themorevalid thediagnostictestis.Thisparameterhastheclinicallyuseful interpretationofrepresentingtheprobabilityofcorrectly dis-criminatingbetweentwosubjectsinarandomlyselectedpair ofabnormalandnormalsubjects[44,45].Accordingtothe lit-erature,ROCcurveswithAUCsbetween0.50and0.70indicate lowdiagnosticaccuracy,AUCsbetween0.70and0.90indicate moderateaccuracy,andAUCsbetween0.90and1.00indicate highaccuracy[46,47].

Takingintoconsiderationthesevalues,allofthestudied classifiersreachedhighlevelsofaccuracywhenallfeatures wereused(experiment1,Fig.1andTable2).KNNwasthemost adequate algorithmtocorrectly identify COPD(AUC=1.00), followedbySVM(AUC=1.00)andANN(AUC=0.97).Statistical comparisonsshowedthatKNNwassignificantlybetterthan LBNCandDTREE.

Theresultsobtainedusingthefiveselectedfeatures cho-senbytheforwardselectionsearchstrategy,describedinFig.2

and Table4,were coherent withthatobtained usingall of thesevenfeatures,showingthatKNNwasthemostadequate algorithmtocorrectlyidentifyCOPD(AUC=1.00),followedby

SVM(AUC=0.98)andANN(AUC=0.96).Oncegain,statistical comparisonsshowedabetterperformanceoftheKNNwhen comparedwithDTREE.

Althoughwecouldnotobservestatisticallysignificant dif-ferencesamongtheperformanceoftheclassifiers,theKNN algorithmalsopresentedthehighestvalueofAUC(0.99) con-sideringtheresultsobtainedusingthefourselectedfeatures chosenbythe forwardfloatingselectionand thebackward search strategies. Theseresults are describedin Fig.3 and

Table7.TheperformanceoftheKNNwasfollowedbyANN (AUC=0.97) and SVM and LBNC (AUC=0.96). These results were similartothat observedfurtherreducing the number ofusedfeatures(Fig. 4andTable 6), whichwasconducted usingthreefeaturesselectedbytheanalysisofthecorrelation coefficients.

Inthelastexperiment,conductedusingonlytwofeatures selectedbytheanalysisofthecorrelationcoefficients(Fig.5

andTable7),KNNwasalsotheclassifierwiththehighestAUC (0.99).However,inthisexperiment,onecannotsaythata par-ticularclassifierdominatestheothers.Differentsectionsof thecurvearedominatedbydifferentclassifiers(Fig.5).

RecommendationsforresearchinCOPD[48]includethe need for improved noninvasive mechanical tests of lung function.Thepresentstudywasconductedasaneffortto con-tributeinthisdirection,andshowedthatFOTmeasurements, integratedwithmachinelearningalgorithms,mayconstitute averypromisingsystemabletonon-invasivelyandaccurately diagnoseCOPD.WeobservedhighvaluesofAUCinallofthe classifiersandfeaturesstudied,andthattherearestatistically significantdifferencesinthefirstexperimentbetweenKNN and LBNC,KNN andDTREE,and inthesecond experiment betweenKNNandDTREE.Itmeansforallothercasesonecan use anyofthefiveclassifiers.However,ifonelooksonthe averagevaluesoftheperformancemeasures(Tables3–7),the

(10)

classifiersthatperformedbestinallexperimentswereKNN andSVM.Theywere followedbytheANNandthen bythe LBNCandDTREE.Infact,KNNwasthemostadequate clas-sifiertousetocorrectlyidentifytherespiratorymodifications inthestudiedCOPDpatients.

AlthoughtheFOTmaybeveryusefulinclinicalpractice, thistechniquehasnotbeenwidelyusedinthemedical com-munityduetothelackofspecificity,whichisassociatedwith thebiasfromtheupperairwayshunt.Itisinterestingtonote thattheuseofmachinelearningalgorithmsresultedinvery accurateresults (Tables3–7 and 10). Webelieve that these resultsmayhelptoincreasetheacceptationoftheFOTinthe medicalcommunity.

Thefiveexperimentswere madeusingdifferent feature selectionmethods.Noneofthemprovidedbetterresultsthan theexperimentthatuseallfeatures,i.e.,allFOTparameters. Thismeansthatall FOT parametersarerelevant. However, byanalyzingtheexperimentsthatselectedspecificfeatures, onecanobserve arankbetween theFOT parameters. This can beshown in Table10 which lists the results achieved byKNN in all the experiments. Thesmall decrease inthe performancemeasurementsindicates thatfr, R0,Crs,dyn are

themostimportantparameters.Thisagreeswiththeanalysis ofthecorrelationcoefficients. Theuseoffewerparameters (fr, R0, Crs,dyn or R0,Crs,dyn)simplifies theanalysis and still

keepahighdegreeofaccuracy.

Inrelationtothespeed,itknownthatKNNisverypowerful andveryfasttobuild,butitcantakealongtimetoperform aclassificationif thetrainingset islarge[21]. Sinceinthis casethedatasetissmall,theKNNdidnottakelongto per-formaclassification.TheSVMclassifierpresentsverygood results.Itisveryfasttotrainandtoperformthe classifica-tion.TheANNtakesalongtimetobuildaclassifierdueto thetrainingprocedures,butitisveryfasttoperforma clas-sification.Italsodoesnotprovideanyexplanationonhowit achievedtheclassification.Theclassifiersthatpresentmore interpretableresults(LBNCandDTREE)haveverysimilar per-formance.TheDTREEsufferedfromthefactthatthefeatures arehighlycorrelated.Intheauthor’sopinion,consideringthe trade-offamongaccuracy,thetimetobuild,totrain,andthe timetoperformaclassification,ifweusealltheFOT param-eters,theKNNisthemostappropriatechoice.Itallowsusto achieveahighdegreeofaccuracyandanintuitive interpreta-tionoftheclassification.Inthiscase,theexamundertestis classifiedasnormalorCOPDaccordingtothetrainingsetthat isclosesttoit.Thisclassifierisalsoagoodchoiceifwewant touseonlytwoparameters,asdescribedinSection5.3.5.

Itisimportanttopointoutthatthefeatureselectionand associatedresultsofthisstudyarespecificfortheCOPD.Other diseaseswillresultindifferentchangesintherespiratory sys-temand,thus,otherparametersmaybebettersuitedtothe identificationoftherespiratorychanges.Eveninpure emphy-sema,whichisadiseaseassociatedwithCOPD,theauthors recommendthatasimilarstudybeconductedandthe opti-mizedconditionsareobtainedandused.

7.

Conclusions

Inthispaper, wedesignedandevaluated severalclassifiers systemsandfeatureselectionmethodstodevelopaclinical

decisionsupportsystemtohelpthediagnosticofCOPDusing FOTmeasurements.KNN,SVMandANNclassifierswerethe mostadequate,reachingvaluesthatallowaveryaccurate clin-icaldiagnosis.Theseclassifiersallowedtheidentificationof therespiratorymodificationswithaminimumsensitivityof 87%andaminimumspecificityof94%.Theuseofthe analy-sisofcorrelationasarankingindexoftheFOTparameters, allowedustosimplify theanalysisofthe FOT parameters, whilestillmaintainingahighdegreeofaccuracy.

8.

Future

plans

Basedonthesepromisingresults,futureworkincludesthe fol-lowinggoals:(1)toaddtotheclassificationsystemtheability ofidentifyingthelevelofairflowobstructioninCOPD(mild, moderateorsevere);(2)toapplythismethodologyinthe detec-tionofearlysmoking-inducedrespiratorychanges,and(3)to contributetothediagnosisofairwayobstructioninasthma.

Conflict

of

interest

Nonedeclared.

Acknowledgements

The authors would like to thankJosiel G. Santosfor their technical assistance. The Brazilian Council for Scientific and TechnologicalDevelopment (CNPq)and Riode Janeiro StateResearchSupportingFoundation(FAPERJ)supportedthis study.

r

e

f

e

r

e

n

c

e

s

[1] Theglobalinitiativeforchronicobstructivelungdisease. Availablefrom:<http://www.goldcopd.com>(accessed March2011).

[2] WorldHealthOrganization.Availablefrom:

<http://www.who.int/respiratory/copd/burden/en/index.html> (accessedMarch2011).

[3] P.L.Enright,R.M.Crapo,Controversiesintheuseof spirometryforearlyrecognitionanddiagnosisofchronic obstructivepulmonarydiseaseincigarettesmokers,Clin. ChestMed.21(4)(2000)645–652.

[4] L.Ljung,SystemIdentification:TheoryfortheUser, Prentice-HallInc.,Londres,1987.

[5] A.B.Dubois,A.W.Brody,D.H.Lewis,B.F.BurgesJr., Oscillationmechanicsoflungsandchestinman,J.Appl. Physiol.8(1956)587–594.

[6] A.C.D.Faria,A.J.Lopes,J.M.Jansen,P.L.Melo,Evaluatingthe forcedoscillationtechniqueinthedetectionofearly smoking-inducedrespiratorychanges,Biomed.Eng.Online 25(2009)8–22.

[7] A.J.Orr,D.R.Westenskow,Abreathingcircuitalarmsystem basedonneuralnetworks,J.Clin.Monit.10(1994)101–109. [8] P.Bright,M.R.Miller,J.A.Franklyn,M.C.Sheppard,Theuseof

aneuralnetworktodetectupperairwayobstructioncaused bygoiter,Am.J.Respir.Crit.CareMed.157(1998)1885–1891. [9] M.A.Leon,J.Räsänen,D.Mangar,Neuralnetwork-based

detectionofesophagealintubation,Anesth.Analg.78(1994) 548–553.

(11)

[10] J.Räsänen,M.A.León,Detectionoflunginjurywith conventionalandneuralnetwork-basedanalysisof continuousdata,J.Clin.Monit.14(1998)433–439.

[11] G.Perchiazzi,M.Högman,C.Rylander,R.Giuliani,T.Fiore,G. Hedenstierna,Assessmentofrespiratorysystemmechanics byartificialneuralnetworks:anexploratorystudy,J.Appl. Physiol.90(2001)1817–1824.

[12] U.Uncü,Evaluationofpulmonaryfunctiontestsbyusing fuzzylogictheory,J.Med.Syst.34(3)(2010)241–250. [13] A.J.Lopes,D.Capone,R.Mogami,R.S.Lanzillotti,P.L.Melo,

J.M.Jansen,Severityclassificationforidiopathicpulmonary fibrosisbyusingfuzzylogic,Clinics66(6)(2011)1015–1019. [14] A.M.G.T.DiMango,A.J.Lopes,J.M.Jansen,P.L.Melo,Changes

inrespiratorymechanicswithdegreesofairwayobstruction inCOPD:detectionbyforcedoscillationtechnique,Respir. Med.100(3)(2006)399–410.

[15] C.Ionescu,E.Derom,R.DeKeyser,Assessmentof respiratorymechanicalpropertieswithconstant-phase modelsinhealthyandCOPDlungs,Comput.Methods ProgramsBiomed.97(1)(2010)78–85.

[16] M.Barúa,H.Nazeran,P.Nava,V.Granda,B.Diong, Classificationofpulmonarydiseasesbasedonimpulse oscillometricmeasurementsoflungfunctionusingneural networks,in:Conf.Proc.IEEEEng.Med.Biol.Soc.,2004,pp. 3848–3851.

[17] M.Barúa,H.Nazeran,P.Nava,B.Diong,M.Goldman, Classificationofimpulseoscillometricpatternsoflung functioninasthmaticchildrenusingartificialneural networks,in:Conf.Proc.IEEEEng.Med.Biol.Soc.,2005, pp.327–331.

[18] D.Macleod,M.Birch,Respiratoryinputimpedance measurements:forcedoscillationmethods,Med.Biol.Eng. Comput.39(2001)505–516.

[19] J.Hellinckx,M.Cauberghs,K.DeBoeck,M.Demedts, Evaluationofimpulseoscillationsystem:comparisonwith forcedoscillationtechniqueandbodyplethysmography,Eur. Respir.J.18(2001)564–570.

[20] J.L.M.Amaral,A.C.D.Faria,A.J.Lopes,J.M.Jansen,P.L.Melo, Automaticidentificationofchronicobstructivepulmonary diseasebasedonforcedoscillationmeasurementsand artificialneuralnetworks,in:32ndAnnualInternational ConferenceoftheIEEEEngineeringinMedicineandBiology Society,BuenosAires,Argentina,2010.

[21] L.I.Kuncheva,CombiningPatternClassifiers:Methodsand Algorithms,Wiley-Interscience,2004.

[22] R.O.Duda,P.E.Hart,D.G.Stork,PatternClassification, Wiley-Interscience,2000.

[23] P.N.Tan,M.Steinbach,V.Kumar,IntroductiontoData Mining,UniversityofMinnesotaPublisher,Addison-Wesley Copyright,2006.

[24] I.H.Witten,E.Frank,DataMining:PracticalMachine LearningToolsandTechniques,2nded.,MorganKaufmann, 2005.

[25] S.Haykin,NeuralNetworksaComprehensiveFoundation, MacmillanCollegePublishingCompany,EnglewoodCliffs, 1994.

[26] V.N.Vapnik,TheNatureofStatisticalLearningTheory,2nd ed.,Springer,NewYork,2000.

[27] G.P.Zhang,Neuralnetworksforclassification:asurvey,IEEE Trans.Syst.ManCybern.C:Appl.Rev.30(2000)451–462. [28] C.E.Pedreira,L.Macrini,M.G.Land,E.S.Costa,Newdecision

supporttoolfortreatmentintensitychoiceinchildhood acutelymphoblasticleukemia,IEEETrans.Inf.Technol. Biomed.13(2009)284–290.

[29] M.H.Goldbaum,P.A.Sample,K.Chan,J.Williams,T.-W.Lee, E.Blumenthal,C.A.Girkin,L.M.Zangwill,C.Bowd,T. Sejnowski,R.N.Weinreb,Comparingmachinelearning classifiersfordiagnosingglaucomafromstandard automatedperimetry,Invest.Ophthalmol.Vis.Sci.43(1) (2002)162–169.

[30] I.Guyon,A.Elisseeff,Anintroductiontovariableandfeature selection,J.Mach.Learn.Res.3(2003)1157–1182.

[31] D.T.Dietterich,Approximatestatisticaltestsforcomparing supervisedclassificationlearningalgorithms,Neural Comput.10(1998)1895–1923.

[32] T.Fawcett,Anintroductiontorocanalysis,PatternRecogn. Lett.27(8)(2006)861–874.

[33] P.Refaeilzadeh,L.Tang,H.Liu,CrossValidation, EncyclopediaofDatabaseSystems,Springer,2009. [34] J.Demsar,Statisticalcomparisonsofclassifiersover

multipledatasets,J.Mach.Learn.Res.7(2006)1–30. [35] J.V.Cavalcanti,A.J.Lopes,J.M.Jansen,P.L.Melo,Detectionof

changesinrespiratorymechanicsduetoincreasingdegrees ofairwayobstructioninasthmabytheforcedoscillation technique,Respir.Med.100(12)(2006)2207–2219. [36] P.L.Melo,Newimpedancespectrometerforscientificand

clinicalstudiesontherespiratorysystem,Rev.Sci.Instrum. 71(7)(2000)2867–2872.

[37] P.L.Melo,M.M.Werneck,A.Giannella-Neto,Influenceofthe pressuregeneratornon-linearitiesintheaccuracyof respiratoryinputimpedancemeasuredbyforcedoscillation, Med.Biol.Eng.Comput.38(2000)102–108.

[38] A.C.D.Faria,A.J.Lopes,J.M.Jansen,P.L.Melo,Assessmentof respiratorymechanicsinpatientswithsarcoidosisusing forcedoscillation:correlationswithspirometricand volumetricmeasurementsanddiagnosticaccuracy, Respiration78(1)(2009)93–104.

[39] R.P.W.Duin,P.Juszczak,P.Paclik,E.Pekalska,D.deRidder, D.M.J.Tax,S.Verzakov,PRTools4.1,AMatlabToolboxfor PatternRecognition,DelftUniversityofTechnology, 2007.

[40] C.W.Hsu,C.C.Chang,C.J.Lin,Apracticalguidetosupport vectorclassification.Availablefrom:

<www.csie.ntu.edu.tw/∼cjlin/papers/guide/guide.pdf> (accessedOctober2010).

[41] W.L.Martinez,A.R.Martinez,ExploratoryDataAnalysiswith MATLAB,CRCPress,2005.

[42] W.J.Conover,PracticalNonparametricStatistics,3rded., Wiley,1999.

[43] S.J.Delany,P.Cunningham,L.Coyle,Anassessmentof case-basedreasoningforspamfiltering,Artif.Intell.Rev.24 (3)(2005)359–378.

[44] J.A.Hanley,B.J.McNeil,Themeaninganduseofthearea underareceiveroperatingcharacteristic(ROC)curve, Radiology143(1982)29–36.

[45] J.A.Swets,R.M.Picket,Evaluationofdiagnosticsystems: methodsfromsignaldetectiontheory,Med.Phys.10(2) (1983)266–267.

[46] J.A.Swets,Measuringtheaccuracyofdiagnosticsystems, Science240(1988)1285–1293.

[47] R.Golpe,A.Jiménez,R.Carpizo,J.M.Cifrian,Utilityofhome oximetryasascreeningtestforpatientswithmoderateand severesymptomsofobstructivesleepapnea,Sleep22(7) (1999)932–937.

[48] T.L.Croxton,G.G.Weinmann,R.M.Senior,J.R.Hoidal,Future researchdirectionsinchronicobstructivepulmonary disease,Am.J.Respir.Crit.CareMed.165(2002) 838–844.

w w w . i n t l . e l s e v i e r h e a l t h . c o m / j o u r n a l s / c m p b Elsevier OA license. http://www.goldcopd.com> http://www.who.int/respiratory/copd/burden/en/index.html> www.csie.ntu.edu.tw/∼

Figure

Table 1 – Biometric and spirometric characteristics of the studied groups. CG COPD p Age (years) 55.2 ± 16.7 61.4 ± 9.7 ns (0.38) Weight (kg) 65.4 ± 11.8 66.0 ± 8.4 ns (0.84) Height (cm) 162.2 ± 8.9 163.5 ± 7.9 ns (0.58) FEV 1 (L) 2.8 ± 0.9 1.4 ± 0.7 &lt;0
Table 4 – Results of the experiment 2.
Table 10 lists the results achieved by KNN in all of the experi- experi-ments.
Table 10 – Comparisons of the results achieved by KNN in all of the experiments.

References

Related documents

Based on this new expression for Ca c , a three-regime theory is formulated to describe the interface (in)stability: (i) in Regime I, the growth rate is always negative, thus the

The corona radiata consists of one or more layers of follicular cells that surround the zona pellucida, the polar body, and the secondary oocyte.. The corona radiata is dispersed

National Conference on Technical Vocational Education, Training and Skills Development: A Roadmap for Empowerment (Dec. 2008): Ministry of Human Resource Development, Department

19% serve a county. Fourteen per cent of the centers provide service for adjoining states in addition to the states in which they are located; usually these adjoining states have

4.1 The Select Committee is asked to consider the proposed development of the Customer Service Function, the recommended service delivery option and the investment required8. It

• Follow up with your employer each reporting period to ensure your hours are reported on a regular basis?. • Discuss your progress with

By analyzing HGT networks constructed from two longitudinal metagenomic sequenc- ing data sets: Mother-to-Child and longitudinal IBD data sets, we found the HGT network is

AD: Alzheimer ’ s disease; A β O: Amyloid- β oligomer; CM: Conditioned medium; CNS: Central nervous system; DMEM: Dulbecco ’ s modified Eagle ’ s medium; EVs: Extracellular