• No results found

The impact of class imbalance in classification performance metrics based on the binary confusion matrix

N/A
N/A
Protected

Academic year: 2021

Share "The impact of class imbalance in classification performance metrics based on the binary confusion matrix"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Contents lists available at ScienceDirect

Pattern

Recognition

journal homepage: www.elsevier.com/locate/patcog

The

impact

of

class

imbalance

in

classification

performance

metrics

based

on

the

binary

confusion

matrix

Amalia

Luque

a,∗

,

Alejandro

Carrasco

b

,

Alejandro

Martín

a

,

Ana

de

las

Heras

a a Dpto.IngenieríadelDiseño.EscuelaPolitécnicaSuperior.UniversidaddeSevilla.VirgendeÁfrica,7.41011Sevilla,Spain b Dpto.TecnologíaElectrónica.EscuelaPolitécnicaSuperior.UniversidaddeSevilla.VirgendeÁfrica,7.41011,Sevilla,Spain

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received4September2018 Revised22December2018 Accepted22February2019 Availableonline28February2019

Keywords:

Classification Performancemeasures Imbalanceddatasets ClassBalanceMetrics

a

b

s

t

r

a

c

t

Amajorissueintheclassificationofclassimbalanceddatasetsinvolvesthedeterminationofthe most suitableperformance metricstobeused.Inpreviousworkusingseveralexamples, ithasbeenshown thatimbalancecanexertamajorimpactonthevalueandmeaningofaccuracyandoncertainother well-knownperformancemetrics.Inthispaper,ourapproachgoesbeyondsimplystudyingcasestudiesand developsasystematicanalysisofthisimpactbysimulatingtheresultsobtainedusingbinaryclassifiers.A setoffunctionsandnumericalindicatorsareattainedwhichenablesthecomparisonofthebehaviourof severalperformancemetricsbasedonthebinaryconfusionmatrixwhentheyarefacedwithimbalanced datasets. Throughoutthe paper,anew wayto measurethe imbalanceis definedwhichsurpassesthe Imbalance Ratioused inpreviousstudies.Fromthe simulationresults,severalclustersofperformance metrics have beenidentified that involvethe use ofGeometricMean orBookmakerInformedness as thebestnull-biasedmetricsiftheirfocusonclassificationsuccesses(dismissingtheerrors)presentsno limitation forthe specificapplication where theyareused. However,ifclassification errorsmust also beconsidered,thentheMatthewsCorrelationCoefficientarisesasthebestchoice.Finally,asetof null-biasedmulti-perspectiveClassBalanceMetricsisproposedwhichextendstheconceptofClassBalance Accuracytootherperformancemetrics.

© 2019TheAuthors.PublishedbyElsevierLtd. ThisisanopenaccessarticleundertheCCBY-NC-NDlicense. (http://creativecommons.org/licenses/by-nc-nd/4.0/)

1. Introduction

In recent years, the scientific community workingon classifi-cation algorithms has shown an increasing interest in the chal-lenges that arise when imbalanced datasets are considered. Sev-eraloverviewsontheseissueshavebeenaddressedin[2,17,21,24]. Intheseanalyses,theSyntheticMinorityOver-samplingTechnique (SMOTE) [9] and the AdaBoost [39] are highlighted as general-purposesolutions, although algorithms ofa more specific nature can also be found, either as general-purpose [3], as problem-oriented[15]orasclassifier-oriented[33].Algorithmsthataddress theimbalanceprobleminmulti-labelclassification[8]orthat use advanced classifiers such as extreme learning machines[41] can alsobe found.An up-to-datecomparison ofthesetechniques ad-dressingaspecificproblemcanbefoundin[40].

Correspondingauthor.

E-mailaddresses:[email protected](A.Luque), [email protected](A.Carrasco), [email protected](A.Martín),[email protected](A.delasHeras).

A key aspect of thesemethods involves the determination of the classificationperformance, not only inorder to assessthe fi-nalresult,butalsotoobtainafigurewhichhastobeoptimizedby tuningtheclassifierparameters.However,thereisnosinglewayto selectthebest algorithmasanyalgorithmcanobtaingoodresults inoneclassbutpoorscoresinotherclasses.Forthisreason, sev-eralmetricsare usuallyconsidered, whichpermitsthepolyhedral characteristicsoftheclassificationperformancetobeviewedfrom differentpointsofviews.

The impact of class imbalance on classification performance metricshasthereforebecome amajor issue.Severalauthorshave addressedthistopicbyshowingafewexamplesofthisimpacton accuracy[10,22]andonseveralothermetrics[5,13,20].Tothebest of ourknowledge, only one systematic(albeit limited) studyhas beenpublishedthatisnotsimplybasedonexamples,[25].

Thequantitativeresearchonclassificationperformance metrics hastraditionallybeentackledeitherbyusingacollectionofknown andwidelyavailable datasets[1,4],orbyrandomlysimulatingthe classifierresults[27,34].Amixtureofrandomsimulatedvariations onknowndatasetsisemployedinJenietal.[25].

https://doi.org/10.1016/j.patcog.2019.02.023

(2)

A.Luque,A.CarrascoandA.Martínetal./PatternRecognition91(2019)216–231 217 In order to overcomethe effect ofimbalance on performance

metrics, severalsolutions havebeensuggested, among whichthe mostcitedistheuseoftheClassBalancedAccuracy(CBA)[1,6,32]. Approachessuchasrelevance-basedevaluation[5],theNormalized PrecisionRate[13],theIndexofBalancedAccuracy[18,28,29],and the multiclassperformance score (MPS)[23] have alsobeen pro-posed.

Inthispaper,an extensiveandsystematicstudyisundertaken of the impact of class imbalance on classification performance metrics. Several dozen performance metrics can be found in the scientific literature, some based on a threshold, others based on probabilities, while yet othersare based on ranks [14]. However, themostwidelyemployed metricsare thosebasedonthe confu-sionmatrix[38],wherethemulti-classcaseisusuallyreducedto a setofbinarycasesusingtheOne-versus-All orthe One-versus-One approach [31].Forthese reasons,ourresearch isfocused on classification performance metrics basedon the binaryconfusion matrix.

Therestofthepaperisorganizedasfollows.Section2presents the methodology employed to measure the impact of imbalance in performance metrics, thereby formally defining the confusion matrix (Section 2.1) and themetrics based thereon (Section 2.2). Section2.3proposesanewfigureforthequantificationoftheclass imbalance,and severalfunctionsandindicators ofthe aforemen-tionedimpact ofimbalancearedefinedin Section2.4.The appli-cationto asetofclassification performancemetrics basedonthe binaryconfusionmatrix ispresentedinSection 3.The discussion andconclusionoftheseresultsareaddressedinSection4.

2. Methodology

2.1. Definitionoftheconfusionmatrix

ConsideradatasetD=

{

d1 ,d2 ,· · ·,dm

}

madeupofmelements

wheredkrepresentsthek-thelement.Let

be asetofCclasses

=

{

θ

1 ,

θ

2 ,· · ·,

θ

C

}

where

θ

idefinesthei-thclass.TheclassifierC

operatingondk(thek-thelementofthedatasetD)assignsalabel

θ

j andestimatesthat thiselement belongsto the j-thclass, that

is,dkC

θ

jorC(dk

)

=

θ

j,whileitreallybelongstothei-thclass

θ

i,

therebycausingamisclassification(aconfusion)wheni=j. Let A=

{

α

1 ,

α

2 ,· · ·,

α

m

}

be the set of actual classes

corre-spondingtothedatasetD,where

α

kistheactualclassofthe ele-mentdk.Furthermore,letE=

{

ε

1 ,

ε

2 ,· · ·,

ε

m

}

bethesetofclasses

estimated by the classifier C foreach element in D, where ɛk is

the estimatedclass ofthe element dk.The performance of C can beassessedusingameasuringfunctionM,whichassignsametric

μ

∈Rtothepair

(

A,E

)

,thatis,

(

A,E

)

M

μ

.

In this paper, we will focus on metrics based on the confu-sion matrix,whichrepresentsone ofthe mostcommonmethods topresenttheresultsobtainedbyaclassifier,andisdefinedas

CM

m11 m12 ... m1 C m21 m22 ... m2 C . . . ... ... ... mC1 mC2 ... mCC

. (1)

In thisexpression, mij representsthe number ofelements

ac-tually belonging to the i-th class (

θ

i) but that are classified as

membersofthej-thclass(

θ

j).Inthecontextofourresearch,itis

better todescribe thetermmij inrelation tothetotalnumber of

elementsmibelongingtothei-thclass(

θ

i).Bydenoting

λ

ijasthe

ratiomij/mi,theconfusionmatrixcanberewrittenas

CM=

λ

11m1

λ

12m1 ...

λ

1Cm1

λ

21 m2

λ

22 m2 ...

λ

2 Cm2 . . . ... ... ...

λ

C1 mC

λ

C2 mC ...

λ

CCmC

, (2)

whichcanbeexpressed astheHadamard(element-wise)product oftwomatricesintheform

CM=

λ

11

λ

12 ...

λ

1 C

λ

21

λ

22 ...

λ

2 C . . . ... ... ...

λ

C1

λ

C2 ...

λ

CC

m1 m1 ... m1 m2 m2 ... m2 . . . ... ... ... mC mC ... mC

. (3) Inthebinarycase,thatis,whenthenumberofclassesisC=2, thentheconfusionmatrixcanbewrittenas

CM

m11 m12

m21 m22 . (4)

Ingeneral, oneofthe classesiscalledthe“Positive” class and theother is namedthe “Negative” class.Therefore,the confusion matrixcanberewrittenaccordingtothisnewterminologyas CM

mPP mPN

mNP mNN . (5)

Theelementsofthismatrixarenamedwiththefollowing con-vention:mPP,“TruePositive” (TP);mPN,“FalseNegative” (FN);mNP,

“FalsePositive” (FP);andmNN,“TrueNegative” (TN).

Thetotal numberofpositive mP andnegativemN elementsin

D meet that they summ,the total number of elements, that is, mP+mN=m.Furthermore,itisalsotruethatthenumberof

ele-mentscorrectlyclassifiedinclassP(mPP),andthenumberof

ele-mentsmisclassifiedinthatclassP(mPN),addsuptothenumberof

elementsinthepositiveclass(mP),thatis,mPP+mPN=mP.

Simi-larly,itcan bestatedthatmNP+mNN=mN.Theconfusionmatrix

canthereforebewrittenas CM=

mPP mPmPP

mNmNN mNN , (6)

whichcanalsobeformulatedintermsofthe

λ

ijratiosas

CM=

λ

PPmP

λ

PNmP

λ

NPmN

λ

NNmN =

λ

PPmP

(

1−

λ

PP

)

mP

(

1−

λ

NN

)

mN

λ

NNmN . (7)

Additionally,thetotalnumberofelementsinDestimatedbyC aspositive(despitetheir actual class),eP,andthoseestimatedas

negative,eN,canbewrittenas

eP=mPP+mNP=

λ

PPmP+

(

1−

λ

NN

)

mN.

eN=mNN+mPN=

λ

NNmN+

(

1−

λ

PP

)

mP. (8)

Theyalsoadduptothetotalnumberofelements,eP+eN=m.

Thedefinitionsregardingtheconfusionmatrixaresummarizedin Fig.1.

2.2.Metricsbasedonthebinaryconfusionmatrix

Based onthe binaryconfusionmatrix, numerousperformance metricshavebeenproposed[19,27,30,34,37].Forourstudy,the fo-cus is placed on 10 of these metrics, which are summarized in Table1.Allthesemetrics,take valueswithinthe[0,1] range, ex-cept the last three (MCC, BM,and MK), whose ranges lie within the [−1,1] interval. For comparison purposes, these metrics are usedhereinintheirnormalizedversion(MCCn,BMn,andMKn).By namingametricdefinedwithinthe[−1,1]intervalas

μ

,itcanbe normalizedwithinthe[0,1]rangebytheexpression

μ

n

μ

+

1

(3)

Fig.1. Confusionmatrixforbinaryclassification.

Table1

Definitionofclassificationperformancemetrics.

Symbol Metric Definedas

SNS Sensitivity T P T P+F N SPC Specificity T N T N+F P PRC Precision T P T P+F P NPV NegativePredictiveValue T N

T N+F N

ACC Accuracy TP+TN

TP+F N+TN+F P F1 F1score 2PRCPRCSNSSNS

GM GeometricMean √SNS·SPC

MCC MatthewsCorrelationCoefficient TP·TNFP·FN

( T P+F P)( T P+F N)( T N+F P)( T N+F N) BM BookmakerInformedness SNS+SPC−1

MK Markedness PPV+NPV−1

Itcaneasilybeshown(seesupplementarymaterialinthe elec-tronicversionofthispaper)thatallthesemetricscanbeexpressed as a function

μ

=

μ

(

λ

PP,

λ

NN,

π

P,

π

N

)

, where

π

P is the ratio of

positive elements in the dataset (mP/m) and, analogously, where

π

NmN/m. Furthermore, for balanced classes, when

π

P=

π

N=

0.5,themthemetricscanbeformulatedas

μ

=

μ

(

λ

PP,

λ

NN

)

.These

functionsaredepictedinFig.2asaheatmapforeachmetric. The bestclassifier achievesa value of

λ

PP=1(allthepositive

elementsare correctly classifiedas positive)and, also a value of

λ

NN=1(allthenegativeelementsarecorrectlyclassifiedas

neg-ative), corresponding to the upper right-hand-side corner in the graphic.Instead,theworstclassifier(

λ

PP=0,

λ

NN=0)corresponds

tothelowerleft-hand-sidecornerofthegraphic.

Althoughonlyperformancemetricsbasedontheconfusion ma-trix are considered, a marginal approach to Receiver Operating Characteristics(ROC)analysis[16]canalsobe carriedout. Inthis analysis,theArea UnderCurve(AUC) iscommonlyusedasa per-formancemetric.However,forclassifiersofferingonlyalabel(and not a set of scores for each label), or when a single threshold isused on scores, the value of AUC andBMn are the same[36]. Therefore, in the forthcoming sections, whenever BMn is men-tioneditcouldalsobeunderstoodasAUC.

2.3.Definingclassimbalance

Theconceptofclassimbalanceisrelativelyclear:itariseswhen the dataset has a different number of elements in positive and negativeclasses.However, itsformalization isfarfrombeing uni-vocallyaccepted. For instance, in [18,29], the imbalance is char-acterizedbythe dominance (Dom) orprevalence relationship be-tween the positive class and the negative class, and is defined asTPRTNR.Thisvalue islateremployed tocompensate perfor-mancemetrics affected by the imbalance problem. However, the dominanceisnotexactlyameasureoftheimbalanceinthedataset becauseitconsidersimbalanceintheoutcomesoftheclassifier.

Other authors formalize this concept by using the entropy [12]or,morecommonly,theproportionbetweenpositiveand neg-ative instances (formalized as1: X) [13] which is similar to the imbalance ratio(IR) defined asmP/mN [1], alsocalled skew [25].

This value lies within the [0, ∞] range,having a value IR=1in thebalancedcase.

Other authors formalize this concept by using the propor-tionbetweenpositive andnegativeinstances (formalizedas1:X) [13]or,whichissimilar,theimbalanceratio(IR)definedasmP/mN

[1],which isalso calledskew [25].This value lieswithin the [0,

∞]range,havingavalueIR=1inthebalancedcase.

In this paper, it is preferred to feature the imbalance with a value within the [−1,1] range, while reserving the 0 value for when the classes are perfectly balanced (lack of imbalance). For thispurpose,weproposetheimbalancecoefficient

δ

as

δ

=

δ

P≡2

π

P−1=2

mP

m −1. (10)

Forthe negativeclass,the coefficient

δ

N=2

π

N−1 isalso

de-fined.Thesumofthesecoefficientsis

δ

P+

δ

N=2

π

P−1+2

π

N−1=2

(

π

P+

π

N

)

−2=0. (11)

Hence

δ

N=−

δ

P=−

δ

. From (10),the value of

π

P can be

ob-tainedas

π

P=

1+

δ

2 . (12)

Moreover,thevalueof

π

N canbederivedfrom(11)

π

N=

1−

δ

2 . (13)

Therefore,themetrics

μ

=

μ

(

λ

PP,

λ

NN,

π

P,

π

N

)

canberedefined

as

μ

=

μ

(

λ

PP,

λ

NN,

δ

)

. It is clear that the value of the metric

μ

dependsnot onlyonthe classifier’sperformance,butalso onthe imbalance

δ

.

Itcan easily be derivedthat the relationshipbetweenthe im-balanceratio(IR)andtheimbalancecoefficient(

δ

)is

IR=1+

δ

1−

δ

. (14)

2.4. Assessingtheimpactofimbalance

Inordertoassesstheimpactoftheimbalanceinacertain met-ric,its value(

μ

b) forthebalanced case(when

δ

=0)isfirst

con-sidered.

μ

b

μ

|

δ=0=

μ

(

λ

PP,

λ

NN,0

)

=

μ

b

(

λ

PP,

λ

NN

)

. (15)

This definition is later employed to propose a family of met-ricswherethe effectoftheimbalance isdismissed.Inthe scien-tificliterature,afewexamplesofthesemetricscanbefound,asin thecaseofClassBalance Accuracy(CBA)[35].Ongeneralizingthis approach, themetrics

μ

b are calledClass Balance Metrics (CBM).

Table2summarizestheequationsforeachmetric,bothintheclass imbalanceandbalancecases.Theseresultsarederivedinthe sup-plementarymaterialofthepaper,availableonline.

Withthesedefinitions,itisnowpossibletoquantifytheimpact ofimbalance,byusingthe biasofthemetricwhichis definedas Bμ

μ

μ

b=

μ

(

λ

PP,

λ

NN,

δ

)

μ

b

(

λ

PP,

λ

NN

)

. (16)

Table3summarizesthedefinitionofbiasforeachmetric.These results are derived in the supplementary material of the paper, availableonline.

As can be observed, bias depends on three variables: Bμ=

Bμ

(

λ

PP,

λ

NN,

δ

)

. In order to study this function, a 4-dimensional

space is required. Its representations are first tackled using heat volumes(3D),whereeachpointinthe3-dimensional(

λ

PP,

λ

NN,

δ

)

(4)

A.Luque,A.CarrascoandA.Martínetal./PatternRecognition91(2019)216–231 219

Fig.2. Heatmapsformetricswithbalancedclasses.

Table2

Classificationperformancemetricsasafunctionofimbalance.

Metrics Classimbalancemetricsμ(λPP,λNN,δ) ClassBalancemetricsμb(λPP,λNN)

SNS λPP λPP SPC λNN λNN PRC λPP( 1+δ) λPP( 1+δ)+( 1−λNN)( 1−δ) λPP λPP+( 1−λNN) NPV λNN( 1 −δ) λNN( 1 −δ)+( 1 −λPP)( 1+δ) λNN λNN+( 1 −λPP) ACC λPP1+2δ+λNN1 −2δ λPP+2λNN F1 ( 1+λPP)( 12+λδPP)+( 1( 1 +−δ) λNN)( 1 −δ) 2λPP 2+λPPλNN GM λPP·λNN λPP·λNN MCCn 1 2 ( λPP+λNN−1 [ λPP+( 1 −λNN) 11−+δδ][ λNN+( 1 −λPP) 11+−δδ] +1) 1 2 (λPP+λNN−1 [ λPP+( 1 −λNN) ][ λNN+( 1 −λPP) ] +1) BMn λPP+λNN 2 λPP+λNN 2 MKn 1 2 ( 1+δ ( 1+δ)+1−λNN λPP ( 1 −δ) + 1 −δ ( 1 −δ)+1−λPP λNN ( 1+δ) ) 1 2 (1+11 −λNN λPP + 1 1+1−λPP λNN ) Table3

Biasofperformancemetricsduetoclassimbalance. Metrics Bias(λPP,λNN,δ) SNS 0 SPC 0 PRC 1+δ ( 1+δ)+1−λNN λPP ( 1 −δ) − 1 1+1−λNN λPP NPV 1 −δ ( 1 −δ)+1−λPP λNN( 1+δ) − 1 1+1−λPP λNN ACC δ2 (λPPλNN) F1 2 λPP( 1+δ) ( 1+λPP)( 1+δ)+( 1 −λNN)( 1 −δ) − 2 λPP 2+λPPλNN GM 0 MCCn λPP+λNN−1 2 [λPP+( 1−λNN) 11−+δδ][λNN+( 1−λPP) 11+−δδ] − λPP+λNN−1 2√[λPP+( 1−λNN) ][λNN+( 1−λPP) ] BMn 0 MKn 1 2 ( 1+δ ( 1+δ)+1−λNN λPP ( 1 −δ) − 1 1+1−λNN λPP + 1 −δ ( 1 −δ)+1−λPP λNN( 1+δ) − 1 1+1−λPP λNN )

Alternatively, Bμ isalso representedasa setofheat maps(or contourgraphs).Eachheatmap(2D)representsthemetricbiasfor acertainfixedvalueoftheimbalance(letussay

δ

0 ),thereby mak-ing bias dependent on two variables (

λ

PP,

λ

NN) and on one

con-stant(

δ

0 ).Hence,Bμ=Bμ

(

λ

PP,

λ

NN,

δ

0

)

.

Nevertheless, drawing conclusions regarding a 4-dimensional spaceis,inmostcases,achallengingtask.Severalpartialprospects are therefore proposed that reduce the bias function dimension-ality. In this respect, the first approach involves considering the

Table4

Biasindicatorsforsingularclassifiers.

Singularclassifier Symbol σBμ(δ)

Worstclassifier wcBμ(δ) lim

ε→0 Bμ(ε,ε,δ)

Bestclassifier bcBμ(δ) lim

ε→0 Bμ(1−ε,1−ε,δ)

Worst-positiveclassifier wpcBμ(δ) lim

ε→0Bμ(ε,1−ε,δ) Worst-negativeclassifier wncBμ(δ) lim

ε→0Bμ(1−ε,ε,δ) Mediumclassifier mcBμ(δ) (0.5,0.5,δ)

metricbiasforclassifierswhoseperformanceissingularlylocated onthe(

λ

PP,

λ

NN)plane, therebyobtaining asetofbiasindicators

which are generically denoted as

σ

Bμ(

δ

). The singularclassifiers andtheirformulationsareproposedinTable4.

An alternative way to reduce the dimensionality of Bμ is throughtheconsiderationthat

λ

PPand

λ

NNarerandomlyand

uni-formlydistributed within the [0, 1] range. Bias can thereforebe seen for each value of the imbalance coefficient

δ

as a random variable Bμ(

δ

), which is characterized by its probability density function (pdf[Bμ(

δ

)]). Additionally, several local statistical indica-torscanbedefined,whicharegenericallydenotedas

ψ

Bμ(

δ

).The termlocalattributedto theseindicators(summarized in Table5) meansthattheyaredefinedforeachvalueof

δ

.

DefinitionsinTables4and5haveintroducedseveralprospects ofbias,all ofwhichdepend onthe imbalance

δ

.Theyare

(5)

gener-Table5

Localstatisticalindicatorsonbias.

Localstatisticalindicator Symbol ψBμ(δ)

Mean mBμ(δ) ∫∫(λPP,λNN,δ)dλPPdλNN

Standarddeviation sdBμ(δ)

[Bμ(λPP,λNN,δ)mBμ(δ)]2 dλPPdλNN

Root-Mean-SquareBias rmsBμ(δ) B2 μ(λPP,λNN,δ)dλPPdλNN

Maximumabsolutevalue maxaBμ(δ) max 0≤λPP≤1 0≤λNN≤1 |Bμ(λPP,λNN,δ)| Skewness skBμ(δ) sdB1 3μ(δ) [Bμ(λPP,λNN,δ)mBμ(δ)]3dλPPdλNN Kurtosis kBμ(δ) sdB1 4μ(δ) [Bμ(λPP,λNN,δ)mBμ(δ)]4 dλPPdλNN Table6

Globalstatisticalindicatorsonbias.

Globalstatisticalindicator Symbol (δ)

Mean MBμ 1 2 Bμ(λPP,λNN,δ)dλPPdλNNdδ Standarddeviation SDBμ 1 2 [Bμ(λPP,λNN,δ)MBμ]2 dλPPdλNNdδ Root-Mean-SquareBias RMSBμ 1 2 B2 μ(λPP,λNN,δ)dλPPdλNNdδ

Maximumabsolutevalue MAXABμ max 0≤λPP≤1 0≤λNN≤1 0≤δ≤1 |Bμ(λPP,λNN,δ)| Skewness SKBμ SDB13μ 1 2 [Bμ(λPP,λNN,δ)MBμ]3dλPPdλNNdδ Kurtosis KBμ SDB1 4μ 1 2 [Bμ(λPP,λNN,δ)MBμ(δ)]4dλPPdλNNdδ

ically denoted as xBμ

(

δ

)

=

{

σ

Bμ

(

δ

)

,

ψ

Bμ

(

δ

)

}

. It should be ob-servedthatthesearenotnumbersbutfunctions.Inordertoobtain asinglevalue derivedfromthesefunctions, wecanmakethe hy-pothesisthat

δ

israndomly anduniformlydistributed withinthe [−1,1]range.Meanvaluesofeachfunctioncanthereforebe com-putedas

xBμ≡1

2 1

−1 xBμ

(

δ

)

d

δ

. (17) Anotherwaytoreduce thegeneric functionxBμ(

δ

)to asingle valueisbyfocusingonitsvalueforextremelypositive-imbalanced datasets where

δ

→1.A generic value can thereforebe obtained throughtheexpression

xBεP

μ ≡limε0 xBμ

(

1−

ε

)

. (18) Thecorrespondingnegativecounterpartisdefinedas

xBεN

μεlim0 xBμ

(

−1+

ε

)

. (19) A globalregard ofbiascan alsobe undertakenby considering that

λ

PP and

λ

NN are randomly anduniformlydistributed within

the [0, 1] range, and also that

δ

lies within the [1,1] range. Bias can now be seen asa random variableBμ that is indepen-dent of the imbalance coefficient

δ

. Bias can therefore be char-acterized by its probability density function (pdf[Bμ]). Based on this overall pdf, several global statistical indicators can be de-fined,whichare genericallydenotedas

Bμ andare summarized inTable6.

DefinitionsinEqs.(17)–(19)andinTable6haveintroduced sev-eralsingle-valuedindicatorsregardingbias whichwillgenerically bedenotedasXBμ

(

δ

)

=

{

xBμ,xBεP

μ,xBεN

μ ,

Bμ

(

δ

)

}

.

Throughout thissubsection,severalfunctionandsingle-valued indicators have been introduced to assess the impact of dataset

imbalance on classification performance metrics. A summary of theseindicatorsisdepictedinFig.3.

3. Results

3.1. Performancemetricbiasfunction

ThemethodsdescribedinSection2willnowbeappliedtothe tenmetrics definedin Table1. As explainedabove, biasdepends onthreevariables,thatis,Bμ=Bμ

(

λ

PP,

λ

NN,

δ

)

anditsformulation

fortheselectedmetricsisalsoshowninTable3.Thefirstapproach fortheirrepresentationisbasedonthe heatvolumesasdepicted inFig.4,whereeachpointinthe3-dimensional(

λ

PP,

λ

NN,

δ

)space

hasabias-dependentcolour.

Incertainperformancemetrics(forinstanceinMCCn),biashas low values for many points in the (

λ

PP,

λ

NN,

δ

) space. In these

cases,the expressive power ofthe whole range ofcolours is not completelyexploited. It isthereforebetter toselectthe colourof each point not directly based on bias, but on the relative value of bias within the range of values for their corresponding met-ric. The colour-map is then rescaled to show the relative bias with the value −1 corresponding to the minimum bias, and the value+1tothemaximum.The resultisdepictedinFig.5,where the range of each metric is shown in its corresponding subplot title.

Asexplainedabove,Bμ canalsoberepresentedasasetofheat maps.Eachheatmaprepresentsthemetricbiasforacertainfixed valueoftheimbalanceBμ=Bμ

(

λ

PP,

λ

NN,

δ

0

)

.Theresultfor preci-sion(PRC)isportrayedinFig.6.Similargraphics canbeobtained fortheremainingmetrics.

Now let us suppose that the value of

δ

is known, for in-stance

δ

=

δ

0 =0.95. Therefore=Bμ

(

λ

PP,

λ

NN,

δ

0

)

dependson onlytwovariables(

λ

PPand

λ

NN)andcanberepresentedasaheat

map. The resultsfor each metric are shownin Fig. 7 where the coloursrepresenttheabsolutevalueofbias.

(6)

A.Luque,A.CarrascoandA.Martínetal./PatternRecognition91(2019)216–231 221

Fig.3. Functionsandsingle-valuedindicatorsassessingbiasinperformancemetricsduetoclassimbalance.

Fig.4. Heatvolumesofbiasforeachperformancemetric.

Thisinformationcan alsobepresentedintheformofcontour graphsasinFig.8wherethecoloursrepresenttheabsolutevalue ofbias.

3.2. Biasindicatorsdependingon

δ

Inthe previoussection, severalbiasindicatorsdependingonly ontheimbalancecoefficient

δ

weredefined.Asmentionedtherein, the firstapproach isto consider classifierswhoseperformance is singularlylocatedonthe(

λ

PP,

λ

NN)plane.Theresultsobtainedfor

each bias indicator and performance metric

σ

Bμ(

δ

) are summa-rized in Table 7, whereunbiased performance metrics (SNS,SPC, GM,BMn)havebeenomitted.

Table7

Biasindicatorsforsingularclassifiers:σBμ(δ).

PRC NPV ACC F1 MCCn MKn wcBμ 0 0 0 0 0 0 bcBμ 0 0 0 0 0 0 wpcBμ δ/2 −δ/2 −δ/2 0 0 0 wncBμ δ/2 −δ/2 δ/2 2( 1+δ) 3+δ −2 3 0 0 mcBμ δ/2 −δ/2 0 1+δ 2+δ−12 0 0

Itcan be observed that onlyfour typesof non-nullindicators appear.Theirdependenceon

δ

isplottedinFig.9.

Alternatively,itcanbeassumedthat

λ

PPand

λ

NNarerandomly

(7)

Fig.5. Heatvolumesofrelativebiasforeachperformancemetric.

Fig.6. Setofheatmapsofbiasforprecision.

Bμ(

δ

) canthen be statisticallycharacterized.First, theprobability densityfunction(pdf[Bμ(

δ

)])isderivedforeachperformance met-ric.The resultsare shownin Fig.10, where,foreach

δ

andeach biasBμ,thevaluesofpdfare shownusingdifferentcolours(dark bluecorrespondstopdf=0).

Additionally, several local statistical indicators have been de-fined(see Table5) and theseare generically denotedas

ψ

Bμ(

δ

). Theresultsobtainedforeach performance metricare depictedin Fig.11.Foreasierreading,unbiasedperformancemetrics(SNS,SPC, GM,BMn) havebeen omitted.Moreover,NPV presents symmetric behaviourto PRC andhasalso beendisregarded fromthegraphs forthesakeofsimplicity.

3.3. Single-valuedbiasindicators

Ashasalreadybeenpointedout,therearevariouswaysto ob-tainsingle-valuedbiasindicators.First,weconsiderbiasfunctions

σ

Bμ(

δ

)assummarizedinTable7.Byassumingthat

δ

israndomly anduniformlydistributedwithin the[−1,1] range,themean val-uesofeachmeasurecanbecomputed.Theresultsforeach perfor-mancemetricareshowninTable8.

Now letusconsiderbiasfunctions

ψ

Bμ(

δ

) depictedinFig.11. By applying the same method, mean values of these measures can also be obtained. Results for each metric are shown in Table9.

(8)

A.Luque,A.CarrascoandA.Martínetal./PatternRecognition91(2019)216–231 223

Fig.7. Heatmapsofbiasforeachperformancemetric(δ=0.95).

Fig.8. Contourgraphsofbiasforeachperformancemetric(δ=0.95).

Table8

MeanvaluesofbiasfunctionsforsingularclassifiersσBμ(δ).

Symbol PRC NPV ACC F1 MCCn MKn 1 wcBμ 0 0 0 0 0 0 2 bcBμ 0 0 0 0 0 0 3 wpcBμ 0 0 0 0 0 0 4 wncBμ 0 0 0 −0.053 0 0 5 mcBμ 0 0 0 −0.049 0 0 Table9

MeanvaluesofbiasoflocalstatisticalindicatorsψBμ(δ).

Symbol PRC NPV ACC F1 MCCn MKn 6 mBμ 0 0 0 −0.041 0 0 7 sdBμ 0.082 0.082 0.102 0.066 0.038 0.066 8 rmsBμ 0.228 0.228 0.102 0.135 0.038 0.066 9 maxaBμ 0.308 0.308 0.25 0.244 0.090 0.154 10 skBμ 0 0 0 0.129 0 0 11 kBμ 0.080 0.080 −0.6 −1.093 0.006 0.028

Insteadofcomputingthemeanofafunction,single-valuedbias indicatorscanalsobeobtainedby focusingontheir valuefor ex-tremelyimbalanceddatasets.Byfirstconsideringthebiasmetrics ofsingularclassifiers(

σ

Bμ(

δ

)),theirresultsareshowninTable10. Regardingtheextremelyimbalancedcaseforlocalstatistical in-dicators(

ψ

Bμ(

δ

)),theirresultsareshowninTable11.

For a global view of bias, let us consider that

λ

PP and

λ

NN are randomly and uniformly distributed across the [0, 1]

range while

δ

lies within the [1,1] range, and then bias Bμ is statistically characterized. First, the probability density func-tion (pdf(Bμ)) is derived for each performance metric. The re-sultsareshowninFig.12,whereNPV hasbeenomittedfromthe graphbecause itshowssymmetric behaviour toPRC andhas the samepdf.

Finally, several global statistical indicators based on pdf(Bμ) havebeendefined(seeTable6),whicharegenericallydenotedas

Bμ.Theresultsobtainedforeachperformancemetricareshown inTable12.

(9)

Table10

Biasindicatorsonsingularclassifiers(σBμ(δ))forextremelyimbalanceddatasets.

Symbol PRC NPV ACC F1 MCCn MKn Extremely positive-imbalanced 12 wcBεP μ 0.667 0 0 0 0.211 0.333 13 bcBεP μ 0 −0.667 0 0 −0.211 −0.333 14 wpcBεP μ 0.5 −0.5 −0.5 0 0 0 15 wncBεP μ 0.5 −0.5 0.5 0.333 0 0 16 mcBεP μ 0.5 −0.5 0 0.167 0 0 Extremely negative-imbalanced 17 wcBεN μ 0 0.667 0 0 −0.211 0.333 18 bcBεN μ −0.667 0 0 −0.5 0.211 −0.333 19 wpcBεN μ −0.5 0.5 0.5 0 0 0 20 wncBεN μ −0.5 0.5 −0.5 −0.667 0 0 21 mcBεN μ −0.5 0.5 0 −0.5 0 0 Table11

Biasmeasuresonlocalstatisticalindicators(ψBμ(δ))forextremelyimbalanceddatasets.

Symbol PRC NPV ACC F1 MCCn MKn Extremely positive-imbalanced 22 mBεP μ 0.5 −0.5 0 0.137 0 0 23 sdBεP μ 0.238 0.238 0.204 0.088 0.213 0.226 24 rmsBεP μ 0.554 0.554 0.204 0.163 0.213 0.226 25 maxaBεP μ 1 1 0.5 0.333 0.5 0.5 26 skBεP μ 0 0 0 0.244 0 0 27 kBεP μ −0.651 −0.651 −0.6 −1.043 −0.790 −1.014 Extremely negative-imbalanced 28 mBεN μ −0.5 0.5 0 −0.477 0 0 29 sdBεN μ 0.238 0.238 0.204 0.241 0.213 0.226 30 rmsBεN μ 0.554 0.554 0.204 0.534 0.213 0.226 31 maxaBεN μ 1 1 0.5 1 0.5 0.5 32 skBεN μ 0 0 0 0.168 0 0 33 kBεN μ −0.651 −0.651 −0.6 −0.933 −0.790 −1.014

Fig.9. Fourtypesofbiasindicatorsforsingularclassifiers.

Table12

Globalstatisticalindicatorsofbiasforeachperformancemetric.

Symbol PRC NPV ACC F1 MCCn MKn 34 MBμ 0 0 0 −0.041 0 0 35 SDBμ 0.271 0.271 0.118 0.169 0.055 0.086 36 RMSBμ 0.271 0.271 0.118 0.174 0.055 0.086 37 MAXABμ 1 1 0.5 1 0.5 0.5 38 SKBμ 0 0 0 −1.269 0 0 39 KBμ −0.046 −0.046 1.32 2.043 6.9 3.061

Inthissection,39single-valuedbiasindicatorshavebeen con-sidered, which can also be presented in a graphical form, as shown in Fig. 13. Here, the bias indicators

σ

Bμ

(

δ

)

, detailed in Table 8, have been omitted, since all these indicators are null (except for F1 score) and therefore their comparison would be meaningless.

3.4. Symmetryofbiasfunctions

Inorderto categorizethe biasfunctionsforeachperformance metric, it is convenient to study their symmetry. For this pur-pose, let us begin by considering the Matthews Correlation Co-efficient(MCC), since thiscoefficient is particularlyclear. Its heat map foran imbalancecoefficient

δ

=0.95 isshownin theupper left-hand-side plot of Fig. 14. A first anti-clockwise 90° rotation on the(

λ

PP,

λ

NN) plane is performed, andtheresult isshownin

theupperright-hand-sideplot.Theresultofasecond90ºrotation is shown inthe lower right-hand-sidegraph. Finally, the signof the bias valuesis changed,as shownin thelower left-hand-side plot.Itcanbeobservedthattheresultcoincides withtheoriginal heatmap.

The180ºrotation enablesthebiasto bedefinedon anewset of axes (

PP,

NN), which are related to the original (

λ

PP,

λ

NN)

throughtheexpressions

PP=1−

λ

PP;

NN=1−

λ

NN.This

sym-metrycanthereforebeformalizedas BMCC

(

λ

PP,

λ

NN,

δ

)

=−BMCC

(

PP,

NN,

δ

)

=−BMCC

(

1−

λ

PP,1−

λ

NN,

δ

)

. (20)

Hence, the MCC bias function shows an order-2 (180º) rota-tionalodd symmetry(or anti-symmetry)on the(

λ

PP,

λ

NN) plane.

Furthermore, thisbias function shows symmetry with respect to the principal diagonalon the (

λ

PP,

λ

NN) plane ifthe signof

δ

is

inverted,thatis,

BMCC

(

λ

NN,

λ

PP,

δ

)

=BMCC

(

λ

PP,

λ

NN,

δ

)

. (21)

This dualbehaviour iscalled TypeI symmetry. Bias functions forACCandMKalsoexhibitthistypeofsymmetry.

When the bias of precision (PRC) is considered, no symmetry onthe(

λ

PP,

λ

NN) planecanbefound.However, itexhibitsa

sym-metry in the (

λ

PP,

λ

NN,

δ

) space as can be observed in Fig. 15

where its heat volume for an imbalance coefficient

δ

=0.95 is shown in the upper left-hand-side plot. First, a mirror symme-try,withrespecttothe

δ

=0plane,isperformedandtheresultis

(10)

A.Luque,A.CarrascoandA.Martínetal./PatternRecognition91(2019)216–231 225

Fig.10. Probabilitydensityfunctionpdf[(δ)]foreachperformancemetric.

Fig.11. LocalstatisticalindicatorsofbiasψBμ(δ)foreachperformancemetric.

Fig.12. Probabilitydensityfunctionpdf()foreachperformancemetric.

shownintheupperright-hand-sideplot.Thesignofthebias val-uesisthenchangedandtheresultsareshowninthelower

right-hand-side plot. Finally, a second mirror symmetry is performed, this time with respect to the anti-diagonal plane drawn in the thirdplot.Theresultisshowninthelowerleft-hand-side plot.It can be observed that the result coincides with the original heat volume.

The double mirror symmetry enables the bias to be defined on a new set of axes (

PP,

NN,

), which are related to the

original set (

λ

PP,

λ

NN,

δ

) through the expressions

PP=1−

λ

NN;

NN=1−

λ

PP;

=−

δ

. This symmetry can hence be formalized

as

BPRC

(

λ

PP,

λ

NN,

δ

)

=−BPRC

(

PP,

NN,

)

=−BPRC

(

1−

λ

NN,1−

λ

PP,

δ

)

. (22)

Therefore, the PRC bias function shows a double mirror odd symmetry(or anti-symmetry)inthe (

λ

PP,

λ

NN,

δ

)space.This

be-haviouriscalledTypeIIsymmetry. Biasfunctionsforeach metric (exceptF1 score)exhibitthistypeofsymmetry.

Additionally, the symmetry of each pdf can be measured throughtheskewnessstatistics.

(11)

Fig.13. Single-valuedbiasindicatorsforeachperformancemetric.

Fig.14. StudyofsymmetryforBMCC(λNN,λPP)withδ=0.95.

3.5.Clusteringperformancemetricsbasedontheirbias

Inprevious sections,theimpactofimbalanceonten classifica-tionperformancemetrics isstudied.Based ontheir bias,the per-formancemetrics cannowbegroupedintoseveralclusters.In or-dertoperformthisclustering,the39single-valuedbiasindicators areconsidered.Thatis,eachperformancemetricistobefeatured byapointinanR39 space.

Totackletheissueofhowtovisualizesuchahigh-dimensional vector, a reduction of dimensionality to a plane (2D) must be

undertaken. The first approach (Fig. 16-A) involves the selection of 2 highly significant bias indicators and the projection of the points on that plane. Our selection is of RMSBμ, which is an indicator of the mean global bias, and of rmsBεP

μ, which is a mean gauge of bias for extremely positive-imbalanced datasets. In this graphic, it can be observed that: bias for SNS, SPC, GM and BMn performance metrics are at the same point; bias for ACC, MCCn and MKn are very close to each other; bias for F1 is not far from these metrics; and, finally, bias for PRC and NPV are at the same point but distant from the other metrics.

(12)

A.Luque,A.CarrascoandA.Martínetal./PatternRecognition91(2019)216–231 227

Fig.15. StudyofsymmetryforBPRC(λNN,λPP,δ)withδ=0.95.

Following these criteria, either 3 or 4 clusters could be esta-blished.

However, a more in-depth insight shows that PRC and NPV havesymmetricbehaviourformanybiasindicators.Theyhave ap-peared together inthe previous graph becausethe selected indi-cators(RMSBμ andrmsBεP

μ)computeasquaredmean,whichhides theirsymmetriccharacteristics.Inordertoovercomethisissue,the dimensionalityreductioncanbemadebyselectingadifferentpair ofbias indicators.InFig. 16-B,MAXABisa globalindicatorofthe absolute maximum value of bias (and still hides the symmetry), and mBεP

μis another mean gauge of bias for extremely positive-imbalanceddatasets thatrevealsthesymmetry. Fiveclustersnow clearlyappear.

An alternative to theprevious somewhat arbitraryand reduc-tionist selection of the pair of bias indicators involves the con-sideration of the full set of indicators and the performance of some kindofbidimensionalreduction.PrincipalComponent Anal-ysis(PCA),showninFig.16-C[26],andMulti-dimensionalScaling (MDS),showninFig.16-D[7],areemployedasthetechniquesfor thisreduction.

Each panel on Fig. 16 represents a different bidimensional perspective of highly dimensional (R39 ) set of points. Therefore, slightlydifferentclusteringmayarise inanyofthem.But consid-ering all the panels,it can be seen that the following 5 clusters appear:

I. ClustercomprisedofSNS,SPC,GMandBMnwithmetricshaving nullbias.

II. Cluster comprised of ACC, MCCn andMKn with the following features:

Bias has Type I symmetry, that is, Bμ

(

λ

PP,

λ

NN,

δ

)

= −Bμ

(

1−

λ

PP,1−

λ

NN,

δ

)

and

(

λ

NN,

λ

PP,

δ

)

=

(

λ

PP,

λ

NN,

δ

)

.

Biaspdfhasnullskewness.

Bias values are moderate: 0.5 for maximum (MAXABμ, maxaBεP

μ and maxaBεμN); and about 0.2 foraverage bias in extremelyimbalanceddatasets(rmsBεP

μ andrmsBεN

μ). Signofbiasdoesnotdependonsignofimbalance. III. Cluster(with2subclusters)comprisedofPRCandNPVwiththe

followingfeatures:

Bias has Type II symmetry, that is, Bμ

(

λ

PP,

λ

NN,

δ

)

= −Bμ

(

1

λ

NN,1−

λ

PP,

δ

)

.

Biaspdfhasnullskewness.

Biasvaluesarehigh:1formaximum(MAXABμ,maxaBεP

μ and maxaBεN

μ );andabout0.5foraverage biasinextremely im-balanceddatasets(rmsBεP

μ andrmsBεμN).

The relationship betweenthe sign of bias andthe sign of imbalanceestablishes2subclusters.

A. PRC,withbiasandimbalancehavingthesamesign. B. NPV,withbiasandimbalancehavingtheoppositesign. IV. ClustercomprisedbyF1 withthefollowingfeatures:

Biashasnosymmetry.

Biaspdfhasnon-nullskewness.

Bias values are low forpositive imbalance: 0.33 for maxi-mum(maxaBεP

μ);andapproximately0.15foraveragebiasin extremelyimbalanceddatasets(rmsBεP

μ).

Biasvaluesarehighfornegativeimbalance:1formaximum (maxaBεN

μ); and approximately 0.5 for average bias in ex-tremelyimbalanceddatasets(rmsBεN

μ ).

Signofbiasandsignofimbalancearethesame.

ClusteringinformationissummarizedinTable13.

Another way to represent how performance metrics are groupedaccordingtothebias behaviour isbydrawing a dendro-gram.Tothisend,thefullsetofbiasindicatorsisemployedto fea-tureeachperformance metric.Thedistancesbetweenthemetrics arethen computedinthe spaceoftheR39 bias indicators.These

(13)

Fig.16. Bidimensionalrepresentationofperformancemetricsaccordingtotheirbiasindicators.

Table13

Clustersofperformancemetricsattendingtotheirbias.

I II III.A III.B IV

SNSSPCGMBM ACCMCCMK PRC NPV F1

δ>0 maxaBεP

μ Null 0 Medium 0.5 High 1 High 1 Low 0.333

rmsBεP

μ 0 ∼0.2 0.554 0.554 0.163

δ<0 maxaBεN

μ 0 0.5 1 1 High 1

rmsBεμN 0 ∼0.2 0.554 0.554 0.534

Symmetry TypeI Yes Yes Yes Yes No

TypeII Yes Yes No No No

Skewness 0 0 0 0 =0

sgn(B)vs.sgn(δ) = Independent = = =

distancesareemployedtogaugehowseparatedthemetricsare,as showninFig.17.Onceagain,5clustersclearlyappear.

4. Discussion

The first issuetobe discussed isthecomparisonbetweenthe imbalanceratio(IR),definedbyseveralauthorstoquantify imbal-ance,andtheimbalancecoefficient (

δ

)asproposed inthispaper. Althoughtheyarebothvalidindicatorsofthedegreetowhichthe datasetsareimbalanced,we prefer

δ

becauseitisdefinedwithin the[−1,+1]boundedrangewiththebalancedcase(

δ

=0)inthe middleof the segment (and hence it is symmetric); whilst IR is

defined within the [0,+∞] unbounded range with the balanced caseIR=1,which isclearlyasymmetric.In orderto obtain sym-metric behaviour basedon IR,the logarithm of IRcould be used (LIR), whoserange is[−∞,+∞] withthe balanced case(LIR=0) inthemiddle;however,itsrangestillremainsunbounded.Fig.18 showsanexampleofa localstatisticalindicatorofbias (rmsB) as afunctionoftheimbalanceusing

δ

(left-hand-side)andIRin log-arithmicscale(right-hand-side).

A practical application of the above results is that the bias’s meanvalue ofeveryperformancemetric(andother related statis-tics) can be computed using the equations in Table 6, and their resultsforthetenstudiedmetricsareshowninTable12.

(14)

A.Luque,A.CarrascoandA.Martínetal./PatternRecognition91(2019)216–231 229

Fig.17. Dendrogramofperformancemetricsaccordingtotheirbiasmeasures.

Inthe caseswherethedataset isknown,theexpectedbiasof every performance metric can be computed. Indeed, the dataset determine the value of the imbalance coefficient

δ

and, for that value, thebias(itsroot-mean-squarevalue)can beobtainedasit isshowninFig.18.Thefullsetofbias’sstatisticscanbecomputed using the equations in Table 5 and their results are depicted in Fig.11.

Additionally if the classifier results on that dataset are also known (thatis,the valuesof

λ

PP and

λ

NN) the bias’sexact value

ofeveryperformancemetriccanbecomputedusingtheequations inTable3.

Letusnowfocusonthebiasfunctions.Fromtheaboveresults it is clear that the best performance metrics, that is, those with nobiasduetoimbalance,arethoseofsensitivity(SNS)and speci-ficity(SPC). Thesemetrics canbe consideredone-dimensional (or partial)performancemetricshowever,sincetheytakeintoaccount onlythe resultsoneitherthepositive(SNS) orthe negative(SPC) class,butnotboth.

Null biasis alsoshown by two metrics directlydependingon SNSandSPC: geometricmean(GM) andbookmakerinformedness (BMorsingle-thresholdAUC). Thesesolve theone-dimensionality

Table14

Behaviourofperformancemetricswithimbalanceddatasets.

Cluster Metric Bias RMSBμ Focusonclasses Focusonresults (Positive,Negative) (Successes,Errors)

I SNS Null 0 P S

SPC Null 0 N S

GM Null 0 P&N S

BM Null 0 P&N S

II ACC Medium 0.118 P&N S

MCC Medium 0.055 P&N S&E

MK Medium 0.086 P&N S&E III PRC High 0.271 P&N S&E

NPV High 0.271 P&N S&E

IV F1 High 0.174 P&N S&E

problem of the SNS and SPC metrics by considering either their arithmetic (BM) or geometric (GM) mean. Although these two metrics constitute good alternatives to be used with imbalanced datasets, they have the drawback of focusing on only the classi-fication successes(

λ

PP and

λ

NN), andfail to directlyconsider the

classificationerrors(

λ

PN and

λ

NP).

The second best (lowest biased) cluster of metrics is that which is comprised of accuracy (ACC), Matthews correlation co-efficient (MCC), and markedness (MK). These all have a global (notpartial) perspective, since classification results on both pos-itive and negative classes are considered. From among these 3 metrics,ACCfocusesonly ontheclassificationsuccesses, whichis a drawback and, additionally, has the highest bias (except when extreme balanced datasets are used). In this cluster, the lowest bias is shown by MCC with moderate values (lower than 0.2 in the normalizedversion) for almost every value of the imbalance coefficient.

Finally,the metrics in the third andfourth clusters, precision (PRC),negativepredictivevalue(NPV),andF1 score(F1),arehighly biased and should be avoided for use in imbalanced datasets. Table 14summarizes the behaviour of performance metrics with imbalanceddatasets.

As a practical conclusion, when dealing with imbalanced datasets,GMandBMarethebestperformancemetricsiftheir fo-cusonsuccesses(dismissingtheerrors) presentsnolimitationfor thespecific applicationwhere theyare used. However,if classifi-cationerrorsmustalsobeconsidered,thenMCCarisesasthebest choice.

(15)

Concordant results have been obtained in other studies, al-though mostof theseprevious resultswere only shown andnot exhaustivelyexplored andquantified.Theweakness ofACCin im-balanced problems has been signalled by many authors [10,22]. Furthermore, the use of PRC has been extensively discouraged [5,13,20]. F1 score,which dependson PRC, is also indirectly dis-missedbythoseauthors anddirectlyrejectedby Jeni,Cohn & De LaTorre[25].Mostofthe literature onthisissuedoesnot select anyperformance metrics, thereby limiting their study to a mere indicationthattheyarebiased.Afewauthorsalsosuggestthatthe bestchoiceistheMCCmetric[4,11].

Probably themostcitedsolutiontoovercometheeffectof im-balanceon performance metrics is the useof theClass Balanced Accuracy(CBA).Intheterminologyusedthroughoutthispaper,this isthe accuracy forthe classifier operating on abalanced dataset (ACCb). Nevertheless,accordingto ourstudy, thisideacan be

ex-tended to theremaining performance metrics, by obtaining their balanced counterpart (

μ

b), generallycalledClass Balance Metrics

(CBM), asformulated in thelast column ofTable2. These exten-sionspermitdifferentnull-biasperspectivesto beusedinthe as-sessmentoftheresultsobtainedby aclassifierintheimbalanced case.

The ClassBalance version oftheten studiedmetrics (

μ

b) will showanullbiasand,therefore,a valueofRMSBμ=0(columns 3 and4inTable14).Aseverymetricisnowunbiased,choosingany ofthem should be based on other criteria, for instance,on their symmetry (column1), their focuson classes (column5) ortheir focusonresults(lastcolumn).Thesevaluesremainunalteredwith respecttotheirbiasedcounterpart.

The behaviour of each Class Balance Metric is shown in Fig.2which canalso be usedasa guideforthe selectionof the metric.

5. Conclusions

In thispaper,an extensiveandsystematicstudyoftheimpact ofclass imbalanceonclassification performancemetrics is under-taken.Tothebestofourknowledge,noquantitativeandcomplete studyofthisissuehasbeenpreviouslypublished.

TocharacterizethedisparitybetweenclassestheImbalance Co-efficienthasbeendefined:anewmeasurewhichsurpassesthe Im-balanceRatioortheEntropyusedinpreviousstudies.

Throughout our analysisseveralpractical procedures to deter-minethe bias’squantitative value ofa metric havebeenderived, eitherforageneralcase,foracertaindatasetorforagiven exper-iment(pairofclassifieranddataset).

From the simulation results,a quantitativelyjustified guideto selectperformance metrics in the presence of imbalance classes has been developed. In our analysis, several clusters of perfor-mancemetrics havebeen identified that involvethe use of Geo-metricMean orBookmaker Informedness asthe bestnull-biased metricsiftheirfocusonclassificationsuccesses(dismissingthe er-rors)presentnolimitationforthespecific applicationwherethey are used. However, if classification errors must also be consid-ered,thenthe MatthewsCorrelationCoefficientarisesasthebest choice.

Finally, a set of null-biased multi-perspective Class Balance Metrics is proposed which extends the concept of Class Balance Accuracytootherperformancemetrics.

Supplementarymaterial

Supplementary material associated with this article can be found,intheonlineversion,atdoi:10.1016/j.patcog.2019.02.023.

References

[1]A.Amin,S.Anwar,A.Adnan,M.Nawaz,N.Howard,J.Qadir,...A.Hussain, Comparingoversamplingtechniquestohandletheclassimbalanceproblem:a customerchurnpredictioncasestudy,IEEEAccess4(2016)7940–7957. [2]G.E.Batista,R.C.Prati,M.C.Monard,Astudyofthebehaviorofseveralmethods

forbalancingmachinelearningtrainingdata,ACMSIGKDDExplor.Newslett.6 (1)(2004)20–29.

[3]C.Beyan,R.Fisher,Classifyingimbalanceddatasetsusingsimilaritybased hi-erarchicaldecomposition,PatternRecognit.48(5)(2015)1653–1672. [4]S. Boughorbel, F. Jarray, M. El-Anbari, Optimal classifier for imbalanced

data usingMatthewsCorrelationCoefficient metric,PloSOne12(6)(2017) e0177678.

[5]P.Branco,L.Torgo,R.P.Ribeiro,Relevance-basedevaluationmetricsformulti– classimbalanceddomains,in:Pacific-AsiaConferenceonKnowledgeDiscovery andDataMining,Cham,Springer,2017,May,pp.698–710.

[6]K.H.Brodersen,C.S.Ong,K.E.Stephan,J.M.Buhmann,Thebalancedaccuracy anditsposteriordistribution,in:PatternRecognition(ICPR),201020th Inter-nationalConferenceon,IEEE,2010,August,pp.3121–3124.

[7]R.Caruana,A.Niculescu-Mizil,Datamininginmetricspace:anempirical anal-ysisofsupervisedlearningperformancecriteria,in:ProceedingsoftheTenth ACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandData Min-ing,ACM,2004,August,pp.69–78.

[8]F.Charte,A.J.Rivera,M.J.delJesus,F.Herrera,Addressingimbalancein multi-labelclassification:measuresandrandomresamplingalgorithms, Neurocom-puting163(2015)3–16.

[9]N.V.Chawla,K.W.Bowyer,L.O.Hall,W.P.Kegelmeyer,SMOTE:synthetic mi-norityover-samplingtechnique,J.Artif.Intell.Res.16(2002)321–357. [10]N.V.Chawla,Dataminingforimbalanceddatasets:AnOverview,in:Data

Min-ingandKnowledgeDiscoveryHandbook,SpringerUS,2005,pp.853–867. [11]D.Chicco,Tenquicktipsformachinelearningincomputationalbiology,

Bio-DataMin.10(1)(2017)35.

[12] R.A.Dara,M.S.Kamel,N.Wanas,Datadependencyinmultipleclassifier sys-tems,PatternRecognit.42(7)(2009)1260–1273.

[13]S.Daskalaki,I.Kopanas,N.Avouris,Evaluationofclassifiersforanunevenclass distributionproblem,Appl.Artif.Intell.20(5)(2006)381–417.

[14]C.Ferri,J.Hernández-Orallo,R.Modroiu,Anexperimentalcomparisonof per-formance measures for classification, Pattern Recognit. Lett. 30 (1)(2009) 27–38.

[15]A.Fernández,S.delRío,N.V.Chawla,F.Herrera,Aninsightintoimbalanced BigDataclassification:outcomesand challenges,ComplexIntell.Syst.3(2) (2017)105–120.

[16]P.A.Flach,ThegeometryofROCspace:understandingmachinelearning met-ricsthroughROCisometrics,in:Proceedingsofthe20thInternational Confer-enceonMachineLearning(ICML-03),2003,pp.194–201.

[17]V. Ganganwar, An overview of classification algorithms for imbalanced datasets,Int.J.Emerg.Technol.Adv.Eng.2(4)(2012)42–47.

[18]V.García,R.A.Mollineda,J.S.Sánchez,Indexofbalancedaccuracy:a perfor-mancemeasureforskewedclassdistributions,in:IberianConferenceon Pat-ternRecognitionandImageAnalysis,Berlin,Heidelberg,Springer,2009,June, pp.441–448.

[19]J.Gorodkin,ComparingtwoK-categoryassignmentsbyaK-category correla-tioncoefficient,Comput.Biol.Chem.28(5–6)(2004)367–374.

[20]Q.Gu,L.Zhu,Z.Cai,Evaluationmeasuresoftheclassificationperformanceof imbalanceddatasets,in:InternationalSymposiumonIntelligence Computa-tionandApplications,Berlin,Heidelberg,Springer,2009,October,pp.461–471. [21]H. He,E.A. Garcia, Learningfromimbalanced data,IEEETrans.Knowl. Data

Eng.21(9)(2009)1263–1284.

[22]M.Hossin,M.N.Sulaiman,Areviewonevaluationmetricsfordata classifica-tionevaluations,Int.J.DataMin.Knowl.Manage.Process5(2)(2015)1. [23]T.Kautz,B.M.Eskofier,C.F.Pasluosta,Genericperformancemeasurefor

multi-class-classifiers,PatternRecognit.68(2017)111–125.

[24]B.Krawczyk,Learningfromimbalanceddata:openchallengesandfuture di-rections,Prog.Artif.Intell.5(4)(2016)221–232.

[25]L.A.Jeni,J.F.Cohn,F.DeLaTorre,Facingimbalanceddata–recommendations fortheuseofperformancemetrics,in:AffectiveComputingandIntelligent In-teraction(ACII),2013HumaineAssociationConference,IEEE,2013,September, pp.245–251.

[26]I.Jolliffe,Principalcomponentanalysis,in:InternationalEncyclopediaof Sta-tisticalScience,Springer,Berlin,Heidelberg,2011,pp.1094–1096.

[27]G.Jurman,S.Riccadonna,C.Furlanello,AcomparisonofMCCandCENerror measuresinmulti-classprediction,PLoSOne7(8)(2012)e41882.

[28]M.S. Kraiem, M.N. Moreno, Effectiveness of basic and advanced sampling strategiesontheclassificationofimbalanceddata.acomparativestudyusing classicaland novelmetrics,in:International ConferenceonHybridArtificial IntelligenceSystems,Cham,Springer,2017,June,pp.233–245.

[29]V.López,A.Fernández,S.García,V.Palade,F.Herrera,Aninsightinto classi-ficationwithimbalanceddata:empiricalresultsandcurrenttrendsonusing dataintrinsiccharacteristics,Inf.Sci.250(2013)113–141.

[30]B.W.Matthews,Comparisonofthepredictedand observedsecondary struc-tureofT4phagelysozyme,BiochimicaetBiophysicaActa(BBA)405(2)(1975) 442–451.

[31]N.Mehra,S.Gupta,Surveyonmulticlassclassificationmethods,Int.J.Comput. Sci.Inf.Technol.4(4)(2013)572–576.

[32]L.Mosley,ABalancedApproachtotheMulti-ClassImbalanceProblem,Iowa StateUniversity,2013.

(16)

A.Luque,A.CarrascoandA.Martínetal./PatternRecognition91(2019)216–231 231 [33]H. Núñez,L.Gonzalez-Abril,C.Angulo,ImprovingSVMclassificationon

im-balanced datasets byintroducing anewbias,J. Classification34(3)(2017) 427–443.

[34]D.M. Powers, Evaluation:from precision,Recalland F-Measureto ROC, In-formedness,MarkednessandCorrelation,SchoolofInformaticsand Engineer-ing,FlindersUniversity,Adelaide,Australia,2011TechnicalReportSIE-07-001. [35]B.Raman,T.R.Ioerger,EnhancingLearningUsingFeatureandExample

Selec-tion,TexasA&MUniversity,CollegeStation,TX,USA,2003.

[36]M.Sokolova,N.Japkowicz,S.Szpakowicz,Beyondaccuracy,F-scoreandROC: afamilyofdiscriminantmeasuresforperformanceevaluation,in:Australasian JointConferenceonArtificialIntelligence,Berlin,Heidelberg,Springer,2006, December,pp.1015–1021.

[37]M.Sokolova,G.Lapalme,Asystematicanalysisofperformancemeasuresfor classificationtasks,Inf.Process.Manage.45(4)(2009)427–437.

[38]S.V.Stehman, Selectingandinterpretingmeasuresofthematicclassification accuracy,RemoteSens.Environ.62(1)(1997)77–89.

[39]Y.Sun,M.S.Kamel,A.K.Wong,Y.Wang,Cost-sensitiveboostingfor classifica-tionofimbalanceddata,PatternRecognit.40(12)(2007)3358–3378. [40]X.Yuan,L.Xie,M.Abouelenien,Aregularizedensembleframeworkofdeep

learningforcancerdetectionfrommulti-class,imbalancedtrainingdata, Pat-ternRecognit.77(2018)160–172.

[41]W.Zong,G.B.Huang,Y.Chen,Weightedextremelearningmachinefor imbal-ancelearning,Neurocomputing101(2013)229–242.

AmaliaLuquereceivedherIndustrialEngineeringdegreein2007,Master in Au-tomationin2010andherDoctoralDegreein2014.Shehasbeeninvolvedin teach-ingrelatedtoProjectEngineeringattheUniversityofSevillesince2015.Hermain areasofresearcharebusinessintelligence,dataminingandfeatureextraction.

Alejandro Carrascoreceivedhis ComputerEngineering degreein 1998and his Ph.D.in2003.HeisaLecturerand ResearcherattheUniversityofSevillesince 1998,hasfoundedaNTBFanddirectedseveralR&Dprojects.Hismainareasof re-searcharedatamining,industrialcomputingandnetworksecurity.

AlejandroM.Martín-Gómez,B.IndustrialElectronicsandAutomationEngineer,MS Productandfacilitiesdesignanddevelopment,MSSecurityandhealthandPhD Stu-dent(manufacturingengineering).Heworkedacrossawidevarietyofprojectsof industrialsector.HeisassistantprofessoratUniversityofSeville(Fieldof Knowl-edge:EngineeringProject).

AnadelasHerasGarcíadeVinuesaisaBSc.EngIndustrialDesign,MSc.Ecodesign, andPhD.inManufacturingandEnvironmentalEngineering.Sheworksasan assis-tantprofessorintheDepartmentofDesignEngineering,EngineeringProjectarea, atUniversityofSeville,Spain.

Figure

Fig. 1. Confusion matrix for binary classification.
Fig. 2. Heat maps for metrics with balanced classes.
Fig. 3. Functions and single-valued indicators assessing bias in performance metrics due to class imbalance
Fig. 5. Heat volumes of relative bias for each performance metric.
+7

References

Related documents

Given the significance of evaluating maternal functioning, and the lack of adequate standard instruments in Iran for this purpose, the present study was aimed at translating

It was decided that with the presence of such significant red flag signs that she should undergo advanced imaging, in this case an MRI, that revealed an underlying malignancy, which

Conclusions In this paper a new method for numerical solution of ordi- nary differential equations based on pseudo differential operators and Haar wavelets is proposed.. Its

The use of sodium polyacrylate in concrete as a super absorbent polymer has promising potential to increase numerous concrete properties, including concrete

The paper assessed the challenges facing the successful operations of Public Procurement Act 2007 and the result showed that the size and complexity of public procurement,

There are infinitely many principles of justice (conclusion). 24 “These, Socrates, said Parmenides, are a few, and only a few of the difficulties in which we are involved if

The anti- oxidant activity was performed by DPPH free radical scavenging method using ascorbic acid as standard and compound IIa, IIc and IId showed significant free

There are three approaches to addressing the physical problems of the informal settlement in Kabul city, including the land readjustment and urban redevelopment