The impact of class imbalance in classification performance metrics based on the binary confusion matrix

(1)

Contents lists available at ScienceDirect

Pattern

Recognition

journal homepage: www.elsevier.com/locate/patcog

The

impact

of

class

imbalance

in

classiﬁcation

performance

metrics

based

on

the

binary

confusion

matrix

Amalia

Luque

a,∗

_,

_Alejandro

_Carrasco

b

_,

_Alejandro

_Martín

a

_,

_Ana

_de

_las

_Heras

a a _Dpto._Ingeniería_del_Diseño._Escuela_Politécnica_Superior._Universidad_de_Sevilla._Virgen_de_África,_7.₄₁₀₁₁_Sevilla,_Spain b _Dpto._Tecnología_{Electrónica.}_Escuela_Politécnica_Superior._Universidad_de_Sevilla._Virgen_de_África,_7._41011,_Sevilla,_Spain

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received4September2018 Revised22December2018 Accepted22February2019 Availableonline28February2019

Keywords:

Classiﬁcation Performancemeasures Imbalanceddatasets ClassBalanceMetrics

a

b

s

t

r

a

c

t

Amajorissueintheclassificationofclassimbalanceddatasetsinvolvesthedeterminationofthe most suitableperformance metricstobeused.Inpreviousworkusingseveralexamples, ithasbeenshown thatimbalancecanexertamajorimpactonthevalueandmeaningofaccuracyandoncertainother well-knownperformancemetrics.Inthispaper,ourapproachgoesbeyondsimplystudyingcasestudiesand developsasystematicanalysisofthisimpactbysimulatingtheresultsobtainedusingbinaryclassifiers.A setoffunctionsandnumericalindicatorsareattainedwhichenablesthecomparisonofthebehaviourof severalperformancemetricsbasedonthebinaryconfusionmatrixwhentheyarefacedwithimbalanced datasets. Throughoutthe paper,anew wayto measurethe imbalanceis definedwhichsurpassesthe Imbalance Ratioused inpreviousstudies.Fromthe simulationresults,severalclustersofperformance metrics have beenidentified that involvethe use ofGeometricMean orBookmakerInformedness as thebestnull-biasedmetricsiftheirfocusonclassificationsuccesses(dismissingtheerrors)presentsno limitation forthe specificapplication where theyareused. However,ifclassification errorsmust also beconsidered,thentheMatthewsCorrelationCoefficientarisesasthebestchoice.Finally,asetof null-biasedmulti-perspectiveClassBalanceMetricsisproposedwhichextendstheconceptofClassBalance Accuracytootherperformancemetrics.

1. Introduction

In recent years, the scientific community workingon classifi-cation algorithms has shown an increasing interest in the chal-lenges that arise when imbalanced datasets are considered. Sev-eraloverviewsontheseissueshavebeenaddressedin[2,17,21,24]. Intheseanalyses,theSyntheticMinorityOver-samplingTechnique (SMOTE) [9] and the AdaBoost [39] are highlighted as general-purposesolutions, although algorithms ofa more specific nature can also be found, either as general-purpose [3], as problem-oriented[15]orasclassifier-oriented[33].Algorithmsthataddress theimbalanceprobleminmulti-labelclassification[8]orthat use advanced classifiers such as extreme learning machines[41] can alsobe found.An up-to-datecomparison ofthesetechniques ad-dressingaspecificproblemcanbefoundin[40].

∗ _{Corresponding}_author.

E-mailaddresses:[email protected](A.Luque), [email protected](A.Carrasco), [email protected](A.Martín),[email protected](A.delasHeras).

A key aspect of thesemethods involves the determination of the classificationperformance, not only inorder to assessthe fi-nalresult,butalsotoobtainafigurewhichhastobeoptimizedby tuningtheclassifierparameters.However,thereisnosinglewayto selectthebest algorithmasanyalgorithmcanobtaingoodresults inoneclassbutpoorscoresinotherclasses.Forthisreason, sev-eralmetricsare usuallyconsidered, whichpermitsthepolyhedral characteristicsoftheclassificationperformancetobeviewedfrom differentpointsofviews.

The impact of class imbalance on classiﬁcation performance metricshasthereforebecome amajor issue.Severalauthorshave addressedthistopicbyshowingafewexamplesofthisimpacton accuracy[10,22]andonseveralothermetrics[5,13,20].Tothebest of ourknowledge, only one systematic(albeit limited) studyhas beenpublishedthatisnotsimplybasedonexamples,[25].

Thequantitativeresearchonclassiﬁcationperformance metrics hastraditionallybeentackledeitherbyusingacollectionofknown andwidelyavailable datasets[1,4],orbyrandomlysimulatingthe classiﬁerresults[27,34].Amixtureofrandomsimulatedvariations onknowndatasetsisemployedinJenietal.[25].

https://doi.org/10.1016/j.patcog.2019.02.023

(2)

A.Luque,A.CarrascoandA.Martínetal./PatternRecognition91(2019)216–231 217 In order to overcomethe effect ofimbalance on performance

metrics, severalsolutions havebeensuggested, among whichthe mostcitedistheuseoftheClassBalancedAccuracy(CBA)[1,6,32]. Approachessuchasrelevance-basedevaluation[5],theNormalized PrecisionRate[13],theIndexofBalancedAccuracy[18,28,29],and the multiclassperformance score (MPS)[23] have alsobeen pro-posed.

Inthispaper,an extensiveandsystematicstudyisundertaken of the impact of class imbalance on classification performance metrics. Several dozen performance metrics can be found in the scientific literature, some based on a threshold, others based on probabilities, while yet othersare based on ranks [14]. However, themostwidelyemployed metricsare thosebasedonthe confu-sionmatrix[38],wherethemulti-classcaseisusuallyreducedto a setofbinarycasesusingtheOne-versus-All orthe One-versus-One approach [31].Forthese reasons,ourresearch isfocused on classification performance metrics basedon the binaryconfusion matrix.

Therestofthepaperisorganizedasfollows.Section2presents the methodology employed to measure the impact of imbalance in performance metrics, thereby formally defining the confusion matrix (Section 2.1) and themetrics based thereon (Section 2.2). Section2.3proposesanewfigureforthequantificationoftheclass imbalance,and severalfunctionsandindicators ofthe aforemen-tionedimpact ofimbalancearedefinedin Section2.4.The appli-cationto asetofclassification performancemetrics basedonthe binaryconfusionmatrix ispresentedinSection 3.The discussion andconclusionoftheseresultsareaddressedinSection4.

2. Methodology

2.1. Deﬁnitionoftheconfusionmatrix

ConsideradatasetD=

{

d₁,d₂,· · ·,dm

}

madeupofmelements

whered_krepresentsthek-thelement.Let

be asetofCclasses

=

{

θ

1 ,

θ

2 ,· · ·,

θ

C

}

where

θ

ideﬁnesthei-thclass.TheclassiﬁerC

operatingondk(thek-thelementofthedatasetD)assignsalabel

θ

j andestimatesthat thiselement belongsto the j-thclass, that

is,d_k→C

θ

jorC(dk

)

=

θ

j,whileitreallybelongstothei-thclass

θ

i,

therebycausingamisclassiﬁcation(aconfusion)wheni=j. Let A=

{

α

1 ,

α

2 ,· · ·,

α

m

}

be the set of actual classes

corre-spondingtothedataset_D,where

α

_kistheactualclassofthe ele-mentdk.Furthermore,letE=

{

ε

1 ,

ε

2 ,· · ·,

ε

m

}

bethesetofclasses

estimated by the classiﬁer C foreach element in D, where ɛk is

the estimatedclass ofthe element d_k.The performance of _C can beassessedusingameasuringfunctionM,whichassignsametric

μ

∈Rtothepair

(

_A_,_E

)

,thatis,

(

_A_,_E

)

→M

μ

.

In this paper, we will focus on metrics based on the confu-sion matrix,whichrepresentsone ofthe mostcommonmethods topresenttheresultsobtainedbyaclassiﬁer,andisdeﬁnedas

CM≡

⎡

⎢

⎣

m11 m12 ... m1 C m₂₁ m₂₂ ... m₂C . . . ... ... ... mC1 mC2 ... mCC

⎤

⎥

⎦

. (1)

In thisexpression, mij representsthe number ofelements

ac-tually belonging to the i-th class (

θ

i) but that are classiﬁed as

membersofthej-thclass(

θ

j).Inthecontextofourresearch,itis

better todescribe thetermmij inrelation tothetotalnumber of

elementsmibelongingtothei-thclass(

θ

i).Bydenoting

λ

ijasthe

ratiomij/mi,theconfusionmatrixcanberewrittenas

CM=

⎡

⎢

⎣

λ

11m1

λ

12m1 ...

λ

1Cm1

λ

21 m2

λ

22 m2 ...

λ

2 Cm2 . . . ... ... ...

λ

C1 mC

λ

C2 mC ...

λ

CCmC

⎤

⎥

⎦

, (2)

whichcanbeexpressed astheHadamard(element-wise)product oftwomatricesintheform

CM=

⎡

⎢

⎣

λ

11

λ

12 ...

λ

1 C

λ

21

λ

22 ...

λ

2 C . . . ... ... ...

λ

C1

λ

C2 ...

λ

CC

⎤

⎥

⎦

◦

⎡

⎢

⎣

m₁ m₁ ... m₁ m2 m2 ... m2 . . . ... ... ... mC mC ... mC

⎤

⎥

⎦

. (3) Inthebinarycase,thatis,whenthenumberofclassesisC₌2, thentheconfusionmatrixcanbewrittenas

CM≡

m11 m12

m21 m22 . (4)

Ingeneral, oneofthe classesiscalledthe“Positive” class and theother is namedthe “Negative” class.Therefore,the confusion matrixcanberewrittenaccordingtothisnewterminologyas CM≡

mPP mPN

mNP mNN . (5)

Theelementsofthismatrixarenamedwiththefollowing con-vention:mPP,“TruePositive” (TP);mPN,“FalseNegative” (FN);mNP,

“FalsePositive” (FP);andmNN,“TrueNegative” (TN).

Thetotal numberofpositive mP andnegativemN elementsin

D meet that they summ,the total number of elements, that is, mP+mN=m.Furthermore,itisalsotruethatthenumberof

ele-mentscorrectlyclassiﬁedinclassP(mPP),andthenumberof

ele-mentsmisclassiﬁedinthatclassP(mPN),addsuptothenumberof

elementsinthepositiveclass(mP),thatis,mPP+mPN=mP.

Simi-larly,itcan bestatedthatmNP+mNN=mN.Theconfusionmatrix

canthereforebewrittenas CM=

mPP mP−mPP

mN−mNN mNN , (6)

whichcanalsobeformulatedintermsofthe

λ

ijratiosas

CM=

λ

PPmP

λ

PNmP

λ

NPmN

λ

NNmN =

λ

PPmP

(

1−

λ

PP

)

mP

(

1−

λ

NN

)

mN

λ

NNmN . (7)

Additionally,thetotalnumberofelementsinDestimatedbyC aspositive(despitetheir actual class),eP,andthoseestimatedas

negative,eN,canbewrittenas

eP=mPP+mNP=

λ

PPmP+

(

1−

λ

NN

)

mN.

eN=mNN+mPN=

λ

NNmN+

(

1−

λ

PP

)

mP. (8)

Theyalsoadduptothetotalnumberofelements,eP+eN=m.

Thedeﬁnitionsregardingtheconfusionmatrixaresummarizedin Fig.1.

2.2.Metricsbasedonthebinaryconfusionmatrix

Based onthe binaryconfusionmatrix, numerousperformance metricshavebeenproposed[19,27,30,34,37].Forourstudy,the fo-cus is placed on 10 of these metrics, which are summarized in Table1.Allthesemetrics,take valueswithinthe[0,1] range, ex-cept the last three (MCC, BM,and MK), whose ranges lie within the [−1,1] interval. For comparison purposes, these metrics are usedhereinintheirnormalizedversion(MCCn,BMn,andMKn).By namingametricdeﬁnedwithinthe[−1,1]intervalas

μ

,itcanbe normalizedwithinthe[0,1]rangebytheexpression

μ

n≡

μ

+

1

(3)

Fig.1. Confusionmatrixforbinaryclassiﬁcation.

Table1

Deﬁnitionofclassiﬁcationperformancemetrics.

Symbol Metric Deﬁnedas

SNS Sensitivity T P T P+F N SPC Speciﬁcity T N T N+F P PRC Precision T P T P+F P NPV NegativePredictiveValue T N

T N+F N

ACC Accuracy TP+TN

TP+F N+TN+F P F1 F1score 2PRCPRC+·SNSSNS

GM GeometricMean √SNS·SPC

MCC MatthewsCorrelationCoeﬃcient _√ TP·TN−FP·FN

( T P+F P)( T P+F N)( T N+F P)( T N+F N) BM BookmakerInformedness SNS+SPC−1

MK Markedness PPV+NPV−1

Itcaneasilybeshown(seesupplementarymaterialinthe elec-tronicversionofthispaper)thatallthesemetricscanbeexpressed as a function

μ

=

μ

(

λ

PP,

λ

NN,

π

P,

π

N

)

, where

π

P is the ratio of

positive elements in the dataset (mP/m) and, analogously, where

π

N≡mN/m. Furthermore, for balanced classes, when

π

P=

π

N=

0_.5,themthemetricscanbeformulatedas

μ

₌

μ

(

λ

PP,

λ

NN

)

.These

functionsaredepictedinFig.2asaheatmapforeachmetric. The bestclassiﬁer achievesa value of

λ

PP=1(allthepositive

elementsare correctly classiﬁedas positive)and, also a value of

λ

NN=1(allthenegativeelementsarecorrectlyclassiﬁedas

neg-ative), corresponding to the upper right-hand-side corner in the graphic.Instead,theworstclassiﬁer(

λ

PP=0,

λ

NN=0)corresponds

tothelowerleft-hand-sidecornerofthegraphic.

Althoughonlyperformancemetricsbasedontheconfusion ma-trix are considered, a marginal approach to Receiver Operating Characteristics(ROC)analysis[16]canalsobe carriedout. Inthis analysis,theArea UnderCurve(AUC) iscommonlyusedasa per-formancemetric.However,forclassiﬁersofferingonlyalabel(and not a set of scores for each label), or when a single threshold isused on scores, the value of AUC andBMn are the same[36]. Therefore, in the forthcoming sections, whenever BMn is men-tioneditcouldalsobeunderstoodasAUC.

2.3.Deﬁningclassimbalance

Theconceptofclassimbalanceisrelativelyclear:itariseswhen the dataset has a different number of elements in positive and negativeclasses.However, itsformalization isfarfrombeing uni-vocallyaccepted. For instance, in [18,29], the imbalance is char-acterizedbythe dominance (Dom) orprevalence relationship be-tween the positive class and the negative class, and is deﬁned asTPR−TNR.Thisvalue islateremployed tocompensate perfor-mancemetrics affected by the imbalance problem. However, the dominanceisnotexactlyameasureoftheimbalanceinthedataset becauseitconsidersimbalanceintheoutcomesoftheclassiﬁer.

Other authors formalize this concept by using the entropy [12]or,morecommonly,theproportionbetweenpositiveand neg-ative instances (formalized as1: X) [13] which is similar to the imbalance ratio(IR) deﬁned asmP/mN [1], alsocalled skew [25].

This value lies within the [0, ∞] range,having a value IR=1in thebalancedcase.

Other authors formalize this concept by using the propor-tionbetweenpositive andnegativeinstances (formalizedas1:X) [13]or,whichissimilar,theimbalanceratio(IR)deﬁnedasmP/mN

[1],which isalso calledskew [25].This value lieswithin the [0,

∞]range,havingavalueIR=1inthebalancedcase.

In this paper, it is preferred to feature the imbalance with a value within the [−1,1] range, while reserving the 0 value for when the classes are perfectly balanced (lack of imbalance). For thispurpose,weproposetheimbalancecoeﬃcient

δ

as

δ

=

δ

P≡2

π

P−1=2

mP

m −1. (10)

Forthe negativeclass,the coeﬃcient

δ

N=2

π

N−1 isalso

de-ﬁned.Thesumofthesecoeﬃcientsis

δ

P+

δ

N=2

π

P−1+2

π

N−1=2

(

π

P+

π

N

)

−2=0. (11)

Hence

δ

N=−

δ

P=−

δ

. From (10),the value of

π

P can be

ob-tainedas

π

P=

1+

δ

2 . (12)

Moreover,thevalueof

π

N canbederivedfrom(11)

π

N=

1−

δ

2 . (13)

Therefore,themetrics

μ

=

μ

(

λ

PP,

λ

NN,

π

P,

π

N

)

canberedeﬁned

as

μ

=

μ

(

λ

PP,

λ

NN,

δ

)

. It is clear that the value of the metric

μ

dependsnot onlyonthe classiﬁer’sperformance,butalso onthe imbalance

δ

.

Itcan easily be derivedthat the relationshipbetweenthe im-balanceratio(IR)andtheimbalancecoeﬃcient(

δ

)is

IR=1+

δ

1−

δ

. (14)

2.4. Assessingtheimpactofimbalance

Inordertoassesstheimpactoftheimbalanceinacertain met-ric,its value(

μ

b) forthebalanced case(when

δ

=0)isﬁrst

con-sidered.

μ

b≡

μ

|

_δ₌0=

μ

(

λ

PP,

λ

NN,0

)

=

μ

b

(

λ

PP,

λ

NN

)

. (15)

This deﬁnition is later employed to propose a family of met-ricswherethe effectoftheimbalance isdismissed.Inthe scien-tiﬁcliterature,afewexamplesofthesemetricscanbefound,asin thecaseofClassBalance Accuracy(CBA)[35].Ongeneralizingthis approach, themetrics

μ

b are calledClass Balance Metrics (CBM).

Table2summarizestheequationsforeachmetric,bothintheclass imbalanceandbalancecases.Theseresultsarederivedinthe sup-plementarymaterialofthepaper,availableonline.

Withthesedeﬁnitions,itisnowpossibletoquantifytheimpact ofimbalance,byusingthe biasofthemetricwhichis deﬁnedas B_μ≡

μ

−

μ

b=

μ

(

λ

PP,

λ

NN,

δ

)

−

μ

b

(

λ

PP,

λ

NN

)

. (16)

Table3summarizesthedeﬁnitionofbiasforeachmetric.These results are derived in the supplementary material of the paper, availableonline.

As can be observed, bias depends on three variables: B_μ=

B_μ

(

λ

PP,

λ

NN,

δ

)

. In order to study this function, a 4-dimensional

space is required. Its representations are ﬁrst tackled using heat volumes(3D),whereeachpointinthe3-dimensional(

λ

PP,

λ

NN,

δ

)

(4)

A.Luque,A.CarrascoandA.Martínetal./PatternRecognition91(2019)216–231 219

Fig.2. Heatmapsformetricswithbalancedclasses.

Table2

Classiﬁcationperformancemetricsasafunctionofimbalance.

Metrics Classimbalancemetricsμ(λPP,λNN,δ) ClassBalancemetricsμb(λPP,λNN)

SNS λPP λPP SPC λNN λNN PRC λPP( 1+δ) λPP( 1+δ)+( 1−λNN)( 1−δ) λPP λPP+( 1−λNN) NPV λNN( 1 −δ) λNN( 1 −δ)+( 1 −λPP)( 1+δ) λNN λNN+( 1 −λPP) ACC λPP1+2δ+λNN1 −2δ λPP+2λNN F1 ₍1+λPP)( 12+λδPP)+( 1( 1 +−δ) λNN)( 1 −δ) 2λPP 2+λPP−λNN GM λPP·λNN λPP·λNN MCCn 1 ₂( λPP+λNN−1 [ λPP+( 1 −λNN) 11−+δδ][ λNN+( 1 −λPP) 11+−δδ] +1) 1 ₂(√ λPP+λNN−1 [ λPP+( 1 −λNN) ][ λNN+( 1 −λPP) ] +1) BMn λPP+λNN 2 λPP+λNN 2 MKn 1 ₂( 1+δ ( 1+δ)+1−λNN λPP ( 1 −δ) + 1 −δ ( 1 −δ)+1−λPP λNN ( 1+δ) ) 1 2 (1+11 −λNN λPP + 1 1+1−λPP λNN ) Table3

Biasofperformancemetricsduetoclassimbalance. Metrics BiasBμ(λPP,λNN,δ) SNS 0 SPC 0 PRC 1+δ ( 1+δ)+1−λNN λPP ( 1 −δ) − 1 1+1−λNN λPP NPV 1 −δ ( 1 −δ)+1−λPP λNN( 1+δ) − 1 1+1−λPP λNN ACC δ₂(λPP−λNN) F1 2 λPP( 1+δ) ( 1+λPP)( 1+δ)+( 1 −λNN)( 1 −δ) − 2 λPP 2+λPP−λNN GM 0 MCCn λPP+λNN−1 2 [λPP+( 1−λNN) 11−+δδ][λNN+( 1−λPP) 11+−δδ] − λPP+λNN−1 2√[λPP+( 1−λNN) ][λNN+( 1−λPP) ] BMn 0 MKn 1 ₂( 1+δ ( 1+δ)+1−λNN λPP ( 1 −δ) − 1 1+1−λNN λPP + 1 −δ ( 1 −δ)+1−λPP λNN( 1+δ) − 1 1+1−λPP λNN )

Alternatively, B_μ isalso representedasa setofheat maps(or contourgraphs).Eachheatmap(2D)representsthemetricbiasfor acertainﬁxedvalueoftheimbalance(letussay

δ

₀),thereby mak-ing bias dependent on two variables (

λ

PP,

λ

NN) and on one

con-stant(

δ

₀).Hence,B_μ=B_μ

(

λ

PP,

λ

NN,

δ

0

)

.

Nevertheless, drawing conclusions regarding a 4-dimensional spaceis,inmostcases,achallengingtask.Severalpartialprospects are therefore proposed that reduce the bias function dimension-ality. In this respect, the ﬁrst approach involves considering the

Table4

Biasindicatorsforsingularclassiﬁers.

Singularclassiﬁer Symbol σBμ(δ)

Worstclassiﬁer wcBμ(δ) lim

ε→0 Bμ(ε,ε,δ)

Bestclassiﬁer bcBμ(δ) lim

ε→0 Bμ(1−ε,1−ε,δ)

Worst-positiveclassiﬁer wpcBμ(δ) lim

ε→0Bμ(ε,1−ε,δ) Worst-negativeclassiﬁer wncBμ(δ) lim

ε→0Bμ(1−ε,ε,δ) Mediumclassiﬁer mcBμ(δ) Bμ(0.5,0.5,δ)

metricbiasforclassiﬁerswhoseperformanceissingularlylocated onthe(

λ

PP,

λ

NN)plane, therebyobtaining asetofbiasindicators

which are generically denoted as

σ

B_μ(

δ

). The singularclassiﬁers andtheirformulationsareproposedinTable4.

An alternative way to reduce the dimensionality of B_μ is throughtheconsiderationthat

λ

PPand

λ

NNarerandomlyand

uni-formlydistributed within the [0, 1] range. Bias can thereforebe seen for each value of the imbalance coeﬃcient

δ

as a random variable B_μ(

δ

), which is characterized by its probability density function (pdf[B_μ(

δ

)]). Additionally, several local statistical indica-torscanbedeﬁned,whicharegenericallydenotedas

ψ

B_μ(

δ

).The termlocalattributedto theseindicators(summarized in Table5) meansthattheyaredeﬁnedforeachvalueof

δ

.

DeﬁnitionsinTables4and5haveintroducedseveralprospects ofbias,all ofwhichdepend onthe imbalance

δ

.Theyare

(5)

gener-Table5

Localstatisticalindicatorsonbias.

Localstatisticalindicator Symbol ψBμ(δ)

Mean mBμ(δ) ∫∫Bμ(λPP,λNN,δ)dλPPdλNN

Standarddeviation sdBμ(δ)

[Bμ(λPP,λNN,δ)−mBμ(δ)]2 dλPPdλNN

Root-Mean-SquareBias rmsBμ(δ) B2 _μ(λPP,λNN,δ)dλPPdλNN

Maximumabsolutevalue maxaBμ(δ) max 0≤λPP≤1 0≤λNN≤1 |Bμ(λPP,λNN,δ)| Skewness skBμ(δ) _sdB1 3_μ₍_δ) [Bμ(λPP,λNN,δ)−mBμ(δ)]3dλPPdλNN Kurtosis kBμ(δ) _sdB1 4_μ₍_δ) [Bμ(λPP,λNN,δ)−mBμ(δ)]4 dλPPdλNN Table6

Globalstatisticalindicatorsonbias.

Globalstatisticalindicator Symbol Bμ(δ)

Mean MBμ 1 2 Bμ(λPP,λNN,δ)dλPPdλNNdδ Standarddeviation SDBμ 1 2 [Bμ(λPP,λNN,δ)−MBμ]2 dλPPdλNNdδ Root-Mean-SquareBias RMSBμ 1 2 B2 μ(λPP,λNN,δ)dλPPdλNNdδ

Maximumabsolutevalue MAXABμ max 0≤λPP≤1 0≤λNN≤1 0≤δ≤1 |Bμ(λPP,λNN,δ)| Skewness SKBμ SDB13_μ 1 2 [Bμ(λPP,λNN,δ)−MBμ]3dλPPdλNNdδ Kurtosis KBμ SDB1 4_μ 1 2 [Bμ(λPP,λNN,δ)−MBμ(δ)]4dλPPdλNNdδ

ically denoted as xB_μ

(

δ

)

=

{

σ

B_μ

(

δ

)

,

ψ

B_μ

(

δ

)

}

. It should be ob-servedthatthesearenotnumbersbutfunctions.Inordertoobtain asinglevalue derivedfromthesefunctions, wecanmakethe hy-pothesisthat

δ

israndomly anduniformlydistributed withinthe [−1,1]range.Meanvaluesofeachfunctioncanthereforebe com-putedas

xB_μ≡1

2 ₁

−1 xBμ

(

δ

)

d

δ

. (17) Anotherwaytoreduce thegeneric functionxB_μ(

δ

)to asingle valueisbyfocusingonitsvalueforextremelypositive-imbalanced datasets where

δ

→1.A generic value can thereforebe obtained throughtheexpression

xBεP

μ ≡lim_ε_→₀xBμ

(

1−

ε

)

. (18) Thecorrespondingnegativecounterpartisdeﬁnedas

xBεN

μ ≡_εlim_→₀xBμ

(

−1+

ε

)

. (19) A globalregard ofbiascan alsobe undertakenby considering that

λ

PP and

λ

NN are randomly anduniformlydistributed within

the [0, 1] range, and also that

δ

lies within the [₋1_,1] range. Bias can now be seen asa random variableB_μ that is indepen-dent of the imbalance coeﬃcient

δ

. Bias can therefore be char-acterized by its probability density function (pdf[B_μ]). Based on this overall pdf, several global statistical indicators can be de-ﬁned,whichare genericallydenotedas

B_μ andare summarized inTable6.

DeﬁnitionsinEqs.(17)–(19)andinTable6haveintroduced sev-eralsingle-valuedindicatorsregardingbias whichwillgenerically bedenotedasXB_μ

(

δ

)

=

{

xB_μ,xBεP

μ,xBεN

μ ,

B_μ

(

δ

)

}

.

Throughout thissubsection,severalfunctionandsingle-valued indicators have been introduced to assess the impact of dataset

imbalance on classiﬁcation performance metrics. A summary of theseindicatorsisdepictedinFig.3.

3. Results

3.1. Performancemetricbiasfunction

ThemethodsdescribedinSection2willnowbeappliedtothe tenmetrics deﬁnedin Table1. As explainedabove, biasdepends onthreevariables,thatis,B_μ=B_μ

(

λ

PP,

λ

NN,

δ

)

anditsformulation

fortheselectedmetricsisalsoshowninTable3.Theﬁrstapproach fortheirrepresentationisbasedonthe heatvolumesasdepicted inFig.4,whereeachpointinthe3-dimensional(

λ

PP,

λ

NN,

δ

)space

hasabias-dependentcolour.

Incertainperformancemetrics(forinstanceinMCCn),biashas low values for many points in the (

λ

PP,

λ

NN,

δ

) space. In these

cases,the expressive power ofthe whole range ofcolours is not completelyexploited. It isthereforebetter toselectthe colourof each point not directly based on bias, but on the relative value of bias within the range of values for their corresponding met-ric. The colour-map is then rescaled to show the relative bias with the value −1 corresponding to the minimum bias, and the value+1tothemaximum.The resultisdepictedinFig.5,where the range of each metric is shown in its corresponding subplot title.

Asexplainedabove,B_μ canalsoberepresentedasasetofheat maps.Eachheatmaprepresentsthemetricbiasforacertainﬁxed valueoftheimbalanceB_μ=B_μ

(

λ

PP,

λ

NN,

δ

0

)

.Theresultfor preci-sion(PRC)isportrayedinFig.6.Similargraphics canbeobtained fortheremainingmetrics.

Now let us suppose that the value of

δ

is known, for in-stance

δ

=

δ

0 =0.95. ThereforeBμ=B_μ

(

λ

PP,

λ

NN,

δ

0

)

dependson onlytwovariables(

λ

PPand

λ

NN)andcanberepresentedasaheat

map. The resultsfor each metric are shownin Fig. 7 where the coloursrepresenttheabsolutevalueofbias.

(6)

Fig.3. Functionsandsingle-valuedindicatorsassessingbiasinperformancemetricsduetoclassimbalance.

Fig.4. Heatvolumesofbiasforeachperformancemetric.

Thisinformationcan alsobepresentedintheformofcontour graphsasinFig.8wherethecoloursrepresenttheabsolutevalue ofbias.

3.2. Biasindicatorsdependingon

δ

Inthe previoussection, severalbiasindicatorsdependingonly ontheimbalancecoeﬃcient

δ

weredefined.Asmentionedtherein, the firstapproach isto consider classifierswhoseperformance is singularlylocatedonthe(

λ

PP,

λ

NN)plane.Theresultsobtainedfor

each bias indicator and performance metric

σ

B_μ(

δ

) are summa-rized in Table 7, whereunbiased performance metrics (SNS,SPC, GM,BMn)havebeenomitted.

Table7

Biasindicatorsforsingularclassiﬁers:σBμ(δ).

PRC NPV ACC F1 MCCn MKn wcBμ 0 0 0 0 0 0 bcBμ 0 0 0 0 0 0 wpcBμ δ/2 −δ/2 −δ/2 0 0 0 wncBμ δ/2 −δ/2 δ/2 2( 1+δ) 3+δ −2 3 0 0 mcBμ δ/2 −δ/2 0 1+δ 2+δ−12 0 0

Itcan be observed that onlyfour typesof non-nullindicators appear.Theirdependenceon

δ

isplottedinFig.9.

Alternatively,itcanbeassumedthat

λ

PPand

λ

NNarerandomly

(7)

Fig.5. Heatvolumesofrelativebiasforeachperformancemetric.

Fig.6. Setofheatmapsofbiasforprecision.

B_μ(

δ

) canthen be statisticallycharacterized.First, theprobability densityfunction(pdf[B_μ(

δ

)])isderivedforeachperformance met-ric.The resultsare shownin Fig.10, where,foreach

δ

andeach biasB_μ,thevaluesofpdfare shownusingdifferentcolours(dark bluecorrespondstopdf=0).

Additionally, several local statistical indicators have been de-ﬁned(see Table5) and theseare generically denotedas

ψ

B_μ(

δ

). Theresultsobtainedforeach performance metricare depictedin Fig.11.Foreasierreading,unbiasedperformancemetrics(SNS,SPC, GM,BMn) havebeen omitted.Moreover,NPV presents symmetric behaviourto PRC andhasalso beendisregarded fromthegraphs forthesakeofsimplicity.

3.3. Single-valuedbiasindicators

Ashasalreadybeenpointedout,therearevariouswaysto ob-tainsingle-valuedbiasindicators.First,weconsiderbiasfunctions

σ

B_μ(

δ

)assummarizedinTable7.Byassumingthat

δ

israndomly anduniformlydistributedwithin the[−1,1] range,themean val-uesofeachmeasurecanbecomputed.Theresultsforeach perfor-mancemetricareshowninTable8.

Now letusconsiderbiasfunctions

ψ

B_μ(

δ

) depictedinFig.11. By applying the same method, mean values of these measures can also be obtained. Results for each metric are shown in Table9.

(8)

Fig.7. Heatmapsofbiasforeachperformancemetric(δ=0.95).

Fig.8. Contourgraphsofbiasforeachperformancemetric(δ=0.95).

Table8

MeanvaluesofbiasfunctionsforsingularclassiﬁersσBμ(δ).

Symbol PRC NPV ACC F1 MCCn MKn 1 wcBμ 0 0 0 0 0 0 2 bcBμ 0 0 0 0 0 0 3 wpcB_μ 0 0 0 0 0 0 4 wncBμ 0 0 0 −0.053 0 0 5 mcBμ 0 0 0 −0.049 0 0 Table9

MeanvaluesofbiasoflocalstatisticalindicatorsψBμ(δ).

Symbol PRC NPV ACC F1 MCCn MKn 6 mBμ 0 0 0 −0.041 0 0 7 sdBμ 0.082 0.082 0.102 0.066 0.038 0.066 8 rmsBμ 0.228 0.228 0.102 0.135 0.038 0.066 9 maxaBμ 0.308 0.308 0.25 0.244 0.090 0.154 10 skBμ 0 0 0 0.129 0 0 11 kBμ 0.080 0.080 −0.6 −1.093 0.006 0.028

Insteadofcomputingthemeanofafunction,single-valuedbias indicatorscanalsobeobtainedby focusingontheir valuefor ex-tremelyimbalanceddatasets.Byﬁrstconsideringthebiasmetrics ofsingularclassiﬁers(

σ

B_μ(

δ

)),theirresultsareshowninTable10. Regardingtheextremelyimbalancedcaseforlocalstatistical in-dicators(

ψ

B_μ(

δ

)),theirresultsareshowninTable11.

For a global view of bias, let us consider that

λ

PP and

λ

NN are randomly and uniformly distributed across the [0, 1]

range while

δ

lies within the [₋1_,1] range, and then bias B_μ is statistically characterized. First, the probability density func-tion (pdf(B_μ)) is derived for each performance metric. The re-sultsareshowninFig.12,whereNPV hasbeenomittedfromthe graphbecause itshowssymmetric behaviour toPRC andhas the samepdf.

Finally, several global statistical indicators based on pdf(B_μ) havebeendeﬁned(seeTable6),whicharegenericallydenotedas

B_μ.Theresultsobtainedforeachperformancemetricareshown inTable12.

(9)

Table10

Biasindicatorsonsingularclassiﬁers(σBμ(δ))forextremelyimbalanceddatasets.

Symbol PRC NPV ACC F1 MCCn MKn Extremely positive-imbalanced 12 wcBεP μ 0.667 0 0 0 0.211 0.333 13 bcBεP μ 0 −0.667 0 0 −0.211 −0.333 14 wpcBεP μ 0.5 −0.5 −0.5 0 0 0 15 wncBεP μ 0.5 −0.5 0.5 0.333 0 0 16 mcBεP μ 0.5 −0.5 0 0.167 0 0 Extremely negative-imbalanced 17 wcBεN μ 0 0.667 0 0 −0.211 0.333 18 bcBεN μ −0.667 0 0 −0.5 0.211 −0.333 19 wpcBεN μ −0.5 0.5 0.5 0 0 0 20 wncBεN μ −0.5 0.5 −0.5 −0.667 0 0 21 mcBεN μ −0.5 0.5 0 −0.5 0 0 Table11

Biasmeasuresonlocalstatisticalindicators(ψBμ(δ))forextremelyimbalanceddatasets.

Symbol PRC NPV ACC F1 MCCn MKn Extremely positive-imbalanced 22 mBεP μ 0.5 −0.5 0 0.137 0 0 23 sdBεP μ 0.238 0.238 0.204 0.088 0.213 0.226 24 rmsBεP μ 0.554 0.554 0.204 0.163 0.213 0.226 25 maxaBεP μ 1 1 0.5 0.333 0.5 0.5 26 skBεP μ 0 0 0 0.244 0 0 27 kBεP μ −0.651 −0.651 −0.6 −1.043 −0.790 −1.014 Extremely negative-imbalanced 28 mBεN μ −0.5 0.5 0 −0.477 0 0 29 sdBεN μ 0.238 0.238 0.204 0.241 0.213 0.226 30 rmsBεN μ 0.554 0.554 0.204 0.534 0.213 0.226 31 maxaBεN μ 1 1 0.5 1 0.5 0.5 32 skBεN μ 0 0 0 0.168 0 0 33 kBεN μ −0.651 −0.651 −0.6 −0.933 −0.790 −1.014

Fig.9. Fourtypesofbiasindicatorsforsingularclassiﬁers.

Table12

GlobalstatisticalindicatorsofbiasBμforeachperformancemetric.

Symbol PRC NPV ACC F1 MCCn MKn 34 MBμ 0 0 0 −0.041 0 0 35 SDBμ 0.271 0.271 0.118 0.169 0.055 0.086 36 RMSBμ 0.271 0.271 0.118 0.174 0.055 0.086 37 MAXABμ 1 1 0.5 1 0.5 0.5 38 SKBμ 0 0 0 −1.269 0 0 39 KBμ −0.046 −0.046 1.32 2.043 6.9 3.061

Inthissection,39single-valuedbiasindicatorshavebeen con-sidered, which can also be presented in a graphical form, as shown in Fig. 13. Here, the bias indicators

σ

B_μ

(

δ

)

, detailed in Table 8, have been omitted, since all these indicators are null (except for F₁score) and therefore their comparison would be meaningless.

3.4. Symmetryofbiasfunctions

Inorderto categorizethe biasfunctionsforeachperformance metric, it is convenient to study their symmetry. For this pur-pose, let us begin by considering the Matthews Correlation Co-efficient(MCC), since thiscoefficient is particularlyclear. Its heat map foran imbalancecoefficient

δ

=0.95 isshownin theupper left-hand-side plot of Fig. 14. A ﬁrst anti-clockwise 90° rotation on the(

λ

PP,

λ

NN) plane is performed, andtheresult isshownin

theupperright-hand-sideplot.Theresultofasecond90ºrotation is shown inthe lower right-hand-sidegraph. Finally, the signof the bias valuesis changed,as shownin thelower left-hand-side plot.Itcanbeobservedthattheresultcoincides withtheoriginal heatmap.

The180ºrotation enablesthebiasto bedeﬁnedon anewset of axes (

PP,

NN), which are related to the original (

λ

PP,

λ

NN)

throughtheexpressions

PP=1−

λ

PP;

NN=1−

λ

NN.This

sym-metrycanthereforebeformalizedas BMCC

(

λ

PP,

λ

NN,

δ

)

=−BMCC

(

PP,

NN,

δ

)

=−BMCC

(

1−

λ

PP,1−

λ

NN,

δ

)

. (20)

Hence, the MCC bias function shows an order-2 (180º) rota-tionalodd symmetry(or anti-symmetry)on the(

λ

PP,

λ

NN) plane.

Furthermore, thisbias function shows symmetry with respect to the principal diagonalon the (

λ

PP,

λ

NN) plane ifthe signof

δ

is

inverted,thatis,

BMCC

(

λ

NN,

λ

PP,−

δ

)

=BMCC

(

λ

PP,

λ

NN,

δ

)

. (21)

This dualbehaviour iscalled TypeI symmetry. Bias functions forACCandMKalsoexhibitthistypeofsymmetry.

When the bias of precision (PRC) is considered, no symmetry onthe(

λ

PP,

λ

NN) planecanbefound.However, itexhibitsa

sym-metry in the (

λ

PP,

λ

NN,

δ

) space as can be observed in Fig. 15

where its heat volume for an imbalance coeﬃcient

δ

=0.95 is shown in the upper left-hand-side plot. First, a mirror symme-try,withrespecttothe

δ

=0plane,isperformedandtheresultis

(10)

Fig.10. Probabilitydensityfunctionpdf[Bμ(δ)]foreachperformancemetric.

Fig.11. LocalstatisticalindicatorsofbiasψBμ(δ)foreachperformancemetric.

Fig.12. Probabilitydensityfunctionpdf(Bμ)foreachperformancemetric.

shownintheupperright-hand-sideplot.Thesignofthebias val-uesisthenchangedandtheresultsareshowninthelower

right-hand-side plot. Finally, a second mirror symmetry is performed, this time with respect to the anti-diagonal plane drawn in the thirdplot.Theresultisshowninthelowerleft-hand-side plot.It can be observed that the result coincides with the original heat volume.

The double mirror symmetry enables the bias to be deﬁned on a new set of axes (

PP,

NN,

), which are related to the

original set (

λ

PP,

λ

NN,

δ

) through the expressions

PP=1−

λ

NN;

NN=1−

λ

PP;

=−

δ

. This symmetry can hence be formalized

as

BPRC

(

λ

PP,

λ

NN,

δ

)

=−BPRC

(

PP,

NN,

)

=−BPRC

(

1−

λ

NN,1−

λ

PP,−

δ

)

. (22)

Therefore, the PRC bias function shows a double mirror odd symmetry(or anti-symmetry)inthe (

λ

PP,

λ

NN,

δ

)space.This

be-haviouriscalledTypeIIsymmetry. Biasfunctionsforeach metric (exceptF1 score)exhibitthistypeofsymmetry.

Additionally, the symmetry of each pdf can be measured throughtheskewnessstatistics.

(11)

Fig.13. Single-valuedbiasindicatorsforeachperformancemetric.

Fig.14. StudyofsymmetryforBMCC(λNN,λPP)withδ=0.95.

3.5.Clusteringperformancemetricsbasedontheirbias

Inprevious sections,theimpactofimbalanceonten classiﬁca-tionperformancemetrics isstudied.Based ontheir bias,the per-formancemetrics cannowbegroupedintoseveralclusters.In or-dertoperformthisclustering,the39single-valuedbiasindicators areconsidered.Thatis,eachperformancemetricistobefeatured byapointinanR39 _space.

Totackletheissueofhowtovisualizesuchahigh-dimensional vector, a reduction of dimensionality to a plane (2D) must be

undertaken. The ﬁrst approach (Fig. 16-A) involves the selection of 2 highly signiﬁcant bias indicators and the projection of the points on that plane. Our selection is of RMSB_μ, which is an indicator of the mean global bias, and of rmsBεP

μ, which is a mean gauge of bias for extremely positive-imbalanced datasets. In this graphic, it can be observed that: bias for SNS, SPC, GM and BMn performance metrics are at the same point; bias for ACC, MCCn and MKn are very close to each other; bias for F₁ is not far from these metrics; and, ﬁnally, bias for PRC and NPV are at the same point but distant from the other metrics.

(12)

Fig.15. StudyofsymmetryforBPRC(λNN,λPP,δ)withδ=0.95.

Following these criteria, either 3 or 4 clusters could be esta-blished.

However, a more in-depth insight shows that PRC and NPV havesymmetricbehaviourformanybiasindicators.Theyhave ap-peared together inthe previous graph becausethe selected indi-cators(RMSB_μ andrmsBεP

μ)computeasquaredmean,whichhides theirsymmetriccharacteristics.Inordertoovercomethisissue,the dimensionalityreductioncanbemadebyselectingadifferentpair ofbias indicators.InFig. 16-B,MAXABisa globalindicatorofthe absolute maximum value of bias (and still hides the symmetry), and mBεP

μis another mean gauge of bias for extremely positive-imbalanceddatasets thatrevealsthesymmetry. Fiveclustersnow clearlyappear.

An alternative to theprevious somewhat arbitraryand reduc-tionist selection of the pair of bias indicators involves the con-sideration of the full set of indicators and the performance of some kindofbidimensionalreduction.PrincipalComponent Anal-ysis(PCA),showninFig.16-C[26],andMulti-dimensionalScaling (MDS),showninFig.16-D[7],areemployedasthetechniquesfor thisreduction.

Each panel on Fig. 16 represents a different bidimensional perspective of highly dimensional (R39 ₎ _set _of _points. _Therefore, slightlydifferentclusteringmayarise inanyofthem.But consid-ering all the panels,it can be seen that the following 5 clusters appear:

I. ClustercomprisedofSNS,SPC,GMandBMnwithmetricshaving nullbias.

II. Cluster comprised of ACC, MCCn andMKn with the following features:

• Bias has Type I symmetry, that is, B_μ

(

λ

PP,

λ

NN,

δ

)

= −B_μ

(

1−

λ

PP,1−

λ

NN,

δ

)

and Bμ

(

λ

NN,

λ

PP,−

δ

)

=Bμ

(

λ

PP,

λ

NN,

δ

)

.

• Biaspdfhasnullskewness.

• Bias values are moderate: 0.5 for maximum (MAXAB_μ, maxaBεP

μ and maxaBεμN); and about 0.2 foraverage bias in extremelyimbalanceddatasets(rmsBεP

μ andrmsBεN

μ). • Signofbiasdoesnotdependonsignofimbalance. III. Cluster(with2subclusters)comprisedofPRCandNPVwiththe

followingfeatures:

• Bias has Type II symmetry, that is, B_μ

(

λ

PP,

λ

NN,

δ

)

= −B_μ

(

1₋

λ

NN,1−

λ

PP,−

δ

)

.

• Biaspdfhasnullskewness.

• Biasvaluesarehigh:1formaximum(MAXAB_μ,maxaBεP

μ and maxaBεN

μ );andabout0.5foraverage biasinextremely im-balanceddatasets(rmsBεP

μ andrmsBεμN).

• The relationship betweenthe sign of bias andthe sign of imbalanceestablishes2subclusters.

A. PRC,withbiasandimbalancehavingthesamesign. B. NPV,withbiasandimbalancehavingtheoppositesign. IV. ClustercomprisedbyF₁withthefollowingfeatures:

• Biashasnosymmetry.

• Biaspdfhasnon-nullskewness.

• Bias values are low forpositive imbalance: 0.33 for maxi-mum(maxaBεP

μ);andapproximately0.15foraveragebiasin extremelyimbalanceddatasets(rmsBεP

μ).

• Biasvaluesarehighfornegativeimbalance:1formaximum (maxaBεN

μ); and approximately 0.5 for average bias in ex-tremelyimbalanceddatasets(rmsBεN

μ ).

• Signofbiasandsignofimbalancearethesame.

ClusteringinformationissummarizedinTable13.

Another way to represent how performance metrics are groupedaccordingtothebias behaviour isbydrawing a dendro-gram.Tothisend,thefullsetofbiasindicatorsisemployedto fea-tureeachperformance metric.Thedistancesbetweenthemetrics arethen computedinthe spaceoftheR39 _bias _indicators._These

(13)

Fig.16. Bidimensionalrepresentationofperformancemetricsaccordingtotheirbiasindicators.

Table13

Clustersofperformancemetricsattendingtotheirbias.

I II III.A III.B IV

SNSSPCGMBM ACCMCCMK PRC NPV F1

δ>0 maxaBεP

μ Null 0 Medium 0.5 High 1 High 1 Low 0.333

rmsBεP

μ 0 ∼0.2 0.554 0.554 0.163

δ<0 maxaBεN

μ 0 0.5 1 1 High 1

rmsBε_μN 0 ∼0.2 0.554 0.554 0.534

Symmetry TypeI Yes Yes Yes Yes No

TypeII Yes Yes No No No

Skewness 0 0 0 0 =0

sgn(B)vs.sgn(δ) = Independent = = =

distancesareemployedtogaugehowseparatedthemetricsare,as showninFig.17.Onceagain,5clustersclearlyappear.

4. Discussion

The first issuetobe discussed isthecomparisonbetweenthe imbalanceratio(IR),definedbyseveralauthorstoquantify imbal-ance,andtheimbalancecoefficient (

δ

)asproposed inthispaper. Althoughtheyarebothvalidindicatorsofthedegreetowhichthe datasetsareimbalanced,we prefer

δ

becauseitisdeﬁnedwithin the[−1,+1]boundedrangewiththebalancedcase(

δ

=0)inthe middleof the segment (and hence it is symmetric); whilst IR is

deﬁned within the [0,+∞] unbounded range with the balanced caseIR₌1,which isclearlyasymmetric.In orderto obtain sym-metric behaviour basedon IR,the logarithm of IRcould be used (LIR), whoserange is[−∞,+∞] withthe balanced case(LIR=0) inthemiddle;however,itsrangestillremainsunbounded.Fig.18 showsanexampleofa localstatisticalindicatorofbias (rmsB) as afunctionoftheimbalanceusing

δ

(left-hand-side)andIRin log-arithmicscale(right-hand-side).

A practical application of the above results is that the bias’s meanvalue ofeveryperformancemetric(andother related statis-tics) can be computed using the equations in Table 6, and their resultsforthetenstudiedmetricsareshowninTable12.

(14)

Fig.17. Dendrogramofperformancemetricsaccordingtotheirbiasmeasures.

Inthe caseswherethedataset isknown,theexpectedbiasof every performance metric can be computed. Indeed, the dataset determine the value of the imbalance coeﬃcient

δ

and, for that value, thebias(itsroot-mean-squarevalue)can beobtainedasit isshowninFig.18.Thefullsetofbias’sstatisticscanbecomputed using the equations in Table 5 and their results are depicted in Fig.11.

Additionally if the classiﬁer results on that dataset are also known (thatis,the valuesof

λ

PP and

λ

NN) the bias’sexact value

ofeveryperformancemetriccanbecomputedusingtheequations inTable3.

Letusnowfocusonthebiasfunctions.Fromtheaboveresults it is clear that the best performance metrics, that is, those with nobiasduetoimbalance,arethoseofsensitivity(SNS)and speci-ﬁcity(SPC). Thesemetrics canbe consideredone-dimensional (or partial)performancemetricshowever,sincetheytakeintoaccount onlythe resultsoneitherthepositive(SNS) orthe negative(SPC) class,butnotboth.

Null biasis alsoshown by two metrics directlydependingon SNSandSPC: geometricmean(GM) andbookmakerinformedness (BMorsingle-thresholdAUC). Thesesolve theone-dimensionality

Table14

Behaviourofperformancemetricswithimbalanceddatasets.

Cluster Metric Bias RMSBμ Focusonclasses Focusonresults (Positive,Negative) (Successes,Errors)

I SNS Null 0 P S

SPC Null 0 N S

GM Null 0 P&N S

BM Null 0 P&N S

II ACC Medium 0.118 P&N S

MCC Medium 0.055 P&N S&E

MK Medium 0.086 P&N S&E III PRC High 0.271 P&N S&E

NPV High 0.271 P&N S&E

IV F1 High 0.174 P&N S&E

problem of the SNS and SPC metrics by considering either their arithmetic (BM) or geometric (GM) mean. Although these two metrics constitute good alternatives to be used with imbalanced datasets, they have the drawback of focusing on only the classi-ﬁcation successes(

λ

PP and

λ

NN), andfail to directlyconsider the

classiﬁcationerrors(

λ

PN and

λ

NP).

The second best (lowest biased) cluster of metrics is that which is comprised of accuracy (ACC), Matthews correlation co-efficient (MCC), and markedness (MK). These all have a global (notpartial) perspective, since classification results on both pos-itive and negative classes are considered. From among these 3 metrics,ACCfocusesonly ontheclassificationsuccesses, whichis a drawback and, additionally, has the highest bias (except when extreme balanced datasets are used). In this cluster, the lowest bias is shown by MCC with moderate values (lower than 0.2 in the normalizedversion) for almost every value of the imbalance coefficient.

Finally,the metrics in the third andfourth clusters, precision (PRC),negativepredictivevalue(NPV),andF1 score(F1),arehighly biased and should be avoided for use in imbalanced datasets. Table 14summarizes the behaviour of performance metrics with imbalanceddatasets.

As a practical conclusion, when dealing with imbalanced datasets,GMandBMarethebestperformancemetricsiftheir fo-cusonsuccesses(dismissingtheerrors) presentsnolimitationfor thespeciﬁc applicationwhere theyare used. However,if classiﬁ-cationerrorsmustalsobeconsidered,thenMCCarisesasthebest choice.

(15)

Concordant results have been obtained in other studies, al-though mostof theseprevious resultswere only shown andnot exhaustivelyexplored andquantiﬁed.Theweakness ofACCin im-balanced problems has been signalled by many authors [10,22]. Furthermore, the use of PRC has been extensively discouraged [5,13,20]. F₁score,which dependson PRC, is also indirectly dis-missedbythoseauthors anddirectlyrejectedby Jeni,Cohn & De LaTorre[25].Mostofthe literature onthisissuedoesnot select anyperformance metrics, thereby limiting their study to a mere indicationthattheyarebiased.Afewauthorsalsosuggestthatthe bestchoiceistheMCCmetric[4,11].

Probably themostcitedsolutiontoovercometheeffectof im-balanceon performance metrics is the useof theClass Balanced Accuracy(CBA).Intheterminologyusedthroughoutthispaper,this isthe accuracy forthe classiﬁer operating on abalanced dataset (ACCb). Nevertheless,accordingto ourstudy, thisideacan be

ex-tended to theremaining performance metrics, by obtaining their balanced counterpart (

μ

b), generallycalledClass Balance Metrics

(CBM), asformulated in thelast column ofTable2. These exten-sionspermitdifferentnull-biasperspectivesto beusedinthe as-sessmentoftheresultsobtainedby aclassiﬁerintheimbalanced case.

The ClassBalance version oftheten studiedmetrics (

μ

_b) will showanullbiasand,therefore,a valueofRMSB_μ=0(columns 3 and4inTable14).Aseverymetricisnowunbiased,choosingany ofthem should be based on other criteria, for instance,on their symmetry (column1), their focuson classes (column5) ortheir focusonresults(lastcolumn).Thesevaluesremainunalteredwith respecttotheirbiasedcounterpart.

The behaviour of each Class Balance Metric is shown in Fig.2which canalso be usedasa guideforthe selectionof the metric.

5. Conclusions

In thispaper,an extensiveandsystematicstudyoftheimpact ofclass imbalanceonclassiﬁcation performancemetrics is under-taken.Tothebestofourknowledge,noquantitativeandcomplete studyofthisissuehasbeenpreviouslypublished.

TocharacterizethedisparitybetweenclassestheImbalance Co-eﬃcienthasbeendeﬁned:anewmeasurewhichsurpassesthe Im-balanceRatioortheEntropyusedinpreviousstudies.

Throughout our analysisseveralpractical procedures to deter-minethe bias’squantitative value ofa metric havebeenderived, eitherforageneralcase,foracertaindatasetorforagiven exper-iment(pairofclassiﬁeranddataset).

From the simulation results,a quantitativelyjustified guideto selectperformance metrics in the presence of imbalance classes has been developed. In our analysis, several clusters of perfor-mancemetrics havebeen identified that involvethe use of Geo-metricMean orBookmaker Informedness asthe bestnull-biased metricsiftheirfocusonclassificationsuccesses(dismissingthe er-rors)presentnolimitationforthespecific applicationwherethey are used. However, if classification errors must also be consid-ered,thenthe MatthewsCorrelationCoefficientarisesasthebest choice.

Finally, a set of null-biased multi-perspective Class Balance Metrics is proposed which extends the concept of Class Balance Accuracytootherperformancemetrics.

Supplementarymaterial

Supplementary material associated with this article can be found,intheonlineversion,atdoi:10.1016/j.patcog.2019.02.023.

References

[1]A.Amin,S.Anwar,A.Adnan,M.Nawaz,N.Howard,J.Qadir,...A.Hussain, Comparingoversamplingtechniquestohandletheclassimbalanceproblem:a customerchurnpredictioncasestudy,IEEEAccess4(2016)7940–7957. [2]G.E.Batista,R.C.Prati,M.C.Monard,Astudyofthebehaviorofseveralmethods

forbalancingmachinelearningtrainingdata,ACMSIGKDDExplor.Newslett.6 (1)(2004)20–29.

[3]C.Beyan,R.Fisher,Classifyingimbalanceddatasetsusingsimilaritybased hi-erarchicaldecomposition,PatternRecognit.48(5)(2015)1653–1672. [4]S. Boughorbel, F. Jarray, M. El-Anbari, Optimal classiﬁer for imbalanced

data usingMatthewsCorrelationCoeﬃcient metric,PloSOne12(6)(2017) e0177678.

[5]P.Branco,L.Torgo,R.P.Ribeiro,Relevance-basedevaluationmetricsformulti– classimbalanceddomains,in:Paciﬁc-AsiaConferenceonKnowledgeDiscovery andDataMining,Cham,Springer,2017,May,pp.698–710.

[6]K.H.Brodersen,C.S.Ong,K.E.Stephan,J.M.Buhmann,Thebalancedaccuracy anditsposteriordistribution,in:PatternRecognition(ICPR),201020th Inter-nationalConferenceon,IEEE,2010,August,pp.3121–3124.

[7]R.Caruana,A.Niculescu-Mizil,Datamininginmetricspace:anempirical anal-ysisofsupervisedlearningperformancecriteria,in:ProceedingsoftheTenth ACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandData Min-ing,ACM,2004,August,pp.69–78.

[8]F.Charte,A.J.Rivera,M.J.delJesus,F.Herrera,Addressingimbalancein multi-labelclassiﬁcation:measuresandrandomresamplingalgorithms, Neurocom-puting163(2015)3–16.

[9]N.V.Chawla,K.W.Bowyer,L.O.Hall,W.P.Kegelmeyer,SMOTE:synthetic mi-norityover-samplingtechnique,J.Artif.Intell.Res.16(2002)321–357. [10]N.V.Chawla,Dataminingforimbalanceddatasets:AnOverview,in:Data

Min-ingandKnowledgeDiscoveryHandbook,SpringerUS,2005,pp.853–867. [11]D.Chicco,Tenquicktipsformachinelearningincomputationalbiology,

Bio-DataMin.10(1)(2017)35.

[12] R.A.Dara,M.S.Kamel,N.Wanas,Datadependencyinmultipleclassiﬁer sys-tems,PatternRecognit.42(7)(2009)1260–1273.

[13]S.Daskalaki,I.Kopanas,N.Avouris,Evaluationofclassiﬁersforanunevenclass distributionproblem,Appl.Artif.Intell.20(5)(2006)381–417.

[14]C.Ferri,J.Hernández-Orallo,R.Modroiu,Anexperimentalcomparisonof per-formance measures for classiﬁcation, Pattern Recognit. Lett. 30 (1)(2009) 27–38.

[15]A.Fernández,S.delRío,N.V.Chawla,F.Herrera,Aninsightintoimbalanced BigDataclassiﬁcation:outcomesand challenges,ComplexIntell.Syst.3(2) (2017)105–120.

[16]P.A.Flach,ThegeometryofROCspace:understandingmachinelearning met-ricsthroughROCisometrics,in:Proceedingsofthe20thInternational Confer-enceonMachineLearning(ICML-03),2003,pp.194–201.

[17]V. Ganganwar, An overview of classiﬁcation algorithms for imbalanced datasets,Int.J.Emerg.Technol.Adv.Eng.2(4)(2012)42–47.

[18]V.García,R.A.Mollineda,J.S.Sánchez,Indexofbalancedaccuracy:a perfor-mancemeasureforskewedclassdistributions,in:IberianConferenceon Pat-ternRecognitionandImageAnalysis,Berlin,Heidelberg,Springer,2009,June, pp.441–448.

[19]J.Gorodkin,ComparingtwoK-categoryassignmentsbyaK-category correla-tioncoeﬃcient,Comput.Biol.Chem.28(5–6)(2004)367–374.

[20]Q.Gu,L.Zhu,Z.Cai,Evaluationmeasuresoftheclassiﬁcationperformanceof imbalanceddatasets,in:InternationalSymposiumonIntelligence Computa-tionandApplications,Berlin,Heidelberg,Springer,2009,October,pp.461–471. [21]H. He,E.A. Garcia, Learningfromimbalanced data,IEEETrans.Knowl. Data

Eng.21(9)(2009)1263–1284.

[22]M.Hossin,M.N.Sulaiman,Areviewonevaluationmetricsfordata classiﬁca-tionevaluations,Int.J.DataMin.Knowl.Manage.Process5(2)(2015)1. [23]T.Kautz,B.M.Eskoﬁer,C.F.Pasluosta,Genericperformancemeasurefor

multi-class-classiﬁers,PatternRecognit.68(2017)111–125.

[24]B.Krawczyk,Learningfromimbalanceddata:openchallengesandfuture di-rections,Prog.Artif.Intell.5(4)(2016)221–232.

[25]L.A.Jeni,J.F.Cohn,F.DeLaTorre,Facingimbalanceddata–recommendations fortheuseofperformancemetrics,in:AffectiveComputingandIntelligent In-teraction(ACII),2013HumaineAssociationConference,IEEE,2013,September, pp.245–251.

[26]I.Jolliffe,Principalcomponentanalysis,in:InternationalEncyclopediaof Sta-tisticalScience,Springer,Berlin,Heidelberg,2011,pp.1094–1096.

[27]G.Jurman,S.Riccadonna,C.Furlanello,AcomparisonofMCCandCENerror measuresinmulti-classprediction,PLoSOne7(8)(2012)e41882.

[28]M.S. Kraiem, M.N. Moreno, Effectiveness of basic and advanced sampling strategiesontheclassiﬁcationofimbalanceddata.acomparativestudyusing classicaland novelmetrics,in:International ConferenceonHybridArtiﬁcial IntelligenceSystems,Cham,Springer,2017,June,pp.233–245.

[29]V.López,A.Fernández,S.García,V.Palade,F.Herrera,Aninsightinto classi-ﬁcationwithimbalanceddata:empiricalresultsandcurrenttrendsonusing dataintrinsiccharacteristics,Inf.Sci.250(2013)113–141.

[30]B.W.Matthews,Comparisonofthepredictedand observedsecondary struc-tureofT4phagelysozyme,BiochimicaetBiophysicaActa(BBA)405(2)(1975) 442–451.

[31]N.Mehra,S.Gupta,Surveyonmulticlassclassiﬁcationmethods,Int.J.Comput. Sci.Inf.Technol.4(4)(2013)572–576.

[32]L.Mosley,ABalancedApproachtotheMulti-ClassImbalanceProblem,Iowa StateUniversity,2013.

(16)

A.Luque,A.CarrascoandA.Martínetal./PatternRecognition91(2019)216–231 231 [33]H. Núñez,L.Gonzalez-Abril,C.Angulo,ImprovingSVMclassiﬁcationon

im-balanced datasets byintroducing anewbias,J. Classiﬁcation34(3)(2017) 427–443.

[34]D.M. Powers, Evaluation:from precision,Recalland F-Measureto ROC, In-formedness,MarkednessandCorrelation,SchoolofInformaticsand Engineer-ing,FlindersUniversity,Adelaide,Australia,2011TechnicalReportSIE-07-001. [35]B.Raman,T.R.Ioerger,EnhancingLearningUsingFeatureandExample

Selec-tion,TexasA&MUniversity,CollegeStation,TX,USA,2003.

[36]M.Sokolova,N.Japkowicz,S.Szpakowicz,Beyondaccuracy,F-scoreandROC: afamilyofdiscriminantmeasuresforperformanceevaluation,in:Australasian JointConferenceonArtiﬁcialIntelligence,Berlin,Heidelberg,Springer,2006, December,pp.1015–1021.

[37]M.Sokolova,G.Lapalme,Asystematicanalysisofperformancemeasuresfor classiﬁcationtasks,Inf.Process.Manage.45(4)(2009)427–437.

[38]S.V.Stehman, Selectingandinterpretingmeasuresofthematicclassiﬁcation accuracy,RemoteSens.Environ.62(1)(1997)77–89.

[39]Y.Sun,M.S.Kamel,A.K.Wong,Y.Wang,Cost-sensitiveboostingfor classiﬁca-tionofimbalanceddata,PatternRecognit.40(12)(2007)3358–3378. [40]X.Yuan,L.Xie,M.Abouelenien,Aregularizedensembleframeworkofdeep

learningforcancerdetectionfrommulti-class,imbalancedtrainingdata, Pat-ternRecognit.77(2018)160–172.

[41]W.Zong,G.B.Huang,Y.Chen,Weightedextremelearningmachinefor imbal-ancelearning,Neurocomputing101(2013)229–242.

AmaliaLuquereceivedherIndustrialEngineeringdegreein2007,Master in Au-tomationin2010andherDoctoralDegreein2014.Shehasbeeninvolvedin teach-ingrelatedtoProjectEngineeringattheUniversityofSevillesince2015.Hermain areasofresearcharebusinessintelligence,dataminingandfeatureextraction.

Alejandro Carrascoreceivedhis ComputerEngineering degreein 1998and his Ph.D.in2003.HeisaLecturerand ResearcherattheUniversityofSevillesince 1998,hasfoundedaNTBFanddirectedseveralR&Dprojects.Hismainareasof re-searcharedatamining,industrialcomputingandnetworksecurity.

AlejandroM.Martín-Gómez,B.IndustrialElectronicsandAutomationEngineer,MS Productandfacilitiesdesignanddevelopment,MSSecurityandhealthandPhD Stu-dent(manufacturingengineering).Heworkedacrossawidevarietyofprojectsof industrialsector.HeisassistantprofessoratUniversityofSeville(Fieldof Knowl-edge:EngineeringProject).

AnadelasHerasGarcíadeVinuesaisaBSc.EngIndustrialDesign,MSc.Ecodesign, andPhD.inManufacturingandEnvironmentalEngineering.Sheworksasan assis-tantprofessorintheDepartmentofDesignEngineering,EngineeringProjectarea, atUniversityofSeville,Spain.