Profit-driven classification with machine learning: an application in credit scoring for micro-entrepreneurs

(1)

Profit-driven classification with machine learning:

an application in credit scoring for

micro-entrepreneurs

Sebastián Maldonado -J. Pérez (UAndes, Chile)- J. López (UDP, Chile) - C. Bravo (Western U, Canada)

Business School, University of Chile, Santiago, Chile. Email: [email protected]

(2)

(3)

(4)

(5)

Motivation

Nate Silver’s “The signal and the noise”:

(6)

The “wet bias”

(7)

The “wet bias”

(8)

The “wet bias”

(9)

Cost-sensitive and causal classification

It’s ok to be sinners?

Statistical measures for model evaluation such as accuracy are no adequate (D.J. Hand, 2012).

Focus on the primary goal of the predictive task!

(10)

The Micro-entrepreneurs challenge

Micro entrepreneurs in Chile: (very) small firms with annual

sales below 60,000 e.

(11)

The Micro-entrepreneurs challenge(2)

Should we cover this segment? (Yes!) According to Bruhn and Love, 2009:

There’s a 7% increase in income in the long term for ME with access to loans.

There’s a 1.4 increase in the number of employees that funded ME’s have.

The number of firms in the country is increased by 7.6% when there’s a healthy system.

(12)

(13)

Feature Selection

Recently, databases have incremented their size in all areas of knowledge, considering both the number of instances and attributes.

Feature selection is an important topic in business analytics applications.

A low-dimensional representation:

Reduces the risk of overfitting, improving model generalization. Requires less computational effort.

(14)

Feature Selection - Motivation II

Example Irrelevancy

(15)

Feature Selection - Motivation III

Example - Assessing Relevancy

(16)

Feature Selection Strategies

(17)

Wrapper Methods II

Figure:N attributes, 2N _{subset combinations!}

(18)

Support Vector Machines - SVM

SVM: Geometrical Interpretation

Among existing classification methods, Support Vector Machine provides several advantages such as adequate generalization to new objects due to the Structural Risk Minimization principle, absence of local minima via convex optimization, and representation that depends on only a few data points (the support vectors).

(19)

Support Vector Machines - SVM

Standard l2-SVM

Linear SVM constructs an optimal hyperplane f (x) = w>x + b which tries to correctly separate one class from the other. To achieve this optimal hyperplane, SVM aims to maximize the margin. Given that a perfect separation between the two classes is not always possible, slack variables ξiare introduced for each training vector xi,

i = 1, ..., N, whereby C is used as a penalization parameter.

(20)

SVM - Norm-based Regularizers

The Euclidean norm can be replaced to encourage sparsity l2 squared norm: kwk2₂ = J P j =1 w_j2 l1 norm (LASSO): kwk1 = J P j =1 |w_j| lp norm : kwkp = ( J P j =1 w_jp)1/p

(21)

Business-driven research I

Two datasets from a Chilean bank.

Information about the benefits and losses for granting credit, and variable acquisition costs were available.

Benefit of accepting a good borrower: Return On Investment (ROI) of the loan.

(Broverman, 2010) Considering a principal A requested at maturity (terms) T at an interest rate given by r , the total interest I follows: I = AMr 1 − (1 + r )−M − A = A Mr 1 − (1 + r )−M − 1

The ROI of a loan i will simply be the total interest I divided by the principal A, i.e.

(22)

Business-driven research II

Cost of accepting a bad applicant: loss that is incurred when the borrower defaults.

The Basel II Banking Regulation Accords define the expected loss of a borrower as

L = PD · LGD · EAD

EAD: amount that is outstanding when default occurs; LGD: percentage of the EAD that cannot be recovered after all collections actions; PD = 1. Then

(23)

Business-driven research III

Five groups of variables with different acquisition costs.

Credit evaluation: The form that each applicant fills out (sunk

cost). e 5 per borrower of an executive’s time.

In-depth interview: Interview of the applicant after the evaluation process during a visit to his/her place of work. e 20 per borrower of an executive’s time.

Financial analysis: Estimation of the cash flow of the micro-enterprise. e 20 per borrower of a specialist’s time. System-level information: The bank acquired the information of the standing debts of the borrowers in the financial system

(new customers only). e 1000 for all borrowers.

(24)

Group penalty functions

A regularization strategy designed to penalize the use a group of related variables together, instead of removing weights independently (Yuan and Lin, 2006).

Strategy used with categorical attributes with multiple levels (related dummy variables), or multiclass classification.

We propose using the l∞-norm regularizer (Zou and Yuan,

2008): Γ(w) = J X j =1 ||w(j )||∞,

where ||w(j )||∞= maxl ∈Ij{|wl|}, i.e. the highest weight (in magnitude) for each source of variables j = 1, . . . , J is

minimized, being Ij the set of variables that belong to source

(25)

Proposed formulations

The l2l∞-SVM method

It combines three objectives: the Euclidean norm minimization (also known as Tikhonov regularization), the l∞-norm penalization for grouped feature selection, and the hinge loss minimization to guarantee an adequate model fit. C , λ > 0 are parameters that will be tuned via cross-validation

(26)

Proposed formulations

The l2l∞-SVM method II

In order to avoid using of a non-smooth function in the previous problem, we introduce a set of auxiliary variables z_j≥ 0, and add new constraints |w_l| ≤ z_jfor each l ∈ I_jand j = 1, . . . , J. It results in a QPP.

(27)

Proposed formulations

kwk₁=Pn

i =1|wi| denotes the l1-norm of w.

(28)

Proposed formulations

Same trick to avoid using of a nonsmooth function in the previous problem, leading to a LPP

(29)

Profit Computation

Solution tuple Λ = {w, b} for an SVM problem. Validation subset V with xv

l and yl ∈ {−1, +1}, for

l = 1, . . . , |V|.

The decision rule is sgn(w>xv_l + b).

The total profit is: (Benefit-Losses-Variable acquisition costs).

(30)

Experimental Settings

New customers (1510 applicants, 629 defaulters, 94 variables) and returning customers (5799 applicants, 872 defaulters, 46 attributes).

Validation procedure: 10-fold crossvalidation.

Values for C and λ: {2−7, 2−6, ..., 2−1, 20, 21, ..., 26, 27} A combination of random undersampling and SMOTE oversampling was used for the returning customers. AUC and Profit as main performance metrics.

(31)

Performance summary

New customers, profit as performance metric.

Predictive performance for all feature selection approaches. All monetary metrics are expressed in Euros for a group of approx. 150 applicants (one tenth of the full sample). For the logistic regression with the backward elimination process, the cutoff is optimized to maximize profit.

Logit Fisher+SVM RFE-SVM l2l∞-SVM l1l∞-SVM

AUC 69.6 50.0 60.1 66.6 66.6 Accuracy 70.4 58.3 64.2 68.2 68.2 Vars. selected 28.7 5 5 35.4 26.4 Sources selected 4.8 2.9 2.5 1.1 1 Profit 36 1107 910 4449 4699 Benefits 8769 9965 8528 8342 8370 Losses 3235 5933 3745 3045 3078

(32)

Performance summary

Returning customers, profit as performance metric.

Predictive performance for all feature selection approaches. All monetary metrics are expressed in Euros for a group of approx. 580 applicants (one tenth of the full sample). For the logistic regression with the backward elimination process, the cutoff is optimized to maximize profit.

AUC 67.7 62.3 56.9 63.2 63.2 Accuracy 84.9 56.6 56.6 54.6 57.4 Vars. selected 28.8 20 5 31 31 Sources selected 2.9 2.2 1.9 1 1 Profit 35913 24354 22972 36581 38618 Benefits 67282 42881 39193 42419 44874 Losses 10372 4427 5077 3564 3982

(33)

Performance summary

New customers, AUC as performance metric.

Predictive performance for all feature selection approaches. Predictive performance for all feature selection approaches. All monetary metrics are expressed in Euros for a group of approx. 150 applicants (one tenth of the full sample). For the logistic regression with the backward elimination process, the maximum likelihood cutoff is used (0.5).

AUC 69.6 69.6 70.4 70.7 70.7 Accuracy 70.6 70.0 70.9 71.3 71.5 Vars. selected 28.7 80 90 90.5 58.5 Sources selected 4.8 5 5 4.3 3.9 Profit -5 -305 -40 1806 2541 Benefits 8018 7738 8003 8107 8114 Losses 2525 2287 2288 2342 2386

(34)

Performance summary

Returning customers, AUC as performance metric.

Predictive performance for all feature selection approaches. Predictive performance for all feature selection approaches. All monetary metrics are expressed in Euros for a group of approx. 580 applicants (one tenth of the full sample). For the logistic regression with the backward elimination process, the maximum likelihood cutoff is used (0.5).

AUC 67.7 67.8 65.0 67.7 67.4 Accuracy 65.1 64.0 61.7 63.8 63.8 Vars. selected 28.8 40 40 45.6 44.2 Sources selected 2.9 3 3 2.8 2.1 Profit 23579 21519 19740 23263 30399 Benefits 48436 47218 45718 47008 47315 Losses 3860 3716 3995 3734 3802

(35)

Feature selection performance

New customers

Performance for an increasing number of features. Various performance metrics.

● ● ● ● ● 0.55 0.60 0.65 0.70 0.75 25 50 75

No. of Selected Features

A UC Method ● ● ● ● RFE−SVM Fisher+SVM Twin l1l∞−SVM Twin l2l∞−SVM (a) AUC ● ● ● ● ● ● ● 0 2000 4000 25 50 75

(36)

Feature selection performance

Returning customers

Performance for an increasing number of features. Various performance metrics.

● ● ● ● ● 0.55 0.60 0.65 0.70 0.75 10 20 30 40

A UC Method ● ● ● ● RFE−SVM Fisher+SVM Twin l1l∞−SVM Twin l2l∞−SVM (c) AUC ● ● ● ● ● 20000 30000 40000 10 20 30 40

(37)

Concluding remarks(Methodology)

We presented two SVM-based strategies for simultaneous classification and feature selection.

Relevant attributes were identified using a group penalty function.

The proposals achieved similar classification power compared to well-known classification strategies, but outperformed them in terms of profit.

Best results with few variables (importance of model understanding)

(38)

Concluding remarks (Applied)

Traditional Credit Scoring variables may not be adequate for micro entrepreneurs.

Income is not relevant!!

Accountability is of major relevance

Importance of variables we are able to corroborate (utility service bills of the last three months on time).

(39)

Collaboration Company-University

As a University, it is of utmost importance for us to do proper technological transfer. The model is discussed and improved in conjunction with our counterpart.

(40)

Some extensions I

Lopez, J., Maldonado S. (2019): Profit-based Credit Scoring based on Robust Optimization and Feature Selection. Information Sciences 500, 190-202.

(41)

Some extensions II

Maldonado S., Peters, G., Weber, R. (2020): Credit Scoring using Three-Way Decisions with Probabilistic Rough Sets. Information Sciences 507, 700-714.

(42)

Some extensions III

Maldonado, S., Lopez, J., Vairetti, C. (2021): Time-weighted Fuzzy Support Vector Machines for classification in changing environments. Information Sciences 559, 97-110.

This study addresses the issue of credit risk modelling in changing environments.

In credit scoring, granting policies change over time due to risk-related internal decisions or macroeconomic factors. We propose constructing models that are robust in the presence of small changes in the data distribution.

(43)

Profit-driven classification with machine learning: an application in credit scoring for micro-entrepreneurs