Profit-driven classification with machine learning:
an application in credit scoring for
micro-entrepreneurs
Sebasti´an Maldonado -J. P´erez (UAndes, Chile)- J. L´opez (UDP, Chile) - C. Bravo (Western U, Canada)
Business School, University of Chile, Santiago, Chile. Email: [email protected]
Motivation
Nate Silver’s “The signal and the noise”:
The “wet bias”
The “wet bias”
The “wet bias”
Cost-sensitive and causal classification
It’s ok to be sinners?
Statistical measures for model evaluation such as accuracy are no adequate (D.J. Hand, 2012).
Focus on the primary goal of the predictive task!
The Micro-entrepreneurs challenge
Micro entrepreneurs in Chile: (very) small firms with annual
sales below 60,000 e.
The Micro-entrepreneurs challenge(2)
Should we cover this segment? (Yes!) According to Bruhn and Love, 2009:
There’s a 7% increase in income in the long term for ME with access to loans.
There’s a 1.4 increase in the number of employees that funded ME’s have.
The number of firms in the country is increased by 7.6% when there’s a healthy system.
Feature Selection
Recently, databases have incremented their size in all areas of knowledge, considering both the number of instances and attributes.
Feature selection is an important topic in business analytics applications.
A low-dimensional representation:
Reduces the risk of overfitting, improving model generalization. Requires less computational effort.
Feature Selection - Motivation II
Example Irrelevancy
Feature Selection - Motivation III
Example - Assessing Relevancy
Feature Selection Strategies
Wrapper Methods II
Figure:N attributes, 2N subset combinations!
Support Vector Machines - SVM
SVM: Geometrical Interpretation
Among existing classification methods, Support Vector Machine provides several advantages such as adequate generalization to new objects due to the Structural Risk Minimization principle, absence of local minima via convex optimization, and representation that depends on only a few data points (the support vectors).
Support Vector Machines - SVM
Standard l2-SVM
Linear SVM constructs an optimal hyperplane f (x) = w>x + b which tries to correctly separate one class from the other. To achieve this optimal hyperplane, SVM aims to maximize the margin. Given that a perfect separation between the two classes is not always possible, slack variables ξiare introduced for each training vector xi,
i = 1, ..., N, whereby C is used as a penalization parameter.
SVM - Norm-based Regularizers
The Euclidean norm can be replaced to encourage sparsity l2 squared norm: kwk22 = J P j =1 wj2 l1 norm (LASSO): kwk1 = J P j =1 |wj| lp norm : kwkp = ( J P j =1 wjp)1/p
Business-driven research I
Two datasets from a Chilean bank.
Information about the benefits and losses for granting credit, and variable acquisition costs were available.
Benefit of accepting a good borrower: Return On Investment (ROI) of the loan.
(Broverman, 2010) Considering a principal A requested at maturity (terms) T at an interest rate given by r , the total interest I follows: I = AMr 1 − (1 + r )−M − A = A Mr 1 − (1 + r )−M − 1
The ROI of a loan i will simply be the total interest I divided by the principal A, i.e.
Business-driven research II
Cost of accepting a bad applicant: loss that is incurred when the borrower defaults.
The Basel II Banking Regulation Accords define the expected loss of a borrower as
L = PD · LGD · EAD
EAD: amount that is outstanding when default occurs; LGD: percentage of the EAD that cannot be recovered after all collections actions; PD = 1. Then
Business-driven research III
Five groups of variables with different acquisition costs.
Credit evaluation: The form that each applicant fills out (sunk
cost). e 5 per borrower of an executive’s time.
In-depth interview: Interview of the applicant after the evaluation process during a visit to his/her place of work. e 20 per borrower of an executive’s time.
Financial analysis: Estimation of the cash flow of the micro-enterprise. e 20 per borrower of a specialist’s time. System-level information: The bank acquired the information of the standing debts of the borrowers in the financial system
(new customers only). e 1000 for all borrowers.
Group penalty functions
A regularization strategy designed to penalize the use a group of related variables together, instead of removing weights independently (Yuan and Lin, 2006).
Strategy used with categorical attributes with multiple levels (related dummy variables), or multiclass classification.
We propose using the l∞-norm regularizer (Zou and Yuan,
2008): Γ(w) = J X j =1 ||w(j )||∞,
where ||w(j )||∞= maxl ∈Ij{|wl|}, i.e. the highest weight (in magnitude) for each source of variables j = 1, . . . , J is
minimized, being Ij the set of variables that belong to source
Proposed formulations
The l2l∞-SVM method
It combines three objectives: the Euclidean norm minimization (also known as Tikhonov regularization), the l∞-norm penalization for grouped feature selection, and the hinge loss minimization to guarantee an adequate model fit. C , λ > 0 are parameters that will be tuned via cross-validation
Proposed formulations
The l2l∞-SVM method II
In order to avoid using of a non-smooth function in the previous problem, we introduce a set of auxiliary variables zj≥ 0, and add new constraints |wl| ≤ zjfor each l ∈ Ijand j = 1, . . . , J. It results in a QPP.
Proposed formulations
The l1l∞-SVM method
kwk1=Pn
i =1|wi| denotes the l1-norm of w.
Proposed formulations
The l1l∞-SVM method
Same trick to avoid using of a nonsmooth function in the previous problem, leading to a LPP
Profit Computation
Solution tuple Λ = {w, b} for an SVM problem. Validation subset V with xv
l and yl ∈ {−1, +1}, for
l = 1, . . . , |V|.
The decision rule is sgn(w>xvl + b).
The total profit is: (Benefit-Losses-Variable acquisition costs).
Experimental Settings
New customers (1510 applicants, 629 defaulters, 94 variables) and returning customers (5799 applicants, 872 defaulters, 46 attributes).
Validation procedure: 10-fold crossvalidation.
Values for C and λ: {2−7, 2−6, ..., 2−1, 20, 21, ..., 26, 27} A combination of random undersampling and SMOTE oversampling was used for the returning customers. AUC and Profit as main performance metrics.
Performance summary
New customers, profit as performance metric.
Predictive performance for all feature selection approaches. All monetary metrics are expressed in Euros for a group of approx. 150 applicants (one tenth of the full sample). For the logistic regression with the backward elimination process, the cutoff is optimized to maximize profit.
Logit Fisher+SVM RFE-SVM l2l∞-SVM l1l∞-SVM
AUC 69.6 50.0 60.1 66.6 66.6 Accuracy 70.4 58.3 64.2 68.2 68.2 Vars. selected 28.7 5 5 35.4 26.4 Sources selected 4.8 2.9 2.5 1.1 1 Profit 36 1107 910 4449 4699 Benefits 8769 9965 8528 8342 8370 Losses 3235 5933 3745 3045 3078
Performance summary
Returning customers, profit as performance metric.
Predictive performance for all feature selection approaches. All monetary metrics are expressed in Euros for a group of approx. 580 applicants (one tenth of the full sample). For the logistic regression with the backward elimination process, the cutoff is optimized to maximize profit.
Logit Fisher+SVM RFE-SVM l2l∞-SVM l1l∞-SVM
AUC 67.7 62.3 56.9 63.2 63.2 Accuracy 84.9 56.6 56.6 54.6 57.4 Vars. selected 28.8 20 5 31 31 Sources selected 2.9 2.2 1.9 1 1 Profit 35913 24354 22972 36581 38618 Benefits 67282 42881 39193 42419 44874 Losses 10372 4427 5077 3564 3982
Performance summary
New customers, AUC as performance metric.
Predictive performance for all feature selection approaches. Predictive performance for all feature selection approaches. All monetary metrics are expressed in Euros for a group of approx. 150 applicants (one tenth of the full sample). For the logistic regression with the backward elimination process, the maximum likelihood cutoff is used (0.5).
Logit Fisher+SVM RFE-SVM l2l∞-SVM l1l∞-SVM
AUC 69.6 69.6 70.4 70.7 70.7 Accuracy 70.6 70.0 70.9 71.3 71.5 Vars. selected 28.7 80 90 90.5 58.5 Sources selected 4.8 5 5 4.3 3.9 Profit -5 -305 -40 1806 2541 Benefits 8018 7738 8003 8107 8114 Losses 2525 2287 2288 2342 2386
Performance summary
Returning customers, AUC as performance metric.
Predictive performance for all feature selection approaches. Predictive performance for all feature selection approaches. All monetary metrics are expressed in Euros for a group of approx. 580 applicants (one tenth of the full sample). For the logistic regression with the backward elimination process, the maximum likelihood cutoff is used (0.5).
Logit Fisher+SVM RFE-SVM l2l∞-SVM l1l∞-SVM
AUC 67.7 67.8 65.0 67.7 67.4 Accuracy 65.1 64.0 61.7 63.8 63.8 Vars. selected 28.8 40 40 45.6 44.2 Sources selected 2.9 3 3 2.8 2.1 Profit 23579 21519 19740 23263 30399 Benefits 48436 47218 45718 47008 47315 Losses 3860 3716 3995 3734 3802
Feature selection performance
New customers
Performance for an increasing number of features. Various performance metrics.
● ● ● ● ● 0.55 0.60 0.65 0.70 0.75 25 50 75
No. of Selected Features
A UC Method ● ● ● ● RFE−SVM Fisher+SVM Twin l1l∞−SVM Twin l2l∞−SVM (a) AUC ● ● ● ● ● ● ● 0 2000 4000 25 50 75
No. of Selected Features
Feature selection performance
Returning customers
Performance for an increasing number of features. Various performance metrics.
● ● ● ● ● 0.55 0.60 0.65 0.70 0.75 10 20 30 40
No. of Selected Features
A UC Method ● ● ● ● RFE−SVM Fisher+SVM Twin l1l∞−SVM Twin l2l∞−SVM (c) AUC ● ● ● ● ● 20000 30000 40000 10 20 30 40
No. of Selected Features
Concluding remarks(Methodology)
We presented two SVM-based strategies for simultaneous classification and feature selection.
Relevant attributes were identified using a group penalty function.
The proposals achieved similar classification power compared to well-known classification strategies, but outperformed them in terms of profit.
Best results with few variables (importance of model understanding)
Concluding remarks (Applied)
Traditional Credit Scoring variables may not be adequate for micro entrepreneurs.
Income is not relevant!!
Accountability is of major relevance
Importance of variables we are able to corroborate (utility service bills of the last three months on time).
Collaboration Company-University
As a University, it is of utmost importance for us to do proper technological transfer. The model is discussed and improved in conjunction with our counterpart.
Some extensions I
Lopez, J., Maldonado S. (2019): Profit-based Credit Scoring based on Robust Optimization and Feature Selection. Information Sciences 500, 190-202.
Some extensions II
Maldonado S., Peters, G., Weber, R. (2020): Credit Scoring using Three-Way Decisions with Probabilistic Rough Sets. Information Sciences 507, 700-714.
Some extensions III
Maldonado, S., Lopez, J., Vairetti, C. (2021): Time-weighted Fuzzy Support Vector Machines for classification in changing environments. Information Sciences 559, 97-110.
This study addresses the issue of credit risk modelling in changing environments.
In credit scoring, granting policies change over time due to risk-related internal decisions or macroeconomic factors. We propose constructing models that are robust in the presence of small changes in the data distribution.