3 The mechanics of credit scoring
3.2.1 Process and strategy
Credit scores are developed as tools for managing the business, which broadly speaking has two aspects:
Selection—How many cases enter the system, and the immediate result? Outcome—What happens to those cases subsequently?
The selection aspect only applies to selection processes, such as application scoring for new- business origination. In these instances, volumes, reject rates, accept rates, and take-up rates have to be measured. They have a major bearing on the profitability and growth (or shrinkage) of the portfolio, and will be a function of the lenders’ marketing strategies, business model, competition, and customers’ circumstances.
Once in the system, the focus shifts to the outcome, or subsequent account performance. In credit scoring, just as in gambling, the term ‘odds’ is used; the casino comes to the workplace— but instead of monetary ‘winnings to wager’ odds, it is used in the context of good/bad odds, bad rates, default probabilities, or probability of good (P(Good)). These are all a function of the bad definition, and will vary according to the product, process, and company. The usual interpretation of ‘bad’ is, ‘If I knew then what I know now, I would not have done the busi- ness!’ and vice versa for good. Lewis (1992) indicated that for the bulk of risk developments he was involved with, the good/bad odds fell into the 10 to 20 times range, but ranged from 1/1 to 125/1. In high-risk markets, the odds are usually lower, but are offset by higher profits on good accounts.
An example of these calculations is provided in Table 3.2. The good/bad odds and P(Good) calculations include only goods and bads, as they are the only accounts included in the mod- elling process (see Section 15.2, Good/Bad Definition). In the example, there are 60,000 goods and 3,000 bads, giving a good/bad odds rate of 20 times, and a P(Good) of 0.952 (or 95.2%). The bad rate calculation also includes the 9,000 indeterminates, providing a total of 72,000 accounts, in which instance the 3,000 bads provide a bad rate of 4.2%.
64 Module A : Setting the scene
Table 3.2. Odds and bad rate calcs
Description All accounts Good/bad odds Bad rate
Good 60,000 60,000 60,000 Indeterminate 9,000 9,000 Bad 3,000 3,000 3,000 Exclude 900 Total 72,900 72,000 Good/bad odds 20.0 P(Good) 0.952 Bad rate 4.2%
Reject inference
In selection processes, both ‘accepts’ and ‘rejects’ must be gauged. Unfortunately though, there is usually no performance data for rejects. If accept performance only is used, there will be a
sample selection bias, because a significant subgroup to which the scorecard will be applied in
practice, has been ignored. Reject inference attempts to address this potential bias, resulting in two sets of performance measures: known performance, for accepted applicants where performance is readily available; and inferred performance, informed guesses, provided by the reject-inference process.
If the existing selection process is providing any value, then known performance will always be better than any inferred performance, usually by a factor of two or more. Scorecard devel- opers have some discretion in setting the multiple, and it is considered good practice to be a bit harsh on past rejects, in order to reduce the size of the ‘swap set’—the set of cases that might receive a different decision from the new model. Care should, however, always be taken— especially where there are large numbers of rejects—as reject inference is fallible, and the inferred performance might distort the results. Indeed, many scorecard developers and other commentators dispute the value that can be added by reject inference. Over time though, lenders are learning how to use cohort performance—meaning outcome-performance data on other loans held by the customer—to enhance the estimates.
The combined set of known and inferred performance (‘all’) is then used for the scorecard development. An illustration is provided in Table 3.3, where of the 15,000 through-the-door applicants 80% were accepted, with an outcome all good/bad odds performance of 3 to 1 (which for interest, would be an extremely high-risk portfolio). For the 3,000 rejects, there is an inferred odds rate of 0.5 to 1—six times worse than the known performance group. When combined, the population of 15,000 has an odds ratio of 2 to 1, which provides 5,000 bads for the development.
Strategy
All of the above examples are at portfolio level. In practice however, these measures are applied to different segments, especially those defined by score, and especially for setting
Table 3.3. Inferred performance for rejects
Known Inferred All
Good 9,000 1,000 10,000 Bad 3,000 2,000 5,000 Total 12,000 3,000 15,000 Odds 3.0 0.5 2.0 P (Good) 0.750 0.333 0.667 Bad rate 25% 67% 33% % Row 80% 20% 100%
strategy. Figure 3.2 shows the good/bad distributions and marginal bad rates (inclusive of inferred performance) for a hypothetical scorecard. When comparing different models applied to the same set of data: (i) the greater the distance between the two distributions, the better; and (ii) the steeper the bad rate graph’s slope, the better. This is not sufficient by itself, as the scores provide the greatest value when used in strategies, the choice of which will depend upon the lender’s goals. Figure 3.3 shows what the cut-offs would be under two traditional 66 Module A : Setting the scene
0 200 400 600 800 1,000 1,200 200 300 400 500 600 700 800 900 Score # Cases per five points 0% 5% 10% 15% 20% 25% 30% 35% 40% Bad rate Bads Goods Marginal bad rate
Figure 3.2. Bad rate by score.
5% 10% 15% 20% 25% 30% 35% 40% 350 375 400 425 450 475 500 525 550 Score
Current bad rate Current reject rate Cumulative bad rate Cumulative reject rate
Same reject rate @ 460
Same bad rate @ 425
(hypothetical) scenarios:
Same reject rate—Used if the lender wishes to reduce bad debts. A score cut-off of 465 will
match the historical reject rate of 25.5 per cent, and reduce the bad rate from 10.3 to 9.1 per cent (an 11.6 per cent reduction).
Same bad rate—Used if the lender wishes to gain market share and grow the business. A
score cut-off of 425 will match the historical bad rate of 10.2 per cent, and reduce the reject rate from 25.5 to 18.4 per cent (a 9.5 per cent improvement).
Such approaches are simple, and are favoured when application scoring is first implemented. Lenders can achieve core objectives, without massive changes to the structure of the business. There are also a lot of choices in between these two points, and when circumstances are very favourable—especially where lenders have invested heavily in their back-end processes— lenders may risk even lower cut-offs.
Ideally, lenders should try to maximise profit. Lewis (1992) was the first to highlight the obvi- ous approach of setting the cut-off to the lowest score with a contribution greater than or equal to zero, which implies accepting any account that provides a profit. As a highly simplified example, if each good account results in profit of $1 and each bad in a loss of $19, then the opti- mal cut-off is where the marginal good/bad odds are 19 to 1. The task then becomes one of com- ing up with reliable profit and loss figures at account level, which presents its own challenges.
These are the Model T versions of strategy-setting using credit scores. Traditional approaches assume that the same offer is made to each customer, and that risk is the only fac- tor considered in the decisions. Over time, lenders have become more comfortable with credit scoring, and have learnt how to: (i) take potential profitability into consideration; (ii) incorp- orate other aspects of customer behaviour (response, retention, and revenue); (iii) use it to adjust loan terms, especially for risk-based pricing; (iv) use it at other stages of the risk man- agement process (marketing, account management, recoveries); (v) apply scientific approaches to better achieve business goals (champion/challenger, experimentation, optimisation, simulation); and (vi) use it for other purposes, such as forecasting and portfolio valuation.
3.2.2 Scorecard performance
Credit scoring provides an extremely valuable tool for measuring risk, but at the same time, the results need to be measured. The particular aspects of interest are power and accuracy, both of which are subject to drift. Power refers to a score’s ranking ability, or the extent to which it discriminates between good and bad. It is the primary attribute that lenders require of scorecards; the greater the power, the greater the value they can provide in business processes. In contrast, accuracy refers to how closely the odds or bad rate estimates approximates what happens in practice. Because it is so dependent upon economic, operational, marketing, and other exogenous factors that cannot be captured by credit scores, it is secondary to power; and can really only be achieved through calibration, based on newer data, long-term averages, (or ‘central tendencies’), supplementary economic modelling, or even judgmental overlays. It is of primary interest in finance functions, especially where the scores are being used for pricing,
68 Module A : Setting the scene
forecasting, capital reserving, or other calculations. And finally, drift is the extent to which things have changed over time, which has implications for power, accuracy, and the overall effectiveness of the scorecards within the business. These changes are illustrated in the accompanying figures: Figure 3.4 shows possible changes in the account distribution, while Figure 3.5 shows changes in the model’s power and accuracy.
Power Accuracy Account distribution
loss loss
No No Changes in ‘all’ are accompanied by proportional changes in good and bad, across the entire range.
No Yes A constant change in the good/bad odds along the full range of possible scores.
Yes No Slope of the score to odds curve reduces, without a change in the overall good/bad odds.
Yes Yes Both the slope and the overall good/bad odds change.
Number of accounts Goods All Score BadsBads Low High 0 High Ln(good/bad odds) Score Power loss Accuracy loss Baseline Low High Low High
Figure 3.4. Score distributions.
While a loss of accuracy can be corrected by recalibrating the scorecard or modifying strat- egies (cut-off, limit, etc.), the only way to correct for a loss of power is by modifying or rede- veloping the scorecards. In any event, there must be strict procedures in place, to determine when drift moves beyond acceptable boundaries. There are only a few measures used to assess model accuracy. Most lenders will focus on relative changes to the outcome measures at the portfolio level (like changes in the overall bad rate), but it is possible to use binomial prob-
abilities, the Hosmer–Lemeshow statistic, or the log-likelihood measure. The latter splits the
error into its power and accuracy components, using naïve models as reference points. In contrast, there are a lot of tools available to measure power and drift, the generic terms for which are measures of separation, measures of divergence, or power-divergence measures. When used to measure scorecard power, they are gauges of the graph’s slope in Figure 3.5. The most commonly used measures are rank-order correlation coefficients that provide values between1 and –1, where 1 means it is always right, –1 means it is always wrong, and 0 means there is no relationship. Such measures include the Gini coefficient (also called Somer’s D) and Spearman rank-order correlation coefficient, while the Receiver Operating
Characteristic is similar. Other measures include: the Kolgomorov–Smirnov statistic, which
provides the maximum difference between the cumulative percentage of goods and bads, across the range of possible scores; the chi-square Statistic, which measures the difference between observed and expected values, where the expected values for each score range assume the average odds; and the Information Value (Kullback divergence measure), which measures the difference between two distributions. Most of these same measures can also be used to measure drift in the score distribution, in particular the chi-square statistic, Kolgomorov–
Smirnov Statistic, and Stability Index (Kullback divergence measure, applied to changes in a
distribution over time).
3.2.3 Default probability and loss severity
Credit scoring’s primary strength is its ability to rank risk. Increasingly, however, lenders have to estimate expected losses (ELs)—and even profits—whether for risk-based pricing or port- folio valuation. The EL is the amount that the lender expects to lose, based upon available data. It is made up of two parts: probability-of-default (PD), which is the risk of non-payment according to some definition; and loss severity, the extent of the loss in the event of default, which is affected by the exposure-at-default (EAD), loss-given-default (LGD), and maturity of the loan (M).
Equation 3.4. Expected loss $EL PD% $EAD LGD% f(M)
Probability-of-default (PD%)—An obligor (borrower) risk rating, which is related to indi-
vidual economic and environmental circumstances.
Exposure-at-default ($EAD)—A monetary value related to the outstanding balance, agreed
loan limit, the lender’s shadow/target limits, and loan product characteristics.
Loss-given-default (LGD%)—Proportion of the EAD that the lender expects to lose if
Maturity (f (M))—An adjustment that is a function of the remaining loan term or repayment
schedule, which applies in the wholesale market for maturities of longer than one year. Care must be taken here, as there will always be a positive correlation between default prob- ability and loss severity, which is not captured in many models. This is best illustrated by con- sidering an economic downturn, when both increase: (i) as asset values reduce, counterparties are more likely to walk away from them, which results in an increase in both LGD and PD;3
(ii) LGDs increase, because the time frames required to collect, if at all, become longer; (iii) because the number of defaults are higher, the LGD values will be dominated by those occurring during downturns, which will result in conservative results for capital allocation and pricing calculations; and (iv) EADs may be higher, because lenders are more likely to: (a) take greater advantage of any credit lines currently available; (b) request increases; and/or (c) abuse the facilities. On this last point, there are also contrary tendencies, because lenders relax and tighten their lending policies according to the perceived risk, both for individual borrowers, and during the cycle. This leads to higher EADs during upturns, and for companies perceived as low risk.4
Some further comments can be made with respect to some of the individual elements. First, according to Miu and Ozdemir (2005:30) it is common practice to split EAD into drawn EADd and undrawn EADucomponents, and the ‘forward-looking dollar amount’ is calculated as EAD d EADd u EADu. The components are calculated for defaulters using aggregated drawn and undrawn values at time of default and one year prior:
EADd min (dT, dT1)/dT1, and
EADu min (1, max (0, dT dT1)/uT1).
Note that these formulae assume defaulters have—on average—been managed at or within their limits.
Second, there are two primary approaches for LGD estimation: (i) workout, which discounts post-default cash flows; and (ii) market, which uses the market value of a security at time of default. The latter is infeasible for retail portfolios. Third, the final term, f (M), is used to recog- nise the higher risk of longer maturities, and applies mostly to corporate, inter-bank, and sover- eign lending. It is usually dropped, because in most cases: (i) its impact is negligible; (ii) lenders annualise the PD, EAD, and LGD values; and (iii) at least part of it may be already reflected in the EAD and LGD values. For loans with known repayment schedules, M is calculated as the weighted-average time-to-maturity, using the scheduled cash flows. If M cannot be derived, then the termination date of the agreement should be used, as a conservative estimate. M is not used directly in the formula, but is instead used to calculate an adjustment, the ‘f(M)’ (function of maturity) shown in Equation 3.4, which is usually only slightly greater than 100 per cent. 70 Module A : Setting the scene
3 This can be especially difficult for real estate markets, where asset correlations are high. Property owners are
prone to jump ship simultaneously, especially when their wage or rental incomes fail to cover loan repayments.
Fourth, losses can be split into: (i) the loss of principal; (ii) collections and recovery costs (workout and legal); and (iii) the cost of funds (Schuermann 2004). Discounting the post- default cash flows usually captures the latter.
According to Miu and Ozdemir (2005), if the post-default cash flow volatility has already been captured elsewhere, then a risk-free discount rate should be used. Otherwise, there should be a risk premium.
EAD and LGD could theoretically be modelled using statistical methods, but the low numbers of defaults may make it infeasible. LGD is especially problematic due to problems obtaining data on the amount and timing of post-default cash flows, unless appropriate systems are in place. Irrespective, any bank hoping to use the advanced approach under Basel II needs to come up with these estimates, whether using own or pooled data.
And finally, in general, the possible post-default outcomes are cure (rehabilitation), restruc-
ture (renegotiation), and liquidation, and if the latter, funds may be recovered by realising the
collateral’s value, calling upon guarantees, any residual remaining after all senior debt has been settled, or other sources.5Studies have shown that the LGD tends to be a function of:
type of debt (bank loan, bond, store credit); contractual terms (seniority, collateral); market segment (higher in sectors with greater assets); and economic conditions (better when times are good). The LGD will also vary depending upon the lender’s bargaining power, experience in managing distressed borrowers, and ability to realise collateral’s value.
Finance calculations
Credit scoring was originally used for accept/reject decisions in fixed-offer scenarios, but is increasingly being used in more innovative ways. It has become the basis for the expected-loss calculation, which is used for risk-based decisioning and ‘value at risk’ (VaR) models. Risk- based decisioning includes: risk-based pricing, where prices and loan terms are adjusted according to the level of risk, which is especially common where the resulting portfolios will be securitised; and risk-based processing, where other actions are adjusted, such as the level of documentation, or number of security checks when processing applications.
In contrast, VaR models do not affect decisions on a deal-by-deal level, but instead focus on the portfolio. They are used to provide estimates of worst-case losses that can arise from mar- ket fluctuations, assuming a given time frame and confidence interval . . . the greater the EL and loss volatility, the greater the possible unexpected loss. At the extreme, the loss may be catastrophic, resulting from events that might occur once in a millennium. VaR models have become the basis for determining banks’ capital requirements, and have been adopted as part of the Basel II regulatory framework. The formula in Equation 3.4 is still quite simplistic, as it
5 The concepts relating to post-default outcomes and treatment presented in these two paragraphs were influenced
by presentations by, and discussions with, Christian Endter and Evren Üçok of Mercer Oliver Wyman, during early 2006.
does not recognise the potential variation that can occur in the underlying values.6Regulators
will thus increase capital requirements, to ensure that there is sufficient capital to handle unex- pected losses (see Chapter 36, Capital Adequacy).
Bad versus default definition
When dealing with corporate bonds, the definition of good and bad is clear-cut—either the obligor defaults, or does not. When dealing with loan accounts however, the situation is different—and there are usually differences between the default definition and the good/bad definition used for a scorecard development. The scorecard good/bad definition focuses upon providing the best possible risk ranking; whereas the default definition is used for finance