3 The mechanics of credit scoring
3.3 What is the scorecard development process?
3.3.3 Scorecard modelling
Once data has been assembled, the development of a predictive model can be started. The first step is to choose a modelling technique. According to Siddiqi (2006), the issues that should be considered include: data quality, as missing data could force the use of decision trees; type of target variable, as linear regression is best suited for continuous outcomes, and logistic or pro-bit regression for binary; sample size, as decision trees require more data; implementation plat-form, because the final model has to be implemented in a business system; model transparency, which may be required both by regulation and the business, and often forces the use of trad-itional scorecards; and monitoring capabilities, as the lender has to track performance over time. Most lenders will have greater familiarity with certain techniques, and may favour them in spite of potential problems.
Thereafter, there are a number of different stages, which are covered in greater detail in Module E (Scorecard Development Process):
Transformation—Conversion of data into a form that can be used!
Characteristic selection—Which characteristics can provide value?
Reject inference—How would the rejects have performed if accepted?
Segmentation—Do certain subgroups require their own separate scorecards?
Training—What weight should be allocated to each variable?
The first step is to transform the data into a usable form. Even though there is plenty of data, it is often inappropriate for use within the model. There are a number of different transform-ation techniques available, but most retail (consumer and small-business credit) credit scoring systems have been developed to handle traditional scorecards. As a result, the most common
78 Module A : Setting the scene
transformation technique is to: (i) create fine classes for analysis; (ii) group these further into coarse classes of similar risk; and (iii) either convert the coarse classes into dummy variables;
or calculate a new characteristic containing a relative risk measure (like the weight of evidence) for each.
Another task often performed is characteristic selection, which limits the number of charac-teristics initially considered in the model development. Some scorecard developers will focus on finding characteristics that are correlated with the target variable but not with each other, in order to minimise multicollinearity—especially where sample sizes are small. This can be aided by using factor analysis to group the characteristics, and possibly even use these factors in the scorecard development. Others will use common sense to select the characteristics, and ignore the multicollinearity, instead relying upon the sample size, and ensuring that the point allocations make logical sense, to keep the standard error in check.
For selection processes, there is no historical performance available for rejects, and reject inference is used to make educated guesses. In the early days, no reject inference was done, but over the past few years, lenders have become more sophisticated, and many new terms have entered the credit scoring lexicon. Distinctions should be made between: (i) performance manipulation techniques, including reweighting, reclassification (rule- or score-based), and parcelling (polarised, random, or fuzzy); and (ii) reject inference techniques, including random supplementation, augmentation, extrapolation, cohort performance, and bivariate two-step;
and (iii) model types, including known good/bad, accept/reject, and all good/bad. Today, the most sophisticated approaches involve: (i) stratified-fuzzy parcelling; (ii) extrapolation;
(iii) known good/bad and accept/reject models in a two-step approach; and (iv) use of cohort performance. Special care must be taken where the number of rejects is very large, as the inferred performance may severely distort the results.
When developing credit scoring models, the cases included must be similar enough to be treated together, but different enough for models to distinguish between them. Different score-cards may be required for a single portfolio, and the segmentation, or scorecard splits, may be affected by five types of factors:7
(i) Marketing, the lender wishes to apply different strategies going forward, and requires greater confidence in one area (usually much higher risk) than another.
(ii) Customer, instances where certain characteristics do not apply, often related to a lack of credit related data.
(iii) Data, differences relating to what data is available, or when and how it becomes avail-able, especially for different channels or application forms.
(iv) Process, where the cases receive different treatment, whether because of operational, technological, legal, or other factors.
(v) Model-fit, all of the above, and others, where the relative importance of the predictors varies between groups.
7 The starting point for this was a framework provided by Thomas et al. (2001), who suggested strategic (mar-keting), operational (customer), and interactional (model-fit).
Care must be taken in all of these cases, as there must be sufficient data to develop each score-card, and bads are often in short supply.
Once the segmentation has been decided upon, model training can begin. This is the glory aspect of scorecard development, where a parametric technique (like logistic regression or DA), or non-parametric technique (like a NN) is applied. For a traditional scorecard, it is where the points (a combination of variable transformation and regression coefficients) are allocated. It is an iterative process, as the scorecard developer may have to generate many models, and/or make many cosmetic changes.
The key factor is to ensure that the points correspond to the relative risk of each group, and to avoid overfitting, especially where the predictors are correlated and sample sizes are small.
Scorecard developers will guard against: (i) gaps where no points are allocated; (ii) wrong-sign problems, where the points are the opposite of what is expected; (iii) point allocations that decrease where an increase is expected, and vice versa; and (iv) t-statistics, or other measures, indicating that variables’ relationships with the response function are insignificant. There will also be issues relating to: (i) controlling for certain factors, like company strategies; and (ii) staging, determining the order in which characteristics will be considered for possible inclusion.
3.3.4 Finalisation
After training has been finished, the next step is to finalise the model, and get it into produc-tion. This is covered in both Module E (Scorecard Development Process) and Module F (Implementation and Use), including:
Validation—Will the scorecard work in practice?
Calibration—Can the scores be used to provide estimates?
Strategy setting—How are the scores to be used?
Loading—Physical implementation of the scorecards, in whatever form!
Testing—Is the system working according to design?
Monitoring—Are the scorecards providing the expected results?
After the scorecard has been developed, the next step is validation, to ensure that the model will work on the intended population. Checks will be made to test: (i) the loss of predictive power, when applied to a validation sample; and (ii) score drift, when applied to a recent sam-ple. Lenders may also benchmark the results against other models, especially those developed by external agencies (credit bureau or rating agency). Where an existing model is to be replaced, the size and composition of the swap-set should be considered. In all cases, and throughout the development, documentation must be kept of what assumptions were made.
Calibration is used to: (i) ensure that the scores provided by different scorecards have the same meaning; and possibly (ii) determine or refine the probability estimates to be associated with each score. The easiest way is to create grades by banding score ranges, but many lenders want flexibility at score level. This can be done by doing score transformations, or alterna-tively by mapping the scores onto probabilities.
80 Module A : Setting the scene
Scores provide little value without associated strategies. In its simplest form, strategy setting may involve a simple one-dimensional cut-off for an accept/reject decision using an application-risk score. In more complex forms, there may be: (i) multiple cut-offs for different application-risk grades;
or (ii) combination with other factors, including bureau scores or response, retention, and/or revenue dimensions. The strategy may be chosen to cause minimal upset to the existing busi-ness process; or alternatively, to make best advantage of the new or updated tool.
Once the scorecard has been completed, the next stage is loading it into the system where it is to be applied. In modern environments, this involves setting up the scorecard details within a parameterised system, along with any changes to strategies that will accompany the new scorecards. In other environments however, it could involve creating, and possibly distribut-ing: (i) paper-based score sheets; (ii) electronic calculators or spreadsheets; or (iii) computer program code, whether on PC, network, or mainframe.
Once the scorecard has been loaded, especially where the calculations will be done elec-tronically, the next step is testing (also referred to as verification). This ensures that the system is working according to design; as opposed to other validations, that determine whether the design is correct. All stages of the decision process should be tested, including data, scores, and strategy. Initial loading and testing is best done in a separate test environment, but this is not always possible.
Validation and testing are performed not only at implementation, but also as part of ongoing monitoring thereafter. This includes: (i) drift reporting, to measure how much the data and score distributions have changed; (ii) back-testing, to ensure that the scorecards have the expected predictive power and accuracy, once performance data is available; (iii) decision process, to measure the scores’ impact on the business; (iv) adherence, to ensure that scores and policies are being applied as intended; and (v) portfolio analysis, to measure how well the business is doing generally.