Once the data are clean and the appropriate variables are selected, two sets of samples need to be generated, one of which is used to develop the model while the other is used for model testing. The former is known as the development sample and the latter is called the holdout sample. The clean dataset is usually split into an 80% development and a 20% holdout sample, although this percentage split varies from one financial institution to another.
Usually a stratified sampling method is applied as it not only ensures that the sample is randomly chosen, but it is also made to reflect the population in relation to some specific characteristics. For example, if the clean set of data contains 10% ‘bad’ and 90% ‘good’ instances, the stratified sampling method will ensure that both the development and holdout samples contain 10% ‘bad’ and 90% ‘good’ cases. The advantages of the stratified sampling method over other sampling methods are that it is not only efficient, but also improves the accuracy of estimation. As part of our scorecard development process, an 80% development and a 20% holdout sample were ob- tained, with each sample containing 6.7% of ‘bad’ applicants and 93.3% of ‘good’ applicants. An ‘80-20’ split was performed based on the recommendations of credit risk practitioners from whom the dataset was obtained.
It should be noted that while the generated samples are built from accepted applicants who are either ‘good’ or ‘bad’, no information is available on applicants who were denied credit. Figure 3.1 shows the samples of accepted and rejected applicants obtained from a population of credit applications. While the proportions of ‘good’ and ‘bad’ are known for the accepted applicants,
P o p u l a t i o n
A c c e p t e d R e j e c t e d / E x c l u d e d
x % g o o d y % g o o d ? % g o o d ? % g o o d
Figure 3.1: Reject inference
this is not the case for rejected applicants. Consequently, this phenomenon introduces some bias into the samples. The idea of reject inference has been suggested to cater for this problem. It is a process of deducing how a rejected applicant would have behaved had he/she be granted credit.
There have been several approaches to reject inference and one of the best ways to deal with reject bias is to grant credit to all applicants during a certain period of time to obtain a com- plete picture of the population applying for credit. However, due to the considerable amount of money that could be lost as a result of granting credit to all applicants, most financial organisa-
tions are not keen to take such a risk. Nevertheless, Banasik, Crook & Thomas (2003) had an exceptional opportunity to observe the repayment behaviour of applicants who would normally have been rejected and they discovered that the inclusion of reject inference in scorecard devel- opment would only result in a modest improvement in scorecard performance.
Another popular method is to build a scorecard using the accepted applicants and use this score- card to score and classify the rejected applicants as either ‘good’ or ‘bad’. Consequently, a new scorecard is built using the accepted (with their observed classes) and the rejected applicants (with their predicted classes). Readers are referred to Steenackers & Goovaerts (1989) for more information about this technique.
A different approach is to look for approved applications that are similar to each rejected ap- plication and assign the class label of the former to the latter. The only problem with this approach is that determining the definition of ‘similar’ proves to be quite difficult in practice (Baesens 2003).
A final alternative to reject inference is to classify applicants into three groups: ‘good’, ‘bad’ or ‘reject’. This approach is called the three-group approach and was proposed by Reichert et al. (1983). The only problem with this method is that a typical scorecard would split applicants into either ‘good’ or ‘bad’. As such, what one does with the ‘reject’ group is not clear.
In this research, reject inference is not carried out because the information on rejected applicants is not available. Also, the fact that there is no consensus on the necessity for reject inference or how it should be tackled appropriately (Baesens 2003, Kelly 1998) further supports our decision to ignore reject inference in this research.
A diagrammatical representation of the scorecard development process from the data cleaning step to the samples generation step is shown in Figure 3.2. The figures in the round brackets show the details of the data at each step of the scorecard development process. We began with a raw set of credit scoring data consisting of 38,766 records and 138 variables. Through the process of data cleaning and discretisation, the dataset was reduced to 15,576 records and 50 variables. In addition, the values of each variable were converted into their corresponding weight of evidence (refer to Equation 3.1). The application of stepwise regression analysis further reduced the number of variables to 20. The resulting dataset was then divided into
development and holdout samples. S a m p l e D a t a R a w D a t a C l e a n D a t a D i s c r e t i s e d C l e a n D a t a H o l d o u t S a m p l e D e v e l o p m e n t S a m p l e D a t a c l e a n i n g V a r i a b l e s e l e c t i o n D a t a d i s c r e t i s a t i o n 8 0 % 2 0 % 1 3 8 v a r i a b l e s 3 8 , 7 6 6 r e c o r d s 5 0 v a r i a b l e s 1 5 , 5 7 6 r e c o r d s V a r i a b l e v a l u e s = W O E 2 0 v a r i a b l e s 1 5 , 5 7 6 r e c o r d s 6 . 7 % B a d 9 3 . 3 % G o o d 6 . 7 % B a d 9 3 . 3 % G o o d 6 . 7 % B a d 9 3 . 3 % G o o d
Figure 3.2: Samples generation