Chapter 3: Methodology
3.3. Research Methodology
3.3.2. Solution Design
Learning from those previous results (see section 2.3), it is apparent that credit scoring performance will never be perfect since the classification process is itself subject to errors (Nayak & Turvey, 1997). Moreover, the best performance reached from a study today may not be optimal in the future as many new customers may come with their new behaviours. Therefore, it is necessary to design a solution with continuous improvement processes as part of the solution itself. The current credit scoring mechanism should dynamically change as data changes. Dynamic changes in credit scoring models are possible by providing feedback to the current credit scoring process. By reviewing the current credit scoring process on the basis of the feedback given, the performance of the current credit scoring process will significantly increase.
The feedback will be given when we know the quality of the current credit scoring system. The quality of the current credit scoring will be reviewed based on the
Dynamic Credit Scoring Using Payment Prediction
more bad customers the worse will be the performance of a credit scoring system. Therefore we will base our study on the overdue payment report issued by the lender.
Account receivable performance reduces significantly because of late payments from customers. As Parisi (2006) has discussed the collection process is done after the late payments are known from account receivable reports. This reactive action is ineffective because overdue payments have already occurred. Therefore, it is necessary to build models that support proactive actions instead of reactive actions.
Proactive action is possible if there is a prediction that identifies those customers who will not make their payment on time. Therefore this dissertation will focus on building advance payment prediction in order to pre-empt overdue payments. Payment prediction will be given for each payment period by learning from all credit scoring parameters and all available payment histories.
Data arising from a credit scoring system represents an imbalanced data mining problem as many more good customers exist than bad customers. Subsequently, good payments records are many more than bad payment records. As a result, bad payments form the minority class. As imbalanced datasets over-emphasizes the majority class, bad payment prediction is a difficult task. The proposed solution is to transform the minority class in such a way as to make it the majority class on particular segments of the data. Thus the original data will be divided into two segments. The first segment will contain more bad payment records than good payment records, while the other segment will contain the rest. By learning from segments where bad payment records are the majority, we expect prediction performance to improve.
Dynamic Credit Scoring Using Payment Prediction
In data mining, it is a well known fact that in general there is no single best algorithm that performs well in all situations (Witten & Frank, 2005, p. 35). Therefore, a number of different algorithms will be utilized in the payment prediction process. Appropriate metrics will be applied to test the efficacy of various different schemes.
Payment Prediction Design
The payment prediction model is built on information from customer’s payments. Since information about customers is readily available in the form of credit scoring parameters, the latter are used in conjunction with payment histories to produce a payment prediction model.
Historical payment data is divided into two categories, namely good payments and bad payments. Good payments are payments that are paid in advance or within seven days of their due date, otherwise payments are categorized as being bad. Seven days is within the tolerance level for good payments since some payments may be late due to operational reasons. For example, data transfers from banks need a number of working days and some inter-branch transactions need several days to be accomplished. But delays of more than a week are due to customer failure to initiate payments on time.
Characteristics of bad payments are reflected in different combinations of credit scoring parameter values and payment history data. Since the first payment data contains no payment history, such bad payment characteristics can only be determined by a combination of credit scoring parameters. For the second
Dynamic Credit Scoring Using Payment Prediction
until the last payment, both credit scoring parameters and previous payment histories will be used to learn bad payment characteristics. This process continues until the seventh payment as data is only available for seven payments only. For the seventh payment, a combination of credit scoring parameters and all payments made up to this point will be used to characterize bad payments.
Different combinations of credit scoring parameters and payment histories will be used to segment data. A data segment consists of both bad and good payments. The number of bad payments compared with good payments may be less, the same, or greater. A segment that contains more bad payments than good payment data will be called a Majority Bad Payment Segment (MBPS). Since a MBPS contains more bad payment data, we would expect it to be an effective vehicle in studying bad payment characteristics. This expectation is borne out by the experimental results presented in Chapter 4. In this context, it becomes important to identify which segments are indeed MBPS.
Payment Prediction Algorithms
Algorithms are an important part of payment prediction modelling. However, as has been discussed in the literature review chapter, there is no single algorithm that is universally the best across all data domains. Anticipating this issue, it is thus appropriate to involve multiple algorithms and then make comparisons amongst them to find the best performer for the data domain under study.
Galindo and Tamayo (as cited in Servigny & Renault, 2004, p. 75) specify some requirements in algorithm selection, which are accuracy (low error rates arising from assumptions) and interpretability (understanding the output of a model).
Dynamic Credit Scoring Using Payment Prediction
Interpretability issues are important considerations if the algorithms are to be useful in practice (Gurka, Edwards, & French, 2007).
Previous studies show algorithms such as the C4.5 decision tree algorithm (Kauderer, Nakhaeizadeh, Artiles, & Jeromin, 1999), Logistic Regression (Sohn & Kim, 2007; Xu & Wang, 2007), Neural Networks (Yang, Li, Ji, & Xu, 2001) and Bayesian Networks (Hu, 2004) have high levels of accuracy in the domain of payment prediction,
However, excluding Neural Networks, the other algorithms produce interpretable models. Logistic Regression models are interpretable as their coefficients show the changes of experiencing an event (Pampel, 2000, pp. 18-20). The same holds true for Bayesian Networks, with Santana et al. (2006) observing that: “they are one of the most prominent techniques when considering the ease of knowledge interpretation achieved”. Likewise, Cano et al. (2007) explain that decision trees are highly interpretable. However, Cano et al. warns that the degree of interpretability of decision trees depends very much on their size. Large decision trees generally exhibit the phenomenon of over fitting and hence their generalization ability will be consequently punished.