Learning curve is a graphical tool to show the changing performance of a series of classification models. The changing trend is illustrated in a curve.
Possible factors that can change continuously and result in the continuous changing in the performance of models are:
§ The size of the train sample. It can be denoted by the number or the percentage of the cases in the sample. For example, 100, 250, 500, 1000, 1500, 2000 cases, or 10%, 25%, 50%, 80%, 100% of all cases are used as the train sample.
§ The model parameters. Some model parameters represent the structure of models. For example, the size of decision trees; the number of hidden nodes in neural networks. § The number of input variables. Sometimes a feature selection algorithm orders the
independent variables according to a particular criterion. In this order they are added one by one to the subset of input variables. A series of models can be trained with these subsets of input variables that have increasing number of predictors.
In a learning curve the performance of models is usually denoted by the total error rate. With the help of a learning curve, the changing trend of the prediction accuracy caused by a particular changing factor can be observed intuitively.
Following three sections introduce the method of using learning curves to analyze the changing of the classification accuracy due to the change of train sample sizes and model structures.
4.3.2 Incremental case analysis
Incremental case analysis is the way to show the changing error rates of a series of classification models generated by using train samples with increasing sizes (cf. WeIn98, P. 173).
The goal of incremental case analysis is to find the least sample size to train model. If the same quality of models can be generated by training with fewer cases, then the smaller train sample will be used. The smaller train sample can speed up learning, especially for computation-extensive methods such as neural networks, the speedup is dramatic. For some
real world application, when large number of examples are available, which is usually the case in credit scoring due to the mass customers, it may be not necessary to use all the cases to find the best solution if the model with fewer cases can induce exhaustively the concepts in the data set.
Incremental case analysis uses learning curves to monitor the incremental change in errors and illustrate the potential for additional gains in performance when the sample size is increased. Usually the test error rate will decrease as the number of train cases increase. However, the degree and speed of the decrease vary greatly with different data sets. For some data sets that imply more complex concepts, the error rate may be decreased significantly and continuously when increasing training cases are used; for others data sets with simple concepts, the error rate may decrease at first and stop decreasing once a certain amount of training cases is exceeded.
4.3.3 Incremental complexity analysis
Incremental complexity analysis is the way to show the changing error rates of a series of classification models generated with different model sizes (cf. Weln98, P.176). Model size is here the synonym of complexity or the capacity of models. For example, the larger the decision tree, the more complex it is; the more the number of hidden nodes in a neural network, the more complex it is. Model complexity can be adjusted by tuning model parameters.
The goal of incremental complexity analysis is to find the right size solution. As shown in Figure 4.3.3/1, the optimal model size is the one that neither under-fits nor over-fits the data. If the model is too small, it will not fit the data enough well with its limited generalization power. If the model is too large, it will over-fit the data and lose generalization power by incorporating too many accidents or outliers in the training data not shared by other data sets (cf. GaTa97, P. 11).
Therefore, with the increasing of model sizes the test error rate decreases at first, at some point the model becomes too large, producing over-fitting which results in the increasing of error rates.
Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ ŸŸ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ ŸŸ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ ŸŸ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Data Model Ÿ Data Model Ÿ Data Model Train Error Model size Model is too simple
Eg = Generalization Error on test data
Eg Train Error Model size Model is optimal Eg Model size Train Error Model is too large
Eg
Figure 4.3.3/1: Test error tradeoff in terms of model size (cf. GaTa97, P. 10)
4.3.4 The basic phenomenology of incremental analysis
For some algorithms, the sample size and the model size have this relation: the larger the sample size, the more complexity the found model, i.e. as the number of train sample cases increases, the model size tends to increase. Therefore, the incremental case analysis should assume a fixed model size; the incremental complexity analysis should assume a fixed sample size. Figure 4.3.4/1 and Figure 4.3.4/2 describe the basic phenomenology of incremental analysis with learning curves. For a fixed model size, as the training data increase, the test error rate decreases and get to an asymptotic value when the sample size is large enough. For a given sample size, as the model size increases, the test error rate decreases at first, after reaches an optimal model size it begins to increase.
Sample size Fixed model size Test error rate
Sample size
Model size increasing Test error rate
Model size Fixed sample size Test error rate
Model size
Sample size increasing Test error rate
Figure 4.3.4/2: Learning curve for incremental complexity analysis
The following phenomena may happen when an incremental analysis is carried out. Although not always being inviolate, they are observed in many applications (cf. WeIn98, P. 180): § When the size of the train sample is increased, test error rate decreases significantly and
no evidence of reaching an asymptotic value, it may indicate that the concept in the underlying population has not been fully extracted, the model can be improved by using more cases.
§ When the size of the train sample is increased, test error rate stops decreasing for a fixed model size, it may indicate the model size should be increased to increase the learning ability of the model.
§ When the size of the train sample is increased, test error rate stops decreasing for any increased model size, significant gains in performance may be not achieved by either increasing the train sample size or increasing the model size. The model is reached its full ability to extract knowledge from the data set at hand.
Using incremental analysis helps answer a key question for mining large data set: how many useful concepts can be extracted from the data with the least learning cost in terms of computation time. Suppose two solutions are close in classification accuracy, the less- complex solution that learned from a smaller sample has some advantages: more explanatory, less variable performance on new data and more rapid to be trained or applied to new data. Through the incremental analysis, simpler, faster and more robust classification models can be found.
5 An example of the credit scoring model evaluation
In this chapter, different methods are used to develop scoring models for a real world credit risk assessment problem. Five classification algorithms are evaluated and compared:
§ Linear discriminant analysis (LDA) § Logistic regression (LR)
§ K-nearest-neighbors (k-NN) § Model tree algorithm --- (M5)8
§ Multi-layer perceptron neural network (MLP)
The details of these classification algorithms were provided in Liuy02, Chapter 4. They are representative for three professional backgrounds: statistics, machine learning and neural networks. The process of evaluating the generated classification models are described, which utilizes various model evaluation criterions and methods introduced earlier in this paper.