Assessing the fit of the model - One year prior to failure model using logistic regression

Chapter 5 Empirical findings for single-period models using logistic regression

5.1 One year prior to failure model using logistic regression

5.1.3 Assessing the fit of the model

regression model: a logistic regression model is not estimated to minimise variance and so the

ordinary least square approach to goodness-of-fit does not apply. Nevertheless, pseudo

indices allow for the comparison of logistic regression models as up to five years prior to failure

models will be developed using logistic regression. Since the purpose of a financial distress

prediction model is to predict the status of a firm, failed or active, it is intuitively appealing to

assess the fit of the one year prior to failure model via a classification matrix like Table 5.3:

Table 5.3 Classification matrix for a binary outcome variable

Predicted

Observed Active Failed Total

Active c f c + f

Failed m h m + h

Total c + m f + h n

Table 5.3 is the matrix of cross-classifying the outcome variable with a dichotomous variable

whose values are derived from comparing each estimated probability to a cut-off point: if the

estimated probability is more than the cut-off point the derived variable is equal to 1, otherwise

it is equal to 0. Out of the total number of observations n, c and h refer to the number of correctly predicted active firms (or the number of correct rejections) and the number of correctly

predicted failed firms (or the number of hits ) respectively. The letters of m and f refer to the number of incorrectly predicted failed firms (or the number of misses), and the number of

incorrectly predicted active firms (or the number of false alarms) respectively. A Type I error

rate (also called miss rate) and a Type II error rate (also called false alarm rate) can be

Table 5.4 One year prior to failure logistic regression model’s classification of firms in the estimation sample

The estimation sample comprises 561,304 active firms and 42,861 failed firms. Using the optimal cut- off point of 0.068, the model correctly predicts 416,163 active firms and 31,778 failed firms.

Predicted

Observed Active Failed Total

Active 416,163 145,141 561,304 Failed 11,083 31,778 42,861

Total 427,246 176,919 604,165

Using the optimal cut-off point of 0.068, the one year prior to failure model estimated in Table

5.1 correctly classifies 74.1% of the firms in the estimation sample in Table 5.4 with the Type

I and Type II errors being 25.9%. The optimal cut-off point of 0.068 lies in the point where the

sensitivity (the hit rate ) and specificity (correct rejection rate ) curves cross in Figure

5.1.

Figure 5.1 Plot of one year prior to failure logistic regression model’s sensitivity &specificity versus all possible cut-off points in the estimation sample

point, an alternative way to assess the forecasting accuracy of a predictive model is a ROC

curve, which is a plot of the hit rate (sensitivity) versus the false alarm rate (1-specificity)

across all possible cut-off levels. The steeper the ROC curve at the left and the larger the AUC,

the higher the predictive accuracy of a model. The AUC represents the probability that a

randomly selected failed firms is rated with greater suspicion of failure than a randomly

selected active firm. As a general rule, a model with an AUC of 0.5 has no discrimination, 0.7

to 0.8 acceptable discrimination, 0.8 to 0.9 excellent discrimination and 0.9 or above

outstanding discrimination (Hosmer & Lemeshow 2000, p. 162). When a model provides

perfect discriminant power, the AUC it achieves will be 1 with the hit rate being 1 and the false

alarm rate being 0.

To check the predictive capacity of the one year prior to failure model using logistic regression,

I graph its ROC curve across all possible cut-off points in Figure 5.2. The AUC provides a

measure of discrimination which is the likelihood that a failed firm has a higher probability of

failure than an active firm. The AUC in Figure 5.2 is 0.818, suggesting that for a randomly

selected failed firm and a randomly selected active firms, there is 0.818 probability that the

model predicted probability of failure will be higher for the failed firm than for the active firm.

At a 95% confidence interval, the AUC is between 0.815 and 0.820. According to Hosmer and

Lemeshow (2000)’s general rule, the one year prior to failure model provides excellent in-

Figure 5.2 One year prior to failure logistic regression model’s ROC curve in the estimation sample

Since the one year prior to failure model is estimated based on the 70% observations, its

goodness-of-fit can be assessed on the remaining 30% observations referred to as the holdout

sample. The reason for this type of assessment of model performance is that a fitted model

usually performs in an optimistic manner on an estimation sample. Using the values of the

coefficients reported in Table 5.1, the one year prior to failure model achieves a similar one-

year-ahead out-of-sample predictive accuracy rate: correct classification of 74.0% of the firms

using the optimal cut-off point of 0.067 in Table 5.5, and an AUC of 0.816, as small as 0.813

holdout sample

The holdout sample comprises 241,475 active firms and 18,598 failed firms. Using the optimal cut-off point of 0.067, the model correctly predicts 178,571 active firms and 13,767 failed firms.

Predicted

Observed Active Failed Total

Active 178,571 62,904 241,475 Failed 4,831 13,767 18,598

Total 183,402 76,671 260,073

In summary, the one year prior to failure model using logistic regression effectively describes

the outcome variable, which serves as important evidence of the predictive accuracy of those

independent variables in the model. To address the question of how far one can predict financial

distress for privately held firms with acceptable discrimination, Sections 5.2 to 5.5 will develop

In document Predicting financial distress for privately held firms in the European Union (Page 137-143)