Chapter 5 Empirical findings for single-period models using logistic regression
5.1 One year prior to failure model using logistic regression
5.1.3 Assessing the fit of the model
regression model: a logistic regression model is not estimated to minimise variance and so the
ordinary least square approach to goodness-of-fit does not apply. Nevertheless, pseudo
indices allow for the comparison of logistic regression models as up to five years prior to failure
models will be developed using logistic regression. Since the purpose of a financial distress
prediction model is to predict the status of a firm, failed or active, it is intuitively appealing to
assess the fit of the one year prior to failure model via a classification matrix like Table 5.3:
Table 5.3 Classification matrix for a binary outcome variable
Predicted
Observed Active Failed Total
Active c f c + f
Failed m h m + h
Total c + m f + h n
Table 5.3 is the matrix of cross-classifying the outcome variable with a dichotomous variable
whose values are derived from comparing each estimated probability to a cut-off point: if the
estimated probability is more than the cut-off point the derived variable is equal to 1, otherwise
it is equal to 0. Out of the total number of observations n, c and h refer to the number of correctly predicted active firms (or the number of correct rejections) and the number of correctly
predicted failed firms (or the number of hits ) respectively. The letters of m and f refer to the number of incorrectly predicted failed firms (or the number of misses), and the number of
incorrectly predicted active firms (or the number of false alarms) respectively. A Type I error
rate (also called miss rate) and a Type II error rate (also called false alarm rate) can be
Table 5.4 One year prior to failure logistic regression model’s classification of firms in the estimation sample
The estimation sample comprises 561,304 active firms and 42,861 failed firms. Using the optimal cut- off point of 0.068, the model correctly predicts 416,163 active firms and 31,778 failed firms.
Predicted
Observed Active Failed Total
Active 416,163 145,141 561,304 Failed 11,083 31,778 42,861
Total 427,246 176,919 604,165
Using the optimal cut-off point of 0.068, the one year prior to failure model estimated in Table
5.1 correctly classifies 74.1% of the firms in the estimation sample in Table 5.4 with the Type
I and Type II errors being 25.9%. The optimal cut-off point of 0.068 lies in the point where the
sensitivity (the hit rate ) and specificity (correct rejection rate ) curves cross in Figure
5.1.
Figure 5.1 Plot of one year prior to failure logistic regression model’s sensitivity &specificity versus all possible cut-off points in the estimation sample
point, an alternative way to assess the forecasting accuracy of a predictive model is a ROC
curve, which is a plot of the hit rate (sensitivity) versus the false alarm rate (1-specificity)
across all possible cut-off levels. The steeper the ROC curve at the left and the larger the AUC,
the higher the predictive accuracy of a model. The AUC represents the probability that a
randomly selected failed firms is rated with greater suspicion of failure than a randomly
selected active firm. As a general rule, a model with an AUC of 0.5 has no discrimination, 0.7
to 0.8 acceptable discrimination, 0.8 to 0.9 excellent discrimination and 0.9 or above
outstanding discrimination (Hosmer & Lemeshow 2000, p. 162). When a model provides
perfect discriminant power, the AUC it achieves will be 1 with the hit rate being 1 and the false
alarm rate being 0.
To check the predictive capacity of the one year prior to failure model using logistic regression,
I graph its ROC curve across all possible cut-off points in Figure 5.2. The AUC provides a
measure of discrimination which is the likelihood that a failed firm has a higher probability of
failure than an active firm. The AUC in Figure 5.2 is 0.818, suggesting that for a randomly
selected failed firm and a randomly selected active firms, there is 0.818 probability that the
model predicted probability of failure will be higher for the failed firm than for the active firm.
At a 95% confidence interval, the AUC is between 0.815 and 0.820. According to Hosmer and
Lemeshow (2000)’s general rule, the one year prior to failure model provides excellent in-
Figure 5.2 One year prior to failure logistic regression model’s ROC curve in the estimation sample
Since the one year prior to failure model is estimated based on the 70% observations, its
goodness-of-fit can be assessed on the remaining 30% observations referred to as the holdout
sample. The reason for this type of assessment of model performance is that a fitted model
usually performs in an optimistic manner on an estimation sample. Using the values of the
coefficients reported in Table 5.1, the one year prior to failure model achieves a similar one-
year-ahead out-of-sample predictive accuracy rate: correct classification of 74.0% of the firms
using the optimal cut-off point of 0.067 in Table 5.5, and an AUC of 0.816, as small as 0.813
holdout sample
The holdout sample comprises 241,475 active firms and 18,598 failed firms. Using the optimal cut-off point of 0.067, the model correctly predicts 178,571 active firms and 13,767 failed firms.
Predicted
Observed Active Failed Total
Active 178,571 62,904 241,475 Failed 4,831 13,767 18,598
Total 183,402 76,671 260,073
In summary, the one year prior to failure model using logistic regression effectively describes
the outcome variable, which serves as important evidence of the predictive accuracy of those
independent variables in the model. To address the question of how far one can predict financial
distress for privately held firms with acceptable discrimination, Sections 5.2 to 5.5 will develop