Loss given default estimation - Advanced Methods for Loss Given Default Estimation

2.5 Conclusion

3.3.2 Loss given default estimation

In Table 3.5 we compare our two-step models based on their explanatory power of LGD variation according to Equation (3.2). The LGD estimation results are in line with the classification accuracy in Table 3.4. In particular, RF mostly outperforms the other models in all testing methods for companies D and E. It also confirms the difficulties in handling company F’s dataset. As before, boosting has a slight advantage over the single versions of J4.8 and C5.0.

NegativeR2 _{values indicate that the model cannot explain the variation of the}

Company D Company E Company F

Method Is Oos Oot Is Oos Oot Is Oos Oot

At execution Direct OLS 0.1807 0.1045 −0.0445 0.1727 0.1166 −0.0487 0.0154 0.0153 0.0157 Direct RF 0.7672 0.2202 −0.0943 0.7127 0.1316 −0.0876 0.0154 0.0153 0.0157 Logit 0.2937 0.0524 −₀.0742 0.4246 0.1420 −₀.0760 0.1881 0.1821 0.1118 C5.0 0.7311 0.2284 −0.0791 0.6300 0.1743 −0.0721 0.2565 0.2484 0.1120 C5.0 Boosted 0.6311 0.1929 −0.0474 0.8023 0.2834 −0.0036 0.1489 0.1657 0.1119 J4.8 0.7677 0.2354 −0.0735 0.6004 0.2250 −0.0073 0.2584 0.2493 0.1120 J4.8 Boosted 0.8430 0.2282 −0.0721 0.7844 0.1950 −0.0641 0.2574 0.2483 0.1120 RF 0.8131 0.2939 −₀.0575 0.8182 0.2938 0.0047 0.1334 0.1191 0.1120 At default Direct OLS 0.2095 0.1251 −0.0500 0.2177 0.1794 0.0086 0.0154 0.0148 0.0151 Direct RF 0.8587 0.3180 0.0541 0.7718 0.2362 0.0369 0.0154 0.0148 0.0151 Logit 0.2281 0.0106 0.0455 0.4104 0.1729 0.0498 0.2007 0.1843 0.1116 C5.0 0.8419 0.3591 0.0476 0.8238 0.3111 0.0451 0.2757 0.2498 0.1117 C5.0 Boosted 0.8233 0.3466 0.0531 0.8217 0.3373 0.0486 0.1797 0.1394 0.1118 J4.8 0.8383 0.3633 0.0621 0.7687 0.2539 0.0479 0.2870 0.2501 0.1117 J4.8 Boosted 0.8871 0.3688 0.0666 0.8234 0.2527 0.0509 0.2779 0.2507 0.1117 RF 0.8808 0.4185 0.0732 0.8459 0.3570 0.0752 0.2014 0.1443 0.1119

Table 3.5: Coefficient of determinationR2 of the one-step and two-step models for com-

panies D–F. The table lists the used classification methods (step (1)). Random forest (RF) regression produces the estimation of LGD (step (2)) described in Section 3.2.2. The listed coefficients are calculated according to Equation (3.2). We validate the es- timates in-sample (Is), out-of-sample (Oos), and out-of-time (Oot). Direct OLS is the direct ordinary least squares regression of the LGD, and direct RF is the direct regression of the LGD with RF. Both are one-step models. Logit is the logistic regression. The tree classifiers C5.0 and J4.8 are performed in single and boosted version. In all cases, higher outcomes are preferable. We underline the best results for each testing method, both points in time, and each company.

suchR2_{. This result demonstrates that the available information at the contract’s}

execution is insufficient to adequately forecast LGD. Again, the estimation results improve with the additional information at the default of the contracts.

Along with this study’s two-step models, we fit two estimation models as a benchmark for LGD estimation: direct ordinary least squares (OLS) regression and direct RF regression. In most cases and most importantly in the out-of- sample and out-of-time estimation, we find that the two-step models exceed the explanatory power of direct estimation.

The in-sample results are remarkably good throughout all methods. Particu- larly the tree methods generate determination coefficients as high as 88%. The

3.3 Results 63

distinction between recovered and written off contracts seems negligible because direct RF regression yields similarly high R2. Although _R2 in general decreases

out-of-sample, the gap between R2 of the one-step and two-step model increases significantly. This finding rewards the consideration of recovery and write-off of the contracts. The coefficient of determination in the out-of-time estimation is indeed negative in most cases at the execution of the contracts. However, accounting for the additional information at default, the two-step models can explain up to 11% of the variation of LGD. Again, the difference in R2 between one-step and

two-step models is significant.

The reason for the large deviation between direct and multi-step estimation of LGD is the control for the different default ends. Any regression missing this information is biased because recovery is a key driver of LGD as we show in Section 3.3.3. Concerning Table 3.3 it is clear that recovered and written off contracts yield significantly different LGD values and distributions. Accounting for this single information, the distributions in Figure 3.1 can already explain the variation of the respective LGD to a large extent.

Due to the large set of available information in company D and E’s data, our methods perform similarly well on both. Company F’s results can hardly compete with those of companies D and E in the in-sample and out-of-sample estimation. However, company F’s out-of-time R2 _{is surprisingly high. The out-of-time coef-}

ficient of determination for company F is higher than most of its counterparts for companies D and E. It is also positive throughout the methods at the execution of the contracts. We attribute this effect to the very different datasets in terms of contract numbers and LGD distribution. In particular, the LGD of company F is very dense around 0 with 66% of the observations lying in the small interval of (−0.3,0.3) (see Figure 3.1c). This density facilitates the forecasting of LGD in case of out-of-time estimation because here a less volatile response is particularly beneficial to the estimation accuracy. In the case of companies D and E, the

disadvantage in contract numbers can be compensated by additional significant variables in the in-sample and out-of-sample estimation.

The results of the two-step model with Logit yield R2 values that are com- parable to those of Gürtler and Hibbeln (2013). Since our Logit-R2 are only average compared to other classification models used in this study, we see a large opportunity for improvement by choosing advanced classification techniques.

Although R2 _{is about equal throughout all models for company F, we recog-}

nize the large gap between the performances of one-step and two-step models. This difference is a direct result of the consideration of recovered and written off contracts. Still, RF is slightly superior in the out-of-time testing.

In document Advanced Methods for Loss Given Default Estimation (Page 77-80)