Forecasting time-varying value-at-risk

Full text

(1)Forecasting Time-Varying Value-at-Risk. Nedda Cecchinato Doctor of Philosophy in Statistics Bachelor's Degree in Statistical, Demographic and Social Sciences. School of Mathematical Sciences Queensland University of Technology Master of Applied Science, 2010.

(2)

(3) iii. Keywords value-at-risk, generalised lambda distributions, non-parametric volatility estimation, local linear regression. Abstract In this thesis we are interested in nancial risk and the instrument we want to use is Value-at-Risk (VaR). VaR is the maximum loss over a given period of time at a given condence level. Many denitions of VaR exist and some will be introduced throughout this thesis. There two main ways to measure risk and VaR: through volatility and through percentiles. Large volatility in nancial returns implies greater probability of large losses, but also larger probability of large prots. Percentiles describe tail behaviour. The estimation of VaR is a complex task. It is important to know the main characteristics of nancial data to choose the best model. The existing literature is very wide, maybe controversial, but helpful in drawing a picture of the problem. It is commonly recognised that nancial data are characterised by heavy tails, time-varying volatility, asymmetric response to bad and good news, and skewness. Ignoring any of these features can lead to underestimating VaR with a possible ultimate consequence being the default of the protagonist (rm, bank or investor). In recent years, skewness has attracted special attention. An open problem is the detection and modelling of time-varying skewness.. Is skewness. constant or there is some signicant variability which in turn can aect the estimation of VaR? This thesis aims to answer this question and to open the way to a new approach to model simultaneously time-varying volatility (conditional variance).

(4) iv. and skewness. The new tools are modications of the Generalised Lambda Distributions (GLDs).. They are four-parameter distributions, which allow. the rst four moments to be modelled nearly independently: in particular we are interested in what we will call para-moments, i.e., mean, variance, skewness and kurtosis. The GLDs will be used in two dierent ways. Firstly, semi-parametrically, we consider a moving window to estimate the parameters and calculate the percentiles of the GLDs.. Secondly, parametrically,. we attempt to extend the GLDs to include time-varying dependence in the parameters. We used the local linear regression to estimate semi-parametrically conditional mean and conditional variance. The method is not ecient enough to capture all the dependence structure in the three indices ASX 200, S&P 500 and FT 30, however it provides an idea of the DGP underlying the process and helps choosing a good technique to model the data. We nd that GLDs suggest that moments up to the fourth order do not always exist, there existence appears to vary over time. This is a very important nding, considering that past papers (see for example Bali et al., 2008; Hashmi and Tay, 2007; Lanne and Pentti, 2007) modelled time-varying skewness, implicitly assuming the existence of the third moment. However, the GLDs suggest that mean, variance, skewness and in general the conditional distribution vary over time, as already suggested by the existing literature. The GLDs give good results in estimating VaR on three real indices, ASX 200, S&P 500 and FT 30, with results very similar to the results provided by historical simulation..

(5) Contents. List of Figures. vii. List of Tables. xiii. Glossary. xv. Statement of Original Authorship. xvii. Acknowledgment. xix. 1. Introduction. 1. 2. Modelling Volatility and Estimating Value-at-Risk. 5. 2.1. A denition of VaR . . . . . . . . . . . . . . . . . . . . . . . .. 6. 2.2. RiskMetrics. 2.3. Taylor-based approximation methods. . . . . . . . . . . . . . .. 11. 2.4. Historical simulation method . . . . . . . . . . . . . . . . . . .. 14. 2.5. Monte Carlo simulation method . . . . . . . . . . . . . . . . .. 15. 2.6. GARCH processes. 16. 2.7. Comparing techniques. . . . . . . . . . . . . . . . . . . . . . .. 19. 2.8. Modelling skewness . . . . . . . . . . . . . . . . . . . . . . . .. 21. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. v. 10.

(6) CONTENTS. vi. 3. 4. 5. 6. Quantile Distributions. 25. 3.1. Origins of generalised lambda distributions . . . . . . . . . . .. 26. 3.2. Moment-based methods. . . . . . . . . . . . . . . . . . . . . .. 29. 3.3. Percentile-based methods . . . . . . . . . . . . . . . . . . . . .. 32. 3.4. Starship estimation . . . . . . . . . . . . . . . . . . . . . . . .. 34. 3.5. Estimation based on L-moments . . . . . . . . . . . . . . . . .. 35. 3.6. Maximum likelihood estimation. . . . . . . . . . . . . . . . . .. 36. 3.7. Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 38. Estimating Volatility with Local Linear Regression. 39. 4.1. The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 40. 4.2. Skewness and conditional skewness. . . . . . . . . . . . . . . .. 43. 4.3. Local linear regression estimation. . . . . . . . . . . . . . . . .. 48. 4.4. Local linear regression exploration . . . . . . . . . . . . . . . .. 53. 4.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 72. The Time-Varying GLD. 73. 5.1. The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 75. 5.1.1. 77. Estimation algorithm . . . . . . . . . . . . . . . . . . .. 5.2. Generalised Lambda Distribution. . . . . . . . . . . . . . . . .. 79. 5.3. Estimating Value-at-Risk . . . . . . . . . . . . . . . . . . . . .. 84. Conclusions. A Likelihood equalities for the GLD estimation Bibliography. 151 155 159.

(7) List of Figures. 3.1. GLDrs regions as dened in Karian et al. (1996).. . . . . . . .. 27. 4.1. ASX 200 index and returns.. . . . . . . . . . . . . . . . . . . .. 44. 4.2. S&P 500 index and returns.. . . . . . . . . . . . . . . . . . . .. 45. 4.3. FT 30 index and returns. . . . . . . . . . . . . . . . . . . . . .. 46. 4.4. Standard deviations of overlapping moving windows of length. k = 30, 50. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 47. 4.5. Skewness and its condence interval.. . . . . . . . . . . . . . .. 49. 4.6. Autocorrelations of indices returns.. . . . . . . . . . . . . . . .. 55. 4.7. Returns against conditional mean for ASX 200.. . . . . . . . .. 57. 4.8. Returns against conditional mean for S&P 500.. . . . . . . . .. 58. 4.9. Returns against conditional mean for FT 30. . . . . . . . . . .. 59. 4.10 Returns against conditional variance for ASX 200. . . . . . . .. 60. 4.11 Returns against conditional variance for S&P 500. . . . . . . .. 61. 4.12 Returns against conditional variance for FT 30.. . . . . . . . .. 62. 4.13 Returns against conditional skewness for ASX 200 index. . . .. 63. 4.14 Returns against conditional skewness for S&P 500.. . . . . . .. 64. 4.15 Returns against conditional skewness for FT 30. . . . . . . . .. 65. 4.16 Series of conditional mean and conditional variances for ASX 200 index.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. vii. 66.

(8) LIST OF FIGURES. viii. 4.17 Series of conditional mean and conditional variances for S&P 500. 67 4.18 Series of conditional mean and conditional variances for FT 30. 67 4.19 ASX 200 returns ltered with local linear regression . . . . . .. 69. 4.20 S&P 500 returns ltered with local linear regression . . . . . .. 70. 4.21 FT 30 returns ltered with local linear regression. 71. 5.1. Q-Q plot of ASX 200.. 5.2. 5.3. Q-Q plot of. Q-Q plot of S&P 500.. 5.5. λ1 , λ2 , λ3 , λ4. Q-Q plot of. λ1 , λ2 , λ3 , λ4. Q-Q plot of. λ1 , λ2 , λ3 , λ4. 90. bootstrap distribution for the index. FT 30. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6. 89. bootstrap distribution for the index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. λ1 , λ2 , λ3 , λ4. 88. bootstrap distribution for the index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. λ1 , λ2 , λ3 , λ4. 87. bootstrap distribution for the index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. S&P 500. 5.4. bootstrap distribution for the index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Q-Q plot of ASX 200.. λ1 , λ2 , λ3 , λ4. . . . . . . .. 91. bootstrap distribution for the index. FT 30. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 92. 5.7. Q-Q plots for ASX 200 index.. . . . . . . . . . . . . . . . . . .. 93. 5.8. Q-Q plots for S&P 500 index.. . . . . . . . . . . . . . . . . . .. 93. 5.9. Q-Q plots for FT 30 index. . . . . . . . . . . . . . . . . . . . .. 93. 5.10 Moving windows GLDrs distribution for ASX 200. . . . . . . .. 94. 5.11 Moving windows GLDfmkl distribution for ASX 200.. . . . . .. 95. . . . . . . .. 96. 5.13 Moving windows GLDfmkl distribution for S&P500. . . . . . .. 97. 5.14 Moving windows GLDrs distribution for FT 30.. . . . . . . . .. 98. 5.15 Moving windows GLDfmkl distribution for FT 30. . . . . . . .. 99. 5.12 Moving windows GLDrs distribution for S&P500..

(9) LIST OF FIGURES. ix. 5.16 Series of lambda estimates, parameterisation,. w = 30.. λ1 , λ2 , λ3 , λ4. . . . . . . . . . . . . . . . . . . . . 100. 5.17 Series of para-moments estimates, RS parameterisation,. w = 30.. 5.18 Series of lambda estimates, FMKL parameterisation,. λ1 , λ2 , λ3 , λ4. w = 30.. w = 30.. 5.20 Series of lambda estimates, parameterisation,. w = 50.. 5.22 Series of lambda estimates,. parameterisation,. w = 50.. w = 100.. 5.26 Series of lambda estimates,. µ, σ 2 , γ, κ for ASX 200 with . . . . . . . . . . . . . . . . 107. λ1 , λ2 , λ3 , λ4. for ASX 200 with. . . . . . . . . . . . . . . . 110. µ, σ 2 , γ, κ for ASX 200 with. w = 100. .. w = 200.. for ASX 200 with RS. µ, σ 2 , γ, κ for ASX 200 with. w = 100. .. 5.28 Series of lambda estimates, parameterisation,. . . . . . . . . . . . . . . . . 106. . . . . . . . . . . . . . . . . . . 109. 5.27 Series of para-moments estimates, FMKL parameterisation,. for ASX 200 with. . . . . . . . . . . . . . . . . . . . . 108. w = 100.. FMKL parameterisation,. µ, σ 2 , γ, κ for ASX 200 with. λ1 , λ2 , λ3 , λ4. 5.25 Series of para-moments estimates, RS parameterisation,. for ASX 200 with RS. λ1 , λ2 , λ3 , λ4. w = 50.. 5.24 Series of lambda estimates,. . . . . . . . . . . . . . . . . 103. . . . . . . . . . . . . . . . . . . 105. 5.23 Series of para-moments estimates, FMKL parameterisation,. µ, σ 2 , γ, κ for ASX 200 with. . . . . . . . . . . . . . . . . . . . . 104. w = 50.. FMKL parameterisation,. for ASX 200 with. . . . . . . . . . . . . . . . . 102. λ1 , λ2 , λ3 , λ4. 5.21 Series of para-moments estimates, RS parameterisation,. µ, σ 2 , γ, κ for ASX 200 with. . . . . . . . . . . . . . . . . . . 101. 5.19 Series of para-moments estimates, FMKL parameterisation,. for ASX 200 with RS. . . . . . . . . . . . . . . . 111. λ1 , λ2 , λ3 , λ4. for ASX 200 with RS. . . . . . . . . . . . . . . . . . . . . 112.

(10) LIST OF FIGURES. x. 5.29 Series of para-moments estimates, RS parameterisation,. w = 200.. 5.30 Series of lambda estimates, FMKL parameterisation,. . . . . . . . . . . . . . . . . . . 113. λ1 , λ2 , λ3 , λ4. w = 200. .. 5.31 Series of para-moments estimates, FMKL parameterisation,. w = 100.. 5.34 Series of lambda estimates,. parameterisation,. λ1 , λ2 , λ3 , λ4. 5.38 Series of lambda estimates,. . . . . . . . . . . . . . . . . . . 121. λ1 , λ2 , λ3 , λ4. 5.39 Series of para-moments estimates,. parameterisation,. . . . . . . . . . . . . . . . 122. . . . . . . . . . . . . . . . 123. λ1 , λ2 , λ3 , λ4. for S&P 500 with RS. . . . . . . . . . . . . . . . . . . . . 124. 5.41 Series of para-moments estimates, RS parameterisation,. for S&P 500 with. µ, σ 2 , γ, κ for S&P 500 with. w = 200. .. w = 100.. for S&P 500 with RS. µ, σ 2 , γ, κ for S&P 500 with. w = 200. .. 5.40 Series of lambda estimates,. . . . . . . . . . . . . . . . 119. . . . . . . . . . . . . . . . . . . . . 120. w = 200.. FMKL parameterisation,. . . . . . . . . . . . . . . . 118. λ1 , λ2 , λ3 , λ4. 5.37 Series of para-moments estimates,. FMKL parameterisation,. for S&P 500 with. µ, σ 2 , γ, κ for S&P 500 with. w = 100. .. w = 200.. RS parameterisation,. µ, σ 2 , γ, κ for S&P 500 with. w = 100. .. 5.36 Series of lambda estimates,. for S&P 500 with RS. . . . . . . . . . . . . . . . . . . 117. 5.35 Series of para-moments estimates, FMKL parameterisation,. . . . . . . . . . . . . . . . 115. . . . . . . . . . . . . . . . . . . . . 116. w = 100.. FMKL parameterisation,. . . . . . . . . . . . . . . . 114. λ1 , λ2 , λ3 , λ4. 5.33 Series of para-moments estimates, RS parameterisation,. for ASX 200 with. µ, σ 2 , γ, κ for ASX 200 with. w = 200. .. 5.32 Series of lambda estimates, parameterisation,. µ, σ 2 , γ, κ for ASX 200 with. w = 100.. µ, σ 2 , γ, κ for S&P 500 with. . . . . . . . . . . . . . . . . . . 125.

(11) LIST OF FIGURES. xi. 5.42 Series of lambda estimates, FMKL parameterisation,. λ1 , λ2 , λ3 , λ4. w = 100. .. 5.43 Series of para-moments estimates, FMKL parameterisation,. parameterisation,. w = 200.. 5.46 Series of lambda estimates,. λ1 , λ2 , λ3 , λ4. w = 200. .. w = 100.. parameterisation,. w = 200.. for FT 30 with RS. . . . . . . . . . . . . . . . . . . . . 136. w = 200.. 5.54 Series of lambda estimates,. for FT 30 with. . . . . . . . . . . . . . . . 135. λ1 , λ2 , λ3 , λ4. 5.53 Series of para-moments estimates,. parameterisation,. µ, σ 2 , γ, κ. w = 100. .. w = 200.. for FT 30 with. . . . . . . . . . . . . . . . . . . . . 134. 5.52 Series of lambda estimates,. RS parameterisation,. µ, σ 2 , γ, κ. . . . . . . . . . . . . . . . . . . 133. 5.51 Series of para-moments estimates,. parameterisation,. for FT 30 with RS. λ1 , λ2 , λ3 , λ4 for FT 30 with FMKL. w = 100.. FMKL parameterisation,. . . . . . . . . . . . . . . . 131. . . . . . . . . . . . . . . . . . . . . 132. w = 100.. 5.50 Series of lambda estimates,. . . . . . . . . . . . . . . . 130. λ1 , λ2 , λ3 , λ4. 5.49 Series of para-moments estimates, RS parameterisation,. for S&P 500 with. µ, σ 2 , γ, κ for S&P 500 with. w = 200. .. 5.48 Series of lambda estimates, parameterisation,. µ, σ 2 , γ, κ for S&P 500 with. . . . . . . . . . . . . . . . . . . 129. 5.47 Series of para-moments estimates, FMKL parameterisation,. for S&P 500 with RS. . . . . . . . . . . . . . . . . . . . . 128. 5.45 Series of para-moments estimates,. FMKL parameterisation,. . . . . . . . . . . . . . . . 127. λ1 , λ2 , λ3 , λ4. w = 200.. RS parameterisation,. . . . . . . . . . . . . . . . 126. µ, σ 2 , γ, κ for S&P 500 with. w = 100. .. 5.44 Series of lambda estimates,. for S&P 500 with. µ, σ 2 , γ, κ. for FT 30 with. . . . . . . . . . . . . . . . . . . 137. λ1 , λ2 , λ3 , λ4 for FT 30 with FMKL . . . . . . . . . . . . . . . . . . . . 138.

(12) LIST OF FIGURES. xii. 5.55 Series of para-moments estimates, FMKL parameterisation,. w = 200. .. 5.56 Series of lambda estimates, parameterisation,. w = 100.. parameterisation,. . . . . . . . . . . . . . . . . . . . . 142. 5.60 Series of lambda estimates,. parameterisation,. w = 200.. µ, σ 2 , γ, κ. for FT 30 with. . . . . . . . . . . . . . . . . . . 145. λ1 , λ2 , λ3 , λ4 for FT 30 with FMKL . . . . . . . . . . . . . . . . . . . . 146. 5.63 Series of para-moments estimates, FMKL parameterisation,. for FT 30 with RS. . . . . . . . . . . . . . . . . . . . . 144. w = 200.. 5.62 Series of lambda estimates,. for FT 30 with. . . . . . . . . . . . . . . . 143. λ1 , λ2 , λ3 , λ4. 5.61 Series of para-moments estimates, RS parameterisation,. µ, σ 2 , γ, κ. w = 100. .. w = 200.. for FT 30 with. . . . . . . . . . . . . . . . . . . 141. 5.59 Series of para-moments estimates,. parameterisation,. µ, σ 2 , γ, κ. λ1 , λ2 , λ3 , λ4 for FT 30 with FMKL. w = 100.. FMKL parameterisation,. for FT 30 with RS. . . . . . . . . . . . . . . . . . . . . 140. w = 100.. 5.58 Series of lambda estimates,. for FT 30 with. . . . . . . . . . . . . . . . 139. λ1 , λ2 , λ3 , λ4. 5.57 Series of para-moments estimates, RS parameterisation,. µ, σ 2 , γ, κ. µ, σ 2 , γ, κ. w = 200. .. for FT 30 with. . . . . . . . . . . . . . . . 147.

(13) List of Tables. χ2. 4.1. P-values of the. 4.2. P-values of. 5.1. Maximum likelihood estimates of. . . . . . . . . . . . . . . .. 80. 5.2. Kolmogorov-Smirnov goodness of t test. . . . . . . . . . . . .. 80. 5.3. VaR coverage for ASX 200 at one day.. . . . . . . . . . . . . . 148. 5.4. VaR coverage for S&P 500 at one day.. . . . . . . . . . . . . . 149. 5.5. VaR coverage for FT 30 at one day. . . . . . . . . . . . . . . . 150. χ2. test for whiteness.. . . . . . . . . . . . . . .. 54. test for whiteness after ltering. . . . . . . . . .. 68. xiii. λ..

(14)

(15) Glossary. ARCH: AutoRegressive Conditional Heteroskedastic processes ARCH-M: ARCH in mean ARMA-GARCH: AutoRegressive Moving Average processes with GARCH errors ASX 200: Australian Securities Exchange 200 index DGP: Data Generating Process EGARCH: Exponential GARCH processes FIGARCH: Fractionally Integrated GARCH processes FT 30: Financial Times 30 index GARCH: Generalised ARCH processes GLDs: Generalised Lambda Distributions GLDfmkl, FMKL: parameterization for GLDs introduced by Freimer et al. (1988) GLDrs, RS: parameterization for GLDs introduced by Ramberg and Schmeiser (1974) IGARCH: Integrated GARCH processes para-moments: mean, variance, skewness and kurtosis S&P 500: Standard & Poor 500 index VaR: Value-at-Risk. xv.

(16)

(17) Statement of Original Authorship The work contained in this thesis has not been previously submitted to meet requirements for an award at this or any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where reference is made..

(18)

(19) To say of what is, that it is, or of what is not, that it is not, is true. (Aristotle in Metaphysics) If you would be a real seeker after truth, you must at least once in your life doubt, as far as possible, all things. (René Descartes, in Discours de la Methode). I would like to thank my mum that has always supported me. I thank Professor Rodney Wol who taught me so much: Writing a thesis is like telling a story. I thank Lorenzo for sharing the last months with me. I thank the Derbarians: John, Stef, Mehdi, Laura and Florian. A special mention to Alison, Madda and Erica..

(20)

(21) Chapter 1 Introduction. 1. According to the Oxford English Dictionary. the word risk means. Hazard, danger; exposure to mischance or peril.. Risk is everything uncertain about the future.. Every day of our lives, we. are exposed to risk: the risk of an accident when riding a bicycle, the risk in crossing a street, the risk of catching the 'u: these are just a few examples of risk for human beings. However, rms, banks and investors are also prone to risk; specically, some types of risk are credit risk, market or nancial risk,. liquidity risk, operational risk and legal risk (for detail, see Jorion, 2001). It is very important for rms and investors to be aware of the risk they are undertaking and to be able to measure this risk. In the case that the risk is too large, they can change strategies to reduce the risk. Naturally risk is not constant over time. For example, the risk of death for some contagious diseases has changed with the improvement of medicine. The recent nancial crisis increased the risk of default and many rms went bankrupt in the last year.. 1 www.oed.com. at 23 June 2009.. 1.

(22) 1 Introduction. 2. In this thesis we are interested in nancial risk and the instrument we want to use is Value-at-Risk (VaR). VaR is the maximum loss over a given period of time at a given condence level. Many denitions of VaR exist and some will be introduced throughout of this thesis. There are two main ways to measure risk and VaR: through volatility and through percentiles. Large volatility in nancial returns implies greater probability of large losses, but also larger probability of large prots. Percentiles describe tail behaviour. The estimation of VaR is a complex task. It is important to know the main characteristics of nancial data to choose the best model.. The existing literature is very wide, maybe controversial,. but helpful in drawing a picture of the problem. It is commonly recognised that nancial data are characterised by heavy tails, time-varying volatility, asymmetric response to bad and good news, and skewness.. Ignoring any. of these features can lead to underestimating VaR with a possible ultimate consequence being the default of the protagonist (rm, bank or investor). In recent years, skewness has attracted special attention.. Section 2.8. provides a more detailed introduction to skewness. An open problem is the detection and modelling of time-varying skewness. Is skewness constant or. there is some signicant variability which in turn can aect the estimation of VaR? This thesis aims to answer this question and to open the way to a new approach to model simultaneously time-varying volatility (conditional variance) and skewness. The new tools are modications of the Generalised Lambda Distributions (GLDs, see Chapter 3). They are four-parameter distributions, which allow the rst four moments to be modelled nearly independently: in particular we are interested in what we shall call para-moments, i.e., mean, variance, skewness and kurtosis. The GLDs will be used in two dierent ways..

(23) 3. Firstly, semi-parametrically, we consider a moving window to estimate the parameters and calculate the percentiles of the GLDs. Secondly, parametrically, we attempt to extend the GLDs to include time-varying dependence in the parameters. The thesis is organised as follows. In Chapter 2 VaR is introduced with a review of most common estimation methods. Chapter 3 presents a review of the literature on GLDs. In Chapter 4 we model the conditional mean, variance and skewness of three indices with local linear regression. We introduce the time-varying GLDs in Chapter 5 and we compare the new method with other existing methods in estimating VaR. Chapter 6 has the conclusions and future developments of the thesis..

(24) 1 Introduction. 4.

(25) Chapter 2. Modelling Volatility and Estimating Value-at-Risk. In this chapter we introduce the concept of Value-at-Risk (VaR). The scope of this thesis is oering a new method to estimate VaR. The aim of the chapter is describing VaR, its advantages and drawbacks, existing estimation techniques, and how to choose the best model. VaR is a measure of risk introduced by Morgan (1996) and since then has become very important and very common for nancial entities like rms, banks or investors. Even though VaR is widely accepted as measure of risk, it has been criticised recently (Beder, 1995; Chen, 2008) because it does not draw a complete picture of the risk and it can give misleading impressions of safety. Beder (1995) showed that there too many dierent ways to calculate VaR and results can vary by more than 14 times for the same portfolio. Chen (2008) noticed that VaR lacks the sub-additivity property, i.e., the risk for the sum of two independent risky events is not greater than the sum of the risks of the two events. The literature is very rich with papers proposing estimation techniques. 5.

(26) 2 Modelling Volatility and Estimating Value-at-Risk for VaR, but we consider only the most popular of them to provide a picture of the problem. As it often happens, that there is not an overall best choice to estimate VaR, but according to the characteristics of the situation (time line, time horizon, linear/non-linear behaviours, dimensions of the portfolio, presence of skewness, fat tails and structural breaks) sometimes one of the methods is known to perform better. Thus, the more information we have about the portfolio under study, the better we can choose a proper technique to forecast VaR. Another characteristic gaining attention in the past decade is skewness. There is a growing literature on the problem of modelling skewness and timevarying skewness. It is still an open problem and we would like to give our contribution in the second part of this thesis. This chapter is organised as follows. In Section 2.1, we give a denition of VaR. The following sections describe some of the most common estimation methods: RiskMetrics (Section 2.2), Taylor-approximation methods (Section 2.3), historical simulation (Section 2.4), Monte Carlo techniques (Section 2.5), and GARCH models (Section 2.6). All the methods mentioned above are currently used by banks. We will highlight advantages and drawbacks of each of them. In Section 2.7, we deal with the problem of comparing dierent methodologies and Section 2.8 is dedicated to the importance of skewness in estimating VaR.. 2.1 A denition of VaR Let us see why it is so important to estimate VaR accurately. Firms, banks, and investors holding portfolios are exposed to the risk of incurring large shortfalls, i.e., large negative returns or losses. The risk of a portfolio depends. 6.

(27) A denition of VaR. 7. on each of its assets and each asset is aected by dierent risk factors. Thus, to measure the risk of a portfolio, it is necessary to analyse all the dierent risk factors and their eects on the returns of the assets.. Given a good. measure of this risk, rms, banks, and investors can decide whether this level of risk is bearable or if they have to change to a less risky strategy.. VaR. is the most used measure of this risk and when underestimated can lead to catastrophic consequences.. Unfortunately, it can be rather complicated to. get a good estimate of VaR, especially, as it is often the case, if the portfolio has large dimensions, i.e., it is composed of a large number of assets.. A. portfolio with many assets implies procuring more information or more data and considering possible dependences between dierent assets in order to estimate its riskiness; it is possible to identify two types of problems: amount and quality of information, and computational burden.. The Bank for International Settlements (BIS) sets the adequacy of bank capital to be three times the. 1%. VaR of the ten-day forecast (Beder, 1995;. Due and Pan, 1997). However, we must keep in mind that there are many dierent ways to estimate VaR, and this capital can vary consistently from one rm to another according to the estimation technique chosen.. VaR is dened over a time horizon or holding period. probability or condence level. 1 − α.. τ. and a coverage. For example, VaR at one-day at. 1−α =. 95% level gives the maximum loss over the next day that should be exceeded no more than. 5%. of the time (Chang et al., 2003; Due and Pan, 1997).. Many dierent ways exist to dene and to estimate VaR. It is dened as the quantile of the loss in portfolio value during a dened period. Let the value of a portfolio at time t, then we can dene the loss. V (t). be. L after a period.

(28) 2 Modelling Volatility and Estimating Value-at-Risk τ. 8. as. L = V (t + τ ) − V (t). The VaR,. xα ,. is the quantile for a given probability. the loss function. α. of the distribution of. L: P (L ≤ xα ) = α.. Usually the holding period, for the condence level,. τ,. is one or ten working days and typical values. 1 − α,. are. 90%, 95%. 99%.. and. This denition is. given by Glasserman et al. (2000).. Equivalently, we can dene a portfolio as a collection of weights. w = (w1 , w2 , . . . , wn )0. and values. L = ∆V =. wi (vi,t+τ − vi,t ) =. i=1 where. ∆vi,t = vi,t+τ − vi,t. τ. n X. at time. t.. is given by. wi ∆vi. i=1. is the change in value of asset. interval (t, t + τ ). The VaR for some probability level level of loss such that. assets with. v = (v1,t , v2,t , . . . , vn,t )0. The change of a portfolio value over a time horizon. n X. n. 1−α. i. over the time. is dened as the. P {∆V ≤ ∆V ∗ (α)} = α.. An interesting (but usually awed) scenario assumes that the change in value of an asset is Normally distributed.. Then the calculation is quite. straightforward:. VaR(α)t+τ |t. = Φ−1 (α)ˆ σt+τ |t. where VaR(α)t+τ |t is the VaR of level. 1−α. at time. t+τ. estimated at time.

(29) A denition of VaR. 9. t; Φ−1 (·). is the. α. Φ−1 (α) = −1.64;. quantile of the Normal distribution, e.g., if. 1 − α = 95%,. and. 2 σ ˆt+τ |t. n √ X = τ wi wj γˆij i,j=1. is the variance of the portfolio, with. wi. n. the weights as dened above, and. between assets. i. and. j.. the number of assets in the portfolio,. γˆij. is the estimate of the covariance. However, calculations can become burdensome and. complicated when increasing the portfolio dimension: the estimation of the variance-covariance matrix with general entry. n(n − 1)/2. γij. requires the estimation of. terms. On the other hand, not all functions are linear. A long. straddle position (long call and long put) makes a prot if the price of the underlying asset moves, whereas a short straddle position (short call and short put) makes a prot if the price does not move at all. These are only a couple of examples, where we cannot assume that. ∆vi. is Normally distributed.. To overcome the problem of linearity it is possible to use approximation methods based on dierent order of a Taylor expansion. They are the delta. method (linear approximation) and the delta-gamma method (quadratic approximation), introduced in Section 2.3 (Britten-Jones and Schaefer, 1999). Chen (2008) highlights some drawbacks of VaR. Even if it remains an important quantity, it is not perfect and care is needed when dealing with it. In particular, the author highlights that VaR does not give information on the extent of excessive losses. The Bank of International Settlements sets the adequate level of bank capital to be three times VaR(0.01)t+10|t (Due and Pan, 1997).. Beder (1995) in her study found that the multiplicative. constant can be too large or too small depending on the technique used to estimate VaR. For these reasons it is worth mentioning that, recently,.

(30) 2 Modelling Volatility and Estimating Value-at-Risk. 10. another measure has been considered, the Expected Shortfall (ES), which is dened as the average loss given that it is lower than the. α. percentile. ES is. increasingly becoming an attractive alternative to VaR and is dened by. µα = E{rt |rt < VaR(α)t+1 }, where. rt. is the return of the observed price. Pt. at time. t. Pt Pt−1 Pt+τ = log , Pt. rt = log Rt,τ where. rt. period. t, t + τ .. are the daily returns and. Rt,τ. (2.1). are the aggregate returns over the. 2.2 RiskMetrics The RiskMetrics method provides a very simple and intuitive way to calculate VaR. In spite of its simplicity and ease of use, RiskMetrics is still a good benchmark and it is not uncommon that it overcomes more sophisticated methods. The model, proposed by Morgan (1996), is. VaR(α)t|t−1. =µ ˆt−1 + σ ˆt Φ−1 (α),. where VaR(α)t|t−1 has been dened in the previous section; is the sample mean of the returns; Normal distribution function; and. σ ˆt. Φ−1 (·). (2.2). µ ˆt−1 =. rj j=1 t−1. Pt−1. is the inverse of the standard. is the forecast of the standard deviation.

(31) Taylor-based approximation methods. 11. at time. t. given by. 2 σ ˆt2 = (1 − λ)(rt−1 − µ ˆt−1 )2 + λˆ σt−1 ,. with. 0<λ<1. being the decay factor. Even though it would be possible to. estimate the optimal. λ. through a likelihood function, it is common practice. to follow a suggestion of Morgan (1996). He recommends to use a value of. λ = 0.94. for forecasting one-period volatility and a value of. λ = 0.97. for. forecasting monthly volatility of aggregate returns. A positive aspect of this technique is that it gives more importance to recent observations, whereas past observations have decreasing weights. Barone-Adesi et al. (2002) noticed that RiskMetrics has the merit of helping VaR to be accepted by banks and imposed by regulators thanks to its simplicity and ease of use. However, the method can create large bias, especially for longer horizons, since it does not admit consideration of expiring contracts. RiskMetrics is also known under the name of exponentially weighted mov-. ing average (EWMA) and is a special case of a GARCH process.. More. precisely, it is equivalent to an I-GARCH process (GARCH processes and their variations are explained in Section 2.6).. 2.3 Taylor-based approximation methods Taylor-based approximation methods include, in particular, two well-known methods: the delta method and the delta-gamma method. very simple. Given a portfolio with. Pn. i=1. xi,t vi,t ,. n. The principle is. assets, whose value at time. t. is. Vt =. for sake of simplicity we assume that there is only a factor,. inuencing the values of the assets of the portfolio.. f,. We approximate the.

(32) 2 Modelling Volatility and Estimating Value-at-Risk. 12. variation of the price with a Taylor expansion, given by. dVt = Vt − Vt−1 =. ∂Vt 1 ∂ 2 Vt 2 ∂Vt df + dt + . . . df + ∂f 2 ∂f 2 ∂t. 1 = δdf + γdf 2 + θdt + . . . 2 where. δ =. ∂Vt and ∂f. γ. (2.3). are the rst and the second derivatives, respectively,. of the portfolio value with respect to the factor. f,. and. θ. is the time drift,. which is deterministic (Jorion, 2001). According to Hull (1997),. θ. is the rate. of change of the value of the portfolio as time passes with everything else remaining the same. For example, the value of an option changes with time, while everything else remains constant, because it gets closer to its maturity time.. The delta method is a full valuation method, simple and fast to evaluate the VaR of a portfolio.. The name indicates that we approximate the. variations of the portfolio with the rst derivative, i.e., we stop the Taylor expansion of Equation (2.3) at the rst term. From Britten-Jones and Schaefer (1999) we nd that the delta method approximates the change of the portfolio value by. dVt = µdt + δdf. If we can assume that. ∆f ∼ N (µf , σf2 ). is Normally distributed, then the. VaR can be calculated explicitly as. VaR(α)t+τ |t. = δσf Φ−1 (α).. The delta-gamma method stops the approximation at the second term of.

(33) Taylor-based approximation methods. 13. the Taylor expansion of Equation (2.3):. 1 ∆γ Vt = µdt + δdf + γ(df )2 , 2 where. γ is the gamma as dened in Equation (2.3).. In the case the underlying. asset is Normally distributed, Britten-Jones and Schaefer (1999) showed that the distribution of. χ2 .. ∆γ V. can be derived exactly as a translated non-central. Also in this case an explicit equation for the VaR can be obtained as. 1 ∆γ V ∗ (α) = µ∗ + γσ 2 ω ∗ (α), 2 where. µ∗ = µ −. 1 δ2 and 2 γ. ω ∗ (α). is the. α-quantile. of a non-central. χ2 random 2. variable with one degree of freedom and non-centrality parameter. δ+γµ γσ. .. The error of the delta-gamma method is of a lower order than the error originated by the delta method; however the former method is not always preferable. Second-order approximations densities can lead to a funny tail, as observed by Due and Pan (1997, Exhibit. 21).. The best choice between. the delta and delta-gamma methods depends on the function linking the price of the underlying asset and its derivative, i.e., the relationship between the delta and the gamma. Apart from its simplicity, the delta method works pretty well with a linear payo, or with a non-linear payo and short time horizon. In the case of a non-linear payo and a long horizon, the method departs from the real function linking the spot price and the value of the derivative with the consequence of underestimating the VaR. In this case, the delta-gamma method might be necessary to improve the approximation. It works particularly well when the link function is either concave or convex, i.e., the function is locally parabolic..

(34) 2 Modelling Volatility and Estimating Value-at-Risk. 14. The delta and the delta-gamma methods can be quite burdensome and not always satisfactory, especially in some extreme cases. estimate the delta and the gamma of a portfolio with the variance-covariance matrix,. n(n − 1)/2. the computational task becomes huge as. n. n. If we want to. assets, to evaluate. values have to be calculated and increases.. 2.4 Historical simulation method The historical simulation method is a non-parametric full valuation method and approximates the distribution of the returns of a given portfolio by considering the past observations: servations.. it takes the. α. quantile of the past. k. ob-. The method is powerful when the distribution of the returns. is stationary over time, and VaR is estimated by taking the corresponding quantile of the past, or historic, distribution, i.e.,. VaR(α)t. where. Xα,(t−k,t−1). indicates the. Xt−k−1 , Xt−k , . . . , Xt−1. and. k. = Xα,(t−k,t−1) ,. α. percentile of the observed distribution of. is the number of past values considered.. The major challenge lies in the choice of the window length.. On one. hand, it is necessary to consider a large window to have enough observations to estimate larger risks. For example, considering 100 observations, if we are estimating the VaR at. 99%. condence we expect only one such ex-. treme observation and, as a consequence, the estimate will not be precise, as its variance cannot be estimated. However, a large window is not always available. On the other hand, a very large window takes into consideration very remote observations, and it is likely that these returns are no longer representative of the current distribution of the portfolio, unless the process.

(35) Monte Carlo simulation method. 15. is stationary. If the process is not stationary, short windows might be better because they adapt quickly to new regimes. Vlaar (2000) observes that short windows are sensitive to large outcomes from the recent past. Large observations can create articial gaps between subsequent windows, being in the sample at time. t. and being excluded at time. t + 1,. or vice versa. From this. point of view EWMA (see Section 2.2) provides a solution to this problem assigning decreasing weights to remote observations and removing the gap. This method takes into consideration distributional characteristics, impossible to be modelled through the Normal approximation, e.g., fat tails, asymmetries and non-linearities (Jorion, 2001). There exist variants of this method, like the proposal of Barone-Adesi et al. (2002). They developed a method dened semi-parametrically because it can be divided into two phases.. They t a GARCH(1,1) process and. calculate the residuals of the returns based on the model. Then they apply historical simulation to the residuals.. 2.5 Monte Carlo simulation method As the name suggest, Monte Carlo simulation method samples from a distribution to forecast VaR. It consists of three sub-elements (Vlaar, 2000): the expectation, the variance-covariance matrix, and the distribution. The expectation is the average future value we can expect. It is common practice to assume future expectation to be zero; however, it has been proved that modelling the conditional mean improves the results. In the case of a portfolio, it is necessary to have an estimate of the variance-covariance matrix, i.e., the mutual linear dependence structure of the assets constituting the portfolio under study. Finally, a distribution has to be chosen. Barone-Adesi et al..

(36) 2 Modelling Volatility and Estimating Value-at-Risk. 16. (2002) highlighted that estimating the historical variance-covariance matrix is a very delicate task: in periods of crisis, all assets tend to exhibit the same behaviour, increasing their correlation and, as an ultimate consequence, possible losses are underestimated.. Commonly, the Normal distribution is a. good candidate, even though Barone-Adesi et al. (2002) severely criticised this choice because, in practice, it allows crashes below the mean of only three or four standard deviations. Overall, Monte Carlo simulation methods are very versatile and they allow the model to reect the characteristics of the data, e.g., in the case of very fat tails a distribution dierent from the Normal with a larger kurtosis can be chosen. The main drawback of this method is the computational burden because of the need of generate many sample paths, especially with large portfolios (Jorion, 2001).. 2.6 GARCH processes Engle (1982) was the rst to model conditional variance with autoregressive conditional heteroscedastic (ARCH) processes. A few years later Bollerslev (1986) extended Engle's result and proposed the very famous generalised ARCH (GARCH) process.. Definition 2.1 A generalised autoregressive conditional heteroscedastic. (GARCH) model with orders. p≥1. and. q≥0. is dened as. X t = σ t εt σt2 = ω +. p X. 2 βi Xt−i +. i=1. where. ω ≥ 0, βi ≥ 0, i = 1, . . . , p,. and. q X. 2 , αj σt−j. (2.4). j=1. αj ≥ 0, j = 1, 2, . . . , q ,. are constant,.

(37) GARCH processes. 17. εt ∼IID(0, 1),. and. εt. is independent of. Xt−k , k ≥ 1. for all. t.. GARCH processes can model heavy tails and volatility clustering, typical features of nancial data.. They do not allow for modeling skewness. or asymmetric response to large negative and positive returns. In the past twenty years, the ARCH framework has been growing with new models to incorporate asymmetries, long range dependence and other features present in nancial data.. GARCH processes and their variations are used in VaR. estimation to give a more accurate estimate of the conditional standard deviation. σ ˆt. in Equation (2.2). The glossary of Bollerslev (2008) is a precious. source providing a brief denition of each model created up to 2008 and the corresponding literature reference. We introduce some of the most famous models in alphabetic order below.. ARCH-M The ARCH-M (ARCH in mean) model was rst proposed by Engle et al. (1987) and it allows the variance to aect directly the expected returns on a portfolio. The conditional variance becomes a determinant of the current risk premium. The model is given by. Xt = g(σt2 , θ) + σt εt , 2 σt2 = ω + β1 {Xt−1 − g(σt−1 , θ)}2 ,. where commonly. g(y, θ) = θ0 + θ1 y. (Fan and Gu, 2003).. ARMA-GARCH Many authors (Meitz and Saikkonen, 2008, and reference therein) model data with ARMA models with GARCH errors. This became a powerful tool for.

(38) 2 Modelling Volatility and Estimating Value-at-Risk. 18. data conditionally dependent in the mean and in the variance at the same time. The ARMA-GARCH model is dened in Francq and Zakïan (2004) as. Φ(B)(Xt − µ) = Θ(B)εt , where. µ. is the mean of the process,. B. is the backward shift operator. Xt−1 , Φ(z) = 1 − φ1 z − . . . − φP z P , Θ(z) = 1 + θ1 z + . . . + θQ z Q. BXt =. and. εt. is. a GARCH(p, q ) process as dened in Equation (2.4). Other papers on the topic are by Ling and McAleer (2003), Lange et al. (2006) and Ling (2007).. EGARCH. The exponential GARCH was developed by Nelson (1991). Not only the size, but also the sign of the shock is important in determining the conditional variance: the model responds asymmetrically to random shocks allowing to model the skewness of nancial data. The model is given by. Xt = εt exp(ht /2) h2t = γ0 + γ1 ht−1 + g(εt−1 ), where. g(x) = ωx + λ(|x| − E|x|).. FIGARCH. The fractionally integrated GARCH (FIGARCH or FIARCH) was introduced by Baillie (1996) and extends GARCH models to account for long memory in a similar way as ARFIMA (autoregressive fractionally integrated moving.

(39) Comparing techniques. 19. average processes, see Granger and Joyeux, 1980; Hosking, 1981):. X t = σ t εt σt2 = ω + {1 − α(L) − (1 − L)d β(L)}Xt2 + α(L)σt2 , where. β(L) =. shift operator. Pp. i=1. βi Li , α(L) =. LXt = Xt−1 .. Pp. j=1. αj Lj , (1 − L)d. and. L is the backward. The authors' aim was to propose a model more. exible than the traditional GARCH model.. IGARCH Engle and Bollerslev (1986) considered the integrated GARCH (IGARCH). In the case of orders. p = q = 1,. they constrain. α1 + β1 = 1.. This model has. been mentioned in Section 2.2 because it is also known under the name of EWMA (see also Bao et al., 2006). RiskMetrics can be considered a special case of an IGARCH model. 2.7 Comparing techniques In this chapter we have given a short introduction to VaR, its importance for banks, rms and investors, and the most common estimation techniques. However, the literature is much wider and many other methods have been developed.. As usual there is not a single best method, but according to. the characteristics and the scenario it might be that a particular technique performs better than the others. For this reason, we think it is important to spend some time to illustrate some evaluation criteria, helpful to choose the best technique. There are three main characteristics to be considered to choose a good.

(40) 2 Modelling Volatility and Estimating Value-at-Risk. 20. model: unbiasedness, ability to forecast out-of-sample, and randomness.. Unbiasedness means that we expect the number of spillovers, i.e., the number of observations exceeding the VaR, to be near the condence level chosen to estimate the VaR. Due and Pan (1997) suggested a simple test on the correctness of the model. Under the null hypothesis we expect that the fraction of excess converges to the chosen condence level. The test is very simple but does not work very well if the model is poor.. Secondly, following a suggestion of Poon and Granger (2003), the success of a model lies in its out-of-sample forecasting power. returns. rt , t = 1, . . . , n,. out-of-sample part,. we split it in the in-sample part,. t = t1 + 1, . . . , n.. Given a series of. t = 1, . . . , t1 ,. and. On the rst part of the sample we base. the estimate of the model. We test the model by forecasting on the second part of the sample.. We count the number of observations not exceeding. the estimated VaR. The closer we are to the condence level the better the method works.. Finally, even though we know that some spillovers will occur, they should occur randomly. If some kind of pattern drives the timing of the spillovers, then the model is not satisfactory and we could improve it by taking into account this dependence.. Christoersen (1998) developed three dierent tests: the rst to test for correct coverage, the second to test for independence, and the third testing for both independence and coverage. They are all likelihood ratio tests asymptotically distributed as. χ2. random variables. For reasons that will be. clear in Section 5.3, we report only the test for independence.. The independence test statistic is asymptotically distributed as a. χ2. ran-.

(41) Modelling skewness. 21. dom variable with one degree of freedom and is given by. n01 (1 − pi11 )n10 ], − 2 ln[(1 − p)n−t pt ] + 2 ln[(1 − π01 )n00 π01. where. n. is the sample size,. portion of spillovers, by. j , i, j = 0, 1. nij. t. is the number of spillovers,. p = t/n. is the number of observations with value. (2.5). is the pro-. i. followed. and. nij πij = P (Xt = j|Xt−1 = i) = P j nij is the corresponding probability. It is worth mentioning that some methods give good performance at some levels, but they do not approximate very well the whole distribution. If we are interested in estimating VaR at. 99%. condence level, we want a method. able to model accurately the tails of the distribution.. 2.8 Modelling skewness The rst characteristic to be modelled in a series was the conditional mean and the method, above all others, was the autoregressive moving average (ARMA) model. Later Engle (1982) and Bollerslev (1986) moved the common interest to the problem of modeling conditional variance. Finally, in the past decade or so, it has been noticed that nancial data are characterised by non-constant skewness, among other features, and the literature has been enriched with works to describe it. Skewness has been identied to be caused by dierent factors. Firstly, it is driven by the dierent eects of good and bad news. In the presence of bad news, the market becomes more unstable and volatility increases, whereas.

(42) 2 Modelling Volatility and Estimating Value-at-Risk. 22. good news has smaller impact. In particular Lanne and Pentti (2007) showed that the maxim no news is good news does not really work for nancial data.. Small volatility is observed with little good news, whereas no-news. has a negative inuence.. Chen et al. (2001) identify volatility feedback. as one of the causes of skewness, i.e., good news is partially oset by an increase in risk premium, whereas bad news and risk premium go on the same direction. Thus they tried to nd additional explanations of the asymmetries observed in the market.. They identied two other causes beside volatility. feedback, i.e., leverage eects and stochastic bubbles (see also Bekaert and Wu, 2000; Blanchard and Watson, 1982). They dene stochastic bubbles as low-probability events producing large negative returns. Very recently, time-varying skewness was introduced in nancial models (Bali et al., 2008; Brännäs and Nordman, 2003; Harvey and Siddique, 1999, 2000; Hashmi and Tay, 2007; Lanne and Pentti, 2007).. Dierent authors. addressed the existence of skewness in conditional distributions.. They in-. cluded time-varying skewness in dierent ways and most of them share the result that forecasting and VaR estimates are more accurate if a skewness parameter is included in the model. Harvey and Siddique (1999) proposed two dierent models. In the rst they modelled conditional second and third moments jointly using a noncentral Student. t-distribution. with a shape parameter. δt :. this is the asym-. metry parameter and they assumed it varies over time. In the second model they try to understand the relation between variance and skewness.. Thus. they use a GARCH model with dierent responses to innovations according to the sign of the innovation nding that conditional variance and innovations have an inverse relation. Lanne and Pentti (2007) model time-varying third moment through a.

(43) Modelling skewness. 23. GARCH-M model with a. z. distribution which is a variance-mean mixture of. Normal distributions such that it allows for heavy tails and asymmetry. They have noticed that the conditional second and third moments move together. Hashmi and Tay (2007) used an AR-GARCH model where the conditional variance has asymmetric response to the sign of the innovation and the conditional distribution of the standardised residuals is a modied skewed Student. t-distribution.. The existing literature highlights how identication and modeling of timevarying skewness is very important when forecasting VaR. A negative skewness implies a larger probability of large negative returns, and underestimating this feature of nancial data can lead to underestimation of the VaR, with all the consequences of it. Just to have an idea of the huge error we could incur, let us assume that we estimate VaR based on the Normal approximation when the true distribution is an exponential. After standardizing, the VaR at. 1%. is. −2.27. for the Normal distribution and. distribution, the VaR at. 5%. is. −1.93. and. −2.68. −3.94. for the exponential. respectively.. Harvey and. Siddique (1999) observed that persistence in skewness can aect also the rst two conditional moments, i.e., conditional mean and conditional variance..

(44) 2 Modelling Volatility and Estimating Value-at-Risk. 24.

(45) Chapter 3. Quantile Distributions. In the previous chapter we introduced VaR and the most common methods to estimate it.. Now we dedicate a chapter to quantile distributions,. a new method to estimate VaR. It is new in the sense that until now, to our knowledge, quantile distributions have not been used in nancial applications.. However, we deem that the great exibility of these distributions. could produce good results.. Rayner and MacGillivray (2002) identied a family of quantile distributions, i.e., distributions dened through their inverse distribution functions and most of the time without a closed form for the cumulative density function. In this chapter we introduce two of them, the generalised lambda dis-. tributions with RS and FMKL parameterisations (from the initials of the authors who rst proposed the two parameterisations Freimer et al., 1988; Ramberg and Schmeiser, 1974, respectively). They both depend on four parameters,. λ = (λ1 , λ2 , λ3 , λ4 ),. which allow the rst four para-moments to be. modelled.. 25.

(46) 3 Quantile Distributions. 26. 3.1 Origins of generalised lambda distributions. Tukey's. λ. distribution was rst suggested in a work of Hasting et al. (1947).. It depends on one parameter,. λ,. and is dened through the inverse of its. distribution function,. xp = F −1 (p) = where. 0. p. is uniformly distributed. pλ − (1 − p)λ , λ. (0, 1), xp. is a parameter to be estimated.. (3.1). λ 6=. indicates the quantile, and. Later, Ramberg and Schmeiser (1974). introduced a more general distribution than Tukey's version, known as the. generalised lambda distribution (GLDrs), that depends on four parameters. xp = F −1 (p) = λ1 + where eters. λ1. λ3. is the location parameter,. and. λ4. λ2. pλ3 − (1 − p)λ4 , λ2. (3.2). is the scale parameter and the param-. govern skewness and kurtosis (Fournier et al., 2007). Like its. predecessor, the GLDrs can be expressed only through its inverse distribution function and does not exist in closed form, however Karian et al. (1996) found that Equation (3.2) generates a proper distribution if and only if. λ3. pλ3 −1. λ2 ≥ 0, + λ4 pλ4 −1. for all. p ∈ [0, 1].. (3.3). Figure 3.1 shows the regions where the GLDrs is dened. The curved boundary in the second quadrant is determined by the equality. (1 − λ3 )1−λ3 (λ4 − 1)λ4 −1 λ3 = − , λ −λ (λ4 − λ3 ) 4 3 λ4.

(47) Origins of generalised lambda distributions. 27. λ4 Region 5. 3. Region 1 2. Region 3. 1 not valid. λ. 0. 3. not valid Region 6 −1 Region 4. Region 2. −2. −3 −3. −2. −1. 0. 1. 2. 3. GLDrs regions as dened in Karian et al. (1996). The distributions are bounded on the right in regions 1 and 5, bounded on the left in regions 2 and 6, bounded in region 3 and unbounded in region 4. Figure 3.1:. whereas for the fourth quadrant the parameters The support of the GLD can be nite (λ3 , λ4 half-innite (λ3 the parameters. < 0, λ4 > 1 λ3. and. λ4 .. or. λ3. and. > 0),. λ3 > 1, λ4 < 0),. λ4. are interchanged.. innite (λ3 , λ4. < 0). or. according to the values of. It is symmetric whenever. λ3 = λ4 :. unfortunately,. there is not a clear association between these parameters and the shape. When either. λ3. or. λ4. is equal to zero, we can nd. a function of skewness. γ. and kurtosis.    λ = 0, γ(λ4 ) =   3     λ = 0, γ(λ ) = 4 3. λ4. and. λ3 ,. κ:. 1−λ4 , λ4 +3. λ3 −1 , λ3 +3. κ(λ4 ) =. κ(λ3 ) =. λ24 −3λ4 +2 λ24 +7λ4 +12. λ23 −3λ3 +2 . λ23 +7λ3 +12. respectively, as.

(48) 3 Quantile Distributions. 28. As it is emphasised in the literature, the GLDrs is very exible and its four parameters allow a wide variety of shapes (for more detail on the shape of GLD, see King and MacGillivray, 1999). This quality is at the cost of high complexity and diculty to t. It has been used in diverse elds: corrosion, meteorology, fatigue of materials, independent component analysis, statistical process control, simulation of queuing system and generating random numbers (for the complete list of references, see Fournier et al., 2007).. The main diculty in using the GLDrs distribution concerns the limitation on the values of. λ3. and. λ4 :. not all combinations generate a valid. distribution. For this reason, Freimer et al. (1988) proposed another distribution similar to Equation (3.2), given by. xp = F −1 (p) = λ1 + where. (pλ3 − 1)/λ3 − ((1 − p)λ4 − 1)/λ4 , λ2. (3.4). λ1 , λ2 , λ3 and λ4 have the same interpretation as the former parameter-. isation. This parameterisation is well dened for all values of. (λ3 , λ4 ) ∈ R2 .. In the following we briey describe some estimation methods: the method of moments, the starship method, the percentile method, the L-moments method and a method based on the maximum likelihood. We do not describe the method proposed by Öztürk and Dale (1982) because it is not commonly used and it has been proved that most methods outperform it.. The literature is rich with methods to t the GLD. Among them, we cite the methods of percentiles (Karian and Dudewicz, 1999), the method of L-moments (Karvanen and Nuutinen, 2008), numerical maximum likelihood (Su, 2007), least squares method (Öztürk and Dale, 1986) and the starship method (King and MacGillivray, 1999)..

(49) Moment-based methods. 29. 3.2 Moment-based methods. The method of moments was developed by Ramberg et al. (1979) for the RS parameterisation, and by Lakhany and Mausser (2000) for the FMKL parameterisation. It consists of calculating the rst four sample moments from the data and matching them with the theoretical moments. It is necessary to resort to a numerical procedure to solve the four-equation system. The method has two main drawbacks.. Firstly, it does not cover all the possi-. ble solutions because for some combinations of parameters not all moments exist.. Ramberg et al. (1979) articially restricted the regions where mo-. ments up to the fourth exist: the rst with. −1/4 ≤ λ3 , λ4 ≤ 0. of. λ3. and. λ4 ,. 0 ≤ λ3 , λ4 ≤ 1,. the second with. Secondly, distributions generated from dierent values. share the same third and fourth moments.. The rst four moments mean, variance, skewness and kurtosis are given in Karian and Dudewicz (2003), respectively:. µ = λ1 + A/λ2 σ 2 = (B − A2 )/λ22 γ = (C − 3AB + 2A3 )/(λ32 σ 3 ) κ = (D − 4AC + 6A2 B − 3A4 )/(λ42 σ 4 ),.

(50) 3 Quantile Distributions. 30. where. A = 1/(1 + λ3 ) − 1/(1 + λ4 ) B = 1/(1 + 2λ3 ) + 1/(1 + 2λ4 ) − 2β(1 + λ3 , 1 + λ4 ) C = 1/(1 + 3λ3 ) − 1/(1 + 3λ4 ) − 3β(1 + 2λ3 , 1 + λ4 ) +3β(1 + λ3 , 1 + 2λ4 ) D = 1/(1 + 4λ3 ) + 1/(1 + 4λ4 ) − 4β(1 + λ3 , 1 + 3λ4 ) +6β(1 + 2λ3 , 1 + 2λ4 ) − 4β(1 + λ3 , 1 + 3λ4 ), and. β(u, v). is the beta function. Z β(u, v) =. 1. xu−1 (1 − x)v−1 dx. for. u, v > 0.. (3.5). 0. Mykytka and Ramberg (1979) proposed, as an alternative, to use a robust estimator of sample moments. In this way only the existence of the rst two moments is required. However, this method is not free from problems and other methods have been preferred to it.. The methods of the moments for the FMKL parameterisation dened in Equation (3.4) requires the calculation of the theoretical moments. The rst four moments can be found in the work of Lakhany and Mausser (2000)..

(51) Moment-based methods. 31. Firstly, we have to dene the following quantities:. 1. v1 = v2 v3. v4. where. −. 1. λ3 (λ3 + 1) λ4 (λ4 + 1) 1 2 1 + 2 − = β(λ3 + 1, λ4 + 1) 2 λ3 (2λ3 + 1) λ4 (2λ4 + 1) λ3 λ4 1 1 3 = − 3 − 2 β(2λ3 + 1, λ4 + 1) 3 λ3 (3λ3 + 1) λ4 (3λ4 + 1) λ3 λ4 3 + β(λ3 + 1, 2λ4 + 1) λ3 λ24 1 1 6 = + 4 + 2 2 β(2λ3 + 1, 2λ4 + 1) 4 λ3 (4λ3 + 1) λ4 (4λ4 + 1) λ3 λ4 4 3 − 3 β(3λ3 + 1, λ4 + 1) − β(λ3 + 1, 3λ4 + 1), λ3 λ4 λ3 λ34. β(u, v). is the beta function dened in Equation (3.5).. min(λ3 , λ4 ) > − k1. for the. k -th. Given that. moment to exist, the four moments are given. by. 1 1 1 µ = λ1 − − λ2 λ3 + 1 λ4 + 1 v2 − v12 σ2 = λ22 v3 − 3v1 v2 + 2v13 γ= 3 (v2 − v12 ) 2 v4 − 4v1 v3 + 6v12 v2 − v14 . κ= 2 (v2 − v12 ) To nd the moments estimates we have to minimise the function. min. λ3 ,λ4. in the region. p (γ − γˆ )2 + (κ − κ ˆ )2. (−1/4, ∞) × (−1/4, ∞),. (3.6). which guarantees the existence of the.

(52) 3 Quantile Distributions. 32. moments up to the fourth, and then calculate the other two parameters. p 2 ˆ 2 = v2 − v1 λ σ ˆ 1 1 1 ˆ1 = µ λ ˆ+ − , λ2 λ3 + 1 λ4 + 1 where. µ ˆ. and. σ ˆ. (3.7) (3.8). are the sample mean and sample standard deviation, respec-. tively.. 3.3 Percentile-based methods The method of percentiles was introduced by Karian and Dudewicz (1999) as an alternative to the method of moments.. Karian and Dudewicz (1999) developed a method to estimate the parameters based on the percentiles of the observed distribution. As we highlighted at the beginning of this chapter, the GLD is a very exible distribution and can be an interesting tool to model fat tails. For some values of the parameters (λ3. < −1/4. or. λ4 < −1/4),. the GLD does not have the rst four mo-. ments, thus the method of moments is limited. To avoid this problem, Karian and Dudewicz (1999) estimated and. λ4. λ. through the percentiles. Even though. λ3. have to be determined numerically, the equations are simpler and the. estimates are more accurate.. They chose four statistics based on the quantiles:. inter-percentile range,. ρ2 ,. the median,. the left-right tail weight ratio,. ρ3 ,. ρ1 ,. an. and the tail-.

(53) Percentile-based methods. 33. weight factor,. ρ4 , 0.5λ3 − 0.5λ4 λ2 −1 −1 F (1 − p) − F (p) (1 − p)λ3 − pλ4 + (1 − p)λ4 − pλ3 λ2 −1 −1 F (0.5) − F (p) F −1 (1 − p)F −1 (0.5) (1 − p)λ4 − pλ3 + 0.5λ3 − 0.5λ4 (1 − p)λ3 − pλ4 + 0.5λ4 − 0.5λ3 F −1 (0.75) − F −1 (0.25) ρ2 λ3 0.75 − 0.25λ4 + 0.75λ4 − 0.25λ3 , (1 − p)λ3 − pλ4 + (1 − p)λ4 − pλ3. ρ1 = F −1 (0.5) = λ1 + ρ2 = = ρ3 = = ρ4 = = where. F −1 (·). (3.9). (3.10). (3.11). (3.12). is the inverse distribution function dened in Equation (3.2).. Equating these quantile statistics to the sample statistics, a system of four equations is obtained that has to be solved partly numerically. One of the. 0<. main problems is choosing the value of (2007) quantied the inuence of. p. 1 n+1. ≤ p < 0.25.. Fournier et al.. in the estimates and their corresponding. condence intervals. They found that there is not a best choice of. p,. but it. depends on the scope of the analysis. However, very small values of. p. reect. in large standard errors for. xp. and. x1−p ,. less accurate results and larger bias. in estimating the rst four moments. In the literature, chosen, a value allowing samples as small as. p = 0.1. is commonly. 10 observations still providing a. good tail weight measure.. First, we need to nd numerically. ˆ 3 and λ ˆ4: λ. compiled a wide tabulation of values of. ρˆ3. and. ρˆ4 .. ˆ3 λ. and. Dudewicz and Karian (1999). ˆ4 λ. given the sample statistics. Without using their table, it is possible to solve the equations.

(54) 3 Quantile Distributions ρ3 = ρˆ3. and. ρ4 = ρˆ4. 34. by minimising the objective function. min. p. ρ3 ,ρ4. (ρ3 − ρˆ3 )2 + (ρ4 − ρˆ4 )2 .. (3.13). The method is preferable to moment estimation, however the solution might not be unique and depends on the starting values of the numerical maximisation. Until now, there are no results on the asymptotic behaviour of the estimates.. 3.4 Starship estimation Percentile and moment-based methods are quite simple but they necessitate an a posteriori goodness-of-t test to conrm the validity of the numerical solution. King and MacGillivray (1999) proposed a method incorporating a measure of goodness-of-t.. They adapted the starship method introduced. by Owen (1988) to estimate the parameters of the generalised lambda distribution. The method is computationally intensive since it seeks the best set of parameters performing a grid search in four dimensions. For each set of values it is necessary to calculate numerically the cumulative distribution function. F (x).. The estimation procedure consists of three steps: 1. given a sample numerically. X1 , . . . , X n. F (X),. and range of. λ1 , λ2 , λ3. and. λ4 ,. the cumulative distribution function;. 2. choose a goodness-of-t measure to test for the closeness of uniform. (0, 1). calculate. F (X). to a. distribution;. 3. choose the combination of parameters. (λ1 , λ2 , λ3 , λ4 ) that minimise the. goodness-of-t measure of the previous step..

(55) Estimation based on L-moments. 35. To restrict the region of research, King and MacGillivray (1999) give some suggestions. They showed, by means of simulation, that the starship method is at least as good as the method of moments and another method proposed by Öztürk and Dale (1982), often outperforming them.. To overcome the. computational burden, Fournier et al. (2007) suggest a method to build a more ecient grid: their strategy reduces the grid from a four dimensional space to a two dimensional space. Even in the two dimensional space, the task is challenging, because we are trying to minimise a highly non-linear bivariate function with several minima.. 3.5 Estimation based on L-moments Hosking (1990) developed L-moments as an alternative to traditional moments. The only condition for the L-moments to exist is a nite mean, thus they can characterise a wider range of distributions than traditional moments; nevertheless, they provide similar information about the distribution in terms of position, scale, skewness and kurtosis. Since L-moments are linear combinations of order statistics, they are less biased and less sensitive to outliers. Finally, estimation through L-moments match or even outperform maximum-likelihood based estimation.. Unfortunately, in the case of the GLD, the L-moments exist if and only if. λ3 , λ4 > −1,. thus they do not cover all the possible combinations of pa-. rameters. Karvanen and Nuutinen (2008) provide the rst four L-moments.

(56) 3 Quantile Distributions. 36. for the estimation of the parameters of the GLD:. 1 1 1 L1 = λ1 − − (3.14) λ2 1 + λ4 1 + λ3 1 2 1 2 L 2 λ2 = − + − + (3.15) 1 + λ3 2 + λ3 1 + λ4 2 + λ4 1 6 6 1 6 6 L 3 λ2 = − + − + − (3.16) 1 + λ3 2 + λ3 3 + λ3 1 + λ4 2 + λ4 3 + λ4 12 30 20 1 + − + L 4 λ2 = − 1 + λ3 2 + λ3 3 + λ3 4 + λ3 1 12 30 20 − + − + . (3.17) 1 + λ4 2 + λ4 3 + λ4 4 + λ4 Estimation of GLDrs parameters through L-moments follows the same rule as estimation through moments or percentiles. 1. Calculate the sample statistics. 2. Find numerically. 3. Finally,. ˆ1 λ. and. ˆ2 λ. ˆ3 λ. and. ˆ4 λ. ˆ 1, L ˆ 2 , τˆ3 = L ˆ 3 /L ˆ2 L. and. ˆ 4 /L ˆ 2. τˆ4 = L. given the objective function (3.13).. can be found by solving Equations (3.14) and (3.15).. This method is computationally very ecient compared to the starship method, giving similar results in terms of variance and bias.. 3.6 Maximum likelihood estimation Equation (3.2) denes a probability function in terms of percentiles. It is not possible to nd a closed form for it, however with a little manipulation Su (2007) derived a log likelihood that can be estimated numerically. He used the chain rule to dierentiate Equation (3.2) and obtain the log-likelihood function for RS-parameterisation. l(xi ; λ) =. n X i=1. . λ2 , log λ3 pλi 3 −1 + λ4 (1 − pi )λ4 −1. (3.18).

(57) Maximum likelihood estimation. 37. where. pi = F (xi ; λ). is the sample percentile for a given set of parameters. λ.. The likelihood function for the FMKL-parameterisation is given by. lF M KL (xi ; λ) =. n X i=1. . λ2 log λ3 −1 . pi + (1 − pi )λ4 −1. (3.19). The procedure to maximise Equation (3.18) involves ve steps.. 1. Specify a range of values for good range is. λ3. and. λ4 .. The experience suggests that a. (−1.5, 1.5) for RS GLD and (−0.25, 1.5) for FMKL GLD.. Generate 1000 values for each parameter from a uniform distribution with range specied above.. 2. Calculate. λ1. and. λ2. for each. λ3. and. λ4. based on Equations (3.9). and (3.10) for RS parameterisation and Equations (3.7) and (3.8) for FMKL parameterisation. Remove the vectors that are not a legitimate parametrisation for RS GLD (see Equation (3.3)) or do not span the entire region of the data set.. 3. Calculate numerically through Equations (3.2) and (3.4) the percentiles for each legitimate vector at step 2.. Select the set of values which. minimise Equation (3.13) and (3.6).. 4. Evaluate the log-likelihood given by Equations (3.18) and (5.6).. 5. Obtain a suitable result with a numerical optimisation algorithm, e.g., Nelder-Mead Simplex.. Su (2007) showed by simulation that this is by far the best estimation method in terms of variance and is at least as good in terms of bias, the only problem being the computational burden necessary for the estimation..

(58) 3 Quantile Distributions. 3.7 Conclusions The past two chapters have been dedicated to a literature review of VaR and GLDs. The next chapters of this thesis try to answer the following research questions.. •. When dealing with nancial returns, it is important to have an idea of the underlying DGP in order to choose a good model. What is a good strategy to conduct exploratory analyses? We have seen the estimation of VaR is still an open problem, even though a wide literature exists on the topic. Each scenario has its own characteristics, which are fundamental for choosing the best technique to estimate VaR. We deem the importance of exploratory techniques for obtaining insight into of the underlying DGP. We try to answer to this question in Chapter 4 through the use of local linear regression.. •. Is time-varying skewness relevant in the estimation of VaR? A wide literature has been trying to answer to this question. The GLDs can model skewness, and also kurtosis, thanks to its four parameters. One of the most interesting aspects of these distributions, is that they do not assume the existence of moments a priori and this could open the way to a new way of approaching time-varying moments estimation.. •. Until now GLDs have not been used, to our knowledge, to estimate VaR. Are these distributions bringing improvement in the estimation of VaR? We deem that the adaptability of these distributions could be benecial to the estimation of VaR.. 38.

(59) Chapter 4. Estimating Volatility with Local Linear Regression. In Chapter 2, we have seen the important role played by volatility as a measure of risk and in the estimation of VaR. Volatility alone is not always enough to provide a precise forecast of VaR and one of the main reasons is the existence of skewness in the data:. the topic has been addressed in. many papers in the last decade with controversial results. In particular, the existence of skewness is widely recognised, whereas it is still an open problem if skewness is constant or varies over time (see Section 2.8 for more detail on skewness), given that also mean and variance are usually modelled. In this chapter, we propose to deal with time-varying mean, variance and skewness with local linear regression.. The models proposed until now are. mostly parametric, whereas local linear regression is semi-parametric and does not require a model to be specied a priori. We claim that local linear regression could be useful for exploratory analyses, to give an idea of the underlying data generating process (DGP). A good exploratory analysis is fundamental to choose, in a second step, the best model to t the data.. 39.

No results found