Extreme Value Theory with Applications in Quantitative Risk Management

(1)

Extreme Value Theory with Applications in

Quantitative Risk Management

Henrik Skaarup Andersen

David Sloth Pedersen

Master’s Thesis

Master of Science in Finance

Supervisor: David Skovmand

Department of Business Studies

2010

Aarhus School of Business Aarhus University

(2)

...or how we learned to stop worrying

and love ’em fat tails

(3)

Abstract

In this thesis we investigate extreme value theory and its potential in financial risk management. In the first part of the thesis, we provide a thorough and rigorous exposition of extreme value theory (EVT). We describe the theoretical foundation of the theory covering the fundamental theorems and results. In relation to this, we explicitly emphasize the statistical issues and limitations of the theory with applications in financial risk management in mind. Moreover, we discuss how the theory may be applied to financial data and the specific issues that may arise in such applications. Also, we approach the issue of working with multivariate risk factors using copula theory and discuss some copula results in multivariate extreme value theory.

In the second part of the thesis, we conduct an empirical study of the performance of EVT-based risk measurement methods based on an equally-weighted portfolio composed of three Danish stocks. The performance of the methods is evaluated by their ability to accurately estimate well-known risk measures such as Value at Risk (VaR) and Expected Shortfall (ES).

Treating the portfolio value as a single risk factor, we consider a univari-ate EVT-method, HS-CONDEVT, which combines GARCH-type modeling of volatility and fitting of a generalized Pareto distribution to the tails of the underlying distribution. Compared to the performance of alternative univari-ate methods for risk measurement, the empirical results demonstrunivari-ate that HS-CONDEVT outperforms alternative univariate methods such as histori-cal simulation (HS) and HS combined with a GARCH-type model assuming normally distributed innovations. Moreover, HS-CONDEVT is found to be a viable alternative to filtered HS and to HS combined with a GARCH-type model assumingtdistributed innovations.

Treating the three stocks in the portfolio as risk factors, we consider a multivariate EVT-based method, MCONDEVT, which combines a copula with margins based on GARCH-type modeling and GPD fitting. MCON-DEVT is implemented in three variants using three different copulas; a Gaus-sian, at, and a Gumbel copula. Comparatively, we find that the variants of the MCONDEVT method outperform other multivariate methods such as variance-covariance (VC), VC combined with a multivariate EWMA model, multivariate GARCH based on a constant conditional correlation structure, and multivariate GARCH based on a dynamic conditional correlation struc-ture.

Finally, comparing the performance univariate and multivariate meth-ods altogether, we find that the implemented variants of the MCONDEVT method are among the top performing methods. Especially, MCONDEVT based on atcopula appears to have the best overall performance among the competing methods.

(4)

List of Tables

4.1 Descriptive statistics for the portfolio losses . . . 71

4.2 Parameter estimates and descriptive statistics for the stan-dardized residuals . . . 74

4.3 1-day VaR-based backtest results: Right tail . . . 78

4.4 1-day VaR-based backtest results: Left tail . . . 80

4.5 10-day VaR-based backtest results . . . 81

4.6 1-day ES-based backtest results . . . 83

4.8 Sensitivity analysis: VaR-based backtest results . . . 86

4.9 Sensitivity analysis: ES-based backtest result . . . 87

4.10 Descriptive statistics for the risk factor return series . . . 88

4.11 Parameter estimates and descriptive statistics for the stan-dardized residuals . . . 90

4.12 1-day VaR-based backtest results: Right tail . . . 91

4.13 1-day VaR-based backtest results: Left tail . . . 94

4.14 10-day VaR-based backtest results . . . 95

4.17 Sensitivity analysis: VaR-based backtest results . . . 101

4.18 Sensitivity analysis: ES-based backtest result . . . 102

B.1 Information criteria values for the fitted dynamic models . . . 132

B.2 1-day VaR-based backtest results: Right tail . . . 133

B.3 1-day VaR-based backtest results: Left tail . . . 134

B.4 1-day ES-based backtest results . . . 135

B.5 10-day VaR-based backtest results . . . 136

(7)

List of Figures

4.1 Time series of portfolio losses. . . 70 4.2 Correlograms for the in-sample raw portfolio losses (A) and

their squared values (B) as well as for the total sample raw portfolio losses (C) and their squared values (D). . . 72 4.3 Information criteria values for the fitted models. AR1 denotes

a first order autoregressive mean specification, t and n denotes the distribution assumption, and GPQ denotes a GARCH-type variance specification withP andQ order numbers. . . . 73 4.4 Correlograms for the in-sample raw standardized residuals (A)

and their squared values (B) as well as for the total sample raw standardized residuals (C) and their squared values (D) ex-tracted from the AR(1)-GJR(1,1) model fitted with the QML estimation method. . . 75 4.5 Correlograms for the in-sample raw standardized residuals (A)

and their squared values (B) as well as for the total sample raw standardized residuals (C) and their squared values (D) extracted from the AR(1)-GJR(1,1) model fitted with ML un-der the assumption oft distributed innovations. . . 76 4.6 QQ-plots for the in-sample standardized residuals versus a

normal distribution (A) and a t distribution (B) as well as for the total sample standardized residuals versus a normal distribution (C) and at distribution (D). . . 77 4.7 1-day VaR_α_=99% estimates plotted against the portfolio losses. 81 4.8 10-day VaR_α_=99% estimates plotted against the 10-day

port-folio losses. . . 83 4.9 Time series of risk-factor returns. These are log-returns on

(A) NOVO B, (B) CARLS B, and (C) DANSKE . . . 88 4.10 1-day VaR_α_=99% estimates plotted against the portfolio losses. 96 4.11 10-day VaR_α_=99% estimates plotted against the 10-day

(8)

Chapter 1

Introduction

“In the last fifty years, the ten most extreme days in the financial markets represent half the returns.” [Taleb, 2007, p. 275].

Lately, the financial crisis has exposed major shortcomings of the tradi-tional risk assessment methodologies in terms of capturing the risk of rare but damaging events, which has made the search for better approaches to risk modeling and measurement more crucial than ever. The above quote by Taleb from his famous book The Black Swan captures the essence of what we are up against.

By its very nature, the risk of extreme events (e.g. very large losses) is related to the tails of the distribution of the underlying data generating process. Thus, a crucial challenge in getting good risk measure estimates is to be able to estimate the tail of the underlying distribution as accurately as possible.

Since the pioneering works of Mandelbrot [1963] and Fama [1965], sev-eral studies have documented that financial return series have more mass in the tail areas than what would be predicted by a normal distribution. In other words, the return distributions have fat-tails causing the probability of extreme values to be higher than in a normal distribution. To capture this phenomenon, early studies tried to model the distribution of financial returns using stable distributions like the Cauchy distribution. However, because fi-nancial theory almost always requires finite second moments of returns, and often higher moments as well, these distributions have lost their popular-ity [Campbell et al., 1997]. Instead, more recent studies have resorted to some kind of mixture distribution, e.g. the Normal-Inverse-Gaussian or the

(9)

Variance-Gamma distributions, which are more tractable as moments of all orders exist.

For the purpose of measuring financial risk, however, our practical inter-est is concentrated on the tails. So, instead of forcing a single distribution for the entire return series, one might just investigate the tails of the returns using some kind of limit laws. This is where extreme value theory may be-come the star of the show by providing statistically well-grounded results on the limiting behavior of the underlying distribution.

Pioneered by Fisher and Tippett [1928], Gnedenko [1943], and Gumbel [1958] and later by Balkema and de Haan [1974] and Pickands [1975], extreme value theory has been around for quite some time as a discipline within probability and statistics. Applications of the theory have since appeared in diverse areas such as hydrology or wind engineering. Only recently, though, has extreme value theory seen the light of day within the realms of finance, the first comprehensive volume on the theory completely devoted to finance and insurance being Embrechts et al. [1997]. However, since its introduction to finance, the body of research on financial applications of extreme value theory has grown considerably.

With respect to tail estimation and risk measurement, two crucial prop-erties make extreme value theory particularly attractive. First, it is based on well-established and sound statistical theory. Secondly, it offers a parametric form for the tail allowing us to model rare and extreme phenomena that lie outside the range of available observations. Thus, extreme value theory may provide the means to obtain more accurate risk measure estimates that are true to the extremal or fat-tailed behavior of the underlying distribution. In this thesis, we wish to investigate this possibility further.

1.1

Purpose and Research Questions

The purpose of this thesis is to investigate extreme value theory and its potential in financial risk management. We will provide a thorough and rigorous exposition of the theoretical foundation of extreme value theory. In relation to this, we will explicitly emphasize the statistical issues and limitations of the theory with applications in financial risk management in mind. Moreover, we will discuss how the theory may be applied to financial data and the specific issues that may arise in such applications.

Using return data on three Danish stocks, we conduct an empirical study of the performance of risk measurement methods based on extreme value theory. Compared to the performance of various alternative methods for risk measurement, the methods are evaluated by their ability to accurately estimate well-known risk measures such as Value at Risk (VaR) and Expected Shortfall (ES). Furthermore, most studies on extreme value theory in finance have focused solely on a univariate setting where only one risk factor is

(10)

accounted for; this thesis will also investigate the performance of extreme value theory based methods in a more realistic, multivariate setting where we deal with more than one risk factor.

In conclusion, the thesis seeks to provide answers to the following three research questions:

1. What is the theoretical foundation of extreme value theory?

2. How can extreme value theory be applied to financial risk measurement and what kind of issues arise?

3. Compared to alternative risk measurement methods, how do meth-ods based on extreme value theory perform with respect to estimating Value at Risk and Expected Shortfall?

1.2

Delimitations

In our discussions of financial risk measurement as well as in the empirical study, we concentrate onmarket risk. Market risk is the risk of a movement in the value of a financial position due to changes in the value of the under-lying components on which the position depends, such as stock, bond, and commodity prices, exchange rates, etc. However, banks and other financial institutions are exposed to other categories of risks. One such category is credit risk, which is the risk of not receiving promised repayments on out-standing investments such as loans and bonds, because of the default of the borrower. Another category is operational risk, the risk of losses resulting from inadequate or failed internal processes, people and systems, or from external events [McNeil et al., 2005]. Even though, we consider these kinds of risks equally important, they will not be further covered in this thesis. Furthermore, in the empirical study, we only consider the market risk asso-ciated with a financial position in stocks, specifically three stocks listed on the Danish C20 index (OMXC20).

We also refrain from discussing nor applying other risk measures than Value at Risk and Expected Shortfall. We acknowledge that other risk mea-sures may have merit. However, since Value at Risk was sanctioned by the Basel Committee in 1996 for market risk capital requirement, it has be-come the standard measure for financial market risk [Wong, 2007]. Expected Shortfall is closely related to Value at Risk but so far less used in practice. Nonetheless, it addresses some of the deficiencies of Value at Risk that critics have pointed out. For these reasons, we concentrate on these two measures of risk.

Further delimitations will be made throughout the thesis when appropri-ate.

(11)

1.3

Structure

The main part of the thesis is structured in three chapters covering the theoretical framework of the thesis, the methodology of the empirical study, and ultimately a presentation of the empirical results. This is followed by a short chapter on the implications and limitations of our study, leading up to a conclusion in the final chapter.

Thus, the structure of the thesis is as follows

• Chapter 2 − The chapter presents the theoretical framework of the thesis. We first review some essential concepts and methods within quantitative risk management. Following this, we turn to the primary topic of the thesis, extreme value theory. We provide a thorough expo-sition of the theory and its main results with applications in financial risk management in mind. Finally, we approach the issue of working with multivariate risk factors using copula theory and discuss some copula results in multivariate extreme value theory.

• Chapter 3 − The chapter outlines the methodology of the empirical study. We first discuss the selection of financial data for our empirical investigation. We then turn to the selection and implementation of the different risk measurement methods, giving special emphasis to the statistical issues involved. Lastly, we describe the methodology used for backtesting and performance evaluation of the implemented risk measurement methods.

• Chapter 4 − The chapter presents the results of our empirical study. We evaluate the relative performance of the risk measurement methods, those based on extreme value theory as well as alternative methods, with respect to estimating Value-at-Risk and Expected Shortfall. To this end, we use both statistical tests and more qualitative assessments.

• Chapter 5 − The chapter discusses the implications and limitations of our study and main results. In this connection, ideas for further research within extreme value theory and its applications in financial risk management are proposed.

• Chapter 6−The final chapter summarizes our main findings and con-cludes on our study.

(12)

Chapter 2

Theoretical Framework

In this chapter we present the theoretical framework of the thesis. The con-cepts, theories, and results discussed in the following sections constitute the foundation of the empirical study. In Section 2.1, we discuss some essential concepts and methods within quantitative risk modeling and measurement, which will be used throughout the thesis. After this, we turn to the primary topic of the thesis in Section 2.2, namelyextreme value theory, giving a thor-ough and rigorous account and discussion of the theory and its central results with applications in financial risk management in mind. Finally, in Section 2.3, we approach the issue of working with multivariate risk factors using copula theory and we discuss some copula results in multivariate extreme value theory.

2.1

Quantitative Risk Modeling and Measurement:

Essential Concepts and Methods

In this section we introduce a series of concepts and definitions within the discipline of quantitative finance, which will be used throughout the the-sis. We start by describing some general empirical properties of financial return series data in Section 2.1.1. This is followed by a discussion of the concept of financial loss distributions and risk factors in Section 2.1.2. Here we especially dwell on the difference between unconditional and conditional loss distributions. In Section 2.1.3 we discuss well-known measures of finan-cial risk such as Value at Risk and Expected Shortfall. Finally, we outline the three main quantitative methods for modeling financial risk and their limitations in Section 2.1.4.

2.1.1 Empirical Properties of Financial Return Data

Since stock prices are mostly non-stationary (usually, integrated of order 1), it is common to model relative changes of prices, i.e. the log-return series

(13)

[Cont, 2001]. In this section, we give an overview of some typical properties of daily financial return data, which have become known asstylized facts. These properties often also extend to series of both longer (weekly or monthly) and shorter (intra-day) time interval series [McNeil et al., 2005].

Financial return series are not independently and identically distributed (iid). They tend to exhibit temporal dependence in the second moment. In other words, while return series seem to show little serial correlation, absolute or squared returns seem to be highly serially correlated, causing time-varying volatility and volatility clustering. Volatility clustering is the tendency for large returns (of either sign) to be followed by more large returns (of either sign) [Campbell et al., 1997, McNeil et al., 2005]. Also, Black [1976] found that negative innovations to stock returns tend to increase volatility more than positive innovations of similar magnitudes. This phenomenon has become known as the leverage effect.

Furthermore, Fama [1965] found that financial returns appear to have heavy-tailed or leptokurtic distributions. Compared to the normal or Gaus-sian distribution, return distributions tend to exhibit excess kurtosis (i.e. kurtosis larger than 3) indicating that returns have more mass in the tail areas than predicted by the normal distribution [Campbell et al., 1997]. In mathematical terms, the tails seem to display a slow, power-law type of decay different from the faster, exponential type of decay displayed by the normal distribution [Cont, 2001].

When we deal with multivariate return series we have similar stylized facts. While multivariate returns series show little evidence of cross corre-lation (except for contemporaneous returns), absolute returns of such series tend to exhibit cross correlation. In addition, the contemporaneous correla-tion between returns appears to be time varying. Also, multivariate returns series tend to exhibit tail orextremal dependence, i.e. extreme returns in dif-ferent return series often tend to coincide. Moreover, extremal dependence seems to be asymmetric; joint negative returns show more tail dependence than joint positive returns [McNeil et al., 2005]. The last two stylized facts correspond to the phenomenon that correlations observed in calm periods differ from correlations observed during financial turmoil.

2.1.2 Risk Factors and Loss Distribution

Consider a portfolio of financial assets and let Vt denote its current value.

The portfolio value is assumed to be observable at timet. The portfolioloss

over the time interval fromtto t+ 1is written as

Lt+1 =−(Vt+1−Vt) (2.1)

BecauseVt+1is unknown to us,Lt+1is random from the perspective of timet. The distribution ofLt+1 will be referred to as theloss distribution. Note that the definition of a loss presented here implicitly assumes that the portfolio

(14)

composition is constant over the considered time interval. The portfolio valueVtwill be modeled by a function of time and a set ofdunderlying risk

factors. We write

Vt=f(t,Zt), (2.2)

for some measurable functionf :R+×Rd7→R, where Zt= (Zt,1, . . . , Zt,d)0

denotes a d-dimensional vector of risk factors. We define the time series process of risk factor changes {Xt}t∈N, where Xt := Zt−Zt−1. Using the

functionf we can relate the risk factor changes to the changes in the portfolio value as

Lt+1=−(f(t+ 1,Zt+Xt+1)−f(t,Zt)). (2.3)

Given realizationszt of Zt, we define theloss operator l[t]:Rd7→Rat time tas

l_[_t_](x) :=−(f(t+ 1,zt+x)−f(t,zt), ∀x∈Rd, (2.4) and we can write Lt+1 = l[t](Xt+1) as shorthand notation for the portfolio loss.

In practice, it is often convenient to work with the so-called delta ap-proximation. Assuming that the mapping f is differentiable, we may use a first-order approximation of the loss operator instead of the true operator. We define thelinearized loss operator as

l_[∆_t_](x) :=− ft(t,zt) + d X i=0 fzi(t,zt)xi ! , (2.5)

where the termsftandfzi are the partial derivatives offwith respect to time

and risk factori. The linear approximation makes the problem of modeling l_[_t_] simpler to handle analytically by representing it as a linear function of the risk-factor changes. The quality of the approximation is influenced by the length of the time interval and the size of the second order derivatives. It works best for short time horizons and if the portfolio value is approximately linear in the risk factors.

As Zt is observable from the perspective of time t, the loss

distribu-tion is determined entirely by the distribudistribu-tion of Xt+1. If we assume that

{Xt}t∈N follows a stationary time series, we have to make a distinction

be-tween the conditional and unconditional loss distribution. If we assume instead that {Xt}t∈_N is an iid series, the two distributions coincide. Let

Ft = σ({Xs :s6t}) be the Borel σ-field representing all information on

the risk factor developments up to the present. This leads us to the formal definitions below.

Definition 1(Unconditional Loss Distribution). The unconditional loss dis-tribution FLt+1 is the distribution of l[t](·) under the stationary distribution

FX, whereFX denotes the unconditional distribution ofX assuming

(15)

Definition 2 (Conditional Loss Distribution). The conditional loss distri-bution FLt+1|Ft is the distribution of l[t](·) under FXt+1|Ft, where FXt+1|Ft denotes the conditional distribution of Xt+1 given Ft.

Conditional risk measurement focuses on modeling the dynamics of{Xt}t∈N

in order to make risk forecasts. If we do not model any dynamics, we basically assume thatX forms a stationary time series with a stationary distribution FX on Rd. We will mainly consider conditional risk measurement methods as they appear most suitable for market risk management and shorter time intervals. The worries of market risk managers center on the possible size of the short-term (e.g. one-day or two-week) loss caused by unfavorable shifts in market values. Thus, they are concerned about the tails of the conditional loss distribution, given the current volatility background [McNeil and Frey, 2000]. The unconditional loss distribution is more relevant when interest centers on possible worst case scenarios over longer periods (e.g. one year or more) and is more frequently used in credit risk management [McNeil et al., 2005].

2.1.3 Risk Measures

In this section we discuss statistical summaries of the loss distribution that quantify the portfolio risk. We call these summariesrisk measures. First, we introduce the so-calledaxioms of coherence, which are properties deemed de-sirable for measures of risk. Hereafter, we discuss two widely used measures of financial risk: Value at Risk and Expected Shortfall. Both risk measures consider only the downside risk, i.e. the right tail of the loss distribution.

Artzner et al. [1999] argue that a good measure of risk should satisfy a set of properties termed the axioms of coherence. Let financial risks be represented by a setMinterpreted here as portfolio losses, i.e. L∈ M. Risk measures are real-valued functions%:M 7→_R. The amount%(L) represents the capital required to cover a position facing a loss L. The risk measure% iscoherent if it satisfies the following four axioms:

1. Monotonicity: L1 6L2 =⇒%(L1)6%(L2). 2. Positive homogeneity: %(λL) =λ%(L), ∀λ >0. 3. Translation invariance: %(L+l) =%(L) +l, ∀l∈_R. 4. Subadditivity: %(L1+L2)6%(L1) +%(L2).

Monotonicity states that positions which lead to higher losses in every state of the world require more risk capital. Positive homogeneity implies that the capital required to cover a position is proportional to the size of that position. Translation invariance states that if a deterministic amount l is added to the position, the capital needed to coverLis changed by precisely

(16)

that amount. Subadditivity reflects the intuitive property that risk should be reduced or at least not increased by diversification, i.e. the amount of capital needed to cover two combined portfolios should not be greater than the capital needed to cover the portfolios evaluated separately.

In the following discussion of Value at Risk and Expected Shortfall, we put aside the distinction betweenl[t]andl∆_[_t_]and also between unconditional and conditional loss distributions, assuming that the choice of focus has been made from the outset of the analysis. Also, we denote the distribution function of the lossLt+1:=L byFL, so thatFL(x) =P(L6x), ∀x∈R. Value at Risk Value at Risk (VaR) is the maximum loss over a given period that is not exceeded with a high probability. We begin with a formal definition of the concept.

Definition 3 (Value at Risk). The Value at Risk (VaR) at confidence level

α ∈ (0,1) is defined as the smallest value x such that the probability of L

exceedingx is no larger than (1−α)

VaRα:= inf{x∈R:P(L > x)61−α}= inf{x∈R:FL(x)>α}. (2.6)

Using the concepts of generalized inverse and quantile functions given in Definition 4, it is clear that VaR is simply thequantileof the loss distribution FL. Consequently, (2.6) can be written as

VaRα :=qα(FL) =FL←(α). (2.7)

Definition 4 (Generalized Inverse and Quantile Function).

1. Given some increasing function F :R7→R, the generalized inverse of F is defined asF←(y) = inf{x∈_R:F(x)>y}, where we setinf{∅}=

∞.

2. At any confidence levelα∈(0,1), the quantile of a distribution function

F is defined as qα(F) = inf{x∈R:F(x)>α}=F←(α).

VaRα has been adopted into the regulatory Basel framework for banks as

the major determinant of the risk capital required for covering potential losses arising from market risks [Basel Committee on Banking Supervision, 2004]. A major advantage of VaRα is that it does not depend on a specific

kind of distribution and therefore, in theory, can be applied to any kind of financial asset [Danielsson, 2007]. In addition,VaRα is intuitively appealing

because of its ability to describe the financial risk of a portfolio in a single figure. Its simplicity makes it an attractive risk measure because it is easily comprehended and communicated to the quantitatively novice compared to other risk measures.

(17)

However, by definition VaRα gives no information about the size of the

losses which occur with probability smaller than1−α, i.e. the measure does not tell how bad it gets if things go wrong.

Moreover, Artzner et al. [1999] make the observation thatVaRα fails to

satisfy the axiom of subadditivity in all cases, implying that theVaRα of a

portfolio is not necessarily bounded above byVaRα of the individual

portfo-lio components added together.1 This is very unfortunate as non-subadditive risk measures can lead to misleading conclusions and wrong incentives, e.g. to avoid portfolio diversification and to split entire companies up into sepa-rate legal entities to reduce regulatory capital requirements. This conceptual deficiency has led to much debate and criticism of VaRα as a risk measure.

Given these problems with VaRα, we seek an alternative measure which

satisfies the axioms of coherence.

Expected Shortfall The second risk measure we consider is Expected Shortfall (ES). Again, we begin with a formal definition of the concept.

Definition 5 (Expected Shortfall). For a loss L with E(|L|) < ∞ and distribution functionFL, the Expected Shortfall (ES) at confidence level α∈

(0,1)is defined as ESα := 1 1−α Z 1 α qϕ(FL)dϕ= 1 1−α Z 1 α VaRϕ(L)dϕ, (2.8)

where qϕ(FL) =FL←(ϕ) is the quantile function of FL.

If the loss distributionFLis continuous,ESαcan be thought of as the average

loss givenVaRα is exceeded. That is

ESα:=

E LI{L>VaRα}

1−α =E(L|L>VaRα), (2.9)

whereI{L>V aRα} is a binary violation indicator.

ESα may be considered superior to VaRα for two reasons. First, in

contrast to VaRα, ESα gives an idea of how bad things can get, i.e. it

informs about the probable size of the worst loss which occur with probability 1−α. Second, Artzner et al. [1999] find that ESα satisfies the axioms of

coherence, including subadditivity (for a formal proof of ES being a coherent risk measure, see McNeil et al. [2005] p. 243).

2.1.4 Quantitative Methods for Risk Modeling

Statistical methods for modeling the distribution of a lossLt+1 =l[t](Xt+1) can be divided into three main methods: Variance-Covariance, Historical

1_{McNeil et al. [2005] demonstrate that the non-subadditivity of}_VaR

αcan occur when

the dependence structure is of a highly asymmetric form or when the portfolio components have highly asymmetric loss distributions.

(18)

Simulation, and Monte Carlo Simulation. We present the basics of each method, discuss its limitations and suggest possible extensions.

Variance-Covariance Method We begin by presenting the unconditional version of the variance-covariance (VC) method. In contrast to historical simulation and Monte Carlo methods, VC provides an analytical solution to the risk measure estimation problem which requires no simulation. The method is based on the following two assumptions:

1. The vector of risk factor changes Xt+1 has an (unconditional) multi-variate normal distribution denoted byXt+1 ∼ Nd(µ,Σ), where µ is

the mean vector and Σis the covariance matrix.

2. The linearized loss in terms of risk factorsL∆_t₊₁ :=l_[∆_t_](Xt+1) is a suffi-ciently accurate approximation ofLt+1.

The second assumption allows us to estimate risk measures based on the distribution of L∆_t₊₁ instead of Lt+1, which makes the estimation problem analytically tractable. Taken together, the assumptions ensure that the loss distribution is linear in the risk factor changes and univariate normal. Specif-ically, the linearized loss operator has the forml∆_[_t_](x) =−(ct+b0tx)and since

the multivariate normal distribution is stable under affine transformation, we have

l∆_[_t_](Xt+1)∼ N(−ct−btµ,b0tΣbt), (2.10)

where ct and bt denote some constant and constant vector known at time

t, respectively. The mean vector µ and the covariance matrix Σ are esti-mated from the risk factor change data Xt−n+1, . . . ,Xt. Estimates of the

risk measures VaR and ES are calculated from the estimated moments of the distribution.

The assumptions underlying the method may have some undesirable con-sequences. The linearized loss can be a poor approximation for portfolios with nonlinear instruments such as options or if risk is measured over long time horizons as the first-order approximation only works well with small risk factor changes. However, the most paramount disadvantage is the un-conditional normality assumption which may lead to underestimation of the risk exposure due to the small probability assigned to large losses [Hull and White, 1998].

A conditional version of the VC method is obtained if we alter the first assumption and instead assume that the vector Xt+1 follows a conditional multivariate normal distribution, i.e. Xt+1|Ft ∼ Nd(µt+1,Σt+1). In conse-quence, the conditional loss distribution has conditional mean E(L∆_t₊₁|Ft)

=−(ct+b0tµt+1) and conditional variance V ar(L∆t+1|Ft) =b0tΣt+1bt. The

conditional moments can be estimated based on a multivariate dynamic model, and risk measure estimates can then be calculated from these es-timated moments.

(19)

Historical Simulation In theHistorical Simulation (HS) method the loss distribution ofLt+1is estimated under the empirical distribution of historical data Xt−n+1, . . . ,Xt. Thus, the method does not rely on any parametric

model assumptions. It does, however, rely on stationarity of X to ensure convergence of the empirical loss distribution to the true loss distribution. The historically simulated loss series is generated by using the loss operator applied to recent historical risk factor changes. The univariate dataset of historically simulated losses is given by

{L˜s=l[t](Xs) :s=t−n+ 1, . . . , t}, (2.11)

where the valuesL˜srepresent the losses that would occur if the historical

risk-factor returns on daysreoccurred at time t+ 1. Statistical inference about the loss distribution and risk measures can be made using the historically simulated data L˜t−n+1, . . . ,L˜t.

To ensure sufficient estimation precision, HS requires large amounts of relevant and synchronized data for all risk factors. However, it is not always practically feasible to obtain such large appropriate samples of data. Even if data is available, the history of appropriate data may only contain a few if any extreme observations. Additionally, the unconditional nature of the method makes it likely to miss periods of temporarily elevated volatility which can result in clusters of VaR violations [Jorion, 2001].

We can combine HS with a univariate time series model calibrated to the historical simulation data and thereby estimate a conditional loss distri-bution. In principle, we are not estimating the conditional loss distribution defined in Section 2.1.2; we are estimating the conditional loss distribution F_L_t₊₁|Gt, where Gt = σ({L˜s : s 6 t}). Even though we are working with

a reduced information set, this simple method may work well in practice [McNeil et al., 2005].

Monte Carlo Simulation The main idea of theMonte Carlo(MC)method

is to estimate the distribution ofLt+1=l[t](Xt+1)under some explicit para-metric model for Xt+1. Unlike VC, we make no use of the linearized loss operator to make the estimation problem analytically tractable. Instead, we make inference about the loss distribution by simulating new risk factor change data.

MC is essentially a three-step method. First, we set up a data generating process (DGP) by calibrating the parameters of a suitable parametric model to historical risk factor change data Xt−n+1, . . . ,Xt. Second, we simulate

a large set of m independent future realizations of {Xt}t∈_N, denoted by ˜

X(1)_t₊₁, . . . ,X˜(_tm₊₁). Third, we construct Monte Carlo simulated loss data by applying the loss operator to each realization

(20)

Statistical inference about the loss distribution and risk measures is made using the simulated losses L˜(1)_t₊₁, . . . ,L˜(_tm₊₁). In contrast to HS, the method avoids the problem of having insufficient synchronized historical data. Also, we can address heavy tails and extreme scenarios through the pre-specified stochastic risk factor change process.

For large portfolios the computational costs of using MC can be large, especially if the loss operator is difficult to evaluate. This is the case when the portfolio holds complex instruments, e.g. derivatives for which no closed-form price solution is available. The same critique applies to HS but to a smaller degree since the sample size nrepresenting the number of historical simulations is usually smaller than the number of simulationsm in MC.

An alternative or supplement to MC is to use bootstrapping2. Where MC simulates new data by setting up a DGP and generating random num-bers from a hypothetical distribution, bootstrapping simulates new data by vector-wise random sampling from X= (Xt−1+n, . . . ,Xt) with replacement

as many times as needed [Efron and Tibshirani, 1993]. However, a large sample size is needed to ensure that the bootstrapped distribution is a good approximation of the true one. A further drawback is that any pattern of time variation inX is broken by the random sampling. This can be circum-vented by combining MC and bootstrapping. In this case, we would set up a DGP without assuming a theoretical innovation distribution but instead applying bootstrapping to the standardized residuals.

2.2

Extreme Value Theory

The purpose of this section is to give a thorough and self-contained account and discussion ofextreme value theory (EVT) with applications in financial risk management in mind, but without losing mathematical rigor.

Within the context of EVT, there are roughly two approaches to modeling extremal events. One of them is the direct modeling of the distribution of maximum (or minimum) realizations. These kinds of models are known as

block maxima models. The other approach is the modeling of exceedances of a particular threshold. Models based on this approach are known as peaks over threshold models. Today, it is generally acknowledged that the latter approach uses data more efficiently and it is therefore considered the most useful for practical applications [McNeil et al., 2005].

EVT rests on the assumption of independently and identically distributed (iid) data. In this thesis, however, we are concerned with financial time series data and a stylized fact of financial log-returns is that they tend to exhibit dependence in the second moment, i.e. while they are seemingly uncorrelated, the autocorrelation of the squared (or absolute) log-returns is significant. Consequently, when EVT is applied to financial time series data

2

(21)

we need to take temporal dependence into account. If not, we will produce estimators with non-optimal performance [Brodin and Klüppelberg, 2006].

The Sections 2.2.1 and 2.2.2 describe the block maxima and the peaks over threshold models, respectively, and are organized as follows: First, we present the mathematical concepts and results that constitute the theoret-ical foundation of the two extreme value modeling approaches. Second, we present the models, their assumptions, limitations, and statistical estimation based on maximum likelihood (ML). Third, we discuss how the models can be generalized and applied to financial time series data and the subtleties this involves. And finally, we discuss how to estimate quantiles and risk measures.

2.2.1 Modeling of Extremal Events I: Block Maxima Models Suppose that {Xi}i∈_N is a sequence of iid non-degenerate random vari-ables representing financial losses with common distribution functionF(x) = P(Xi≤x).

The Generalized Extreme Value Distribution LetMn =Wni=1Xi =

max(X1, ..., Xn) define the sample maxima of the iid random variables. In

classical EVT we are interested in the limiting distribution of affinely trans-formed (normalized) maxima. The mathematical foundation is the class of extreme value limit laws originally derived by Fisher and Tippett [1928] and summarized in Theorem 1.

Theorem 1 (Fisher and Tippett [1928], Gnedenko [1943]). If there exist norming constants cn>0 and dn∈R such that

c−_n1(Mn−dn) d

→H (2.13)

for some non-degenerate3 distribution function H, then H belongs to one of the following three families of distributions:

• Gumbel: Λ(x) = exp{−e−x}, x∈R. • Fréchet: Φα(x) = ( 0, x60, exp{−x−α}, x >0, α >0. • Weibull: Ψα(x) = ( exp{−(−x)α}, x60, α >0, 1, x >0.

A rigorous proof of the theorem can be found in Gnedenko [1943]. 3

A non-degenerate distribution function is a limiting distribution function that is not concentrated on a single point [McNeil et al., 2005].

(22)

The Λ,Φα and Ψα distribution functions are called standard extreme value distributions. In accordance with von Mises [1936] and Jenkinson [1955], we can obtain a one-parameter representation of the three standard distributions. This representation is known as the standard generalized ex-treme value (GEV) distribution.

Definition 6 (Generalized Extreme Value Distribution). The distribution function of the standard GEV distribution is given by

Hξ(x) =

(

exp−(1 +ξx)−1/ξ , ξ6= 0,

exp{−e−x}, ξ= 0, (2.14)

where 1 +ξx >0 andξ is the shape parameter.

The related location-scale familyHξ;µ,σ can be introduced by replacing the

argument x above by (x−µ)/σ for µ ∈ R, σ > 0; that is Hξ,µ,σ(x) :=

Hξ x−_σµ

. The support has to be adjusted accordingly. Moreover, due to its crucial role in determining the likelihood function when fitting the GEV distribution, we calculate thedensity function of the three-parameter GEV distribution, obtained by differentiatingHξ,µ,σ(x) with respect tox.

hξ,µ,σ(x) = (₁ σ 1 +ξ x−µ σ −1/ξ−1 exp n − 1 +ξx−_σµ−1/ξ o , ξ 6= 0, 1 σexp −x−_σµ exp−e−(x−µ)/σ , ξ = 0, (2.15)

Theorem 1 shows that affinely transformed maxima converge in distribution to the GEV distribution Hξ, and convergence of type (see Embrechts et al.

[1997], p. 121 and p. 554) insures that the limiting distribution is uniquely determined up to affine transformations.4

Under the iid assumptiom, the exact distribution function of the maxima Mnis

P(Mn6x) =P(X16x, ..., Xn6x) =Fn(x), x∈R, n∈N. (2.16) As a result of (2.16) and the fact that the extreme value distribution functions are continuous on_R,c−_n1(Mn−dn) d →H is equivalent to lim n→∞P(Mn6cnx+dn) = limn→∞F n_(c nx+dn) =H(x), (2.17)

or equivalently, by taking logarithms and usingln(1−y)∼ −y asy→0, we have

lim

n→∞n(1−F(cnx+dn)) = limn→∞n

¯

F(cnx+dn) =−lnH(x). (2.18)

4_{Using the identity}

min(X1, ..., Xn) =−max(−X1, ...,−Xn),

it can be shown that the appropriate limits of minima are distributions of type1−Hξ(−x),

(23)

In fact, we have the more general equivalence. For 0 6 τ 6 ∞ and any sequenceunof real numbers

lim n→∞n ¯ F(un) =τ ⇐⇒ lim n→∞P(Mn6un) =e −τ_. _(2.19)

whereF¯ is defined byF¯= 1−F and denotes the tail ofF.

Definition 7 (Maximum Domain of Attraction). If (2.17) holds for some norming constantscn >0,dn∈R and non-degenerate distribution function H, we say that the distribution functionF belongs to the maximum domain of attraction of the extreme value distributionH, and we write F ∈MDA(H).

Consequently, we can restate Theorem 1:

Theorem 2 (Fisher-Tippett-Gnedenko Theorem Restated). If F is in the maximum domain of attraction of some non-generate distribution function

H (F ∈MDA(H)), then H must be a GEV distribution, i.e. H belongs to

the distribution family Hξ.

The Fisher-Tippett-Gnedenko Theorem essentially says that the GEV dis-tribution is the only possible limiting disdis-tribution for normalized maxima. If ξ = α−1 > 0, F is said to be in the maximum domain of attraction of the Fréchet distributionΦα. Distributions in this class include the Pareto,t,

Burr, log-gamma, and Cauchy distributions. Ifξ= 0,F is in the maximum domain of attraction of the Gumbel distributionΛ, which includes distribu-tions such as the normal, log-normal, and gamma distribudistribu-tions. Finally, if ξ = −α−1 < 0, F is in the maximum domain of attraction of the Weibull distribution Ψα, which includes distributions such as the uniform and beta

distributions [McNeil et al., 2005, Embrechts et al., 1997].

Maximum Domain of Attraction In the following we will investigate what kind of underlying distributions F that give rise to which limit laws by characterizing the maximum domain of attraction of each of the three extreme value distributions.

For theFréchet distribution the maximum domain of attraction consists of distribution functions F whose tails are regularly varying5 with nega-tive index of variation. Regularly varying functions are functions which can be represented by power functions multiplied by slowly varying6 functions.

5

A Lebesque-measurable functionψ: R+ 7→R+ isregularly varying at∞with index

ρ∈Rif lim x→∞ ψ(tx) ψ(x) =t ρ , t >0, and we writeψ∈ RVρ. See Resnick [2007] ch. 2.

6

A Lebesque-measurable functionL: R+7→R+isslowly varying at∞if

lim

x→∞

L(tx)

L(x) = 1, t >0,

(24)

Thus, the distribution function F belongs to the maximum domain of at-traction of Φα,α > 0, if and only if F¯ = x−αL(x) for some slowly varying

functionL. That is

F ∈MDA(Φα) ⇐⇒ F¯ ∈ RV−α,

whereα= 1/ξ is called thetail index of the distribution. This class of dis-tribution functions contains very heavy-tailed disdis-tributions in the sense that E[Xk_{] =}_∞_,_{k > α} _{for some non-negative stochastic variable} _X _with

distri-bution function F ∈ MDA(Φα), which makes the distributions specifically

attractive for modeling large fluctuations in log-returns and other financial applications.

The maximum domain of attraction of the Weibull distribution consists of distribution functionsF with support bounded to the right, i.e. they have a finite right endpoint,xF = sup{x∈R:F(x)<1}<∞. The distribution function F belongs to the maximum domain of attraction of Ψα,α > 0, if

and only if xF < ∞ and F(x¯ F −x−1) = x−αL(x) for some slowly varying

functionL. That is

F ∈MDA(Ψα) ⇐⇒ xF <∞, F¯(xF −x−1)∈ RV−α.

The fact thatxF <∞renders this class of distributions the least appropriate

for modeling extremal events in finance. In practice, financial losses clearly have an upper limit, but often distributions with xF =∞ is favored since

they allow for arbitrarily large losses in a sample.

The maximum domain of attraction of the Gumbel distribution consists of the so-calledvon Misesdistribution functions and theirtail-equivalent dis-tribution functions. A disdis-tribution functionF is called avon Mises function

if there existsz < xF such thatF has the representation

¯ F(x) =cexp − Z x z 1 a(t)dt , z < x < xF,

where c is a positive constant, a(·) is a positive and absolutely continuous function with density a0 and limx→xFa

0_{(x) = 0. Furthermore, two}

distri-bution functions F and G are called tail-equivalent if they have the same right endpoint, i.e. xF =xG, andlimx→xF F¯(x)/G(x) =¯ cfor some constant

0< c <∞[Embrechts et al., 1997].

MDA(Λ) contains a large variety of distributions with very different tails ranging from light-tailed distributions (e.g. the exponential or Gaussian dis-tributions) to moderately heavy-tailed distributions (e.g. the log-normal or heavy-tailed Weibull distributions), which makes the Gumbel class inter-esting for financial applications alongside the Fréchet class [McNeil et al., 2005]. However, the tails of the distributions in the Gumbel class decrease to zero much faster than any power law and thereby the regularly varying power-tailed distributions of the Fréchet class.

(25)

A non-negative stochastic variable X with distribution function F ∈

MDA(Λ)has finite moments of any positive order, i.e. E[Xk]<∞for every k >0. Also, the distributions in the Gumbel class can have both finite and infinite right endpoints,xF 6∞[Embrechts et al., 1997].

Method for Block Maxima Modeling Based on the theoretical results presented in the previous sections, we are now ready to present the practical and statistical application of the block maxima model.

Assume that we have data from an underlying distribution with distri-bution function F ∈ MDA(Hξ) and these data are iid. We know from the

previous sections and Theorem 1, in particular, that the true distribution of the maximaMncan be approximated by a GEV distribution for largen. In

practice, we do not know the true distribution of losses and can therefore not determine the norming constantscnanddn; thus we use the three-parameter

specificationHξ,µ,σwhere we have replacedcnanddnbyσ >0andµ[McNeil

et al., 2005].

The implementation of the method is relatively straightforward. First, we divide the data intom blocks of sizenand collect the maximum value in each block, denoting the block maximum of thejth block byMn(j). This, of

course, requires that the data can be divided in some natural way. Assuming that we are dealing with daily return data (or similarly, daily losses), we could e.g. divide the data into monthly, quarterly or yearly blocks.7 However, to avoid seasonality, it might be preferable to choose yearly periods [Gilli and Kellezi, 2006].

Next, we fit the three-parameter GEV distribution to them block max-imum observations Mn(1), . . . , Mn(m). One estimation procedure is the

the-oretically well-established maximum likelihood (ML) method [Prescott and Walden, 1983, Hosking, 1985], which allows us to give estimates of statis-tical error for the parameter estimates. However, alternative methods do exist, e.g. Hosking et al. [1985] proposemethod of probability-weighted mo-ments (PWM) but the theoretical justification for this method is less well-established [Embrechts et al., 1997].

Assuming that the block size n is large enough so that the m block maximum observations can be assumed independent, regardless of whether the underlying data is dependent, then the likelihood function based on the dataMn(1), . . . , Mn(m) is given by L(ξ, µ, σ;M_n(1), . . . , M_n(m)) = m Y i=1 hξ,µ,σ(Mn(i)),

wherehξ,µ,σ is the density function of the GEV distribution given in (2.15).

(26)

By taking logarithms we obtain the log-likelihood function l(ξ, µ, σ;M_n(1), . . . , M_n(m)) = m X i=1 lnhξ,µ,σ(Mn(i)) = −mlnσ− 1 +1 ξ m X i=1 ln 1 +ξM (i) n −µ σ ! − m X i=1 1 +ξM (i) n −µ σ !−1/ξ

Themaximum likelihood estimators (MLE) of ξ,µ andσ are given by ( ˆξ,µ,ˆ ˆσ) = arg max

ξ,µ,σl(ξ, µ, σ;M

(1)

n , . . . , Mn(m)) (2.20)

subject toσ >0and1 +ξ(Mn(i)−µ)/σ >0,∀i. That is,ξ,ˆ µˆandσˆmaximize

the log-likelihood functionl(ξ, µ, σ;Mn(1), . . . , Mn(m))over the appropriate

pa-rameter space. In the so-called regular cases maximum likelihood estimation yields consistent, efficient, and asymptotically normal estimators [Heij et al., 2004]. However, the maximum likelihood problem (2.20) poses a non-regular case because the parameter space depends on the values of the data, or put equivalently, the support of the underlying distribution function depends on the unknown parameters. Fortunately, Smith [1985] shows that even in the non-regular case the resulting MLEs are consistent and asymptotically efficient wheneverξ >−1/2.

Generalization to Financial Time Series Data In the previous sec-tions we have restricted ourselves to iid series. However, extremal events often tend to occur in clusters caused by local dependence in financial data. If a large value occurs in a financial time series, we can usually observe a cluster of large values over a short period afterwards. In this section we will give the conditions on the stationary process {Xi}i∈_Nwhich ensure that its sample maxima Mn and the corresponding maxima M˜n of an iid sequence

˜

{Xi}_i∈_Nwith common distribution functionF exhibit similar limit behavior, i.e. the same type of limiting distribution applies.

Leadbetter et al. [1983] show that under two technical conditions the classes of limit laws for the normalized sequences Mn and M˜n are exactly

the same. The first condition is a distributional mixing condition under which the stationary series shows only weak long-range dependence. The second condition is an anti-clustering condition under which the stationary series shows no tendency to form clusters of large values (for details of these results, please refer to Leadbetter et al. [1983] or Embrechts et al. [1997]). The two conditions ensure that the stationary sequence {Xi}i∈N has the

(27)

Unfortunately, while the first condition is often a tenable assumption for financial time series, the anti-clustering condition is not. Financial time series often exhibit volatility clustering which in turn causes clusters of ex-tremal observations [McNeil, 1998]. The standard measure for describing clustering of extreme values of a process is the so-called extremal index of the process. The extremal index allows one to characterize the relationship between the dependence structure of the data and their extremal behavior.

Formally, if we let{Xi}i∈Nbe a stationary process andθ∈(0,1]. Assume

that for everyτ >0there exists a sequence un such that

lim n→∞n ¯ F(un) =τ, (2.21) and lim n→∞P(Mn6un) =e −θτ_, _(2.22)

then the process {Xi}i∈_N has extremal index θ. Observe that we have the same equivalence as in (2.19) except for the extremal index introduced in the limit of (2.22). For θ <1 there is a tendency of extreme values to cluster, while forθ= 1 there is no such tendency.8

Now, for any sequence of real numbersun, it can be shown that (2.21),

(2.22) and_P( ˜Mn6un)→exp{−τ} (cf. relation 2.19) are equivalent

[Lead-better, 1983]. From this, we infer

P(Mn6un)≈Pθ( ˜Mn6un) =Fnθ(un), (2.23)

for large enoughn. Thus, in the limit, the maximum ofnobservations from a stationary series with extremal index θ behaves like the maximum of nθ observations from the associated iid series.

Consequently, we have the following result for stationary time series.

Theorem 3. If {Xi}i∈_N is strictly stationary with extremal index θ∈(0,1]

then

lim

n→∞P{( ˜Mn−dn)/cn6x}=H(x), (2.24)

for a non-degenerate H(x) if and only if

lim

n→∞P{(Mn−dn)/cn6x}=H

θ_(x), _(2.25)

with Hθ(x) also non-degenerate.

Since the extreme value distributionH is max-stable9,Hθis of the same type as H, which means there exist constants c > 0 and d ∈ R such that

8

Strict white noise processes have extremal indexθ= 1

9

A non-degenerate random variableX (and its distribution) is calledmax-stable if it satisfiesmax(X1, . . . , Xn)

d

=cnX+dnfor appropriate constantscn>0,dn∈Rand every

(28)

Hθ = H(cx+d). This implies that the limits in (2.24) and (2.25) can be chosen to be identical after a single change of norming constants; raising the distribution function to the power θ only affects location and scaling parameters [Embrechts et al., 1997].

Thus, providedF ∈MDA(Hξ)for someξ, the asymptotic distribution of

normalized maxima of the stationary series{Xi}i∈Nwith extremal indexθis

also an extreme value distribution with exactly the same shape parameterξ as in the iid case. However, the dependence in{Xi}i∈Nhas the effect that the

convergence to the GEV distribution happens slower because the effective sample sizenθis smaller thann, which means that we have to choose larger blocks when fitting a GEV distribution to their maxima than in the iid case [McNeil, 1998].

See Embrechts et al. [1997] pp. 418-425 for approaches to estimating the extremal index.

Quantiles and Measures of Risk Despite of the fact that the block maxima model is considered less useful than the threshold models discussed in the next section, the method is not without practical relevance and could be used to provide estimates of stress losses [McNeil, 1998, McNeil et al., 2005]. The fitted GEV distribution of the block maxima allows for the determination of the so-calledreturn level, which can be considered as a kind ofunconditional quantile estimate for the unknown underlying distribution. Assuming that the maxima in blocks of lengthnfollow the GEV distribution with distribution functionHξ,µ,σ, then the k n-block return level is defined

as the(1−1/k)-quantile ofH

Rn,k=q1−1/k(H)

This is the level we expect to be exceeded in onen-block everyk n-blocks, on average. That is, assuming a model for annual (252 trading days per year) maxima, the 15-year return levelR252,15 is on average only exceeded in one year out of every 15 years. We shall call the n-block in which the return level is exceeded areturn or stress period. Using the parameter estimates of the fitted GEV distribution, we can estimate the return level as

ˆ Rn,k=H_ξ,−ˆ_µ,1_ˆ_σ_ˆ 1− 1 k = ˆµ+σˆ ˆ ξ −ln 1−1 k −ξˆ −1 ! (2.26)

The derivation of this expression can be found in Appendix A.1.

2.2.2 Modeling of Extremal Events II: Peaks over Threshold Models

The peaks over threshold (POT) models are concerned with modeling ex-ceedances over a certain threshold, referring to this as an extreme event.

(29)

The block maxima models that we discussed in the previous section are quite wasteful of data due to the trade-off between the size of the blocks and the number of blocks to be constructed from a given dataset. In contrast, POT models are more efficient in their use of the (often limited) data on extreme values as they retain all observations that are extreme in the sense that they exceed some defined threshold. Consequently, these models are generally considered the most useful for practical applications [McNeil et al., 2005]. Also, the block maxima models do not allow for the estimation of popular risk measures such as VaR and ES.

In this section we present the theory and statistical aspects of the POT models. Within the class of threshold exceedance models one can further distinguish between two competing analysis approaches. First, we have the

semi-parametric models based on upper order statistics such as the Hill es-timator [Hill, 1975], the Pickands eses-timator [Pickands, 1975], and the DEdH or moment estimator [Dekkers et al., 1989]. Embrechts et al. [1997], ch. 4, and Resnick [2007], ch. 4, provide excellent overviews of the theoretical foundation and the statistical properties of these estimators. For empirical applications in a financial context, consult Koedijk et al. [1990], Jansen and De Vries [1991], Lux [1996], Longin [1996], and Danielsson and De Vries [1997]. We refrain from considering these semi-parametric models in this thesis. Secondly, we have the fullyparametricmodels based on the approxi-mation of excess losses by the generalized Pareto distribution. These models provide parametric estimation of the tails and will be the focus of this sec-tion.

In the following we concentrate on the case of iid data but we note that the results also carry over to dependent processes with extremal indexθ= 1, which are processes that show no tendency to cluster.

Generalized Pareto Distribution The generalized Pareto distribution

(GPD) is the pivotal distribution for modeling the magnitudes of exceedances over a high threshold, i.e. excess amounts [Embrechts et al., 1997, McNeil et al., 2005]. In an EVT context, the GPD is usually expressed as a two parameter distribution.

Definition 8 (Generalized Pareto Distribution). The distribution function of the GPD is given by

Gξ,β(x) =

(

1−(1 +ξx/β)−1/ξ, ξ 6= 0,

1−exp{−x/β}, ξ = 0, (2.27)

where ξ ∈ _R, β > 0, and the support is x > 0 when ξ > 0 and 0 6 x 6 −β/ξ when ξ <0. ξ is the shape parameter of the distribution and β is an additional scaling parameter.

(30)

The GPD subsumes a number of specific distributions under its parameter-ization. When ξ > 0 then Gξ,β is a re-parameterized version of a

heavy-tailed, ordinary Pareto distribution and the kth moment E[Xk] is infinite

for k > 1/ξ; when ξ = 0 we have a light-tailed, exponential distribution;

and, finally, ξ <0corresponds to a bounded (i.e. short-tailed), Pareto type II distribution. Moreover, Gξ,β ∈MDA(Hξ) for any ξ ∈R. Finally, we can extend the distribution family by adding a location parameter µ ∈ _R, i.e. Gξ,µ,β(x) :=Gξ,β(x−µ). The support has to be adjusted accordingly. When

µ= 0 and β= 1, the representation is known as the standard GPD.

Because of its important role in the likelihood functions in the following sections, we take the first derivative of the distribution function to get the

density function

gξ,µ,β(x) = (

β−1(1 +x/β)−1/ξ−1, ξ 6= 0,

β−1exp{−x/β}, ξ = 0, (2.28)

Now, consider a series{Xi}i∈Nof iid random variables representing financial

losses with common distribution functionF ∈MDA(Hξ)and upper endpoint

xF 6∞. Let u be a certain threshold and denote by

Nu = card{i:Xi> u, i= 1, . . . , n}

the (random) number exceedances of u by X1, . . . , Xn. We denote the

losses exceeding u by X˜1, . . . ,X˜Nu and the corresponding excess amounts

by Y1, . . . , YNu, where Yj := ˜Xj−u. The conditional excess distribution of

X over thresholdu is given by

Fu(y) =P(X−u6y|X > u) = F(u+y)

−F(u)

1−F(u) , (2.29)

for 06y < xF −u. The excess distribution represents the probability that

a loss exceeds the thresholdu by at most an amounty, given that the loss exceeds the threshold.

A famous limit result by Pickands [1975] and Balkema and de Haan [1974], captured in Theorem 4, shows that the GPD is the natural limiting distribution for excesses over a high threshold.

Theorem 4 (Pickands [1975], Balkema and de Haan [1974]). For every

ξ∈_Rand some positive measurable function β(·) lim u→xF sup 06y6xF−u Fu(y)−Gξ,β(u)(y) = 0, (2.30)

if and only ifF ∈MDA(Hξ), ξ ∈R.

Thus, for any distribution F belonging to the maximum domain of attrac-tion of an extreme value distribuattrac-tion, the excess distribuattrac-tion Fu converges

(31)

(uniformly) to a generalized Pareto distribution (GPD) as the threshold u is raised. In other words, distributions for which the affinely transformed maxima converge to a GEV distribution constitute the set of distributions for which the excess distribution over a threshold converges to a GPD. In addition, the shape parameter ξ of the limiting GPD of excesses is exactly the same as that of the limiting GEV distribution of normalized maxima.

Consequently, in the limit, the distribution of excess amounts is gen-eralized Pareto. Hence, utilizing (2.27) and (2.30), the excess distribution functionFu may be approximated by the GPD for large enoughu

Fu(y)≈Gξ,β(u)(y), 06y < xF −u. (2.31)

Point Process of Exceedances In the previous section we solely dis-cussed the limit behavior of excess amounts and we found that the distribu-tion of excess amounts can be approximated by a generalized Pareto distri-bution. In contrast, this section focuses on the occurrence of exceedances in the limit. To this end, we need to clarify a few concepts and results from the theory of point processes.

Definition 9(Point Process). Assume the state space_E, i.e. the space where points live, is a complete separable metric space10 (c.s.m.s.) equipped with a

σ-field E of Borel subsets of _E generated by open sets, then a point process

N(·) on the state space _E is a measurable map from the probability space

(Ω,F,P) into the Borel-measurable space (Mp(E),Mp(E)), where Mp(E) is

the space of all point measures on _E and M_p(E) is the σ-algebra of Borel

subsets of Mp(E) generated by open sets.

In the following, we downplay the measure-theoretical notation somewhat to avoid disturbing the focus of the text. However, interested readers might want to consult Embrechts et al. [1997] Chapter 5 or Resnick [2007] Chapter 5 for short, accessible introductions to the notion of point processes with an explicit focus on extreme value theory. A more advanced and rigorous treatment can be found in Daley and Vere-Jones [2003].

Roughly, one can think of a point processN(·)onEsimply as a random distribution of pointsWi in state spaceE. Consider a sequenceW1, . . . , Wn

of random variables or vectors taking values in _E and define for any set A⊂E N(A) = card{i:Wi ∈A}= n X i=1 Wi(A),

whereWi is the Dirac measure for Wi∈Edefined by

Wi(A) =

(

1, Wi ∈A,

0, Wi ∈/A.

10

For our purposes, however, it is safe to assume a finite-dimensional Euclidean space [Embrechts et al., 1997].

(32)

The point process N(A) counts the number of points Wi falling into the

subset A of the state spaceE. One point process closely related to extreme value theory is the Poisson point process.

Definition 10 (Poisson Point Process). N(·) is called a Poisson point pro-cess onEwith mean measureµ, or synonymously, a Poisson random measure

PRM(µ) under the following two conditions:

1. For A∈ E and k>0,

P(N(A) =k) =

(

e−µ(A) (µ(_kA_!))k, µ(A)<∞,

0, µ(A) =∞.

2. For any m > 1, if A1, . . . , Am are mutually disjoint sets of E in E,

then N(A1), . . . , N(Am) are independent random variables.

After this short detour into the realms of point processes, let us return to the issue of modeling the occurrence of exceedances. As in the previous section, we are still considering the series{Xi}i∈Nof iid random variables representing

financial losses with common distribution functionF ∈MDA(Hξ)and right

endpoint xF 6∞.

If we let un denote a sequence of real thresholds, then for n ∈ N and 16i6nthepoint process of exceedances Nn(·) with state spaceE= (0,1]

Nn(A) = n X i=1 _n−1_i(A)I_{_X i>un}, n= 1,2, . . . (2.32)

counts the number of exceedances of the threshold un by the sequence

X1, . . . , Xn with time of occurrence in the set A, where A ⊂E. Note that the point process of exceedances is time-normalized, i.e. an observation Xi

exceeding thresholdunis plotted atn−1ion(0,1]and not ation(0, n]. Also,

note that (2.32) is considered an element in a sequence of point processes indexed byn.

Recall from (2.19) that the relation _P(Mn 6 un) → exp{−τ} holds if

and only if nF(u¯ n) =E n X i=1 I{Xi>un} ! →τ, n→ ∞, (2.33) for anyτ ∈[0,∞).

This implies that the sequence of point processes of exceedancesNn

con-verges in distribution to a homogeneous Possion process N on E = (0,1] with intensity τ (see Embrechts et al. [1997] Section 5.3.1 and Theorem 5.3.2., in particular). In fact, letting the sequence of thresholds un be

defined by un(y) := cny +dn for some fixed value y and combining

rela-tion (2.18) and (2.33), we can write the intensity as τ(y) = −lnHξ(y) = −lnHξ((u−dn)/cn).

(33)

Furthermore, replacing the norming constantsdnandcnbyµandσ >0,

respectively, it is clear that, in the limit, exceedances of the level x > u occur according to a homogeneous Poisson process with intensity τ(x) =

−lnHξ,µ,σ(x). We can understand the intensity as expressing the

instanta-neous risk of a new exceedance of the threshold u at time t. Clearly, this intensity does not depend on time and takes theconstant valueτ :=τ(x).

Method for Peaks over Threshold Modeling Based on the theoretical exposition of the asymptotic behavior of threshold exceedances in the pre-vious two sections, we can now formulate the peaks-over-threshold (POT) model for iid data. The model rests on the following two assumptions

Assumptions

1. Exceedances occur in time according to a homogeneous Poisson process with constant intensity.

2. Excess amounts are independently and identically distributed accord-ing to the generalized Pareto distribution, particularly they are inde-pendent of their location in time.

Under these assumptions, we can model extreme events as either a marked Poisson point process [Chavez-Demoulin et al., 2005] or a two-dimensional Poisson point process [McNeil et al., 2005]. In the marked Poisson process, the marks represent the excess losses over the thresholdu and the times of exceedance constitute the points. However, in this section, we use the bivari-ate representation which is a Poisson point process on two-dimensional space with points(t, x) representing times andmagnitudes of extreme events, i.e. losses exceeding thresholdu.

Bivariate Poisson point process representation. LetX1, ..., Xn be an iid

se-ries of random variables representing financial losses. Assuming the POT assumptions are satisfied, then the point process given by

Nu(·) = n X i=1 _n−1_i,X i(·) (2.34)

is a (non-homogeneous) Poisson process on the two-dimensional state space

E= (0,1]×(u,∞) with intensity

λ(t, x) =

(

σ−1(1 +ξ(x−µ)/σ)−1/ξ−1, if 1 +ξ(x−µ)/σ >0,

0, otherwise, (2.35)

at point (t, x). The Poisson process is non-homogeneous due to the fact that this intensity only depends on the loss magnitude x and not on the exceedance timet.

(34)

Obtained throughbackward engineering, this representation secures that the tails are generalized Pareto distributed, i.e. the excess amounts are iid GPD, and that exceedances occur in time according to a homogeneous Poisson process with constant intensityτ. To see this, we first calculate the mean measure of the process (2.34) for any subsetΩ = (t, T)×(x,∞)of the state space_E. We get

µ(Ω) = Z Z Ω λ(ϕ, ω)dωdϕ= Z T t τ(x)dϕ=−(T −t) lnHξ,µ,σ(x). (2.36)

See the calculations in Appendix A.2. From this, we see that for x > u, the implied one-dimensional point process of exceedances of the level x is a homogeneous Poisson process with intensity τ(x) = −lnHξ,µ,σ(x), i.e.

exceedances of the threshold levelxfollow a Poisson process in time and the instantaneous risk of a incurring a loss exceeding the levelx at any point in timetis the constant rate τ :=τ(x).

Moreover, calculating the tail of the excess distribution over thresholdu as the ratio of the intensities of excee