(EVT) in risk man-

(1)

Alexander J. McNeil

Departement Mathematik

ETH Zentrum

CH-8092 Zurich

Tel: +41 1 632 61 62

Fax: +41 1 632 15 23

[email protected]

May 17, 1999

Abstract

We providean overview oftherole of extremevalue theory(EVT)inrisk

man-agement (RM), as a method formodellingand measuring extreme risks. We

con-centrate onthepeaks-over-threshold(POT) modeland emphasizethegeneralityof

this approach. Wherever the tail of a loss distribution is of interest, whether for

market, credit, operationalor insurance risks, the POT method provides a simple

toolforestimatingmeasuresoftailrisk. InparticularweshowhowthePOTmethod

maybeembeddedinastochasticvolatilityframeworktodeliverusefulestimates of

Value-at-Risk (VaR) and expected shortfall,a coherent alternative to theVaR,for

marketrisks. Furthertopicsofinterest,includingmultivariateextremes,modelsfor

stress losses and software forEVT,arealso discussed.

1 A General Introduction to Extreme Risk

Extreme event riskispresent inallareasof riskmanagement. Whether weare concerned

with market, credit, operational or insurance risk, one of the greatest challenges to the

riskmanageristoimplementriskmanagementmodelswhichallowforrarebut damaging

events,and permit the measurement of their consequences.

This paper may be motivated by any numberof concrete risk management problems.

Inmarketrisk,wemightbeconcernedwiththe daytodaydeterminationofthe V

alue-at-Risk (VaR)for the losses we incur ona tradingbook due to adverse market movements.

In credit or operational risk management our goal might be the determination of the

risk capitalwerequire asacushionagainstirregularlosses fromcreditdowngradings and

defaults or unforeseen operationalproblems.

Alongside these nancial risks, it is also worth considering insurance risks; the

in-surance world has considerable experience inthe management of extreme risk and many

methodswhichwemightnowrecognize asbelongingtoextremevalue theory have along

AlexanderMcNeilisSwissReResearchFellowatETHZurichandgratefullyacknowledgesthe

(2)

reservesfor productswhichoerprotectionagainst catastrophiclosses,suchas

excess-of-loss (XL)reinsurance treaties concluded withprimary insurers.

Whatever the typeof riskwe are considering our approach toitsmanagementwillbe

similar in this paper. We will attempt to model it in such away that the possibility of

an extreme outcome is addressed. Using our modelwe willattempt to measure the risk

with a measurement which provides information about the extreme outcome. In these

activities extreme value theory (EVT) willprovidethe toolswe require.

1.1 Modelling Extreme Risks

The standard mathematicalapproach tomodellingrisks uses the languageof probability

theory. Risks are random variables, mapping unforeseen future states of the world into

valuesrepresentingprotsand losses. Theserisksmaybeconsideredindividually,orseen

aspartofastochasticprocesswherepresentrisksdependonpreviousrisks. Thepotential

valuesofariskhaveaprobabilitydistributionwhichwewillneverobserveexactlyalthough

past losses due to similar risks, where available, may provide partial information about

that distribution. Extreme events occur when a risk takes values from the tail of its

distribution.

We develop a model for a risk by selecting a particular probability distribution. We

mayhaveestimatedthisdistributionthrough statisticalanalysisofempiricaldata. Inthis

case EVT is a tool which attempts to provide us with the best possible estimate of the

tailarea of the distribution. However, even in the absence of useful historicaldata, EVT

provides guidance on the kind of distribution we should select so that extreme risks are

handled conservatively.

1.2 Measuring Extreme Risks

For our purposes, measuring a risk means summarising its distribution with a number

known asa risk measure. At the simplestlevel,we mightcalculatethe mean orvariance

of a risk. These measure aspects of the risk but do not provide much information about

the extreme risk. In this paper we will concentrate on two measures which attempt to

describe the tail of a loss distribution - VaR and expected shortfall. We shall adopt the

convention that a loss is a positive number and a prot is a negative number. EVT is

most naturallydevelopedasatheory oflargelosses, ratherthanatheory ofsmallprots.

VaRisahighquantileofthedistributionoflosses,typicallythe95thor99thpercentile.

It providesa kind of upper bound for a loss that is only exceeded ona smallproportion

of occasions. Itissometimesreferredtoasacondencelevel,althoughthis isamisnomer

whichis at odds with standard statisticalusage.

In recent papers Artzner, Delbaen, Eber & Heath (1997) have criticized VaR as a

measure of riskon two grounds. Firstthey showthat VaR isnot necessarilysubadditive

so that, in their terminology, VaR is not a coherent risk measure. There are cases where

a portfolio can be split into sub-portfolios such that the sum of the VaR corresponding

to the sub-portfolios is smaller than the VaR of the total portfolio. This may cause

problems if the risk-management system of anancial institution isbased on VaR-limits

for individual books. Moreover, VaRtells usnothing about the potentialsize of the loss

that exceeds it.

(3)

and is coherent according totheir denition.

1.3 Extreme Value Theory

The approach to EVT in this paper follows most closely Embrechts, Kluppelberg &

Mikosch (1997); other recent texts on EVT include Reiss & Thomas (1997) and

Beir-lant, Teugels & Vynckier (1996). All of these texts emphasize applications of the theory

in insurance and nance although much of the original impetus for the development of

the methodscame from hydrology.

Broadlyspeaking,therearetwoprincipalkindsofmodelforextremevalues. Theoldest

groupofmodelsaretheblockmaximamodels;thesearemodelsforthelargestobservations

collected from large samples of identically distributed observations. For example, if we

record dailyorhourly losses and protsfrom tradingaparticular instrumentorgroup of

instruments, the block maxima method provides a modelwhich may be appropriate for

the quarterly or annual maximum of such values. We see a possible role for this method

inthedenition andanalysisofstresslosses(McNeil1998)and willreturntothis subject

in Section4.1.

Amoremoderngroupofmodelsare thepeaks-over-threshold(POT)models;theseare

models for all large observations which exceed a high threshold. The POT models are

generally considered to be the most useful for practical applications, due to their more

eÆcient useof the(often limited)dataonextremevalues. This paperwillconcentrateon

such models.

Within the POT class of models one may further distinguish two styles of

analy-sis. There are the semi-parametric models built around the Hill estimator and its

rela-tives(Beirlantetal.1996,Danielsson,Hartmann&deVries 1998)andthefully

paramet-ric models based on the generalized Pareto distribution or GPD (Embrechts, Resnick &

Samorodnitsky1998). There islittleto pickand choose between these approaches -both

are theoreticallyjustiedand empiricallyusefulwhenusedcorrectly. Wefavour thelatter

style of analysis for reasons of simplicity - both of exposition and implementation. One

obtains simpleparametric formulae for measures ofextreme risk forwhichit isrelatively

easy to give estimates of statistical error using the techniques of maximum likelihood

inference.

The GPD will thus be the main tool we describe in this paper. It is simplyanother

probability distribution but for purposes of risk management it should be considered as

equally important as (if not more importantthan) the Normal distribution. The tails of

the Normal distribution are toothin to addressthe extremeloss.

We willnot describe the Hill estimatorapproach inthis paper; we referthe readerto

the references above and alsoto Danielsson& de Vries (1997).

2 General Theory

LetX

1 ;X

2

;::: be identicallydistributedrandomvariableswith unknown underlying

dis-tribution function F(x) = PfX

i

xg. (We work with distribution functions and not

densities.) The interpretationof these randomrisks isleft tothe reader. They mightbe:

Daily(negative)returns onnancialasset or portfolio{ losses and prots

(4)

Catastrophic insurance claims

Credit losses

Moreover, they might represent risks which we can directly observe or they might also

represent risks which we are forced tosimulate in some Monte Carlo procedure, because

of the impracticality of obtaining data. There are situations where, despite simulating

fromaknownstochasticmodel,thecomplexityofthesystem issuchthatwedonotknow

exactly what the lossdistribution F is.

Weavoidassumingindependence, whichfor certainof theabove interpretations

(par-ticularly marketreturns) is well-known tobeunrealistic.

2.1 Measures of Extreme Risk

Mathematically we dene our measures of extreme risk in terms of the loss distribution

F. Let1>q0:95 say. Value-at-Risk (VaR) isthe qthquantile of the distribution F

VaR

q =F

1

(q);

where F 1

isthe inverse of F,and expectedshortfall isthe expected losssize, given that

VaRis exceeded

ES

q

=E[X jX >VaR

q ]:

Like F itself these are theoretical quantities which we will never know. Our goalin risk

measurement isestimates d

VaR

q and

c

ES

q

of thesemeasures and in this chapterweobtain

twoexplicit formulae (6)and (10). The reader whowishes toavoid mathematicaltheory

may skim through the remaining sections of this chapter, pausing only to observe that

the formulae inquestion are simple.

2.2 Generalized Pareto Distribution

The GPD is a two parameter distributionwith distribution function

G

; (x)=

8

<

:

1 (1+x=) 1=

6=0;

1 exp ( x=) =0;

where >0,and where x0when 0and 0x = when <0.

Thisdistributionisgeneralizedinthesensethatitsubsumescertainotherdistributions

underacommonparametricform. isthe importantshapeparameterofthe distribution

and is an additionalscaling parameter. If >0 then G

;

is a reparametrized version

of the ordinaryParetodistribution, which has along history inactuarialmathematicsas

a modelfor large losses; = 0 corresponds to the exponential distribution and < 0 is

known asa Pareto type II distribution.

The rst case is the most relevant for risk management purposes since the GPD is

heavy-tailed when > 0. Whereas the normal distribution has moments of all orders, a

heavy-tailed distributiondoes not possess a complete set of moments. In the case of the

GPD with>0wendthat E[X k

(5)

fourth moment.

Certain types of large claims data in insurance typically suggest an innite second

moment; similarly econometricians might claim that certain market returns indicate a

distribution with innite fourth moment. The normal distribution cannot model these

phenomena but the GPD is used to capture precisely this kind of behaviour, as we shall

explain in the next sections. To make matters concrete we take a popular insurance

example. Our data consist of 2156 large industrial re insurance claims from Denmark

covering the years 1980 to 1990. The reader may visualize these losses in any way he or

she wishes.

2.3 Estimating Excess Distributions

The distribution of excesseslosses over ahigh threshold uis dened tobe

F

u

(y)=P fX uyjX >ug; (1)

for 0y<x

0

u where x

0

1 is the right endpoint of F, tobeexplained below. The

excess distribution represents the probability that a loss exceeds the threshold u by at

most anamount y, given the informationthat it exceeds the threshold. It is very useful

to observe that it can be writtenin terms of the underlyingF as

F

u (y)=

F(y+u) F(u)

1 F(u)

: (2)

Mostlywewould assumeour underlyingF isa distributionwithaninniteright

end-point,i.e.itallows thepossibilityofarbitrarilylargelosses, even ifitattributes negligible

probability tounreasonably large outcomes, e.g. the normal ort{distributions. But itis

also conceivable, in certain applications, that F could have a nite right endpoint. An

example is the beta distribution on the interval [0;1] which attributes zero probability

to outcomes larger than 1 and which might be used, for example, as the distribution of

credit losses expressed asa proportionof exposure.

Thefollowinglimittheorem isakey resultinEVTandexplainsthe importanceof the

GPD.

Theorem 1 Foralargeclassof underlyingdistributionswecanndafunction(u)such

that

lim

u!x0

sup

0y<x

0 u

jF

u

(y) G

;(u)

(y)j=0:

That is,foralarge class ofunderlyingdistributionsF, asthethreshold uis progressively

raised, the excess distribution F

u

converges to a generalized Pareto. The theorem is of

course not mathematically complete, because we fail to say exactly what we mean by a

large class of underlying distributions. For this paper it is suÆcient to know that the

class contains allthe commoncontinuous distributions of statistics and actuarialscience

(normal, lognormal, 2

, t, F, gamma,exponential,uniform, beta, etc.).

In the sense of the above theorem, the GPD is the natural model for the unknown

excess distributionabove suÆcientlyhigh thresholds,and this factisthe essential insight

on which our entire method is built. Our model for a risk X

i

(6)

to be exactlyGPD for some and

F

u

(y)=G

;

(y): (3)

AssumingwehaverealisationsofX

1 ;X

2

;::: weuse statisticstomakethe modelmore

precise by choosing a sensible u and estimating and . Supposing that N

u

out of a

total ofn data pointsexceed thethreshold, the GPDis ttedtothe N

u

excesses by some

statistical tting method to obtain estimates ^

and ^

. We favour maximum likelihood

estimation(MLE)oftheseparameters,wheretheparametervaluesarechosentomaximize

the joint probabilitydensity of the observations. This is the most generaltting method

instatisticsanditalsoallowsustogiveestimates ofstatisticalerror (standarderrors)for

the parameter estimates.

Choiceofthe threshold isbasicallyacompromisebetween choosingasuÆciently high

threshold so that the asymptotic theorem can be considered to be essentially exact and

choosing a suÆciently low threshold so that we have suÆcient material for estimation of

the parameters. For further informationonthis data-analytic issue see McNeil(1997).

Forour demonstrationdata, we take a threshold at10 (millionKrone). This reduces

our n=2156 losses to N

u

=109 threshold exceedances. On the basis of these data and

are estimated to be 0.50 and 7.0; the value of shows the heavy-tailedness of the

data and suggests a good explanatory model may have an innite variance. In Figure 1

the estimated GPD model for the excess distribution is shown as a smooth curve. The

empiricaldistributionofthe109 extremevaluesisshownby points;itisevidentthe GPD

modelts these excess losses well.

2.4 Estimating Tails of Distributions

By setting x = u+y and combining expressions (2) and (3) we see that our model can

also bewritten as

F(x)=(1 F(u))G

;

(x u)+F(u); (4)

for x>u. Thisformulashows that wemay moveeasilytoaninterpretationof the model

in terms of the tailof the underlying distributionF(x) for x>u.

Ouraimistouse (4)toconstruct atail estimatorand the onlyadditionalelementwe

require to dothis is anestimate of F(u). Forthis purpose we take the obvious empirical

estimator(n N

u

)=n. That is,we use the methodof historicalsimulation (HS).

An immediate question is, why do we not use the HS method to estimate the whole

tail of F(x) (i.e. for all x u)? This is because historical simulation is a poor method

in the tail of the distribution where data become sparse. In setting a threshold atu we

are judging that we have suÆcient observations exceeding u to enable a reasonable HS

estimate of F(u), but for higher levels the historicalmethod would be too unreliable.

Putting our HS estimate of F(u) and our maximum likelihood estimates of the

pa-rameters of the GPD together we arrive atthe tail estimator

^

F(x)=1 N

u

n

1+ ^

x u

^

1= ^

; (5)

itisimportanttoobservethatthisestimatorisonlyvalidforx>u. Thisestimatecanbe

(7)

bestunderstoodinthe situationwhenthe datamayalsobeassumed independent oronly

weakly dependent.

ForourdemonstrationdatatheHSestimateofF(u)is0.95((2156 109)=2156)sothat

ourthresholdispositioned(approximately)atthe95thsamplepercentile. Combiningthis

with our parametric model for the excess distribution we obtain the tail estimateshown

inFigure2. Inthis gurethey-axisactuallyindicatesthetailprobabilities1 F(x). The

topleftcornerofthegraphshowsthatthethresholdof10correspondstoatailprobability

of 0.05(asestimatedby HS).The pointsagainrepresentthe109 largelossesandthe solid

curve shows howthe tailestimationformulaallowsextrapolationintothe areawhere the

data becomea sparse and unreliable guide totheir unknown parent distribution.

2.5 Estimating VaR

For a given probability q > F(u) the VaR estimate is calculated by inverting the tail

estimation formula(5) toget

d

VaR

q

=u+ ^

^

n

N

u

(1 q)

^

1 !

: (6)

Instandardstatisticallanguagethisisaquantileestimate,wherethequantileisan

un-knownparameterofanunknownunderlyingdistribution. Itispossibletogiveacondence

interval for d

VaR

q

using a method known as prole likelihood; this yields an asymptotic

interval in which we have condence that VaR lies. The asymmetric interval reects a

fundamental asymmetry in the problem of estimating a high quantile for heavy-tailed

data: it iseasier to bound the intervalbelowthan to bound itabove.

In Figure 3 we estimate VaR

0:99

to be 27.3. The vertical dotted line intersects with

the tailestimateat the point(27:3;0:01) and allows the VaR estimateto beread o the

x-axis. The dotted curve is a tool to enable the calculation of a condence interval for

the VaR.The secondy-axison theright ofthe graphis acondence scale(not aquantile

scale). The horizontal dotted line corresponds to 95% condence; the x-coordinates of

the two points where the dotted curve intersects the horizontal line are the boundaries

of the 95% condence interval (23:3;33:1). Two things should be observed: we obtain a

wider 99% condence interval by dropping the horizontal line down to the value 99 on

the condence axis; the interval isasymmetric asdesired.

2.6 Estimating ES

Expected shortfallis related toVaRby

ES

q

=VaR

q

+E[X VaR

q

jX >VaR

q

]; (7)

where the second term is simply the mean of the excess distribution F

VaR

q

(y) over the

threshold VaR

q

. Our model for the excess distribution above the threshold u (3) has

a nice stability property. If we take any higher threshold, such as VaR

q

for q > F(u),

then the excess distribution abovethe higherthreshold isalsoGPD with the sameshape

parameter, but adierentscaling. Itis easilyshown that aconsequence of themodel(3)

is that

F

VaRq

(y)=G

;+(VaRq u)

(8)

VaR.Withthis modelwecan calculatemanycharacteristicsofthe lossesbeyond VaR.By

notingthat(provided<1)themeanofthedistributionin(8)is(+(VaR

q

u))=(1 ),

we can calculatethe expected shortfall. We nd that

ES

q

VaR

q =

1

+

u

(1 )VaR

q

: (9)

It isworth examining this ratio a littlemoreclosely inthe case where the underlying

distribution has aninniterightendpoint. In this case the ratio islargely determinedby

thefactor1=(1 ). Thesecondtermontherighthandsideof(9)becomesnegligiblysmall

as the probabilityq gets nearer and nearer to 1. This asymptoticobservation underlines

the importance of the shape parameter in tail estimation. It determines how our two

risk measures dier inthe extreme regions of the loss distribution.

Expected shortfall is estimated by substituting data-based estimates for everything

whichis unknown in (9)to obtain

c

ES

q =

d

VaR

q

1 ^

+

^

u

1 ^

: (10)

For our demonstration data 1=(1 ^

) 2:0 and ( ^

^

u)=(1 ^

) 4:0. Essentially

c

ES

q

is obtained from d

VaR

q

by doublingit. Our estimateis58.2 and wehavemarked this

withasecondverticallineinFigure4. Againusingtheprolelikelihoodmethodweshow

how anestimate of the 95% condence intervalfor c

ES

q

can beadded (41.6,154). Clearly

the uncertainty about the value of our coherent risk measure is large, but this is to be

expected with such heavy-taileddata. The prudent riskmanager should be aware of the

magnitude of his uncertainty about extreme phenomena.

3 Extreme Market Risk

In the marketrisk interpretationof our randomvariables

X

t

= (logS

t

logS

t 1 ) (S

t 1 S

t )=S

t 1

; (11)

representsthe lossonaportfoliooftradedassetsondayt, whereS

t

istheclosingvalue of

the portfolioonthatday. Wechangetosubscript ttoemphasizethe temporalindexingof

our risks. As shown above,the lossmaybedened asarelativeorlogarithmicdierence,

both denitions givingvery similarvalues.

In calculating daily VaR estimates for such risks, there is now a general recognition

thatthecalculationshouldtakeintoaccountvolatilityofmarketinstruments. Anextreme

value in aperiodof high volatility appears less extreme than the same value ina period

of lowvolatility. Various authorshave acknowledged the need to scale VaR estimates by

current volatility in some way (see, for example, Hull & White (1998)). Any approach

which achieves this we will call a dynamic risk measurement procedure. In this chapter

we focusonhow thedynamic measurementof marketriskscan befurther enhanced with

EVT totakeintoaccount the extreme risk overand abovethe volatility risk.

Most market return series show a great deal of common structure. This suggests

that more sophisticated modelling is both possible and necessary; it is not suÆcient

(9)

serialcorrelationofabsoluteorsquaredreturnsishigh;returnsshowvolatilityclustering{

the tendency oflargevaluestobefollowedbyotherlarge values,althoughnotnecessarily

of the same sign.

3.1 Stochastic Volatility Models

The most popularmodels for this phenomenon are the stochasticvolatility(SV) models,

whichtake the form

X

t =

t +

t Z

t

; (12)

where

t

is the volatility of the return on day t and

t

is the expected return. These

values are considered to depend in a deterministic way on the past history of returns.

The randomness in the model comes through the random variables Z

t

, which are the

noise variablesor the innovationsof the process.

Weassumethat thenoise variablesZ

t

are independentwithanidenticalunknown

dis-tributionF

Z

(z). (By convention we assumethis distributionhas mean zero and variance

1, so that

t

is directly interpretable as the volatility of X

t

.) Although the structure of

the model causes the X

t

to be dependent we assume that the modelis such that the X

t

are identicallydistributed withunknown distribution functionF

X

(x). In the languageof

time series we assume that X

t

isa stationary process.

Models which t into this framework include the ARCH/GARCH family. A simple

example is

t

= X

t 1

; (13)

2

t

=

0 +

1 (X

t 1

t 1 )

2

+

2

t 1 ;

with

0 ,

1

, > 0, +

1

< 1 and jj < 1. This is an autoregressive process with

GARCH(1,1) errors and with a suitably chosen noise distribution this is a model which

mimics many features of real nancialreturn series.

3.2 Dynamic Risk Management

Suppose we have followed daily market movements over a period of time and we nd

ourselves at the close of day t. In dynamic risk management we are interested in the

conditional return distribution

F

Xt+1+:::+X

t+k jFt

(x); (14)

where the symbol F

t

represents the history of the process X

t

up to and includingday t.

In looking at this distribution we ask, what is the distribution of returns over the next

k 1days, given the presentmarketbackground? This isthe issue indailyVaR(orES)

calculation.

This view can be contrasted with static risk management where we are interested in

the unconditional or stationary distribution F

X

(x) (or F

X

1 +:::+X

k

(x) for a k-day return).

Here we takea complementaryviewand askquestions like,howlargeisa 100 daylossin

general? What is the magnitude of a 5-year loss?

We redene our risk measures slightly tobe quantiles and expected shortfalls for the

distribution (14) and we introduce the notation VaR t

q

(k) and ES t

q

(k). The subscript t

shows that these are dynamic measures designed for calculation at the close of day t; k

(10)

The structure of the model(12) means that the dynamic measures take simple formsfor

a 1 day horizon.

VaR t

q

=

t+1 +

t+1

VaR(Z)

q

(15)

ES t

q

=

t+1 +

t+1 ES(Z)

q ;

where VaR(Z)

q

denotes the qth quantile of a noise variable Z

i

and ES(Z)

q

is the

corre-sponding expected shortfall.

The simplest approaches to estimating a dynamic VaR make the assumption that

F

Z

(z) is a known standard distribution, typically the normal distribution. In this case

VaR(Z)

q

is easily calculated. To estimate the dynamic measure a procedure is required

to estimate tomorrow's expected return

t+1

and tomorrow's volatility

t+1

. Several

approaches are available for forecasting the mean and volatility of SV models. Two

possibilities are the exponentially weighted moving average model (EWMA) as used in

Riskmetrics or GARCH modelling. The problem with the assumption of conditional

normality is that this tends to lead to an underestimation of the dynamic measures.

Empiricalanalyses suggestthe conditionaldistributionof appropriateSV modelsfor real

data is oftenheavier-tailed than the normaldistribution.

The trick as far as augmenting the dynamic procedure with EVT is concerned, is to

apply it to the random variables Z

t

rather than X

t

. In the EVT approach (McNeil &

Frey 1998) we avoid assumingany particular form for F

Z

(z); instead we apply the GPD

tailestimationproceduretothis distribution. Weassumethatabovesomehigh threshold

u the excess distribution is exactlyGPD. The problemwith the statistical estimation of

this modelis that the Z

t

variablescannot be directly observed, but this is solved by the

followingtwo stage approach.

Suppose atthe close ofday t weconsider a timewindow containing thelastn returns

X

t n+1

;:::;X

t .

1. AGARCH-typestochasticvolatilitymodel,typicallyanARmodelwithGARCH

er-rors,isttedtothehistoricaldatabypseudomaximumlikelihood(PML).Fromthis

model the so-calledresidualsare extracted. Ifthe modelistenablethese can be

re-gardedasrealisationsoftheunobserved,independentnoisevariablesZ

t n+1

;::: ;Z

t .

The GARCH-type modelis used tocalculate 1-steppredictions of

t+1 and

t+1 .

2. EVT is applied to the residuals. Forsome choice of threshold the GPD method is

used toestimateVaR(Z)

q

and ES(Z)

q

as outlinedinthe previouschapter. The risk

measures are calculated using equations(15).

3.4 Backtesting

The procedureabove, whichwe termdynamic orconditional EVT, isa successfulway of

adaptingEVTtothe special taskof dailymarketrisk measurement. This can beveried

by backtesting the method onhistoricalreturn series.

Figure5 shows a dynamicVaR estimateusing the conditional EVT method fordaily

losses on the DAX index. At the close of every day the method is applied to the last

1000 data points using a threshold u set at the 90th sample percentile of the residuals.

Volatilityand expected returnforecasts arebased onanAR(1)modelwith GARCH(1,1)

(11)

static EVT as inChapter 2. This changes onlygradually (as extreme observations drop

occasionallyfromthe back of the movingdata window).

A VaR estimation method is backtested by comparing the estimates with the actual

losses observed on the next day. A VaR violation occurs when the actual loss exceeds

the estimate. Various dynamic(and static)methodsofVaR estimationcan be compared

by counting violations; tests of the violation counts based on the binomial distribution

can show when a systematicunderestimation or overestimation of VaR seems to be

tak-ing place. It is also possible to devise backtests which compare dynamic ES estimates

with actual incurred losses exceeding the VaR on days when VaR violation takes place

(see McNeil&Frey(1998) for details).

S&P DAX

Lengthof Test 7414 5146

VaR

0:95

Expected violations 371 257

DynamicEVT violations 366 (0.41) 258 (0.49)

Dynamicnormal violations 384 (0.25) 238 (0.11)

StaticEVT violations 402 (0.05) 266 (0.30)

VaR

0:99

DynamicEVT violations 73(0.48) 55 (0.33)

Dynamicnormal violations 104 (0.00) 74 (0.00)

StaticEVT violations 86(0.10) 59 (0.16)

VaR

0:995

DynamicEVT violations 43(0.18) 24 (0.42)

Dynamicnormal violations 63(0.00) 44 (0.00)

StaticEVT violations 50(0.02) 36 (0.03)

Table 1: Some VaR backtesting results for two major indices. Values in brackets are

p-valuesforastatisticaltest ofthe success of themethod;values smallerthan 0.05 indicate

failure.

McNeiland Frey compare conditional EVTwith other dynamic approaches whichdo

not explicitly model the tail risk associated with the innovation distribution. In

par-ticular, they compare the approach with methods which assume normally distributed or

t{distributedinnovations(whichtheylabelthe dynamicnormalanddynamictmethods).

They alsocompare dynamic with static EVT.

TheVaRviolationsrelatingtotheDAXdatainFigure5are showninFigure6forthe

dynamicEVT,dynamicnormalandstaticEVTmethods,thesebeingdenotedrespectively

by the circular,triangular and square plottingsymbols. It is apparentthat although the

dynamic normal estimate reacts to volatility, itis violated more oftenthan the dynamic

EVT estimate; it is alsoclear that the static EVT estimate tends to be violated several

times inarowinperiodsof high volatilitybecauseitis unabletoreact swiftlyenoughto

the changingvolatility. Theseobservations areborneout by theresults inTable1, which

is a sampleof the backtesting results in McNeil&Frey(1998).

(12)

Dynamic EVT is in general the best method for estimating VaR

q

for q 0:95.

(Dynamic t is an eective simple alternative, if returns are not too asymmetric.)

For q0:99, the dynamic normalmethodis not goodenough.

The dynamic normalmethod isuseless for estimatingES t

q

, even when q=0:95. To

estimate expected shortfall adynamic procedurehas tobe enhanced with EVT.

Itis worth understanding inmoredetail why EVTisparticularlynecessary for

calcu-lating expected shortfall estimates. In a stochastic volatility modelthe ratio ES t

q =VaR

t

q

is essentially given by ES(Z)

q

=VaR(Z)

q

, the equivalent ratio for the noise distribution.

We have already observed in (9) that this ratio is largely determined by the weight of

the tail of the distribution F

Z

(z) as summarized by the parameter of a suitable GPD

approximation.

WehavetabulatedsomevaluesforthisratioinTable2inthecasewhenF

Z

(z)admitsa

GPDtailapproximationwith=0:22(thethresholdbeingsetatu=1:2with =0:57).

The ratio is compared with the equivalent ratio fora normal innovationdistribution.

q 0.95 0.99 0.995 q !1

GPD tail 1.52 1.42 1.39 1.29

Normal 1.25 1.15 1.12 1.00

Table 2: ESto VaRratios under two models for the noise distribution.

Clearly the ratios are smaller for the normal distribution. If we erroneously assume

conditional normality in our models, not only dowe tend to underestimate VaR, but we

also underestimate the ES/VaR ratio. Our error for ES is magnied due to this double

underestimation.

3.5 Multiple day horizons

Formultipleday horizons(k>1)wedonot havethesimple expressions fordynamicrisk

measures which we had in (15). Explicit estimation of the risk measures is diÆcult and

it is attractive to want to use a simple scaling rule, like the famous square root of time

rule, toturn one dayVaRintok-dayVaR.Unfortunately,square rootof timeisdesigned

for the case when returns are normally distributed and is not appropriate for the kind

of SV model driven by heavy-tailednoise that we consider realistic. Nor is itnecessarily

appropriateforscalingdynamicriskmeasures,whereonemightimaginecurrentvolatility

should be taken intoaccount.

It is possible to adopt a Monte Carlo approach to estimating dynamic risk measures

for longer time horizons. Possible future paths for the SV model of the returns may be

simulated and possible k-day losses calculated.

To calculate a single future path on day t we could proceed as follows. The noise

distribution is modelled with a composite model consisting of GPD tail estimates for

both tails and a simple empirical(i.e. historical simulation)estimate based onthe model

residuals in the centre. k independent values Z

t+1

;::: ;Z

t+k

are simulated from this

model(for detailsof the necessary randomnumbergeneratorsee McNeil&Frey (1998)).

Using the noise values and the current estimated volatilityfrom the tted GARCH-type

model, future values of the return process X

t+1

;::: ;X

t+k

(13)

distribution of the k-day loss(14).

By repeating this process many times to obtain a sample of values (perhaps 1000)

from the target distribution and then applying the GPD tail estimation procedure to

these simulated data, reasonable estimates of the risk measures may beobtained.

Such simulationresultscan thenbeused toexamine the natureofthe impliedscaling

law. McNeilandFreyconduct suchanexperimentand suggestthat forhorizonsupto50

days VaR t

q

(k) typically obeysa powerscaling law of the form

VaR t

q

(k)=VaR t

q k

t

;

where

t

depends on the current volatility. Their results are summarized in Table 3.

In their experiment square root of time scaling (

t

= 0:5) is appropriate on days when

estimated volatility ishigh; otherwise a a largerscaling exponent issuggested.

q 0.95 0.99

low volatility 0.65 0.65

average volatility 0.60 0.59

high volatility 0.48 0.47

Table 3: Typical scaling exponents for multiple day horizons. Low, average and high

volatilities are taken to be the 5th, 50th and 95th percentiles of estimated historical

volatilitiesrespectively.

4 Other Issues

In this sectionwe providebriefer noteson some other relevant topics inEVT.

4.1 Block Maxima Models for Stress Losses

For a more complete understanding of EVT we should be aware of the block maxima

models. Although less useful than the threshold models, these models are not without

practicalrelevance and could be used to provide estimates of stress losses.

Theorem1isnotreallyamathematicalresultasitpresentlystands. Wecouldmakeit

mathematically complete by saying that distributions which admit the asymptotic GPD

model for their excess distribution are precisely those distributions in the maximum

do-main of attraction of an extreme value distribution. To understand this statement we

must rst dene the generalized extremevalue distribution (GEV).

The distributionfunction of the GEV is given by

H

(x)=

exp ( (1+x) 1=

) 6=0;

exp ( e x

) =0;

where 1 +x > 0 and is the shape parameter. As in the case of the GPD, this

parametric form subsumes distributions which are known by other names. When > 0

the distribution isknown as Frechet; when =0 itis aGumbeldistribution;when <0

(14)

1

X

2

, ::: are independent identically distributed losses with distribution function F as

earlieranddenethemaximumofablockofnobservationstobeM

n

=max(X

1

;:::;X

n ).

Supposeitispossibletondsequencesofnumbersa

n

>0andb

n

suchthatthedistribution

of (M

n b

n )=a

n

, converges tosome limitingdistributionH asthe block size increases. If

this occurs F is said to be inthe maximumdomain of attraction of H. Tobeabsolutely

technically correctwe should assumethis limitisa non-degenerate (reasonably behaved)

distribution. We also note that the assumption of independent losses is by no means

important for the result that now follows and can be dropped if some additional minor

technical conditions are fullled.

Theorem 2 If F is in the maximum domain of attraction of a non-degenerate H then

this limit must be an extreme value distribution of the form H(x) = H

((x )=), for

some , and >0.

This result is known as the Fisher-Tippett Theorem and occupies ananalogous position

with respect to the study of maxima as the famous central limit theorem holds for the

study of sums or averages. Fisher-Tippett essentially says that the GEV is the only

possiblelimiting distributionfor (normalized) block maxima.

If the of the limiting GEV is strictly positive, F is said to be in the maximum

domain of attraction of the Frechet. Distributions in this class include the Pareto, t,

Burr, loggammaandCauchy distributions. If=0then F isinthe maximumdomainof

attraction of the Gumbel;examples are the normal,lognormaland gammadistributions.

If < 0 then F is in the maximum domain of attraction of the Weibull; examples are

the uniform and beta distributions. Distributions inthese three classes are precisely the

distributions for which excess distributions converge toa GPD limit.

Toimplementananalysisofstresslossesbasedonthislimitingmodelforblockmaxima

werequirealotofdata,sincewemustdeneblocksandreducethesedatatoblockmaxima

only. Suppose, forthesakeofillustration,thatwehavedaily(negative)returndatawhich

we divide into k large blocks of essentially equalsize; for example, we might take yearly

or semesterly blocks. Let M (j)

n

= max(X (j)

1 ;X

(j)

2

;::: ;X (j)

n

) be the maximum of the n

observations in block j. Usingthe methodof maximum likelihoodwetthe GEVto the

block maxima data M (1)

n

;::: ;M (k)

n

. That is we assume that our block size is suÆciently

large so that the limitingresult of Theorem 2may be taken asapproximatelyexact.

Suppose that we t a GEV model H

^

;^;^

to semesterly maxima of daily negative

re-turns. Thenaquantileofthisdistributionisastressloss. H 1

^

;^;^

(0:95)givesthemagnitude

ofdailylosslevelwemightexpect toreachevery20semestersor10years. Thisstressloss

is known asthe 20semester return leveland can beconsidered asakindof unconditional

quantileestimateforthe unknown underlyingdistributionF. In Figure7weshowthe20

semester return level for daily negative returns on the DAX index; the return level itself

is marked by asolidlineand anasymmetric95%condence intervalismarked by dotted

lines. In 23 years of data 4 observations exceed the point estimate; these 4 observations

occurin3dierentsemesters. Inafullanalysiswewould ofcoursetryaseriesofdierent

block sizes and compare results. See Embrechts etal. (1997)for amore detailed

descrip-tion of both the theoryand practice ofblock maximamodellingusing theFisher-Tippett

(15)

So far we have been concerned with univariate EVT. We have modelled the tails of

univariate distributions and estimated associated risk measures. In fact, there is also a

multivariate extreme value theory (MEVT) and this can be used to model the tails of

multivariate distributions in a theoretically supported way. In a sense MEVT is about

studying the dependence structure of extreme events, as we shall nowexplain.

Consider the random vector X = (X

1

;:::;X

d )

0

which represents losses of d dierent

kindsmeasured atthe same pointintime. Weassumetheselosses have jointdistribution

F(x

1

;:::;x

d

) = PfX

1 x

1

;::: ;X

d x

d

g and that individual losses have continuous

marginal distributions F

i

(x) = PfX

i

xg. It has been shown by Sklar (see Nelsen

(1999)) that every jointdistribution can be writtenas

F(x

1

;:::;x

d

)=C(F

1 (x

1

);::: ;F

d (x

d ));

for a unique function C that is known as the copula of F. A copula may be thought

of in two equivalent ways: as a function (with some technical restrictions) that maps

values inthe unit hypercube tovalues inthe unit interval;as a multivariatedistribution

function with standard uniform marginal distributions. The copula C does not change

under (strictly)increasingtransformationsof thelosses X

1

;::: ;X

d

and itmakessenseto

interpret C asthe dependence structure ofX orF, asthe followingsimple illustrationin

d=2 dimensions shows.

We take the marginal distributions to be standard univariate normal distributions

F

1 = F

2

= . We can then choose any copula C (i.e. any bivariate distribution with

uniform marginals)and apply ittothese marginalstoobtainbivariate distributionswith

normalmarginals. Forone particular choice ofC,whichwe calltheGaussiancopula and

denoteC Ga

,weobtainthestandard bivariatenormaldistributionwithcorrelation. The

Gaussian copula does not have a simple closed form and must be written as a double

integral - consult Embrechts, McNeil & Straumann (1999) for more details. Another

interesting copulais the Gumbel copulawhich doeshave asimple closed form,

C Gu

(v

1 ;v

2

)=exp

n

( logv

1 )

1=

+( logv

2 )

1= o

; 0< 1: (16)

Figure8shows thebivariatedistributionswhicharisewhenweapplythetwocopulasC Ga

0:7

and C Gu

0:5

to standard normal marginals. The left-hand picture is the standard bivariate

normal with correlation 70%; the right-hand picture is a bivariate distribution with

ap-proximatelyequal correlationbut the tendency togenerate extreme values of X

1

and X

2

simultaneously. It is,in this sense, a more dangerous distribution for risk managers. On

the basis of correlation, these distributions cannot be dierentiated but they obviously

have entirely dierent dependence structures. The bivariate normal has rather weak tail

dependence; the normal-Gumbeldistribution has pronounced tail dependence. For more

examples of parametric copulasconsult Nelsen (1999) orJoe(1997).

OnewayofunderstandingMEVTisasthe studyofcopulaswhichariseinthelimiting

multivariate distribution of componentwise block maxima. What do we mean by this?

SupposewehaveafamilyofrandomvectorsX

1 ;X

2

;::: representingd-dimensionallosses

at dierent points in time, where X

i = (X

i1

;::: ;X

id )

0

. A simple interpretation might

be that they represent daily (negative) returns for d instruments. As for the univariate

discussion ofblockmaxima, weassumethat lossesatdierentpointsintime are

indepen-dent. Thisassumption simpliesthe statement of the result,but can again be relaxedto

(16)

We denethe vectorof componentwiseblockmaxima tobe=(M

1n

;::: ;M

dn

) where

M

jn

= max(X

1j

;:::;X

nj

) is the block maximum of the jth component for a block of

size n observations. Now consider the vector of normalized block maxima given by

((M

1n b

1n )=a

1n

;::: ;(M

dn b

dn )=a

dn )

0

,wherea

jn

>0andb

jn

arenormalizingsequences

asinSection4.1. Ifthisvector converges indistribution toanon-degenerate limiting

dis-tributionthen this limitmust have the form

C

H

1

x

1

;::: ;H

d

x

d

;

for some values of the parameters

j ,

j

and

j

and some copula C. It must have this

form because of univariate EVT. Each marginaldistribution of the limitingmultivariate

distribution must be aGEV, aswe learnedin Theorem 2.

MEVTcharacterizes the copulasC which may arise inthis limit-the so-calledMEV

copulas. ItturnsoutthatthelimitingcopulasmustsatisfyC(u t

1

;::: ;u t

d )=C

t

(u

1

;::: ;u

d )

for t >0. There is nosingle parametric family which contains allthe MEV copulas, but

certain parametric copulas are consistent with the above condition and might therefore

beregarded as natural models for the dependence structure of extreme observations.

In two dimensions the Gumbel copula (16) is an example of an MEV copula; it is

moreover a versatile copula. If the parameter is 1 then C Gu

1 (v

1 ;v

2 ) = v

1 v

2

and this

copulamodelsindependenceofthecomponentsofarandomvector(X

1 ;X

2 )

0

. If 2(0;1)

then the Gumbel copula models dependence between X

1

and X

2

. As decreases the

dependence becomesstrongeruntilavalue =0correspondstoperfect dependenceofX

1

and X

2

; this means X

2

=T(X

1

) for some strictly increasing function T. For < 1 the

Gumbelcopula shows taildependence - thetendency ofextreme values tooccurtogether

as observed in Figure8. For more detailssee Embrechts etal.(1999).

The Gumbel copula can be used to build tail models in two dimensions as follows.

Suppose two risk factors (X

1 ;X

2 )

0

have an unknown joint distribution F and marginals

F

1

and F

2

so that, for some copula C, F(x

1 ;x

2

) = C(F

1 (x

1 );F

2 (x

2

)). Assume that we

have n pairs of data points fromthis distribution. Usingthe univariate POT methodwe

model the tails of the two marginal distributions by picking high thresholds u

1

and u

2

and using tail estimators of the form(5) toobtain

^

F

i

(x)=1 N

u

i

n

1+ ^

i

x u

i

^

i

1= ^

i

;x>u

i

; i=1;2:

Wemodelthe dependence structure of observations exceeding these thresholds using the

Gumbel copulaC Gu

^

for someestimated value ^

of the dependence parameter . We put

tail models and dependence structure together to obtaina modelfor the jointtail of F.

^

F(x

1 ;x

2 )=C

Gu

^

F

1 (x

1 );

^

F

2 (x

2 )

;x

1 >u

1 ; x

2 >u

2 :

The estimateof the dependence parameter can bedeterminedbymaximum likelihood,

either in a second stage after the parameters of the tail estimators have been estimated

or in a single stage estimation procedure where all parameters are estimated together.

Forfurther detailsofthese statisticalmatterssee Smith(1994). Forfurtherdetailsof the

theory consultJoe (1997).

This is perhaps the simplest bivariate POT model one can devise and it could be

(17)

a small number of dimensions. If we are interested in only a few risk factors and are

particularly concerned that joint extreme values may occur, we can use such models to

get useful descriptions of the joint tail. In very high dimensions there are simply too

manyparameters toestimateandtoomanydierenttailsof themultivariatedistribution

to worry about - the so-called curse of dimensionality. In such situations collapsing the

problemtoaunivariateproblemby consideringawholeportfolioofassets asasinglerisk

and collecting data ona portfoliolevel seems more realistic.

4.3 Software for EVT

We are aware of two software systems for EVT. EVIS (Extreme Values In S-Plus) is a

suite of free S-Plus functions for EVT developed at ETHZurich. To use these functions

it is necessary to have S-Plus, either for UNIXor Windows.

The functions provide assistance with four activities: exploring data to get a feel

for the heaviness of tails; implementing the POT method as described in Section 2;

im-plementing analyses of block maxima as described in Section 4.1; implementing a more

advanced form of the POT method known as the point process approach. The EVIS

functions provide simple templates which an S-Plus user could develop and incorporate

into a customized risk management system. In particular EVIS combines easily with

the extensive S-Plus time series functions orwith the S+GARCH module. This permits

dynamic risk measurement asdescribed inSection 3.

XTREMES is commercial software developed by Rolf Reiss and Michael Thomas at

the University of Siegen in Germany. It is designed to run as a self-contained program

under Windows (NT, 95,3.1). Fordidacticpurposes this programisverysuccessful; itis

particularlyhelpful for understanding the dierentsorts of extreme value modelling that

are possible and seeing how the models relate to each other. However, a user wanting to

adapt XTREMES for risk management purposes will need to learn and use the

Pascal-likeintegratedprogramminglanguageXPLthatcomeswithXTREMES.Thestand-alone

nature of XTREMES means that the user does not have access to the extensive libraries

of pre-programmed functions that packages likeS-Plus oer.

EVIS maybedownloaded overthe internetathttp://www.math.ethz.ch/ mcn eil.

InformationonXTREMEScanbefoundathttp://www.xtremes.math.un i-si egen .de.

5 Conclusion

EVT is here to stay as a technique in the risk manager's toolkit. We have argued in

this paperthat whenever tailsof probability distributions are of interest, it is naturalto

consider applying the theoretically supported methods of EVT. Methods based around

assumptions of normal distributionsare likely to underestimate tailrisk. Methods based

onhistoricalsimulationcan onlyprovideveryimprecise estimatesof tailrisk. EVTisthe

most scientic approach to an inherently diÆcult problem- predicting the size of a rare

event.

Wehavegivenoneverygeneralandeasilyimplementablemethod,theparametricPOT

methodof Section2,and indicatedhow this methodmay be adapted tomorespecialised

risk management problems such as the management of market risks. The reader who

wishes to learn more is encouraged to turn to textbooks like Embrechts et al. (1997)

(18)

& Stroughair (1999) for some commonsense caveats to uncritical use of EVT. We hope,

however, that all risk managers are persuaded that EVT will have an important role to

play in the development of sound risk management systems for the future.

References

Artzner, P., Delbaen, F., Eber, J. & Heath, D. (1997), `Thinking coherently', RISK

10(11), 68{71.

Beirlant,J.,Teugels,J.&Vynckier,P.(1996),Practicalanalysisofextremevalues,Leuven

University Press, Leuven.

Danielsson, J. & de Vries, C. (1997), `Tail index and quantile estimation with very high

frequency data', Journal of EmpiricalFinance 4,241{257.

Danielsson, J., Hartmann, P. & de Vries, C. (1998), `The cost of conservatism', RISK

11(1), 101{103.

Diebold, F., Schuermann, T. & Stroughair, J. (1999), Pitfalls and opportunities in the

use of extreme value theory in risk management, in `Advances in Computational

Finance', Kluwer AcademicPublishers, Amsterdam. To appear.

Embrechts, P., Kluppelberg, C. & Mikosch, T. (1997), Modelling extremal events for

insurance and nance, Springer, Berlin.

Embrechts, P., McNeil,A. & Straumann,D. (1999),`Correlation and dependency inrisk

management: properties and pitfalls', preprint,ETH Zurich.

Embrechts, P., Resnick, S. & Samorodnitsky, G. (1998), `Living on the edge', RISK

Magazine 11(1), 96{100.

Hull,J. &White, A.(1998),`Incorporatingvolatility updatingintothe historical

simula-tion methodfor value atrisk', Journal of Risk 1(1).

Joe,H.(1997),Multivariate ModelsandDependenceConcepts,Chapman&Hall,London.

McNeil,A. (1997),`Estimating the tailsof lossseverity distributionsusing extremevalue

theory', ASTIN Bulletin27, 117{137.

McNeil, A. (1998),`Historyrepeating', Risk 11(1),99.

McNeil,A.&Frey,R.(1998),`Estimation oftail-relatedriskmeasures forheteroscedastic

nancial time series: an extreme value approach', preprint,ETH Zurich.

Nelsen, R. B. (1999), An Introductionto Copulas, Springer, New York.

Reiss, R.&Thomas, M.(1997),StatisticalAnalysisof ExtremeValues,Birkhauser,Basel.

Smith, R. (1994), Multivariate threshold methods, in J. Galambos, ed., `Extreme Value

(19)

•••

• ••

•••

• ••

• • •

• ••

• • •

•

10

50

100

0.0

0.2

0.4

0.6

0.8

1.0 x (on log scale)

Fu(x-u)

Figure1: Modellingthe excess distribution.

•••••••••••••••••••••••••

_{•••••••••••••••••••}

•••••••••••••••

_{•••••••••••}

•••••••••

_{••• ••••}

•••••

_{••• •}

•••

• •

••

_•

•

10

50

100 0.00005

0.00050

0.00500

0.05000

x (on log scale)

1-F(x) (on log scale)

(20)

•••••••••••••••••••••••••

_{•••••••••••••••••••}

•••••••••••••••

_{•••••••••••}

•••••••••

_{••• ••••}

•••••

_{••• •}

•••

• •

_••

•

10

50

100 0.00005

0.00050

0.00500

0.05000

x (on log scale)

1-F(x) (on log scale)

99

95

Figure3: EstimatingVaR.

•••••••••••••••••••••••••

_{•••••••••••••••••••}

•••••••••••••••

_{•••••••••••}

•••••••••

_{••• ••••}

•••••

••• •

_•••

• •

••

_•

•

10

50

100 0.00005

0.00050

0.00500

0.05000

x (on log scale)

1-F(x) (on log scale)

99

95

99

95

(21)

Time

-0.05

0.0

0.05

0.10

0.15

01.10.87

01.04.88

01.10.88

01.04.89

01.10.89

01.04.90

Figure 5: Dynamic VaR using EVTfor the DAX index.

Time

-0.05

0.0

0.05

0.10

0.15

01.10.87

01.04.88

01.10.88

01.04.89

01.10.89

01.04.90

(22)

Time

-0.05

0.0

0.05

0.10

02.01.73

02.01.77

02.01.81

02.01.85

02.01.89

02.01.93

Figure 7: 20 semester return level for the DAX index (negative returns) is marked by

solid line; dotted linegives95% condence interval.

•

Figure 8: 10000 simulated data from two bivariate distributions with standard normal