## The Hidden Dangers of Historical Simulation

### Matthew Pritsker

*∗*

### June 19, 2001

**Abstract**

Many large financial institutions compute the Value-at-Risk (VaR) of their trading portfolios using historical simulation based methods, but the methods’ properties are not well understood. This paper theoretically and empirically examines the historical simulation method, a variant of historical simulation introduced by Boudoukh, Richard-son and Whitelaw (1998) (BRW), and the Filtered Historical Simulation method (FHS) of Barone-Adesi, Giannopoulos, and Vosper (1999). The Historical Simulation and BRW methods are both under-responsive to changes in conditional risk; and respond to changes in risk in an asymmetric fashion: measured risk increases when the portfolio experiences large losses, but not when it earns large gains. The FHS method appears promising, but requires additional refinement to account for time-varying correlations; and to choose the appropriate length of historical sample period. Preliminary anal-ysis suggests that 2 years of daily data may not contain enough extreme outliers to accurately compute 1% VaR at a 10-day horizon using the FHS method.

**1**

**Introduction**

The growth of the OTC derivatives market has created a need to measure and manage the

risk of portfolios whose value fluctuates in a nonlinear way with changes in the risk factors.

One of the most widely used of the new risk measures is Value-at-Risk, or VaR.1 A portfolio’s VaR is the most that the portfolio is likely to lose over a given time horizon except in a small

percentage of circumstances. This percentage is commonly referred to as the VaR confidence

level. For example, if a portfolio is expected to lose no more than $10,000,000 over the next day, except in 1% of circumstances, then its VaR at the 1% confidence level, over a one-day

VaR horizon is $10,000,000. Alternatively, a porfolio’s VaR at the*k*% confidence level is the

*k*’th percentile of the distribution of the change in the portfolio’s value over the VaR time
horizon.

The main advantage of VaR as a risk measure is that it is very simple: it can be used

to summarize the risk of individual positions, or of large multinational financial institutions,

such as the large dealer-banks in the OTC derivatives markets. Because of VaR’s

simplic-ity, it has been adopted for regulatory purposes. More specifically, the 1996 Market Risk

Amendment to the Basle Accord stipulates that banks’ and broker-dealers’ minimum capital

requirements for market risk should be set based on the ten-day 1-percent VaR of their trad-ing portfolios. The amendment allows ten-day 1-percent VaR to be measured as a multiple

of one-day 1-percent VaR.

Although VaR is a conceptually simple measure of risk, computing VaR in practice can

be very difficult because VaR depends on the joint distribution of all of the instruments in

the portfolio. For large financial firms which have tens of thousands of instruments in their

portfolios, simplifying steps are usually employed as part of the VaR computation. Three

steps are commonly used. First the dimension of the problem is reduced by modeling the

change in the value of the instruments in the portfolio as depending on a smaller (but still

large) set of risk factors *f*. Second the relationship between *f* and the value of instruments

which are nonlinear functions of*f* is approximated where necessary.2 Finally, an assumption
about the distribution of *f* is required.

The errors in VaR estimation depend on the reasonableness of the simplifying

assump-tions. One of the most important assumptions is the choice of distribution for the risk factors.

Many large banks currently use or plan to use a method known as historical simulation to

model the distribution of their risk factors. The distinguishing feature of the historical

sim-ulation method and its variants is that they make minimal parametric assumptions about

1_{For a review of the early literature on VaR, see Duffie and Pan (1997).}

2_{For instruments that require large amounts of time to value, it will typically be necessary to approximate}

the distribution of *f*, beyond assuming that the distribution of changes in value of today’s

portfolio can be simulated by making draws from the historical time series of past changes

in *f*.

The purpose of this paper is to conduct an in-depth examination of the properties of

historical simulation based methods for computing VaR. Because of the increasing use of these methods among large banks, it is very important that market practitioners, and

regu-lators understand the properties of these methods and ways that they can be improved. The

empirical performance of these methods has been examined by Hendricks (1996), and Beder

(1995) among others. In this paper I study the performance of the historical simulation

method, as well as two variants: the BRW method [Boudoukh, Richardson, and Whitelaw

(1998)] and the Filtered Historical Simulation Method (Barone-Adesi, Giannopoulos, and

Vosper, 1999). In a related paper, Hull and White (1998) empirically examine the

perfor-mance of the historical simulation method, the BRW method, and the Hull-White method,

which is a variant of the Filtered Historical Simulation method.3 The analysis here departs from related work on the empirical properties of the VaR methods in two ways. First, I ana-lyze the historical simulation based estimators of VaR from a theoretical as well as empirical

perspective. The theoretical insights aid in understanding the deficiencies of the historical

simulation and BRW methods. Second, the earlier empirical analysis of these methods was

based on how the method performed with real data. A disadvantage of using real data to

examine the methods is that since true VaR is not known, the quality of the VaR methods,

as measured by how well they track true VaR, can only be measured indirectly. As a result

it is very difficult to quantify the errors associated with a particular method of measuring

VaR when using real data. In my empirical analysis, I analyze the properties of the his-torical simulation method’s estimates of VaR with artificial data. The artificial data are

generated based on empirical time series models that were fit to real data. The advantage of

working with the artificial data is that true VaR is known. This makes it possible to much

more closely examine the properties of the errors made when estimating VaR using historical

simulation.

Because my main focus in this paper is on the distributional assumptions used in historical

simulation methods, in all of my analysis, I abstract from other sources of error in VaR

estimates. More specifically, I only examine VaR for simple spot positions in underlying

stock indices or exchange rates. For all of these positions, there is no possibility of choosing

incorrect risk factors, and there is no possibility of approximating the nonlinear relationship between instrument prices and the factors incorrectly. The only sources of error in the VaR

3_{The Filtered Historical Simulation and Hull-White methods are the same when the time horizon over}

estimates is the error associated with the distributional assumptions.

Before presenting my results on historical simulation based methods, it is useful to

illus-trate the problems with the distributional assumptions associated with historical simulation.

The distributional assumptions used in VaR, as well as the other assumptions used in a VaR

measurement methodology, are judged in practice by whether the VaR measures provide the correct conditional and unconditional coverage for risk [Christofferson (1998), Diebold,

Gun-ther, and Tay (1998), Berkowitz (1999)]. A VaR measure achieves the correct unconditional

coverage if the portfolio’s losses exceed the *k* percent VaR measures k% percent of the time

in very large samples. Because losses are predicted to exceed k-percent VaR k-percent of the

time, a VaR measure which achieves correct unconditional coverage is correct on-average. A

more stringent criterion is that the VaR measure provides the correct conditional coverage.

This means that if the risk, and hence the VaR of the portfolio changes from day to day,

then the VaR estimate needs to adjust so that it provides the correct VaR on every day, and

not just on average.

It is probably unrealistic to expect that a VaR measure will provide exactly correct conditional coverage. But, one would at least hope that the VaR estimate would increase

when risk appears to increase. In this regard, it is useful to examine an event where risk

seems to have clearly increased, and then examine how different measures of VaR respond.

The simplest event to focus on is the stock market crash of October 19, 1987. The crash

itself seemed indicative of a general increase in the riskiness of stocks, and this should be

reflected in VaR estimates.

Figure 1 provides information on how three historical simulation based VaR methods

performed during the period of the crash for a portfolio which is long the S&P 500. All three VaR measures use a one-day holding period and a one-percent confidence level.

The first VaR measure uses the historical simulation method. This method involves

computing a simulated time series of the daily P & L that today’s portfolio would have

earned if it was held on each of *N* days in the recent past. VaR is then computed from the

empirical CDF of the historically simulated portfolio returns.

The principle advantage of the historical simulation method is that it is in some sense

nonparametric because it does not make any assumptions about the shape of the distribution

of the risk factors that affect the portfolio’s value. Because the distribution of risk factors,

such as asset returns, is often fat-tailed, historical simulation might be an improvement over

other VaR methods which assume that the risk factors are normally distributed.

The principle disadvantage of historical simulation method is that it computes the

em-pirical CDF of the portfolios returns by assigning an equal probability weight of 1*/N* to each

simulated returns are independently and identically distributed (i.i.d.) through time. This

assumption is unrealistic because it is known that the volatility of asset returns tends to

change through time, and that periods of high and low volatility tend to cluster together

[Bollerslev (1986)].

When returns are not i.i.d., it might be reasonable to believe that simulated returns

from the recent past better represent today portfolio’s risk than returns from the distant past. Boudoukh, Richardson, and Whitelaw (1998), BRW hereafter, used this idea to

intro-duce a generalization of the historical simulation method in a way that assigns a relatively

high amount of probability weight to returns from the recent past. More specifically, BRW

assigned probability weights that sum to 1, but decay exponentially. For example, if *λ*,

a number between zero and 1, is the exponential decay factor, and *w*(1) is the

probabil-ity weight of the most recent historical return of the portfolio, then the next most recent

return receives probability weight *w*(2) = *λ∗* *w*(1), and the next most recent receives

weight *λ*2*∗w*(1), and so on. After the probability weights are assigned, VaR is calculated
based on the empirical CDF of returns with the modified probability weights. The historical

simulation method is a special case of the BRW method in which *λ* is set equal to 1.

The analysis in figure 1 provides results for the historical simulation method when VaR

is computed using the most recent 250 days of returns. The figure also presents results for

the BRW method when the most recent 250 days of returns are used to compute VaR and

the exponential decay factors are either *λ* = 0*.*99, or *λ* = 0*.*97. The size of the sample of

returns and the weighting functions are the same as those used by BRW. The VaR estimates

in the figure are presented as negative numbers because they represent amounts of loss in

portfolio value. A larger VaR amount means that the amount of loss associated with the

VaR estimate has increased.

The main focus of attention is how the VaR measures respond to the crash on October

19th. The answer is that for the historical simulation method the VaR estimate has almost

no response to the crash at all (Figure 1 panel A). More specifically, on October 20th, the

VaR measure is at essentially the same level as it was on the day of the crash. To understand

why, recall that the historical simulation method assigns equal probability weight of 1/250

to each observation. This means that the historical simulation estimate of VaR at the 1%

confidence level corresponds to the 3rd lowest return in the 250 day rolling sample. Because

the crash is the lowest return in the 250 day sample, the third lowest return after the crash turns out to be the second lowest return before the crash. Because the second and third

lowest returns happen to be very close in magnitude, the crash actually has almost no impact

on the historical simulation estimate of VaR for the long portfolio.

However, the modification makes a large difference. On the day after the crash, the VaR

estimates for both BRW methods increase very substantially, in fact, VaR rises in magnitude

to the size of the crash itself (Figure 1, panels B and C). The reason that this occurs is simple.

The most recent P & L change in the BRW methods receive probability weights of just over

1% for *λ* = 0*.*99 and of just over 3% for *λ* = 0*.*97. In both cases, this means that if the
most recent observation is the worst loss of the 250 days, then it will be the VaR estimate

at the 1% confidence level. Hence, the BRW methods appear to remedy the main problems

with the historical simulation methods because very large losses are immediately reflected

in VaR.

Unfortunately, the BRW method does not behave nearly as well as the example suggests.

To see the problem, instead of considering a portfolio which is long the S&P 500, consider

a portfolio which is short the S&P 500. Because the long and short equity positions both

involve a “naked” equity exposure, the risk of the two positions should be similar, and should

respond similarly to events like a crash. Instead, the crash has very different effects on the

BRW estimates of VaR: following the crash the estimated risk of the long portfolio increases very significantly (Figure 1, panels B and C), but the estimated VaR of the short portfolio

does not increase at all (Figure 2, panels B and C). The estimated risk of the short portfolio

did not increase until the short portfolio experienced significant losses in response to the

markets partial recovery in the two days following the crash.4

The reason that the BRW method fails to “see” the short portfolio’s increase in risk

after the crash is that the BRW method and the historical simulation method are both

completely focused on nonparametrically estimating the lower tail of the *P*&*L*distribution.

Both methods implicitly assume that whatever happens in the upper tail of the distribution,
such as a large increase in *P*&*L*, contains no information on the lower tail of *P*&*L*. This

means that large profits are never associated with an increase in the perceived dispersion of

returns using either method. In the case of the crash, the short portfolio happened to make

a huge amount of money on the day of the crash. As a consequence, the VaR estimates using

the BRW and historical simulation methods did not increase.

The BRW method’s inability to associate increases in P&L with increases in risk is

disturbing because large positive returns and large negative returns are both potentially

indicative of an increase in overall portfolio riskiness. That said, the GARCH literature

suggests that the relationship between conditional volatility and equity index returns is

asymmetric: conditional volatility increases more when index returns fall than when they
rise. Because the BRW method updates risk based on movement in the portfolio’s*P*&*L*, and

4_{The short portfolio’s losses on October 20 exceeded the VaR estimate for that day. As a result, the VaR}

not on the price of the assets, it can respond to this asymmetry in precisely the wrong way.

For example, the short portfolio registers larger increases in risk when prices rise, than when

they fall. This is just the opposite of the relationship suggested by the GARCH literature.

The sluggish adjustment of the BRW and historical simulation methods to changes in

risk at the 1% level are much worse at the 5% level; and in this case the BRW method
with *λ* = 0*.*97 and *λ* = 0*.*99 provide very little improvement above and beyond that of the

historical simulation method. The strongest evidence for the problem is the number of days

in October where losses exceed the 5% VaR limits. For example, for the long portfolio losses

exceed the VaR limits on 7 of 21 days in October using historical simulation or BRW with

*λ*= 0*.*99, and losses exceed the VaR limits on 5 days using the BRW method with*λ*= 0*.*97
(Figure 3). Losses for the short-portfolio exceed their limits as well, but the total number of

times is fewer (Figure 4).

Sections 2 and 3 explore the properties of the historical simulation and BRW methods

from a theoretical and empirical viewpoint. Section 4 examines a promising variant of

the historical simulation method introduced by Barone-Adesi, Giannopoulous, and Vosper. Section 5 concludes.

**2**

**Theoretical Properties of Historical Simulation **

**Meth-ods**

The goal of this section is to derive the properties of historical simulation methods from a

theoretical perspective. Because historical simulation is a special case of BRW’s approach, all of the results here are derived for the BRW method; and hence they generalize to the

historical simulation approach.

The simplest way to implement BRW’s approach *without using their precise method* is
to construct a history of *N* hypothetical returns that the portfolio would have earned if

held for each of the previous *N* days, *r _{t}_{−}*

_{1}

*, . . . , r*, and then assign exponentially declining probability weights

_{t}_{−}_{N}*w*

_{t}_{−}_{1}

*, . . . , w*to the return series.5 Given the probability weights, VaR at the

_{t}_{−}_{N}*C*percent confidence level can be approximated from

*G*(

*.*;

*t, N*), the empirical

cumulative distribution function of *r* based on return observations *r _{t}_{−}*

_{1}

*, ...r*.

_{t}_{−}_{N}5_{The weights sum to 1 and are exponentially declining at rate}_{λ}_{( 0}_{< λ}_{≤}_{1):}

*N*

X

*i*=1

*w _{t}_{−}_{i}*= 1

*G*(*x*;*t, N*) =

*N*

X

*i*=1

1_{{}_{r}_{t}_{−}_{i}_{≤}_{x}_{}}w_{t}_{−}_{i}

Because the empirical cumulative distribution function (unless smoothed) is discrete,

the solution for VaR at the *C* percent confidence level will typically not correspond to

a particular return from the return history. Instead, the BRW solution for VaR at the *C*

percent confidence level will typically be sandwiched between a return which has a cumulative

distribution which is slightly less than *C*, and one which has a cumulative distribution that
is slightly more than *C*. These returns can be used as estimates of the BRW method’s VaR

estimates at confidence level *C*. The estimate which slightly understates the BRW estimate

of VaR at the *C* percent confidence level is given by:

*BRWu*_{(}_{t|λ, N, C}_{) = inf(}_{r}_{∈ {r}

*t−*1*, . . . rt−N}|G*(*r*;*t, N*)*≥C*)*,*

and the estimator which tends to slightly overstate losses is given by:

*BRWo*_{(}_{t|λ, N, C}_{) = sup(}_{r}_{∈ {r}

*t−*1*, . . . rt−N}|G*(*r*;*t, N*)*≤C*)*.*

where *λ* is the exponential weight factor, *N* is the length of the history of returns used

to compute VaR, and *C* is the VaR confidence level.

In words, *BRWu*(*t|λ, N, C*) is the lowest return of the *N* observations whose empirical

cumulative probability is greater than *C*, and *BRWo*(*t|λ, N, C*) is the highest return whose

empirical cumulative probability is less than *C*.

The *BRWu*(*t|λ, N, C*) method is not precisely identical to BRW’s method. The main

difference is that BRW smooths the discrete distribution in the above approaches to create a

continuous probability distribution. VaR is then computed using the continuous distribution.

For expositional purposes, the main analytical results will be proven for the*BRWu*(*t|λ, N, C*)

estimator of value at risk. The properties of this estimator are essentially the same as those

for the estimator used by BRW, but it is much easier to prove results for this estimator.

The main issue that I examine in this section is the extent to which estimates of VaR based on the BRW method respond to changes in the underlying riskiness of the environment.

In this regard, it is important to know under what circumstances risk estimates increase (i.e.

reflect more risk) when using the *BRWu*(*t|λ, N, C*) estimator. The result is provided in the

following proposition:

The proposition basically verifies my main claim in the introduction to the paper.

Specif-ically, the proposition shows that when losses at time*t* are bounded below by the BRW VaR

estimate at time*t*, then the BRW VaR estimate for time*t*+ 1 will indicate that risk at time

*t*+ 1 is no greater than it was at time *t*. The example of a portfolio which was short the
S&P 500 at the time of the crash is simply an extreme example of this general result.

To get a feel for the importance of this proposition, suppose that today’s VaR estimate

for tomorrow’s return is conditionally correct, but that risk changes with returns, so that

tomorrow’s return will influence risk for the day after tomorrow. Under these circumstances,

one might ask what is the probability that a VaR estimate which is correct today will increase

tomorrow. The answer provided by the proposition is that tomorrow’s VaR estimate will

not increase with probability 1*−c*. So, for example, if*c*is equal to 1%, then a VaR estimate

which is correct today, will not increase tomorrow with probability 99%.

The question is how often should the VaR estimate increase the next day. The answer

depends on the true process which is determining both returns and volatility. The easiest

case to consider is when returns follow a GARCH(1,1). This is a useful case to consider for two reasons. First, it is a reasonable first approximation to the pattern of conditional

heteroskedasticity in a number of financial time series. Second, it is very tractable.6 I will assume that returns are normally distributed, have mean 0, and follow a GARCH(1,1)

process:

*rt* = *h.t*5*ut* (1)

*ht* = *a*0+*a*1*rt*2*−*1+*b*1*ht−*1 (2)

where, *u _{t}* is distributed standard normal for all

*t*;

*a*

_{0},

*a*

_{1}, and

*b*

_{1}are all greater than zero; and

*a*

_{1}+

*b*

_{1}

*<*1.

Under these conditions, it is straightforward to work out the probability that a VaR

estimate should increase tomorrow given that it is conditionally correct today. The answer

turns out to have a very simple form when *h _{t}* is at its long run mean. The probability that

the VaR estimate should increase tomorrow given that *h _{t}* is at its long run mean is given in

the follow proposition.

6_{Although deriving analytical results may be difficult, all of the simulation analysis that I perform when}

**Proposition 2** *When returns follow a GARCH(1,1) process as in equations* (1) *and* (2)
*and* *h _{t}*

*is at its long run mean, then*

*Prob(V aR _{t}*

_{+1}

*> V aR*) = 2

_{t}*∗*Φ(

*−*1)

*≈.*3173

*where* Φ(*x*) *is the probability that a standard normal random variable is less than* *x.*
*Proof: See the appendix.*

Propositions 1 and 2 taken together suggest that roughly speaking, when a VaR estimate

is near the long-run average value of VaR using the BRW methods, then VaR should increase

about 32 percent of the time when in fact it will only increase about*C* percent of the time,

i.e. at the 1% confidence level, 31% of the time VaR should have increased, but didn’t, or

at the 5% confidence level, 27% of the time VaR should have increased, but did not.

The quantitive importance of the historical simulation and BRW methods not responding

to certain increases in VaR depend on how much VaR is likely to have increased over a single

time period (such as a day) without being detected. This is simple to work out if returns

follow a GARCH(1,1) process.

**Proposition 3** *When returns follow a GARCH(1,1) process as in equations* (1) *and* (2),
*then when* *h _{t}*

*is at its long run mean, and*

*y*(

*c, t*), the VaR estimate for confidence level

*c,*

*at time*

*t, using VaR method*

*y, is correct, and*

*y*

*is either the BRW or historical simulation*

*methods, then the probability that VaR at timet*+ 1

*is at leastx*%

*greater than at timet, but*

*the increase is not detected at time*

*t*+ 1

*using the historical simulation or BRW methods is*

*given by:*

Prob(∆*V aR > x* %*,* no detect) =

2 Φ

*−*q1 + *x*2* _{a}*+2

*x*

1

*−c* 0*< x < k*(*a*_{1}*, c*)
Φ

*−*q1 + *x*2* _{a}*+2

*x*

1

*x≥k*(*a*_{1}*, c*)*,*

(3)

*where* *k*(*a*_{1}*, c*) = *−*1 +p1*−a*_{1}+*a*_{1}[Φ*−*1(*c*)]2*.*
*Proof: See the appendix.*

To get a feel for how much of a change in VaR might actually be missed, I considered

VaR for 10 different spot foreign exchange positions. Each involves selling U.S. currency

and purchasing the foreign currency of a different country. To evaluate the VaR for these

positions and to study historical simulation based estimates of VaR, I fit GARCH(1,1) models

data was for the time period from 1973 through 1997.7 The results of the estimation are presented in Table 1. The restrictions of proposition 2 are satisfied for most, but not all

of the exchange rates. The paramater estimates for the French franc and Italian lira, do
not satisfy the restriction that *a*_{1}+*b*_{1} *<*1. Instead their parameter estimates indicate that
their variances are explosive and hence their variances do not have a long-run mean. As a

consequence, some of the theoretical results are not strictly correct for these two exchange

rates, but they are correct for processes with slightly smaller values of*b*_{1}.

When the variance of exchange rate returns has a long-run mean, equation (3) shows that

when variance is near its long run mean, then of the three parameters of the GARCH model,

only*a*_{1} determines how much of the increase in true VaR is not detected. For the 10 exchange
rates that I consider, *a*_{1} ranges from a low of about 0*.*05 for the yen, to about 0*.*20 for the
lira. When VaR is computed at the 1% confidence level using the historical simulation

or BRW methods, the probability that VaR could increase by at least *x*% without being
detected is presented in figure 5 for the low, high, and average values of *a*_{1}.8 The figure
shows that there is a substantial probability (about 31 percent) that increases in VaR will go

undetected. Many of the increases in VaR that go undetected are modest. However, there is

a 4% probability that fairly “large” increases in VaR will also go undetected. For example,

for the largest value of *a*_{1}, with 4% probability (i.e. 4% of the time) VaR could increase by
25% or more, but not be detected using the historical simulation or BRW methods. For the

average value of*a*_{1}, there is 4% chance VaR could increase by 15% with being detected, and
for the low value of*a*_{1}, there is a 4% chance that a 7% increase in VaR would go undetected.
A slightly different view of these results is provided in Table 2. Unlike the figure, which

presents probabilities that VaR will actually increase, the table computes the expected size

of the increase in VaR conditional on it increasing, but not being detected. For example,

the results for the British pound show that conditional on VaR increasing but not being

detected (an event that occurs with 31% probability), the expected increase in VaR is about

5-1/2 percent with a standard deviation of about the same amount. Taken as a whole, the

table and figure suggests that conditional on VaR being understated for these currencies,

the expected understatement will probably be about 7 percent, but because the conditional

distribution is skewed right, there is a nontrivial chance that the actual increase in VaR

could be much higher.

It is important to emphasize that proposition 3, table 2, and figure 5 quantify the

proba-bility that a VaR increase of a given size will not be detected on the day that it occurs. It is

7_{The precise dates for the returns are Jan 2, 1973, through November 6, 1997. The currencies are the}

British pound, the Belgian franc, the Canadian dollar, the French franc, the Deutschemark, the Yen, the Dutch guilder, the Swedish kronor, the Swiss franc, and the Italian lira.

8_{The average value of}_{a}

possible that VaR could increase for many days in a row without being detected. This allows

VaR errors to accumulate through time and occasionally become large. But, the proposition

does not quantify how large the VaR errors typically become. Only simulation can answer

that question. This is done in the next section.9

**3**

**Simulated Performance of Historical Simulation **

**Meth-ods**

**3.1**

**Simulation Design**

This section examines the performance of the BRW method using simulation in order to

provide a more complete description of how the method performs. Results for simulation of

the BRW and historical simulation methods are presented in Tables 3 and 5. For purposes

of comparison, analogous results are presented in Tables 4 and 6 for when VAR is computed using a Variance-Covariance method in which the variance-covariance matrix of returns is

estimated using an exponentially declining weighted sum of past squared returns.10

All simulation results were computed by generating 200 years of daily data for each

exchange rate when the process followed by the exchange rates is the same as those used to

generate the theoretical results in Table 2. The simulation results are analyzed by examining

how well each of the VaR estimation methods perform along each 200 year sample path.

Simulation results are not presented for the Italian lira because for its estimated GARCH

parameters, its conditional volatility process was explosive.

**3.2**

**Simulation Results**

The main difference between the simulations and the theory is that the simulations compute

how the methods perform on average over time. The theoretical results, by contrast, con-dition on volatility starting from its long run mean. Because of this difference, one would

expect the simulated results to differ from the theoretical results. In fact, the theoretically

predicted probability that VaR increases will not be detected, and the theoretically

pre-dicted conditional distribution of the nondetected VaR increases (Table 2) appear to closely

match the results from simulation. In this respect, table 3 provides no new information

9_{An additional reason to perform simulations is that the analytical results on VaR increasing are derived}

under the special circumstances that the variance of returns are at their long-run mean, and the VaR estimate using the BRW or historical simulation method at this value is correct.

beyond knowledge that the predictions from the relatively restrictive theory are surprisingly

accurate in the special case of the GARCH(1,1) model.

The more interesting simulation results are presented in Table 5. The table shows that

the correlation of the VaR estimates with true VaR is fairly high for the BRW methods, and

somewhat lower, for the Historical Simulation methods. This confirms that the methods move with true VaR in the long run. However, the correlations of changes in the VaR

estimates with changes in true VaR are quite low. This shows that the VaR methods are

slow to respond changes in risk. As a result, the VaR estimates are not very accurate: The

average Root Mean Squared Error (RMSE) across the different currencies is approximately

25% of true VaR (Table 5, panels A and B.). The errors as a percent of true VaR turn out

not to be symmetrically distributed, but instead are positively skewed. For example in the

case of Historical Simulation estimates of 1-day 1-percent VaR for the British pound, VaR is

slightly more likely to be overstated than understated; and the errors when VaR is overstated

are much larger than when it is understated (Figure 6). On this basis, it appears that the

BRW and historical simulation methods are conservative. However, the risks when VaR is understated are substantial: for example, there is a 10% probability that VaR estimates for

a spot position in the British pound/dollar exchange rate will be understate true VaR by

more than 25% (Figure 6); the same error, expressed as a percent of the value of the spot

position is about 1/2 % (Figure 7).

A more powerful method for illustrating the poor performance of the methods involves

directly examining how the VaR estimates track true VaR. For the sake of brevity, this is only

examined for the British pound over a period of 2 years. The figures for the British pound

tell a consistent story: true VaR and VaR estimated using historical simulation or the two BRW methods tend to move together over the long-run, but true VaR changes more often

than the estimates, and all three VaR methods respond slowly to the changes(Figures 8, 9,

and 10). The result is that true VaR can sometimes exceed estimated VaR by large amounts

and for long periods of time. For example, over the two-year period depicted in Figure

11, there is a 0.2 year episode during which VaR estimated using the historical simulation

method understates true VaR by amounts that range from a low of 40% to a high of 100%.

Over the same 2 years, even with the best BRW method (*λ* = 0*.*97) there are four different

episodes which last at least 0.1 years during which VaR is understated by 20% or more; and

for one of these episodes, VaR builds up over the period until true VaR exceeds estimated

VaR by 70% or more before the VaR estimate adjusts (Figure 12).

The problems with the BRW and historical simulation methods are striking when one

compares true VaR against the VaR estimates. In particular, the errors seem to persist for

it is important that the methods that regulators and risk practitioners use to detect errors

in VaR methods are capable of detecting these errors. These detection methods are briefly

examined in the next subsection.

**3.3**

**Can Back-testing Detect The Problems with Historical **

**Sim-ulation?**

VaR methods are often evaluated by backtesting to determine whether the VaR methods

provide correct unconditional coverage, and to examine whether they provide correct

condi-tional coverage. The standard test of uncondicondi-tional coverage is whether losses exceed VaR

at the *k* percent confidence level more frequently than *k* percent of the time. A finding

that they do would be interpreted as evidence that the VaR procedure understates VaR

unconditionally.

Based on standard tests, both BRW methods and the historical simulation method

ap-pear to perform well when measured by the percentage of times that losses are worse than

predicted by the VaR estimate. Losses exceed VaR 1.5% of the time. This is only slightly

more than is predicted. Given that the VaR estimates are actually persistently poor, my

results here reconfirm earlier results that unconditional coverage tests have very low power

to detect poor VaR methodologies (Kupiec, 1995).

The second way to examine the quality of VaR estimates is to test whether they are

conditionally correct. If the VaR estimates are conditionally correct, then the fact that

losses exceeded today’s VaR estimate should have no predictive power for whether losses will

exceed VaR in the future. If we denote a VaR exceedance by the event that losses exceeded VaR, then correct conditional coverage is often tested by examining whether the time series

of VaR exceedances is autocorrelated. To provide a sort of baseline feel for the power of

this approach, for the 200 years of simulated data for the British pound, I computed the

autocorrelation of the actual VaR errors, and of the series of VaR exceedances. Results are

presented for 1-day 1% VaR, and 1-day 5% VaR for both the BRW method and for the

historical simulation method.

The autocorrelation of the true VaR errors reinforces my earlier results that these VaR

methods are slow to adjust to changes in risk. The autocorrelation at a 1-day lag is about 0.95 for all three methods. For the best of the three methods, the autocorrelation of the

VaR errors dies off very slowly: it remains about 0.1 after 50 days (Figure 13). The errors of

the historical simulation method die off much more slowly. The 50th order autocorrelation

Given the high autocorrelations of the actual VaR errors, it is useful to examine the

autocorrelations of the exceedances. Unfortunately, the autocorrelation of the exceedances

is generally much smaller than the autocorrelation of the VaR errors. For example, in the

case of the BRW method with *λ*= 0*.*97, the autocorrelation of the VaR exceedances for 1%

VaR is only about 0.015 for autocorrelations 1 - 6, and it drops towards 0 after that.11 For the historical simulation method, the first six autocorrelations are 0.02 - 0.03 for 1% VaR,

and 0.05 for 5% VaR. Because all of the autocorrelations of the exceedances are generally

very small, the power of tests for correct conditional coverage, when based on exceedances

is very low.12

The low power of tests based on exceedances suggests that alternative approaches for

examining the performance of VaR measures are needed.13 The alternative that I advocate is the one I use here: evaluate a VaR method by comparing its estimates of VaR against

true VaR in situations where true VaR is known or knowable.

**3.4**

**Comparison with VaR estimates based on variance-covariance**

**methods**

To put the results on the BRW and historical simulation methods in perspective, it is useful

to contrast the results with a variance- covariance method with equally weighted

observa-tions and with variance- covariance methods which use exponentially declining weights. The

performance of the variance-covariance method with equal weighting is about as good as

the historical simulation methods. Neither method does a good job of capturing conditional

volatility; and this shows up in the performance of the methods. The variance-covariance

11_{The first six autocorrelations of the exceedances for}_{λ}_{= 0}_{.}_{99 are about 0.02 for the 1% VaR estimates}

about 0.02 - 0.03 for the 5% VaR estimates.

12_{An informal illustration of the power of the tests involves calculating the number of time-series }

observa-tions that would be necessary to generate a rejection of the null if the correlaobserva-tions that were measured for
the test are the true correlations. Let *ρ _{i}* represent the

*i0th*autocorrelation. Consider a test based on the first six autocorrelations of the exceedances. Under the null that all autocorrelations are zero,

*N*X6

*i*=1

*ρ*2

*i* *∼χ*2(6)*.*

If instead all six measured autocorrelations are about 0.05, then about 839 observations (3.36 years of daily data) are required for the test statistic to reject the null of no autocorrelation at the 0.05 percent confidence level. If instead all six measured autocorrelations are about 0.015, then 37.3 years of daily data are required to reject the null using this test.

13_{Despite the low power of tests based on VaR exceedances, in Berkowitz and O’Brien’s (2001) study}

methods with exponentially declining weights are unambiguously better than historical

sim-ulation, and also perform better than the BRW methods: the probability that increases

in VaR are not detected is with one exception, less than 10%, the mean and standard

de-viation of undetected increases in VaR is generally low (Table 4), and the correlation of

these measures with true VaR and with changes in VaR is high. There are two reasons why these methods perform better than the BRW method in the simulations. The first is that

the variance-covariance methods recognize changes in conditional risk whether the portfolio

makes or loses money; the BRW method only recognizes changes in risk when the

portfo-lio experiences a loss. The second reason is that computing variance-covariance matrices

using exponential weighting is similar to updating estimates of variance in a GARCH(1,1)

model. This simililarity helps the variance-covariance method capture changes in conditional

volatility when the true model is GARCH(1,1). Moreover, the same exponential weighting

methods perform well for all of the GARCH(1,1) parameterizations.

Given that these simulations suggest that the exponential weighting method of computing

VaR appears to be better than the BRW method with the same weights, the empirical results in Boudoukh, Richardson, and Whitelaw (1997) are puzzling because they show that their

method appears to perform better when using real data. The reason for the difference is

almost surely that returns in the real world are both heteroskedastic and leptokurtic but

the exponential smoothing variance-covariance methods ignore leptokurtosis and instead

assume that returns are normally distributed. It turns out that the normality assumption is

a first-order important error; it is this error which makes the BRW and historical simulation

methods appear to perform well by comparison.

Although the BRW method appears to be better than exponential smoothing when using real data, it is far from an ideal distributional assumption. The BRW method’s inability

to associate large profits with risk, and its inability to respond to changes in conditional

volatility are disturbing. More importantly, there is not a strong theoretical basis for using

the BRW method. More specifically, except for the case of *λ* = 1, one cannot point to any

process for asset returns and say to compute VaR for that process, the BRW method is the

theoretically correct approach. Because of the disturbing features of the BRW and historical

simulation methods, it is desirable to pursue other approaches for modeling the distribution

of the risk factors. Ideally, the methodology which is adopted should model conditional

heteroskedasticity and non-normality in a theoretically coherent fashion. There are many

possible ways that this could be done. A relatively new VaR methodology introduced by Barone-Adesi, Giannopoulos, and Vosper combines historical simulation with conditional

volatility models in a way which has the potential to achieve this objective. This new

the filtered historical simulation method are discussed in the next section.

**4**

**Filtered Historical Simulation**

In a recent paper Barone-Adesi, Giannopoulos, and Vosper, introduced a variant of the

his-torical simulation methodology which they refer to as filtered hishis-torical simulation (FHS).

The motivation behind using their method is that the two standard approaches for

comput-ing VaR make tradeoffs over whether to capture the conditional heteroskedasticity or the

non-normality of the distribution of the risk factors. Most implementations of

Variance-Covariance methods attempt to capture conditional heteroskedasticity of the risk factors,

but they also assume multivariate normality; by contrast most implementations of the

his-torical simulation method are nonparametric in their assumptions about the distribution of

the risk factors, but they typically do not capture conditional heteroskedasticity.

The innovation of the filtered historical simulation methodololgy is that it captures both the conditional heteroskedasticity and non-normality of the risk factors. Because it

cap-tures both, it has the potential to very significantly improve on the variance-covariance and

historical simulation methods that are currently in use.14

**4.1**

**Method details**

Filtered historical simulation is a Monte Carlo based approach which is very similar to

computing VaR using fully parametric Monte Carlo. The best way to illustrate the method

is to illustrate its use as a substitute for the fully parametric Monte Carlo. I do this first for

the case of a single-factor GARCH(1,1) model (equations (1) and (2)), and then discuss the

general case.

**Single Risk Factor**

To begin, suppose that the time-series process for the risk factor *r _{t}* is described by the

GARCH(1,1) model in equations (1) and (2), and that the conditional volatility of returns

tomorrow is *h _{t}*

_{+1}. Given, these conditions, VaR at a 10-day horizon (the horizon required by the 1996 Market Risk Amendment to the Basle Accord) can be computed by simulating

10-day return paths using fully parametric monte carlo. Generating a single path involves

14_{Engle and Manganelli (1999) propose an alternative approach in which the quantiles of portfolio value}

drawing the innovation_{t}_{+1}from its distribution (which is*N*(0*,*1)). Applying this innovation
in equation (1) generates *r _{t}*

_{+1}. Given

*h*

_{t}_{+1}and

*r*

_{t}_{+1}, equation (2) is then used to generate

*ht*+2. Given *ht*+2, the rest of the day path can be generated similarly. Repeating

10-day path generation thousands of times provides a simulated distribution of 10-10-day returns

conditional on *h _{t}*.

The difference between the above methodology and FHS is that the innovations are drawn from a different distribution. Like the monte-carlo method, the FHS method assumes that

the distribution of * _{t}* has mean 0, variance 1, and is i.i.d., but it relaxes the assumption of

normality in favor of the much weaker assumption that the distribution of * _{t}* is such that

the parameters of the GARCH(1,1) model can be consistently estimated. For the moment,

suppose that the parameters can be consistently estimated, and in fact have been estimated

correctly. If they are correct, then the estimates of *h _{t}*at each point in time are correct. This

means that since *r _{t}* is observable, equation (1) can be used to identify the past realizations

of * _{t}* in the data. Barone-Adesi, Giannopoulous, and Vosper (1999) refer to the series of

*that is identified as the time series of filtered shocks. Because these past realizations are*

_{t}i.i.d., one can make draws from their empirical distribution to generate paths of *r _{t}*.

The main insight of the FHS method is that it is possible to capture conditional

het-eroskedasticity in the data and still be somewhat unrestrictive about the shape of the

dis-tribution of the factors returns. Thus the method appears to combine the best elements of

conditional volatility models with the best elements of the historical simulation method.

**Multiple Risk Factors**

There are many ways the methodology can be extended to multiple risk factors. The simplest

extension is to assume that there are N risk factors which each follow a GARCH process in

which each factor’s conditional volatility is a function of lags of the factor and of lags of the

factor’s conditional volatility.15 To complete this simplest extension, an assumption about
the distribution of * _{t}*, the N-vector of the innovations, is needed. The simplest assumption

is that * _{t}* is distributed i.i.d. through time. Under this assumption, the implementation of

FHS in the multifactor case is a simple extension of the method in the single factor case.

As in the single-factor case, the elements of the vector * _{t}* are identified by estimating the

GARCH models for each risk factor. Draws from the empirical distribution of * _{t}* are made
by randomly drawing a date and using the realization of for that date.

This simple multivariate extension is the main focus of Barone-Adesi and Giannopoulos

(1999). This extension has two convenient properties. First, the volatility models are very

15_{The specification in equations (1) and (2) is a special case of a more general specification which has}

simple. One does not need to estimate a multivariate GARCH model to implement them.

The second advantage is that the method does not involve estimation of the correlation

matrix of the factors. Instead, the correlation of the factors is implicitly modelled through

the assumption that * _{t}* is i.i.d.

Although the simplest multivariate extension of FHS is convenient, the assumptions that it uses are not necessarily innocuous. The assumption that volatility depends only on a risk

factor’s own past lags, and its own past lagged volatility can be unrealistic whether there is

a single risk factor, or many. For example, if the risk factors are the returns of the FTSE

Index and that of the S & P 500, then if the S & P 500 is highly volatile today, then it may

influence the volatility of the FTSE tomorrow. A separate issue is the assumption that _{t}

is i.i.d. This assumption implies that the conditional correlation of the risk factors is fixed

through time. This assumption is also likely to be violated in practice.

Although the assumptions of the simplest extension of FHS may be violated in practice,

these problems can be fixed by complicating the modelling where necessary. For example,

the volatility modelling may be improved by conditioning on lagged values for other assets. Similarly, time-varying correlations can be modelled within the framework of multivariate

GARCH models.

To show the potential for improving upon simple implementations of the FHS method,

suppose that the conditional mean and variance-covariance matrix of the factors depends on

the past history of the factors.16 More specifically, let *r _{t}* be the factors at time

*t*; let

*h*be the history of the risk factors prior to time

_{r}_{t}_{−}*t*; let

*θ*be parameters of the data generting

process; let *µ*(*h _{r}_{t}_{−}, θ*) be the mean of the factors at time

*t*conditional on this history and

*θ*,

and let Σ(*h _{r}_{t}_{−1}, θ*) be the variance-covariance matrix of

*r*conditional on this history, and

_{t}*θ*. Given this notation, suppose that

*r*is generated according to

_{t}*rt*=*µ*(*hrt−, θ*) + Σ(*hrt−, θ*)*.*5*t* (4)

where *θ* are the parameters of the conditional mean and volatility model, and * _{t}* is i.i.d.

through time with mean 0 and variance *I*.17 If equation (4) is the data generating process,
then under appropriate regularity conditions (Bollerslev and Wooldridge (1992)), the *θ*
pa-rameters can be estimated by quasi-maximum likelihood.18 Therefore, * _{t}* can be identified,

16_{GARCH models are a special case of the general formulation.}
17_{Since} _{}

*t* is i.i.d., assumings its variance is *I* is without loss of generality since this assumption simply

normalizes Σ(*h _{r}_{t}_{−}, θ*).

18_{In quasi-maximum likelihood estimation (QMLE), the parameters,}_{θ}_{, are estimated by maximum }

and the FHS method can be implemented in this more general case.19

It may be possible to improve on this general case even further. One of the issues in

im-plementing the FHS method, or any historical simulation method is whether the filtered data series contain a sufficient number of shocks to fill the probability space of what could happen;

i.e. are there important shocks, or combinations of shocks that are under-represented in the

historical data series. If there are, then one possible fix is to create a set of filtered shocks

which are uncorrelated. This could be done by using a multivariate GARCH model with

time-varying covariances, as above. The marginal distribution of the filtered shocks could

then be fit using a semi-parametric or nonparametric method. These marginal distributions

could then be used in the same way as draws of random days in the FHS method. In this

case, VaR estimation would proceed by making i.i.d. draws from the marginal distributions

of the shocks, and then these would be applied to the GARCH model in the same way that

they are applid with Filtered Historical Simulation.

**4.2**

**Preliminary Analysis of Filtered Historical Simulation**

In this section, I conduct some preliminary analysis of the simplest Filtered Historical

Simu-lation Methodology. To analyze the method, I estimated GARCH(1,1) models for the same

exchange rates as in Table 1, but to reserve data for out of sample analysis, I only used data

from January 1973 through June 1986 in the estimation.

As noted above, an important assumption in the simple FHS approach is that the

cor-relation of the filtered data sets are constant through time. To investigate whether this

is satisfied, I split my estimation sample into two subsamples—one for the first half of

the estimation sample—and one for the second. I then used the distribution of Fisher’s

*z* transformation of the correlation coefficient to individually test whether each correlation
coefficient is different in the two time periods.20 The tests for each correlation coefficient
are not independendent, but the null hypothesis is overwheming rejected for 86 out of 90 of

the correlation coefficients. This suggests that the changes in the correlations across these

19_{Let ˆ}_{θ}_{of}_{θ}_{be a consistent estimate of}_{θ}_{. Then since} _{h}_{r}

*t−* is observable, from (4), a consistent estimate
of*t*is

* _{t}*= Σ(

*hrt−, θ*)

*−.*5

*r _{t}*

*−*

*µ*(

*hrt−, θ*)

20_{Formally, I tested the null hypotheses that the correlation coefficients in the two time periods were the}

same. Following Kendall and Stewart (1963), let *n* be the sample size and let *r* and *ρ* be the estimated
and true correlation coefficients; and define*z* and *ξ* by *z* =*.*5*log*_{1}1+* _{−}r_{r}*, and let

*ξ*=

*.*5

*log*

_{1}1+

*. Kendall and Stewart show that*

_{−}ρ_{ρ}*z−ξ*is approximately normal with approximate mean

_{2(}

_{n}ρ_{−}_{1)}and approximate variance

1

*n−*3. Let*z*1 and*z*2 be the estimates of*z*1in the two subsamples and assume that both subsamples have the

periods are statistically signicant. Examination of the differences in the coefficients in the

two periods suggests that they are economically significant as well (Table 7).

Perhaps, the fact that the correlations appear to have changed over a 13 year period of

time is not surprising. But the fact that they have shows that using historical simulation

with simple filtering could be problematic. The difficulty is that one may be making draws from time periods where the correlation of the shocks is different than it is today. This could

have the effect of making a risky position appear hedged, or a hedged position appear risky.

Even if correlations appear to have changed over a period of 13 years, if they appear to

remain fixed over reasonably long periods of times, then a simple solution to the time-varying

correlation problem is to only use recent data when doing filtered historical simulation. To

investigate this possibility, I examined the stability of the correlations over a much shorter

period of time—adjacent one-year intervals. Over this shortened period, there is less

evi-dence that correlations change through time, but there still appear to be economically and

statistically significant changes in many correlations (Table 8). One reason that the

correla-tions vary through time is that the univariate GARCH(1,1) models that are used here may be improperly specified. However, it appears that better-specified univariate GARCH

mod-els will not fix the problem because there is substantial evidence in the GARCH literature

which suggests that foreign exchange rates, do not have constant conditional correlations.

The only other quick fix to the problem with the correlations is to shorten the sample that

is used for filtered historical simulation by even more, but I am hesitant to do this for fear

of losing important nonparametric information about the tails of the risk factors.

The second exercise that I conducted examines the ability of the FHS method to

accu-rately estimate VaR at a 1% confidence level at a 10-day horizon. The accuracy of these VaR estimates is of particular interest because the BIS capital requirements for market risk

are based on VaR at this horizon and confidence level. If the VaR estimates are excellent at

this horizon, then a strong case could be made for moving toward using the FHS method to

compute capital for market risk.

Barone-Adesi, Giannopoulos, and Vosper, have written a number of papers in which they

examine the performance of the FHS method using real data. Because the FHS method

makes strong assumptions about how the data is generated, it is necessary to examine the

method using real data. But, examining the method using real data is not sufficient to

understand how the method performs. There are two weaknesses with only relying on real

data when analyzing a method. The first is that the typical methods for analyzing model performance, such as backtesting, have low power to detect poor models because detection

is based on losses exceeding VaR estimates of loss, and this is a rare event even if the VaR

ability to understand the properties of the VaR model is obfuscated by the simultaneous

occurrence of other types of model errors including errors in pricing, errors in GARCH

models, and other potential flaws in the VaR methodology.

To focus exclusively on potential difficulties with the FHS approach, while abstracting

from other sources of model error, the simulations in the second exercise are conducted under ”ideal” conditions for using the FHS methodology. In particular, I assume that the

parameters of the GARCH processes are estimated exactly, the filtered innovations that are

used in the simulation are the true filtered innovations, and that today’s conditional volatility

is known, and that all pricing models are correct. Because these conditions eliminate most

forms of model misspecification, the simulations help to examine how well the FHS method

works under nearly perfect conditions.

To examine the performance of FHS at a 10-day horizon, for each exchange rate’s

esti-mated GARCH process, I generated 2 years of random data, and then used the simulated

data with the true GARCH parameters to estimate VAR at a 10-day horizon using filtered

historical simulation. Each filtered historical simulation estimate of VaR was computed us-ing 10,000 sample paths. The VAR estimates usus-ing the FHS method were then evaluated by

comparing them with VAR estimates based on a full Monte-Carlo simulation (which used

100,000 sample paths). The errors of any particular FHS estimate will depend on the initial

conditional volatilities that are used to generate the sample paths and on any idiosyncracies

in a particular sample of filtered innovations. To date, I have only examined the behavior

of the method from initial conditional volatilities that were selected to be the same as those

at two different points in time. Therefore, my results should be viewed as conditional on

a particular set of beginning volatilities. That said, to keep the results from depending on any particular set of filtered innovations, for each set of beginning conditional volatilities the

results that I report for each exchange rate are based on 100 independent simulations of the

FHS method.21 Additional details on these simulations are provided in the appendix. My assessment of the FHS method is designed to answer two questions. The first is

whether, for a fixed span of historical data, VaR estimates using filtered historical simulation

do a good job of approximating VaR at a 10-day horizon. The second question is if filtered

historical simulation does not do well, then what is the source of the errors? Two sources are

examined. The first is that the number of sample paths in the filtered historical simulation

may have been too small. To address this issue, in the appendix I derive confidence bounds

for how big the errors in the VaR estimates would be if for a given set of historical data, VaR was computed using an infinite number of sample paths. If the errors with an infinite number

21_{Although the filtered innovations in each FHS estimate are independent, to economize on computation}

of sample paths remain large, then the number of sample paths is not the source of the error.

The second potential source of error is that the size of the historical sample that is used for

the bootstrap is too small. In other words, 2 years of historical data may not have enough

extreme observations to generate good estimates of 10-day VaR at the 1% confidence level.

Logically, this second source of error is the only other potential source because virtually all other sources of error have been eliminated from the simulation by design. Below, I provide

intuition for why I believe this second source of error is important.

**4.3**

**Results of Analysis of FHS**

The analysis here is still preliminary. The first issue I address is whether the VaR estimates

using filtered historical are downward biased; i.e. do they appear to understate risk. To

address this question, I use order-statistics from the monte carlo simulation of VaR to

cre-ate 95% confidence intervals for the percentage errors in the filtered historical simulation

VaR estimates.22 When the confidence interval for the percentage errors did not contain 0 percent error, the VaR estimate was classified as under- or over- stating true VaR based on

whether the confidence interval was bounded above or below by 0. The amount of under- or

overstatement was measured as the shortest distance from the edge of the confidence interval to 0. This is probably a conservative estimate of the true amount of under- or

overstate-ment. My main finding is that the filtered historical simulation method is biased towards

understating risk. At a 10-day horizon and 1% confidence interval the method was found

to understate risk about 2/3rds of the time, and when VaR was understated the average

amount of understatement was about 10% (Table 9). VaR was overstated about 28 percent

of the time; and conditional on VaR being overstated, the average amount of overstatement

was about 9%.

To investigate the source of the downward bias in the VaR estimates I first investigated whether additional monte-carlo draws using filtered historical simulation would produce

bet-ter results. I performed this analysis by creating 90% confidence inbet-tervals for the percentage

difference between the infinite sample (population) estimate of VaR using filtered historical

simulation and the infinite sample (population) estimate of VaR using pseudo monte carlo.

The difference between this analysis, and my analysis of the VaR percentage errors above is

that this analysis uses statistics to test whether the FHS VaR estimates could be improved

by increasing the number of sample paths to infinity. These confidence intervals are not as

precise as those used to analyze whether the VaR estimates were over or under stated, but

they are suggestive, and seem to indicate that even in infinite samples the VaR estimates

using filtered historical simulation in this context would be downward biased (Table 10).

The most likely explanation for the downward bias is a lack of extreme observations in

a short filtered data set. For example, when making draws from a 500 observation

histor-ical sample of filtered data, the extremes of the filtered sample are the lowest and highest observation in the data, which corresponds to roughly the 0.2’d and 99.8th percentile of the

probability distribution. Draws greater than or less than these percentiles cannot happen

in the filtered historical simulation. The question is how likely are draws outside of these

extremes when simulating returns over a 10-day period. The answer is that on a 10 day

sample path, the probability of at least one draw outside of these extremes is 1*−*(*.*996)10,
which is about 4%. Since these sample paths are also those where GARCH volatilities will

be higher (through GARCH-style amplification of shocks to volatility) these 4% of sample

paths are probably very likely to be important for risk at the 1% confidence level, but they

cannot be accounted for by the filtered historical simulation method when only a two year

sample of daily historical data is used in the simulation. A longer span of historical data may improve the filtered historical simulation method by allowing for more extreme observations,

but longer data spans mean it is necessary to address the time-varying correlation issue.

Before closing, it is useful to contrast my results on filtered historical simulation with

those of Barone-Adesi, Giannopoulos, and Vosper (2000). When working with real data,

they found that at long horizons (their longest horizon was 10 days) the FHS method tended

to overstate the risk of interest rate swap positions, and of portfolios which consist of interest

rate swaps, futures, and options. The finding that risk is overstated at these long horizons

stands in contrast to my results. If both sets of results are correct (remember my results are still preliminary), they suggest that several sources of bias are present. To better understand

the properties of filtered historical simulation in the real data, it would be useful to attempt

to separate the effects of model risk from errors due to the filtered historical simulation

procedure. This remains a task for future research.

**5**

**Conclusions**

Historical simulation based methods for computing VaR are becoming popular because these

methods are easy to implement. However, the properties of the methods are not well un-derstood. In this paper, I explore the properties of these methods and show that a number

of the methods are responsive to changes in conditional volatility. Despite the

under-responsiveness of these methods to changes in risk, there is strong reason to believe that

methods. I also investigated the properties of the filtered historical simulation method that

was recently introduced by Barone-Adesi, Giannopoulous, and Vosper (1999). The advantage

of the method is it allows for time-varying volatility but the distribution of the risk factor

innovations is modeled nonparametrically. While I think this new method has a great deal

of promise, the paper illustrates two areas in which the method needs further development. The first is modeling time-varying correlations. The evidence in this paper suggests that

time-varying correlations may be important. Filtered historical simulation methods need

to be improved in ways that account for these correlations. The second area which needs

further development is establishing the number of years of historical data which are needed

to produce accurate VaR estimates. Results in this paper suggest that 500 days of daily

data may not be enough to accurately compute VaR at a 10-day horizon because a sample

period this short may not contain enough extreme observations. Methods for choosing the

length of the historical data series, or methods to augment historical data with additional

extreme observations (perhaps with a parametric model fit to the historical data) are topics