Variance reduction - Monte Carlo Simulation

Monte Carlo Simulation

4.3 Variance reduction

There are two major avenues to take in order to reduce variance in aMC simula-tion, one is to take advantage of specific features of the problem domain to adjust or correct simulation outputs, the other by directly reducing the variability of the simulation inputs. In this section we’ll introduce control variates, antithetic variance, absorbing technique, full truncation, and the case of the quasi-random simulation.

4.3.1 Control variate technique

In order to improve on the accuracy of the estimated variate, it is essential to use as much information as possible. We can derive very important information for the estimated variate, if we can define another variate, one of which it is not difficult to calculate it’s expected value, that is highly correlated with the estimated one. We could then use this new variate as a control mechanism in order to improve the accuracy of the simulation by minimising the errors between the two variates.

Suppose that, as in Section 4.2, we wish to estimate the call option price C := E [Y ], where Y := h(X) is for instance the discounted payoff for a call option for which we have no closed-form evaluation. We do however know of another variate Z, which we can easily generate, whose expected value E [Z] we can also quickly calculate. Now suppose that for each replication of Yi we also generate a value for Zi, and that the pair (Yi, Z_i) isi.i.d., then for any fixed c we can calculate,

Y_i^c= Y_i+ c(Z_i− E [Z]), from the i^th path and thus calculate the sample mean,

Yˆ^c= ˆY + c( ˆZ − E [Z]) = 1

This is an unbiased estimator of E [Y ]²that can act as a control variate estimator, that is the observed residual Z − E [Z] acts as a control when estimating E [Y ].

If we now compute the variance of ˆY^c we get, V arh ˆY^c

= V ar [Yi+ c(Zi− E [Z])]

= σ_Y² + 2cσ_Zσ_Yρ_ZY + c²σ²_Z ≡ σ²_c, (4.3.1) where σZ² = V ar [Z] , σ_Y² = V ar [Y ], and ρZY is the correlation factor between Z and Y . In order to minimise the variance of the replication we choose a c^∗ that

2Since E [Yi^c] = E [Yⁱ+ c(Zi− E [Z])] = Eh ˆYi

= E [Y ].

minimises Equation 4.3.1 and is expressed as,

c^∗ = −σ_Y σZ

ρ_ZY = −Cov[Z, Y ]

V ar[Z] . (4.3.2)

Substituting Equation4.3.2 into 4.3.1we get,

V arh ˆY^ci

therefore, as long as Cov[Z, Y ] 6= 0 we can achieve a reduction in variance of the estimated variate. In fact the higher the correlation, ρZY, is the higher the reduction to the variance will be.

All that remains now is to find an appropriate control variate that is non-zero correlated to the estimated variate. Since we are trying to find the discounted payoff of a call option, we could use the Black & Scholes SDE as the Z control variate of the above example.

Remark. If we were to examine the ratio of the controlled estimator to that of the uncontrolled estimator we could derive that,

V arh ˆY + c( ˆZ − E [Z])i

V ar[ ˆY ] = 1 − ρ²_ZY, (4.3.4) where what is implied is that the stronger the correlation between the estimated Y and control Z variate, the more effective the control variate is. N.B. that this effect is irrelevant of the sign of correlation since this is canceled in the square form in Equation 4.3.4.

The question remains however as to how to estimate this correlation of the two bi-variates. This can be achieved by doing t training simulation runs in which the correlation coefficient is calculated, first by calculating the covariance of the two variates as,

And by selecting an appropriate control variable, we are able to calculate the expected value as well as the sample variance of the replications. For the variance we get,

V ar[Z] =\ Pt

=1(Z_− E [Z])²

t − 1 ,

and hence we can derive from Equation4.3.2the optimal constant ˆc^∗to use for the control variate simulation. Once we have this constant we can use it to perform a simulation and reduce the variance with a control variate.

A simple algorithm to simulate the variate V with a control variate Z would look like Algorithm 2.

Algorithm 2 Monte Carlo Simulation with control variate(t, N ) Require: t > 1, N > 0.

Ensure: E [Y ] = ˆV_N.

1: for i = 1 to t do

2: generate (Yi, Z_i) {training run to calculate the constant factor c*}

3: end for

This method reduces variance by introducing a negative dependence between pairs of replications. We will present the case where the replications are sampled from a Uniform distribution, however this method can take various forms. As [Gla04, p.205] mention, this method can be extended to various distributions via the inverse transform method where F⁻¹(U ) and F⁻¹(1 − U ) both have distribution F, but are antithetic to each other because F⁻¹ is monotonic. As an example it is possible to use a pair or replications from the normal distribution, by pairing a sequence Z1, Z₂, · · · of i.i.d. N(0,1) random variables with the antithetic sequence

−Z₁, −Z₂, · · · of i.i.d. N(0,1) random variables.

Let us now extend the paradigm we presented in Chapter 4.3.1 for the price of a call option. In this case we generate two pairs of antithetic replications from a Uniform distribution and use them as the unbiased estimator,

Z_ı = Y_i+ ˜Y_i

2 , (4.3.5)

where Yi = h(U_i)is the payoff of the call option sampled on a Uniform distribution Ui ∼ U(0,1), and ˜Yi = h(U_i)is the payoff of the call option sampled on an antithetic

Uniform distribution 1 − Ui ∼ U(0,1).

Since E[Y ] = E[ ˜Y ] = ˆY , then from Equation (4.3.5) we deduce that Zi is an unbiased estimator of ˆY , and because the Ui⁰s are i.i.d. we can use Zi to construct confidence intervals. Algorithm 3 explains how the MC simulation would implement antithetic variates.

Algorithm 3 Monte Carlo Simulation with antithetic variates(N ) Require: N > 0.

One more procedure to reduce the variance of the simulation is to sample for variates of lower variance. Such numbers can be sampled from the so called "low discrepancy" sequences. A sequence’s discrepancy is a measure of its uniformity and is defined by following definition [see [Lev02]].

Definition 4.1. Given a set of points x¹, x², · · · , x^N ∈ I^S and a subset G ' I^S, define the counting function S_N(G) as the number of points xⁱ ∈ G. For each x = (x₁, x₂, · · · , x_S) ∈ I^S, let G_x be the rectangular s-dimensional region,

The discrepancy value of the distribution compares the sample points found in the volume of a multi-dimensional space, against the points that should be in that volume provided it was a uniform distribution.

There are a few sequences that are being used to generate quasi-random vari-ates. The Numerical Algorithms Group (NAG) libraries provide three sequence generators. The Niedereiter [Nid92], the Sobol [Sob67], and the Faure [FAU81]

sequence are implemented in MATLAB with the functions g05yl and g05ym.

4.4 Discretisation Schemes for Stochastic Differ-ential Equations

In §4.3we described methods of reducing the variance of the Monte Carlo simula-tion and thus increasing the precision of the estimated value at the end. However, there is one more factor of error that needs to be taken into account and addressed, and that’s the simulation bias due to the discretisation of the SDE. One way to think of this is with by shooting arrows at a bullseye target; high precision shots form a tight cluster of arrows, but they could be completely outside of the circles due to high bias. We will continue to present the Euler Scheme, which is the simplest and most common form of discretisation, before we proceed to refine and extend it to alternative schemes.

4.4.1 Euler-Maruyama scheme

Let us consider the case of the following SDE,

dX(t) = α(X(t))dt + β(X(t))dW (t). (4.4.1) Also let ˆXbe the discretised approximation of X. The Euler Maruymama [Mar55]

approximation, and temporal granulation 0 = t0 < t₁ < · · · < t_m, and ˆX is,

X(tˆ _i+1) = ˆX(t_i) + α( ˆX(t_i))∆t + β( ˆX(t_i))√

∆tZ_i+1, (4.4.2) for i = 0, · · · , m − 1, Zi i.i.d. normal variates, and ∆t = ti+1− ti.

Since the discretisation is an approximation process, it is imperative to mea-sure how accurate this approximation ultimately is. To do this we need to evaluate the discrepancy between the SDE and its discrete approximation conditional on the size of ∆t. In essence we want to evaluate if the error

||X(t) − ˆX(t)||i

→ 0,

when ∆t → 0. There are two accepted metrics to measure this discrepancy; the weak convergence that shows the error of the mean, and the strong convergence which shows the mean of the error.

A typical weak convergence error has the form,

^weak_∆t := sup

where f is a smoothing polynomial of some order k. We say that the discretised approximation ˆX(t) converges weakly if ^weak_∆t → 0 when ∆t → 0. The order of the weak convergence is γ > 0 when

^weak_∆t ≤ C∆t^γ, for some scalar C and for all sufficiently small ∆t.

Conversely the discretised approximation ˆX(t) converges strongly if for the strong error convergence is γ > 0 when

^strong_∆t ≤ C∆t^γ,

for some scalar C and for all sufficiently small ∆t. According to [Gla04, p.345], the Euler scheme typically has a strong order of ¹₂, but often achieves a weak order of 1.

4.4.2 Milstein scheme

This scheme was first proposed by Milstein [Mil95], and is explained in detail by Glasserman [Gla04, p.340-344], Klöden and Platen[KPS94] for more general processes, and in Kahl and Jäckel [KJ06] for stochastic volatility processes. The scheme works forSDEs for which the drift and diffusion terms are not dependent on time directly. Let us take the case of a stochastic process as defined in Equation (4.4.1). The Milstein discretisation scheme can then be expressed as

X(i + 1) = ˆˆ X(i) + α( ˆX(i))∆t + β( ˆX(i))√

∆tZ_i+1 + 1

2β⁰( ˆX(i))β( ˆX(i))∆t(Z_i+1² − 1). (4.4.3) The discretisation Equation 4.4.3 is composed by a deterministic part which is defined by the α term, a stochastic term that is defined by the β term, and an Itô’s term.

4.4.3 Kahl-Jäckel scheme

Kahl and Jäckel [KJ06, p.24] propose an implicit Milstein scheme for the variance in combination with an alternative discretisation for the underlying’s price pro-cess. Specifically they refer to this stochastic volatility scheme as IJK and define it as

where θ is the equilibrium supported by the fundamentals, κ is the rate at which the shocks dissipate and the variance returns to θ, ξ is the degree of volatility around it caused by shocks, and ZX, Z_V are Normal variates.

By finding the minimum of the variance function and forcing its value to be positive we can easily derive that the variance is guaranteed to be strictly positive if 4κθ > ξ². In reality, as Andersen [And07] mentions, it is unrealistic to uphold this constraint in realistic situations, hence this scheme will can and will produce negative variance. Andersen proposes then a full truncation to 0 when a value is negative. This means the Equation (4.4.4) will substitute ˆV (· · · )⁺ = max( ˆV (· · · ), 0) where there is ˆV (· · · ).

4.4.4 Broadie-Kaya exact calculation scheme

Broadie and Kaya [BK06] presented a discretisation process that is completely bias-free. This exact scheme though has limitions³ and sub-optimal performance even against the "simple" Euler scheme as Lord et al. [LKvD06] showed in their numerical comparisons⁴.

To obtain the bias-free scheme begin with the consecutive application of Itô’s Lemma first to get the explicit form and then to pass on to a Cholesky decom-position (see [And07, p.7] for a detailed derivation). What is finally obtained is,

3Due to lack of speed and high complexity of implementation.

4Most likely due to the reliance to the acceptance-rejection sampling that induces a perfor-mance penalty.

V (t + ∆t) = V (t) +

The distribution of ln X(t+∆t) is clearly Normal, and after sampling V (t+∆t) from a non-central χ², with degrees of freedom defined from a Poisson sampling, we can draw a sample R_t^t+∆tV (u)du|V (t + ∆t), and calculate the next log value of X. Since this last sampled distribution is conditional on the next value of the variance, we can identify it as a Brownian Bridge.

4.4.5 Quadratic Exponential (QE) scheme

In 2005 Andersen [And07] proposed a new scheme to discretise the stochastic volatility and the price of an underlying asset. This scheme takes advantage of the fact that a non-central χ² sampled variate can be approximated by a related distribution, that’s moment-matched to the conditional first and second moments of the non-central χ² distribution.

As Andersen points out, the cubic transformation of the Normal RV is a more accurate representation of the distribution closer to 0, it introduces negative values of variance. Thus the quadratic representation is adopted with a spacial case for when we have low values of V (t). Therefore when V (t) is sufficiently large, we get,

V (t + ∆t) = a(b + Zˆ V)², (4.4.8) where ZV is an N(0,1) Gaussian RV, and a, b scalars that will be determined by moment-matching. Now for the complementary low values of V (t) the distribution can –asymptotically– be approximated by,

P(V (t + ∆t) ∈ [x, x + ∆t]) ≈ (pδ(0) + β(1 − p)eˆ ^−βx)dx, x ≥ 0, (4.4.9) where δ is the, strongly reflective at 0, Dirac delta-function, and p and β are positive scalars to be calculated. The scalars a, b, p, β depend on the parameters of the Heston model and the time granulation ∆t, and will be calculated by moment-matching the exact distribution.

To sample from these distributions there are two distributions to take into account:

◦ Sample from the normal N(0,1) Gaussian RV and calculate ˆV (t + ∆t) from Equation (4.4.8).

◦ To sample for the small values of V the inverse of Equation (4.4.9) will be used. The inverse of the distribution function is,

Ψ⁻¹(u) = Ψ⁻¹(u; p, β) =

(0 if 0 ≤ u ≤ p,

β⁻¹ln _1−u^1−p

if p ≤ u ≤ 1. (4.4.10) The value of V can then be sampled from

V (t + ∆t) = Ψˆ ⁻¹(U_V; p, β), (4.4.11) where UV is a uniform RV.

The rule on deciding which descritisation of V to use is depended on the non-centrality of the distribution, and can be triaged based on the value of ψ. The value of ψ is , where m, s² are the conditional mean and variance of the exact distribution we are matching. What Andersen showed was that the quadratic scheme of Equation 4.4.8can only be moment-matched for ψ ≤ 2 and similarly the exponential scheme of Equation 4.4.11 can only be moment-matched for ψ ≥ 1. It emerges then that there is an overlap interval for ψ ∈ [1, 2] where the two schemes overlap.

Intuitively Andersen chooses the midpoint of this interval as the cut-off point between the schemes; thus the cut-off ψc = 1.5.

Since we’ve defined the discretisation process for the QEscheme, with Equa-tions 4.4.8 and 4.4.11, and the cut-off discriminator, what is left is to calculate the remaining parameters a, b, p, β for each case. The algorithm for this process is detailed in Algorithm 4.

This algorithm is implemented in MATLAB [see Listing A.4 for details], and is used for numerical comparisons of acceleration.

In document Case Studies in Acceleration of Heston s Stochastic Volatility Financial Engineering Model: GPU, Cloud and FPGA Implementations (Page 33-41)