Case Studies in Acceleration of Heston s Stochastic Volatility Financial Engineering Model: GPU, Cloud and FPGA Implementations

(1)

Case Studies in Acceleration of

Heston’s Stochastic Volatility

Financial Engineering Model:

GPU, Cloud and FPGA Implementations

by

Christos Delivorias

Supervised by

Dr. Peter Richtárik

and Martin Takáč

Dissertation presented for the Degree of

MSc in Operational Research

The School of Mathematics

August 2012

(2)

(3)

I want to thank first and foremost Mr. Erik Vynckier of Scottish Widows Investment Partnership(SWIP) in Edinburgh, both for giving me the opportunity to work on this project, as well as his support and guidance

throughout the project. He has been an invaluable mentor to me. I also would like to thank my academic supervisor Dr. Peter Richtárik for his help and also for believing in me and suggesting me for this project. I especially

want to thank Mr. Martin Takáč in his role as a co-supervisor. His experience with aspects of GPU parallelisation have been invaluable, as has his instantaneous availability to my questions. Much aspects of this thesis might

have been left uncompleted without his assistance and guidance. I would also like to thank Messrs Robin Bruce and Steven Hutt at Maxeler Technologies for their hospitality in London and their support with getting to grips with Dataflow processing. On the same note, I’d like to acknowledge the help of Messrs Tuomas Eerola and Rainer Wehkamp at Techila Technologies for

extending to myself and SWIP the use of their grid processing platform. Acknowledgments also go out to Mr. John Holden for providing a trial license for the Numerical Algorithms Group (NAG) libraries for Matlab, to use in this project. Similarly, to Mr. John Ashley at NVIDIA Corp for his input regarding the physical setup of the GPU server, to Mathworks for providing a trial license for Matlab to run on the GPU server, and to Boston Limited for providing the

GPU test platform.

Finally, and most importantly for myself, I want to thank my girlfriend and partner in life Eugenia Metzaki. My love, I would not have made it thus far

without your help, your support, and your faith in me. Thank you.

(4)

(5)

Declaration

I declare that this thesis was composed by myself and that the work contained therein is my own, except where explicitly stated otherwise in the text.

(6)

Abstract

Here we present a comparative insight of the performance of the Heston stochastic volatility model on different acceleration platforms. This model was tested against a MacBook’s CPU, a Techila grid server hosted on Microsoft’s Azure cloud, a GPU node hosted by Boston Ltd, and an FPGA node hosted by Maxeler Technologies Ltd. Temporal data was collected and compared against the CPU baseline to provide quantified acceleration benefits from all platforms.

(7)

List of Tables

6.1 GPGPU specifications for the dual Kepler cards. . . 41

6.2 GPGPU specifications from NVIDIA’s website. . . 42

(11)

List of Figures

3.1 Prices of 3 indices from 1990 to 2012 . . . 7

3.2 GARCH volatility of 3 indices from 1990 to 2012 . . . 8

3.3 A plotting of the Cox-Ingersoll-Ross process . . . 10

4.1 Monte Carlo (MC) simulation procedure. . . 32

4.2 Sobol sequence for quasi-random variates . . . 33

4.3 platforms sequence for quasi-random variates . . . 33

4.4 Faure sequence for quasi-random variates . . . 34

5.1 Central Processing Unit (CPU) Architecture . . . 37

5.2 The FPGA architecture . . . 38

5.3 Code use with the Maxeler Technologies . . . 39

6.1 Memory space allocation on a GPU . . . 41

6.2 Schematic of the multiple cores in the NVIDIA Fermi GPU . . . . 44

7.1 Parallel problem architecture. . . 45

7.2 Embarrassingly parallel problem architecture . . . 46

7.3 Techila Architecture . . . 47

8.1 Matlab figures for the Monte Carlo simulation of the Heston model 49 8.2 Call option prices with multiple models . . . 51

8.3 Monte Carlo standard error . . . 52

(12)

List of Algorithms

1 MC Simulation(N ) . . . 18 2 Monte Carlo Simulation with control variate(t, N ) . . 21 3 Monte Carlo Simulation with antithetic variates(N ) . 22 4 Quadratic Exponential Variance Reduction . . . 28

(13)

Acronyms and Abbreviations

ALU Arithmetic Logic Unit . . . .37

AJD Affine Jump Diffusion . . . .11

ATM At the Money . . . .50

BLAST Basic Local Alignment Search Tool . . . .46

B&S Black & Scholes . . . .5

CIR Cox-Ingersoll-Ross . . . .9

CPU Central Processing Unit. . . .35

KS Kolmogorov-Smirnov. . . .29

CUDA Compute Unified Device Architecture . . . .42

CI Confidence Interval . . . .48

DoF Degrees of Freedom. . . .6

GPU Graphical Processing Unit . . . .2

GPGPU General Purpose computing on GPU. . . .40

HW Half-Width . . . .18

HDL Hardware Description Language . . . .35

HFT High Frequancy Trading . . . .1

i.i.d. independent and identically distributed . . . .16

(14)

MC Monte Carlo . . . .17

MCMC Markov Chain Monte Carlo . . . .50

MT Mersenne Twister . . . .30

NAG Numerical Algorithms Group . . . .23

FPGA Field Programmable Gate Array . . . .2

GARCH Generalized Auto-Regressive Conditional Heteroskedasticity. . . .6

GBM Geometric Brownian Motion . . . .4

GPU Graphical Processor Unit . . . .2

OS Operating System. . . .36

PCT Parallel Computing Toolbox . . . .53

PCIe Peripheral Component Interconnect express. . . .36

PDE Partial Differential Equation . . . .2

PDF Probability Density Function. . . .29

PRNG Pseudo-Random Number Generator . . . .28

QE Quadratic Exponential. . . .26

RV Random Variable . . . .5

SDE Stochastic Differential Equation . . . .2

(15)

Chapter 1 Introduction

The purpose of this thesis is to explore different ways to accelerate a certain financial mode, specifically the Heston stochastic volatility mode. The platforms to be examined are the cutting edge at the present time in parallelisation of algorithms. For myself this was a completely novel and unchartered territory. Coming from a background in Computer Science and taken a Risk management approach with the MSc in Operational Research, this project was combining aspects of financial mathematics that challenged me to quickly come to grips with. It has been a steep learning curve, but I’ve thoroughly enjoyed mastering this new domain of knowledge.

1.1 Motivation

For a long time now, even before the 2008 financial crisis, the derivatives sector was in need of a fast and efficient way to calculate the prices of options given certain market data. The use of this is two fold; first the ability to price the option value on a given underlying is important in a fast-paced market environment. The second use is in a relatively new sector of the market that deals with High Frequancy Trading (HFT). This sector is relying on extremely fast computations in order to make algorithmic decisions based on the current market status. This thesis will focus on the pricing of options rather than the aspects of HFT. The informed reader would to look into some of the negative aspects of this practice as explained by Easley et al. [ELO].

This joint project between the University of Edinburgh and Scottish Widows Investment Partership aims to explore the possibilities in accelerating financial engineering models ,with real life applications in the markets. The goal of this thesis was to test the same model on different platforms and assess the benefits of accelerations that each platform provided.

(16)

1.2 Outline of the Thesis

Chapter 2 introduces the pricing theory behind the derivatives options and states some of the short-comings of the previous models. This chapter links the Stochastic Differential Equations (SDEs) with the Partial Differential Equations (PDEs). The introduction of stochastic volatility to combat the issues raised in Chapter 2 is addressed in Chapter 3. Chapter 4 explains the theory and the implementation behind evaluating an SDE using a Monte Carlo simulation. Chapters 5, 6, and 7 go into more details on the acceleration platforms of Field Programmable Gate Arrays (FPGAs), Graphical Processor Units (GPUs), and the Techila Cloud, re-spectively. Chapter 8 details the efficiency and the accuracy of the implementation of the Heston model in Matlab, and finally Chapter 9 presents the experimental results and concludes the thesis.

(17)

Chapter 2 Option Pricing Theory

This chapter will touch upon the fundamentals of arbitrage theory in pricing the fair value of an option. An option provides the bearer with the opportunity, but not the obligation, to sell(put) or buy(call) the underlying at fixed price(strike). If the option is not exercised nothing happens, aside from the premium paid for the option is lost. There are currently two ways to achieve this pricing. One is via the option’s characteristic PDE and the other with the Monte Carlo simulation. The PDE option will be presented in Section 2.4 and will be used to ana-lytically evaluate the price of the Heston model in Section 3.2. The risk-neutral approach in Section 2.2 will be used to numerically simulate the price of the underlying with the Monte Carlo simulation, as presented in Chapter 4.

2.1 Arbitrage-free and complete market

An arbitrage is the practice of identifying and taking advantage henceforth of mis-priced assets that are trading in multiple exchanges. In effect this is taking advantage of someone selling an asset for a lower price than another party is willing to buy it for; and vice-versa. By acting in-between the price spread, the arbitrageurs can make a profit. The act of arbitrage itself changes the underlying’s value and thus pushes it towards an equilibrium. An arbitrage-free market is one where such imbalances do not exist. This idea is crucial when trying to evaluate the risk-neutral evaluation of an option.

Given a portfolio with M underlying assets, excluding the risk-free asset, and R the number of sources of randomness, it is possible to define a "meta-theorem" to determine if the portfolio represents a complete and arbitrage-free market. As stated in [Bj9, p. 122], we can define the following rule of thumb.

Theorem 2.1. Let M denote the number of underlying traded assets in the model excluding the risk free asset, and let R denote the number of random sources. Generically we then have the following relations:

(18)

1. The model is arbitrage free if and only if M ≤ R. 2. The model is complete if and only if M ≥ R.

3. The model is complete and arbitrage free if and only if M = R.

Remark. The practical implication of a complete market is that options are head-geable and can therefore be priced. On the other hand an incomplete market is one where derivative securities cannot be perfectly hedged, and the price of an option cannot be derived by the prices of its underlying assets.

For the purposes of this thesis we will assume a frictionless market, with no taxes or transaction costs and premiums. For simplicity of the modelling we will also assume a non-dividend paying underlying. We will proceed to examine the relationship between two assets; the option and the underlying.

2.2 Risk-neutral valuation

The idea of a risk neutral valuation states that a derivative’s price is calculated by the expecation of the discounted price at maturity, under a risk neutral mea-sure. What this implies is that the the discounted option processes are indeed Martingales under the risk neutral measure in a no arbitrage market. In this case the value of the derivative is the expected value at maturity of the payoff and the expected return of the underlying asset equals the risk-free rate.

2.3 Feynman-Kac theorem

The Feynman-Kac theorem is fundamental to financial modelling as it links the

PDE to the SDE domain. What follows is a brief presentation of the theorem. The theorem is explained in detail in [Kle05].

Theorem 2.2. Let xt follow the SDE

dxt = µ(xt, t)dt + σ(xt, t)dWtQ, (2.3.1)

where WQ

t is a Geometric Brownian Motion (GBM) under the probablity measure

Q. Now let f (xt, t) be a differentiable function of xt and t that follows the PDE,

∂f ∂t + µ(xt, t) ∂f ∂x + 1 2σ(xt, t) 2∂2f ∂x2 − r(xt, t)f (xt, t) = 0,

on the condition at the boundary of f (XT, T ). Under the provision of the theorem,

f has the solution,

f (xt, t) = EQ h e−RtTr(xu,u)duf (X T, T )|F i . (2.3.2)

(19)

2.4 Black

& Scholes (

B&S

) model

TheB&Smodel formulates the assumption that the price of an asset is dependent on a GBM.

dSt= µStdt + σStdWt, (2.4.1)

where the drift µ and the volatility σ are assumed to be constant. This means that the asset prices are log-normally distributed. This however is on the antipode from recent empirical observations which dictate the volatility to be non-Gaussian and in fact stochastic.

The price CT for a vanilla call option according to B&S is

CT = e−rTE (S0e(r− 1 2σ2)T +σ √ T N(0,1) _{− K)}+ , (2.4.2)

where r the risk-free interest rate, T the time to maturity, K is the strike price, S0 the spot price at t = 0, σ is the volatility, and N(0,1) is a Gaussian Random

Variable (RV), in this example the payoff f of a call option f = (S0e(r−

1 2σ

2_{)T +σ}√_{T N}

(20)

Chapter 3 Underlying Price Volatility

For many years the B&S model was the de facto method for pricing derivative securities. The purpose of the B&S model was to create a risk-neutral asset that could be perfectly hedged against market volatility. In their seminal paper Fis-cher Black and Myron Scholes [BS73] described the partial differential equation that governs the price of a derivative over time. The key insight of this approach was to hedge the security by buying or selling the underlying, at a certain ratio ∆ which was calculated by the B&S equation. This kind of strategy was also referred as dynamic delta hedging.

However recent criticism [HT09] has highlighted the weaknesses and limita-tions of the model. One of the most prominent remarks is the incapacity of the Gaussian distribution to properly address the probability of events in the tails of the distribution. Suggestions are being made to use the Student’s t-distribution which is a fat-tailed distribution for low Degrees of Freedoms (DoFs).

3.1 Stochastic Volatility

As is suggested in [Kwo08] there is an apparent clustering of volatility with regards to an assets value. The clustering is characterised by a mean-reverting behaviour for the volatility. The assumption made byB&Swas that of a constant volatility. However empirical evidence has shown that the volatility of an asset price is not described very well by the Normal distribution, when in fact the volatility follows a leptokurtic distribution1_{. Figure} _3.1 _{shows the prices of 3 indices (FTSE100,} S&P500, and NIKKEI225) from 1990 until today. Figure3.2 shows the volatility of each index modelled with GARCH(1,1)2 _{since 1990. It is apparent that the}

1_{Essentially the distribution has higher peaks around the mean and fatter tails, than that}

of a Gaussian distribution.

2_{The Generalized Auto-Regressive Conditional Heteroskedasticity (}_GARCH_{) model aims to}

(21)

31−Dec−19890 05−Jul−1997 07−Jan−2005 12−Jul−2012 5000

10000

Time (date)

Adjusted Close price ($)

FTSE100

31−Dec−19890 05−Jul−1997 07−Jan−2005 12−Jul−2012 500

1000 1500 2000

Time (date)

SP500

03−Jan−19900 07−Jul−1997 08−Jan−2005 12−Jul−2012 20000

40000 0

Time (date)

NiKKEI225

Figure 3.1

The FTSE100, S&P500, and NIKKEI225 index values from 1990 to 2012. The Matlab code for these plots can be found in Listing A.3.

volatility is not normally distributed.

Since volatility drives the asset price, it became apparent that a more realistic model was required to represent volatility. Heston [Hes93] showed that theB&S

model could be extended, by taking into account a stochastically moving volatility.

kyrtosis. Volatility clustering is the effect that large changes in prices tend to be followed by large changes, of either sign, and vice versa.

(22)

31−Dec−1989−0.1 05−Jul−1997 07−Jan−2005 12−Jul−2012 −0.05

0 0.05 0.1

03−Jan−1990−0.1 07−Jul−1997 08−Jan−2005 12−Jul−2012 −0.05

0 0.05 0.1

31−Dec−1989−0.1 05−Jul−1997 07−Jan−2005 12−Jul−2012 −0.05

0 0.05 0.1

FTSE100 daily returns GARCH(1,1) conditional SD

NIKKEI225 daily returns GARCH(1,1) conditional SD

SP500 daily returns GARCH(1,1) conditional SD

Figure 3.2

The correspoding conditional variance (in red) calculated with GARCH(1,1), overlayed on the daily returns of each index. The Matlab code for these plots can be found in Listing A.3.

3.2 Heston’s Model

The Heston model extends the B&S model by taking into account a stochastic volatility, that is mean-reverting and is also correlated with the asset price. It assumes that both the asset price and its variance are determined by a joint stochastic process.

Definition 3.1. The Heston model that defines the price of the underlying S and its stochastic volatility v is defined as

dSt= µStdt + √ vtStdWtS, dvt= κ(θ − vt)dt + ξ √ vtdWtv, CovhdW_tS, dW_tvi = ρdt, (3.2.1)

(23)

where the two GBMs(WS_{, W}v_{) are correlated by a coefficient ρ, κ is the rate of}

reversion of the variance to θ – the long-term variance –, ξ is the volatility of variance, and µ is the rate of return of the asset.

The Heston model is a superior choice to the B&S model since it provides a model that has a dynamic stochastic volatility, as described in Equation (3.2.1). This model has a semi-analytic formula that can be exploited to derive an inte-gral solution which is described in §3.2.2. Additionally if the [Fel51] condition is upheld, this process will only produce strictly positive variance with probability 1; this is described in Lemma 3.2.

Lemma 3.2. If the parameters of the Heston model obey the condition 2κθ ≥ ξ2,

then the stochastic process will produce volatility such that Pr (vt > 0) = 1, since

the upward drift is large enough to strongly reflect away from the origin.

3.2.1 Cox-Ingersoll-Ross Process

Cox et al. [CIR85] described a Markov process with continuous paths defined by the SDE

dvt= κ(θ − vt)dt + ξ

√

vtdWtv, (3.2.2)

where θ is the equilibrium supported by the fundamentals, κ is the rate at which the shocks dissipate and the variance returns to θ, ξ is the degree of volatility around it caused by shocks, and dWt is a GBM. Figure 3.3 shows the plot of a

Cox-Ingersoll-Ross (CIR) process.

This process is mean-reverting, ergodic3_{and can be thought of as an} Ornstein-Uhlenbeck4_{process [see [Gil96] for more on implementation, and the Matlab code} in Listing A.2].

The probability density function of the variance at time s in the future, con-ditional on its value at the current time, t ≤ s, as described in [CIR85, p. 391] is f (vs, s; vt, t) = ce−u−ν ν u q₂ Iq(2(uν) 1 2),

3_{An ergodic process is one which its statistical properties, e.g. mean and variance, can be}

deduced from a single, sufficiently long realisation of the process.

4_{Describes the velocity of a Brownian particle under friction. It’s stationary, Gaussian,}

(24)

0 50 100 150 200 250 −2 0 2 4 6

Euler Discritisation, sampling from N

(0,1) for 1 year(s) Time(business days) S(t) 0 50 100 150 200 250 −2 0 2 4 6

Euler Discritisation, sampling from the terminal non−central χ2 distribution for 1 year(s)

Time(business days)

S(t)

Figure 3.3

Two CIR processes for 5 periods, with different distributions, where dt = 1, mean reversion rate λ = 3.0, long term mean µ = 1, and a volatility of ξ = 0.6. The process starts at S(0)=5 reverting to a long term mean of 1 (the mean is marked in the figure with a horizontal black dashed line). Note how the top process breaches the lower barrier of 0, and becomes negative, while the second one is highly reflective at 0. where c = 2κ ξ2_{(1 − e}−κ(s−t), u = cvte−κ(s−t), ν = cvs, q = 2κθ ξ2 − 1,

and Iq(·) is the modified Bessel function of the first kind of order q. This

con-ditional distribution function is a non-central χ2_{, e.g. v}

t ∼ χ2(2q + 2, 2u), with

(25)

as s → ∞, is distributed as a gamma distribution, i.e. f ∼ Γ(2κθ ξ2 ,

ξ2 2θ)

5_{. The}

CIR process itself is a special case of the basic affine diffusion function which are described in §3.3.1.

3.2.2 Heston

PDE

The process followed by the variance, as shown in Equation (3.2.1), is aCIR pro-cess. These kinds of processes, as presented in §3.2.1, are a subset of Affine Jump Diffusion (AJD) processes, in that they present no jumps. The following deriva-tion of the Heston model’s PDE is a special case of PDE for general stochastic volatility models as described in [Gat06]. Given the Heston model as we defined it in §3.2 there are two random sources – the price and variance Wiener processes – and only one traded asset, the underlying option (since the volatility is not a traded asset6_{), then according to [Bj9, p. 118] this model is arbitrage-free but} incomplete. In order to make it complete we need to add another traded asset with associated stochastic risk to it, so that the portfolio can be hedged.

Theorem 3.3. Let a portfolio Π(t) consisting an option V (S, v, t; T ) with matu-rity T , ∆ units of the underlying stock S, and ∆1 units of another hedging option

U (S, v, t; T ) with maturity T . The PDE is then described by the equation,

0 = ∂V ∂t + 1 2vS 2∂2V ∂S2 + ξρvS ∂2_V ∂S∂v + 1 2ξ 2 v∂ 2_V ∂v2 − rV + rS∂V ∂S + [κ(θ − v) − λ(S, v, t)] ∂U ∂v,

where all the variables are as were defined in §3.2, and λ is the market price of volatility risk.

Proof. Let Π(t) be a self-financing7 hedged portfolio, consisting of an option on an underlying asset S, the asset S itself, and another option U,

Π(t) = ∆S + V (S, v, t; T ) + ∆1U (S, v, t; T ), (3.2.3)

where t ∈ [0, T ], and v is the volatility as defined in the Heston model. We can drop the dependent parameters for reason of brevity and look at the change in the value of the portfolio during a quantum of time dt is

5_{A Gamma distribution Γ(κ, θ), is defined by two numerical parameters: κ, the shape}

pa-rameter –which dictates the overall shape of the distribution–, and θ the scale papa-rameter, which defines by how much the distribution is stretched/shrinked.

6_{However recent instruments have emerged that treat Volatility as an asset class [see ??]} 7_{i.e. there is no inwards/outwards cash flow.}

(26)

dΠ = ∆dS + dV + ∆1dU. (3.2.4)

From Itô’s lemma we have that the changes of the price of the options V and U are, dV = ∂V ∂t + 1 2vS 2∂2V ∂S2 + ξρSv ∂2_V ∂S∂v + 1 2ξ 2_v∂2V ∂v2 dt +∂V ∂SdS + ∂V ∂vdv, (3.2.5) dU = ∂U ∂t + 1 2vS 2∂ 2_U ∂S2 + ξρSv ∂2U ∂S∂v + 1 2ξ 2_v∂ 2_U ∂v2 dt +∂U ∂SdS + ∂U ∂vdv. (3.2.6) We replace Equations 3.2.5 and 3.2.6 into3.2.4 to get,

dΠ = αdS + βdv + γdt, (3.2.7) where, α = ∂V ∂S + ∆ + ∆1 ∂U ∂S, β = ∂V ∂v + ∆1 ∂U ∂v, γ = ∂V ∂t + 1 2vS 2∂2V ∂S2 + ξρvS ∂2_V ∂S∂v + 1 2ξ 2 v∂ 2_V ∂v2 + ∆1 ∂U ∂t + 1 2vS 2∂2U V ∂S2 + ξρSv ∂2_U ∂S∂v + 1 2ξ 2_v∂2V ∂v2 , and where ξ and ρ above are as defined in Equation (3.2.1).

In order to provide a risk-neutral valuation it is necessary to eliminate the the stochastic components of Equation (3.2.7). Hence we set α = β = 0 and derive the following hedging parameters

∆1 = − ∂V ∂v ∂U ∂v , ∆ = −∂V ∂S − ∆1 ∂U ∂S. (3.2.9) If there is no arbitrage to exist, then that the rate of return on the portfolio’s value will be equal to the risk-free interest rate r - which we will assume is deterministic for this thesis - leaving us with

dΠ = γdt = rΠdt = r(V − ∆S − ∆1U )dt. (3.2.10)

(27)

γ = rΠ, and hence, r(∆S + V + ∆1U ) = ∂V ∂t + 1 2vS 2∂ 2_V ∂S2 + ξρvS ∂2V ∂S∂v + 1 2ξ 2_v∂ 2_V ∂v2 + ∆1 ∂U ∂t + 1 2vS 2∂2U V ∂S2 + ξρSv ∂2_U ∂S∂v + 1 2ξ 2_v∂2V ∂v2 , which we can abstract by writing

A + ∆1B = r(∆S + V + ∆1U ).

When we substitute ∆, ∆1 from Equations 3.2.9and re-arrange we get,

A − rV + rS∂V_∂S ∂V ∂v = B − rU − rS ∂U ∂S ∂U ∂v . (3.2.11)

The LHS of Equation 3.2.11 is a function of just V , as is the RHS a function of U alone. Hence there must be some function f(S, v, t) for which the equality of Equation 3.2.11 holds. When we define this function as f(S, v, t) = −κ(θ − v) + λ(S, v, t), where λ is the market price of volatility risk, following the proposal by [Hes93]. Thus Equation 3.2.11 can be written as,

A − rV + rS∂V_∂S

∂V ∂v

= −κ(θ − v) + λ(S, v, t). By re-arrangement and substitution of A we get,

0 = ∂V ∂t + 1 2vS 2∂2V ∂S2 + ξρvS ∂2_V ∂S∂v + 1 2ξ 2_v∂2V ∂v2 − rV + rS∂V ∂S + [κ(θ − v) − λ(S, v, t)] ∂U ∂v.

3.2.3 Heston’s closed-form analytical solution

The Heston model’s closed form price C – as described in [Hes93] and defined in Definition 3.1 – for a European vanilla call option, with strike price K, spot price S, and time to maturity T , on a non-dividend underlying satisfies Equation (3.2.12).

(28)

where the change in its price for a time dφ, conditional on the strike price K, is P(x, v, T ; ln[K]) = 1 2+ 1 π Z ∞ 0 R e −iφ ln[K]_f (x, v, t; φ) iφ dφ, x = ln[St], f(x, v, t; φ) = eC(T −t;φ)+D(T −t;φ)v+iφx, C(τ ; φ) = rφiτ + κθ ξ2 (b− ρξφi + d)τ − 2 ln 1 − gedτ 1 − g , D(τ ; φ) = b− ρξφi + d ξ2 1 − edτ 1 − gedτ , and g = b− ρξφi + d b− ρξφi − d , d = q

(ρξφi − b)2− ξ2(2uφi − φ2) ,

where  = {1, 2}, τ = T − t, φ is the integration parameter, i = √−1, λ is the price of volatility risk8,

u1 =

1

2, u2 = − 1

2, b1 = κ + λ − ρξ, b2 = κ + λ,

and the rest of the parameters are as defined in Definition 3.1. The Octave/Mat-lab code to implement Equation (3.2.12) is shown in Listing A.1in Appendix A.

3.2.4 Double Heston Model

Another approach to the Heston model is to add a second process of variance [GP09], independent of the first one,

dSt = µStdt + 2 X ı=1 StpvıtdWtSı, (3.2.13a) dv_tı = κı(θı− vı t)dt + ξ ı_pvı tdWv ı t , ı = {1, 2}, (3.2.13b)        d WS1 Wv1 WS2 Wv2 WS1 1 ρ1 0 0 Wv1 ρ1 1 0 0 WS2 0 0 1 ρ2 Wv2 0 0 ρ2 1        . (3.2.13c)

8_{The choice of λ(S, v, t) can be assumed as λ(S, v, t) = γCov[dv,}dC

C], where C(t) is the

(29)

This approach provides a more flexible model by increasing the DoF in the stochastic process. [CH09] showed that the introduction of a second, uncorrelated, variance process to the Heston model provided a more flexible modelling for the time variation in the smirk9, leading to a better fit quality with comparable computational time. Equation (3.2.13c) shows the correlation structure between the GBMs.

3.3 Bates’ Model

Critical information regarding a company’s performance arrive at random points in time and have an acute impact on the company’s stock price. Because the information arrives at a random time, it can be treated and modelled as a random variable. To account for this behaviour Merton [Mer76] introduced a model to account for the discontinuous paths of asset prices. The behaviour of this model is very similar to the AJDfunctions described in §3.3.1.

The Merton and Heston model were combined by Bates [Bat96] who merged the stochastic volatility with the jump diffusion augmentation. The stock price follows the SDE,

dSt= µStdt + √ vtStdWtS+ (e α+δ_{− 1)Sdq} t, dvt= κ(θ − vt)dt + ξ √ vtdWtv, CovhdW_tS, dW_tvi = ρdt, dq = ( 0 with probability 1 − λdt 1 with probability λdt ,

where dq is a compounded Poisson process with a constant intensity λ, and the jump size is lognormally distributed with mean log-jump α and standard deviation δ, where ∼ N(0,1). The jump size distribution in not correlated with the Wiener

processes.

3.3.1 Affine Jump Diffusion Functions

The basic AJD functions are of the following form,

dx(t) = µ(x, t)dt + σ(x, t)dW (t) + dZ(t, λ, ν), µ(x, t) = µ0+ µ1x(t),

σ2(x, t) = H0+ H1x(t),

9_{A smirk is defined as a skewed volatility smile, which in its turn is the long-observed pattern}

at which at-the-money options thend to have lower implied volatilities than in/out-of-money options.

(30)

where µ(x, t) and σ2_{(x, t)} _{are affine functions, dW is a} _GBM _{and dZ(t) is a}

compound Poisson process with λ intensity and ν independent jumps, which is independent of W . The log-jump sizes Yi ∼ N(µ,δ2₎ are independent and

identi-cally distributed (i.i.d.) random variables with mean µ and variance δ2 which are

uncorrelated with both Z(t) and W (t). Jumps add mass to the returns distri-bution, and thus address the issue of non-"fat-tail" that the B&S models suffers from. By increasing δ we can add mass to both tail, while a negative/positive µ add mass to the left/right tail respectively alone.

More recent explorations into jump diffusions can be seen in [Lip02], where the log-exponential jumps are investigated to price exotic options.

(31)

Chapter 4 Monte Carlo Simulation

As mentioned in Chapter2, the numerical method to evaluate the value of aPDE

is a Monte Carlo (MC) simulation. In this chapter we will set the basics of this method and define the inputs and the intermittent processes required for this simulation to work.

4.1 Introduction

The reference to Monte Carlo, is due to the homonym city’s affiliation with games of luck. The premise of luck is utilised within the simulation in order to provide a random sample from the overall probability space. If the random sample is as truly random as possible, then the subset of the random samples is taken as an estimate of the volume of the entire probability space. The law of large numbers guarantees [RW] that this estimation will converge to the true likelihood as the number of random draws increases. Given a certain number of random draws, the likely magnitude of the error can be derived by the central limit theorem.

As stated in §2.3 the Feynman-Kač theorem is the connective tissue between the PDE form of the stochastic model, and its approximation by a Monte Carlo simulation. By this theorem it is possible to approximate certain formPDEs, by simulating random paths and deriving the expectation of them as the solution of the original PDE.

As an example we can to return the B&S Equation (2.4.2), where the price of the option depends on the expected value of the payoff. In order to calculate the expected value of f it is possible to run a MC simulation with N paths 1_, in order to approximate the actual price of the call option C with the simulated

1_{A path in this context is a discrete time-series of prices for the option, one for each discrete}

(32)

price ˆCN, ˆ CN = e−rT N N X ı=1 S0e (r−1₂σ2_{)T +σ}√_{T N}ı (0,1) − K + , (4.1.1)

where r is the risk-free interest rate, T is the time to maturity of the option, σ is the volatility, K is the strike price at maturity date T , S0 is the spot price at

t = 0, and N(0,1) are Gaussian RVs. By the "strong law of large numbers" we

have,

Pr ˆCN → C

= 1, as N → ∞. (4.1.2)

4.2 Simple

MC

algorithm

Suppose we wish to estimate the call option price C := E [h(X)], then the algo-rithm to simulate the expected value is,

Algorithm 1 _MC _{Simulation(N )} Require: Generate X1, · · · , XN Ensure: E [C] = ˆCN 1: for i = 1 to N do 2: Yi := h(Xi) 3: Si ← Si−1+ Yi 4: end for 5: return ˆCN ← S_NN

In order to determine the efficiency of the simulation, we need to calculate the confidence interval for this simulation. The approximate 100(1 − α)% confidence interval for Algorithm 1 is defined as,

ˆ CN − z1−α₂ ˆ σN √ N, ˆCN + z1− α 2 ˆ σN √ N ,

where ˆσN is the estimate of V ar(Y ) based on Y1, · · · , YN. We can use the

Half-Width (HW) of the confidence interval to measure the quality of the estimator, ˆ CN, HW= z1−α 2 r V ar(Y ) N , (4.2.1)

The smaller the HWthe more accurate the simulation of the expected value will be. There are two ways that this can be achieved. Either increase N or reduce V ar(Y ). The first option has some limitations upon computational complexity of calculating Yi in a reasonable amount of time. We can however impact the

variance of the generated Yi values. There are multiple techniques to do the

(33)

4.3 Variance reduction

There are two major avenues to take in order to reduce variance in aMC simula-tion, one is to take advantage of specific features of the problem domain to adjust or correct simulation outputs, the other by directly reducing the variability of the simulation inputs. In this section we’ll introduce control variates, antithetic variance, absorbing technique, full truncation, and the case of the quasi-random simulation.

4.3.1 Control variate technique

In order to improve on the accuracy of the estimated variate, it is essential to use as much information as possible. We can derive very important information for the estimated variate, if we can define another variate, one of which it is not difficult to calculate it’s expected value, that is highly correlated with the estimated one. We could then use this new variate as a control mechanism in order to improve the accuracy of the simulation by minimising the errors between the two variates.

Suppose that, as in Section 4.2, we wish to estimate the call option price C := E [Y ], where Y := h(X) is for instance the discounted payoff for a call option for which we have no closed-form evaluation. We do however know of another variate Z, which we can easily generate, whose expected value E [Z] we can also quickly calculate. Now suppose that for each replication of Yi we also

generate a value for Zi, and that the pair (Yi, Zi) isi.i.d., then for any fixed c we

can calculate,

Y_ic= Yi+ c(Zi− E [Z]),

from the ith path and thus calculate the sample mean,

ˆ Yc= ˆY + c( ˆ_{Z − E [Z]) =} 1 N N X i=1 (Yi+ c(Zi− E [Z])).

This is an unbiased estimator of E [Y ]2that can act as a control variate estimator, that is the observed residual Z − E [Z] acts as a control when estimating E [Y ]. If we now compute the variance of ˆYc _{we get,}

V arh ˆYc i = V ar [Yi+ c(Zi− E [Z])] = σ_Y2 + 2cσZσYρZY + c2σ2Z ≡ σ 2 c, (4.3.1) where σ2

Z = V ar [Z] , σY2 = V ar [Y ], and ρZY is the correlation factor between Z

and Y . In order to minimise the variance of the replication we choose a c∗ _that

2_{Since E [Y}c

i ] = E [Yi+ c(Zi− E [Z])] = Eh ˆY

i

(34)

minimises Equation 4.3.1 and is expressed as, c∗ = −σY σZ ρZY = − Cov[Z, Y ] V ar[Z] . (4.3.2)

Substituting Equation4.3.2 into 4.3.1we get, V arh ˆYci= V ar[ ˆY ] − 2Cov[Z, Y ] V ar[Z] Cov[Z, Y ] + Cov[Z, Y ]2 V ar[Z]2 V ar[Z] = V ar[ ˆY ] − Cov[Z, Y ] 2 V ar[Z] , (4.3.3)

therefore, as long as Cov[Z, Y ] 6= 0 we can achieve a reduction in variance of the estimated variate. In fact the higher the correlation, ρZY, is the higher the

reduction to the variance will be.

All that remains now is to find an appropriate control variate that is non-zero correlated to the estimated variate. Since we are trying to find the discounted payoff of a call option, we could use the Black & Scholes SDE as the Z control variate of the above example.

Remark. If we were to examine the ratio of the controlled estimator to that of the uncontrolled estimator we could derive that,

V arh ˆY + c( ˆ_{Z − E [Z])}i

V ar[ ˆY ] = 1 − ρ

2

ZY, (4.3.4)

where what is implied is that the stronger the correlation between the estimated Y and control Z variate, the more effective the control variate is. N.B. that this effect is irrelevant of the sign of correlation since this is canceled in the square form in Equation 4.3.4.

The question remains however as to how to estimate this correlation of the two bi-variates. This can be achieved by doing t training simulation runs in which the correlation coefficient is calculated, first by calculating the covariance of the two variates as,

\ Cov[Y, Z] = Pt =1 Y− ˆYt (Z− E [Z])) t − 1 .

And by selecting an appropriate control variable, we are able to calculate the expected value as well as the sample variance of the replications. For the variance we get, \ V ar[Z] = Pt =1(Z− E [Z]) 2 t − 1 ,

(35)

and hence we can derive from Equation4.3.2the optimal constant ˆc∗to use for the

control variate simulation. Once we have this constant we can use it to perform a simulation and reduce the variance with a control variate.

A simple algorithm to simulate the variate V with a control variate Z would look like Algorithm 2.

Algorithm 2 Monte Carlo Simulation with control variate(t, N ) Require: t > 1, N > 0.

Ensure: E [Y ] = ˆVN.

1: for i = 1 to t do

2: generate (Yi, Zi) {training run to calculate the constant factor c*}

3: end for

4: ˆc∗ ←calculate c∗

5: for i = 1 to N do

6: generate (Yi, Zi) {actual simulation}

7: Vi ← Yi+ ˆc∗(Zi− E [Z]) 8: end for 9: VˆN ← PN i=1 Vi N 10: V ar[V ]\ _N ←PN i=1 (Vi− ˆVN)2 N −1 11: 100(1 − α)%CI ← ˆ VN − z1−α 2 \ V ar[V ]_N √ N , ˆVN + z1−α2 \ V ar[V ]_N √ N

4.3.2 Antithetic variates technique

This method reduces variance by introducing a negative dependence between pairs of replications. We will present the case where the replications are sampled from a Uniform distribution, however this method can take various forms. As [Gla04, p.205] mention, this method can be extended to various distributions via the inverse transform method where F−1_{(U )} _{and F}−1_{(1 − U )} _{both have distribution}

F, but are antithetic to each other because F−1 _{is monotonic. As an example it}

is possible to use a pair or replications from the normal distribution, by pairing a sequence Z1, Z2, · · · of i.i.d. N(0,1) random variables with the antithetic sequence

−Z1, −Z2, · · · of i.i.d. N(0,1) random variables.

Let us now extend the paradigm we presented in Chapter 4.3.1 for the price of a call option. In this case we generate two pairs of antithetic replications from a Uniform distribution and use them as the unbiased estimator,

Zı =

Yi+ ˜Yi

2 , (4.3.5)

where Yi = h(Ui)is the payoff of the call option sampled on a Uniform distribution

(36)

Uniform distribution 1 − Ui ∼ U(0,1).

Since E[Y ] = E[ ˜Y ] = ˆY , then from Equation (4.3.5) we deduce that Zi is

an unbiased estimator of ˆY , and because the U0

is are i.i.d. we can use Zi to

construct confidence intervals. Algorithm 3 explains how the MC simulation would implement antithetic variates.

Algorithm 3 Monte Carlo Simulation with antithetic variates(N ) Require: N > 0. Ensure: E [Y ] = ˆVN. 1: for i = 1 to N do 2: generate Ui 3: Yi ← h(Ui) 4: Y˜i ← h(1 − Ui) 5: Zi ← Yi+ ˜₂Yi 6: end for 7: ZˆN ←PN_i=1Z_Ni 8: V ar[Z]\ _N ←PN i=1 (Zi− ˆZN)2 N −1 9: 100(1 − α)%CI ← ˆ VN − z1−α₂ \ V ar[V ]_√ _N N , ˆVN + z1−α2 \ V ar[V ]_√ _N N

4.3.3 Quasi-random simulation

One more procedure to reduce the variance of the simulation is to sample for variates of lower variance. Such numbers can be sampled from the so called "low discrepancy" sequences. A sequence’s discrepancy is a measure of its uniformity and is defined by following definition [see [Lev02]].

Definition 4.1. Given a set of points x1_{, x}2_{, · · · , x}N _{∈ I}S _{and a subset G ' I}S_,

define the counting function SN(G) as the number of points xi ∈ G. For each

x = (x1, x2, · · · , xS) ∈ IS, let Gx be the rectangular s-dimensional region,

Gx = [0, x1) × [0, x2) × · · · × [0, xS),

with volume x1, x2, · · · , xN. Then the discrepancy of the points x1, x2, · · · , xN is given by,

D∗_N(x1, x2, · · · , xN) = sup

x∈IS

|SN(Gx) − N · (x1· x2, · · · , xS)|.

The discrepancy value of the distribution compares the sample points found in the volume of a multi-dimensional space, against the points that should be in that volume provided it was a uniform distribution.

(37)

There are a few sequences that are being used to generate quasi-random vari-ates. The Numerical Algorithms Group (NAG) libraries provide three sequence generators. The Niedereiter [Nid92], the Sobol [Sob67], and the Faure [FAU81] sequence are implemented in MATLAB with the functions g05yl and g05ym.

4.4 Discretisation Schemes for Stochastic

Differ-ential Equations

In §4.3we described methods of reducing the variance of the Monte Carlo simula-tion and thus increasing the precision of the estimated value at the end. However, there is one more factor of error that needs to be taken into account and addressed, and that’s the simulation bias due to the discretisation of the SDE. One way to think of this is with by shooting arrows at a bullseye target; high precision shots form a tight cluster of arrows, but they could be completely outside of the circles due to high bias. We will continue to present the Euler Scheme, which is the simplest and most common form of discretisation, before we proceed to refine and extend it to alternative schemes.

4.4.1 Euler-Maruyama scheme

Let us consider the case of the following SDE,

dX(t) = α(X(t))dt + β(X(t))dW (t). (4.4.1) Also let ˆXbe the discretised approximation of X. The Euler Maruymama [Mar55] approximation, and temporal granulation 0 = t0 < t1 < · · · < tm, and ˆX is,

ˆ

X(ti+1) = ˆX(ti) + α( ˆX(ti))∆t + β( ˆX(ti))

√

∆tZi+1, (4.4.2)

for i = 0, · · · , m − 1, Zi i.i.d. normal variates, and ∆t = ti+1− ti.

Since the discretisation is an approximation process, it is imperative to mea-sure how accurate this approximation ultimately is. To do this we need to evaluate the discrepancy between the SDE and its discrete approximation conditional on the size of ∆t. In essence we want to evaluate if the error

(38)

when ∆t → 0. There are two accepted metrics to measure this discrepancy; the weak convergence that shows the error of the mean, and the strong convergence which shows the mean of the error.

A typical weak convergence error has the form, weak_∆t := sup 0≤ti≤T E[f (X(t))] − E h f ( ˆX(t))i ,

where f is a smoothing polynomial of some order k. We say that the discretised approximation ˆX(t) converges weakly if weak

∆t → 0 when ∆t → 0. The order of

the weak convergence is γ > 0 when

weak_∆t ≤ C∆tγ_,

for some scalar C and for all sufficiently small ∆t.

Conversely the discretised approximation ˆX(t) converges strongly if for the strong error strong_∆t := sup 0≤ti≤T E[X(t)] − E h ˆ_X(t)i , we have that string

∆t → 0when ∆t → 0. Similarly as above the order of the strong

convergence is γ > 0 when

strong_∆t ≤ C∆tγ_,

for some scalar C and for all sufficiently small ∆t. According to [Gla04, p.345], the Euler scheme typically has a strong order of 1

2, but often achieves a weak

order of 1.

4.4.2 Milstein scheme

This scheme was first proposed by Milstein [Mil95], and is explained in detail by Glasserman [Gla04, p.340-344], Klöden and Platen[KPS94] for more general processes, and in Kahl and Jäckel [KJ06] for stochastic volatility processes. The scheme works forSDEs for which the drift and diffusion terms are not dependent on time directly. Let us take the case of a stochastic process as defined in Equation (4.4.1). The Milstein discretisation scheme can then be expressed as

ˆ

X(i + 1) = ˆX(i) + α( ˆX(i))∆t + β( ˆX(i))√∆tZi+1

+ 1 2β

0

( ˆX(i))β( ˆX(i))∆t(Z_i+12 − 1). (4.4.3) The discretisation Equation 4.4.3 is composed by a deterministic part which is defined by the α term, a stochastic term that is defined by the β term, and an Itô’s term.

(39)

4.4.3 Kahl-Jäckel scheme

Kahl and Jäckel [KJ06, p.24] propose an implicit Milstein scheme for the variance in combination with an alternative discretisation for the underlying’s price pro-cess. Specifically they refer to this stochastic volatility scheme as IJK and define it as ˆ V (t + ∆t) = ˆ V (t) + κθ∆t + ξ q ˆ V (t)ZV √ ∆t +1₄ξ2∆t(Z_V2 − 1) 1 + κ∆t , (4.4.4) ln ˆX(t + ∆t) = ln ˆX(t) − ∆t 4 ˆ_{V (t + ∆t) + ˆ}_{V (t)} + ρ q ˆ V (t)ZV √ ∆t + 1 2 q ˆ V (t + ∆t) + q ˆ V (t) ZX √ ∆t − ρZV √ ∆t + 1 4ξρ∆t(Z 2 V − 1), (4.4.5)

where θ is the equilibrium supported by the fundamentals, κ is the rate at which the shocks dissipate and the variance returns to θ, ξ is the degree of volatility around it caused by shocks, and ZX, ZV are Normal variates.

By finding the minimum of the variance function and forcing its value to be positive we can easily derive that the variance is guaranteed to be strictly positive if 4κθ > ξ2_{. In reality, as Andersen [And07] mentions, it is unrealistic to}

uphold this constraint in realistic situations, hence this scheme will can and will produce negative variance. Andersen proposes then a full truncation to 0 when a value is negative. This means the Equation (4.4.4) will substitute ˆV (· · · )+ ₌

max( ˆV (· · · ), 0) where there is ˆV (· · · ).

4.4.4 Broadie-Kaya exact calculation scheme

Broadie and Kaya [BK06] presented a discretisation process that is completely bias-free. This exact scheme though has limitions3 _{and sub-optimal performance} even against the "simple" Euler scheme as Lord et al. [LKvD06] showed in their numerical comparisons4_.

To obtain the bias-free scheme begin with the consecutive application of Itô’s Lemma first to get the explicit form and then to pass on to a Cholesky decom-position (see [And07, p.7] for a detailed derivation). What is finally obtained is,

3_{Due to lack of speed and high complexity of implementation.}

4_{Most likely due to the reliance to the acceptance-rejection sampling that induces a}

(40)

V (t + ∆t) = V (t) + Z t+∆t t κ(θ − V (u))du + ξ Z t+∆t t p V (u)dWV(u), (4.4.6) ln X(t + ∆t) = ln X(t) + ρ ξ(V (t + ∆t) − V (t) − κθ∆t) + κρ ξ − 1 2 Z t+∆t t V (u)du +p1 − ρ2 Z t+∆t t p V (u)dW (u). (4.4.7)

The distribution of ln X(t+∆t) is clearly Normal, and after sampling V (t+∆t) from a non-central χ2_{, with degrees of freedom defined from a Poisson sampling,}

we can draw a sample Rt+∆t

t V (u)du|V (t + ∆t), and calculate the next log value

of X. Since this last sampled distribution is conditional on the next value of the variance, we can identify it as a Brownian Bridge.

4.4.5 Quadratic Exponential (

QE

) scheme

In 2005 Andersen [And07] proposed a new scheme to discretise the stochastic volatility and the price of an underlying asset. This scheme takes advantage of the fact that a non-central χ2 _{sampled variate can be approximated by a related}

distribution, that’s moment-matched to the conditional first and second moments of the non-central χ2 distribution.

As Andersen points out, the cubic transformation of the Normal RV is a more accurate representation of the distribution closer to 0, it introduces negative values of variance. Thus the quadratic representation is adopted with a spacial case for when we have low values of V (t). Therefore when V (t) is sufficiently large, we get,

ˆ

V (t + ∆t) = a(b + ZV)2, (4.4.8)

where ZV is an N(0,1) Gaussian RV, and a, b scalars that will be determined by

moment-matching. Now for the complementary low values of V (t) the distribution can –asymptotically– be approximated by,

P(V (t + ∆t) ∈ [x, x + ∆t]) ≈ (pδ(0) + β(1 − p)eˆ −βx)dx, x ≥ 0, (4.4.9) where δ is the, strongly reflective at 0, Dirac delta-function, and p and β are positive scalars to be calculated. The scalars a, b, p, β depend on the parameters of the Heston model and the time granulation ∆t, and will be calculated by moment-matching the exact distribution.

(41)

To sample from these distributions there are two distributions to take into account:

◦ Sample from the normal N(0,1) Gaussian RV and calculate ˆV (t + ∆t) from

Equation (4.4.8).

◦ To sample for the small values of V the inverse of Equation (4.4.9) will be used. The inverse of the distribution function is,

Ψ−1(u) = Ψ−1(u; p, β) = (

0 if 0 ≤ u ≤ p,

β−1ln _1−u1−p if p ≤ u ≤ 1. (4.4.10) The value of V can then be sampled from

ˆ

V (t + ∆t) = Ψ−1(UV; p, β), (4.4.11)

where UV is a uniform RV.

The rule on deciding which descritisation of V to use is depended on the non-centrality of the distribution, and can be triaged based on the value of ψ. The value of ψ is , ψ := s 2 m2 = ˆ V (t)ξ2_e−κ∆t κ (1 − e −κ∆t_{) +} θξ2 2κ(1 − e −κ∆t₎2 (θ + ( ˆV (t) − θ)e−κ∆t₎2 , (4.4.12)

where m, s2 _{are the conditional mean and variance of the exact distribution we}

are matching. What Andersen showed was that the quadratic scheme of Equation

4.4.8can only be moment-matched for ψ ≤ 2 and similarly the exponential scheme of Equation 4.4.11 can only be moment-matched for ψ ≥ 1. It emerges then that there is an overlap interval for ψ ∈ [1, 2] where the two schemes overlap. Intuitively Andersen chooses the midpoint of this interval as the cut-off point between the schemes; thus the cut-off ψc = 1.5.

Since we’ve defined the discretisation process for the QEscheme, with Equa-tions 4.4.8 and 4.4.11, and the cut-off discriminator, what is left is to calculate the remaining parameters a, b, p, β for each case. The algorithm for this process is detailed in Algorithm 4.

This algorithm is implemented in MATLAB [see Listing A.4 for details], and is used for numerical comparisons of acceleration.

4.5 Generating random realisations of variates

4.5.1 General considerations

A pseudo-random number sequence is one that tries to approximate a truly ran-dom process by using a process that relies on a specific initial condition, called a

(42)

Algorithm 4 Quadratic Exponential Variance Reduction Require: The present value for the variance, ˆV (t)

Ensure: The value for the variance in the subsequent time-step, ˆV (t + ∆t)

1: Compute m ← θ + ( ˆV (t) − θ)e−κ∆t 2: Compute s2 _← V (t)ξˆ 2_e−κ∆t κ (1 − e −κ∆t_{) +} θξ2 2κ(1 − e −κ∆t₎2 3: Compute ψ ← m2 s2 4: if ψ ≤ ψc then 5: Compute a ← m 1+b2 6: Compute b ← 2ψ−1_{− 1 +}_p2ψ−1_p2ψ−1_{− 1}

7: Generate Normal random variate ZV

8: return ˆV (t + ∆t) ← a(b + ZV)2 9: else 10: Compute p ← ψ−1 ψ+1 ∈ [0, 1) 11: Compute β ← 1−p m

12: Generate Uniform random variate UV

13: if 0 ≤ UV ≤ p then 14: return ˆV (t + ∆t) ← 0 15: else 16: return ˆV (t + ∆t) ← β−1ln_1−U1−p V 17: end if 18: end if

seed, and a deterministic algorithm for generating continuous variates. Since the process is deterministic the generated sequence cannot be truly random, however it will not fail all tests of randomness. It is not possible to create a Pseudo-Random Number Generator (PRNG) that passes all empirical statistical testing (see [Knu97] for more on this matter), however we can accept empirically that good PRNGs will pass all simple tests, while probably fail more complex ones.

Another property of thePRNGs is that the generated sequences are periodic, meaning that after a certain amount of variates generated, the same variate as the very first one will be generated again. However since the deterministic generation relies on a specific mathematical function to generate the random variate, when implemented on a CPU it is significantly more efficient in generating very large se-quences, in comparison to generating truly random numbers using True-Random Number Generators (TRNGs)5_.

There are two qualities that can qualify a sequence’s randomness; its uni-formity and its independence. The most common tests of uniuni-formity are the

5_{The simplest example of a truly random number would be a sequence of numbers recorded}

by throwing a fair die for a large enough amount times. More optimised generators might use natural phenomena or chaotic sources, that once observed can produce true randomness(e.g. lava lamps[http://www.lavarnd.org], atmospheric white noise, CCD chip white noise, etc.)

(43)

Kolmogorov-Smirnov (KS) test, and the χ2 _{test. The autocorrelation test would}

test for independence, by identifying periodicity in the sequences.

There are quite a number of algorithms to generate pseudo-random numbers. Some of the most commonly used are mentioned in the chapters below.

4.5.2 Sampling methods

4.5.2.1 Inversion method

The Inverse transform sampling relies on the following theorem,

Theorem 4.2. Let X be a random variate from a continuous distribution with a strictly increasing cumulative F (x). Let U be a uniform(0, 1) random variate, then F−1(U ) ∼ F (x).

Recall that for y ∈ [0, 1], F (y) = P(U ≤ y) = y. Then,

P(F−1(U ) ≤ x) = P(U ≤ F (x)) = F (x) = P(X ≤ x).

Then the inversion process is 1. Generate U ∼ U (0, 1). 2. Output X = F−1(U ).

4.5.2.2 Acceptance-rejection method

For a random variate sequence X with a Probability Density Function (PDF) f (x) and support (a, b), i.e. f(x) = 0, when x /∈ [a, b]. The function also has an upper bound K for which f(x) ≤ K, ∀x. The acceptance-rejection process is

1. Generate uniform random variates U1 ∼ U(a,b) and U2 ∼ U(0,K).

2. If U2 ≤ f (U1), then output X = U1, else goto step 1.

4.5.3 Uniform random samples

The continuous uniform distribution is part of a family of distributions, where all bins(intervals) of the same interval, are equally probable. What follows is a pre-sentation of three procedures to sample from a Uniform distribution; the Linear Congruential Generator (LCG), the Mersenne twister, and L’Ecuyer’s methods.

(44)

4.5.3.1 Linear Congruential Generator (LCG)

The LCG method is the oldest and well known methods to generate Uniform random numbers. This is a recursive algorithm that takes advantage of some properties of prime numbers. Like most PRNG it too relies on an initial value to seed it. This becomes the starting value of the pseudo-random sequence. The next value is deterministically derived from

Zi+1 = (αZi+ c) (mod m),

where m is the modulus, α is the multiplier, and c is the increment. Since the next variate in the sequence is modulated by m, it is not possible to get a number higher than m. Thus we can derive uniform numbers Ui ∈ [0, 1] by Ui = Z_mi.

4.5.3.2 Mersenne twister

The Mersenne Twister (MT) is an exceptionally high quality, fast random number generator. It was developped by Makoto Matsumoto and Takuji Nishimura in 1996 and further extended in 2002; the current implementation is MT19937, with 32-bit word length. This algorithm has a period length of approximately 219,937₋₁

and has been shown to be uniformly distributed in 623 dimensions. (see [MN98]). The function g05kg(genid, subid) from the NAG library was used to generate the uniform from 0 t 1 random variates. When the genid=3 the MT algorithm is used to generate them.

4.5.4 Normal random samples

The normal or gaussian distribution has a charactestic bell-like same which gives it the empirical name of the bell curve. It is considered the most prominent distribution statistics, and arises in a multitude on natural phenomena. The Gaussian function that descibes the bell curve is,

f (x; µ, σ2) = 1 σ√2πe −1 2( x−µ σ ) 2 ,

where µ = 0 is the expected value of the distrbution, and σ = 1 is the variance of it.

The most common and efficient way to generate normal variates N0,1 is the

Box-Muller transformation. The method relies on first sampling two Uniform RV

and then performing a polar transformation with them.

Lemma 4.3. Let U1, U2 to independent uniform random variates that are

uni-formly distributed in the interval (0, 1). We can then generate two Normal random variates,

(45)

Z0 =p−2 ln U1cos(2πU2),

Z1 =p−2 ln U1sin(2πU2),

then Z0, Z1 are independent random variates, distributed normally with σ = 1.

In the case of this implementation the NAG library was used to generate the random variates. An example of how to generate them is given in Listing 4.1.

1 f u n c t i o n [ x , iseedOut , i f a i l ] = GenerateNormalNAG (mu, var , n , gen )

n = in t 64 (n) ;

3 igen = i n t64 ( gen ) ;

[ igenOut , i s e e d ] = g05kc ( igen ) ;

5 [ igen , i s e e d ] = g05kb ( igen , i s e e d ) ;

[ x , iseedOut , i f a i l ] = g05la (mu, var , n , igen , i s e e d ) ;

Listing 4.1

(46)

Static MC Uniform Generator Generated r.v U Transformation Rules Simulated r.v. X Simulation al-gorithm (loop) Variance Reduction Sample paths

Law of large numbers

Monte Carlo Estimation Dynamic MC Arbitrage-free model Risk-neutral SDE Discretization Scheme Approxim. random walk Series Expansion Approxim. random path Figure 4.1

(47)

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Sobol Quasi Random Pseudo Random

Figure 4.2

The Sobol quasi-random uniform variates on the left, versus the pseudo-random variates on the right. The NAG library was used for the quasi-random variates.

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Niedereitter Quasi Random Pseudo Random

Figure 4.3

The Niedereitter quasi-random uniform variates on the left, versus the pseudo-random variates on the right. The NAG library was used for the quasi-pseudo-random variates.

(48)

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Faure Quasi Random Pseudo Random

Figure 4.4

The Faure quasi-random uniform variates on the left, versus the pseudo-random variates on the right. The NAG library was used for the quasi-random variates.

(49)

Chapter 5 Dataflow programming on

FPGAs

This approach to accelerate code, takes advantage of the fact that most of the time the Central Processing Unit (CPU) is busy figuring out the scheduling of the instructions and the branch prediction of the program. The purpose of anFPGA

is to provide a customisable “field-programmable” chip that can be optimised to perform calculation for a specific problem domain. This is achieved by allowing the logic blocks on the chip to be re-wirable. This way, even after a board has been shipped, it can be re-wired and re-purposed.

This re-wiring is achieved via a Hardware Description Language (HDL). The language then offers the ability to interconnect the logic blocks into different combinations and cater for complex combinatorial functions, and also manage the on-chip memory 1_.

5.1 Applications of

FPGA

s in computational

fi-nance

One company that has made breakthroughs in accelerating financial models on

FPGAs is Maxeler Technologies. Their paradigm shift, from the Von Neumann control-flow architecture to the Dataflow architecture, allows for much higher computational specialisation and acceleration. The control-flow architecture can be likened to a mechanics workshop, where one person does all stages of con-structing a product, e.g. a motorcycle. This work doesn’t have to be sequential in any order. While building the motorcycle, the mechanic can also side-step and work on a part of a car before returning to the motorcycle according to a sched-ule. The antipode of this paradigm is a motorcycle production line, where each station on the production is optimised to perform one action. The overall process is as quick as the flow. What the FPGAs provide is a way to create workers

(50)

–kernels as defined by Maxeler– that are highly specialised and extremely quick at conducting a specialised operation. The kernels are large synchronous dataflow pipelines that implement the mathematics and the control of the problem. They are asynchronously coupled to other kernels and I/O sources and sinks (DRAM, Peripheral Component Interconnect express (PCIe), inter-chip links, etc.) by the manager.

An additional benefit of this technology is its low power footprint when com-pared against a standard CPU. Since the clock cycle on a CPU is much higher than in an FPGA the electrical resistance on the transistors causes the release of energy in the form of heat. This heat accumulates in a server room and needs to be dissipated. In large server clusters, the cooling of the servers can amount to a significant cost. Typically the main power consumption for a cluster would be half and half for the servers themselves, and for the cooling systems of those servers. FPGA chips tend to offer a significant reduction in the electricity costs of maintenance.

5.2 FPGA

versus Intel multi-core

Thus far, CPUdevelopments have adhered to Moore’s law2_{. This prediction has} been followed up to this point, however limitations of the scale that transistors can achieve coupled with the issue of power consumption increasing the more transistors are fitted in a chip, casts doubt into the relevance of Moore’s law. However, what might actually happen instead is that the transistors will double their numbers every 18 months, but mainly because the number of cores in each chip would double. What this means is that Operating Systems (OSs) will be able to take advantage on multiple cores within a CPU and via efficient scheduling maximise performance while minimising power consumption.

The benefits of such of the Intel multi-core approach is that the current pro-gramming paradigm can abide and most existing code could be easily –compared to more exotic implementation on GPUs and FPGAs – ported to the many-core architecture much quicker.

On the one hand the FPGA can leverage two advantages over the CPU ap-proach. First it has more silicon dedicated to calculations compared to theCPU. And second it relies on the DataFlow architecture to do away with the taxing as-pects of instruction scheduling and branch-predictions. This way the calculations pipeline is always full and a result is calculated every clock cycle [see Figure5.2for

2_{Moore stated in 1965 that the transistor density of semiconductor chips would double}

(51)

Figure 5.1

The process architecture on a CPU where the ALU is referred as the Function Unit. Data has to be moved into the Funtion Unit form memory and then moved back into memory for storage. (Photo used by permission of Maxeler Technolo-gies).

more details]. On the other hand the CPU, as shown in Figure 5.1, needs to han-dle concurrent threads vying for their turn on the Arithmetic Logic Unit (ALU) in order to progress their calculation status.

5.3 Scope for

FPGA

s

The FPGA lends itself more aptly to problems of a difference engine nature. For instance it has been successfully used in the Seismic Acquisition Industry to per-form finite difference modelling of the geophysical models, to perper-form reverse time migrations, and to do CRS stackings. Lately implementations in Credit Deriva-tives Pricing have been appearing as well [WSRM12]. Basically any mathematical process that can be decomposed to distinguishable self-sufficient sequential cal-culation can achieve high acceleration on the FPGA architecture.

(52)

Figure 5.2

The FPGA architecture as implemented by Maxeler Technologies. The Max-Compiler constructs the DataFlow tree which defines the circuit architecture on the FPGA chip. From then on data from memory gets piped into the different DataFlow cores until it exits the calculation pipe and is committed to memory. (Photo used by permission of Maxeler Technologies).

5.4 Application to the Heston model

The implementation of Heston’s stochastic volatility model has two aspects to it. First the code that is run on the host, and second the code that defines the circuit architecture of the FPGA and performs the necessary calculations.

Since only repetitive calculations can benefit from the DataFlow architecture there are certain elements that need to run on the host and others on theFPGA

card. Maxeler Technologies use the nomenclature of a kernel and a manager. The kernel comprises of a set of calculations that produce a distinct result, e.g. a 3-value moving average. The manager’s responsibility is to instantiate and ad-minister the life cycle and functions of each kernel that is assigned to it. For this implementation the manager creates numerous pipes within a given MaxCard3 _to

3_{Latest models of Maxeler’s} _FPGA _{cards provide an ever increasing number of resources}

(53)

Figure 5.3

This figure illustrates how code interacts between the host CPU and the FPGA

kernels.(Photo used by permission of Maxeler Technologies)

handle different operations. The more pipes that can be filled into the available silicon the better the overall performance of theFPGA. The manager is responsi-ble to create and to populate the pipes with kernels to generate random variates from the Gamma distribution, and also kernels that calculate the next values for the variance and the price of the underlying. Once all the prices of the underlying have been generated for every timestep, the results are aggregated back on the host’s CPU.

Case Studies in Acceleration of Heston s Stochastic Volatility Financial Engineering Model: GPU, Cloud and FPGA Implementations