Estimating a Volatility Surface - Optimization Methods in Finance

The discussion in this section is largely based on the work of Coleman, Kim, Li, and Verma, see [21, 20].

The BSM equation for pricing European options is based on a geomet- ric Brownian motion model for the movements of the underlying security. Namely, one assumes that the underlying security price Stat time t satisfies

dSt

= µdt + σdWt (6.7)

where µ is the drift, σ is the (constant) volatility, and Wt is the standard

Brownian motion. Using this equation and some standard assumptions about the absence of frictions and arbitrage opportunities, one can derive the BSM partial differential equation for the value of a European option on this underlying security. Using the boundary conditions resulting from the payoff structure of the particular option, one determines the value function for the option. Recall from Exercise 5.3 that the price of a European call option with strike K and maturity T is given by:

C(K, T ) = S0Φ(d1)− Ke−rTΦ(d2), (6.8) where d1 = log(S0 K) + (r +σ 2 2 )T σ√T , d2 = d1− σ √ T ,

and Φ(_{·) is the cumulative distribution function for the standard normal} distribution. r in the formula represents the continuously compounded risk- free and constant interest rate and σ is the volatility of the underlying security that is assumed to be constant. Similarly, the European put option price is given by

P (K, T ) = Ke−rTΦ(_−d2)− S0Φ(−d1). (6.9)

The risk-free interest rate r, or a reasonably close approximation to it is often available, for example from Treasury bill prices in US markets. Therefore, all one needs to determine the call or put price using these formulas is a reliable estimate of the volatility parameter σ. Conversely, given the market price for a particular European call or put, one can uniquely determine the volatility of the underlying asset implied by this price, called its implied volatility, by solving the equations above with the unknown σ. Any one of the univariate equation solving techniques we discussed in Section 5.3 can be used for this purpose.

Empirical evidence against the appropriateness of (6.7) as a model for the movements of most securities is abundant. Most studies refute the as- sumption of a volatility that does not depend on time or underlying price level. Indeed, studying the prices of options with same maturity but different strikes, researchers observed that the implied volatilities for such options often exhibit a “smile” structure, i.e., higher implied volatilities away from the money in both directions, decreasing to a minimum level as one approaches the at-the-money option from in-the-money or out-of-the-money strikes. This is clearly in contrast with the constant (flat) implied volatilities one would expect had (6.7) been an appropriate model for the underlying price process.

There are many models that try to capture the volatility smile including stochastic volatility models, jump diffusions, etc. Since these models introduce non-traded sources of risk, perfect replication via dynamic hedging as in BSM approach becomes impossible and the pricing problem is more complicated. An alternative that is explored in [21] is the one-factor continuous diffusion model:

dSt

= µ(St, t)dt + σ(St, t)dWt, t∈ [0, T ] (6.10)

where the constant parameters µ and σ of (6.7) are replaced by continuous and differentiable functions µ(St, t) and σ(St, t) of the underlying price St

and time t. T denotes the end of the fixed time horizon. If the instantaneous risk-free interest rate r is assumed constant and the dividend rate is constant, given a function σ(S, t), a European call option with maturity T and strike K has a unique price. Let us denote this price with C(σ(S, t), K, T ).

While an explicit solution for the price function C(σ(S, t), K, T ) as in (6.8) is no longer possible, the resulting pricing problem can be solved effi- ciently via numerical techniques. Since µ(S, t) does not appear in the generalized BSM partial differential equation, all one needs is the specification of the function σ(S, t) and a good numerical scheme to determine the option prices in this generalized framework.

So, how does one specify the function σ(S, t)? First of all, this function should be consistent with the observed prices of currently or recently traded options on the same underlying security. If we assume that we are given market prices of m call options with strikes Kj and maturities Tj in the

form of bid-ask pairs (βj, αj) for j = 1, . . . , n, it would be reasonable to

require that the volatility function σ(S, t) is chosen so that

βj ≤ C(σ(S, t), Kj, Tj)≤ αj, j = 1, . . . , n. (6.11)

To ensure that (6.11) is satisfied as closely as possible, one strategy is to minimize the violations of the inequalities in (6.11):

min σ(S,t)∈H n X j=1 [βj− C(σ(S, t), Kj, Tj)]++ [C(σ(S, t), Kj, Tj)− αj]+. (6.12)

Above, _{H denotes the space of measurable functions σ(S, t) with domain} IR+ _{× [0, T ] and u}+ _{= max}_{{0, u}. Alternatively, using the closing prices}

Cj for the options under consideration, or choosing the mid-market prices

Cj = (βj+αj)/2, we can solve the following nonlinear least squares problem:

min σ(S,t)∈H n X j=1 (C(σ(S, t), Kj, Tj)− Cj)2. (6.13)

This is a nonlinear least squares problem since the function C(σ(S, t), Kj, Tj)

depends nonlinearly on the variables, namely the local volatility function σ(S, t).

While the calibration of the local volatility function to the observed prices using the objective functions in (6.12) and (6.13) is important and desirable, there are additional properties that are desirable in the local volatility function. Arguably, the most common feature sought in existing models is smoothness. For example, in [46] authors try to achieve a smooth volatility function by appending the objective function in (6.13) as follows:

min σ(S,t)∈H n X j=1 (C(σ(S, t), Kj, Tj)− Cj)2+ λk∇σ(S, t)k2. (6.14)

Here, λ is a positive trade-off parameter and _{k · k}2 represents the L2-norm.

Large deviations in the volatility function would result in a high value for the norm of the gradient function and by penalizing such occurrences, the formulation above encourages a smoother solution to the problem. The most appropriate value for the trade-off parameter λ must be determined exper- imentally. To solve the resulting problem numerically, one must discretize the volatility function on the underlying price and time grid. Even for a relatively coarse discretization of the St and t spaces, one can easily end up

with an optimization problem with many variables.

An alternative strategy is to build the smoothness into the volatility function by modeling it with spline functions. To define a spline function, the domain of the function is partitioned into smaller subregions and then, the spline function is chosen to be a polynomial function in each subregion. Since polynomials are smooth functions, spline functions are smooth within each subregion by construction and the only possible sources of nonsmoothness are the boundary regions between subregions. When the polynomial is of a high enough degree, the continuity and differentiability of the spline function at the boundaries between subregions can be ensured by properly choosing the polynomial function coefficients. This strategy is similar to the model we consider in more detail in Section 8.4, except that here we model the volatility function rather than the risk-neutral density and also we generate a function that varies over time rather than an estimate at a single point in time. We defer a more detailed discussion of spline functions to Section 8.4. The use of the spline functions not only guarantees the smoothness of the resulting volatility function estimates but also reduces the degrees of freedom in the problem. As a consequence, the optimization problem to be

solved has much fewer variables and is easier. This strategy is proposed in [21] and we review it below.

We start by assuming that σ(S, t) is a bi-cubic spline. While higher-order splines can also be used, cubic splines often offer a good balance between flexibility and complexity. Next we choose a set of spline knots at points ( ¯Sj, ¯tj) for j = 1, . . . , k. If the value of the volatility function at these points

is given by ¯σj := σ( ¯Sj, ¯tj), the interpolating cubic spline that goes through

these knots and satisfies a particular end condition is uniquely determined. For example, in Section 8.4 we use the natural spline end condition which sets the second derivative of the function at the knots at the boundary of the domain to zero to obtain our cubic spline approximations uniquely. Therefore, to completely determine the volatility function as a natural bi- cubic spline and to determine the resulting call option prices we have k degrees of freedom represented with the choices ¯σ = (¯σ1, . . . , ¯σk).

Let Σ(S, t, ¯σ) the bi-cubic spline local volatility function obtained set- ting σ( ¯Sj, ¯tj)’s to ¯σj. Let C(Σ(S, t, ¯σ), S, t) denote the resulting call price

function. The analog of the objective function (6.13) is then min ¯ σ∈IRk n X j=1 (C(Σ(S, t, ¯σ), Kj, Tj)− Cj)2. (6.15)

One can introduce positive weights wj for each of the terms in the objec-

tive function above to address different accuracies or confidence in the call prices Cj. We can also introduce lower and upper bounds li and ui for the

volatilities at each knot to incorporate additional information that may be available from historical data, etc. This way, we form the following nonlinear least-squares problem with k variables:

min ¯ σ∈IRk f (σ) := n X j=1 wj(C(Σ(S, t, ¯σ), Kj, Tj)− Cj)2 (6.16) s.t. l≤ ¯σ ≤ u.

It should be noted that the formulation above will not be appropriate if there are many more knots than prices, that is if k is much larger than n. In this case, the problem will be underdetermined and solutions may exhibit consequences of “over-fitting”. It is better to use fewer knots than available option prices.

The problem (6.16) is a standard nonlinear optimization problem except that the term C(Σ(S, t, ¯σ), Kj, Tj) in the objective function depends on the

decision variables ¯σ in a complicated and non-explicit manner. Since most of the nonlinear optimization methods we discussed in the previous chapter require at least the gradient of the objective function (and sometimes its Hessian matrix as well), is potentially troublesome. Without an explicit expression for f , its gradient must be either estimated using a finite difference scheme or using automatic differentiation. Coleman et al. implement both alternatives and report that local volatility functions can be estimated very accurately using these strategies. They also test the hedging accuracy

of different delta-hedging strategies, one using a constant volatility estimation and another using the local volatility function produced by the strategy above. These tests indicate that the hedges obtained from the local volatility function are significantly more accurate.

Exercise 6.4 The partial derivative ∂f (x)/∂xi of the function f (x) with

respect to the i-th coordinate of the x vector can be estimated as ∂f (x)

∂xi ≈

f (x + hei)− f(x)

h ,

where ei denotes the i-th unit vector. Assuming that f is continuously dif-

ferentiable, provide an upper bound on the estimation error from this finite difference approximation using Taylor series expansion for the function f around x. Next, compute a similar bound for the alternative finite difference formula given by

∂f (x) ∂xi ≈

f (x + hei)− f(x − hei)

2h .

Comment on the potential advantages and disadvantages of these two approaches.

Quadratic Programming:

Theory and Algorithms

7.1 The Quadratic Programming Problem

As we discussed in the introductory chapter, quadratic programming (QP) refers to the problem of minimizing a quadratic function subject to linear equality and inequality constraints. In its standard form, this problem is represented as follows:

minx 1₂xTQx + cTx

Ax = b x ≥ 0,

(7.1)

where A _{∈ IR}m×n, b _{∈ IR}m, c _{∈ IR}n, Q _{∈ IR}n×n are given, and x _{∈ IR}n. QPs are special classes of nonlinear optimization problems and contain linear programming problems as special cases.

Quadratic programming structures are encountered frequently in optimization models. For example, ordinary least squares problems which are used often in data fitting are QPs with no constraints. Mean-variance optimization problems developed by Markowitz for the selection of efficient portfolios are QP problems. In addition, QP problems are solved as sub- problems in the solution of general nonlinear optimization problems via se- quential quadratic programming (SQP) approaches; see Section 5.5.2.

Recall that, when Q is a positive semidefinite matrix, i.e., when yTQy≥ 0 for all y, the objective function of problem (7.1) is a convex function of x. Since the feasible set is a polyhedral set (i.e., a set defined by linear constraints) it is a convex set. Therefore, when Q is positive semidefinite, the QP (7.1) is a convex optimization problem. As such, its local optimal solutions are also global optimal solutions. This property is illustrated in Figure 7.1 where the contours of a quadratic function with a positive semidefinite Q are contrasted with those of an indefinite Q.

Exercise 7.1 Consider the quadratic function f (x) = cTx +1₂xTQx, where the matrix Q is n by n and symmetric.

x1 x2

Contours of a convex function

−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 x1 x2

Contours of a nonconvex function

−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4

Figure 7.1: Contours of positive semidefinite and indefinite quadratic functions

a. Prove that if xT_{Qx < 0 for some x, then f is unbounded below.}

b. Prove that if Q is positive semidefinite (but not positive definite), then either f is unbounded below or it has an infinite number of solutions. c. True or false: f has a unique minimizer if and only if Q is positive

definite.

As in linear programming, we can develop a dual of quadratic programming problems. The dual of the problem (7.1) is given below:

maxx,y,s bTy − 1₂xTQx

ATy ₋ Qx + s = c x , s _{≥ 0.}

(7.2)

Note that, unlike the case of linear programming, the variables of the primal quadratic programming problem also appear in the dual QP.

7.2 Optimality Conditions

One of the fundamental tools in the study of optimization problems is the Karush-Kuhn-Tucker theorem that gives a list of conditions which are nec- essarily satisfied at any (local) optimal solution of a problem, provided that some mild regularity assumptions are satisfied. These conditions are com- monly called KKT conditions and were already discussed in the context of general nonlinear optimization problems in Section 5.5.

Applying the KKT theorem to the QP problem (7.1), we obtain the following set of necessary conditions for optimality:

Theorem 7.1 Suppose that x is a local optimal solution of the QP given in (7.1) so that it satisfies Ax = b, x≥ 0 and assume that Q is a positive semidefinite matrix. Then, there exist vectors y and s such that the following

conditions hold:

ATy− Qx + s = c (7.3)

s ≥ 0 (7.4)

xisi = 0, ∀i. (7.5)

Furthermore, x is a global optimal solution.

Note that the positive semidefiniteness condition related to the Hessian of the Lagrangian function in the KKT theorem is automatically satisfied for convex quadratic programming problems, and therefore is not included in Theorem 7.1.

Exercise 7.2 Show that in the case of a positive definite Q, the objective function of (7.1) is strictly convex, and therefore, must have a unique minimizer.

Conversely, if vectors x, y and s satisfy conditions (7.3)-(7.5) as well as primal feasibility conditions

Ax = b (7.6)

x _{≥ 0} (7.7)

then, x is a global optimal solution of (7.1). In other words, conditions (7.3)-(7.7) are both necessary and sufficient for x, y, and s to describe a global optimal solution of the QP problem.

In a manner similar to linear programming, optimality conditions (7.3)- (7.7) can be seen as a collection of conditions for

1. primal feasibility: Ax = b, x_{≥ 0,}

2. dual feasibility: ATy_{− Qx + s = c, s ≥ 0, and}

3. complementary slackness: for each i = 1, . . . , n we have xisi = 0.

Using this interpretation, one can develop modifications of the simplex method that can also solve convex quadratic programming problems (Wolfe’s method). We do not present this approach here. Instead, we describe an alternative algorithm that is based on Newton’s method; see Section 5.4.2. Exercise 7.3 Consider the following quadratic program

min x1x2 + x21 + 32x22 + 2x23

+ 2x1 + x2 + 3x3

subject to x1 + x2 + x3 = 1

x1 − x2 = 0

x1 ≥ 0, x2 ≥ 0, x3≥ 0.

Is the quadratic objective function convex? Show that x∗ = (1₂,1₂, 0) is an optimal solution to this problem by finding vectors y and s that satisfy the optimality conditions jointly with x∗.

In document Optimization Methods in Finance (Page 119-128)