Estim a t i ng T e r m St r uct ur ew ith P e na l i ze d Spl i ne s

(1)

Estimating Term Structure with

Penalized Splines

David Ruppert

Operations Research and Industrial Engineering Cornell University

Joint work with Robert Jarrow and Yan Yu

(2)

Outline

• Bond prices, forward rates, yields

• Empirical forward rate – noisy

• Modelling the forward rate

• Penalized least-squares

• Inadequacy of cross-validation

• Residual analysis – checking the noise assumptions

• Corporate term structure and credit spreads

(3)

Discount Function, Forward Rates, and Yields

• D(0, t) = D(t) is the discount function, the value at time 0 (now) of a zero-coupon bond that pays $1 at time t.

Price(t)

PAR = D(t)

• f(t) is the current forward rate defined by

D(t) = exp

½

−

Z _t

0

f(s)ds

¾

for all t

• The yield is the average forward rate, i.e.,

y(t) = 1

t

Z _t

0

f(s)ds = −1

(4)

Discount Function, Forward Rates, and Yields

0 5 10 15 20 25 30

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

price

maturity

(5)

Prices, Forward Rates, and Yields

0 5 10 15 20 25 30 −1.8

−1.6 −1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0

maturity

log(price)

(6)

Discount Function, Forward Rates, and Yields

0 5 10 15 20 25 30

0.048 0.05 0.052 0.054 0.056 0.058 0.06 0.062 0.064

maturity

−log(price)/t

(7)

Empirical Forward Rate

D(t) = exp

½

−

Z _t

0

f(s)ds

¾

for all t

f(t) = − d

dt log{D(t)}

empirical forward = −log{P(ti+1)} − log{P(ti)} t_i₊₁ − t_i

(8)

Empirical Forward Rate

0 5 10 15 20 25 30 0.02

0.03 0.04 0.05 0.06 0.07 0.08

maturity

empirical forward

(9)

Modelling Coupon Bonds

• P₁,· · · , P_n denote observed market prices of n bonds (coupon or zero-coupon)

• Bond i has fixed payments Ci(ti,j) due on dates

ti,j, j = 1, . . . , Ni (Ni = 1 for zero-coupon bonds)

• Model price for the ith coupon bond:

b

Pi(δ) = Ni

X

j=1

Ci(ti,j) exp

½

−

Z _t_i,j

0

f(s, δ)ds

¾

(10)

Spline Model of Forward Rate

• f(s, δ) = δTB(s)

– B(s) is a vector of spline basis functions

– δ is a vector of spline coefficients

• ∴ F(t,δ) := R₀t f(s, δ)ds = ty(t,δ) = δTBI(s)

(11)

Example: Quadratic Splines

B(s) = ¡1, s, s2, (s − κ1)2₊, . . . ,(s − κK)2₊

¢_T

0.0 0.2 0.4 0.6 0.8 1.0

-0.05

0.0

0.05

0.10

0.15

(12)

Linear Spline – 2 Knots

range

log ratio

400 500 600 700

-1.0

-0.8

-0.6

-0.4

-0.2

(13)

Linear Spline – 4 Knots

range

log ratio

400 500 600 700

-1.0

-0.8

-0.6

-0.4

-0.2

(14)

Linear Spline – 24 Knots

range

log ratio

400 500 600 700

-1.0

-0.8

-0.6

-0.4

-0.2

(15)

Lidar Data — Carefully Chosen Knots

range

log ratio

400 500 600 700

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

(16)

Modelling the Forward Rate

From before:

B(t) =

³

1, t, · · ·, tp, (t − κ₁)p₊, · · ·, (t − κ_K)p₊

´_T

Therefore:

BI(t) :=

Z _t

0

B(s)ds = ¡_t _{· · ·} tp+1 p+1

(t−κ1)p₊+1

p+1 · · ·

(t−κK)p₊+1

p+1

¢_T

(17)

Penalized Least-Squares

Qn,λ(δ) = 1

n

X

i=1

h

h(Pi) − h{Pbi(δ)}

i₂

+ λδTGδ

or equivalently

Qn,λ() = 1

n

X

i=1

(

h(Pi) − h

" _N_i X

j=1

Ci (ti,j) exp

n

−TBI(ti,j)

o#)2

+λTG.

• h is a monotonic transformation: “transform-both-sides” model

• λδTGδ is a “roughness” penalty

– λ ≥ 0

(18)

Penalized Least-Squares

From previous slide:

Qn,λ() = 1

n

X

i=1

(

h(Pi) − h

" _N

i

X

j=1

Ci(ti,j) exp

n

−TBI(ti,j)

o#)2

+λTG.

Several sensible choices for G

1. G is a diagonal matrix

• last K diagonal elements equal to one

• all others zero.

• penalizes jumps at the knots in the pth derivative of the spline. 2. quadratic penalty on the dth derivative R{f(d)(s)}2 ds

(19)

Linear Spline with 24 Knots Fit by Penalized Least

Squares

range

log ratio

400 500 600 700

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

(20)

Using Zero Coupon Bonds

• Now assume we are using zeros, e.g., STRIPS

• Pi has a single payment of $1 at time ti

• Therefore,

Qn,λ(δ) = 1

n

X

i=1

³

h(Pi) − h

h

exp

n

−δTBI(ti)

oi´₂

(21)

Choosing the Knots

• κ_k is the ₍_Kk₊₁₎th sample quantile of {t_i}n_i₌₁

(22)

Effective Number of Parameters of a Fit

There exists a matrix S(λ) such that

   b P1 .. . b P_n  

 ≈ S(λ)

   P1 .. . P_n   

• S(λ) is called the smoother matrix or hat matrix

• DF(λ) := trace{S(λ)} is called the degrees of freedom of the fit

(23)

Generalized Cross-Validation

GCV (λ) = n

−1 Pn

i=1

h

h(P_i) − h

n b

P_i(δ)

oi₂

{1 − n−1_θ_DF(_λ₎_}2 , • one chooses λ to minimize GCV(λ)

• θ is a user-specified tuning parameter

• θ = 1 is ordinary GCV

• Fisher, Nychka, and Zervos used θ = 2

– this causes more smoothing

(24)

EBBS

• To estimate MSE add together:

– estimated squared bias

– estimated variance

• Gives MSE(fb; t, λ), the estimated MSE of fb at t and λ.

– then Pn_i₌₁ MSE(fb; ti, λ) is minimized over λ

• EBBS estimates bias at any fixed t by

– computing the fit at t for a range of values of the smoothing parameter

(25)

EBBS – Estimating Bias

• to the first order, the bias is γ(t)λ for some γ(t)

• Let fb(t, λ) be fb depending on maturity and λ

• Compute

n

λ`,fb(t, λ`)

o

, ` = 1, . . . , L

– λ₁ < . . . < λ_L is the grid of values of λ – we used L = 50 values of λ

– log₁₀(λ_`) were equally spaced between −7 and 1

– DF(10) = 4.8 and DF(10−7) = 28.9 for a 40-knot cubic spline

(26)

EBBS – Estimating Bias

• For any fixed t, fit a straight line to the data

{(λ_i,fb(t, λ_i) : i = 1, . . . , L} • slope of the line is bγ(t)

(27)

EBBS versus GCV

0 5 10 15 20 25 30

0.045 0.05 0.055 0.06 0.065 0.07

forward rate

time to maturity

EBBS

GCV θ=3

GCV θ=1

(28)

EBBS Fit: Residual Analysis

0 10 20 30 −6 −4 −2 0 2 4 6x 10

−3

maturity

residual − log transform

0 5 10 15 20 −0.5 0 0.5 1 Lag Sample Autocorrelation

Sample Autocorrelation Function (ACF)

−4 −2 0 2 4 6 x 10−3 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 Data Probability

Normal Probability Plot

2.5 3 3.5 4 4.5 5 0 1 2 3 4 5 6x 10

−3

log(fitted value)

(29)

Geometry of Transformations

0 1 2 3 4 5 6 7 8

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

Y

log(Y)

(30)

Strength of a Transformation

• Suppose y₁ < y₂

• strength of a transformation h:

strength = h

0₍_y 2) h0₍_y

1)

− 1

• Example:

h(y;α) = y

α ₋ ₁

α if α 6= 0

= log(y) if α = 0

•

strength :=

µ

y₂ y₁

¶_α₋₁

− 1

(31)

Strength of a Transformation

−1 −0.5 0 0.5 1 1.5 2

−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

alpha

strength

y

(32)

Transformation and Weighting

• log is the linearizing transformation – convenient

– induces some heteroscedasticity, but not enough to cause a problem

• log{P(t)}/t = − yield

– cause severe heteroscedasticity – avoid

Transformation and weighting should be done primarily to induce

the assumed noise distribution, which is:

• normal

(33)

Transformation and Weighting

10 20 30 40 50 60 70 80 90 100 0

0.02 0.04 0.06 0.08 0.1 0.12

fitted value

absolute residual

(34)

Transformation and Weighting

2.5 3 3.5 4 4.5 5

0 0.5 1 1.5 2 2.5

3x 10 −3

absolute residual

(35)

Transformation and Weighting

4 5 6 7 8 9 10

0 1 2 3 4 5 6 7x 10

−3

fitted value

absolute residual

(36)

Transformation and Weighting

0 5 10 15 20 25 30

0 1 2 3 4 5 6 7 8 9 10

fitted value

absolute residual

(37)

Modelling the Correlation

• open problem

• probably not stationary

(38)

Modelling Corporate Term Structure

f_C(t) = f_{T r}(t) + α₀ + α₁t + α₂t2

• α₀ + α₁t + α₂t2 is the credit spread

(39)

Modelling Corporate Term Structure

0 5 10 15 20 0 5 10 15 20 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

Years to Maturity AT&T

Apr 94 − Dec 95

(40)

Modelling Corporate Term Structure

(41)

Asymptotics

The PLS estimator is the solution to

n

X

i=1

ψ_i(δ, λ,G) = 0

(42)

Asymptotics:

λ

→

0

Theorem 1

• let {bδ_n,λ_n} be a sequence of penalized least squares estimators

• assume typical “regularity” assumptions

• suppose λn is o(1)

• then bδn is a (strongly) consistent for δ0

• if λ_n is o(n−1/2), then

√ n

³ b

δn,λn − δ0

´

D

ª

,

where

Ω(δ₀) := lim Σ_n, Σ_n = σ−2n−1

n

X

(43)

Asymptotics:

λ

fixed

• assume λn ≡ λ

• the bias does not shrink to 0

– limit of bδn,λ solves

lim

n→∞ E

(

n−1

n

X

i=1

ψi(δ, λ,G)

)

= 0

• the large sample variance formula is

d

Var{bδ(λ)} = σ

2 n

£

{Σn + λG}−1Σn{Σn + λG}−1

¤

(44)

Summary

• splines are convenient for estimating term structure

• penalization is better, or at least easier, than knot selection

• EBBS provides a reasonable amount of smoothing

• GCV undersmooths because

– noise is correlated

– target function is a derivative

• corporate term structure can be estimated by “borrowing strength” from treasury bonds

• a constant credit spread fits the data reasonably well

(45)

References

Jarrow, R., Ruppert, D., and Yu, Y. (2004) Estimating the interest rate term structure of corporate debt with a semiparametric

penalized spline model, JASA, to appear.

Available at:

http://www.orie.cornell.edu/~davidr

• see “Recent Papers”

(46)

References

Ruppert, D. (2004) Statistics and Finance: An Introduction,

Springer, New York – splines, term structure, transformations

Ruppert, D., Wand, M.P., and Carroll, R.J. (2003) Semiparametric Regression, Cambridge University Press, New York – splines