Computing a Nearest Correlation Matrix with Factor Structure

(1)

Computing a Nearest Correlation Matrix

with Factor Structure

Nick Higham School of Mathematics The University of Manchester

[email protected]

http://www.ma.man.ac.uk/~higham/

Joint work with

Rüdiger Borsdorf,Marcos Raydan NAG Quant Event, London, October 2009

(2)

Outline

Properties of structured correlation matrices. Nearness problem for factor structured correlation matrices.

Selection of optimization method. Numerical analysis issues.

(3)

Correlation Matrix

n_×n symmetric positive semidefinite matrixAwithaii ≡1.

symmetric,

1s on the diagonal,

eigenvalues nonnegativeor all principal minors nonnegative. Properties:

off-diagonal elements between₋1 and 1, convex set.

(4)

Quiz

Is this a correlation matrix?   1 1 0 1 1 1 0 1 1  .

(5)

Quiz

Is this a correlation matrix?   1 1 0 1 1 1 0 1 1  . Spectrum: −0.4142, 1.0000, 2.4142.

(6)

Quiz

For whatw is this a correlation matrix?   1 w w w 1 w w w 1  .

(7)

Quiz

For whatw is this a correlation matrix?   1 w w w 1 w w w 1  . −1 n₋1 ≤w ≤1.

(8)

Structured Correlation Matrices

Nonnegative:   1 1 2 1 3 1 2 1 1 4 1 3 1 4 1  .

(9)

Structured Correlation Matrices

Nonnegative:   1 1 2 1 3 1 2 1 1 4 1 3 1 4 1  . Low rank: _  1 1 1 1 1 1 1 1 1  .

(10)

Structured Correlation Matrices

Nonnegative:   1 1 2 1 3 1 2 1 1 4 1 3 1 4 1  . Low rank: _  1 1 1 1 1 1 1 1 1  . Factor structure:   1 x1x2 x1x3 x1x2 1 x2x3 x1x3 x2x3 1  .

(11)

Approximate Correlation Matrices

Empirical correlation matrices often not true correlation matrices, due to asynchronous data missing data limited precision stress testing http://www.movielens.org

(12)

Nearest Correlation Matrix

FindX achieving

min_{{ k}A₋X_kF :X is a correlation matrix},

where_kA_k2 F = P i,ja 2 ij.

⋆ Constraint set is a closed, convex set, so unique minimizer.

(13)

Optimization Problems—Issues

Existence and uniqueness of solution. Convexity?

Explicit, closed-form solution? Choice of algorithm.

Availability of derivatives.

Starting matrix and convergence criterion. Practical behaviour.

(14)

Quick and “Dirty” Differentiation

Analytic functionf :R→R,i =√₋1. Complex step approximation:

f′₍_x₎_≈_Imf(x+ih)

h .

(15)

Quick and “Dirty” Differentiation

Analytic functionf :R→R,i =√₋1. Complex step approximation:

f′₍_x₎_≈_Imf(x+ih) h . E.g.,h=10−100. f′(x) =Imf(x +ih) h +O(h 2_), f(x) =Ref(x +ih) +O(h2). SeeMIMS EPrint 2009.31, April 2009.

(16)

Alternating Projections Method

H (2002): repeatedlyprojectonto the positive semidefinite matrices then the unit diagonal matrices.

S₁

S₂

◮ Easy to implement.

◮ Guaranteed convergence, at a linear rate. ◮ Can add further constraints/projections,

(17)

Newton Method

Qi & Sun (2006): Newton methodbased on theory of strongly semismooth matrix functions.

Applies Newton todual(unconstrained) of min1

2kA₋X_k2

F problem.

Globally andquadraticallyconvergent.

H & Borsdorf (2007) improve efficiency and reliability by using appropriate iterative method and

eigensolver,

preconditioning the Newton direction solve. The basis ofG02AAF(nearest correlation matrix) in NAG Library Mark 22.

(18)

Factor Model

ξ = X |{z} n×k η |{z} k×1 +diag(fi) | {z } n×n ε |{z} n×1 , η, ε_∈N(0,1). SinceE(ξ) =0, cov(ξ) = E(ξξT) =XXT +F2. Assume var(ξi)≡1. ThenPkj=1xij2+fii2=1, so

k

X

j=1

x_ij2_≤1, i =1: n.

Collateralized debt obligations (CDOs), multivariate time series.

(19)

Structured Correlation Matrix

Yields correlation matrix of form

C(X) = D+XXT ₌_D₊ k X j=1 xjxjT, D =diag(I₋XXT_), _X _{= [}_x_{1, . . . ,}_x k]. C(X)hask factor correlation matrix structure.

(20)

Structured Correlation Matrix

Yields correlation matrix of form

C(X) = D+XXT ₌_D₊ k X j=1 xjxjT, D =diag(I₋XXT_), _X _{= [}_x_{1, . . . ,}_x k]. C(X)hask factor correlation matrix structure.

C(X) =      1 yT 1y2 . . . y1Tyn yT 1y2 1 . . . ... .. . . .. yT n−1yn yT 1yn . . . ynT−1yn 1      , yi ∈Rk.

(21)

Aims

Fork factor correlation matrices, investigate mathematical properties,

(22)

1-Parameter Correlation Matrix

X(w) =   1 w w w 1 w w w 1  , w ∈_R. Theorem

min_{{ k}A₋X(w)_kF :X(w)a corr. matrix}has unique solution the projection of

w = e

T_Ae

−trace(A)

n2₋_n ,

(23)

Block Structured Correlation Matrix

    1 γ11 γ11 1 γ12 γ12 γ12 γ12 γ12 γ12 γ12 γ12 1 γ22 γ22 1    , Cij = ( C(γii)∈Rni×ni, i =j, γijeeT ∈Rni×nj, i 6=j. Objective function: f(Γ) =_kA₋C(Γ)_k2F = m X i=1 kAii−C(γii)k2F+ X i6=j kAij−γijeeTk2F.

Convex constraint set_⇒unique minimizer. Alternating projections converges.

(24)

1-Factor Correlation Matrix

C(x) =diag(1₋x_i2) +xxT, x _∈_Rn i.e., cij =xixj ,i 6=j. Lemma det(C(x)) = n Y i=1 (1₋x_i2) + n X i=1 x_i2 n Y j=1 j6=i (1₋x_j2). Corollary

If_|x_{| ≤}e with xi =1for at most one i then C(x)is

(25)

Rank Result

C(x) =diag(1₋x_i2) +xxT,

Theorem Let x _∈_Rn_with

|x_{| ≤}e. Then rank(C(x)) = min(p+1,n), where p is the number of xi for which|xi|<1.

x = [1 1 1x4x5] ⇒ C(x) =       1 1 1 x4 x5 1 1 1 x4 x5 1 1 1 x4 x5 x4 x4 x4 1 x4x5 x5 x5 x5 x4x5 1       .

(26)

One-Factor Problem

min x∈Rn f(x) :=kA−C(x)k 2 F subject to ₋e_≤x _≤e.

Objective function is nonconvex.

(27)

One-Factor Problem: Derivatives

Objective: f(x) = _hA₋I,A₋I_iF −2xT(A−I)x+ (xTx)2−Pn_i=1x 4 i . Gradient: ∇f(x) = 4((xT_x₎_x −(A₋I)x ₋diag(x2 i )x). Hessian: ∇2_f₍_x_{) =}₄₍₂_xxT _{+ (}_xT_x ₊₁₎_I −A₋3diag(x_i2)). ∇f(x),_∇2_f₍_x₎_cheap.

(28)

Case 1:

f

(

x

_∗

) =

0

Iff(x∗) = 0 then_∇2_f₍_x_∗)_{has the form}

Hn(x) = (xTx)I+xxT −2D2, x ∈Rn.

For example, forn=4:     x2 2 +x32+x42 x1x2 x1x3 x1x4 x2x1 x12+x32+x42 x2x3 x2x4 x3x1 x3x2 x12+x 2 2 +x 2 4 x3x4 x4x1 x4x2 x4x3 x12+x22+x32    .

(29)

Case 1:

f

(

x

_∗

) =

0

Iff(x∗) = 0 then_∇2_f₍_x_∗)_{has the form}

Hn(x) = (xTx)I+xxT −2D2, x ∈Rn.

For example, forn=4:     x2 2 +x32+x42 x1x2 x1x3 x1x4 x2x1 x12+x32+x42 x2x3 x2x4 x3x1 x3x2 x12+x 2 2 +x 2 4 x3x4 x4x1 x4x2 x4x3 x12+x22+x32    . Theorem

(30)

Case 2:

f

(

x

_∗

)

>

0

Can write

1

4∇2_f₍_x_{) =}_H

n(x) +En(x)

whereEnhas the form

E4=     0 x1x2−a12 x1x3−a13 x1x4−a14 x2x1−a21 0 x2x3−a23 x2x4−a24 x3x1−a31 x3x2−a32 0 x3x4−a34 x4x1−a41 x4x2−a42 x4x3−a43 0    .

Hence, if_|xixj−aij|is sufficiently small and Hnpositive

(31)

k

Factor Problem

C(X) :=I₋diag(XXT) +XXT withX _∈_Rn×k. Representation not unique!

k

X

j=1

x_ij2_≤1 =_⇒ C(X)is a correlation matrix. Thek factor problem is

min X∈Rn×k f(x) := kA−C(X)k 2 F subject to k X j=1 x_ij2_≤1.

(32)

k

Factor Problem: Derivatives

Gradient

∇f(X) =4(X(XT_X₎

−AX +X ₋diag(XXT₎_X₎

Hessian given implicitly, can be viewed as a matrix representation of the Fréchet derivative of _∇f(X).

(33)

Choice of Optimization Method

Derivatives available.

Ignore the constraints?

(34)

Choice of Optimization Method

Derivatives available.

Ignore the constraints?

Starting matrix, convergence test?

Rich set of solvers in NAG Library, Mark 22: E04 - Minimizing or Maximizing a Function E05 - Global Optimization of a Function MATLAB Optimization toolbox.

(35)

Alternating Directions

f(xij) =const.+2 X q6=i aiq− k X s=1 xisxqs 2 . Hencef′₍_x ij) = 0 if xij = P q6=ixqj aiq −P_s₆=jxisxqs P q6=ixqj2 . Projectxij onto[−1,1]. Convergence guaranteed.

(36)

Principal Factors Method

Anderson, Sidenius & Basu (2003): with

F(X) = I₋diag(XXT_),

Xi =argminX∈Rn×k kA−F(Xi−1)−XXTkF.

Min obtained by eigendecomposition ofA₋F(Xi−1).

Equivalent toalternating projections methodfor U :=_{W _∈_Rn×n _:_w

ij =aij fori 6=j} convex,

S :=_{W _∈_Rn×n _:_W ₌_XXT _for _X

∈Rn×k} nonconvex! Alt proj theory says no guarantee of convergence! Constraints ignored, so project final iterate onto them.

(37)

Spectral Projected Gradient Method

Birgin, Martínez & Raydan (2000).

To minimizef :Rn→Rover convex setΩ:

xk+1 =xk +αkdk.

dk =PΩ(xk −tk∇f(xk))−xk is descent direction,

αk ∈[−1,1]chosen throughnonmonotone line

search strategy.

(38)

(39)

Test Examples

corr: gallery(’randcorr’,n)

nrand: 1₂(B+BT) +diag(I₋B)withB _∈[₋1,1]n×n

such thatλmin(B)<0.

Results averaged over 10 instances. AD: alternating directions. PFM: principal factors method.

Nwt: e04lbof NAG Toolbox for MATLAB (modified Newton), bound constraints.

(40)

Comparison:

k

=

1

,

n

=

2000

tol=10−3 tol=10−6

t(sec.) #its pf(X∗₎ _t(sec.) _#its p_f₍_X∗₎

corr,f(X0) =26.0 AD 3.3 5.2 26.0 3938 7282 26.0 PFM 68 1.1 26.0 827 18 26.0 Nwt 23 1.8 26.0 36 5.0 26.0 SPGM 9.8 5.2 26.0 638 760 26.0 nrandf(X0) =825.13 AD 3.8 7.2 815.79 3.4 10.0 815.79 PFM 22 3.0 815.81 19.0 4.0 815.81 Nwt 4167 1222 815.79 4312 1229 815.79 SPGM 9.4 7.2 815.79 11 9.6 815.79

(41)

Comparison:

k

=

6

,

n

=

1000

tol=10−3 tol=10−6

t(sec.) #its pf(X∗₎ _t(sec.) _#its p_f₍_X∗₎

corr,f(X0) =18.5 AD 704 836 18.38 5060 5955 18.38 PFM 10 4.1 18.38 95 28.1 18.38 Nwt 167 52 18.38 280 68.2 18.38 SPGM 24 235 18.38 108 892 18.38 nrand,f(X0) =415 AD 8694 9816 421 1.13e4 1.28e4 414 PFM 10.1 6.0 421 9.8 10 420 Nwt 146 40.8 421 109 56 420 SPGM 122 1263 407 276 2925 407

(42)

Conclusions

Performance of methods depends on the problem type,

the required tolerance, the problem size.

Alternating directions good fork =1, low accuracy. Principal factors methodgenerally fast, but may not converge to feasible point.

Spectral projected gradientmethod wins overall. Incorporating the constraints need not hurt

performance.

(43)

References I

L. Anderson, J. Sidenius, and S. Basu.

All your hedges in one basket.

Risk, pages 67–72, Nov. 2003.

|www.risk.net|.

J. Barzilai and J. M. Borwein.

Two-point step size gradient methods. IMA J. Numer. Anal., 8:141–148, 1988. E. G. Birgin, J. M. Martínez, and M. Raydan.

Nonmonotone spectral projected gradient methods on convex sets.

(44)

References II

E. G. Birgin, J. M. Martínez, and M. Raydan.

Algorithm 813: SPG—Software for convex-constrained optimization.

ACM Trans. Math. Software, 27(3):340–349, 2001. E. G. Birgin, J. M. Martínez, and M. Raydan.

Spectral projected gradient methods.

In C. A. Floudas and P. M. Pardalos, editors,

Encyclopedia of Optimization, pages 3652–3659. Springer-Verlag, Berlin, second edition, 2009. P. Glasserman and S. Suchintabandid.

Correlation expansions for CDO pricing.

(45)

References III

N. J. Higham.

Computing the nearest correlation matrix—A problem from finance.

IMA J. Numer. Anal., 22(3):329–343, 2002. C. Lucas.

Computing nearest covariance and correlation matrices.

M.Sc. Thesis, University of Manchester, Manchester, England, Oct. 2001.

(46)

References IV

H.-D. Qi and D. Sun.

A quadratically convergent Newton method for computing the nearest correlation matrix.

SIAM J. Matrix Anal. Appl., 28(2):360–385, 2006. P. Sonneveld, J. J. I. M. van Kan, X. Huang, and C. W. Oosterlee.

Nonnegative matrix factorization of a correlation matrix. Linear Algebra Appl., 431:334–349, 2009.

(47)

References V

A. Vandendorpe, N.-D. Ho, S. Vanduffel, and P. Van Dooren.

On the parameterization of the CreditRisk+_{model for}

estimating credit portfolio risk.

Insurance: Mathematics and Economics, 42(2): 736–745, 2008.