Dating multiple change points in the correlation matrix

(1)

SFB

823 Dating multiple change points

in the correlation matrix

Discussion Paper

_{Pedro Galeano, Dominik Wied}

(2)

(3)

Dating multiple change points in the

correlation matrix

Pedro Galeano, Dominik Wied

∗

Universidad Carlos III Madrid, TU Dortmund

This Version: May 10, 2014

Abstract

We propose a nonparametric procedure for detecting and dating multiple change points in the correlation matrix of a sequence of random variables. The procedure is based on a test for changes in correlation matrices at an unknown point in time recently proposed by Wied (2014). Although the procedure requires constant ex-pectations and variances, only mild assumptions on the serial dependence structure are assumed. We show the validity of the procedure including the convergence rate of the change point estimators. Moreover, we illustrate its performance in finite samples by means of a simulation study and the analysis of a real data example with financial returns. These examples show that the proposed algorithm has large power in finite samples.

Keywords: Binary segmentation algorithm; Correlation matrix; CUSUM statistics; Fi-nancial returns; Multiple change point detection; Nonparametric estimation.

∗_{Universidad Carlos III de Madrid, Departamento de Estad´ıstica, E-28903 Getafe, Madrid, Spain.}

Email: [email protected], Phone: +34 91 624 8901 (P. Galeano); TU Dortmund, Fakult¨at Statistik, 44221 Dortmund, Germany. E-Mail: [email protected], Phone: +49 231 755 5419 (D. Wied). Financial support by Ministerio de Ciencia e Innovaci´on grant ECO2012-38442 and Deutsche Forschungsgemeinschaft (SFB 823, project A1) is gratefully acknowledged.

(4)

1. Introduction

The detection of change points in sequences of random variables is an important prob-lem with a wide range of applications. Briefly, the probprob-lem can be stated as follows: a sequence of random variables has a set of characteristics, such as the mean and/or the variance, that follow a piecewise constant structure. Then, we want to detect the number of times that the characteristics change from a set of values to another and the location of the changes. Additionally, we want to estimate the characteristics in each constant period. The piecewise constant structure can then be taken into account to construct an appropriate model that can be used, for instance, to forecast future values of the sequence. Many different strategies have been proposed to solve specific change point problems, such as penalized likelihood methods or binary segmentation procedures, among others. Although the literature on change point detection is vast, two recent and

complete references on the topic can be found in Aue and Horv´ath (2013) and Jandhyala

et al. (2013).

In particular, binary segmentation is one of the most flexible methodologies for change point detection. The idea of the procedure is the following: first, search for a single change point in the whole sequence using, for instance, a likelihood ratio or a cumulative sum (CUSUM) statistic. If a change point is detected, then the sequence is split in two subsequences that are used to search for new change points. This procedure was first proposed by Vostrikova (1981) and posteriorly implemented in various problems by

Incl´an and Tiao (1994), Bai (1997), Bai and Perron (1998), Andreou and Ghysels (2002),

Gooijer (2006), Galeano (2007) and Galeano and Tsay (2010), among many others. See also Fryzlewicz and Rao (2014) and Fryzlewicz (2014) for two recent references on binary segmentation.

Change point problems have been mainly focused on changes in the mean and/or the variance of univariate sequences and in the mean and/or the covariance matrix of multi-variate sequences. However, the case of changes in the correlation between sequences of multiple random variables has not been extensively analyzed. In particular, Wied et al.

(5)

(2012) propose a nonparametric CUSUM statistic to formally test if correlations between two random variables remain constant over time. This approach allows the practitioner to determinate if there is a change or not but he cannot determine where a possible change occurs or how many changes there are. Wied et al. (2012) fill this gap by proposing an algorithm based on the correlation constancy test to estimate both the number and the timing of possible change points. However, the previous papers only consider bivariate correlations that restricts the applicability of these procedures when more than two vari-ables are of interest. For instance, in portfolio management, typically the interest is in more than two assets and constancy of the whole correlation matrix is of interest. Re-cently, Wied (2014) has proposed a CUSUM statistic that extends the methodology from the test proposed by Wied et al. (2012) to higher dimensions, but keeping its nonparamet-ric and model-free approach. Wied (2014) show that the matrix-based test outperforms a method based on performing several pairwise tests and to use a level correction like Bonferroni-Holm in some situations. Moreover, Berens et al. (2013) show that the test is useful for Value at Risk (VaR) forecasting.

The main aim of this paper is to propose a nonparametric procedure for detecting and dating multiple change points in the correlation matrix of sequences of random variables based on the test proposed by Wied (2014). The proposed procedure is a binary segmen-tation type procedure with a refinement step to avoid the presence of false change points due to multiple testing. The method is an extension of the method proposed by Galeano and Wied (2013) for detecting multiple change points in the correlation between two ran-dom variables. The procedure proceeds as follows: first, we determine the “ran-dominating” change point and decide if this point is statistically significant. Then, we split the series in two parts and again test for possible change points in each part of the series. The pro-cedure stops if we do not find any new change point any more. Finally, a refinement step is added to delete all possible false change points and to better estimate their location. We analytically show that the proposed procedure gives the correct number of change points and that, assuming a finite number of change points, these are consistently

(6)

esti-mated. In addition to that, we derive the convergence rate of the change point estimator. Furthermore, we show that the algorithm gives good results in simulated samples and in an empirical application.

The rest of the paper is organized as follows. Section 2 introduces the proposed procedure for detecting multiple change points in the correlation matrix of a multivariate random variable. Section 3 derives the asymptotic properties of the procedure (in particular its validity). Sections 4 and 5 present some simulation studies and a real data application that show the good behavior of the procedure in finite settings. Finally, Section 6 provides some conclusions. All proofs are presented in the Appendix.

2. The procedure

In this section, we present the algorithm for detection of change points in the correlation matrix of a sequence of p-dimensional random vectors. First of all, we introduce some

notation. Throughout the paper, Xt= (X1,t, X2,t, . . . , Xp,t), t ∈ Z, denotes a sequence of

p-variate random vectors on a probability space (Ω, A, P) with finite 4-th moments and

(unconditional) correlation matrix Rt= (ρijt )1≤i,j≤p, where

ρij_t = Cov(Xi,t, Xj,t)

pVar(Xi,t)Var(Xj,t)

.

We call || · ||rthe Lr-norm, where r > 0. Additionally, we write A ∼ (m, n) for a matrix A

with m rows and n columns. We denote by →d and →p convergence in distribution and

probability, respectively, of random variables or vectors. The convergence symbols as well as all moment operators like Var are used with respect to P if not denoted otherwisely. Moreover, let ∨ and ∧ denote maximum and minimum, respectively.

Given an observed time series X1, . . . , XT, Wied (2014) proposed a statistic to test for

the null hypothesis H0 : R1 = . . . = RT versus H1 : ¬ H0. The statistic is given by:

A1,T := max 2≤k≤T k √ T ˆ E_1,T−1/2Pk,1,T 1 , (1)

(7)

where Pk,1,T = ˆρij1,k − ˆρ ij 1,T 1≤i,j≤p,i<j ∈ R p(p−1) 2 , ˆ ρij_1,k = Pk t=1(Xi,t− ¯Xi,1,k)(Xj,t− ¯Xj,1,k) q Pk t=1(Xi,t− ¯Xi,1,k)2 q Pk t=1(Xj,t− ¯Xj,1,k)2 , ¯ Xi,1,k = _k1 Pk t=1Xi,t, ¯Xj,1,k = _k1 Pk

t=1Xj,t, and ˆE1,T is a bootstrap estimate of

E = lim T →∞Cov √ T ˆρij_1,T 1≤i,j≤p,i<j ∼ p(p − 1) 2 × p(p − 1) 2 . (2)

The bootstrap estimate ˆE1,T of E is computed as follows. First, we divide the observed

time series X1, . . . , XT into T −lT−1 overlapping blocks Bi, i = 1, . . . , T −lt−1, where lT

is a block length, such that B1 = (X1, . . . , XlT), B2 = (X2, . . . , XlT+1), . . . Then, for some

large B, we sample h

T lT

i

times with replacement one of the T − lT − 1 blocks and stick

the blocks together, obtaining B p-dimensional time series with length hT

lT

i

· lT. Now,

we calculate the vector vb :=

√

T ˆρij_b,1,T

1≤i,j≤p,i<j, where ρ ij

b,1,T is the sample correlation

of the bootstrapped time series b = 1, . . . , B. Finally, the estimator ˆE1,T is the empirical

covariance matrix of these B vectors, i.e.,

ˆ E1,T = 1 B B X b=1 (vb− ¯v)(vb− ¯v)0 (3) where ¯v = _B1 PB b=1vb.

To derive the limiting null distribution of (1) and to obtain local power results, Wied (2014) considered some additional assumptions that are also required for our latter de-velopments. Similar as in Galeano and Wied (2013), Assumption 1 is slightly extended by considering an arbitrary subinterval of [0, 1].

(8)

Assumption 1. For an arbitrary interval [l1, l2] ⊆ [0, 1] with l1 < l2, Ut:=                              X2 1,t − E(X1,t2 ) .. . ... X2 p,t − E(Xp,t2 ) X1,t − E(X1,t) .. . ... Xp,t − E(Xp,t) X1,tX2,t − E(X1,tX2,t) X1,tX3,t − E(X1,tX3,t) .. . ... Xp−1,tXp,t − E(Xp−1,tXp,t)                              and Sj :=Pj_t=1∨[l₁_{T ]}Ut, we have lim T →∞E 1 (l2− l1)T S[l2T ]S 0 [l2T ] =: D1 ∼ 2p + p(p − 1) 2 , 2p + p(p − 1) 2 ,

where D1 is a finite and positive definite matrix.

Assumption 2. For some r > 2, the r-th absolute moments of the components of Ut are

uniformly bounded, that means, sup_t∈ZE||Ut||r < ∞.

Assumption 3. For r from Assumption 2, the vector (X1,t, . . . , Xp,t) is L2-NED

(near-epoch dependent) with size −r−1_r−2 and constants (ct), t ∈ Z, on a sequence (Vt), t ∈ Z,

which is α-mixing of size φ∗ := −_r−2r , i.e.

||(X1,t, . . . , Xp,t) − E ((X1,t, . . . , Xp,t)|σ(Vt−l, . . . , Vt+l))||₂ ≤ ctvl

with liml→∞vl = 0. The constants (ct), t ∈ Z fulfill ct≤ 2||Ut||2 with Ut from Assumption

(9)

Assumption 4. (X1,t, . . . , Xp,t), t ∈ Z, has constant expectation and variances, that

means, E(Xi,t) and E(Xi,t2) > 0, for i = 1, . . . , p, do not depend on t.

Assumption 5. For T → ∞, lT → ∞ and lT ∼ Tα for α ∈ (0, 1).

Assumptions 1, 2 and 3 concern moments and serial dependencies of the components of

Xt. In particular, Assumption 1 is a regularity condition which holds in many models,

except perhaps the case of trending random variables. This situation is however not relevant, for instance, in the case of financial returns. Assumption 2 requires finite

|4+γ|-th moments of Xt with γ > 0 arbitrary. Although this condition may be more critical,

our simulation evidence shows that the proposed procedure still works in models not fulfilling this assumption (see Section 4 below). Assumption 3 is a very general serial dependence assumption which holds, for instance, in most relevant econometric models such as GARCH models under certain conditions (cf. Carrasco and Chen, 2002). More precisely, Assumption 3 guarantees that the vector

(X_1,t2 , . . . , X_p,t2 , X1,t, . . . , Xp,t, X1,tX2,t, X1,tX3,t, . . . , Xp−1,tXp,t)

is L2-NED (near-epoch dependent) with size −1₂, cf. Davidson (1994), p. 273.

Assump-tion 4 is a staAssump-tionarity condiAssump-tion which is in line with Aue et al. (2009) and that can be slightly relaxed to allow for some fluctuations in the first and second moments. How-ever, we do not consider this situation for ease of exposition and because the procedure would remain exactly the same. Nevertheless, we investigate in our simulation study in Section 4, how the procedure behaves in finite samples in the presence of GARCH effects (volatility clustering). Finally, Assumption 5 is similar as in Calhoun (2013), Corollary 2, and guarantees that the block length becomes large but not too large compared to T .

Under H0 and assumptions 1, 2, 3, 4 and 5, Corollary 1.b in Wied (2014) shows that the

statistic A1,T converges in distribution to the supremum of the sum of the absolute value

of independent Brownian Bridges. More precisely, H0 si rejected whenever A1,T is larger

than the 1 − α quantile of A := sup_0≤s≤1

B p(p−1) 2 (s) 1

(10)

estimated by Monte Carlo simulations by approximating the path of Brownian Bridges on fine grids, as seen in Section 4.

The proposed procedure sequentially employs the test statistic in (1) to estimate the timings and the number of multiple change points. In particular, we assume that there

is a finite number of change points. However, the number, location and size of the

change points are unknown. In Assumption 4 we assumed for easiness in exposition that expectations and variances are constant. Additionally, we assume that under the alternative hypothesis, we have a piecewise constant second-order cross moment matrix

E (XtXt0). The formal assumption is:

Assumption 6. Under the alternative, expectations and variances of X1,t, . . . , Xp,t are

constant and equal to finite numbers, µi and σ2i, respectively, for i = 1, . . . , p, while

the second-order cross moments matrix changes from E (XtXt0) = mXX0 to E (X_tX0

t) =

mXX0 + g t

T. The function g(z), z ∈ [0, 1] is a step function with a finite number of

steps `, i.e. there is a partition 0 = z0 < z1 < . . . < z` < z`+1 = 1 and there are second

cross moment level matrices a0, . . . , a` such that

g(z) = ` X

i=0

ai1{z∈[zi,zi+1)}

and g(1) = al. The quantities `, z1, . . . , z` and a0, . . . , a` do not depend on T .

The function g specifies the timing and the size of the changes in the correlation matrix. Since this is a step function, we consider sudden changes in the correlation and do not consider smooth changes.

Next, we present the proposed procedure with the goal of estimating `, z1, . . . , z` and

a0, . . . , a`. The procedure is divided in four steps. The first and second steps of the

algorithm are basically the steps taken by the usual binary segmentation algorithm. The idea of these steps is to isolate each change point in different time intervals by splitting the series into two parts once a change point is found. Then, the search of a new change point is repeated in both sections. These two steps are iterated until no more change points

(11)

are detected. The third step of the algorithm is a refinement step in which the detected change points are tested in subintervals containing a single change point. Additionally, this step allows to better estimate the true locations of the change points. The final step computes the correlation matrix in subintervals with constant (unconditional) correlation. To establish the asymptotic results, it is better to present the procedure in terms of the estimator of the change point fraction. To that purpose, we rewrite the test statistic (1) as A1,T := sup z∈[0,1] τ (z) √ T ˆ E_1,T−1/2Pτ (z),1,T 1,

with τ (z) = [2+z(T −2)] (where [·] is the floor function), Pτ (z),1,T =

ˆ ρij_{1,τ (z)}− ˆρij_1,T 1≤i,j≤p,i<j ∈ R p(p−1)

2 , and estimate the timing of the break by ˆz := τ (ˆz∗)/T with ˆz∗ := argmax

zB1,T(z) and B1,T(z) := τ (z)_T Pτ (z),1,T

1. Note that we exclude the bootstrap estimator from the

argmax estimator as it would disturb the information about the location of the change

points given by the correlation differences. In fact, we will see later on that B1,T(z)

converges to a function that essentially depends only on the function g. Here and in the following, we restrict the values z for which the argmax is calculated to multiples of 1/T . In case of multiple solutions, we choose the smallest one. Note that in the first step

of the procedure, B1,T(z) is calculated from all observations. In subsequent iterations if

needed, we just consider the observations in the relevant part of the sample and call the corresponding “target function”

Aη(l1),ξ(l2)(z) := ξ(z) − η(l1) + 1 pξ(l2) − η(l1) + 1 ˆ E_η(l−1/2 1),ξ(l2)Pτ (z),η(l1),ξ(l2) 1 , (4) where, z ∈ [l1, l2] for 0 ≤ l1 < l2 ≤ 1, η(z) = ([zT ] ∨ 1) ∧ (T − 1), ξ(z) = η(z) ∨ (η(l1) + 1), ˆ

Eη(l1),ξ(l2) denotes the bootstrap estimate of E in (2) using data from η(l1) to ξ(l2) and

Pτ (z),η(l1),ξ(l2) = ˆ ρij_η(l 1),τ (z)− ˆρ ij η(l1),ξ(l2) 1≤i,j≤p,i<j ∈ R p(p−1)

2 . Then the timing of break is

estimated by

ˆ

(12)

with ˆz∗ = argmax_l 1≤z≤l2Bη(l1),ξ(l2)(z) and Bη(l1),ξ(l2)(z) := ξ(z) − η(l1) + 1 ξ(l2) − η(l1) + 1 Pτ (z),η(l1),ξ(l2) 1, (6)

Basically, this means that we always look for the time point at which the test statistic (4) (calculated from data in a particular interval) takes its maximum and divide by T .

Under the null hypothesis of no correlation change, ˆE1,T converges to a positively definite

matrix with respect to the product measure P× which combines randomness from the

data as well from the bootstrap (see the proof of Corollary 1.b in Wied, 2014). So,

it is reasonable to assume that the matrix ˆEη(l1),ξ(l2) is invertible. It is unclear what

happens with this matrix under fixed alternatives as considered below. However, in order

to ensure consistency of the test statistic, we impose the convention that ˆEη(l1),ξ(l2) if not

invertible is perturbed slightly such that it becomes invertible. This does not affect the asymptotic properties neither under the null hypothesis nor under alternatives. The only potential drawback is that there might be some overrejections in finite samples if p is large compared to T . Nevertheless, in the following, we always assume without loss of

generality that ˆE_η(l−1/2

1),ξ(l2) exists.

Formally, the algorithm proceeds as follows:

1. Let X1, . . . , XT be the observed series. Obtain the test statistic A1,T. There are

two possibilities:

(a) If the test statistic is statistically significant, i.e., if A1,T > cT ,α, where cT,α is

the asymptotic critical value for a given upper tail probability, then a change

in the correlation matrix is announced. Let z1 be the break point estimator

from (5) and go to step 2.

(b) If the test statistic is not statistically significant, the algorithm stops, and no change points are detected.

(13)

2. Let z1, . . . , z` be the ` change points in increasing order already found in previous iterations. If max k n A_η(z k−1+_T1),ξ(zk), k = 1, . . . , ` + 1 o > cT,α, where A_η(z

k−1+_T1),ξ(zk) is the value of the statistic calculated from the data from

η(zk−1+ _T1) to ξ(zk), for k = 1, . . . , ` + 1, taking z0 = 0 and z`+1 = 1, then a new

change point is detected at the point fraction at which the value A_η(z

kmax−1+T1),ξ(zkmax)

is attained, where:

kmax = arg max

k n A_η(z k−1+_T1),ξ(zk), k = 1, . . . , ` o .

Repeat this step until no more change points are found.

3. Let (z1 < . . . < z`) be the detected change points. If ` > 1, refine the estimates of

the change point locations by calculating the statistic from the data from η(zk−1+_T1)

to ξ(zk+1), for k = 1, . . . , `, where z0 = 0 and z`+1 = 1. If any of the change points

is not statistically significant, delete it from the list, and repeat this step.

4. Finally, estimate the correlation matrix of X1, . . . , XT in each segment separately

with the usual sample correlation matrix where individual correlations are computed using the Bravais-Pearson correlation coefficient.

As shown in Section 3, the proposed procedure consistently detects the true change points. Steps 1 and 2 are, essentially, the steps taken by the usual binary segmentation procedure. Step 3 is meant to refine the estimation of the change points as in this step the algorithm computes the value of the statistic in intervals only affected by the presence of a single change point, which is not guaranteed in step 2.

A key issue in applying the procedure to real data is the selection of the critical level used in the algorithm. A possibility is to use always the same critical value in each step of the procedure. However, the use of the same critical level in steps 2 and 3 may lead

(14)

to over-estimation of the number of change points, because the larger the number of detected change points, the larger the accumulation of type I errors. Although we later prove that we can consistently estimate the correct number of change points even if this is the selected strategy, in practice, we require that the type I errors used depend on the

number of change points already detected by the algorithm. More precisely, let α0 be

the type I error for step 1. Then, we use the critical value cT,αk after detecting the k-th

change point, where αkis such that 1 − α0 = (1 − αk)k+1, leading to αk = 1 − (1 − α0)

1 k+1,

that keeps the same significance level constant for all tests. For instance, if α0 = 0.05,

α1 ≈ 0.025, α2 ≈ 0.017 and so on. In fact, the initial type I error in the asymptotic

result concerning the number of break points (Theorem 2) would have to converge to

zero. However, in finite samples, α0 = 0.05 seems to be an acceptable choice. As noted

before, we use the quantiles of the distribution of the supremum of the sum of the absolute value of independent Brownian Bridges estimated by Monte Carlo simulations that can be approximated numerically as explained in Section 4.

The proposed procedure works reasonably well even in small samples in terms of detection of the true number of changes, as shown in the Monte Carlo experiments of Section 4. However, note that if the number of change points detected is large compared to the sample size, then a piecewise constant correlation matrix may not be a good description of the true correlation of the series.

3. Analytic results

As in Galeano and Wied (2013), we assume that there are dominating change points (if any) in order to obtain analytic results.

Assumption 7. For any 0 ≤ l1 < l2 ≤ 1, the function

Pl1,l2(z) := P_l∗₁_,l₂(z) 1

(15)

with P_l∗ 1,l2(z) := Z z l1 g(t)dt − z − l1 l2 − l1 Z l2 l1 g(t)dt

is either constant or has a unique maximum for a z ∈ [l1, l2].

This is fulfilled for example if there is one dominating break in one component, whereas the correlations in the other components stay constant. The condition is also fulfilled if there is a dominating break in all components at the same time point.

Based on this assumption, we can show consistency of the estimator, validity of the algo-rithm (in the sense that the number of change points is detected correctly asymptotically) and a local power result. The ideas of the proofs are similar to those in Galeano and Wied (2013) and Wied (2014), whereas we additionally have to ensure that the bootstrap

variance estimator ˆE_η(l−1/2

1),ξ(l2) is positively definite if there is a dominating change point

between l1 and l2.

Theorem 1. Let Assumptions 2, 3, 4, 6 and 7 be true and let there be at least one break

point in a given interval [l1, l2] ⊆ [0, 1] with l1 < l2. Then the change point estimator (5)

is consistent for the dominating change point.

Note that, for Theorem 1, we need not apply a functional central limit theorem so that we do not need Assumption 1. Moreover, as in Galeano and Wied (2013), one could relax Assumption 2 by only assuming the existence of finite q-th moments for a q > 1. Finally, we do not need Assumption 5 on a block length as the bootstrap estimator is excluded from the argmax estimator.

In addition to a consistency result, a statistician is also interested in the convergence rate of the change point estimator. Such a result is given in Theorem 2.

Theorem 2. Let the assumptions from Theorem 1 be true. Then, for every i = 1, . . . , ` and > 0, there is a M > 0 such that

(16)

Interestingly, the convergence rate is not √T , which would hold within e.g. the central limit theorem. This is true despite the fact that the change point estimator is based on the argmax which is also true for the usual maximum likelihood estimator. An intuition behind this can be obtained in a simple bivariate model with independent, bivariately normally distributed random variables and one change point in which the change point fraction z is the only parameter. The likelihood in this model is given by

l(z|(X1, Y1), . . . , (XT, YT)) =

T Y

i=1

(f1,i((Xi, Yi))1i≤[T z]+ f1,i((Xi, Yi))1i>[T z]),

where f1,i and f2,i are the densities of (Xi, Yi) before and after the break, respectively.

Here, one directly sees that the likelihood is not differentiable in the change point such

that standard regularity conditions for a √T -asymptotic do not hold.

The preceding theorem is of potential own interest, but is also needed in order to ensure

that the asymptotic behavior of the test statistic calculated from τ (ˆzi) to τ (ˆzi+1) is

similar to that calculated from τ (zi) to τ (zi+1). This will be important in the proof of

Theorem 3, especially for the fact that the number of change points is not overestimated asympotically. In fact, one also needs such a condition in the proof of Theorem 2 in Galeano and Wied (2013) (although it is not explicitly stated there).

While the convergence results above are important, our main interest lies in consistently estimating the number of change points. For this, we need the assumptions for applying a functional central limit theorem and an additional assumption on the critical values.

Assumption 8. The critical values cT,αk used in the algorithm obey the condition

limT →∞cT ,αk = ∞ and cT ,αk = o(

√

T ) for k ∈ N0.

Theorem 3. Under Assumptions 1, 2, 3, 4, 5, 6, 7 and 8, the change point algorithm asymptotically gives the correct number of change points ` and the change points are consistently estimated.

Finally, in this section, we want to address the case in which the correlation shifts tend

to zero with rate _√1

(17)

E (XtXt0) = mXX0+ g t T by E (XtX 0 t) = mXX0 +√1 Tg t

T. In this setting, Wied (2014)

provides local power results (compare his Corollary 2). We do not have consistency to the true break point any more, but the change point estimator converges to a non-degenerated random variable as the next theorem shows.

Theorem 4. Let Assumptions 2, 3, 4, 6 (with E (XtXt0) = mXX0 + g t

T replaced by

E (XtXt0) = mXX0 + √1

Tg t

T) and 7 be true and let there be at least one break point in

a given interval [l1, l2] ⊆ [0, 1] with l1 < l2. Then it holds for the change point estimator

(5) that ˆ z →dargmax l1≤z≤l2 E Wp(p−1)2 (z) − W p(p−1) 2 (l 1) − z − l1 l2− l1 Wp(p−1)2 (l 2) − W p(p−1) 2 (l 1) + P_l∗₁_,l₂(z) ,

where P_l∗₁_,l₂(z) is from Assumption 7 and Wp(p−1)2 (z) is a p(p−1)

2 -dimensional standard

Brownian motion.

4. Simulation evidence

In this section, we present several Monte Carlo experiments to illustrate the performance of the proposed algorithm in finite samples. In particular, we focus our attention to three important aspects: first, the empirical size of the procedure, second, its power in correct detection of changes, and third, its ability to accurately identify the location of the change points.

In all the Monte Carlo experiments in this section and the real data example in Section 5, the critical values used by the proposed procedure are the estimated quantiles of the distribution of the supremum of the sum of the absolute value of independent Brownian Bridges. In particular, as the dimension of the series in the simulations of this section and the real data example of Section 5 is p = 4, we obtain the estimated quantiles by generating 100000 sets of 6 independent Brownian Bridges in a fine grid of 1000 points in the interval [0, 1]. Then, for each set, we take the absolute values of the observed Brownian Bridges, add the six of them and, finally, obtain the maximum of the sums

(18)

over the generated 1000 points. In this way, we obtain a sample of 100000 random values of the required distribution, from which we can easily estimate the quantiles. For instance, the first five quantiles used in steps 1 and 2 of the procedure, if needed, are given by 4.4366, 4.6890, 4.8298, 4.9230 and 4.9907, respectively. An histogram of the 100000 random values and a kernel estimate of their density function are shown in Figure 1. The plot suggests that the asymptotic distribution is slightly positive skewed.

Figure 1 about here

In the simulations that we report below, we consider several variants of the scalar BEKK model proposed by Ding and Engle (2001). There are two main reasons to use this model. First, our main fields of application of the proposed procedure are financial returns and BEKK models are one of the most widely used models to analyze these kind of time series. Second, unlike many other multivariate GARCH models it is possible to derive the unconditional covariance and correlation matrices of the series that allow us to easily simulate series with a changing unconditional correlation matrix.

We initially focus on the size of the procedure, i.e., the accuracy of the procedure in estimating the number of change points if the true value is zero. For that, we consider the scalar BEKK model given by:

Xt= H

1/2

t Et

Ht= (1 − α − β) H+αXt−1X0t−1+ βHt−1

where Htis the conditional covariance matrix of Xt, Etare iid random vectors with mean

04 and covariance matrix I4, and α and β are positive numbers such that α + β < 1, to

ensure covariance stationary. Under these assumptions, it is not difficult to show that

H is the unconditional covariance matrix of Xt. Therefore, the unconditional correlation

matrix of Xt can be written as R = D−1/2HD−1/2, where D is a diagonal matrix with

(19)

the main diagonal of H. In particular, we take α = 0.14 and β = 0.85 to reflect volatility

persistence, and D = I4, so that H = R, with:

R =          1 0.5 0.6 0.7 0.5 1 0.5 0.6 0.6 0.5 1 0.5 0.7 0.6 0.5 1          . (7)

The random errors, Et, are assumed to be, first, a four dimensional standard Gaussian

distribution, and, second, a four dimensional standardized Student-t with 3 degrees of freedom. The latter distribution represents a rather extreme situation not covered by our assumptions but we analyze this case to see the performance of the procedure in settings which can hold in financial applications. The sample sizes considered are T = 500, 1000, 2000, 3000 and 4000, which are usual sample sizes of financial returns, while the block

lengths are lT = T1/4, i.e., l500 = 4, l1000 = 5, l2000 = 6, l3000 = 7 and l3000 = 7,

respectively. The number of bootstrap replications is B = 1000. Table 1 gives the results

based on 1000 replications and an initial nominal significant level of α0 = 0.05. From this

table, it seems that the type I error of the proposed procedure is very close to the initial nominal level for the Gaussian even with the smallest sample size, while there are some small size distorsions for the standardized Student-t with 3 degrees of freedom, although

the level seems to converge to the initial nominal significant level of α0 = 0.05 for higher

T . Therefore, overestimation does not appear to be an issue for the proposed procedure if there are no changes in the correlation.

Table 1 about here

Next, we analyze the power of our procedure when there is a single change point in the series. The Monte Carlo setup is similar to the one described above, but the series are generated with a single change point in the unconditional correlation matrix. Three

(20)

locations of the change point are considered, z1 = 0.25, 0.50 and 0.75, respectively. The

change is such that R is initially as in (7) and then changes at z1 to:

R1 =          1 0.7 0.6 0.5 0.5 1 0.7 0.6 0.6 0.5 1 0.7 0.7 0.6 0.5 1          . (8)

Note that the largest correlation change is of magnitude .2, while two of the correlations do not change at all. We believe that this setting seems quite reasonable in practice. Table 2 shows the relative frequency detection of zero, one and more than one change points. It is seen that the procedure performs quite well in detecting a single change point, with many cases over 90% correct detection. Second, as expected, as the sample size increases the procedure works better. Third, when the sample size is small, the probability of under-detection may be large only if the errors are Student-t with 3 degrees of freedom. However, in practice, one does not expect to have many change points if the length of the series is small. Fourth, the location of the change point does not strongly affect the detection frequency of the procedure unless the sample size is small. In this latter case, the procedure detects more frequently the change point at the middle of the series. This is in coincidence with other procedures relying in CUSUM statistics as the one used here. Finally, in most cases, the percentage of false detection is always quite close to the nominal 5%, specially in the Gaussian case. Specifically, the frequency of over-detection is small. Regarding estimation of the location of the change point, Table 3 shows the median and mean absolute deviation of the change point estimators in each case. The table shows that the medians of the estimates are reasonable close to the true change point locations. Indeed, the larger the sample size, the smaller the empirical mean absolute deviation. In particular, the bias appear to be smaller in the Gaussian case.

(21)

Table 3 about here

Next, we conduct another Monte Carlo experiment to study the power of the proposed procedure for detecting two change points. In this case, the location of the change points

are z1 = 0.35 and z2 = 0.7, respectively. The changes are such that the correlation matrix

of the series before the first change point is the correlation matrix in (7), then changes to the correlation matrix in (8), and, finally, changes again to the correlation matrix in (7) at the second change point. Note that in this scenario there is no a dominant change point but we prefer to consider this situation to show that the procedure works well also in this case. Table 4 shows the relative frequency detection of zero, one, two and more than two changes. As in the case of a single change point, the proposed procedure works reasonably well, especially when the sample size gets larger. In addition, the frequency of over-detection is small. It may underestimate the number of change points, however, in the case of Student-t errors. On the other hand, Table 5 shows the median and mean absolute deviation of the estimates of the two change point locations. As expected in view of the results in Section 3, the medians of the estimates are reasonable close to the true ones. Again, it appears that the larger is the sample size, the better are the locations estimated.

Table 4 about here Table 5 about here

5. Empirical application

In this section, we illustrate the performance of the proposed procedure with a financial time series. For this, we look for changes in the correlation structure of the daily simple return series of four stocks. Specifically, as in Wied (2014), we consider four European

(22)

companies, Total, Sanofi, Siemens and BASF from January 1, 2007 to June 1, 2012 con-sisting of T = 1414 data points. The data was obtained from the database Datastream. The four return series are plotted in Figure 1, which shows very similar patterns. The autocorrelation functions of the simple returns show some minor serial dependence, while the autocorrelation functions of the squared simple returns reveal considerable serial de-pendence, as usual in stock market returns.

The empirical full sample correlation matrix is given by:

R =          1 0.5483 0.6460 0.6734 0.5483 1 0.4821 0.4998 0.6460 0.4821 1 0.7208 0.6734 0.4998 0.7208 1          .

Figure 2 show rolling windows for the six pairwise correlations of the simple return series with window length 120 that roughly corresponds to a trading time of about half a year. The plots show time-varying correlations. It is interesting to see several correlation ups and downs.

Figure 1 about here Figure 2 about here

Next, we apply the proposed segmentation procedure of Section 2 to detect correlation changes for the simple returns of the Total, Sanofi, Siemens and BASF stock assets. Table 6 shows the iterations taken by the procedure. In the first step, we start with the asymptotic critical value at the 5% significance level and the procedure detects a change in the correlation at time point t = 443 (September 11, 2008). The value of the test statistic (1) is 6.3280, which is statistically significant at the 5% level. Then, we split the series into two subperiods and look for changes in the subintervals [1, 443] and [444, 1414], respectively. In the first subinterval, the procedure detects a change point at

(23)

time point t = 134 (July 6, 2007). The value of the test statistic is 4.8159. Then, we split the subinterval [1, 443] into two subintervals, [1, 134] and [135, 443], respectively, and look again for new change points. No more changes were found in the three subintervals [1, 134], [135, 443], and [444, 1414]. Then, we pass to step 3 (the refinement step) and compute the statistic in the subintervals [1, 443], and [135, 1414], respectively. In the first subinterval, the procedure detects a change point at time point t = 134 (July 6, 2007) and the value of the test statistic is 4.7438. In the second subinterval, the procedure detects a change point at time point t = 443 (September 11, 2008) and the value of the test statistic is 5.5399. As the detected change points are the same as in the previous iterations, the algorithm stops and the time points located at t = 134 (July 6, 2007) and 443 (September 11, 2008) are the final detected change points.

Table 6 about here

It is interesting to see that the dates of the detected change points fare well with well known financial facts. The first estimated change point corresponds to the beginning of the Global Financial Crisis around the middle of 2007. The reduction of interest rates leads to several consequent issues starting with the easiness of obtaining credit, leading to sub-prime lending, so that an increased debt burden, and finally a liquidity shortfall in the banking system that resulted in the collapse of important financial institutions such as Lehman Brothers and Merrill Lynch, among others, and the bailout of banks by national governments such as Bear Stearns, Bank of America and Bankia, among others. Specifically, the bankruptcy of Lehman Brothers was formally announced at on September 15, 2008, after a week of rumours, which is very close to the second estimated change point.

Next, Table 7 shows the correlation matrices of the three simple returns of the Total, Sanofi, Siemens and BASF stock assets for the three periods of constant unconditional correlation provided by the procedure. Note how all the pairwise correlations increases

(24)

after each detected change point. For instance, the correlation between Total and Sanofi pass from 0.1564 to 0.3907 at the first change point, and then from 0.3907 to 0.5990 at the second change point. This is in accordance with phenomenon kwnon as “correlation meltdown” that affirms that in times of crisis, the correlations between financial returns often increases.

Table 7 about here

6. Conclusions

This paper has proposed a procedure for detecting change points in the correlation matrix of a sequence of multiple random variables. The procedure is based on a CUSUM test statistic proposed by Wied (2014). The asymptotic distribution of the test statistic is the one of the supremum of the maximum of the absolute value of independent Brownian bridges. We have shown that, under certain circumstances, the procedure consistently detects the true number and location of the change points. The finite sample behavior of the procedure has been analyzed via several simulation studies and illustrated with the analysis of a four dimensional time series of simple returns of four European companies. The real data example suggests that the procedure detects changes at points that fare well with external events affecting the financial markets.

An alternative method to detect changes in the correlation matrix of a sequence of random variables is to consider each entry in a higher dimensional correlation matrix separately to determine whether there have been changes in the individual correlations. However, if the dimension is large enough, the number of comparisons between correlations can be huge. Consequently, we think that the use of the whole correlation matrix is a more elegant and compact way to detect these kind of changes.

Finally, it might be interesting to consider a more sophisticated algorithm that includes modifications of the standard binary segmentation procedure such as the ones introduced

(25)

in Fryzlewicz (2014) to increase the power of the procedure in small samples. However, the modification made in Fryzlewicz (2014) is not straightforward in the case of changes in the correlation matrix.

References

Andreou, E. and E. Ghysels (2002): “Detecting multiple breaks in financial market volatility dynamics,” Journal of Applied Econometrics, 17(5), 579–600.

Aue, A., S. H¨ormann, L. Horvath, and M. Reimherr (2009): “Break detection

in the covariance structure of multivariate time series models,” Annals of Statistics, 37(6B), 4046–4087.

Aue, A. and L. Horv´ath (2013): “Structural breaks in time series,” Journal of Time

Series Analysis, 34(1), 1–16.

Bai, J. (1997): “Estimating multiple breaks one at a time,” Econometric Theory, 13(3), 315–352.

Bai, J. and P. Perron (1998): “Estimating and testing linear models with multiple structural changes,” Econometrica, 66(1), 47–78.

Berens, T., G. Weiß, and D. Wied (2013): “Testing for structural breaks in cor-relations: Does it improve Value-at-Risk forecasting?” SSRN working paper, online: http://ssrn.com/abstract=2265488.

Calhoun, G. (2013): “Block bootstrap consistency under weak assumptions,” Iowa State Working Paper, version 25.03.2013,

http://www.econ.iastate.edu/sites/default/files/publications/papers/p14313-2011-09-23.pdf.

Carrasco, M. and X. Chen (2002): “Mixing and moment properties of various GARCH and stochastic volatility models,” Econometric Theory, 18(1), 17–39.

(26)

Davidson, J. (1994): Stochastic limit theory: An introduction for econometricians, Ox-ford University Press.

Ding, Z. and R. Engle (2001): “Large Scale Conditional Covariance Matrix Modeling, Estimation and Testing,” Academia Economic Papers, 29(2), 157–184.

Fryzlewicz, P. (2014): “Wild binary segmentation for multiple change-point detec-tion,” Working paper, London School of Economics.

Fryzlewicz, P. and S. S. Rao (2014): “Multiple-change-point detection for auto-regressive conditional heteroscedastic processes,” to appear in: Journal of the Royal Statistical Society, Series B, doi: 10.1111/rssb.12054.

Galeano, P. (2007): “The use of cumulative sums for detection of changepoints in the rate parameter of a Poisson Process,” Computational Statistics and Data Analysis, 51(12), 6151–6165.

Galeano, P. and R. Tsay (2010): “Shifts in individual parameters of a GARCH model,” Journal of Financial Econometrics, 8(1), 122–153.

Galeano, P. and D. Wied (2013): “Multiple break detection in the correlation struc-ture of random variables,” to appear in: Computational Statistics and Data Analysis, online: http://dx.doi.org/10.1016/j.csda.2013.02.031.

Gooijer, J. D. (2006): “Detecting change-points in multidimensional stochastic pro-cesses,” Computational Statistics and Data Analysis, 51(3), 1892–1903.

Incl´an, C. and G. Tiao (1994): “Use of cumulative sums of squares for retrospective

detection of changes of variance,” Journal of the American Statistical Association, 89(427), 913–923.

Jandhyala, V., S. Fotopoulos, I. McNeill, and P. Liu (2013): “Inference for single and multiple change-points in time series,” to appear in: Journal of Time Series Analysis, doi: 10.1111/jtsa12035.

(27)

Vostrikova, L. (1981): “Detecting disorder in multidimensional random processes,” Soviet Mathematics Doklady, 24, 55–59.

Wied, D. (2014): “A nonparametric test for a constant correlation matrix,” SSRN working paper, online: http://ssrn.com/abstract=2372748.

Wied, D., W. Kr¨amer, and H. Dehling (2012): “Testing for a change in correlation

at an unknown point in time using an extended functional delta method,” Econometric Theory, 68(3), 570–589.

A. Appendix

A.1. Proofs

Proof of Theorem 1

Applying the proof of Theorem 1 in Galeano and Wied (2013) component by component

and making use of the fact that a variance estimator like ˆE_η(l−1/2

1),ξ(l2) is excluded, one shows

that, uniformly on [l1, l2],

Bη(l1),ξ(l2)(z) →a.s.CPl1,l2(z)

for a certain constant C. Then, the theorem follows by applying the argmax continuous

mapping theorem.

Proof of Theorem 2

We assume without loss of generality that there is only one change point in k0 = [T z0]

and that [l1, l2] = [0, 1]. Denote P0,1(z) =: P (z). Then P (z) has a unique maximum in

z0. Similarly as in the proof of Proposition 2 in Bai (1997), we show that

PM,T := P max|k−k0|>MB1,T k T − B1,T k0 T ≥ 0

becomes small for large M and T . That means that, for every > 0, there is a M > 0

(28)

Now, B1,T _Tk − B1,T(k_T0) ≥ 0 is equivalent to B1,T k T − P k T − B1,T k₀ T − P k0 T + P k T − P k0 T ≥ 0.

Assume for the moment k > k0 and that the standard deviations of all random variables

are equal to 1. (Divide each component of P (·) by the standard deviations if the latter

assumption is not fulfilled.) We multiply the whole equation with T /(k − k0) and denote

A1(k, k0, T ) = T k − k0 B1,T k T − P k T − B1,T k₀ T − P k0 T A2(k, k0, T ) = P k T − P k0 T .

Now, we use several observations in order to argue that the asymptotic behavior of

A1(k, k0, T ) can be reduced to the behavior of

X 1≤i<j≤p 1 k − k0 k X t=k0

(Xt,iXt,j − E(Xt,iXt,j)).

This quantity is then arbitrarily small by the law of large numbers for sufficiently large M . The observations are the following:

1. A1(k, k0, T ) can be regarded as the sum of p(p − 1)/2 components and each

com-ponent can be treated separately.

2. For large T , with high probability and uniformly in z ∈ {k/T, k0/T }, all components

of T Pτ (z),1,T and T P0,1∗ (z) have the same sign so that we do not need the absolute

values.

3. For large T , with high probability and uniformly in z ∈ {k/T, k0/T }, the successive

variances in the denominators of the components of T Pτ (z),1,T are equal to their

(29)

4. In the nominators, we have expressions like 1 k − k0 _k X t=k0 Xt,iXt,j ! − (k − k0) T X t=1 Xt,iXt,j ! − Pi,j k T − Pi,j k0 T − 1 k k X t=1 Xt,i k X t=1 Xt,j − 1 k0 k0 X t=1 Xt,i k0 X t=1 Xt,j − (k − k0) 1 T T X t=1 Xt,i 1 T T X t=1 Xt,j !! .

Here, Pi,j(·) are the components of P (·).

5. One can uniformly approximate R₀zg(t)dt, that means,

lim T →∞_z∈[0,1]sup Z z 0 g(t)dt − 1 T [zT ] X t=1 XtXt0 1 = 0.

Then, after some tedious calculations, one sees that A1(k, k0, T ) is a random variable such

that, for all > 0 and all η > 0, there is a M > 0 such that P(|A4(k, k0, T )| > ) < η for

k > k0+ M and T > T0. This means that A4(k, k0, T ) is arbitrarily small whenever T and

M are large. On the other hand, A2(k, k0, T ) does not converge to zero: P _Tk − P k_T0

is a finite sum of linear functions in k with negative slope (see Figure 1 in Galeano

and Wied (2013)) so that it is equal to C _Tk − k0

T for a C < 0 by Taylor’s formula.

Multiplied with T /(k − k0), the expression is equal to C. Then, with large probability,

A1(k, k0, T ) + A2(k, k0, T ) is strictly negative. For k < k0, the argument is similar and

the theorem is proven.

Proof of Theorem 3

Denote Ql1,l2

T := supz∈[l1,l2]A_{η(l1),ξ(l2)}(z) the test statistic calculated from data from η(l1) to

ξ(l2). Now, by Theorem 1, it holds

1

ak

T

Bη(l1),ξ(l2)(z) →p ∞

for any sequence ak_T = o√T

if there is a change point in the interval [l1, l2]. Moreover,

(30)

Conse-quently, the eigenvalues of ˆE_η(l−1/2

1),ξ(l2)(remember that we assume its existence) are bounded

away from zero and the matrix is positively definite. Therefore,

1

ak

T

Ql1,l2

T →p ∞

(with respect to the measure P×) for any sequence ak

T = o

√

T if there is a change

point in the interval [l1, l2]. By Theorem 2, we moreover have

1

ak

T

Qˆzi,ˆzi+1

T →p ∞

(with respect to the measure P×), where ˆzi and ˆzi+1 for i ∈ N0 are two estimated change

points in one of the iterations of the algorithms, as long as there is a change point in the

interval [zi, zi+1]. This follows from the fact, that, by Theorem 2,

Qˆzi,ˆzi+1 T − Q zi,zi+1 T = oP× 1 √ T .

Moreover, with the same argument,

Qzˆi,ˆzi+1

T = OP×(1)

if there is no change point in the interval [zi, zi+1].

With this result, we can proceed as in the proof of Theorem 2 in Galeano and Wied (2013).

Proof of Theorem 4

This follows similary to the Proof of Theorem 3 in Galeano and Wied (2013), again

making use of the fact that a variance estimator like ˆE_η(l−1/2

(31)

Table 1: Relative frequency detection of 0 and more than 0 change points with the scalar BEKK model with a initial nominal significant level of α0= 0.05.

Gaussian Student-t3 T 0 ≥ 1 0 ≥ 1 500 .944 .056 .927 .073 1000 .952 .048 .955 .045 2000 .943 .057 .941 .059 3000 .952 .048 .940 .060 4000 .957 .043 .954 .046

Table 2: Relative frequency detection of 0, 1 and more than 1 change points with the scalar BEKK model with a single change point and with a initial nominal significant level of α0 = 0.05.

z1 = .25 z1 = .50 z1 = .75 T 0 1 ≥ 2 0 1 ≥ 2 0 1 ≥ 2 500 .044 .913 .021 .001 .961 .038 .045 .915 .040 1000 .004 .935 .044 .000 .949 .051 .000 .944 .056 Gaussian 2000 .000 .939 .041 .000 .937 .063 .000 .934 .066 3000 .002 .941 .057 .000 .939 .061 .001 .934 .065 4000 .002 .945 .053 .000 .949 .051 .000 .946 .054 T 0 1 ≥ 2 0 1 ≥ 2 0 1 ≥ 2 500 .558 .426 .016 .346 .630 .024 .629 .365 .006 1000 .255 .715 .030 .051 .909 .040 .271 .705 .024 Student-t3 2000 .020 .922 .058 .002 .936 .062 .018 .925 .057 3000 .001 .949 .050 .001 .940 .059 .003 .942 .055 4000 .000 .941 .059 .001 .944 .057 .003 .941 .056

Table 3: Median and MAD of the change point estimators for the results in Table 2.

z1 = .25 z1 = .50 z1 = .75

Gaussian Student-t3 Gaussian Student-t3 Gaussian Student-t3

T Median

(Mad) Median(Mad) Median(Mad) Median(Mad) Median(Mad) Median(Mad)

500 .282 (.0593) (.1660).342 (.0326).502 (.1097).504 (.0593).724 (.1957).628 1000 .257 (.0237) (.1008).303 (.0163).500 (.0607).502 (.0222).746 (.0830).711 2000 .253 (.0118) .270 (.0500) .500 (.0088) .503 (.0300) .748 (.0107) .740 (.0429) 3000 .252 (.0074) (.0286).259 (.0064).500 (.0227).501 (.0074).748 (.0289).745 4000 .251 (.0063) (.0203).254 (.0048).500 (.0187).501 (.0063).749 (.0196).746

(32)

Table 4: Relative frequency detection of 0, 1, 2 and more than 2 change points with the scalar BEKK model with two change points and with a initial nominal significant level of α0= 0.05.

(z1, z2) = (.35, .7) Gaussian Student-t3 T 0 1 2 ≥ 3 0 1 2 ≥ 3 500 .604 .076 .305 .015 .856 .125 .019 .000 1000 .095 .006 .836 .063 .730 .144 .121 .005 2000 .000 .000 .927 .069 .369 .053 .527 .051 3000 .000 .004 .937 .063 .145 .014 .779 .062 4000 .000 .001 .942 .057 .031 .004 .903 .062

Table 5: Median and MAD of the change point estimators for the results in Table 4.

(z1, z2) = (.35, .7)

Gaussian Student-t3

T Mean

(Mad) (bz1) Mean(Mad) (bz2) Mean(Mad) (zb1) Mean(Mad) (zb2)

500 .348 (.0296) (.0326).700 (.0652).348 (.0593).728 1000 .351 (.0222) (.0177).699 (.0489).351 (.0429).702 2000 .350 (.0096) (.0096).699 (.0303).353 (.0363).701 3000 .350 (.0069) (.0064).700 (.0247).351 (.0207).700 4000 .350 (.0044) .700 (.0048) .351 (.0177) .700 (.0170)

Table 6: Iterations taken by the procedure in the real data example, (*) means statistically significant change point. The initial nominal significant level is α0= 0.05.

Step 1

Interval A Change point Time point Date

[1, 1414] 6.3280 (*) 0.3132 443 September 11, 2008

Step 2

[1, 443] 4.8159 (*) 0.0947 134 July 6, 2007 [444, 1414] 2.1415 0.8437 1193 July 28, 2011 [1, 134] 4.2863 0.0827 117 July 13, 2007 [135, 443] 3.6897 0.2220 314 March 4, 2008 [444, 1414] 2.1415 0.8437 1193 July 28, 2011 Step 3

[1, 443] 4.7438 (*) 0.0947 134 July 6, 2007

(33)

Table 7: Correlation matrices in each period.

Period Empirical correlation matrix

First     1 0.1564 0.3275 0.4917 0.1564 1 0.0708 0.2261 0.3275 0.0708 1 0.3604 0.4917 0.2261 0.3604 1     Second     1 0.3907 0.5148 0.5677 0.3907 1 0.4352 0.4924 0.5148 0.4352 1 0.5110 0.5677 0.4924 0.5110 1     Third     1 0.5990 0.6924 0.6986 0.5990 1 0.5159 0.5168 0.6924 0.5159 1 0.7857 0.6986 0.5168 0.7857 1    

(34)

Figure 1: Histogram of 100000 generated values from the asymptotic distribution of the A1,T

statistic with a kernel density estimate

Density 2 3 4 5 6 7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Figure 2: Daily simple returns of Total, Sanofi, Siemens and BASF

2007 2008 2009 2010 2011 2012 −0.10 −0.05 0.00 0.05 0.10 Total Time Retur ns 2007 2008 2009 2010 2011 2012 −0.10 0.00 0.05 0.10 Sanofi Time Retur ns 2007 2008 2009 2010 2011 2012 −0.15 −0.05 0.05 0.15 Siemens Time Retur ns 2007 2008 2009 2010 2011 2012 −0.10 0.00 0.05 0.10 BASF Time Retur ns

(35)

Figure 3: Rolling correlations for the daily simple returns of Total, Sanofi, Siemens and BASF 2007 2008 2009 2010 2011 2012 0.0 0.2 0.4 0.6 0.8 1.0

Total and Sanofi

Time Correlations 2007 2008 2009 2010 2011 2012 0.0 0.2 0.4 0.6 0.8 1.0

Total and Siemens

Total and BASF

Sanofi and Siemens

Sanofi and BASF

Siemens and BASF

Time

(36)

(37)

(38)