Graphical Models in Time Series Analysis

(1)

in Time Series Analysis

Michael Eichler

INAUGURAL-DISSERTATION

zur

Erlangung der Doktorwürde der Naturwissenschaftlich-Mathematischen Gesamtfakultät der Ruprecht-Karls-Universität Heidelberg 1999

(2)

(3)

1 Introduction 1

1.1 Outline of the thesis . . . 2

2 Graphical models for time series 4 2.1 Conditional correlation graphs . . . 4

2.2 Causality graphs . . . 9 2.2.1 Definition . . . 9 2.2.2 Markov properties . . . 12 2.2.3 Concluding remarks . . . 18 3 Nonparametric analysis 20 3.1 Introduction . . . 21

3.2 Testing for interrelation . . . 22

3.2.1 Asymptotic null distribution . . . 24

3.2.2 Asymptotic local power of the test . . . 37

3.2.3 Finite sample performance . . . 40

3.3 Time domain analysis . . . 42

3.3.1 Partial correlation functions . . . 42

3.3.2 Empirical partial spectral processes . . . 45

3.3.3 Interrelation analysis in the time domain: An example . . . 52

4 Selection of graphical interaction models 57 4.1 Model fitting . . . 57

4.2 Asymptotical efficiency of a model selection . . . 63

4.3 Asymptotically efficient model selection . . . 71

(4)

5 Selection of causal graphical models 79

5.1 Introduction . . . 79

5.2 Asymptotic properties of the final prediction error . . . 84

5.3 Asymptotically efficient model selection . . . 88

5.3.1 Asymptotic efficiency of CT(p, G) . . . 89

5.3.2 Other model selection criteria . . . 93

5.3.3 Model selection with estimated Σ . . . 98

5.4 Proofs and auxiliary results . . . 100

Appendix 110 A Properties of L-Functions . . . 110

B Matrices and Norms . . . 111

(5)

Introduction

The origins of graphical models can be traced back to the beginning of the century when Gibbs (1902) used local neighbourhood relationships to describe the interactions in large systems of particles. Another source has been in genetics, where Wright (1921, 1934) developed the so-called path analysis for the study of hereditary properties, linking par-ents and children graphically by arrows. In statistics, Bartlett (1935) studied the notion of interaction in three-way contingency tables and arrived at a similar description of in-teraction as in statistical physics. But it has been only within the last decades that the similarities between these different methods and also newer developments have been recognized (e.g. Darroch et al., 1980; Wermuth, 1976), which then has led to the uni-fied theory of graphical models for multivariate data. Since then graphical models have become increasingly popular.

Graphical models allow to describe and manipulate conditional independence relations between variables in multivariate data. These relations are typically visualized by undi-rected graphs where the vertices represent the variables and edges between the vertices indicate conditional dependence. In situations where one variable is regarded as response and the other as an explanatory variable, directed acyclic graphs and chain graphs can be used to incorporate such hypotheses into the statistical model (e.g. Wermuth and Lau-ritzen, 1990). Recently, directed acyclic graphs have also been used for the interpretation, conjecture and discovery of causal relations between variables (e.g. Pearl, 1995; Spirtes et al., 1993).

In the analysis of stationary time series, Brillinger (1996) and independently Dahlhaus (1996) have introduced graphical models as a tool for visualizing the interaction structure between components of a multivariate process. Their approach is based on partialized frequency domain statistics like the partial spectral coherence which are obtained by re-moving from two fixed component processes the linear effects of all other components. If the partial spectral coherence vanishes at all frequencies, the components are condition-ally uncorrelated given all other components. This leads to the undirected conditional correlation graph, where each component of the process is represented by one vertex in the graph. In Dahlhaus et al. (1997), conditional correlation graphs for stochastic point processes are used to detect synaptic connections in neural networks, which also includes

(6)

the identification of the direction of connections.

When concerned with data, the true conditional correlation graph of the process must be estimated. Dahlhaus (1996) and Dahlhaus et al. (1997) suggest to estimate the partial spectral coherence nonparametrically e.g. by use of spectral kernel estimates. The missing edges in the graph are then determined by a series of tests. Alternatively, we could also think of fitting a parametric model to the data. The estimation of the conditional correlation graph then becomes a problem of model selection. Here the best approximating graph is determined by minimizing an appropriate model selection criterion.

While the extension of undirected conditional independence (or correlation) graphs for multivariate data to the time series case is straightforward, an appropriate definition of directed graphs for time series seems to be much harder to obtain. The reason for this is that time should play the key role in the definition of directed edges when concerned with time series. One possible approach has been presented by Lynggaard and Walther (1993), who have used chain graphs for conditional Gaussian distributions to model the dependence structure of a time series. This approach leads to graphs where each vertex represents only one component at a fixed time.

The major problem in the definition of directed graphs is the meaning of the direction. In the case of directed acyclic graphs or chain graphs for multivariate data, the directions of the edges are normally determined by prior information or research hypotheses. This concept can also be applied to time series, but does not take advantage of the fact that the data are measured over time. Instead, we can base our notion of direction on one of the several definitions of causality, which have been proposed for time series (Granger, 1969; Sims, 1972; Pierce and Haugh, 1977). Here, the approach of Lynggaard and Walther has the disadvantage, that it does not correspond to any of these concepts for causality.

1.1 Outline of the thesis

In this thesis, we will discuss both undirected and directed graphs for time series. First in Section 2.2, we will introduce a new class of directed graphs, which are based on the concept of Granger-causality. In these causality graphs, each vertex represents one com-ponent process as in conditional correlation graphs. Unlike the approach by Lynggaard and Walther, these graphs do not fit into the usual framework of directed acyclic graphs or chain graphs. In particular, causality graphs allow more than one edge between two vertices. We investigate the properties of these new graphs and show that they are related to conditional correlation graphs.

For the prediction of a process from sampled values it seems natural to use only those variables which have a substantial influence or, in other words, are causal for a component. We therefore consider in Chapter 5 the problem of selecting the best fitting autoregressive model under constraints due to some causality graph, where we measure the approximation by the final prediction error (Akaike, 1969, 1970). This leads to the concept of asymptotically efficient model selection as it has been considered by Shibata (1980), Taniguchi (1980), and several other authors. As in the paper by Shibata, we derive an asymptotic lower bound for the final prediction error and prove the asymptotic efficiency of the AIC.

(7)

In Chapter 3 and 4, we discuss the estimation of the conditional correlation graphs of a process, starting with the the nonparametric approach by Dahlhaus (1996) and Dahlhaus et al. (1997). In Section 3.2 we consider the problem of testing for the presence of an edge in the graph and present a new test based on the integrated partial spectral coherence. In simulations we show that the new test performs better than the one suggested by Dahlhaus et al. (1997).

In the second part of Chapter 3, we briefly discuss the problem of identification of synaptic connections in neural networks. In the analysis of neurophysiological data, cor-relation analysis is still an important, widely used tool (e.g. Melssen and Epping, 1987). The advantage of such time domain based methods compared with the frequency ap-proach is the interpretability of the curves, which yield information about the direction and the type of connections. We propose a new time domain statistic which combines the advantages of time domain analysis and the interrelation analysis suggested by Rosenberg et al. (1989) and Dahlhaus et al. (1997). We prove a functional central limit theorem for the new statistic.

Finally in Chapter 4, we consider the model selection problem for conditional corre-lation graphs. Again we restrict ourselves to the case of autoregressive models, which we parametrize in terms of the inverse covariances of the process. For a given graph, this parametrization will lead to simple constraints on the parameters. We show that the Whittle estimate (cf. Whittle, 1953, 1954) then is determined by equations similar to those for the maximum likelihood estimates in Gaussian graphical models. We derive an asymptotic lower bound for the Kullback-Leibler distance and prove the asymptotic efficiency of the corresponding version of the AIC.

The Appendix summarizes a few results about L-functions and matrix norms. We note that throughout this thesis, the standard norm for matrices and vectors will be the operator norm and the Euclidian norm, respectively.

In conclusion I would like to thank all those who generously contributed to this work: My supervisor Professor Rainer Dahlhaus for his suggestions and his support, Wolfgang, Michael, and Martin for many helpful discussions and careful proof-reading, and Karin for her kindness and constant encouragement.

(8)

Graphical models for time series

Graphical models for stationary time series have been first discussed by Brillinger (1996) and Dahlhaus (1996) as a tool for visualizing the interaction structure between the compo-nents of a multivariate process. It has been shown that undirected graphs for multivariate data can be generalized to the time series case, leading to the concept of conditional cor-relation graphs where each component of the process is represented by one vertex in the graph. For directed graphs, no similar theory exists for time series since unlike in the case of multivariate data time should play a key role in the definition of such directed graphs, and therefore new concepts have to be developed.

This chapter splits into two parts. In the first section, we introduce the basic concepts of conditional correlation graphs for weakly stationary time series. As an example for a graphical model, we consider graphical autoregressive models with given conditional correlation graph.

In the second part of the chapter, we use the concept of Granger-causality to define semi-directed graphs for time series where arrows between the vertices indicate causation and lines represent conditional contemporaneous correlation. We investigate the proper-ties of these causality graphs and their relation to conditional correlation graphs.

2.1 Conditional correlation graphs

LetL2(Ω,A,P) be the Hilbert space of all square integrable random variables on some probability space (Ω,A,P). We consider countable sets _{Xi}i∈I, {Yj}j∈J, and {Zk}k∈K

of random variables in L2_(Ω_,_A_,_P_{). Let sp}_{_X

i, i ∈ I} be the closed linear subspace in

L2_(Ω_,_A_,_P_{) generated by}

{Xi}. Then putting M = sp{1, Zk, k ∈Z} we say that {Xi}

and _{Yj}are conditional orthogonal given {Zk} if the projections of{Xi} and {Yj}onto

the orthogonal complement ofM are orthogonal. In the following we adapt the notation of Dawid (1979) for conditional independence to denote conditional orthogonality. With this notation we have

(9)

where P_M is the orthogonal projection onto M. Since P_MX is the best linear predictor of X in M, this implies that Xi and Yj are uncorrelated for all i ∈ I and j ∈ J after

removing the linear effects of _{Zk_}.

The relationX Y _|Z has the following properties wheref(X) is a linear functional. (C1) if X Y _|Z then Y X_|Z;

(C2) if X Y _|Z then f(X) Y _|Z; (C3) if X Y _|Z then X Y _|(Z, f(X));

Property (C5), however, does not hold universally, but only under additional assumptions. For multivariate processes, such a condition can be formulated in terms of the spectral matrix of the process. Suppose that _{Xa(t)}, a= 1, . . . , d is a weakly stationary process

with spectral matrix f(λ). Then (C5) holds if all eigenvalues of f(λ) are positive and bounded, i.e. there exist constants c1, c2 with 0 < c1 ≤ c2 < ∞ such that f(λ) satisfies

the following boundedness condition

c11d ≤f(λ)≤c21d, ∀λ∈[−π, π]. (2.1.1)

Here, the matrix inequality A_≤B means thatB ₋A is non-negative definite.

In the following we consider simple undirected graphs G = (V, E), where V denotes the set of vertices and E _{⊆ {}(i, j) _∈ V _×V_|i ₆= j_} the set of edges. For simplicity we assume that (i, j)_∈E if (j, i)_∈E. We can now define the conditional correlation graph of a weakly stationary multivariate process _{X(t), t_∈Z_}.

Definition 2.1.1 (Conditional correlation graph) Let _{X(t), t _∈ Z_} be a d vector-valued weakly stationary stochastic process. Then the conditional correlation graph of

{X(t)_}is the simple undirected graphG= (V, E) with verticesV =_{1, . . . , d_}and edges

E such that

(i, j)_∈/E _{⇔ {}Xj(t)_} _{Xi(t)_{} | {}XV\{i,j}(t)}

for all i₆=j _∈V.

Using (C1) to (C5), we can now derive similar properties as for conditional inde-pendence graphs for multivariate data. In particular, we obtain the following important separation theorem (Dahlhaus, 1996). Consider a fixed graph G = (V, E). For distinct subsets A, B, and S of V we say that S separates A and B if there exists no sequence (vi, vi−1), i = 1, . . . , k, of edges in E such that v0 ∈ A, vk ∈ B, and vi ∈/ S for all i= 1, . . . , k₋1, that is there is no way fromA toB which does not contain at least one element of S.

Proposition 2.1.2 (Separation theorem) Let _{X(t), t_∈Z_} be a vector-valued weakly stationary process such that condition (2.1.1) holds. Further let G= (V, E) be the condi-tional correlation graph of _{X(t)_} and suppose that A, B, and S are distinct subsets of V. Then S separates A and B in G if and only if

(10)

Consider now fixed components _{Xa(t)} and {Xb(t)} and let Cab = V\{a, b}. Then

the orthogonal projection ofXa(t) onto sp{1,{XCab(t)}t∈Z}is obtained by minimizing

EXa(t)−µa− P s∈Z P c∈Cab φc(t−s)Xc(s) 2 .

This leads to the partial error process

εa|Cab(t) =Xa(t)−µ∗a− P s∈Z P c∈Cab φ∗_c(t₋s)Xc(s),

where ˆφ∗_Cab(λ) = faCab(λ)fCabCab−1 (λ) and µa∗ =EXa(t)−φˆ∗Cab(0)EXCab(t) are the optimal

values and ˆφ∗_Cab is the Fourier transform of the filterφ∗_Cab (cf. Brillinger, 1981). Then if the processes _{Xa(t)} and {Xb(t)} are conditionally orthogonal given all other components

{XCab(t)}, it follows that

corrεa|Cab(t), εb|Cab(t+u) = 0

for all t, u_∈Z. Equivalently, we obtain in the frequency domain

fab|Cab(λ) = fεa|Cabεb|Cab(λ) = 0

for all frequenciesλ_∈[₋π, π], wherefεa|Cabεb|Cab(λ) denotes the cross-spectrum of the two

partial error processes. We call fab_|Cab(λ) the partial spectrum of {Xa(t)} and {Xb(t)}

given _{XCab(t)}. It follows from the form of ˆφ∗Cab that the partial spectrum is given by fab|Cab(λ) =fab(λ)−faCab(λ)fCabCab(λ)−1fCabb(λ). (2.1.2)

For an analysis of the interaction structure, the partial spectrum is typically standardized, which leads to the partial spectral coherence,

Rab|Cab(λ) =

fab|Cab(λ) faa|Cab(λ)fbb|Cab(λ)

1/2. (2.1.3)

The quantity _|Rab|Cab(λ)|2 is bounded between 0 and 1 with 0 if {Xa(t)}and {Xb(t)} are

conditionally orthogonal given_{XCab(t)}while the value 1 indicates perfect linear relation

between the partialized variables.

The next lemma due to Dahlhaus (1996) states an important relation between the partial spectral coherences and the inverse of the spectral matrix. It allows an efficient computation of all frequency domain statistics needed for estimating the conditional cor-relation graph.

Proposition 2.1.3 Suppose that _{X(t), t_∈Z_} is a vector-valued weakly stationary pro-cess such that condition (2.1.1) holds. Then if g(λ) denotes the inverse spectral matrix, we have faa|Ca(λ) = 1 gaa(λ) , (i)

(11)

Rab|Cab(λ) = − gab(λ) p gaa(λ)gbb(λ) , (ii) faa|Cab(λ) = faa|Ca(λ) 1_{− |}Rab|Cab(λ)|2 , (iii) fab|Cab(λ) = Rab_|Cab(λ) p faa_|Ca(λ)fbb|Cb(λ) 1_{− |}Rab|Cab(λ)|2 . (iv)

Proof. (i) and (ii) have been proved by Dahlhaus (1996). From the inverse variance

lemma (e.g. Whittaker, 1990, Proposition 5.7.3) it further follows that

gaa(λ) =

fbb|Cab(λ)

faa|Cab(λ)fbb|Cab(λ)−fab|Cab(λ)fba|Cab(λ) ,

which together with (i) and the definition of Rab_|Cab(λ) completes the proof.

The proposition does not only provide an efficient method of computing the partialized frequency domain statistics, but also allows a new characterization of conditional orthog-onality and thus of the missing edges of the conditional correlation graph. For this, let

R = R(u₋v)_u,v

∈Z withR(u) = E X(t)X(t+u) 0

be the infinite dimensional covariance matrix of _{X(t)_}. Then we denote by R(i) the inverse of the covariance matrix. It then follows that R(i)₍_u_{) is the Fourier transform of the inverse spectral matrix (e.g. Shaman}

1975, 1976), that is R(i)_ab(u) = 1 4π Z Π f_ab−1(λ) exp(iλu)dλ. (2.1.4)

Proposition 2.1.4 Let _{XV(t), t∈Z} be a vector-valued weakly stationary process such that condition (2.1.1) holds. Then the following two statements are equivalent:

(i) Xa(s) Xb(s0)| {XV\{a,b}(t), t∈Z}, ∀s, s0 ∈Z;

(ii) Xa(s) Xb(s0)| {XV(t), t ∈Z}\{Xa(s), Xb(s0)}, ∀s, s0 ∈Z.

Proof. It follows directly from the Proposition 2.1.3 that f_ab_|_Cab(λ) = 0 if and only

if f_ab−1(λ) = 0 for all frequencies λ _∈ [₋π, π]. Since by (2.1.4) the latter is equivalent to R(i)_ab(u) = 0 for all u _∈ Z, the assertion now follows by application of the

variance-covariance lemma (e.g. Whittaker, 1990, Chapter 5).

As an example for graphical models for time series we consider autoregressive pro-cesses under the restrictions of a conditional correlation graph. Here, we can use the last proposition to reparametrize the model in order to obtain feasible parameter constraints.

Example 2.1.5 Let_{X(t), t _∈Z_}be advector-valued stationary autoregressive process of order p, X(t) = p P h=1 A(h)X(t₋h) +ε(t),

(12)

where A(h) are d_×d matrices and ε(t) are independent and identically normally dis-tributed with mean E(ε(t)) = 0 and covariance matrix E ε(t)ε(t)0 = Σ of full rank d. We further assume that the spectral matrix of _{X(t)_} exists and satisfies the bounded-ness condition (2.1.1). Then the inverse spectral matrixf−1(λ) of_{X(t)_}is (cf. Dahlhaus, 1996)

f−1(λ) = 2πA eiλ0ΨA e−iλ, (2.1.5) where Ψ = Σ−1 and A(z) =1d−A(1)z−. . .−A(p)zp is the characteristic polynomial of

the process.

Now suppose that _{X(t)_} has conditional correlation graph G= (V, E). Then for all (i, j)_∈/E it follows that d X k,l=1 p X u,v=0

ΨklAki(u)Alj(v) exp(iλ(v₋u)) = 0.

In the special case where Σ =σ2₁

d, we still have 2π σ2 d X k=1 p X u,v=0 Aki(u)Akj(v) exp(iλ(v−u)) = 0,

which yields the following 2p+ 1 restrictions on the parameters

d X k=1 p X u=0 Aki(u)Akj(u+h) = 0 ∀h∈ {−p, . . . , p},

whereA(0) =₋1d and A(u) = 0 ifu <0 or u > p. It is clear from the above expressions

that it is difficult to work with these restrictions especially if more than one edge is missing from the graph.

We suggest another approach to the problem. It is well known, that for autoregressive models of order p the inverse covariances R(i)₍_u_{) vanish for} _|_u_|_{> p} _{(e.g. Bhansali, 1980;}

Battaglia, 1984). Because of the uniqueness of the factorization in (2.1.5) (cf. Masani, 1966), an AR(p) process is also determined by the set of inverse covariances

θ= vech(R(i)(0))0,vec(R(i)(1))0, . . . ,vec(R(i)(p))00,

where the vech operator stacks only the elements contained in the lower triangular sub-matrix. Thenθ again consists of pd2 +d(d+ 1)/2 parameters.

The restrictions imposed on the parameters by the conditional correlation graph G

have now a simple formulation in terms of the inverse covariancesR(i)(u),

R(i)_ij(u) =R(i)_ji(u) = 0 _∀(i, j)_∈/E.

Thus, we can parametrize a graphical autoregressive model of order p and graph G by a parameter vectorθ _∈Rk(p,G) _with _k₍_{p, G}_{) = (2}_p_{+ 1)}_|_E_|_{+ (}_p_{+ 1)}_d_{, where}_|_E_| _{denotes the}

(13)

Remark 2.1.6 The definition of conditional correlation graphs can also be generalized to the case of time-continuous stochastic processes such as point processes, which we will briefly discuss in Section 3.3. Consider a random (signed) measure µ on R. Then we replace in the definition of conditional orthogonality the space sp_{Xi, i∈I}by the closed

subspace Mµ generated by the set

nZ

R

φ(t)dµ(t)

φ :R→R is continuous with bounded support o

.

Then two random measures µ1 and µ2 are conditionally orthogonal given {ν1, . . . , νd} if

X₋P_M_{₁_,ν

1,... ,νd}X ⊥Y −PM{1,ν1,... ,νd}Y for all X _∈Mµ1 and Y ∈Mµ2

2.2 Causality graphs

In the analysis of multivariate data, graphical models based on directed acyclic graphs or chain graphs have been used to model and detect causal effects between the variables under study (e.g. Wermuth and Lauritzen, 1990; Pearl, 1995). This approach for detecting causation relies on research hypotheses which define an ordering of the variables. Thus it is assumed that we already know the possible directions for causation. Without such prior information it becomes unclear how causation can be inferred from patterns of association. In the case of stochastic time series, the ordering of the variables in time provides a natural basis for the definition of causality. The most frequently used concept of causality has been introduced by Granger (1969). Here, one process _{X(t)_}is said to be causal for another process_{Y(t)_}if the prediction ofY(t) using all relevant information available at time t₋1 apart from _{X(t)_}can be improved by adding the available information about

{X(t)_}.

In this section, we introduce a new class of graphs which visualize the causal rela-tionships between the components of multivariate stationary time series. In these graphs, the vertices are connected by arrows and lines corresponding to the presence of Granger-causality and instantaneous Granger-causality, respectively. We investigate the Markov properties of these graphs and show their relation to the conditional correlation graphs as defined in the previous section.

2.2.1 Definition

The original definition of causality by Granger (1969) has been formulated in terms of mean squared prediction error. Here, we will adapt a slightly weaker definition in terms of conditional orthogonality in the Hilbert space of square integrable random variables, which is due to Hosaya (1977) and Florens and Mouchart (1985).

Consider two weakly stationary stochastic processes _{X(t)_} and _{Y(t)_} on a proba-bility space (Ω,A,P). In this section we denote by ¯X(t) = _{X(s), s < t_} the set of all past values of _{X(t)_} at timet. Further, we set ¯X¯(t) =_{X(s), s_≤t_} and define A(t) as the set of all relevant information accumulated since time t₋1.

(14)

Definition 2.2.1 (Linear causality) _{Y(t)_}is not (linearly) causal for_{X(t)_}(relative to_{A(t)_}) if and only if

X(t) Y¯(t)_|A¯(t)_\Y¯(t).

Further, there is no instantaneous (linear) causality between_{X(t)_} and_{Y(t)_} (relative to_{A(t)_}) if and only if

X(t) Y(t)_|A¯¯(t)_\{X(t), Y(t)_}.

We note that in the literature instantaneous causality has been defined without condi-tioning on the present of_{A(t)_}. However, in the framework of graphical models the above definition where instantaneous noncausality corresponds to conditional noncorrelation of the increments of the process is more appropriate.

In a multivariate setting there are different causality patterns like direct and indirect causality, feedback, or spurious causality possible (cf. Hsiao, 1982). These patterns can be visualized by graphs where arrows between vertices correspond to causality and lines correspond to instantaneous causality. We therefore consider mixed graphs (graphs with both directed and undirected edges)G= (V, Ed_{, E}u_{) where}_V _{is the set of vertices,}

Ed _{⊆ {}₍_{u, v}₎_∈_V_×_V_|_u₆₌_v_}_{is the set of directed edges, and}_Eu_{⊆ {}₍_{u, v}₎_∈_V_×_V_|_u₆₌_v_}

is the set of undirected edges. For simplicity we assume that if (i, j) _∈ Eu then also (j, i)_∈Eu_.

Definition 2.2.2 (Causality graph) Let _{X(t), t _∈ Z_} be a d vector-valued weakly stationary stochastic process. Then the (linear) causality graph of _{X(t)_} (relative to

{A(t)_}) is the graph G = (V, Ed_{, E}u_{) with vertices} _V ₌_{₁_{, . . . , d}_} _{which satisfies for all}

i₆=j _∈V the conditions

(i, j)_∈/ Ed _⇔ Xj(t) X¯i(t)|A¯(t)\X¯i(t) (2.2.1)

and

(i, j)_∈/ Eu _⇔ Xj(t) Xi(t)_|A¯¯(t)_\{Xi(t), Xj(t)_}. (2.2.2) The definition of a causality graph clearly depends on the information set A(t). Here, we will be concerned only with multivariate processes and always use A(t) = X(t). How-ever, it has been noted before (Granger, 1969; Hsiao, 1982) that omittance of confounding variables can lead to spurious causality. If the causal relations between the measured ables and the confounding variables are known, it is possible to determine the set of vari-ables needed to be included into the analysis in order to be able to assess the direct causal relation between two specific variables (cf. Pearl, 1995). In general, when the causality graph is unknown and constructed from the data, such methods are not available and spurious causality cannot be ruled out. We will not discuss this any deeper.

Example 2.2.3 (AR-processes) Let _{X(t), t _∈ Z_} be a d vector-valued stationary autoregressive process of orderp,

X(t) =

p

X

j=1

(15)

5 1

2

3

4

Figure 2.2.1: Causality graph for the autoregressive process in Example 2.2.3.

whereA(i) ared_×dmatrices andεare independent and identically distributed errors with mean zero and covariance matrix Σ. Then it is well known (cf. Tjøstheim, 1981; Hsiao, 1982) that _{Xi(t)_} does not cause _{Xj(t)_} if and only if the corresponding components

Aji(h) vanish for all h, that is

Xj(t) X¯i(t)|X¯V\{i}(t) ⇔ Aji(h) = 0 ∀h∈ {1, . . . , p}.

Further, _{Xi(t)} and {Xj(t)} are not instantaneously causal if and only if the

corre-sponding error components εi(t) and εj(t) are conditionally orthogonal given εV_\{i,j}(t).

Thus, instantaneous causality can be expressed in terms of the inverse covariance matrix Ψ = Σ−1. We have the characterization

Xi(t) Xj(t)_|XV¯ (t), XV\{i,j}(t) ⇔ εi(t) εj(t)|εV\{i,j}(t) ⇔ ψij =ψji = 0.

As an example, we consider the following AR(1)-process

X(t) = A X(t₋1) +ε(t) with parameters A=       a1,1 0 0 0 0 0 a2,2 a2,3 0 0 a3,1 a3,2 a3,3 a3,4 0 0 0 0 a4,4 a4,5 0 0 a5,3 0 a5,5       and Σ−1 =       ψ1,1 ψ1,2 ψ1,3 0 0 ψ2,1 ψ2,2 ψ2,3 0 0 ψ3,1 ψ3,2 ψ3,3 ψ3,4 0 0 0 ψ4,3 ψ4,4 0 0 0 0 0 ψ5,5       .

The causality graph for this process is shown in Figure 2.2.1. From this graph, we can see that e.g. _{X1(t)} does not cause {X5(t)} relative to the full process. However, a

more intuitive interpretation of the graph suggests that _{X1(t)} causes {X5(t)} only via

{X3(t)}, that is

(16)

In the following section we investigate the properties of causality graphs and show that such an intuitive interpretation is indeed possible within the framework of causality graphs.

2.2.2 Markov properties

In this section, we investigate the properties of causality graphs. We start with a version of the block independence lemma (cf. Whittaker, 1990) which allows us to consider blocks of variables and derive causality relations between these blocks.

Lemma 2.2.4 Let _{X(t), t_∈ Z_} be a vector-valued weakly stationary process which sat-isfies condition (2.1.1). Then we have for I1, I2, J ⊆V and I =I1∪I2

(i) XJ(t) X¯Ir(t)|X¯V\Ir(t), r = 1,2 ⇔ XJ(t) X¯I(t)|X¯V\I(t),

(ii) XIr(t) X¯J(t)|X¯V\J(t), r= 1,2 ⇔ XI(t) X¯J(t)|X¯V\J(t),

(iii) XIr(t) XJ(t)|X¯(t), XV\(Ir∪J)(t), r= 1,2 ⇔ XI(t) XJ(t)|X¯(t), XV\(I∪J)(t). Proof. For (i) and (iii) this is an immediate consequence of (C5). For (ii) this follows

from the linearity of the covariance in each argument.

The next theorem focuses on the relation between causality and conditional orthogo-nality of the processes as used in the previous section. For this, we first need some more notation from graph theory. LetG= (V, Ed, Eu) be a mixed graph. If two verticesv and

uare connected by an edge in Eu _{we call} _v _and _u _{neighbours. The set of neighbours of} _v

will be denoted by ne(v). Further if there exists an edge (v, u) _∈ Ed _then _v _{is a parent}

of u and u is a child of v. The corresponding sets of children and parents of a vertex v

are denoted by ch(v) and pa(v), respectively. Additionally we define pa(v) = pa(v)_{∪ {}v_}

and similarly ne(v) and ch(v).

Theorem 2.2.5 Let_{X(t), t_∈Z_} be a vector-valued weakly stationary stochastic process with causality graph G = (V, Ed_{, E}u₎_{. Further suppose that} _{_X₍_t₎_} _{satisfies condition}

(2.1.1). If for i₆=j _∈V the causality graph satisfies the following conditions

(i) i /_∈ch(j) and j /_∈ch(i),

(ii) i /_∈ne(j),

(iii) ne(i)_∩ch(j) =_∅ and ne(j)_∩ch(i) =_∅,

(iv) ne(v)_∩ch(j) =_{∅ ∀}v _∈ch(i),

then the processes _{Xi(t)} and {Xj(t)} satisfy

{Xi(t), t_∈Z_} _{Xj(t), t_∈Z_{} | {}XV_\{i,j}(t), t∈Z},

that is the processes _{Xi(t)_} and _{Xj(t)_} are conditionally orthogonal given all other components _{XV\{i,j}(t)}.

(17)

Proof. According to Proposition 2.1.4 it is sufficient to show that for allt, h ∈Z

Xi(t) Xj(t+h)| {XV(t), t∈Z}\{Xi(t), Xj(t+h)}.

We can assume that h_≥0 (otherwise we swap the indicesiand j). First, considerh >0. Since by (i) and (iv) the vertices in ne(j) are not children of i, it follows from Lemma 2.2.4 (ii) that

Xne(j)(t+h) Xi(t)|X¯V(t+h)\{Xi(t)}

which implies

Xj(t+h) Xi(t)|X¯V(t+h)\{Xi(t)}, Xne(j)(t+h).

Since we have further by the Lemma 2.2.4

Xj(t+h) XV\ne(j)(t+h)|X¯V(t+h), Xne(j)(t+h),

we obtain by (C4)

Xj(t+h) Xi(t)|X¯V(t+h+ 1)\{Xi(t), Xj(t+h)}.

In the case h= 0, this follows directly from (ii). Now assume that

Xj(t+h) Xi(t)|X¯V(s)\{Xi(t), Xj(t+h)} (2.2.3)

for some s > t+h. Let K = V_\(ch(i)_∪ch(j)). By condition (iv) _{Xk(t)} and {Xl(t)}

are not instantaneously causal for all k_∈ch(i) and l _∈ch(j), and we get by Lemma 2.2.4 (iii)

X_ch(_i₎(s) X_ch(_j₎(s)_|X¯V(s), XK(s). (2.2.4)

Since ch(j) and K_∪ch(i) are disjunct, we have

Xj(t+h) Xch(i)∪K(s)|X¯V(s)\{Xj(t+h)}

and thus by (2.2.3)

Xj(t+h) Xi(t)|X¯V(s)\{Xj(t+h)}, Xch(i)∪K(s). (2.2.5)

On the other hand, we similarly obtain

Xi(t) Xch(j)(s)|X¯V(s)\{Xi(t)}, XK(s)

and by (2.2.4)

Xi(t) X_ch(_j₎(s)_|XV¯ (s)_\{Xi(t)_}, X_ch(_i_)∪_K(s).

Together with (2.2.5) this implies

Xj(t+h) Xi(t)_|XV¯ (s+ 1)_\{Xi(t), Xj(t+h)_}.

The assertion of the theorem now follows by induction over s.

The theorem motivates the following definition of a moral graph which differs from the definition in the case of multivariate data.

(18)

Definition 2.2.6 Let G = (V, Ed_{, E}u_{) be the causality graph of a weakly stationary}

process_{XV(t)}. Then the moral graph ofGis the undirected simple graphGm= (V, Em)

such that (i, j)_∈/ Em if and only if conditions (i) to (iv) in Theorem 2.2.5 hold.

It follows now immediately from Theorem 2.2.5 that the causality graph of a process

{X(t)_} is related to the conditional correlation graph of _{X(t)_}.

Corollary 2.2.7 LetG= (V, Ed_{, E}u₎_{be the causality graph of a weakly stationary process}

{X(t)_} satisfying (2.1.1) and Gm = (V, Em) the corresponding moral graph. Then if Gi_{= (}_{V, E}i₎_{is the conditional correlation graph of the process} _{_X₍_t₎_} _{we have} _Ei_⊆_Em_.

For a subset A of V we define the set of ancestors an(A) of A as the smallest set

B _⊆ V such that A _⊂ B and pa(B) _⊆ B. Further, we denote by Gan(A) the causality

graph of the subprocess_{Xan(A)(t)}.

Definition 2.2.8 (Markov properties) Let _{X(t), t _∈ Z_} be a vector-valued weakly stationary stochastic process. Further, letG= (V, Ed_{, E}u_{) be a mixed graph with directed}

edges Ed and undirected edges Eu.

(i) _{X(t)_} satisfies the causal pairwise Markov property with respect to G, if for all (i, j)_∈/ Ed

Xj(t) X¯i(t)|X¯V\{i}(t)

and for all (i, j)_∈/ Eu

Xj(t) Xi(t)|X¯(t), XV\{i,j}(t).

(ii) _{X(t)_}satisfies the causal local Markov property with respect to G, if for all i_∈V Xi(t) X¯V\pa(i)(t)|X¯pa(i)(t)

and

Xi(t) XV_\ne(i)(t)|X¯(t), Xne(i)(t).

(iii) _{X(t)_} satisfies the causal global Markov property with respect to G, if for all distinct subsetsA, B, S _⊂V such thatS separates A and B in (Gan(A∪B∪S))m, that

is in the moral graph of the ancestral set ofA_∪B _∪S, then

XB(t) X¯A(t)|X¯B∪S(t)

and

XB(t) XA(t)_|XA¯ _∪B∪S(t), XS(t).

The causality graphs have been defined such that the causal pairwise Markov property is satisfied. However, it is the global Markov property which allows the intuitive interpre-tation of the graphs. The next theorem states that under the boundedness condition on the spectral matrix the causal pairwise Markov property already implies the global one.

(19)

Theorem 2.2.9 Let_{X(t), t_∈Z_}be a vector-valued weakly stationary stochastic process. Further, let G= (V, Ed_{, E}u₎ _{be the causality graph for} _{_X₍_t₎_}_{. Then if condition} _(2.1.1)

holds, _{X(t)_} satisfies the causal global Markov property with respect to G.

Proof. Assume thatA, B, andS are distinct subsets such thatS separates A andB in

the moral graph of Gm_an(_A_∪_B_∪_S₎. Let V∗ = an(A_∪B_∪S). By definition of the ancestral set, we have

XV∗(t) X¯_V_\_V∗(t)|X¯_V∗(t). (2.2.6)

Therefore, we get for all (i, j)_∈/ Ed∗ ₌_{₍_{i, j}₎_∈_Ed_|_{i, j} _∈_V∗_}

Xj(t) X¯i(t)|X¯V∗_\{_i_}(t).

Further, let Eu∗ be the set of undirected vertices (i, j) withi, j _∈V∗ such thatiandj are not separated byV∗_\{i, j_}in the undirected graph (V, Eu_{). It then follows for (}_{i, j}₎_∈_/ _Eu∗

from the global Markov property for ordinary graphical models that

Xi(t) Xj(t)|X¯(t), XV∗_\{_i,j_}(t)

and further by (2.2.6)

Xi(t) Xj(t)|X¯V∗(t), X_V∗_\{_i,j_}(t).

Therefore GV∗ = (V∗, Ed∗, Eu∗) is the causality graph of the subprocess X_V∗(t).

Let Em _{be the set of edges in the moral graph} _Gm

V∗. Analogously to the proof of

Theorem 2.2.5 we now find that if (i, j)_∈/ Em then

Xi(t1) Xj(t2)|X¯¯V(t)\{Xi(t1), Xj(t2)}

for anyt1, t2 ≤t. SinceSseparatesAandBinEm, there exists a partition{A, B, S, V1, V2}

of V∗ such that ¯ ¯ XB∪V1(t) ¯ ¯ XA∪V2(t)| ¯ ¯ XS(t). (2.2.7)

This implies the second part of the defined global Markov property. Further there exists a partition _{S1, S2} ofS such that

¯ ¯ XB∪V1(t) XS1(t)| ¯ ¯ XA∪V2∪S2(t),X¯S1(t) and ¯ XA∪V2(t) XS2∪B∪V1(t)|X¯B∪V1∪S(t),

since otherwise there would exist vertices b _∈B_∪V1 and a∈A∪V2 which were married

and thus connected in Em _{in contradiction to (2.2.7). The first relation implies}

¯ ¯

(20)

(b) B S C S C B A A (a)

Figure 2.2.2: Process with direct and indirect feedback: (a) Causality graph and (b) moral graph.

Together with the second relation, we obtain from this ¯

¯

XB∪V1(t), XS2(t) X¯A∪V2(t)|X¯S(t) and finally

XB(t) X¯A(t)|X¯S∪B(t).

This completes the proof.

The separation criterion for noncausality is symmetric in the sets Aand B. Therefore it can only be used to detect noncausality in both directions, that is _{XA(t)} does not

cause _{XB(t)} relative to {XA∪B∪S(t)} and vice versa. When concerned with directed

acyclic graphs or chain graphs, this criterion is sufficient as two variables can only be connected in one direction. In the case of causality graphs, we cannot assume such an ordering of the variables (without loss of generality).

As an example, we consider a process _{X(t)_} with causality graph shown in Figure 2.2.2. The graph suggests that_{XA(t)}causes{XB(t)} or{XC(t)}only via{XS(t)}. On

the other hand, _{XB(t)} and {XC(t)}both have a causal effect on {XA(t)} which is not

mediated by _{XS(t)}. Consequently, S does not separate A from B and C in the moral

graph.

Moral graphs as defined above basically visualize conditional orthogonality of the com-ponent processes for the set of ancestors of the variables under study. For the investigation whether _{XA(t)} causes {XB(t)}, however, we are more interested in the conditional

or-thogonality between the present of _{XB(t)} and the past of {XA(t)}. The problem can

be solved by inserting a new vertex B∗ into the graph, which represents XB(t), while all

other vertices stand for the past (A for ¯XA(t), etc.). By moralizing this extended graph,

we obtain an extended moral graphGm_an(_A_∪_B_∪_B∗_∪_S₎. Figure 2.2.3 shows the extended moral

graphsGm

an(A∪B∪B∗_∪_S₎andGm_an(_A_∪_C_∪_C∗_∪_S₎ for the example in Figure 2.2.2. SinceS∪B

sep-aratesB∗ and A we find that ¯XA(t) and XB(t) are conditional orthogonal given ¯XB∪S(t)

and thus that the process _{XA(t)_} does not cause _{XB(t)_} relative to _{XA_∪B∪S(t)}. For

verticesA and C we obtain a similar result.

The next theorem shows that the extension of causality graphs just described indeed can be used for the identification of unidirectional noncausality. For simplicity we consider only block graphs, where the target variables,XB1(t), . . . , XBk(t) say, have been combined to one vertex B. Otherwise the vertices B₁∗ to B_k∗ need to be connected in line with the rules for obtaining the reduced graphGan(A∪B∪C) applied to the full extended graph with

(21)

* B S B A A (a) (b) B* C S C C

Figure 2.2.3: Extended moral graphs for the causality graph in Fig. 2.2.2: (a) {XA(t)} does not cause

{XB(t)} relative to{XS(t)}; (b){XA(t)}does not cause {XC(t)}relative to{XS(t)}

* 1 2 3 4 5 5

Figure 2.2.4: Extended moral graph for the causality graph in Figure 2.2.1.

Theorem 2.2.10 Let _{X(t), t _∈Z_} be a vector-valued weakly stationary stochastic pro-cess such that condition (2.1.1) holds. Suppose that A, B, and S are distinct subsets of V such that S_∪B separatesB∗ and A in the extended moral graph Gm

an(A∪B∪B∗_∪_S₎. Then

the process _{XA(t)_} does not cause _{XB(t)_} relative to _{XA_∪B∪S(t)}, that is XB(t) X¯A(t)|X¯S∪B(t).

Proof. Let V∗ = an(A∪B ∪S). Clearly the moral graph Gm_an(_A_∪_B_∪_S₎ is a subset of

the extended moral graph. Therefore if two vertices i, j ₆= B∗ are not connected then it follows as in the proof of Theorem 2.2.5 that

¯

Xi(t) X¯j(t)|X¯V∗_\{_i,j_}(t).

Since S_∪B separates B∗ fromA there exists a partition_{A, B, S, V1, V2}of V∗ such that

S_∪B separates V1 and A∪V2 and pa(B)⊆S∪V1. Then

¯

XV1(t) X¯A∪V2(t)|X¯S∪B(t). Since S_∪B now also separates B∗ from A_∪V2, we further get

XB(t) X¯A∪V2(t)|X¯S∪B(t),

from which the assertion of the theorem follows.

It follows now from this result that the intuitive interpretation of the graph in Figure 2.2.1 has been correct. As we can see in Figure 2.2.4, the set _{3,5_} separates 1 and 5∗, which leads to X5(t) X¯1(t)|X¯{3,5}(t).

(22)

2.2.3 Concluding remarks

In this section we have considered linear causality graphs for weakly stationary processes. The question arises whether it is possible to generalize the definition such that nonlinear causal relationships between the processes can be handled.

Remark 2.2.11 Florens and Mouchart (1982) and Bouissou et al. (1986) have also de-fined Granger-causality in terms of conditional independence. While properties (C1) to (C4) still hold if we replace conditional orthogonality by conditional independence, stronger assumptions on the process are needed to guarantee also property (C5). Even with such assumptions, Lemma 2.2.4 (ii) additionally requires that noncausality for single component processes_{Xi(t)}, i∈I,

Xi(t) X¯J(t)|X¯V\J(t) ∀i∈I,

implies noncausality for the joint vector process _{XI(t)}, XI(t) X¯J(t)|X¯V\J(t).

We list three examples for which this condition holds.

(i) _{X(t)_} is a Gaussian process. Then conditional independence corresponds to con-ditional orthogonality for which we have proved Lemma 2.2.4.

(ii) There is no instantaneous causality present between the components of _{X(t)_}. Then we have trivially

XIr(t) XI\Ir(t)|X¯(t)

for all sets I1, I2 ⊆V and I =I1∪I2. Together with the left hand side in Lemma

2.2.4 (ii), this implies

XIr(t) X¯J(t)|X¯V\{J}(t), XI\Ir(t),

from which the block independence follows by (C5). (iii) _{X(t)_}is an autoregressive process of the form

Xi(t) =fi( ¯X(t), εi(t)) ∀i= 1, . . . , K (2.2.8)

wherefi are measurable functions monotone in εi(t) andε(t) X¯(t). Then if Xi(t) X¯j(t)|X¯V\{i}(t)

the function fi( ¯X(t), εi(t)) is constant in ¯Xj(t) almost surely.

To show this we writefxj¯ (y) = fi(¯x, y) for any ¯X(t) = ¯xto denote that we leave all

components of ¯X(t) except ¯xj(t) fixed. Since εi(t) is independent from the past of

{X(t)_}we have

P(Xi(t)≤y|X¯(t) = ¯x) = P(f¯xj(εi(t))≤y|X¯(t) = ¯x)

(23)

where Fεi(t) is the distribution ofεi(t). The conditional independence then implies

that the left hand side does not depend on ¯xj. Consequently f_¯_xj−1(y) =f_¯_x−01

j (y) P−a.s.

for all ¯xj,x¯0j and therefore fxj¯ (y) = f¯x0_j(y).

Since the dependence ofX(t) on the past ¯X(t) is defined for each component sepa-rately the pairwise independence implies the mutual independence.

In this section we have shown that Granger-causality can be used to define directed graphs for weakly stationary time series. These causality graphs can be interpreted in terms of causality and instantaneous causality (or conditional contemporaneous corre-lation), which gives an intuitive meaning the the directed edges in the graph. Another advantage of causality graphs is the relation to autoregressive models which are an impor-tant tool in time series analysis. On the other side the derivation of noncausality relations from the graph seems to be more difficult than e.g. for chain graphs. We note, however, that in absence of instantaneous causality the causality graph of a process can be obtained from the chain graph as defined by Lynggaard and Walther (1993) by forming blocks for each component process.

(24)

Nonparametric analysis

Partial spectral coherences are a well known tool for the nonparametric investigation of functional relationships between the components of multivariate stochastic processes (e.g. Brillinger, 1981; Rosenberg et al., 1989). The results of such an interrelation analysis can be visualized by conditional correlation graphs, which allow an intuitive interpretation of the obtained dependence structure.

As we have seen in the last chapter, the conditional correlation graph of a stochas-tic process is constructed by determining the pairs of components for which the partial spectral coherence is zero at all frequencies. However, when concerned with data, this decision must be based on estimates of the partial spectral coherence which are only approximately zero even if there is no direct interrelation. Therefore tests need to be employed for building the conditional correlation graph from the data.

The first part of this chapter deals with the problem of testing for the presence of an edge in the otherwise complete conditional correlation graph. In Section 3.2, we con-sider the integrated partial spectral coherence as a test statistic and prove its asymptotic normality under the null hypothesis that the edge is missing. We also derive its asymp-totic distribution under a sequence of contiguous alternatives. A simulation study shows that the new test has good power against small deviations from the null hypothesis and performs better than existing global tests.

The frequency domain methods discussed in this chapter have also been used for the analysis of stationary point processes such as neuronal spike trains (e.g. Rosenberg et al., 1989; Dahlhaus et al., 1997). For the identification of synaptic connections in a neuronal net, we are interested not only in the strength of a connection, which is measured by the partial spectral coherence, but also in information about the direction and the type of connection (excitatory or inhibitory). Although the direction can be identified by the help of spectral phase curves (cf. Dahlhaus et al., 1997), frequency domain methods do not allow to distinguish excitatory and inhibitory connections. In Section 3.3, we suggest an extended analysis using partialized time domain statistics. We prove a functional central limit theorem for the new statistics. Examples with simulated data show, that the new statistics allow the correct identification of the type and direction of a connection while retaining information about direct and indirect association between components.

(25)

3.1 Introduction

In this section, we set down the basic definitions concerning the estimation of frequency domain statistics. We consider advector-valued stationary time series_{X(t), t_∈Z_}such that E_|Xa(t)|k <∞ for all a= 1, . . . , d and k ∈N. Then

ca1,... ,ak(u1, . . . , uk−1) = cum

Xa1(t+u1), . . . , Xak−1(t+uk−1), Xak(t) is the kth order cumulant of the process. If a function fa1,... ,ak : Π

k−1 _→ _C _{with the} property ca1,... ,ak(u1, . . . , uk−1) = Z Πk−1 fa1,... ,ak(λ1, . . . , λk−1) exp i k−1 P j=1 ujλj dλ1· · ·dλk−1

exists, where Π = [₋π, π], we call it the kth order cumulant spectrum. We make the following assumptions on _{X(t)_}.

Assumption 3.1.1 _{X(t), t _∈ Z_} is a d vector-valued stationary stochastic process de-fined on a probability space (Ω,A,P).

(i) _{X(t)_} has mean zero and spectral density matrix f(λ) = fij(λ)

i,j=1,... ,d which

satisfies the boundedness condition

a11d≤f(λ)≤a21d ∀λ∈[−π, π]

for constants a1 and a2 with 0< a1 ≤a2 <∞.

(ii) The kth order cumulants of _{X(t)_} satisfy the mixing conditions

X u1,... ,uk−1∈Z 1 +_|uj|2 |ca1,... ,ak(u1, . . . , uk−1)|<∞ for all j = 1, . . . , k₋1.

The nonparametric estimation of the spectral densities fab(λ) is usually based on the

periodogram which has the form

I_ab(T)(λ) = 2πH₂(T)(0)−1d_a(T)(λ)d(_bT)(₋λ), where d(_aT)(λ) = T X t=1 h(T)(t)Xa(t) exp(−iλt)

is the finite Fourier transform of the ath component of the process and

H_k(T)(λ) =

T

X

t=1

h(T)(t)kexp(₋iλt)

are the Fourier transforms of the data taper h(T)(t) =h (t₋ 1₂)/T. The taper function

(26)

should be smooth with h(0) =h(1) = 0 in order to improve the small sample properties of the estimates.

For the estimation of the spectral densities fij(λ) we consider kernel estimates of the form ˆ f_ij(T)(λ) = Z Π w(T)(λ₋α)I_ij(T)(α)dα for a kernel w(T)₍_λ_{) =}_M

Tw(MTλ). We need the following assumptions.

Assumption 3.1.2 The kernel function w(λ) is bounded, symmetric, nonnegative, and Lipschitz continuous with

Z R w(λ)dλ = 1 and Z R λ2w(λ)dλ <_∞.

Furtherw(λ) has continuous Fourier transform ˆw(α) such that

Z R ˆ w(α)2dα <_∞ and Z R ˆ w(α)4dα <_∞.

Assumption 3.1.3 The sequence (MT)T∈N satisfies MT =O(Tβ) with 1₄ < β < 1₂.

It follows from these assumptions that

ˆ f_ab(T)(λ)₋fab(λ) =OP r MT T (3.1.1) and E_fˆ(T) ab (λ) = Z Π fab(α)w(T)(λ−α)dα=fab(λ) +O 1 M2 T (3.1.2)

uniformly inλ_∈[₋π, π] (e.g. Brillinger, 1981, Theorems 7.4.2 and 7.4.4). The conditions onMT guarantee that both the stochastic variation in (3.1.1) and the bias in (3.1.2) tend to zero sufficiently fast.

Substituting the kernel estimates ˆf_ab(T)(λ) for the true spectra fab(λ) in (2.1.2) and

(2.1.3), we now obtain consistent estimates for the partial spectrafab|Cab(λ) and the partial

spectral coherences Rab|Cab(λ). However, the identification of the conditional correlation

graph requires the computation of these statistics for all pairs a, b_{∈ {}1, . . . , d_}, which is computationally impracticable for large d. Here, we can use Lemma 2.1.3, which allows to compute the same estimates efficiently by an inversion of the estimated spectral matrix

ˆ

f(T)₍_λ_).

3.2 Testing for interrelation

Having estimated the partial spectral coherences as described above, the conditional cor-relation graph can be identified by testing for the presence of each single edge. Since the

(27)

partial spectral coherence is identical to zero, if the corresponding edge in the graph is missing, we are interested in testing the null hypothesis

H0 : Rab|Cab(λ)≡0 against H1 : Rab|Cab(λ)6≡0. (3.2.1)

Tests for this problem have been used quite frequently for interrelation analysis in multi-variate processes. The conditional correlation graph now summarizes and visualizes the results of such an analysis. However, the intuitive interpretation of the conditional corre-lation graph uses the global Markov property, which deals with the presence or absence of sets of edges. Therefore, the correct approach for the identification would be to consider the test problem under additionally constraints due to edges which already have been deleted from the graph. Another problem which clearly arises is the problem of multiple testing. These problems will not be discussed in this work.

Dahlhaus et al. (1997) suggested to reject the null hypothesisH0 if the partial spectral

coherence Rˆ (T) ab|Cab(λ) 2

exceeds an appropriate threshold. Thus the test statistic has the form S_◦(T) = sup λ∈[0,π] 2T µMT Rˆ (T) ab|Cab(λ) 2 where µ= 2πH4 H2 2 Z R w(α)2dα.

The partial spectral coherence rescaled as above is asymptoticallyχ2₂-distributed, but the exact distribution of the supremum is difficult to obtain. For an approximation we assume that the kernel function has compact support, [₋π₂,π₂] say. Since the partial coherences at frequencies λ1 and λ2 which are separated widely enough such that the corresponding

smoothing intervals are non-overlapping are approximately independent,S_◦(T)can then be approximated by the maximum over MT independent χ22-distributed random variables.

Thus, the null hypothesis is rejected at significance level α if S_◦(T) _≥ χ2

2,(1−α)1/MT, where

χ2

2,p denotes the p-quantile of theχ22-distribution.

Taniguchi et al. (1996) considered instead the integrated partial spectral coherence

S(T)= 1 2π Z Π Rˆ (T) ab|Cab(λ) 2 dλ (3.2.2)

as a test statistic for the test problem in (3.2.1). In a more general setting, asymptotic normality is established for test statistics of the form

S_∗(T) =√T Z Π K fˆ(T)(λ)dλ₋c ,

where K(_·) is a holomorphic function. The proof is based on a Taylor expansion of first order. Thus with (3.1.1) and setting K(f(λ)) = (2π)−1R_ab_|_Cab(λ)

2

, we get for the integrated partial spectral coherence

S(T)= Z Π K f(λ) dλ+ d X i,j=1 Z Π ∂K f(λ) ∂fij ˆ f_ij(T)(λ)₋fij(λ) dλ+OP M_T T . (3.2.3)

(28)

Under the null hypothesisH0 the first term vanishes sinceK(f(λ))≡0. Further, we can

rewrite the first derivatives as

∂K(f(λ)) ∂fij = X k,l∈{a,b} ∂K(f(λ)) ∂fkl|Cab ∂fkl|Cab ∂fij (λ).

With the abbreviation Dab(λ) = 1/ 2πfaa|Cab(λ)fbb|Cab(λ)

, we have for k, l_{∈ {}a, b_} such that k ₆=l ∂K(f(λ)) ∂fkl|Cab =flk|Cab(λ)Dab(λ) and ∂K(f(λ)) ∂fkk|Cab =₋|Rab|Cab(λ)| 2 fkk|Cab(λ) .

It follows from these expressions that under the null hypothesis H0 the second term

in (3.2.3) is also zero and accordingly together with Assumption 3.1.3 we get S(T) ₌

oP T−

1

2. Thus the central limit theorem in Taniguchi et al. (1996) does not hold for the integrated partial spectral coherence if the null hypothesis in (3.2.1) is considered.

In the following, we derive the correct asymptotic distribution of S(T) _{under the null}

hypothesisH0 and under a class of local alternatives.

3.2.1 Asymptotic null distribution

The derivation of the asymptotic distribution of S(T) _{is based on the following quadratic}

approximation S₂(T) = Z Π K f(λ)dλ+ d X i,j=1 Z Π ∂K(f(λ)) ∂fij ˆ f_ij(T)(λ)₋fij(λ) dλ +1 2 d X i,j,k,l=1 Z Π ∂2_K₍_f₍_λ₎₎ ∂fij∂fkl ˆ f_ij(T)(λ)₋fij(λ) _ˆ f_kl(T)(λ)₋fkl(λ) dλ. (3.2.4)

It follows from Assumption 3.1.1 (i) that the third derivative of K(_·) is bounded in a neighbourhood of f. Therefore we find by (3.1.1) and Assumption 3.1.3 that

S(T)=S₂(T)+oP

p

MT/T

. (3.2.5)

Under the null hypothesis H0, the first two terms in (3.2.4) vanish. Furthermore, we

obtain

∂2_K₍_f₍_λ₎₎

∂fab|Cab∂fba|Cab

(29)

while all other second derivatives are zero. Further it follows from the following Lemma 3.2.1 that d X i,j=1 Z Π Dab(λ) ∂fab_|Cab ∂fij (λ) ∂fba_|Cab ∂fkl (λ)fij(λ) ˆf (T) kl (λ)−fkl(λ) dλ= 0.

Therefore under the null hypothesis H0 S (T)

2 takes the form

S₂(T) = d X i,j,k,l=1 Z Π Dab(λ) ∂fab|Cab ∂fij (λ)∂fba|Cab ∂fkl (λ) ˆf_ij(T)(λ) ˆf_kl(T)(λ)dλ. Lemma 3.2.1 Under Assumption 3.1.1 we have the following identities:

Proof. From the definition of the partial spectral density we get the derivatives

whereg(λ) is the inverse of the spectral matrixfCabCab(λ). All other derivatives are equal

to zero. Substituting these expressions for the derivatives in (i) to (iii), the sums yield

the terms on the right side.

We now state the main theorem of this section.

Theorem 3.2.2 Suppose that Assumptions 3.1.1 to 3.1.3 hold. Then under the null hypothesis H0, T ST −MTµ √ MTσ D → N(0,1), where µ= 2πH4 H2 2 Z R w(α)2dα and σ2 = 4πH 2 4 H4 2 Z R ˆ w(α)4dα.

(30)

Proof. In Lemmas 3.2.3, 3.2.7, and 3.2.9 we prove the convergence of the cumulants

of first, second, and higher order of S₂(T) to the corresponding cumulants of the limit

distribution. The results then follows from (3.2.5).

For the proofs we need the following function, which has been introduced by Dahlhaus (1983). Let L(T):R_→Rbe the periodic extension (with period 2π) of

L(T)(λ) =    T, _|λ_{| ≤}1/T 1 |λ_|, 1/T <|λ| ≤π . (3.2.6)

The properties of this function are summarized in the appendix. Under the stated as-sumptions on the taper function we then have

H (T) k (λ) ≤CL(T)(λ),

with a constantC _∈Rindependent ofT andλ. Similarly, we obtain by Assumption 3.1.2 for the kernel function

w(T)(λ)_≤CL

(MT)₍_λ₎2

MT .

Further, we define the sequence _{Φ(₂T)_}T∈N of functions

Φ(₂T)(λ) = |H

(T) 2 (λ)|2

2πH₄(T)(0), (3.2.7)

which is an approximate identity (cf. Dahlhaus, 1983).

Lemma 3.2.3 Suppose that Assumptions 3.1.1 to 3.1.3 hold. Under the null hypothesis H0 we have E(S₂(T)) = MT T 2πH4 H2 2 Z R w(α)2dα+o √ MT T . (3.2.8)

Proof. For fixed i, j, k, l let

g(λ) =Dab(λ)∂fab|Cab

∂fij

(λ)∂fba|Cab

∂fkl

(λ).

(31)

(cf. Brillinger, 1981, Theorem 2.3.2) that EhZ Π g(λ) ˆf_ij(T)(λ) ˆf_kl(T)(λ)dλi = 1 (2πH₂(T)(0))2 Z Π3 g(λ)w(T)(λ₋α1)w(T)(λ−α2) ·cum_{d(_iT)(α1)d (T) j (−α1)d (T) k (α2)d (T) l (−α2)}dα1dα2dλ = 1 (2πH₂(T)(0))2 Z Π3 g(λ)w(T)(λ₋α1)w(T)(λ−α2) ·hcum_{d(_iT)(α1), d (T) j (−α1), d (T) k (α2), d (T) l (−α2)} + cum_{d(_iT)(α1), d (T) j (−α1)}cum{d (T) k (α1), d (T) l (−α2)} + cum_{d(_iT)(α1), d (T) k (α2)}cum{d (T) j (−α1), d (T) l (−α2)} + cum_{d(_iT)(α1), d (T) l (−α2)}cum{d (T) j (−α1), d (T) k (α2)} i dα1dα2dλ. (3.2.9)

By Theorem 4.3.2 of Brillinger (1981) we have

cum_{d(_aT₁)(α1), . . . , d(akT)(αk)}

= (2π)k−1H(T)(α1+. . .+αk)fa1...ak(α1, . . . , αk−1) +O(1)

(3.2.10)

uniformly in α1, . . . , αk ∈Π. Substituting into (3.2.9) the first term becomes C T Z Π3 g(λ)w(T)(λ₋α1)w(T)(λ−α2)fijkl(α1,−α1, α2)dα1dα2dλ+O 1 T2 =O1 T .

Further it follows from (3.1.2) and Lemma 3.2.1 that

Z Π ∂fab|Cab ∂fij (λ)fij(α−λ)w(T)(α)dα =O 1 M2 T , (3.2.11)

and thus the leading term of the second term becomes

Z Π3 g(λ)w(T)(λ₋α1)w(T)(λ−α2)fij(α1)fkl(α2)dα1dα2dλ=O 1 M4 T . (3.2.12)

With the above bounds for H₂(T)(λ) and w(T)(λ) we obtain for the third term

C T2 Z Π3 g(λ)w(T)(λ₋α1)w(T)(λ−α2) H (T) 2 (α1+α2) 2 fik(α1)fjl(−α1)dα1dα2dλ ≤ C T2_M2 T Z Π3 L(MT)(λ₋α1)2L(MT)(λ−α2)2L(T)(α1+α2)2dα1dα2dλ ≤ C T M2 T Z Π2 L(MT)(λ+α)2L(MT)(λ₋α)2dα dλ=O 1 T .

The last term now can be rewritten as 2πH4 T H2 2 Z Π3 g(λ+α)w(T)(λ)w(T)(λ+β)Φ(₂T)(β)fil(α)fkj(α)dαdβdλ.

(32)

In order to prove the convergence to the term on the right side in (3.2.8), we first show that the differences

Z Π3 g(λ+α)w(T)(λ)w(T)(λ+β)Φ(₂T)(β)fil(α)fkj(α)dαdβdλ − Z Π2 g(λ+α)w(T)(λ)2fil(α)fkj(α)dαdλ and Z Π2 g(λ+α)w(T)(λ)2fil(α)fkj(α)dαdλ− Z Π2 w(T)(λ)2g(α)fil(α)fkj(α)dαdλ

both are of ordero √MT

which is of the desired order since by Assumption 3.1.3 √MT =O(T β

2). For the second difference we note that by Assumption 3.1.1 (ii) the spectral densities and together with (i) also the inverse spectra are continuous. Thusg is Lipschitz continuous and we get

Z Π2 g(λ+α)w(T)(λ)2fil(α)fkj(α)dαdλ− Z Π2 w(T)(λ)2g(α)fil(α)fkj(α)dλdα ≤C Z Π2| g(λ+α)₋g(α)_|w(T)(λ)2dαdλ ≤ C M2 T Z Π |λ_|L(MT)(λ)4dλ _≤C.

Thus we have shown

E(S₂(T)) = MT T µ Z Π d X i,j,k,l=1 Dab(λ) ∂fab|Cab ∂fij (λ)∂fba|Cab ∂fkl (λ)fil(λ)fkj(λ)dλ+o √M_T T .

The assertion of the lemma now follows from Lemma 3.2.1.

For the derivation of the covariance of S(T) we define the sequence _{Ψ(T)_}T∈N of

functions Ψ(T)(α1, . . . , α5) = 1 C_Ψ(T)w (T)₍_α 1)· · ·w(T)(α4)Φ (T) 2 (α5)Φ (T) 2 (α1+α2−α3−α4+α5)

(33)

with C_Ψ(T) = Z Π5 w(T)(α1)· · ·w(T)(α4)Φ (T) 2 (α5)Φ (T) 2 (α1+α2−α3−α4+α5)dα1· · ·dα5.

Lemma 3.2.4 Let w satisfy Assumption 3.1.2. Then

(i) Ψ(T) _{is an approximate identity and}

(ii) lim T→∞ 1 MTC (T) Ψ = Z Π ˆ w(α)4dα.

Proof. (i) It follows immediately from the definition of Ψ(T) that Z Π5 Ψ(T)(α1, . . . , α5)dα1· · ·dα5 = 1 and Z Π5| Ψ(T)(α1, . . . , α5)|dα1· · ·dα5 ≤K <∞.

Further, we have for any δ >0

where Uδ(0) ={α∈R5|kαk∞≤δ}. For i= 1 we obtain Z |α1|>δ Z Π4| Ψ(T)(α1, . . . , α5)|dα1· · ·dα5 ≤ C MTδ2 Z Π5 w(T)(α2)· · ·w(T)(α4)Φ (T) 2 (α5)Φ (T) 2 (α1+α2−α3−α4+α5)dα1· · ·dα5 ≤ C MTδ2 → 0 asT _{→ ∞}.

The cases i = 2, . . . ,5 can be treated similarly. Therefore (3.2.13) tends asymptotically to zero.

(ii) To prove the second part of the lemma, we first note that

Z Π ˆ w(α)4dα = Z Π3 w(α1)w(α2)w(α3)w(α1+α2−α3)dα1dα2dα3.

Thus we get for δT >0

1 MT C_Ψ(T)₋ Z Π3 w(α1)w(α2)w(α3)w(α1+α2−α3)dα1dα2dα3 ≤ Z Π5 w(α1)w(α2)w(α3) w α₁+α₂−α₃+M_T(α₅−α₄) −w(α1+α2−α3) ·Φ(₂T)(α4)Φ (T) 2 (α5)dα1· · ·dα5

(34)

≤2_kw_k_∞ Z Π3 Z |α4|>δT∨|α5|>δT w(α1)w(α2)w(α3)Φ (T) 2 (α4)Φ (T) 2 (α5)dα1· · ·dα5 +CMT Z Π3 Z |α4|≤δT∧|α5|≤δT w(α1)w(α2)w(α3) α5−α4 Φ (T) 2 (α4)Φ (T) 2 (α5)dα1· · ·dα5.

For the second term we find for each ε >0 some δ >0 such that for δT =δ/MT the term

is bounded by CMTδT Z Π5 w(α1)w(α2)w(α3)Φ (T) 2 (α4)Φ (T) 2 (α5)dα1· · ·dα5 =Cδ≤ ε 2. Then there exists T0 >0 such that for allT > T0 the first term is bounded by

C Z Π Z |α4|>δT Φ(₂T)(α4)Φ (T) 2 (α5)dα4dα5 ≤ C T Z |α|>δT L(T)(α)2dα_≤ CM 2 T T δ2 ≤ ε 2 since M2

T/T →0 for T → ∞. This proves (ii).

Lemma 3.2.5 Let g :R_→R and h:R3 _→_R _{be integrable functions. Then} Z Π6 g(λ)h(α1+α2+α3−λ, λ−α1, λ−α2)Ψ(T)(α1, . . . , α5)dα1· · ·dα5dλ − Z Π g(λ)h(₋λ, λ, λ)dλ =o(1).

Proof. This follows e.g. from Theorem 2.10 in Alt (1992).

For the derivation of the cumulants of second and higher order we have to consider cumulants of the form

cumd(_ij,T₁)(αj,1)d (T) ij,2(−αj,1)d (T) ij,3(αj,2)d (T) ij,4(−αj,2)|j = 1, . . . , k . (3.2.14) LetP

i.p. denote the sum over all indecomposable partitions{P1, . . . , Pm} of the table α1,1 −α1,1 α1,2 −α1,2

..

. ... ... ...

αk,1 −αk,1 αk,2 −αk,2

(3.2.15)

with pj =|Pj|, Pj = {γj,1, . . . , γj,pj} and ¯γj = γj,1 +. . .+γj,pj. We call two sets Pi and Pj hooked if there exists an index l _{∈ {}1, . . . , k_} and variables γir _∈ Pi and γi_r₀ _∈ Pj

such that γir and γi_r0 are both contained in the lth row {αl,1,−αl,1, αl,2,−αl,2}. Thus,

in an indecomposable partition every setPj is hooked to at least one other set Pi. Each

partition _{P1, . . . , Pm} of the table (3.2.15) can also be interpreted as a partition of the

table α1,1 −α1,1 α1,2 −α1,2 .. . ... αk,2 −αk,2 . (3.2.16)