Interrelation analysis in the time domain: An example

3.3 Time domain analysis

3.3.3 Interrelation analysis in the time domain: An example

In this section, we illustrate the properties of the proposed correlation function as a tool for the identification of connectivity in multivariate point processes. For this purpose, we have generated data from a mutually exciting nonlinear point process which allows excitatory and inhibitory connections between the components.

For a multivariate point process N = (N1, . . . , Nd)0 onR, let Ht denote the σ-algebra

generated by the process up to time t. We then consider processes such that the conditional intensity functions satisfy

PdNa(t) = 1 Ht = exp µa+ d P b=1 µab(t) where µab(t) = Z R γab(t− u)dNb(u).

The constant µacharacterizes the spontaneous activity of process Nawhile µab(t) signifies

the change of activity induced by process Nb. The link function γab determines the nature

of the connection from Nb to Na. In particular, γab(u) < 0 represents inhibition while

positive values indicate excitation. We suppose that γab(u) = 0 for all u < 0 which

expresses that only events from the past can have an influence. Apart from the fact that such processes do not allow for the modeling of refractory periods, this seems to be a reasonable model for synaptic interactions.

1 3 4 5 (a) (b) excitatory connection inhibitory connection 1 2 3 4 5 2

Figure 3.3.1: (a) Connectivity of the simulated multivariate point process. (b) Estimated conditional correlation graph.

0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 50.0 -11.0 6.0 0.0 50.0 -11.0 6.0 0.0 50.0 -11.0 6.0 0.0 50.0 -11.0 6.0

partial coherence partial phase

frequency [Hz] 1 2 3 4 5

Figure 3.3.2: Below the diagonal: Estimated partial spectral coherences ˆR

(T )

ab|Cab(λ)

(solid) and spectral coherences ˆR (T ) ab (λ) 2

(dotted) for the simulated data. The horizontal dashed lines represent critical thresholds at significance level α = 5% for the maximum partial spectral coherence S(T )_◦ under the hypothesis that Rab|Cab(λ)

≡ 0. Above the diagonal: Estimated partial phase spectra ˆφ(T )_ab (λ) with 95%-confidence bands.

We have generated a five-dimensional process with connectivity structure as shown in Figure 3.3.1 (a) and sample length T = 30. For the simulation we have used exponential link functions of the form,

γab(u) = αabexp − βab(u− uab).

where αab = 1.2 and −2.0 for excitatory and inhibitory connections, respectively, βab =

100.0, and uab = 0.025, which determines the time delay of the connection.

Figure 3.3.2 shows the estimated partial spectral coherences for the simulated data. From these we obtain the conditional correlation graph in Figure 3.3.1 by connecting two vertices with an edge whenever the corresponding partial spectral coherence exceeds the threshold. Apart from one additional edge between vertices 2 and 3, this graph corresponds to the connectivity structure of the process. A better understanding of the underlying structure can be gained by application of the identifiction procedure suggested by Dahlhaus et al. (1997). Noting that the slope of the partial phase curve φ_ab|C_ab(λ) indicates the time delay of the connection, we can identify the directions of all edges but (2, 3). By the directions we can now identify those edges which might be due to a

0.0 1.0

0.0 50.0

partial coherence

frequency [Hz]

Figure 3.3.3: Estimated partial spectral coherence ˆR

(T ) 23_|1(λ) 2 .

marrying parents effect, which are then reexamined with all successors deleted from the graph. For the edge (2, 3) we therefore have to compute ˆR

(T ) 23|1(λ) 2 (Fig. 3.3.3), from which it follows that indeed the additional edge is due to a marrying parents effect. Thus, we have correctly identified the connections and their directions. However, this frequency domain analysis provides no information about the type of connections.

Next, we analyse the same data by use of scaled partial covariance densities. The estimates of the scaled partial covariance densities are given in Figure 3.3.4. The curves, having peaks or troughs at specific times while being otherwise flat, are typical for the

-70.0 180.0 -70.0 180.0 -70.0 180.0 -70.0 180.0 -0.1 0.1 -70.0 180.0 -0.1 0.1 -70.0 180.0 -0.1 0.1 -70.0 180.0 -0.1 0.1 -70.0 180.0 partial correlation correlation time [sec] 1 2 3 4 5

Figure 3.3.4: Estimated scaled partial covariance densities ˆρ(T )_ab

|Cab(u) and estimated scaled covariance

densities ˆρ(T )_ab (u) for the simulated data. The horizontal dotted lines represent pointwise critical values for the hypothesis ρab|Cab(u) = 0 at significance level α = 5%.

(a) -36.0 116.0 -0.1 0.1 partial correlation time [sec] (b) -40.0 218.0 -0.1 0.1 partial correlation time [sec]

Figure 3.3.5: Estimated scaled partial covariance densities (a) ˆρ(T )₂₃_|1(u) and (b) ˆρ(T )₃₄_|12(u).

kind of plots obtained for neurophysiological data sets. Here, direct association between the components of the process is marked by significant deviations of the scaled partial covariance density from zero. In the plots, the horizontal dotted lines represent the pointwise critical values centered around zero. These are calculated from the limiting distribution of ˆρ(T )_ab|C

ab(u) under the null hypothesis ρab|Cab(u) = 0. Since the values are not

simultaneous critical values, we have to make allowances for exceeding these thresholds. Simultaneous confidence bands can be obtained from Theorem 3.3.9 similar as in Eichler (1995, Example 3.2).

Examining the plots, we identify the same edges in the conditional correlation graph as by the frequency approach above. Further, we can estimate the time delay and thus the direction of a connection by the distance of the peak or trough from the origin. For all edges except (2, 3) we obtain approximately the correct time delay of 25 milliseconds, although one curve, ˆρ(T )_34|125(u) has a flat second peak at the origin. For the remaining edge (2, 3) we find a significant trough at the origin. Since synaptic connections between neurons cannot have zero delay, these significant deviations at the origin typically indicate a common input or - when concerned with partialized statistics - a marrying parents effect if this is suggested by the directions of the other edges. In order to decide between these two possibilities, we can apply the same identification procedure as in the frequency domain. Thus, we have to compute ˆρ(T )_23|1(u) and ˆρ(T )_34|12(u). As we can see from the plots in Figure 3.3.5, there is no common input and the peak and the trough were only due to a marrying parents effect.

Finally, the type of the connection - excitation of inhibition - is indicated by peaks and troughs, respectively. Examining the plots, we can detect two inhibiting connections between N1 and N2 and between N2 and N5, while all other connections are excitatory.

Therefore, the connectivity structure of the process as given in Figure 3.3.1 has been completely identified.

In Figure 3.3.4, we have also included the scaled covariance densities without partialization. These curves are equivalent to the cross-correlation histogram, which is a widely used tool in the analysis of neurophysiological data. More precisely, the cross-correlation histogram is an estimate for the renewal density of the process, which can be obtained from the covariance density by a linear transformation. As expected, these curves do not provide sufficient information for the identification of the connectivity structure of the process. Particularly interesting is the scaled covariance density between N3 and N5

which exhibits both a peak and a trough. Consequently, it could not be used for inference about the type of the connection even if the frequency domain method were employed for the identification of connections and their directions.

In summary, the example has shown that the new proposed scaled partial covariance density can be used for the identification of the connectivity structure of multivariate point processes. The new statistic provides information about the type and the direction of a connection and allows to distinguish direct connections from indirect connections and common inputs. It can be interpreted in the same way as the widely used cross-correlation histogram and therefore might lead to a better acceptance of partialization methods in neurophysiology. The statistics can be computed efficiently by fast Fourier transforms for point processes (cf. Rigas, 1991, 1992) and therefore be used as an extension of the partialization analysis in the frequency domain.

Selection of graphical interaction

models

In the previous chapter, we have seen that conditional independence graphs can be used to summarize and visualize the findings of nonparametric interrelation analysis in a concise way. However, it is unclear in which way the estimated dependence structure is related to the probability distribution of the observed process and whether it is (in some sense) the best estimate.

An alternative to the nonparametric approach is the fitting of parametric graphical models where the parameters are constrained with respect to conditional correlation graphs. The problem of estimating the dependence structure of the process now becomes a problem of model selection where the best approximating model minimizes some chosen model distance such as the Kullback-Leibler information divergence.

In this chapter we discuss the problem of model selection for the class of graphical autoregressive models introduced in Example 2.1.5. The aim is to derive the asymptotic efficiency of a version of the AIC criterion with respect to the Kullback-Leibler information divergence. In the first section we derive implicit equations for the Whittle estimate which are similar to the equations for the maximum likelihood estimate in the case of ordinary Gaussian graphical models. In Section 4.2 we investigate the asymptotic be- haviour of the Kullback-Leibler information and show that it can be approximated by a deterministic function. This is exploited in Section 4.3 to derive the asymptotic efficiency of the proposed model selection criterion.

4.1 Model fitting

A fundamental, information theoretic measure for the separation or distance between two probability distributions is the Kullback-Leibler information (Kullback and Leibler, 1951), which gives the mean information per observation for the discrimination between the true and a fitted distribution. Let X(1), . . . , X(T ) be observations from a multivariate Gaussian stationary process specified by some infinite parameter θ0. Then for density

functions pθ0 and pθ and spectral matrices fθ0 and fθ the Kullback-Leibler information

between the process and a fitted model specified by the parameter θ is given by I(θ, θ0) = lim T→∞ 1 TEθ0 pθ(X1, . . . , XT) pθ0(X1, . . . , XT) = 1 4π Z Π n (logdet fθ(λ) det fθ0(λ) + trfθ0(λ)f −1 θ (λ)− 1d o dλ

(cf. Parzen, 1983). Minimization of I(θ, θ0) with respect to θ is equivalent to minimizing

L (θ) = 1 4π Z Π log det fθ(λ) + trfθ0(λ)f −1 θ (λ) dλ.

In the following we assume that _{{X(t), t ∈ Z} is an autoregressive process of infinite} order to which we will fit graphical autoregressive models of finite order p. Allowing the order p to diverge to infinity for increasing sample sizes, the process can asymptotically be fitted by the correct model which is crucial in our investigation of the asymptotic properties of the Kullback-Leibler information.

Assumption 4.1.1 _{{X(t), t ∈ Z} is a d vector-valued stochastic process defined on a} probability space (Ω,A , P) such that the following conditions hold.

(i) _{{X(t)} is a stationary Gaussian autoregressive process,} X(t) =

∞

h=1

AhX(t− h) + ε(t), (4.1.1)

with d_{× d coefficient matrices A}h such that kAhk 6= 0 for infinitely many h ∈ N for

any matrix norm _{k · k. The innovations ε(t), t ∈ Z are independent and normally} distributed with meanE ε(t) = 0 and regular covariance matrix E ε(t)ε(t)0 = Σ. (ii) The spectral matrix f (λ) of_{{X(t)} exists and satisfies the boundedness condition}

a11d≤ g(λ) ≤ a21d ∀λ ∈ [−π, π]

for constants a1 and a2 such that 0 < a1 ≤ a2 <∞.

(iii) There exists β > 1 such that the covariances R(u) of _{{X(t)} satisfy} X

u∈Z

|u|βkR(u)k < ∞. (iv) _{{X(t)} has conditional independence graph G}0 = (V, E0).

As in Example 2.1.5 we parametrize graphical autoregressive models by the inverse covariances R(i)_ij(u). Thus we get infinite dimensional parameter vectors

θ = vech(R(i)(0))0, vec(R(i)(1))0, vec(R(i)(2))0, . . .0,

where the vech operator stacks only the elements contained in the lower triangular sub- matrix. We denote the spectral matrices, covariances, and inverse covariances specified by the parameter θ by fθ(λ), Rθ(u), and R

(i)

Assumption 4.1.2 Θ is a subset of `2₍_{R) such that the following conditions hold.}

(i) The spectral matrices fθ satisfy for all θ ∈ Θ the boundedness condition

b11d ≤ fθ(λ)≤ b21d ∀λ ∈ [−π, π]

for constants b1 and b2 such that 0 < b1 ≤ b2 <∞.

(ii) There exists a constant C > 0 such that the covariances Rθ(u) satisfy

u∈Z

|u|β

kRθ(u)k < C

for all θ _{∈ Θ(p, G), where β is the same as in Assumption 4.1.1.}

(iii) There exists θ0 in Θ such that fθ0(λ) = f (λ) for all λ ∈ [−π, π] and θ0 belongs to

the interior of Θ.

Next, let G denote the set of all graphs G = (V, E) such that V = {1, . . . , d} and E _{⊆ {(i, j) ∈ V}2_{|i 6= j}. For p ∈ N and G ∈ G , the AR(p, G) model is now given by the} parameter space

Θ(p, G) = θ ∈ ΘR(i)_ij,θ(u) = 0 if (i, j) /_{∈ E or |u| > p} .

Let Ip,G denote the set of indices for which Θ(p, G) is not constrained to zero and πp,G

the projection of `2₍_{R) onto the subspace spanned by Θ(p, G).}

Minimization of the Kullback-Leibler information I(θ, θ0), or equivalently L (θ), with

respect to θ _{∈ Θ(p, G) yields the best AR(p, G) approximation of {X(t)}, which we denote} by the parameter

θ0(p, G) = argmin θ∈Θ(p,G)L (θ).

We require that θ0(p, G) exists and is uniquely defined.

Assumption 4.1.3 The best approximation θ0(p, G) in Θ(p, G) with respect to the

Kullback-Leibler information I(θ, θ0) is unique and belongs to the relative interior of

Θ(p, G) with respect to the seminorm _{k · k}πp,G.

From Lemma B.3 we obtain the following derivatives of L (θ) ∂L (θ) ∂θk = 1 4π Z Π trh fθ0(λ)− fθ(λ) ∂ f_θ−1(λ) ∂θk i dλ. (4.1.2)

Since the inverse spectral matrix is linear in the parameters we get an explicit formula for its derivatives. Let θk correspond to R

(i) ab(u). Then ∂f_ij,θ−1(λ) ∂θk = ( 2πδiaδja if a = b and u = 0

2πδiaδjbexp(−iλu) + δibδjaexp(iλu)

Substituted into (4.1.2) we therefore get ∂L (θ) ∂θk = 0 _⇔ Z Π

(fθ0,ab(λ)− fab,θ(λ)) exp(iλu)dλ = 0. (4.1.3)

This leads to the following set of equations, which characterize the best AR(p, G) approximation θ0(p, G),

Rij,θ0(p,G)(u) = Rij,θ0(u) ∀(i, j) ∈ E ∀u ∈ {−p, . . . , p}

R(i)_ij,θ

0(p,G)(u) = 0 ∀(i, j) /∈ E ∀u ∈ {−p, . . . , p}

(4.1.4)

and additionally R(i)_θ

0(p,G)(u) = 0 for all |u| > p.

In the following we will also need the second and third derivatives ofL (θ). By Lemma B.3 and the linearity of f_θ−1(λ) in the parameters we obtain for the second derivatives

∂2L (θ) ∂θi∂θj = 1 4π Z Π trh fθ0(λ)− fθ(λ) ∂2f_θ−1(λ) ∂θi∂θj i dλ₋ Z Π trh∂fθ(λ) ∂θi ∂f_θ−1(λ) ∂θj i dλ = 1 4π Z Π tr h fθ(λ) ∂f_θ−1(λ) ∂θi fθ(λ) ∂f_θ−1(λ) ∂θj i dλ, and similarly for the third derivatives

∂3_{L (θ)} ∂θi∂θj∂θk = 1 2π Z Π trhfθ(λ) ∂f_θ−1(λ) ∂θi fθ(λ) ∂f_θ−1(λ) ∂θj fθ(λ) ∂f_θ−1(λ) ∂θk i dλ. We will denote the vector of first derivatives by

∇L (θ) =∂L (θ) ∂θi

i∈N,

and the matrix of second derivatives by

∇2_{L (θ) =}∂2L (θ)

∂θi∂θj

i,j∈N.

The linearity of f_θ−1(λ) in θ also implies that θ0₁_∇2L (θ)θ2 = 1 4π Z Π trfθ(λ)f_θ−1₁ (λ)fθ(λ)f_θ−1₂ (λ)dλ. (4.1.5)

In practice, model distances such as the Kullback-Leibler information need to be estimated as they depend on the unknown parameter θ0. Akaike (1973) pointed out that the

Kullback-Leibler information is related to the method of maximum likelihood. Therefore, given observations X(1), . . . , X(T ) from the process_{{X(t)}, minimum distance estimates} can be obtained by maximizing the Gaussian likelihood function or, equivalently, minimizing the _{−1/T log likelihood function}

L∗ T(θ) = 1 2log(2π) + 1 2T log det Rθ,T + 1 2TX 0 TR−1θ,TXT, (4.1.6)

where Rθ,T = Rθ(u− v)

u,v=1,... ,T. A more favourable choice for fitting graphical au-

toregressive models is the likelihood approximation suggested by Whittle (1953, 1954). Approximating the matrix R−1_θ,T by the corresponding matrix of inverse covariances (cf. Shaman, 1975, 1976) together with the Szeg¨o identity (cf. Grenander and Szeg¨o, 1958) leads to the Whittle likelihood

LT(θ) = 1 4π Z Π

log det fθ(λ) + trI(T )(λ)fθ(λ)−1

dλ,

which estimates L (θ) consistently. Thus we get as a minimum distance estimate the Whittle estimate

θT(p, G) = argmin θ∈Θ(p,G)L

T(θ).

The first derivative of the Whittle likelihood is ∂LT(θ) ∂θi = 1 4π Z Π trh I(T )(λ)_{− f}θ(λ) ∂ f_θ−1(λ) ∂θi i dλ. (4.1.7)

Since f_θ−1(λ) is linear in θ, the data dependent term vanishes in the second derivative and we find for all θ _{∈ Θ}

∇2LT(θ) =∇2L (θ). (4.1.8)

Consequently, also the third derivatives of LT(θ) and L (θ) are equal. Setting the first

derivative to zero leads to the following characterization of the Whittle estimates in the AR(p, G) model.

Theorem 4.1.4 Suppose that Assumptions 4.1.1 and 4.1.2 hold. Then the Whittle-

estimate ˆθT(p, G) in the graphical autoregressive model AR(p, G) is given by the equations

R_ij,ˆ_θ

T(p,G)(u) = ˆRij(u) ∀(i, j) ∈ E ∀u ∈ {−p, . . . , p},

R(i)

ij,ˆθT(p,G)(u) = 0 ∀(i, j) /∈ E ∀u ∈ {−p, . . . , p},

and R(i)_ˆ

θT(p,G)

(u) = 0 for all _{|u| > p, where ˆ}Rij(u) is defined as

Rij(u) =

I_ij(T )(λ) exp(iλu)dλ.

Proof. The result follows from the arguments leading to (4.1.4) applied to the first

derivative in (4.1.7).

These equations are similar to the equations for the maximum likelihood estimates in ordinary Gaussian graphical models (cf. Lauritzen, 1996). More precisely, these are the restrictions for a Gaussian graphical model in which the set of vertices consists of the entire process _{{X(t), t ∈ Z}. This, however, is not surprising by the way the Whittle} likelihood approximates the likelihood function in (4.1.6), as the Whittle likelihood mainly

neglects edge effects due to observing only a finite horizon by substituting asymptotic approximations for the finite sample quantities det Rθ,T and R−1θ,T.

The asymptotic properties of the Whittle estimate in general are well known (e.g. Dzhaparidze and Yaglom, 1983). For example, we have the following central limit theorem. Theorem 4.1.5 Under Assumptions 4.1.1 to 4.1.3 we have

√ T ˆθT(p, G)− θ0(p, G) D → N 0, chΓ(p, G)−1Γ0(p, G)Γ(p, G)−1 where ch = H4/H22, Γ0(p, G) = πp,G∇2L (θ0)πp,G and Γ(p, G) = πp,G∇2L (θ0(p, G))πp,G

with Γ(p, G)−1 = πp,GΓ(p, G)−πp,G for any generalized inverse Γ(p, G)−.

Proof. see e.g. Dzhaparidze and Yaglom (1983, Section 5.6).

From the Whittle estimate ˆθT(p, G) we can finally compute estimates for the param-

eters A1, . . . , Ap and Σ in (4.1.1).

(a) From the estimates R(i)_ˆ

θT(p,G)(u) for the inverse covariances we can obtain the covari-

ances R_θˆ_T_(p,G)(u) via computation of f_θ_ˆ−1

T(p,G) and fθˆT(p,G). Then estimates for the

matrices A1, . . . , Ap and Σ can be determined by solving the Yule-Walker equations p

u=0

AuR_θˆ_T_(p,G)(u− v) = δv0Σ, v = 0, . . . , p,

where A0 =−1d.

(b) The parameters A1, . . . , Ap and Σ are related to the inverse covariances by the

equation system R(i)_ˆ θT(p,G)(v) = p−v X u=0 AuΣAu+v,

where again A(0) = ₋₁d. This problem is equivalent to the estimation of mov-

ing average parameters from the covariances of a process. An iterative algorithm for solving such an equation system has been suggested e.g. by Tunnicliffe Wilson (1972).

We are now interested in the AR(p, G) model, where G_{∈ G and p is selected from a}

In document Graphical Models in Time Series Analysis (Page 56-67)