Parallel Factor Analysis (PARAFAC) - Studies of Local Functional Network Connectivity in Cat Vi

In order to get a first grasp of the effects of the thermal pMS deactivation on correlated activity in area 18, a screening method was required. It was decided to use Parallel Factor Analysis (PARAFAC) (Hitchcock, 1927; Harshman, 1970; Carroll and Chang, 1970), a method that is frequently used in chemometrics, but has so far not been well developed for electrophysiological data.

PARAFAC provides a clear and quickly assessible visualisation of the changes in correlation, with the advantage that a whole cycle of warm-cool-rewarm conditions is accessible at a glance. The results can be used to assess the correlation level during the baseline condition (i.e. no deactivation), physiologi- cal effects caused by the deactivation, and also to identify sessions with an unusual correlation profile. PARAFAC should then ideally be followed by further analysis methods, possibly also using several mea- sures for correlated activity, to thoroughly describe the dataset. This is why, for the first coarse look at the data, the definition of correlation was kept fairly general (see definition ofκ_{s y nc}in equation 2.6), and was refined in later analysis steps.

An article about the application of PARAFAC to neurophysiological data by the author of this thesis and colleagues was published in Frontiers in Neuroinformatics (Schmitz et al., 2015). Parts of the publication are presented in the next chapter, followed by some additional comments and analyses. The following passages in italics are taken from the publication by Schmitz et al. (2015)3.

3 _{Contributions to the publication Application of Parallel Factor Analysis (PARAFAC) to electrophysiological data} by Schmitz et al. (2015):

Sarah Katharina Schmitz wrote the paper, designed the analysis, contributed to the implementation in MAT- LAB, performed the analysis, selected the results to be presented, and developed the discussion.

Philipp P. Hasselbach developed the program code (with support by Marcelo Nakano Daniel and Stephan Wirsing), assisted in the design of the analysis and in writing the methods section.

Ralf A. W. Galuske and Boris Ebisch performed the experiments.

Ralf A. W. Galuske, Gordon Pipa, and Anja Klein helped with the design of the analysis and the interpretation of the results.

"The PARAFAC Model

PARAFAC is based on a mathematical model that represents the interactions of the dimensions in which the input data is to be analyzed. In order to carry out PARAFAC, the analysis dimensions have to be defined first. Each input value can then be related to an index for each of the dimensions. Assuming N = 3 dimensions, for example, xi jk identifies the measured value for index i in the first dimension, j in the second dimension and k in the third dimension. In our case, the three dimensions are constituted by three experimental variables: electrode pair, stimulus, and repetition/trial. The correlation valuesκ_syncare obtained by the procedure explained [below] and used as an input into the three-dimensional array. They are placed at the location corresponding to the experimental condition they were obtained for (electrode pair x stimulus x repetition/trial).

PARAFAC is now used to model this input array. Let F denote the number of so-called compo- nents and define so-called loading matrices A, B, and C of dimensions I× F, J × F and K × F and with elements ai f, bj f and ck f, respectively, and the modelling error"i jk. The general model used by PARAFAC to represent the input data is then given by (Bro, 1997)

xi jk= F X

f=1

ai f bj f ck f + "i jk. (2.4)

A graphical illustration of the model is given in Fig. 2.7.

Figure 2.7.: Illustration of the PARAFAC model. The three-dimensional array containing the cross correlation information for each electrode pair, stimulus, and repetition, is decomposed into a sum of products of three factors, called loadings, which build up the loading matrices A, B, and C. Modied with permission from Miwakeichi et al. (2004).

PARAFAC thus constrains the interactions between the different dimensions to the complete multiplicative interaction. The loading vectors are determined by minimizing the modelling error "_{i jk}. This minimization can be carried out using the alternating least squares (ALS) approach, for example. ALS iteratively determines the loading matrices A, B, and C by the following algorithm (Bro, 1997):

1. Choose the number of components, F (on the choice of F see next paragraph) 2. Initialize B and C

3. Estimate A from X, B, and C by least squares regression to minimize the square of the model error

4. Estimate B likewise 5. Estimate C likewise

6. Repeat from 3) until the algorithm converges (indicated by only little changes in fit or loadings)

The choice of the number F of components is difficult and no technique giving clear values has been identified yet. If F is chosen too small, not all effects in the input data can be identified. If F is chosen too large, however, noise is modeled increasingly and the existing effects in the data will be modeled by correlated components. Different approaches for estimation of the best value for F exist (Bro, 1997). The approach taken in this work was to increase the number F of components until the decrease in the residual error decayed significantly. The model with the optimal number of components was then determined to be the one which was able to explain the highest amount of variance without any of the components being correlated. The quality of the analysis results of multilinear variation strongly depends on the preprocessing of the input data. Possible preprocessing strategies are centering and scaling (see (Bro, 1997)): Centering removes a non-zero mean from the data. A complete centering of the input data in all dimensions can be achieved by taking the result of the previous centering and center it in the next dimension. Scaling adjusts the variations in each of the dimensions to comparable magnitudes. Note that in contrast to centering, subsequent scaling of several modes is problematic since scaling one mode affects the scaling of other modes as well as the centering of the same mode. For this reason, centering should be carried out after scaling. Iterative approaches that can achieve a scaling of all modes are available (Bro, 1997). In this study, the data were centered in all dimensions. No scaling of the data was carried out.

Synchrony and oscillatory synchrony

Two different metrics were established in order to quantify the degree of synchrony and oscillatory synchrony. Since both phenomona are identified using the correlogram, both metrics are based on the correlogram.

Spike trains were stored as binary vectors with sampling frequency resolution. Ones occurred where a sample value exceeded the chosen threshold and the sample before did not. For the calculation of cross correlations, a binning of2 ms was introduced.

For the synchrony measure, the cross correlogram λraw_{x y} of the spike trainsx and y of two multi-units in the analysis window T was computed:

λraw x y (τ) :=

t∈T

x(t)y(t + τ). (2.5)

To correct for rate induced chance coincidences, we normalized the correlograms to the firing rate. To this end, the spikes in the spike trains were convolved with a ’jitter kernel’ and the resulting cross correlogram was subtracted from the raw correlogramλraw_{x y} . To implement a computationally efficient jitter correction, we rely on convolution with a homogeneous filter with entries T_Tsample

jitter , with Tsample the time interval between two samples, which corresponds to jittering the spike train randomly within a time interval of Tjitter = 6 ms. The jittered

spike train ˜x was thus obtained from spike train x according to˜x:= K ∗ x , where ∗ denotes the convolution. A correlogram was then computed from the two convolved spike trains and subtracted from the original correlogram, to obtain the normalized correlogramλ_{x y}:= λraw_{x y} ₋

λjitter

x y . The convolution approach is equivalent to the random drawing of jitter times from a uniform distribution (following the Wiener-Khinchin theorem, see also (Pipa et al., 2008)). The degree of synchrony was measured by the largest positive peak in the normalized correl- ogram within a window of size Tsync= 5 ms around a lag of zero. The synchrony metric κsync

is thus defined as follows. Letκ = max

l,l∈[−5 ms,5 ms]λx y(l). Then,

κsync:=

(κ, if κ ≥ 0

0 else. (2.6)

In addition to the synchrony measure, we looked at the oscillatory synchrony between the spike trains. The degree of oscillatory synchrony can be determined by considering how much energy is contained in the oscillations of the correlogram in a chosen frequency range. To this end, the correlogramλ_{x y}was subjected to an N-point Discrete Fourier Transform (DFT) (Oppenheim et al., 1997), in order to extract the frequencies of interest. The oscillatory synchronyκ_oscsyncwas then set as the relative power of the signal in a chosen frequency range between f_minand f_max:

κoscsync:= Pfmax m=fminDF T 2 λx y(m) PN−1 n=0 DF Tλ2x y(n) . (2.7) Validation

The verification of the results obtained with the PARAFAC model of (5) was achieved by car- rying out so-called split-half experiments (Harshman and Lundy, 1994). To this end, the set of input data is split into two halves and PARAFAC is carried out for both halves inde- pendently. The model is considered to be applicable if the results gained from both halves are similar. In this work, several split-half experiments were carried out, splitting the input data set into odd and even trials.

PCA

We use the trilinear PARAFAC model because we assume the data to be (at least) trilinear. To show that a bilinear model, such as principal component analysis (PCA), is not adequate in this context, we also decomposed the correlation matrices using PCA and compared the results. PCA is a widely used technique. An introduction can be found in Jolliffe (2002). In order to make the data array accessible for PCA, it was unfolded into a two-dimensional structure."

PCA

There are several aspects in validating a model, comprising computational, statistical, and explanatory validation (see (Bro, 1998), p. 99). Split-half experiments (see section "Validation" in (Schmitz et al.,

2015)) were applied and residuals and the core consistency diagnostic (CORCONDIA) (Bro and Kiers, 2003) were evaluated. Explanatory validation, meaning the appropriateness of the model for the data at hand and the research question, will be covered in the discussion section (see section 4.1.1).

Finding the optimal number of components for the model is non-trivial and can be facilitated by expe- rience and prior knowledge about the data ((Bro, 1998), p. 110).

While for PCA the loading matrices are always orthogonal to each other, this is not the case for PARAFAC. As a consequence, correlated loadings can occur. If this correlation is "high" (positive or negative correlation> 0.8 (Field and Graupe, 1991) or > 0.9 (Kruskal, 1989)), degenerate solutions emerge. In general, correlations lower than this seem to be common in real data and do not present the PARAFAC algorithm with problems. Degenerate solutions are thus present, if there are "high correla- tions (...) in all three modes between the loadings of two or more components" (Field and Graupe, 1991). If this is the case, the Nway toolbox (Andersson and Bro, 2000) issues a warning, which was used as a stopping criterion in the search for the optimal number of components and thus to avoid degenerate solutions.

As a further criterion in the search for the optimal PARAFAC solution, the core consistency diagnostic, or CORCONDIA (Bro and Kiers, 2003) was applied. It is defined as

core consistency= 100 1− PF d=1 PF e=1 PF f=1 gd e f− td e f 2 F ! , (2.8)

where F the number of components in the PARAFAC model, G the Tucker34 core array and T a binary array with ones on the superdiagonal, and zeros otherwise (for more details see (Bro, 1998; Bro and Kiers, 2003)). The CORCONDIA thus assesses how well the Tucker3/PARAFAC core fits the assumptions of the PARAFAC model. CORCONDIA values above 90 indicate that the data is indeed trilinear, whereas values around 50 and below indicate that the model includes both trilinear and non- trilinear variation and therefore is not appropriate (Bro and Kiers, 2003).

CORCONDIA cannot find the most appropriate model, but it can tell the researcher that the model does not overfit. However, "by assuming that noise is not trilinear (...), it follows that the valid model with the highest number of components must be the one to choose." ((Bro, 1998), p. 120). According to this rule, the model with the highest number of components that reached a CORCONDIA value of larger than 70-75 was selected. With this, it is worthwhile to note that models with only one factor always have a CORCONDIA of 100.

In document Studies of Local Functional Network Connectivity in Cat Visual Cortex (Page 38-42)