• No results found

Uncovering cortical MEG responses to listened audiobook stories

N/A
N/A
Protected

Academic year: 2021

Share "Uncovering cortical MEG responses to listened audiobook stories"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Uncovering cortical MEG responses to listened audiobook stories

M. Koskinen

, M. Seppä

Brain Research Unit at AMI Centre, and MEG Core, O.V. Lounasmaa Laboratory, Aalto University School of Science, P.O. Box 13000, FI-00076 Aalto, Finland

a b s t r a c t

a r t i c l e i n f o

Article history: Accepted 6 June 2014 Available online 17 June 2014 Keywords:

Canonical correlation analysis (CCA) Forward modeling

MEG

Single-trial analysis Spatialfiltering Wavelet transform

Naturalistic stimuli, such as normal speech and narratives, are opening up intriguing prospects in neuroscience, especially when merging neuroimaging with machine learning methodology. Here we propose a task-optimized spatialfiltering strategy for uncovering individual magnetoencephalographic (MEG) responses to audiobook stories. Ten subjects listened to 1-h-long recording once, as well as to 48 repetitions of a 1-min-long speech passage. Employing response replicability as statistical validity and utilizing unsupervised learning methods, we trained spatialfilters that were able to generalize over datasets of an individual. For this blind-signal-separation (BSS) task, we derived a version of multi-set similarity-constrained canonical correlation analysis (SimCCA) that theoretically provides maximal signal-to-noise ratio (SNR) in this setting. Irrespective of signifi-cant noise in unaveraged MEG traces, the method successfully uncovered feasible time courses up to ~120 Hz, with the most prominent signals below 20 Hz. Individual trial-to-trial correlations of such time courses reached the level of 0.55 (median 0.33 in the group) at ~0.5 Hz, with considerable variation between subjects. By this filtering, the SNR increased up to 20 times. In comparison, independent component analysis (ICA) or principal component analysis (PCA) did not improve SNR notably. The validity of the extracted brain signals was further assessed by inspecting their associations with the stimulus, as well as by mapping the contributing cortical signal sources. The results indicate that the proposed methodology effectively reduces noise in MEG recordings to that extent that brain responses can be seen to nonrecurring audiobook stories. The study paves the way for applica-tions aiming at accurately modeling the stimulus–response-relaapplica-tionship by tackling the response variability, as well as for real-time monitoring of brain signals of individuals in naturalistic experimental conditions.

© 2014 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

Introduction

Good storytelling can be very captivating for the listener. Stories en-gage brain functions from perception, attention shifting, and compre-hension to emotional immersion in the story, providing a versatile and modern approach for neuroscience research. However, natural speech and narratives have yet gained only modest attention in neuroscience experimentation. As speech is a relatively unpredictable one-time-only phenomenon, it offers a great challenge for reliably detecting related acti-vation patterns from noisy functional brain-imaging data. Nonrecurring audio streams are in stark contrast to typical strictly controlled event-related analysis designs with simplified brief and repeated stimuli. There-fore, novel methodological approaches are required for the analysis.

Recent work has indicated that listened narratives may engage brain regions beyond those activated by non-speech sounds, single words, or short isolated sentences. These areas include, for example, medial and inferior frontal cortices, precuneus, cingulate cortex, and amygdala (Ben-Yakov et al., 2012; Boldt et al., 2013; Lerner et al., 2011; Regev et al., 2013; Wilson et al., 2008; Zion Golumbic et al., 2013). Moreover,

in functional magnetic resonance imaging (fMRI), listened stories have been used for mapping of brain areas linked to syntactic processing (Brennan et al., 2012) or responding to transitions in narration (Whitney et al., 2009). Recently, growing interest has emerged for elec-trophysiological signals that can provide superior temporal resolution sufficient for detecting activity patterns at syllable rates (~8 Hz) and above. This approach has, for example, enabled identifying segments of listened news with magnetoencephalography (MEG; Koskinen et al., 2013),finding brain correlates for sentence comprehension (Peelle et al., 2013), and resolving attention mechanisms in a“cocktail party” setup with electroencephalograpy (EEG;O'Sullivan et al., 2014), and with electrocorticography (ECoG;Zion Golumbic et al., 2013).

In the current study, we aim at uncovering cortical MEG responses to listened 1-h-long nonrecurring audiobook recording. The work has three key focuses. First, our main emphasis is to uncover physiologically feasi-ble time series from the raw MEG data. Topographic mapping is also con-sidered, but only for validation of the main analysis. Second, as the listened stories have different effects on different brain regions, we aim for data-driven analysis and search for statistical regularities in brain re-sponses, without constraining the search space by stimulus features that might or might not be reflected in the raw data. Third, our purpose is to pave the way for applications aiming at real-time signal analysis of

⁎ Corresponding author. Fax: +358 9 470 22969. E-mail address:miika.koskinen@aalto.fi(M. Koskinen).

Contents lists available atScienceDirect

NeuroImage

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / y n i m g

http://dx.doi.org/10.1016/j.neuroimage.2014.06.018

(2)

individual subjects' MEG data in naturalistic experimental conditions. Thus, we focus on subject-wise task-optimized analysis models.

The signal-to-noise ratio (SNR) of single-trial MEG recordings is very low (≪1). Thus to obtain reliable signals without averaging, we con-centrate on signal replicability and require that a successful analysis model, reliably uncovering brain responses to nonrecurring speech, produces replicable MEG signatures for a repeated stimulus irrespective of the signal variability and prominent noise in the recorded data. Therefore, in addition to the nonrecurring speech stimulus, a supporting dataset with repeated trials (1-min-long independent stimulus) was recorded for model training. Notably, the analysis model should gener-alize over trials and over datasets.

We utilize a linear mapping model, representing spatialfiltering of MEG signals in sensor space, combined with parameter estimation based on canonical correlation analysis (CCA). In its standard version, CCAfinds linear projections for two distinct multivariate datasets so that the resulting random variables become maximally correlated be-tween the sets. For the replicability requirement, however, our aim is tofind a single weighting matrix (canonical basis vectors, i.e. spatial fil-ters) that provides maximal correlations between projections of N trials. This requirement relates to similarity-constrained CCA problem (SimCCA;Lahti et al., 2009). A similar idea has been utilized with EEG in computation of intra- and intersubject correlations (Dmochowski et al., 2012). Here, we derive a robust singular-value-decomposition (SVD) based solution that is practical for large datasets, with the aim of uncovering reliable brain responses from the recorded data, i.e. blind signal separation (BSS). Importantly, maximizing trial-to-trial cor-relations by SimCCA corresponds to maximizing the SNR (e.g.Bershad and Rockmore, 1974). Such maximization is expected to reduce noise sensitivity of the estimates, to prevent overfitting, and to improve the model performance with novel test datasets. Thus, the proposed approach differs from the well-known spatialfiltering schemes, such as beamforming (van Veen et al., 1997) that is based on anatomical models, or independent-component (ICA) and principal-component analyses (PCA) that do not intrinsically utilize sample ordering (e.g.Karhunen and Hao, 2011). We show that our method succeeds in revealing feasible brain responses to nonrecurring audiobook stimuli in individual subjects.

Materials and methods

Subjects and recordings

Ten native Finnish-speaking and normally hearing participants gave their written informed consent for the study. The experimental setup was approved by the ethics committee of the Helsinki and Uusimaa Hospital district.

MEG was recorded with a 306-channel Elekta Neuromag™ system (Elekta Oy, Helsinki, Finland). The passband was 0.03–330 Hz and sam-pling frequency 1000 Hz. The auditory speech stimulus was presented through a non-magnetic open-field audio speaker (Panphonics Ltd., Tampere, Finland) ~2.5 m in front of the subject (~40° upwards). The loudness was not standardized but adjusted instead to a comfortable listening level in the beginning of both recording sessions. With one subject, however, the loudness was reduced also between the parts of the recording at the will of the subject. For convenience and to reduce gaze wandering, the map of Europe was placed 1 m in front of the subject. The map had some relevance with respect to the story heard.

The stimulus comprised passages from the classic Finnish novel Välskärin kertomuksia by Z. Topelius. The audiobook (provided by Yleisradio Oy, Finland), originally intended for radio broadcasting, was professionally read by a male actor. The MEG recordings were arranged in two ~1 h sessions on separate days. Thefirst half of the sessions was spent listening to a 1-min-long passage repeatedly 24 times, with at least 5-s spacing between the trials. After every eight trials, the subjects' alertness was checked by short conversation and sitting position was

adjusted if needed. The second half was reserved for the ongoing story, listened in two pieces with a break in between. Altogether, the re-cordings comprised 48 repeated trials (recorded in 2–6 pieces) and a 59.5-min nonrecurring single-trial (recorded in four pieces). To exclude possible trivial explanation for results by spurious magneticfields or by other artifacts, we also collected 48 trials of empty-room data in a similar manner.

MEG preprocessing

The recordings were preprocessed with the signal-space-separation technique (SSS;Taulu et al., 2004; Taulu and Kajola, 2005) as imple-mented in the MaxFilter program (Elekta Oy, Helsinki, Finland) with the default parameter settings to reduce noise and to convert eachfile into the standard head position. All 204 gradiometer channels of the de-vice were selected for further processing. Principal component analysis (PCA) was applied subject-wise to reduce data dimensionality from 204 down to the maximum degrees of freedom (dof) remaining after the SSS procedure, the value provided by the MaxFilter program. All subject-wise recordings were used for PCA computation and the minimum dof of these (~ 68) determined the number of selected PCA basis vectors (Nc× dof matrix A, Nc= 204 i.e. the number of channels).

Of the 48 trials with the recurrent stimuli, 33 were randomly assigned to training and 15 to test sets. All data were wavelet-transformed with the Mexican-hat wavelet into 17 scales with exponential spacing corresponding to center frequencies approximately between 0.5 and 120 Hz (Torrence and Compo, 1998).

Spatialfiltering with CCA

Generally, the spatialfiltering corresponds to a linear mapping Z¼ WT

X;

where the Nc× Ntmatrix X is the multi-channel MEG time series, Ncis

the number of channels, and Ntis the number of time points. Nc× n

ma-trix W contains a set of n spatialfilters w1… wnthat set weights to the

different MEG sensors (or to PCA mappings in this case with NCreplaced

by dof, see the MEG preprocessing section) resulting in afiltered n × Nt

signal matrix Z = (z1… zn)Twherezi= (wiTX)T. Using CCA

nomencla-ture, we call the spatialfilters withe CCA basis vectors and the

respec-tive resultszithe canonical variates. Our aim is to estimate weights W

that provide variateszicorrelating maximally between the single-trial

responses. Here, the CCA (Hardoon et al., 2004; Hotelling, 1936) is utilized with the constraint that the matrix W should be same for all trial datasets (i.e. similarity-constrained CCA, see e.g.Dmochowski et al., 2012; Lahti et al., 2009). In the following, we briefly summarize our implementation based on data whitening and singular value de-composition (SVD) (Karhunen and Hao, 2011; Hyvärinen et al., 2001).

As usual, X is a zero-mean multi-dimensional dataset, E = (e1… en)

the matrix containing the unit-norm eigenvectors of the covariance matrix Cxx= E[xxT], and D = diag(d1… dn) is the diagonal matrix of

the eigenvalues of Cxx. The whitened data eX can be expressed as

eX ¼ D−1=2ET

X:

In a standard (i.e. not similarity-constrained) CCA, data whitening is performed separately for a second dataset (in a similar fashion) and the between-sets covariance matrix Css of the whitened data is

factorized by SVD as Css¼ UΣV T ¼XL i¼1 ρiuivTi:

The singular column vectors uiand viin matrices U and V are the basis

(3)

singular valuesρiin the diagonal ofΣ represent the canonical correlations.

For the constrained solution, wefirst (i) used the training trials for esti-mating a common whitening matrix from the average covariance matrix Cave= (∑E[XiXiT])/N whereXiis the data-set for trial i and N is the

num-ber of training trials. Then, (ii) as Cssneeds to be symmetric for U to equal

V and thereby provide common linear transform for all trials, we formed the between-sets covariance matrix Cssby averaging over all the

combi-nations of two different training data sets. Denoting the whitened version of data-set Xiby eXi, we have Css¼ 1 N Nð −1Þ Nð t−1Þ XN i¼1 XN j¼ 1; j≠i eXieX T j ¼ NXXT−I N−1 ð Þ Nð t−1Þ

where X is the whitened time series averaged over trials and I is the iden-tity matrix. It is useful to note that the condition j≠ i in the summations above (the cause of matrix I) makes our method seemingly different from the principal component analysis (PCA) method applied to X, i.e. our method considers only between-trials covariance of the whitened data. However, the identity matrix only shifts the singular values by a constant, and does not change the singular vectors or their ordering. Thus, our method is essentially equivalent to performing PCA on the averaged whitened data. Finally, the matrix W in the original signal space is

W¼ ED−1=2U:

In our implementation, the model performance was assessed subject-wise by applying the trained model to the data of the 15 test trials and by calculating their pair-wise correlations separately for n CCA projections in each wavelet scale. The t-test (two-tailed) was employed tofind statistically significant deviations from zero correlation (p b 0.01/n). Signal-to-noise ratio (SNR)

Notably, maximizing trial-to-trial correlationsρ corresponds to maximizing the SNR, given as (Bershad and Rockmore, 1974)

SNR¼ J1−ρρ þ K:

Here, the coefficients for the unbiased estimate are J ¼ exp − 2 Nt−3

 

≈ 1 and K¼ −1

2ð1−JÞ≈0 with large Nt. SNR was quantified for test trials in

consecutive analysis stages after SSS, after PCA, after wavelet transform, and after SimCCA. Prior work has shown that CCA as a preprocessing stage before ICA may improve SNR in some cases (Karhunen and Hao, 2011; Karhunen et al., 2012). Thus, for comparison, SNR computations with ICA were included.

Stimulus–response relationship

We assessed the applicability of trained spatialfilters in revealing plausible brain responses to 1-h-long nonrecurring audiobook stories by computing stimulus–response correlations. The analysis was limited to thefirst canonical variates in each scale, corresponding to the highest trial-to-trial correlations with training data. The stimulus signal was rectified, downsampled to 1000 Hz, wavelet-transformed to corre-sponding scale and split into 1-min epochs with 5-s spacing to reduce the temporal dependency between the consecutive epochs. Similar splitting was done with MEG time courses. The Pearson's correlation co-efficient was computed in each epoch. The delay in the brain signals with respect to stimulus was considered by shifting MEG data 0–249 ms in 1-ms steps to find the lag for maximal correlation. Two-tailed t-test with Bonferroni correction was used for evaluating the sta-tistical significance of the correlations deviating from zero (with 250 time shifts pb 0.01/250). The procedure was repeated for each of 17

wavelet scales separately. To verify the validity of t-test results, we formed control-data null distribution by computing correlations between the same signals but taken from different epochs in random.

Current sources contributing to canonical variates

We constructed neuroanatomical sensitivity maps to visualize the influence of the source currents on the amplitude of each canonical variate. Let A represent Nc× dof matrix of PCA basis vectors earlier

used for preprocessing, W is dof × Nsmatrix holding the CCA basis

vectors, and G is Nc× 3Nsourcegain matrix relating the source currents

to the sensor signals, i.e. the forward solution. Here, Nsis the number

of statistically significant canonical basis vectors, Nsourceis the number

of source locations, and the multiplier 3 comes from three geometric dipole orientations (x, y, z). We compute

S¼WT

ATG;

where Ns× 3Nsourcematrix S represents the sensitivity of the different

basis vectors (rows) to unit current dipoles in three orthogonal orienta-tions x, y, and z, for each location. For visualization, these orientaorienta-tions are combined into single sensitivity values vi; j¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2

i; j;xþ s2i; j;yþ s2i; j;z

q

for each basis vector i and location j, and are further normalized into range 0… 1 asVi¼ vi;1…vi;Nsource

 T

= max

j vi; j

 

. In other words, G intro-duces anatomical mapping extending the idea of plotting the spatial filter weights alone (e.g.O'Sullivan et al., 2014). The approach is a sim-plified alternative, e.g. to visualizing correlations between the canonical variates and the time series at source points (Lankinen et al., 2014).

T1-weighted anatomical MR images of seven subjects out of ten were available from prior studies. With the permission of subjects, the images were reanalyzed to estimate matrix G by using FreeSurfer soft-ware (http://surfer.nmr.mgh.harvard.edu) and MNE Suite software package (http://www.martinos.org/mne/). For settings, we used 7 mm grid spacing and smoothing constant 5 for graphics. The registration of MEG coordinates with MRI coordinate system was done separately for each separate MEG recording (training data) with our custom made program. The subject'sfinal forward matrix was formed as an average of the subject's forward solutions, weighted according to the length of the specific MEG recordings. During the process, anatomical MRI data of two subjects were excluded due to unsuccessful boundary-element modeling (BEM).

Results

All MEG data were used in the analysis, except with one subject ~ 11-min piece of the ongoing story was discarded due to recording artifacts.

SimCCA

The model training and primary performance testing was carried out with 33 and 15 1-min-long repeated trials, respectively.Fig. 1A repre-sents the maximal trial-to-trial correlations (i.e. between the 1st canon-ical variates) for each subject and the wavelet scale. Notably, statistcanon-ically significant correlations (p b 0.01, t-test with Bonferroni correction) could be found throughout the 0.5–120 Hz frequency range, most prom-inent ones residing below ~20 Hz. The correlations varied considerably between subjects, e.g. from 0.11 up to 0.55 (median 0.33) at 0.5 Hz. For comparison, the same procedure did not reveal any statistically signi fi-cant correlations with empty-room data, discounting the possibility of spuriously induced magnetic fields. Fig. 2demonstrates canonical variate time series.

(4)

SNR

Closer inspections of SNR in different processing stages revealed that spatialfiltering with SimCCA was the most influential step in the pipe-line (Fig. 3). With the test trials at 0.5 Hz, SNR increased 9.1–20.0 fold (median 12.5 in the group), from 0.021–0.074 (median 0.047) to 0.29–1.21 (median 0.49). These numbers refer to the first canonical variates at 0.5 Hz providing maximal effects. Notably, PCA or ICA them-selves did not improve SNR considerably in this context and the ICA after SimCCA decreased the SNR of that with SimCCA alone.

Stimulus–response correlations

Spatiallyfiltered MEG data (the 1st canonical variates), computed for the nonrecurring 1-h-long audiobook stimuli, correlated signi ficant-ly with the speech envelope (pb 0.01, t-test with Bonferroni correction for 250 tests, i.e. lags;Fig. 4A). More specifically, out of 40,250 t-tests (number of statistically significant canonical basis vectors in the groups × 250 lags), 24,049 (60%, i.e. over relatively wide range of lags) were statistically significant. The corresponding number with the randomized control data was 2 with an expected value of 1.6. The peak correlations were typically distributed around ~ 100 ms lag for the MEG signal, limiting the analysis to lags between 0 and 249 ms (Fig. 4B). The results were significant with wavelet scales up to 31 Hz with all subjects. However, the most pronounced correlations resided below 3 Hz with considerable inter-individual differences (up to 0.61, median 0.45 at 0.5 Hz). Practically equal correlations were gained

with 1-min training trials data analyzed with the same lags for refer-ence. For interpretation, it is useful to note that the individual signals are not necessarily consistent in the group or may not represent activity in primary auditory regions. Instead, the results merely indicate that the tested time series were stimulus-related at the scales up to 31 Hz.

Anatomical mapping

We computed topographic maps for qualitative feasibility evalua-tion withfive subjects' data. These maps represent the sensitivity of in-dividually optimized spatialfilters to source amplitude at different locations on the cortex.Fig. 5demonstrates selected results; complete data are presented in the Supplementary material. Characteristically, the spatialfilters showed selectivity with different cortical regions and hemispheres, and the influence of auditory regions is obvious. Sources in superior temporal sulcus/gyrus, inferior frontal regions, motor/ somatosensory, and parietal cortices consistently contribute to filters across subjects. Heterogeneity of subject-wise optimized models is considerable.

Discussion

We utilized a task-optimized spatialfiltering strategy for uncovering single-trial brain responses to naturalistic stimuli. The method succeeded in revealing relevant individual MEG time-courses to an audiobook story. The feasibility of the analysis was demonstrated in three ways. First, the spatialfilters were capable of uncovering replicable brain signals with

0.54 0.75 1.1 1.5 2.1 2.9 4.1 5.7 8 11 16 22 31 44 61 86 120 0 0.1 0.2 0.3 0.4 0.5 0.6 Correlation 0.54 0.75 1.1 1.5 2.1 2.9 4.1 5.7 8 11 16 22 31 44 61 86 120 6 8 10 Subjects 0.54 0.75 1.1 1.5 2.1 2.9 4.1 5.7 8 11 16 22 31 44 61 86 120 0 2 4 6 8

Wavelet center frequency (Hz)

Basis vectors

A

B

C

Fig. 1. (A) Intra-subject correlations at the seventeen wavelet scales with central frequencies between ~0.5 and 120 Hz (note the logarithmic axis). The results correspond to thefirst ca-nonical variates. The circles present the average correlation between the test trials for each subject who exceeded the threshold for statistical significance. The horizontal line shows their median, the vertical thick lines represent the quartiles, and the thin line represents the extent of their distribution. (B) The number of subjects with statistically significant findings at each scale for thefirst canonical variate. (C) The number of the statistically significant canonical variates for each individual. The lines indicate the minimum, the median and the maximum in the group.

(5)

0.54 Hz 5 s 1.1 Hz 2 s 5.7 Hz 500 ms 200 ms 11 Hz 20 ms 120 Hz

Fig. 2. An example of spatialfiltering (i.e. canonical variates) at wavelet scales corresponding to central frequencies of 0.54 Hz, 1.1 Hz, 5.7 Hz, 11 Hz, and 120 Hz in one subject. In the first three bands, thefirst and the second canonical variates are shown in the same time window. All 48 trials are superimposed. Note different time scales for the canonical variates.

0.54 0.75 1.1 1.5 2.1 2.9 4.1 5.7 8 11 16 22 31 44 61 86 120 0 0.1 0.2 0.3 0.4 0.5 0.6

Wavelet center frequency (Hz)

Median SNR over subjects

simCCA (test trials) simCCA (training trials)

B A

C

wavelet (all trials) ICA without simCCA (all trials)

ICA after simCCA (training trials) ICA after simCCA (test trials)

Fig. 3. Median SNR over the group in selected stages of the analysis chain (SSS-PCA-wavelet-SimCCA-ICA). Letters in the left corner represent group medians of individually maximal SNR (A) over 204 channels, (B) over ICA components after PCA, and (C) over PCA components, SSS signals as a starting point.

(6)

repeated 1-min-long test trials. With this test data, consistent brain re-sponses were detected up to ~ 120 Hz in 6/10 subjects, with the most prominent signals residing below ~20 Hz. Statistically significant trial-to-trial correlations, especially at the upper end of the spectrum, demon-strate the efficiency of the method in poor SNR conditions. The detected frequency range far exceeded the frequencies typically observed in MEG/ EEG studies with narrative speech stimuli (b10 Hz; e.g.O'Sullivan et al., 2014; Gross et al., 2013). Second, with 1-h-long nonrecurring recordings, signal components with the largest trial-to-trial correlations (i.e. the 1st canonical variates) were shown to be stimulus-related up to ~ 30 Hz. Thus our trainedfilters were able to generalize over independent datasets and to reveal feasible brain responses to relatively long-lasting stream of nonrecurring speech stimuli. Third, the spatialfilters were found to be sensitive to source currents in cortical regions previously known to participate in speech processing.

The proposed approach has several distinctions compared with the well-known spatialfiltering schemes. First, compared with beamforming (van Veen et al., 1997), the analysis can be done solely in sensor-space signals avoiding errors in anatomical source models. This procedure may broaden the applicability of the method, for example, to EEG record-ings. However, the mapping of the signal sources in post-hoc analysis can reveal clusters of mutually correlated signal sources that are important in e.g. functional connectivity analysis. Second, SimCCA has some simi-larities, for example, with ICA or with PCA. Although replicability re-quirement favors the proposed methodology, thereby preventing fair comparison with the other methods, we were somewhat surprised to find out that standard ICA or PCA did not noticeably improve SNR, and thus could not generalize over training and testing sets, or to reveal con-sistent trial-to-trial responses. This result, however, does not exclude the possibility that e.g. independent components estimated for a particular dataset could be stimulus-related. Possibly, much of the difficulty stems from the low SNR of the unaveraged single-trial MEG data. These methods, however, do not intrinsically utilize sample ordering (see e.g.

Karhunen and Hao, 2011), and they are thus principally suboptimal in revealing consistent temporal structures between trials. As a solution, previous studies have pointed out advantages of combining ICA and CCA (Karhunen et al., 2012; Karhunen and Hao, 2011). In-depth analysis with ICA is out of the scope of this paper.

Our approach aimed at uncovering both instantaneous changes, and longer temporal patterns of brain signals in detail, in contrast to typical topographic mapping per se. In this line of research, previous studies indicate that sound onsets (Lakatos et al., 2005) or other characteristics in natural sound stimuli, for example in rock music (Szymanski et al., 2011), in environmental sounds, or in animal vocalizations (Kayser

et al., 2009) often cause replicable transient brain responses or enforce abrupt phase resetting of oscillations (Lakatos et al., 2008). Moreover, the temporal structure of the stimulus, such as syllabic variations in the heard speech (4–8 Hz; e.g.Luo and Poeppel, 2007; Cogan and Poeppel, 2011), is reflected in the brain oscillation patterns. On this basis, we selected the Mexican-hat wavelet transform for its capability to react both to the oscillatory and to the aperiodic waveforms with minimal temporal blurring. The downside is that the effective passbands broaden, even to tens of Hz in the gamma range. Here, the redundancy between adjacent scales was reduced by exponential wavelet scaling.

Founding the model training on replicable brain responses has note-worthy physiological implications. The analysis is inherently steered to-wards revealing activity in brain areas where the responses remain relatively similar from trial to trial. Evidently, the extrinsic networks (Golland et al., 2007; Boldt et al., 2013) most directly influenced by the stimuli, are involved. In this respect, association between the spa-tiallyfiltered signals and the speech stimulus was expected. Notably, stimulus–response relation was not explicitly included in models (in contrast, for example, toDing and Simon, 2013; Koskinen et al., 2013; O'Sullivan et al., 2014; Zion Golumbic et al., 2013). Thus, the parameter search space was not constrained by an incomplete and partly unknown set of stimulus features that could lead to the exclusion of unknown contributing sources.

In prior studies, the most common speech-signal feature correlating with MEG or EEG has been signal power or amplitude envelope (Abrams et al., 2008; Ahissar et al., 2001; Nourski et al., 2009; Ding and Simon, 2013; Gross et al., 2013), as used here for validation. Inter-estingly, envelope was found to provide only partial explanation for the signal components, as suggested by statistically significant but weak stimulus–response correlations above 3 Hz. Specifying a more complete set of contributing speech features is out of the scope of this paper as the aim was merely to demonstrate the dependency. Consider-ing the stimulus–response-relationship, it is useful to note that the pro-posed methodology may help tackling the response variability that inherently reduces the mutual dependency between the stimulus and the response, and thus, the performance of stimulus–response models (see e.g.Belitski et al., 2010; Cogan and Poeppel, 2011; Kayser et al., 2009; Schyns et al., 2011; Szymanski et al., 2011).

We make two remarks on the used experimental setup. First, in ex-periments where the same stimuli are repeated, both brain responses and the subject's attentiveness tend to decrease with stimulus repeti-tion. At the same time, changes occur in response time courses and in the brain areas that react to the stimulus. Second, with narratives, a

0.54 0.75 1.1 1.5 2.1 2.9 4.1 5.7 8 11 16 22 31 44 0 0.1 0.2 0.3 0.4 0.5 0.6

Wavelet center frequency (Hz)

Correlation 0.54 0.75 1.1 1.5 2.1 2.9 4.1 5.7 8 11 16 22 31 44 0 50 100 150 200 250

Wavelet center frequency (Hz)

Delay (ms)

A

B

Fig. 4. (A) Correlation between the speech envelope and the spatiallyfiltered MEG data (the 1st canonical variates) recorded during listening to nonrecurring audiobook stories for an hour. The dashed line represents the median correlation with the training data. (B) Estimated lags in MEG signals at peak correlations. In both panels, the circles present the average value between the epochs for each subject (who exceeded the threshold for statistical significance). The horizontal line shows their median, the vertical thick lines represent the quartiles, and the thin line represents the extent of their distribution.

(7)

relation exists between the extent of consistently responding brain re-gions and the lengths of temporal receptive windows (TRWs), i.e. time-spans that accumulate contextual information (Hari et al., 2010; Lerner et al., 2011). Generally, early auditory regions tend to process

momentary features while high-level perceptual and cognitive areas more likely favor longer coherent structures in the story (e.g. 38 ± 17 s inLerner et al., 2011). In our work, the SimCCA models were trained with repeated stimuli that was relatively long, 1-min. Our data provide

0.54 Hz

(#2)

2.9 Hz

(#5)

2.1 Hz

(#5)

2.1 Hz

(#2)

16 Hz

(#1)

A

B

C

D

E

Fig. 5. Selected sensitivity maps of Subject #2. Sensitivity values starting from 0.5 are colored red, turning to orange and yellow at ~0.9. (A) This specific filter is most sensitive to sources at auditory and superior temporal regions. (B and C) Clearly lateralized, nonspecific activity across regions typically associated with speech processing. (D and E) These two spatial filters at the same wavelet scale pick mutually uncorrelated signals from motor/sensory regions.

(8)

some indications that the detected brain signals may reflect activity not only in early auditory regions, but also in more extensively temporal and posterior cortices, and in inferior frontal regions, in addition to motor/somatosensory areas. These results are concordant with previous electrophysiologicalfindings with narrative speech (Gross et al., 2013; Zion Golumbic et al., 2013).

In conclusion, the proposed spatialfiltering strategy offers apparent methodological advantages. Spatialfilter parameter estimation based on replicability constraint effectively reduces noise, thereby uncovering consistent time courses hidden in multivariate MEG datasets. In this framework, the method provides the maximum SNR given the model-ing constraints. Importantly, the spatialfilters were robust enough to generalize over independent datasets, revealing plausible single-trial time courses up to ~ 120 Hz range. The very topics of replicability of analysis results, noise resiliency of models, and the ability to reveal stimulus-related brain activity all play crucial roles in attempts to accu-rately and unambiguously link brain responses with stimulus features.

Acknowledgments

The authors thank Profs. Riitta Hari and Aapo Hyvärinen for valuable suggestions and comments, MSc. Mia Illman for assistance in MEG re-cordings, Prof. Matti Hämäläinen, Prof. Lauri Parkkonen, MSc. Kaisu Lankinen, and MSc. Jukka Saari for discussions. The study was supported by the Academy of Finland (post-doctoral project #134655) and aivoAALTO project of the Aalto University.

Appendix A. Supplementary data

Supplementary data to this article can be found online athttp://dx. doi.org/10.1016/j.neuroimage.2014.06.018.

References

Abrams, D.A., Nicol, T., Zecker, S., Kraus, N., 2008.Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech. J. Neurosci. 28, 3958–3965.

Ahissar, E., Nagarajan, S., Ahissar, M., Protopapas, A., Mahncke, H., Merzenich, M.M., 2001.

Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. Proc. Natl. Acad. Sci. U. S. A. 98, 13367–13372.

Belitski, A., Panzeri, S., Magri, C., Logothetis, N.K., Kayser, C., 2010.Sensory information in localfield potentials and spikes from visual and auditory cortices: time scales and frequency bands. J. Comput. Neurosci. 29, 533–545.

Ben-Yakov, A., Honey, C.J., Lerner, Y., Hasson, U., 2012.Loss of reliable temporal structure in event-related averaging of naturalistic stimuli. NeuroImage 63, 501–506.

Bershad, N.J., Rockmore, A.J., 1974.On estimating signal-to-noise ratio using the sample correlation coefficient. IEEE Trans. Inf. Theory 20, 112–113.

Boldt, R., Malinen, S., Seppä, M., Tikka, P., Savolainen, P., Hari, R., Carlson, S., 2013. Listen-ing to an audio drama activates two processListen-ing networks, one for all sounds, another exclusively for speech. PLoS One 8, e64489.

Brennan, J., Nir, Y., Hasson, U., Malach, R., 2012.Syntactic structure building in the anterior temporal lobe during natural story listening. Brain Lang. 120, 163–173.

Cogan, G.B., Poeppel, D., 2011.A mutual information analysis of neural coding of speech by low-frequency MEG phase information. J. Neurophysiol. 106, 554–563.

Ding, N., Simon, J.Z., 2013.Adaptive temporal encoding leads to a background-insensitive cortical representation of speech. J. Neurosci. 33, 5728–5735.

Dmochowski, J.P., Sajda, P., Dias, J., Parra, L.C., 2012.Correlated components of ongoing EEG point to emotionally laden attention— a possible marker of engagement? Front. Hum. Neurosci. 6, 112.

Golland, Y., Bentin, S., Gelbard, H., Benjamini, Y., Heller, R., Nir, Y., Hasson, U., Malach, R., 2007.Extrinsic and intrinsic systems in the posterior cortex of the human brain revealed during natural sensory stimulation. Cereb. Cortex 17, 766–777.

Gross, J., Hoogenboom, N., Thut, G., Schyns, P., Panzeri, S., Belin, P., Garrod, S., 2013.Speech rhythms and multiplexed oscillatory sensory coding in the human brain. PLoS Biol. 11, e1001752.

Hardoon, D.R., Szedmak, S., Shawe-Taylor, J., 2004.Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16, 2639–2664.

Hari, R., Parkkonen, L., Nangini, C., 2010.The brain in time: insights from neuromagnetic recordings. Ann. N. Y. Acad. Sci. 1191, 89–109.

Hotelling, H., 1936.Relations between two sets of variants. Biometrika 28, 321–377.

Hyvärinen, A., Karhunen, J., Oja, E., 2001.Independent Component Analysis. John Wiley & Sons.

Karhunen, J., Hao, T., 2011.Finding dependent and independent components from two related data sets. Proc. of Int. Joint Conf. on Neural Networks (IJCNN 2011), San Jose, California, USA, July 31–August 5, 2011, pp. 457–466.

Karhunen, J., Hao, T., Ylipaavalniemi, J., 2012.A canonical correlation analysis based method for improving BSS of two related data sets. Proc. of the 10th Int. Conf. on Latent Variable Analysis and Signal Separation. Lecture Notes in Computer Science, vol. 7191. Springer, pp. 91–98.

Kayser, C., Montemurro, M.A., Logothetis, N.K., Panzeri, S., 2009.Spike-phase coding boosts and stabilizes information carried by spatial and temporal spike patterns. Neuron 61, 597–608.

Koskinen, M., Viinikanoja, J., Kurimo, M., Klami, A., Kaski, S., Hari, R., 2013.Identifying fragments of natural speech from the listener's MEG signals. Hum. Brain Mapp. 34, 1477–1489.

Lahti, L., Myllykangas, S., Knuutila, S., Kaski, S., 2009. Dependency detection with similar-ity constraints. Machine Learning for Signal Processing, 2009. MLSP 2009. IEEE International Workshop.http://dx.doi.org/10.1109/MLSP.2009.5306192.

Lakatos, P., Shah, A.S., Knuth, K.H., Ulbert, I., Karmos, G., Schroeder, C.E., 2005.An oscilla-tory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. J. Neurophysiol. 94, 1904–1911.

Lakatos, P., Karmos, G., Mehta, A.D., Ulbert, I., Schroeder, C.E., 2008.Entrainment of neuro-nal oscillations as a mechanism of attentioneuro-nal selection. Science 320, 110–113.

Lankinen, K., Saari, J., Hari, R., Koskinen, M., 2014.Intersubject consistency of cortical MEG signals during movie viewing. NeuroImage 92C, 217–224.

Lerner, Y., Honey, C.J., Silbert, L.J., Hasson, U., 2011.Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915.

Luo, H., Poeppel, D., 2007.Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54, 1001–1010.

Nourski, K.V., Reale, R.A., Oya, H., Kawasaki, H., Kovach, C.K., Chen, H., Howard, M.A., Brugge, J.F., 2009.Temporal envelope of time-compressed speech represented in the human auditory cortex. J. Neurosci. 29, 15564–15574.

O'Sullivan, J.A., Power, A.J., Mesgarani, N., Rajaram, S., Foxe, J.J., Shinn-Cunningham, B.G., Slaney, M., Shamma, S.A., Lalor, E.C., 2014. Attentional selection in a cocktail party en-vironment can be decoded from single-trial EEG. Cereb. Cortex.http://dx.doi.org/10. 1093/cercor/bht355.

Peelle, J.E., Gross, J., Davis, M.H., 2013.Phase-locked responses to speech in human auditory cortex are enhanced during comprehension. Cereb. Cortex 23, 1378–1387.

Regev, M., Honey, C.J., Simony, E., Hasson, U., 2013.Selective and invariant neural responses to spoken and written narratives. J. Neurosci. 33 (40), 15978–15988.

Schyns, P.G., Thut, G., Gross, J., 2011.Cracking the code of oscillatory activity. PLoS Biol. 9, e1001064.

Szymanski, F.D., Rabinowitz, N.C., Magri, C., Panzeri, S., Schnupp, J.W.H., 2011.The laminar and temporal structure of stimulus information in the phase offield potentials of auditory cortex. J. Neurosci. 31, 15787–15801.

Taulu, S., Kajola, M., 2005.Presentation of electromagnetic multichannel data: the signal space separation method. J. Appl. Phys. 97, 124905.

Taulu, S., Kajola, M., Simola, J., 2004.Suppression of interference and artifacts by the signal space separation method. Brain Topogr. 16, 269–275.

Torrence, C., Compo, G.P., 1998.A practical guide to wavelet analysis. Bull. Am. Meteorol. Soc. 79, 61–78.

van Veen, B.D., van Drongelen, W., Yuchtman, M., Suzuki, A., 1997.Localization of brain electrical activity via linearly constrained minimum variance spatialfiltering. IEEE Trans. Biomed. Eng. 44, 867–880.

Whitney, C., Huber, W., Klann, J., Weis, S., Krach, S., Kircher, T., 2009.Neural correlates of narrative shifts during auditory story comprehension. NeuroImage 47 (1), 360–366.

Wilson, S.M., Molnar-Szakacs, I., Iacoboni, M., 2008.Beyond superior temporal cortex: intersubject correlations in narrative speech comprehension. Cereb. Cortex 18 (1), 230–242.

Zion Golumbic, E.M., Ding, N., Bickel, S., Lakatos, P., Schevon, C.A., McKhann, G.M., Goodman, R.R., Emerson, R., Mehta, A.D., Simon, J.Z., Poeppel, D., Schroeder, C.E., 2013.Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron 77, 980–991.

References

Related documents

Log in to your Account Management area at www.bluecamroo.com using your workspace login details and choose Select Email Templates to view the available templates and copy

A set of coordination relations were identified with terms containing conjunctive phrases, "and" or "or". Furhter, statistical analysis was performed on the list

As such, this paper provides the first examination of the factors which may be linked to osteopathic workforce distribution with a specific focus on the relationship between

This study provides a useful perspective regarding the manner in which the curriculum commission has implemented the subject of natural hazards in German curricula. It can be

To assess the impact of governance mechanisms on bank valuations, we regress bank valuation on the legal protection of minority shareholders, bank supervision and regulation

To identify SNPs and candidate genes associated with natural variation in lipid and diterpene contents in Arabica beans, we performed GWAS using four methods:

We employed three feature systems for the experiment: 12 th order Linear Frequency Cepstral Coefficients (LFCC) [7], Perceptual Linear Prediction (PLP) [9] and Mel

The results from the isolation and identification of fungi in pure culture have indicated the contamination of pigeon pea collected in southern Benin by fungi, with a