4 Variability in fMRI: The Generality of Single Session Results
4.3 Methods Experimental Setup and Scanning
4.3.1 Subject and Scanning Details
The subject was a healthy 23 year old right-handed male. The data were acquired on a Siemens MAGNETOM Vision (Siemens, Erlangen, Germany) at 2T. Each BOLD-EPI volume scan consisted of 48 transverse slices (inplane
matrix 64x64; voxel size 3x3x3mm; TE=40ms; TR=4.1s). A T1-weighted high- resolution MRI of the subject (1 x 1 x 1.5mm resolution) was acquired to facilitate anatomical localisation of the functional data.
4.3.2 Experimental Setup
As the experiment was designed to examine the generality of a single session, each session was conducted as if it were the first time the subject had been examined: in effect, as if only one session was to be obtained. This was done to control for obvious and artefactual between-session differences whilst ensuring that sources of typical between-session variability (scanner hardware and subject physiology) would be sampled in an unbiased manner. The following precautions were taken: the same operators always controlled the scanner; ambient light and sound levels were similar between sessions; and spoken instructions to the subject were always exactly the same. Day-to-day quality control measurements of scanner characteristics were not acquired. It was impossible to control for the fact that the subject was always aware that he had performed the task before in the scanner, only under slightly different circumstances. I refer to this effect as the
'‘GroundhogDay' effect.
Thirty-three individual sessions were acquired from the subject over a period of two months, collected at 12pm and 6pm. Each scanning session consisted of one run of each of a motor, cognitive and visual paradigm, presented in random order. Session paradigms were designed to reduce the effects of variable task performance. For example, the subject was familiarised with both the random number generation and finger-tapping task before performing them in the scanner, in an attempt to eliminate performance effects. In addition, the rates at which both tasks were performed were chosen to ensure that subject performance would be stable across sessions. These decisions were informed by studies that used similar paradigms (motor paradigm - Blinkenberg et a l, 1996; cognitive paradigm - Jahanshahi et a l, manuscript submitted), demonstrating that the rate at which the subject tapped his finger, and the degree of randomness of the sequence generated were both stable over time. It is worth noting, however, that an empirical
demonstration of these would have been desirable. 4.3.2.1 Motor Paradigm
The subject tapped his right index finger, paced by an auditory tone (1.5Hz). The subject’s hand was restrained within a custom-built thermoplastic splint, which ensured that the amplitude of the finger movement was consistent both across and within sessions. Each activation epoch was alternated with a rest epoch, in which the pacing tone was delivered to control for auditory activation. Thirteen blocks were collected per session (seven rest and six active). Each block was 24s/6 scans long, making 78 scans total for each session. The subject maintained fixation on a cross that was backprojected onto a transparent screen by a LCD video projector as in previous experiments. The projector was similarly employed to deliver visual instructions to the subject before each block (either ‘Move’ or ‘Rest’).
4.3.2.2 Cognitive Paradigm
The subject generated random numbers from 1-9, paced by an auditory tone (.66Hz). In the rest condition the subject counted from 1 to 9, similarly paced by the auditory tone. The subject fixated in a similar fashion to before. Thirteen epochs were collected in total (seven rest and six active). Again, block length was 6 scans, and 78 volumes were collected in each session.
4.3.2.3 Visual Paradigm
A reversing black and white checkerboard flickering at 8Hz (Fox and Raichle, 1985) was presented to the subject. The subject focused on a central fixation spot that was constant across both activation (reversing checkerboard stimulation) and rest (fixation spot only) blocks. Six epochs were acquired in total (three activation and three rest). Blocks were 6 scans long, and 36 scans in total were acquired in each session.
4.3.3 Image Preprocessing
Data preprocessing was carried out using SPM99 (Wellcome Department of Cognitive Neurology, London, UK; http:/www.fil.ion.ucl.ac.uk/spm). All functional volumes, independent of session or paradigm, were realigned to the
first volume acquired (Friston et a l, 1995) and a mean realigned volume created. Sessions containing obvious movement artifacts were discarded at this stage: three motor sessions, two visual sessions and three cognitive sessions were excluded in this manner. The subject’s T1-weighted structural scan was co registered to the mean functional volume, and the mean volume used to determine the parameters applied to all volumes during spatial normalisation and resampling (Ashbumer et a l, 1997; Ashbumer and Friston, 1999) to a standard template (Evans et a l, 1993). As the volume of brain sampled in each study was affected by the position of the subject within the scanner’s field of view, we found that the extreme superior and inferior portions of the subject’s brain were sparsely sampled. To address this, voxels not sampled in every session were eliminated during normalisation. All functional volumes were then smoothed with a 6mm
FWHM Gaussian kernel. Global changes in fMRI response from scan to scan were
removed by proportionally scaling each scan to have a common global mean voxel value.
4.3.4 D ata Analysis
Statistical analysis was carried out using the general linear framework described by Worsley & Friston (1995). As the analyses presented in this chapter are more elaborate than previous chapters, it is necessary to revisit the implementation of the general linear model (GLM) in SPM.
The sessions for each paradigm were modeled with a simple linear model for the data at each voxel:
K
'^ij = Y< + cc, f(/) + E Pik&ff) + Sij (1)
k - 1
Here Yÿ denotes the value of the voxel of scan y of session is the mean (block) effect for session /. f(/) is a reference waveform, a function of the scan index within session that has the same form for all sessions. This is simply one possible expansion of eqn.7 from chapter 2. As in previous analyses, a simple “convolved box-car” reference waveform [CBC] was
used as f(/), consisting of a box-car function of zero’s and one’s representing the experimental timecourse, convolved with the expected heamodynamic response function (HRF). The parameter a, is the amplitude of the CBC response for session i. Differences in the session response amplitudes constitute session by condition interactions. The additional reference functions g^(/) are a set of discrete cosine basis functions, effecting a simple “high pass” filter, as described by Holmes et a l{ \9 9 1 \ with cut-off (specified by K) set at twice the experimental period. Unlike previous analyses, the high-pass filter was explicitly included in the design matrix. Under the assumption that this model fits, any residual errors
{Sij) will have zero mean and exhibit only short term auto-correlation within
session. In the following I shall refer to the CBC amplitudes ai simply as the response for session /.
4.3.4.1 Individual Session Analyses.
Each session was first analyzed as a single fMRI session (as though it were the only session acquired) using a ‘standard’ SPM analysis. The ‘Groundhog Day’ effects aside, this enables a comparison of how the results of a single session experiment can vary. It also illustrates why drawing conclusions about a subject from a single session can be dangerous.
The model used is that of Eqn. 1 evaluating a single session (/) at a time. The residual errors are assumed to be Normally distributed with variance o,(e)^, estimated individually for each session. The design matrix for each session is illustrated in figure 1 A. A f-statistic assessing the null hypothesis of zero response (a/ = 0) was constructed for each voxel, giving an SPM{t} for each session indicating the significance of the response for each session. For display, each session specific SPM(t} was transformed to an equivalent SPM(Z} by probability integral transform. This was effected by replacing each r-value with the standard Normal ordinate with the same upper tail probability. Temporal autocorrelation was dealt with using the method of Worsley and Friston (1995), by temporally smoothing the session time series with a Gaussian kernel of 6s f w h m.
M otor S ession D esign Matrix
F ig u re 4 .1 . D e s ig n m a tr ic e s u s e d in a n a ly s is . X i s a s in g le s e s s io n d e s ig n m a tr ix w ith th e r e g re s s o r o f in te r e s t (th e C B C , c o lu m n 1), th e s e s s io n m e a n e ffe c t c o lu m n 2 ) a n d th e s e t o f d is c r e te c o s in e b a s is fu n c tio n s u s e d to e ffe c t h ig h - p a s s filte r in g (c o lu m n s 3 -8 ). 5 is a m u lti- s e s s io n d e s ig n m a tr ix , c o n s tr u c te d f r o m n s in g le s e s s io n d e s ig n m a tr ic e s , w h e r e n is th e n u m b e r o f s e s s io n s a n a ly s e d a t th e m u lti- s e s s io n le v e l. T h e d e s ig n m a trix in A w a s u s e d fo r s in g le s e s s io n a n a ly s e s , w h e r e a s B w a s u s e d f o r b o th f ix e d a n d ra n d o m e ffe c ts m u ltis e s s io n a n a ly s e s _____________________________________________________________________________________________
4.3.4.2 Multiple Session Analyses - Session by Condition Interactions
To assess whether there were significant session by condition interactions, the model of Eqn. 1 for all I sessions (design matrix shown in figure IB) was contrasted with a reduced model where the response was identical for all sessions («/ = a', / = 1,...,7). Again, it is assumed that residual variance is identical across sessions, such that the residuals are normally distributed with zero mean and variance Oe^. The additional variance modeled by the full model (including session by condition interactions) was compared with the residual variance using an extra sum-of-squares F-test (Draper and Smith, 1981), modified to account for temporally auto-correlated residuals using the method o f Worsley and Friston (1995). The resulting SPM {F} identifies voxels that display significant session by condition interactions.
4.3.4.3 Multiple Session Analyses - Fixed Effects Model
If there are substantial differences in response from session to session a single session experiment is inadequate if one wishes to examine a subject’s response to experimental stimuli in general, and so a multiple session experiment is necessitated. Given a multiple session data set, modeled with Eqn.l (design matrix shown in figure IB), a fixed effects analysis proceeds by assuming that the session specific responses a, themselves are of interest - in other words, these discrete levels of the factor are those over which we wish to extend inference. The residual errors Eÿ are assumed normally distributed with zero mean, and constant variance <3^. Evidence of a response across sessions can be tested by examining
I
%, the average o f the I session specific responses a . = ^ ai. Again, short term / = 1
temporal auto-correlation in the errors were handled using the method of Worsley and Friston (1995), temporally smoothing each session time series with a Gaussian kernel of 6s f w h m. However, since the session specific responses are
considered fixed, only one component of variance is accounted for (the residual error variance Og^), and inference from the resulting SPM{t} is limited to the
average response fo r the observed sessions. As such, this analysis is sensitive to large effects in a small number of sessions.
4.3.4.4 Multiple Session Analyses - Random Effects Model,
To extend inference beyond the sessions acquired, a different approach is needed. The sessions acquired are treated as a sample of all possible sessions, each of which with its own response «/. Thus, the of Eqn.l is treated as a
random effect. In other words, the response amplitudes a, for the sessions under
consideration are merely one sample from the (hypothetical) distribution of response amplitudes for a session chosen at random. A simple second level (between-session) model would be:
cci - a + £ i (2)
where the Oi are from Eqn. 1 (the within-session model), and the between-session errors £i have zero mean, variance and can be considered independent. Thus, the random effects model has two components of variance, between-session, Oa^, and within-session (residual), Og^. Using this model inference can be extended to a, the underlying average response across all possible sessions.
In general, analysis of such random effects models can be difficult (Searle & Casella, 1992). However, the simple models considered here are balanced (the models for each session are exactly the same), and separable (the only common parameter across sessions is the intra-session (residual) variance c i , assumed constant for all sessions). This permits a simple “summary statistic” approach (Frison & Pocock, 1992). Such an approach was first described for neuroimaging data by Worsley e/a/. (1992), and its importance subsequently highlighted by Holmes e/fl/. (1998) who describe the implementation (in SPM) used here. In essence, the model of Eqn.l is fitted to yield estimates & of the response amplitude at at each voxel for each session. The variance of the estimated response amplitudes & across sessions incorporates both within (Cg^) and between-session variability {(5a) in the appropriate proportions to assess the significance of the overall subject activation effect a (Prison & Pocock, 1992).
Thus, each session data set is summarised by a single contrast image whose voxel values are the fitted response amplitudes. These contrast images can then be assessed at the inter-session level for a significant average effect, with inference extending to the subject in general (under similar experimental conditions) rather than just the particular sessions acquired.
To conduct a parametric analysis, a specific form must be used for the between- session errors In the absence of any evidence (yet) to suggest otherwise, consider a simple Normal model:
C C i ^ a + E i , £i ~ # ( 0 , (5a) (3)
My approach here is pragmatic: nothing is known about f/’s distribution. The assumption of Normality allows random effects analyses to be introduced simply and logically as an extension of the parametric statistical tests used by SPM. The validity of this assumption will be explored in the Discussion.
With the models of Eqn.l and Eqn.3, the random effects analysis can be effected as a simple one-sample Mest on the contrast images, yielding an SPM{/}.
zSEEE:
m .
... .
Figure 4.2. A and B. Single session sagittal Maximum Intensity Projections (MIPs) for the motor paradigm. The number of each session is displayed below it. Although thirty-three sessions were collected, only thirty are shown here (sessions 17 23 and 24 were rejected due to movement artifacts). All results are thresholded at/?<0.05 corrected for multiple comparisons unless otherwise stated.
1 / 1 \L" ... ...'I . .. J?
16
••••/■ .... w — N .... (.. ... •// " r V- ; —11 .. y hi 14 l4U4|lJlliMl •^•••♦••1 liVlltrll _ ! / ! : : W.... |« ^ i i r * | f i i i ^ r i i i ^ i j n i i r t i p i i f i n r ^ i ï i i r i j-/ 1 • • I i n | : I ■ L :P4H
p::}":':pr7-T-?r =---1
[' is r'7i ~'.KI ~ v ! I4.4 Results
44.1 Individual session results
Figures 4.2, 4.3 and 4.4 show sagittal maximum intensity projections (MIPS) per session for the motor, cognitive and visual tasks, respectively. Each SPM{Z} MIP shows voxels that survive a threshold of p<0.05, corrected for multiple comparisons. It is immediately obvious that the pattern o f activated voxels varies widely between repeated single sessions in our subject. While a grossly homogeneous pattern is evident across single session MIPs of the same paradigm, the spatial distribution of voxels in each MIP is highly variable. Even though striking similarity is evident between certain data sets (e.g. visual sessions 10 and 12, figure 4.4), a large number of sessions from all three paradigms display no
significantly activated voxels (e.g. visual sessions 4 and 30). The differences are best exemplified by comparing the SPM {Z} of motor session 1 (figure 4.2), which contains 1076 voxels above threshold, and motor session 33, which contains only 5. Results from the cognitive paradigm (figure 4.3) are broadly similar: while the spatial distribution of voxels between MIPs is more comparable than in the motor and visual paradigms, a large number of sessions contain no significantly activated voxels at the chosen threshold.
MIPs are binary statistical images, in which voxels are classified as ‘active’ or ‘inactive’ according to accepted but arbitrary statistical thresholds (for discussions of this issue, see Poline et a l, 1996; Genovese et al, 1997; Noll et a l, 1997; Cohen and DuBois, 1999; Tegeler et a l, 1999). In any of the MIPs of figures 4.2,4.3 and 4.4, a voxel i could have very different o^ ’s between sessions, yet still pass the threshold and appear to be consistently activated.
y : .. ê r l E i i E t ï E i Ü ..
Q:::'
r ' • 'v iA
..Figure 4.3 A & B. Single session sagittal MIPs for the cognitive paradigm. Similar to figure 4.2, although thirty-three sessions were collected, only thirty are displayed. Sessions marked with contain no significant voxels.
17
■ / ■ ' 4 : • Ilium»»»... I '20
22
25
27
R = F 1
ES
: I ! I 3 3 *H
H
y
16
Figure 4.4 A & B. Single session sagittal MIPs for the visual paradigm. As with figures 4.2 and 4.3, only thirty-one sessions are displayed. Sessions marked with **’ contain no significant voxels.
18
\ ...|.^ .i...j..J .... iTC ..;.i..ZE....J'... ; . ' \ r ^ T x / 'XLE E
s ;26
%
EX
7
'
/ t ,
I m
J
...^
•7.
J ,y
J . . I. . . . \ % ... • • ■ • / X . y t e ■ y ....: : Z ...-
■A X ...Y<-
,7
■\ , . ,y
.I,,-
z ■ ...^... .... ^.. 1... 1... : . . ^ X s t ...t.... i ... i... i 7 ? \ t e . .. . i : : I : : ± : : M ...!...1... t... j ....z j ... / K Z30*
' 7 - r i4.4.2 Multiple Session Analyses
Figures 4.5, 4.6 and 4.7 show the results o f the motor, cognitive and visual multiple session analyses, respectively. As noted above, merely examining thresholded statistical maps is perhaps not the best way to examine similarities between sessions. The use of the extra-sum-of-squares (ESS) F-test enables