2.2 Morphometry
2.2.7 Voxel and Tensor-Based Morphometry
Following GWR into the space of an average image, the deformation fields are used in TBM to measure local volume differences between groups. In VBM, segmentations are derived from the intensity information embedded in voxels, and used to determine
changes in local tissue volume or concentration between groups or over time. Segmented tissues are aligned into the GWR average space, and compared at every voxel.
These techniques are non-invasive and hypothesis-free, covering the entire brain and thereby reducing the need for the laborious delineation of regions of interest. In an early VBM paper, Wright et al. (1995) registered GM and WM maps from 15 patients with schizophrenia. The authors found that GM density (in the temporal lobe) and WM density (in the corpus callosum) were correlated with patients’ scores in tests of schizophrenia syndromes. In a well-cited work, Maguire et al. (2000) sensitively showed specific hippocampal volume increases in London taxi drivers, compared with controls, suggesting a structural correlate with spatial navigation skills. These studies highlighted the sensitivity of morphometry to group differences due to both disease and plastic learning and memory-related brain changes, and presaged the techniques’ widespread utilisation in the clinical literature.
In animal studies, they ease the burden of exploratory histology, which is both destructive and time-consuming. Statistical tests of other values are possible in the GWR average space. For example, Teipel et al. (2011) measured brain T2 relaxation times on a voxel- wise basis in an AD mouse model, compared with controls. Lebenberg et al. (2011) used a GWR approach to register autoradiographic data of glucose uptake and metabolism in mouse brain tissue slices reconstructed into a 3D volume.
Tensor-Based Morphometry
In TBM, for each source image, the 3D displacement transformation ( ⃗) = maps a voxel at position ⃗ = ( , , ) in the final GWR average (reference) space to its corresponding position in the original image. For 3D deformation fields, the Jacobian matrix at each voxel is a 3 × 3 tensor of displacement gradients (Chung et al., 2001), given by the first-order partial differentials of each component of the transformation. Its determinant is:
E2.1 =
( ⃗) ( ⃗) ( ⃗)
( ⃗) ( ⃗) ( ⃗)
( ⃗) ( ⃗) ( ⃗)
Where ( ⃗) is the first component of the deformation field which transforms voxel position ⃗ in the reference image to the equivalent position in the floating image. contains information about the elongation and contraction, in each direction (Ashburner & Friston, 2003). The determinant is a scalar value which represents the relative expansion (when >1) or contraction (when <1) of that unit voxel to encompass an equivalent region in the original image (Fig 2.10). If = 1, there is no volume change, and if < 0, as previously stated, there is “folding” of the voxel grid (Ashburner & Friston, 2000).
Figure 2.10 Local expansion and contraction in a deformation field.
A regular grid overlaid on an average mouse brain image (after several iterations of GWR) and warped following a deformation field which maps the average to an individual image. The inset illustrates some regions expanding (blue; > 1), while others contract (red;
The NiftyReg implementation of GWR includes the final affine (global) and NRR (local) iterations’ transformations in . If the affine were omitted, the value can simply be multiplied by the determinant of the affine matrix, to give the overall scaling factor. In deformation-based morphometry (DBM), the vector lengths themselves (or sometimes the directional components of , giving shape change) are compared, between groups. In TBM, values from each group are compared, at every voxel, often using a t-test. An assumption of t-tests is that values are normally distributed, but because is always positive, the distribution is skewed, and assumed lognormal. values are therefore usually log-transformed, prior to statistical tests, to render them more normally distributed (Chung, 2012). This renders, via symmetry about zero, expansions and contractions of equal magnitude to be equally likely (Leow et al., 2007).
Smoothing (with a Gaussian kernel with given full width at half maximum, FWHM) also aides this assumption of normality13, compensates for imprecise registration, and reduces the influence of noise. The FWHM is chosen to correspond to, and hence enhance, the expected spatial scale of structural differences between groups: smaller structures’ volume changes are likely to be smoothed out14 (Ashburner & Friston, 2001). In the human brain, this is generally of the order 8—12mm (e.g. Good et al., 2001; Carducci et al., 2013).
The result of performing these mass-univariate, voxel-wise t-tests is a statistical parametric map (SPM), conferring (after multiple testing correction) significance or non-significance of the mean local volume difference between groups. These maps are useful for exploratory or naïve morphometry, without prior assumptions about the location of volume differences. Thus, for example, regional growth or atrophy can be estimated between time-points (Brambati et al., 2007; Maheswaran et al., 2009a), or local volumes compared between transgenic and wild-type groups (Ellegood et al., 2010). Alternatively, the values themselves can be integrated over parcellated regions of interest in the
13: Via the Central Limit Theorem: that is, with enough observations with a well-defined (unique) variance
and expected value, the mean of observations tends towards a normal distribution.
final average image, giving structural volumes for each subject (Jacobian integration, Boyes et al., 2006).
Voxel-Based Morphometry
In VBM, images are registered and aligned to GWR space, as above. They are either segmented beforehand (in which case, the segmented tissues are warped and interpolated into the final average space, using their images’ respective transformations), or after alignment (it is assumed that interpolation and consequent PV does not corrupt the subsequent segmentation, Ashburner & Friston, 2000).
The aligned segmentation values, for each tissue class of interest, are then compared at each voxel. It is assumed that these accurately represent the underlying tissue, and that there are no systematic differences between image acquisitions, which might bias segmentations via, for example, different noise levels between scanners. VBM was developed prior to the advent and common availability of highly accurate, computer resource-intensive NRR algorithms, and thus transformations were not expected to achieve precise alignment between images (Ashburner & Friston, 2001). A smoothing step (as above) is applied to compensate for possible misalignments. Smoothing also introduces a PV continuum, and, by incorporating neighbouring voxels, constitutes a spatial averaging of segmented values – equivalent to assessing GM “density” or “concentration” within an ROI, or the proportion of GM relative to other tissue types, since at every voxel following segmentation, the proportions of all tissues sum to 1 (Ashburner & Friston, 2000).
Because smoothing introduces a spatial aspect to the segmentation, Ashburner & Friston (2000; 2001), and Good et al. (2001) suggested applying a “modulation” step. This compensates for the change in total volume of the tissue within structures commensurate with the deformations from registration (Fig 2.11). Voxels’ concentration values are multiplied by their from TBM, thereby preserving tissue volume, and the statistical test is between regional absolute tissue volumes. Modulation allows this comparison having accounted for larger volume differences via registration. Without modulation, the comparison is not between local volume differences (GM or otherwise); it is directly
between tissue concentrations (from the original images’ space – i.e., GM probability, or proportion). Hence Ashburner & Friston (2001) placed modulated VBM on a “continuum” with TBM: both examine local volume.
Figure 2.11 Explanation of modulation in VBM.
If initial segmentations seg1 and seg2 of structural images i1 and i2 represent the relative volume of a particular tissue at each voxel, then modulation is a volume-preserving step. Here, greyscale represents intensity, hence, the voxel values after image segmentation. Following segmentation of i1 and i2, i1 is registered to i2. The resulting transformation T is used to warp seg1 into the space of i2. Here, region A shrinks and region B retains its volume. To preserve tissue volume at each voxel, transformed seg1 is modulated (M) by dividing voxel values by the relative local volumes of (transformed i1)/i1 (equivalent to multiplying by ). Intuitively, in this illustration, the density increases (modulated A is darker), thereby preserving volume. Region B does not change size, so intensities remain constant. Voxel-wise tests will show differences in B, but none in region A. If modulation was not performed, here they would show differences at all voxels.
Contemporary fine-scale NRR methods, with high degrees of freedom, likely account for the majority of group volume differences within the deformations (as predicted by Ashburner & Friston, 2000). Modulated VBM may therefore closely approximate TBM.
Unmodulated VBM may still be used to test for voxel-wise differences in the underlying tissue concentration. Each method is valuable and likely to reveal group effects in different regions due to distinct aspects of neuroanatomy (Keller et al., 2004; Mechelli et al., 2005).
Criticisms of VBM
VBM originally incorporated allowances for some uncertainty in the alignment achieved between brains. Registration would model “macroscopic” volume and shape, while segmented values could reveal smaller-scale, regional volume differences. Several criticisms arose of the formalised VBM methodology proposed by Ashburner & Friston (2000). Bookstein (2001), adopting the “continuum” analogy, noted the difficulty of quantifying volume differences (from registration) using the same scale as intensities (from segmentation). However, this is accounted for by modulation (as above). The same paper indicated, and Ashburner & Friston (2001) agreed, that significant apparent group differences may arise between tissues at boundaries where there is a misregistration. This is particularly concerning when the registration is known to be imprecise, such as when low-dimensional warps are used to align images (as in the original VBM implementations), and when there is no true one-to-one correspondence between brains, as in the highly variable human cortex (Crum et al., 2003). Systematic misregistrations may also be caused by biological differences between groups (Teipel et al., 2013). Contemporary NRR algorithms, with many degrees of freedom, allow improved alignments, especially between the relatively simple structures of mouse brains, and these can be checked visually. Using VBM to examine residual volume differences may therefore be redundant (Crum et al., 2003).
Keller et al. (2004) noted that unmodulated VBM appeared less sensitive for detecting hippocampal GM atrophy than modulated VBM, when using ‘optimised’ VBM (Good et al., 2001), which improves alignment using customised GM templates. Both techniques revealed bilateral patterns of hippocampal volume loss, but this suggests that the majority of group differences were volumetric rather than in tissue concentration.
While the physical meaning of TBM results (volume differences) is unambiguous, the effects underlying statistical maps from VBM require interpretation; rejection of a null hypothesis does not confer the underlying cause. At the beginning of the century, the technique’s clinical usefulness was unclear. VBM is a test of either local segmented tissue concentration or (after modulation) absolute volume. The past decade has provided accumulating evidence that VBM measures are reproducible (Ewers et al., 2006) and correlate well with pathology, such as atrophy in Alzheimer’s disease (Karas et al., 2003; Chételat et al. 2005; Whitwell et al., 2008; Teipel et al., 2013). Several hypotheses exist regarding the tissue changes underlying VBM, including atrophy, altered neurogenesis or cell-specific changes such as size and density (Keifer et al., 2015). These will affect tissue T1 and T2 relaxation times, and hence the MRI signal and its segmentation (Zatorre et al., 2012).
Additional criticisms arise from the fact that segmentation methods which produce variable or unpredictable estimates in the presence of PV may produce unreliable VBM results in these regions (Thacker, 2003). This might be an important concern in the mouse brain, wherein the degree of PV is high, and may vary regionally between strains.
Statistical tests
In the final GWR average space, at each brain voxel (and thus, at anatomically equivalent locations across all images), groups’ tissue segmentations or log-transformed values may be statistically compared to derive the most significant regions of difference, and hence of most scientific interest, visualised using a statistical parametric map. A common method for doing so is with t-tests to compare group means, using the general linear model (GLM) embedded in SPM, FSL, or other analysis software. The null hypothesis H0 is that means are equal. It is assumed that every voxel in image can be tested independently, and that after fitting the model, the residuals are normally distributed (Ashburner & Friston, 2000). Each voxel represents a response variable which is linearly dependent upon a summed combination of independent variables , and their respective weights , which represent the contribution of the variable to :
E2.2 = + + ⋯ + +
Here, is an intercept term and is the residual error. In general form, E2.2 can be expressed as:
E2.3 = +
Here, is a matrix whose rows correspond to each image and columns to aligned voxels; is a design matrix coding group memberships (such as WT or transgenic) and covariates describing the independent variables (such as TIV; age; sex); is a vector of parameters to be estimated; is their dot product; and is a vector of assumed normally- distributed errors, such as noise. is estimated ( ) to give the smallest sum of squares of the residuals, ∑ . In SPM this is done using the pseudoinverse of , (Kiebel & Holmes, 2006):
E2.4 =
The model predictions (fitted values ) and residual sum of squared errors are then:
E2.5
=
= −
To isolate the contribution of certain parameters to , for example if genotype is of interest, and a confounding covariate such as age is not, may be multiplied using a vector of appropriate contrasts, . The following equations show the calculation of the t- statistic using the mean residual sum of squares , and the standard error, :
=
= × (( ) )
The DOF is the number of images minus the number of independent regressors (columns) in . Thus, mass-univariate tests are performed over the entire brain, fitting parameters to each voxel.
Non-parametric permutation tests (as implemented in FSL’s Randomise) may be more robust to non-normally distributed data (Ashburner & Friston, 2000), but require a long time to test each possible permutation of every voxel (or a large enough number of permutations to achieve an estimate of the distribution, Groppe et al., 2011). Under the assumption that there is no effect of, for example, genotype (H0), for each voxel genotype labels are exchanged amongst images and a test statistic calculated comparing groups. A distribution is constructed by repeating the test after permuting label combinations. A voxel’s p-value is the probability, over the image, of observing a more extreme test statistic (Nichols & Holmes, 2001).
The multiple testing problem and false discovery rate
In a 0.5ml mouse brain with 40µm3 voxels, using conventional mass-univariate t-tests, there shall be 7.8 × 10 tests of H0. A Type I error rate = 0.05 therefore implies 390,625 false positives (the familywise error rate, FWER, over the whole image): this is the multiple testing problem (Nichols & Hayasaka, 2003). There are various methods for compensating.
Decreasing at every voxel increases specificity (1 − ), lowering the probability of a single false positive. This simultaneously increases the false negative (Type II error) rate, (thereby lowering sensitivity, or power, 1 − ), and is the reason the highly conservative Bonferroni correction ( = / , where is the number of tests) is avoided in morphometry with thousands of voxels.
Given an SPM with some voxels declared significant, cluster-based approaches use the size and “mass” of regions of significance to distinguish less significant “noise” such as single, isolated significant voxels (Smith & Nichols, 2009). Clusters’ locations can be compared with a Gaussian random field to estimate which might arise randomly. This is helpful because smoothing introduces spatial correlation between neighbouring voxels, reducing the number of independent observations (Brett et al., 2003). There are
difficulties applying this to structural data, however, as there is greater variance at boundaries, where smaller clusters might be expected than in large, homogeneous regions (Thompson et al., 2003).
The false discovery rate (FDR) is a popular and simple alternative control of false positives in functional and structural experiments. If there are total rejected null hypotheses, of which are false, controlling FDR limits the expected mean of / over many tests (Groppe et al., 2011). If the chosen FDR rate = 0.05, on average at most 5% of the H0 rejections will be false. The procedure assumes tests are independent and is as follows (Benjamini & Hochberg, 1995; Genovese et al., 2002):
1. Sort p-values in ascending order, such that for tests, is the th smallest. 2. Let be the greatest for which ≤ . If does not exist, stop. 3. All null hypotheses 1… are rejected; the remainder are not.
Because structures’ variance between subjects is higher in some regions (such as the ventricles), controlling the overall at an arbitrary level can lower sensitivity. Controlling the proportion of H0 rejections which are false is more powerful.