Chapter 2 Background on Magnetic Resonance Imaging and Herit-
2.5 Inference on Heritability
2.5.2 Permutation Test and FWE Correction
Permutation test is a non-parametric technique that makes minimal assumptions about the data. With few simple assumptions like exchangeability of the observed data under the null hypothesis, the non-parametric permutation test is conceptu- ally simple and theoretically intuitive (Nichols and Hayasaka, 2003; Nichols and Holmes, 2001). When the null hypothesis is true, the data will exhibit a form of exchangeability, allowing permutation, re-fitting the model and computation of test statistic. With multiple permutations, an empirical null distribution can be con- structed and critical thresholds and p-values computed. Currently this approach has become well-known and widespread gradually owing to the recent development of inexpensive and powerful computers.
Applying variance component inference approach voxel-by-voxel yields a test stat- istic image. For each voxel, if the null hypothesis of no heritability, H0 : h2 = 0, is assumed to be true, the MZ and DZ twin pairs become exchangeable, allowing P = (nMZ+nDZ)/2
nMZ/2
possible permutations in total. Normally a small-to-medium sample gives a comparatively large P, so an approximate, or Monte Carlo permuta- tion test can be exploited instead with a smaller number of permutations, say N, based on a random subsample of all permutations (Nichols and Holmes, 2001).
In order to resolve the multiple comparisons problem and strictly control the false positives over the whole volume of the ROI’s simultaneously, permutation test can be employed to implement the procedure of FWE correction. Type I errors for all these comparisons are under strong control, and FWE-corrected p-values are computed by considering the maximum test statistics (Nichols and Holmes, 2001). With a permutation test, we obtain FWE-corrected p-values on peak height (voxel- wise test statistic value) for voxel-wise inference, and cluster size (number of voxels involved in a cluster after thresholding) and cluster mass (sum of voxel-wise test
statistic values from all voxels within a cluster after thresholding) for cluster-level inference. Hence, this permutation approach can be further separated into two parts: voxel-wise single threshold test and cluster-wise suprathreshold tests.
Voxel-wise Single Threshold Test
Along with the single threshold test, the omnibus null hypothesis of no heritab- ility for all voxels over the ROI’s is rejected if any voxel-wise test statistic value exceeds a given critical threshold, or equivalently, if the maximum voxel-wise test statistic value exceeds this threshold. The critical threshold is pre-defined with the significance level α. By permuting the labels of MZ and DZ for twin pairs and computing the test statistic image for each permutation, the empirical distribution of this maximum test statistic can be constructed using the maximum test statistic values obtained from all permutations, and the critical threshold is thec+ 1 largest value of the empirical distribution, wherec=bαNc(Nichols and Holmes, 2001).
The procedure of single threshold test is detailed as follows. For permutation p (p = 1, . . . ,N), the maximum test statistic over the ROI’s is denoted as Tmaxp . The original (unpermuted) data and the other N−1 relabelings together provide totally N maximum test statistic values, which can be used to create the empirical distribution of this maximum test statistic, and then provide theα-level critical threshold as the c+ 1 = bαNc+ 1 largest value, denoted as Tα. If the unpermuted maximum test statistic value is greater than Tα, the omnibus hypothesis can be rejected. The corresponding FWE-corrected p-value, written as pFWE, for each voxel can
be calculated as the proportion of these N maximum test statistic values in the constructed permutation-based empirical distribution that are not smaller than the original voxel-wise test statistic value (T0):
pFWE=
#{Tmaxp ≥T0}
N .
Cluster-wise Suprathreshold Tests
The significance of suprathreshold cluster tests can be assessed by the spatially informed cluster statistics, such as cluster size and cluster mass. A pre-selected cluster-forming threshold, which can be expressed as a p-value using the sampling distribution of the test statistic, is given and applied to the derived test statistic image to threshold test statistic values and form suprathreshold clusters, which are brain regions of connecting voxels with the test statistic values above that cluster-
forming threshold. Accordingly, the suprathreshold cluster size and suprathreshold cluster mass are defined as the number of voxels in a suprathreshold cluster and the sum of voxel-wise test statistic values within a suprathreshold cluster respectively. Theoretically similar to the single threshold test, suprathreshold cluster tests require constructing the empirical distribution of the maximum suprathreshold cluster stat- istics with permutations. The pre-determined significance levelα also provides the critical threshold to be thec+ 1 = bαNc+ 1 largest member within the empirical distribution.
The mechanics of the cluster-wise suprathreshold tests are described as follows. For permutation p (p = 1, . . . ,N), the maximum suprathreshold cluster size (or cluster mass) is denoted as Kmaxp (or Mmaxp ). An original data and the other N−1 relabelings (permuted data vectors) are analyzed, and the resulting N measures of maximum suprathreshold cluster size (or cluster mass) are sorted to form the empirical distribution of this cluster statistic. The critical threshold at level α can be calculated as thec+ 1 =bαNc+ 1 largest member of the empirical distribution, denoted as Kα(or Mα). Significance of the test is determined by whether or not the original maximum suprathreshold cluster size (or cluster mass) is greater than Kα(or Mα). The associated FWE-corrected p-value, denoted aspFWE, for each individual
suprathreshold cluster on the original test statistic image can be computed as the proportion of these N measures of maximum suprathreshold cluster size (or cluster mass) within the empirical distribution greater than or equal to the observed size (or mass) of that cluster (Nichols and Holmes, 2001), i.e.,
pFWE= #{Kmaxp ≥K0} N , pFWE= #{Mmaxp ≥M0} N ,
for cluster statistics of size and mass respectively.