Chapter 2. The impact of registration accuracy on imaging validation study
2.5 Discussion
2.5.4 Case study
The application of these power calculations can be illustrated through a sample size calculation for a case study, modeled closely after a prostate cancer imaging
validation study currently taking place at our institution. In this study, patients scheduled for radical prostatectomy (surgical removal of the prostate) undergo in vivo imaging before surgery using multi-parametric magnetic resonance (MR) imaging (structural, diffusion weighted and dynamic contrast-enhanced (DCE) sequences). These images could be processed to yield derived images (T2 maps from structural imaging, the average diffusion coefficient (ADC) from diffusion weighted MRI, and both the contrast transfer coefficient and the contrast leakage from DCE MRI). Following surgery, the prostate specimens are processed for whole-mount histology [17] and are digitized, annotated by a pathologist to identify cancer of different grades [18], and reconstructed and registered to the in vivo images. The contemporary per-patient cost for this study underway in our center is approximately $10,000 USD, so correctly estimating
the required number of subjects is important. Although these data can be applied to answer many questions, in this illustration we calculate the number of subjects for a hypothesis testing whether these modalities show significant signal differences between cancer and benign regions in the prostatic peripheral zone, known to harbor the majority of prostate cancer [19].
To predict the required number of subjects for such a study, we can use Equation 2.7. In order to use this equation, we need to specify three study design parameters: (1) , the acceptable false positive error rate (chosen as 0.05, in our case study); , the acceptable false negative error rate (chosen as 0.2 in our case study); and
, the minimum magnitude of signal difference we need to distinguish with these error rates. For our case study, we estimated the differences in mean intensities from the literature: Langer et al. [20] reported medians and ranges (from which means and variances can be estimated [21]) for the intensity in tumor and benign tissue in the
prostatic peripheral zone for T2, ADC, and images. We also need to estimate 6 model parameters:
I. , the average number of samples per tumor, which can be calculated as the average tumor volume divided by the volume of an image voxel. For this case study, we assume an even distribution of tumors with 0.5 cc, 1 cc and 2 cc volumes.
II. , the variance of tumor voxel intensities. For this case study, we used the variance estimated from statistics reported by Langer et al. [20] for tumor peripheral zone tissue for T2, ADC, and .
III. , the variance of background voxels. For this case study, we used the variance estimated from statistics reported by Langer et al. [20] for benign peripheral zone tissue for T2, ADC, and .
IV. , the interclass correlation coefficient relating the relative contributions of intra- and interregion variances to the total variance of voxel intensities. This parameter is typically assumed to be equal for both classes, and so can represent
or . This can be estimated from the literature, or calculated
from pilot data as , where is the variance of the mean intensities of tumor regions and is the variance of tumor voxels after subtracting the mean intensity for each tumor. For this case study, without pilot data or information from the literature specifying , we used a conservative estimate of .
V. , the mean fractional overlap of our tumor sampling regions with the underlying tumor for our registration algorithm. For this case study, we estimated using the first approach described in Section 2.5.3, i.e., estimating from a TRE estimate under the assumptions of spherical tumors and 3D Gaussian translational error, and using the same tumor volumes as used for estimating . For this case study, we explored multiple levels of TRE.
VI. , the variance of fractional overlap of our registration algorithm. For the case study, this was also estimated using the first approach described in Section 2.5.3.
From these parameters, we can calculate 4 derived parameters: (1) , (2) , (3) , and (4)
. The number of subjects can be calculated by substituting these parameters and into Equation 2.7 and solving numerically
for the fixed point of [22]. For the case study, this was performed by expressing the right hand side of Equation 2.7 as and finding the zero-crossing of using an iterative solver implemented as in Matlab R2011b (The Mathworks Inc., Natick, MA).
Because it may be necessary to estimate in the absence of pilot data or reported values, an intuitive understanding of is important. The effect of can be understood by comparing the variance of the mean of cluster-randomized samples in the absence of correlation , and in the presence of correlation,
. has an effect equivalent to changing the number of independent samples that can be collected from each region. Since the power depends on the total number of independent samples, the number of subjects needed is inversely proportional to the number of independent samples per region. Assuming (i.e. effectively 1 sample per region) would yield a conservative estimate of the necessary number of subjects. As initial data is collected during the study and better estimates for can be made, the study may reassess the required number of subjects. For example, (i.e. effectively ~2 samples per region) would halve the required number of subjects.
If we had an ideal zero-error registration, we could apply the classical power calculation in Equation 2.2 by solving for , substituting in this study's parameter
estimates for , , , , and , and numerically solving for the fixed point of . This would yield an estimated number of subjects for each modality: 28 for T2, 13 for ADC, 188 for , and 772 for . In practice, since histology-in vivo MR image
registrations have error, Equation 2.7 predicts the functional relationship between number of subjects and registration error using this study's parameter estimates (shown in Figure
2.4). In this relationship, the -axis is target registration error, from which we calculated the mean and variance of fractional overlap [11]. For a target registration error of 1 mm, we would predict a required number of subjects of 36 for T2, 15 for ADC, 208 for
, and 889 for . For a target registration error of 2.5 mm, we would predict a
required number of subjects of 48 for T2, 19 for ADC, 243 for , and 1090 for , in which case, the classical power calculation underestimated the number of subjects by as much as 40%.
Figure 2.4: Relationship between estimated required number of subjects and target registration error (TRE) for 4 imaging modalities (note the differing -axis scales illustrating the variable sensitivity of the required number of subjects to TRE for the different modalities). An estimate of the required number of subjects using the classical power formula (marked with circles) will underestimate the required number of subjects when there is registration error. This relationship also illustrates the high potential impact
for improving registration error in the context of imaging validation studies. Based on contemporary per-patient cost for a study underway in our center, the per-patient cost is
more than $10,000 USD, so reducing the required number of subjects can have a substantial impact on the overall study cost.
In practice, several methods have been proposed for reconstruction of prostate histology to in vivo MR imaging. The required number of subjects (and thus the study cost) varies as a function of the registration error. Figure 2.4 shows this relationship, demonstrating the value in improving registration accuracy.