In this section, I describe the materials used during the application of this method to detect non-credible regions in m-rep-based bladder and prostate segmentations. The image data, m-reps, and RIQF-based image match used during this experiment are specified.
The data set used during the experiments reported here consists of 80 CT images of the pelvic region of 5 patients receiving ART to treat prostate cancer. Each patient’s treatment has been fractionated over a series of dates, and a new image was acquired prior to many treatment sessions. The bladder and prostate in each image were segmented by Bayesian optimization over m-rep deformable models [64]. These segmentations required us to train a
geometric prior distribution and an image match function. Let
−I
p
t denote the tth image of patient p. Let m=
p,k
t denote the m-rep segmentation of
the bladder (k = 0) and the prostate (k = 1) in
−I
p
t. A manual segmentation of each image
was performed. These segmentations of patientp on all days other than the target day were used to train the shape prior and the image match. Let m=p,t0,fit denote a training m-rep fit to the manual segmentation of the bladder in
−I
p
t. Each m=
p,0,fit
t was constructed, via the
method of Han and Merck [36, 57], so that the m-rep coordinate system preserved anatomical correspondences. Each bladder segmentationm=p,t0 was produced using a shape prior learned by Principal Geodesic Analysis [26] over the set of m-reps
n
=
mp,j0,fit :j6=t o
, holding p constant, and using an image match trained on object-scale RIQFs for the same set of models. Each prostate segmentation m=p,t1 was produced by applying the same procedure to the prostate training data.
Although the bladder segmentation methodology used in these experiments is similar to the naive method reported in Section 3.3, the probability distributions used were not precisely equivalent. These training m-reps
n
=
mp,o,j fit o
were produced by a preliminary implementation of the method of Han and Merck [36, 57] that assigned different values to the weights used in (2.78) and (2.79), than were used in the training for the experiments reported in Section 3.3. After the training used in this experiment was completed, the distance function dI was
improved to better estimate the distance from the boundary of the m-rep to reference seg- mentation in cases when that distance was likely to be significantly different from the distance in the other direction, i.e., from the reference segmentation to the m-rep. These changes in the training tools led to differences between the training m-reps, which in turn affected SDSM training and RIQF region definitions. Furthermore, although the segmentations used in these experiments were produced using object-scale RIQFs, the test of non-credibility uses smaller scale RIQFs that are described below.
The local image match functions used in the non-credibility test were trained on regions defined by the m-rep coordinate system. Each image region
−I
iis defined to be the neighborhood
of a sampled m-rep spoke end, which serves as the anchor point for the region and is identified by its object-relative coordinates (u, v, φ). Each neighborhood is defined to extend 1 cm along
the surface normal, and each voxel makes a Gaussian weighted contribution to the QF based on it’s distance from the object boundary. The peak of this Gaussian is at the object boundary, and its standard deviation is defined to be 1/3 cm. Each region is further constrained so that only voxels for which the nearest surface point is within 2.5 cm of the region’s anchor point are allowed to contribute to the QF.
Because the training m-reps preserve anatomical correspondence, local RIQF statistics on a corresponding region across the training population for an object can be learned. PCA on the training cases yields a mean µi and n eigenmodes of variation nλij, vjio for each RIQF. The local image match function used to assess the local credibility of this region in each target case, m=p,kt can be understood as a Mahalanobis distance of an RIQF Qi sampled from the target image, according to the probability distribution learned from the PCA.
fi = m, −I i = n X j=1 Qi−µi·vji 2 λi j +krk 2 λi r (4.2)
The first term is the Mahalanobis distance to the intensity quantiles observed in the target case in the PCA space truncated to neigenmodes of variation. This term can be understood as aχ2 random variable withndegrees of freedom. The second term accounts for the residue outside of this PCA space; this residue is assumed to follow an isotropic Gaussian probability distribution. r= Qi−µi− n X j=1 Qi−µi ·vij vji 2 (4.3)
r is weighted by the standard deviation pλi
r in the training cases that is unaccounted for in
the truncated PCA space, so the second term in (4.2) can be assumed to be the square of standard normal random variable, or equivalently as a χ2 random variable with one degree of freedom. Because the sum of independent χ2 random variables is also distributed as χ2,
fi = m, −I ican be understood as a χ2 n+1 random variable.
This image match has several properties that are desirable for detecting non-credible regions in image segmentations. I have observed positive correlation between image match value and
Figure 4.1: An axial slice of a CT image illustrating why the exterior image match is used to detect non-credibility in prostate segmentations. The two segmentations shown in the image are quite different, yet have similar interior intensity patterns. It is the local exterior intensity pattern that allows one to distinguish between the acceptable segmentation (dark contour) and the erroneous segmentation (bright contour).
segmentation error distance, suggesting that the outliers of local image match are likely to be regions where a localized failure has occurred. This image match models the variability in image intensity for a population of objects and is appropriate for evaluating the segmentation of new image of the same class. Because the value of the image match function follows theχ2
distribution, a critical value of that distribution can be used as the thresholdρ in (4.1). The local RIQFs used to train the credibility test for bladder segmentations use three eigenmodes of variation to describe the image intensity quantiles from the exterior of the object and two eigenmodes to describe the interior of the object. Two additional degrees of freedom, one each for the exterior and interior, are needed to account for the residue outside of these eigenmodes. Voxels that were segmented as gas or bone via intensity thresholding were excluded from the RIQFs and did not contribute to the match function. Thus, the distribution of local image match values can be approximated as fi(·,·)∼χ2
7. The choice of a threshold valuef such thatP
fi(·,·)> f
< ρfor any valueρcan be made via the known CDF for the
χ27 distribution. Any region of a bladder segmentation where the image match exceedsf is considered to be non-credible.
Only the exterior RIQFs were used to evaluate the local credibility of prostate segmen- tations because of the lack of image contrast between the bladder and prostate. Figure 4.1
shows two possible segmentations of a prostate on an axial slice of a CT image. One of these segmentations is correct; the other is the result of a gross shift towards the bladder. Because the image intensity patterns for bladder and prostate are similar, the interior histograms for these two segmentations are roughly equivalent, as is their values of interior image match. A trained expert’s ability to detect the segmentation error is based on the intensity patterns at the exterior of the prostate, in the region away from the bladder. These regional intensity distributions are quite different, and the exterior RIQF image match will be able to distin- guish between them. Exterior RIQFs with three eigenmodes of variability are used to detect non-credible regions of the prostate. The additional degree of freedom for residue outside of the PCA space allows the distribution of image match values in a region to be approximated by fi(·,·) ∼ χ24. A threshold value can be chosen from the CDF of that probability distri- bution, and any image match above that threshold can be interpreted as evidence that the segmentation in that region is non-credible.