3.3 Markov Random Field Based Classification
4.1.2 Training and Segmentation for Bayesian Methods
Bayesian, deformable model segmentation frameworks must learn the two components of their objective function, the shape prior p(m) and the image likelihood p(I|m). This section focuses on the shape prior; an in-depth discussion of the image likelihood is given in the next section. Training the shape prior requires three steps: fitting, alignment, and statistical learning. Segmentation requires two main steps: initialization and optimization. These steps
of the Bayesian segmentation pipeline are now discussed.
The shape prior is estimated from training images that have been manually segmented by a human expert. This task is typically challenging, and different experts produce different manual segmentations. This effect is called rater bias. The challenges with accounting for this variability are not discussed in this chapter. Instead, a single expert is used, and automatic segmentations are sought that mimic this specific expert.
Manual segmentation typically supplies a segmentation in a format equivalent to voxel labels. To train the shape prior, parameters of the shape model must be found that match the voxel labels. This is itself a segmentation task. I term “fitting” to be the segmentation of a label image by a shape model. Fitting has the same requirements as segmentation, namely an objective function and an optimization. However, fitting is simpler than segmentation since the appearance of the object is well defined. For fitting, fappear(m, I) compares m’s implied voxel labeling to the voxel labeling of I. Comparison measures are either boundary based or region based. Region based comparisons typically use a volume overlap measure between the two voxel labelings. A popular boundary based comparison measure computes the sum of squared distances from many points on the object boundary given by m to the closest point on the object boundary implied by I, and vice-versa. However, computing distances from the boundary given by m is computationally expensive for an m that is varying, so this is often either not computed or approximated. Such an approximation is used for the fappear function used for fitting in Section 4.3 [MTS+08]. The fshape functions used for fitting are identical to those used for non-Bayesian segmentation. fshape is composed of soft geometric constraints that are designed to obtain non-self-interpenetrating shapes and good model-to- model correspondences.
The above first step of training produces a set of models {mi} fit to each training image. As Section 4.2 discusses in more detail, {mi} and their corresponding training images are all that are required to train the image likelihood. Computing the shape prior, however, often additionally requires alignment. Here, I consider alignment to include any variation within {mi}that one does not wish to statistically model. For example, there is often a change in the global coordinate system of each image which one does not wish to model. Alignment produces
a modified set of models {m0i} that are either expressed in a new coordinate system or have had some of its variation subtracted out. The segmentation tasks examined in Section 4.3 use organ specific alignments that are further discussed in Section 4.3. Alignment is also linked to the initialization and optimization performed during segmentation, which is discussed below.
Given the aligned training models {m0i}, the shape prior can finally be estimated. This is typically done using PCA. The segmentation framework used in Section 4.3 uses the PGA generalization of PCA to compute a multi-scale fshape function on m-reps [Fle04]. First the Fr´echet mean m-rep model mµ of {m0i} is computed with respect to a distance metric. Then an appropriately scaled, linear tangent plane in the m-rep shape space is computed at mµ. The training models are projected onto the tangent plane. PCA is used on the projections to compute several global modes of variation and several local, per-atom residual modes of vari- ation. These modes are used for optimization, and their corresponding Mahalanobis distance functions define fshape.
Segmentation begins by placingmµat an initial position in the target image. This starting object is the most likely object as determined solely by the shape prior. Then the maximum of the posterior is found by optimizing over the coefficients of the model’s learned modes of variation. The initial position or deformation ofmµ for each target image is termed its initial- ization. The initialization used for each target image should be identical to the alignment used for each training image. Otherwise, the learned variation that is optimized over will not match the variation needed to segment the target images, i.e., the prior will be inappropriate. Sec- tion 4.3 further discusses specific initializations and alignments. The segmentation framework used in Section 4.3 performs a multi-scale, conjugate-gradient optimization. It is multi-scale since the optimization is first constrained to the learned global models of variation. Then the local residues are independently optimized withfshape functions that are independent of each other and the global prior. Conjugate-gradient optimization finds the local maximum of the objective function [PFF+03]. It proceeds by first numerically sampling the derivative of the objective function. Then it computes the gradient direction and the gradient’s first conjugate direction. Next, for each direction in series, the optimum of the objective function along each line is found using a Brent linear search. This process is repeated until convergence.
The entire training and segmentation pipeline for Bayesian, deformable model frameworks has now been described except for appearance models. Next Section 4.1.3 discusses this re- maining piece of the segmentation pipeline. Then Section 4.2 presents a novel quantile function based appearance model.