2.5 Popular Object and Atlas Based Segmentation Methods
2.5.1 Snakes, Geodesic Active Contours
Kass et al. introduce snakes as energy minimizing spline [42] that is attracted to salient edges and contours in an image. In their notationv(s) = (x(s), y(s)) represents a curvev in 2D that is parameterized by its fractional arc lengths. The energy functional that they seek
to minimize is:
Esnake∗ =
Z 1
0
Eint (v(s)) +Eimage (v(s)) +Econ (v(s))ds . (2.65)
The three terms in this energy functional Eint, Eimage, and Econ represent energies due
to the internal bending of the spline, due to image forces, and due to external constraints respectively. The spline is evolved to a configuration that minimizes the energy functional for a given image.
The external constraint energyEcon in (2.65) provides a way for a user to steer the snake. A spring force can be used to penalize the snake for moving a specified point away from a designated location, and a repulsion force can be used to penalize the snake for moving towards a designated location. When these external constraints are specified, the snake produces a semi-automatic segmentation. The other two energies in (2.65), the internal spline state Eint
and the image energyEimage, require no human input, and when they are used withoutEcon
the snake produces a fully automatic segmentation.
During the 20 years since [42] was published, the snakes methodology has evolved into a widely used technology known as geodesic active contours [92, 12]. A change in the shape representation is one of the most important improvements of these newer methods over the original snakes.
In the geodesic active contours framework the shape model m=Lused to segment L in an
n-dimensional image is implicitly represented by the zero level set of a scalar-valued function on the imaging domain:
φ: Ω → R, (2.66)
rather than by the explicit parameterization used in the original snakes. Typically the signed distance map (SDM) is used asφ[75]. Let
−
x∈Ω and let d −x
denote the distance from
the nearest point on the boundary implied by φ. φ − x = −d −x , −
x is on the interior of the object
d −x , −
x is on the exterior of the object
(2.67)
Negative distances in this map are associated with the interior of the object, positive distances are associated with the exterior, and the zero level set implies the boundary of the object. An example of this shape representation can be seen in Figure 2.5.
(a) The binary map for an object inR2
−5 0 5
(b) The signed distance map for this object
(c) The signed distance map interpreted as a height function inR3 Figure 2.5: Implicit shape representations.
This implicit shape representation has several useful properties. It allows the topology of
=
mLto change during the segmentation process. Such a change would not be possible with the original snake, or indeed with the ASM [19] or m-rep [36] shape representations discussed in Sections 2.5.2 and 2.5.3. This change in topology is desirable when the segmentation target is likely to have a different topology than the shape template, for example, when the segmentation target is in the presence of pathology. However, a change in topology is undesirable when the target has a known, fixed topology. Bai et. al [3] have demonstrated an extension of this method that is constrained to a fixed topology, but their constraint is computationally expensive.
Another useful property of the implicit shape representation is that it is computationally efficient to determine whether any given voxel lies on the interior or the exterior of the object.
All one needs to do is query the sign ofφ −x . SL = n −x:φ − x≤0o (2.68)
This allows one to design (2.65) so that the Eimage term encourages different intensity dis-
tributions for the interior and exterior of the object. Let µ and ν denote the average image intensity in the interior and exterior of the object respectively.
µ = −x∈SL mean −I −x (2.69) ν = −x6∈SL mean −I −x (2.70)
The functional popularized by Chan and Vese [14],
Eimage = Z −x∈SL −I − x−µ2+ Z −x3SL −I − x−ν2, (2.71)
encourages this separation of regions.
During the segmentation process, (2.65) is minimized, typically by following a gradient descent procedure for which the gradient formulas are knowna priori. This allows the entire segmentation to be performed quickly, and these level-set methods are frequently used in real-time applications driven by sonography or fluoroscopy.
As the shape deforms during the segmentation process, the evolution ofφcan be limited to translation along its normals. To understand this, it is useful to consider that any translation in a tangent plane to the surface would constitute a reparameterization of surface. Because the correspondences produced by this shape representation are based on voxels in Ω , rather than some anatomical property of the target object or some other geometric property of φ, allowing such a reparameterization would have no effect on the implied shape.
This optimization typically can only be guaranteed to produce a segmentation result that is a local minimizer of the energy function. In general the desired segmentation result, i.e.,
the global minimum of (2.65), the segmentation needs to be initialized suitably close to that answer.
The tunneling descent method by Tao and Tagare [83] is able to relax that requirement by imposing new restrictions on the image data. They make the assumption that the image is such that one can tell when the segmentation boundary is entirely within the object and when it is entirely outside of the object. They seed the optimization with an initial shape on the interior of the target object. During the each iteration of the optimization the object is required to grow. They escape from local minima by growing the object in the way that causes the smallest increase in the cost function. This optimization procedure runs until the entire segmentation boundary is outside of the target object. At that point they choose the object with the lowest cost from the set of local minima that they had encountered, and they return that object as the segmentation of the image.
Two relative weaknesses of the level-set methods are exposed when they are used to study populations of objects. As was mentioned earlier, the implicit shape representation produces voxel-based correspondences. This is a weakness with regard to morphometric studies that require anatomical or geometric correspondences. Although an explicit shape representation may provide these correspondences directly, further processing, e.g., by the method proposed by Cateset al. [13], is needed to build these correspondences for implicitly represented objects. The other major weakness of the level-set methods is that there does not exist a principled means to learn a probability distribution on the shapes themselves.
Several research groups [86, 46, 69] have proposed using PCA on a vector formed by the concatenation of all values from the SDM. A major problem is that SDMs lie in a nonlinear space. This problem is illustrated in Figure 2.6. The figure shows the boundaries and the SDMs from two example objects inR2. The average of those two SDMs is not another SDM.
In fact, because the two example objects are non-overlapping and well separated, this average function does not even have a zero level-set. In general new vectors produced by (2.44) will not correspond to new SDMs, but they will however describe functions Ω→R2. In practice,
the training SDMs could be aligned so that a new instance produced by (2.44) will have a zero level-set that would be interpreted as the new object boundary.
Another problem with using PCA on SDMs is that it is a memory-intensive task. In particular, when working with objects inR3, the number of voxelsnin each SDM can be quite
(a) Example SDM with a ball in the upper left corner
(b) Example SDM with a ball in the lower right corner
(c) The average of these two SDMs
Figure 2.6: The linear average of two SDMs need not be a SDM itself. The SDMs for two balls are shown in Figure 2.6(a) and Figure 2.6(b) with a dashed line through the zero level-set. The two balls are non-overlapping. The linear average of these two SDMs is shown in Figure 2.6(c). This SDM does not imply a shape: it has no zero level-set.
large. For example, an SDM that occupies 2563 voxels, each containing a double-precision (8 byte) distance, requires 128 megabytes of storage. When working with a population of
k objects, keeping all the SDMs in memory becomes problematic for relatively small k. A 2563×2563 covariance matrix, with 8 byte elements requires 2 petabytes of storage, making it completely impractical with commodity hardware that is available today.
There is a trick that allows the eigenvalues and eigenvectors of a large covariance matrix to be computed without ever constructing the covariance matrix itself. Here I will use the notation of Section 2.3.1, assuming=x0 is the n×k matrix formed by repeated application of (2.23) fori∈ {1..k} and Σ is the unknownn×ncovariance matrix. We can compute ak×k
symmetric positive semi-definite matrix Σ0:
Σ0 = 1 k−1 =x 0T = x0 (2.72)
and its eigenvalues Λ0 ={λ0i} and eigenvectorsV0. It follows that fori∈ {1..k}
Σ=x0Vi0 = =x0Σ0Vi0 (2.73)
Thus, each eigenvector of Σ can be found by premultiplying an eigenvector of Σ0by=x0 and then normalizing the result to a unit-vector. Its eigenvalue will be the corresponding eigenvalue of Σ0.
This method allows us to perform PCA on a set of SDMs without having to compute the
n×n covariance matrix. However, doing so remains a computationally difficult task due to the requirements that we store then×k matrix=x0 and multiply vectors by it. Moreover, our concern than the SDMs do not lie in a vector space has not been addressed.
Each of the next two segmentation methods that I present, ASMs and m-reps, use a shape representation that leads to a more principled SDSM. Both shape representations are compact: each shape is typically represented by a small number of samples. Even when working with objects from high resolution three dimensional images, storing ann×nmatrix in memory and computing its eigenvalues can be done on commodity hardware. In Section 2.5.2, I will show that ASMs use a linear shape representation to which PCA can be directly applied. M-reps lie in a non-linear space, but as I will explain in Section 2.5.3, they can be mapped into a linear space where PCA is applicable.