Pyramid Tools - Multiresolution image segmentation

3.3.1 Definition

Some early work used the term pyramids to describe what multiresolution analysis means. In [Ros84a] the author describes the pyramids as data struc- tures that provide successively condensed representations of the information in the input image. The successive levels of the pyramid are reduced- resolution versions of the input image, so that they represent increasingly coarse approximations of the features of the image.

In the following sections, early tools used to create the successive approximations are introduced. Some of these tools do not need the orthogonality condition of the basis functions. Therefore, they were not used for complete representation of the signals.

3.3.2 Average-based Pyramid

The simplest type of an image pyramid is constructed by repeated averaging of the image intensities in non-overlapping 2 × 2 blocks of pixels. Given an input image of size 2n × 2n_{, applying this process yields a reduced image of}

size 2n−1 _{× 2}n−1_{. This image is called the parent image. Applying the pro-}

cess again to the parent image yields a still smaller image of size 2n−2 _{× 2}n−2

etc. Fig. 3.2 illustrates this process.

As the images are stacked on top of one another, they constitute an exponential tapering “pyramid” of images. In this simple method each node in the pyramid, say k levels above the base, represents the average of a square block of the base of size 2k × 2k _[Ros84a].

3.3.3 Weighted Average-based Pyramid

The previous method to create lower resolution images is simple and easy to compute. But the sharp cut off characteristic of the unweighted averaging can be undesirable. Overlapping weighted averages would be preferable. However, the next logical step to construct the lower resolution images of the higher analysis levels, is to use a non-overlapping weighted averaging, peaked at the centre of the non-overlapping averaging regions and falling off to zero at their borders.

3.3 Pyramid Tools 51

j=2 (Coarsest scale)

Each pixel is a parent to four pixels at level j=1 and a grandparent to 16 pixels at level j=0.

j=1

Each pixel is a parent to four pixels at levelj=0.

j=0 (Original resolution)

Figure 3.2: Simplest type of an image pyramid.

Salem et al. used in [STMG03] the Gaussian function as a function for the weighted average in a manner similar to a moving window. The original image is divided into parts, each of which has the same size as the filter size. The filter is applied to each part of the image separately. This can be interpreted as a windowed convolution, that also agrees with the concept of a distinct block operation [GW05]. As shown in Fig. 3.3, in the distinct block operation one block of the input image is processed at a time. The operation in this case is Gaussian filtering. Each time the filter is applied on a part of the image, the result is placed as a pixel value in a new image in its corresponding location. For the next images in the pyramid the process is repeated using larger filters. For instance, if the parent image at level j = 1 was created with a Gaussian filter of size 3 × 3 then the grandparent image at level j = 2 should be created from the original with a Gaussian filter of the size 5 × 5. Generally, the distinct block operation may require image padding, since the image is divided into blocks. These blocks will not always fit exactly over the image. In Fig. 3.4 the Gaussian window pyramid is applied to a traffic scene.

Figure 3.3: Gaussian window for constructing a weighted average pyramid.

In [SAHU04] the authors used the B-spline functions to compute the local weighted geometric moments. A sliding window at dyadic scales is used. The B-splines are well-suited window functions because, in addition to being refinable, they are positive, symmetric, separable, and very nearly isotropic. The algorithm is used in many applications, e.g., as a feature- extraction method for detecting and characterizing elongated structures in images and as a multiscale optical-flow algorithm extending the well-known optical-flow method.

3.3.4 Gaussian Pyramid

The next logical step is to use a weighted and overlapping averaging. Here the functions used to generate approximated version of the signal are non- orthogonal functions. Some redundancy in this method of information representation may be useful. The Gaussian pyramid is a sequence of images, each of which is a low-pass filtered copy of its predecessor [Bur84]. It is called Gaussian pyramid because the low-pass filter used has a Gaussian characteristic.

Let G0 be the original image. It becomes the bottom or zero level of the Gaussian pyramid. Each pixel of the next pyramid level, image G1, is obtained as a weighted average of the pixels in image G0 within an n × n window.

3.3 Pyramid Tools 53

(a) (b) (c)

Figure 3.4: Image in multiresolution representation. (a) Original resolution. (b) Approximation in one lower resolution level by a 3 × 3 Gaussian window. (c) Approximation in three lower resolution levels by a 7×7 Gaussian window.

Each pixel of G2 is then obtained from G1 by applying the same pattern of weights. The window moves horizontally or vertically so that its centre is the second-next pixel of the current pixel, i.e., the sample distance in each level is double that in the previous level. As a result each image in the sequence is represented by an array which is half as large as its predecessor. The first row of Fig. 3.5 shows an application of the Gaussian pyramid tool to a traffic scene.

3.3.5 Laplacian Pyramid

A set of band-pass filtered images L0, L1, ..., LN −1 may be defined simply

as the differences between the low-pass images at successive levels of the Gaussian pyramid:

Lj = Gj− Gj+1 (3.13)

and

LN = GN (3.14)

The image Gj+1 must be expanded to the size of Gj before the difference

is computed. The expansion of an image of size (M1 + 1) × (M2 + 1) is done by interpolating between each two given values to have an image of size (2M1+ 1) × (2M2+ 1). Just as each image in the Gaussian pyramid represents the result of applying a Gaussian filtering to the image of the previous lower level, each image of the set L0, L1, ..., LN −1 represents the difference of

The original image The lower resolution image G1

The lower resolution image G2

The first Laplacian image L1

The second Laplacian image L2

(-) (-)

Figure 3.5: Application of the Gaussian (first row) and the Laplacian pyramid tools to a traffic scene.

These differences resemble the Laplacian operator which is used, e.g., in image processing to extract such image features as edges. Therefore, this type of pyramids is called Laplacian pyramid [Bur84]. Fig. 3.5 second row shows two levels of a Laplacian pyramid and their relation to the Gaussian pyramid.

In document Multiresolution image segmentation (Page 70-74)