Region-based Representation - Three-dimensional image classification using hierarchical spatial

Once an image has been decomposed, either the individual region represented by each node, or the decomposition as a whole, needs to be translated into a representation compatible with classifier generation. Whole image-based representation methods are discussed in the following section. This section deals with previous work that lends itself to individual region representation (although clearly these representations could

equally well be applied in the context of whole, non-decomposed, image representation). From the literature, two categories of region-based representation can be identified: (i) statistical-based techniques and (ii) histogram-based techniques. Two types of statistical-based techniques can also be identified: (i) first-order and (ii) second-order [134]. In the case of first-order methods, images are described using statistical functions such as mean, variance, energy and standard deviation of the image’s intensity values. With respect to the second-order methods, the relationship between the intensity value of each pixel with respect to those of its neighbours is taken into considera- tion [134]. In other words, relative location information is used. One example of a second-order method is where the concept of a co-occurrence matrix [44, 49] is used to enumerate the number of times two intensity values appear in an image within a certain distance and a direction of each other. A Voxel Co-occurrence Matrix (VCM) is used in the same manner as a pixel co-occurrence matrix but with respect to 3D images [44]. In a VCM matrix, the rows and the columns represent intensity values and a field represents the frequency that an intensity value in theith row was adjacent to the intensity value in the jth column. The adjacency is defined by a displacement distance d and angle. After computing VCM, various statistical functions can be applied to this matrix, such as angular second moment, contrast, correlation and variance. Another example of a second-order method is where run-length encoding matrices are used. These are matrices that hold information about the set of consecutive intensity pixels/voxels that have the same values [43, 132]. A Voxel Run-Length Matrix (VRLM) is the 3D form of a pixel run-length matrix. In a VRLM matrix, the rows represent intensity values, the columns represent the length of the run and the fields show the frequency of a specific intensity value in adjacent pixels/voxels in a specific direction. Similar to VCM, in the case of the VRLM matrix, different functions may be applied, such as, short/long run emphasis, length nonuniformity, run percentage and so on.

Regardless of whether first-order or second-order statistical methods are used, the generated statistics describe individual features which in turn can be used to define a feature space from which feature vectors can be extracted.

In the case of the histogram-based methods, there are a number of techniques that can be adapted: (i) simple histograms, (ii) Histograms of Oriented Gradients (or HOGs), (iii) histograms of Local Binary Patterns (LBPs) and (iv) histograms of Local Phase Quantisation (LPQ). In the case of simple histograms, the x-axis represents the values for some image features and the y-axis a count of the number of times that each feature value occurs. Often the attribute-values are grouped into sub-ranges referred to as “bins”. The simplest form of histogram image representation is where the x-axis represents intensity values. The histogram thus represents the number of times each intensity value, or group of intensity values, appears. The disadvantages of such simple histograms are: (i) significant information is lost, such as spatial information, because

only the frequency of the intensity values are considered; and (ii) invariant problems, especially when two images have similar content but with different resolutions (in which case different histograms will be produced).

A more advanced histogram-based method is the use of Histograms of Oriented Gradients (or HOGs) [24]. Using HOGs the changes in the intensity values of the region, with respect to either the azimuth and/or zenith direction, are computed and referred to asgradients. In order to compute a gradient at each location the difference between the “left” and “right” neighbouring intensity values, in a given direction, is calculated. Following this, the angles between the image gradients are computed and stored in what are called “orientation” bins. The gradient magnitudes in each orientation bin are accumulated. In the generated histogram, the x-axis represents directions and the y-axis the sum of the gradient magnitudes.

In order to generate LBPs, each pixel/voxel is compared to its immediate neighbours. For each comparison a one is stored if the intensity value of the pixel/voxel is greater than the neighbour, otherwise a zero is stored. The generated binary number from the sequence of neighbours then describes an integer value. In the generated histogram, the x-axis represents the computed integer values and the y-axis the frequency with which they occur. In order to generate a robust representation, it is desirable to compute rotation invariant LBPs. With respect to 2D images it is straightforward to calculate rotation invariant LBPs because each location has only eight immediate neighbours. With respect to 3D images the generation of 3D rotation invariant LBPs (26 neighbours in contrast to 8 neighbours) is computationally expensive. To address this issue Zhao and Pietikainen [157] proposed the use of Three Orthogonal Plane LBPs (LBP-TOP). The LBP-TOP representation considers the calculation of LBPs only with respect to neighbouring voxels located in the XY, XZ and Y Z planes. A combina- tion of HOG and LBP (HOG-LBP) has also been proposed and found to be a robust representation [143].

The concept of histograms of Local Phase Quantisation (LPQ) was proposed in [101]. LPQ uses low frequency local Fourier transforms whereby a histogram of the quantised Fourier transform can be generated [99]. At each image location, a Short- Term Fourier Transform (STFT) is applied with respect to the immediate neighbours. Then the resulting values are quantised (a value of one is used if the value is bigger than or equal to zero, otherwise a value of zero is used). In this manner a binary encoding is computed for each image location which can then be interpreted as an integer value between 0-256 (b = P8

i=0qi2i

−1_{, where} _q

i is the quantised value of a neighbouring pixel/voxel). Histograms describing the number of times that each integer value occurs are then computed, one per image.

In document Three-dimensional image classification using hierarchical spatial decomposition: A study using retinal data (Page 37-40)