4.2 Methodology
4.2.5 Feature Extraction
Feature extraction algorithms for medical image segmentation are categorised into intensity- based, texture, and shape features. Most of the features are considered as hand-designed, since feature extraction parameters are manually optimised for a specific task. Different types of features including intensity statistics, textons and curvature features will be considered to train a robust classifier for the detection and segmentation of brain tumour.
Intensity statistical features
First order intensity statistics (Jain, 1989) are referred as pixel-intensity based features. They express the distribution of grey levels within the selected region-of-interest (ROIs) which are the superpixels in the present work. For each superpixel, 16 features are calculated which will be explained in the following.
Average intensity feature of a superpixel, SP, is calculated using
π΄π£πππππ(ππ) = 1 ππ β πΌππ,π ππ π=1 . (4-4)
where ISP,i is the intensity value of pixel i in the superpixel SP, and NP is the total number of
pixels within the superpixel.
Standard deviation (STD) of intensities within the superpixel is calculated using
πππ·(ππ) = β 1 ππβ 1 β|πΌππ,πβ π΄π£πππππ(ππ)| 2 ππ π=1 . (4-5)
69 πππ(ππ) = 1 ππβ 1 β|πΌππ,πβ π΄π£πππππ(ππ)| 2 ππ π=1 . (4-6)
Coefficient of variance of the pixels is calculated using
πΆππ(ππ) = πππ·(ππ)
π΄π£πππππ(ππ) . (4-7)
Skewness is a measure of asymmetry of the distribution of the intensities around the mean value of the superpixel. Skewness for a data with average ΞΌ and Ο is derived from
ππππ€πππ π =πΈ(π₯ β π)
3
π3 , (4-8)
where E is the expectation operator. For the intensities of the pixels within a superpixels, skewness is calculated using
ππππ€πππ π (ππ) = 1 ππβ (πΌππ,πβ π΄π£πππππ(ππ)) 3 ππ π=1 πππ·(ππ)3 . (4-9)
Kurtosis is a descriptor of the shape of a distribution and a measure of tailedness of the distribution of the intensities. Kurtosis for a data generally derived from
πΎπ’ππ‘ππ ππ =πΈ(π₯ β π)
4
π4 . (4-10)
For the intensities of the pixels within a superpixels, kurtosis is calculated using
πΎπ’ππ‘ππ ππ (ππ) = 1 ππβ (πΌππ,πβ π΄π£πππππ(ππ)) 4 ππ π=1 πππ·(ππ)4 . (4-11)
Maximum and minimum of the intensities within a superpixel are considered as πππ₯(ππ) and
πππ(ππ), respectively which are included in the feature vector of that superpixel. The range value is calculated using
π ππππ(ππ) = πππ₯(ππ) β πππ(ππ) . (4-12)
Median and mode of the intensities of the pixels inside the superpixel are considered as
ππππππ(ππ) and ππππ(ππ), respectively which are also included in the feature vector.
Mean of the absolute deviation is calculated using
πππππ΄π·(ππ) = 1 ππ β|πΌππ,πβ π΄π£πππππ(ππ)| ππ π=1 . (4-13)
70 Median absolute deviation is calculated using
ππππ΄π·(ππ) = ππππππ (πΌππ,π|π=1,β¦, ππβ ππππππ(ππ)) . (4-14)
The third central moment is calculated using
ππππππ‘3(ππ) = πΈ(π₯ β π)3= 1 ππ β(πΌππ,πβ π΄π£πππππ(ππ)) 3 ππ π=1 . (4-15)
Interquartile range is calculated using
πΌππ(ππ) = π3β π1 , (4-16)
where Q3 is the upper quartile, i.e. the median of the lower half of the intensities, and Q1 is the
lower quartile, i.e. the median of the lower half of the data. Entropy is calculated using
πΈππ‘ππππ¦(ππ) = β β ππ. πππ2(ππ) ππ
π=1
, (4-17)
where p is the histogram count of the absolute superpixel values and Nb is the number of
histogram bins.
Texton Feature
Brain tissues have complex structures that include both normal and tumorous tissues. Therefore, intensity features are not sufficient to accurately detect and segment the tumour. To tackle this problem, texture features that work on a higher dimensionality are used to improve the accuracy of segmentation. Textons (Leung and Malik, 2001) are among the most powerful texture feature extraction (Arbelaez et al., 2011) and are able to distinguish various patterns in the image. Textons are small elements of the image, generated by convolution of the image, I, with a specific filter bank (F1, F2, β¦, FNF), i.e.
π = [πΉ1β πΌ, πΉ2β πΌ β¦ πΉππΉβ πΌ] , (4-18)
where NF is the number of filters in the filter bank and R is the set of filter responses. Selecting the filter type and designing the filter bank is an important stage for texton analysis (Zhang et al., 2016). Gabor filters provide strong textural descriptors by considering the local dependencies in both spatial and frequency domain (Grigorescu et al., 2002). Therefore, Gabor filter (Henriksen, 2007) will be used in this work for texton feature extraction, which is defined as
71 πΊ(π₯, π¦; π, π, π, π, πΎ) = exp (βπ₯ β²2+ πΎ2π¦β²2 2 π2 ) exp (π (2π π₯β² π + π)) , (4-19)
where, Ο is the standard deviation of Gaussian envelope, Ξ³ is the spatial aspect ratio, Ξ» is the wavelength of sinusoid and Ο is the phase shift. In Equation (4-19), the terms π₯β² and π¦β² are calculated from the spatial orientation of the filter, ΞΈ, defined as
π₯β² = π₯ cos π + π¦ sin π,
π¦β² = βπ₯ sin π + π¦ cos π.
(4-20)
The values that are set for these parameters will be discussed in Section 4.3.3.
Figure 4-9 shows a set of Gabor filters with different size, directions, and wavelengths of the sinusoid. For more detailed representation, the kernels in the filter bank are categorised based on different configurations of parameters.
Assuming the number of filters in the filter bank is NFB, the FLAIR image is convolved with
all the filters, hence a response vector with length of NFB is generated for each pixel.
Figure 4-10 shows the filter responses generated from convolution of the FLAIR image with the Gabor filters with different parameters. The parameters (i.e. size, direction, and sinusoid wavelength) are separated in order to better illustrate the effect of each parameter on the response.
72
a b
c d
Figure 4-9 Set of Gabor filters which are used for texton feature extraction with different parameters. a) similar sinusoid wavelengths and different sizes and directions, b) similar directions and different sizes and sinusoid wavelengths a) similar sizes and different sinusoid wavelengths and directions, d) 3D representation of Gabor kernels with different sinusoid wavelengths.
The number of the filter response vectors is the same as the number of the pixels in the image. The texton maps are created from the filter bank responses by applying k-means clustering which is NFB dimensional. The number of clusters ktexton is chosen empirically based on the
number of tissues. The major tissues in a MR image of brain with tumour include WM, GM, CSF, core and oedema. Each texton is assigned a texton ID based on the cluster number (i.e. k = [1, 2, β¦,5]). The texton map is a greyscale image with values ranging in the k = [1, 2, β¦,5]. Figure 4-11 shows the process of texton map extraction. The texton feature for superpixels is defined as histogram of the texton IDs within that SP. The IDs are then sorted ascendingly based on the average FLAIR intensity of the group of pixels within each cluster. An example of the texton map and the corresponding texton histogram is illustrated in
73
Figure 4-11. As can be seen, the texton ID histogram of superpixels related to tumour are different from those of a normal brain.
Figure 4-10 Filter responses obtained by convolving the image with the Gabor kernels in the filter bank separately for different size, direction and sinusoid wavelengths.
74
Figure 4-11 Example of calculating texton IDs for normal brain and tumour. The plots present the average texton histogram of the superpixels inside each region, i.e. tumour and normal brain. It should be noted that the IDs are sorted based on the initial k-means cluster points. This is an illustration example and later the clusters will be sorted ascendingly based on the average intensity value of the clusters.
Fractal Features
Fractal features are calculated based on a segmentation based fractal texture analysis method (SFTA) (Costa et al., 2012). In this method, the image is decomposed into a set of binary images based on multi-level thresholds which are computed using the Otsu algorithm (Liao et al., 2001). The number of thresholds Nthreshold is defined by the user which is the tuneable
parameter of the fractal analysis. For single modality MRI data, Nthreshold = 3 is selected which
will be discussed in Section 4.3.3. Thereafter, all the image boundaries are extracted for each binary channel using edge detection (Canny, 1986). The fractal features are calculated from these binary edge channels which include area, intensity and fractal dimension. Area feature is the number of edge pixels in a superpixel. Intensity feature is the mean intensity of image pixels corresponding to the edge pixels in a superpixel. Fractal dimension represents the complexity of the structure of the image and is calculated from image boundary using
π·0= limπβ0 log π (π)
75
where N(Ξ΅) denotes the counting of hyper-cubes (rectangles in the case of 2D space) of dimension E and length Ξ΅. An approximation of fractal distance is obtained from the binary images using box counting algorithm (Schroeder, 2009).
The flowchart of fractal analysis is depicted in Figure 4-12. Figure 4-13 shows fractal features including: area, mean intensity and fractal dimension. Figure 4-14 shows an example of fractal dimension and mean intensity features calculated from healthy and tumour superpixels from one patient data containing a Grade IV glioma. It demonstrates a good separation in feature space (mean intensity-fractal dimension) for FLAIR images.
76
Figure 4-13 An example of fractal analysis applied to a Grade III glioma to generate superpixel based fractal feature maps: a) FLAIR image with the ground truth of oedema, b) area, c) mean intensity, d) fractal dimension.
Figure 4-14 Fractal dimension vs. mean intensity for healthy and tumour superpixels calculated from one FLAIR MRI data with Grade IV glioma.
Curvature Feature
Image curvature is a shape-based feature which is computed by the derivatives along x and y directions of an image, fx and fy. The image normal at pixel (x, y) is calculated using (Arridge,
77
π΅Μ(π₯, π¦) = 1
(ππ₯2+ππ¦2)1 2β ( ππ₯
ππ¦) . (4-22)
The two-dimensional curvature of the image is the divergence of the normal in Equation (4-22) and is calculated using
Curv =ππ₯π₯ππ¦2+ππ¦π¦ππ₯2β2ππ₯π₯ππ₯ππ¦
(ππ₯2+ππ¦2)3 2β , (4-23)
where, fxx and fyy are the second derivatives of the image intensity I(x ,y). The curvature feature
for each superpixel is the average of the curvature values for all the pixels in the superpixel. In the case of fx = fy = 0, a null value will be assigned to the curvature feature.
Table 4-1 Total number of features calculated from an MRI FLAIR image
Feature name Number of features
Statistical 1st order 16
Texton Histogram 5
Fractal 6
Curvature 1
Total 28
In total, 28 features were calculated for each superpixel. The feature vector includes 5 texton histogram features from 5-clusters and 6 fractal features obtained from 3 thresholded binary images (each binary image provides 3 fractal features). All the features are normalized to the range of [0,30], except the 5 texton histogram features. The reason for selecting 30 is that the average number of pixels in the superpixels is approximately 30, which is also the maximum value for the texton histogram counts. This is to ensure that all the features have similar dynamic ranges and are close to the texton histogram values. Table 4-1 shows a list of the features. The details of parameter setting in feature calculation will be discussed in Section 4.3.3.