Pixel-based Approaches - Basic Image Classification Categories

2.2 Basic Image Classification Categories

2.2.1 Pixel-based Approaches

Pixel-based approaches were common when using low resolution data. They were the first to emerge in remote sensing. Each pixel is analyzed one after another and classified individually. Especially in low resolution data, one pixel can cover large areas (e.g. 1.1 km in the case of the Advanced Very High Resolution Radiometer (AVHRR) [57]), and therefore one pixel may contain a mixture of land cover classes. On the contrary, in high resolution data, one pixel may only contain a part of a shadow area located at the edge of a tree crown.

In [58], the potential of very high resolution satellite data for tree species iden- tification based on IKONOS images with a resolution of 1 m panchromatic and 4 m

multispectral was studied. A small percentage of isolated pixels that often were located at the limits between two distinct land cover zones, much like the salt and pepper noise described in [59], was observed. To reduce the salt and pepper effect, a modal filter, which assigns the most frequent value within the filter mask to the each pixel, was applied to the result images using a 3x3 pixel window as the filter mask. It was concluded, that the study reveals the limitations of pixel-based multispectral classification of very high resolution images and suggested that a region based approach seems promising.

Airborne hyperspectral imagery was used to compare a multiple endmember spectral angle mapper (SAM) to a conventional SAM on a pixel basis in [60]. For a spectral angle mapper, the spectra that are extracted from the input data sources are treated as vectors and the angle between the spectrum of the current pixel and a reference spectrum or an endmember is calculated. Endmembers are pure sig- nature spectra of the land cover classes present in the scene [61]. Seven common species in South Africa were used in [60] for the analysis. The sensor operated in the spectral region between 384.8 and 1054.3 nm at a spectral resolution of 9.23 nm, which results in 72 bands, and a spatial resolution of 1.12 m. The Environment for visualizing images (ENVI) was used as image processing software. The multiple endmember SAM is similar to the k-nearest-neighbor classifier, but it uses the angle between the feature vectors containing the spectral values as a measure for simi- larity. A high intraclass spectral variability for all considered species was observed. A bootstrapping approach was used for sample selection and an overall accuracy of 64.1 % was achieved for the multiple endmember spectral mapper in combination with the optimal band combination, which contained 31 out of 72 bands. The most important region for discrimination was reported to be the RE region, which was influenced by chlorophyll amounts and leaf mass or stacking. The authors also reported that including short wavelength infrared bands might improve classification, particularly in cases where interspecies differences in leaf moisture regimes exist.

A concept of decision fusion in a pixel-based context was proposed in [18]. Several preliminary classifications were performed on individual data sources and the results were fused. When classifiers disagreed, modeling the global reliability for each algo- rithm and estimating the point-wise accuracy solved the problem. IKONOS images from urban areas of Reykjavik were used as test images. A fuzzy classifier and a conjugate gradient neural network were compared to the proposed fusion method

of these two classifiers and to fusion methods using other combination rules. The complementary use of classifiers was reported to improve global classification accuracies significantly. Six classes were discriminated and the achieved overall accuracy was 59.1 % in the first image, compared to 40.3 % for the neural network and 52.1 % for the fuzzy logic classifier. For the second image, the fusion method achieved 75.7 % overall accuracy, while the neural network achieved 57.0 % and the fuzzy logic classifier 43.2 %.

A multiclass SVM was compared to a decision tree (DT), a multilayer perceptron neural network and a discriminant analysis classification in [19]. The overall accuracy was used to present classification accuracy. The SVM classifier was reported to be significantly more accurate than the DT and the discriminant analysis. But it was also noted that for the SVM it was critical for the training set to include useful support vectors, which was more likely with a larger training set. The authors also concluded that due to the difference of the classifiers they can be useful candidates in a consensual or ensemble approach. The study was performed on imagery acquired by an airborne thematic mapper over an agricultural region with 11 spectral bands and a ground resolution of 5m. Only three of the 11 spectral bands were used. Six classes were discriminated and the highest accuracy for the test set of 93.75 % was achieved by the SVM .

Two more approaches based on pixel-based classification were presented in [16] and [21] which both use SVMs. [16] analyzed the sensitivity of SVMs regarding random feature selection and compares the approach to a multiple classifier system based on SVMs. The study was conducted on hyperspectral data. The classification accuracy can be significantly improved by the proposed ensemble strategy, up to 15 %, in the experiment. Feature subset size and ensemble size had a significant impact on the accuracy and the stability of SVM ensembles. The highest accuracy achieved with 157 bands from an Airborne Visible/Infrared Imaging spectrometer (AVIRIS), with a resolution of 20 m and a 2048x614 pixel sized image using 22 classes was 97.7 %. The result for the Reflective Optics System Imaging Spectrometer (ROSIS-3) using 103 channels at a resolution of 1.3 m per pixel and a 610x340 pixel size image using nine classes was 81.6 %. Those accuracies were achieved with the SVM ensemble.

In [21] an ensemble approach was employed that was based on the random forest approach, which will be described in section 2.5.5. Three multispectral bands

that were acquired by the IKONOS MS sensor at a resolution of 4 m per pixel in an urban/suburban area were used. The SVM approach was chosen due to the "re- markable generalization and robustness capabilities" and [62] was given as reference. To add spatial information to the spectral features the grey-level co-occurrence ma- trix (GLCM) was used, in particular the mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment and correlation were calculated for each of the spectral bands. Eight classes were defined in a 700x400 pixel area and an overall accuracy of 97.86 % was reported.

Hyperspectral data collected directly above the canopy using a bucket truck was used in [63] and only sunlit portions of the crowns were used. Overall, 280 samples of six species were used. A stepwise discriminant analysis was performed to identify the variables that maximize between-species variability. It was stated, that the visible and NIR regions were most important for species recognition, with some distinct bands in the SWIR region. The spectral discrimination between deciduous and coniferous species was concluded to be likely in an operational context. The discrimination of hardwood species was encouraging, but external influences may worsen classification accuracy. It was noted that the limited species selection should be kept in mind for that study.

In [64] a LIDAR based object oriented discriminant classification approach was used. An accuracy of 89 % for the discrimination between deciduous and coniferous based on height and intensity per return was achieved.

A comparison of multispectral and multitemporal data in high spatial resolution imagery for classification of individual tree species was presented in [49]. Airborne images and false-color infrared images were acquired at nine times from May to October. Four deciduous species were discriminated using a ML classifier for an initial pixel-based classification. Each delineated tree crown was assigned the species that was most frequently identified on the pixels within it. Training pixels were selected in an iterative process to find the samples that best represent the desired classes. The maximum accuracy reported was 76 %. From all spectral bands, the blue band was reported to be the best single band for classification. Multitemporal multispectral data yielded additional information, but for smaller number of dates (less than five) multispectral information was more valuable than multitemporal information. The infrared band was found to be the least valuable band, which is not in accordance with other results e.g. those presented in [65].

Fig. 2.6a shows a pixel-based classification. Fig. 2.6b shows an object-based classification of the same area but on another data set which was recorded after storm damages have occurred. The salt and pepper like effect in the pixel-based classification is clearly visible while it does not occur in the object-based classification.

(a) pixel-based (b) object-based

Figure 2.6: Examples of pixel-based and object-based classification performed on the same area but at different times with storm damages in between.

In document Machine learning based classification for semantic world modeling : support vector machine based decision tree for single tree level forest species mapping (Page 41-45)