Placing additional samples - Machine learning based classification for semantic world modeling

In some cases, there is insufficient reference data available as e.g. in Glindfeld. It is located east of test area Schmallenberg and has a size of 4.7 km2_{. Only 25}

samples were available in the sample plot inventory data, consisting of two beeches, 22 spruces and one Douglas fir samples. The classification based solely on these samples is shown in Fig. 4.19.

In the reliability estimates in Fig. 4.19b, white denotes high reliability. Due to the insufficient number of samples, the algorithm believes to produce reliable results, although the produced classification is actually extremely poor. As the area was well-known, some additional samples could be placed using the 4D-GIS. The result is shown in Fig. 4.20.

The classification is already much better than the original classification based on the sample inventory data. The algorithm can now also classify Douglas firs and oaks due to the additionally placed samples. But there are still some errors in the classification as in the upper right part some deciduous trees are misclassified as spruce and in the upper left part of the image, some spruces are misclassified as beech. Also, the reliability shows rather low values for the upper half of the image. The misclassified regions were used in the iteration loop to place additional samples in the areas that are misclassified in Fig. 4.20. The borders between the classes are defined more clearly by the additionally placed samples at the misclassified regions, as objects have characteristics that are more similar to other classes, which led to the misclassification in the first place. By choosing some of these objects as samples, the characteristics of the classes are refined. The samples that were placed in addition to the sample plot inventory data are shown in Fig. 4.22.

(a) Classification (b) Reliability Figure 4.19: Classification based on only 25 samples

(a) Classification (b) Reliability Figure 4.20: Classification based on additionally placed samples

(a) Classification (b) Reliability

Figure 4.22: Additionally placed samples

fication of spruce as beech. Oak samples in the right part of the image allow the classification of oak, as oak was not present in the sample plot inventory data at all. Furthermore, they avoid oak to be misclassified as Douglas fir or spruce. Douglas fir was not classified at all in the first classification as only one sample was available in the sample plot inventory data. The additional samples allow classifying Douglas fir. Fig. 4.21 shows the classification result for all samples, including the ones placed in the iteration loop.

The samples led to a very good classification result which is considered to be quite reliable. In the very upper part of the image to the right, there is an area that is still considered to be unreliable, which is an accurate estimation, as this area contains a significant proportion of larch, which is not represented by any of the samples and therefore cannot be classified. However, the exact location of the larch trees in this mixed stand are unknown and therefore no samples could be placed.

Developed Classification Approaches

The first experiments were performed using pixel based analysis. But pixel based analysis showed to be difficult for several reasons. As noted in [58, 77, 81] a salt and pepper effect as shown in Fig. 5.1 was observed in the resulting images. The salt and pepper effect is undesired, as the single pixels do not denote single trees but are a result of under- or overexposure, e.g. shadow areas between the trees or small very light parts within a tree crown.

Figure 5.1: Pixel based classification with salt and pepper effect

5.1 Decision Tree

As a proof of concept for tree species classification based on distinct spectral high resolution bands, a decision tree was manually induced, based on the exploratory data analysis in section 4.2. The region images, as described in section 3.1.3, were calculated from the normalized digital surface model (nDSM) and used as image objects in the classification approach.

5.1.1 Tree Structure

After filtering shadow and non-forest areas using the nDSM and the overall bright- ness of the color infrared (CIR) and simple user-defined thresholds, the objects are classified according to the tree in Fig. 5.2. For this proof of concept, the other broadleaved species with short rotation time (OBS.) species were omitted, as this group is a mixture of several different species.

SWIR-R, SWIR-G, SWIR-NIR, NIR-G

NIR-B, NIR-G, NIR-R, SWIR-NIR, B-G, SWIR-G AISA25 / AISA146, SWIR-NIR, SWIR-R

spruce

Douglas fir, spruce beech, oak, larch

Douglas fir beech AISA25 / AISA151, NIR-G, SWIR-NIR, G-R

oak larch

oak, larch

Figure 5.2: Manually induced decision tree

The decision tree was induced using features with good proportion of interspecies to intraspecies variability as described in section 4.2. In the first step, Douglas fir and spruce are separated from beech, oak and larch using four difference bands, namely SW IR − R, SW IR − G, SW IR − NIR and NIR − G. In the second level of the decision tree, Douglas fir is separated from spruce using the ratio band AISA25/AISA146and the difference bands SW IR − NIR and SW IR − R. In the second branch, beech is first separated from oak and larch using six difference bands, in particular NIR −B, NIR −G, NIR −R, SW IR−NIR, B −G and SW IR−G. In the last step, oak is separated from larch using one ratio and three difference bands, nominally AISA25/AISA151, NIR − G, SW IR − NIR and G − R. For

each of these bands a user-defined threshold is used, which can also be extracted from available training data automatically. AISA25, AISA146 and AISA151 are the 25th, the 146th and the 151st AISA++ hyperspectral bands respectively, with wavelengths of 1.126 µm, 1.888 µm and 1.920 µm.

At each node, the appropriate branch or species is chosen and a reliability estimate is derived from the classification procedure. The reliability depends on the distance of the features used in the current decision, from the thresholds and also depends on the variabilities and spectral overlaps that were estimated during training, along with the thresholds. These reliability images are a valuable source of information, as they point to areas, where the results need to be confirmed by an expert. These confirmed or corrected data points can then be used as additional training data in a second run to refine the algorithm.

5.1.2 Automatic Threshold Selection

After the structure of the tree is determined, the thresholds can be user defined or calculated automatically from a training data set. For each feature at each node, the mean and the standard deviation of the two class groupings corresponding to the two branches are calculated. The difference between the two mean values, the interspecies variability, is then divided such that the quotient of the two subdivisions equals the quotient of the two standard deviations. The two subdivisions of the distance between the mean values are called partial weighted distance. The threshold is then calculated as the sum of the mean of the lower value class group and the according partial weighted distance. Assuming that the class grouping g1 has the lower mean value in the currently assessed feature, the according formula is given in (5.1).

threshold = µg1+

σg1

σg1+ σg2

· (µg2− µg1) (5.1)

Equation (5.1) takes the standard deviations and therefore the width of the distributions into account. Although the distributions are not Gaussian, they are still similar to Gaussian distributions and the standard deviations are used as an ap- proximation. More sophisticated approaches exist, like expectation maximization, but the described approach guarantees faster execution time and is sufficient for the manual decision tree as proof of concept for very high resolution classification based on multispectral bands.

In document Machine learning based classification for semantic world modeling : support vector machine based decision tree for single tree level forest species mapping (Page 148-154)