Visual inspection - Land Cover and Crop Type Classification

4.2 Land Cover and Crop Type Classification

4.2.3 Visual inspection

In order to find if what was classified in the map correspond to reality (and vice versa), three examples were selected to undergo visual inspection using as base map the orthophotos available as Web Map Service (WMS) at DGT for 2018 in False Color (RGB: b8, b4, b3 -NIR, Red, Green). The aim is to find if the pre-processing rules allowed to select the samples correctly for accurately predicting the class. Also, a close inspection of the tile transitions is performed to see if the use of information from the whole biogeographic region ensured continuity in the classification.

52 Class correctly predicted on the map

The first example corresponds to where “Open Maritime Pine” (PA=78) is equal to “Open Maritime Pine”, and it illustrates when the labels allow classifying the class correctly. In Figure 25, the COS 2018 polygon corresponding to the CODE 3121 “Florestas de Pinheiro bravo” was reclassified to Maritime Pine. According to the COS guidelines [7], the polygon contains 75% or more of the total area covered by forest. Wherein, it contains a regular network of service roads inside the polygon that can have the same probability of being selected as “Maritime Pine”. Nevertheless, after the application of the pre- processing steps in section 3.3.2, the area was classified as “Open Maritime Pine Forest” because it contains more than 10% and less than 60% of coniferous tree cover according to the HRL. When including the samples for this class, neither the services roads nor logged areas are taken into consideration for feature extraction, reducing the possible misclassifications. As it is possible to visualize, the final classification coincides with the class; however, the service road network is classified as Baresoil and Urban in some areas, being Baresoil more appropriate.

Figure 25 COS 2018 (OBJECTID: 491011) polygon pre and post-processed comparison to predictions for the class Open Maritime Pine Forest. Scale: 1:30.000

Next, it is possible to appreciate Where “Holm Oak” (PA=94) is equal to “Holm Oak”; the COS 2018 polygon is labeled as a pure forest of Holm Oak. The HRL mask allowed to allocate training and testing points only where the tree cover density was higher than 60%. However, when overlaying the IFAP 2018 dataset, as this dataset is prioritized over

COS, it ends up removing some Holm Oak areas in favor of agricultural grasslands. The final classification is a combination of Holm Oak and agricultural grasslands for the polygon.

Figure 26 COS 2018 (OBJECTID: 382944) polygon pre and post-processed comparison to predictions for the class Holm Oak. Scale: 1:10.000

Class incorrectly predicted on the map

In the confusion matrix, one of the classes with the lowest accuracy is Orchards; in this example Where “Orchards” (PA=35) is equal to “Natural grassland” it is possible to appreciate a classification issue related to plantations. This class was assigned 201 times by commission error to natural grasslands. One of the difficulties in classifying this class is that it contains 17 types of trees ranging from citrus to almonds. Also, orchards are usually planted in 2m separation, meaning that the surface reflectance values captured correspond to a mixture of the crop and soil as the MMU of the Sentinel-2 is 10m. In the first square of Figure 27, it is possible to visualize that half of the polygon contains more vegetation intra rows at the soil level, this can correspond to creeping vegetation (i.e., close to the ground). The final classification dictated that the polygon is considered as olive trees and natural grassland; nothing was classified into the class they genuinely belong.

Figure 27 IFAP 2018 (OSAID: 4410598) polygon pre and post-processed comparison to predictions for the class Orchards. Scale: 1:6000

Tile transitions

The aim was to classify the biogeographic region corresponding to the strata 214 in Figure 2. Though this area was covered by two separate Sentinel-2 tiles, which could entrain discontinuities in their limits [3]. Yet, the approach was to automatically extract the samples for the whole study area and retrieving the features from both tiles. Hence, the classifier contained the information for the overall strata, allowing adjacent pixels in the borders to be assigned in the same class as it can be appreciated in Figure 28 that portrays the Land Cover and Crop Type map in three locations (a). The first location (b) corresponds to the Tejo estuary, preserving the continuity of the river and wetlands. The other two locations that are in the border of Santarém and Sétubal (c) and near Évora (d) kept the continuity of classes such as forests and rice fields along the tiles.

Figure 28 (a) Land Cover and Crop Type in raster format with three locations on the border of the Sentinel-2 tiles 29SND (upper) and 29SNC (lower), (b) Tejo estuary (c)

5 CONCLUSIONS

Up-to-date land cover and crop type information play an essential role in commercial and environmental monitoring and planning. For its updating, they have benefited from remote sensing imagery at a national, continental and global level. However, many challenges remain to produce accurate and timely land cover and crop type maps. This thesis focused on the use of intra-annual composites of Sentinel-2, supervised classification with random forest, and automatic sample extraction based on a pre-processing set of rules. The overall accuracy of 76% was achieved for 31 land cover and crop type classes.

The use of monthly composites of L2 Sentinel-2 data allowed having cloud-free data in contrast to single acquisitions that have missing values due to cloud cover or cloud shadows. Also, since the classification is done at the pixel level, having missing data would affect the spectral signature extraction and incompletely characterize the classes with missing data. Likewise, the composites represent an excellent opportunity for dimensionality reduction as the number of features would correspond to 10 bands per month. For single acquisitions, each acquisition would contain 10 bands, and the number of features would increase based on acquisitions during the period.

The Random Forest classifier required few hyperparameters to tune as opposed to other classifiers and proved to be computationally efficient as it was possible to parallelize it (multi-core processing) to classify the whole area. Also, it allowed extracting the most important features during the classification. As expected, the most relevant features from the time series correspond to the spring and summer months and the bands on the Red Edge (b5, b6, b7), NIR (b8a) and SWIR (b11 and b12). The inclusion of spectral indices slightly improved the accuracy but was not a predominant variable.

One of the purposes of this research was to test if a pre-defined set of rules could remove possible sources of misclassification, allowing us to extract samples for training and testing automatically. This would permit the classifier to adequately characterize the spectral signature for each class and make an accurate prediction. The data sources (IFAP 2018 and COS 2018) themselves are a product of visual interpretation of high-resolution imagery; in the case of the LPIS, the yearly update of the product and the MMU of a parcel allowed to characterize the types of crops. Nevertheless, the agricultural grassland class coverage was over-optimistic in this dataset and sometimes would mask out forest areas causing several mix-ups within classes.

Regarding the filtering rules, the application of the burned mask allowed to remove from the dataset the areas that ignited with wildfires. Though, in some cases, the burn mask includes build-up areas and water leading to confusion within classes. Therefore, the burn mask can benefit from a set of pre-processing rules before sample extraction, such as build- up, cannot be part of the burned mask and neither water. The following filter was the NDVI alerts; these were produced for the year 2015-2018 from Landsat 8 images at 30m resolution, containing an omission error of 33% [51]. This implicates that some changes are not detected. Also, the difference in pixel size between satellites reduces the precision in the detection of these areas; the same approach yet implementing Sentinel-2 imagery might improve the identification of clear-cuts for this study. Lastly, the HRL rules reduced the number of samples available per class dramatically. By removing many of the forest pixels, the spectral signature was not precisely characterized, and the model could not classify the whole area accurately. Forest in the Portuguese landscape is not as dense as trees are sparsely distributed in space; therefore, a decrease in the tree cover density is encouraged for detecting the forest types.

In document Landcover and crop type classification with intra-annual times series of sentinel-2 and machine learning at central Portugal (Page 63-69)