5. Mapping sub-Antarctic cushion plants: using random forests to combine very high resolution
5.4.1. Testing random forest classification
5.4.1.1. Testing the training method
Using a combination of spectral and terrain variables, we tested the effect of the training method on the accuracy of the image classifications, using two measures of accuracy. Both OOB accuracy estimates and independent validation of the present class showed that the multiple pixel approach provided the highest accuracy, though it provided only a small improvement over the single pixel sampling method (Fig. 5). The object-based method was the least accurate by both measures. The OOB accuracy estimate tended to over-estimate the accuracy of the multiple pixel training method, producing accuracies near 100% for every subset of input variables we examined. It appeared to be vulnerable to the effects of spatial autocorrelation in the sampling method. In contrast, the
independent validation estimate was not affected by this.
The random forest classifications based on a combination of terrain and spectral input variables accurately predicted the presence of Azorella with very high accuracy. Independent validation of the present class showed that the single pixel (89.8%) and multiple pixel (90.9%) approaches had very similar accuracy, with both performing better than the object-based classification (81.8%). The OOB accuracy estimate could not separate these three models.
After inspection of the variable importance plots, the same input variables were selected for both the single pixel- and object-based classifications: elevation, solar radiation, coast distance, NIR 1, NIR 2, red edge, NDVI and the GLCM Mean texture measure. These variables were also selected for the subsets used in the terrain and spectral data classifications. The multiple pixel classification required most of the same variables, but replaced the two NIR spectral bands with ridgeness.
The partial dependence plots (Fig. 6) showed that Azorella presence typically occurs at elevations above 200 m, with NIR 1 reflectance values below 550, NDVI values below 0.55, GLCM Mean values less than 50, distances from the coast greater than 600 m, the highest values for solar radiation (> 5.6 MWh/m2), red edge reflectance below 650, and NIR 2 reflectance less than 850 (out of 2048 DN values).
129 On Macquarie Island, Azorella typically grows on the highest parts of the island (i.e. those areas with high elevation, solar radiation and distance from the coast); with low to medium reflectance in the NIR and red edge portions of the spectrum, and hence to be in open vegetation; and with low values in the NDVI and the mean values of the GLCM matrix based on the NDVI layer (Fig. 6). The slope, shape of the terrain, topographic position and wetness index had little effect on the classifications, as shown by low variable importance values, and by the fact that excluding them from reduced classification models did not decrease the accuracy of the classifications, and often increased the accuracy marginally. The maps produced by the three classifications appeared very similar, though the multiple pixel training method produced a slightly more fragmented pattern of Azorella
distribution than either of the other two hybrid classifications. In conjunction with the high validation accuracy, this indicates that the multiple pixel approach was better able to capture the variability in the dataset.
Relying on the variable importance measures to select the variables for inclusion in the reduced model showed that terrain variables were most important to the classification, with the role of spectral data largely being confined to locating areas with sparse vegetation. Spearman rank correlation coefficients showed that the spectral variables incorporated into these models were strongly correlated with NDVI (Table 4), indicating that the major role of spectral variables in these classifications was to select areas with sparse vegetation.
Table 4: Spearman rank correlation coefficients between NDVI and other spectral variables selected in the hybrid classifications based on terrain, spectral and hybrid sets of input variables. Strong correlations between input variables need not result in the variables being excluded from the RF models, but they do complicate the interpretation.
Spectral Variable Correlation with NDVI GLCM Mean 0.94 Red Edge 0.82 NIR 1 0.89 NIR 2 0.87
130
Fig 5: Predicted Azorella presence on northern Macquarie Island, from single pixel, object and multiple pixel based classifications of the hybrid variables. The main map shows the hard classes for Azorella presence, with the individual prediction layers made partially transparent to demonstrate the overlaps in the predicted distributions. The inset maps show the probability of Azorella presence based on single-pixel (A); multiple pixel (B); and object-based (C) classifications. The differences among the three image training methods were subtle.
Single Pixel Multiple Pixel Object Cloud Lake (A) (B) (C) 0 1 2 3 km
l
131 Fig. 6: Partial dependence plots for the variables selected for the reduced hybrid multiple pixel-based classification of Azorella presence/absence. The variables included were chosen on the basis of the variable importance measures. The y-axis shows a relative measure of the marginal effect on the probability of Azorella presence. Azorella is associated with high values for elevation, distance from coast, ridgeness, and solar radiation; and with low values for GLCM Mean, NDVI and red edge reflectance.
132
To further explore the respective roles of terrain and spectral variables in predicting the presence of Azorella, we produced classifications using the terrain and spectral variables in isolation.