5.3 Mapping of the flood extent
5.3.5 Extraction of the training dataset
The pre-flood SAR image needs to be collocated with the Sentinel-2 optical image prior to the extraction of the SAR training samples from it, as shown in Figure 5.13. The learning dataset is then generated to train the model that will predict the classes of unlabelled data. With respect to the supervised classification of the flood, the training dataset is selected from products derived from the optical image and the pre-flood SAR one. First, the NDWI is calculated from the green and NIR bands of the Sentinel-2 image. This index is used by the remote sensing community to highlight the presence of water.
The NDWI was initially presented in McFeeters (1996) to detect water bodies from multispectral optical images. The NDWI mathematical expression is given by:
N DW I = Green − N IR
Green + N IR (5.30)
where, similarly to Du et al. (2016):
Green: Band 3 in the Sentinel-2 product, N IR: Band 8 in the Sentinel-2 product.
It is based on the idea that water has at the same time a high reflectance in the green band and a low one in the NIR one, while other types of land cover to disregard (soil and vegetation) appear brighter in the latter band (Xu, 2006). The NDWI was originally calculated from multispectral images captured by Landsat’s Multispectral Scanner (MSS) (McFeeters, 1996), but was estimated in further works from Landsat’s ETM+ bands (Xu, 2006), high-resolution Quickbird ones (McFeeters, 2013), and even
5.3. Mapping of the flood extent 99
Figure 5.13: Flowchart of the extraction of the training dataset from the collocated pre-flood SAR image and the optical bands (Green, NIR).
5.3. Mapping of the flood extent 100
from Sentinel-2 images (Du et al., 2016). In rural areas, pixels with a positive NDWI were expected to correspond to water. However, this could lead to false alarms in urban areas where houses rooftops for instance were found to have positive NDWI values, although lower than that of water. Consequently, a higher threshold was determined in McFeeters (2013) (Equation 5.31) by the same author who proposed the NDWI, in order to identify the water surfaces in swimming pools which constitute a common place for mosquitoes to lay their eggs. The NDWI threshold could nevertheless fail to detect water surfaces that are concealed by protruding vegetation or shadowed by trees or buildings. On a positive note, the shadow appearing on the optical images tested was not included in the water mask produced from the NDWI, and this prevents it afterwards from being wrongly categorized with the water class in the SAR training dataset. Class = W ater, if N DW I ≥ 0.3 Land, otherwise (5.31)
Other water indices were also proposed in the literature like the MNDWI (Xu, 2006), which was calculated by replacing the NIR band in Equation (5.30) with the shortwave infrared (SWIR) one. The central wavelengths of the green, NIR, and SWIR bands in Sentinel-2 products are shown in Table 5.1. The MNDWI was proposed to prevent the obtained water mask from including false positives from urban areas, based on the observation that the spectral response from built-up land was higher in the SWIR band compared to the green band. Therefore, built-up areas would be expected to result in negative MNDWI values, in contrast to NDWI ones. However, the SWIR band used to calculate the MNDWI has a 20 m spatial resolution in Sentinel-2 products, as opposed to the 10 m spatial resolution of NIR and visible bands. As a result, the calculation of the MNDWI from Sentinel-2 bands should be preceded either by a downsampling of the green band to 20 m-resolution, or by a sharpening of the SWIR band to 10 m-resolution (Du et al., 2016). Besides, according to a few experiments carried out on Sentinel-2 datasets, the MNDWI still required a threshold greater than zero to identify the water, although it should be lower than the NDWI one according to Xu (2006). Lastly, most
5.3. Mapping of the flood extent 101
Table 5.1: Wavelengths of the green (Band 3), NIR (Band 8), and SWIR (Band 11) bands in Sentinel-2 products (Du et al., 2016).
Band Central wavelength (nm)
Green 560
NIR 842
SWIR 1610
high-resolution optical satellite sensors currently in orbit have a NIR band but lack a SWIR band.
In the current study, the NDWI index is calculated using Equation (5.30) from the green and NIR bands of the Sentinel-2 BOA level-2A optical image, which was collocated with the preprocessed pre-flood SAR one in the previous step. Then, by applying the threshold in Equation (5.31) to the NDWI, a water mask is produced. The land mask is simply the logical negation of the water mask. Both the NDWI-derived water and land masks produced in the previous step were separately multiplied with the pre-flood SAR image present in the same stacked product, to extract from the latter image the pixels belonging to water and land classes, respectively. The previous step depends on the accurate collocation between the optical and the pre-flood SAR image, so that the location of one class (water or land) in the former product matches its location in the latter one.
Both binary masks have zero pixels for areas to filter out, and one-valued pixels for those to keep. Eventually, these water and land binary masks will take care of splitting up the pre-flood SAR image pixels into two classes that form the training dataset, by a simple pixel-by-pixel multiplication between the two:
T rainingW ater = P reSAR · W aterM ask T rainingLand = P reSAR · W aterM ask
5.3. Mapping of the flood extent 102