Inter-comparison

Chapter 5. Sensitivity to Data & Parameters

5.1 Precipitation

5.1.2 Inter-comparison

Model performance when using MSWEP, E-OBS, ERA-Interim and ERA5 rainfall was measured against the performance of the models driven using MIDAS data from Chapter 4. Performance was assessed in terms of flood peak, timing and extent metrics. All models were found to generally under-estimate flood peaks at the 12 selected gauges, as seen in Figure 5-1. Overall, models using MIDAS data tended to outperform those using the more widely available products. One explanation is the higher spatial and temporal resolution of MIDAS compared to the other datasets. Comparing performance of models using the two ECMWF re-analysis datasets, those driven by ERA 5 outperformed others using its predecessor ERA Interim. Again, a large contributor to this improvement is the increased spatial and temporal resolution of ERA5 compared with ERA Interim.

Model performance was most variable across gauges when using MSWEP rainfall, which could be explained by the wide range of sources which are drawn on to create the dataset. Although EOBS is derived directly from precipitation gauge observations, models using E-OBS data performed to a similar standard as those using ERA-5. This may have been caused by the low temporal resolution of EOBS, which is limited to daily time steps. E-OBS also only interpolates

observations with no representation of physical processes, as opposed to the more dynamic data assimilation and modelling in ERA5. However, as the information describing what observations were used in creating ERA5 is currently not easily accessible, it is difficult to suggest what may be causing this.

Figure 5-1 – Peak and timing error using HydroSHEDS 90m DEM and a range of rainfall products.

Notably, the performance of models using ERA Interim data is lower here than when the same data was used to simulate large European basins in Section 4.6. This is likely to be a consequence of either the increased number of 80 km ERA-Interim cells falling within the larger basins or a better representation of atmospheric dynamics over the more continental European areas. The storm events causing large peaks in rivers such as the Po and Rhone are also undoubtedly larger than Storm Desmond and therefore more likely to be realistically captured by the re-analysis data.

Timing error is shown in Figure 5-1 (right) between observed and simulated peaks at the selected gauges for models using each P dataset. There is greater variation in performance here than in the peak error results. Again, models using MIDAS data demonstrate the highest

level of accuracy, closely followed by those driven with ERA5. This makes sense as MIDAS and ERA5 are joint first in terms of temporal resolution, with hourly time steps.

Unfortunately, spurious error measurements have been caused in some cases, as an earlier peak exceeds the main peak in the observed series, causing an over-representation of timing error, as seen in Figure 5-2. This false peak problem is clearly affecting simulations using MSWEP rainfall, as seen in the timing errors. It is difficult to draw any meaningful conclusions about model timing performance when the wrong peak is used to generate a timing error. The problem is caused by an earlier burst of rainfall being over-estimated and the more intense, main period being under-estimated.

Figure 5-2 - Example of a spurious early peak from a simulation using MSWEP rainfall causing incorrect timing error.

In the case of MSWEP, it is difficult to speculate as to the direct reason for a spurious early peak as it is not clear what proportion of gauged, radar and re-analysis data has been used in a given area. Some attempts were made at using peak identification algorithms to find the nearest high point to the observed and use this to calculate both peak and timing error, however this is also subject to misinterpretation. If the main peak is delayed and preceded by

a smaller peak, timing error will be underestimated and peak error overestimated. Therefore, the standard approach of comparing the highest peaks in the series had to be used.

Flood peaks from models using E-OBS data are approximately 12 hours behind the observed peaks, on average, which could be explained by the daily values being positioned at 12am rather than 12pm, in the middle of the accumulation period. Some difficulty was found in ascertaining the original observing period of the data used when creating E-OBS, which may explain this offset (Haylock et al., 2008). The dramatic improvement in model performance seen when upgrading from ERA Interim rainfall to ERA5 is partly due to an increase in temporal resolution from 3 hourly to hourly, but a better representation of extreme values has also contributed to ensuring that the correct peak is designated as the largest in the series.

Hit Rate (HR), False Alarm Ratio (FAR) and Critical Success Index (CSI) were used to compare the extent similarity of models using each P dataset to remotely sensed extents. Further detail about the metrics and methods used can be found in Section 4.4.2. Metrics were calculated for each basin and the results were averaged to produce over-all performance statistics. When compared with extents from Sentinel 1, there is minimal variation in the performance of models using different rainfall datasets. Figure 5-3 (right) shows some internal variation in hit rate and false alarm ratio, however this does not correspond to improved CSI due to a consistently high FAR, indicating that CityCAT is flooding more area than observed by Sentinel 1. For example, considering hit rate alone, models driven by MSWEP appear to perform best, but the same simulations also produced a high FAR, leading to a very similar CSI to the other datasets. Models using ERA-Interim produced the lowest HR results but their CSI was not significantly lower than those using MSWEP.

Figure 5-3 - Metrics describing the accuracy of maximum extent when compared with outputs from Sentinel 1 for the same period and polygons from EA Flood Zone 2.

Again, the EA data did not produce a large amount of variation in CSI. Models using ERA Interim and EOBS produced the lowest CSI values of 0.27. When ERA5, MIDAS and MSWEP were used, scores improved to 0.28, 0.29 and 0.30 respectively. Overall, the performance is still generally relatively low, however these scores ignore internal variations within basins which are shown in Figure 4-18. Despite the lack of significant variability in CSI, their ranking does approximately correspond to the spatial resolution of the datasets, as shown in Table 12. ERA Interim has a grid spacing of ~80 km which equates to cells more than seven times the size of ERA5, MSWEP and EOBS, each with ~30 km resolution. Although MIDAS has the highest resolution grid at 1 km, models using MIDAS data did not produce the highest CSI. This may have been caused by the sparsity of rain gauges or could indicate that the accuracy of rainfall input is no longer the limiting factor and the DEM quality is more significant.

the data from Sentinel 1 nor the EA distinguishes depths, there is no convenient way to verify if this threshold is suitable. Any reflective surface may also be classified as flooded in the change-detection algorithm, creating incorrectly flooded areas.

In document Broad-scale flood modelling in the cloud : validation and sensitivities from hazard to impact (Page 120-125)

Chapter 5. Sensitivity to Data &amp; Parameters

5.1 Precipitation

5.1.2 Inter-comparison

Chapter 5. Sensitivity to Data & Parameters