Understanding differences in performance - performance on individual disasters

6.1 performance on individual disasters

6.1.4 Understanding differences in performance

In all the experiments covered in the previous sections, it was shown that there are clear differences in performance per disaster. Trying to understand where those differences originate from is important for practical purposes. Having a better es- timation of the performance of the model before or while using it in the field can make a lot of difference in usefulness, communication, and acceptance. There are many possible causes for the differences, both quantitative and qualitative. In this section, we will explore some of them.

One influence we noted before is the difference in class distributions. Figure 6.8displays the percentage of data points belonging to a class vs the recall of that class, for each disaster and each of the four classes. From here we can see that there is a monotonically increasing, roughly logarithmic, relation between the class distribution and the recall, revealing that the recall generally increases when a larger percentage of data points belong to that class. Concurrently, we can also see from the figure that by far not all data points follow this trend.

Figure 6.8: Scatter plot of the percentage of data points belonging to a class versus the recall

of that class for each of the 13 tested disasters. The blue line shows the best polynomial fit and the blue area the 95% confidence interval.

6.1 performance on individual disasters 50

Figure 6.9: Distribution plot of the building footprint for buildings up to 700 m2, i.e. 95% of

the buildings. The green area corresponds to the distribution over correctly classified samples, and the red area corresponds to the distribution over incorrectly classified samples.

Another possible explanation for the difference in performance could be the building footprint. A hypothesis is that buildings with smaller footprints are harder to classify due to less optical clues. Figure6.9shows the distributions over the building footprint for the correctly and incorrectly classified buildings, for the 95% percentile of buildings. The figure shows that the distributions of the building footprint over correctly and incorrectly classified buildings largely overlap. This disproves the hypothesis, and thus apparently a larger building size is not a predictor for the chance that a building is correctly classified.

Disaster specific parameters could also be of an influence on the performance. The quantified parameters that belong to this category are the number of satellite image pairs covering a disaster, the number of buildings, the type of disaster, and the geographical region where the disaster struck. A technique to understand the relation between these parameters and the performance is by making a scatter plot of the value of the parameter versus the performance. As performance measure the AUC over the binarized labels destroyed and not destroyed was chosen since this is the measure least influenced by class imbalance. Figure6.10 shows the scatter plots for the four parameters. From these plots, it can be seen that there is no correlation between the AUC and any of the four parameters. Nevertheless, the fact that these parameters do not have an explaining factor in the performance is also an interesting observation. For example, a negative relation between the number of image pairs covering a disaster and the performance could have been expected since more image pairs results in more variety in the data and thus the model has to learn a wider variety of features. Another interesting observation is the non-existent correlation between the number of buildings and the AUC. A smaller number of buildings means less training data and thus it could have been expected that this leads to a lower AUC.

Lastly, we can inspect parameters that are specific for a pre and post satellite image pair. These include the off-nadir angle, panchromatic resolution, Sun azimuth, Sun elevation, and target azimuth. In Figure 6.11 scatterplots for these parameters versus the AUC are shown. For each parameter a scatter plot is made for the sum of the pre and post image of that parameter, and of the absolute difference between the pre and post image of that parameter. Again, we can see that for none of the parameters, there is a very distinct relationship with the AUC. The largest influence seems to come from the sum of the Sun azimuth, the larger this sum the smaller the AUC. This same relation is somewhat visible in the difference in Sun

6.1 performance on individual disasters 51

elevation. Both these parameters have to do with lightning and thus it can be in- terpreted that too little/too much lightning or larger differences in lightning can degrade performance. Surprisingly, the off-nadir angle sum and difference do not show a relationship with the AUC, while a large off-nadir angle clearly changes the appearance of the imagery and is often mentioned as a difficulty in the literature [80,41].

Figure 6.10: Scatter plots of the value of the parameter versus the AUC. One dot belongs to

one disaster.

Section summary

• The model is able to learn damage-related features • The performance differs per disaster

• Intermediate damage classes can be detected from aerial imagery, con- trary to previous arguments

• It is preferable to train a model on a higher granularity of damage labels, even if solely a binary distinction is needed for the purpose • There is a monotonically increasing relation between the percentage of

buildings belonging to a class and the recall of that class

• Specific disaster and image parameters, such as the geographical region and the off-nadir angle, do not explain the difference in performance of the used method in this dataset

6.1 performance on individual disasters 52

Figure 6.11: Scatters plot of the value of the parameter versus the AUC. One dot belongs to

one pre-post satellite image pair. The x-axis of the left column represents the sum of the parameter over the pre and post image. In the right column, the x-axis represents the absolute difference in the parameter value between the pre and post image.

6.2 testing on a disaster without labeled data 53

6.2 testing on a disaster without labeled data

In document The Practical Applicability of a CNN for Automated Building Damage Assessment (Page 61-65)