Evaluation of spatial scale
3.3.3 Model selection: spatial robustness between areas
To explore spatial robustness between areas, the models were developed on one geographical area (training data) and tested on their ability to predict distributions in the other geographical area (the test data).
3.3.3.1 Faroe-Shetland Channel survey data
The Faroe-Shetland Channel surveys were mainly confined to the hydrographic lines crossing the northern and southern parts of this area, but also included areas
surrounding the Wyville-Thompson Ridge and some shelf waters. A narrower range of depths and halocline strengths, a lower range of SST values, and a wider range of thermocline depths & strengths, and current speeds were recorded in these areas (Table 3.2).
Of the survey variables added to the model, only vessel speed showed the same relationship in both models (Figure 3.3), with reduced detections of sperm whales at higher vessel speeds. Forward step-wise selection of environmental variables resulted in a model that included (in order of importance): depth, thermocline strength,
halocline depth, halocline depth and sea surface salinity. Sperm whale clicks were most likely to be heard in off-shelf waters deeper than 350 m, peaking between 500- 1100 m, and decreasing in preference above 1100 m; shallow (< 100 m) or deep (> 450 m) halocline depth; stronger negative haloclines (fresher deep than surface water); weak thermoclines (> -2°C difference between surface and bottom water temperature); and either fresh (< 35.17 psu) or salty (>35.36) surface salinity. It should be noted that there is considerable uncertainty around some of the predicted relationships. In particular, the smooth of sperm whale occurrence with halocline depth shows very wide confidence intervals around the predicted relationship with the zero preference line being included within the confidence intervals over the whole range of the predictor variable.
(a) (b)
(c) (d)
Figure 3.7 – Sperm whale occurrence per 9km segment (n = 918) modelled as a GAM smooth function of (a) depth (d.f. = 3.6), (b) halocline depth (d.f. = 2.2), (c) halocline strength (d.f. = 1), (d) thermocline strength (d.f. = 2.7), and (e) Sea Surface Salinity SSS (d.f. = 2.5) for Faroe- Shetland Channel survey data collected off the west coast of Scotland between September 2003 and October 2005. Tick marks above the x-axis indicate the distribution of observations in all segments. Dotted lines show 95% confidence intervals.
The overall model was able to explain 36.6% of the deviance of which 33.9% was explained by environmental variables (Table B2, Appendix B):
SpWhOccFSC ~ s(Speed)+ s(Depth) + s(HaloDepth) + s(HaloStrength)
+ s(ThermoStrength) + s(SurfaceSalinity)
The Wald-Wolfowitz test statistic Wz = -14.3, indicating that the residuals were not randomly distributed (p < 0.001), and that there is un-modelled autocorrelation remaining within the data. However, the model performed well on predicting the occurrence of sperm whales in the data set with a very high ROC AUC score of 0.914. The model predicts sperm whale occurrence throughout the deep water areas surveyed (Figure 3.9). There is high predicted probability of occurrence in the deepest parts of the Faroe-Shetland Channel, and in a band just south of the Wyville-Thompson Ridge between the shelf-edge and Rosemary Bank. Low probability of occurrence of sperm whales is predicted for any on-shelf or shallow waters such as the Wyville-Thompson Ridge and the Faroe Bank.
3.3.3.2 Ellet-Line survey data
The Ellet Line surveys were based on hydrographic surveys of the Ellet Line; a
straight line that travels out from shallow waters near to Oban, across and off the shelf edge, crossing the Rockall Trough over the Anton Dorhn seamount and onto the Rockall Bank (Figure 2.1). However, the surveys also included some tracks within adjacent areas within the Rockall Trough to the north up to the Wyville-Thompson Ridge, and further west beyond Bill Bailey’s Bank. The surveys therefore included a much wider range of depths than the Faroe-Shetland Channel surveys, as well as including warmer SST (probably due to the Gulf Stream), a wider range of halocline strengths, and a smaller range of thermocline depths & strengths, and current speeds. None of the survey effect variables were significant in modelling the occurrence of sperm whales in this data set. The environmental variables selected by forward model-selection included (in order of importance): depth, SST, sea surface salinity, and chlorophyll concentration (Figure 3.8). Sperm whale clicks were most likely to be heard in off-shelf waters deeper than 600 m; water cooler than 12.7°C; in water saltier than 35.22 psu; and areas of high primary productivity.
(a) (b)
(c) (d)
Figure 3.8 – Sperm whale occurrence per 9km segment (n = 321) modelled as a GAM smooth function of (a) depth (d.f. = 3.6), (b) SST (d.f. = 1), (c) Sea Surface Salinity SSS (d.f. = 3.0), and (d) surface chlorophyll (d.f. = 1) for Ellet Line survey data collected off the west coast of Scotland between July 2003 and October 2005. Tick marks above the x-axis indicate the distribution of observations in all segments. Dotted lines show 95% confidence intervals.
However, the confidence intervals around the linear relationship between sperm whale occurrence and chlorophyll are very wide, with the lower CI only just above the zero preference line, suggesting a relatively weak positive relationship between the two.
(a)
(b)
Figure 3.9 (previous page) - Spatial prediction of sperm whale occurrence per 20 km grid cell using environmental data from 14 October 2004 for (a) the best Faroe-Shetland Channel (FSC) model, (b) the best Ellet Line (EL) model, and (c) the most spatially robust model based on the variables selected by the FSC and EL models. Based on data collected between July 2003 – October 2005 off the west coast of Scotland. Overlaid on the maps are the effort segments (white dots) and the sperm whale detections (black dots) for each model. Also overlaid are GEBCO depth contours (dark grey lines). Colours represent probability of sperm whale click detection from low (blue) to high (red), ranging from 0-1, colour gradation based on 20 levels using quantile classification.
Overall, 25.6% of the deviance was explained by the resultant model, of which 25.6% was explained by environmental variables (Table B3, Appendix B):
SpWhOccEL ~ s(Depth) + s(SST) + s(Surface Salinity) + s(Chlorophyll)
Less non-randomness was observed in the residuals (Wz = -8.4) than in the previous model and it performed well, with a high ROC AUC score of 0.848. However, the Wald-Wolfowitz test statistic was still significant (Wz = -8.4, p < 0.001), indicating that the model was unable to fully model the autocorrelation.
The model predicted highest probability of occurrence of sperm whales in either the northern part of the Rockall Trough, around the Wyville-Thompson Ridge and west of Bill Bailey’s Bank, or in the deep southern Rockall Trough to the east of Anton Dorhn seamount. There was relatively low predicted probability of occurrence in the mid- depth Rockall Trough (between the 1500 m and 2000 m isobaths).
3.3.3.3 Comparison of Faroe-Shetland Channel & Ellet Line models
Testing each model against its ability to predict the other data set, the Faroe-Shetland Channel model performed the worst, with a maximum ROC AUC score of only 0.695 when depth was the only variable in the model (Figure 3.10). The Ellet Line model performed better, though again with best performance when depth was the only variable in the model, performing better on predicting the Scotia data (ROC AUC 0.822) than the non-Scotia data (ROC AUC 0.759).
These results suggest that the most robust model for predicting distribution over space, based on the two different geographical areas only includes depth, with the smooth based on the Ellet Line survey data:
SpWhOccSpatialRobust ~ s(Depth)EL
This spatially robust model predicts highest probability of sperm whale occurrence in the southern parts of the Rockall Trough to the east of Anton Dohrn seamount, with high probability of occurrence also predicted along the shelf edges, deep Faroe-
Shetland Channel on both deep sides of the Wyville-Thompson Ridge, and to the west of Bill Bailey’s Bank.
(a) (b)
Figure 3.10 - the performance (ROC AUC) of the sperm whale occurrence model based on (a) Faroe-Shetland Channel survey data and (b) Ellet line survey data as applied to the original data (black line) and the opposite data set (blue line) as each of the variables are added to the model: (a) 1 = s(Speed); 2 = s(Depth); 3 = s(HaloDepth); 4 = s(HaloStrength); 5 = s(ThermoStrength); 6 = s(SurfaceSalinity), (right) 1 = s(Depth); 2 = s(SST); 3 = s(SurfaceSalinity); 4 = s(Chlorophyll).