• No results found

4. Why do models predict differently for the same species and/or locations?

4.3.2 Multivariate analysis

4.3.2.1 Major trends

The density plot for model AUC, Kappa, sensitivity, specificity, precision and cross validation error are given in Figure 4.7. The plots of Kappa, AUC, Sensitivity, Specificity & Precision scores were skewed to the right showing that the majority of the models performed well above that expected from a random prediction. The density plot of the cross- validation error was skewed to the left showing most of the models had low cross validation error. However, not all of the models that had high AUC scores also had low cross- validation error shown by the CV error density curve being less-leptokurtic than that of the AUC density plots. In this study, the Kappa statistic was found to provide the best discriminatory measure. In Figure 4.7, most of the other scores except cross-validation error are majorly leptokurtic with thin tails showing almost all models performed very well whereas the Kappa plot clearly shows a more spread out distribution implying better model ranking. Further statistical support for the choice of Kappa and CV error as model selection indices is given in section 4.3.2.2.

The MANOVA result (Table 4.4) showed that all the modelling components and the interactions had a significant effect on the linear combination of the five model performance scores with the exception of Predictor choice (P). Predictor choice did not have a significant effect (Pillai’s Trace = 0.11, F = 1.50, η2 = 0.05). However, it is important to note that the levels in the predictor choice (P) factor were not completely unique as more variables were added from P1 to P2 and on to P3. Change in variables selected as a result of the newly added variables is reported separately in section 4.3.2.1.

A follow up canonical correlation analysis was undertaken and the first canonical variable accounted for 52.4% of the model variance. The corresponding canonical correlation for the first variable was 0.903 (Wilks λ = 0.015, F= 3.53) showing that 81.5% (0.9032∗ 100) of the variance in the canonically derived scores was accounted for by the model component factors tested (species type, predictor choice, dimension reduction and model type) (Figure 4.8).

98

Figure 4.7: Density plot of Kappa, AUC, Sensitivity and Specificity scores for the total 180 models. Red line at 0.5 on the x axis, in cases of AUC, Cross-validation error, Sensitivity and Specificity shows a score expected from a random prediction; and in case of Kappa score indicated at 0.4 on the x axis shows where models are expected to perform worse than a “medium performing model” on the Kappa scale. The blue line show, 0.8 for Kappa, where models are expected to be excellent; 0.7 for AUC a conventional threshold where models are expected to be good; 0.9 for Sensitivity and Specificity, an arbitrarily assigned high performance threshold; and 0.1 for cross-validation error a threshold set as an acceptable training error margin for this study.

99

Table 4.4: MANOVA results: modelling component effects on model performance.

Modelling components Pillai’s trace η2.100* F Df p

Model type (MT) 0.79 26.22 9.24 3 <0.0001

Dimension reduction (DR) 0.42 21.01 6.86 2 <0.0001

Species (SP) 0.81 20.32 6.68 4 <0.0001

Predictor (P) 0.11 5.50 1.50 2 0.138

Species x Predictor 0.68 13.51 2.58 8 <0.0001

Species x Dimension reduction 0.58 11.65 2.18 8 <0.0001

Predictor x Dimension reduction 0.49 12.37 3.70 4 <0.0001

Species x Predictor x Dim. Red. 0.95 18.98 1.93 16 <0.0001

Residuals 26.22 132

* The effect size (eta square) is multiplied by a factor of 100 for easy reporting

Figure 4.8: Structure correlations (canonical factor loadings) for the first canonical dimension Arrows show the vector direction of variables that correspond to the canonical component on the y- axis. The corresponding variables for the x-axis (combinations of modelling components) were not labelled so not to overcrowd the graph.

4.3.2.2 Model performance measure selection

The canonical correlation analysis was used to determine the model performance measures that most described the effects of the modelling components. The standardized coefficients of the canonical correlation analysis showed that the Kappa score contributed most of the variance of the first canonical variable (79.9%) and cross-validation error contributed the most for the second canonical variable (62.7%). The strong, negative correlation between Kappa and cross-validation error was also a further indication that the multivariate analysis

100

was supported by appropriate dependent variables as recommended by Tabachnick and Fidell (2001). Therefore, Kappa score and cross-validation error were used to further investigate the significant model component interactions using individual ANOVA and Tukey’s Honestly significant difference (Tukey’s HSD) post-hoc analysis.

4.3.2.3 Quantifying variance contribution of modelling factors

Individual follow-up ANOVA’s were performed for Kappa and cross-validation error scores and the results largely agree with the MANOVA analysis. Even though smaller residuals were obtained for the ANOVA based on cross-validation error scores, the general ANOVA statistic for Kappa and CV error scores were similar. Therefore the statistics for Kappa scores are presented below. All main effects were significant (ANOVA test, SS > 0.24, η2 > 0.12, p < 0.0001) with the exception of predictor choice (SS = 0.007, η2= 0.003, p = 0.82). All interactions were also significant (ANOVA test, SS between 0.17 – 0.52, η2 between 0.09 and 0.22, p between 0.0001 and 0.013).

Hierarchical partitioning (Chevan & Sutherland, 1991; MacNally, 2000) was carried out to quantify the independent contribution of the modelling factors, species data (SP), predictor choice (P), dimension reduction (DR) and model types (MT) on mean Kappa and cross validation scores. Accordingly, species data (SP) was identified as the source of the largest variation both in Kappa scores and model cross-validation errors (54.8% and 47.5 % respectively) followed by model types (MT) which accounted for 38.1% and 43.8% of the variations in Kappa and CV error scores respectively. Dimension reduction (DR) accounted for 6.8% in Kappa score variation and 8.6% in cross validation error variation, and predictor choice (P) scored 0.2% and 0.1% for Kappa and cross validation score variation respectively. The overall trend largely conforms to the results reported by Dormann et al. (2008) in their factorial study to quantify modelling uncertainties involving similar modelling components (not including the species data (SP) factor as one species was used in their study). The importance of model types as a source of major variation in predictions is also reported by similar studies (Elith et al., 2006; Pearson et al., 2006; Buisson et al., 2010).

101

Figure 4.9 Model mean Kappa scores compared over the four modelling components. Error bars indicate the standard deviation over replicate runs. Bars with different letters within a graph indicate statistically significant differences (Tukey’s HSD test, α = 0.025 for SP & DR, α = 0.05 for P & MT). Key to factor levels: Species data [SP], A. a = SP1, A. g = SP2, D. v. v = SP3, T. p = SP4, V. v = SP5. Predictor choice [P], BIO35+T4 =P3, BIO19 = P1 and BIO35 = P2. Dimension reduction [DR], RF= DR1, PCA = DR2, NLPCA= DR3. Model type [MT], QDA=MT1, LOG=MT2, CART = MT3, SVM= MT4. The comparison of mean CV errors also showed the same pattern except for a slightly higher CV-error forBIO35+T4 (P3) than BIO19 (P1) which is the opposite of the trend for Kappa scores, however because the differences within the PC group were not significant it was not investigated further.

Figure 4.10 Model mean CV error scores compared by the four different modelling components. Error bars indicate the standard deviation over replicate runs. Bars with different letters within a graph indicate statistically significant differences (Tukey’s HSD test, α = 0.025 for SP &DR, α = 0.05 for P &MT).

102