Materials and Methods Presence absence data

Location records were acquired for 12 species of New Zealand’s endemic alpine grasshoppers and four mitochondrial lineages: Alpinacris crassicauda, Alpinacris tumidicauda, Brachaspis collinus, Brachaspis nivalis (SA), Brachaspis nivalis (FP), Paprides dugdali, Paprides nitidus, Sigaus australis, Sigaus campestris, Sigaus childi, Sigaus minutus, Sigaus piliferus (LW), Sigaus piliferus (MH) and Sigaus villosus. Brachaspis nivalis (SA) represents the northern clade of B. nivalis, while B. nivalis (FP) represents the southern clade. The central and northern clade of S. piliferus is represented by S. piliferus (LW) and the southern clade by S. piliferus (MH). Journal articles, books and Crown Pastoral Lease Tenure Reviews (CPLTR) were reviewed for location information. Crown Pastoral Lease Tenure Reviews (CPLTR) are produced by Land Information New Zealand (LENZ) and contain conservation reports and ecological surveys carried out by the Department of Conservation on stations throughout the South Island of New Zealand. Further records were retrieved from the Phoenix Lab collection at Massey University. Longitudinal and latitudinal coordinates for each location were obtained from NZ Topo Map (www.topomap.co.nz).

Location records for two other endemic lowland grasshopper species: Phaulicridium marginale and Phaulicridium otagoense were also included in the data set (Sivyer et al. 2018). Both of these species have a lowland distribution and the addition of their location data points increases the accuracy of the niche models by providing additional absence information. All records were pooled into a binary table where for each location the presence or absence of each species was recorded. The presence points of each species were mapped in turn using QGIS v.2.16.1 (QGIS Development Team 2017). This was in order to demonstrate what information was provided to the models (in the form of presence points for each species) and also so that the distribution of each species could be visualised. The map of New Zealand used to project these distributions onto was created using the “Vegetative Cover Map of New Zealand” file obtained from the LRIS Portal (https://lris.scinfo.org.nz/) (Leathwick et al. 2012, Newsome 1987). The files were edited in QGIS, where broad vegetation groupings were regrouped and coloured by: native forest, agricultural land, tussock grasslands, urban, wetlands, sand- dunes and ice and waterways categories.

151 Predictor variable layers

In order to define and project the potential niche of each grasshopper, 19 Bioclimatic variables were obtained from the Worldclim website (http://www.worldclim.org/) for three different time periods – the Last Glacial Maximum (LGM) (~15,000), ‘current’ (e.g. data averaged from 1960-1990) and ‘future’ (e.g. data predicted and averaged from 2061-2080) (Table 4.2). These were at a resolution of 2.5 arc minutes (approx. 5km2) for the LGM and 30 arc-seconds (approx. 1km2) for the current and future layers. Two future predicted climate layers were acquired for the ‘future’ time period, RCP2.6 and RCP8.5, representing potential low and high greenhouse gas concentration scenarios, respectively. All climate layers acquired were produced using the MIROC- ESM global climate model (Watanabe et al. 2011). The Worldclim files were cropped to the extent of New Zealand (Latitudes: -49,-32; Longitudes: 165,180) using QGIS. A Variance Inflation Factor (VIF) analysis refined the list of 19 climate variables down to seven using the R package ‘VIF’ (Lin et al. 2011). Variance Inflation Factor analyses use stepwise selection to identify and remove collinear variables. This is done in order to increase parsimony and minimize over-fitting during the modelling process (Fletcher et al. 2016).

Soil and vegetation (i.e. vege) layers were acquired. These two variables are known to influence the distribution of some grasshopper species (Nattier et al. 2013; Weiss et al. 2013). The soil and vege data layers were rasterized, clipped and re-scaled from their original files “Fundamental Soil Layers New Zealand Soil Classification” and “Vegetative Cover Map of New Zealand”, obtained through the Land Resource Information System (LRIS) portal (Landcare Research New Zealand Limited 2010, Leathwick et al. 2012). No past (e.g. LGM) or future (e.g. 2070) GIS files exist for soil and vegetation cover in New Zealand, these layers represent the current approximate state of vegetation and soil in New Zealand, and were therefore used as static layers throughout the modelling process. Ecological Niche Models that include such static layers are known to perform as well as, or better than, models where only dynamic variables are included (Stanton et al. 2012).

152

Table 4.2 The 21 predictor variables initially investigated for use in the Ecological Niche Models. This list was refined to nine variables following a Variance Inflation Factor (VIF) analysis, with variables retained for modelling shaded in grey.

Variable Code Variable Description

Bio1 Annual mean temperature

Bio2 Mean diurnal range (Mean of monthly (max temp - min temp))

Bio3 Isothermality (BIO2/BIO7) (* 100)

Bio4 Temperature seasonality (standard deviation *100)

Bio5 Max temperature of warmest month

Bio6 Min temperature of coldest month

Bio7 Temperature annual range (BIO5-BIO6)

Bio8 Mean temperature of wettest quarter

Bio9 Mean temperature of driest quarter

Bio10 Mean temperature of warmest quarter

Bio11 Mean temperature of coldest quarter

Bio12 Annual precipitation

Bio13 Precipitation of wettest month

Bio14 Precipitation of driest month

Bio15 Precipitation seasonality (Coefficient of Variation)

Bio16 Precipitation of wettest quarter

Bio17 Precipitation of driest quarter

Bio18 Precipitation of warmest quarter

Bio19 Precipitation of coldest quarter

Soil Fundamental soil layers of New Zealand

153 Ecological niche models

The R package ‘biomod2’ v. 3.3-7 (Thuiller et al. 2016) was used to model the environmental niche of each species. Ecological Niche Models were produced at two different resolutions of taxonomy. The first degree of taxonomic modelling was at a species level, modelling the ecological niches of each of the 12 alpine grasshopper species. In this instance, Brachaspis nivalis (SA) and Brachaspis nivalis (FP) were combined into Brachaspis nivalis (All), and Sigaus piliferus (LW) and Sigaus piliferus (MH) were combined into Sigaus piliferus (All). The second degree of taxonomic modelling was at the mitochondrial lineage level, where the ecological niches of Brachaspis nivalis (SA) and Brachaspis nivalis (FP), and Sigaus piliferus (LW) and Sigaus piliferus (MH) were compared with that of Brachaspis nivalis (All) and Sigaus piliferus (All), respectively. This analysis was undertaken in order to reveal if these distinct mitochondrial lineages are diverging ecologically.

Modelling methods

Ten different modelling methods were selected to analyse the presence and absence data of the grasshoppers against ‘current’ predictor variables: Generalized Linear Model (GLM), Generalised Boosting Model/Boosted Regression Tree (GBM), Generalised Additive Model (GAM), Classification Tree Analysis (CTA), Artificial Neural Network (ANN), Surface Range Envelope (SRE), Flexible Discriminent Analysis (FDA), Multiple Adaptive Regression Splines (MARS), Random Forest (RF) and Maximum Entropy (MAXENT). Descriptions of each model and their use within ‘biomod2’ are explained indepth in Diniz-Filho et al. (2009). All modelling parameters were kept at default values and 80% of the data was used to calibrate the models, with the remainding 20% being used to test them; each model was repeated three times resulting in a total of 30 models per species. Output values of variable importance were calculated as 1 - the mean correlation score of each variable, with scores closest to 1 indicating a variable is of high importance.

Investigation of model accuracy was carried out using two different evaluation methods: Reciever Operating Characteristic (ROC) (i.e. Area Under the Curve (AUC)) and True Skill Statistic (TSS). Models with ROC values of 0.9–1 and TSS values of 0.8–1 are considered to have ‘excellent’ predictive power (accuracy) (Thuiller et al. 2009). An

154

ensemble model was then created from a subset of these models based on their ROC values (>0.9). Using the ensemble model, ensemble forecasts were projected for the LGM and the 2061-2080 datasets. Plots were produced for each time period using the Ensemble Model mean weights model (EMmw). EMmw variable importance was calculated by applying the weights produced in the ensemble model to the associated models in the 30 model data set. As models were run three times for each predictor variable, these were then averaged, and EMmw variable importance was calculated by summing the total of these averages for each predictor variable, and dividing by the number of modelling methods used (i.e. 10) (Fletcher et al. 2016). Final scores of variable importance were converted into percentages of total variable importance for each modelling method.

Binary vectors of each ENM were generated for range change and fragmentation statistic analyses. These binary vectors were generated from the 30 niche model dataset, where for each pixel (1m2), if in >50% of models it scored greater than the predetermined cutoff value, it was ranked as a 1, and all other pixels as 0’s. When comparing binary vetors between current and future models, Biomod2 ranked pixels as: ‘Never occupied’ (i.e. pixels were unoccupied and remain unoccupied between models), ‘Always occupied’ (i.e. pixels were occupied and remain occupied between models), ‘Lost’ (i.e. pixels were occupied but become unoccupied in the future RCP scenario) or ‘Gained’ (i.e. pixels that were unoccupied in the ‘current’ model become occupied in the RCP model), from which range change statististics were then calculated. Fragstats, implemented in R package ‘SDM Tools’ v 1.1-221 (VanDerWal et al. 2014), also used these binary files to estimate various fragmentation statistics (e.g. patch size, number of patches, core area of patches), which were compared between different RCP scenarios from their ‘current’ niche models. Pixels that were connected within each of the binary files were given unique patch identities, and the area and number of these patches was calculated for each scenario. In this study, all patch fragments that were less than 100m2 were excluded from the fragmentation statistics analysis. A flowchart of the methods implemented in this study can be seen in Figure 4.2.

155 Presence/Absence Data • 80% train, 20% test Predictor Variables • Bioclim data (x7) • Soil and Vegetation

layers

Species Distribution Models

In document The ecology and evolution of New Zealand's endemic alpine grasshoppers: a thesis presented in partial fulfilment of the requirement for the degree of Doctor of Philosophy in Zoology, Massey University, New Zealand (Page 163-168)