Simulation 5: The importance of including a random effect

Depending on how the trial surveys are conducted, it is foreseeable that repeat trials may be conducted on the same individual, and the detection function needs to be modelled using e.g., a mixed effects model to take account of the non-independence of the trial observations (see Section 5.3.1 on page 79). The inclusion of random effects in the model increases statistical complexity, possi- bly rendering the analysis of data collected using the trapping point transect field methods arduous for field biologists without input from a statistician. In addition, the inclusion of a random effect increases computational time, espe- cially integrating out the random effect form the detection function, which is the most time consuming aspect of this method3_{(although more efficient integration} methods could be implemented, e.g. numerical integration). The consequences of omitting the random effect from the detection function in the analysis are investigated below.

6.7.1 Methods

Using the three detection function scenarios and the “Adaptive” method to se- lect trial distances, I simulated trial survey data that included a random effect component. Instead of fitting a detection function that modelled the random effect variance, as in Simulation 1, the random effect was ignored. Total survey effort was fixed at 360 trap nights, 24 trials were conducted per individual, and there were 15 individuals in the survey. Results were compared to those obtained when the random effect component was included (i.e., Simulation 1).

I also ran these two simulations excluding the group covariate and compared 3_{Note, conducting 999 simulations did not typically take greater than 16 hours of computer}

time, and depended on abundance in the main survey, underlying detectability, complexity of the fitted detection function, survey effort and processing power of the computer.

the results to when the group covariate was excluded but the random effect variance was included (i.e., Simulation 2), for the same level of survey effort.

6.7.2 Results

When the group covariate was included in the analysis (Table 6.4), omitting the random effect component of the model decreased the percentage bias of ˆN2 when the “High” and “Medium” detectability scenarios were used (from 4.26% to 3.34% when the random effect was included and excluded, respectively for the “High” scenario, and 7.6% to 6.2% for the “Medium” scenario). However, using the “Low” detectability scenario, omission of the random effect increased the percentage bias in ˆN2 (from 23.1%, to 24.4%). The percentile intervals were typically narrower when the random effect was omitted4 _.

When the group covariate was excluded in the analysis (Table 6.5), the conse- quence of omitting the random effect was extremely noticeable for the abundance estimator ˆN3 (less so for ˆN2 which was already found to be extremely negatively biased when underlying heterogeneity was ignored, c.f. Simulation 2 in Section 6.4). When the random effect was omitted, ˆN3 became more negatively biased, regardless of which detection function scenario was used (Table 6.5). For ex- ample, bias in ˆN3 increases from -3.34% to -31.88% when the random effects component was and was not included in the model, respectively, for the “Low” detection function scenario.

Excluding the random effect component from the detection function when heterogeneity was not accounted for increased bias. It seems the probability of detection for the low group present in the population was severely overestimated leading to the abundance estimate being severely underestimated (Figure 6.22, N.B. the same negative bias was found in Simulation 2 when heterogeneity was ignored).

4_{Note, with no random effects variance or group covariate ˆ}_N 2= ˆN3.

6.7.3 Conclusion of Simulation 5

Regardless of detection probability and whether underlying heterogeneity in detection probability was accounted for in the analysis, uncertainty in ˆN2 and ˆN3 was larger when the random effect was included in the analysis. This occurred because the random effect adds an additional level of uncertainty regarding individual variability, that a standard GLM does not. When the random effect was omitted for the “High” and “Medium” detection function scenarios the percentage bias in ˆN2 decreased when underlying heterogeneity was accounted for, as in both these scenarios, the individual random variance was small (big ∼

N(0,0.1), g = H or M). When the random effect variance was larger (i.e., the “Low” scenario, biL ∼ N(0, 0.3)), omitting the random effect increased bias in

N2. Similarily, when heterogeneity was excluded, omitting the random effect when the true individual random effect variance was very high (i.e., the “Low” scenario), caused large bias in ˆN3. When the repeated measures nature of the data (i.e., the random effect) was ignored, there is less bias in ˆN3, when you know about the important covariates (such as group in this case). When underlying heterogeneity is unknown, ˆN3 is extremely biased if the random effect is excluded, which does at least allow for between-individual variation.

Table 6.4: ˆN2 and ˆN3 abundance estimates when the random effects component of the data structure is ignored (i.e., individual trials are assumed independent of each other, “No RE”) for the “High” and “Low” detection function scenarios (24 trials on 15 indiviudals, respectively). Models which include the random effects component (“With RE”) from Simulation 1 are provided for comparison. Note, with no random effects variance or group covariate ˆN2 = ˆN3. True abundance in the main survey was 2,000 individuals.

Model Scenario mean( ˆN2) sd( ˆN2) 95% PI( ˆN2) median( ˆN2) % bias( ˆN2) RMSE( ˆN2) With RE High 2085.171 577.90 (1145.67, 3462.51) 2014.222 4.26 18.47 No RE High 2066.75 543.25 (1165.40, 3334.32) 2023.50 3.34 17.31 With RE Med 2151.91 699.43 (1101.35, 3728.31) 2062.29 7.60 22.63 No RE Med 2124.26 678.65 (966.98, 3672.16) 2049.00 6.21 21.82 With RE Low 2462.44 1607.33 (737.92, 6751.62) 2026.52 23.12 53.19 No RE Low 2487.99 1753.04 (797.75, 7513.65) 2012.53 24.40 57.98 Model Scenario mean( ˆN3) sd( ˆN3) 95% PI( ˆN3) median( ˆN3) % bias( ˆN3) RMSE( ˆN3) With RE High 2158.23 614.81 (1159.66, 3577.30) 2077.06 7.91 20.08 No RE High 2066.75 543.25 (1165.40, 3334.32) 2023.50 3.34 17.31 With RE Med 2241.93 750.30 (1114.82, 4024.02) 2129.12 12.10 24.93 No RE Med 2124.26 678.65 (966.98, 3672.16) 2049.00 6.21 21.82 With RE Low 2908.67 2101.58 (793.05, 8667.91) 2306.99 45.43 72.81 No RE Low 2487.99 1753.04 (797.75, 7513.65) 2012.53 24.40 57.98

Table 6.5: Results for ˆN2 and ˆN3abundance estimators when the random effects component of the data structure is ignored (i.e., individual trials are assumed independent of each other) and the group covariate for the “Medium” and “Low” detection function scenarios is also ignored. A high survey effort was used (24 trials per 15 individuals, respectively). Models which include the random effects component from Simulation 2 are provided for comparison. Note, with no random effects variance or group covariate ˆN2 = ˆN3. True abundance in the main survey was 2,000 individuals.

Model Scenario mean( ˆN2) sd( ˆN2) 95% PI( ˆN2) median( ˆN2) %bias( ˆN2) RMSE( ˆN2) With RE Med 1883.09 626.05 (910.59, 3340.79) 1790.546 -5.85 20.14 No RE Med 1846.84 519.66 (960.48, 3102.09) 1787.38 -7.66 17.13 With RE Low 1404.67 477.15 (616.05, 2511.76) 1357.165 -29.77 24.13 No RE Low 1362.40 416.14 (666.07, 2344.83) 1323.83 -31.88 24.09 Model Scenario mean( ˆN3) sd( ˆN3) 95% PI( ˆN3) median( ˆN3) % bias( ˆN3) RMSE( ˆN3) With RE Med 2147.98 750.96 (1020.72, 3979.09) 2011.39 7.40 24.20 No RE Med 1846.84 519.66 (960.48, 3102.09) 1787.38 -7.66 17.13 With RE Low 1933.12 765.94 (816.56, 3797.42) 1817.94 -3.34 24.31 No RE Low 1362.40 416.14 (666.07, 2344.83) 1323.83 -31.88 24.09

141 ˆ

N2 is biased low (Panel C). The probability of detection of the high group is slightly biased high (Panel D and E), and hence the abundance estimate is slightly biased high (Panel F). The solid, big dashed and small dashed lines in Panel B and C, and E and F are the true, mean and median simulated probabilities and abundances estimates for the high and low groups, respectively.

In document Estimating abundance of rare, small mammals: A case study of the Key Largo woodrat (Neotoma floridana smalli) (Page 157-162)