4.1 Introduction
Distance sampling (DS) is an established framework for estimating animal abundance (Buckland et al. 2001, 2004, 2015, Borchers et al. 2002). It allows for imperfect detection by assuming detection probability is a function of the distance between objects (e.g., animals or their sign), and specified locations (lines or points) from which objects may be observed. Careful modelling of this function is required to obtain accurate abundance estimates (Buckland et al. 2001, 2004, 2015). Exploratory analyses, goodness-of-fit (GOF) testing, and model selection are therefore critical components of DS analyses (Buckland et al. 2001, 2004, 2015, Marques et al. 2007, Thomas et al. 2010). GOF tests evaluate the null hypothesis that a model adequately fits the data; GOF tests for continuous and binned DS data were described by Buckland et al. (2001, 2015) and are implemented in DS software (Thomas et al. 2010, Laake et al. 2017). Rejection may indicate problems in the data or the structure of the model being tested, or violations of model assumptions. The purpose of model selection is the identification of a model or models that optimize the trade-off between bias and
precision of the parameters estimated from a given data set (Burnham and Anderson 2002, Johnson and Omland 2004).
DS methods assume that observations are independent (Buckland et al. 2001), but some DS surveys violate this assumption. For example, some animals travel in groups. Violation of the independence assumption can be avoided by treating the group as the unit of observation, measuring or estimating distances to the centre of detected
91
groups, and estimating animal density as the product of group density and mean group size (Buckland et al., 2001). However, this is only effective if the size and central location of the group are measured accurately (Buckland et al. 2001, 2010). When they cannot be, for example, because groups are widely spread or in motion, the recourse is to treat the individual as the unit of observation, and to record distances to all group members detected, in which case the data include non-independent observations. Furthermore, some animals, such as cetaceans that are often submerged, or songbirds that perch concealed in trees, are only available to be observed intermittently. However, if they give discrete cues of their presence and location, such as whale blows or bursts of birdsong, density of cues can be estimated using DS methods, and converted to estimates of animal density by dividing by the cue production rate (Buckland 2006, Buckland et al. 2001). During cue counting surveys, distances to all cues are recorded, so the data may include observations of distances to multiple cues given by the same animal(s), which again violates the independence assumption (Buckland 2006). Finally, Chapter 3 and Howe et al. (2017a) extended DS methods to accommodate data from camera traps (CTs). Distances to animals when first detected by CTs are expected to be positively biased, so authors recommended programming cameras to record video, or multiple still images, each time the sensor is triggered, and measuring distances to each detected animal multiple times at predetermined “snapshot moments” during an
independent encounter with a CT. Authors acknowledged that these observations would not be independent of each other.
Violations of the independence assumption do not bias point estimates of model parameters, but introduce overdispersion (Buckland et al. 2001) – a situation where the data are more variable than are expected under a given statistical model. When distance
92
data are overdispersed: (1) GOF tests, and likelihood ratio tests to compare the fit of nested models, are invalid and prone to type I error; (2) analytic variance estimators underestimate the actual uncertainty associated with the estimates, though modern empirical design-based estimators are robust to some violations (Fewster et al. 2009); (3) model selection criteria that have not been adjusted for overdispersion favour overly complex models with more than the optimal number of parameters (Cox and Snell 1989, McCullagh and Nelder 1989, Anderson et al. 1994, Burnham and Anderson 2002, Buckland 1984, Buckland et al. 2001, 2010, 2015, Richards 2008, Fewster et al. 2009). Akaike’s Information Criterion (AIC; Akaike 1973) is usually recommended for
selecting among candidate models of the detection function (Buckland et al. 2001, 2004, 2015, Marques and Buckland 2003, Marques et al. 2007, Thomas et al. 2010), however if the data are overdispersed, AIC is likely to favour unnecessarily complex models (Buckland 2006, Buckland et al. 2001, 2010). This additional complexity reduces precision, and can cause bias if it affects the slope of the detection function near the line or the point. Criteria adjusted to account for overdispersion in observed distances have not been developed previously.
Detectability may vary in response to multiple factors other than distance. DS methods are pooling robust, so the total or average density estimated from the entire data set will generally be unbiased even when variation in detectability is ignored (in the case of differences between distinct spatial subsets of the greater study area, sampling effort should be proportional to the areas of the subsets; Burnham et al. 2004). However, density estimates specific to different population strata among which detectability varies, which might be different species, treatments, habitat types, time periods, etc., are expected to be biased if estimated from a common detection function
93
(Marques and Buckland 2003, Marques et al. 2007, Buckland et al. 2004, 2015). Observations within different strata can be analyzed separately to avoid this bias, but this can reduce sample sizes to the point where densities of some strata may not be estimable, or estimates may be too imprecise to be useful. The multiple covariate approach to DS analysis (MCDS) allows variation in detectability to be modelled using covariates, and support for differences among strata to be evaluated using model selection criteria (Marques and Buckland 2003, Buckland et al. 2004, Marques et al. 2007); if models with constant detectability are supported over those with differences among strata, data can pooled across levels of those covariates when estimating
detectability. MCDS can therefore improve precision and allow density to be estimated for strata with few detections without relying on an assumption of constant detectabity across strata. It also casts decisions about how much stratification is necessary as a model selection problem, but in this case the quality of inferences about strata-specific densities is affected by the reliability of the model selection criterion. When the independence assumption is suspected or known to have been violated, it has been recommended that analysts constrain the complexity of the detection function and the number of covariates to avoid overfitting (Buckland et al. 2004, 2010, 2015, Marques et al. 2007). However, limiting the candidate set to simple models may not be desirable if there are multiple potential covariates of the detection function. Model selection criteria unadjusted for overdispersion will tend to select models that subdivide the data more than necessary, with adverse effects on precision. Conversely, “underfitting”, that is, failure to include significant sources of variation in the estimating model, would cause stratum-specific densities to be underestimated if true detection probabilities in that stratum tend to be lower than the average across strata, and vice versa. Adjusted criteria
94
could underfit if they overcompensated for overdispersion (e.g., if the magnitude of overdispersion was overestimated).
Although explicitly modeling the sources of overdispersion would be ideal, this is not always possible or practical (Cox and Snell 1989, McCullagh and Nelder 1989, Lebreton et al. 1992, Anderson et al. 1994, Burnham and Anderson 2002, Richards 2008). In practice in other contexts, the total overdispersion (c) was estimated from a χ2 GOF test of the global model (i.e., the most highly parameterized or most general model) divided by its degrees of freedom (df); the result was denoted 𝑐̂, and included in the calculation of information criteria adjusted for overdispersion for all models in the candidate set, and this was sufficient to avoid overfitting (Cox and Snell 1989, Lebreton et al. 1992, Liang and McCullagh 1993, Anderson et al. 1994, Burnham and Anderson 2001, 2002). The adjusted version of AIC (QAIC) is calculated as
𝑄𝐴𝐼𝐶 = −2 {logԸ(𝜃̂)
𝑐̂ } + 2𝐾
where 𝜃̂ is a vector of maximum likelihood parameter estimates, and K is the number of parameters in the current model (Lebreton et al. 1992). Burnham and Anderson (2001, 2002) clarified that the parameter 𝑐̂ should be included as one of the K parameters to estimate. Technically, “QAIC” is a misnomer because no quasilikelihood theory is involved; I used this term to refer to an information criteria adjusted for overdispersion because it may be familiar to ecologists from other contexts, such as capture–recapture (Lebreton et al. 1992).
Given an estimator of c (𝑐̂), the same approach could be used to calculate QAIC for models of the DS detection function. However, candidate sets usually include models with different general forms (termed “key functions”; e.g., half-normal, hazard rate, and uniform; Buckland et al. 2001) and numbers of adjustment terms, as well as
95
different covariate combinations (Buckland et al. 2004, 2015, Marques et al. 2007). Models with different key functions are not nested (not all models can be defined as simplifications of the most highly-parameterized model) within any one model, hence it may not always be straightforward to identify a single “global” model from which to estimate 𝑐̂. For example, Buckland (2006) considered an unadjusted hazard rate model, a half-normal model with a maximum of one adjustment term, and a uniform model with a maximum of two adjustment terms in his analysis of cue counting data from 4 songbird species. Thus three different models included two parameters, and no single model was the most general or most highly-parameterized. Below we propose and evaluate two estimators of 𝑐̂, and a two-step model selection procedure that does not require that a single global model is identifiable, for use with overdispersed DS data.
4.2 Methods
4.2.1 Model selection criteria and procedures
We suggest the χ2 GOF statistic for binned distance data (Buckland et al. 2001, p. 71, eqn. 3.57) divided by its degrees of freedom as one estimator of c (𝑐̂1). Johnson et al. (2010) proposed a one-stage, model-based approach for simultaneously estimating detectability and spatially variable abundance from DS data, and also evaluated the effectiveness of an overdispersion factor calculated from a χ2 test performed on transect- specific counts for inflating model-based variances around abundance estimates to account for overdispersion introduced by fine-scale variation in local abundance. They found that variances were still underestimated except where there were many transects, and suggested the χ2 GOF test for binned distance data divided by its degrees of
96
freedom as an alternative estimator. However, it is not clear to us how a statistic
derived from the observed distances would quantify overdispersion induced by variation in local abundance, so we use this statistic to adjust for overdispersion only when
modeling the detection function.
To allow for the possibility that multiple models may include the maximum number of parameters, and the fact that DS models have different general forms, we propose the following “two-step” model selection procedure. In step one we use QAIC to identify the best-supported model within each key function, and in step two we compare the relative goodness-of-fit of the best-supported models with different key functions. More specifically, in step one, we obtain 𝑐̂1 from the most highly-
parameterized model within each key function (rather than from the most highly- parameterized model overall), use those values of 𝑐̂1 to calculate QAIC for all models with the same key function, and identify the QAIC-minimizing model within each key function. In this step, the same value of 𝑐̂1 is used to calculate QAIC for all models with the same key function, but different values of 𝑐̂1 are used to calculate QAIC for different key functions. In step two, we compare values of the χ2 GOF statistic divided by its df across QAIC-minimizing models (one from each key function), and select the model with the smallest value for estimation, hence, the final decision is based on the values of 𝑐̂1, not QAIC (a different metric is used for decision making the two steps). If continuous distances are recorded in the field, distance observations will first need to be grouped into categories so that the GOF test for binned distance data can be performed. See Buckland et al. (2001) for advice regarding binning continuous observations.
In CT surveys or surveys of groups of animals, the number of distance
97
CT, or a sinlge observation of a group of animals) between an animal and an observer describes the frequency of violations directly, and so provides an alternative measure of the magnitude of overdispersion (𝑐̂2). 𝑐̂2 can be calculated from the raw data, and will be the same for all models in the candidate set. In CT surveys of solitary animals, 𝑐̂2 would be the mean number of distance observations recorded during a single pass by an animal in front of a CT. In surveys of social animals employing human observers, 𝑐̂2 would be the mean number of detected animals per detected group, and in CT surveys of social animals 𝑐̂2 would be the mean number of distance observations recorded per triggering event. In cue counting surveys, 𝑐̂2 would be unknown because multiple cues from the same animal cannot be identified as such. Where calculable, 𝑐̂2 could be used instead of multiple values of 𝑐̂1 to calculate QAIC values as in step one above. QAIC values would still be compared only within key functions, and the χ2 GOF statistic divided by its df would still be used in step two to select among QAIC-minimizing models with different key functions, i.e., the two step procedure would remain the same, but the definition of 𝑐̂ and therefore the values of QAIC used in step one would differ. Hereafter, we will refer to QAIC calculated from 𝑐̂1 as QAIC1, and from 𝑐̂2 as QAIC2.
4.2.2 Simulations
We conducted simulations where non-independent observations were all at the same distance so that we could evaluate performance where the true magnitude of overdispersion (c), and the true underlying model were known, but we would not expect this scenario to arise in practice. When non-independent observations during a single independent encounter are at different distances (e.g. to different members of a group, different cues from a moving animal, or as an animal moves past a CT), true c is
98
unknown because the different distance observations contribute information about the shape of the detection function. We therefore also simulated camera-trapping (CT) surveys of moving animals where cameras recorded video and distance was recorded every 2 seconds as animals moved through the field of view. These simulations mimic real surveys where animals move and c is unknown. Furthermore, the distribution of observed distances differed from the expected distribution of independent detections (see Results), so the true underlying model was also unknown.
For the simulations with known c, we sampled distance to animals within a circular point transect with radius 20 m, where the trude density was 2.00 per m2. To generate independent DS data, we simulated detections via random trials where
detection probability declined according to a half-normal function with scale parameter (σ) of 7. Each observation was arbitrarily assigned one of three levels of a categorical covariate that had no effect on detectability, which we will refer to as “observer”. We then replicated each observation five times to generate overdispersed data with c = 6. We fitted eight point transect DS models to each dataset, including the half normal model used to generate the data, and overparameterised models.
For the CT surveys, we simulated sampling of ungulates inhabiting old growth forests, recently logged forests, and previously logged but regrowing forests.
Simulation parameters were based on the survey of Maxwell’s duikers described in Chapter 3 and by Howe et al. (2017a), but were also selected to ensure that data were overdispersed, not sparse, and included multiple potential covariates of detectability. We assumed that the density of understorey vegetation increased immediately after logging and decreased gradually as forests regrew, such that food supply and therefore animal density was highest, but detection probability as a function of distance was
99
lowest, in recently logged forests; we further assumed a larger difference in detection probability between old growth and logged forests than between recently logged and regrowing forests (Table 4.1).
Table 4.1. Animal densities (D) and scale parameters (σ) of a half normal detection
probability function in different habitat types used to generate simulated distance sampling data. Forest type D σ Old growth 10 7.0 Regrowing 12 5.5 Recently logged 15 5.0 Mean 12.33
We simulated movements of 10, 12, and 15 animals within 1 km by 1 km square study areas in old growth, regrowing, and recently logged habitats, respectively. Each animal started with a random initial location and heading, after which new locations were generated every two seconds for 12 hours. Step lengths were drawn from an exponential distribution with a rate parameter of 2 m, and turn angles were drawn from a normal distribution with mean of 0 and standard deviation of 0.05 radians (we used the exponential distribution rather than the lognormal distribution used in Chapter 3 to ensure there would be many small steps lengths and therefore many observations at the same distance so that data would be severely overdispersed). Animals that moved beyond the boundaries of the study areas reappeared immediately on the opposite side of the same study area at the same heading. We simulated sampling at a grid of 36 CTs at 150 m spacing within each study area. We defined the zone of potential detection by a CT as a sector with a central angle of 0.733 radians and a radius of 25 m, and recorded distances between CTs and animal locations that fell within these sectors. We initially conducted random trials assuming detection probability declined according to a half- normal function with scale parameters as in Table 4.1 to determine whether animals
100
were detected at each time step. However, we assumed that cameras were programmed to record video when triggered, so once an animal was randomly detected once within a sector we set the probability of subsequent detection to 1.0 for as long as the animal remained within the sector. Therefore, the observed distances were those recorded within the sector defined by the location and angle of view of the CT, at predetermined snapshot moments after initial detection, following Howe et al. (2017a). With this movement model and sampling scenario, each animal was expeted to travel 10.8 km per day. Most step lengths were between 0 and 0.5 m, which ensured that animals would be observed multiple times, including at similar distances, during a single independent encounter with a CT, and hence distance data would be severely overdispersed. Density remained constant, and the expected distribution of animal locations was uniform within the study areas.
We conducted conventional distance sampling (CDS) analyses of data from recently logged forests, where only the key function and number of adjustment terms varied among 6 candidate models.
We also analysed data from all three habitat types simultaneously using multiple covariate distance sampling. Different habitat types were treated as different strata, with the potential to estimate a common detection function across all strata, or to model