A simulation example - Correcting the effect of measurement error the P DF approach

3.3 Correcting the effect of measurement error the P DF approach

3.3.1.4 A simulation example

The performance of the proposed method was assessed by simulation. A known size simulated population was surveyed, the true distances to detected animals contaminated with errors of known structure, and the expected bias (and corresponding correction factor) were obtained. Then, using a subset of the data, the correction was estimated for each data set, and the corrected estimates compared to the ones ignoring the errors.

A population of 10000 animals was randomly (uniform coordinates in both dimen-

sions) generated on a square with side 1000 meters (D= 100 animals/hectare). The

study area was divided into 25 non-overlapping squares, and in each of these squares a transect of 200 meters was randomly selected. In each row of squares a transect was randomly generated for the first square and in the subsequent squares it was systematically placed with respect to the first one (see Figure 3.3). At the analysis stage, a truncation distance of 10 meters was used. To avoid edge effects, no transects were placed at less than 10 meters from the edge of the square. For every animal, a rejection method was used to decide if it was considered detected or not, based

on a half-normal detection function (σ = 5). This process was repeated 100 times,

resulting in 100 independent data sets. The average number of animals detected in each realization of the process was 593, standard deviation 20.1. The data generated were considered to be error free.

I then generated errors with the following distributions: beta(1,1); beta(3,2) and beta(5,5). The choice for these particular models was arbitrary, but had a rationale. The beta(1,1) was used as an extreme case, beta(5,5) as estimation of distances is unbiased, but density estimation is biased, and beta(3,2) as estimation of distances is biased, but nonetheless estimation of density should be unbiased. For each distance without error in these data sets, errors were independently generated, and introduced

0 200 400 600 800 1000 0 200 400 600 800 1000 x coordinate y coordinate

Figure 3.3: Schematics of the study area considered in the simulation study to access

the performance of the P DF approach, with an example realization of the transects

and animals. The survey design consisted of random placement of a transect in the first square of each row followed by systematic placement (with respect to the first square) of the remaining transects in each row.

as postulated above (for model I and II), leading to five contaminated sets. (The case

of beta(1,1) for model II was not implemented, since Kl would be infinite.)

To preclude analyst influence, the data sets were analyzed using a conventional

analysis in Distance 4 (Thomas et al., 2002), as follows. The models considered for

the detection function were half-normal+cosine (HN cos), uniform+cosine (UNIcos)

and hazard rate+simple polynomial (HRpol), and the one with lowestAIC selected.

The variance for encounter rate was calculated analytically based on replicate lines. In the analysis of contaminated data, the largest 5% of distances was truncated, as otherwise some models required several adjustment terms to provide an adequate fit of the data.

The analysis of the error free data led to an average estimated density of 98.6 animals per hectare, with a standard error of 0.65. The actual coverage for the 95%

CI was 93%.

For the contaminated data sets, only 23 transects were used to estimate density, and the remaining 2 were used as a separate experiment, where true and contaminated distances were evaluated. This resulted on average in 516.2 (standard deviation 19.4) observations to estimate density and 49.6 (standard deviation 7.1) observations

to estimate Kl. Kl was estimated using the harmonic mean estimator on the sample

of R’s resulting from the 2 transects, as well as based on the true beta model used

to generate the errors, by maximizing the appropriate likelihood and then evaluating equations 3.16 or 3.19 by substitution of the true parameter values with their cor-

responding maximum likelihood estimates. The variance of Kl estimates, var(Kl),

was obtained by bootstrap (999 resamples). The variance for the corrected estimator,

considering the nonparametric estimator for Kl, was then obtained by combining the

variance of ˆDy and ˆKl using the delta method, using expression 3.14.

Figure 3.4: Results of the simulation exercise to evaluate the P DF approach to deal with measurement error. Error-based density estimates (lighter histograms) and the corrected estimates (darker histograms) using the harmonic mean estimator. a) True distances; b) Error beta(1,1), model I; c) Error beta(5,5), model I; d) Error beta(5,5), model II; e) Error beta(3,2), model I; f) Error beta(3,2), model II. Dashed line - mean value of estimates based on true distances. Dashed-doted line - mean value of estimates based on error distances. Long dashed line - mean value of estimates based

Table 3.1: Results of the simulation exercise to evaluate the P DF approach to deal

with measurement error. True Kl (T K), mean estimated Kl using the harmonic

mean (EHK) or the true beta model (EBK), and mean observed Kl (OK), under

the combinations of errors and models (I - model I, II - model II) considered. Mean

estimated density (D), density coefficient of variation (DCV) and coverage of 95%

CI for density (95%CIC), respectively for the corrected and uncorrected estimator.

Mean estimated density based on true distances is 98.6 animals/ha. True density is 100 animals/ha.

T K EHK EBK OK D1 _DCV1 _95%_CIC1

beta(1,1), I 1.099 1.105 1.104 1.078 96.5/106.4 7.46/5.83 89/81

beta(5,5), I 1.024 1.021 1.021 1.030 99.4/101.5 6.76/6.76 95/94

beta(5,5), II 1.125 1.126 1.125 1.096 96.4/108.0 8.69/6.17 89/78

beta(3,2), I 0.944 0.943 0.943 0.941 98.5/92.8 6.83/6.13 94/75

beta(3,2), II 1.000 1.008 1.007 0.952 94.2/93.9 11.04/6.68 90/75

1_{Corrected analysis/uncorrected analysis}

and error, along with the corresponding corrected and uncorrected mean density estimates, are presented in table 3.1. Also shown is the coverage of the 95% confidence interval, based both on corrected and uncorrected analysis.

The nonparametric estimator for Kl and the parametric beta-based estimator

showed no differences, justifying the nonparametric estimator when the true model is unknown. There was an increased coverage with the use of the proposed correction in all cases. Figure 3.4 shows the uncorrected (i.e. error based density estimates) and the corrected density estimates using the harmonic mean estimator, showing that the correction reduced the bias in most cases. It can be seen that the results were very close to the expected ones, validating the predictions of the effects of errors and the

proposed correction. However, in some cases, true Kl and observed Kl were slightly

different. Especially in the case of beta(3,2) for model II, a Kl of 1 was expected but

an average Kl of 0.952 was obtained. These unexpected results will be considered in

the discussion. Note that even in this case coverage was increased, due to an increased

In document Incorporating measurement error and density gradients in distance sampling surveys (Page 78-83)