Quantifying observer heterogeneity in bird counts

Published in Australian Journal of Ecology (1999) 24, 270-277

Authors: Cunningham, R.B., Lindenmayer, D.B., Nix, H.A. and Lindenmayer, B.D.

27

INTRODUCTION

A substantial literature shows that counts of birds can vary considerably as a result of the observation methods employed (e.g. Loyn 1980; Recher 1984, 1988; Ralph et al.1997) as well as variations in the area and/or time searched using a given method (Er et al.

1995). It is also well known that counts of birds can be strongly affected by levels of observer skill (e.g. Morin & Conant 1994; van der Meer & Camphuysen 1996). For example, sea-bird surveys in the North Sea that did and did not account for observer effects resulted in estimated population sizes of fulmars (Fulmarus glacialis) of 3.5 million and 1.8 million, respectively (van der Meer & Camphuysen 1996). Failure to quantify and incorporate observer effects can lead to controversy over the interpretation of data gathered by multiple observers (Link & Saur 1997 cf. McCulloch et al.1997) and potentially important findings can be masked by increased variance. This can limit the value of such data (Link & Sauer 1997). For example, ignoring observer effects may lead to ques- tionable conclusions about the magnitude of changes

in populations of birds (James et al.1996; Link & Sauer 1997) resulting in poor management decisions and the inappropriate allocation of scarce conservation resources.

In studies where multiple observers are required, it is often not possible to account for observer differences by statistical modelling. This introduces an additional source of variability into observations. It follows that knowledge of the magnitude of observer effects is important. It provides a basis for designing field studies so that a reduction in error, and hence an increase in power, can be achieved. Indeed, such work may become increasingly important as contributions from multiple observers (e.g. various atlases of distribution records: Blakers et al.1984; Saunders & Ingram 1995) are being used more frequently in assessing large-scale effects and long-term trends in bird populations (e.g. James et al.1996).

Relatively few Australian studies have attempted to quantify the magnitude of observer effects (but see Kavanagh & Recher 1983; Pyke & Recher 1985). In this study we present the findings of a field-based experiment designed to quantify observer effects on bird counts in native eucalypt and exotic softwood forests of southeastern Australia. Rather than examine Australian Journal of Ecology(1999) 24, 270–277

Quantifying observer heterogeneity in bird counts

R.B. CUNNINGHAM,

_{* D. B. LINDENMAYER,}

_{H. A. NIX}

_{AND B. D. LINDENMAYER}

_{Statistical Consulting Unit of The Graduate School,}

_{Centre for Resource and Environmental Studies}

and Department of Geography,

_{Centre for Resource and Environmental Studies, The Australian}

National University, Canberra, ACT 0200,

17 Monkman Street, Chapman, ACT 2611, Australia

Abstract An essential pilot study was designed to quantify observer heterogeneity and to compare observation methods for the detectability of forest birds in stands of Eucalyptusand Pinus radiataforest as a basis for a major research project on habitat fragmentation near Tumut, southern New South Wales. Twelve experienced observers participated in the investigation. Point interval counts, zig-zag walks and strip transects were used to count birds in both eucalypt and pine forests. The 65 species of birds recorded in the study were assigned to one of nine groups classified by a set of attributes that characterized bird detection by field observers (e.g. body size, colour and calling patterns). Observer heterogeneity varied between groups of birds and was most apparent for small birds foraging in low shrubs (species such as the white-browed scrub wren, assigned to group 2), frequent calling, active birds (species such as the golden whistler, assigned to group 7), and midstorey, undercanopy foragers with distinctive behaviour (species such as the grey fantail assigned to group 4). For bird groups 2, 4 and 7, additional variability due to observer differences resulted in an average increase of ∼40% in the width of a 95% confidence interval for the logarithm of bird abundance generated from a 20 minute count. Our analysis shows that taking the average of counts obtained by two or more observers would negate the increase in variance of counts due to observer heterogeneity. Few differences between methods of field observation were found. However, for frequent calling, active birds (group 7) there was evidence that more birds were heard using the point interval count method. Our study clearly demonstrated a need to either control for observer differences or to assign at least two observers to individual sites when designing bird surveys for comparative studies. Failure to do so will result in a decrease in precision of bird counts.

Key words: bird detectability, counting methods, forest birds, observer heterogeneity, southeastern Australia.

*Corresponding author.

Accepted for publication October 1998.

particular species, our focus is on broad groups of birds that share similar attributes relating to detectability (e.g. size, colour and voice characteristics). Thus, we identify those groups for which observer heterogeneity is important and the extent of such effects. In addition, we contrast counts using different observation methods [point interval count (sensuPyke & Recher 1983), zig-zag walk and straight transect] in stands of Eucalyptus and Pinus radiata forest. Some implications of our findings for bird surveys are briefly discussed.

METHODS

Study area

The field experiment was completed in late October 1996 in the Buccleuch State Forest near Tumut, southern New South Wales (148°409E, 35°109S). Twelve sites were selected; six in a radiata pine (Pinus radiata) plantation and six in Eucalyptusforest domi- nated by 30–40 m tall manna gum (Eucalyptus vimi- nalis) and narrow-leaved peppermint (Eucalyptus radiata). The areas of eucalypt forest were typically a mixture of large old trees and younger (30–50 years old) regrowth stems with an understorey comprised of

Bedfordia arborescens, Acaciaspp. trees, Cassiniaspp., and a range of other shrub species. The six sites in

P. radiataforest were 10–15 years old and had yet to be thinned. The dominant trees in these stands were ,20 m tall and supported little or no ground or shrub- storey cover except for occasional Cassiniaspp. stems. Detailed vegetation surveys completed as part of another major investigation in the study area (Lindenmayer et al. 1999) indicated there was considerable similarity in vegetation structure and plant species composition across the different eucalypt sites, and across the six P. radiatasites. Relatively uniform vegetation on the P. radiataand eucalypt sites, respectively, helped reduce between-site variations in the range of bird species likely to occur in the two forest types. Logistics dictated that all 12 sites used in our experiments be located within an area of ,9 km2_{. This}

facilitated the movement of observers between study sites and minimized asynchrony of sampling times.

Site preparation and field counting protocol

Each of the 12 sites commenced at the edge of a gravel road and followed a set compass bearing into the forest. The sites were prepared for counting by setting out a 400 m line of coloured flagging tape which guided observers along the strip transect. A total of four transect counts of equal length was completed during each counting event; 0–100 m, 100–200 m, 200–300 m, and 300–400 m. Five minutes were allocated for counting birds along each of the four 100 m segments. Flagging

tape of a different colour was used to mark the 50 m, 100 m, 150 m, 200 m, 250 m, 300 m, 350 m and 400 m points along each flagged transect. The 100 m, 200 m, 300 m and 400 m stations were the locations for the point interval counts. Birds were counted for five minutes at each location for the point interval counts. Two additional 400 m long lines of flagging tape were set out 50 m either side of the 400 m long main transect line. These additional flagged lines marked the boundaries of each survey site and the limits of the area within which the zig-zag walk was employed. A total of four counts using the zig-zag walk method was used to record birds on a given site. These were of equal length and took place around the 0–100 m, 100– 200 m, 200–300 m, and 300–400 m sections of the flagged line. For each 100 m segment of the zig-zag walk, observers tracked a 100 m long path from the 0 m point on one of the flagged boundary lines to the 100 m point on the opposite boundary line passing through the marked 50 m station on the main transect. This procedure was repeated as observers passed through the 150 m point on the main transect as they traversed from the 100 m point to the 200 m point on the opposite boundary line. Five minutes was allocated to count birds within each 100 m segment of the zig- zag walk. Thus, the time spent counting birds (20 min) was identical for the strip transect, point interval count and the zig-zag walk.

Experimental design

Twelve experienced observers from the Canberra Ornithologists Group participated in this study. Each observer surveyed three of the six sites in P. radiata

forest and three of the six sites in Eucalyptusforest. A given observer completed three surveys in a given forest type before switching to the other type of vegetation. Each observer used a different observation method for each of the three P. radiatasites and each of the three Eucalyptus sites. Thus, each observer employed each bird observation method twice; once in

P. radiata forest and once in Eucalyptusforest. Each observer completed a survey of one site on any given day, requiring six days to complete data collection. Assignment of observers to sites, day and method was achieved by use of a series of Graeco–Latin squares (Cochran & Cox 1956). This ensured that within the groups of three observers and three sites over three days, all effects were balanced with respect to each other. All surveys were completed between 1430 and 1600 h and were restricted to warm, humid afternoons characterized by sunshine and intermittent cloud cover. Afternoons with similar climate conditions were surveyed to reduce random effects between sampling days due to large differences in weather conditions (Slater 1994). Bad weather forced cancellation of field- work scheduled for Day 5. Because the same group of

O B S E RV E R H E T E RO G E N E I T Y I N B I R D C O U N T S 2 7 1

30

observers were available for a limited time, this meant the design was incomplete resulting in some confounding between observers, sites and method effects. However, this incomplete design still has high statistical efficiency and confounding between fixed and random factors was low (< 10%). Statistical efficiency is measured by comparing the variance of the incomplete design with that of the full design (Mead 1988).

Observers recorded the numbers of each species of bird detected by sight and/or hearing for their assigned site and method.

Preliminary data analysis and statistical methods

A total of 65 species of birds was recorded in the experiment. In this study we focused on the ‘detectability’ of birds, rather than the more traditional functional or guild-type attributes. The seven attributes that we considered likely to influence the chances of a bird being detected by sight or sound were: body size, the qual- ity and distinctiveness of calling, loudness of calling, frequency of calling, colour of plumage, behavioural patterns, and foraging height. Expert assessment rated each of these detectability attributes on a scale 1, 2 or 3. A final classification comprized of nine groups of birds was produced by examination of the two-dimen- sional configuration of points representing the 65 species derived by a Principal Component Analysis of the seven detectability scores together with expert opinion. General attributes of each of these groups, as well as a few of the taxa representative of each one, are set out in Table 1.

As multiplicative effects seem more plausible than additive effects in our experiment, the response vari- able for statistical analyses was the logarithm of the aggregate abundance of birds counted in each of the nine groups. Observer effects were considered to have been randomly drawn from a pool of expert observers. That is, observer differences were treated as contributing to the variance of counts rather than as a bias in the mean count. Assessment of the significance or otherwise of observer randomeffects involved estimat- ing the observer component of variance and calculat- ing a likelihood-ratio-based test statistic (Robinson 1991). An analysis of variance table showing the structure of variance decomposition (i.e. the model) for group 7 birds detected by hearing is given in Table 2. Inferences relating to forest type effects are based on estimates of site variation within the two broad types of forest in our experiment so caution is needed in inter- preting the results. However, this lack of effective repli- cation is not so important here because an assessment of these effects is incidental in this study. Estimates of variance components and fixed and random effects were obtained by a general method of estimation known as restricted maximum likelihood (REML) (Robinson 1991). Patterns of observer variability across groups were explored by Principal Components Analysis of observer effects.

Examination of residuals following an initial analysis provided a check of the compatibility of the model and data. Counts were low for some bird groups, and in these cases the results have not been reported as the usual distributional assumptions

2 7 2 R . B. C U N N I N G H A M E T A L.

Table 1.Groups of birds and their distinctive features Group

number Description Examples

1 Small colourful birds with few calls Scarlet robin, flame robin.

2 Small plain coloured birds foraging in the low shrubs layer Brown thornbill, white-browed scrub-wren, silvereye. 3 Birds with more cryptic behaviour and/or faint infrequent Striated thornbill, spotted quail-thrush, red-browed

calls that may be difficult to recognise treecreeper, olive whistler.

4 Birds with distinctive behaviour that are mid-storey to Grey fantail, rufous fantail, satin flycatcher,

undercanopy foragers leaden flycatcher.

5 Large, very active and visible birds with a loud call Australian magpie, pied currawong, crimson rosella, yellow-tailed black cockatoo, sulphur-crested cockatoo.

6 Frequent calling birds with loud and distinctive calls Willie wagtail, grey shrike thrush, blackbird, noisy friarbird.

7 Frequent calling birds with active movements Spotted pardalote, striated pardalote, white-throated treecreeper, golden whistler, rufous whistler. 8 Birds with penetrating calls but cryptic and/or hard to see Brush cuckoo, fan-tailed cuckoo, shining bronze

cuckoo, eastern whipbird, eastern yellow robin. 9 Large birds with distinctive calls Red wattlebird, black-faced cuckoo-shrike, grey

butcherbird, satin bowerbird, laughing kookaburra.

31

necessary for valid statistical inference are unlikely to be met.

RESULTS

Hearing data

Considerable information was gathered for detections of birds by call but sight records were substantially more limited. Therefore, most of the results outlined below relate to detection by call.

Observer components of variance (a measure of heterogeneity) were significantly (P< 0.05) different from zero for groups 2, 4, 7 and 9, but not for groups 3, 5 and 8. For groups 1 and 6, there was some evidence of observer heterogeneity although this was not statistically significant (P< 0.10). Variance components and associated change-in-deviance statistics are given in Table 3.

For a given site on a given day the variance of the logarithm of the count of the number of birds heard in 20 min is: s2₁_s2

0, where s2 and s20are estimates of the

residual ‘observation’ variance and the observer variance, respectively. Under the assumed model that

observer effects are Normal with variance s2 0, an

approximate 95% confidence interval (CI) is given by:

623√(s2₁_s2 0).

With the exception of group 9, for bird groups for which there was significant observer heterogeneity, the observer component of variance was approximately the same magnitude as the inherent observation variance (Table 3). Thus, observer heterogeneity increased the width of the confidence interval of the logarithm of the number of birds heard by ,40%. For group 9 this will be conservative. It follows that if two observers were to count birds on a given site, the variance of the average of the log counts of the two observers will be roughly equivalent to that obtained by a single observer, if there were no observer differences.

It should be noted that observation error is ,45% of the mean for groups 1, 4 and 7 and around 85% of the mean for groups 2, 5, 6 and 8 (Table 3). Thus, relative error remains reasonably large even if two observers are used. To increase the precision of counts, more than two observers are required. Day and site components of variance are consistently small relative to observation variance (Table 3).

A scatterplot matrix in Fig. 1 contains pairwise

O B S E RV E R H E T E RO G E N E I T Y I N B I R D C O U N T S 2 7 3

Table 2.Analysis of variance table for birds allocated to group 7 and detected by hearing

Source of variation d.f. Sum of squares Mean square Variance ratio

Site stratum

Forest type 1 7.599 7.599 5.55

Residual 10 13.695 1.369 7.69

Day stratum 4 1.284 0.321 1.81

Observer stratum 11 6.019 0.547 3.07

Site3day3observer stratum

Method 2 4.037 2.019 11.41 (P< 0.001)

Forest type3method 2 1.600 0.780 4.41 (P= 0.022)

Residual 28(1) 4.952 0.178

Table 3.Estimates of observer variance components, associated change-in-deviance statistics and P-values, based on a χ2 (1 d.f.) distribution for log bird counts

Observer variance Change-in-deviance

Bird group component statistic P-value Residual Day Site Mean [log(counts + 1)]

1 0.030 3.1 0.078 0.118 0.003 0.000 0.22 2 0.44 14.0 <0.001 0.338 0.000 0.000 1.72 3 0.0* 0.0 1.0 0.250 0.042 0.036 0.26 4 0.156 7.7 0.006 0.164 0.022 0.117 0.9 5 0.0* 0.4 0.53 0.341 0.009 0.027 1.46 6 0.144 2.7 0.10 0.402 0.000 0.000 1.29 7 0.123 8.7 0.003 0.166 0.013 0.111 2.01 8 0.0* 0.0 1.00 0.388 0.000 0.004 1.07 9 0.087 6.7 0.01 0.170 0.000 0.062 0.56

*Groups for which there were a large number of zeros and the distribution of the data may breach assumptions needed for valid inferences from statistical analyses.

Also included are estimates of variance components for the other random terms in the model and means of log (bird abundance).

graphs of observer effects for all groups except groups 3, 5 and 8 for which observer heterogeneity was not statistically significant. A high value indicated that a given observer heard more birds than the average, while a low value corresponded to a lower than average count by an observer for a given group. These graphs facilitate an analysis of our sample of observers. For example, observer 6 produced low counts for all groups, but particularly bird groups 6, 7 and 9. Conversely, observer 9 produced high counts for all groups. However, such consistent patterns were not obtained for all observers; 11 typically recorded average counts for group 9 birds but low ones for group 7. Observers 1, 3 and 7 produced consistently similar counts for all groups (Fig. 1).

The above analysis allows assessment of pairwise patterns only. Given this, observer effects were subject to Principal Components Analysis of observer effects. This examined patterns of observer heterogeneity by considering all groups simultaneously. The first principal component accounted for 78.1% of the variation between the observers and the second 9.3%. The bird groups contributing most to the first vector were 2, 4, 6 and 7. A contrast between a high count on group 4 (and to a lesser extent group 6) and a low count on

In document Design and analysis of quasi-experiments in landscape ecology: responses of fauna to landscape vegetation transformation in South-Eastern NSW (Page 37-47)