Quantifying observer heterogeneity in bird counts
Published in Australian Journal of Ecology (1999) 24, 270-277
Authors: Cunningham, R.B., Lindenmayer, D.B., Nix, H.A. and Lindenmayer, B.D.
27
INTRODUCTION
A substantial literature shows that counts of birds can vary considerably as a result of the observation methods employed (e.g. Loyn 1980; Recher 1984, 1988; Ralph et al.1997) as well as variations in the area and/or time searched using a given method (Er et al.
1995). It is also well known that counts of birds can be strongly affected by levels of observer skill (e.g. Morin & Conant 1994; van der Meer & Camphuysen 1996). For example, sea-bird surveys in the North Sea that did and did not account for observer effects resulted in estimated population sizes of fulmars (Fulmarus glacialis) of 3.5 million and 1.8 million, respectively (van der Meer & Camphuysen 1996). Failure to quantify and incorporate observer effects can lead to controversy over the interpretation of data gathered by multiple observers (Link & Saur 1997 cf. McCulloch et al.1997) and potentially important find- ings can be masked by increased variance. This can limit the value of such data (Link & Sauer 1997). For example, ignoring observer effects may lead to ques- tionable conclusions about the magnitude of changes
in populations of birds (James et al.1996; Link & Sauer 1997) resulting in poor management decisions and the inappropriate allocation of scarce conservation resources.
In studies where multiple observers are required, it is often not possible to account for observer differences by statistical modelling. This introduces an additional source of variability into observations. It follows that knowledge of the magnitude of observer effects is important. It provides a basis for designing field studies so that a reduction in error, and hence an increase in power, can be achieved. Indeed, such work may become increasingly important as contributions from multiple observers (e.g. various atlases of distribu- tion records: Blakers et al.1984; Saunders & Ingram 1995) are being used more frequently in assessing large-scale effects and long-term trends in bird popu- lations (e.g. James et al.1996).
Relatively few Australian studies have attempted to quantify the magnitude of observer effects (but see Kavanagh & Recher 1983; Pyke & Recher 1985). In this study we present the findings of a field-based experiment designed to quantify observer effects on bird counts in native eucalypt and exotic softwood forests of southeastern Australia. Rather than examine Australian Journal of Ecology(1999) 24, 270–277
Quantifying observer heterogeneity in bird counts
R.B. CUNNINGHAM,
1* D. B. LINDENMAYER,
2H. A. NIX
3AND B. D. LINDENMAYER
41
Statistical Consulting Unit of The Graduate School,
2Centre for Resource and Environmental Studies
and Department of Geography,
3Centre for Resource and Environmental Studies, The Australian
National University, Canberra, ACT 0200,
417 Monkman Street, Chapman, ACT 2611, Australia
Abstract An essential pilot study was designed to quantify observer heterogeneity and to compare observation methods for the detectability of forest birds in stands of Eucalyptusand Pinus radiataforest as a basis for a major research project on habitat fragmentation near Tumut, southern New South Wales. Twelve experienced observers participated in the investigation. Point interval counts, zig-zag walks and strip transects were used to count birds in both eucalypt and pine forests. The 65 species of birds recorded in the study were assigned to one of nine groups classified by a set of attributes that characterized bird detection by field observers (e.g. body size, colour and call- ing patterns). Observer heterogeneity varied between groups of birds and was most apparent for small birds fora- ging in low shrubs (species such as the white-browed scrub wren, assigned to group 2), frequent calling, active birds (species such as the golden whistler, assigned to group 7), and midstorey, undercanopy foragers with dis- tinctive behaviour (species such as the grey fantail assigned to group 4). For bird groups 2, 4 and 7, additional variability due to observer differences resulted in an average increase of ∼40% in the width of a 95% confidence interval for the logarithm of bird abundance generated from a 20 minute count. Our analysis shows that taking the average of counts obtained by two or more observers would negate the increase in variance of counts due to observer heterogeneity. Few differences between methods of field observation were found. However, for frequent calling, active birds (group 7) there was evidence that more birds were heard using the point interval count method. Our study clearly demonstrated a need to either control for observer differences or to assign at least two observers to individual sites when designing bird surveys for comparative studies. Failure to do so will result in a decrease in precision of bird counts.
Key words: bird detectability, counting methods, forest birds, observer heterogeneity, southeastern Australia.
*Corresponding author.
Accepted for publication October 1998.
particular species, our focus is on broad groups of birds that share similar attributes relating to detect- ability (e.g. size, colour and voice characteristics). Thus, we identify those groups for which observer heterogeneity is important and the extent of such effects. In addition, we contrast counts using different observation methods [point interval count (sensuPyke & Recher 1983), zig-zag walk and straight transect] in stands of Eucalyptus and Pinus radiata forest. Some implications of our findings for bird surveys are briefly discussed.
METHODS
Study area
The field experiment was completed in late October 1996 in the Buccleuch State Forest near Tumut, southern New South Wales (148°409E, 35°109S). Twelve sites were selected; six in a radiata pine (Pinus radiata) plantation and six in Eucalyptusforest domi- nated by 30–40 m tall manna gum (Eucalyptus vimi- nalis) and narrow-leaved peppermint (Eucalyptus radiata). The areas of eucalypt forest were typically a mixture of large old trees and younger (30–50 years old) regrowth stems with an understorey comprised of
Bedfordia arborescens, Acaciaspp. trees, Cassiniaspp., and a range of other shrub species. The six sites in
P. radiataforest were 10–15 years old and had yet to be thinned. The dominant trees in these stands were ,20 m tall and supported little or no ground or shrub- storey cover except for occasional Cassiniaspp. stems. Detailed vegetation surveys completed as part of another major investigation in the study area (Lindenmayer et al. 1999) indicated there was con- siderable similarity in vegetation structure and plant species composition across the different eucalypt sites, and across the six P. radiatasites. Relatively uniform vegetation on the P. radiataand eucalypt sites, respec- tively, helped reduce between-site variations in the range of bird species likely to occur in the two forest types. Logistics dictated that all 12 sites used in our experiments be located within an area of ,9 km2. This
facilitated the movement of observers between study sites and minimized asynchrony of sampling times.
Site preparation and field counting protocol
Each of the 12 sites commenced at the edge of a gravel road and followed a set compass bearing into the forest. The sites were prepared for counting by setting out a 400 m line of coloured flagging tape which guided observers along the strip transect. A total of four tran- sect counts of equal length was completed during each counting event; 0–100 m, 100–200 m, 200–300 m, and 300–400 m. Five minutes were allocated for counting birds along each of the four 100 m segments. Flagging
tape of a different colour was used to mark the 50 m, 100 m, 150 m, 200 m, 250 m, 300 m, 350 m and 400 m points along each flagged transect. The 100 m, 200 m, 300 m and 400 m stations were the locations for the point interval counts. Birds were counted for five minutes at each location for the point interval counts. Two additional 400 m long lines of flagging tape were set out 50 m either side of the 400 m long main tran- sect line. These additional flagged lines marked the boundaries of each survey site and the limits of the area within which the zig-zag walk was employed. A total of four counts using the zig-zag walk method was used to record birds on a given site. These were of equal length and took place around the 0–100 m, 100– 200 m, 200–300 m, and 300–400 m sections of the flagged line. For each 100 m segment of the zig-zag walk, observers tracked a 100 m long path from the 0 m point on one of the flagged boundary lines to the 100 m point on the opposite boundary line passing through the marked 50 m station on the main transect. This procedure was repeated as observers passed through the 150 m point on the main transect as they traversed from the 100 m point to the 200 m point on the opposite boundary line. Five minutes was allocated to count birds within each 100 m segment of the zig- zag walk. Thus, the time spent counting birds (20 min) was identical for the strip transect, point interval count and the zig-zag walk.
Experimental design
Twelve experienced observers from the Canberra Ornithologists Group participated in this study. Each observer surveyed three of the six sites in P. radiata
forest and three of the six sites in Eucalyptusforest. A given observer completed three surveys in a given forest type before switching to the other type of vege- tation. Each observer used a different observation method for each of the three P. radiatasites and each of the three Eucalyptus sites. Thus, each observer employed each bird observation method twice; once in
P. radiata forest and once in Eucalyptusforest. Each observer completed a survey of one site on any given day, requiring six days to complete data collection. Assignment of observers to sites, day and method was achieved by use of a series of Graeco–Latin squares (Cochran & Cox 1956). This ensured that within the groups of three observers and three sites over three days, all effects were balanced with respect to each other. All surveys were completed between 1430 and 1600 h and were restricted to warm, humid afternoons characterized by sunshine and intermittent cloud cover. Afternoons with similar climate conditions were surveyed to reduce random effects between sampling days due to large differences in weather conditions (Slater 1994). Bad weather forced cancellation of field- work scheduled for Day 5. Because the same group of
O B S E RV E R H E T E RO G E N E I T Y I N B I R D C O U N T S 2 7 1
30
observers were available for a limited time, this meant the design was incomplete resulting in some con- founding between observers, sites and method effects. However, this incomplete design still has high statisti- cal efficiency and confounding between fixed and ran- dom factors was low (< 10%). Statistical efficiency is measured by comparing the variance of the incomplete design with that of the full design (Mead 1988).
Observers recorded the numbers of each species of bird detected by sight and/or hearing for their assigned site and method.
Preliminary data analysis and statistical methods
A total of 65 species of birds was recorded in the experi- ment. In this study we focused on the ‘detectability’ of birds, rather than the more traditional functional or guild-type attributes. The seven attributes that we con- sidered likely to influence the chances of a bird being detected by sight or sound were: body size, the qual- ity and distinctiveness of calling, loudness of calling, frequency of calling, colour of plumage, behavioural patterns, and foraging height. Expert assessment rated each of these detectability attributes on a scale 1, 2 or 3. A final classification comprized of nine groups of birds was produced by examination of the two-dimen- sional configuration of points representing the 65 species derived by a Principal Component Analysis of the seven detectability scores together with expert opinion. General attributes of each of these groups, as well as a few of the taxa representative of each one, are set out in Table 1.
As multiplicative effects seem more plausible than additive effects in our experiment, the response vari- able for statistical analyses was the logarithm of the aggregate abundance of birds counted in each of the nine groups. Observer effects were considered to have been randomly drawn from a pool of expert observers. That is, observer differences were treated as con- tributing to the variance of counts rather than as a bias in the mean count. Assessment of the significance or otherwise of observer randomeffects involved estimat- ing the observer component of variance and calculat- ing a likelihood-ratio-based test statistic (Robinson 1991). An analysis of variance table showing the struc- ture of variance decomposition (i.e. the model) for group 7 birds detected by hearing is given in Table 2. Inferences relating to forest type effects are based on estimates of site variation within the two broad types of forest in our experiment so caution is needed in inter- preting the results. However, this lack of effective repli- cation is not so important here because an assessment of these effects is incidental in this study. Estimates of variance components and fixed and random effects were obtained by a general method of estimation known as restricted maximum likelihood (REML) (Robinson 1991). Patterns of observer variability across groups were explored by Principal Components Analysis of observer effects.
Examination of residuals following an initial analy- sis provided a check of the compatibility of the model and data. Counts were low for some bird groups, and in these cases the results have not been reported as the usual distributional assumptions
2 7 2 R . B. C U N N I N G H A M E T A L.
Table 1.Groups of birds and their distinctive features Group
number Description Examples
1 Small colourful birds with few calls Scarlet robin, flame robin.
2 Small plain coloured birds foraging in the low shrubs layer Brown thornbill, white-browed scrub-wren, silvereye. 3 Birds with more cryptic behaviour and/or faint infrequent Striated thornbill, spotted quail-thrush, red-browed
calls that may be difficult to recognise treecreeper, olive whistler.
4 Birds with distinctive behaviour that are mid-storey to Grey fantail, rufous fantail, satin flycatcher,
undercanopy foragers leaden flycatcher.
5 Large, very active and visible birds with a loud call Australian magpie, pied currawong, crimson rosella, yellow-tailed black cockatoo, sulphur-crested cockatoo.
6 Frequent calling birds with loud and distinctive calls Willie wagtail, grey shrike thrush, blackbird, noisy friarbird.
7 Frequent calling birds with active movements Spotted pardalote, striated pardalote, white-throated treecreeper, golden whistler, rufous whistler. 8 Birds with penetrating calls but cryptic and/or hard to see Brush cuckoo, fan-tailed cuckoo, shining bronze
cuckoo, eastern whipbird, eastern yellow robin. 9 Large birds with distinctive calls Red wattlebird, black-faced cuckoo-shrike, grey
butcherbird, satin bowerbird, laughing kookaburra.
31
necessary for valid statistical inference are unlikely to be met.
RESULTS
Hearing data
Considerable information was gathered for detections of birds by call but sight records were substantially more limited. Therefore, most of the results outlined below relate to detection by call.
Observer components of variance (a measure of heterogeneity) were significantly (P< 0.05) different from zero for groups 2, 4, 7 and 9, but not for groups 3, 5 and 8. For groups 1 and 6, there was some evi- dence of observer heterogeneity although this was not statistically significant (P< 0.10). Variance components and associated change-in-deviance statistics are given in Table 3.
For a given site on a given day the variance of the logarithm of the count of the number of birds heard in 20 min is: s21s2
0, where s2 and s20are estimates of the
residual ‘observation’ variance and the observer vari- ance, respectively. Under the assumed model that
observer effects are Normal with variance s2 0, an
approximate 95% confidence interval (CI) is given by:
623√(s21s2 0).
With the exception of group 9, for bird groups for which there was significant observer heterogeneity, the observer component of variance was approximately the same magnitude as the inherent observation variance (Table 3). Thus, observer heterogeneity increased the width of the confidence interval of the logarithm of the number of birds heard by ,40%. For group 9 this will be conservative. It follows that if two observers were to count birds on a given site, the variance of the aver- age of the log counts of the two observers will be roughly equivalent to that obtained by a single observer, if there were no observer differences.
It should be noted that observation error is ,45% of the mean for groups 1, 4 and 7 and around 85% of the mean for groups 2, 5, 6 and 8 (Table 3). Thus, relative error remains reasonably large even if two observers are used. To increase the precision of counts, more than two observers are required. Day and site components of variance are consistently small relative to observation variance (Table 3).
A scatterplot matrix in Fig. 1 contains pairwise
O B S E RV E R H E T E RO G E N E I T Y I N B I R D C O U N T S 2 7 3
Table 2.Analysis of variance table for birds allocated to group 7 and detected by hearing
Source of variation d.f. Sum of squares Mean square Variance ratio
Site stratum
Forest type 1 7.599 7.599 5.55
Residual 10 13.695 1.369 7.69
Day stratum 4 1.284 0.321 1.81
Observer stratum 11 6.019 0.547 3.07
Site3day3observer stratum
Method 2 4.037 2.019 11.41 (P< 0.001)
Forest type3method 2 1.600 0.780 4.41 (P= 0.022)
Residual 28(1) 4.952 0.178
Table 3.Estimates of observer variance components, associated change-in-deviance statistics and P-values, based on a χ2 (1 d.f.) distribution for log bird counts
Observer variance Change-in-deviance
Bird group component statistic P-value Residual Day Site Mean [log(counts + 1)]
1 0.030 3.1 0.078 0.118 0.003 0.000 0.22 2 0.44 14.0 <0.001 0.338 0.000 0.000 1.72 3 0.0* 0.0 1.0 0.250 0.042 0.036 0.26 4 0.156 7.7 0.006 0.164 0.022 0.117 0.9 5 0.0* 0.4 0.53 0.341 0.009 0.027 1.46 6 0.144 2.7 0.10 0.402 0.000 0.000 1.29 7 0.123 8.7 0.003 0.166 0.013 0.111 2.01 8 0.0* 0.0 1.00 0.388 0.000 0.004 1.07 9 0.087 6.7 0.01 0.170 0.000 0.062 0.56
*Groups for which there were a large number of zeros and the distribution of the data may breach assumptions needed for valid inferences from statistical analyses.
Also included are estimates of variance components for the other random terms in the model and means of log (bird abundance).
graphs of observer effects for all groups except groups 3, 5 and 8 for which observer heterogeneity was not statistically significant. A high value indicated that a given observer heard more birds than the average, while a low value corresponded to a lower than aver- age count by an observer for a given group. These graphs facilitate an analysis of our sample of observers. For example, observer 6 produced low counts for all groups, but particularly bird groups 6, 7 and 9. Conversely, observer 9 produced high counts for all groups. However, such consistent patterns were not obtained for all observers; 11 typically recorded aver- age counts for group 9 birds but low ones for group 7. Observers 1, 3 and 7 produced consistently similar counts for all groups (Fig. 1).
The above analysis allows assessment of pairwise pat- terns only. Given this, observer effects were subject to Principal Components Analysis of observer effects. This examined patterns of observer heterogeneity by considering all groups simultaneously. The first prin- cipal component accounted for 78.1% of the variation between the observers and the second 9.3%. The bird groups contributing most to the first vector were 2, 4, 6 and 7. A contrast between a high count on group 4 (and to a lesser extent group 6) and a low count on