The final part of this meta-analysis provides an assessment of the sensitivity of the bidirectional control method. It has already been shown that demonstrator-consistent responding is statistically significant. Thus, in principle, such an effect is detectable. However, the low proportion o f reliable demonstrator-consistent responding effects in the experiments under review might indicate that bidirectional control tests are not especially sensitive: A demonstrator-consistent responding effect may tend not to be measurable in practice, at least when samples of a modest size are employed. In order to examine the sensitivity of bidirectional control methodology, the following steps were taken. First, a homogeneous sample of study effects was isolated from the initial sample. It was intended that, as well as being more homogeneous than the overall sample, this subsample should be representative of the paradigmatic procedure conventionally used in attempts to demonstrate imitation in rats through the use of a bidirectional control method. Second, the true effect size for this conventional procedure was estimated by averaging the effect sizes for the subsample. And third, a power analysis was conducted to assess the sensitivity o f this procedure.
Table 5.4: Permitted procedural manipulations and excluded testing conditions for the homogeneous subsample.
Permitted Manipulations
Amount of magazine training
(number of sessions, number of pellets per session) Previous experimental experience of observers Housing conditions of observers
(with demonstrators, without demonstrators) Habituation of observers to observation context
Size of joystick deflection necessary to register response Requirement for demonstrator to hold the joystick
in a deflected position to register a response Type of joystick
(weights, spring loaded, elastic bands, magnetic; various lengths) Type of demonstrator training
Demonstrated response - reinforcer intervals of .25sec or less Excluded Joystick-transfer test (3 locations)
conditions Depositing demonstrator responded in opposite direction to observed demonstrator
Demonstration - test delay (Ihr, 24hr) Multiple demonstration sessions
Shortened demonstration session (20 reinforced responses)
Lengthened demonstration session (100 / 150 reinforced responses) Demonstrated response - reinforcer interval 1 sec or greater
Demonstrated response followed by food alone (tone absent) Demonstrated response followed by tone alone (food absent) Different orientation of animals during demonstration
(observer watched over demonstrator's shoulder) Female demonstrators
Table 5.5: Distribution of effect sizes (g) for the homogeneous subsample o f study effects. Stem Leaf E x t r e m e s - 1 0 . 7 - 1 7 - 1 ★ 1 24 - 0 5 5 5 6 7 9 - 0 ★ 0 0 1 1 1 1 1 1 1 2 2 2 2 3 3 4 4 0 ★ 0 0 0 0 0 1 2 2 2 2 3 3 3 3 3 4 4 4 4 4 0 5 6 7 7 7 7 8 8 9 9 1 * 1 2 2 3 3 4 4 4 4 1 E x t r e m e s 9 2 . 4 , 2 . 6 , 3 . 0 , 5 . 2 , 5 . 5
Table 5.6: Statistics for the homogeneous subsample of effect sizes (g)
Maximum 5.50 Quartile 3 (g^) .82 Median {Qj) .27 Quartile 1 (Ôi) .21 Minimum -10.70 7 5 (& -6 .) .80 SD 1.80 unweighted Mean .30 (95% C l = -.11, .71) weighted Mean“ .35 k (number studies) 73.00
N (total number o f observers) 942.00
n (median number of observers per study) 13.00 proportion positive sign .62
Z o f proportion positive 1.90 (p = .061, two-tailed) Combined Stouffer Z 4.40 ip = .00003, two-tailed) W eighted by total number of animals per study effect.
The subsample of study effects was constrained as follows, in order that it may be representative of paradigmatic testing conditions. In the conventional bidirectional control procedure (e.g., Heyes & Dawson, 1990), magazine trained observer rats are exposed to a single demonstration session in which a demonstrator makes 50 responses in a single direction, before being exposed to the joystick and subjected to a NDR test. Each demonstrated response is followed with the presentation of a tone, and the delivery o f food to the demonstrator. Only studies which employed these procedural features were selected. In addition, study effects were excluded from the new sample if they utilised some o f the more rarely used testing conditions, such as the joystick-transfer test. However, some minor procedural variations were permitted. Table 5.4 details the permitted procedural manipulations, and excluded testing conditions, for the homogenous subsample.
A total of 73 study effects were found to satisfy the above criteria, and their effect sizes are summarised in Table 5.5 and 5.6 in a similar manner to those for the complete sample. Although the unweighted mean, weighted mean and median of these effect sizes all provided different estimations of the population treatment effect for a conventional NDR experiment, all three are in agreement that this is approximately .3, slightly higher than the equivalent estimations for the entire sample of study effects. The weighted mean {M^ = .35) estimates this to be o f greater magnitude than either of the other two measures o f central tendency presented in Table 5.4 (M„ = .30, Mdn = .27), and is probably the most preferable measure because it takes size of study into account. However, the large variability in this sample of effect sizes limits the certainty with which the weighted mean, or any other of these measures of central tendency, can be taken to represent the size of the population effect. Again, this can be illustrated by the 95% confidence intervals around the unweighted mean (-.11 to .71). These confidence intervals not only indicate that a small demonstrator-inconsistent responding bias cannot be ruled out, they also indicate that a population effect size as large as .71 cannot be taken to be unlikely (at the 5% level). The upper confidence interval is important in an evaluation of the sensitivity of the conventional NDR procedure because it effectively specifies the largest effect size which might be attributed to a demonstrator- consistent responding tendency measured in this way, and, thus, the most favourable state of the world for this procedure which is, statistically, tenable.
Table 5.7: Results o f power analysis for the conventional bidirectional control procedure.
Estimate of effect size 8
Power (16 per group) n per group (for 80% power) Best (weighted mean) .35 16% 130
Most favourable, yet plausible
B oth the best estimate (weighted mean), and the most favourable yet plausible estimate (upper confidence interval), of the population effect size were subjected to power analysis. Results of these power analyses which indicate the sensitivity o f the conventional NDR procedure are summarised in Table 5.7. Two measures of sensitivity are provided in Table 5.7. The first o f these is the power when the sample size for a study (number of observers) was fixed to the highest commonly used value (16 per group). With this modest sample size, the best estimate o f the power associated with the conventional NDR procedure was found to be 16%; an experimenter would be expected to be able to reject the null hypothesis, correctly, once in every six or so experiments conducted with this sample size. In the most favourable of situations, this expected hit rate rises to one in two. The second measure of sensitivity provided in Table 5.7 is the number of subjects which a power analysis indicated would be required in order to detect a given effect size with a reasonable level o f power (conventionally set to 80%). When the upper confidence interval was used as an estimate of true effect size, power analysis was found to recommend the use of 33 observers per treatment group in order to detect demonstrator-consistent responding with this level of sensitivity using the conventional NDR procedure. This is equivalent to a recommendation to run studies twice as large as the largest studies conducted to date. However, for the weighted mean, the best estimate of population effect size, 130 observers per treatment group was the recommended sample size. In order to detect demonstrator-consistent responding with a reasonable level of sensitivity, it would appear that the conventional NDR bidirectional control procedure requires unmanageably large sample sizes.
The findings of this meta-analysis of the results of the first seven years of experiments using the bidirectional control procedure may be summarised as follows. Initially, the statistical reliability o f a demonstrator-consistent responding tendency was established; published demonstrator-consistent responding effects do not appear to be merely a sampling artefact. How ever, pow er analyses based upon estimations of the population effect size for the conventional bidirectional control procedure revealed that this is not an especially sensitive research tool. Furthermore, no evidence was found that the strength o f a demonstrator- consistent responding tendency was influenced by either the type o f bidirectional control design, or the person conducting the experiment. On the basis o f these findings, it would seem that the bidirectional control method is not sufficiently sensitive to be of practical use in the investigation of imitation.