CHAPTER 2: GENERAL METHODOLOGY
2.7 Inter-observer reliability
2.7.1 Secondary observer
Inter-observer reliability for identification of gesture form, content, and intentionality was measured by having a second rater code a subset of the video clips using a modified coding spreadsheet. This rater, Cat Hobaiter, was familiar with the gestural communication of gorillas and had coded many videos of captive gorilla interactions. She therefore provided an ideal second rater as she was used to the theory and methodology but was unfamiliar with the species used in the study and thus had no preconceptions of orangutan communication.
2.7.2 Design
The coding sheet used for reliability tests was modified from that used for the original coding of gestures to accommodate the rater’s unfamiliarity with both the individual orangutans included in the study and the behaviour of the species. The reliability coding aimed to focus on the elements of the videotaped actions essential to determining their status as gestures as well as their supposed meaning. The coding of the second rater, therefore, consisted primarily of variables used to determine intentionality, the supposed goal of the action, and the response of the recipient. A full list of variables coded is given in Table 4 and an example of an entry in the spreadsheet is included in Appendix III.
59
Appendices
Table 4: LIST OF CODED VARIABLES USED IN ANALYSIS OF INTER-OBSERVER RELIABILITY.
Variable Type Possible values
Mechanical effectiveness Scale 1-4 1) Effective 2) Possibly effective 3) Likely non-effective 4) Definitely non-effective Directedness Scale 1-4 1) No recipient
2) Several potential recipients
3) Several potential recipients but directed to one 4) One definite recipient
Goal Categorical
• Unknown • Affiliation • Attention • Play
• Share food/object (acquire object or info) • Look at object/body part (direct attention) • Stop behaviour ("no")
• Move back • Leave • Follow • Climb on • Pick up • Mate Signaller’s visual attention
Out of view & Scale 1-4
• Out of view 1) Can’t see recipient 2) Can potentially see 3) Looking towards
4) Looking at the face or eyes
Recipient’s visual attention
Out of view & Scale 1-4
• Out of view 1) Can’t see signaller 2) Can potentially see 3) Looking towards 4) Looking at the face
Modality match Categorical •• Not detectable Detectable but not necessary • Detectable and necessary
Response
waiting Scale 1-4
1) None 2) Pause
3) Wait until response 4) Wait >2 sec
Response Categorical
• No response
• Negative (look away, move away, aggression) • Acknowledge but carry on with prior behaviour • Pay attention (look or move towards)
• Positive interaction (affiliate, play, give)
Goal met? Categorical •• No Yes • Unclear Persistence Categorical • None • Repeat/elaborate • Same modality • Change modality
60
Appendices
Table 4 continued
Variable Type Possible values
Sequence goal Categorical •• Different Same • Unclear Outcome Categorical • None • Affiliation • Attention • Play
• Share food or object • Look at object or body part • Stop behaviour • Move back • Leave • Follow • Climb on • Pick up • Mate Intentionality rating Scale 1-4 1) Not intentional
2) Unclear/needs more evidence
3) Consistent with intentional interpretation 4) Support for intentional interpretation
2.7.3 Procedure
The second rater (CH) was trained to use the spreadsheet on a set of 15 pre- selected video clips. The primary rater (EC) analysed the 15 clips alongside the second rater, discussing why each judgment was made and working with one clip until both agreed on all the different ratings. Then the second rater was given free access to all video clips and told to code as many clips as possible within a limited period of time (two afternoon sessions). She was given no other instructions or limits except that she should include some video clips from each of the three zoos.
2.7.4 Analysis
Tests for reliability between the observations of the two observers were done using Cohen’s Kappa. This test measures the agreement between independent observers, taking into account the possibility of chance agreement.
61
Appendices
Some of the variables were combined into more general categories to reflect the overall nature of the interactions rather than highly specific distinctions between contexts that may require familiarity with either orangutan behaviour or the ability to contextualise the subset of clips within all clips in the dataset. Thus reactions that involved non-
aggressive social interactions were grouped together as “positive” responses, and responses that involved leaving or actively rejecting the signaller (e.g. pushing away) were combined into “negative” responses. I grouped actions that were coded as scalar values into 2 categories of high and low values for analysis. The combining of specific values for each variable is reported below.
Values for the variable mechanical effectiveness were combined into either “effective” (previously, “effective” and “probably effective”) or “non-effective” (previously “likely non-effective” and “definitely non-effective”). I condensed the category directedness by combining “one potential recipient” and “one certain recipient” into “one recipient.”
The variable goal was condensed into more general categories that reflected either attraction or repulsion of the recipient. The values “leave,” “stop,” and “move back” were combined into “stop/move away.” The values that reflected the goal of positive
interaction (“affiliation,” “attention,” “play,” and “look at body part”) were combined into “attention/play.”
The measures of gaze direction for both the signaller and recipient (signaller visual attention and recipient visual attention) were collapsed within each variable so that all values that indicated one individual could see the other became “looking towards.” Thus measures of visual attention had values of either “looking” or “not looking.”
Response waiting was initially divided up into 4 categories in order to obtain a more delicate measure of whether the signaller was demonstrating her expectation of a response from the recipient. For the purposes of this analysis, only the most extreme measure of waiting for a response (waiting for more than 2 seconds) was counted as
62
Appendices
response waiting. All values indicative of pauses shorter than 2 seconds were condensed into “no response waiting.”
For analysis of the variable response, the value “acknowledge but carry on with prior behaviour” was merged into the value “pay attention to.” The variable outcome was condensed using the same combination of categories as was used for goal.
The rating for intentionality was condensed so that both of the values that suggested intentionality (“consistent with intentional interpretation” and “support for intentional interpretation”) were merged into the single value “likely intentional.” This was done to reflect the inclusion of both values in building the dataset of intentional gestures.
2.7.5 Results
The second rater coded 64 video clips, yielding a total of 108 potential gestures. Nineteen of the potential gestures had to be discarded due to incomplete coding. This left 89 potential gestures (5.8% of all gestures) to use for comparison of the two raters. The kappa values for concordance between the two raters are reported in Table 5.
63
Appendices
Table 5: MEASURES OF AGREEMENT (COHEN’S KAPPA) BETWEEN THE TWO OBSERVERS FOR EACH OF THE 13 VARIABLES MEASURED.
Also listed is the type (scalar or categorical) for each variable. The strength of agreement signified by each kappa value (Landis and Koch 1977) is given in the right hand column.
Variable Type of variable Kappa value Strength of
agreement Mechanical
effectiveness Scalar .88 Almost perfect
Directedness Scalar .94 Almost perfect
Goal Categorical .63 Substantial
Signaller’s visual
attention Scalar .91 Almost perfect
Recipient’s visual
attention Scalar .89 Almost perfect
Modality match Categorical .78 Substantial
Response waiting Scalar .79 Substantial
Response Categorical .64 Substantial
Goal met? Categorical .48 Moderate
Persistence Categorical .80 Substantial
Sequence goal Categorical .83 Almost perfect
Outcome Categorical .56 Moderate
Intentionality rating Scalar .68 Substantial
Though the values for two variables generated only “moderate levels of
agreement,” the mean kappa value for all variables was 0.75, signifying a “substantial” strength of agreement between the two raters.
64
Appendices