Coding and Reliability Testing - Introduction to Content Analysis and Justification for the Use

Philosophy and Methods

5.4 Introduction to Content Analysis and Justification for the Use of Quantitative Content Analysis

5.5.5 Coding and Reliability Testing

I chose to assess reliability using both percent agreement and Cohen’s kappa.

I chose Cohen’s kappa as a measure that takes chance into account both because of the extent of precedence of its use and because its method of calculation of expected agreement taking individual coder distributions into account is more personally convincing to me than that of Scott’s pi. I chose percent agreement as an

accompanying measure because it compensates, to some degree at least, for some of the limitations of Cohen’s kappa, which I will discuss later in this section.

The second coder and I initially coded one third of the articles (72) for the purposes of reliability testing. I randomly selected these articles using the website www.random.org. However, while testing the reliability of the 72 jointly-coded articles it became apparent that very few instances of certain category values had been present in the reliability sample. In order to enhance my reliability data I then increased the reliability sample from one third (72 articles) to one half (108 articles).

I then conducted reliability testing on the 108 articles. The results of these tests are outlined for each variable individually in the table below. The final three columns of the table indicate the significance of the agreement figure achieved according to the commentators identified (Krippendorff, 2004, Landis and Koch, 1977, Banerjee et al., 1999). I have chosen Krippendorff’s assessments as examples of relatively conservative guidelines and Landis and Koch and Banerjee et al.’s assessments as examples of more liberal guidelines.

Three of the variables in this study (year of article, month of article and number of words in article) were not formulated as multiple choice questions and hence were not amenable to testing using Cohen’s kappa. In each case percent agreement was 100%. Of the remaining 20 variables, 18 achieved either Cohen’s kappa reliability scores >0.80 or percent agreement scores >90%. I decided to allow for a high percent agreement rate as an alternative to a high kappa score as the data set yielded a significant incidence of units for which there was very limited variation in terms of the variables being studied. For example, in the case of variable 20, the kappa scores was 0 despite 99.07% reliability being achieved between coders. In

124

this instance the coder and I coded the articles to indicate that in 107 of the 108 articles being studied the content did not include a reference to an NGO claiming to be legitimate. Hence we both coded 107 of the 108 articles with the value 1. The reason the kappa was so low in this instance is that variation is a requirement for reliability to be demonstrated. Without variation coders could simply have agreed to code everything in the same way or could have habitually coded articles in the same way due to boredom or inertia. Of course it is also possible that the coders simply agreed on what the appropriate values were in 107 of the 108 cases. I suggest that to exclude variables that achieved low kappa scores as a result of insufficient variation would be inappropriate for two reasons. Firstly, there was a high number of variables in the study in which both greater variation and higher kappa figures were produced by the coding. This casts doubt on the assumption of careless or duplicitous coding, which presumably would not have been isolated to a small number of variables. Secondly, if, in the case discussed, one more article had been coded 1 by both coders, this would have yielded a kappa of 1 to indicate perfect agreement. Although in such a case variation would have been even less and there would have been more reason to suspect unthinking allocation of the same value, no theorist would suggest that such a result indicated anything other than perfect reliability. In such cases the “benefit of the doubt” is applied and coding is assumed to have resulted from genuine agreement. I suggest that it is internally inconsistent to apply such a logic in cases of 100% agreement but not in cases of 99% agreement.

High percent agreement is being used, therefore, as an alternative to high kappa scores in incidents in which insufficient variation arose. To allow readers to

independently confirm that insufficient variation was the problem leading to the low kappa scores, Appendix I presents the kappa calculation tables for the variables in question.

Finally, it is worth mentioning that although not identified as warranted due to insufficient variation, there is precedent for the use of percent agreement as an alternative to high kappa scores in a study by Lombard et al., (2002) which considered variables with an alpha of 0.7 or higher or a percent agreement rate of 90% of higher.

As outlined in the table below, two variables (10 and 11) did not achieve the threshold applied of Kappa >0.80 or percent agreement >90. Although the kappa score for variable 10 would not generally be considered acceptable by Krippendorff,

125

it has been included in the results for the study as it would be considered acceptable by others including Landis and Koch and Banerjee et al. Variable 11 would be considered acceptable to all three commentators, although Krippendorff would only consider it acceptable for tentative conclusions.

126

Table 5.1 Kappa reliability results and selected interpretations thereof

No. Title of Variable % Cohen’s 5 References to NGO accountability 97.22% 0.927 can be

relied on

almost perfect excellent 6 NGOs questioning or disputing the

accountability of NGOs or other actors

93.53% 0.880 can be relied on

almost perfect excellent

7 Other actors questioning or disputing the accountability of NGOs

9 Claims of NGO accountability by other actors

100% 1 can be

relied on

almost perfect excellent 10 Definitions of accountability applied

by NGOs and other actors

70.37% 0.620 not usually acceptable

substantial fair to good 11 Accountability to whom 77.78% 0.690 suitable for

tentative conclusion

substantial Fair to good 12 Other NGO references to

accountability

95.37% 0.912 can be relied on

almost perfect excellent 13 Other references by other actors to

NGOs and accountability

96.30% 0.885 can be relied on

almost perfect excellent 14 References to NGO administration

costs

99.07% 0.896 can be relied on

almost perfect excellent 15 References made by NGOs to NGO

administration costs

99.07% 0.852 can be relied on

almost perfect excellent 16 References made by other actors to

NGO administration costs 18 NGOs questioning or disputing the

legitimacy of NGOs or other actors

90.74% 0.523 not usually acceptable

moderate fair to good 19 Other actors questioning or disputing

the legitimacy of NGOs

21 Claims of NGO legitimacy by other actors

97.22% 0.613 not usually acceptable

substantial Fair to good 22 Other NGO references to legitimacy 90.74% 0.768 suitable for

tentative conclusion

substantial excellent

23 Other references by other actors to NGOs and legitimacy

98.15 0.865 can be relied on

almost perfect excellent Selected Interpretations

127

In document Irish Times coverage of Irish relief and development nongovernmental organisations, legitimacy and accountability, 1994-2009: analysis and implications for the role of nongovernmental organisations (Page 133-137)