Philosophy and Methods
5.6 Criticisms and Limitations of Quantitative Content Analysis
5.6.3 Additional Challenges posed by the method .1 Introduction
Three additional categories of criticism, which have received far less popular attention than those already mentioned, but in my view are substantive, are
acknowledged here. These relate to the construction of categories, coder training, and issues of reliability.
5.6.3.2 Category Development
Despite the assertion that categories are potent entities and that “each
category valorizes some point of view and silences another” (Bowker and Star, 2000, p. 5), it has been claimed that “classical content analysis has few answers to the question from where the categories come, how the system of categories is
developed” (Mayring, 2000, p. 3). Certainly it would appear true that theorists of quantitative content analysis have paid relatively little attention to this issue and this trend is repeated in individual published content analyses, with one meta-analysis reporting insufficient information on categorical reliability to enable them to evaluate the issue (Kolbe and Burnett, 1991).
Category development may proceed in different ways including from a literature review, a single piece of previous research or a group of researchers devising specific new measures. I suggest that, as a minimum, all content analyses should state the process that led to the adoption of particular categories and their definition in particular ways. While it is common for published content analyses to
132
state that the coding protocol will be made available to any interested parties, constraints on space make it impossible for coding protocols to be attached to most published content analyses. I suggest that all published content analyses should include, as a minimum, the full variable definitions, and that, where possible, a website address where the protocol is available for download should be included.
5.6.3.3 Coder Training
Coder training is seen as an integral part of protocol development with discussion among coders on disagreements typically leading to the refinement of categories and alteration of instructions until acceptable levels of reliability can be achieved. Coder training is, however, also problematic, as when coders participate in this way it becomes difficult to determine whether they have merely become more careful or have instead developed a new, group-specific unwritten consensus
concerning what is expected of them (Krippendorff, 2004). Although empirical studies on coder training are virtually non-existent, one small study (Hak and Bernts, 1996) found that coder training worked not only through the communication of the protocol to coders, but also through socialization of coders into practical rules. This is a problem, because it suggests that instead of the protocol governing coding (as is the received wisdom), in fact coding is partially determined by informal rules, which will not be known to future coders who may participate in replications of individual content analyses. Krippendorf’s (2004) recommendations to mitigate against this problem involve incorporating everything that transpired during coder training into the protocol and testing the finalized instructions with a fresh set of coders.
Although both of these steps are possible, they may prove difficult as the more information that a protocol contains the more difficult it is for coders to remember it, and a new set of coders would also require at least some training to ensure their familiarity with the instructions.
I acknowledge that coder training is problematic and, as a minimum, I recommend that details of the number, duration and content of training sessions should be documented. Although the absence of significant research on coder training outcomes makes more specific recommendations problematic, I suggest that while a small number of training sessions involving the joint coding of test articles may be unavoidable, if several training sessions involving lengthy discussions on
133
individual categories are required before acceptable reliability levels are obtained, then new coders should be used to avoid the problem of socialization.
5.6.3.4 Reliability
Reliability poses various challenges in the context of content analysis. While the absence of consensus on a coefficient or an appropriate level of agreement have been widely discussed in previous literature and have been mentioned already in this chapter, two other methodological challenges that became obvious in the course of this study will be discussed here.
Firstly, Section 5.5.5 discusses in detail the problem of insufficient variation.
This arose as a particular challenge in this study and was dealt with by the substitution of high percent agreement figures for kappa scores in the case of the affected variables. Secondly, the sample sizes required for reliability testing
presented a challenge in this study. While it is generally recommended that random sampling be used for the selection of units for reliability testing, and the sample sizes recommended by theorists are generally relatively small, small samples may be insufficient to allow for the testing of all distinctions. Using a larger sample for testing can be a problem, particularly in the context of the pilot reliability test, when either there may be an insufficient number of articles that are relevant but not part of the actual sample/universe, or if actual units are being used, the sample size required to test all critical distinctions may amount to a significant proportion of the actual universe. This is potentially problematic because if reliability levels are not sufficiently high and category definitions end up being revised following the pilot test, this could result in an insufficient number of actual unseen articles left to use for subsequent reliability testing. Although this was not required in this study, a
potential means of alleviating this problem would be to thoroughly test the protocol with one set of coders first before transferring to a second team for a second round of testing. Presumably if high agreement levels were achieved with the first set of coders then very few additional changes would be required to the protocol following testing with the second team. Hence this would maximise the chance that there would be an adequate number of units to allow for unseen reliability testing and subsequent individual coding.
134
5.7 Conclusion
In this chapter I have described the philosophical approach underpinning this study. I have also outlined the form of quantitative analysis I used and described in detail how I applied quantitative content analysis from the point of the identification of content to the analysis of the data. While acknowledging that there are limitations associated both with quantitative content analysis in general and my application of quantitative content analysis in particular, I have suggested that this method provided a suitable means of investigating the research questions and testing the hypotheses which guide this research and are outlined in Chapter 4.
As described in Chapters 1-4, my research questions and hypotheses were themselves guided by a three-pronged background argument: that Irish relief and development NGOs should reorient their activities towards the promotion of
development literacy and global solidarity; that the ways in which NGOs refer to the concepts of legitimacy and accountability may indicate the extent to which NGOs are promoting development literacy and global solidarity and the ways in which the public refer to legitimacy and accountability in relation to NGOs may indicate the extent to which the public already exhibit development literacy and global solidarity;
and that Irish Times newspaper coverage may serve as a reflection of NGO and public views in relation to legitimacy and accountability and hence may serve as an indicator both of the extent to which NGOs are already promoting development literacy and global solidarity and the extent to which the Irish public already exhibits development literacy and global solidarity.
135
Chapter 6
Results
136