Philosophy and Methods
5.4 Introduction to Content Analysis and Justification for the Use of Quantitative Content Analysis
5.4.2 Introduction to Quantitative Content Analysis .1 Definitions of Quantitative Content Analysis
Following a review of earlier definitions Neuendorf (2002, p. 10) defined content analysis as “a summarizing, quantitative analysis of messages that relies on the scientific method (including attention to objectivity, intersubjectivity, a priori design, reliability, validity, generalizability, replicability, and hypothesis testing) and is not limited as to the types of variables that may be measured or the context in which the messages are created or presented”.
113
5.4.2.2 Reliability and Validity in Quantitative Content Analysis
Reliability and validity are key concerns in quantitative content analysis and hence merit discussion here. Reliability, firstly, is generally understood as
agreement among coders about the categorization of data.
When testing reliability levels researchers begin by choosing a reliability coefficient. Although 39 such coefficients have been identified (Popping 1998 cited in Lombard et al., 2002), only a small number are widely known. In particular, percent agreement (often described as Holsti’s method in the context of content analysis), Scott’s pi and Cohen’s kappa warrant discussion, because meta-analyses have shown them to be consistently among the most commonly identified measures (Hughes and Garrett, 1990, Perreault and Leigh, 1989, Riffe and Freitag, 1997).
Percent agreement in the context of content analysis reliability refers simply to the number of categories that coders code in the same way divided by the number of units they code. Holsti’s (1969) method is identical to percent agreement in cases when two coders code the same units. Assessments of the merits of percent
agreement measures vary considerably. Banerjee et al. (1999, p. 5) declare the measure to be “clearly inadequate”. Similarly, Krippendorff (2004, p. 245) describes it as an “uninterpretable agreement measure”. Neuendorf (2002), on the other hand, while acknowledging some drawbacks to percent agreement, does not reject outright its use. Riffe et al. (2005) and Lombard et al. (2002) go further and recommend that agreement figures should be reported.
The main objection to percent agreement is its failure to take chance into account and the attendant possibility that it may overinflate reliability. For example, if there were two coding possibilities and two coders they would have a 50% chance of choosing the same code even if they selected codes without ever looking at the material they were coding. Consequently, theorists generally agree that coefficients that take chance into account should be used either on their own or in addition to percent agreement (Neuendorf, 2002, Krippendorff, 2009, Riffe et al., 2005, Lombard et al., 2002).
Scott’s pi (1955) is one such coefficient. Scott’s pi computes the agreement expected by chance by calculating how often individual category values are used in a given study and then calculating chance agreement based on that usage. Values are expressed in a normal range from .00 (agreement at chance level) to 1.00 (perfect agreement). Scott’s pi, along with Cohen’s kappa, has been criticised as being
114
overly conservative as it gives credit only to agreement beyond chance. In other words, it contains a built in assumption that a certain proportion of coding decisions are due to chance even though this may not be the case. Scott’s pi is calculated using the formula
Pi = Percent agreement observed – percentage agreement expected 1-percent agreement expected
Cohen’s kappa (1960) is calculated using the same formula as Scott’s pi and the measures differ only in terms of how expected agreement is calculated. Whereas Scott’s pi disregards which of two coders has allocated a particular code, Cohen’s kappa checks for systematic biases by accounting for differences in how individual coders allocate their values across the coding categories. Much has been written about which approach is preferable. Whereas Krippendorff (1978), for example, rejects outright the validity of Cohen’s kappa based on its method of calculating expected agreement, Fleiss (1978) has identified its approach to expected agreement as a strength compared to that contained in Scott’s pi. Most commentators have, however, not taken a stance on the matter (e.g. Riffe et al., 2005, Neuendorf, 2002, Lombard et al., 2002).
Once an agreement measure has been decided upon, content analysts must decide how much content to test for reliability. Again there is no definite consensus on this. Neuendorf (2002), following a review of commentary on the issue, has recommended that at least 10% of the full sample or a minimum of 50 units be tested.
The very notion of reliability testing raises the question as to what an
acceptable level of reliability is. Recommendations differ widely. Landis and Koch (1977) have suggested that a kappa score between 0.00 and 0.20 indicates poor agreement, a score between 0.21 and 0.40 indicates fair agreement, a score between 0.41 and 0.60 indicates moderate agreement, a score between 0.61 and 0.80 indicates substantial agreement and a score between 0.81 and 1.00 indicates almost perfect agreement. Banerjee et al. (1999) have suggested that a kappa score of .75 upwards indicates excellent agreement and .40 to .75 indicates fair to good agreement.
Krippendoff (2004) has recommended an alpha, which is equivalent to a kappa in research involving nominal data, of .80 or higher, although he allows for more tentative conclusions to be drawn about variables with reliabilities between .67 and .80.
115
The fact that there is any difference in opinion regarding the meaning of reliability results draws attention to the fact that reliability is itself a construct.
While critics might suggest that this undermines the scientific claims of content analyses, I suggest that differing interpretations of reliability scores are consistent with the moderate approach inherent in a postpositivist outlook.
Validity refers to the extent to which a instrument measures what it claims to measure. Validity (and in particular empirical validity) is widely acknowledged to be problematic in content analysis (Krippendorff, 2004, Potter and
Levine-Donnerstein, 1999, Janis, 1965). Krippendorff (2004) has usefully distinguished between three types of validity that are relevant to content analysis: face, social and empirical. We appeal to face validity when we accept research findings because they appear intuitively to “make sense”. Research has social validity when the findings are sought out and meaningful to a particular constituency. Empirical validity is “the degree to which available evidence and established theory supports various stages of a research process, the degree to which specific inferences withstand the challenges of additional data, of the findings of other research efforts, of evidence encountered in the domain of the researcher’s research questions, or of criticisms based on observations, experiments, or measurements as opposed to logic or process”
(Krippendorff, 2004, p. 315). Each content analyst should be able to identify what, in their view, makes their analysis valid.
5.4.2.3 Key Debates in Quantitative Content Analysis
While quantitative content analysts largely agree on what entails quantitative content analysis, important points of contention do exist between them. Two of these, which will be discussed here, relate to the purpose of content analysis and the distinction between latent and manifest content.
Three potential purposes of content analysis are to describe communication, to draw inferences about the context of the production of communication, and to draw inferences about the context of the consumption of communication. Whereas early textual content analysis tended to focus solely on describing trends in
communication content and some commentators continue to identify a role for purely descriptive content analysis (Riffe et al., 2005), this application is now routinely criticised for being disconnected from social life (Shapiro and Markoff, 1997). While it is common for modern content analyses to be explicitly concerned
116
with making inferences, therefore, whether this is an essential or optional element of quantitative content analysis is contested.
Another area of contention between content analysts concerns whether analysis may or must go beyond the manifest to include consideration of latent content. Shapiro and Markoff (1997) point out that positions adopted on this question range from the view that only manifest content may be analysed to the alternative extreme that implies that only latent content is of genuine interest. The meanings of manifest and latent content warrant interrogation. Holsti (1969, p. 12) defined manifest content simply as “the surface meaning of a text” in contrast to latent content, which he defined as “the deeper layers of meaning embedded in the document”. The notion of manifest content implies that content is inherent to texts although, as Krippendorff (2004) notes, alternative definitions suggest that content can be the property of the source of a text or only emerge in the process of a
researcher analysing a text relative to a particular context. A key question concerns how manifest content (if it exists) can be identified and distinguished from latent content (if it exists). In most texts that deal with this issue the norm is to suggest that manifest content should be equated with the existence of widespread agreement on what a text means (e.g. Riffe et al., 2005). Although this definition of manifest content is common, it is not universal. George (1959), for example, argued that experts may well achieve high reliability in coding latent meanings.
Given the contention surrounding the concepts of manifest and latent
content, it is not clear to what extent the labelling of particular elements of content as manifest or latent aids clarity, and a number of researchers have criticized the
application of the dichotomy on the basis that no clear cut distinction exists (Shapiro and Markoff, 1997). Neuendorf (2002), for example, has suggested that a continuum approach be applied with content being considered in a range from highly manifest to highly latent.
5.4.3 Justification for the Use of Quantitative Content Analysis