Full-Text Coding for Small Dyads - Validity of the SAID Data Set

Hypothesis 11: International policy interventions resulted in an improvement of the situation in South Africa

4 Research Design and Methods

4.4 Validity of the SAID Data Set

4.4.4 Full-Text Coding for Small Dyads

For rather densely reported dyads such as that on U.S.-South African relations, the coding of lead-sentences provides a thorough coverage of political events. The lead is a declarative sentence at the beginning of a news story that summarizes the whole story with a rather sim-ple syntax and is therefore easy to code for the machine. Rather poorly covered dyads, so-called small dyads,³⁹ are confronted with low frequencies of reported events.

Event data scholars therefore started to experiment with full-story coding, taking the whole news text as a potential source for the coding of events. Huxtable (1985), for example, demonstrated that news reports on West Africa, while difficult to code and subject to unreli-able and inconsistent coverage, would nonetheless produce meaningful event data when coded from the whole body of the available news stories. Specific tests by Schrodt and Gerner

39 A small dyad is understood as a dyad that includes one or two so-called “small states” (according to Sommer and Scarritt 1999).

(1997: 1/33) comparing lead-sentence and full-story coding on a data set dealing with the Persian Gulf region (which receives more sporadic coverage than the highly covered Middle East, for example) are consistent with Huxtable’s findings: states perceived as peripheral by the international media are more likely to be discussed only in the body of a news story and not in the head. Furthermore, the full story is also more likely to present secondary events that occurred but which did not by themselves justify a separate story (and, thus, did not produce a lead sentence by themselves).

On the other hand, full-story coding involves a significant potential problem: it is subject to more false positives than lead-sentence coding since the body of the news story may return to events that have already been reported in the upper body of the story (and, thus, have already been coded). But the use of filter programs helps to mitigate this problem to a certain degree. I applied a one-a-day filter⁴⁰ to the full-story coded data sets that applies the rule that each dyad can have only one event per coding category per day. Like this, the algo-rithm eliminates most of the false positives at the expense of a few correctly coded events.

To validate full-story coded events gained from AP, I compare in the following pages different full-story coded event series to event series coded from AP lead sentences.

First, I will consider the U.S.-South African dyad (USA>ZAF) for which lead-sentence cod-ing produces a decent number of events per year and a small-n situation therefore hardly ever occurs. USA-South African relations therefore provide a usual dyad to test how adequately full-story coding reproduces general patterns found in highly reliable lead-sentence coding.

Then, I turn to the more critical dyads including Germany (DEU-ZAF), Sweden (SWE-ZAF), and Switzerland (CHE-ZAF).

U.S.-South African Relations

It was shown earlier (Table 7, p. 98, and Figure 15, p. 119) that the U.S.-South Africa dyad is comparatively well covered by international media. It can therefore be assumed that events coded from lead-sentences describe events occurring in this particular dyad in a sufficient way. Full-text coding of the relevant news stories is just expected to deliver a higher fre-quency of particular events whereas the relative frequencies of reported event types do not differ from lead-sentence coded data.

40 Available from the KEDS-Website: http://www.ku.edu/~keds/software.dir/utilities.html. Accessed 18 August 2006.

Figure 17 below confirms that the USA>ZAF dyad features significantly more monthly events when the data is coded from the full stories (monthly mean of 17.4, standard deviation 1.32) than from lead sentences (monthly mean of 5.65, standard deviation 0.44). But the full-story and the lead-sentence event series highly correlate over time (r = 0.93, signifi-cant at the 0.01 level), indicating that full-story and lead-sentence coded event data differ for the USA>ZAF dyad significantly in level but not in variance over time.

Figure 17: Total Monthly Event Counts in USA>ZAF Dyad Coded from AP Lead and Full-text

Whether full-story and lead-sentence coded data also display similar behavior pat-terns within the dyad will now be tested by a comparison of event counts across the SAID/CAMEO cue categories.

Figure 18: Event Counts per SAID/CAMEO Category for the USA>ZAF Dyad, 1977-1996 (AP)

Figure 18 above shows that events coded from AP full-stories and lead-sentences are in fact akin when distributed across the different event categories when the USA>ZAF dyad is analyzed. Event frequencies coded from full-stories outpace frequencies obtained from lead-sentences in every cue category. But the relative distributions across the twenty event catego-ries highly correlate (r = 0.99, significant at the 0.01 level). It can therefore be assumed that events coded from full-stories describe very similar behavioral patterns, at least for the U.S.-South African dyad.

Considering individual years, however, minor differences between full-story and lead-sentence coded data may occur (Figure 19).

Figure 19: Event Counts per SAID/CAMEO Category and Year for the USA>ZAF Dyad (AP)

Note: Graphs differ in scale.

In two of the four selected years as displayed in Figure 19, namely 1982 and 1986, rather marginal but significantly lower correlation coefficients between full-story and lead-sentence coded event distribution exist. For both years, the lower correlation is caused by one particular event category which is differently coded when full-story and lead-sentence coding are applied: in 1982, the full-story coded series displays significantly more events in SAID/CAMEO category 01 (Make Public Statement) than is the case for the lead-sentence coded series; in 1986, differences are significant in SAID/CAMEO category 04 (Consult).

Table 16 demonstrates that the exclusion of the respective outlying category increases the correlation between full-story and lead-sentence coded event series significantly.

Table 16: Correlations of AP Full-story and Lead-sentence Coded Event Counts per CAMEO Category USA>ZAF 1982 USA>ZAF 1986 USA>ZAF 1990 USA>ZAF 1994

Correlation full-lead 0.89*** 0.95*** 0.96*** 0.98***

CAMEO 01 excl. 0.97*** 0.87*** 0.95*** 0.97***

CAMEO 04 excl. 0.81*** 0.96*** 0.94*** 0.96***

***significant at the 0.01 level

Differences between full-story and lead sentence coding are also displayed in sub-dyads. Sub-dyads represent a subset of relations of a superordinated dyad. In U.S.-South African relations (the superordinated dyad), the following sub-dyads are distinguished: U.S.

relations towards the South African government (USA>ZAFGOV), the South African opposi-tion (USA>ZAFOPP), and the South African civil society (USA>ZAFCVS). Table 17 indi-cates that full-story coding improves the total number of coded events quite uniformly across the distinguished sub-dyads of U.S.-South African relations. Compared to lead sentence coding, the event coverage can be improved by around 300% for sub-dyad when the full text is coded.

Table 17: Total Number of Events per Dyad and Coding Mode, 1977-1996 (AP)

Coding Mode USA>ZAFx USA>ZAFGOV USA>ZAFOPP USA>ZAFCVS

Lead-sentence 1356 348 94 60

Full-story 4179 987 328 203

Improvement 308% 284% 349% 338%

Inter-dyad differences between full-story and lead sentence coding can be located when the variance between the two coding modes for each sub-dyad are compared. The corre-lation coefficients in Table 18 below show that differences in variance between the two cod-ing modes are significantly higher in the sub-dyad of U.S. relations towards the South African civil society (USA>ZAFCVS) than in the case of the other three analyzed sub-dyads (USA>ZAFx, USA>ZAFGOV, USA>ZAFOPP).

Table 18: Correlations Event Counts per Event Category in the USA>ZAF Dyad

USA>ZAFx USA>ZAF USA>ZAFGOV USA>ZAFOPP USA>ZAFCVS

Correlation Full-Lead 0.99*** 0.98*** 0.97*** 0.98*** 0.84***

***significant at the 0.01 level

The USA>ZAFCVS sub-dyad with a lower correlation between full-text and lead sentence coded event counts (as displayed in Table 18) represents the smallest of the four

analyzed sub-dyads in U.S.-South African relations. It can therefore be assumed that full-text coding first and foremost improves event coding for such small-dyads.

German, Swedish and Swiss South African Relations

The question remains if such an improvement can also be achieved for the small-state dyads such as Swedish and Swiss relations towards South Africa. Table 19 indicates that full-story coding indeed improves the number of events significantly for these dyads. Rather surpris-ingly, however, the biggest improvement was not achieved in the small-dyads of Swedish (SWE-ZAF) and Swiss (CHE-ZAF) South Africa relations but in the German-South African dyad (DEU-ZAF). Taking into account the few number of events (39) that have been picked up in the DEU-ZAF dyad in the first place by lead sentence coding, the DEU-ZAF must actually to be considered a small-dyad too.

Table 19: Total Number of Events per Dyad and Coding Mode, 1977-1996 (AP)

Coding Mode DEU-ZAF SWE-ZAF CHE-ZAF

Lead-sentence 39 24 19

Full-story 277 105 91

Improvement 710% 438% 479%

Over the whole period from 1977 to 1996, the correlation between full-story and lead sentence coded event counts remains rather low for these three dyads (Table 20). Only around one half of the variance in lead sentence coded event counts is also displayed in full-story coded event counts.

Table 20: Correlations Event Counts per Event Category in the USA>ZAF Dyad

DEU-ZAF SWE-ZAF CHE-ZAF

Correlation Full-Lead 0.51*** 0.43*** 0.69***

***significant at the 0.01 level

These differences between full-story and lead sentence coding for the small dyads of DEU-ZAF, SWE-ZAF and CHE-ZAF both in level and variance indicate that the two coding modes supposedly produce data that carries different information. I therefore test in the fol-lowing how the coded events are distributed across the SAID/CAMEO event categories.

The event distribution across the SAID/CAMEO categories for the three dyads of DEU-ZAF, SWE-ZAF and CHE-ZAF (Figure 20) in fact confirms that full-story coding not only generates higher numbers but also different types of events. Figure 20 below also shows

that full-story coding is able to pick up specific event types that are not covered when lead sentences are coded.

Figure 20: Yearly Event Counts per Category for DEU-ZAF, SWE-ZAF, CHE-ZAF 1977-1996 (AP)

Note: Graphs differ in scale.

As Table 21 indicates, the overall pattern of the distribution of the coded events across the SAID/CAMEO event categories does not differ strongly from full-story to lead sentence coding. But the correlation coefficients are significantly lower for the DEU-ZAF, SWE-ZAF, and CHE-ZAF dyads (Table 21) than is the case for the more densely covered USA>ZAF dyad (Table 18 above).

Table 21: Correlations Event Category Distribution in the DEU-ZAF, SWE-ZAF, and CHE-ZAF Dyad DEUx-ZAFx SWEx-ZAFx CHEx-ZAFx

Correlation Full-Lead 0.79*** 0.69*** 0.84***

*significant at the 0.01 level

Conclusion

This brief analysis indicates that full-story and lead-sentence coding does not result in significantly different data when the analyzed dyad is well covered by the international media, such as the U.S.-South Africa dyad (USA>ZAF). In this particular case, full-story coding picks up more events than lead-sentence coding, but the event series generated by the two coding-modes only differ in level but not in variance. Moreover, reported behavioral patterns are very similar.

When it comes to small dyads—that is, dyads less densely covered by the interna-tional media—full-story coding is able to pick up events when lead-sentence coding is not.

Event data series generated by full-story and lead sentence coding differ not only in level but also in variance, indicating that the two coding modes in fact cover behavioral patterns within small-state dyads differently. Analyzing small-dyads, full-story coding is therefore favorable over just coding lead sentences since broader information about interactions in such dyads can be tapped, at the expense of a few more false positives.

The better performance of full-story coding on small dyads is partly due to the newswires’ style of reporting on such small dyads. As Schrodt and Gerner (2000; 1998: 1/33) and Huxtable (2000) noted, international wire services sometimes append a series of event reports to the end of a story on a particular region. These “side-events” are often unrelated to the main story except for their regional focus. The underlying editorial model appeared to be, as Schrodt and Gerner (1997: 17) put it, “Hey, if you’re sufficiently interested in this out-of-the-way region to read this far, you’ll probably enjoy this other stuff as well.”

In document Intervening against apartheid : the South Africa policy of the United States, West Germany, Sweden and Switzerland, 1977-1996 (Page 125-133)