• No results found

CHAPTER 4: METHODOLOGY: QUANTIFYING RHETORIC AND REALITY

4.3 CODIFYING SOCIETAL BEHAVIOR

4.3.2 TABARI AND CAMEO

One of the most widely-used event codification systems today is the TABARI software program (formerly known as KEDS) by Philip Schrodt at Pennsylvania State University. The TABARI system is an open-source software program that accepts a collection of news articles as input and automatically processes them using a grammar-based parsing system to identify more than 300 categories of societal behavior.

For each event, it outputs a dyadic record recording that Actor1 performed a given action to Actor2, and the date and location of the action (Schrodt & Yonamine, 2012). In cases where multiple actors are involved, such as a multiparty peace summit, multiple entries are created, recording all of the dyadic connections. TABARI is a self-contained and fully autonomous coding system, operating in unattended batch mode, accepting as input a collection of news articles and outputting a tab-delimited file containing a list of extracted events.

TABARI has traditionally been applied only to the lead sentence of each news article, recording just the most “important” event in each story. However a growing body of literature has demonstrated the need for full-story coding to adequately cover many regions, in which the entire article text is coded and all events identified, and that approach is taken here (Schrodt, Simpson & Gerner, 2001; Huxtable,

1997). TABARI ordinarily records events at the level of the “country-day” in which more precise spatial and temporal information is discarded. This is problematic during periods of intense conflict, such as the 2011 Egyptian revolution when there were protests throughout the country. Thus, in keeping with Schrodt & Yonamine (2012), the fulltext geocoding process of Leetaru (2012) is used to associate each record with the closest geographic location in context. This preserves the geospatial resolution of high-intensity conflicts, ensuring they are properly recorded. Multiple mentions of the same event in the same or different articles are collapsed during a deduplication process to ensure that high-profile events attracting significant media attention are not counted multiple times.

While the news media traditionally reports on events that took place the previous or current day, they can on occasion report on future events or those in the significant past. TABARI has a dedicated date resolution system that automatically recognizes references like “two months ago” or “next week” and uses a calendaring system to resolve these to their appropriate date. To prevent forward-looking references from skewing the forecasting results (ie an article on a Monday stating that a peace summit will be held the following Friday), all events with forward offsets (ie occurring in the future) are excluded from the forecasts in the next chapter. Thus, only events occurring the day of the news article or in the past are included.

TABARI places its events into the Conflict And Mediation Event Observations (CAMEO) event taxonomy, which has its roots in a successive series of event taxonomies stretching back several decades and is one of the most widely-used taxonomies (Gerner et al, 2002). The latest version of the CAMEO event taxonomy used here (version 1.1b3) consists of 310 distinct event categories such as Code 1661 “Expel Or Withdraw Peacekeepers”, Code 1832 “Carry Out Car Bombing”, or Code 0311 “Express Intent to Cooperate Economically.” These fall under the following 20 root categories:

 01: Make Public Statement

 02: Appeal

 03: Express Intent to Cooperate

 04: Consult

 05: Engage in Diplomatic Cooperation

 06: Engage in Material Cooperation

 07: Provide Aid

 08: Yield

 09: Investigate

 10: Demand

 11: Disapprove

 12: Reject

 13: Threaten

 14: Protest

 15: Exhibit Force Posture

 16: Reduce Relations

 17: Coerce

 18: Assault

 19: Fight

 20: Use Unconventional Mass Violence

Since many of these categories, especially Unconventional Mass Violence, will contain relatively few events for most countries, and to simplify the study of political dynamics, the CAMEO framework offers

the concept of “Quad Classes” (Gerner et al, 2002) that aggregate the individual categories to the

“general behavioral level.” Categories 01 through 05 are grouped under the heading of “Verbal Cooperation”, categories 06 to 09 under “Material Cooperation”, 10 to 14 under “Verbal Conflict”, and 15 to 20 under “Material Conflict.” This offers considerable simplification when exploring the kinds of broader trends of interest here.

A significant benefit of the TABARI system is that it allows the same news content to be used for both the analysis of narrative and the construction of the event database. This is important, as it ensures that any forecasting error is directly related to the model itself, rather than reflective of a disconnect between the narrative and event sources. Take the example of Agence France Presse news coverage of Egypt being used to forecast ACLED’s Egyptian event records. If the resulting accuracy of the models was low, it would be difficult to determine whether the poor accuracy was because Agence France Presse coverage does not contain strong predictive narrative indicators, or whether it was because the ACLED database listed event types that the news agency did not cover in detail. Using the same news source for the entire processing pipeline avoids this potential source of error.

While any coding system, machine or human-based, will suffer from a certain level of error in codifying essentially qualitative occurrences into precise quantitative records, the combined TABARI + CAMEO system was the only system to pass the rigorous tests of the DARPA ICEWS competition (O’Brien, 2010).

Under this competition, a wide array of state-of-the-art event recognition systems were applied to the same corpus of mainstream news articles and required to automatically recognize and codify all events within. The TABARI + CAMEO system performed so well that it is now the coding system used in the ICEWS United States Department of Defense operational watchboard, which compiles a daily list of political events worldwide to assist US military and intelligence analysts monitor global stability.