Disagreement case study - Inter-annotator agreement and corpus analysis

5.4 Inter-annotator agreement and corpus analysis

5.4.1 Disagreement case study

Since several annotators underwent training but did not mark up other articles, we use these redundant annotations over training documents to illustrate variability, with the caveat that some annotators have not suﬃciently grasped the task, but that others are likely to be highly attentive to the annotation guidelines.

One training document contains the following background to news regarding Australian Federal Lower House preselection in the electorate of Dickson:

(47) Mr Dutton [won]a Dickson from Labor’s Cheryl Kernot in 2001. Ms Kernot [won]b the

seat for Labor in 1998 after[defecting]c from the Democrats.

Out of 14 annotations of this text,12 13 marked a linkable event for a and b,13 while only 8 marked c, reflecting either its lesser salience due to its subordinate clausal position, or annotators deeming the event not newsworthy.

The linked article for c was identified with highest consistency, with six in agreement on the news report, one finding an analysis article from the same day of publication, and the last selecting not found .14 The two selected articles both mention Kernot’s defection, but the canonical reporting is highly evident, opening with similar language to the 2009 article, In a stunning political coup, the Leader of the Australian Democrats, Senator Cheryl Kernot, has defected to the Labor Party. . . ; the analysis article presupposes that fact: Ms Cheryl Kernot’s departure from the Democrats is a devastating blow to the party . . . . In a similar manner for a, apart from one annotator targeting a story four days later,15 all chose the same day’s news, with eight selecting an article directly reporting Dutton’s win against Kernot, three targeting an article on changed voting patterns in the broader region, and one targeting an article focusing on Kernot’s departure. Thus where multiple stories report perspectives on an event, annotators may disagree on – or not make an eﬀort to ensure – the distinction of the target.

While a and b appear similar on the surface, linking the latter is more ambiguous and prone to technical error. Table 5.3 shows the distribution of annotations among six articles: 3 annotators choose articles prior to the election, another two select targets well after the election, with most annotators are split between an article published shortly before and one shortly after Kernot announces her win.

Unlike Dutton, Kernot is mentioned many times before and after her electoral win. The sheer frequency of Kernot’s appearances in 1998 news makes the annotation more technically

12_{This includes six performed with the pilot schema, and eight with the final schema. We do not consider} the changes to substantively aﬀect this example.

For unknown reason, despite the textual similarity, the annotator missing a marked b and vice-versa. Neither of these annotators contributed to the final corpus.

This seems to be a spurious error: the annotator in question made a number of searches with the incorrect constraint that the event was reported in 1998. After viewing two other candidates from 1997, the annotator viewed the canonical target before selecting not found .

5.4. Inter-annotator agreement and corpus analysis 109

Freq. Date Target headline or event

1 1997-12-06 Lees rids herself of Kernot’s style

2 1998-09-19 Where is Kernot? In winner’s tent

1998-10-03 Polling for election

– 1998-10-05 Outburst by Kernot ‘intemperate’

– 1998-10-06 Kernot needs miracle to save political career

– 1998-10-07 Kernot says sorry, but she’ll quit if she loses

– 1998-10-09 Kernot edges towards Parliament

– 1998-10-10 Kernot edges further ahead

– 1998-10-12 Kernot lags, but pins hope on a recount

– 1998-10-13 Women the losers on ALP front bench

– 1998-10-14 Kernot almost home, but just where is it?

– 1998-10-15 Lib claims muckraking, says Kernot

4 1998-10-16 Accusations continue as Kernot firms

1998-10-17 Kernot announces electoral victory 3 1998-10-19 Opponents join to take shine oﬀ Kernot’s win

1 1998-10-21 Playing the diplomat may be Kernot’s hardest task in her new mega-job

1 1998-10-31 Poll proves high profiles count

1 Target not found 1 Event unmarked

Table 5.3: Linking Kernot won the seat: the distribution of fourteen annotations and the timeline from election to declared victory, with a representative article per day where

involved: it either requires manually examining many candidates, or limiting them through fine-grained keyword searches or date constraints. The date constraint explicit in the article (in 1998) does not provide a tight bound, but if the annotator appreciates that the target must not precede the Australian federal election, they can use its date – easily found from Wikipedia – as a constraint, but still need to look through many articles between this date and Kernot’s victory. Of the three annotators that did not ensure the link target followed the election, two were deceived by the headline, Where is Kernot? In the winner’s tent, which elides the fact that this is merely a pre-election prediction. Thus a na¨ıve perusal of search results may also not suﬃce to select the correct target. Alternatively, adding keywords may vastly reduce the number of candidates, but miss the canonical target. Consider, for example, including won as well as Kernot and Dickson in a query: of the four articles annotators chose following the election, all contain the term won except for Opponents join to take shine oﬀ

Kernot’s win, which we argue is the correct target; in this particular instance, the problem is

also one of insuﬃcient term normalisation on the part of our search engine,16where annotators might expect the level of query processing applied in mainstream web search engines that would match win – and perhaps even victory used in the article body – for a query containing won. These technical problems result in part from the random-access nature of the task: news as a genre is designed to be read shortly after publication, and to some extent each day’s news supplants the previous; for annotators divorced of that synchronous knowledge-building experience, a lot of work may be necessary to accurately pinpoint a particular story.

Yet the predominance of error must also be accounted for by lack of clarity of the event reference or misunderstandings within the task. The certainty of Kernot’s win fluctuates in the two weeks following the election. By the report on the 16th of October, Ms Cheryl Kernot appears to have won the seat of Dickson, with further recounting yesterday, although she does not claim victory until the weekend of the 17th, which is reported on the 19th. The ambiguity between these two articles may be from the semantics of Kernot won: is the winning an automatic result of the poll, or subject to her claiming victory? Is it then subject to the absence of later court rulings invalidating (and then upholding on appeal, etc.) that victory? Had the source article instead used the paraphrase Kernot was elected to the seat, would that change the link target? We are again struck by the eﬀect of lexical choice in identifying a precise referent, and the ability to read many event references with narrow or wide interpretations.17 Alternatively, the ambiguity may stem from interpreting the article

first reporting that event as having happened/begun in our schema (appendix Section B.2):

the article on the 16th reports that the event in question appears to have happened, while the later article is more assertive18; the requirement to identify the first article reporting the

16_{Since the Solr search engine we employ applies Porter2 stemming, the problem also lies in won’s morpho-} logical irregularity; had the article used elected, the problem would be a diﬀerent one.

17_{One might argue that elected, or even won, represents a compound event in some uses, and that it is} ambiguous in the context of Example 47.

5.4. Inter-annotator agreement and corpus analysis 111

event often contradicts the desire to find an article where the event’s occurrence is asserted as certain. However, rejecting the article of the 16th cannot merely be because a better candidate exists on the 19th: had the publication not chosen to report the claimed victory, a consistent schema must still reject the alternative, and prescribe not found as the correct annotation. This indicates that our schema needs further explication in order to ensure such consistency.

Annotators (at least at this stage in training) also violate the schema in choosing articles not reporting the event, selecting an article predicting Kernot’s win (on 1998-09-19) and later analysis that presumes the event but does not report it (1998-10-21,31). Indeed, this example illustrates the necessity of linking to an article only describing the event as having happened. Several articles report that Kernot will win before she claims victory, yet identifying one as a canonical representative of the event for linking is problematic; notably, many future-tense references to Kernot’s win suggest that the event will not happen.

The three event references in the Dickson example illustrate a number of sources of annotator disagreement including the diﬃculty of identifying non-salient references, misunderstandings and underspecifications of the annotation schema, technical limitations in searching through archival news, and divergent readings of referential semantics. All of these are re- flected in the aggregate agreement scores reported below.

In document Grounding event references in news (Page 125-128)