Simplification Schemes - Event-Based Text Simplification

E VENT S IMPLIFY : E VENT -B ASED ATS S YSTEM

6.2 Event-Based Text Simplification

6.2.2 Simplification Schemes

The EventSimplifyATSsystem offers two different simplification schemes: (1) sentence-wise simplification; and (2) event-sentence-wise simplification.

Sentence-wise simplification eliminates all those tokens in the original sentence which do not belong to any of the extracted factual event mentions. This means that only tokens recognised as a part of event anchors or event arguments are preserved in the simplified text. A single sentence of the input text is transformed into a single sentence of the simplified text, assuming it contains at least one factual event mention.

Descriptive sentences which do not contain any factual event mentions (e.g. “Oh what a shame!”) are eliminated from simplified text. Algorithm 1 summarises the sentence-wise simplification scheme.

Algorithm 1. Sentence-wise simplification input: sentence s

input: set of event mentions E

// initialise the simplified sentence (list of tokens)

S = {}

// list of original sentence tokens

T = tokenize(s) foreach token t in T do

foreach event mention e in E do

// set of event tokens

A = anchorAndArgumentTokens(e)

// if the sentence token belongs to an event

if t in A do

// include the token in the simplified sentence

S = S ∪ t break output: S

Event-wise simplification transforms each event extracted from the input sentence into a separate sentence of the output. Since a single phrase can be an argument of more than one event mention, a single token from the input sentence may be part of several output sentences. For example, input “China sent in its fleet and provoked Philippines”

is transformed into output “China has sent in its fleet. China provoked Philippines”

with “China” being the agent of both events “sent” and “provoked”, thus occurring in both output sentences.

In order to retain the grammaticality of the output, we made three additional adjust-ments to the event-wise simplification:

• Events of the Reporting type (e.g. said) were ignored as they frequently cannot constitute grammatically correct sentences on their own (e.g. “Obama said.”).

• Events with nominal anchors were not transformed into separate sentences, as such events tend to have very few arguments, if any. Nominal events are also very often arguments of verbal events. For example, in “China and Philippines resolved a naval standoff”the mention “standoff ” is a target of the mention “re-solved”and has no arguments of its own.

• Gerundive events that govern the clausal complement of the main sentence event were converted into past simple form in the output. For example, the input

“Philippines disputed China’s territorial claims, triggering the naval confrontation”

is transformed into “Philippines disputed China’s territorial claims. Philippines triggered the naval confrontation”, i.e., the gerundive anchor “triggering” is transformed into “triggered” since it governs the open clausal complement of

the anchor “disputed”.

Algorithm 2 summarises the event-wise simplification scheme.

Algorithm 2. Event-wise simplification input: sentence s

input: set of event mentions E

// initialise the set of pairs (event, set of output tokens)

S = {}

// initialise the set of output tokens for each event

foreach e in E do S = S ∪ (e, {})

// list of original sentence tokens

T = tokenize(s) foreach token t in T do

foreach event mention e in E do

// set of event tokens

a = anchor (e)

A = anchorAndArgumentTokens(e)

// if the token is a part of verbal, non-reporting event

if t in A & PoS (a) 6= N & type(t) 6= Rep do

// if the token is a gerundive anchor, it is converted into past simple tense

if t = a & gerund (a)

S[e] = S[e] ∪ pastSimple(a) else S[e] = S[e] ∪ t

output: S

Additionally, pronominal anaphora resolution was employed on top of the event-wise simplification scheme, as it has been shown that anaphoric mentions cause diffi-culties for people with cognitive disabilities (Ehrlich et al., 1999;Shapiro and Milkes, 2004). Anaphoric pronouns were resolved using the coreference resolution tool from Stanford CoreNLP(Lee et al.,2011).

An example of the original text snippet accompanied by its sentence-wise

simplifi-cation, event-wise simplifisimplifi-cation, and event-wise simplification with anaphoric pronoun resolution is given in Figure 6.3. Event anchor and event arguments are presented in bold. The event anchor is presented in gray, the agent in red, the time in orange, and the location in green. The example in Figure 6.3 also illustrates the imperfections of

Figure 6.3: An example of event-based text simplification

the current system, which can be addressed in the future. The sentence-wise simplifica-tion is not always grammatically correct (“Baset al-Megrahi was convicted in the 1988 Lockerbie bombing has died at his home...”), while the event-wise simplification in its current state does not always keep time relations between the sentences (e.g. that Baset al-Megrahi died after he was released from a prison).

6.3 Evaluation

The output of the EventSimplify system was evaluated automatically for its readability, and evaluated by humans for its grammaticality, and information relevance. Instead of the commonly used human evaluation of simplicity and meaning preservation, we propose the information relevance score which is more appropriate for ATS systems which perform significant content reduction.

6.3.1 Readability

The readability of the system’s output was evaluated on 100 news stories collected from EMM NewsBrief². For each original story and its simplified versions, we computed three frequently used readability scores – Kincaid-Flesch Grade Level (KFGL) (Kin-caid et al., 1975), Automated Readability Index (ARI) (Smith and Senter, 1967), and SMOG Index (McLaughlin, 1969), as well as three common-sense indicators of read-ability: average sentence length (ASL), average document length (ADL), and average number of sentences per document (ANS). As a baseline, we used a syntactically moti-vated simplification strategy that retains only the main clause of a sentence and discards all subordinate clauses. The main and subordinate clauses were identified using the Stanford constituency parser (Klein and Manning,2003a).

The results indicate that the event-wise simplification significantly (p < 0.01)³ in-creases the readability for all measures except the average number of sentences (ANS).

Large variation in ANS for event-wise simplification is caused by a large variation

2http://emm.newsbrief.eu/NewsBrief/clusteredition/en/latest.html

32-tailed t-test if both samples are approximately normally distributed; Wilcoxon signed-rank test otherwise

Table 6.1: Readability evaluation (readability formulae)

Original vs. KFGL ARI SMOG

Baseline -27.70% ± 12.51% -31.03% ± 12.78% -13.95% ± 7.93%

Sentence-wise -30.12% ± 13.93% -30.73% ± 14.20% -16.26% ± 9.24%

Event-wise -50.25% ± 12.59% -50.89% ± 13.43% -30.77% ± 10.46%

Pronom. anaphora -47.76% ± 13.91% -48.14% ± 14.38% -29.41% ± 10.56%

Table 6.2: Readability evaluation (common-sense indicators)

Original vs. ASL ADL ANS

Baseline -38.52% ± 12.13% -38.52% ± 12.13% 0.00% ± 0.00%

Sentence-wise -44.34% ± 11.06% -49.76% ± 11.50% -9.94% ± 8.72%

Event-wise -65.48% ± 9.31% -63.36% ± 12.56% -9.99% ± 39.70%

Pronom. anaphora -63.60% ± 10.25% -61.20% ± 14.37% -9.99% ± 39.70%

in number of factual events per news story. Descriptive news stories (e.g. political overviews) contain more sentences without any factual events, while sentences from factual stories (e.g. murders, protests) often contain several factual events, forming multiple sentences in the simplified text. Event-wise simplified texts seem to be sig-nificantly more readable than sentence-wise simplified texts (p < 0.01) in terms of all measures except ANS. Absolute values of the Kincaid-Flesch Grade Level (KFGL), av-erage sentence length (ASL), avav-erage document length in words (ADL) and the avav-erage number of sentences (ANS) for each simplification scheme are presented in Table6.3.

6.3.2 Human Evaluation

In line with previous work on text simplification (Knight and Marcu, 2002;Woodsend and Lapata, 2011a; Wubben et al., 2012; Drndarevi´c et al., 2013), grammaticality of

Table 6.3: Absolute values of the readability measures for each simplification scheme

Simplification KFGL ASL ADL ANS

Original 11.0 ± 3.6 23.8 ± 5.3 315.9 ± 181.6 13.6 ± 8.1 Baseline 7.8 ± 2.0 14.4 ± 3.3 192.1 ± 115.0 13.6 ± 8.1 Sentence-wise 7.5 ± 2.0 13.1 ± 3.3 153.5 ± 84.3 12.1 ± 6.9 Event-wise 5.3 ± 1.4 7.8 ± 1.1 110.1 ± 61.4 14.2 ± 8.3 Pronom. anaphora 5.5 ± 1.5 8.3 ± 1.5 115.7 ± 63.7 14.2 ± 8.3

simplified text was evaluated by human judges. Due to the cognitive effort required for the annotation, the evaluators were asked to compare text snippets (consisting of a single sentence or two adjacent sentences) instead of whole news stories. As a consequence of the differences between our event-basedATSsystem and the previously proposedATS systems (Knight and Marcu,2002;Woodsend and Lapata,2011a;Wubben et al.,2012;

Drndarevi´c et al.,2013), we propose a measure of information relevance (Relevance) – calculated as the harmonic mean of the Relevant Information score (RI) and the Irrel-evant Information score (II) – instead of the commonly used scores for simplicity and meaning preservation. The meaning preservation score is defined in a way which pe-nalises any change in the meaning between the original sentence and its corresponding simplification, including any loss of information. Given that the main goal of ourATS system is to eliminate all irrelevant information and to retain and simplify only the rele-vant information, the loss of irrelerele-vant information is actually desirable and should not be penalised. Therefore, we propose a different kind of human evaluation which is more appropriate for thoseATSsystems which are expected to – in addition to simplification – perform significant content reduction.

Evaluators were instructed to compare each simplified text snippet with the

respec-tive original, and assign three different scores:

1. Grammaticality score (G);

2. Relevant Information score (RI);

3. Irrelevant Information score (II).

Grammaticality score (G) denotes the grammatical well-formedness of text on a 1–3 scale, where: 1 denotes significant ungrammaticalities (e.g. missing subject or object as in “Was prevented by the Chinese surveillance craft.”), 2 indicates smaller grammatical inconsistencies (e.g. missing conjunctions or prepositions, as in “Vessels blocked the arrest Chinese fishermen in disputed waters”), and 3 indicates grammatical correctness.

Relevant Information score (RI) denotes the degree to which relevant information from the original text is preserved semantically unchanged in the simplified text on a 1–3 scale, where: 1 indicates that the most relevant information has not been preserved in its original meaning (e.g. “Russians are tiring of Putin” → “Russians are tiring Putin”), 2 denotes that relevant information is partially missing from the simplified text (e.g. “Their daughter has been murdered and another daughter seriously injured.” →

“Their daughter has been murdered.”), and 3 means that all relevant information has been fully preserved.

Irrelevant Information score (II) indicates the degree to which irrelevant informa-tion has been eliminated from the simplified text on a 1–3 scale, where: 1 means that a lot of irrelevant information has been retained in the simplified text (e.g. “The pres-ident, acting as commander in chief, landed in Afghanistan on Tuesday afternoon for

an unannounced visit to the war zone.”), 2 denotes that some of the irrelevant informa-tion has been eliminated, but not all of it (e.g. “The president landed in Afghanistan on Tuesday afternoon for an unannounced visit.”), and 3 indicates that only the most relevant information has been retained in the simplified text (e.g. “The president landed in Afghanistan on Tuesday.”).

A few examples of original sentences and their automatic simplifications produced by the EventSimplify system, together with the assigned human evaluation scores, are presented in Table6.4. Note that the relevant information score (RI) and the irrelevant information score (II) can, respectively, be interpreted as recall and precision of infor-mation relevance. The less relevant inforinfor-mation is preserved (i.e. false negatives), the lower the RI score. Similarly, the more irrelevant information is preserved (i.e. false positives), the lower the II score. Considering that the well-performing simplification method should, at the same time, preserve relevant and eliminate irrelevant information, for each simplified text we computed Relevance score (Relevance) as the harmonic mean of its relevant information score (RI) and irrelevant information score (II).

The evaluation dataset encompassed 70 original newswire text snippets, each con-sisting of one or two sentences.⁴. These 70 snippets were simplified using the two pro-posed simplification schemes (plus the additional scheme with the pronominal anaphora resolution) and the baseline, obtaining in that way four different simplifications per snip-pet:

1. Baseline;

4The dataset is freely available at http://takelab.fer.hr/evsimplify

Table 6.4: Human evaluation examples

Ex. Original Simplified G RI II SM

(a) “It is understood the dead girl had been living at her family home, in a neighbouring housing estate, and was visiting her older sister at the time of the shooting.”

“The dead girl had been living at her family home, in a neighbouring housing estate and was visiting her older sister.”

3 3 3 S

(b) “On Facebook, more than 10,000 people signed up to a page an-nouncing an opposition rally for Saturday.”

“On Facebook, more than 10,000 people signed to a page announc-ing an opposition rally for Satur-day.”

2 3 3 S

(c) “Joel Elliott, also 22, of North Road, Brighton, was charged on May 3 with murder. He appeared at Lewes Crown Court on May 8 but did not enter a plea.”

“Joel Elliott was charged on May 3 with murder. He appeared at Lewes Crown Court on May 8.”

3 2 3 S

(d) “For years the former Bosnia Serb army commander Ratko Mladic had evaded capture and was one of the world’s most wanted men, but his time on the run finally ended last year when he was arrested near Belgrade.”

“For years the former Bosnia Serb army commander Ratko Mladic had evaded but his time the run ended last year he was arrested near Belgrade.”

1 2 3 S

(e) “Police have examined the scene at a house at William Court in Bel-laghy, near Magherafelt for clues to the incident which has stunned the community.”

“Police have examined the scene at William Court near Magherafelt.

The incident has stunned the com-munity.”

3 1 3 P

(f) “But opposition parties and inter-national observers said the vote del Rosario was seeking a diplo-matic solution with Chinese Am-bassador Ma Keqing, the TV net-work said.”

“Foreign Affairs Secretary Albert del Rosario was seeking a diplo-matic solution with Chinese Am-bassador Ma Keqing, the TV net-work said.”

3 3 1 B

(h) “ On Wednesday, two video jour-nalists working for the state-owned RIA Novosti news agency were briefly detained outside the Elec-tion Commission building where Putin was handing in his applica-tion to run.” agency. Putin was handing in his application.”

3 2 2 E

Gdenotes grammaticality score, RI denotes relevant information score, II denotes irrelevant information score; while SM denotes the simplification method used: B – baseline, S – sentence-wise, E – event-wise,

2. Sentence-wise simplification;

3. Event-wise simplification;

4. Pronominal anaphora (event-wise simplification with pronominal anaphora reso-lution).

This resulted in total of 280 pairs of original and simplified text snippets. The inter-annotator agreement (IAA) was measured on 40 pairs of text snippets independently evaluated by each of the three annotators. Since a moderate agreement was observed⁵, the evaluators proceeded by annotating the remaining 240 pairs of text snippets (80 each). Pairwise averaged IAA in terms of three complementary metrics – Weighted Cohen’s (κ) coefficient (Cohen,1968), Pearson’s correlation, and Mean Absolute Error (MAE) – is given in Table6.5.

Table 6.5: IAA for human evaluation

Aspect Weighted κ Pearson MAE

Grammaticality (G) 0.68 0.77 0.18

Relevant Information (RI) 0.53 0.67 0.37

Irrelevant Information (II) 0.54 0.60 0.28

As expected, IAA shows that grammaticality is less susceptible to individual in-terpretations than information (ir)relevance (i.e. RI and II). Nonetheless, moderate agreement is observed for RI and II as well (κ > 0.5). Finally, the performance of the proposed simplification schemes on the 70 text snippets was evaluated in terms of

5Landis and Koch(1977) describe a moderate agreement as 0.4 < κ < 0.6, whereas 0.6 < κ < 0.8 indicates a substantial agreement.

Grammaticality and Relevance. The results are shown in Table6.6.

Table 6.6: Grammaticality and Relevance

Scheme Grammaticality (1–3) Relevance (1–3)

Baseline 2.57 ± 0.79 1.90 ± 0.64

Sentence-wise 1.98 ± 0.80 2.12 ± 0.61

Event-wise 2.70 ± 0.52 2.30 ± 0.54

Pronominal anaphora 2.68 ± 0.56 2.39 ± 0.57

All the simplification schemes produce text which is significantly more relevant than the baseline simplification (p < 0.05 for the sentence-wise scheme; p < 0.01 for the event-wise and pronominal anaphora schemes). However, sentence-wise simplification produces text which is significantly less grammatical than the baseline simplification.

This is because conjunctions and prepositions are often missing from sentence-wise simplifications as they do not form any event mention. The same issue does not arise in event-wise simplifications where each mention is converted into its own sentence, in which case eliminating conjunctions is grammatically desirable. Event-wise and pronominal anaphora schemes significantly outperform the sentence-wise simplifica-tion (p < 0.01) in both grammaticality and informasimplifica-tion relevance. The majority of the mistakes in event-wise simplifications originate from a change of meaning caused by the incorrect extraction of event arguments (e.g. “Nearly 3,000 soldiers have been killed in Afghanistan since the Talibans were ousted in 2001.” → “Nearly 3,000 soldiers have been killed in Afghanistan in 2001.”).

Overall, the event-wise scheme increases readability and produces grammatical text,

preserving at the same time relevant content and reducing irrelevant content. Combined, experimental results for readability, grammaticality, and information relevance suggest that the proposed event-wise scheme is very suitable for text simplification.

In document NEW DATA-DRIVEN APPROACHES (Page 169-181)