3.4 Summary generation

4.2.1.2 Algorithms

Two types of algorithms for sentence reduction were tested:BLIND REMOVALandSCORE-

BASED REMOVAL. Both these algorithms take the original sentence and their previously identified structures, and return a reduced sentence.

After performing reduction, both the original sentence score and the reduced sen- tence score are compared to check if the reduced sentence is better than the original one. If it is, the reduced version of the sentence is the one chosen. If not, the original sentence is kept and the reduced version is not used.

Blind removal is an algorithm that takes all the structures detailed in Section 4.2.1.1

and removes1them from the original sentence.

Consider the sentence in Example 4.7, whose removable passages are underlined.

Example 4.7

Também hoje, na conferência de líderes, o ministro dos Assuntos Parlamentares, Jorge Lacão, afir- mou ter-se descoberto que o gabinete do primeiro-ministro tinha ficado de fora.

Today also, at the leadership conference, the Minister for Parliamentary Affairs, Jorge Lacão, said to have discovered that the office of the prime minister had been excluded.

With this algorithm, all these passages are removed from the original sentence, build- ing the reduced sentence illustrated in Example 4.8.

Example 4.8

Também hoje, o ministro dos Assuntos Parlamentares afirmou ter-se descoberto que o gabinete do primeiro-ministro tinha ficado de fora.

Today also, the Minister for Parliamentary Affairs said to have discovered that the office of the prime min- ister had been excluded.

Score-based removal is an algorithm that approximates the notion of power set. In

mathematics, a power set is the set of all subsets of a given set. Here, the "reduced power set" of a given sentence is used to refer to the set with all the reduced sentences that is possible to obtain by combining the removal of its individual candidate structures.

Example 4.9 shows the sentence from Example 4.7 and its score. Example 4.10 de- scribes its power set and the respective scores of each member sentence.

Example 4.9

Também hoje, na conferência de líderes, o ministro dos Assuntos Parlamentares, Jorge Lacão, afirmou ter-se descoberto que o gabinete do primeiro-ministro tinha ficado de fora.

1.7200

Example 4.10

Também hoje, na conferência de líderes, o ministro dos Assuntos Parlamentares

afirmou ter-se descoberto que o gabinete do primeiro-ministro tinha ficado de fora. 1.8175(AP removed) Today also, at the leadership conference, the Minister for Parliamentary Affairs said to

have discovered that the office of the prime minister had been excluded.

Também hoje, na conferência de líderes, o ministro dos Assuntos Parlamentares, Jorge Lacão, afirmou ter-se descoberto que o gabinete do primeiro-ministro tinha ficado de fora.

1.7200 (original sentence) Today also, at the leadership conference, the Minister for Parliamentary Affairs, Jorge

Lacão, said to have discovered that the office of the prime minister had been excluded. Também hoje o ministro dos Assuntos Parlamentares afirmou ter-se descoberto que

o gabinete do primeiro-ministro tinha ficado de fora. 1.7053(AP and

PP removed) Today also the Minister for Parliamentary Affairs said to have discovered that the office of

the prime minister had been excluded.

Também hoje o ministro dos Assuntos Parlamentares, Jorge Lacão, afirmou ter-se

descoberto que o gabinete do primeiro-ministro tinha ficado de fora. 1.6000(PP removed) Today also the Minister for Parliamentary Affairs, Jorge Lacão, said to have discovered

After the "reduced power set" has been determined, the different sentences are or- dered by their relevance score. As illustrated in Example 4.10, depending on the passage that has been removed or the combination of passages removed, the score of the reduced sentence is different. This means that there are some expressions that may be seen as containing more key information than others, as the relevance score is an indication of informativeness. Note that, for instance, the original sentence, from which were built the reduced ones, is in the second place in the list. Note also that the third reduced sentence in the list was obtained by removing all the possible targeted passages, being the same sentence that theBLIND REMOVALalgorithm would create (cf. Example 4.8). Its relevance

score is lower than either the best relevance score of a sentence in the list or the relevance score of original sentence.

The reduced sentence to be kept will then be the sentence in the "reduced power set" that has the maximum relevance score.

Discussion

The main assumption of a reduction process is that the identified syntactic structures are candidates to be removed because they typically detail accessory information with respect to the key information conveyed by that sentence.

Taking this into account, the two algorithms presented before were tested. BLIND RE- MOVALconsiders that all the information targeted can be dispensable. Thus, all the can-

didate stretches are "blindly" removed from the sentences that go through this process. Considering the nature of these stretches, their removal could make room for more information to be included in the summary. However, under closer scrutiny after apply- ing this algorithm, there may be some stretches that, if removed, would compromise the comprehensiveness of the text. Too much information may be being removed, even in- formation that is relevant to be in the summary. This way, the reduction procedure would not have the impact required, that is to identify and remove dispensable information.

SCORE-BASED REMOVAL, in turn, aims to avoid this pitfall. By removing the structures

taking into account the impact of that removal on the score of the sentence, it is expected that the informativeness of the summary also improves. The relevance score, by being an indicator of the sentence informativeness, determines which of the reduced sentences created is the best, that is, the one that contains higher key information density.

SCORE-BASED REMOVALwas then the algorithm selected, since it verifies two impor-

tant conditions: (1) produces the best combination of truncated stretches to obtain a re- duced sentence, and (2) it takes into account the relevance score of the resulting reduced sentences.

Nevertheless, by being a naïve approach to sentence reduction,BLIND REMOVALwill be used in the evaluation procedure (cf. Chapter 5) as a baseline to test the efficiency of the SCORE-BASED REMOVALalgorithm.

Recall the texts in Examples 3.1 and 3.2 from Section 3.1. The sentences in Table C.8 (cf. Annex C) define a summary that could be delivered to the end-user. However, its textual quality can be improved, and the first step towards this goal resides in the cur- rent sentence reduction procedure. The sentences illustrated in Table C.12 have already been reduced. Take for instance sentence#2. It contains an apposition phrase (Trasept

Congo), containing a keyword (Congo) (cf. Table C.5). This apposition could have been

removed. However, when computing the relevance score of the candidate reduced sen- tence, it would have a lower relevance score than the original sentence, and that is why the original sentence has been kept (cf. Table C.10). The inverse has occurred with sen- tence#3. The original sentence contained a parenthetical phrase (prejudicado pelo mau

tempo). The relevance score of the reduced sentence, produced by removing this phrase,

is higher than the one of the original sentence. Thus, the original sentence was replaced by this reduced sentence (sentence#3) (cf. Table C.11).

It is worth a final look at the differences between Tables C.9 and C.12. Not only sen- tence#3 is a newly added sentence to the summary, but also it has been reduced. In fact, by removing dispensable stretches from the sentences, it is possible to include more in- formation in the summary, through the compression review step.

In document Enhancing extractive summarization with automatic post-processing (Page 133-136)