As the World Wide Web (WWW) is getting bigger and people publish more informa- tion on it, users of the WWW have access to, and are overwhekmed with. Considering the volume of relevant information, documentsummarization has become a must. Doc- ument summarization aims at filtering out less informative pieces of documents and only presents the most relevant parts of document(s). Summarizing a vast amount of information is very challenging and more importantly time-consuming, and thus automatic summariza- tion comes as a pragmatic solution. Automatic text summarization is one of the oldest problems which has been investigated in the past half-century by the Natural Language Processing (NLP) and Information Retrieval (IR) communities. Text summarization is “the process of distilling the most important information from the source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks)” (Mani and May- bury, 1999). A summary can be generated by either selecting important sentences of the original text(s) or understanding and rewriting the main idea of the original text(s). It can also be either comprehensive or query specific. In general, the summarization techniques are categorized into different classes based on different criteria as described below:
In this paper we present an extractive summariza- tion system POLY 2 based on a linear program- ming model. We represent the document as a set of intersecting hyperplanes. Every possible sum- mary of a document is represented as the inter- section of two or more hyperlanes. We consider the summary to be the best if the optimal value of objective function is preserved during summariza- tion, and translate the summarization problem into a problem of finding a point on a convex polytope which is the closest to the hyperplane describing the ”ideal” summary. We introduce multiple ob- jective functions describing the distance between a summary (a point on a convex polytope) and the best summary (the hyperplane). Since the in- troduced objectives behaves differently on differ- ent languages, only two of them were indicated as primary systems and evaluated by MultiLing 2013 organizers–OBJ 1 P OS F and OBJ 3 –denoted
In order to improve the quality of ranking aggregation, some supervised learning methods have also been proposed. The work in (Liu et al., 2007) incorporated labeled data into a supervised rank aggregation method to minimize disagreements between ranking results and labeled data. (Chen et al., 2011) proposed a semi-supervised rank aggregation approach and the work minimizes the weight disagreements of different rankers to learn the aggregation function. In the semi-supervised case, the preference constraints on several item pairs were incorporated and the intrinsic manifold structures of items are also taken into account. In (Hoi and Jin, 2008), a different semi-supervised method was proposed, which learns query- dependent weights by exploring the underlying distribution of items to be ranked and assigns two similar retrieved items with similar ranking scores.
In this paper, we focus on the task of producing extraction-based query-focused multi- document summaries given a collection of documents, which is usually considered as a sen- tence ranking problem. Typically, ranking methods calculate the combinational effects of var- ious features which are designed to identify the different aspects of sentences and/or their relevance to queries. Yet so far not much attention has been paid to it. Most commonly, the features are simply combined by a linear function in which the weights are assigned manu- ally or tuned experimentally. In the past, machine learning approaches have been successfully applied in extractivesummarization (Ouyang et al., 2007; Shen and Li, 2011), and a new re- search branch named “learning to rank" has emerged. Its objective is to explore how the optimal weights can be obtained automatically by developing learning strategies. However, previous work mainly considered the content relevance of sentences with respect to certain query while ignoring the relationships among sentences. In this paper, we try to study how to use sentence relatedness to improve the performance of a ranking model. Further, we notice that many learning to rank algorithms have been proposed in recent literature, and these algo- rithms can be categorized into three types: pointwise, pairwise, and listwise approaches. The pointwise and pairwise approaches transform ranking problem into regression or classification on single object and object pairs respectively, while neglecting the fact that ranking is a predic- tion task on a list of objects. In listwise approach, object lists instead of object pairs are used as instances in learning, and the major task is how to construct a listwise loss function, rep- resenting the difference between the ranking list output by a ranking model and the ranking list given as ground truth. Experimental results showed that listwise approach usually outper- forms pointwise and pariwise approaches (Cao et al., 2007; Qin et al., 2008). Accordingly, we mainly concentrate on developing listwise learning to rank in our summarization task. More exactly, taking into account the specific scenario of summarization, it’s better to base our work on a variant of basic listwise training model: ListMLE Top-K presented in (Xia et al., 2009). It’s because that we usually only need to select a small amount of sentences to construct a summary. ListMLE Top-K , a modification of basic listwise algorithm for more suitability in many real ranking problems where the correct ranking of the entire permutation is not needed, could help us to improve the ranking accuracies of top-K sentences. Based on that, our novel RelationListwise function, having absorbed sentence affinity information, is formed to learn the optimal feature weights.
We report on a Mechanical Turk evaluation that directly compares G-F LOW to state-of-the-art MDS systems. Using DUC’04 as our test set, we com- pare G-F LOW against a combination of an extractivesummarization system with state-of-the-art ROUGE scores (Lin and Bilmes, 2011) followed by a state- of-the-art sentence reordering scheme (Li et al., 2011a). We also compare G-F LOW to a combina- tion of an extractive system with state-of-the-art co- herence scores (Nobata and Sekine, 2004) followed by the reordering system. In both cases participants substantially preferred G-F LOW . Participants chose G-F LOW 54% of the time when compared to Lin, and chose Lin’s system 22% of the time. When com- pared to Nobata, participants chose G-F LOW 60%
In this paper we propose SubSum, a subjective logic framework for sentence-based extractivemulti-documentsummarization. Document summaries perceived by humans are subjective in nature as human judgements of sentence relevancy are inconsistent and laden with uncertainty. SubSum captures this uncertainty and extracts significant sentences from a document cluster to generate extractive summaries. In particular, SubSum represents the sentences of a document cluster as propositions and computes opinions, a probability measure containing secondary uncertainty, for these propositions. Sentences with stronger opinions are considered more sig- nificant and used as candidate sentences. The key advantage of SubSum over other techniques is its ability to quantify uncertainty. In addition, SubSum is a completely unsupervised approach and is highly portable across different domains and languages.
Summarization can be extractive or abstractive . Extractive summaries are summaries in which all the text in the summaries are from the main document(s). Abstractive summaries are summaries produced in which some sentences are paraphrased to represent what the document is saying but the sentence(s) is not exactly as in the document(s). Summarization can also be single document or multiple documents. Single documentsummarization is summary generated from a document while multiple documentsummarization  is summary generated from two or more related documents. According to type of summary, different approaches are employed. For instance, an extractive summary will not need paraphrasing method or may not even need semantic method in summarization process . To achieve a good summary, approaches such as topic identification, frequency of words, position of sentence, graph based, machine learning, semantic to mention a few are employed. With all these approaches , AS is still insufficient as compared to human summaries. Most of the summaries especially multi-document summaries face challenge of redundancy. Similar sentences are present in different documents and these sentences are sometimes rated high  due to popularity. Another issue why automatic summaries are still less efficient to manual summary is coherency. Sentences are rearranged for summarization and therefore loses its chronological arrangement. Automatic summarization systems are evaluated for relevancy, precision and call, length, expert evaluation, etc. With new ideas and improvement on existing approaches, summarization system issues.
Two of the key components of effective summariza- tions are the ability to identify important points in the text and to adequately reword the original text in order to convey these points. Automatic text summarization approaches have offered reasonably well-performing approximations for identifiying im- portant sentences (Lin and Hovy, 2002; Schiffman et al., 2002; Erkan and Radev, 2004; Mihalcea and Ta- rau, 2004; Daum´e III and Marcu, 2006) but, not sur- prisingly, text (re)generation has been a major chal- lange despite some work on sub-sentential modifica- tion (Jing and McKeown, 2000; Knight and Marcu, 2000; Barzilay and McKeown, 2005). An addi- tional drawback of extractive approaches is that es- timates for the importance of larger text units such
made available in 2014. The systems we include can all be (or broadly) regarded as extractivesummarization systems, which directly selects sentences from the original input. The six state-of-the-art systems were developed between 2009 and 2014, and have all demonstrated their superiority in the original papers. The input in our repository comes from the DUC 2004 workshop, the latest year in which generic summarization was addressed in a shared task. This dataset is the most popular one for evaluation in generic summarization (Takamura and Okumura, 2009; Lin and Bilmes, 2011; Kulesza and Taskar, 2012; Hong and Nenkova, 2014a). Apart from DUC 2004, systems were also evaluated on earlier years of the DUC data or other data. Notably, many generic summarization systems evaluate their performance on benchmarks designed for query-focused or guided summarization, where these systems simply ignore the specifications (Berg-Kirkpatrick et al., 2011; Woodsend and Lapata, 2012; Almeida and Martins, 2013). However, it is not advisable to do this, since the human summaries used for comparison are designed to answer the query or address the guidance.
The task of automatic documentsummarization aims at finding the most relevant informations in a text and presenting them in a condensed form. A good summary should retain the most important contents of the original document or a cluster of documents, while being coherent, non-redundant and grammatically readable. There are two types of summarizations: abstractive summarization and extractivesummarization. Abstractive methods, which are still a growing field are highly complex as they need extensive natural language genera- tion to rewrite the sentences. Therefore, research community is focusing more on extractive sum- maries, which selects salient (important) sentences from the source document without any modifica- tion to create a summary. Summarization is classi- fied as single-document or multi-document based upon the number of source document. The infor- mation overlap between the documents from the same topic makes the multi-document summariza- tion more challenging than the task of summariz- ing single documents.
Extractive text summarization extracts sentences from the original text to create the summary. This is usually done using some statistical analysis to count and rank sentences. The sentences that score high become a part of the summarized subset. Abstractive text summarization, on the other hand, may not include words from the parent text. Abstractive summarization understands the language and context to generate new sentences. The main difference while creating both summarization tools is that abstractive does not necessarily need pre-written text but it does need a large amount of training data.
Traditionally, features for summarization were studied separately. Radev et al. (2004) reported that position and length are useful surface fea- tures. They observed that sentences located at the document head most likely contained important information. Recently, content features were also well studied, including centroid (Radev et al., 2004), signature terms (Lin and Hovy, 2000) and high frequency words (Nenkova e t al., 2006). Radev et al. (2004) defined centroid words as those whose average tf*idf score were higher than a threshold. Lin and Hovy (2000) identified signature terms that were strongly associated with documents based on statistics measures. Nenkova et al. (2006) later reported that high frequency words were crucial in reflecting the focus of the document.
Human: The Canon G3 was received exceedingly well. Consumer reviews from novice photographers to semi-professional all listed an impressive number of attributes, they claim makes this camera superior in the market. Customers are pleased with the many features the camera offers, and state that the camera is easy to use and universally accessible. Picture quality, long lasting battery life, size and style were all highlighted in glowing reviews. One flaw in the camera frequently mentioned was the lens which partially obsructs the view through the view fi nder, however most claimed it was only a minor annoyance since they used the LCD sceen.
As discussed in Section 1, a collection of documents often involves different topics related to a specific event. The basic idea of our summarization approach is to discover the latent topics and cluster sentences according to the topics. Inspired by (Chemudugunta et al, 2006) and (Li et al, 2011), we find 4 types of words in the text: (1) Stop words that occur frequently in the text. (2) Background words that describe the general information about an event, such as ”Quebec” and ”independence”. (3) Aspect words talking about topics across the corpus. (4) Document-specific words that are local to a single document and do not appear across different corpus. Similar ideas can also be found in many LDA based summarization techniques (Haghighi and Vanderwende, 2009; Li et al, 2011; Delort and Alfonseca, 2012).
Inspired by PageRank and HITS algorithm, much focus has been put on adopting graph- based ranking algorithm like LexRank (Erkan and Radev, 2004) and TextRank (Mihalcea and Tarau, 2004) to multi-documentsummarization. These algorithms generally employ the global information described by a passage affinity graph and recursively calculate each passage’s significance based on link structure analysis, stability-based random walk, global consistency or smoothness-based label propagation on the graph. Topic-sensitive LexRank (Haveliwala, 2003) extended the traditional LexRank algorithm by integrating the similarity between sentences and the given query. Wan et al. (2007) adopted a manifold-ranking algorithm to rank sentences by considering global information and emphasizing the high biased information richness in a score propagation process.
Impressive progress has been made on neural abstractive summarization using encoder-decoder models (Rush et al., 2015; See et al., 2017; Paulus et al., 2017; Chen and Bansal, 2018). These mod- els, nonetheless, are data-hungry and learn poorly from small datasets, as is often the case with multi- documentsummarization. To date, studies have primarily focused on single-document summariza- tion (See et al., 2017; Celikyilmaz et al., 2018; Kryscinski et al., 2018) and sentence summariza- tion (Nallapati et al., 2016; Zhou et al., 2017; Cao et al., 2018; Song et al., 2018) in part because par- allel training data are abundant and they can be conveniently acquired from the Web. Further, a notable issue with abstractive summarization is the reliability. These models are equipped with the ca- pability of generating new words not present in the source. With greater freedom of lexical choices, the system summaries can contain inaccurate fac- tual details and falsified content that prevent them from staying “true-to-original.”
A multi-documentsummarization system aims to generate a single summary from an input set of documents. The input documents may have been obtained, for example, by submitting a query to an information retrieval engine and retaining the most highly ranked documents, or by clustering the documents of a large collection and then using each cluster as a set of documents to be summarized. Although evaluations with human judges also examine the coherence, referential clarity, grammaticality, and readability of the summaries (Dang, 2005, 2006; Dang and Owczarzak, 2008), and some of these factors have also been considered in recent summarization algorithms (Nishikawa et al., 2010b; Woodsend and Lapata, 2012), most current multi-documentsummarization systems consider only the importance of the summary’s sentences, their non-redundancy (also called diversity), and the summary length (McDonald, 2007; Berg-Kirkpatrick et al., 2011; Lin and Bilmes, 2011).
sentences labeled as summary-worthy. To over- come this, several studies have used artificial ref- erence summaries (Sun et al., 2005; Svore et al., 2007; Woodsend and Lapata, 2010; Cheng and Lapata, 2016) compiled by collecting documents and corresponding highlights from other sources. However, preparing such a parallel corpus often requires domain-specific or expert knowledge de- pending on the domain (Filippova et al., 2009; Parveen et al., 2016). Our summarization uses document-associated information as pseudo rough reference summaries, which enables us to learn feature representations for both document classi- fication and sentence identification with smaller amounts of actual reference summaries.
Results Table 3 shows the results. The proposed optimization significantly and systematically im- proves TF*IDF performance as we expected from our analysis in the previous section. This re- sult suggests that using only a frequency signal in source documents is enough to get high scor- ing summaries, which supports the common belief that frequency is one of the most useful features for generic news summarization. It also aligns well with the strong performance of ICSI, which combines an ILP step with frequency information as well.
We propose a neural multi-document sum- marization (MDS) system that incorpo- rates sentence relation graphs. We employ a Graph Convolutional Network (GCN) on the relation graphs, with sentence em- beddings obtained from Recurrent Neural Networks as input node features. Through multiple layer-wise propagation, the GCN generates high-level hidden sentence fea- tures for salience estimation. We then use a greedy heuristic to extract salient sen- tences while avoiding redundancy. In our experiments on DUC 2004, we consider three types of sentence relation graphs and demonstrate the advantage of combin- ing sentence relations in graphs with the representation power of deep neural net- works. Our model improves upon tradi- tional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive re- sults against other state-of-the-art multi- documentsummarization systems. 1 Introduction