From the main properties of Opinosis-Graph that allow it to be used to generate abstractive summaries are redundancy capture, gapped subsequence capture and collapsible structure. The graph capture redundant information through its sub-graph where in the example above, the sentence ‘a great device’ appears to have been mentioned in both sentences 1 and 3. Sentences
structure introduces lexical links that makes it easy to discover new sentences or reinforce existing ones. For illustration, in the example above ‘drop frequently’ are mentioned in different ways, this introduce lexical link between ‘drop’ and ‘frequently’ thus ‘too’ can be ignored in sentence 3. This
is similar to capturing a repetitive gapped subsequence that shows similar sequences of words with minor variation [2]. In the graph, nodes that connect to various nodes could act like a hub that is possibly collapsible. In the example above, the sub-graph ‘the iPhone is’ is fairly heavy and the node ‘is’ acts like a hub and is a good candidate for compression to generate summary like ‘The iPhone is a great device and is worth the price’ in the example above [2].
The summarization approach they follow aims at repeatedly searching the Opinosis graph for appropriate sub-graphs that encode a valid sentence and have a high redundancy scores thus forming an abstractive summary. The summary is a form of shallow abstractive form since it can only contain words that occur in the text however it has elements of fusion and compression so the sentences generated are not generally the same as the original ones. They define a valid path to be a path that intuitively correspond to meaningful sentences which is defined as follows A path W={vq…vs} is valid if it is connected by a set of directed edges such that vq is a valid start node
and vs is a valid end node and W satisfied a set of well-formedness POS constraints [2]. A valid start node is defined as a node that is a natural starting point of a sentence. While a valid end node is a node that completes a sentence where it could be a punctuation such as period or comma or any coordinating conjunction like ‘but’ or ‘yet’ [2]. They used the following POS constraints to
ensure the valid path is of a well-formed sentence, these rules apply to comparative sentences making it application specific, the filter path so that they match one of those rules so for example for the first rule a sentence must have a noun followed by a verb then followed by adjective with allowing words in between and at the end[2]
1. .* (/nn) +.* (/vb) + .*(/jj) + .* 2. .* (/jj) + .* (/to) + .*(/vb) .* 3. .* (/rb) *. * (/jj) + .* (/nn) + .* 4. .* (/rb) + .* (/in) + .* (/nn) + .*
They give a path a redundancy score of the number of overlapping sentences covered by that path. They also score redundancy weighted by the path length since intuitively longer path will be of more value. In addition, in some cases paths could be collapsible thus a path is collapsed and then scored. Their summarization algorithm is as follows. They first rank all paths in descending order
of their scores. Then they eliminate duplicates using a similarity measure namely Jaccard. They then take the top few remaining paths as the generated summary with the number of paths controlled by a parameter that represents the summary size [2]. The algorithm starts by construction of the graph then a depth first traversal of this graph is done to locate valid paths by determining if the node is a valid starting node and if so the algorithm invokes the traverse algorithm that finds the valid path and accumulates its score. Once all paths are explored duplicate paths are removed and the remaining ones are stored in ascending order and the best top candidates are picked as the final summary [2].
For evaluation, they collected reviews from Tripadvisor, Amazon and Edmunds about hotels, cars and various products. Then based on these reviews, two human evaluators were asked to construct opinion seeking queries consisting of entity name and topic of interest for example “Amazon Kindle:buttons” [2]. They compiled a list of 51 queries and created one document per query by
collecting all review sentences that contain the query words of the entity name. Each document was of approximately 100 sentences of unordered redundant review sentences. They used the ROUGE system for evaluation that is based on n-gram co-occurrence between machine summaries and human summaries. They used ROUGE-1, ROUGE-2 and ROUGE-N scores. They got five different human workers to summarize each review document and manually reviewed the summaries and got around four reference summaries for each document. Due to limitation in abstractive summarization work, they used MEAD extractive summarizer as a baseline for comparison. They also devised a readability test to evaluate the readability of the generated summaries. They mixed sentences from human summaries with system-generated summaries and asked human judges to pick the most N sentences that are least readable and if they don’t often pick the system generated one then they consider the generated summary to be good. The following
tables show comparison between Opinosis and human and baseline (MEAD). It shows that Opinosis has closer performance to human summaries than baseline [2]. In addition, using the readability tests they found that 60% of the generated sentences are indistinguishable from the human sentences [2].