I present here some of the most common commercial and academic summarization systems and give a brief description about each. I also highlight the targeted application for each of these systems and the languages they can handle. Table 2.1 presents a summary of the main public automatic summarizers and highlights their features.
MEAD Multi document multi
lingual
Downloadable from
http://www.clsp.jhu.edu/ws2001/groups/asmd/
NewsInEssence Online news
summarization version of MEAD – English only
Online version
http://www.newsinessence.com/
Newsblaster Multi source multi lingual Online version
http://newsblaster.cs.columbia.edu/
Condensr Multi source – English
Only- Restaurants Reviews Summarization and Sentiment Analysis
Online Version
http://www.condensr.com
Open Text
Summarizer
Multi document – Multi lingual
Downloadable from http://libots.sourceforge.net/
Copernic’s Summarizer
Multi document – Multi lingual – Commercial
Trial version available at
http://www.copernic.com/en/products/summarizer/index.html
Intellexor’s Summarizer
Multi document – Multi lingual – Commercial
Trial version available at
http://summarizer.intellexer.com/
Essential Summarizer Multi document – Multi lingual – Commercial
Trial version available at
https://essential-mining.com/es/produits.jsp?ui.lang=en
SSSearch Multi document – Multi
lingual – Commercial
Trial version available at
http://www.kryltech.com/summarizer.htm
Microsoft Word Auto Summarize
Multi lingual – single document summarizer
Available as a function in the Word application
Centrifuser Domain and genre-
specific multi document summarizer
Available from
http://www1.cs.columbia.edu/nlp/tools.cgi
QCS Query, Cluster and
Summarize Multi-document
Online demo with limited access
http://stiefel.cs.umd.edu:8080/qcs/index.html
2.4.1 MEAD
MEAD [102] is a multi document extractive summarizer that scores sentences according to a linear combination of features including centroid, position and first sentence overlap. These scores are then refined to consider cross-sentence dependencies, chronological order and user supplied parameters. Initially, documents are segmented into clusters with a distinctive theme covering each cluster. Then, all input documents are represented with TFIDF vectors. Other features are also factored in at subsequent stages to help assign a score to each sentence. The overall score Si of a sentence i is computed as the weighted sum of the considered features as follows:
Si = w1 * Ci + w2 * Fi + w3 * Li 2.1
Where Ci is the similarity scores between sentence i and the cluster theme it belongs to, Fi is similarity score between i and the first sentence in the document it belongs to and Li is the position score for sentence i. w1, w2 and w3 are the weights assigned to each feature. After initially computing Si, the sentence i is further re scored to take into account the redundancy in the summary. Because all documents are modelled as BOW, the summarizer is multi lingual and domain independent. Further details about the summarizer and its implementation can be found in [102], [103] and [104]. It should be noted here though at least two of its features give higher weight to sentences at the beginning of document. This makes the summarizer most suitable for news articles where authors tend to include most important sentences at the beginning.
2.4.2 Newsblaster
Newsblaster is a multi-source multi lingual summarization system [105]. It has an online demo which helps users finds the news they are most interested in. It crawls the web to
read news articles from different sources, clusters and categorizes these articles and provides an update for each cluster. It uses different types of summarizers on the collected articles clusters depending on the detected articles types. For example, single-event articles are summarized by integrating machine learning and statistical techniques to identify similar sentences across the processed articles [106]. It also uses a cut and paste method for extracting important phrases from sentences and adding them to the summary [105].
2.4.3 QCS
Given a query, the Query, Cluster and Summarize (QCS) system [107] separates the retrieved documents into topic clusters and creates a summary for each cluster. LSA is used for documents retrieval, spherical k-means for clustering and a HMM-based module for extractive summarization. The system has an online demo with limited access to only the DUC collection dataset and MEDLINE documents.
2.4.4 MASC
MASC [108] is a multi-document summarizer that generates Multiple Alternative Sentences Compressions (MASC), instead of unaltered source sentences, as candidate summary components. It uses weighted features of the candidates to select candidates and construct summaries. MASC differs from MEAD and many other summarizers in that multiple variants of a single source sentence are available to the sentence selector to choose for inclusion in the summary. The system for this summarizer was built on top of two other variants which used different techniques for compressing sentences. One is
sentence. The second is called Trimmer [24] and uses syntactic rules to compress sentences. The summarizer is mono lingual and can only be applied to one language, English, due to the syntactic rules it applies and the language-dependent models built for compressing sentences.
2.4.5 Condensr
This system provides extractive multi-document summaries and sentiment analysis [109]. It leverages the documents structure along with cue words and phrases and contextual information to build an HMM-based model that also aids with summarization. It is designed to primarily handle reviews and has an online demo for summarizing restaurants
reviews and viewing the reviewers’ sentiments.
2.4.6 Open Text Summarizer
The Open Text Summarizer is an open-source tool that analyzes texts in various languages and tries to present the most important parts of the text and present them in a summary. It works by first removing stop words from the text and stemming all terms. Then, a weight is assigned to each word based on its frequency and sentences with highest weighted terms are chosen for the summary. It has a downloadable version in addition to an online one. In addition, it ships with several Linux distributions such as Ubuntu and Fedora.
2.4.7 Commercial Summarizers
Copernic’s summarization software is multi lingual and available commercially on the company’s website. It claims to use statistical and linguistic algorithms to provide
claims to create theme-oriented, structure-oriented and concept-oriented summaries.
Microsoft’s Auto Summarize feature in its common Word application provides a multi-
lingual single document summarization. The Essential Summarizer provided by Mining Essential is a cross-lingual multi-document summarizer. It covers 20 languages and is able to provide translated summaries in a language different from the input documents language. The summaries provided by the system have sentences with varying degrees of font size to illustrate the importance of its sentences. Sentences with bigger font size are more essential and important than others with smaller font sizes. Subject Search Summarizer (SSSearch) is yet another multi-lingual and multi-document commercial summarizer.