3.3 A generic framework for aggregated search
3.4.2 Natural Language Generation
Given an information need (usually a question), the goal of NLG is to gen- erate an answer with the right information in an appropriate linguistic form [130]. This can demand merging content from different documents. So far, natural language generation addresses a limited range of information needs and research focuses on application specific needs.
NLG perspective addresses issues such as the comprehensibility of the answer [68], coherence, structure of the answer. McKeown [116] recognizes the need of a prototypical structure for certain queries. Returned facts can be organized based on some relationships. For instance, they can be ordered chronologically, they can have a cause-effect relation, background information is shown first and so on [134].
Paris et al. [132, 133, 131] show benefits and application of natural language generation approaches across different domains. In [132], they use discourse planning to generate automatically surveillance reports tailored to various contexts and tasks. The document generation is considered as a goal which can be decomposed in sub-goals represented through a discourse tree. Nevertheless, the information need in this application is not explicit (query), but implicit. In [133], authors use a similar approach for a traveling application. In this work, they incorporate query-based search. The user can indicate his target destination, but also some budget constraints. The third application is called Scifly and it produces brochures on demand. The users query is the name of an organisation and the result is a generated brochure containing several text passages. The effectiveness of this approach is evaluated through user studies.
Instead of summaries, Szl´avik et al [169] aggregate content from different documents which they show preceded by a table of contents. This approach facilitates navigation, but it does not guarantee any coherence among the assembled content.
Sauper et al. [155] propose an approach to automatically generate Wikipedia-like documents. They study the document structure for specific classes of information such as diseases and actors from Wikipedia4. The learned document structure is then used to automatically build inexistent documents for other instances of the class. For instance, for diseases it is common to have sections on causes, treatment, symptoms. For a disease which is not present in Wikipedia, a document with the same structure is generated extracting information from the Web.
To summarize, natural language generation approaches also fall in the general aggregated search framework. We can observe that queries can dis- patch different solutions. It involves various nugget retrieval and result aggregation.
3.4.3 Relational aggregated search
In this class of approaches, we place a generalization of relational search [35] and entity-oriented search [19]. We consider it as a search paradigm that relies on relations between different information nuggets. In addition to nugget retrieval, relational aggregated search retrieves relations. The latter can be precious for result aggregation.
To better understand this new paradigm, we will first present entity- oriented search and relational search. The first one is more about retrieving entities, while the second is more about retrieving their relations. The com- bination of both enables what we call relational aggregated search.
∙ Entity-Oriented search
Named entities are common concepts which belong to categories such as locations, person names, organisations, . . . . They are also called class instances [9, 93]. They are particularly common in text and queries. In a recent work, Kato et al. [87] found that about 71% of the Web search queries contain named entities. Another recent study [25] on query logs found that about 73% - 87% of the queries contain named entities and that about 18% - 39% of the queries are named entities. Given the importance of named entities and their frequent occurrence in queries, there is an increasing interest in retrieving them as well as retrieving content for them.
Instead of a list of documents, in entity-oriented search the result is a list of entities [19]. This is useful when we cannot name some entity
4
(e.g. actor playing in Pretty Woman) or when we want to find related entities (e.g. entities similar to Nokia e72).
From a broader perspective we can consider that information is aggre- gated around entities. When we query about the entity, we can then return a lot of concerning information about it. In literature, there exist plenty approaches that take an entity as a query and return re- lated content such as the Wikipedia homepage of the entity [19, 20], images [170], social network profile of a person [195], etc.
In [24], authors define the notion of composite item to correspond to the conjunction of an entity and related compatible entities. For example, a user shopping for an iPhone can be presented as a com- posite item containing the iPhone and a list of gadgets that match the iPhone, all within the user’s budget. The approach is interesting but authors do not focus much on the retrieval process rather than on aggregation with given constraints.
Another name for entity-oriented search is object-level search [126, 125]. In the latter, the main target is to extract and assemble infor- mation around an entity. The result of this aggregation is refered to as object. This enables returning pre-built objects instead of documents.
∙ Relational search: In Information Extraction (IE), it is common to extract and relate content from documents. Existing approaches can extract named entities such as person names, locations, organisations, etc., but also their relations such as “John” works for “Motorola”. Information retrieval based on these extracts is also known as relational search.
In [35], authors identify different types of queries that can be answered with relational search. To illustrate we can give some examples such as “French wines”, “capital of France”, “features of iPhone” [93]. The first query can be answered with a list of instances (named entities). The second query can be answered with an attribute value, while the third can be answered by many attributes (name and value).
Relational search is enabled by information extraction techniques [9] and mining within semi-structured data [36]. Existing techniques can discover many information extracts and their relations. Nevertheless, their use for information retrieval remains limited.
Because retrieving information nuggets and their relations is possible and because this can enable flexible result aggregation, we consider rela- tional aggregated search as one of the most promising research directions of aggregated search. We will study in detail relational aggregated search in chapter 4.