3.7 Experimental Evaluation
3.7.3 Experimental Results
We evaluate S3 by comparing its performance with those obtained with the original version of the SS algorithm. In particular, for each set of queries in the dataset described above (short, medium and long), we computed average relatedness and average diversity.
Figure 3.4 shows the average relatedness computed for each query q belong- ing to a particular set of queries. Results confirm the validity of our intuition as, for all the three sets, the results obtained by S3 are always greater than the results obtained by considering the SS suggestions. It is worth to observe that the entities suggested by S3 are potentially completely different by the entities annotated in the suggestions of SS. In fact, while in SS we are ex- ploiting only the entities in the titles, in S3 we are leveraging all the entities in the whole virtual document, using the virtual document relevance to boost
7Interested readers can download it from:
3.7. EXPERIMENTAL EVALUATION 87
short medium long 0.15 0.2 0.25 0.3 0.32 0.25 0.23 0.31 0.19 0.16 Query Set R elate dness S3 SS
Figure 3.4: Per-set average relatedness computed between the list of sugges- tions and the given query.
short medium long 0.4 0.6 0.8 0.62 0.63 0.69 0.33 0.38 0.59 Query Set Diversity S3 SS
Figure 3.5: Per-set average diversity computed between the list of suggestions and the given query.
88 CHAPTER 3. SEMANTIC QUERY RECOMMENDATIONS
the most important entities. Furthermore, the longer the queries the more difficult the suggestion of related queries. This happens because long queries occur less frequently in the log and then we have less information to generate the suggestions. If we consider single sets, the highest gain of S3 in terms of average relatedness is obtained for medium and long queries: this means that relying on entities allows to mitigate the sparsity of user data.
Figure 3.5 reports the average diversity of the suggestions over the queries of each set. Here, we observe an opposite trend, due to the fact that the longer the queries, the more terms/entities they contain, and the more different the suggestions are. Furthermore, we observe that, for the most frequent queries, SS has a very low performance w.r.t. S3. This happens because in the case of frequent queries SS tends to retrieve popular reformulations of the original query, thus not diversifying the returned suggestions. S3 does not suffer for this problem since it works with entities thus diversifying naturally the list of suggestions. We leave as future work the study of a strategy for suggesting entities aiming at maximizing the diversity on a list of suggestions.
Let us clarify with the example in Table 3.2 how the two techniques be- have differently. Given the query “dante”, SS returns the following suggestions: dante banquet, dante boska komedia, dante paradiso, dante kupferstich, dante’s divina, dante divine comedy, dante ali, dante alle. Please note the previously highlighted behavior of SS. The suggestions it produces are often reformula- tions of the same query, while S3 is able to expand the set of suggestions to the entities: Divine Comedy, Dante Falconeri, Italian battleship Dante Alighieri, Inferno (Dante), Ludovico Ariosto, S´andor Pet˜ofi, Petrarch, Convivio with an average relatedness of 0.48 (SS, 0.43) and a diversity of 0.40 (SS, 0.10).
SS Suggestions S3 Suggestions dante banquet Divine Comedy dante boska komedia Dante Falconeri
dante paradiso Italian battleship Dante Alighieri dante kupferstich Inferno (Dante)
dante’s divina Ludovico Ariosto dante divine comedy S´andor Pet˜ofi
dante ali Petrarch
dante alle Convivio
3.8. SUMMARY 89
3.8
Summary
In this Chapter we propose an analysis of a large query log coming from a digital library. We reused the concepts of session identification, time series analysis, query chains and task based search when analyzing the Europeana logs. To the best of our knowledge, this is first analysis of the user interaction with a cultural heritage retrieval system.
Our analysis highlights some significative differences between the Euro- peana query log and the historical data collected by general purpose Web Search Engine logs. In particular, we find out that both query and search session distributions show different behaviors. Such phenomenon could be explained by looking at the characteristics of Europeana users, which are typi- cally more skilled than generic Web users and, thus, they are capable of taking advantage of the Europeana portal features to conduct more complex search sessions.
For this reason, we believe that interesting knowledge can be extracted from Europeana query log in order to build advanced assistance functionalities, such as query recommendation. In fact, we investigated the integration of a state- of-the-art algorithm into the Europeana portal.
We then explored the use of entities extracted from a query log to enhance query recommendations. We presented our technique and we assessed its per- formance by using a manually annotated dataset that has been made available for download to favor the repeatability of experiments. The quality of sugges- tions generated has been measured by means of two novel evaluation metrics that measure semantic relatedness and diversity.
Chapter 4
Learning Relatedness Measures
for Entity Linking
4.1
Introduction
Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowledge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant entities selected for annotation, since this minimizes errors in disambiguating entity-linking.
The definition of an effective relatedness function is thus a crucial point in any entity-linking algorithm. In this Chapter we address the problem of learn- ing high-quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of different state-of-the- art entity-linking algorithms.
A typical entity linking system performs this task in two steps: spotting and disambiguation. The spotting process identifies a set of candidate spots in the input document, and produces a list of candidate entities for each spot. Then, the disambiguation process selects the most relevant spots and the most likely entities among the candidates. The spotting step exploits a given catalog of named entities, or some knowledge base, to devise the possible mentions of entities occurring in the input.
92 CHAPTER 4. LEARNING RELATEDNESS MEASURES FOR ENTITY LINKING
works:
On July 20, 1969, the Apollo 11 astronauts - Neil Armstrong, Michael Collins, and Edwin “Buzz” Aldrin Jr. - realized President Kennedy’s dream.
The text “President Kennedy” can be easily spotted and linked to John F. Kennedy, since in Wikipedia there are 98 anchors exactly matching such fragment of text and linking to the U.S. president page. In addition, the text “Apollo 11” may refer to two distinct candidates: the famous spaceflight mis- sion, or a 1996 film directed by Norberto Barba. Similarly, the text “Michael Collins” may refer to either the well known astronaut, or to the Irish leader and president of the Irish provisional government in 1922. Indeed, mentions to the latter (408) are much more frequent than those to the former (141)1.
The above spots and the relative candidate entities are further processed during the disambiguation step. The goal of disambiguation is twofold. First, only relevant spots have to be filtered. For instance, the word “the” may refer to the entity associated with the definite article, but this linking might be relevant only for documents discussing the English grammar. Second, the best candidate entity for each spot has to be selected. This is usually done by considering the context of close mentions and by maximizing some measure of relatedness among the linked entities [57, 50, 69, 110, 127]. In our example, the astronaut “Michael Collins” and the “Apollo 11” spaceflight mission entities are preferred since they are clearly strongly related to each other and to the other entities found in the document, i.e., Buzz Aldrin and John F. Kennedy. The effectiveness of the entity relatedness function adopted is thus a key- point for the accuracy of any entity-linking algorithm. In this work we inves- tigate to which extent a machine learning approach can be exploited to devise a high-quality entity relatedness function. The main contributions presented in this Chapter are:
• a formalization of the problem of devising high-quality entity relatedness functions as a learning-to-rank problem;
• a novel technique to build benchmark datasets for learning and testing entity relatedness functions;
• an extensive experimentation showing that our automatically learned function outperforms state-of-the-art relatedness functions. More im- portantly, our approach can improve the performance of a whole class of entity-linking algorithms;
1Throughout this chapter, we use the 04/03/2013 dump, available at http://dumps.