• No results found

Data-Centric Systems and Applications

N/A
N/A
Protected

Academic year: 2021

Share "Data-Centric Systems and Applications"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Data-Centric Systems and Applications

Series Editors

M.J. Carey

S. Ceri

Editorial Board

P. Bernstein

U. Dayal

C. Faloutsos

J.C. Freytag

G. Gardarin

W. Jonker

V. Krishnamurthy

M.-A. Neimat

P. Valduriez

G. Weikum

K.-Y. Whang

J. Widom

For further volumes:

(2)
(3)

Roberto De Virgilio

Francesco Guerra

Yannis Velegrakis

Editors

Semantic Search

over the Web

(4)

Editors

Roberto De Virgilio

Department of informatics and Automation University Roma Tre

Rome Italy Yannis Velegrakis University of Trento Trento Italy Francesco Guerra

University of Modena and Reggio Emilia Modena

Italy

ISBN 978-3-642-25007-1 ISBN 978-3-642-25008-8 (eBook) DOI 10.1007/978-3-642-25008-8

Springer Heidelberg New York Dordrecht London Library of Congress Control Number: 2012943692 ACM Computing Classification: H.3, I.2

c

Springer-Verlag Berlin Heidelberg 2012

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Printed on acid-free paper

(5)

Introduction

The Web has become the world’s largest database with search being the main tool that enables organization and individuals to exploit its huge amounts of information that is freely offering. Thus, having a successful mechanism for finding and retrieving the most relevant information to a task at hand is of major importance. Traditionally, Web search has been based on textual and structural similarity. Given the set of keywords that comprise a query, the goal is to identify the documents containing all these keywords (or as many as possible). Additional information such as information from logs, references from authorities, popularity, and personalization has been extensively used to further improve the accuracy. However, one of the dimensions that has not been captured to its full extent is that of semantics, that is, fully understanding the meaning of the words in a query and in a document. Combining search and semantics gives birth to the idea of the semantic search. Semantic search can be described in a sentence as the effort of improving the accuracy of the search process by understanding the context and limiting the ambiguity.

The idea of the semantic Web is based on this goal and aims at making the semantics of the Web content machine understandable. To do so, a number of different technologies that allowed for richer modeling of the Web resources, along-side annotations describing their semantics, have been introduced. Furthermore, the semantic Web went on to create associations between different representations of the same real-world entity. These associations are either explicitly specified or derived off-line and then remain static. They allow data from many different sources to be interlinked, giving birth to the so-called linked open data cloud. Nevertheless, semantics have yet to fully penetrate existing data management solutions and become an integral part in information retrieval, analysis, integration, and data exchange techniques.

Unfortunately, the generic idea of semantic search has remained in its infancy. Existing solutions are either search engines that simply index the semantic Web data, like Sindice, or the traditional search engines enhanced with some basic form of synonym exploitation, as supported by Google and Bing. Semantic search is about using the semantics of the query terms instead of the terms themselves. This means

(6)

vi Introduction

using synonyms and related terms, providing additional materials in the answer that may be related to elements already in the result, searching not only in the content but also in the semantic annotations of the data, exploiting ontological knowledge through advanced reasoning techniques, treating the query as a natural language expression, clustering the results, offering faced browsing, etc.

All the above mean that there are currently numerous opportunities to exploit in the area of semantic search on the Web. In this work, we try to give a generic overview of the works that have been done in the field and in other related areas. However, the work should definitely not be considered as a survey. It is simply intended to provide the reader with a taste of the many different aspects of the problem and go deep in some specific technologies and solutions.

The book is divided into three parts. The first part introduces the notion of the Web of Data. It describes the different types of data that exist, their topology, and their storing and indexing techniques. It also shows how semantic links between the data can be automatically derived.

The second part is dedicated specifically to Web search. It presents different kinds of search, such as the exploratory or the path-oriented, alongside methods for efficiently implementing them. It talks about the problem of interactive query construction and also about the understanding of the keyword query semantics. Other topics include the use of uncertainty in query answering or the exploitation of ontologies. The second part concludes with some reference to Mashup technologies and the way they are affected by the semantics.

The theme of the third part of the book is Linked Data and, more specifically, how recommender system ideas can be used in the case of linked data management alongside techniques for efficient query answering.

Rome, Italy Roberto De Virgilio

Modena, Italy Francesco Guerra

(7)

Contents

Part I Introduction to Web of Data

1 Topology of the Web of Data. . . 3

Christian Bizer, Pablo N. Mendes, and Anja Jentzsch

2 Storing and Indexing Massive RDF Datasets. . . 31

Yongming Luo, Franc¸ois Picalausa, George H.L. Fletcher, Jan Hidders, and Stijn Vansummeren

3 Designing Exploratory Search Applications upon Web

Data Sources. . . 61

Marco Brambilla and Stefano Ceri

Part II Search over the Web

4 Path-Oriented Keyword Search Query over RDF. . . 81

Roberto De Virgilio, Paolo Cappellari, Antonio Maccioni, and Riccardo Torlone

5 Interactive Query Construction for Keyword Search on

the Semantic Web. . . 109

Gideon Zenz, Xuan Zhou, Enrico Minack, Wolf Siberski, and Wolfgang Nejdl

6 Understanding the Semantics of Keyword Queries on

Relational Data Without Accessing the Instance. . . 131

Sonia Bergamaschi, Elton Domnori, Francesco Guerra, Silvia Rota, Raquel Trillo Lado, and Yannis Velegrakis

7 Keyword-Based Search over Semantic Data . . . 159

Klara Weiand, Andreas Hartl, Steffen Hausmann, Tim Furche, and Franc¸ois Bry

(8)

viii Contents

8 Semantic Link Discovery over Relational Data. . . 193

Oktie Hassanzadeh, Anastasios Kementsietsidis, Lipyeow Lim, Ren´ee J. Miller, and Min Wang

9 Embracing Uncertainty in Entity Linking. . . 225

Ekaterini Ioannou, Wolfgang Nejdl, Claudia Nieder´ee, and Yannis Velegrakis

10 The Return of the Entity-Relationship Model: Ontological

Query Answering. . . 255

Andrea Cal`ı, Georg Gottlob, and Andreas Pieris

11 Linked Data Services and Semantics-Enabled Mashup. . . 283

Devis Bianchini and Valeria De Antonellis

Part III Linked Data Search Engines

12 A Recommender System for Linked Data. . . 311

Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, and Eugenio Di Sciascio

13 Flint: From Web Pages to Probabilistic Semantic Data.. . . 333

Lorenzo Blanco, Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, and Paolo Papotti

14 Searching and Browsing Linked Data with SWSE. . . 361

Andreas Harth, Aidan Hogan, J¨urgen Umbrich, Sheila Kinsella, Axel Polleres, and Stefan Decker

(9)

Contributors

Sonia Bergamaschi Dipartimento di Ingegneria dell’Informazione, Universit`a di Modena e Reggio Emilia, Modena, Italy

Devis Bianchini Department of Electronics for Automation, University of Brescia, Brescia, Italy

Christian Bizer Web-based Systems Group, Freie Universit¨at Berlin, Berlin, Germany

Lorenzo Blanco Dipartimento di Informatica e Automazione, Universit`a degli Studi Roma Tre, Rome, Italy

Marco Brambilla Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy

Mirko Bronzi Dipartimento di Informatica e Automazione, Universit`a degli Studi Roma Tre, Rome, Italy

Franc¸ois Bry Institute for Informatics, University of Munich, M¨unchen, Germany Andrea Cal`ı Department of Computer Science and Information Systems, Birkbeck University of London, London, UK

Paolo Cappellari Interoperable System Group, Dublin City University, Dublin, Ireland

Stefano Ceri Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy

Valter Crescenzi Dipartimento di Informatica e Automazione, Universit`a degli Studi Roma Tre, Rome, Italy

Valeria De Antonellis Department of Electronics for Automation, University of Brescia, Brescia, Italy

Roberto De Virgilio University Roma Tre, Rome, Italy

(10)

x Contributors

Tommaso Di Noia Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, Bari, Italy

Eugenio Di Sciascio Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, Bari, Italy

Stefan Decker Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland

Elton Domnori Dipartimento di Ingegneria dell’Informazione, Universit`a di Modena e Reggio Emilia, Modena, Italy

George H.L. Fletcher Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands

Tim Furche Department of Computer Science and Institute for the Future of Computing, Oxford University, Oxford, UK

Georg Gottlob Computing Laboratory, University of Oxford, Oxford, UK Oxford-Man Institute of Quantitative Finance, University of Oxford, Oxford, UK Francesco Guerra Dipartimento di Economia Aziendale, Universit`a di Modena e Reggio Emilia, Modena, Italy

Andreas Harth Karlsruhe Institute of Technology, Institute AIFB, Karlsruhe, Germany

Andreas Hartl Institute for Informatics, University of Munich, M¨unchen, Germany

Oktie Hassanzadeh University of Toronto, Toronto, Ontario, Canada

Steffen Hausmann Institute for Informatics, University of Munich, M¨unchen, Germany

Jan Hidders Faculty of Electrical Engineering Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands

Aidan Hogan Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland

Ekaterini Ioannou University Campus – Kounoupidiana, Technical University of Crete, Chania, Greece

Anja Jentzsch Web-based Systems Group, Freie Universit¨at Berlin, Berlin, Germany

Anastasios Kementsietsidis IBM T.J. Watson Research Center, Hawthorne, NY, USA

Sheila Kinsella Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland

(11)

Contributors xi

Lipyeow Lim University of Hawaii at Manoa, Honolulu, HI, USA

Yongming Luo Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands

Antonio Maccioni University Roma Tre, Rome, Italy

Pablo N. Mendes Web-based Systems Group, Freie Universit¨at Berlin, Berlin, Germany

Paolo Merialdo Dipartimento di Informatica e Automazione, Universit`a degli Studi Roma Tre, Rome, Italy

Ren´ee J. Miller University of Toronto, Toronto, Ontario, Canada Enrico Minack L3S Research Center, Hannover, Germany

Roberto Mirizzi Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, Bari, Italy

Wolfgang Nejdl L3S Research Center, Hannover, Germany Claudia Nieder´ee L3S Research Center, Hannover, Germany

Paolo Papotti Dipartimento di Informatica e Automazione, Universit`a degli Studi Roma Tre, Rome, Italy

Franc¸ois Picalausa Universit´e Libre de Bruxelles, Brussels, Belgium

Andreas Pieris Department of Computer Science, University of Oxford, Oxford, UK

Axel Polleres Siemens AG ¨Osterreich, Vienna, Austria

Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland

Azzurra Ragone Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, Bari, Italy

Exprivia S.p.A., Molfetta, BA, Italy

Silvia Rota Dipartimento di Ingegneria dell’Informazione, Universit`a di Modena e Reggio Emilia, Modena, Italy

Wolf Siberski L3S Research Center, Hannover, Germany Riccardo Torlone University Roma Tre, Rome, Italy Raquel Trillo Informatica e Ing. Sistemas, Zaragoza, Spain

J ¨urgen Umbrich Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland

(12)

xii Contributors

Yannis Velegrakis University of Trento, Trento, Italy Min Wang HP Labs China, Beijing, China

Klara Weiand Institute for Informatics, University of Munich, M¨unchen, Germany

Gideon Zenz L3S Research Center, Hannover, Germany Xuan Zhou Renmin University of China, Beijing, China

References

Related documents

In cases in which the cyst is trapped cerebrospinal fluid or intraparenchymal in location, the cyst wall adjacent to or within the brain parenchyma is

Notice that our system labels 60% of the data points of the spam dataset with 80.4% ac- curacy; 40% of the data points of the sentiment dataset with 85.4% accuracy; and 20% of the

Patients with myocardial infarction (MI) experience higher levels of anxiety than the general population [9], and there is growing evidence that higher levels of anxiety increase

 He  got  his  start   working  with  Peter  Cardew,  whom  he  says  gave     him  “a  great  introduction  into  detailing  and  express-­ ing  architecture  in

The cell e.s.d.'s are taken into account individually in the estimation of e.s.d.'s in distances, angles and torsion angles; correlations between e.s.d.'s in cell parameters are

Using a 3D immersive virtual environment system to enhance social understanding and social skills for children with autism spectrum disorders. Public school–based interventions

Hypotesen om att lågpresterare på attributionsdelen av heuristiktestet antogs handla oftare på Internet än högpresterare fick inte stöd eftersom inga signifikanta

A recent preliminary report from this laboratory has shown that while certain members of the families of acatalasemic subjects have normal catalase activity in the pe- ripheral