Contents Abstract
Chapter 1. Introduction
6. Knowledge Exploration: We propose two different systems to explore RDF knowledge base, in order to learn more about query results, an entity
2.1. Resource Description Framework
RDF is a standard model endorsed by the W3C Consortium to represent infor- mation on the Web [71]. It was originally designed to represent metadata about Web resources, such as the author, title, date of creation, etc. However, RDF is not restricted to ”Web resources” only, and today it is used to represent various information about many different types of resources including people, organi- zations, products, books, movies and so on.
Using RDF, information about the described resources can be represented in a common standard format so that this information can be easily exchanged among different applications without a loss of meaning. In RDF, there are three types of identifiers that can be used to describe information: URIs, literals and blank nodes. We explain each one separately.
URIs. A URI (Uniform Resource Identifier) is a string of characters used to
identify a resource on the Internet. In RDF terminology, a URI is a unique iden- tifier used to identify a single resource. For example, the director Woody Allen can be identified using the URI
http://en.wikipedia.org/wiki/Woody_Allen
Note that there might exist more than one URI to identify the same resource. For example, Woody Allen can also be identified using the URI
http://www.imdb.com/name/nm0000095/
Also note that URIs in RDF do not necessarily correspond to a Web address or URL. For example, the URI
http://www.knowledgebase.com/Woody_Allen
which does not physically exist in the Internet, can also be used to identify the director Woody Allen.
To simplify reference to resources, RDF is equipped with namespaces. A namespace is an abbreviation for the prefix of a URI. For example, the names- pacewcan be used to abbreviate the URI prefix
http://en.wikipedia.org/wiki/
In such case, Woody Allen can be identified by the identifier w:Woody Allen which is a shorthand for
http://en.wikipedia.org/wiki/Woody_Allen
For the sake of readability, we will omit the namespaces when referring to a resource and we assume that given a URI suffix, the full URI can be uniquely resolved.
Literals. A literal is a string representation of a certain value. For example, the string "09.03.1981" is a string representation of the 9th day of March of the
year 1981. RDF consists of two types of literals: plain literals and typed literals. Plain literals have a lexical form and optionally a language tag whereas typed literals have a lexical form and a data type that describes the type of the value they represent. For example, the literal "09.03.1981" has type date, and the
literal"9"has type Integer, and so on.
Blank Nodes. A blank node represents a resource whose URI is not known or
is irrelevant. The resource represented by a blank node is also called an anony- mous resource.
2.1. Resource Description Framework
SPO Triples. Given these three types of identifiers, URIs, literals and blank
nodes, RDF can be used to encode all sorts of information on the Web. In RDF, information is represented in the form of statements. An RDF statement is a triple consisting of three fields: a subject, a predicate and an object. These triples are typically referred to as SPO triples.
Definition 2.1 : Triple
Let U be the infinite set of all possible URIs, L be the infinite set of all possible literals and B be the infinite set of all possible blank nodes. Furthermore, assume that the sets U, L and B are pairwise disjoint. An SPO triple t = (s, p, o) is a 3−tuple such that s∈ U ∪ B, p ∈ U and o ∈ U ∪ L ∪ B.
In other words, an SPO triple consists of three components: • the subject, which is an RDF URI or a blank node, • the predicate, which is an RDF URI, and
• the object, which is an RDF URI, a literal or a blank node.
An SPO triple is conventionally written in the order subject, predicate, object. The predicate is also known as the property of the triple.
RDF uses SPO triples to represent information about resources. For example, assume we want to represent the information that Woody Allen is the director of the movie Annie Hall. This information can be expressed using the following SPO triple:
Woody Allen directed Annie Hall
where the subjectWoody Allenis the URI of the director Woody Allen, the object Annie Hall is the URI of the movie Annie Hall and the predicatedirectedis
the URI of the relation ”is the director of”.
As mentioned earlier, subjects and objects are not restricted to URIs. For in- stance, the information that Woody Allen’s first name is ”Woody” and last name is ”Allen” can be expressed using the following two SPO triples:
Woody Allen hasFirstName "Woody" Woody Allen hasLastName "Allen"
where the URIshasFirstNameand URIshasLastNamereference the properties
”first name” and ”last name” respectively, and "Woody"and "Allen"are two
Subject (S) Predicate (P) Object (O)
Woody Allen directed Match Point Woody Allen directed Hollywood Ending Woody Allen actedIn Hollywood Ending Woody Allen hasWonPrize Academy Award Woody Allen hasWonPrize BAFTA Award Scarlett Johansson actedIn Match Point
Tea Leoni actedIn Hollywood Ending
Match Point type English Movie
Hollywood Ending type English Movie Vicky Cristina Barcelona type English Movie
Table 2.1.: A small RDF knowledge base
Figure 2.1.: An RDF graph corresponding to the knowledge base in Table 2.1
2.2. RDF Knowledge Bases
Definition 2.2 : RDF Knowledge Base