• No results found

Resource Description Framework

Contents Abstract

Chapter 1. Introduction

6. Knowledge Exploration: We propose two different systems to explore RDF knowledge base, in order to learn more about query results, an entity

2.1. Resource Description Framework

RDF is a standard model endorsed by the W3C Consortium to represent infor- mation on the Web [71]. It was originally designed to represent metadata about Web resources, such as the author, title, date of creation, etc. However, RDF is not restricted to ”Web resources” only, and today it is used to represent various information about many different types of resources including people, organi- zations, products, books, movies and so on.

Using RDF, information about the described resources can be represented in a common standard format so that this information can be easily exchanged among different applications without a loss of meaning. In RDF, there are three types of identifiers that can be used to describe information: URIs, literals and blank nodes. We explain each one separately.

URIs. A URI (Uniform Resource Identifier) is a string of characters used to

identify a resource on the Internet. In RDF terminology, a URI is a unique iden- tifier used to identify a single resource. For example, the director Woody Allen can be identified using the URI

http://en.wikipedia.org/wiki/Woody_Allen

Note that there might exist more than one URI to identify the same resource. For example, Woody Allen can also be identified using the URI

http://www.imdb.com/name/nm0000095/

Also note that URIs in RDF do not necessarily correspond to a Web address or URL. For example, the URI

http://www.knowledgebase.com/Woody_Allen

which does not physically exist in the Internet, can also be used to identify the director Woody Allen.

To simplify reference to resources, RDF is equipped with namespaces. A namespace is an abbreviation for the prefix of a URI. For example, the names- pacewcan be used to abbreviate the URI prefix

http://en.wikipedia.org/wiki/

In such case, Woody Allen can be identified by the identifier w:Woody Allen which is a shorthand for

http://en.wikipedia.org/wiki/Woody_Allen

For the sake of readability, we will omit the namespaces when referring to a resource and we assume that given a URI suffix, the full URI can be uniquely resolved.

Literals. A literal is a string representation of a certain value. For example, the string "09.03.1981" is a string representation of the 9th day of March of the

year 1981. RDF consists of two types of literals: plain literals and typed literals. Plain literals have a lexical form and optionally a language tag whereas typed literals have a lexical form and a data type that describes the type of the value they represent. For example, the literal "09.03.1981" has type date, and the

literal"9"has type Integer, and so on.

Blank Nodes. A blank node represents a resource whose URI is not known or

is irrelevant. The resource represented by a blank node is also called an anony- mous resource.

2.1. Resource Description Framework

SPO Triples. Given these three types of identifiers, URIs, literals and blank

nodes, RDF can be used to encode all sorts of information on the Web. In RDF, information is represented in the form of statements. An RDF statement is a triple consisting of three fields: a subject, a predicate and an object. These triples are typically referred to as SPO triples.

Definition 2.1 : Triple

Let U be the infinite set of all possible URIs, L be the infinite set of all possible literals and B be the infinite set of all possible blank nodes. Furthermore, assume that the sets U, L and B are pairwise disjoint. An SPO triple t = (s, p, o) is a 3−tuple such that s∈ U ∪ B, p ∈ U and o ∈ U ∪ L ∪ B.

In other words, an SPO triple consists of three components: • the subject, which is an RDF URI or a blank node, • the predicate, which is an RDF URI, and

• the object, which is an RDF URI, a literal or a blank node.

An SPO triple is conventionally written in the order subject, predicate, object. The predicate is also known as the property of the triple.

RDF uses SPO triples to represent information about resources. For example, assume we want to represent the information that Woody Allen is the director of the movie Annie Hall. This information can be expressed using the following SPO triple:

Woody Allen directed Annie Hall

where the subjectWoody Allenis the URI of the director Woody Allen, the object Annie Hall is the URI of the movie Annie Hall and the predicatedirectedis

the URI of the relation ”is the director of”.

As mentioned earlier, subjects and objects are not restricted to URIs. For in- stance, the information that Woody Allen’s first name is ”Woody” and last name is ”Allen” can be expressed using the following two SPO triples:

Woody Allen hasFirstName "Woody" Woody Allen hasLastName "Allen"

where the URIshasFirstNameand URIshasLastNamereference the properties

”first name” and ”last name” respectively, and "Woody"and "Allen"are two

Subject (S) Predicate (P) Object (O)

Woody Allen directed Match Point Woody Allen directed Hollywood Ending Woody Allen actedIn Hollywood Ending Woody Allen hasWonPrize Academy Award Woody Allen hasWonPrize BAFTA Award Scarlett Johansson actedIn Match Point

Tea Leoni actedIn Hollywood Ending

Match Point type English Movie

Hollywood Ending type English Movie Vicky Cristina Barcelona type English Movie

Table 2.1.: A small RDF knowledge base

Figure 2.1.: An RDF graph corresponding to the knowledge base in Table 2.1

2.2. RDF Knowledge Bases

Definition 2.2 : RDF Knowledge Base