• No results found

Generation Layer

4.3 Implementation

4.3.2 Generation Layer

This layer aims at discovering resources related to a given one through semantic relationships. Given an initial resource (or a set of initial ones) it generates a

4http://www.mpi-inf.mpg.de/yago-naga/yago/ 5https://wordnet.princeton.edu

6http://wiki.dbpedia.org/Datasets#h434-7 7http://dublincore.org/documents/dcmi-terms/

set of candidate resources located at a predefined distance. For this layer, three generators were implemented based on the semantic relationships found on the Linked Data: (i) a transversal generator to study direct and indirect relationships between resources (Resource-Resource) avoiding hierarchical relationships, (ii) a hierarchical generator for indirect relationships between resources through direct relationships between resources and categories (Resource-Category) and between categories (Category-Category), and (iii) a dynamic generator which combines dynamically both types of relationships giving priority to the existing interlinking between resources. These generators use SPARQL [15] queries to navigate the dataset.

Transversal Generator

The transversal generator looks for resources that are directly related to a given initial resource and those found through a third resource (indirect relationships). Its implementation is inspired by dbrec [66].

S E L E C T D I S T I N C T ? cr W H E R E { { < i n U R I > ? p ? cr . } U N I O N { ? cr ? p < i n U R I >. } F I L T E R(i s U R I(? cr) && ? p != < f o r b i d d e n L i n k U R I 1 > && ? p != < f o r b i d d e n L i n k U R I 2 > && ... && ? p != < f o r b i d d e n L i n k U R I n >) . }

Listing 4.1 The SPARQL query to retrieve resources directly linked to the resource <inURI>.

The SPARQL query used to retrieve the resources directly connected with the initial resource is presented in Listing 4.1. In this query <inURI> is the URI of the initial resource, p is the link and cr is each one of the candidate resources to be retrieved. A set of forbidden links can be defined to prevent the algorithm to obtain resources over links pointing to empty nodes (i.e. resources without a URI), literals that are used to identify values such as numbers and dates or nodes that are not desired for the recommendation. In other words, it is a way to limit the

results of the algorithm. For example the resource dbr:Turin contains the link <dbpprop:populationTotal> that points to the integer value 911823. Optionally, a set of allowed links may be added to restrict the set of retrieved resources to those linked with only a set of specific links. In the query of Listing4.1, the forbidden links are limited adding the expression && ?p != <forbiddenLinkURI> for each link. S E L E C T D I S T I N C T ? cr W H E R E { { < i n U R I > ? p ? o . ? o ? p ? cr.} U N I O N{< i n U R I > ? p ? o . ? cr ? p ? o .} U N I O N{ ? o ? p < i n U R I >. ? o ? p ? cr .} U N I O N{ ? o ? p < i n U R I >. ? cr ? p ? o .} F I L T E R(i s U R I(? cr) && i s U R I(? o) && ? p != < f o r b i d d e n L i n k U R I 1 > && ? p != < f o r b i d d e n L i n k U R I 2 > && ... && ? p != < f o r b i d d e n L i n k U R I n >) . }

Listing 4.2 The query to retrieve resources indirectly linked to the resource <inURI>.

The SPARQL query to retrieve resources indirectly connected to the resource <inURI> through a third resource (o) is shown in Listing4.2.

Hierarchical Generator

The hierarchical generator generates a set of candidate resources located at a specified distance in a hierarchy of categories taken from a category tree described in a dataset. The implementation of this module is inspired by the work of Damljanovic et al. [71], which obtains candidate resources by navigating a category tree of the Wikipedia categories.

The hierarchical generator firstly extracts base categories of an initial resource (<inURI>) and then looks for broader categories until a maximum distance (which may be user-defined) is reached. This maximum distance is the hierarchical distance of a broader category from base categories. It is inversely proportional to the level of specificity of a category (i.e. a higher distance means that a category contains

a lower level of specificity). Listing4.3presents the SPARQL query used for the hierarchical generator to obtain base categories of an initial resource (<inURI>). As said before, dcterms:subject is used in Allied because skos:isSubjectOf and skos:subject are deprecated and not employed in DBpedia.

P R E F I X d c t e r m s : < http :// purl . org / dc / t e r m s / >

S E L E C T ? cat W H E R E {

< i n U R I > d c t e r m s : s u b j e c t ? cat. }

Listing 4.3 The SPARQL query to retrieve base categories of the resource <inURI>.

P R E F I X skos : < http :// www . w3 . org / 2 0 0 4 / 0 2 / skos / core # >

P R E F I X rdfs : < http :// www . w3 . org / 2 0 0 0 / 0 1 / rdf - s c h e m a # > S E L E C T ? b r o a d e r C a t W H E R E { {< c a t U R I > skos : b r o a d e r ? b r o a d e r C a t.} U N I O N { ? b r o a d e r C a t skos : n a r r o w e r < c a t U R I >.} ? b r o a d e r C a t rdfs : l a b e l ? c a t e g o r y N a m e . F I L T E R(lang(? c a t e g o r y N a m e) = " en ") . }

Listing 4.4 The SPARQL query to retrieve broader categories of the category <catURI>.

Listing4.4shows the SPARQL query used to recursively extract broader categories for each base category starting from a distance equal to 1 until a maximum distance is reached. In this query, <catURI> is the URI of the sub category and FILTER limits the search for only categories in English language. After extracting categories, this module extracts subcategories for all the broader categories at maximum distance (i.e. it descends one level into the category tree) to increase the possibility of finding more candidate resources. Finally, the algorithm obtains candidate resources for each category (including subcategories). Listing4.5presents the SPARQL query that extracts subcategories of each broader category obtained by recursive application of the query shown in Listing4.4. Listing4.6obtains candidate resources for each category. In this SPARQL query, <catURI> denotes a URI of one of the categories retrieved in previous steps to obtain related candidate resources. As a result, the module creates a “category graph”, including the initial resource, its category tree,

and the candidate resources retrieved for each category. For example, Figure4.4 shows the category graph for the resource Mole Antonelliana.

P R E F I X skos : < http :// www . w3 . org / 2 0 0 4 / 0 2 / skos / core # >

P R E F I X rdfs : < http :// www . w3 . org / 2 0 0 0 / 0 1 / rdf - s c h e m a # > S E L E C T ? s u b C a t W H E R E { {< c a t U R I > skos : n a r r o w e r ? s u b C a t.} U N I O N { ? s u b C a t skos : b r o a d e r < c a t U R I >.} ? s u b C a t rdfs : l a b e l ? c a t e g o r y N a m e. F I L T E R(lang(? c a t e g o r y N a m e) = " en ") . }

Listing 4.5 The SPARQL query to retrieve subcategories of the category <catURI>.

P R E F I X d c t e r m s : < http :// purl . org / dc / t e r m s / >

S E L E C T ? cr W H E R E {

? cr d c t e r m s : s u b j e c t < c a t U R I > . }

Listing 4.6 The SPARQL query to obtain candidate resources of the category <catURI>.

Dynamic Generator

The dynamic generator is a “hybrid” generator, which takes advantage of both the transversal and the hierarchical approaches, giving priority to the existing interlinking between resources, that is, one of the four principles of Linked Data [37]. The innovative algorithm of this generator is explained in Section5.3