Architecture of the AlLied framework - Recommender Systems based on Linked Data

The design of the AlLied framework was based firstly on the main conceptual ar- chitectures for developing Semantic Web applications, and secondly on the main steps of the recommendation process identified on the systematic literature review conducted in this thesis.

• Conceptual architectures for developing Semantic Web applications: provide a conceptual foundation to create Semantic Web applications as well as other

architectures on the top of them. They were proposed because of the lack of simplifying the development and deployment of Semantic Web applications, which has been an obstacle for real-world adoption of Semantic Web technologies and the Web of Data [88]. Two of the most known conceptual architectures for the Semantic Web are the Semantic Web stack and the conceptual architecture for applications on the Web of Data.

The Semantic Web stack [89] is a layered model representing the architecture of the Semantic Web, which defines relationships among the technologies and languages essential for the Semantic Web. This model may be divided into three layers, a bottom layer that provides the base for the Semantic Web for writing structured data with user-defined vocabularies; a middle layer for the implementation of the Semantic Web core techniques such as ontology languages, and query languages; and a top layer that provides a user interface for applications as well as enhancements to the lower layers though proof validation and trusting operations.

The conceptual architecture for applications on the Web of Data [88] is a component-based, Conceptual architecture for Semantic Web applications, which describes the high-level components that implements the functional- ity that differentiates Semantic Web applications. This architecture contains of seven components: 1) graph access layer: the interface for the application logic to access local or remote data sources; 2) RDF store: the persistent storage or RDF and other graph-based data; 3) Data homogenization service: address the structural, syntactic and semantic heterogeneity of data; 4) data discovery service: implements automatic discovery and retrieval of external data; 5) graph query language: performs graph-based queries on the data in addition to search on unstructured data; 6)graph-based navigation interface: provides a human accessible interface to navigate the graph-based data; and 7)structured data authoring interface: useful to enter new data, edit existing data, and import or export data.

Both conceptual architectures may be divided into three common layers: 1) knowledge base management represented by the bottom layer in the first architecture, and by component 2 in the second architecture; 2) Recommender System Management, represented by the intermediate layer in the first architecture, and by components 1,3,4, and 5 of the second architecture; and 3) user interface and applications layer represented by the top layer of the first architecture and components 6 and 7 of the second architecture.

• The recommendation process: The recommendation process is the set of steps followed by most of the RS studied on the state of the art in order to produce recommendations. In this thesis, the recommendation process was generalized based on the common steps of the RS studied in the SLR presented in 2. This

recommendation process may be decomposed in a sequence of four steps as shown in Figure 3.1.

Figure 3.1. Steps of the recommendation process

These steps represent the tasks that a RS executes to produce recommenda- tions. The first step is intended to generate a set of candidate resources CR that maintain semantic relationships with an initial resource ir. The initial resource may be any object or resource identified with a URI. The semantic relationships may be seen as direct or indirect links between two resources in a Linked Data dataset. The second step sorts the candidate resources generated in the previous step from highest to lowest semantic similarity with the initial resource. In this step, different semantic similarity measures can be used to calculate the semantic similarity between pairs of resources. Up to this point the candidate resources are ranked but a ranked list is too general and does not provide a distinction between the results according to contexts or application domains. For this reason, the third step groups the ranked resources into meaningful clusters or contexts based on hierarchical relationships inferred from the Linked Data. This step may be also swapped with the ranking step, i.e. firstly grouping the candidate resources, and then ranking the candidate resources for each group separately. Finally, the last step presents the results through different interfaces that allow the end-users to visualize the recommendations.

The architecture AlLied framework was designed taking into account the con- ceptual architectures for Semantic Web applications, and the main steps of the recommendation process. The proposed architecture, as shown in Figure 3.2(a), contains the following components: knowledge base, resource generation, resource ranking, results grouping, and presentation. Additionally, these components are located into three main layers according to the layers of the conceptual architectures described before (Figure 3.2(b)): knowledge base, search and discovery, and user interface and applications.

Figure 3.2. Proposed architecture for the AlLied framework (a) and its rela- tionships with the common layers of the conceptual architectures for semantic web applications (b).

3.1.1 Knowledge base management layer

This layer is the “data layer” for the Allied framework, that provides the interfaces needed to access to local or remote Linked Data datasets. It is traversal to the other layers as it is the main data source containing the knowledge about resources and their structural relationships. Additionally, this layer can access to local copies (dumps) of the Linked Data datasets as well as to remote datasets via their end- points. Other types of datasets may be derived from more general datasets for example, matrixes with a sub-set of the data extracted from main datasets (e.g., a matrix containing data about films extracted from DBpedia).

The preferred mechanism for describing Linked Data is the RDF language [88] and the most used Linked Data dataset, as demonstrated in Chapter 2, is DBpedia. DBpedia is one of the most general and complete datasets which was even considered as the hub of the Linked Open Data [90].

3.1.2 Recommender System Management layer

This layer provides the mechanisms for retrieving, searching, discovering and ranking resources based on the data extracted from the knowledge bases. This layer contains four main components: dataset manager, resource generation, results ranking, and results grouping.

Resource generation

This layer aims to generate resources related to an initial resource (resources gen- erally are related to a real-world item e.g., a film, a song, a place etc). This layer receives an initial resource (or a set of initial resources) and generates a set of candidate resources located at a predefined distance from the initial resource. Algorithms in this component may or not rank the candidate resources. The most simple algo- rithm in this component may be a keyword-based search, which extract resources based on the similarity of their names.

Results ranking

This layer mainly ranks (defines which resources are more similar to the initial resource) the candidate resources obtained in the previous layer, based on semantic similarity functions, i.e. the candidate resources generated in the previous layer are sorted according to their semantic distance values with the initial resource.

Results grouping

The Allied framework is based on the DBpedia dataset which is a general purpose source of data. For this reason the results obtained contains an inherent ambiguity due the generality of the data used to produce the recommendations. Moreover, a ranked list of recommendations not always is a good way to show the results, because users may require results arranged according to their subjective needs.

Under these circumstances, the results groping component provides mechanisms to categorize the results obtained from the ranking layer arranging the candidate resources into meaningful clusters or contexts elucidating the wide range of categories they belong to.

3.1.3 User interface and applications layer

This layer provides different interfaces to give access to external applications to the recommendation results. In this way the framework Allied can easily be integrated into other applications consuming resource recommendations.

It may be noted that each layer may contains multiple algorithms (plugins) that can be used alone or integrated with other algorithms to produce recommendations that may be suitable for different requirements of domains and contexts. For example, Figure 3.2 shows the generation component with a set of algorithms {Rc1, Rc2, . . . , Rcn} that generate resources located at a predefined semantic dis-

tance with respect to an initial resource. These algorithms can be integrated with other algorithms of the same layer or the other layers.

Likewise, the algorithms of the ranking component {Rk1, Rk2, . . . , Rkm} may be

integrated with the generation component algorithms in order to produce ranked lists based on the semantic relatedness between each tuple (ic, ci), where ic is the

initial resource and ci is each one of the candidate resources generated by one of

the {Rc1, Rc2, . . . , Rcn} algorithms. In this way, it is possible to produce recom-

mendations based on semantic relationships and to study the application of these algorithms under different contexts and conditions.

In document Recommender Systems based on Linked Data (Page 49-54)