Chapter 2 Related Work
2.5 Generating Paths in Data Graphs
In this Section, we will review different approaches used to generate paths in information spaces to justify the contribution of our approach for generating exploration paths in data graphs for knowledge expansion using knowledge anchors.
The problem of generating exploration paths in information spaces has been tackled by several research fields such as education, recommending systems, e-Learning, and data graphs. In its simplest definition, a path is an alternating sequence of nodes and links in an information space, often represented as a sequence of just nodes [119]. An information space can be seen as heterogenous graph containing different types of edges and vertices [120]. Approaches for generating paths in information spaces can be grouped into two categories. The first category concerns in generating paths from a given start entity (e.g. search keyword) and the Second category focuses on generating paths using two or more entities in the information space (e.g. find the relationships between two actors in the movie domain or find relationships between combination drug therapy regimens commonly used to treat a particular disease). However, most attention has been paid to the later type and aimed to discover connections (“associations”) between entities by exploring possible paths that link the two entities.
In virtual environments, an approach for generating navigation paths for virtual tour guides been proposed in [121]. The hypothesis was virtual navigation paths can help the users to familiarise themselves with the virtual environment and understand the meanings of its virtual objects. A virtual navigation path is created by linking several navigation landmarks (i.e. objects in the virtual environment) in a well-defined order. The selection of landmarks can be in a freehand-mode where the landmarks are freely selected by the path designer, or via a grid-mode where the path designer specifies the navigation landmarks by selecting a specific sphere of a grid layout. The transition from one landmark to another one will be done by means shortest path.
In recommending systems, path construction has been used to provide serendipitous recommendations between two entities. For example, the work in [122] aimed to generate serendipitous recommendations for the users using mobile applications installed on the user’s phone. The main intuition behind the method is that, if there exists a path connecting two applications on a user’s phone, then the applications though this path which are not already downloaded by the user’s mobile, are good candidates for serendipitous recommendations.
In e-Learning, there have been several definitions for learning paths. In [123], a learning path (also called a curriculum sequence) comprises steps for guiding a student to effectively build up knowledge and skills in an online environment. A learning path of good quality is a sequence of course modules arranged in such a way that can satisfy most/all the knowledge requirements of the involved course modules [124]. According to [125], learning path in online learning systems refers to a sequence of learning objects which are designated to help the students in improving their knowledge or skill in particular subjects or degree courses. The work in [126] aimed to construct learning paths that can help individual learners reduce cognitive overload and disorientation. The approach used ontologies as structured knowledge representations to construct personalised learning paths. It generates an ontology- based concept map which clusters sequences of courses, and then use the concepts-map to generate learning path taking into account the order of prior and posterior courses. The experimental results indicated that the proposed approach can produce high quality learning paths that are likely to reduce learners’ cognitive overloads during learning processes.
In education, course selection has been discussed in [127] as a decision problem for students who want to make suitable selections about their future courses. The authors presented CourseNavigator, a course exploration system that identifies possible course selection options, referred to as learning paths, for the students to meet their educational goals. For this, the authors use a learning graph, a directed graph which encodes constraints such as class scheduling and course requirements. Accordingly, the learning graph offers many options for students to follow. To address this challenge, CourseNavigator identifies all possible learning paths for s students, and then allows the student to control his/her learning path through a ranking function (e.g. find shortest path, find most reliable path).
In data graphs, the notion of path queries has been used in [128] for constructing paths in graph databases. The work used queries based on regular expressions (i.e. regular words) used to indicate the start and end entities of paths in the graph. For example, in a geographical graph database representing neighbourhoods (i.e. places) as entities and transport facilities (e.g. Bus, Tram) as edges. Then a user who writes a simple query such as “I need to go from Place a to Place b”, then the user will be provided with different transportations facilities
going through different routes (paths) starting from Place a to reach the destination Place b. Another notion that is widely used in the context of data graph is property path. The notion of property path (also called property sequence) was used by the W3C to define possible routes between two entities in a data graph. A trivial case of a property path is a triple pattern (i.e. property path of length 1). Property paths have been used in [129] to capture paths in RDF data graph as a sequence of directed edges (i.e. properties). These paths have been used to identify associations between entities in data graphs. An association from entity a to entity b comprises the labels of the entities and edges [7]. However, in data graphs there are usually high numbers of associations (i.e. possible paths) between entities, and hence ways to refine and filter available paths. The authors in [7] aimed to address this challenge and presented a system called Explass29 for recommending patterns (i.e. paths) between two entities. Explass identifies patterns as a sequence of classes and relationships (edges) used for recommending exploration paths. To suggest suitable patterns, the authors use the frequency of a pattern to reflect its relevance to the query and use informativeness of classes and relationships in the pattern to indicate the Informativeness of the pattern (i.e. informativeness of a pattern is obtained by adding the informativeness of all classes and relationships in the pattern). Informativeness is of a class is based on the assumption that a class having fewer instances is more specific and thus more informative. The idea is similar to relationships (i.e. an edge) except that informativeness is considered to the start (Subject) entity and target (Object) entities of a relationship. Furthermore, Explass considers the overlap between patterns, such that patterns that are highly overlapped are redundant and this will not be recommended together (i.e. overlap used to filter the recommended patterns). Relfinder30 [130] is another approach for helping users to get an overview of how two entities are associated together. It reveals all possible paths between the two entities in the RDF graph, which cause high cognitive load for users. Furthermore, one of its usage restrictions for lay users is s that the user must supply valid entry points, a SPARQL query endpoint and the repositories to query. Although a significant amount of work tried to address the problem of supporting users’ exploration through paths in data graphs, a solution that considers guiding layman users through paths that will expand their domain knowledge is still missing. None of the approaches outlined above investigates the user’s familiarity with the domain, which is the main focus of the approach we present in this research for generating exploration paths, where familiarity is related to understanding and knowledge expansion in the domain. Our uniqueness is the explicit consideration of knowledge utility of exploration paths in data
29 http://ws.nju.edu.cn/explass/ 30 http://relfinder.dbpedia.org
graphs. This is crucial for the usability of semantic exploration applications, especially when the users are not domain experts. Knowledge utility approximates, to what extend a user expand his/her domain knowledge, while exploring through a path in a data graph. Furthermore, the above approaches lack applying the subsumption theory for meaning learning [21] in the context of a data graph. In this research, we are operationalizing the notion of basic level objects to identify familiar entities to the user that can be used as knowledge bridges to direct the user from familiar to less familiar entities in the graph. We will apply the subsumption theory for meaningful learning and offer a unique use of knowledge anchors in data graphs to generate exploration paths for knowledge expansion. This can facilitate the adoption of data graph exploration in the learning domain. It can also be useful in other applications to facilitate the exploration by users who are not familiar with the domain presented in the graph.