• No results found

Chapter 2 Background and Literature Review

2.2 Linked Data (LD), Ontology and the Semantic Web (SW)

The conventional Web has met its objectives as a global document repository that can be easily accessed and consumed by human beings. However, the goal of constructing a document repository that is processed by not only humans but also machines has not been realised. Hence, the SW was introduced as an extension of the current Web, providing the technical capabilities to publish data in a machine-understandable format [34]–[37].

The ultimate vision of the SW is to provide a common framework that intertwines data and its semantics in one package. Furthermore, it facilitates data sharing and reuse across different applications. Thus, the essential step that must be materialised to meet the SW vision is the development of a technique for publishing and connecting data over the Web. This technique is the backbone of the SW and lies at the heart of its architecture. Accordingly, the concept of linked data (LD) was introduced in response to this need [38]–[40]. Figure 2-4 (below) depicts the architecture of the SW.

Page | 18

Figure 2-4 The SW reference architecture [41]

Linked data (LD) was introduced by Sir Tim Berners-Lee in 2006 and defined as the set of best practices for publishing and connecting structured data on the Web. These practices are currently known as LD principles. They can be summarised in the following points [42].

a. Uniform resource identifiers (URIs), used to name things.

b. They use Hypertext Transfer Protocol (HTTP) and URIs, so that people can look up names.

c. When someone looks up a URI, they provide useful information, using the standard resource description framework (RDF) and the simple protocol & RDF query language (SPARQL).

d. They include links to other URIs, so that they can discover more things.

The technical foundation of the LD concept is the use of HTTP and URI, not only to access Web documents but also to define and access real-world entities. Additionally,

Page | 19 RDF is used to represent the defined entities [43]. ]. In fact, RDF is the basic building- block for information representation in the SW.

The structure of the RDF statement is built based on a triple format, that is, subject– predicate–object, which forms a graph-based data model. Furthermore, RDF is enriched by the RDF schema (RDFS for short). RDFS is utilised to define the vocabularies used in a specific RDF data model. It uses a sub-class, sub-property relationship, in addition to domain and range restrictions, to describe the relationship between objects and to identify which property applies to which object. In contrast to RDF and RDFS, the Web Ontology Language (OWL) has added more vocabularies to describe properties and classes, leading to better machine readability of the Web content [42], [44], [45].

The word “ontology” is rooted in philosophy. Philosophers used ontology to refer to the study of the nature of existence. Nevertheless, the concept of ontology has been hijacked by the computer science field and used in a different sense. In computer science literature, ontology is used to provide a shared understanding of domain knowledge and treated as a special kind of information object or computational artefact. It has been defined as “formal, explicit specification of shared conceptualization”. The terms used in the ontology definition could be further explained as follows [9], [36], [37], [46]:

a. Formal indicates that ontology must be machine-readable.

b. Explicit refers to the fact that the types of concept, and the constraints on their use, are explicitly defined.

c. Shared means that the notation captured by ontology must reflect a consensual knowledge that is accepted by a group, and not a private vision of certain individuals.

d. Conceptualisation refers to an abstract model of some phenomenon in the world, having identified the relevant concepts of the phenomenon.

A number of researchers hold the view that computational ontology has been used to formally represent the conceptual structure of a domain. Different entities in the targeted domain and their relations can be encoded in the term ontology [9], [47], [48]. Figure 2-5 (below) shows an example of how ontology is used to represent a publication domain.

Page | 20

Figure 2-5 Ontology RDF triple relation [44]

According to [44], the structure of ontology is defined as tuple consisting of the following elements: S = (C, R, H, rel, A) [44], where:

 C is the set of entities describing the conceptual structure for the targeted system;  R is the set of relation types;

 H is the set of taxonomy relationship of C;

 rel is the set of relationship of C with relation of type R, where rel ⊆C X C;  A is the set of description logic sentences.

Furthermore, rel has been further defined as a set of three tuple relations, that is, rel = (s, r, o), which represents the subject–relation–object relationship, where:

 s is the subject, which is an element from C;  r is the relation, which is an element from R;  o is the object, which is an element of C.

To further illustrate the structure of ontology and to gain a better understanding of its basic components, the following example converts the ontology model given in Figure 2-5 (above) into its equivalent RDF statements.

Example: the RDF triples for the ontology model of the publication system (Figure 2-5).  hasAuthor(article001,person002).

 hasTitle(articale001,”Science of nature).  hasName(person002,”John Ken).

The above RDF statements can be serialised in RDF/XML syntax, as shown in Figure 2-6 (below).

Page | 21 It has been reported that ontology can be broadly classified into two categories, namely, lightweight and heavyweight ontologies. The lightweight ontology is mainly concerned with taxonomies and includes concepts, concept taxonomies, and the relationship between concepts and properties that describe concepts. On the other hand, heavyweight ontology addresses the domain in a deeper way and provides more restrictions by adding axioms and constraints to the lightweight ontology. Figure 2-7 (below) categorises ontology based on its structural complexity. It follows a line where ontologies move from lightweight to heavyweight [46], [49].

Figure 2-7 McGuiness Ontology Spectrum Classification [49]

The process of ontology development and management is not a trivial process; nor can it be implemented in a simple way. In fact, the objectives of the constructed ontologies and the development methodology must be critically assessed and closely pursued by the organisation or the individual pushing for their creation. The field that studies the methods and tools for ontology development and maintenance is known as ontology engineering. Ontology engineering refers to the set of activities that deal with the ontology development process, which includes the ontology life cycle, methodologies, tools and languages for building ontologies. It has been advised to implement the following activities during the ontology development process [9], [46].

a. Ontology management activities include scheduling, control and quality assurance. b. Ontology-development-oriented activities are grouped into pre-development,

development and post-development.

c. Ontology support activities include knowledge acquisition, evaluation, integration, merging, alignment, documentation and configuration management.

Page | 22

Figure 2-8 Ontology development process [46]

Having defined what ontology is and explained its structure and how it can be used to represent domain knowledge, the next section discusses the integration of ontology in the DM techniques.