Semantic web was presented by Berners-Lee et al. in [134] and few years later came up a new revision of its definition [135]. There is a fundamental idea in web semantic,knowledge represen- tation, that is key in this work. The core idea is to represent all the knowledge of the Big Data analytic with semantic web technology (through ontology) and, consequently, we will be able of using reasoners or SPARQL queries in order to fetch this information or deduce new knowledge from them.
Via Semantic web models we define the concepts related to Big Data analytic, below it is described this models and tools that has been used in this work.
• Ontology. Following the definition on [136] and [137], an ontology provides a formal representation of the real world. It defines an explicit description of concepts in a domain of discourse (classes or concepts), properties of each concept describing various features and attributes of the concept (properties) and restrictions on properties, that is to say, ontologies define data models in terms of classes, subclasses, and properties. Ontologies are part of the W3C standard stack of the Semantic Web1. An ontology together with a set of individual
instances of classes constitutes a knowledge base and offer services to facilitate interoperability across multiple, heterogeneous systems and databases.
• RDF. Resource Description Framework [138] is a W3C recommendation that defines a lan- guage for describing resources on the web. Resources are described in terms of properties and property values using RDF statements. Statements are represented as triples, consisting of a subject, predicate and object. RDF Schema (RDFS) [139] describes vocabularies used in RDF descriptions.
• OWL. The Ontology Web Language is used to define ontologies on the Web, which extends RDF and RDFS, but adding a vocabulary. From a formal description, OWL is equivalent to a very expressive description logic DL, where an ontology corresponds to a Tbox [140]. In this sense, OWL-DL is syntactic description that gives maximum expressiveness while retaining computational completeness and decidability [141]. In this work, we use OWL-DL syntax summarized in Table 2.1 to formalize the proposed ontology.
• SPARQL. It is a query language for easy access to RDF stores. It is the query language recommended by W3C [142] to work with RDF graphs [143], then supporting queries and web data sources identified by URIs. Essentially, SPARQL is a graph-matching query language that can be used to extract knowledge from the model.
• SWRL. The Semantic Web Rule Language provides the OWL-based ontologies with pro- cedural knowledge, which compensates for some of the limitations of ontology inference, particularly in identifying semantic relationships between individuals [144]. SWRL uses the typical logic expression “Antecedent⇒Consequent” to represent semantic rules. Both an- tecedent (rule body) and consequent (rule head) can be conjunctions of one or more atoms written as “atom1∧atom2∧ · · · ∧atomn”. Each atom is attached to one or more parameters
represented by a question mark and a variable (e.g.,?x). The most common uses of SWRL include transferring characteristics and inferring the existence of new individuals [145]2.
In this scenario, the concept of Smart Data emerges. It is defined as the result of the process of analysis performed to extract relevant information and knowledge from Big Data, including
1https://www.w3.org/standards/semanticweb/ 2https://www.w3.org/Submission/SWRL/
42 2.5. SEMANTIC WEB
Table 2.1: Basic OWL-DL semantic syntax used to formally define the proposed ontology
Descriptions Abstract Syntax DL Syntax
Operators intersection(C1, C2,· · ·, Cn) C1uC2u · · · uCn
union(C1, C2,· · ·, Cn) C1tC2t · · · uCn
Restrictions
for at least 1 valueV fromC ∃V.C
for all valuesV fromC ∀V.C
R is Symmetric R≡R−
Class Axioms A partial(C1, C2,· · ·, Cn) AvC1uC2u · · · uCn
A complete(C1, C2,· · · , Cn) A≡C1uC2u · · · uCn
context information and using a standardized format. By context we mean all the relevant (meta) information to interpret the analysis results. This will lead to the enforceability of these results and thus facilitating their interpretation, the easy integration with other structured data, the inte- gration of the Big Data analysis system with other systems, the interconnection (in a standardized way, at a lower cost and a higher accuracy and reliability) of third parties algorithms and services, etc.
In this thesis, we use semantic web as the technology that acts as the glue which binds each component of a workflow. Furthermore, we have defined an ontology called BIGOWL [18], where we describe the semantic model for designing Big Data workflow. In addition we have defined SWRL rules and SPARQL queries for accessing the information and identifying semantic relationships. These instances are stored in RDF triple format in a Stardog3 repository, which is a commercial
version of the Pellet OWL 2 reasoner [146], but enhanced with persistence capabilities, as well as the reasoning tasks.
Chapter 3
Methods and Materials
3.1
Intoduction
This Chapter is aimed at presents all the material carried out in this thesis as well as the methods that have been used for this goal. It is worth pointing out that in this thesis work we have presented not only new algorithms proposal, but also software tools. First, we present our Big Data optimization framewok (jMetalSP), its architecture of classes and design features. Second, it is presented the semantic model, BIGOWL ontology and RDF repository. Third, we describe the architecture of the execution platform used in this thesis, then it is described the experimental methodology that we have followed in this PhD Thesis and finally, a description of the software repositories where are hosted all the software developed.