Knowledge Engineering - Knowledge Components and Methods for Policy Propagation in Data Flows

Knowledge Engineering (KE) is a computer science discipline that studies methods and tools for the representation and acquisition of knowledge. In general, KE is concerned with technological and representational aspect but also human factors, making it a discipline situated in a middle ground between computer science, cognitive studies and human-computer studies [Aussenac-Gilles and Gandon (2013)]. KE is at the core of the approach we take on analysing the problem of policy propagation and leads to the choice of focusing on the knowledge components required and the way they are acquired, developed and managed (see the research questions in Section 1.2). KE developed as a discipline from Allan Newell’s 1982 paper “The Knowledge Level” [Newell (1982)], where it was argued that systems can (and should) be designed and modelled independently from their implementation in concrete software artefacts, by distinguishing this representation layer - the so called symbol level, from another more abstract one related to knowledge. The main characteristic of a Knowledge System is to incorporate a large quantity of information, facts (data) or statements (rules), in order to perform knowledge-intensive tasks [Stefik (2014)]. Under this perspective, the general problem is the one of knowledge transfer, meaning the activity of incorporating human knowledge into an intelligent system capable of reasoning over a specific family of problems with the accuracy of an expert in the field. Approaches to the development of such systems are targeted to the collection of relevant information and its organisation in a way that is both cognitively relevant and machine processable. Therefore, and in some sense similar to what happened to software development with Software Engineering [Studer et al. (1998)], the field of Knowledge Engineering emerged as the attempt to formalise and make reproducible the processes, methods, and tools to design and build Knowledge Systems at scale.

The initial strategies for the development of knowledge systems were essentially based on a plain representation of information in the symbolism of an inference engine, for example, production rules, Prolog, or LISP. These first generation systems (and related approaches), while being successful on incorporating the human’s expertise in specific areas, lacked the capacity of being adaptable to different domains. To tackle this problem, further research was centered on reuse, and second generationmethodologies focused on the development of reusable abstracted components to be

2.2. KNOWLEDGE ENGINEERING 23 instantiated in specific domains. In [Clancey (1985)], Clancey made the observation that, while developed using different representation formalisms, many first-generation expert systems could be classified under a common model, called "Heuristic Classification". This led to a consequent shift of the problem from the one of encoding expertise to the one of identifying the knowledge components to be used in the development of these systems (compare [Breuker and Wielinga (1987); Bylander and Chandrasekaran (1987); Motta (1997); Steels (1990)]), and by extension in the definition of generic methodologies for the acquisition and design of knowledge systems based on them. These methodologies were based on the hypothesis that the development of knowledge systems was essentially a model construction problem, therefore methods and tools for the definition, formalisation and reuse of such models were required [Studer et al. (1998)]. For example, the well known KADS methodology [Breuker and Wielinga (1987)] - later on developed in CommonKADS [Schreiber (2000)] - identified three distinct Knowledge layers, the ones of Domain- the capacity of the system to capture the facts relevant to the problem; Inference - the ability of the system to reason upon the domain; and Task - its ability to solve problems and making decisions, for instance by planning or searching. Under this perspective, the notion of Problem Solving Method (PSM) was introduced with the challenge of identifying the generic inference actions and knowledge roles under which it would have been possible to develop domain independent expert systems. Various PSMs can be found in the literature: Propose-and-Revise was targeted to solve problems of parametric design; Cover-and-Differentiate for diagnostic tasks, and so forth [McDermott (1988)]. PSMs are based on the strong interaction problem hypothesis, stating that the properties of domain knowledge are entirely determined by its use [Bylander and Chandrasekaran (1987)]. Therefore, while PSMs were developed as abstractions to be reused in different - but equivalent in terms of task and goal - domains, the actual models resulting from the instantiation of the PSMs could not be transferred to another system working in the same domain but for a different type of goal. With the emergence of the World Wide Web (WWW), this requirement became more evident, as for the first time knowledge acquisition could be performed at a scale that was impossible before, and elements like interdependencies of systems and procedures became a constituent of the Web of Data. In other words, a new requirement for interoperability emerged. As a result, the KE research community - and particularly the research focusing on Knowledge Acquisition - centered its effort on the development of Ontologies as domain relevant but transferable knowledge models. The term Ontology was stolen from Philosophy, where it identifies the study of the essence and being. In [Gruber (1991)] a definition of Ontology is made as a common model, having, on one hand, the property of being an accurate and shared representation of a domain, and on the other hand, the characteristic of being reusable in various applications (also

compare [Gómez-Pérez and Benjamins (1999)]). Researchers in KE in large part espoused the vision of the Semantic Web [Berners-Lee et al. (1998)], where the WWW as a whole is conceived as a large and distributed knowledge system [Schreiber (2013)]. While the WWW provided the infrastructure for building large-scale distributed systems, KE research contributed by providing the background in knowledge modelling necessary for the development of representation languages like the Web Ontology Language (OWL) or the Simple Knowledge Organization System (SKOS) [Schreiber (2013)]. Research in the Semantic Web faced most of the crucial problems of KE, in particular the Knowledge Acquisition one, and made it evident that knowledge models were not only necessary for designing and operationalise systems, but also that they had inherent value as a means to access, browse and query information at scale [Aussenac-Gilles and Gandon (2013)]. Ontology Engineering in the Semantic Web inherited the research tradition of KE and developed sub-areas like Ontology Evolution, or Alignment, and particularly developed the notion of reusable components for instance developing Ontology Design Patterns (ODP) [Gangemi (2005)] and its associated Extreme Design (XD) methodology [Blomqvist et al. (2010); Daga et al. (2010)] for the collaborative development of ontologies also based on knowledge reuse (also compare [Presutti et al. (2012); Suárez-Figueroa et al. (2012)]). In the present work we make extensive use of Semantic Web technologies and specifications Therefore Section 2.4 is dedicated to the basics on RDF, OWL, Linked Data and associated inference capabilities.

An important consequence of the evolution of the World Wide Web (WWW) is related to a number of resources it made available, to be used as background information for knowledge acquisition. While early Knowledge Acquisition (KA) was primarily focused on the elicitation of information from human experts, providing solutions similar to the ones of requirements collection in Software Engineering, for example using Competency Questions (CQ) [Grüninger and Fox (1995); Uschold and Gruninger (1996)], recent approaches focus on the way non ontological resources can be exploited to support ontology design [Villazón-Terrazas (2012)]. The focus on abstraction and reuse did not prevent the development of bottom up approaches to model construction from data [Van Der Vet and Mars (1998)], and semi automatic and supervised techniques to generate ontologies from resources like documents, media objects, folksonomies, Web sites or microblogs (for example compare [Alani et al. (2003); Biemann (2005); Brewster et al. (2002); Fortuna et al. (2006); Maedche and Staab (2000); Navigli et al. (2011)]). This rather difficult task requires to face several problems including terminology extraction [Pazienza et al. (2005)], concept extraction, named entity recognition, relation extraction, including techniques of Knowledge Discovery (KD) [Jicheng et al. (1999); Mladeni´c et al. (2009)]. In the present work, we employ several techniques from the literature to support the activity of model construction. In the next section, we introduce

2.3. FORMAL CONCEPT ANALYSIS (FCA) 25

In document Knowledge Components and Methods for Policy Propagation in Data Flows (Page 45-48)