Knowledge Engineering - Related Research Efforts

Related Research Efforts

2.2 Knowledge Engineering

Many scientific disciplines including Cognitive Sciences (CS)[122] and Artificial Intelli-gence (AI) have been concerned with defining the notion of knowledge. However, there is no single agreed definition of knowledge today. A number of definitions have been formulated such as:

• Knowledge is understanding of a subject area[37]. It includes concepts and facts about that subject area, as well as relations among them and mechanisms for how to combine them to solve problems in that area;

• Knowledge is a fluid mix of data, experience, practice, values, beliefs, standards, context, and expert insight that provides a conceptual arrangement for evaluating and incorporating new data, information and experiences[29];

• Knowledge is richer, more structured and more contextual form of information. It is required to perform complex tasks such as problem-solving, and encompasses such things as experience and expertise[74];

Instead of defining the term knowledge precisely, some researchers [58, 5] focus on knowledge cues. A knowledge cue can be considered as any kind of symbol, pat-tern or artifact that evokes some knowledge in a person’s mind, when viewed or used.

Knowledge cues can be stored on a computer - while knowledge may not.

Knowledge engineering [5, 37, 46] is a field within AI that involves integrating knowledge into computer systems in order to solve complex problems normally requir-ing a high level of human expertise[42]. Currently, it refers to the building, maintaining and development of knowledge-based intelligent systems[73]. The central component of any knowledge based intelligent system is its knowledge base. In order to develop a practical knowledge base, it is necessary to acquire human knowledge (e.g., from hu-man experts or other sources), to understand it properly, to transform it into a form suitable for applying various knowledge representation formalisms and to encode it in the knowledge base using appropriate representation techniques, languages, and tools.

This process is also known as knowledge acquisition.

It has been frequently stated that the problem of knowledge acquisition is ‘the critical bottleneck’ of knowledge based system development [5]. There are many knowledge acquisition (KA) techniques that can be classified into manual and (semi)automated.

Usually, the expert knowledge is acquired through common social science methods such as interviews, questionnaires, and discourse analysis. However, in many cases when a system requires a large knowledge base which should be constantly augmented with new knowledge, the manual techniques are not applicable. Therefore, the trend in knowledge acquisition has turned towards the use of (semi)automated knowledge ac-quisition techniques based on machine learning and qualitative modeling. Recently, the Web 2.0 and social network services (e.g., Facebook, MySpace and LinkedIn) have

19 2.2 Knowledge Engineering

opened the way for the acquisition of the so called ‘collective knowledge’ through the collaborative social tagging of web resources[56].

Once a knowledge base is populated, knowledge can be utilized. Knowledge re-trieval is the inverse process of knowledge acquisition - finding knowledge when it is needed. Retrieved knowledge can serve both humans and intelligent software systems.

Later one can perform reasoning by using knowledge and problem solving strategies to obtain conclusions, inferences and explanations. In the rest of the section I present common knowledge representation techniques and languages.

2.2.1 Knowledge Representation Techniques

Natural languages can express almost everything related to human experience, and hence they are the most powerful knowledge representation technique. However, the use of natural languages for knowledge representation in AI is very restricted, owing to the fact that they are extremely complex for machine processing. Even more important and more difficult is the problem of machine understanding of the meaning of natural languages.

Knowledge representation is the notation or formalism used for encoding knowledge for storage in a knowledge-based system. Different mental representation of the human mind, as proposed by cognitive theories, such as logical propositions, rules, concepts, images and analogies, constitute the basis of different knowledge representation tech-niques[66]. The field of AI has not produced fully intelligent machines but one of its major achievements is the development of a range of techniques for representing knowl-edge, which can be classified into four categories: ladders, semantic networks, tabular representations, and rules.

Ladders are hierarchical (tree-like) diagrams. Important types of ladders are: i) concept ladders, which show classes of concepts and their sub-types and models ’is a’ relationships; ii) composition ladders, which show the way a knowledge object is composed and model ’has-part’ or ’part-of’ relationships; iii) decision ladders, which show the alternative courses of action for a particular decision; iv) attribute ladders, which show attributes and values; and v) process ladders, which show processes (tasks) and the sub-processes (sub-tasks) of which they are composed.

Semantic networks are graphs made up of objects, concepts, and situations in some specific domain of knowledge (the nodes in the graph), connected by some type of re-lationship (the links/arcs). All semantic networks can be represented as collections of Object-Attribut-Value (O-A-V) triplets. O-A-V triplets are a technique used to represent facts about objects/concepts and their attributes. It serves as a basic building block of any kind of semantic network. Examples of semantic networks include concept maps, process maps and state transition networks. Designed after the psychological model of human associative memory, concept maps[5] are graphs made up of concepts from spe-cific domain knowledge, connected by some type of relationship. A process map is a way of representing information of how and when processes and tasks are performed. They

show the inputs, outputs, resources, roles and decisions associated with each process or task in a domain. The third important type of semantic networks is the state transition network. The state transition networks comprise two elements: i) nodes that represent the states that a concept can be in, and ii) arrows between the nodes showing all the events and processes/tasks that can cause transitions from one state to another.

Tabular representations make use of tables or grids for knowledge representation.

The most common and the most often used form of this representation technique are frames. A frame is structure for representing stereotypical knowledge of some concept or object. Frames are similar to classes and objects in object-oriented programming.

Each frame is easy to visualize using a matrix representation. The left-hand column represents the attributes associated with the concept (class) and the right-hand column represents the appropriate values.

Rules are a knowledge representation technique and a structure that relates one or more premises (conditions) or situations to one or more conclusions (consequents) or actions. The premises are contained in the IF part of the rule, and the conclusions are contained in the THEN part, so that the conclusions may be inferred from the premises when the premises are true. Some rules may include certainty factor, a numeric value assigned to both premises and conclusion that represents the degree of belief in them.

The knowledge of a particular knowledge based system may be represented using a number of rules. In such a case, the rules are usually grouped into a hierarchy of rule sets, each set containing rules related to the same topic.

2.2.2 Knowledge Representation Languages

The knowledge base contains a set of sentences - the units of the knowledge repre-sented using one or more knowledge representation techniques, i.e., assertions about the world[113]. The sentences are expressed in a knowledge representation language.

Knowledge representation languages should be capable of both syntactic and seman-tic representation of entities, events, actions, processes, and time. Formal notation for knowledge representation allows inference and problem solving. Moreover, queries can be made to the knowledge base to obtain what the system currently knows about the world. In accordance to the knowledge representation techniques which are described above, AI researchers have developed a number of knowledge representation languages.

Logic-Based Representation Languages: The popularity of formal logics as the basis of the knowledge representation languages arises for practical reasons. They are all formally well founded and are suitable for machine implementation. Also, every formal logic has a clearly defined syntax that determines how sentences are built in the language, a semantics that determines the meanings of sentences, and an inference procedure that determines the sentences that can be derived from other sentences.

Propositional logicis a form of symbolic reasoning that assigns a symbolic variable to a proposition. A proposition is a logical statement that is either true or false. The truth-value of the variable represents the truth of the corresponding statement (the

21 2.2 Knowledge Engineering

proposition). Propositions can be linked by logical operators (AND (∧), OR (∨), NOT (¬), IMPLIES (⇒), and EQUIVALENCE (⇔) to form more complex statements and rules. Propositional logic allows formal and symbolic reasoning with rules, by deriving truth-values of propositions using logical operators and variables.

First-Order logicextends propositional logic by introducing the universal quantifier

∀, and the existential quantifier ∃. It also uses symbols to represent knowledge and log-ical operators to construct statements. Its symbols may represent constants, variables, predicates, and functions. Using predicates, functions, and logical operators, it is pos-sible to specify rules. Reasoning with first order logic is performed using predicates, rules, and general rules of inference to derive conclusions. First-order logic is like an assembly language for knowledge representation[37]. Higher-order logic, modal logic, fuzzy logic, and even neural networks can all be defined in first-order logic.

Description logicis based on two components TBox and ABox. Developing a knowl-edge base using a description logic language means setting up terminology (the vocab-ulary of the application domain) in a part of the knowledge base called the TBox, and assertions about named individuals (using the vocabulary from the TBox) in a part of the knowledge base called the ABox. The vocabulary consists of concepts and roles.

Concepts denote sets of individuals. Roles are binary relationships between individuals.

Frame-Based Representation Languages: In all frame-based representation lan-guages, the central principle is a notation based on the specification of frames (concepts and classes), their instances (objects and individuals), their properties, and their rela-tionships to each other [134]. Frame-based languages are suitable for expressing gen-eralization/specialization, i.e., organizing concepts into hierarchies. They also enable reasoning, by making it possible to state in a formal way that the existence of some piece of knowledge implies the existence of some other, previously unknown piece of knowledge. With frame-based languages, it is possible to make classifications, that is, concepts are defined in an abstract way and objects can be tested to see whether they fit such abstract descriptions.

Rule-Based Representation Languages: Rule-based representation languages are popular in commercial AI appliactions, such as expert systems [37]. Every rule-based language has an appropriate syntax for representing the If-Then structure of rules. Vianu [129] notes that there are two broad categories of rule-based languages: declarative languages, which attempt to provide declarative semantics for programs, and produc-tion system languages, which provide procedural semantics based on forward chaining of rules. The rule-based representation formalism is recognized as an important topic not only in AI, but also in many other branches of computing. This is especially true for Web engineering. Rules are one of the core design issues for future Web develop-ment, and are considered central to the task of document generation from a central XML repository. In response to such practical demands from the world of the Web, the Rule Markup Initiative (RMI) has taken steps towards defining RuleML, a shared Rule Markup Language[111]. RuleML enables the encoding various kinds of rules in XML for

deduction, rewriting, and further inferential-transformational tasks. The Rule Markup initiative now covers a number of new developments, including Java-based rule engines, an RDF-only version of RuleML, and MOF-RuleML.

In document Semantic document architecture for desktop data integration and management (Page 38-42)