• No results found

2.1 Philosophy of Knowledge applied to Artificial Intelligence

2.1.4 Knowledge Based Systems

There are numerous methodologies and tools that have been developed to aid in the creation of KBSs, which attempt to address the KA problem discussed above. There are many books, papers and reviews (such as Gennari et al. 2002; Leake 1996; Lenat and Guha 1990; Schreiber 1993; Schreiber et al. 1993) that investigate the issues associated with the problem in detail, and so this section will only very briefly introduce the more widely used or relevant approaches developed, so that readers can place the work in this thesis in the overall KBS field of research. The approaches discussed in this section are Knowledge Acquisition and Design Structuring (KADS), Protégé, Cyc, Case-Based Reasoning (CBR) and Data Mining.

2.1.4.1 Knowledge Acquisition and Design Structuring (KADS)

KADS (Knowledge Acquisition and Design Structuring) is the outcome of the European research project ESPRIT-I P1098 initiated in 1983 with the aim of developing a comprehensive, commercially viable methodology for knowledge- based systems (Wielinga et al. 1992). KADS is recognised as one of the first true methodologies developed specifically for the development of KBSs and is still widely used today. It was developed in particular to address the problems found in the knowledge acquisition bottleneck. The researchers’ view of the problem was that knowledge acquisition failed in traditional systems through an inability to get to the deep-knowledge of the experts, primarily due to a lack of structural constraints on the experts during knowledge extraction.

Rather than viewing KA as filling a container of knowledge, the KADS perspective is an operational model that displays a form of observed behaviour which is “…specified in terms of real world phenomena” (Wielinga et al. 1992, p 6). Basically, KADS has an array of modelling techniques, where the expert and engineer work together to build up a set of tasks so that knowledge acquisition can be approached in a systematic way. This ensures, as best as possible, that nothing gets left out and the people concerned are aware of where they are up to in the process (Compton et al. 1993). This divide and conquer approach to KA is the basic underpinning of all task orientated methodologies.

The reason approaches, such as KADS, work reasonably effectively during development is because the expert themselves model a particular individual task. This forces the expert to step out of their current context into the global knowledge domain. This process of the expert now providing more globalised knowledge prevents the engineer from being required to extensively convert the knowledge further. However, the resulting knowledge base is still global in context and static in representation. Wielinga et al. (1992) even defines domain knowledge in KADS as being static knowledge.

This static representation, however, leads to one of KADS greatest shortcomings: knowledge maintenance. Nowhere in the KADS methodology is the knowledge maintenance issue addressed directly. Instead it assumes that the spiral life cycle model will continue infinitum. This omission has resulted in alterations to the basic methodology, such as structure preserving design

(Schreiber 1993), which preserves the information content and structure in the knowledge-level model, within the final artefact. Therefore, the system not only provides the static domain knowledge but also the relationships between the artefact and the original knowledge sources and/or meta-classes. This makes development a process of adding implementation detail to a knowledge-level model, which makes it more possible to trace omissions or inconsistencies in an artefact back to the relevant part of the model, considerably simplifying maintenance (Killin 1993; Schreiber 1993). This essentially, although only in a limited form, can be seen as an attempt to include some form of context within the symbols.

2.1.4.2 Protégé

Protégé, a generalisation of the OPAL and ONCOCIN systems, is a knowledge- based systems development environment that has been evolving since the mid 1980s (Gennari et al. 2002; Musen 1987). Initially, it was a simple program designed for the medical domain, protocol-based therapy planning, but has since had many reimplementations, becoming a much more general-purpose set of tools. The original goal of Protégé, like the majority of new methodologies at the time, was to reduce the knowledge-acquisition bottleneck. This was accomplished through reducing the role of the KE in the construction of KBs

Musen’s (1987) information-partitioning hypothesis, which asserts that there exists a qualitative division among the types of information a KBS requires, is the primary basis behind the Protégé methodology. This hypothesis proposes that the knowledge acquired in one stage is also meta-knowledge for the subsequent stage, and thus, can be used to determine what KA-tools should be used in that next stage (Grosso et al. 1999). It’s Protégé’s use of this information-partitioning hypothesis along with its utilization of task specific knowledge to generate customised KA-tools that allows for the simplification of KA (Gennari et al. 2002; Grosso et al. 1999).

Later incarnations of Protégé worked towards making knowledge-bases more reusable. For instance:

• Protégé-II removed knowledge concerning the problem-solving method (PSM) from the KB, by formally modelling the PSMs and then using the method ontologies to define mappings. This converted Protégé’s original informal model to a formal one (Grosso et al. 1999; Puerta et al. 1992).

• Protégé/Win allowed for modularity of knowledge bases through the use of components (Gennari et al. 2002; Grosso et al. 1999).

• Protégé-2000 adopted the Open Knowledge Base Connectivity (OKBC) (Chaudhri et al. 1998; Fikes and Farquhar 1997) knowledge protocol, allowing greater expressivity, a clean model-theoretic semantic and a greater possibility for maintenance and reuse (Gennari et al. 2002; Grosso et al. 1999). It was also rebuilt around a three layer model with fully replaceable and interchangeable components (Grosso et al. 1999).

Protégé’s underlying methodology of building separate ontologies and then constructing knowledge bases from these components is not unique. This can be seen for example in LOOM (MacGregor 1991) and GKB (Karp et al. 1999). Like KADS, Protégé’s knowledge-level modelling of framework ontologies can be effective and help with knowledge reuse. These ontologies, however, are constructed prior to the knowledge-base, and therefore, the knowledge within is still global to the component. Theoretically though, individual ontologies may be created in context, however, this usually conflicts with the components’ potential for reuse.

2.1.4.3 Cyc

Cyc, short for encyclopaedia, did not start out as an attempt to develop new methodologies for knowledge acquisition. Rather, its intention was primarily to solve the problem of brittleness, where a small amount of missing knowledge can be significantly detrimental to the system. However, as a result it has had implications for both KA and KR methods. A KB’s brittleness comes from its concentration of specialised knowledge within a single narrow domain. Therefore, when knowledge is required from just beyond this domain the KBS collapses. Fundamentally, this is part of the same problem the previous methodologies were attempting to fix. Their unarticulated view had been if we can extract deeper knowledge then the system will be less brittle. Lenat et al. (1990) however, argues that brittleness is the result of insufficient commonsense knowledge within the KB. It is this lack of commonsense knowledge that Cyc has been developed to address. The Cyc system is a universal schema with millions of directly entered and inferred commonsense axioms that make up hundreds of thousands of general concepts (Lenat 1995).

While methodological development was not the driving force behind Cyc, it was one of the results. During the system’s development it was obvious that existing methodologies were woefully inadequate at scaling to the required size or at representing particular concepts. Cyc incrementally developed its own representation language then, to address repetition issues that eventuated, and periodically smoothed out the resulting structure (Lenat et al. 1990). It uses a frame-based language embedded in a first order predicate calculus framework with a series of second-order extensions that allow the representation of defaults, reification, and for reflection (Guha and Lenat 1994; Lenat 1995; Lenat et al. 1990; Pittman and Lenat 1993). The inference engine used for Cyc was also incrementally constructed using more traditional computer science data- structures and algorithms.

Lenat (1995) argues that the majority of assertions could not be made correctly without the use of some form of context. For instance, the statement

you cannot see a persons heart’ assumes that the person is not currently

undergoing open heart surgery. Alternatively, the assertion could also represent a metaphorical meaning. Cyc’s solution was to place each assertion into one or

more explicit contexts, through the use of microtheories. Within each context, the assertion is then given its default conclusion. Each context is itself an individual KB. Additionally, the provision for being able to import assertions from other contexts was also included in Cyc allowing the combination of contexts and contradictory assertions to be resolved (Lenat 1995; Lenat et al. 1990).

The Cyc project’s brute force approach shows potential as a basis for domain specific KBs to be built upon. However, Cyc’s ad-hoc development methodology requires explicitly-stated knowledge that can often be dated or invalidated by the time it is eventually used in a real world system. This thesis asserts that: commonsense knowledge is one of the most a posteriori-

contextually dependent forms of knowledge, due to its high dependence on

culture and time. For instance, individual axioms not only change conclusions between contexts, but they also can change within the same context between different culturally independent minority groups. Furthermore, the contexts themselves within these groups also change over time. One of Lenat et al’s (1990) own examples of knowledge in the Cyc system is:

Payments of less than ten dollars are usually made with cash; those over fifty dollars are usually made via check or credit card (Lenat et al. 1990, p 43).

Such an assertion is highly cultural, location and time specific and, therefore, very susceptible to failing in a contextually-dynamic environment. For example, given such knowledge, one must ask ‘how relevant is this?’ to the following: a peasant farmer in central china, who has no concept of how much a dollar is worth; to people in a war zone that have no access to secure financial institutions; or, to someone living in 2010 where all transactions are made with smart cards. Therefore, even though various static-contexts are identified in Cyc, the absence

of dynamic-contexts renders its approach as far too simplistic and highly

susceptible to obsolescence, for the development of a (near) complete KB of commonsense knowledge. For instance, Clancey (1991) likens common sense knowledge to that of chaos theory, where projects such as Cyc attempt to collect it like “…so many butterflies” (Clancey 1991, p 245).

2.1.4.4 Case-Based Reasoning (CBR)

Case-based reasoning (CBR) is not a single methodology but a field of research as diverse as KBSs. It originated initially to solve the KA bottleneck by attempting to capture the context of knowledge through the use of cases. The idea was to represent a concept through extension. That is, a concept is defined by a set of instances (cases). The basic idea behind CBR is two fold. Firstly, a CBR system attempts to solve a particular problem, in the form of a case, by searching and finding a similar, previously seen, case (or cases) and reusing that earlier case’s solution, or a modified version of the solution. Secondly, the CBR system attempts to incrementally learn by using the success or failure of a solution (Aamodt and Plaza 1994). For instance, if a solution is correct then the new case is added with the solution given so it can be used in future similar situations. However, if the solution is wrong, then the reason for the failure is ascertained, as best as possible, and stored to avoid the error in the future.

The majority of research in CBR is around the problems of:

• Knowledge representation – must allow for effective and efficient searching and the inclusion of new case knowledge.

• Retrieval – using a partial problem description the CBR system must find the closest matching previous case(s) using an efficient method for case comparison.

• Reuse – investigates which aspects of a case are useful for future problem solving, as well as finding the difference between the current and previous case.

• Revision/Adaptation – if the solution was wrong then use domain-specific knowledge or user input to revise the solution.

• Retainment – if the solution was correct then use an aspect of the case to expand the area of the systems solution space by integrating the new case into the memory structure (Aamodt and Plaza 1994).

CBR methods are effective to some extent because experts are significantly more open to discussing details of a case and the associated solution than abstract general rules (Leake 1996). Additionally, their effectiveness also stems

from the provision of contextually relevant information being provided regarding the solution, improving maintenance issues and significantly reducing the KA bottleneck (Leake 1996). However, the development of effective CBR based systems is highly dependant on effective retrieval and adaptation functions (Khan 2003), which generally require domain dependant knowledge.

Fundamentally, CBR fits within a more situated cognitive (2.2) view of knowledge. However, most of the current research has overlooked that dynamic knowledge does not only relate to the domain knowledge held within the individual cases, but also in the control and problem solving knowledge of the domain. The problem in CBR is that the majority of systems rely on static predefined rules or procedures for their retrieval and adaptation functions. CBR system developers struggle, however, to anticipate all the difficulties that may be encountered in a domain during development, thus, causing a major bottleneck (Khan and Hoffmann 2003). Yet the nature of learning in CBR is implicitly incremental (Aamodt and Plaza 1994; Leake 1996), and thus, should be capable of handling such difficulties. Khan (2003) proposes that the process of acquiring and using both case specific knowledge and general domain knowledge should also be made within the context of the problem solving process and monitored by an expert.

2.1.4.5 Data Mining

The world is becoming more like the infinite library from The Library of Babel

(Borges 1956) every day, overflowing with data from which people are unable to extract meaningful information. Knowledge discovery in databases (KDD) is a field of research attempting to solve this dilemma and is seen as the process of extracting “…implicit, previously unknown and potentially useful knowledge from data” (Frawley et al. 1992).

Data mining is not a single methodology but a field of research as diverse as KBSs, nor is it specifically related to Machine Learning techniques. It also crosses into statistical analysis and database systems. One area of research within KDD is concerned with the ability to find meaningful classifications and predictions of values for the tuples contained within a database.

Expert Systems in data mining generally use knowledge extraction methods to form a classifier or predictor. These have the advantage of forming high

quality results due to the inclusion of expert knowledge. The problem, however, is that they do not allow for autonomous knowledge discovery. Therefore, such systems will only find results that the expert is capable of giving examples about and will not find unknown patterns. Thus, Knowledge Discovery in Database (KDD) generally relies on other machine learning tools, forgoing the advantage of expert knowledge.

One method that is highly effective at automatically generating rule bases, capable of classifying and predicting values from smaller data sets is decision trees. However, while in theory decision trees could scale effectively to larger databases, due to only having n(log n) complexity, they suffer from requiring the training set to be resident in memory (Han and Kamber 2001). More recent systems such as SLIQ and SPRINT have attempted to address this issue. However, they require pre-sorting of data sets as well as complex and expensive data structures that reduce their effectiveness with large training sets (Han and Kamber 2001).

Another application of rule based approaches in data mining is in combination with other classifiers. In these methods a number of classifiers can be trained or built separately then their various guesses combined to find the ultimate classification. Alternatively, classifiers can be chained as in stacking (Wolpert 1992) and cascading (Gama 1998) methods. These methods have multiple layers of classifiers where the results from the first layer classifiers are fed in as input to the second layer of classifiers (Estruch et al. 2003). It should be noted that the method developed in this thesis is a form of stacked classifier with the exception that the input into the second layer comes solely from the output of a single classifier in the first layer, rather than from multiple classifiers.

One of the primary drawbacks of stacking and cascading is that comprehensibility is generally lost. This is due to the subsequent layers only receiving attributes that are in terms of the previous layers conclusions (Estruch et al. 2003). This is potentially not the case in the method developed in this thesis due to their being a direct continuity between layers through the use of only a single classifier in the first layer. However, to prevent the scope of this thesis from expanding this is not explored further.