Chapter 2 Literature Review
2.2 Knowledge Acquisition
2.2.2 Knowledge Acquisition Methods
2.2.2.1 Classification Rules
As the first successful expert system, DENDRAL provided the standard template for expert systems until the 1980s. DENDRAL’s approach to knowledge acquisition involved a programmer, later known as a knowledge engineer, interviewing experts and encoding their expertise as rules (B. G. Buchanan & Sutherland, 1968). These rules were described as heuristic rules by the developers to indicate that they are not absolute laws or complete definitions: rather they are guidelines or suggestions that, through inferencing, typically lead to correct results (B. G. Buchanan & Sutherland, 1968).
These heuristic rules are an example of what are more commonly known as classification rules; and they are one of the first, and in many ways the simplest, of the data modelling and knowledge acquisition techniques. The term classification
refers to the grouping of data cases by some measure: all the cases in a group are said to have that classification, or belong to that class. The term classification is used in many of the methods described here in exactly the same way; it is a common way of using knowledge about a case to add extra information, which may be used to derive new information or to make deriving further information more efficient. A classification rule is a rule which defines which cases should belong to a given class, and why (Witten & Frank, 2005).
A classification rule consists of one or more conditions, and a classification or conclusion. The classification is typically a name or number which is used to denote a given group of cases; such as in the DENDRAL system, each classification is an interpretation, or label, of what the case represents. Each condition in the rule consists of some statement describing a type of case, usually in the form [attribute name] [operator] [value], for example: Author (attribute) is (operator) ―C. Dickens‖ (value).
It should be noted that there are many other terms used to describe different implementations of rules, for example production rules or inference rules; however in practice the rules take the same form. The differentiation comes in their application: an inference rule is one which is used to derive new information, which can then be used to cause another rule to activate, and so on until no more rules activate: at which point some of the additional information provided by the rules is presented as the classification (Witten & Frank, 2005). Although the term classification rule is used here, it is intended to cover each of these different named rules, all of which follow a very similar pattern.
The most common source for classification rules are human experts. Rules are usually easy to create and understand, often being close to literal natural language statements, and as such were very popular early in knowledge acquisition research and are still commonly used today (Davis, et al., 1977; Stansfield, 2009). The expert examines the dataset and creates rules which classify cases based on the values of the set attributes. For example, all cases with a sufficient value for attribute A, and where attribute B is negative, should have conclusion 1 (if A>30 AND B<0 then 1).
One of the major advantages of classification rule systems is that the structure of the knowledge learned is readable by the expert – if the expert wants to know why a classification was made, they can simply examine the conditions of the rule that ―fired‖ (the rule that provided the final classification) (Clancey, 1984): which is a critical component in creating useable, verifiable expert systems (B. G. Buchanan, 1986; Davis, et al., 1977). It is generally easy to view the compiled knowledge and see what conclusions are being made, and based on which knowledge: hence providing a simple means to review effectiveness and progress.
Another major advantage for using these rules is that they represent discrete pieces of knowledge; which allows new knowledge to be added simply by adding more rules, rather than rewriting significant portions of code; and allows modification of existing rules (B. G. Buchanan & Sutherland, 1968; Davis, et al., 1977).
A criticism of rule-based systems was that they lacked flexibility in their conclusions, and could not be applied to many real-world problems, especially reasoning problems, because facts and data are rarely certain (Zadeh, 1979). This led to further development by adding Bayesian probabilities (discussed in section 2.3.3) and fuzzy logic (in which a classification is provided a confidence, between 0 and 1, rather than simply being present or not present). Each method gave alternative means of adding likelihoods or certainty factors to knowledge and to classifications, greatly improving the applicability and usefulness of results; however this came at the cost of more complex inferencing and knowledge acquisition (Duda, Hart, & Nilsson, 1976; Zadeh, 1979).
As work progressed on implementing more complex rule-based systems, and as existing systems were added to, it also became apparent that in order to solve complex tasks thousands of rules might be needed (Walser & McCormick, 1977). As it was realised that each rule condition and classification was typically reused many times, this led to strategies to reduce the storage requirements of the rules, such as the inference net (Duda, et al., 1976). These structures assisted in being able to process and use very large numbers of rules, but did not solve other aspects of the problem. The acquisition of these rules from human experts was very time- consuming and risky, as the translation between expert, knowledge engineer, and rule was not exact. Anything that was missed or entered incorrectly added to the increasingly difficult maintenance of a system where it became very difficult to predict what the effect of adding a new rule or changing an existing rule might be. It also became apparent that no matter how much effort was made to be thorough in the knowledge acquisition, the knowledge bases were never complete: due to changing knowledge, new discoveries, and fallible human memory, there would always be new rules that would have to be added (Davis, et al., 1977; Duda & Shortliffe, 1983). The maintenance of such a system therefore became an ongoing and very difficult task: each new rule that was added needed to be extensively checked and modified by a knowledge engineer to ensure that it would not change
the result of any other rule. This process was facilitated by the cornerstone case
system, whereby any case that caused the inclusion of a new rule would be stored; whenever a new rule was to be tested it would have to be compared to every cornerstone case to ensure that it did not match and change the results for the previously reviewed cornerstone cases – an exhausting process (Compton & Jansen, 1989). As a result of these acquisition and maintenance difficulties, the knowledge acquisition component became accepted as being the bottleneck in expert systems development (B. G. Buchanan, et al., 1983; Lenat, et al., 1985; Walser & McCormick, 1977), and alternatives to rule-based system knowledge acquisition began to be explored.