2.4 Multiple Classification Ripple Down Rules
2.4.2 Applications
Since its inception MCRDR has been applied to an interesting variety of domains, including but not limited to pathology, text/web document classification, medication review and help desk information retrieval (Kang, Yoshida et al. 1997; Park, Kim et al. 2004; Bindoff, Tenni et al. 2007).
Of the many MCRDR applications that have been developed or trialled, some are better documented than others. Of those available a representative sample has been chosen to discuss with a little more detail.
Figure 2-11 The general MCRDR knowledge acquisition process.
Load Case
Any wrong
or missing?
Determine and show classifications
No Wrong or Missing? Yes Wrong Missing Exception rule to incorrect rule
New rule at root
Get rule conditions from expert
Fetch valid cornerstone cases
Cornerstones remaining?
Yes
Add rule No
54
Pathology
As has been previously discussed, the RDR method was tested on pathology domains via both of the GARVAN-ES1 dataset and the PEIRS project. It has also been noted that these domains did have situations where multiple classifications were desired, and that these situations were handled by compound classifications. In light of this, after the PEIRS system was taken offline in 1994 it was re-developed as a commercial MCRDR application in the form of LabWizard by Pacific Knowledge Solutions. This system is understood to have since grown to over 16000 rules at one site and is reported to have achieved a 99.5% rate of accurate classification on 80% of the entire pathology domain, with the only parts remaining unexplored being those that require the interpretation of complex data such as images (Compton, Peters et al. 2006; PKS 2009). Unfortunately since this is a commercial application, further details of the specifics of implementation are unavailable, but it is understood to be largely a traditional MCRDR application (Compton, Peters et al. 2006).
Perhaps the spiritual successor to this system is the system developed by this author, in which MCRDR was applied to the domain of medication review, of which there is a pathology interpretation component, but also combining medications, medical history and patient biography (Bindoff 2005). This work has been extended here, so more information can be found in the relevant Medication Review chapter.
Help Desk Document Retrieval
MCRDR was applied to indexing of cases in a help desk environment for UNIX users. This system used a string describing the user’s problem as a case, and then applied an MCRDR knowledge base and inference process to find appropriate help documents (Kang, Yoshida et al. 1997).
Document Classification and Web Monitoring
A web monitoring system was developed using MCRDR, with the intention of allowing users to index particular web portals and classify the documents which were contained within. One of the particular challenges of this kind of document classification is that the documents are not static, and may change at any time. The MCRDR was found to be of particular value in this environment, since the user
55
could relatively easily re-classify documents as the context shifted (Park, Kim et al. 2004; Park, Kim et al. 2004).
2.4.3 Variations
Much as MCRDR is an extension of the RDR method there have been several extensions and different versions of the MCRDR developed since its inception. These are outlined below.
Rated/Weighted Multiple Classification Ripple Down Rules
This extension to the MCRDR method was undertaken after it was observed that although an MCRDR system can correctly identify a series of classifications for a particular case, it is not able to model or provide any knowledge about the relationships between those classifications. Dazeley and Kang would suggest that because the expert has given a particular case a particular set of classifications that there must be a relationship between these classifications, even if it has not been expressed (Dazeley and Kang 2003). The example provided is an email classification system, such as the one developed by Deards (Deards 2001). In a situation where the user has created rules to classify work and spam emails, with one requiring immediate attention and one being safely ignored, it is conceivable that a situation might arise where they happen to receive a spam email selling something work related. The system will be capable of correctly classifying this case as being both work and spam, but Dazeley argues that it should be capable of suggesting the middle ground in this situation, perhaps marking the email as worth reading but with a low importance (Dazeley and Kang 2003).
The approach Dazeley suggests for how to do this in principle seeks to capture the relationships between these classifications. In practice it produces a value which attempts to model “worth” based on the particular classifications that are provided and/or the rule path that is followed (Dazeley and Kang 2003). To do this the resultant outputs of the MCRDR inference process and the current knowledge base are fed into an artificial neural network (these are discussed briefly in the Simulation Studies chapter).
MCRDR with Formal Concepts Analysis
Richards proposed an extension of MCRDR which included the use of Formal Concepts Analysis (FCA) (Wille 1982; Richards 2000; Richards 2003). This
56
extension was intended to allow knowledge reuse, by providing a visual insight into the knowledge (Richards 2000). This concept was then extended to allow the transfer of tacit knowledge between knowledge bases, which effectively allowed separate knowledge bases to be combined (Richards 2003).
Automatic Compression of MCRDR
An approach was outlined to automatically compress an MCRDR knowledge base into a more compact knowledge base by removing redundant rules. The authors were ultimately disappointed with the results of this approach when applied to a test domain “Ultra3700”, achieving a compression rate of only 10% without allowing negative conditions and only 25% when allowing negative conditions (Suryanto, Richards et al. 1999). Since their approach at compressing the knowledge base was fairly comprehensive, what this study in some ways suggested is the same as what similar attempts with other exception based knowledge representation structures have ultimately suggested – that these knowledge bases are inherently already quite compact by design, despite the potential for repetition.
Discovering Ontologies from Knowledge Bases
As a continuation of the FCA work mentioned above, Suryanto noted that many KBSs are not built around well-defined ontologies; which are a formal representation of a set of concepts within a domain, and the relationships between those concepts which can be used to reason about the properties of the domain. In particular, there is often little restraint on how the expert expresses their classifications, particularly in RDR knowledge bases where the knowledge is input directly by the expert rather than carefully defined by a knowledge engineer (Suryanto and Compton 2000). This work looked to allow the definition of relationships between the various classes, including subsumptions, mutual exclusivity and similarity from which an ontology could be derived (Suryanto and Compton 2000). In essence this approach is the opposite of the conventional approach which would ask that an ontology be defined and then the KBS designed to suit it, although it is particularly targeted only at the classifications, rather than defining the domain as a whole (Suryanto and Compton 2001).
57
Bayesian Threshold with MCRDR (BayesTH-MCRDR)
Cho and Richards proposed an extension to the naive Bayesian method (discussed briefly in the Simulation Studies chapter) which included MCRDR. Their essential strategy was to use the machine learning strategy - naive Bayes with threshold - to learn a classification task, but under circumstances where the difference between two or more of the highest probability classes was low, or the highest probability class was still very low, the user was asked to intervene. This approach did demonstrate an improvement in accuracy on a web document classification task over other methods tested (Cho and Richards 2004).
Exposed MCRDR
This variation of the MCRDR approach was trialled as a tool to allow experts in lung function research to discover new knowledge about their domain, and is considered a knowledge discovery method. With this approach the expert is effectively allowed to create and destroy rules at a whim in order to see how these rules perform. The essential idea is that the expert will trial rules that they suspect, for whatever reason, might successfully classify a particular feature of the data. The system then provides feedback telling them what level of overlap the new rules results have with the existing methods of classifying that feature. Through this process the expert is able to discover new ways of reaching classifications. In this style of domain this is important, since every additional test the lung function expert must order costs a considerable amount of money, so being able to reach classifications with fewer tests is highly desirable (Ling 2006).
Interactive Recursive RDR
This method was designed for a particular implementation of a high-volume help desk system using MCRDR. The method is an amalgam of approaches, including Mulholland’s Recursive RDR (Mulholland, Preston et al. 1993) (this is described in more detail in the Multiple Classification Ripple Round Rules chapter) and Interactive RDR which extends RDR such that it can prompt the user for more information (attributes) as required, giving RDR backward-chaining style features. This approach is of interest, but is of particular relevance to the Multiple Classification Ripple Round Rules chapter, and will be discussed more thoroughly there.
58
Resource Allocation RDR
In 1999 Richards and Compton published a document describing an adaptation of the MCRDR approach in order to apply it to a specific resource allocation task, Sisyphus-I (Richards and Compton 1999). The Sisyphus-I problem is a room allocation task whereby employees of a research facility must be allocated to rooms which they will find acceptable. Complications are added by giving various employees preferences on the types of employees they share rooms with or are located near to, as well as requirements on office sizes. The system proposed handled the problem quite elegantly, but was let down in two areas. Although the task appears complex there are solutions which require no or very little staff shuffling, so a robust approach to recursive solving of configurations was not required. As well as this, the system proposed was not very generally applicable, as it was designed very much with the Sisyphus-I task in mind (Richards and Compton 1999). However, this work encouraged more thought as to how the RDR approach might be more generally applied to configuration tasks which contributed strongly to the Repeat Inference MCRDR proposal which is outlined below.
Repeat Inference MCRDR
Few working systems can really come under the banner of Repeat Inference MCRDR. The term is poorly defined, but appears to have been derived from publications which proposed extensions to RDR in light of experiences with the Ion Chromatography and Sisyphus-I systems which were described earlier (Mulholland 1995; Compton and Richards 1999; Compton and Richards 2000). The approach proposed is in many ways similar to the Nested RDR approach if it were applied to MCRDR knowledge bases, although there are some key differences. These proposals were then refined in the later proposal for Generalised RDR which is discussed briefly below, although again a more thorough discussion of these topics can be found in the Multiple Classification Ripple Round Rules chapter.
Generalised RDR
Following the proposal for extensions to RDR in 1999 and 2000 Compton et al. proposed further extensions, which have been dubbed Generalised RDR. This proposal largely connected work to combine knowledge bases and work to add intermediate conclusions to RDR into one unified approach (Compton, Cao et al.
59
2004; Singh and Compton 2005). This approach is again discussed comprehensively in the Multiple Classification Ripple Round Rules chapter.
2.4.4 Shortcomings
It is clear from the simple fact that those in the community saw fit to create the variations of MCRDR above that it has some shortcomings. The particular trends we see above include projects that seek to get more information out of the knowledge base, and projects that seek to get more information out of the particular group of classifications.
In fact, the shortcomings of MCRDR are largely the same as those of RDR: - - Potentially repeated knowledge.
o Except the knowledge is in a different context, so is it really repeated? And the level of repetition has been shown to be generally slight (Kang 1995; Suryanto, Richards et al. 1999).
- It encourages the expert to produce poorly defined domains, particularly in regards to the classifications.
o Although this seems inevitable in many situations, as one only has to look at work on ontologies to know that it is very difficult to fully define a domain beforehand (Van Heijst, Schreiber et al. 1997). - It is impossible to truly infer knowledge. That is, one cannot define a rule
that fires in the presence (or lack thereof) of another classification or classifications.
o As seen in the Variations section a fair body of work exists that deals with this issue, although this is discussed in far greater detail in the Multiple Classification Ripple Round Rules chapter where a new approach to this problem is described and evaluated.
One shortcoming that is not shared by single classification RDR is the lack of a robust machine learning approach that produces MCRDR knowledge bases from sets of pre-classified case data. This has been addressed to some extent with simulated experts by both Dazeley and Kang (Kang 1995; Dazeley and Kang 2003), however these approaches were both very much approximations which were never comprehensively evaluated on their own merits, rather being only concerned with evaluating the particular MCRDR method in question. To the best of this author’s
60
knowledge, no true machine learning approach for MCRDR has been described and evaluated. This is discussed in greater detail in the Simulation Studies chapter, where a more robust (in terms of errors) machine learning approach for MCRDR is described and evaluated.
Of course, when it is sought to apply the method to a broader range of problems, new shortcomings become apparent. For instance, an MCRDR based approach for cleaning bathrooms would present its own challenges. However, in the contexts that it has been applied to date its shortcomings have been, for the main part, successfully worked around. It should be made clear that this is far from saying that further improvements to the method cannot be made.
61