Multiple Classification Ripple Round Rules:
Classifications as Conditions
A dissertation submitted to the Faculty of Science, Engineering and Technology, University of Tasmania in fulfilment of the requirements for the Degree of Doctor
of Philosophy.
Ivan Karl Bindoff
BComp (Hons, First Class)
Statement of Originality and Access Authority
This dissertation contains no material which has been accepted for the award of any degree or diploma by the University of Tasmania or any other tertiary institution, except by way of background information and duly acknowledged in this dissertation, and to the best of the candidate’s knowledge and belief, this dissertation contains no material previously published or written by another person, except where due acknowledgement is made in the text of the dissertation.
This dissertation may be made available for loan and/or limited copying in accordance with the Copyright Act 1968.
Ivan Karl Bindoff June 2010
Statement of Ethical Conduct
The research associated with this thesis abides by the international and Australian codes on human and animal experimentation, the guidelines by the Australian Government’s Office of the Gene Technology Regulator and the rulings of the Safety, Ethics and Institutional Biosafety Committees of the University.
Abstract
The Ripple Down Rules (RDR) approach was developed by Compton and Jansen (Compton and Jansen 1989; Compton and Jansen 1992) to effectively remove the maintainability concerns of expert systems. This method was used to create an advanced expert system to assist in the performance of medication reviews. However, work in this area, although very successful, led to the realisation that the RDR method did have its drawbacks, since with this method it was no longer possible to define rules which were dependent on the presence or absence of a classification or classifications.
Previously, attempts were made to address this, with Recursive RDR (Mulholland 1995), Nested RDR (Beydoun and Hoffmann 1997) and Repeat Inference MCRDR (Compton and Richards 1999) all deserving acknowledgement in this regard. However, all of these approaches had their own shortcomings. Recursive RDR suffered problems with cyclic rule definitions, and was very domain specific (Mulholland 1995). Nested RDR was concerned more with the idea of intermediate classifications, rather than the more general problem of being able to define a rule based on the presence/absence of a classification or classifications (Beydoun and Hoffmann 1997; Beydoun and Hoffmann 2001). Repeat Inference MCRDR tackled the general problem, but its approach at preventing cycles – to not allow the retraction of assertions – fundamentally limits the scope of rules which can use classifications as conditions. In addition to this, there is some minor concerns as to the efficiency of the inference strategy, which simply repeatedly inferences the knowledge base until no further changes to the outputs are detected (Compton and Richards 1999; Finlayson 2008).
Acknowledgements
To my supervisor, Byeong Ho Kang: To you I am thankful for many wonderful things, such as guidance, direction and support. You seem to believe very strongly in my abilities, and that belief, in turn, gives me reassurance which is sometimes sorely needed. However, I am also thankful for a number of not-so-nice things. You are perhaps the best player of the role of devil’s advocate that I know, even if it drove me into fits of intense frustration at times. I am forced to acknowledge that it did very effectively arm me with the tools I need to defend my ideas – probably even against a barbarian horde, were it necessary. A warning to any future PhD candidate of Byeong’s – he will rip your ideas apart mercilessly and with an infuriating feigned ignorance, but it’s for your own good.
To my supervisor, Gregory Peterson: You took a fairly hands off role throughout my candidature, which was probably a good thing considering how much Byeong was already nagging me, and considering how small a part medication review ended up playing in this thesis. However, when push came to shove, you really came through for me in a big way. I am very grateful for this support, and I hope that you feel you have been suitably rewarded for placing your trust in my abilities, and have no (or at least few) regrets. I assure you that I intend to capitalise on the opportunities you have laid out before me.
To my partner, Vanessa Wronski: You are an endless source of amusement and encouragement. You balance me. When completing a thesis it is normally the role of your partner to be proud of you, but I am instead proud of you. On balance you worked harder than me during these past four years, yet together we managed to stay quite within the acceptable bounds of sanity. We’ve lived together for almost 4 years now, and next week we move into our first home together. I can only hope that this adventure will be as good as our previous ones.
To my friend, Tristan Ling. I’m singling you out because you chose a similar path to me, and as such were a source of valuable discussion and debate. Talking through my ideas with you helped me flesh them out, and learn how to explain them better. I’m sorry I haven’t been able to help you more with yours yet, but trust that I will make myself available to do that when you need me to. Thank you also for proof reading this thesis, the favour will be paid back.
Contents
1 Introduction ... 20
1.1 Thesis Outline ... 21
2 Literature Review ... 24
2.1 Artificial Intelligence ... 24
2.1.1 Knowledge Representation ... 24
2.1.2 Common Fields of Artificial Intelligence ... 26
2.2 Knowledge Based Systems ... 29
2.2.1 Applications ... 30
2.2.2 History ... 30
2.2.3 Design ... 31
2.2.4 Extensions ... 33
2.2.5 Optimisations ... 34
2.2.6 Flaws ... 34
2.3 Ripple Down Rules ... 36
2.3.1 Origins & Philosophy ... 36
2.3.2 Design ... 41
2.3.3 Procedure ... 44
2.3.4 Variations ... 44
2.3.5 Shortcomings ... 45
2.4 Multiple Classification Ripple Down Rules ... 46
2.4.1 Design ... 47
2.4.2 Applications ... 53
2.4.3 Variations ... 55
2.4.4 Shortcomings ... 59
3 Medication Review ... 61
3.1.1 Performing ... 65
3.1.2 Existing Software ... 65
3.2 Method ... 67
3.2.1 Existing Prototype ... 69
3.2.2 New Prototype ... 76
3.3 Experimental Design ... 82
3.3.1 Cases ... 82
3.3.2 Experts... 83
3.4 Results & Discussion ... 84
3.4.1 Growth of the Knowledge Base – Rules per Case ... 85
3.4.2 Specificity of the rules – Conditions per Rule ... 86
3.4.3 Accuracy of the system – Correct classifications provided ... 88
3.4.4 Classifications ... 89
3.4.5 Classifications per case ... 90
3.4.6 Cornerstone cases ... 90
3.4.7 Time per rule ... 93
3.4.8 Time per case ... 94
3.4.9 Expert error ... 95
3.5 Conclusions ... 99
3.6 Further work ... 100
4 Multiple Classification Ripple Round Rules... 102
4.1 Motivations ... 102
4.2 Literature Review ... 104
4.2.1 Single Classification Approaches ... 104
4.2.2 Multiple Classification Approaches ... 109
4.3 Method ... 111
4.3.2 Inference ... 116
4.3.3 Knowledge Acquisition ... 119
4.3.4 Summary ... 132
4.4 Traditional Classification Task – Pizza Suggestions ... 133
4.4.1 Results and Discussion ... 134
4.4.2 Conclusions ... 141
4.4.3 Further Work ... 141
4.5 Configuration Task – Blocks Placement ... 142
4.5.1 Results and Discussion ... 148
4.5.2 Conclusions ... 160
4.5.3 Further Work ... 161
4.6 Summary ... 162
5 Simulation Studies ... 164
5.1 Multiple Classification Simulated Experts ... 165
5.1.1 Literature Review ... 165
5.1.2 Method ... 173
5.1.3 Datasets ... 182
5.1.4 Results & Discussion ... 184
5.1.5 Conclusions & Further Work ... 227
5.2 Stress Testing ... 229
5.2.1 Method ... 230
5.2.2 Results & Discussion ... 233
5.2.3 Conclusions & Further Work ... 241
Summary of Contributions ... 246
Closing Words ... 251
References ... 252
6.1 Simulation Stress Test ... 258
6.1.1 Scene Dataset ... 258
6.1.2 Enron Dataset ... 261
Figures
Figure 2-1 A simple set of rules. ... 32
Figure 2-2 A complete set of facts. ... 32
Figure 2-3 A simple fuzzy rule set. ... 33
Figure 2-4 The difference between knowledge expressed by the expert, and the knowledge as it must be represented in a standard knowledge base (Compton and Jansen 1989). ... 35
Figure 2-5 The case based reasoning cycle (Aamodt and Plaza 1994). ... 40
Figure 2-6 A simple RDR knowledge base, where arrows pointing upwards indicate the TRUE path while arrows heading downwards indicate FALSE paths. 42 Figure 2-7 For a case [X=5, Y=5, Z=10] the emphasised rules are those which were evaluated, while the highlighted rule is the one which ultimately fired. ... 42
Figure 2-8 An example of a compound classification in RDR. ... 47
Figure 2-9 The previous example of a compound classification RDR knowledge base converted to MCRDR. ... 48
Figure 2-10 The difference list approach (Kang 1995). ... 52
Figure 2-11 The general MCRDR knowledge acquisition process. ... 53
Figure 3-1 An ATC code, example shown being Furosemide (Frusemide in Australia). We can also determine from this code that it is a high-ceiling diuretic in the Sulfonamides group. ... 80
Figure 3-2 The growth charts of both knowledge bases. ... 86
Figure 3-3 Conditions per rule. ... 87
Figure 3-4 Accuracy of the provided classifications. ... 89
Figure 3-5 The total number of cornerstone cases found for each rule. ... 92
Figure 3-6 The number of conditions added per rule in order to eliminate all cornerstone cases. ... 93
Figure 3-7 Time per rule. ... 94
Figure 3-8 Time per case. ... 95
Figure 3-9 The deviation from the original number of classifications found by the expert and the number found by the system after training was completed. ... 98
Figure 4-1 An example of an exception which uses a classification as a condition of its rule. This rule could not be represented with the RIMCRDR knowledge
representation scheme. ... 112
Figure 4-2 An example representation of a simple MCRRR knowledge base. .... 115
Figure 4-3 The MCRDR inference algorithm (Kang 1995). ... 116
Figure 4-4 The MCRRR inference algorithm. ... 117
Figure 4-5 A simple example of a cyclic knowledge base. ... 121
Figure 4-6 Psuedo-code for a topological sort of a directed acyclic graph (Kahn 1962). ... 122
Figure 4-7 The cycle detection algorithm used in this study. ... 123
Figure 4-8 The simplest example of a cycle. ... 124
Figure 4-9 A third example of a cycle... 125
Figure 4-10 A fourth example of a cycle. Inclusive of class not present conditions. ... 125
Figure 4-11 The growth of the pizza suggestions knowledge base... 135
Figure 4-12 The number of conditions per rule for the pizza suggestions knowledge base. ... 136
Figure 4-13 The percentage of correct classifications provided by the system for each case... 137
Figure 4-14 How many times each classification was used. ... 139
Figure 4-15 Time taken to create each rule. ... 140
Figure 4-16 The blocks which must be placed in each grid (case). Each block has an identification number 1-8 from left to right. ... 144
Figure 4-17 A fully loaded grid with 4 unavailable cells. One block remains correctly unplaced. ... 145
Figure 4-18 An example of a solution suggested by the system which has shown an overlap. ... 147
Figure 4-19 The growth rate of the blocks knowledge base. ... 150
Figure 4-20 The number of conditions per rule in the blocks experiment. ... 150
Figure 4-21 Number of classifications used as conditions per rule. ... 152
Figure 4-22 Correct classifications provided by system. Shown with a moving average with a period of 100. ... 153
Figure 4-23 The number of uses of each classification. ... 154
Figure 4-25 The number of cycles detected per rule. ... 157
Figure 4-26 The total number of alternate solutions suggested by the MCRRR method for 2000 cases after varying amounts of training. ... 159
Figure 4-27 Instances where forced additions resulted in the system finding alternate solutions. ... 160
Figure 4-28 Instances where forced removals resulted in the system finding an alternate solution. ... 160
Figure 5-1 C4.5 Algorithm (Kotsiantis 2007) ... 167
Figure 5-2 A simple example of a (bad) grouping rule. ... 179
Figure 5-3 The hindsight algorithm to “convert” MCRDR knowledge bases to MCRRR knowledge bases. ... 181
Figure 5-4 Growth of the knowledge base for the bibtex dataset. ... 186
Figure 5-5 Growth of the knowledge bases for the emotions dataset. ... 187
Figure 5-6 Growth of the knowledge base for the enron dataset. ... 188
Figure 5-7 Growth of the knowledge base for the genbase dataset. ... 189
Figure 5-8 Growth of the knowledge base for the medical dataset. ... 190
Figure 5-9 Growth of the knowledge base for the scene dataset. ... 191
Figure 5-10 The growth of the knowledge base for the yeast dataset. ... 192
Figure 5-11 The accuracy of the system relative to the simulated experts with the bibtex dataset. ... 193
Figure 5-12 The accuracy of the system relative to the simulated experts with the emotions dataset. ... 194
Figure 5-13 The accuracy of the system relative to the simulated experts with the enron dataset. ... 195
Figure 5-14 The accuracy of the system relative to the simulated experts with the genbase dataset. ... 196
Figure 5-15 The accuracy of the system relative to the simulated experts with the medical dataset. ... 197
Figure 5-16 The accuracy of the system relative to the simulated experts with the scene dataset. ... 198
Figure 5-17 The accuracy of the system relative to the simulated experts with the yeast dataset. ... 199
Figure 5-19 The average number of conditions for every 10 rule cluster in the emotions dataset. ... 201
Figure 5-20 The average number of conditions for every 10 rule cluster in the enron dataset. ... 202
Figure 5-21 The average number of conditions for every 10 rule cluster in the genbase dataset. ... 203
Figure 5-22 The average number of conditions for every 10 rule cluster in the medical dataset. ... 204
Figure 5-23 The average number of conditions for every 10 rule cluster in the scene dataset. ... 205
Figure 5-24 The average number of conditions for every 10 rule cluster in the yeast dataset. ... 205
Figure 5-25 The average depth for every cluster of 10 rules for the bibtex dataset. ... 206
Figure 5-26 The average depth for every cluster of 10 rules for the emotions dataset. ... 207
Figure 5-27 The average depth for every cluster of 10 rules for the enron dataset. ... 208
Figure 5-28 The average depth for every cluster of 10 cases for the genbase dataset. ... 208
Figure 5-29 The average depth for every cluster of 10 cases for the medical dataset. ... 209
Figure 5-30 The average depth for every cluster of 10 cases for the scene dataset. ... 210
Figure 5-31 The average depth for every cluster of 10 cases for the yeast dataset. ... 210
Figure 5-32 The total number of cornerstone cases found for each rule in the bibtex dataset. ... 212
Figure 5-33 The number of conditions added to remove each cornerstone case for the bibtex dataset. ... 212
Figure 5-34 Total cornerstone cases found for each rule in the emotions dataset. 213
Figure 5-36 The total number of cornerstone cases found for each rule in the enron dataset. ... 215
Figure 5-37 The number of conditions added to eliminate all cornerstone cases for each rule in the enron dataset. ... 215
Figure 5-38 The total number of cornerstone cases found for each rule in the genbase dataset. ... 216
Figure 5-39 The number of conditions added to eliminate all cornerstone cases for each rule in the genbase dataset. ... 217
Figure 5-40 The total number of cornerstone cases found for each rule in the medical dataset. ... 218
Figure 5-41 The number of conditions added to eliminate all cornerstone cases for each rule in the medical dataset. ... 218
Figure 5-42 The total number of cornerstone cases found for each rule in the scene dataset. ... 219
Figure 5-43 The number of conditions added to eliminate all cornerstones cases for each rule in the scene dataset. ... 220
Figure 5-44 The total number of cornerstone cases found for each rule in the yeast dataset. ... 220
Figure 5-45 The number of conditions added to eliminate all cornerstone cases for each rule in the yeast dataset. ... 221
Figure 5-46 The number of grouping rules and the reduction of conditions for the bibtex dataset. ... 222
Figure 5-47 The number of grouping rules and the reduction of conditions for the emotions dataset. ... 223
Figure 5-48 The number of grouping rules and the reduction of conditions for the enron dataset. ... 224
Figure 5-49 The number of grouping rules and reduction of conditions for the medical dataset. ... 225
Figure 5-50 The number of grouping rules and the reduction of conditions for the scene dataset. ... 226
Figure 5-52 A benchmark simulation, with 20% chance of exceptions and no rules based on classifications. A pure MCRDR simulation of this type would be expected to show a linear growth. ... 235
Figure 5-53 The same benchmark simulation, allowing Excel to use a higher order polynomial than necessary. ... 235
Figure 5-54 A simulation with a 10% chance of exceptions and a 10% chance of rules based on classifications. ... 237
Figure 5-55 A simulation with a 10% chance of exceptions and a 20% chance of rules based on classifications. ... 237
Figure 5-56 A simulation with a 10% chance of exceptions and a 40% chance of rules based on classifications. ... 238
Figure 5-57 A simulation with a 10% chance of exceptions and an 80% chance of
rules based on classifications. ... 238
Figure 5-58 A simulation with a 40% chance of exceptions and a 10% chance of rules based on classifications. ... 239
Figure 5-59 A simulation with a 40% chance of exceptions and an 80% chance of
rules based on classifications. ... 239
Figure 5-60 A benchmark simulation with a 20% chance of exceptions and a 50%
chance of rules based on classifications. ... 240
Figure 5-61 The time taken to perform 100 inferences at various evenly distributed points through the 10% exception and 10% rules based on classification experiment with the scene dataset. ... 241
Figure 6-1 A simulation with a 20% chance of exceptions and a 10% chance of rules based on classifications. ... 258
Figure 6-2 A simulation with a 20% chance of exceptions and a 20% chance of rules based on classifications. ... 258
Figure 6-3 A simulation with a 20% chance of exceptions and a 40% chance of rules based on classifications. ... 259
Figure 6-4 A simulation with a 20% chance of exceptions and an 80% chance of rules based on classifications. ... 259
Figure 6-5 A simulation with a 40% chance of exceptions and a 20% chance of rules based on classifications. ... 260
Figure 6-7 A benchmark simulation, with 20% chance of exceptions and no rules based on classifications. ... 261
Figure 6-8 A simulation with a 10% chance of exceptions and a 10% chance of rules based on classifications. ... 261
Figure 6-9 A simulation with a 10% chance of exceptions and a 20% chance of rules based on classifications. ... 262
Figure 6-10 A simulation with a 10% chance of exceptions and a 40% chance of rules based on classifications. ... 262
Figure 6-11 A simulation with a 10% chance of exceptions and an 80% chance of
rules based on classifications. ... 263
Figure 6-12 A simulation with a 20% chance of exceptions and a 10% chance of rules based on classifications. ... 263
Figure 6-13 A simulation with a 20% chance of exceptions and a 20% chance of rules based on classifications. ... 264
Figure 6-14 A simulation with a 20% chance of exceptions and a 40% chance of rules based on classifications. ... 264
Figure 6-15 A simulation with a 20% chance of exceptions and an 80% chance of
rules based on classifications. ... 265
Figure 6-16 A simulation with a 40% chance of exceptions and a 10% chance of rules based on classifications. ... 265
Figure 6-17 A simulation with a 40% chance of exceptions and a 20% chance of rules based on classifications. ... 266
Figure 6-18 A simulation with a 40% chance of exceptions and a 40% chance of rules based on classifications. ... 266
Figure 6-19 A simulation with a 40% chance of exceptions and an 80% chance of
rules based on classifications. ... 267
Figure 6-20 A benchmark simulation with a 20% chance of exceptions and a 50%
chance of rules based on classifications. ... 267
Figure 6-21 The number of rules in the system and the time taken to perform 100 inferences at 9 separate points during the simulated stress test runs for 10% exceptions and 20% classifications. ... 268
Figure 6-23 The number of rules in the system and the time taken to perform 1000 inferences at 9 separate points during the simulated stress test runs for 10% exceptions and 80% classifications. ... 269
Figure 6-24 The number of rules in the system and the time taken to perform 1000 inferences at 9 separate points during the simulated stress test runs for 20% exceptions and 10% classifications. ... 270
Figure 6-25 The number of rules in the system and the time taken to perform 1000 inferences at 9 separate points during the simulated stress test runs for 20% exceptions and 20% classifications. ... 270
Figure 6-26 The number of rules in the system and the time taken to perform 1000 inferences at 9 separate points during the simulated stress test runs for 20% exceptions and 40% classifications. ... 271
Figure 6-27 The number of rules in the system and the time taken to perform 1000 inferences at 9 separate points during the simulated stress test runs for 20% exceptions and 80% classifications. ... 271
Figure 6-28 The number of rules in the system and the time taken to perform 1000 inferences at 9 separate points during the simulated stress test runs for 40% exceptions and 10% classifications. ... 272
Figure 6-29 The number of rules in the system and the time taken to perform 1000 inferences at 9 separate points during the simulated stress test runs for 40% exceptions and 20% classifications. ... 272
Figure 6-30 The number of rules in the system and the time taken to perform 1000 inferences at 9 separate points during the simulated stress test runs for 40% exceptions and 40% classifications. ... 273
Tables
Table 1 ICPC-2 PLUS terms for keyword 'vascular'. ... 79
Table 2 The number of rules which had 0-4 classifications as conditions. ... 151
Table 3 The number of rules at each depth. ... 156
Table 4 The multi class datasets used, and their relevant statistics. ... 183