Autonomous learning by incremental induction and revision

(1)

by

Incremental Induction and Revision

A thesis submitted for the degree of

Doctor of Philosophy

Kerry Lea Taylor

February 1996

(2)

(3)

(4)

I first acknowledge th e co n trib u tio n to th is thesis d issertatio n m ade by my triu m v ira te of supervisors. To D r C laude Sam m ut of th e U niversity of New S outh W ales goes th e credit for th e initial in spiratio n for th e work. He sparked m y in terest in m achine lea rn ing as an a re a of study, an d suggested th e fram ew ork for increm ental th eo ry revision inco rp o ratin g inverse resolution and declarative diagnosis. C laude d irected th e early stages of m y research a n d focused my a tte n tio n on experim ental resu lts in th e final stages. In betw een, his deep knowledge of th e area of work enabled him to suggest fruitful lines of enquiry for problem s th a t arose.

I th a n k Professor R obin S tan to n , D ean of th e F aculty of E ngineering an d In fo rm atio n Technology a t th e A u stralian N ational U niversity, for my ed u catio n in th e w ider fields of artificial intelligence an d co m p u tatio n al logic. His bro ad an d extrem ely p ercep tive view of my work and its context were challenging, thought-provoking an d u ltim a te ly beneficial to its quality.

D r G ra h a m W illiam s, of th e Division of Inform ation Technology, C SIR O , was a n in valuable source of p ractical assistance. G rah am encouraged a regular dialogue of com m unication on a wide range of subjects: from th e very p a rtic u la r to th e very general. He also read a n d re-read interm inable d rafts, advised on d etailed questions of style and expression a n d frequently drew my a tte n tio n to helpful lite ra tu re an d research tools. I am especially grateful to G rah am for m aking him self so readily available for assistance w hen I needed it.

(5)

M ost of all, I am g rateful for th e unfailing su p p o rt and encouragem ent given freely by my h u sb an d, Trevor Vickers. Trevor was also a great help w ith m a tte rs of w ritte n expression, o rganisation, direction an d equanim ity.

I th a n k th e youngest p a rtic ip a n t in th e project: my son, Byron Vickers. B y ro n ’s rem arkable capacity to learn was a source of wonder and inspiration. His d em and for teaching was a source of frequent b u t welcome d istraction .

(6)

An inductive learning cap ab ility is generally acknowledged to be crucial to artificial intelligence in m achines. T h is work investigates learning by an autono m o us agent th a t continuously in teracts w ith th e environm ent in which it is em bedded. A novel dom ain- in d ep en d en t learning alg o rith m is proposed.

L earning is viewed as th e increm ental revision of a developing th eory of m ultiple in terd e p en d e n t concept definitions. W hen an observation of th e environm ent reveals an error in th e present theory, th e reason for th e error is located by diagnosis an d ex p er im en tatio n. To correct an error, a ltern ativ e generalizing or specializing revisions are investigated. Revision hypotheses are evaluated by exp erim en tatio n leading to fu rth e r th eo ry im provem ent by discovery.

T he th eo ry itself is th e source of dom ain-dep end ent background knowledge directing hypothesis form ulation. T he environm ent is th e source of observations of concepts and th e te s tb e d for hypothesis ex p erim en tation. T he environm ent also generates u n expected in te rru p tio n s im posing a rb itra ry tim e lim its on learning.

T he learning m odel is developed in th e Inductive Logic P ro gram m ing fram ew ork in which logic program s are used to represent theories. New th eo retical resu lts b ased on logic program m ing concepts u n d erp in th e key com ponents of generalization by a b so rp tio n a n d specialization by generalization of exceptions. T he investigation of a lte rn a tiv e revisions proceeds by a staged heuristic search th ro u g h p a rtially developed h ypotheses, enabling graceful perform ance d eg radation w hen tim e is sh ort.

The algorithm is implemented as a software agent called Minerva. The environment is modelled by a parameterised random sampling tool called SAMPLER. Empirical results demonstrating the learning performance of Minerva in several domains modelled by SAMPLER are reported.

(7)

A c k n o w le d g e m e n ts... v

A b s t r a c t ... vii

C o n te n ts ... ix

Figures ... xvii

1 I n t r o d u c tio n 1 1.1 T he P r o b l e m ... 1

1.2 R esearch F r a m e w o r k ... 2

1.3 Role of th e L e a r n e r ... 2

1.3.1 T h e learner in th e e n v ir o n m e n t... 4

1.4 Key F e a t u r e s ... 4

1.4.1 F irs t order predicate calculus knowledge r e p r e s e n t a t i o n ... 5

1.4.2 D om ain i n d e p e n d e n c e ... 6

1.4.3 Increm ental revision in response to increm ental in p u t ... 6

1.4.4 Learning from background knowledge ... 7

1.4.5 M ultiple intersp ersed concept l e a r n i n g ... 7

1.4.6 Simple in p u t l a n g u a g e ... 7

1.4.7 Active e x p e r i m e n t a t i o n ... 8

(8)

1.4.9 Theoretical fo u n d atio n ... 9

1.5 Structure of the T h e sis... 10

2 L e a r n in g I llu s t r a t e d 11 2.1 Introduction... 11

2.2 Discovering New F a c t s ... 13

2.3 Learning from Background K now ledge... 14

2.4 Returning to an Earlier C o n c e p t ... 15

2.5 In te rru p tio n ... 15

2.6 Learning a Symmetric R e la tio n ... 16

2.7 Learning E x c e p tio n s ... 17

2.8 Removing Redundant C lau ses... 18

2.9 D ia g n o s is ... 19

2.10 Generalizing an E x cep tio n ... 20

2.11 Missing Answer D iag n o sis... 21

2.12 Replacing a Redundant Rule ... 21

2.13 Simplifying E xceptions... 24

2.14 S u m m a r y ... 26

3 F o u n d a tio n C o n c e p ts 27 3.1 Intro d u ctio n ... 27

3.2 Inductive Concept L e a rn in g ... 27

3.2.1 Learning in the Inductive Logic Programming framework . . . . 28

3.3 N o ta tio n ... 29

3.4 Preliminary Concepts of Logic Programming ... 30

3.4.1 T e r m s ... 30

3.4.2 Clauses and p ro g r a m s ... 31

3.4.3 S u b s titu tio n s ... 31

(9)

3.5.2 G e n e ra liz a tio n ... 35

3.5.3 Predicate invention... 40

3.6 In te rp r e te rs ... 41

3.6.1 Negation as f a ilu r e ... 42

3.6.2 Interpreter deficiencies... 43

3.7 Declarative Diagnosis ... 52

3.7.1 Diagnostic tools ... 52

3.7.2 Contradiction b ack tracin g ... 53

3.7.3 Missing answers ... 54

3.8 S u m m a r y ... 56

4 I n c r e m e n ta l L e a r n e r s a n d T h e o r y R e v is io n S y s t e m s 57 4.1 Introduction... 57

4.2 Incremental ILP L e a r n e r s ... 57

4.2.1 Ma r v i n... 57

4.2.2 M i s ... 58

4.2.3 CiGOL... 59

4.2.4 Cl i n t... 59

4.3 Batch ILP Learners ... 60

4.4 Other Revising L e a rn e rs ... 61

4.4.1 Knowledge base re fin e m e n t... 61

4.4.2 Operator p la n n in g ... 62

4.4.3 Science ... 63

4.5 Approaches to Revision P roblem s... 64

4.5.1 D ia g n o sis ... 64

4.5.2 Generalization ... 64

4.5.3 Specialization... 66

(10)

5.1 In tro d u ctio n ... 69

5.2 Design P rin c ip le s ... 69

5.2.1 Strategy outline ... 69

5.2.2 R edundancy... 72

5.2.3 The top-level a lg o r ith m ... 72

5.3 R ev isio n ... 74

5.3.1 Specialization by e x c e p tio n ... 74

5.3.2 Generalization ... 76

5.4 Interpreting a P ro g ram ... 77

5.4.1 The language of p ro g ram s... 78

5.4.2 In te rp re te rs ... 79

5.5 D ia g n o s is ... 80

5.5.1 Q u e stio n s ... 84

5.5.2 Search order ... 84

5.5.3 Invented predicates... 85

5.5.4 Non-terminating diagnosis... 85

5.5.5 Interrupted diagnosis... 86

5.6 Experim ents... 87

5.6.1 Experiments about self-recursive p re d ic a te s ... 89

5.6.2 Experiments about invented predicates... 90

5.7 R edundancy... 91

5.7.1 S leep ... 92

5.8 S u m m a r y ... 93

6 H e u r is t ic S e a r c h for a R e v is io n 95 6.1 In tro d u ctio n ... 95

6.2 Heuristic Evaluation of a Revision ... 96

(11)

6.4.1 Choosing a specializing rev isio n ... 98

6.4.2 Assimilating a specializing re v isio n ...100

6.5 The Search for a G eneralization... 100

6.5.1 Partial h y p o th e se s ...101

6.5.2 Development of a partial h y p o th e s is ...102

6.5.3 Hypothesis syntax c h ec k in g ... 106

6.5.4 Hypothesis t e s t i n g ...107

6.6 Evaluating a Partial H ypothesis... 109

6.6.1 Value of a partial h y p o th e s is ... 109

6.6.2 Best background c l a u s e ... 112

6.7 Hypothesis A ssim ilatio n ...112

6.7.1 A d e q u a c y ... 113

6.7.2 Assimilating a partial h y p o th e sis...113

6.8 Minimum Description L e n g th ... 113

6.9 S u m m a r y ... 114

7 M o d e llin g t h e E n v ir o n m e n t 11 5 7.1 Introduction... 115

7.2 The T a r g e t ... 115

7.3 Sample Strategies... 116

7.3.1 Probability distrib u tio n s... 116

7.3.2 Deterministic s tr a te g i e s ... 117

7.3.3 Complexity-probability stra te g ie s... 118

7.3.4 Arity-probability s tr a te g i e s ... 119

7.3.5 Positive strategies ...120

7.3.6 Alternating s t r a t e g i e s ...120

7.3.7 User-control of example s e le c tio n ... 121

(12)

7.4.2 Generating ground a t o m s ... 122

7.5 Time L im its...122

7.6 Other SAMPLER fe a tu re s... 123

7.6.1 Sleep c o n t r o l ... 123

7.6.2 S t a t i s t i c s ...123

7.7 S u m m a r y ... 123

8 P e r fo r m a n c e e v a lu a tio n 125 8.1 Intro d u ctio n ...125

8.1.1 Experimental d e s ig n ... 125

8.2 Learning by A b s o r p tio n ... 126

8.2.1 Results ...127

8.3 Incremental L earning...128

8.3.1 Results ...129

8.4 Theory R efinem ent... 129

8.4.1 Experimental m e t h o d ... 131

8.4.2 Induction experiment ... 131

8.4.3 Refinement experiment ... 132

8.5 Learning Multiple Predicate Theories...134

8.5.1 Results ...135

8.6 Learning with Mi n e r v a... 135

8.6.1 The graph d o m a in ... 135

8.6.2 Learning interleaved interdependent p re d ic a te s ... 138

8.6.3 Varying the learning t i m e ... 140

8.6.4 Varying the sleep p a t t e r n ... 141

8.6.5 Tolerating noise... 142

8.6.6 Learning from positive e x a m p le s ...143

(13)

8.7 S u m m a r y ... 147

9 G e n e r a liz a t io n b y A b s o r p t io n 149 9.1 In tro d u ctio n ... 149

9.2 Absorption ... 150

9.2.1 Features of a b so rp tio n ... 152

9.3 Completeness of A b so rp tio n ... 154

9.3.1 Preliminary definitions... 154

9.3.2 Inversion of SL D -resolution...155

9.3.3 Completeness of absorption for definite c l a u s e s ...160

9.4 Eliminating R edundancy... 162

9.4.1 Connex c la u s e s ... 162

9.4.2 Duplicate c la u se s... 164

9.4.3 Least general ab so rp tio n ... 164

9.5 Reducing the Search Space F u r th e r ...165

9.5.1 Free variables in a background clause ...165

9.5.2 Choosing the root ... 165

9.6 Changing the R e p re se n ta tio n ... 166

9.6.1 Flattening for a b s o r p tio n ... 167

9.6.2 Limiting unit absorption... 167

9.7 A Strategy for Generalization by k-unit A bsorption... 170

9.7.1 Free variables in a background clause ...170

9.7.2 Choosing the root ... 171

9.7.3 Avoiding inverse s u b s titu tio n ...171

9.8 Generalizing Normal C l a u s e s ... 172

9.8.1 Normal subsum ption... 172

9.8.2 Normal a b so rp tio n ... 174

(14)

9.9.1 Implementing normal a b s o rp tio n ... 177

9.9.2 Implementing k-unit absorption...178

9.10 S u m m a r y ... 179

10 C o n c lu s io n s 181 10.1 Intro d u ctio n ... 181

10.2 Summary of the T h e sis... 181

10.3 Improvements to Minerva ... 182

10.4 Expanding the E nvironm ent... 185

10.5 Concluding R e m a rk s ... 187

(15)

1.1 Architecture of an intelligent a g e n t ... 3

1.2 Modelling the learning environment ... 5

2.1 Example fa m ilie s... 12

5.1 Top level algorithm ... 73

5.2 Procedure b e st-re v is io n ... 75

5.3 Missing answer diagnosis... 81

5.4 Missing answer diagnosis (continued)... 82

5.5 Contradiction backtracing d ia g n o s is ... 83

6.1 Choosing a specializing rev isio n ... 99

6.2 Processing partial h y p o th eses...102

6.3 Stages of development of a partial h y p o th e sis ... 103

6.4 Developing a partial h y p o th esis...104

6.5 Applying a b s o r p tio n ... 105

6.6 Hypothesis t e s t i n g ... 108

8.1 Experimental results for in d u c tio n ...132

8.2 Experimental results for refinem ent...133

(16)

8.4 Graph relations p ro g ram ... 137

8.5 Learning three graph relations ...139

8.6 Varying the learning t i m e ... 140

8.7 Varying the sleep p a t t e r n ... 141

8.8 Tolerating n o ise ... 143

8.9 Learning from positive e x a m p le s ...143

8.10 Confusing the background knowledge...144

8.11 Learning five graph relations ...145

8.12 Learning a single extra graph r e l a t i o n ...146

(17)

1 I n tr o d u c tio n

1.1 T h e P r o b le m

How do intelligent beings learn? How is it possible for an agent to use o b servations of an environm ent to reason a b o u t events which are n ot observed; to p red ict th e consequences of actions; to form an in tern al m odel of its world?

T hese questions have been asked m any tim es in m any fields of hu m an endeavour. Be havioral psychologists like P iag et have stu d ied childhood learning [Flavell 1987, Gins- bu rg an d O p p er 1979]. Philosophers including H um e, P o p p er and Peirce have a t te m p te d to explain th e concepts th ey have called induction an d la te r abduction as well as scientific th eo ry form ation [Gregory 1987]. C o m p u ter scientists, recognizing th e im p o rtan ce of learning to intelligent behaviour in m an-m ade system s, have stu d ie d

machine learning.

T h e s tu d y of m achine learning is usually focused on one or m ore of th re e m ajo r goals. T h e cognitive science ap p roach aim s to discover and rec o n stru c t m odels of h u m an b ehaviour as a way of gaining g reater insight into th e workings of th e h u m an b rain [Van-Lehn 1990]. T h e second goal, encom passing m ost m achine learning research, deals w ith “knowledge discovery” : th e form ation of predictive rules from a large b u t finite n u m b er of em pirical observations. T h e th ird m ajo r goal is to equip ro b o ts w ith th e ability to learn from th eir environm ent as th e y work in it. T his would enable th e m to function effectively and adaptively in environm ents which are n ot highly re s tric te d and well u n d e rsto o d in advance by th e hum an designers of th e ro bo ts. T h u s a n off-the- shelf ro b o t could a d a p t to different working environm ents or a sp ecial-purpose ro b o t could work in dangerous or rem ote locations a b o u t which little is known.

T he p rim ary aim of th is thesis is to co n trib u te to th is la tte r goal, a lth o u g h it is m o tiv a te d by a fascination w ith th e rem arkable learning ability of hum ans. P o p p e r’s

(18)

account of indu ctio n, which provides a clue to th e hum an ability to learn from a very sm all nu m ber of observations is p articu larly influential:

W ithout waiting, passively, for repetitions to impress or impose regularities upon us, we actively try to impose regularities upon the world. We try to discover similarities in it, and to interpret it in terms of laws invented by us. W ithout waiting for premises we jump to conclusions. These may be discarded later should observation show that they are wrong. [Popper 1969b]

T h e thesis describes an algo rith m which m ight u n d erpin th e learning com ponent of an “intelligen t” active learning agent. W hen executed, th e algo rith m in te rac ts in a re stric te d b u t well-defined m ann er w ith a sim ulated environm ent. It d e m o n stra tes th e ab ility to acquire a predictive theory of th e environm ent. In th e course of developing th e theory, th e algo rithm perform s experim ents, th u s actively seeking learning experiences. It is interruptible — always aim ing to learn as m uch as possible as soon as possible, and assim ilating th e b e st of w hat has been learnt when in te rru p ted . W hen observations in dicate errors in th e developing theory, th e algorithm revises th e th eo ry to correct th em . It can develop useful, if som etim es incorrect, inductive hypotheses from very few observations, enabling it to continue to work in p u rsu it of o th er goals while w aiting for new experiences from which to learn. T his algorithm is called MINERVA. An extend ed exam ple of its b ehaviour is given in ch ap ter 2. Com prehensive experim en tal resu lts are re p o rte d in ch a p te r 8.

1.2 R e se a r c h F ra m ew o rk

T h e thesis is developed w ith in th e fram ew ork of Inductive Logic Program m ing (IL P ). In th is fram ew ork knowledge acquired by m achine learning is rep resented as a logic program. T h e nam e was coined by M uggleton [1991] to refer to th e stu d y of th e a u to m atic co n stru ctio n of logic program s from exam ples of th e behaviour of th e (unknow n) program s. He describes th e fram ew ork to be a t th e “in tersectio n ” of th e fields of m a chine learning an d logic programm ing. ILP research draw s heavily on th e resu lts of logic program m ing research which is concerned w ith u n d e rsta n d in g th e rep resen tatio n an d in te rp re ta tio n p ro p ertie s of logic program s. It benefits from th e com bination of th e form al foundations of logic program m ing and th e experim ental approach of m achine learning. C h a p te r 3 includes an in tro d u ctio n to th e basic principles an d techniques of IL P an d logic program m ing an d ch ap ter 4 describes several IL P and o th er learners.

1.3 R o le o f t h e L earn er

(19)

examples planning,

goal generation,

natural language ...

interruptions

MINERVA questions

answers

perception and action

Agent (MINERVA’s environment)

External world

Figure 1.1: Architecture of an intelligent agent

Minerva includes an interpreter which tra n sla te s from th e in tern al rep re sen ta tio n of a th eo ry as a program.

[image:19.519.73.483.80.583.2]

(20)

n a tu ra l events and th e actions of th e agent. T hey will be in te rp re te d th ro u g h th e agen t ’s sensors an d reasoning com ponents before presen tatio n to MINERVA in th e form al language. Sim ilarly th e form al language of experim ents produced by Minerva will be in te rp re te d by th e a g e n t’s reasoning com ponents to form ulate real world experim ents, observations, a n d occasional n atu ral-lan g u ag e questions.

Goals for learning are extern ally generated: by th e environm ent or by an o th e r com po n e n t of th e agent a n d com m unicated to Minervaby way of observations a n d in te rru p tions. These o th er reasoning com ponents of th e agent could have access to th e th eory developed by Minerva to aid th e ir reasoning.

1.3 .1 T h e learn er in th e e n v ir o n m e n t

Provided that the environment presents occasional examples of concepts, learning pro ceeds autonomously through experiments designed by MINERVA. These experiments are self-directed observations of the environment and, subject to time constraints, Min erva will perform as many as necessary to aid the search for inductive hypotheses. The experimental results (answers to questions) are assumed to be inexpensive to obtain as they are satisfied by observations of the environment in which MINERVA is embedded.

In th is work th e environm ent is m odelled by a novel software tool called Sam pler,

su itab le for in teractio n w ith an a rb itra ry co-operative learner. SAMPLER is p rim arily responsible for draw ing facts from a ta rg e t th eory to present as exam ples to a learner; for providing th e observations which correspond to th e resu lts of a le a rn e r’s experi m ents; an d for enforcing tim e lim its on learning. These actions can be m o d u lated by user-controlled p aram eters.

F igure 1.2 illu strate s th e role of Sampler in m odelling th e environm ent for Minerva

a n d ch ap ter 7 describes Sampler in m ore detail. T he em pirical evaluation of Min

erva’slearning perform ance in ch a p te r 8 uses Sampler for environm ental m odelling.

1 .4

K e y F e a tu r e s

T he m ajo r c o n trib u tio n of th is thesis is a stra te g y for revision, in co rp o ratin g b o th generalization and specialization, th a t is suitable for a learner having th e following key features. Taken to g eth e r these featu res distinguish Minerva from o th er learning algorithm s.

1. F irs t order p red icate calculus knowledge represen tatio n

2. D om ain independence

3. Increm en tal revision p ro m p te d by increm ental in p u t

(21)

sampling parameters

F ig u re 1.2: Modelling the learning environment

5. M ultiple in tersp ersed concept learning

6. Sim ple in p u t language

7. A ctive ex p erim en tatio n

8. In te rru p tib ility for anytim e learning

9. T h eoretical fo u n datio n

1 .4 .1 F irst o rd er p r e d ic a te ca lcu lu s k n o w led g e r e p r e s e n ta tio n

[image:21.519.71.485.69.462.2]

(22)

an d in d irect recursion. T here are some syntactic restrictions on full norm al program s: th ere m ay be only one negative literal in th e bo dy of a clause and th a t literal has a p red icate sym bol which occurs in no o th er clause body; and every variable in th e h ead of a clause also occurs in th e body. Nevertheless, th e language of Minerva is considerably m ore com plex (and expressive) th a n is custom ary. T he expressive power of Minerva’s rep re sen ta tio n is th e source of three problem s addressed in th e thesis: an

increased space of possible inductive hypotheses; undecidability of logical en tailm ent; a n d th e logical an d com p u tatio n al difficulties associated w ith negation-as-failure.

A precise characterisation of Minerva’s knowledge representation language is given in chapter 5.

1 .4 .2 D o m a in in d e p e n d e n c e

Like m any o th er learners Minerva is dom ain-independent. T here are no prior assu m p

tio n s a b o u t th e term s of th e language of observation: it is generated entirely by th e environm ent, all term s having equivalent sta tu s. T he environm ent is only assum ed to be able to provide in p u t of th e syntactic form required by th e learner a n d to tru th fu lly answ er questions p resen ted in th e sam e syntactic form.

1 .4 .3 In c r e m e n ta l r ev isio n in r e sp o n se to in crem en ta l in p u t

M ost experim en tal work in m achine learning, and p articu larly in th e IL P fram ew ork, assum es t h a t th e in p u t is presented as a batch. T h a t is, th e in p u t to a learner is a finite set of observations or exam ples. T his enables a learner to m ake learning decisions w ith sim ultaneous consideration of th eir effect on th e full in p u t.

T h is m odel is a p p ro p ria te w hen the goal of th e learning algorithm is to discover p a tte rn s in th e in p u t, b u t it is in ap p ro p ria te for a long-lived learner o p eratin g in a complex environm ent. We would like Minerva to be able to ju m p to inductive hypotheses from p a rtia l inform ation, even a single exam ple, so th a t th e agent can use th e p a rtia l th eory to m ake decisions an d to guide fu rth e r learning. T his requires m odelling in p u t as a (possibly infinite) sequence of exam ples, allowing rep etitions.

T h e te rm incremental is usually applied to th is notion: of in p u t being a sequence ra th e r th a n a set. However, it is clear th a t a b atch learner can be m ade increm ental by sim ply allowing it access to one exam ple a t a tim e, having it keep a record of every exam ple, an d having it re-learn a th eo ry from th e com plete exam ple set each tim e a new exam ple is supplied. Similarly, an increm ental learner can be viewed as a b atch learner by giving it all th e in p u t a t one tim e and allowing it to order it internally.

(23)

a d ju st th a t th eo ry to account for th e observed errors.

It is a te n e t of th is thesis th a t an increm ental learner working w ith tim e lim itatio n s m u st increm entally revise a p a rtia l th eo ry w hen an erro r is exposed. T h is requires diagnosis to blam e a p a rt of th e th eo ry for causing th e error, followed by revision to correct th e error. A su itab le stra te g y for revision, in co rp o ratin g b o th generalization an d specialization, is a m ajo r co n trib u tio n of th is thesis. It is described in c h a p te r 5.

1 .4 .4 L ea rn in g from b a ck g ro u n d k n o w led g e

T h e te rm background knowledge refers to inform ation a b o u t th e learning p roblem which is n o t d irectly represented in th e in p u t. T he need for learning m echanism s to take account of background knowledge has been a p p a re n t for a long tim e. T h is recognition leads im m ediately to th e desire to em pow er a learning agent to learn its own background knowledge [Sam m ut 1981b, S tepp, W hitehall and H older 1988, Russell 1989].

MINERVA im plem ents th is notio n by having a uniform knowledge re p re sen ta tio n for any initially-provided background knowledge an d th e knowledge it acquires by lea rn ing. Inductive hypotheses co n stru cted by Min erva are assim ilated into its th eo ry

an d becom e available for use in fu rth e r learning as b ackground knowledge for th e n ex t learning task. Minerva uses IL P techniques, p articu la rly inverse resolution to s u p p o rt learning from background knowledge. C h a p te r 9 presen ts th eo retical resu lts, especially regarding soundness an d com pleteness pro p erties, th a t u n d e rp in th e inverse resolution o p e ra to r used in Min e r v a.

1.4 .5 M u ltip le in te r sp e r se d c o n c e p t lea rn in g

Unlike m ost learners, MINERVA can learn a th eo ry a b o u t m ore th a n one concept. E x am ples of each concept m ay be interspersed in th e in p u t, alth o u g h some sequence orders will enable b e tte r results th a n o thers. Minerva c o n stru c ts concept descriptions in term s of o th er concepts represented by in p u t exam ples.

Some of th e special difficulties of m ultiple concept learning are discussed by De R a e d t, Lavrac and Dzeroski [1993].

1 .4 .6 S im p le in p u t la n g u a g e

(24)

MINERVA does require th a t th e environm ent is consistent a t all tim es. T h e t r u th s ta tu s of any fact m u st never change. A consequence of th is is th a t noise in th e in p u t is not su p p o rte d : Min e r v a m akes th e perfect data assum ption [Brazdil an d C lark 1990] in ord er to enable deeper investigation of o th er aspects of th e revision problem . D espite th is, th e design of MINERVA lends itself n a tu ra lly to th e inco rporatio n of noise h andling m echanism s.

1 .4 .7 A c tiv e e x p e r im e n ta tio n

R a th e r th a n passively w aiting for exam ples of concepts to come along, we w ould like a learner to actively experim ent. A learner endowed w ith th is cap ability is known as

active.

T he role of e x p erim en tatio n in childhood learning was identified by P iag et [G insburg an d O pp er 1979]. E x p erim en tatio n is also fundam ental to scientific th eo ry form ation [K lahr 1994]. Indeed, P o p p e r’s [1969a] philosophy of science dem ands th a t scientific in qu iry proceeds by a tte m p tin g to discover counter-exam ples to unlikely b u t ex p lan ato ry hypotheses.

E x p erim en ts can be used to d istinguish betw een a ltern ativ e hypotheses, each of which is satisfacto ry w ith resp ect to experience. They can also be used to diagnose a n d correct m istakes in w h at has alread y been learnt. E stablished techniques for debugging logic program s are useful for th is purpose.

M any learning system s assum e th e existence of an oracle or a teacher which can a n swer questions asked by th e learner. A lternatively, these questions could be seen to represen t th e carrying o u t of an experim ent or th e m aking of a p a rtic u la r observation by an auto no m o u s learner, in which case th e answ er is assum ed to be given by th e environm ent.

MlNERVA asks questions to aid diagnosis and revision, described in ch a p te r 5. We prefer th e ex p erim en tal/en v iro n m en tal view of th e questions because for a learner em bedded in an environm ent such directed observations could be inexpensively answ ered. In p ractice th ey could be answ ered by a com bination of a teacher an d th e environm ent: th e capabilities requ ired of th e question-answ erer and th e exam ple-giver are th e same. T he questions asked by MlNERVA are of a very sim ple form: is some fact tru e or false? T h e fact is expressed in th e sam e language in which th e environm ent has expressed exam ples. T h e environm ent m u st give either a yes or no response and th a t response m u st be consistent w ith earlier exam ples and answers.

However, ex p erim en tatio n can n o t norm ally be exhaustive. T he environm ent n a tu ra lly im poses restrictio n s on th e tim e and m aterial resources available. Some difficulties w ith co n stru ctin g experim ents, including availability of m aterials and su ita b ility of th e c u rre n t s ta te of th e environm ent, are beyond th e scope of th e environm ental m odel used here. These issues are addressed by Cheng [1991] and H um e an d S am m u t [1991a].

(25)

questions tend to be more demanding of the environment. The questions of Minerva

are at least as easy as those of any other active learner. This means th at the information gained from answers is limited and the learning task is harder for Minerva than for other learners which ask more expressive questions.

A ngluin [1990] shows th a t theoretically, allowing a learner to ask questions im proves th e learning tim e and exam ple com plexity perform ance of a learner. Indeed, if ques tions for which th e learner can correctly guess th e answ er are regard ed free of cost, exam ple com plexity perform ance is rem arkably im proved [G oldm an a n d Sloan 1994]. As explained by Shapiro [1981], asking questions need n o t affect th e th eo retical com  pleteness of th e learner provided th a t th e environm ent is assum ed to sup ply all possible exam ples eventually (or, m ore precisely, an enum eration of th e environm ent).

1 .4 .8 In te r r u p tib ility for a n y tim e lea rn in g

MlNERVA is designed to be a learning com ponent of an agent which usually functions to satisfy goals u n rela te d to learning. W hen th e agent is u nab le to satisfy a goal because of an observed failing of its knowledge, MlNERVA is invoked. Minerva considers m eth o d s to correct th e error in a best-first m anner, allowing it to be in te rru p te d a t any m om ent w ith o u t w arning. It is u n im p o rta n t w h ether th e in te rru p tio n is g enerated ex tern ally by an environm ental occurrence or teacher com m and, or w h eth er th e a g e n t’s own goals require a shift of a tte n tio n . T he in te rru p tio n is assum ed only to h a p p e n u n ex p ected ly and beyond th e control of MlNERVA.

Minerva’ssu p p o rt for in te rru p tio n in th is way is uniq ue am ongst IL P learn ers. G e n erally, intelligent agent com ponents designed to improve resu lts w ith increased tim e availability are called anytim e algorithms [Dean and B oddy 1988, Poole 1993].

T h e m ajo r challenge of anytim e learning is th e allocation of tim e in th e b e st possi ble m ann er. Minerva’s consideration of a lte rn a tiv e revisions su p p o rts increm ental im provem ent: easily evaluated revisions are investigated before m ore com plex a lte rn a tives an d inform ation gained during th e evaluation directs th e following investigation. T h e b est revision d eterm in ed a t th e tim e of in te rru p tio n is m ade a n d MlNERVA aw aits fu rth e r in p u t. C h a p te r 6 describes how th is is done.

1 .4 .9 T h e o r e tic a l fo u n d a tio n

(26)

1.5 S tr u c tu r e o f t h e T h e sis

The succeeding chapter illustrates the behaviour of MlNERVA by way of an extended annotated example of a learning sequence.

Basic concepts from m achine learning and logic program m ing, needed for discussing th e learning problem in d e p th , are introd u ced in ch ap ter 3. C h a p te r 4 surveys o th er relevant research work to provided a context for th e work of th is thesis.

C h a p te r 5 outlines th e a rch itectu re of MlNERVA and deals in d e p th w ith in te rp re ta tio n , diagnosis, e x p erim en tatio n and redundancy. C h a p te r 6 describes th e in teg ratio n of these techniques u n d e r a heuristic search control stra te g y in MlNERVA.

A tool for environm ental m odelling called Sampler is described in ch a p te r 7. T h is tool is used in ch a p te r 8 to em pirically evaluate th e perform ance of MlNERVA: th e evaluation pro cedu re is described an d th e resu lts of em pirical tests re p o rte d an d analysed.

C h a p te r 9 provides a su p p lem entary theoretical definition and analysis of key com po n en ts of MlNERVA underlying th e generalization strategy.

(27)

2.1 I n tr o d u c tio n

In th is c h a p te r th e learning behaviour of Minerva is illu stra te d by an ex ten d ed exam ple

based on fam ily relation sh ip concepts. T he exam ple represents a com plete, u n e d ite d , learning sequence from a s ta rtin g p oint devoid of background knowledge. It is in tend ed to give th e read er an idea of th e power and range of th e capabilities of MINERVA before presen tin g th em in m ore d ep th . Some form al term inology is used inform ally in th is ch a p te r before definition in th e succeeding ch ap ter. Some readers m ight prefer to read ch a p te r 3 before th is one.

Figure 2.1 describes th e fam ily tree for th e th re e fam ilies of th e exam ple. A generally descending line connects a m o th er or a fath e r to each of his or her offspring. T h e fam ilies are n o t conventional, b u t th ey have b een chosen to d e m o n stra te p a rtic u la r featu res of MINERVA in a sm all dom ain.

T h e order of p rese n tatio n and tim ing of th e exam ples is also d eliberate. N orm ally these would be controlled by th e environm ent b u t th e user-guided p rese n tatio n given here enables a d e m o n stra tio n of Minerva’sm ajo r features. N orm ally, too , th e h ypotheses would be verified by experim ents in th e environm ent ra th e r th a n questions posed to a user; all experim ents are in d icated here to aid th e explanation.

N o ta tio n

W hen Minerva is idle, aw aiting in p u t, th e p ro m p t m ark “> ” is indicated . E x am ples p resen ted to Minerva as observations are w ritte n as an ato m prefixed by “ -f” for a positive exam ple (tru e atom ) or ” for a negative exam ple (false a to m ), an d

(28)

Mark = Cathy

John = Sally Andrew = Alice Matthew

James Mary Simon

Emily Alison Andrea Mia Michael

Richard = Robyn

Paul

Bill = Kim Peter Tom = Paula

Kate Kris

F igure 2.1: Example Families

Philip Betty Karen Rowena Sam Tim

term inated w ith a full stop. W hen Minerva asks a question it is w ritten as an atom

preceded by a question-mark ( “ ?”) and followed by a full stop. The answer provided by the environment appears as “y” for yes or true, or “n ” for no or false. Sometimes a hypothesis being considered by Minerva is noted for illustrative purposes w ithin the

dialogue as a clause preceded by the m ark “...” , although it would not normally be displayed by Minerva. W hen the deliberations of Minerva are not p erm itted to

[image:28.519.43.456.76.528.2]

(29)

2 .2

D isc o v e r in g N e w F a cts

In itially th e program representing Minerva’sth eo ry is em pty. T h e first few exam ples observed provide no o p p o rtu n ity for learning an y th in g beyond th e exam ples them selves, which are sim ply ad o p ted as u n it clauses in th e program .

> +person(john).

> +person(mary).

> +person(peter).

> +person(james).

> +person(sim on).

> +per son (cathy).

> +fother(john, mary).

T h e last exam ple does provide an o p p o rtu n ity for th e first te n ta tiv e in d u ctio n step.

MINERVA already knows som ething a b o u t b o th john an d m a ry, and th is background knowledge provides th e o p p o rtu n ity for generalization of those people to o th ers like th em . Know ing th a t john is one of several instances of person, th e featu re john is replaced by th e m ore general featu re person, an d Minerva asks a question to te s t th e hyp oth esis fother(X, mary)-fr-person(X).

? father (cathy, mary). n

Because th e answ er was negative, th e alte rn a tiv e h ypothesis which replaces m ary by any person, father (john, X)<r-person(X), seems im m ediately m ore in terestin g a n d is te ste d next.

? father(john, cathy). n

W ith a n o th e r negative response, Minerva re tu rn s a tte n tio n to th e first hypoth esis, testin g it fu rth er.

? father(jam es, mary). n

A gain, a tte n tio n is tu rn e d to th e second hypothesis. T his tim e th ere is a positive response, so work on th is h ypothesis continues fu rth e r. T he encouragem ent of th e positive response is not outw eighed by th e succeeding two negative responses a n d th e testin g of th e hypothesis continues to com pletion.

? father (john, jam es). y ? father (john, john). n ? father(john, peter), n ? father (john, simon). y

There is nothing further to be done with that hypothesis although time remains, so MINERVA tests the other hypothesis further in case there is more to be discovered.

(30)

Now all possible generalizations of th e exam ple are fully teste d an d MINERVA chooses th e b est one to assim ilate into th e program . In ad d itio n to th e b est hypothesis, MIN ERVA a d o p ts u n it clauses for each of th e newly discovered facts which are not covered by th e p referred hypothesis. In th is case Minerva chooses sim ply to ad o p t th e exam ple itself as well as th e newly discovered facts. Here is th e program a t th is point.

person(john)*— person(mary)*— per son (pet er)*— person(james )*— person(simon)*— person( cathy)*— father(john, mary)*— father(john, simon)-<—

father (john, james)<—

2 .3

L e a rn in g fro m B a c k g r o u n d K n o w le d g e

Now we’ll allow Minerva to quickly build up its th eory w ith some m ore facts. A lthough MINERVA can an d does develop inductive hypotheses from these exam ples, we do not allow th e tim e for developing th e hypotheses and so th e facts them selves are a d o p te d as u n it clauses.

> +father (mark, john). !

> +father (mark, andrem).

!

> +father (andrem, emily).

!

> + father (andrem, alison).

Now Minerva uses th e background knowledge it has b u ilt up to learn a clause de

scribing a new concept, grandfather. Here, as before, a negative answ er for th e first hypothesis te s t causes th e focus of a tte n tio n to shift to an a ltern ativ e hypothesis.

> + grandfather (mark, alison).

... grandfather(X, alison)*— father(X,Y) ? grandfather (andrem, alison). n

... grandfather (mark, X)*— father(X,Y)

? grandfather (mark, andrem), n

T h e nex t hy pothesis trie d is a specialization of th e first one which has been rejected:

(31)

...grandfather (m ark, X)<— fa th e r (andrew, X ) ? grandfather (m ark, em ily). y

T h e positive answ er here confirms every fact covered by th is h ypothesis, so th e h y p o th esis is generalized fu rth e r, using th e background knowledge fa th er(m a rk, andrew)<r~.

... grandfather(X,Y)< — fa th e r (X ,Z ), fa th e r (Z ,Y ) ? grandfather (m ark, ja m e s). y

? grandfather (m ark, m ary). y ? grandfather (m ark, sim o n ). y

T hese answ ers confirm all th e facts covered by th a t hypothesis. F u rth e r hypotheses are investigated by MINERVA b u t no fu rth e r questions are req uired to evaluate th em . E ventually MINERVA chooses to ad o p t th e la tte r hypothesis.

2 .4

R e tu r n in g to an E arlier C o n c e p t

E arlier, Minervalea rn t some facts a b o u t th e concept person. Here Minerva observes some m ore exam ples of persons, an d finds a way to describe th em in te rm s of o th er concepts learnt in th e intervening p eriod. Several hypotheses are investigated by MIN ERVA b u t shown here is th e only one which gives rise to some questions an d which is eventually ad o pted.

> +per son (em ily).

...person(X )*— fa th e r (Y ,X ) ? person(alison). y

? per son (andrew ). y

2 .5

I n te r r u p tio n

Now we aim to b uild u p some m ore of th e th eo ry to prep are for d e m o n stra tio n of some m ore in terestin g behaviour. T h e search by Minerva for hypotheses to cover each of th e following exam ples is p rem atu rely te rm in a te d by ex tern al in te rru p tio n , so each exam ple is sim ply ad d e d as a fact to th e program . We can assum e th a t th e in te rru p tio n corresponds to an inab ility or unwillingness of th e environm ent to answ er th e question, or th e need of th e agent in which Minervais em bedded to focus resources elsewhere.

> + m other(sally, ja m e s).

f

> + m other(sally, m ary).

I

> + m arried(john, sally).

/

> + m arried(andrew , alice).

/

(32)

> +mother(alice, andrea).

2 .6

L e a rn in g a S y m m e tr ic R e la tio n

T he n ex t exam ple p ro m p ts a sim ple sym m etry hypothesis to be g en erated using th e background clause married(andrew, alice)<r~.

+married(alice, andrew).

... married(X,Y)<— married(Y,X). ? married (sally, john). y

MINERVA would continue to generate and test other hypotheses if not for the inter ruption here. Instead, the hypothesis married (X,Y)<—married(Y,X) is adopted. This hypothesis is an example of a clause which can contribute considerably to the concise ness of the program representation of a theory — for every married couple henceforth only one side of the relationship need be presented as an example and Minerva imme diately recognizes the dual fact — but which can create difficulties for interpretation of the program by standard logic program interpreters. Minerva’s interpreter has no difficulty with it.

T he following sequence gives th e first exam ple of error diagnosis in action. In th is case m issing answer diagnosis is em ployed, b u t we will su spen d th e ex p lan atio n of diagnosis u n til we have a m ore com plex p rogram and hence a m ore com prehensive exam ple. For th e p resent th e reader should n o te th a t th e sym m etric relation is easily han d led by th e diagnoser, alth ou gh conventional declarative diagnosis techniques would n o t te rm in a te in diagnosing th is m issing answer. Again, we in te rru p t learning p rem a tu re ly here to keep it brief. In th e following, th e questions asked before th e first inductive hypothesis is in d icated by “ ... ” are em ployed in the diagnosis stage of learning.

> + mother(robyn, kim).

> + mother(robyn, peter). /

> + married(richard, robyn).

? married(robyn, richard). y

... some uninteresting hypotheses

1

> + married(cathy, mark). ? married(mark, cathy). y

... some uninteresting hypotheses /

(33)

2 .7

L e a rn in g E x c e p tio n s

So far, negative answ ers to questions have caused inductive hypotheses to be rejected o u trig h t. In th e following sequences we see two ways th a t Minerva can m odify a hypothesis to exclude counter-exam ples discovered d uring ex p e rim e n tatio n (or even previously observed). F irs t, th e second-last hypothesis in th e nex t sequence is special ized to generate th e last one.

> + father(richard, peter).

...father (richard, X)<— person(X) ? father (richard, cathy). n

...father (richard, X)<— mother (Y ,X ) ? father(richard, andrea). n

...father (richard, X)<— mother (robyn, X) ? father (richard, him), y

N ext we see how counter-exam ples can be handled by inventing an exception p red icate. T h is is done in preference to rejecting a hypothesis when th e n ex t b est hypo thesis is so poo r th a t working a ro u n d th e exceptions results in a b e tte r hypothesis. T h e ab ility to ad o p t a hy pothesis even in th e presence of counter-exam ples gives Minerva some resilience to noise in th e environm ent and to errors in its background knowledge. In th e nex t sequence th e previous hypothesis, having been fully teste d is generalized fu rth e r using th e background clause married (richard, robyn)<—.

...father(X,Y)<— married(X,Z), mother(Z,Y) ? father (andrem, andrea). y

? father (andrem, mia). n

A negative answ er to th is last question causes a tte n tio n to be tu rn e d to o th e r h y p o th e ses. A fter investigating oth ers, w ith o u t need for fu rth e r questions, Minerva re tu rn s to te s t th is fu rth er:

? father (mark, matthem). y

Now it is fully tested and one counter-example was discovered. But there are still some other less promising hypotheses to be investigated in the available time.

...father (richard, X)<— mother(Y,Z), m other(Y,X) ? father(richard, james). n

!

At interruption, Minerva has not exhausted all possible inductive hypotheses, but the best one so far, comprising two clauses, is adopted.

father(X,Y)<— married(X,Z), mother(Z,Y), ~ father®(X,Y) father® (andrem, mia)<—

(34)

2 .8

R e m o v in g R e d u n d a n t C la u ses

Two more examples will enhance the illustration of redundancy detection.

> +mother(sally, simon).

!

> +mother(cathy, john).

I

Now Minerva has constructed the following program to represent a theory of families. The clauses marked by are redundant in the sense that every fact they contribute to the theory is also covered by another clause.

* per son (john)<—

* person(mary)<—

* person (peter)<—

* person(james)<—

* person(simon)i— person( cathy)<—

* father (john, mary)<—

* father (john, simon)<—

* father (john, james)i—

* father (mark, john)<— father(mark, andrew)<r-father(andrew, emily)<r-father(andrevo, alison)<—

grandfather(X,Y)<r- father(X,Z), father(Z,Y) person(X)i— father (Y,X)

mother (sally, james)<— mother (sally, mary)<— married(john, sally)<— married (an drew, alice)<r-mother(alice, mia)i— mother(alice, andreaji— married (X,Y)<— married(Y,X) mother(robyn,

kim)<r-mother(robyn, peter)<— married(richard, robyn)<r-married(cathy, mark)<— mother(cathy, matthewji—

father(X,Y)<r- married(X,Z), mother(Z,Y), ~ father® (X, Y) father® (andrew, mia)<—

mother (sally, simon)<r-mother(cathy,

(35)

as tim e allows — sleeping m ay be in te rru p te d a t any tim e.

> sleep

T h e pro gram representing th e th eo ry of fam ilies is sim plified by rem oving each re d u n d a n t clause from the program . These com prise several u n it clauses a b o u t person an d

father as th e facts th ey describe are also covered by o th er clauses in th e program .

2 .9

D ia g n o sis

MINERVA has a sh o rt-te rm m em ory for facts d istin ct from th e m em ory for a program . W hen exam ples are observed or questions are answ ered th e facts an d th e ir validity are sto red in th e finite fact m em ory in a first-in-first-out allocation schem e. T hese facts are typically (although n o t always) consistent w ith th e th eo ry w hen Minerva

is idle. T he p urpose of th e fact m em ory is sim ply to reduce th e n u m b er of questions requ ired in diagnosis an d ex p erim en tation — a question is n o t asked if th e answ er is found in th e fact memory. T h e fact m em ory is only sh o rt-te rm because th e facts are also rep resented in th e theory, usually in a m ore com pact form , and so th e y do n o t co n trib u te to th e knowledge of th e learner. T he size of th e fact m em ory affects th e nu m ber of questions asked of th e environm ent by Minerva, b u t otherw ise does n o t affect learning perform ance.

Here we illu strate diagnosis — when an error in th e th eo ry is n o t due to an erro r in th e definition of th e concept of th e exam ple b u t ra th e r to an error in a concept on which th e definition depends. T his exam ple p a rticu la rly d em o n strates contradiction backtracing

diagnosis, invoked for an error in a negative exam ple. F irs t, two m ore exam ples m ake read y for th e diagnosis exam ple.

> +mother(alice, michael). f

> + stepfather(andrew, michael).

At th is po int we have th a t grandfather (mark, michael) is tru e in th e theory, using some u n it clauses a n d th e rules

grandfather(X,Y)<— father(X ,Z ), fa th er(Z ,Y )

father(X,Y)<— m arried(X,Z), m other(Z ,Y ), ~ father® (X ,Y )

> — grandfather (mark, michael).

C o n trad ictio n backtracing diagnosis com m ences by verifying each of th e an teced en t facts which co n trib u te to th e false conclusion of th e first rule: father(m ark, andrew) is confirm ed to be tru e in th e fact m em ory a n d father (andrew, michael) is checked w ith a question.

(36)

B u t th e second rule im plied th a t th e answer should be “y ” . So now th e diagnosis p ro cedure checks th e an teceden ts of th a t rule, married(andrew, alice) and mother(alice, michael) are each confirm ed tru e in th e fact memory, father0 can not be asked a b o u t because it is an in ternally invented concept and has no m eaning in th e environm ent.

Minerva m u st conclude t h a t eith er th e rule is wrong or there should be a n o th e r ex ception to th e rule: father0(andrew, michael).

2 .1 0

G e n e r a liz in g a n E x c e p tio n

Because of th e u tility of th e rule in question in accounting for several tru e facts in th e theory, in th is case Minerva chooses to reta in th e rule b u t to record a n o th e r exception to it. So father0(andrew, michael) is assum ed m issing from th e th eo ry and Minerva

goes on to generalize th e concept as for any learning exam ple. Indeed, Minerva

finds a n d a d o p ts a good generalization w ith o u t any need to ask fu rth e r questions:

father0 (X, Y)^—stepfather (X, Y).

Now Minerva will learn some m ore ab o u t step fath ers, firstly by an exam ple of a

stepfather.

> +stepfather (andrew, mia).

At this point the unit clause corresponding to the example is adopted, and as a conse quence father (andrew, mia) is false in the theory. MINERVA may also be prompted to generalize stepfather by a negative example of a father. In this case only one question of the diagnosis phase is apparent because the other answers are available in the fact memory.

> +mother(kim, kate). !

> +mother(kim, kris).

!

> +married(bill, kim). ? married(kim, bill), y

I

> —father(bill, kris). ? stepfather (bill, kris). y

...stepfather (bill, X)<— mother (Y ,X ) ? stepfather (bill, andrea). n

...stepfather(X, kris)<r- married(X,Y) ? stepfather (andrew, kris). n

...stepfather(bill, X)<— mother (kim, X) ? stepfather (bill, kate). y

(37)

? stepfather(john, jam es). n

T h e b e st hypothesis found before in te rru p tio n is stepfather (bill, X)<— m other (kim , X ).

T h is hy pothesis is ad o p ted because it is useful in th e present environm ent — if MINERVA

la te r learns of o th er children of kim an d bill it m ay be revised or rem oved.

2 .1 1

M issin g A n sw e r D ia g n o sis

U ntil now th e exam ples of diagnosis have focused on th e identification of a false rule. Here m issing answ er diagnosis aim s to find w h ether th e m issing ato m of grandfather

is due to a m issing ato m in a concept on which it depends. O nly one clause of th e pro gram , grandfathcr(X,Y)<r-father(X,Z), fa th er(Z ,Y ) could account for th e m issing ato m th ro u g h m issing atom s in sub-concepts.

> + grandfather(richard, kris). ? father (kim, kris). n

? father (peter, kris). n

T h ere is no s u b s titu tio n for th e variables in th e clause th a t would have th e a n teced en ts being tru e facts in th e theo ry an d im plying th e desired consequent. T h ere are, however, two possibilities for th e first antecedent: father (richard, kim ) and father (richard, peter)

are tru e in th e theory. B u t questions show th a t th e second an teceden t ca n n o t be satisfied in each case. Min e r v a is unable to ask an existential q uestion of th e form “Is richard th e father of anyone else?” a n d so is forced to assum e t h a t th is ru le is n o t a p p ro p ria te for concluding grandfather (richard, kris), an d th u s th a t a to m is diagnosed as uncovered. MINERVA proceeds to generalize it.

...grandfather(X, kris)<— m arried(X ,Y ) ? grandfather(andrew, kris). n

...grandfather (richard, X)<— m other (Y ,X ) ? grandfather (richard, andrea). n

...grandfather (richard, X)<— mother (kim, X ) ? grandfather (richard, kate). y

1

T h e clause grandfather(X,Y)<—father(X ,Z), m other(Z ,Y ) would be considered, an d eventually ad o p ted , if MINERVA were p e rm itte d to continue fu rth e r b u t a t th is tim e th e less general clause grandfather (richard, X)^—mother (kim, X ) is th e b e s t found a n d is a d o p ted upo n in te rru p tio n .

2 .1 2

R e p la c in g a R e d u n d a n t R u le

(38)

Minerva at the time of its adoption. Later, when more is known, a more general hypothesis may be more appropriate. Here Minerva demonstrates its ability to replace

the true hypothesis about grandfather just learnt, by a more general one at a later time. First we introduce another family to Min e r v a, noting the ease with which Minerva is

able to learn more about person, interspersed with other concepts. Indeed, Minerva

finds a hypothesis about person expressed in terms of another concept which has only become available in the meantime.

> +father(paul, paula). > + mother (paula, philip).

!

> +mother(paula, betty).

I

> +mother(paula, karen).

!

> +person(karen).

...person(X)<— mother (Y,X) ? per son (betty). y

? person(kate). y ? person(kris). y ? person(mia). y ? person(michael). y ? per son (philip). y !

Now we give another grandfather example. The question in the diagnosis stage indicates an examination of the grandfather clause in terms of a father of a father.

> + grandfather(paul, karen). ? father(paula, karen). n

...grandfather(X, karen)<— father(X,Y)

? grandfather(andrew, karen). n

...grandfather (paul, X)<— mother(Y,X) ? grandfather (paul, andrea). n

...grandfather (paul, X)<— mother(paula, X) ? grandfather (paul, betty). y

? grandfather (paul, philip). y

...grandfather(X,Y)<r- father(X, Z), mother(Z,Y)

...grandfather(X,Y)<— person(Y), father(X,Z) ? grandfather (andrew, cathy). n

t

(39)

person(cathy)<—

father (mark, andrew)<— father (andrem, emily)<— father(andrew,

alison)i-grandfather(X,Y)<r- father(X,Z), father(Z,Y) person(X)<— father(Y,X)

mother (sally, james)<— mother (sally, mary)<r-married(john, sally)<— married(andrew, alice)<— mother(alice, mia)<— mother (alice, andrea)<— married (X,Y)<— married(Y,X) mother(robyn, kim)<—

mother(robyn, peter)<— married(richard, robyn)<— married(cathy, mark)<r-mother(cathy, matthew)^—

father(X,Y)<— married(X,Z), mother(Z,Y), ~ father® (X ,Y)

* father® (andrew, mia)*— mother (sally, simon)<— mother (cathy,

john)<r-stepfather(andrew, michael)<r-father® (X,Y)<— stepfather (X,Y) mother(alice, michael)<— stepfather (andrew, mia)<— mother(kim, kate)<r-mother(kim, kris)<— married(bill,

kim)<r-stepfather(bill, X)<r- mother(kim, X)

* grandfather (richard, X)<— mother (kim, X) father(paul,

paula)<r-mother(paula, philip)<— mother(paula, betty)<— mother(paula, karen)<r-person(X)<r- mother(Y,X)

grandfather(X,Y)<— father(X,Z), mother(Z,Y)

The redundant clauses — those that are marked including the earlier clause for

grandfather — may be removed.

(40)

2 .1 3

S im p lify in g E x c e p tio n s

T h ere is one m ore featu re of Minerva to highlight here. We have seen th a t exceptions

to a clause m ay be recognized an d collected as th ey are discovered. Som etim es, th e exceptions m ay becom e so num erous and complex th a t th e rule com bined w ith th e exceptions becom es m ore com plex th a n th e observational facts it accounts for!

In th e n ext sequence Minerva observes m any counter-exam ples to th e fa th e r rule un til it is eventually replaced. If th e counter-exam ples are them selves step fath ers, this will n o t ad d com plexity because they are sim ply represented as previously unknow n instances of ste p fa th e r. In stead th ey m ust be counter-exam ples to fa th e r which are not instan ces of stepfather. P erh ap s th ey are a d u lt children of paula who do n o t care to acknowledge a relation sh ip w ith paula''s present husband, tom . In order to clarify the pro cedu re here, assum e th a t th e sh o rt-te rm fact m em ory is em pty a t th is po int. T his enables us to see th e reasoning of Minerva by th e questions asked.

> + m arried(paula, tom ).

? m arried(tom , paula). y

I

> —fa th e r(to m , philip). ? m otherfpaula, philip). y ? stepfather (tom , philip). n ? fa ther(jo h n, ja m e s). y ? fa th er(jo hn , m ary). y ? fa th e r (john, sim on). y

...father® (tom , X)<— m o th e r (X ,Y )

In th is dialogue, MINERVA diagnosed a m issing exception to fa th e r b u t th e n checked some of th e o th er fa th e r facts covered by th e excepted clause. Those facts were tru e , so w ith o u t checking all of th em , it seems b e tte r to add a n o th e r exception and th e n to a tte m p t to generalize th e exception. In th is case, no good generalization was found an d th e exception h ypothesis father® (tom , philip) <— is ado pted . Now we define some m ore children of paula so th a t th ere will be a sufficient num ber of exceptions. For each, th e u n it clause hypothesis is adopted.

> +m other(paula, rowena). f

> + m other(paula, sam ).

1

> +m other(paula, tim ).

Here, each counter-exam ple causes Minerva to check one m ore fact th e rule accounts for, before deciding to add yet a n o th e r exception.

(41)

> —father(tom, sam). ? stepfather(tom, sam). n ? father (mark, matthew). y

I

> —father(tom, tim). ? stepfather (tom, tim). n ? father (mark, john). y

The next counter-example, together with some more counter-examples found while checking more covered facts, is enough to tip the balance.

> —father(tom, karen).

? mother(paula, karen). y ? stepfather (tom, karen). n ? father (tom, betty). n ? father(richard, kim). y ? father(richard, peter), y

>

Minerva decides to remove the father clause that has been so troublesome and each of its exceptions:

father(X,Y)*— married(X,Z), mother(Z,Y), ~ father®(X,Y) father® (X,Y)*~ stepfather(X,Y)

father® (tom, philip)*— father® (tom, rowena)*— father® (tom, sam)*-father0 (tom, tim)*—

They are replaced by unit clauses to cover each of the true facts for which the clause alone was responsible for including in the theory. That is, each clause defining father®

is removed and the following clauses are added.

(42)