• No results found

A corpus based approach to generalise a chatbot system

N/A
N/A
Protected

Academic year: 2021

Share "A corpus based approach to generalise a chatbot system"

Copied!
179
0
0

Loading.... (view fulltext now)

Full text

(1)

Bayan Aref Abu Shawar

Submitted in accordance with the requirements for the degree of Doctor of Philosophy

University of Leeds School of Computing

April 2005

The candidate confirms that the work submitted is his/her own and that appropriate credit has been given where reference has been made to the work

of others.

This copy has been supplied on the understanding that it is copyright material and that no quotation from the thesis may be published without proper

(2)

12345647 466897 37 647 6397 227 4347 427 997 97 34387 83397 297 426867 94347 7 427 97 6947 8697 5847 427 994973746766879974234742737438 74273872397!746767 6947 23456497 97 397 37 46687 467 397 997 42627 23447 427 37 65647 "6#7427 6875399767386947388723456497374733887227 944979974679783397376397 7 2974299792697423472345647426867687579773747397 73467467573746687672377$7234564768757973973746687467 83767467947377833%7374668746739973763467994#73746687467 9389742764497673769%737374668746773997467&94697737 97 637 '9437 67 57 9447 467 37 97 637 67 447 833#737234564768757437427374(477378337)6767427 9754738723769346973723*23456473869737 947 7 $7+33763723975786746737374(476737327335874(47 ,69-73764747467$.'1/7234564763478337,$'.-727637 39758474675738#742738477429794789#76794469767 97833#763#767944704783397749417/892#7 $35#7 $ 339#7 22#7 37 )3927 $47 427 937 47 47 6637 94477917386#76686#73794474(477 7 1234779737464674674299765349737944712347479497 2345647 2946#7 37 69564346387 89497 12347 67 99997 427 6637977437427$.'1/72345647994#7372347779497427 69*437 23456497 12347 87 3897 427 383467 37 397 67 47 6 7379477234777

(3)

12345678429792942828449432292942294249222643422

923728762732228292292942724426432

2942748269432462 2

1. Abu Shawar B. and Atwell E. 2002. A comparison between ALICE and Elizabeth chatbot systems. School of Computing research report 2002.19, University of Leeds.

2. Abu Shawar B. and Atwell E. 2003. Machine Learning from dialogue corpora to generate chatbots. Expert Update journal, Vol. 6, No 3, pp 25-30

3. Abu Shawar B. and Atwell E. 2003. Using dialogue corpora to retrain a chatbot system. In Archer, D., Rayson, P., Wilson, A. and McEnery, T. (eds.) Proceedings of the Corpus Linguistics 2003 conference. Lancaster University, UK, pp. 681-690.

4. Abu Shawar B. and Atwell E. 2003. Using the Corpus of Spoken Afrikaans to generate an Afrikaans chatbot. SALALS Journal: Southern African Linguistics and Applied Language Studies. Vol. 21, pp. 283-294. 5. Abu Shawar B. 2003. Review of CL2003, the International Conference on

Corpus Linguistics”, ELRA newsletter. Vol. 8, No. 2, pp. 4-6.

6. Abu Shawar B. and Atwell E. 2004. An Arabic chatbot giving answers from the Qur'an. In Bel, B & Marlien, I (eds.) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles. Vol. 2, pp. 197-202.

7. Atwell E.; Al-Sulaiti L.; Al-Osaimi S.; Abu Shawar B. 2004. A review of Arabic corpus analysis tools. In Bel, B & Marlien, I (eds.) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles. Vol. 2, pp. 229-234.

8. Abu Shawar B. 2004. The NWCL Research Training Programme. ELSnews: The Newsletter of the European Network in Human Language Technologies. Vol. 13.1, pp. 14-15.

9. Abu Shawar B. and Atwell E. 2004. A chatbot as a novel corpus visualisation tool. In Lino, M.T., Xavier, M.F., Ferreira, F., Costa, R., Silva, R., Pereira, C., Caevalho, F., Lopes, M., Catarino, M. and Barros, S. (eds.) Proceedings of Fourth International Conference on Language Resources and Evaluation (LREC’04). Vol. VI, pp. 2057-2060.

(4)

Proceedings of Eight Maghrebian Conference on Software Engineering and Artificial Intelligence.

11. Abu Shawar B. and Atwell E. 2004. Accessing an information system by chatting. In Meziane, F. and Metais, E. (eds.) Proceedings of 9th International conference on the Application of Natural Language to Information Systems, NLDB 2004 Salford, UK. LNCS Lecture Notes in Computer Science, Springer, pp. 396-401

12. Abu Shawar B. and Atwell E. 2005. Modelling turn-taking in a corpus-trained chatbot. In Fisseni B., Schmitz H., Schröder B. and Wagner P. (eds.), Sprachtechnologie, Mobile Communication und Linguistische Ressourcen (GLDV05). Peter Lang Verlag, Bonn, pp. 549-561.

13. Abu Shawar B., Atwell E. and Roberts, A. 2005. FAQchat as an information retrieval system. In: Zygmunt V. (ed.), Human Language Technologies as a Challenge for Computer Science and Linguistics: Proceedings of the 2ndLanguage and Technology Conference, Wydawnictwo Poznanskie, Poznan, pp. 274-278.

14. Abu Shawar B and Atwell E. 2005. A chatbot system as a tool to animate a corpus. ICAME Journal. Vol. 29, pp. 5-23.

15. Abu Shawar B. and Atwell E. 2005. Using corpora in machine-learning chatbot systems. International Journal of Corpus Linguistics [forthcoming].

(5)

“Do not consider even the smallest good deed as insignificant; even meeting your brother with a cheerful face (is a good deed)."

Prophet Mohammed (pbuh)

1234563278639393456389437934563894386122324567827925292 292 2 582 52 59892 2 49227228292 698922562772289967925229245282572912399772 2528996725722324282492429258252292 912324229277282324269252425252324567822 7252498222529224252297227922 982 82 2 512 582 2 52 892 542 672 32 2 52 582 52 59892 2 492 772 2 562 32 882 52 2 72 52 82 752 82 52 5989224927722758257245232228245222772 2282752328812 2 2 2 582 32 456782 792 52 2 2 82 9682 52 2 12 2 562 52 562 872 752 52 562 2 82 52 92 492 2 92 2 227729123!25262525622528252249277252 5629562229232982558123!25288252772252 562724598256228692225812 9229229252562 49232524977225622825682525628612 25622 92"25277227572#5282256228925222 622562529224232825612$2252252 $582%282$5252926528256128252 2792452292562282925212322522 522925426723225256277122 2 &692 2 92 52 52 92 92 2 72 82 "52 2 52 2 82 958682 972 82 982 2 562 6952 42 52 57982 82 92 32 82 9252529252291232425292529282924229292 322912'922922222925245232625242 32822(6952522297257282929252572622 92 52 9872 62 752 62 76972 92 82 42 982 912 2 562 82 52 92 2 982 52 92 2 56562 2 2 12 $2

(6)

12921673473677 7

1671719712345737784674712737347123456789731837317297

4937719767897171971746137987611474789417987127 !7 897 947 9414967 29677 37 697 7 3417 197 123457 " 37 #$37 %9237 &3$7 327 '12347 &3(7 &37 %93127 )3$37 347

53737897477912673478971279*34747+37*37 7629749178917197123457397 9447347 1*37)2378971276**917 137 297 3367 697 7 !7*967 %347 1234567 897 127 2997 987 !9*147897371278316717*96734789737*9*76174723*17,7 2972*7127127331947 7 *317 897 127 337 957 7 57 197 123457 3327 %9237 "237 %3297 347 127567897 127 978373196*279734737127 *97 %7 *617 3117 897 7 327 897 37 127 23**4667 347 284667 627 9217 197 7 2317-23457 97 7 897 37 127 416147 167723719127%3471234567197337.37897477847347 297 9*47 27297 197 7247 127 367 497*3719797%3471234567897 &337/7 897 47 7 6617 347 897 62347 127 637 8317 127 7 897 197 367277623723**4667347634667997347379416719127 7 +4373471276170173617123457977376*6971717897 97 46*31947 197 97 1267 6327 17 367 347 29497 9547 127 97 347 2347 97 367 37 6*697 7 1267 367 4917 197 61317 97 197 84627 12917 97 6**91712917127374176*694734712764834173797 *97-2345797897972*7347976**917717761747 31947 197 97 337 47 7 97 57 197 123457 97 897 97 544667 97 9447 347 897 47 47 7 197 327 1267 227 37 17 9127 91314734741614731712763717776973187897371271677 6667 367 347 37 127 167 97 17 7 47 97 9887 12917 347 3**9414174712927736761314797%76473**31947897127 974$9371677237 7 +97 37 7 9734747 12967297 72374917 35497 37297 6679734734179712737127666734723**466797677

(7)

A86494 7 59423 7 325553467 123453467 647274856 7 64727956 7 64727889542367379236 7 1. Introduction ... 1 1.1 Overview ... 1

1.2 Chatbots and A.I. disciplines... 2

1.2.1 Corpus and computational linguistics ... 2

1.2.2 Information seeking and knowledge acquisition... 3

1.3 Chatbot evaluation... 4

1.4 Thesis objectives ... 4

1.5 Thesis structure... 6

2. AI Disciplines and Chatbot History... 7

(8)

viii

2.2.1 Computational linguistics and language engineering... 9

2.3 Chatbot history ... 12

2.4 Recent chatbots... 13

2.5 Chatbot communication language ... 18

2.6 Chatbots and pattern-matching techniques... 20

2.6.1 Information retrieval... 21

2.6.2 Information extraction... 22

2.6.3 Question answering systems ... 25

2.6.4 Summary of pattern-matching techniques... 28

2.7 Machine learning techniques... 28

2.8 ALICE chatbot system ... 30

2.8.1 Overview of AIML... 30

2.8.2 Types of categories... 32

2.8.3 The recursive categories... 32

2.8.4 Preparation for pattern-matching in ALICE... 35

2.8.5 The ALICE pattern-matching algorithm ... 37

2.9 Elizabeth system... 41

2.9.1 Elizabeth’s script file format ... 41

2.9.2 Dynamic processing in Elizabeth ... 42

2.9.3 Elizabeth’s pattern-matching algorithm ... 43

2.9.4 Implementing grammatical rules in Elizabeth... 45

2.10 A comparison between ALICE and Elizabeth... 46

2.11 Evaluation methodologies ... 47

2.11.1 Evaluating spoken language dialogue systems ... 48

2.11.2 The Loebner Prize ... 49

(9)

3.1 Introduction ... 55

3.1 The Dialog Diversity Corpus ... 56

3.1.1 MICAS corpus problems... 57

3.1.2 CIRCLE corpus problems ... 57

3.1.3 CSPA corpus problems ... 59

3.1.4 The TRAINS dialog corpus problems... 60

3.1.5 ICE-Singapore problems ... 61

3.1.6 Mishler Book... 62

3.2 The Korpus Gesprooke Afrikaans corpus ... 62

3.2.1 Afrikaans corpus problems... 63

3.2.2 Principles adopted to solve the KGA problems ... 64

3.3 The British National Corpus... 65

3.3.1 The BNC problems... 66

3.3.2 The principles adopted to solve the BNC spoken text problems ... 70

3.4 The Qur’an corpus... 71

3.4.1 Problems of learning from the Qur’an text ... 72

3.4.2 The adopted principles to solve the Qur’an problems... 74

3.5 The SoC Frequently Asked Questions (FAQ) corpus ... 74

3.5.1 The FAQ problems ... 75

3.6 Summary ... 78

4. Retraining ALICE ... 80

4.1 Introduction ... 80

4.2 Software developments ... 81

4.2.1 Software requirements specification ... 82

4.2.2 Software design and implementation ... 83

(10)

x

4.4 The KGA prototype... 86

4.4.1 Learning techniques ... 86

4.4.2 Elaborating the system architecture... 88

4.5 The BNC prototype ... 90

4.5.1 The BNC frequency list... 90

4.5.2 The alteration method... 91

4.6 The Qur’an prototype ... 93

4.7 The SoC FAQ prototype (FAQchat) ... 96

4.8 Summary ... 99

5. Evaluation ... 100

5.1 Introduction ... 100

5.2 Human-to-human versus human-to-chatbot dialogues... 101

5.2.1 Semantic comparison ... 102

5.2.2 Part-of-Speech comparison ... 103

5.2.3 Lexical comparison ... 104

5.3 Evaluating the Afrikaans version ... 105

5.3.1 Dialogue efficiency metrics... 106

5.3.2 Dialogue quality metrics ... 107

5.3.3 User satisfaction ... 109

5.3.4 The methodology drawbacks... 109

5.4 Evaluating the BNC versions ... 110

5.4.1 Results and discussion... 110

5.5 Evaluating the Qur’an versions ... 116

5.5.1 Results and discussion of the Arabic version... 116

5.5.2 Results and discussion of the English/Arabic version... 117

(11)

5.6.1 Comparing the FAQchat with Google... 124

5.6.2 The methodology... 125

5.6.3 Results and discussions ... 127

5.6.4 Testing reliability ... 132

5.6.5 Samples of chatting ... 135

5.7 Summary ... 136

6. Conclusions ... 139

6.1 Summary of the work ... 139

6.2 Results ... 141 6.3 System drawbacks ... 143 6.4 Future work ... 144 6.4.1 Project aims ... 145 6.4.2 Project plans ... 145 6.5 Overall conclusions ... 146 7. References ... 147

(12)

xii Table 1. 1 Summary of system prototypes, corpora used, and the goals of

implementations ... 5

Table 2. 1 Some recent chatbots... 17

Table 2. 2 The normalisation process in ALICE pattern-matching processing ... 36

Table 2. 3 Producing the input path by the AIML interpreter... 37

Table 2. 4 Tracing an Elizabeth output ... 45

Table 2. 5 A list of the Loebner prize winners from 1991-2004... 53

Table 4. 1 The trainable corpora... 82

Table 5. 1 Response type frequencies for “Afrikaana”, Afrikaans chatbot... 107

Table 5. 2 Response type frequencies: subjective analysis of “Afrikaana” response .... ... 108

Table 5. 3 The BNC London teenager and loudmouth chatbots ... 112

Table 5. 4 Different BNC versions... 113

Table 5. 5 Characteristics of the English/Arabic Qur’an chatbots ... 117

Table 5. 6 Proportions of answers generated from the Qur’an... 121

Table 5. 7 Can you find the answer? Analysed per question ... 127

Table 5. 8 Proportion of users finding answers by FAQchat and Google... 128

Table 5. 9 Which tool do you prefer? Analysed per question ... 129

Table 5. 10 The proportion of users’ preference tool, analysed per question ... 129

Table 5. 11 Number of matches found per user ... 130

Table 5. 12 The mean for giving answers and preference each user found ... 131

Table 5. 13 Paired t-Test for finding answers analysed per question and per user .. 133

Table 5. 14 Paired t-Test for users preferred tool analysed per question and per user ... 134

(13)

Figure 2. 1 The AIML format... 31

Figure 2. 2 Samples of AIML categories, illustrates the usage of the recursive categories... 34

Figure 2. 3 Log-plot visualisation of 24,000-state Graphmaster... 37

Figure 2. 4 An example of an AIML code ... 39

Figure 2. 5 Tracing of ALICE matching process ... 40

Figure 2. 6 Elizabeth script command notations ... 41

Figure 2. 7 An example of Elizabeth script: illistrate different keyword types ... 44

Figure 2. 8 Grammar representation rules in Elizabeth... 46

Figure 3. 1 Astronomy transcripts sample/MICASE corpus... 57

Figure 3. 2 Algebra transcript sample/CIRCLE corpus... ... 58

Figure 3. 3 Physics transcript sample/CIRCLE corpus... 58

Figure 3. 4 CSPA corpus sample... 59

Figure 3. 5 TRAINS corpus sample ... 60

Figure 3. 6 ICE-Singapore corpus sample... 61

Figure 3. 7 Sample of the Korpus Gesprooke Afrikaans corpus... 63

Figure 3. 8 A sample of a BNC spoken transcript... 66

Figure 3. 9 A sample representing long monologue in the BNC ... 67

Figure 3. 10 A sample representing unclear utterances in the BNC. ... 68

Figure 3. 11 A sample showing the overlapping problem in the BNC ... 68

Figure 3. 12 Extra-linguistic entities cause problems in the BNC ... 69

Figure 3. 13 A sample of the Qur’an in Arabic and English... 73

Figure 3. 14 The extra notations used within FAQ of SoC files for avigation ... 76

Figure 3. 15 Questions-answers denoted by <DIV CLASS=“sect1”> and “<p>” tags... 76

Figure 3. 16 FAQ questions-answers denoted by “question” and “answer” tags... 77

(14)

xiv Figure 4. 1 A sample of MICASE transcript with its corresponding AIML

category ... 85

Figure 4. 2 An atomic category with its most significant word in 4 positions... 87

Figure 4. 3 The program algorithm of the KGA prototype. ... 88

Figure 4. 4 A sample of Kilgariff unlemmatised frequency list... 90

Figure 5. 1 Screenshot of the semantic comparison between ALICE (01) and CSPA (02) ... 102

Figure 5. 2 Semantic comparisons between ALICE and CSPA... 102

Figure 5. 3 Screenshot of PoS comparison between ALICE (01) and CSPA (02) .. 103

Figure 5. 4 Part-of-Speech comparison between ALICE (01) and CSPA (02)... 104

Figure 5. 5 Screenshot of word comparison between ALICE (01) and CSPA (02). 104 Figure 5. 6 Word comparison between ALICE (01) and CSPA (02)... 105

Figure 5. 7 Dialogue efficiency: "Afrikaana" matching types relative probabilities107 Figure 5. 8 The quality of the dialogue: "Afrikaana" response type relative probabilities ... 108

Figure 5. 9 A sample of chatting with the BNC chatbot ... 111

Figure 5. 10 A sample of chatting with Peter-BNC chatbot ... 113

Figure 5. 11 A sample of chatting with the sport-BNC chatbot... 114

Figure 5. 12 A sample of chatting with the Arabic Qur’an chatbot ... 117

Figure 5. 13 A sample of chatting with the English/Arabic Qur’an chatbot... 118

Figure 5. 14 A sample of matching a whole ayya via chatting with English/Arabic chatbot ... 119

Figure 5. 15 The Qur’an proportion of each answer type denoted by users ... 121

Figure 5. 16 FAQchat interface... 126

Figure 5. 17 The proportion of finding answers by the FAQchat and Google... 128

Figure 5. 18 Which tool do you prefer? Summary... 130

Figure 5. 19 Comparing t-Test values fro finding answers... 133

Figure 5. 20 Comparing t-Test values fro the preferred tool... 134

Figure 5. 21 Answers generated by FAQchat ... 135

(15)

List of abbreviations and acronyms

123 3 3 145676869325683 113 3 159568355319637336843 123 3 3 145676869325683943993 1233 3 1456768693665683254535435653 3 3 3 465639569343 3 !3 6774837393"5#35#343 3 59 94 3 $69563 %3 !35593"43635343 73 437374 3 !3 !3$937353964 35!53 512343 !384656893$9373556473 2&3 43 73 254 6866943 &9483 3 54856$3 9463$6453 1'3 3 5655366 31595683'4 596353 3 3 3 54 695 3689563 (!3 3 3 !3434373 3!93993 )13 3 3 433)4769314689363 3 3 3 6936$465343 2)&33 3 93254956$3)95543&9563*963 +1,3 3 3 +4-531 3,563 .1!3 3 3 .493148658543743!*536463 /3 3 3 /6 394$3 3 23 3 25495693437363 23 3 3 254956934373944363 23 3 3 2749563*5498563 2&3 3 3 2749563&546$93
(16)

1234 4 4 1567857642794364 14 15678674754574654 4 84 7874 854 384 4 754 854 !4 4 4 5678" 9#!78754384 $134 4 $5438447%4754594 $4 4 4 $57478554 &'4 4 4 &6894574'8754 &(4 4 4 '654&6894574(964 ' 4 4 4 '86##774 )4 4 4 )76545*7854 2!!4 4 4 276784!574!8754 +4 4 4 754574+974 34 4 4 9443%654 '34 4 77464556674384 ,24 4 4 ,869427784674 -24 -$4 *874 2768794 965.4 85994 9974 -2/4 540774 4 #864 8097%.4 275%74 -124 54 -$4 *874 157154 54 2768794 867687.4 4 4

(17)

1. Introduction

“Vague and nebulous is the beginning of all things, but not their end, And I fain would you remember me as a beginning.”

Gibran Khalil Gibran, “The Prophet”

1213456758693

123456347895 75495 9 4 785 95495 95 44 479947332597495524573229554244524 433478957549598123456447549544584 9547 98 2 25 25 295 5424 4524 95 784 823456732942384529578 9522784733259749529457984525 1878449785849345248456 7 958994585424452489789955!5 89 89 45 423497 7 9 5 74 3 8 9795 "847872544933549545845957859#2 8484544552"8494$444384 4933584284584529595 4784844 954524"84394758 347895644 %72& 8 784 4524 3 34 !5 89 8983495478#29549594' ( )84 4 8 957 5 4 82345682345 75495 45 823456784 942 45 745 8 9599 295 7324954* + 8423497727795479595262223 472548544784* , !992454234974649578444477 95349545945#295*
(18)

1234 35678694 4 9234 9634 26994 534 634 4 92334 8394 353674 932834 26534 334 834 824 64 3684 9234 6986734 4 9234 336934 67834 3684 9234 86794 64 334 4 9234 7364 9328346467749699674356786949453969349234794484 9234269946464974943935346344643839746348394 !"#4 69663464834426934$44

1

234156789

8

17

1

1

1

!9674%9377334!%#44334641234984424946348934 492469422&4694923439&4373463439934'2464(294)**)#414 4 &4 8934 894 34 6734 94 83964 2864 768634 +34 64 874893493&42249339464336934698674768634464 6244!%446469867476863434,-.#4/864640694 1222#43347686343464923489696749328349269434 34 64 9934 2864 76863&4 64 768634 1234 774 83394 774339492343769243933426994649234!%46234 4

1.2.1 Corpus and computational linguistics

9843493283447686346484923349466734696394 4 93494 6#4 4 64 84 734 4 698674 768634 344 64 84 7894584 7894 4 94 64 624 4 7894 824 64 964&4 3694 64 6694 92694 396934 4 34 4 34764 34 6394 4 768634 834 %94 4 64 39274 92694 64 34 6734 94 34 63447894984'6441226#4!73464!79334)**)#4334 8478946492349844768634492346449349464 4 !43769343749484789448969674789&4224834 4 984 698674 76863&47894 64 969674 7894 3&4 894 84 8934 64 64 974 94 374 6&4 393&4 534 4 67#4 6394 4 7894 92334 3334 4 698764 93394 886354 394 674 )**$#4 1234 9489696747894676943349234623496769493&4 224 63634 49234 76934 )*$249534 363&4 9234 36344 65676794 4 893&4 64 9234 3364 834 4 9234 %9339&4 3396934 35374

(19)

1233454678997498618798778322774874854752484757268618945268 3582635726858178272687874867456478618558 8 6483874948772687982987528968 !8285214988 17"1524684718 78 548474468123345467892#4185587875448449$8 946728%57"3"&448%&!8 618518 69298 687298749298752898 941878495487486756499838778124988526854868 124987877864989812999418268947268'88

1.2.2 Information seeking and knowledge acquisition

6357268944(268264981232456357682985484645287523745635768298 61897894232875237463523785499498)8178547524854998298 78 54752448 8 *4798 358 8 48 14326418 9757548 48 547268 17949!8 28 972938458 748945+9814326418612726986357268,757268,!8 6186357268475248 !8 54878 5474187867586485499268 45487726826297287462-498618745249888975685488298 74854998385475242681467982827848544678589438787489458 54-4978.4#"/74986182425"04781222!888 8 3248 618 4464578 1225!8 1495248 68 ,854998 98 8 4718 6358 957268 758 5498 38 7478 26(268 544678 26357268 248 265268 748 2554467786182(981228!8298778668272683872987468298 985441418868894786872985494788997498478748544678 146798 618 7468 ,8 997498 948 98 146798 78 47578 544678 2635726826884546783545(89529681228!8149524987487783868 ,8 99748 98 68 97575418 545494677268 98 98 178 94!8 38 9447418 26357268156835874874778 8 :5298 456268 52798 618774568 7268 7462-498 4548 2418 78 78748945+9854-4978278748(64148948427458785475244826357268 5878475789423282449838178783288541432641874748&483874948 471985485424418268947268'86872987492987487782989814952418 98878785475244826357268358897575418179486182985418278 7486745647894584626489487863258279822787846457485496949898 14952418268947268'88

(20)

123456789847784

1234 1567894 134 634 4 84 1567894 4 6797834 234 4 4 357894 4 27834 784 364 4 78379383 4 1234 783793834 4 4 634 34 54 34 3534 4 3567894 24 34 74 54 3534 4 2584 83678 44!63463384784848854"38364#67$34831467424 234 3384 334 784 364 4 67%37834 35784 67367 4 4 834 4 2584&5934634973844785347243242484'34468'4234784 3644(85683) 41234156789413484"38364#67$34834634637334 7843784* + 4368734357843242724634634667344234 674 786784 34 7784 634 34 784 2364 4 1234 35784'34584438%3834674463453646236428467774 %7853467 44

4

12464984

1234 784 &3734 74 4 574 2,4 '83934 574 7834 4 938367894 234 '8393%34 854 57894 4 654 34 62 4 1678789423424783-4 .5778944938364696424743443/64785%554653464 736384664578943/436878943287053449734234342 4 * 138367894 736384 36784 4 234 24 24 3'4 736384 895934 84 3634 784 5734 784 4 6787894 234 24 57894 853654664785789-45865563452447953466484 655634834524405378484836 4 2 3787894 5354 7784 4 24 784 7784 4 37894 4 4 64 53384524-4 • 47894 4 24 4 4 4 64 58'884 895934 23634 558'8854 384 74 58'884 4 234 24 526633364 8664 774 58'884 4 5784 7895774 24 74 23634 236347442693443/77894443472423489593 4 • 4789423424444475734648734465 4 • 47894234244444636733483646405378 4 1 http://www.loebner.net/Prizef/loebner-prize.html
(21)

• 1234567896877626676769226469739963473466 6266 6 92379678967678769327345687726964967686789635968327646 37476 87932736 346 46846 49273426 78326 7892326 2826 7876 8772662736962963464629626 6 89696973967699678962 2968676734345687726 263994796346629392667792623456399476646943456 79843!926 76 9226 6 4596 6 2 526 96 ""6 326 6 26 6 22796 77926734345662964678965266399477346 6 6

Prototype Learning technique(s) Goals

The DDC Simple approach:

the first turn is the pattern and the second is the template.

Exploring the problems of using a dialogue corpus (Abu Shawar and Atwell 2003a)

The KGA Simple approach, the first word approach, the first most significant word approach.

Conversational machine for a new language (Abu Shawar and Atwell 2003b).

The BNC Simple approach, the first word approach, the first most significant word,

and the second most significant words

Generating a large number of AIML categories (Abu Shawar and Atwell 2004a), and visualising/animating a corpus (Abu shawar and Atwell 2005a).

The Qur’an The same as the BNC

techniques. Exploring problems in using Arabic language and non-conversational text (Abu Shawar and Atwell 2004b).

The SoC FAQ

Same as the BNC, and the first word with the first most significant word, and the second most

significant word.

Exploring the use as a tool to retrieve answers from FAQ databases (Abu Shawar et al. 2005b).

Table 1. 1 Summary of system prototypes, corpora used, and the goals of implementations

(22)

12

345678984874

123453678989427354428423847534744 4 27354 4 53394 274 2854 2394 843794 234 7864 433894 24 274 794 234 79734 34 82894 274 364 835394 73594 67289432983475346753414274364 !"4794"8#7324 7534 67534 894 3564 4 $9334 5353397894 794 73594 672894 32983412342735489347$59489567894943778946324 783482894$3948734364794234 39354%58#34934282484 344377342745&6789748988483479479734 3989335894537892847534745383344 4 27354 '4 834 234 5574 34 894 5789894 234 !"4 274 364 12334 5574 2734 6794 5364 2824 7534 6734 3754 894 284 27354 (8947534334435634233453644 4 27354)435834234753486363978943589437244282427484 94 54 794 84 94 956787894 6324 2824 34 824 234 54 7997894835394358944 !"435343935734 4 27354 *4 84 794 377894 4 234 835394 35894 4 234 53578934 !"4 73894 232354 234 74 4 3724 35894 3534 728334 1234 377894 84 734 94 6375894 9757934 794 6375894 234 34 4 234 734 37598943298348948894539347349435+4337$44 4 27354,458347466754423423847949894123486878944 2345$47534533934794794545345$47534334 4 4 4
(23)

12345367897783337833

“...If you thought you must measure time into seasons, let each season encircle all the other seasons, and let today embrace the past with remembrance and the future with longing.”

Gibran Kalil Gibran,”The prophet”

2.1 Introduction

3 3 3 7863 3 793 565563 8663 3 8663 3 3 6563 3 3 5953 3 857583 53 7656 3 3 3 3 3 563 853 3 863 3 3 8663 53 3 3 73 9!3 53 539 "3#93$%%%& 33 3 6593 7563 3 3 983 3 7553 75'63 3 563 3 3 63 57953 93 93 876653 7863 95565763785939556576393935533793 9553 793 573 563 3 3 553 85763 663 53 3783575 33 3 (753)3356378357633535653393596 3(753*3 866376356 3236753+3633373763356766 3 3 93 3 8753 75'63 3 6753 53 67563 ,3 3-368759 3.75395375'633863536753/ 303 11223 73 3 12.13 93 3 38953 53 6753 4 3 (753 %3 656633295537 31378563311223329553 56383536753$6 33(753$$3563539539563 37633593666 3
(24)

2.2 Corpus linguistics and computational linguistics

57953 3 53 3 7863 63 3 799753 3 993 7753 9337633775!33633"53339#$3%6533 7863638873363393633&'36378639556576#3 3 (5338538663337)835637837)3576593 "5993533)75393)#357953*3533123456789128453 6337863'5735637353365633)63'338+ 3 5"93 6&6#3 263 7653 85763 3 93 3 7)3 63 3 53556338"7$#3,3563)3)753937863'633 -'3 863 33 1)57+.9563/13 307563*1#3,3 7863 765663 3 885)93 3)59953'633'53 1)57+.9563 &3)3235338638956353*#323133378633-5563 .9563 '63 7673 53 3 %5"653 3 4763 &'3 63 3 4765669+-3 46-3 7863 3 -556+.9563 7663 3* 3 '573)763 3-'3 7863 53)6335637)8655#3-33-'3 3346-37833"5993363537)57367#3 3 -633687557363358556353783'3593353 8866#357953237966556378353399'538693 • :737869365338"537)865"35)533 393##33-5563;593863-;31633-3 3 3 3 3 78633 56993 7))57531'993 3.9953 <==#33 • >537869356383378636335365!3353563833 63)35)335)3##33-&33.9565-5)5)3863 :331333 • ?9993786937997563363693353963##3 19+@A13393&33269) 35636933)9589396B3 3:1,.:37863>7.339#31#3 • )8937869365)593353)3339#303)893 '3'68863336)3)5353.9563307337653 1 http://www.oneummah.net/quran/quran.html
(25)

63 7893 783 13 993 3 7893 7863 563 3 25938633956323 3!""#33 3 63$%%$3633995378386&3 • '8759563 786&3 (63 3 79973 3 863 3 85793 83 3 ()3 3 3*5753 863 3 17573'839563*21'3 $%%+33 • ,3786&379975633(638733963339)3 3 3 2593 863 3,395632,3 3 3 93 $%%$)333275-3'83,37532',319933 93$%%+3 • 565793786&3(63353856335)33396553 786)35737656633!.3599536)368933(63539)3 59)339395638563/03393$%%%3 3

2.2.1 Computational linguistics and language engineering

23899935359537539378)3335337853 3 93 553 6763 3 955663 63 69563 3 -983 573 63 963 3 897393 965631563 7853 763 3 3 73 3 93 93 876653 593 78593 95565763 563!"2#3675378593955657636333633 783 66633 653 3 53 93 94313 7835636333378633593539636736&353 96)3777533(39656396)336753963 3 153963363333(3337863535363673 6&3 • 566687357'3 5&3 66563 733356357933 653 3 5673 3843 3 -)3 3943 3 3 13 653 ,59531573:653'63,1:'35633(8933 57'3 53 963,73 3 93!"2+)3 653!"2;)3573 56363 3 33,7<33<937863131*1, 1*3319933 93$%%%37357'63775336-935378636763

(26)

• 5793 8653 63 73 673 53 3 786353 563 863 6793 3 19933 1993 3 9!3"###$!3%3 69537833&363'&6(!33 • )573 53 66563 733 3 3 6573 3 673 63'*"(3 733'95&(357363373'5936637563 +63387666(!315733196563,3)8&3456763 11)4$3 563 3 -893 ,3 33 663 53 663.5963 3 /630$!3 453 3 1993"##$3 563 3 -893 ,3 3 5158365733,3363,353667553 53*9563-!3 • )87173536335937835363,3687373 7563 673 63 '79(3 ,3 795+3 '21(3 ,3 12653 '(3 ,3 5!3 33 -893 )873 173 13 863 )411$3563336515739357335733.5663 "##0$637339!37$3866336578857353 ,3353,,573793786!3 3 7753 3 -3 9653 963 673 63.653)73"##9$3 3 3/63"##9$!3%3 7773 8+563 95663 ,363 53 3 -3 535637-!33 3 )75396367363:1/133:;5313/5+931889575$3<3 343"##0$ 357356339393,3675393:;53783533 85793 ,763 3 3 63 ,3 7863 9556576!3 =663 73 673 ,3 6656363 8663 3 3 63573 9553:;53 963 3 53 67585+356!3 3 163993 63 6875,573 963 93 553 6763 963 593573 93939&563389,6!3535535633'567589533 73 ,3 553 6,3 6663 3 8,3 6&63 5+9+53 876653 39!(353$!3 3 1%* 339317573,3%-3*55 3563389,3356353 76753 65 3 3 +953 ,3 93 553 6663

(27)

53 3 9333 5!6"63 3 933 53#$$#%3 1&'35637863(33657396)3 • 1363(36535(5333*33363673 6333+735393 • 138579363 5(73(3 9753 876653 963 3 3 3 ,5-533,95336963 • 13 799753 (3 -8863 (3 95573 3 3 6763 3 583-5336335(73 3 &3 1&'39363793366933*735,33,3693 5(53 23 553 3 353 53 (3 ,9853 1&'3 3 ,73 6735333(393963.7,3393#$$#%3333 1&'3793363633((75,393(3753935533 7859395565763&363+65(35633385533963 593 53 1&'3 -573 ((3 /7865,3 057178953959593 6883399-6363373786353396353553 33963(3786356338(73,953 3 &3 93 93 9"5323&4%3 563 3 63 (3 95563-53 5353 653(37537866785939556576323&4399-636633 (763 3 93 93 6"63 563 (3 789*3 853 8963 23&43 8,563 73 65893 *65935(3 (-"3 (3 66563 8+763379663656833833.53#$$#%3323&43563383 (33(99-5358396)3 • &3 "3 9)3 8,563 6573 796663 (3 876653 55,593 963(3*367363-6336763 • &3 39)3 (563 3 6763(3 8653 3 6763 ,3*3673636*363389579363 • &3859539)358963796663373-63(973 565563385953565563 • :3 963 (53 3 673 3 5(763 (3 8(53 6875(57323536"63 3

(28)

336633599637859395565763393553 673 3 9833 3 593 563 63 1993 !!!3 "#6$33%53&'''(3

2.3 Chatbot history

3 53 #3 73 6663 553 53 3 %667663 2653 #3 7933)5*3 !++3 !+,(35893 3-.2/13 73 3 93 3 8678563 )5*3 !++(3 6753 -.2/13 63 3 835730$639393765353378386659133 35363658933633$3753358356356873#3 3 8673 #3 3$3 2#3 673 33#3 3 673 563883 7753339366753533$35#333773#3$3 33 753 75563 3 953 6#53 563 533$83 3 765353-.2/136363#523866353763373#3673 63033565349633133033933513 3 3 2363 8363 417783 5893 53 !,'3 93 !,9(3 675341778363365953#385338766538653 95#63#63 3 25563 23 763 3-.2/13 563 #3 65953 3 8678563417783993385385353355353 563 856346755663353 675863 #3 5963353 417783 3 6353 793856:3 3 86755663 7933565563 3 341778;63$93 563 63 53 755<753 8753 96333657383563667535338368633583 563853538633733563783#365595563533 6573863 6353 3 63093 3417783633933 6333#385337653-.2/13633859379557933 3 793 553 3 5<653#$3 693 93 693 38563313=>*933753 !!?(3 3 @5357$95331$3&'''(357330A3-.2/13341778363 75357$633393367766#9938#35376563-.2/135763 376533#369#336$53B656C3-.2/13663863#3 3 6;63 583 53 3 83 B6563 3 663 3 3 #9953 3
(29)

7656323553366375631363959365633993 333563635337653

3

23 !"3#3$5%3 5&89&3 3'4()3 66&3 3 65&93*3

7563 +3 3 3 575%3,53 3*97-63,93 +3 5++3 793 3 683 97-63,573 793 3 8973 3 3 93 3 83 53 3 .3/'863 "00123$5%3/ !"2367537&&5753,53'4()363*9-5%33 365&8933,53333333359533&5893397-63 3393#336863375%337&&63/53365&93673 335689367373337&82385%336,6336563 3 7785%3 5+&53 363 53 65%3 93 3'4()3663 3 83 ,3 9+335%3 863,573 9663 3 83 55+563 563 673 3 7%5663 563 93+63 3%&&3#363 3&5%3 3 673 ,63 863 53 &63 +3 4763 88563 3 9563 53 3 87938653+&33

2.4 Recent chatbots

*#33+376593%63637&373,533,568363+3 8693&7563,53,56337&&57333653+353&-633 853939%%35+763/$59-63 233 3 5%1(3/763 6737633193 823,633+563733 63&75395%3756353%5%389563+&3355%3786323 663563,39%%3&93,573765663+3,35-3&963#3+563563 93385733,33,5993+99,33%5365%7333738573,573 ,387633%536733#563&93563633%37389563 5%1(356339563+3-,63+&369635833-565%333 &5%335%3+73,6367363:933:9373;73-,3563 63 63 3 63 +3 75%3 3 866593 893'5%3 3 3-,3 3 :+,93&93 %63 3 673 3 7&893 3 673,63 3 :7-,93&93,-63+&33-,37-333%55%3+33673 53753895633%33633+&93563633+5633 123145672896283,573&-633+59375653633,57389356383 363;665993 3 5%63 5+&53+&93,63 7673 63 3
(30)

13 7993 3 3 63 685653 89563 53 3 3 83 3 76535593313633533 !!"33763 #$3% %&'3 3(633 63 68563 31)63 89563 3 3 3 53 63 3 63 *953 8'3 3 3 3 +5,3 3 13 '33 89563 35793 -3 66579'3 3 66.93 (63 9563 563 63 7756573 /3 3 3 93 /3 67*67'36333/935366.376631363 3 3 3 #753 95&3 73 793 3 59'3 3 3 953 8873693399*/'30956571389563 3 253 53$76)3 57533 303 73 /3 3 693 63 3 3 //73 3 753 76'3 3 63 763 3/5361'3 3 63 /3 32413 3+1556'3 93 # !!!&3 863 3 7553 83 83 5937753486653836638563385738853 /3 866563 53 3/3 /3 3 8693 783 8'33663 73 535333866538383563383/3393333593 3 83 93 3 8563 993 77863 3 989563 93 3 86653 8383 5933 8663 3593863 653 3 8575835337653353635/3536636)6358'33 6853888593836635337366339866353 /95'3 95/6'3 3 63653688536'3 3 //5379953 .6563 3 6656383 63 563 63 53 755*753 8753 96'336573863366753383686633736573 835637863/33*986653#6'3556'3866'37&'3573 563 6653 3 3 6573 7'3 3 3 58305513 563 6653 3 3 6653 73568663 379665/533 3 67573 863#.656'3 .66'3 3 6656&'3 3 /53 6573 /63 #95/6'3 766'3/956'3 656'33756&38636866338599335363/36573 :565)3856335637359335938763385357933 36)63 8693 5593 663;6)63 583 563 853 53 86'3 3 73 3 563 783/3 6559556353 3 863 63 53 3 63<3363/38637'336368635636973#383 3533956&32/3373563/'3383536333933 3//633.65336653338573356766533 3 63866305'33%"3*93/9353/33,53953 =53<5'33633533;>'33793956353?3613#5963

(31)

3 533 23 3 3 793 3 765393 6363 5936333533!5"36#5"336$3%&753393'''3(3 66#3 563 7#863 3 3 65763# 96)3 58 3 753 3 3"3 # 933 73 3 579 63 6 3# 963(3563 563 3 58 3# 93 573 63 3 63 58 3 876663 53 3 773"##5793 3 68995"3 #56*633966336+36 7 33#53373673563 3 7##3, 653 3 3 6#3 3 +73 6 93 5#53#3 3 58 3(3 6 93 5#53 563 3 8663 3 3 753# 93573 563 686593)3"5"36633, 656-395"3+7859376636 73 63 3 9" "3 3 6975"3 3 6863#3 3 73 63 3 866-3 3 *85"37*33857639355337653#55"3 6358 33 5"3 3 #73 53 53 3 73 63 3 5993 85763 .5993 3 "3 799763 +3"3 3 3 753# 93 3#"63 53 53 3 73#66"3573563 8 333 633 3 /8*063%&993'''3563359" 37993573586376593 675863 3 793 3 3 7#8 3 773 576353 3 63(3 67583 963+#89633 76333 63#5"3"3535556333 6656163 753 78 93 86563 3 3 +873 58 63 3 "3 #3 3 58 3 +#8963 573 3 3 #73 "563 63 7635333737789388663&9939563363359" 3 67583 , 5#63 3 3 3 3 753 3 3 76593 "-3 3 #635#83)33 • 155"35888538553533665616359" 3 • 177665"3 3 73 76593 563 53 3 3 683 888593 • 5"36#5"336333#73563 3533337963 37659393 3 /8*06163 573 65"3 9963 58393 3 793 3 3 73 33 637"6337+3233#35#3385 637+633 65993599336833 63 37*33#32336#3#73 77 3 #3 3 3 53 663 3 56893 73 5#3 3 53 8553 23 3 763 3 3#73/8*063 89563 3!/3 23 513 63$3(3 8 3563863 65"33"93+4468736653 3"8579377353395#53799753384 35#563

(32)

3 19563393366333853373335753 3337535633!338895753365"#$3%363 563 66853 53 985"3&'1()3 573 563 3 6 3 93 3 963 9863 3 3 3 93 9""3 573 5795"3 6873 3 53 876$3 &'1(3 763 68!3 763 53 753 79963 3 3 568733388957533*7536363+66$3%353563633 5!5"333 93536335563,76)3538856356333 956586338575835$3-63586333636!6)33736!3 63 3 86.3 68755"3 5563 3 3 93 3 3 6!/3 68755"3 3 88563 3 3 3 3!3 75603 799/3 3 68755"3 3 63 3 896)3737685"3337966336763373363355533 6!$3'736!356378633 3567636"6.37659389)3 5739633! 9"3336!)333765935673 573 96337*33373567636"$3%35676356378633 35753786$3%3563563395"56573673337$3 %3 673 563 3 56763 55)3 573 563 78693 3 3 6!63 +5$3%3 53 563 3 593 6)3 573 563 3 799753 3 556)3 8856)3 3 9565863 3 3 93 3 3 6!$3-6063 583 563 963 373 53 3 88853 89)3 3 3 73 53 3 883 +533 3 6!3 3 5653 3 3 6!3 563 *7$3%3 35636559333536353595"3553*753666)33 56335995"3367389365"33786)3363583563633 73 3 893 8$3 23 3 3 63&'1()3 3 36383 3 7595335638895753533,7153956)3 573&'1(3 733306359$3&'1(3 6363 53323'*866)3 5735633 8895753 3 99 63 7593 3 73 956)3 7"96)3 995866)3 3 *3 *6353537963367$3 3 1653333375"3 53375)35995"3"6)3376353 85793 3 895"3 3 583 93 53 79"$3 63 793 68!3 539""6)363535356)33395893869556$3 %933$3656636337376$3 3

(33)

3

Chatbot Application Communication mode

9133 13 57593 863 3 93 53 53 3 63 573 8563 73 5537753375385733 3935633 !339333 9655333"#$63 %&933

PC Therapist3 Simulate a Rogerian therapist, inspired from ELIZA. Different personalities have been developed such as: PC professor, discusses men versus women; PC Politician, discusses Liberals versus Conservatives.

Spoken mode

Speak2Me4 A female chatbot that is used to teach English language through chatting.

Input: textual mode Output: spoken and textual mode.

Marloes2 A female Dutch financial advisor. Spoken mode

MIA5 A German advisor on opening a bank account.

Textual mode

Cybelle6 A female avatar with body and uses gestures while talking. She directs you to discover the agent land, a new land where you can find more information about agents, what they are, how they work, how they could be useful for you.

Textual mode

Table 2. 1 Some recent chatbots

2 http://www.mlab.uiah.fi/mummi/sanelma/ 3 http://www.loebner.net/Prizef/weintraub-bio.html 4 www.speak2me.net

(34)

2.5 Chatbot communication language

3 563 53 53 3 995639563 45753 633 6633 3 86653 3 63 9563 73 3 3 63 3 68 3 663 3 7!593696"3#$ 563%&&%'335633(337575363 663 3 )563 #%&*+'3 3 853 !5793 3 (573 8893 757337355"337575395563733 7966553 53 6873 3 373 373 75753 3 983 7!656,33333653963359333596373 -3 3 655573 5763 (3 63955633 89365893 .3 !53 3 3 663 893 3 853 93 53 73 3 73 75753 #66993 3 9. %&&&',3(593 6663 553 3 3 3 3 !57373653!33833 3 2336333653933375753889333 63 53 93 3 7573 (53 783 7563 /3 893 6873 75753 56363 3 573(53 3 593 663(573 563 53 633 783 663 3 5763(53663 595653 773 9393593(33633939333765633 853 76"3 #/9705 663 122%'3 13 (53 !53 3 663 88957563!338863359366635333650 6(53666336(333653335337!65936663 3353337!653(5336"3#$7-31221'3-363 (3 375953 3 83 2753 #2'3 563 3 99(536633 8663 53 563(5663 33563 5793 3 993 3 68 53 853 3855"3 3#4.3 3931222'3-563(63 3 93 33 !985307535763(57335363353(3 663 37563 235633 663 3 63 73 335375753 87663 563 686593 3 3 8!5653 3 3 753 (9"3 759533!5993553533(3356378593(5330 656375753796336963365637563#6358'353 33#56756676'36933375"3#7855633 7 563%&&8'33 3 (!3 63 5763 3 3 (3 653 3 93 93 3 7573(53 8893 3 3 363 3 7573(53 78633
(35)

6533 53 5753 53 78 !53 7 5753 "3#553 563 881 2341 561 7891 41 5761 66791 461 7616723 3 5#3 85 63#3 3 6753 3 77565763 $3 3 9% %3 $3 "3 3 3 956583 #3 "3 3 683 3#53 9% %&3 63 775657636753"3635737 575'379665$5353 #386(367 636 7323)9332)'3#57355636%3 77353 35%'3953$7!!$73765*3367 636 73 63596&33233379665$33739% %'35%3+,,-3% 633 78 3537 5753"3.5638'3373953#55%'3 3 7%63 3 $3 853 3 5$9'3 3 73 3 953 683 765/&33 3 033637'3196339&33675339% %3$323 )93 32)3 63 3 893 $3.683 9% %3 53#53$/'3 63 53 663 #53 683 9% %3 3 5$953 775657&32 3$ 63 $3 5$953#3 567 66&33$563 563 3 5%3$3 73 $3 673 863 8 633756&33673563393$3553399586563833 3 66&313536 735636839(3%'3%'3338573$337&3 3 $ 3 $73 563 %563 9(3 3 763 5%3 763 3 6 3 7 9'33355333$936&3 3 83 3"%3+,,,3 663 3 3 %3 $3$953 563 3 58376553$365%63$3 359% &3459% 38633 79665$53 53 #3 7%56(3 67593 3 57593 893 $3 567 63 5%&337533336 593$3 53563367593 '3 63 53 5633 5 63 3 63 3 693%9'3#593 3 57593 $9763 3 $$73 $3 3 7653 3 3 956583 $3 3 686&3 83 3"%3 8335#3 3 3 36663 3.3 88 5563$33896 93%5!!3$3665993 357593 9/&3263$35%3355733 359% '3 366636 93 3$ 99377 3$33675936873&3#'337635635$$3 #5376'365638893 6353$3$ '3373#3$95%399'33 563373563 63$336875938 863336875$5735&33 3 16363763 6337 575'3#37363333563793 9533#53$'3 353563353#3683339556&34573

(36)

593 783863933765335937833 697336363333656353563673966333

95565736359355357373365393993

553333 3

2.6 Chatbots and pattern-matching techniques

13583783539853763563333739663 35833563363733363 93!"753 75#637373366$3953333753533933 %338"75375#636355338563763&3 • '753 33889536539636736353()2*13 3%93 • %533593533755"753936366363633 3+73486653663 • ,653' 3936353'1)3 3 )3 3 93-.//013 7953 3263 76593 6363 65893 8" 753 75#633 53 6353 7655363963 3 63 53663 365673 6863343 63 88636535653 339366$35563396333656736866343 9363675863765633395633#56336866386363 7556"753963(73#3563796655377533385793857313 63935637673633833759"83956586333 935963 3 3 83 3 7593 3 3 573596363 8932)75335633833-9137733233266333 289733759363-57613435336333#3387663563 8895363996&3 0 ,66$3583563773536336899533395633 633 66333653387665393 . ,66$396335553339357393573563 355338573-9333#13355353385735793 63653 33297533 3553 3 5735963 3 563 6635335837396333366336 3353 3553

(37)

453 3 8753 87663 3 75563 3 993 85763 3 78333935533333 395631633 6933857633673536339633 63733 375333563673857356336336 3 3 !563 8873 63 88953 53 3 6875573 53 3 6963 6 3 3 686353 63 3 765673 5366"3963 #3538 7537563396363533 53338895756335963!3 99 53667563#5 33875375633786363 8876363 53 553 5#93 553 $753 3653 6 536663 3

2.6.1 Information retrieval

3253 5#93 %2&'3 6663 5#3 93 763 7753 3 6875573633636753536753())3*632&36663633#7 6873932356393733376336"636333 633#733565936873 338663333593 63 5336"6333 53 3 63 3 763 3 3 673!3 2&3 663966336"63335639$5793635$6336"6363 3763775335936335993633763 3 #3 3 56373 3 3583 6388535336"63 3%&669933+#53(,,-'3&5#93876635636335$53 573 55763 3 763 3 3 !3 53 3 5$53 563 .3 5763 12345657833885335#3763339#3335763 23493 3 8853 3 9#3 763 3 3 5#/3%0 563 33 163(,,2'33 33 2535#9366636736339133367356383 3 3 75343 3 3 3 161#623 563 3 $893 3 2&3 6663 5736636$33657633963358335#3383 23$8763633638836563 63393$8763 63 3 7 http://www.google.co.uk
(38)

53 3 63 8 753 75!3 63 53 "#2$13 %53 5&53 3!375363 56853'3 5'53 5(93 53 595337)3*353'33935633+659385793869556,3 '57593 3 93693 -3'3 95,3'59,3 3*3 6%6.353 )3*857932/366635&33799753'376,367363%386,3 3 3 763 3 753 -%63 '3 3 6603 583 !)3 53633655939,3&7833+76.3%353'7355(593 6763 '3 35(3 8695)313 363 '3 3 73 7(66,3 3 7376395-3367352397633-%63'3358333 '5633639(373'35635&376)323563567,33 73%5993 3 3 67,3 3 '3 3 7(653 73 -3 897)3*3 7356383(3'3357933,36335360333 636866)315599,353595663(953956333%3 6763633633'33!)3 3 395-35'535(93666,3373(65635353563656,3 %39335(368663653658938876)3*3535356363 3-56533676,3&753363655'573%63%57333 963'!3 63 37536603!3%53 63 655'573%63 3 5(36%6363%59933567663537834)33 3

2.6.2 Information extraction

*393'35'53&7532"366635633+89936875'5735'53 '3 93(963 '3 &3 3 63 563 5'53 53 673 66,3 %3663 793!57-93!,3,33%5639535).3*93 4)33 3 239,332"36635637863'365&386636367533%533 #36778)3*3'5635633'59538766,3%57356339(3 863'33&)31'338 ' 68739:;3535638895333%63 %553539:;,3)),3(,3,375(,37)3*35363563387663'3 6573 53%573 756633 869353 3-63 3%53 6573 5'5)3 23 3'3 63 3 9&5793 863 56363 3 6%3 3 956583%33869396)3'73695356388953533 '5'3 63 3 7563 35'3 '53 &86656)3<5993 3 &73 5'53 563'3 3 653 3 83 89)3 23 3 3 653 3
(39)

875653379936353236663533596338333 9!53 "3 2#3 6663 563 63 3 785$3 3 !573 $3 8963 53 63 8!73 3 !63"3 3 63 %6&3'75653 563 3 87$3"33"36963"599377933393!3"369633663 "599337993563387$3"369633663"!37793336963 53363&3 3 2#366633!63535""35636!7363"55$3363"39938893 3897635337!35 5$35"53!3889(63$63"55$3993 863 "3 !63 "3 3 6883 3 75$3 5793 853 76&3 45""3 95$3 9$563 3 88953 3 93"3 863 3 579!3 3 )!53355378&3*3666383339 93"336)!73"3 6+363!6367573 965636!7363 55"5$3 3$!8633 !3 $!8633863363533"36!,7- -,73956&31369753 "32#3963563!9539.3 33 • /3013305!93673166756315$332333563 36"393393533"3!634!6566345"5$3443 65&3/335%763 3"563 673 3 93"3443 579633863395 337873)!55 36!3"39933 63 3 3 6753 53 3 96315$3 3 23 3&3/3 %753876635637863"336$6.33"56356335653 39%579387665$35735 9 63 !65$336$538-"-6873 $$5$3 3 -663 $$5$&3/3 673 6$3 563 67573 865$3 573 9663 3 $5793 6!7!3 "3 3 673 3 63 3 %65 3!63 "3 3-73 9%5793 6573 3 3 3 69 3 3 5$!53 89&3/3 963 6$3 563 53 96563 573 5 9 63 69 5$3 7"73 5$!53 !65$3 53 5"5&3 7"735$!538935633"5$333635353 6 93 5""3 63 6875993 53 95$3 53 63 &$&3 755$3 3358346356336386363358&46&3 3 • /3/%739 93 3 9&33 563 3 4!73,3 5663 2#3 66&34 9863833%379665"575388733%73693 "59963 !3 "3 3 %&3 /3 6573 53 563 3 993 3 69573 8653"335563359933%73637$56333

(40)

56553 3 63 5563 3 3 85753 77753 653 63 63 3 63 3 3 53 9993 67563 13 3 3 3 63 9533 56363 393 9933 3 796655753 66339993896333635336353763 3 93 3 !3 73 3 63 53 63 563 533 3 3 3 63"3 73 563 53 3 53 6975363!363 3 6733363 563 579636335386#353$659%3863$6"593 56%337556#37&753386633675399633 3 • '(1)3$*53+,,,%38633357338932'36663 3 -73 '9563 3 2953 23 6763 563 3 57593 663'(1)356378633386#3699!38653933 873 3 3 3!53 83 3 6873 3 63 895793 55.3353755393375635563673 63639756338636338895333393 86656.33933733553336733 3 .3 35993 393 3 953 993 6635533 3 8733693893 3 • /506342*('3$493 253*3(953'865%3 663$/53 1223%3663368853335386353363 3 349345343 3 53 63 3 753 3 953 3 63$359%3856333!3 393563555333 6993636339538563199377733953$359%3 !53563653735633!5533!3'737773563 7863353596#333333593775353333 6(733373!333!6333656385533 3 3 3 !3 3 3 3 593$59%3 3 3 6563 65533593 33777395635636338733863 9563!37383763993777633933633 35935963 3763 3 963 83 3 853 653 3 6(73 349345343 73 3 673653 63 863 3 53!363 3

(41)

• 36763599633363857397336883 673386356333 !!!"#3$38766363 36386337756338573356%33673353 3763755363863539376%3&33 53'73 53 3 93 63 783 3 53'73 53 3 5333786#3$38(&38633333633 638633633)75#33 3

2.6.3 Question answering systems

*653 653*1"3 6663 3 983 3 77836+63 '653 53 93 9%3 3 53 6633 53 3 2%3 3 '65( 63 66#3,$3 93 3 3 '653 653 663 563 3 53 -66+33'656333993763336(75386663 63635535936663793#.3456339#33 !! "#33 3 /$10$%35633'65363663533357%398331563 233 3 42$+63 123 93 53 566732335668"#3/$10$+63 &93 63 563 863 63 697(95(973 896#3$3 663 &6383 3:9563 6735369935637993&93676%33733756393 3#3$63 6763 3 963 68933&633)7#3 $3&633633535333'653896;333333 638963563#3<3363563%3366389563353 633&336369933/3 !!="#33 3 2&339#3 !!5"36895333*1366353353786>3 5# 3 553 593 53 3 3 93 763573 73 633'65#3 # 3'39537563383363'653533'33 32035#3 7# 3 63 )753 3 3 963 3 763 3 )763 6633#3 3 163 3 63 786%34?@4:03 66363 593 63,33 '653 65366.32&339#3 !!5"%3573778636+63'65%3656353

(42)

39367353376336633356383 3 87663 563 3 3 65893 3 5593663!653 563 863 3 53 3 675733673 333673563866333!65379665533 53 3 83 3 63 3 873 13 3 3 863 !653 563 693336563367353!5636533!395393 663 !563 3 653 393 673 53 373 3 9"3 763 36375393#633333739"3 658863 7993$656%33 33 863 3 #3 3 53 63 3 "53583#6357337963373333395633 866593 753 663&5993 3 63 69753 93 #63 3 7563 7753 3 3 7963 3 3 3#6'3 79663 3 65593 663'37663359399339937966'3363339563 3663533563#3353383 3 16(6135633!653653663573#63533655933 3)*+4,-3"35633657533663!6533353,9563 63 )*+4,-3 16(63 77863 3 &73 .8563 2953 3 /63 !656'3 3 3 6953 93 563 63 3 3 3 76853,95636316(6365633!6533333 673 53 57953930315623 3 19"563 63 3 3 !65335"339"3763 3.7537563563633 75363 3 3763 673 63653 3 6875573 57533313 35375373736953373953 3 263 3 5"53 3 6633 33 63)*+4,-3 3 16(63 &134&535-5339367789366359633!936#3!656373 3*.,2, 326363563#9363353663 358"33 753 87663&13453 889563 67573 8653 3 3!653 3 863657375365336573#9363$12%323 763 14&133 6635153677:93663$699%3 93 93 653 75!63573 63$3753 563 63 3#3 7856'33663863367573865333!65353633 7365737786%35.563677793 3 9 http://www.answerbus.com\about\index.shtml

(43)

263 37653 53753 366365353136563 3 7366 3!665339"3#$%%&'3663388737993(63 95)"3 23 563 76 3 63 89635993 3 5939933 73 65 3 3 63 89635993 3633753 563 3653 753 3 73 356653 553 53 33 65593 3 553 75"33 89 35*3 3653(+3 563 3 853 5563 3 ,9-) 338736393 563(.3 85355633,93563 /0,12345416,7)"3.3 3+93+53+3 563 6733 563 73 63 67 3 3 3 3 8635993 3 7"3,*9533 563 75363335333955657395363963 9963366335*33993789367363363336"33 3 23 563 687 3 3 93 93 573 793 3 593 63 3 933 553 5*9 3 553 75 3 3 653 653 #1'3 666"36753 7*6563 3 963 993 13 6663 *3 3 93 935733759533757535366"3 3 8,,52611.3 #289339"3$%%:'3563 3 93935736633 9963663 3 5*3993 3 763 6*5763 *3 3"38,,5 2611.3 63 *983 3 ;<3 3 663 3 4"3 =3>3 68*5653 3 ?5*653 365665"3.3 573 8*563 39953 759556@3 • A5937 3"" 3(33-)33(2335)"3 • 2535*9 3""3(+356303286-)3 • .69375 3""3(533679385733367935793286)3 3 32343 #673 !*53$%%%'3 (563 3 93 93 573 573 663 ,9563 563 3 783 853 666")3 .3 53 893 53 *9853393935736367533673!*5356333 336336933,9563"3.3*7356389 33234363 653393533863356@363355B3633 895B3 3 633 575"332343 563 7863 3 39953 96@3 3 86 3 6573 86 3 93 93 3 3 9936"3 3

(44)

2.6.4 Summary of pattern-matching techniques

56353663323666363673957938753373 93 76 3 !63 3 5653 2"3 3 #13 6663 63 3 6856573$%&3 3 963 583 3 53 66'3573 65763 63 6663 3$%&3 96 3 23 563 67'3 3 53 563 3 5653 3993 65893 95793 8753 73 3 3 3 9'3 893 73 533(6565 3 3 2356367'33937993)1#73*56766353783+,3563593353 663366-3(6563933)1#333.793385333 /5653 3%6 3)1#73 9063 9503(653 653 3 553 593666353633778539393(65'3353 363336339506339366 3'3)1#7356333 85636663536335636589575 3)1#73563655933)1#)5353 63 3 653)1#35963 563 3 5533 3!93!53!3 63 1/%4"3 3 16263 333 3 3 '3)1#73 63 3 63 3 6757'3657333895793965635337538766 3 '3 )1#73 963 3 6663 653 6993 93 93 65 3 )1#73 953 83 3 )1#73 563 875793 3 753685336733393939365 3 3

2.7 Machine learning techniques

%4&543369953*7889,353331234565735363353 3 5953 3 563 65'3 93763 3 656'3 583 75536059963387575'335633093533753 865 336333:36373056339535633 7(56553355353393305385756333 ;3 *%4&543 3699537889, 33 63 7966553 753 953 75(635<3 • .856395'357357963553393385733833 33583653363389358=83856 3 • 965'3573 57963853 3 63 353 8963*53 3 667539995,353937966'3563563963799368563 95 3

(45)

• 573 953573 57963 653 3 9393 3 953 3 3 7563 53 673 33 3 53 3 93 3 3 63 375395367639363563563 6336875937633 685639538533555939!3 3 " #$3%&'''(37966553375395353353363996)33 • *76573753 953 88763573 6753 3 933 3 63 3 85956573 3 3 8653 3 85956573 87563335963!!3353"3"93 %""(!3 • *9573 753 953 88763 573 5793 953 75#63 3 3 363 859556!3+,8963 3 63 88763 )312345467892253573336856333633353 3 7873 5793 673123454678 4595357333956633 7753 963 3272943864958573 363 3 533 75633675736733378!3 3 "7539535635936353786-63785939556576363 3,893%1993.//03&''13&''2(!3468533733637633 3335953393373953887363535636563 3 593 58383 963 563 65593 3 3 563 3 6347597325218 2747826521827478352521827478887!3237633 75653 63 3 79653573 3 3 593 393 86533 573 56363 53 796655753%85753 3 6(3 3 567-63 9535633336856357539535738639933553 353563336635338573%79665(33633335833 6533 ,893 38 722598 72463 95!3 23 3 567-63 953 8873 3 3 563 863 3 63 3 56763 53 6875573 56!343 ,893 353%553 343.///(3 935695663 3 363393373386633567333796363 356!366356336353375387663379665333 567!33 3 23 3 8873 3 7863 563 7653 63 3 5593 !363 ,73 583839633378633995338895)3

(46)

• 73738535883856356376536335673 • 3563333 6365!5573963"363#73 37373376536356337966533633 33583 • 453863573$336 35833!835335673 533 395633863 3 563 35993367535359353783%356375"38$5633 33!5$5!3363636333 6365!55733 75!3 3

2.8 ALICE chatbot system

1&231557593&5!56573 23 835123'9973())*3 563 3 73 66 3 3 !!63 3 3 53 59!365!3 12+&31557593 2995!3+,-83&!!33$6533.+&3386335883 9633395!359!336573563312+&3/763312345678493 73 7!3 63 3233473573 763 3 583 3 66753342343 5735 8956331&2368633 3 65 895753 3 12+&3 ,63 53 633 -8! 63 6875993 63 393,3+&33!36355!37636331&203353 65 895563 3 6 9563 3 573 3 #653 3 76313 5563!93312+&35633533 33889373533763 735335"333#856331&23!5379395993 !-63 3!3533283 57993 55!3895737!563333 3 673 73 3 3 1&23 73 393 $5993 3 3 1&23 5365353337!3633483675353763 639933753332833

2.8.1 Overview of AIML

12+&3467563 3 79663 3 3 /763 7993 12+&3 /763 3 85993 67563 3 $53 3 7 83 8! 63 3 87663 53'9973 ())*337312+&3593756333/7633663533<aiml>3!3 10 http://www.alicebot.org/
(47)

57386633123653123763338335637993 1234563 3 7563 3 3 3 8593 63 7993 <topic>3 3 <that>33 3<topic>386336337563!33<that>33 9633739636863 39931233563635353"#3 3 3 123 83 93 563 6589!3 765653 93 36!3 6876!3 3 3 5973 6963$3 3%3 3633 76563 3 9388763 963 3 963136593687356363368336!33359737763 753 95&363 35636563 3 123 993 93 3593 73 773 83 83 3 3 123#'#3 63 85639589359763 53 73833 3 389356331236863323563658963!338937656633 93895!3&3(3 389339637531236357373 63 !3 753 3 86!353 75593 6866!3 3 76593 7993 3 8373 3 563 3 68663 3 3 7563 3)3 (89!3 3 <javascript>3 3 9963 53 675853 5653 3 8963 3 3 *aiml version=”1.0”> <topic name=”the topic”> <category> <pattern>USER INPUT</pattern> <that>THAT</that> <template>Chatbot answer</template> </category> .. </topic> </aiml>

(48)

2.8.2 Types of categories

3338633756357393376537563

123456768293597363 538633333 59763 33

<category><pattern>10 DOLLARS</pattern>

<template>Wow, that is cheap!</template></category> 3 23 563 763 53 363 583 563!"3 996#3 3 3 6863 5993 3 !$ 3356378%#3 3 & 982768293593363 5386363 5976363863 6933 3 753 87663 593 3 73 563 67533 3 88853733 59763733583333553 53539857933 3 <category><pattern>5 *</pattern> <template>It is five</template></category> 3 23363583563!'3996#3356373563393733 36863563!235635#3 3 ( 96597 68293597 7 88563 3 8963 3 863 3 893799633837376593653<srai> or <sr>3 833

2.8.3 The recursive categories

)3 88957563 793 3 753653 3 7653 63 573 73 563 5333733!12#363557593599573 3!*+#3 3 3 !6596,686#3 !67573 5#3 !69573 75#3 !65893 765#3 3!63 695#3-657993 3 3 3 53 759556333<srai>33<sr>3635312).33

(49)

885366358353335333 • 58953393 573333!633933 673 53 !573 63 3 3 3 789"3 3 765#3 87663 863 673 5863 3 3 65893 63 83$3 "893%433&!3!3 763563'(356388353%)3 563 76(3 • 3493!53663199358633#3363533 363686338833363833%5(3%53 *(33%5(338833%99(3 • 73 689953 3 3 3 %3 3 (3 563 3 73 56&3!5735638833%3(3 • 473 &!63 23 73 63 863 3 773 !53 85793&!3966356385"63365"6363!#3 563&!3 563 73 53 3 673 3 63 6!3!5993 32356376353756333393!53 3&!3865533533&!63563%(333 99!53 583 863 3 3 3 93 3 593 9753%+3 3,-./30(33 93 3563%,-./30(3 3 93 3 963 %+(3 3 53 53 563 93 3 !33 %,-./(3 1 255533&3 !3663 583 53 3 33863 33 753 53 68663 7&3 3 3 3 673 663!53 %36(3 3%4(3!5993 3 3 63 3 !3 683 6763 3563 563 %36(333673563365"333673 3 $53113599663531253689635333631253 3 53 152.3 3 633 3#6563 #983 53 563 6563 3 86363338#5636333765#37563

(50)

3

Sample (a) shows the use of <srai> to simplify the language, obtained from the Afrikaans version developed in this thesis.

<category>3

3<pattern>WAT IS JOU NAAM</pattern>33

3<template>My naam is Karike. Wat is jou naam?</template>33 </category>1

<category>

<pattern>HET JY N NAAM</pattern>

<template><srai>WAT IS JOU NAAM?</srai></template> </category>

<category>

<pattern>HOE HEET JY</pattern>

<template><srai>WAT IS JOU NAAM?</srai></template> </category> 1234567897552676742727926737672277567 <category> <pattern>YES *</pattern> <template><srai>YES</srai><sr/></template> </category>

Whatever input matched this pattern, the portion bound to the wildcard * may be inserted into the reply with the mark-up <sr/>, which is equivalent to <srai></star></srai>

Figure 2. 2 Samples of AIML categories, illustrate the usage of the recursive categories.

(51)

2.8.4 Preparation for pattern-matching in ALICE

3 653 3 8753 8763 73 583 3 3 123 5836386633387666 3 !" #3956538766"3 $" %7535838337367"3 3 12345678996547634 #3 95653 87663 5&9&63 3686 3 66553673 689553 3835539565363635393$"$"33 !" '65539565356335657388953335833863 3535535335833935633963533 673 689553 3 83553 9565"3 23 73 565563 3 3 53 53 53 563 63 63 3 &553 3 3 6763 3(63 3 85)33)653338973533563888535"3 $" '7368955395653563333753373 583 563 8553 53 6763 653 65893 963 95*3 *53 676338563+653*633)7953*6"3 ," %3553956535&9&6336*6 3 • -&53 87533 583 3 *3 53 7859353 68737&65"3 • &53 3 583 963 3 883 763 3 75953 3 75353312386"3 33 4853335836655336895533333889363 835539636"33
(52)

3 Input Substitution normalised form sentence-splitting normalised form pattern-fitting normalised form "What time is it?" "What time is it?" "What time is it" "WHAT TIME IS IT" "Quickly, go to

http://alicebot.org!" "Quickly, go to http://alicebot dot org!"

"Quickly, go to

http://alicebot dot org" "QUICKLY GO TO HTTP ALICEBOT DOT ORG"

":-) That's funny." "That is funny." "That is funny" "THAT IS FUNNY" "I don't know. Do

you, or will you, have a robots.txt file?"

"I do not know. Do you, or will you, have a robots dot txt file?"

"I do not know" "Do you, or will you, have a robots dot txt file"

"I DO NOT KNOW" "DO YOU OR WILL YOU HAVE A ROBOTS DOT TXT FILE"

Table 2. 2 The normalisation process in ALICE pattern-matching processing 3 123456789785326886 93356338766338753358383336733583 6339953 3283<that>39<topic>3!93358383 56378633396 33 " 39563583

<that> tag which 96338563736#39563533 63363583 93$3856336353%5663 &963 93$3'3 <Topic>3 3573 9963 3 583 3 83 68663 953 5338573 !9 3$38573353%5663 &963 !93$3'3

(53)

Normalised

input Previous chatbot output (normalised)

Value of topic

predicate Input path

"YES" "DO YOU LIKE CHEESE"

"" "YES <that> DO YOU LIKE CHEESE <topic> *"

"MY NAME

IS NOEL" "I GUESS SO" "MUSHROOMS" "MY NAME IS NOEL <that> I GUESS SO <topic> MUSHROOMS" Table 2. 3 Producing the input path by the AIML interpreter.

2.8.5 The ALICE pattern-matching algorithm

3123583563373333353396383 73573563363356353733675335363 33 79663123456378923573 765663 333 3637993963449273!3 83763 373"337638633 56333 33893 335976333 33#8635633$88353%3&'''3 7633 373 33 56363 39933863!(''''3533763 3 3738)67312*35"333 39 36353383 563 +93 3 3 3 3 7563 3 73 9 3 3 7563 3 <template>3,3!-9973&''."33 3 3 3

(54)

5335633569565333 3638!63"!3#9973 $3 %3 3 3 3 &8!63 63 3 3 63 3 3 863 85799'3 63 563 563 563!883 53 3 53 3 73 3 563 3 53 68953 %563 8763 3 53 3 (53 3 3 9963 3 5) 753%56335633*(8+35336859335353363,3673 3 %3 8)!753 87663 793 3 6753 53 !63 3 3593 66!3 (59399359633963356357579935333%359366!3 563 7!863 3 3 3 673 63*7-.+3(573 63 763 8635963 3 96337393536363563(3596336)963/3593633 *83 !+3 3 !563 563 !3 3 8655303 3 6!3!3 3 112/3583(26333!89334653*#35633!5+3793 3(5363*-6#1%6276809:6;1</+3(3-35633&8!633 3 166!3366=3583663(533(3*>+338)!75395!3 889536538)56367337237253754636399(6-3 ? *23 393 63 3 693 363(53673 333*@6+33 6733533!73993(636533>3533!73-3 &37233933353369363(53(3>3536333 *>6+36733!75335933>3,633!73233!73 -3 &372333933353369363(5363535363 33*A6+33993!55365363358399(53*>+3363533 !73233!73(6337357372333833563 93383*>+37233333358+3"#9973 $3 #33!735633387663686333!893396333 7356387663358337673383233!7333 !3 6(3(5993 33 673 63*73 23 3 3 5+3533 5996633!75387663 3 3 3 3

(55)

3 3 3 3 3 3 33 63563 8553 3 573373 63 67385533336396385533 38633333 985799393859933633376333333 3 33 13 !893 3 1"2#3 73 3 3 75$3 3 375$3 87663363535$%3&'335$%3&(36875933 3 3 <category>

<pattern>_ WHAT IS 2 AND 2</pattern>

<template><sr/> <srai>WHAT IS 2 AND 2</srai></template> </category> <category> <pattern>WHAT IS 2 *</pattern> <template><random> <li>Two.</li> <li>Four.</li> <li>Six.</li> </random></template> </category> <category> <pattern>HIYA</pattern> <template><srai>HELLO</srai></template> </category> <category> <pattern>HELLO</pattern> <template><random>

<li>Well hello there!</li> <li>Hi there!</li>

<li>Hi there. I was just wanting to talk.</li> </random></template>

</category>

(56)

Figure 2. 5 Tracing of ALICE matching process Applying step 1, match with

Normalisation Process

Hiya, what is 2 and 2?

HIYA WHAT IS 2 AND 2

_ WHAT IS 2 AND 2

<srai> WHAT IS 2 AND 2 </srai>

WHAT IS 2 AND *

WHAT IS 2 *

Two. Four. Six

No match. Go back one step. Try to find

No match. Back another step. Try to find

Match found, select an answer randomly from the list <sr/>

HIYA

HELLO Match with

Well hello there! Hi there!

Hi there. I just want to talk.

Hi there! Four. Atomic match, select an answer

(57)

2.9 Elizabeth system

95359957335633853332138 !"353#5733 $5%63 6975"3 6%65%5"3 3 863 6 3 !756!63 $3 3 733 95633576339&55953385938595'3 (#9 35636363367583533&359"3#37395356363#533 67583 7!!3 5'3)63 563 3 65 93 776"3 33 73 %9*83636#3535 %3'+'373675837!!36335&3733563 3%!5799'3237396335&3%65 33%636875937'3 3

W: Welcome message Q: Quitting message

N: No match V: Void input

I: Input transformation / : Comment

K: Keyword pattern R: Keyword response

M: Memorise phrase N: No match

O: Output transformation &: Action to be perform Figure 2. 6 Elizabeth script command notations

2.9.1 Elizabeth’s script file format

1367583593!37533!63%386,3 -' .75837!!3956395 3#97!"3$5333/#3!66 6'33 ' 28%36!53%96"3#57363#5302133!86358%333 !3337!8593#53353/#6'33 2' 3%8%36!53%96"3#57363#53031337 638693 8%633388853633686'33 4' (#386"3#57363#530(1333/#38333 !7"399#3 3 3 6863 3 3051'3)3 3 #3 863 3/#3 86,3 65!893 86"3#573!73 93 3 65 93 #"3 3 7!8653 86"3#573!73 676"3 866"3#6"3 65 6"396"35 335 '333 3

(58)

2.9.2 Dynamic processing in Elizabeth

3 573 87663 589563 853 3 63 3 75635735563 3 675835933765356338663637563339953 3 & {script command}3 3 573 87663 7563 593 5 3 5653 3 953 763 63 876663 73 3 835793653!3333753876633 ! 153 3 67583 761363 3 3 3 67583 53 5363 3 "56338973533335353563933#3"89 3 I my sister => my sister & { K MOTHER

& { N DOES ANYTHING ELSE ABOUT YOUR MOTHER? } R HOW WELL YOUR MOTHER AND SISTER GET ON? }

3 3 3 7563 3 3 83#569 33753 7763 5335839 33663599367333$3%&'( 3 5356333686333(3599333336863 9563 3 3$3 23 3 3 3 3 3 793 7$635993 3 3 3 3 67583)79 3375353%&'(3 7763 93 3 3 3 3 3$336635993 3 3 3 3 9563 3 8973393353"5663 * %56537633633933633368633 375337356333%3993333 9933393#3"89 3 3 K [ ] MY NAME [phrase] & { M label [phrase]} R YOUR NAME [phrase]? 3 23 363 583 5633 3 563+3 3 3 3 635993 3 ,&-(3.1%'3 2)3+&.3 13 3 63 53 33 993 953 +35633 / 49537613631933675837336533 7$3696353#3"89 3

V\: deletes all current void messages.

(59)

2.9.3 Elizabeth’s pattern-matching algorithm

36533753876633753356333735833 753 63 583 3 93 763 5653 68763 3 63 3 8753353637763335837831232425262728292 992 2 2 2 99 3!53"#3 663 3 893 3$95%3 73 3 93"&3 599663 3 753 83 3 3753 8766 3'3753 87663 59633995353663 3 ( )753 583 653 963 3 73 3 8893 3 573 876663 32336393563889579333753333 (*356353677665335356388953+637 3 " )753,38633993,633889535363593 3733373563 3'3686356353775333 56373 3 53 3 563 3 9563 3 68663 3 3 3 563 6973 93335373356368635993373533 3 363 53 53 3 6373 7763 53 3 3 6665 3 23 3 563 3 5738766353599338895337333835363 5 32337339933333683 3 - .83653963633889533639536353 633337337533388376 33 & 233737763533,386335337533563 83 3 3 3 35366635993 3 6973 93 3 3 ,3333333,36663599335689 3 / 05335738766633853373756353 356 3
(60)

3

/ The Script begins with Welcome, Void, No-Keyword and Quit responses:

‘001’ W HELLO, I'M Elizabeth. WHAT WOULD YOU LIKE TO TALK ABOUT? ‘001’ V CAN'T YOU THINK OF ANYTHING TO SAY?

‘002’ V ARE YOU ALWAYS CHIE ?

‘001’ N TELL ME WHAT YOU LIKE DOING. ‘001’ Q GOODBYE! DO COME BACK SOON. / Next come the Input transformations: ‘001’ I mum => mother

‘002’ I dad => father

/ Then the Output transformations: ‘001’ O i am => YOU ARE ‘002’ O you are => I AM ‘003’ O my => YOUR ‘004’ O your => MY ‘005’ O me => YOU ‘006’ O I IS => I AM ‘007’ O YOU IS => YOU ARE

/ And four groups of Keyword transformations: ‘001’ K I THINK [phrase]

‘001’ R WHY DO YOU THINK [phrase]? ‘002’ K MOTHER

‘003’ K FATHER

‘001’ R TELL ME MORE ABOUT YOUR FAMILY.

‘002’ R WHAT DO YOU REMEMBER MOST ABOUT YOUR CHILDHOOD? ‘003’ R ARE YOU THE YOUNGEST IN YOUR FAMILY?

‘004’ K [phrase1] IS YOUNGER THAN [phrase2] ‘001’ R SO [phrase2] IS OLDER THAN [phrase1]. ‘005’ K I LIKE [string]ING

‘001’ R HAVE YOU [string]ED AT ALL RECENTLY? ‘006’ K [ ] my [phrase]

& {M [phrase]} ‘001’ R YOUR [phrase]?

(61)

3 User Input

Input Transforming

Keyword Patterns Output Transforming Respond Actions Dad loves Mum ‘001’ mum=>mother ‘002’ Dad => father ‘001’ K MOTHER ‘001’ R - TELL ME MORE ABOUT YOUR FAMILY. - My sister is a teacher. - ‘006’ K my [phrase] [phrase] => sister is a teacher ‘001’ R - YOUR SISTER IS A TEACHER? ‘fam’ M sister is a teacher My brother is younger than me. - ‘004’ K [phrs1] IS YOUNGER THAN [phrs2] [phrs1] => my brother [phrs2] => me ‘001’ R [phrs1] ‘003’ my => YOUR [phrs2] ‘005’ me => YOU SO YOU IS OLDER THAN YOUR BROTHER. - I like reading - ‘005’ K I LIKE [string]ING [string] => read ‘001’ R - HAVE YOU READED AT ALL RECENTLY? -

Table 2. 4 Tracing an Elizabeth output

2.9.4 Implementing grammatical rules in Elizabeth

95363359533873336739656333673 653 3 63 3 583 653 963 3 8633 963563 85633575336333377863375 63393 938766536553399!53"33 S => NP VP NP => D N VP => V NP 3 56373386333583963393!5368755736#3633 563636!35353$%3

(62)

I a => (A d) I the => (THE d) I cat => (CAT n) I dog => (DOG n) I likes => (LIKES v)

I ([brak1] d) ([brak2] n) => (([brak1] D) ([brak2] N) np) I ([brak1] v) ([brak2] np) => (([brak1] V) ([brak2] NP) vp) I ([brak1] np) ([brak2] vp) => (([brak1] NP) ([brak2] VP) s) K [any?]

R [any?]

W TYPE A SENTENCE USING: A, THE, CAT, DOG, RABBIT, BITES, CHASES, LIKES

Figure 2. 8 Grammar representation rules in Elizabeth 3 5636758366333973663 953 3 13!""3#!2"$3 3 133 13 4%$3&1''23 '2!31!!3(2)!3 #61337395*6333 9533+++34,3+13",3" ,3++(2)!3-,3++34,3+4%$3",3" ,3- ,36,3

2.10 A comparison between ALICE and Elizabeth

.338/5063136756339915377906563133 3 • 1(230633658938389338635803308033 9630653 65893 82753 9533639530663 58030963*1386330803096333368633 • 37065/375403063531(235637653633*3033 36633235630633658955335803379953737563 7065/933 1/3 3 03 3 63 0963 539533 7063 779533553157356369/3388953309393735353563 88957933375/35333 • 23 1(23 3 563 3 5953 3 753 13 6163 53 3 763 3 689553 0533 95653 87663 338555370633 37065/38766335638/563313385533673313 676337535360963035635633/5993539533 • 36365557303531(23563382753953 157356363386338256367335639535633

(63)

53 3 963 83 73 593 953 563 3 6863 77533356338373 • 3 6663 73 73 8693 8!6"3 3 563 5633 789573 53953 !3 3 763 665553#5633 7!63 63 353!6!5936633 • 366636!883565395$6357387663573 996337563377!359337653563!38663563 765333!3573633%563531&23 • 2336358!3863!5337653953563353 536633!653533697536866333 9563 1&23 563 63 3 3 69753573353 !89573 68663 • 953 63 3 5953 353 3 553 6!7!33 3 673 !6533579396563573563385331&23 • 953 57863 96563 9633 993 6863 53753 9853 3!633!633663333 '993 1&2363 65893 3 3 6!533 !573 553 87663 886353563673573563567!66353783(3 3

2.11 Evaluation methodologies

)75793 88957563 3 9!53 33 566!63 53 9!3 553 !53*+,,,-37756639!355353633./5637!63 3906793875793663331!5539!53386633563 5995663373356333751!6233 3 33 4&)3 6663 73 3 9!3 3 7853 !8!3 563 3.93 6238733!8!3*3!6331993+,,(35363393 788(39953393788(-3!3563887363333575359!3 66633!8!38633!6358!335753 3 #3 &3 )53 *&3 7889-3 78553 63 3 !63 3 9!3 753 7653 763 #3 &3 )53 563

References

Related documents

Zero Error and Zero Correction in a Screw Gauge: Usually, in a Screw Gauge, the zero of the circular scale, coincides with the base line of the main scale as shown below and we

On the Return of Christ, there will be peace on earth and kings and nations shall serve Him; described prophetically as in the reigns of King David and Solomon - types of Christ

We explore the influences of different choices made by the practitioner on the efficiency and accuracy of Bayesian geophysical inversion methods that rely on Markov chain Monte

As breast cancer VEGF levels stay unchanged at low levels and increase with high concentrations of zoledronic acid, our study provides some limited evidence that the decreases in

Inhalation: Mists, vapors or liquid may cause severe irritation or burns. Skin contact: Causes

The instrument contained 13 attitudinal statements related to the newsletter received, measured on a seven-point Likert scale: reactions to the site (newsletter frequency, drop

Here, defendant issued a mandatory directive and required local election officials to apply a presumption of validity to all signatures on absent voter ballot applications and

For the poorest farmers in eastern India, then, the benefits of groundwater irrigation have come through three routes: in large part, through purchased pump irrigation and, in a