The SynDiKATe Text Knowledge Base Generator

(1)

The

SynDiKATe

Text Knowledge Base Generator

Udo Hahn

Text Knowledge Engineering Lab

Albert-Ludwigs-Universit ¨at Freiburg

D-79085 Freiburg, Germany

[email protected]

Martin Romacker

Text Knowledge Engineering Lab

Albert-Ludwigs-Universit ¨at Freiburg

D-79085 Freiburg, Germany

[email protected]

ABSTRACT

SynDiKATecomprisesafamilyoftextunderstanding sys-temsforautomaticallyacquiringknowledgefromreal-world texts, viz.information technologytest reportsandmedical ndingreports. Theircontentistransformedtoformal rep-resentation structures which constitutecorrespondingtext knowledge bases. SynDiKATe'sarchitectureintegrates re-quirementsfromtheanalysis ofsinglesentences,as wellas thoseofreferentiallylinkedsentencesformingcohesivetexts. Besides centering-based discourseanalysis mechanisms for pronominal,nominal and bridginganaphora,SynDiKATe issuppliedwith alearningmodule forautomatically boot-strappingitsdomainknowledgeastextanalysisproceeds.

1.

INTRODUCTION

TheSynDiKATesystem belongstothe broadfamilyof informationextraction(IE)systems[1]. Signicantprogress hasbeenmadealready,ascurrentIEsystemsproviderobust shallowtextprocessingsuchthatframe-styletemplatesare lledwithfactualinformationaboutparticularentities (lo-cations,persons,eventtypes,etc.) fromtheanalyzed doc-uments. Nevertheless, typical MUC-style systemsare also limited in several ways. They provide no inferencing ca-pabilitieswhichallowsubstantialreasoningaboutthe tem-platellers (hence,their understandingdepthis low), and their potential to deal with textual phenomena is highly constrained, if it is available at all. Also novel and unex-pected thoughpotentially relevant informationwhichdoes notmatchgiventemplatestructuresishardtoaccountfor, since system designers committo a xedcollection of do-mainknowledgetemplates(i.e.,theyhavenoconcept learn-ingfacilities).

WithSynDiKATe,weareaddressingtheseshortcomings and aimat amore sophisticated levelof knowledge acqui-sitionfrom real-world texts. Thedocumentswedeal with aretechnicalnarrativesinGermanlanguagetakenfromtwo domains,viz.testreportsfrom theinformation technology (IT)domainasprocessedbytheitSynDiKATesystem[8],

.

andndingreportsfromamedicalsubdomain(MED),the frameworkofthemedSynDiKATesystem[10,9]. Ourrst goalistoextractconceptuallyandinferentiallyricherforms of knowledgethanthosecaptured by standardIE systems suchasevaluativeassertionsandcomparisons[25,24], tem-poral [26] and spatial information [22]. Second, we also wanttodynamicallyenhancethesetofknowledgetemplates throughincrementaltaxonomylearningdevices[12]sothat the information extraction capability of the system is in-creasedinabootstrappingmanner. Third, SynDiKATeis particularly sensitive to thetreatmentof textualreference relations [27, 6, 14]. Thecapability to properly deal with various forms of anaphora is a prerequisite for the sound-nessandvalidityofthe knowledgebaseswecreateas a re-sult ofthe textunderstanding process andlikewise for the feasibilityofsophisticatedretrievalandquestionanswering applicationsbasedontheacquiredtextknowledge.

2.

SYSTEM ARCHITECTURE

TheoverallarchitectureofSynDiKATe,anacronymwhich stands for \Synthesis of Distributed Knowledge Acquired fromTexts",issummarizedinFigure1. Incomingtexts,T

i , aremappedintocorrespondingtextknowledgebases,TKBi, whichcontainarepresentationofT

i

'scontent. This knowl-edge base platform may feed various information services, suchasinferentiallysupportedquestion answering(fact re-trieval),textpassageretrievalortextsummarization[7].

2.1

Sentence-Level Understanding

(2)

Figure1: ArchitectureofaSynDiKATeSystem

Conceptual knowledgeaboutthedierentdomainsis expressedinaKl-One-likedescriptionlogic language[28]. Corresponding to the division at the lexical level, the on-tologiesweprovidearesplitupbetweenonethatisusedby allapplications,theUpperOntology,whileseveraldedicated ontologies account for the conceptual requirementsof par-ticulardomains,e.g.,IT(HardDisk,ColorPrinter,etc.) orMED(Gastritis,SurfaceMucus,etc.).

Semanticknowledgeaccountsforemergingconceptual relations between conceptual items according to those de-pendencyrelations thatare established between their cor-respondinglexicalitems. Semanticinterpretationschemata mediatebetweenbothlevelsinawayasabstractandgeneral aspossible[20]. Theseschemataareappliedtosemantically interpretable subgraphswhichare,fromasemanticpointof view,\minimal"subgraphsoftheincrementallybuilt depen-dencygraph. Theirboundingnodescontaincontentwords (i.e.,nouns,verbs,andadjectives,allofwhichhavea concep-tualcorrelate inthedomainontologies), while all possibly intervening nodes (zero up to four) contain only noncon-tent words (suchas prepositions, articles, auxiliaries, etc., all of which have no conceptual correlates). Semantic in-terpretationschemataarefullyembeddedintheknowledge representationmodelandsystem(cf.Figure1).

TheParseTalksystem,whichcomprisesthelexicalized grammarandassociateddependencyparser,isembeddedin anobject-orientedcomputationmodel. So,thedependency relationsarecomputedbylexicalobjects,so-calledword ac-tors,throughstrictlylocalmessage passing,onlyinvolving the lexical items they represent. To illustrate how a de-pendencyrelationisestablishedcomputationally,wegivea sketchofthebasicprotocolforincrementalparsing[5]:

After a word has been read from textual input by theWordScanner(stepA

1

inFigure1),itsassociated lexeme (specied in the Lexicon) is identied (step A2) and a corresponding word actor gets initialized (stepB1). Asallcontentwordsaredirectlylinkedto theconceptualsystem,eachlexicalitemwthathasa conceptual correlateCinthedomainknowledgebase (stepA3)getsinstantiatedinthetextknowledgebase (step B

2

). The lexical item Festplatte (hard disk)

talkedaboutinagiventext. 1

For integration inthe parse tree, the newly created word actorsearches itshead (alternatively, its modi-er)bysendingparallelrequestsfordependential gov-ernmenttoitsleftcontext(stepC). Thesearchspace is restricted, sincethese requests are propagated up-wards only along the `right shoulder' of the depen-dency graphconstructed so far. Allword actors ad-dressedthis way check, inparallel, whether their va-lencyrestrictions,i.e.,grammaticalandconceptual con-straints, are metbythe requestingword actor. Step Dsimulatesaconceptualcheckinthetextknowledge base,stepEillustratesatestinthediscoursememory.

If all required constraints are fullled by one of the targetedwordactors,animmediatesemantic interpre-tationisperformed. Thisusuallyalterstheconceptual representationstructuresbywayofslotlling(stepF).

Semanticinterpretationconsistsofndingarelational link between the conceptual correlates of the two content words bounding the associated semantically interpretable subgraph. Thelinkagemayeitherbeconstrainedby depen-dency relations (e.g., the subject: relation of a transitive verb such as \sell" may only be interpreted conceptually intermsofagentorpatientroles),byinterveninglexical material (e.g., some prepositions impose special role con-straints, such as mit (with) does in terms of has-part or instrument roles), or it may be constrained by concep-tual criteria only (as with the genitive: dependency rela-tion,whichunlikesubject: imposesnoadditional selective conceptualconstraintsforinterpretation). The correspond-ing knowledge about these language-specic constraints is denselyencodedintheLexiconclasshierarchy,anapproach whichheavilyreliesonthepropertyinheritancemechanisms inherenttotheobject-orientedparadigm.

2.2

Text-Level Understanding

2.2.1

Referential Text Phenomena

Thetextualphenomenawedeal withinSynDiKATe es-tablishreferentiallinksbetweenconsecutiveutterancesina coherenttextsuchasillustratedbythreepossible continua-tionsofsentence(1),withthreedierentformsof extrasen-tentialanaphora:

(1) CompaqverkaufteinNotebookmiteinerFestplatte,die vonSeagatehergestelltwird.

(Compaqsellsanotebookwithaharddiskthatis man-ufacturedbySeagate.)

(2) Pronominal Anaphora:

EsistmiteinerPentium-III-CPUausgestattet. (ItcomeswithaPentium-IIICPU.)

(3) NominalAnaphora:

DerRechneristmiteinerPentium-III-CPUausgestattet. (ThemachinecomeswithaPentium-IIICPU.) (4) Functional Anaphora:

DerArbeitsspeicherkannauf96MBerweitert werden. (Themain memorycanbeexpandedupto96MB.) 1

(3)

Compaq sells a Notebook with a hard disk that is manufactured by Seagate.

3

ein

4

2

5

1

Compaq

subject:

propo:

verkauft

die

,

delimiter:

subject:

relative:

specifier:

Notebook

hergestellt

von

Seagate

ppadjunct:

pobject:

wird

ppatt:

pobject:

object:

mit

specifier:

Festplatte

.

einer

verbpart:

Figure2: DependencyParseforSentence(1)

Figure 3: ConceptualInterpretationforSentence(1)

Theresultsofsentence-levelanalysisforsentence(1)are given in Figure 2, which contains a syntactic dependency graph(together withvecongurations ofsemantically in-terpretablesubgraphs),andFigure3,whichdisplaysits con-ceptualrepresentation. Fortext-levelanalysis, pronominal anaphorastillheavilydependongrammatical conditions{ theagreementoftheantecedent(\Notebook")andthe pro-noun(\Es"(it))ingenderandnumber;alsoconceptual cri-teria apply insofar as a potential antecedent must t the conceptual role (or case frame) restrictions when it is in-tegratedingoverningstructures, say, thehead verb ofthe clause. In general, however, the inuence of grammatical criteria gradually diminishes for other types of text phe-nomena,whiletheinuenceofconceptualcriteriaincreases. For nominal anaphora, number constraints are still valid, whileageneralizationrelationbetweentheanaphoricnoun (\Rechner" (machine)) and its proper antecedent (\Note-book") must hold, in addition. In the case of functional anaphora,nogrammarconstraintsatall apply,whilequite sophisticatedconceptualrolepathconditionscomeintoplay, e.g., \Arbeitsspeicher" (mainmemory) being aconstituent physicalpartof\Notebook".

Theproblems textphenomenacause are of vital impor-tancefor the adequacyofthe representation structures re-sultingfromtextprocessing,andarecenteredaroundthe no-tionsofincomplete,invalidandincoherentknowledgebases. Incomplete knowledge bases emerge when references to alreadyestablisheddiscourseentitiesaresimplynot recog-nized,asinthecase ofpronominalanaphora. Considerthe referencerelationshipbetweenthepronoun\Es"(it)in sen-tence(2) which refers to the noun phrase \ein Notebook" (a notebook) in sentence (1). The occurrence of the pro-nounisnotreectedattheconceptuallevel,sincepronouns (as noncontent words) do not have conceptual correlates. Hence, an incomplete concept graphemerges as shown in Figure4 |thereferent for the pronoun\Es" (it), Note-book.2,isnotlinkedtoPentium-III-CPU.6. Anadequate treatmentwithaproperlyresolvedanaphorisshownin Fig-ure6, where therepresentation ofthe relevant portions of

Figure4: UnresolvedPronominalAnaphor,Sentence(2)

Figure5: UnresolvedNominalAnaphor,Sentence(3)

Figure6: ResolvedAnaphors,Sentences(1)and(2)/(3)

larbydeterminingtheequip-patientrolebetweenEquip.7 andtheproperreferent,Notebook.2.

Invalid knowledge basesemerge wheneachentity which has adierent denotation atthe text surfaceis treated as a formally distinct conceptual item at the symbollevel of knowledgerepresentation,althoughalldierentdenotations refer literally to the same conceptual entity. This is the casefornominalanaphora,anexampleofwhichisgivenby thereference relationbetweenthe nounphrase \Der Rech-ner"(themachine)insentence(3)andthenounphrase\ein Notebook" (anotebook) insentence(1). Aninvalid referen-tialdescriptionappearsinFigure5,whereComputer.5 is introducedasanewentityinthediscourse,whereasFigure 6 shows the valid conceptual representation capturing the intendedmeaningattherepresentationlevel,viz. maintain-ingNotebook.2astheproperreferent(notethat pronom-inalaswellasnominalanaphoraaretwoequivalentwaysto corefertothediscourseentitydenotedbyNotebook.2).

(4)

sym-Figure7: UnresolvedFunctionalAnaphor,Sentence(4)

Figure 8: ResolvedFunctionalAnaphor, Sentences (1) and(4)

Disregardingtextualphenomenawillcausedysfunctional systembehavior. AqueryQsuchas

Q : (retrieve ?x (Computer ?x)) A-: (|I| Notebook.2, |I| Computer.5) A+: (|I| Notebook.2)

triggersasearchforall instancesofComputerinthetext knowledgebase. Givenaninvalidknowledgebase(cf. Fig-ures3and5),theincorrectanswer(A-)containstwoentities, viz.Notebook.2 andComputer.5|bothareinthe ex-tensionoftheconceptComputer. If,however,avalidtext knowledge base suchas the one inFigure 6 or 8 is given, onlythecorrectanswer, Notebook.2,isinferred(A+).

Renderingalso quantitative substance to ourclaims, we analyzedarandomly chosensampleof 100reports on his-tologicalndingswithapproximately14,000texttokens[9]. InITtexts,(pro)nominalanaphoraandfunctionalanaphora occuratanalmostbalancedrate[27]. Inthemedicaltexts, however,functionalanaphoraturnouttobethemajorglue for establishinglocal coherence, while anaphora, pronomi-nal anaphora in particular, play a far less important role than in other text genres. The high proportion of func-tionalanaphora(45%)[42%-48%]

2

andtheremarkablerate of nominal (34%) [31%-37%] compared to extrasentential pronominalanaphora(2%)[1%-3%]isclearlyanindication oftheprimaryorientationinmedicaltextstoconveyfacts inaverycompactmanner. Twoconsequencescanbedrawn fromthisobservation. First,resolutionproceduresfor func-tionalanaphora{supplementingwell-researchedprocedures for (pro)nominalanaphora {haveto be providedurgently (cf.[6]forafullyworkedoutapproach). Second,functional anaphorapresuppose a considerableamountof deep back-groundknowledge, withemphasisonpartonomic reasoning [13],supplementingwell-knownprinciplesoftaxonomic rea-soningfortextunderstanding.

2

For all percentage numbers 95% condence intervals are suppliedinsquarebrackets.

2.2.2

Centering Model for Anaphora Resolution

Inordertoavoidtheemergenceofincomplete,invalidand incoherenttextknowledgebasesweconsiderdiscourse enti-tiesforestablishingreferencerelationswithupcomingitems from thetextualinputat alocal[27] and ata globallevel [14] of cohesion. To preserve adequatetext representation structuresacenteringmechanismisused. Thediscourse en-tities whichoccur in anutterance U

i

constitute its set of forward-looking centers, Cf(Ui). The elements in Cf(Ui) areorderedtoreectrelativeprominenceinUiinthesense thatthe mosthighlyrankedelementofC

f (U

i

)isthe most likelyantecedent ofananaphoricexpressioninUi+1,while theremainingelementsareorderedaccordingtodecreasing preferenceforestablishingreferentiallinks.

While it is usually assumed (for the English language, in particular) that grammatical rolesare the major deter-minant for the ranking on the C

f

[4], we claim that for German { a language with relatively free word order { it isthe functionalinformation structureofthesentence[27]. Accordingly, the constraints on the ordering of entries in C

f (U

i

)preferhearer-old(eitherevokedorunused)elements in an utterance (i.e., those that can be related to previ-ously introduced discourse elements or generally accessi-bleworldknowledge)overmediated(inferrable)ones,while these are preferred over hearer-new (brand-new) elements foranaphoraresolution. Iftwoelementsbelongtothesame category,thenpreferenceisdenedintermsoflinear prece-denceofthediscourseunitsinthesourcetext.

Whenweapplythesecriteriatosentence(1),Table1 de-pictstheresultingorderofforward-lookingcentersinC

f (S

1 ). Sincewehavenodiscourse-boundelementsintherst sen-tence,textualprecedenceappliesexclusivelytotheordering of the center list items. Only nouns and their conceptual correlates aretakenintoconsideration. Thetuplenotation takestheconceptualcorrelateofthelexicaliteminthetext knowledgebase inthe rstplace, while thelexical surface formappearsinthesecondplace.

(1) Cf: [Compaq: Compaq,Notebook.2: Notebook, Hard-Disk.3: Festplatte,Seagate: Seagate] Table1: CenteringDataforSentence(1)

Processing of the centering list Cf(S1) for sentence (3) untilthegeneralizationconstraintisfullled,nally,results inaquerywhetherNotebookissubsumedbyComputer, the conceptual correlateof thelexicalitem\Rechner". As thisrelationshipobviouslyholds,intheconceptual represen-tationstructureofsentence(3)(cf.Figure5)Computer.5, theliteralinstanceidentier,isdeclaredcoreferentto Note-book.2,the referentiallyvalididentier. Insteadofhaving twounlinkedsentencegraphs,Figures3and5,thereference resolutionfor (pro)nominalanaphoraleadstojoiningthem in a common valid text graph (Figure 6). In particular, Notebook.2linkstotherelationequip-patient,formerly occupiedbyComputer.5. Thecorrespondingcenteringlist attheendoftheanalysisofsentence(3)isprovidedinTable 2 (C

f (S

1

) has beenupdatedto reectthe consumption of theantecedent,Notebook.2,intheprocessing ofC

f (S3)).

(1) Cf: [Compaq: Compaq,Notebook.2: Notebook, Hard-Disk.3: Festplatte,Seagate: Seagate] (3) Cf: [Notebook.2: Rechner,

(5)

2.3

Textual Learning

Theapproachtolearningnewconceptsasaresultoftext understandingbuildsontwodierentsourcesofevidence| thepriorknowledgeofthedomainthetextsareabout,and grammaticalconstructions inwhichunknownlexicalitems occurinthetexts. ThearchitectureofSynDiKATe's con-ceptlearningcomponentisdepictedinFigure9.

linguistic

quality

labels

conceptual

labels

Hypothesis

space-p

Hypothesis

Qualifier

quality

space-q

Quality Machine

1

2

space-1

space-i

space-n

Hypothesis

Language Processor

dependency parse graph

text knowledge base

Figure9: SynDiKATe'sLearningComponent

TheParseTalksystemgeneratesdependencyparsegraphs. Thekindsofsyntacticconstructions(e.g.,genitive, apposi-tive,comparative),inwhichunknownlexicalitemsappear, are recorded and laterassessed relative to the credit they lend to a particular concept hypothesis, e.g., high for ap-positives(\thenotebookX"),lowerforgenitives(\Compaq's X").Theconceptualinterpretationofparse treesinvolving unknownlexicalitemsinthetextknowledgebaseleadstothe deductionofconcepthypotheses. Thesearefurtherenriched byconceptualannotationswhichreectstructuralpatterns ofconsistency,mutualjustication,analogy,etc.relativeto alreadyavailableconceptdescriptionsinthetextknowledge baseorotherhypothesisspaces. Bothkindsofevidence,in particulartheir predictive `goodness'for the learningtask, arerepresentedbycorrespondingsetsoflinguisticand con-ceptualqualitylabels.

Alternative concept hypotheses for each unknown lexi-calitemareorganizedintermsofcorrespondinghypothesis spaces, eachof whichholds a dierent conceptual reading. Aninferenceengineembeddedintheterminologicalsystem, theso-called qualitymachine,determinestheoverall credi-bilityofsingleconcepthypothesesbytakingtheavailableset ofqualitylabelsforeachhypothesisintoaccount. The qual-ier, a terminological classier extended by anevaluation metricforqualityclasses,computesapreferencerankingof thosehypotheseswhichremainvalidafterthetexthasbeen processedcompletely(cf.[12]for details).

3.

COVERAGE AND EVALUATION

SynDiKATe'scoveragevariesconsiderablydependingon the targetdomain. The generic lexiconcurrentlyincludes 3,000 entries, the IT lexicon adds 5,000, while the MED lexiconcontributes70,000entrieseach.TheUpperOntology contains1,200conceptsandroles,towhichtheITontology adds3,000andtheMEDontologycontributes240,000items. TheIT domainwas chosen asa testbedthat canbe ex-tendedondemand. TheMEDdomain,however, issubject to ontology engineering eorts ona larger scale. In order tocopewiththe enormousknowledgeengineering

require-a semantically weak, yethigh-volumemedical terminology (UMLS)toaverylargeterminologicalknowledgebase[21]. Admittedly,SynDiKATehasnot yetundergone a thor-ough empirical evaluationinone ofthe envisaged applica-tiondimensions. Wehave, however, carefully evaluatedits subcomponents. Theresultscanbesummarizedasfollows: SentenceParsing. Wecomparedastandardactivechart parser with full backtracking capabilities with the parser of SynDiKATe, which is characterized by limited memo-ization and restricted backtracking capabilities, using the same grammar specications. Onaverage, SynDiKATe's parserexhibitsalineartimecomplexitythefactorofwhich is dependent on ambiguity rates of input sentences. The active chart parser runs into exponential time complexity wheneveritencountersextragrammaticalorungrammatical input,sincethenitconductsanexhaustivesearchofthe en-tireparsespace. Theloss ofstructuraldescriptions dueto theparser'sincompletenessamountsto10%comparedwith thecomplete,thoughintractableparser[5].

TextParsing. Whilewithrespectto resolution capac-ity (eectiveness)nosignicant dierencescould be deter-mined, the functional centering model we propose outper-formsthebest-knowncenteringalgorithmsbyarateof50% withrespecttoameasureofcomputationcostswhich con-siders\cheap" and\expensive"transitionalmovesbetween utterances to assess a text's coherency. Hence, the proce-dureweproposeismoreeÆcient[27].

SemanticInterpretation. Ourgrouphasbeen pioneer-ingworkontheempiricalevaluationofmeaning representa-tions. Weassessed thequalityandcoverage ofsemantic in-terpretationforrandomlysampledtextsinthetwodomains weconsider.Whilerecallwasratherlow(57%forMED,31% forIT),precisionpeakedat97%and94%,respectively[19]. \Heavy"Semantics. Wecandealwithintricate seman-ticphenomenaforwhichwehaveprovidedtherstempirical evaluationdataavailable atall. This relatestothe resolu-tion of metonymies, where we have determined a gain in eectivenessthatamountsto16% comparedwiththe best proceduresknownsofar[16],aswellasitrelatesto compar-ativesandevaluativeassertions,wheregainsineectiveness werealmosttripled[25].

Concept Learning. The performance of the concept learningcomponenthasbeencomparedtostandardlearning mechanismsbasedontheterminological classieravailable inanysortofdescriptionlogicssystems.Ourdataindicate anincreaseofperformanceof8%(87%accuracy,whilethat ofstandardclassiersisontheorderof79%)[12].

EvaluatingatextknowledgeacquisitionratherthananIE system poseshardmethodologicalproblems[2]. Themain reason being that agold standardfor comparison |what constitutes acanonical,commonlyagreedupon interpreta-tionof thecontent ofatext? |ishardtoestablish, even for technicaltexts. A follow-up problem is constitutedby the lack of a signicant amount of annotated text knowl-edgebasesonwhichcomparativeanalysesmightbeassessed. MUC-styleevaluationmetrics,e.g.,havealreadybeen qual-ied nottoadequatelyreectthe functionalityofless con-strainedtextunderstanders[29].

4.

CONCLUSIONS

(6)

discourseunits for reference resolution using thecentering model. Thisallowsustodealwithvariousformsof pronom-inal,nominalandfunctionalanaphorainauniformway.

Inordertoestablishlocalcoherenceatthetext represen-tation level, single discourse entities related by anaphoric expressionshaveto beconceptuallylinked. We claimthat onlysophisticatedknowledgerepresentationlanguageswith powerfulterminologicalreasoningcapabilities,suchasthose fromtheKL-ONEfamily,areabletodealwiththefullrange ofchallengesofreferentiallyadequatetextunderstanding,in particularconsideringnominalandfunctionalanaphora.

These two typesof anaphora pose anenormous burden onthe availability of richdomainknowledge. We respond to this challenge in two ways. In a large-scale knowledge engineering eort, we semi-automatically transform a se-manticallyweakthoughhugethesaurus-stylemedical knowl-edgesourceintoaterminologicalknowledgebase. Ifsucha human-maderesourceismissing, weturntoapurely auto-maticapproachofbootstrappingagivendomainknowledge baseas partofon-goingtextunderstandingprocesses.

Thedepthofunderstanding weprovidecomesclosest to systemssuchasScisor[18],Tacitus[15]orPundit/Kernel [17],butSynDiKATe'sknowledgeacquisitionstrategiesor learningcapabilitieshavenocounterpartthere. Text under-standers which incorporate learning components are even rarerbutsystems suchas Snowy [3] orWrap-Up [23] ei-therhaveaverynarrowdomaintheoryandlackrobustness fordealing withunseeninputeectively,orfailto account forawiderangeofreferentialtextphenomena,respectively.

5.

ACKNOWLEDGMENTS

The development of the SynDiKATe system has been sup-portedbyvariousgrantsfromDeutscheForschungsgemeinschaft underHa2097/*. SynDiKATewouldnothavecometoexistence withouttheexcitingcontributionsandenthusiasmofcurrentand formermembersofthe group, inparticular,Steen Staab, KatjaMarkert,MichaelStrube,MartinRomacker,StefanSchulz, KlemensSchnattinger,NorbertBroker,PeterNeuhaus,Susanne Schacht,ManfredKlenner,andHolgerSchauer.

6.

REFERENCES

[1] JimCowieandWendyLehnert.Informationextraction. CommunicationsoftheACM,39(1):80{91,1996. [2] CarolFriedmanandGeorgeHripcsak.Evaluatingnatural

languageprocessorsintheclinicaldomain.Methodsof InformationinMedicine,37(4/5):334{344,1998. [3] FernandoGomezandCarlosSegami.Therecognitionand

classicationofconceptsinunderstandingscientictexts. Journalof Experimentaland TheoreticalArticial Intelligence,1(1):51{77,1989.

[4] BarbaraJ.Grosz,AravindK.Joshi,andScottWeinstein. Centering:Aframeworkformodelingthelocalcoherenceof discourse.ComputationalLinguistics,21(2):203{225,1995. [5] UdoHahn,NorbertBroker,andPeterNeuhaus.Let's

ParseTalk:Message-passingprotocolsforobject-oriented parsing.InH.BuntandA.Nijholt,editors,Advancesin ProbabilisticandotherParsingTechnologies,pages 177{201.Kluwer,2000.

[6] UdoHahn,KatjaMarkert,andMichaelStrube.A conceptualreasoningapproachtotextualellipsis.In ProceedingsoftheECAI'96,pages572{576,1996. [7] UdoHahnandUlrichReimer.Knowledge-basedtext

summarization:Salienceandgeneralizationoperatorsfor knowledgebaseabstraction.InI.ManiandM.Maybury, editors,AdvancesinAutomaticTextSummarization,pages

theSynDiKATesystem:Howtechnicaldocumentsare automaticallytransformedtotextknowledgebases.Data& KnowledgeEngineering,35(2):137{159,2000.

[9] UdoHahn,MartinRomacker,andStefanSchulz.Discourse structuresinmedicalreports{watchout! Thegeneration ofreferentiallycoherentandvalidtextknowledgebasesin themedSynDiKATesystem.InternationalJournalof MedicalInformatics,53(1):1{28,1999.

[10] UdoHahn,MartinRomacker,andStefanSchulz.How knowledgedrivesunderstanding: Matchingmedical ontologieswiththeneedsofmedicallanguageprocessing. ArticialIntelligenceinMedicine,15(1):25{51,1999. [11] UdoHahn,SusanneSchacht,andNorbertBroker.

Concurrent,object-oriented naturallanguageparsing:The ParseTalkmodel.InternationalJournalof

Human-ComputerStudies,41(1/2):179{222,1994. [12] UdoHahnandKlemensSchnattinger.Towardstext

knowledgeengineering.InProceedingsoftheAAAI'98, pages524{531,1998.

[13] UdoHahn,StefanSchulz,andMartinRomacker.

Partonomicreasoningastaxonomicreasoninginmedicine. InProceedingsoftheAAAI'99,pages271{276,1999. [14] UdoHahnandMichaelStrube.Centeringin-the-large:

Computingreferentialdiscoursesegments.InProceedingsof theACL'97/EACL'97,pages104{111,1997.

[15] JerryR.Hobbs,MarkE.Stickel,DouglasE.Appelt,and PaulMartin.Interpretationasabduction.Articial Intelligence,63(1/2):69{142,1993.

[16] KatjaMarkertandUdoHahn.Ontheinteractionof metonymiesandanaphora.InProceedingsoftheIJCAI'97, pages1010{1015,1997.

[17] MarthaS.Palmer,RebeccaJ.Passonneau,CarlWeir,and TimFinin.TheKerneltextunderstandingsystem. ArticialIntelligence,63(1/2):17{68,1993.

[18] LisaF.Rau,PaulS.Jacobs,andUriZernik.Information extractionandtextsummarizationusinglinguistic knowledgeacquisition.InformationProcessing & Management,25(4):419{428,1989.

[19] MartinRomackerandUdoHahn.Anempiricalassessment ofsemanticinterpretation.InProceedingsoftheNAACL 2000,pages327{334,2000.

[20] MartinRomacker,KatjaMarkert,andUdoHahn.Lean semanticinterpretation.InProceedingsoftheIJCAI'99, pages868{875,1999.

[21] StefanSchulzandUdoHahn.Knowledgeengineeringby large-scaleknowledgereuse:Experiencefromthemedical domain.InProceedingsofKR2000,pages601{610,2000. [22] StefanSchulz,UdoHahn,andMartinRomacker.Modeling

anatomicalspatialrelationswithdescriptionlogics.In ProceedingsoftheAMIA2000,pages779{783,2000. [23] StephenSoderlandandWendyLehnert.Wrap-up: A trainablediscoursemoduleforinformationextraction. JournalofArticialIntelligenceResearch,2:131{158,1994. [24] SteenStaabandUdoHahn.Comparativesincontext.In

ProceedingsoftheAAAI'97,pages616{621,1997. [25] SteenStaabandUdoHahn.\Tall",\good",\high"{

comparedtowhat? InProceedingsoftheIJCAI'97,pages 996{1001,1997.

[26] SteenStaabandUdoHahn.Scalabletemporalreasoning. InProceedingsoftheIJCAI'99,pages1247{1252,1999. [27] MichaelStrubeandUdoHahn.Functionalcentering:

Groundingreferentialcoherenceininformationstructure. ComputationalLinguistics,25(3):309{344,1999. [28] WilliamA.WoodsandJamesG.Schmolze.TheKl-One

family.Computers&MathematicswithApplications, 23(2/5):133{177,1992.