The
SynDiKATeText Knowledge Base Generator
Udo Hahn
Text Knowledge Engineering Lab
Albert-Ludwigs-Universit ¨at Freiburg
D-79085 Freiburg, Germany
Martin Romacker
Text Knowledge Engineering Lab
Albert-Ludwigs-Universit ¨at Freiburg
D-79085 Freiburg, Germany
ABSTRACT
SynDiKATecomprisesafamilyoftextunderstanding sys-temsforautomaticallyacquiringknowledgefromreal-world texts, viz.information technologytest reportsandmedical ndingreports. Theircontentistransformedtoformal rep-resentation structures which constitutecorrespondingtext knowledge bases. SynDiKATe'sarchitectureintegrates re-quirementsfromtheanalysis ofsinglesentences,as wellas thoseofreferentiallylinkedsentencesformingcohesivetexts. Besides centering-based discourseanalysis mechanisms for pronominal,nominal and bridginganaphora,SynDiKATe issuppliedwith alearningmodule forautomatically boot-strappingitsdomainknowledgeastextanalysisproceeds.
1.
INTRODUCTION
TheSynDiKATesystem belongstothe broadfamilyof informationextraction(IE)systems[1]. Signicantprogress hasbeenmadealready,ascurrentIEsystemsproviderobust shallowtextprocessingsuchthatframe-styletemplatesare lledwithfactualinformationaboutparticularentities (lo-cations,persons,eventtypes,etc.) fromtheanalyzed doc-uments. Nevertheless, typical MUC-style systemsare also limited in several ways. They provide no inferencing ca-pabilitieswhichallowsubstantialreasoningaboutthe tem-platellers (hence,their understandingdepthis low), and their potential to deal with textual phenomena is highly constrained, if it is available at all. Also novel and unex-pected thoughpotentially relevant informationwhichdoes notmatchgiventemplatestructuresishardtoaccountfor, since system designers committo a xedcollection of do-mainknowledgetemplates(i.e.,theyhavenoconcept learn-ingfacilities).
WithSynDiKATe,weareaddressingtheseshortcomings and aimat amore sophisticated levelof knowledge acqui-sitionfrom real-world texts. Thedocumentswedeal with aretechnicalnarrativesinGermanlanguagetakenfromtwo domains,viz.testreportsfrom theinformation technology (IT)domainasprocessedbytheitSynDiKATesystem[8],
.
andndingreportsfromamedicalsubdomain(MED),the frameworkofthemedSynDiKATesystem[10,9]. Ourrst goalistoextractconceptuallyandinferentiallyricherforms of knowledgethanthosecaptured by standardIE systems suchasevaluativeassertionsandcomparisons[25,24], tem-poral [26] and spatial information [22]. Second, we also wanttodynamicallyenhancethesetofknowledgetemplates throughincrementaltaxonomylearningdevices[12]sothat the information extraction capability of the system is in-creasedinabootstrappingmanner. Third, SynDiKATeis particularly sensitive to thetreatmentof textualreference relations [27, 6, 14]. Thecapability to properly deal with various forms of anaphora is a prerequisite for the sound-nessandvalidityofthe knowledgebaseswecreateas a re-sult ofthe textunderstanding process andlikewise for the feasibilityofsophisticatedretrievalandquestionanswering applicationsbasedontheacquiredtextknowledge.
2.
SYSTEM ARCHITECTURE
TheoverallarchitectureofSynDiKATe,anacronymwhich stands for \Synthesis of Distributed Knowledge Acquired fromTexts",issummarizedinFigure1. Incomingtexts,T
i , aremappedintocorrespondingtextknowledgebases,TKBi, whichcontainarepresentationofT
i
'scontent. This knowl-edge base platform may feed various information services, suchasinferentiallysupportedquestion answering(fact re-trieval),textpassageretrievalortextsummarization[7].
2.1
Sentence-Level Understanding
Figure1: ArchitectureofaSynDiKATeSystem
Conceptual knowledgeaboutthedierentdomainsis expressedinaKl-One-likedescriptionlogic language[28]. Corresponding to the division at the lexical level, the on-tologiesweprovidearesplitupbetweenonethatisusedby allapplications,theUpperOntology,whileseveraldedicated ontologies account for the conceptual requirementsof par-ticulardomains,e.g.,IT(HardDisk,ColorPrinter,etc.) orMED(Gastritis,SurfaceMucus,etc.).
Semanticknowledgeaccountsforemergingconceptual relations between conceptual items according to those de-pendencyrelations thatare established between their cor-respondinglexicalitems. Semanticinterpretationschemata mediatebetweenbothlevelsinawayasabstractandgeneral aspossible[20]. Theseschemataareappliedtosemantically interpretable subgraphswhichare,fromasemanticpointof view,\minimal"subgraphsoftheincrementallybuilt depen-dencygraph. Theirboundingnodescontaincontentwords (i.e.,nouns,verbs,andadjectives,allofwhichhavea concep-tualcorrelate inthedomainontologies), while all possibly intervening nodes (zero up to four) contain only noncon-tent words (suchas prepositions, articles, auxiliaries, etc., all of which have no conceptual correlates). Semantic in-terpretationschemataarefullyembeddedintheknowledge representationmodelandsystem(cf.Figure1).
TheParseTalksystem,whichcomprisesthelexicalized grammarandassociateddependencyparser,isembeddedin anobject-orientedcomputationmodel. So,thedependency relationsarecomputedbylexicalobjects,so-calledword ac-tors,throughstrictlylocalmessage passing,onlyinvolving the lexical items they represent. To illustrate how a de-pendencyrelationisestablishedcomputationally,wegivea sketchofthebasicprotocolforincrementalparsing[5]:
After a word has been read from textual input by theWordScanner(stepA
1
inFigure1),itsassociated lexeme (specied in the Lexicon) is identied (step A2) and a corresponding word actor gets initialized (stepB1). Asallcontentwordsaredirectlylinkedto theconceptualsystem,eachlexicalitemwthathasa conceptual correlateCinthedomainknowledgebase (stepA3)getsinstantiatedinthetextknowledgebase (step B
2
). The lexical item Festplatte (hard disk)
talkedaboutinagiventext. 1
For integration inthe parse tree, the newly created word actorsearches itshead (alternatively, its modi-er)bysendingparallelrequestsfordependential gov-ernmenttoitsleftcontext(stepC). Thesearchspace is restricted, sincethese requests are propagated up-wards only along the `right shoulder' of the depen-dency graphconstructed so far. Allword actors ad-dressedthis way check, inparallel, whether their va-lencyrestrictions,i.e.,grammaticalandconceptual con-straints, are metbythe requestingword actor. Step Dsimulatesaconceptualcheckinthetextknowledge base,stepEillustratesatestinthediscoursememory.
If all required constraints are fullled by one of the targetedwordactors,animmediatesemantic interpre-tationisperformed. Thisusuallyalterstheconceptual representationstructuresbywayofslotlling(stepF).
Semanticinterpretationconsistsofndingarelational link between the conceptual correlates of the two content words bounding the associated semantically interpretable subgraph. Thelinkagemayeitherbeconstrainedby depen-dency relations (e.g., the subject: relation of a transitive verb such as \sell" may only be interpreted conceptually intermsofagentorpatientroles),byinterveninglexical material (e.g., some prepositions impose special role con-straints, such as mit (with) does in terms of has-part or instrument roles), or it may be constrained by concep-tual criteria only (as with the genitive: dependency rela-tion,whichunlikesubject: imposesnoadditional selective conceptualconstraintsforinterpretation). The correspond-ing knowledge about these language-specic constraints is denselyencodedintheLexiconclasshierarchy,anapproach whichheavilyreliesonthepropertyinheritancemechanisms inherenttotheobject-orientedparadigm.
2.2
Text-Level Understanding
2.2.1
Referential Text Phenomena
Thetextualphenomenawedeal withinSynDiKATe es-tablishreferentiallinksbetweenconsecutiveutterancesina coherenttextsuchasillustratedbythreepossible continua-tionsofsentence(1),withthreedierentformsof extrasen-tentialanaphora:
(1) CompaqverkaufteinNotebookmiteinerFestplatte,die vonSeagatehergestelltwird.
(Compaqsellsanotebookwithaharddiskthatis man-ufacturedbySeagate.)
(2) Pronominal Anaphora:
EsistmiteinerPentium-III-CPUausgestattet. (ItcomeswithaPentium-IIICPU.)
(3) NominalAnaphora:
DerRechneristmiteinerPentium-III-CPUausgestattet. (ThemachinecomeswithaPentium-IIICPU.) (4) Functional Anaphora:
DerArbeitsspeicherkannauf96MBerweitert werden. (Themain memorycanbeexpandedupto96MB.) 1
Compaq sells a Notebook with a hard disk that is manufactured by Seagate.
3
ein
4
2
5
1
Compaq
subject:
propo:
verkauft
die
,
delimiter:
subject:
relative:
specifier:
Notebook
hergestellt
von
Seagate
ppadjunct:
pobject:
wird
ppatt:
pobject:
object:
mit
specifier:
Festplatte
.
einer
verbpart:
Figure2: DependencyParseforSentence(1)
Figure 3: ConceptualInterpretationforSentence(1)
Theresultsofsentence-levelanalysisforsentence(1)are given in Figure 2, which contains a syntactic dependency graph(together withvecongurations ofsemantically in-terpretablesubgraphs),andFigure3,whichdisplaysits con-ceptualrepresentation. Fortext-levelanalysis, pronominal anaphorastillheavilydependongrammatical conditions{ theagreementoftheantecedent(\Notebook")andthe pro-noun(\Es"(it))ingenderandnumber;alsoconceptual cri-teria apply insofar as a potential antecedent must t the conceptual role (or case frame) restrictions when it is in-tegratedingoverningstructures, say, thehead verb ofthe clause. In general, however, the inuence of grammatical criteria gradually diminishes for other types of text phe-nomena,whiletheinuenceofconceptualcriteriaincreases. For nominal anaphora, number constraints are still valid, whileageneralizationrelationbetweentheanaphoricnoun (\Rechner" (machine)) and its proper antecedent (\Note-book") must hold, in addition. In the case of functional anaphora,nogrammarconstraintsatall apply,whilequite sophisticatedconceptualrolepathconditionscomeintoplay, e.g., \Arbeitsspeicher" (mainmemory) being aconstituent physicalpartof\Notebook".
Theproblems textphenomenacause are of vital impor-tancefor the adequacyofthe representation structures re-sultingfromtextprocessing,andarecenteredaroundthe no-tionsofincomplete,invalidandincoherentknowledgebases. Incomplete knowledge bases emerge when references to alreadyestablisheddiscourseentitiesaresimplynot recog-nized,asinthecase ofpronominalanaphora. Considerthe referencerelationshipbetweenthepronoun\Es"(it)in sen-tence(2) which refers to the noun phrase \ein Notebook" (a notebook) in sentence (1). The occurrence of the pro-nounisnotreectedattheconceptuallevel,sincepronouns (as noncontent words) do not have conceptual correlates. Hence, an incomplete concept graphemerges as shown in Figure4 |thereferent for the pronoun\Es" (it), Note-book.2,isnotlinkedtoPentium-III-CPU.6. Anadequate treatmentwithaproperlyresolvedanaphorisshownin Fig-ure6, where therepresentation ofthe relevant portions of
Figure4: UnresolvedPronominalAnaphor,Sentence(2)
Figure5: UnresolvedNominalAnaphor,Sentence(3)
Figure6: ResolvedAnaphors,Sentences(1)and(2)/(3)
larbydeterminingtheequip-patientrolebetweenEquip.7 andtheproperreferent,Notebook.2.
Invalid knowledge basesemerge wheneachentity which has adierent denotation atthe text surfaceis treated as a formally distinct conceptual item at the symbollevel of knowledgerepresentation,althoughalldierentdenotations refer literally to the same conceptual entity. This is the casefornominalanaphora,anexampleofwhichisgivenby thereference relationbetweenthe nounphrase \Der Rech-ner"(themachine)insentence(3)andthenounphrase\ein Notebook" (anotebook) insentence(1). Aninvalid referen-tialdescriptionappearsinFigure5,whereComputer.5 is introducedasanewentityinthediscourse,whereasFigure 6 shows the valid conceptual representation capturing the intendedmeaningattherepresentationlevel,viz. maintain-ingNotebook.2astheproperreferent(notethat pronom-inalaswellasnominalanaphoraaretwoequivalentwaysto corefertothediscourseentitydenotedbyNotebook.2).
sym-Figure7: UnresolvedFunctionalAnaphor,Sentence(4)
Figure 8: ResolvedFunctionalAnaphor, Sentences (1) and(4)
Disregardingtextualphenomenawillcausedysfunctional systembehavior. AqueryQsuchas
Q : (retrieve ?x (Computer ?x)) A-: (|I| Notebook.2, |I| Computer.5) A+: (|I| Notebook.2)
triggersasearchforall instancesofComputerinthetext knowledgebase. Givenaninvalidknowledgebase(cf. Fig-ures3and5),theincorrectanswer(A-)containstwoentities, viz.Notebook.2 andComputer.5|bothareinthe ex-tensionoftheconceptComputer. If,however,avalidtext knowledge base suchas the one inFigure 6 or 8 is given, onlythecorrectanswer, Notebook.2,isinferred(A+).
Renderingalso quantitative substance to ourclaims, we analyzedarandomly chosensampleof 100reports on his-tologicalndingswithapproximately14,000texttokens[9]. InITtexts,(pro)nominalanaphoraandfunctionalanaphora occuratanalmostbalancedrate[27]. Inthemedicaltexts, however,functionalanaphoraturnouttobethemajorglue for establishinglocal coherence, while anaphora, pronomi-nal anaphora in particular, play a far less important role than in other text genres. The high proportion of func-tionalanaphora(45%)[42%-48%]
2
andtheremarkablerate of nominal (34%) [31%-37%] compared to extrasentential pronominalanaphora(2%)[1%-3%]isclearlyanindication oftheprimaryorientationinmedicaltextstoconveyfacts inaverycompactmanner. Twoconsequencescanbedrawn fromthisobservation. First,resolutionproceduresfor func-tionalanaphora{supplementingwell-researchedprocedures for (pro)nominalanaphora {haveto be providedurgently (cf.[6]forafullyworkedoutapproach). Second,functional anaphorapresuppose a considerableamountof deep back-groundknowledge, withemphasisonpartonomic reasoning [13],supplementingwell-knownprinciplesoftaxonomic rea-soningfortextunderstanding.
2
For all percentage numbers 95% condence intervals are suppliedinsquarebrackets.
2.2.2
Centering Model for Anaphora Resolution
Inordertoavoidtheemergenceofincomplete,invalidand incoherenttextknowledgebasesweconsiderdiscourse enti-tiesforestablishingreferencerelationswithupcomingitems from thetextualinputat alocal[27] and ata globallevel [14] of cohesion. To preserve adequatetext representation structuresacenteringmechanismisused. Thediscourse en-tities whichoccur in anutterance U
i
constitute its set of forward-looking centers, Cf(Ui). The elements in Cf(Ui) areorderedtoreectrelativeprominenceinUiinthesense thatthe mosthighlyrankedelementofC
f (U
i
)isthe most likelyantecedent ofananaphoricexpressioninUi+1,while theremainingelementsareorderedaccordingtodecreasing preferenceforestablishingreferentiallinks.
While it is usually assumed (for the English language, in particular) that grammatical rolesare the major deter-minant for the ranking on the C
f
[4], we claim that for German { a language with relatively free word order { it isthe functionalinformation structureofthesentence[27]. Accordingly, the constraints on the ordering of entries in C
f (U
i
)preferhearer-old(eitherevokedorunused)elements in an utterance (i.e., those that can be related to previ-ously introduced discourse elements or generally accessi-bleworldknowledge)overmediated(inferrable)ones,while these are preferred over hearer-new (brand-new) elements foranaphoraresolution. Iftwoelementsbelongtothesame category,thenpreferenceisdenedintermsoflinear prece-denceofthediscourseunitsinthesourcetext.
Whenweapplythesecriteriatosentence(1),Table1 de-pictstheresultingorderofforward-lookingcentersinC
f (S
1 ). Sincewehavenodiscourse-boundelementsintherst sen-tence,textualprecedenceappliesexclusivelytotheordering of the center list items. Only nouns and their conceptual correlates aretakenintoconsideration. Thetuplenotation takestheconceptualcorrelateofthelexicaliteminthetext knowledgebase inthe rstplace, while thelexical surface formappearsinthesecondplace.
(1) Cf: [Compaq: Compaq,Notebook.2: Notebook, Hard-Disk.3: Festplatte,Seagate: Seagate] Table1: CenteringDataforSentence(1)
Processing of the centering list Cf(S1) for sentence (3) untilthegeneralizationconstraintisfullled,nally,results inaquerywhetherNotebookissubsumedbyComputer, the conceptual correlateof thelexicalitem\Rechner". As thisrelationshipobviouslyholds,intheconceptual represen-tationstructureofsentence(3)(cf.Figure5)Computer.5, theliteralinstanceidentier,isdeclaredcoreferentto Note-book.2,the referentiallyvalididentier. Insteadofhaving twounlinkedsentencegraphs,Figures3and5,thereference resolutionfor (pro)nominalanaphoraleadstojoiningthem in a common valid text graph (Figure 6). In particular, Notebook.2linkstotherelationequip-patient,formerly occupiedbyComputer.5. Thecorrespondingcenteringlist attheendoftheanalysisofsentence(3)isprovidedinTable 2 (C
f (S
1
) has beenupdatedto reectthe consumption of theantecedent,Notebook.2,intheprocessing ofC
f (S3)).
(1) Cf: [Compaq: Compaq,Notebook.2: Notebook, Hard-Disk.3: Festplatte,Seagate: Seagate] (3) Cf: [Notebook.2: Rechner,
2.3
Textual Learning
Theapproachtolearningnewconceptsasaresultoftext understandingbuildsontwodierentsourcesofevidence| thepriorknowledgeofthedomainthetextsareabout,and grammaticalconstructions inwhichunknownlexicalitems occurinthetexts. ThearchitectureofSynDiKATe's con-ceptlearningcomponentisdepictedinFigure9.
linguistic
quality
labels
conceptual
labels
Hypothesis
space-p
Hypothesis
Qualifier
quality
space-q
Quality Machine
1
2
space-1
space-i
space-n
Hypothesis
Hypothesis
Hypothesis
Language Processor
dependency parse graph
text knowledge base
Figure9: SynDiKATe'sLearningComponent
TheParseTalksystemgeneratesdependencyparsegraphs. Thekindsofsyntacticconstructions(e.g.,genitive, apposi-tive,comparative),inwhichunknownlexicalitemsappear, are recorded and laterassessed relative to the credit they lend to a particular concept hypothesis, e.g., high for ap-positives(\thenotebookX"),lowerforgenitives(\Compaq's X").Theconceptualinterpretationofparse treesinvolving unknownlexicalitemsinthetextknowledgebaseleadstothe deductionofconcepthypotheses. Thesearefurtherenriched byconceptualannotationswhichreectstructuralpatterns ofconsistency,mutualjustication,analogy,etc.relativeto alreadyavailableconceptdescriptionsinthetextknowledge baseorotherhypothesisspaces. Bothkindsofevidence,in particulartheir predictive `goodness'for the learningtask, arerepresentedbycorrespondingsetsoflinguisticand con-ceptualqualitylabels.
Alternative concept hypotheses for each unknown lexi-calitemareorganizedintermsofcorrespondinghypothesis spaces, eachof whichholds a dierent conceptual reading. Aninferenceengineembeddedintheterminologicalsystem, theso-called qualitymachine,determinestheoverall credi-bilityofsingleconcepthypothesesbytakingtheavailableset ofqualitylabelsforeachhypothesisintoaccount. The qual-ier, a terminological classier extended by anevaluation metricforqualityclasses,computesapreferencerankingof thosehypotheseswhichremainvalidafterthetexthasbeen processedcompletely(cf.[12]for details).
3.
COVERAGE AND EVALUATION
SynDiKATe'scoveragevariesconsiderablydependingon the targetdomain. The generic lexiconcurrentlyincludes 3,000 entries, the IT lexicon adds 5,000, while the MED lexiconcontributes70,000entrieseach.TheUpperOntology contains1,200conceptsandroles,towhichtheITontology adds3,000andtheMEDontologycontributes240,000items. TheIT domainwas chosen asa testbedthat canbe ex-tendedondemand. TheMEDdomain,however, issubject to ontology engineering eorts ona larger scale. In order tocopewiththe enormousknowledgeengineering
require-a semantically weak, yethigh-volumemedical terminology (UMLS)toaverylargeterminologicalknowledgebase[21]. Admittedly,SynDiKATehasnot yetundergone a thor-ough empirical evaluationinone ofthe envisaged applica-tiondimensions. Wehave, however, carefully evaluatedits subcomponents. Theresultscanbesummarizedasfollows: SentenceParsing. Wecomparedastandardactivechart parser with full backtracking capabilities with the parser of SynDiKATe, which is characterized by limited memo-ization and restricted backtracking capabilities, using the same grammar specications. Onaverage, SynDiKATe's parserexhibitsalineartimecomplexitythefactorofwhich is dependent on ambiguity rates of input sentences. The active chart parser runs into exponential time complexity wheneveritencountersextragrammaticalorungrammatical input,sincethenitconductsanexhaustivesearchofthe en-tireparsespace. Theloss ofstructuraldescriptions dueto theparser'sincompletenessamountsto10%comparedwith thecomplete,thoughintractableparser[5].
TextParsing. Whilewithrespectto resolution capac-ity (eectiveness)nosignicant dierencescould be deter-mined, the functional centering model we propose outper-formsthebest-knowncenteringalgorithmsbyarateof50% withrespecttoameasureofcomputationcostswhich con-siders\cheap" and\expensive"transitionalmovesbetween utterances to assess a text's coherency. Hence, the proce-dureweproposeismoreeÆcient[27].
SemanticInterpretation. Ourgrouphasbeen pioneer-ingworkontheempiricalevaluationofmeaning representa-tions. Weassessed thequalityandcoverage ofsemantic in-terpretationforrandomlysampledtextsinthetwodomains weconsider.Whilerecallwasratherlow(57%forMED,31% forIT),precisionpeakedat97%and94%,respectively[19]. \Heavy"Semantics. Wecandealwithintricate seman-ticphenomenaforwhichwehaveprovidedtherstempirical evaluationdataavailable atall. This relatestothe resolu-tion of metonymies, where we have determined a gain in eectivenessthatamountsto16% comparedwiththe best proceduresknownsofar[16],aswellasitrelatesto compar-ativesandevaluativeassertions,wheregainsineectiveness werealmosttripled[25].
Concept Learning. The performance of the concept learningcomponenthasbeencomparedtostandardlearning mechanismsbasedontheterminological classieravailable inanysortofdescriptionlogicssystems.Ourdataindicate anincreaseofperformanceof8%(87%accuracy,whilethat ofstandardclassiersisontheorderof79%)[12].
EvaluatingatextknowledgeacquisitionratherthananIE system poseshardmethodologicalproblems[2]. Themain reason being that agold standardfor comparison |what constitutes acanonical,commonlyagreedupon interpreta-tionof thecontent ofatext? |ishardtoestablish, even for technicaltexts. A follow-up problem is constitutedby the lack of a signicant amount of annotated text knowl-edgebasesonwhichcomparativeanalysesmightbeassessed. MUC-styleevaluationmetrics,e.g.,havealreadybeen qual-ied nottoadequatelyreectthe functionalityofless con-strainedtextunderstanders[29].
4.
CONCLUSIONS
discourseunits for reference resolution using thecentering model. Thisallowsustodealwithvariousformsof pronom-inal,nominalandfunctionalanaphorainauniformway.
Inordertoestablishlocalcoherenceatthetext represen-tation level, single discourse entities related by anaphoric expressionshaveto beconceptuallylinked. We claimthat onlysophisticatedknowledgerepresentationlanguageswith powerfulterminologicalreasoningcapabilities,suchasthose fromtheKL-ONEfamily,areabletodealwiththefullrange ofchallengesofreferentiallyadequatetextunderstanding,in particularconsideringnominalandfunctionalanaphora.
These two typesof anaphora pose anenormous burden onthe availability of richdomainknowledge. We respond to this challenge in two ways. In a large-scale knowledge engineering eort, we semi-automatically transform a se-manticallyweakthoughhugethesaurus-stylemedical knowl-edgesourceintoaterminologicalknowledgebase. Ifsucha human-maderesourceismissing, weturntoapurely auto-maticapproachofbootstrappingagivendomainknowledge baseas partofon-goingtextunderstandingprocesses.
Thedepthofunderstanding weprovidecomesclosest to systemssuchasScisor[18],Tacitus[15]orPundit/Kernel [17],butSynDiKATe'sknowledgeacquisitionstrategiesor learningcapabilitieshavenocounterpartthere. Text under-standers which incorporate learning components are even rarerbutsystems suchas Snowy [3] orWrap-Up [23] ei-therhaveaverynarrowdomaintheoryandlackrobustness fordealing withunseeninputeectively,orfailto account forawiderangeofreferentialtextphenomena,respectively.
5.
ACKNOWLEDGMENTS
The development of the SynDiKATe system has been sup-portedbyvariousgrantsfromDeutscheForschungsgemeinschaft underHa2097/*. SynDiKATewouldnothavecometoexistence withouttheexcitingcontributionsandenthusiasmofcurrentand formermembersofthe group, inparticular,Steen Staab, KatjaMarkert,MichaelStrube,MartinRomacker,StefanSchulz, KlemensSchnattinger,NorbertBroker,PeterNeuhaus,Susanne Schacht,ManfredKlenner,andHolgerSchauer.
6.
REFERENCES
[1] JimCowieandWendyLehnert.Informationextraction. CommunicationsoftheACM,39(1):80{91,1996. [2] CarolFriedmanandGeorgeHripcsak.Evaluatingnatural
languageprocessorsintheclinicaldomain.Methodsof InformationinMedicine,37(4/5):334{344,1998. [3] FernandoGomezandCarlosSegami.Therecognitionand
classicationofconceptsinunderstandingscientictexts. Journalof Experimentaland TheoreticalArticial Intelligence,1(1):51{77,1989.
[4] BarbaraJ.Grosz,AravindK.Joshi,andScottWeinstein. Centering:Aframeworkformodelingthelocalcoherenceof discourse.ComputationalLinguistics,21(2):203{225,1995. [5] UdoHahn,NorbertBroker,andPeterNeuhaus.Let's
ParseTalk:Message-passingprotocolsforobject-oriented parsing.InH.BuntandA.Nijholt,editors,Advancesin ProbabilisticandotherParsingTechnologies,pages 177{201.Kluwer,2000.
[6] UdoHahn,KatjaMarkert,andMichaelStrube.A conceptualreasoningapproachtotextualellipsis.In ProceedingsoftheECAI'96,pages572{576,1996. [7] UdoHahnandUlrichReimer.Knowledge-basedtext
summarization:Salienceandgeneralizationoperatorsfor knowledgebaseabstraction.InI.ManiandM.Maybury, editors,AdvancesinAutomaticTextSummarization,pages
theSynDiKATesystem:Howtechnicaldocumentsare automaticallytransformedtotextknowledgebases.Data& KnowledgeEngineering,35(2):137{159,2000.
[9] UdoHahn,MartinRomacker,andStefanSchulz.Discourse structuresinmedicalreports{watchout! Thegeneration ofreferentiallycoherentandvalidtextknowledgebasesin themedSynDiKATesystem.InternationalJournalof MedicalInformatics,53(1):1{28,1999.
[10] UdoHahn,MartinRomacker,andStefanSchulz.How knowledgedrivesunderstanding: Matchingmedical ontologieswiththeneedsofmedicallanguageprocessing. ArticialIntelligenceinMedicine,15(1):25{51,1999. [11] UdoHahn,SusanneSchacht,andNorbertBroker.
Concurrent,object-oriented naturallanguageparsing:The ParseTalkmodel.InternationalJournalof
Human-ComputerStudies,41(1/2):179{222,1994. [12] UdoHahnandKlemensSchnattinger.Towardstext
knowledgeengineering.InProceedingsoftheAAAI'98, pages524{531,1998.
[13] UdoHahn,StefanSchulz,andMartinRomacker.
Partonomicreasoningastaxonomicreasoninginmedicine. InProceedingsoftheAAAI'99,pages271{276,1999. [14] UdoHahnandMichaelStrube.Centeringin-the-large:
Computingreferentialdiscoursesegments.InProceedingsof theACL'97/EACL'97,pages104{111,1997.
[15] JerryR.Hobbs,MarkE.Stickel,DouglasE.Appelt,and PaulMartin.Interpretationasabduction.Articial Intelligence,63(1/2):69{142,1993.
[16] KatjaMarkertandUdoHahn.Ontheinteractionof metonymiesandanaphora.InProceedingsoftheIJCAI'97, pages1010{1015,1997.
[17] MarthaS.Palmer,RebeccaJ.Passonneau,CarlWeir,and TimFinin.TheKerneltextunderstandingsystem. ArticialIntelligence,63(1/2):17{68,1993.
[18] LisaF.Rau,PaulS.Jacobs,andUriZernik.Information extractionandtextsummarizationusinglinguistic knowledgeacquisition.InformationProcessing & Management,25(4):419{428,1989.
[19] MartinRomackerandUdoHahn.Anempiricalassessment ofsemanticinterpretation.InProceedingsoftheNAACL 2000,pages327{334,2000.
[20] MartinRomacker,KatjaMarkert,andUdoHahn.Lean semanticinterpretation.InProceedingsoftheIJCAI'99, pages868{875,1999.
[21] StefanSchulzandUdoHahn.Knowledgeengineeringby large-scaleknowledgereuse:Experiencefromthemedical domain.InProceedingsofKR2000,pages601{610,2000. [22] StefanSchulz,UdoHahn,andMartinRomacker.Modeling
anatomicalspatialrelationswithdescriptionlogics.In ProceedingsoftheAMIA2000,pages779{783,2000. [23] StephenSoderlandandWendyLehnert.Wrap-up: A trainablediscoursemoduleforinformationextraction. JournalofArticialIntelligenceResearch,2:131{158,1994. [24] SteenStaabandUdoHahn.Comparativesincontext.In
ProceedingsoftheAAAI'97,pages616{621,1997. [25] SteenStaabandUdoHahn.\Tall",\good",\high"{
comparedtowhat? InProceedingsoftheIJCAI'97,pages 996{1001,1997.
[26] SteenStaabandUdoHahn.Scalabletemporalreasoning. InProceedingsoftheIJCAI'99,pages1247{1252,1999. [27] MichaelStrubeandUdoHahn.Functionalcentering:
Groundingreferentialcoherenceininformationstructure. ComputationalLinguistics,25(3):309{344,1999. [28] WilliamA.WoodsandJamesG.Schmolze.TheKl-One
family.Computers&MathematicswithApplications, 23(2/5):133{177,1992.