• No results found

A Reusable Lexical Database Tool for Machine Translation

N/A
N/A
Protected

Academic year: 2020

Share "A Reusable Lexical Database Tool for Machine Translation"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

A R E U S A B L E L E X I C A L D A T A B A S E T O O L F O R M A C l t l N E T R A N S L A T I O N

B R I ( H T T E BL,~SER ULRIKF, SCHWALL

IBM Germany IBM Germany

Institute for Knowledgc Institute for Knowledge

Based Systems Based Systems

P.O. Box 10 30 68 P.O. Box 10 30 68

W-6900 lleidelberg W-6900 lteidelberg

Email: alscbwee schwall

at dhdibm 1 .bitnet at dhdibm l.bitnet

O. A B S T R A C T

This paper describes the lexical database tool LOLA (Linguistic-Oriented Lexical database Approach) which has been developed fur the construction and maintenance of lexicons for the maclfine translation system I,MT. First, the requirements such a tool should meet are discussed, then I,MT and the lexi- cal information it requires, and some issues con- cerning vocabulary acquisition are presented. Afterwards the architecture aml the components of the I,OLA system are described and it is shown how we tried to meet the requirements worked out em'- hier. Although I,OI,A originally has been designed and implemented for the German-English I,MT prototype, it aimed from the beginning at a repre- sentation of lexical data that can be reused for uther LMT or MT prototypes or even other NLP appli- cations. A special point of discussion will therefore be the adaptability of the tool and its cumponents as well as the reusability of the lexical data stored in the database for the lexicon development for I,MT or for other applications.

i. Introduction

The availability of large-scale lexical information has widely been recognized as a bottleneck in the con- struction of Natural Language Processing (NLI') systems. The lexical database I,OLA has been de- veloped in connection with the Logic- programming-based Machine Translation (LMT) system and shall be presented here. This work is part of the objectives of the project Transl,exis launched in 1991 at the Institute of Knowledge Based Systems of the IBM Germany Scientific Center. Transl,exis aims at the theoretically and empirically well motivated lexical description and the management of the lexical information of LMT in a database. It is conceived as a first step towards a reusable lexical knowledge base.

1.1. Requirements for

convenient

construction and maintenance of Lexicons

Based on our experience and existing literature, a tool for the construction and maintenance of large NLP lexicons with a complex entry structure should meet the following requirements:

tJ Adequate expressive power of the represen- tation formalism: the expressive power must be sufficient to cover the facts of lexical de- scription.

A N G E L I K A S T O R R E R University of Tiibingen Seminar ffir natfirlichsprachl. Systeme Wilhelmstr. 113 W-7400 'Fiibingen storrer at arbuckle.sns. ncuphilologie.uni-tuebingen.de t~ Methodology for the description of lexical in-

formation: criteria and guidelines relevant for encoding should be developed and docu- mented.

n Orientation towards lexicographic procedure: the design of the tool should take the logical course of the lexicographie work procedure into consideration and support it during all its steps and phases. The lexicographer should be ena- bled to concentrate on the lexicographic de- scription of lexical units while the tool itself automatically takes care of the remaining tasks in lexicon development.

n Consistency mid integrity checking of the lexi- cal data: when entries are added or updated, the system should reject invalid values for par- ticular features and check if the input leads to inconsistency of the database.

t~ Data independence: An extreme dependency between the structuring of lexical data stored in the database and the structure of the lexical entries in a given application system should be avoided. In tiffs way the lexical data will remain resistant to modifications in the NLP/MT-systems that make use of these data. t~ Reusability/Rcversability of the data (cf. Calzolari 1989, tlcid 1991): lexical data should be represented in such a way that it can -- apart from its transfer specific components - be re- used for other MT-prototypes with the same source or target languagc, or with the reverse language pair (e.g. German-English and English-German). Ideally, the lexical data should be independent to such a degree that they are also reusable for other NLP-applications.

D Multi-user access: it should be possible for se- veral users to work on the lexicon simultane- ously.

D llelp facilities: the criteria and guidelines for lexical description should be easily accessible. The availability of monolingnal and bilingual dictionaries are to support the lexicographer's linguistic competence.

1.2 L M T

LMT, developed by Michael MeCord, is in basic design a source-based transfer system in which the source analysis is done with Slot Grammar (cf. McCord 1989, 1990, forthcoming). Two main characteristics of LMT should be emphas~ed: 1. the lexicalism, arising from Slot Grammar

source analysis;

(2)

2. a large language-general X4o-Y-translation shell.

Both features facilitate the development of proto- types for ncw hmguage pairs I , Versions of I,MT (in wu'ious stages) exist currently for nine language pairs.

I,MT currently requires the lollowing types of in- formation to bc specified k)r lexical units (I,lJ):

pat1

of

speech; u word SellSCS i

u morphological properties; rl agreelnent features;

o the valency, i.e. the li'anlc of optiomd/ohligatory complement slots; o the specification of the fillers (Nl)s, suhordinate

clauses) for each slot;

t~ semantic compatibility constraints and collocations;

n characterization of mulliword lexmncs; u subject area;

translation relations; tq lexieal transtbmmtions.

In McCord (forthcoming), an external lexical tormat (l';lA:) is prescntcd wtfich allows the representation of the above information. 1Jntil now, however, the lexical data has been kept in sequential files attd updating tins been done with a text editor. Thtls most of tile above-mentiuncd requirenlents eoukl not be met.

1.3 Vocabulary Acquisition

The hand-coding of dictionaries is a laborious and timc-eonsuming task. Thercforc a nmnbcr of at- tempts have been made to exploit corpora and/or machine readable dictionaries (MRDs) for the build-up of NI A)-lexicons (el. 3.5) 2 . In many cases, however, the lexical information in M R D ' s is nci- thor complete nor sufficiently explicit for N L P / M T purposes anti has to he rcvised hy lexicographers. Ideally, the demands on a lexicograptter shoukl only bc of linguistic nature. For this reason a sophisti- cated tool is necdcd to guklc anti suppnrt the NlJ'/MT-lexicographer in revising entries auto- maticaHy converted from machine readable sources as well as in buikling up new vocabulary.

2 . L O L A - a r c h i t e c t u r e a n d c o m p o n e n t s

The lexical data base tool I,OLA aims at meeting the above mentkmed requirements. Its design and development are based on work achieved in the l,EX-project and the COIJ';X-projcct 3 . LOLA makes use of automatic consistency and hltegrity checks as well as of the support of multi-user access provided as standard facilities by the relational

I)BMS SQL/I)S. Updates are made with the help of a user interface tltat supports tim lexicographer during the encoding process. The representation of the lexical data has been worked out to be as inde- pendent as possible of the format of a specific ap- plicatiun lexicon, thus increasing the degree of reusability of the lexical data. In additiun, a cata- logue of criteria and guiddhms for lexical description is being elaborated and will be integrated into tim tool.

IXlLA L ~ x t c a l I ) a r . a i m ~ '1"~l

[image:2.432.226.388.118.266.2]

k . . .

3 / /

]

~x ~

Figure

I. IX)I,A -

architecture

The main components of the architecture of the 1,OI,A system arc tile following (of. l:igure 1):

1. LO1,A-I)B: the database itself.

2. COI,OI,A (COder's interface to LO1,A): Interface tilt hand-codiug and modificatkm of tim lexical data, stored in I,OLA-I)B. 3. I)B TO LMT: program tll~d generates I,MT

lexicon matfies from the lcxieal data stored in

1 I ) I , A - I ) B .

4. I,MT TO 1)11: program that loads already ex- isting 1 ,MT lexicons into I,OI,A-I)B. 5, I,I)11 T O DB: program that converts data

from ~M l(l)'s into I ,O 1 ,A-1)B.

In the tbllowing we give a brief description of these components.

2.1. The Database

The database was desi~md iu two steps: develop- merit of the c o n c e p t u a l scheme and dcvclopment of tim database scheme.

In the conceptual design phase, the lexical objects, ttmir properties, and their interrelations were re- presented in an entity-relatinnship diagram (of.

I , M T is file technical basis of an international project at IBM wifll cooperation between IBM Researdl, tile IBM

S c i e n c e Centers in lleidelberg, Madrid, Paris, llaifa, and Cairo, and IBM European l.anguage Services in

Copenhagen (cf. Rimon et al. 1991).

2 Cf. Byrd et al. 1987; for an overview of related activities within file l,MT-project, cL Rimon et al. 1991, pp. 14-15.

Cf. Barnett et al. 1986; Blumenthal et al. 1988; Storrer 1990.

(3)

C h e n 1976). A l t h o u g h the E R - m o d e l does not have the expressive power to cover all aspects o f lcxical description, especially complex constraints, it has been chosen here as a c o m p r o m i s e between a cmn- plete lexical representation a n d the realization in a traditional database system.

The resulting E R - d i a g r a m for the German-English lexicon is s h o w n in Figure 2 a .

T h e conceptual scheme is still independent o f the choice of a specific D B M S and o f other implemen- tation aspects. The basic principles o f the concep- tual design o f our database will be sketched out in the following.

Orientation towards linguistic structure, not towards the structure of the application lexicon.

The diagram reflects, in the first place, the structure o f the linguistic objects, their properties a n d and their interrelations, and it is influenced to a smaller degree by the structure o f the application lexicon. As a consequence, the data is quite resistant to structural changes hi the format of the application lexicon. The abstraction from the structures of the application lexicon has a positive side effect with regard to the exploitation of m a c h i n e readable lexi- cai resources: o n one hand, we can handle cases, in which not all information required by L M T is pro- vided in the entries o f M R D ' s . T h e information acquired c a n be stored as entries to be completed and revised later. O n the other hand, we are free to store types of lexical information that are o f rele- vance for N L P applications a n d can be acquired from M R D ' s o r other N L P lexicons but are not processed in a current LMT-version. We can save t h e m in the database as coding aids for the lexicographers, for future p r o t o t y p e versions, or other N L P applications.

Analogous structure for source and target language wherever possible.

The lower part of the E R - d i a g r a m represents the G e r m a n source, tile u p p e r part tile English target language. F o r b o t h languages, a n entity of the type entry can have one or more homonyms, each of which c a n have one or m o r e senses. The senses themselves can open one or more sense-specific slots ( o n e - t o - m a n y relations). A sense-specific slot can be filled b y several types o f fillers and the same type of filler can flU several sense-specific slots ( m a n y - t o - m a n y relation). The basic types o f entities and relations, which are the same for all languages, are described by their characteristic features represented as attributes. T h e n u m b e r of attributes as well as

their values m a y differ according to language- specific peculiarities s .

Many-to-many relations between the lexical objects of both languages.

We represent the relation o f lexical equivalence be- tween source a n d target senses, as a m a n y - t o - m a n y relation (one source sense can have multiple target equivalents and vice versa). This breaks with the traditional hierarchical entry structure o f bilingual dictionaries (Calzolari et at. 1990), but it avoids re- d u n d a n t description and storage of one target sense that is lexically equivalent to different source senses. A n o t h e r relation holds for the sense-specific slots o f two senses that are regarded as lexicMly equiv- alent. We decided to establish this relation between slots and not between slot frames. This way we caaa elegantly describe lexically equivalent senses with n o n - c o r r e s p o n d i n g slotframes ~ . In this w a y the re- lations between the two languages m a y be used to a great extent bidirectionally for the XY- as well as for tbe YX-language pair.

The conceptual scheme captured in the E R - d i a g r a m was then m a p p e d into a database scheme and im- plemented in tile relational D B M S S Q L / D S . We chose a relational D B M S , because - - for the main- tenance o f the large L M T - G E lexicon (about 50,000 entries) -- we were in need of a stable D B M S which supports multi-user access, has facilities for auto- matic checking of consistency and integrity of the lexical data mad allows for the specification o f mul- tiple user-specific views o n the data. T o avoid re- d u n d a n c y and update anomalies we tried to normalize o u r relations as far it was useful with re- spect to o u r approach. In total, 32 tables are imple- mented: 25 tables describe lexical objects a n d relations b y means o f attributes with associated val- ues, 7 tables serve to store "knowledge a b o u t the lexical knowledge", e.g. the admitted values for at- tributes sucia as semantic type, filler-type, slot-type for both languages.

2.2. C O L O L A : the u s e r interface to L O L A

C O L O L A is the user interface to L O L A - D B that looks u p the lexical data o f a given search w o r d a n d displays it o n sequentially connected menus. T h e design of the menus as well as their sequential order was guided by the m a n n e r in which lexicographers describe lexical entries. The following operations c a n be performed:

4 The boxes represent types of entities, the diamonds represent types of relations between entities, the ellipses represent attributes which characterize types of entities or relations. The labels of the connection lines indicate whether the relation in question is a one-to-one, one-to-many, many-to-one or many-to-many relation.

The ER-diagram is a simplified version of the actual conceptual model. For the purpose of this paper, several entity types, attributes, and relations have been leR out.

s E.g.: in German a preposition like "auf' can govern either an accusative NP ("warren auf') or a dative NP ("lasten auf') depending on the verb that lakes the prepositional phrase with the respective preposition as a complement. Therefore "case" is a feature, relevant for the description of German slot fillers filling a prepositional complement slot,

s E.g. cases like "like" and "gefallen" where the subject of the English verb corresponds to the dative object of the German verb; or cases like "geigen" and "play the violin" where the English direct object filler "the violin" is incor- porated in the semantics of the German verb "geigen".

(4)

]

" - " 7 " ~ / ~ " 1" \ " \ n,

F i g u r e 2. E n t i t y R e l a t i o n s h i p D i a g r a m f o r [ ; e r m a n - E n g l i , - . h

n addition o f new source or target entries n deletion of existing entries

n change of existing or addition of new features to existing entries

[] deletion of features from existing entries u assigmnent of new or deletitm o f existing

transit(inn equivalents for a given source sense N update, insertinn or deletion of transfer infnr- m a t i o n for each pair o f translation equivalents.

F o r each part o f speech, a specific sequence of menus is defined. There are ntcnus for h o m o n y m s a n d senses of source and target entries; the qinkiug" o f the source senses and target senses regarded to be lexically equivalent is clone via transfer menus. This allows lexicographers to spcci~dize on specific parts o f speech or on specific fi~atures which can be locally updated.

C O I , O L A controls multi-user access to the L O I , A - I ) B so tltat several lexicographers (:an up- date the lcxical database simultaneously. The log- ical unit of work is the source or target h o m o n y m : when a lcxicngraphcr rcqucsts to update a h o m o n y n t , this h o m o n y m , together with its senses, is locked t a r other users.

If a new entry contains blanks, a multiword m e n u is called where the multiword is split u p into its c o m p o n e n t s . F o r each c o m p o n e n t , the following lexical information is gathered: the part of speech, whether the word m a y inflect within the multiword, and w h e t h e r a phrase can be inserted between one multiword c o m p o n e n t mKl tire previous one with- out doing a w a y with idiomaticity.

If n e w h o t n o n y l u s o r 8 e l l s e s ;:ire i n s e r t e d o u t h e multiword m e n u as well as o n other m e n u s , default wducs for tcaturcs arc displayed. They can either be accepted or rejected and overwrittcn by the lcxiengraphcr 7 . T h e assumptions o n dethult values for attributes nf lcxical intbrmation m a y differ ac- cording to diffcrcnt g r a m m a r s and systems. We therefore decided to store the complete lexJcal in- fbrmatinn and use dcfault wdues as proposals in tire user interface. With this a p p r o a c h we aUow for two advantages: on one hand, the data in thc database can be used lot difllsreut appficatinrts having distinct theory specific assumptions o n defaults. O n the other hand, the user of C O 1 , O I , A can bcuetit li'om the economic advantages ()f default assumptions. C()I~OLA does extensive crmsistency chccking of the values entered by tire lexicographers. Illegal val- ues arc rejected a n d warning messages are displayed in situations where errors nfight easily occur. Al- t h o u g h m u c h of tim consistency cttecking is sup.. pnrted by the database m a n a g e m e n t system, some extensions were necessary.

Further support for the lexicographers is provided by an interface tn the W o r d S m i t h nn-linc dictit)nary system (cf. Byrd/Neff 1987). Several machine read- able dictionaries are available e.g. Collins G e r m a n - tinglish, F, nglish-German, L o n g m a n ' s I)ictionai T o f C o n t e m p o r a r y l':uglish, and Webster 7th Collegiate dictionary. The lexicographer can look u p entries in these dictionaries during the encoding process. Furthermore, help menus are provided in which ttre vMid values tbr specific features can be looked tip.

7 Default values are provided, for instance, for slot fillers. German direct objcct slots get an accusative noun phrase as lhe default filler. The lexicographers may accept this, add other fillers or write over it with ann(fief filler.

[image:4.432.54.358.29.215.2]
(5)

2.3. D B 7"0 L M T

A conversion p r o g r a m I)B T O L M T has been dc- veloped which extracts lexical in-formation stored in the relations of I , O I , A - D B and converts it into l , M T - l b r m a t . 1)11 f l ' O _I,MT consists of two com- ponents:

u a datalmsc extractor and O a c o n v e r s i o n program

The database extractor selects the source entries and the corresponding target entries and stores them in database format. This format can be regarded as an intermediate representation betwccn database scheme and I,MT-format. It consists nf a set of Proh)g predicates which correspond to the relations of thc database scheme. Thcrc are, for instance, entry, h o m o n y m , sense, and slot predicates which correspond to the entry, h o m o n y m , sense, and slot relations in the database. The conversion program finally converts the database format into the LMT-format. It has to be adapted according to the changes or extcnsions of" the l,MT-ff~rmat.

2.4. L M T TO D B

Before and during L O L A design and development, I,MT lexicons in EI, F were already crcated and updated in fdes. Since these lexicons still need up- dating and since this is m u c h better supported by 1,OLA, a conversion p r o g r a m L M T T O DB was needed which converts EI, I r entries c~f lextcon files into the database format and loads them into I , O I , A - D B . I M T _ T O _ I ) I ~ consists of three corn- ponents:

t~ the lexiemt cmnpiler o f L M T , tJ a conversion component, and

[] a database loader.

The lexicon compiler is the c o m p o n e n t of the I,M'I" system which converts the EI,F into tire internal I , M T format s . In the internal format all abbrevi- ation conventions and default assumptions are al- ready interpreted a n d expanded accordingly so that the complete lexical itfformation is represented ex- plicitly. The c o n v e r s i o n c o m p o n e n t then converts the internal l , M T - f o r m a t to database forrnat. The database loader generates the SQL-statements and updates the database. It has to check first whether the h m n o n y m or sense to be inserted is identical with ,an h o m o n y m or sense stored in the database. If all the features of two h o m o n y m s or senses can be unified, they are regarded to be identical and the already existing entry is merged with the converted entry. In all other cases the h o m o n y m or sense is

inserted into the database and merging has to be done b y the lexicographers with COI,O1,A.

2.5. L D B T O

D B

T o supplement the lexical coverage o f the L M T system, a dictionary access m o d u l e has been devel- opcd which allows real-time access (cf. Neff/McCord 1990) to Collins bilingual dictionaries awfilable as lexical data bases (l,l)Bs) 9 . The m o d - ule includes a language pair independent shell com- p o n e n t C O l ,1 ,XY a n d laoguage-speeific c o m p o n e n t s ,'rod converts the lexical data of the L l ) l l into the l , M l ' - f o r m a t , l,Dll T O l)ll is based on these programs. It consists of

[] a pattern matching component, tJ a restroettrring enmlmnent ,

a corrvel'sion eonlllofierrt , and t~ the database loader of 1,MT T O I)B.

With the pattern matclfing cmnlmnent, those fea- tures (sub-trees) that are to be converted are selected from the dictionary entries. In printed dictionaries, features colnnlon tO more th,'m one sub-tree are of. ten factorcd out in order to save space. With the re,structuring component, those features can be moved 1o the sub-trees they logically belong to. The conversion ennlponent converts the restructured dictionary entry to database format. T h e database loader of L M T T O I)B merges the entry with a possibly a l r e a d y e x i t i n g one in I , O I , A - D B anti generates the SQl,-staternents to update the data- base. The converted entries can be revised by the lexicographers with C O I , O I , A .

3. R e u s a b i l i t y o f t h e L O L A s y s t e m

3.1 Reusability o f the tool components

The first I,OI,A prototype was tleveloped to sup- port lexicon development for the language pair German-English. In the meantime, work has been started to make the tool usable for lexicon develop- rnent o f the Fnglish-I)anish and English-Spanish I , M T systerns. As a positive result o f the design principles described in section 3.1., the database scheme had to be modified only slightly with regard to prototype-specific differences I° . The values for" 1,'mguage-specific attributes such as types of slots mad fillers will be defined for" the " n e w " languages Spanish and Danish and will be stored in the data- base. T h c y can then bc used fi~r consistency check- ing (only defined valucs can be u p d a t e d in the database). In C O I , O I , A we had to take into ac- count the h o m o n y m level o n thc target side, where

s In tile morpho-lexical processing and compiling phase, I~,I.F entries are converted into an internal format (cf. McCord (forthcoming): sect. 2) which represents file initial source and transfer analysis of an individual input word string.

9 An LI)II provides a tree representation of tile hierarchical structure of tile dictionary entries. The nodes of tile tree are labeled with attributes having specific values for each individual entry. The t,DB can be queried wilt the spe- cialized query language LQL (cf. Neff/Byrd/Rizk 1988).

~o English-Daoish and English-Spanish use lexicon driven morphology for the target languages Spanish (cf. Rimon e t

at. 1991) and Danish, whereas German-Engllsh uses a rule-based target morphology for English (el', McCord/Wolff 1988).

(6)

the features of Spanish ,anti l)anish morphology have to be specitied. The programs that convert the database entries into the timnat of the application lexicons and vice versa ( ] ) l l f l ' O I,MT mul I,MT_TO DIt) need generalization ]il nrdei to achieve an abstraction from prototype-specific lea- lures of ] ,MT.

3.2 R e u s a b i l i t y o f the lexical d a t a

In order to meet the requirement of data independ- ence, the representation of lexical entries in the da- tabase is highly independent of that in the application lexicon. In the database, the description of linguistic entities and their interrelatkms is given in a set of tables where specitic values air stored lor the characteristic attributes of each individual entity. On these tables, different views can be defined for different types of users, l)iffercnt programs (like I)B TO I,MT) can extract exactly the attribute values needed fur their respective appficatiou and convert them into each given format. This way, from one and the same data base several lexicons can be generated, in which the same "lhaguistic world' is structured differently or represented in a completely ditti~rent way. The possibilities of reus- ability ,are naturally defined and limited by thc number of the registered types of lexical information in the origin',d data base. As far as the LOI,A da- tabase is concerned, the very detailed description of slot frames as well as the information about nmlti- words and the properties of their emnponents may be reused for other N L P applicatkms with one of the languages inwflved. The reusability of the transter intormation (specitied in the transfer re- lations between the languages of a given language pair) for other MT systems depends highly on the respective MT approach. As to the question of rcusabilty of the data in the L M T system "family", three different cascs have to be distinguished:

1. lexical-data description given for a source lan- guage X is reused for another language pair having X as source language,

2. lexical-data description given for a source lan- guage X is reused for another language pair having X as target language,

3. lexical-data description given for a target lan- guage Y is reused for another language pair having Y as source language.

In the tirst two cases, reusability of tim lexical data of language X is very high. In tim ttfird case, the description of Y as source 1;mguage may have to be more detailed in order to achieve ml adequate syn- tactic analysis n . New attributes or even new types of entities or relationships may be needed mad the database scheme will have to be etd~aneed accord- ingly.

4. O u t l o o k

Our long-term goal is a multilingnal database, in wlfich the lexical knowledge tbr each language in- volved in the 1,MT project is represented only once. Application lexicons lor I,MT prototypes with dif- ferent language pairs arc generated by extracting tile requitvd informatiun from the database and by converting it into the respective l,MT-format. Furthermore, tim tool is to be extended in such a way that it is not restricted to the construction of MT lexicons, hut c~m also be used as a terminology workbench and thus support tim construction and m;dntenance of terminology. An integra|ed M T and tcrminulogy database would have the advantage that the Icxical knowledge encoded by terminnlogists aud translators can be used by tim tr,'ulslalkm system as wclh For refinement and completion of the description of the German lan- guage, it is pl~mned to integrate lurtber i~ffurmation from available Gerinan NI,I' lexicons into the I,OI,A-DII. A basic proldem concerning this undertaking will bc to identify and to match the basic categories "entries", "homonyms", "senses", which arc detined in various Icxical resources ac- cording to diffbrent criteria, only some nf which be- ing transparent. With this erfurt, we hope to gain Ihrther knowledge on the limits and possibilities concerning the reusability tff lexical data.

5. R e f e r e n c e s

Barnett, B., 11. l,ehmann, M. Zneppritz (1986): "A Word Database for Natured I,anguage Processing", l'roceedings I l th International Con]erence on Com- putational Linguistics COLlNG86 Augast 25th to 291h, 1986, Bonn, Federal Republic of Germany, pp. 435-440.

Btumcnthal ct al. (1988): "Was ist eigentlich ein V e r w e i s ? - Konzeptuellc Dateumodellierung als Voraussctzung computcrgestiilzter Verwcisbehand- lung." , Ilarras,G.(ed.): Das WOrlerbuch. Artikel und Verweisstrukturen. l)fisseldorf 1988.

Byrd, R., N. Calzolari , M. Chodorow, J. Klavans, M. S. Neff, O. Rizk(1987): "Tools and Methods tbr Computatkmal I,exicography", Computational Lin- guistics,13, 3-4.

Byrd, R. J., M. S. Neff" (1987): WordSmith User's Guide, Research Report, IBM Research Division, Yorktown lleigbts, NY 10598.

Chen, Peter P.-S. (1976): "The lintity-Relationship Model - Towards a Unified View of Data" , ACM Tratt~actioas on Database Sy.vtetrtv 1, pp. 9-36.

Calzolmi, N. (1989): "The Development of l,arge Muno- and Bifingual I,exical Databases", Contrib- ution to the IBM Europe Institute "Computer based

ii ii.g. in a source-based translation system like LMT, tile information on whether a target slot is obligatory is not di- rectly encoded in tile I,MT transli~r-lexicon; tile system controls target slot asslgnmeat by tile presence of a corre- sponding source slot in a given input sentence and tile mapping relalion specilied within tim transfer lexicon entry. On the source side, however, the feature of slot obligatorlness is used for purposes of analysis disambiguation.

(7)

TrarLffation of Natural Language", G~ua'nisch- Partenkirchen.

Calzolari et al. (1990): "Computational Model of the Dictionary F.ntry - Preliminary P, epnrt", Pro/ect

AQUILEX, Pisa.

IIeid, U. (1991): A short report on the EUROTRA-7 Study, Univ. of Stuttgart t99t.

MeCord, M. C. (1989) "A NEw Version of the Machine 'l'ranslatkm System I,MT", Literary and Linguistic Computing, 4, pp. 218-229.

McCord, M. ('. (1990): "Slot Grammar: A System for Simpler Construction of Practical Natural l~tH- guage Grammars", In R. Studer (ed.), Natural Lan-

guage and Logic: International Scientific

Symposium, Lecture Notes in Computer Science,

Springer Verlag, Berlin, pp. ] 18-145.

MeCord, M. C. (forthcoming): "The Slot Gram- mar System", In J. Wedekind and Ch. P, ohrer (ed.), Unification in Grammar, to appear in MIT Press.

McCord, M. C., Wolff, S. (1988): The Lexicon and Morphology for LMT, a Prolog-based M T system,

Research P, eport RC 13403, IBM Research Divi- sion, Yorktown lleights, NY 1(1598.

Neff/Byrd/Rizk (1988) M. S. Neff, R. J. Byrd, O. A. P.izk: "Creating and Querying }lierarclfical l,cxical Data Bases", l'roceedings of the 2rid ACI. Conference on Applied NLP, 1988.

Neff, M. S., M. C. McCord (1990): "Acquiring Lcxical l)ata From Machine-readable Dictionary Resources for Machine Translation", Proceedings

of the 3rd Int. Conf. on Theoretical and

Methodological Issues in Machine Translation o f Natural Languages, pp. 87-92. Linguistics Research Center, Univ. of Texas, Austin.

Storrer, A. ( 1 9 9 0 ) : "()berlegungen zur Reprhsentation der Verbsyntax in einer multifunktional-polythcoretisehen lexikalischen I)atenb.'mk" , Sehaeder,B./Rieger,B. (ed.): Lexikon und Lexikographie: ma.~chinell - maschinell gestatzt.

Grundlagen Entwieklungen Produkte,

Hildesheim, pp. 120-133.

Ri[non, M., McCord, M. C., SchwaU, U., Martinez, P. (1991): "Advances in Maehiue Translation Re- search in IBM," Proceedings of M T Summit III, pp. 11-18, Washington D.C.

Figure

Figure I. IX)I,A - architecture
Figure 2. Entity Relationship Diagram for [;erman-Engli,-.h

References

Related documents