• No results found

Exploiting a Large Data Base by Longman

N/A
N/A
Protected

Academic year: 2020

Share "Exploiting a Large Data Base by Longman"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

E X P L O I T I N ( ~ A L A R G E D A T A B A S E B y L O N G \ I A N i

A. blICtlIELS (English Dept), J . 5~LLB~)ERS (Computer C e n t r e ) , J . NOi~L (English Dept) U n i v e r s i t y o f Li}ge, Belgium

We wish to explore some o f the a s p e c t s o f the e x p l o i t a t i o n o f two d i c t i o n a r y f i l e s by LONGHAN Ltd, one f o r ' c o r e ' [mglish and one f o r I m g l i s h idioms.

W e ' l l t r y to show the f e a s i b i l i t y o f an approach to language p r o c e s s i n g based on a l e x i c o n , conceived o f as the r e p o s i t o r y o f grammatical, semantic and k n o w l e d g e - o f - t h e - world i n f o r m a t i o n .

A f t e r g i v i n g a b r i e f d e s c r i p t i o n o f the computer :files ( S e c t i o n I) w e ' l l focus on the f o l l o w i n g p o i n t s :

a) a l e x i c a l approach to granmar allows a c o n s i d e r a b l e s i m p l i f i c a t i o n o f the PSG component o f a p a r s i n g system ( S e c t i o n I I , P a r t One)~

b) the s y n t a c t i c p o t e n t i a l o f many lexemes ( a t s u r f a c e s t r u c t u r e l e v e l ) can serve as a guide to t h e i r deep s t r u c t u r e c o n f i g u r a t i o n s (Section I I s P a r t Two)j

c) provided t h a t a d i c t i o n a r y makes use o f a l i m i t e d d e f i n i n g v o c a b u l a r y , the t e x t s o f the d i c t i o n a r y d e f i n i t i o n s can be p r o c e s s e d on the b a s i s o f c o r r e l a t i o n s between s y n t a c t i c s t r u c t u r e s ( f i l l e d with i n d i v i d u a l lexenms o r lexemes belonging to s p e c i f i a b l e c l a s s e s ) and semantic r e l a t i o n s h i p s such as t h a t between a p r o c e s s verb and an instrument ( S e c t i o n I I I ) .

SECTION I. DESCRIPTION OF THE COxlPUTER FILES . . . A contract with LON@IAN Ltd has made it possible for us to have access to the computer files of two dictionaries, LDOCE (LONDON DICTIONARY OF CONTF~IPORARY ENGLISH) and LDOEI

(LON(~IAN DICTIONAI~OF ENGLISII IDIOMS). I% have had the LDOCE file for some time but have only just received the LDOEI one.

spe~c~aThe features o f LDOCE which make it lly useful for language processing are the following :

a) it reflects the surface structure environment of its entries by means of a sophisticated system of grammatical codes, most of which can be thought of as strict subcategorization features. For instance, IDOCE specifies

i.- that nouns like FACT or CLAIM can be

followed by a THAT-clause,

2.- that a verb such as WATCH can occur follo- wed by an NP followed by an ING-form (we watched the soldiers bleeding).

Though it is mainly concerned with SURFACE structure, LDOCE nevertheless distin- guishes between an NP pair follow in~ GIVE (He gaveJhi.s b.rotl!e~la new b i c y c l e ) [ D J c o d e

NP I N 2

and one following CO~SIDER (He considered this brother, a f ~ ) ~X1lcode

T

NP 1 NP 2 - -

b) through a system o f semantic codes o f the Katz-and-Fodor type (these codes do ~mt appear in the p r i n t e d v e r s i o n o f the d i c t i o n a - r y ) , LDOCE p l a c e s semantic r e s t r i c t i o n s on the s u b j e c t s and o b j e c t s o f verbs (or on the type o f noun t h a t an a d j e c t i v e can modify), s p e c i -

•+

ing f o r i n s t a n c e t h a t PERSUADE r e q u i r e s a t ~ A ~ o b j e c t , and EXTFNPORIZE a [+ ItU~4AN~ subject.

c) LDOCE makes use of a defining vocabula- ry of some 2,000 items - all the definitions and all the examples associated with the 60,000 entries are couched in that restricted vocabu- lary.

Concerning points a and b it should be emphasized that the gra~natical and semantic codes can appear at two different levels : i.- ENTRY level : the code is appropriate to

all the definitions of the entry in ques- tion,

2.- DEFINITION level : the code is not appro- priate to the whole entry (i.e. in all its senses) but only to those readings that correspond to the definitions that the code is tagged to.

For instance, READ cannot be assigned the same grammatical and semantic codes in senten- ces 1 and 2 :

1 . - He rnana~e~ tr~ ro.ad nt ]~a.~t r~no honk every day

2 . - Your paper d o e s n ' t read too w e l l . This second l e v e l makes i t p o s s i b l e to avoid a p r o l i f e r a t i o n o f i n d i s c r i m i n a t e d i s - j u n c t i o n s in the s p e c i f i c a t i o n o f the codes to be associated with a given lexeme. It seems to us that by restricting the occurrence of code specifications at only one level (nmnely, the

(2)

i~NTRY level), one reduces the predictive power of both grammatical and semantic codes to practically nil in the case of complex entries. On the ot]~r hand, the codes that are appro- priate at DEFINITION level provide an interes- ting type of correlation between strict sub- categorization and selection rules on the one hand and choice of appropriate reading on the other : such a type of correlation is bound to prove very useful for machine translation pur- poses.

voc~au ~ i n g to the use of the same defining lary, LDOEI is a natural extension of LDOCE. Whereas the latter merely lists the idiomatic phrases under the relevant headwords, LDOEI gives the information necessary for re- cognizing and generating all the syntactic and morphological variants of each idi~n. To give only one example, in the entry "TELL ° I WHERE TO GET OFF IV : Pass 2]" the sign o indicates that ~ L L admits of morphological variation in this phrase, I specifies the place of the indirect object (which does not belong to the idiomatic phrase as such) and the grammatical note iV : Pass 2Jinforms the user that the syntactic value of the ~4~ole phrase is verbal

(i. e. that it functions as a VP) and that the passive is to be formed by selecting the indirect object as subject (,'Ib was told where to get off").

(LONGMAN LEXICON)

~ ] i s f o r t h c o m i n g t h e s a u r u s i s a l s o

designed to t i e in w i t h LDOCE, o f which i t i s

p a r t l y a b y - p r o d u c t . As S e c t i o n I I I w i l l make

clear, our analysis of LDOCE definitions will have to rely on a thesaurus, but we do not know yet whether LOLEX will be available in machine-readable form.

SECTION II. TOWAPd)S A S~4ANTICALLY ENRICI~JD SURFACE PARSER BASED ON LDOCE

I . -

~ _ ! ~ ! ~ ! ! z _ ~ _ ~ .

It stands to reason that automatic parsing programmes have to have access to at least two linguistic components : a grammar and a lexicon.

In most systems that we know something about, the gra~nar is a good deal more sophis- ticated than the lexicon. The latter includes only a small sub-part of the total lexicon for the language under study, while the g r s m a r takes care of a large proportion of the basic gran~natical structures.

We would like to explore a diametrically opposed approach : our starting-point is a sophisticated lexicon for co:re English and our aim is to make maximum use of the information it contains to keep our grmmnar within strict bounds.

An obvious first step in developing a parser based on LDOCE is to write algorithms that translate the various grammatical codes

into scanning procedures . Most of these algo- rithms are fairly straightforward and have already been written. What we would like to focus on here is the simplification of the cate- gorial component that such a lexically based syntax permits. Consider 3 :

3. The claim that he has succeeded is patent- ly false. Since there is a code (namely, 5 ) that stipulates whether an element (in this case, a countable noun coded C - the whole code is therefore [~51 ) can be followed by a ~IAT-clause, we will not attempt to account for T~T-clauses via rewrite rules for the cate- gory NP, i. e. we won't have such a rule as :

N P - - - ~ N P ~ T S

Naturally enough, there is no LDOCE code sti- pulating that a noun can be followed by a rela- tive clause (such a code would be meaningless since virtually all nouns can have a relative clause - if not a restrictive, thln at least an appositive one - tagged on their right). We will therefore have to include relative clauses some- where in our rewrite rules for the category NP. Here too, however, the lexical approach to syn- tax can prove useful. To show this, let us first define a CONCATENATION as a string every member of which is tied to some other by means of a LDOCE grammatical code (it requires the other member for the satisfaction of its code or it serves to satisfy the other member's code). The concept of CONCATENATION can be equated with that of CLAUSE if it is extended to cover :

i.- free elements, i, e. elements which are not bound to one particular word or phrase in- side the clause (both sentential adjuncts and linking words such as conjunctions would fall into this category).

2.- a subject role, i. e. the creation of a link between a tensed V (the starting-point for the concatenation - see below) and an NP to be found on its right or on its left.

We have already looked into the mechanisms of tensed V searches and subject role assign- ments and we have found that various properties of English make the task of algorithmizing these mechanisms less formidable than it appears at first sight. The most prominent among these properties are the following : I.- the conditions of use of the auxiliary DO; 2.- the fact that only tensed Vs require a

subject;

3.- the fact that only the first (i. e. left- most) member of a verbal ~roup can bear tense;

4.- the fact that it must bear tense;

S.- the morphological contrast between verb and noun with respect to m ~ b e r (- S marks sin- gular verbs but plural nouns).

Turning now to relative clauses, we see that we can characterize them with great ease : a relative clause is s concatenation that opens with a relative phrase (one of whose realiza-

tions is ~ and another the multi-purpose word THAT, so that a recognition procedure based on

(3)

the occurrence of particular morphemes is bound to fail in some cases) and that misses an NP

(it is this second property that has to be re- garded as essential).

The readers who are familiar with Hudson 1976 will have realized that the approach advocated here is nearer to Hudson's version of systemic gra~nar than to transfornmtional gram- mar : we make full use of sister-dependencies, starting with the tensed V, which we believe to provide the best entry-point into the network of relationships woven by the various code- bearing elements in a sentence.

II.- _Dee_~_st~Ljcture_conf_igkjratitins _.

It is obvious that our parser will have to be able to :

I.- recognize the situations in which the basic order of the constituents (i. e. the one stipulated in the scanning procedures asso- ciated with the gra~atical codes) is dis- rupted under the effect of transformations such as PASSIVIZATION, TOPICALIZATION,

PJ~LATIVE CLAUSE FOt~IATION, GAPPING, . . . )

2.- keep track of the

c o n s t i t u e n t s that have

been moved.

We do not intend to deal with these points here but we would like to stress that the problems for RECOGNITION are very different from those for GENERATION. RAISING and EQUI, for instance, are rather formidable and problem-ridden rules from the point of view of generation but we shall argue that we do not need their counterparts for recognition pur- poses. We shall illustrate this point by look- ing at verb complementation - at the same time we will show that the syntactic potential of a verb can be used as a guide to its deep

structure configuration.

In a VP the SYNTACTIC head is always the first, i. e. tensed verb. As we have seen, the way the parser builds up concatenations re- flects this property. As for the SEMANTIC head, it is very often another verb than the first one. This, however, does not matter in so far as the auxiliaries and semi-m~xiliaries (IIAPPiZ~4, SEEM, ...) do not have any semantic code asso- ciated with them and can therefore be regarded as semantically transparent : they have no effect whatsoever on the pailts that the

semantic component will be called on to examine for compatibility. Consider such a sentence a s 4 :

4.- b{y father seems to have been reading too many strips.

11-te starting-point for building the concatenation would be the tensed V, i. e. SED{S : the concatenation would be allowed to grow both to the left (assigrunent of subject role to the NP 'my father') and to the right :

he~appropriate syntactic code for SEI~,IS is 3J here (i. e. followed by an infinitive with TO) : g . ~

u b ~ j e~t fathers s e e m ~

N P / ~ a t i s f i g s g 3 ] code of SED.~S

SI!Di is not coded semantically, so that the semantic component would not be called on at this stage. In the next step, IIAVE would be examined and its ~I ~ code seen to be

applicable ~ 8Jspecifies that the code-bearing element be followed by an EN-form) so that a new sister dependency would be established :

_ . / 4 - - - ~ ~

My father seems to have |been J

sa~tisfies CI 8J code of HAVE In similar fashion, BEEN would have an ~13~ code (i. e. + ING-form) satisfied by IIEADING :

!ly lather seems to nave been reading Neither HAVE nor BE are semantically coded with restrict to the definitions that have been chosen onYDasis of the grammatical codes that are satisfied in the sentence .~ READING on the other hand, will be coded sy~ttactically

(it requires one NP as object-code ~Ti| ) and semantically (it requires a ~ ~ A N J subject). Since SED4, HAVE and BEEN are semantically transparent, the semantic component will exa- mine the pair ~JX father and re.ading and find them to be compatible as a subject-verb configu- ration. But how does the parser know that

fathe__j is the subject of reading ? A very simple-minded rule states that there is no change in subject in a verbal complex as long as there is no interrupting NP; if there is one, it is to be regarded as the subject of the following verb(s) :

I want to read - ~

I started to read ~ subject of READ "i happened to be reading.

Y want g~. to ready ~"

I saw you r e a d i n g ~ y o u subject of READ I made ~ read

This rule admits of at least one exception, namely PROMISE :

I promised you to read (I subject of READ in spite of interrupting YOU).

Another problem relating to deep structure configurations is that of determining, in v + NP + J (TO) + INFINrrIVE l

+ ING-FOR~ J

structures, whether the NP is to be regarded as the object of the V or not (contrast 'I want him to go' with 'I persuaded him to go').

Instead of going into each deep structure distinction that can be drawn within the field of verb complementation, we will show that the verb classes which Akmajian and Heny 1975

(p. 364 and fell.) find it necessary to set up in their introduction to transformational grammar to account for deep structure distinc- tions (Figure i) can be held apart on the basis of their surface structure potential as captured in their LDOCE gran~natical codes. Figure 1

A1qnajian and Heny's verb classes See appendix I .

[image:3.596.307.551.120.797.2]
(4)

T h e r a i s e d n u m b e r s on the f e a t u r e s in the m a t r i x b e l o w r e f e r to the f o l l o w i n g l i s t of t e s t s e n t e n c e s :

I. I w a n t to g o 2. I w a n t h i m to g o

3. a) ? I w a n t t h a t he s h o u l d g o b) * I w a n t t h a t he g o e s 4. * I p e r s u a d e d to g o 5. I p e r s u a d e d h i m to g o 6. * I p e r s u a d e d t h a t he w e n t 7. * I b e l i e v e to h a v e g o n e 8. I b e l i e v e h i m to h a v e g o n e 9. I b e l i e v e t h a t he h a s g o n e 10. I f a i l e d to g o

11. * I f a i l e d h i m to g o 12. * I f a i l e d t h a t he w e n t

CLASS NUM- BER + ONE TYPICAL EXPONENT

CODES

I

T3/I 3 i v3/x (to be)... 1 T5/T5a

I : WANT +I II : PER- _4

SUADH I I I :

BE- I _7

LIEV

IV : FAIL i +10

i

+2

+5

+8

11

3

6

+9

12

T h e N P f o l l o w i n g the v e r b is its d e e p o b j e c t o n l y in the c a s e of C l a s s II v e r b s (I p e r s u a d e d h i m to g o ~ I

p e r s u a d e d h i m ) ; t h e r e is no N P in C l a s s IV (* I f a i l e d h i m to go) a n d the N P is not the o b j e c t in C l a s s I or in C l a s s III (I w a n t h i m to go ~ I w a n t him; I b e l i e v e h i m to h a v e g o n e - 4 - ~ I b e l i e v e | r him) .

As for P R O M I S E (not d i s c u s s e d i n A k m a j i a n a n d H e n y 1975) it c o u l d be d e f i n e d by m e a n s of the f o l l o w i n g

f e a t u r e r o w : + T3, + T5, + V 3 :

I p r o m i s e d to g o (T3)

I p r o m i s e d h i m to ~ o (V3)

I p r o m i s e d t h a t I w o u l d g o (T5)

The N P b e t w e e n P R O M I S E a n d the TO- I N F I N I T I V E is the o b j e c t (as in the P E R S U A D E class) b u t it is n o t the s u b - j e c t of the i n f i n i t i v e .

S E C T I O N T H R E E : L D O C E D E F I N I T I O N S : A N IR APPROACH TO S E M A N T I C A N D K N O W L E D G E - O F - T H E - W O R L D I N F O R M A T I O N .

/

L D O C E d e f i n i t i o n s c o n v e y s e m a n t i c i n f o r - m a t i o n in a f a i r l y e x p l i c i t , b u t n o n - f o r m a t t e d , form. E v e n t h o u g h a l l d e f i n i - t i o n s a r e w r i t t e n in a D E F I N I N G V O C A B U - L A R Y (not to be c o n f u s e d w i t h a B A S I C V O C A B U L A R Y - see b e l o w ) , no a t t e m p t has b e e n m a d e to s t i c k to a l i m i t e d n u m b e r of D E F I N I N G F O R M U L A E . To g i v e a n e x a m p l e of w h a t w e m e a n by D E F I N I N G F O R M U L A , a n d to a n t i c i p a t e o n w h a t w i l l be the m a i n c o n c e r n of this s e c t i o n , w e w i s h to look at the c l a s s of I N S T R U M E N T S . In t h e o r y , it c o u l d be a g r e e d by the d i c t i o n a r y - m a k e r s t h a t a l l i n s t r u m e n t s h a v e to i n c l u d e the p h r a s e " i n s t r u m e n t u s e d for V i n g " in t h e i r d e f i n i t i o n s . In s u c h a d e f i n i n g f o r m u l a the w o r d I N S T R U M E N T w o u l d be a D E F I N I N G P R I M I T I V E a n d the p r e d i c a t e U S E D F O R w o u l d be a D E F I N I N G R E L A T I O N (in t h i s c a s e , b e t w e e n an

i n s t r u m e n t and a p r e d i c a t e ) . S u c h a k i n d of f o r m a t t e d d e f i n i t i o n w o u l d be l e s s p r e c i s e a n d l e s s e x a c t , b u t i n f i n i t e l y m o r e u s a b l e , t h a n a c o m m o n t y p e d e f i n i - tion. S m i t h a n d M a x w e l l 1973 (p2) p o i n t o u t t h a t in a t y p i c a l d i c t i o n a r y

a p p r o x i m a t e l y 50 % of the v o c a b u l a r y a p p e a r s in the d e f i n i t i o n s . L D O C E is a m a j o r i m p r o v e m e n t on s u c h a t y p i c a l d i c t i o n a r y in t h a t its d e f i n i n g

(5)

v o c a b u l a r y is r e s t r i c t e d to s o m e 2 , 0 0 0 i t e m s (used to d e f i n e s o m e 6 0 , 0 0 0

e n t r i e s ) . M y p u r p o s e in t h i s s e c t i o n is to r e f l e c t on the p o s s i b i l i t y of

t u r n i n g a s i g n i f i c a n t n u m b e r of L D O C E d e f i n i t i o n s i n t o f u l l y f o r m a t t e d o n e s

(i.e. m a k i n g use of d e f i n i n g f o r m u l a e ) .

C o n s i d e r the s e n t e n c e :

I s a w the m a n in the p a r k w i t h a t e l e s c o p e

[ W o o d s in R u s t i n 1973, p. 1 7 ~

The P R E F E R R E D r e a d i n g is the o n e t h a t a s s o c i a t e s 'with a t e l e s c o p e ' w i t h the p r e d i c a t e 'saw' r a t h e r t h a n w i t h e i t h e r of the N P h e a d s 'man' or 'park' : 'saw w i t h a t e l e s c o p e ' r a t h e r t h a n 'man w i t h a t e l e s c o p e ' or 'park w i t h a t e l e s c o p e ' . If we h a d a v a i l a b l e a f o r m a t t e d d e f i n i - t i o n of T E L E S C O P E ( " i n s t r u m e n t u s e d for s e e i n g ..."), t h e r e w o u l d be no

p r o b l e m in a s y s t e m of p r e f e r e n t i a l s e m a n t i c s : the l i n k b e t w e e n 'saw' a n d

' t e l e s c o p e ' ( e m b o d i e d in the d e f i n i t i o n of the latter) w o u l d l e a d to the

s e l e c t i o n of the p r e f e r r e d r e a d i n g o n the b a s i s of the D E N S E S T M A T C H F I R S T p r i n c i p l e . As a m a t t e r of fact, the L D O C E d e f i n i t i o n for ' t e l e s c o p e ' is v e r y n e a r l y w h a t we n e e d :

"a t u b e l i k e s c i e n t i f i c i n s t r u m e n t u s e d for s e e i n g d i s t a n t o b j e c t s by m a k i n g t h e m a p p e a r n e a r e r a n d l a r g e r "

A s i m p l e m a t c h i n g p r o c e d u r e b e t w e e n o u r s u g g e s t e d d e f i n i n g f o r m u l a for

i n s t r u m e n t s a n d the L D O C E d e f i n i t i o n for ' t e l e s c o p e ' w o u l d h a v e b e e n s u f f i c i e n t in t h i s c a s e . T h e p r o b l e m , of c o u r s e , is t h a t t h e r e is a b s o l u t e l y no g u a r a n t e e t h a t the d e f i n i n g f o r m u l a

w i l l be p a r t of the d e f i n i t i o n of a l l i n s t r u m e n t s . H A M M E R , for i n s t a n c e , is d e f i n e d as :

"a t o o l w i t h a h e a v y h e a d for d r i v i n g n a i l s i n t o w o o d or for s t r i k i n g t h i n g s to b r e a k t h e m or m o v e t h e m " ( D e f i n i t i o n I)

No s i m p l e p r o c e d u r e w i l l a s s o c i a t e I N S T R U M E N T w i t h H A M M E R . T h e f a c t t h a t L D O C E m a k e s u s e of a d e f i n i n g v o c a b u - lary, h o w e v e r , e n s u r e s t h a t the defining n o u n (TOOL in t h i s case) is a m e m b e r of a f i n i t e list, n a m e l y the L D O C E defining v o c a b u l a r y i t s e l f . O n e c a n g o a s t e p f u r t h e r a n d m a k e the h y p o t h e s i s t h a t the d e f i n i n g n o u n w i l l b e l o n g to a d e f i n i t e s u b s e t w i t h i n the d e f i n i n g v o c a b u l a r y . O n e c a n g o t h r o u g h t h a t v o c a b u l a r y and s e l e c t the w o r d s t h a t c o u l d s t a n d for I N S T R U M E N T S . The s u b s e t t h a t t h i s p r o c e d u r e y i e l d s c a n f a i r l y e a s i l y be d i v i d e d i n t o two f u r t h e r g r o u p s : on the o n e h a n d o n e f i n d s s u c h g e n e r a l w o r d s as T O O L a n d APPARATUS

(note t h a t the l a t t e r w o u l d not be i n c l u d e d in a B A S I C V O C A B U L A R Y ) w h i c h c o u l d a l s o be u s e d in d e f i n i n g formulae; o n the o t h e r h a n d o n e h a s to i n c l u d e s u c h s p e c i f i c i t e m s as B O A T , B I C Y C L E a n d GUN, w h i c h a r e i n s t a n c e s of i n s t r u - m e n t s . T h e s e c o n d g r o u p is of c o u r s e m u c h m o r e p r o b l e m a t i c t h a n the f i r s t : o n e has to be c o n c e r n e d w i t h T Y P I C A L i n s t r u m e n t s , o t h e r w i s e a l l P H Y S I C A L O B J E C T S w o u l d h a v e to be i n c l u d e d :

He h i t her w i t h the t a i l of a d e a d s n a k e .

The I N S T R U M E N T r e a d i n g of the 'with' - p h r a s e is not d u e to a n y i n t r i n s i c p r o p e r t y of e i t h e r 'tail' or 'snake', b u t r a t h e r to f o u r f a c t o r s :

(6)

a) W I T H o f t e n i n t r o d u c e s an instrumental a d j u n c t ;

b) the 'with' - p h r a s e in t h i s s e n t e n c e c a n n o t be r e a d as p o s t m o d i f y i n g

'her';

c) it c a n n o t be r e a d as an accompaniment a d j u n c t for 'he' e i t h e r ;

d) the p r e d i c a t e 'hit' c a n take an i n s t r u m e n t a l a d j u n c t .

The r e a d e r w i l l h a v e n o t i c e d that f a c t o r s a, c a n d d a l s o a p p l y - mutatis m u t a n d i s - to the e x a m p l e i n v o l v i n g the p r e d i c a t e SEE. This, h o w e v e r , d o e s not

i m p l y t h a t the link b e t w e e n T E L E S C O P E a n d S E E w a s of no use in p r e f e r r i n g the

i n s t r u m e n t r e a d i n g for the 'with' - p h r a s e - n o t e t h a t 'with a t e l e s c o p e ' C O U L D p o s t m o d i f y the N P h e a d s 'man' and

'park'; b e s i d e s , e v e n if it c o u l d not, we w o u l d s t i l l h a v e to f i n d a w a y of t e l l i n g the s y s t e m and t h i s task m a y w e l l p r o v e c o n s i d e r a b l y m o r e formidable t h a n t h a t of a s s o c i a t i n g i n s t r u m e n t s a n d p r e d i c a t e s .

The f o l l o w i n g i t e m s in the L D O C E d e f i n i n g v o c a b u l a r y c o u l d be r e g a r d e d as m a k i n g up the s u b s e t for the c o n c e p t I N S T R U M E N T :

G R O U P I

a p p a r a t u s i n s t r u m e n t m a c h i n e m a c h i n e r y m e a n s o r g a n t o o l

G R O U R II a r m [ R ] a r m s C R ~ a r m y a r r o w axe b e a k

G R O U P II ( c o n t i n u e d )

b e l t g u n p r a y e r b i c y c l e h a m m e r p r o o f b o a t h a n d [R~ p u m p b o o t h a n d l e [R] r a d i o b r a i n h o o k r a i l w a y

b r i c k key r o a d

b r i d g e k n i f e rod b r u s h k n o t roof b u l l e t l a d d e r r o p e

bus l a m p s a i l

b u t t o n law s c a l e s c a m e r a l e t t e r s c i s s o r s c a n d l e m a p s c r e w

car m a t s e r v a n t ~ R ] c a r d m e d i c i n e s h i e l d c a r t m e s s a g e s h o e c h a i n m i c r o s c o p e s i g n c o i n m i r r o r s i g n a l c o m b m o t o r [R] s l a v e [R~ c o n t r o l [R] n a i l s p a d e c o v e r n e e d l e s p r i n g c u r t a i n n e t w o r k s t a i r s d r u m p a n s t o n e e n g i n e [R] p e n s t r i n g f a c t o r y p i n s u p p o r t f e n c e p l a n e s w o r d fork p o i s o n s y s t e m

g a t e p o l e taxi

g i f t p o s t t e l e p h o n e g l a s s p o t t e l e g r a m

t e l e g r a p h t e l e v i s i o n t h r e a d t h u m b t i c k e t t o o t h t r a i n t r a p v e h i c l e w e a p o n [R]

w ~ l

w h i p w h i s t l e

N O T E S

I. For a l l i t e m s in b o t h g r o u p s , POS (Part of Speech) = n

2. A l l i t e m s in G r o u p I - e x c e p t M E A N S , w h i c h is i t s e l f a h e a d - a p p e a r u n d e r the h e a d T O O L in R o g e t ' s T h e s a u r u s .

3. In G r o u p II the i t e m s f o l l o w e d by [ R ] o c c u r in R o g e r ' s T h e s a u r u s u n d e r

the h e a d TOOL.

(7)

4. The u n d e r l i n e d i t e m s in G r o u p II a r e m o r e g e n e r a l a n d c o u l d p e r h a p s be s i n g l e d o u t in a t h i r d g r o u p , i n t e r m e d i a t e b e t w e e n I and II.

O b v i o u s l y , the l i s t s as s u c h are not s u f f i c i e n t for our p u r p o s e : w o r d s s u c h as S P R I N G a n d M E D I C I N E are not r e l e v a n t to the I N S T R U M E N T c o n c e p t in s o m e of t h e i r m o s t f r e q u e n t u s e s - for our p u r p o s e s the d e f i n i n g v o c a b u l a r y s h o u l d n o t h a v e b e e n l i m i t e d to a l i s t of L E X I C A L ITEMS; in c a s e of p o l y s e m i c w o r d s , n u m b e r s s h o u l d h a v e b e e n a d d e d to m a k e c l e a r w h i c h d e f i n i t i o n s w e r e to be a s s o c i a t e d w i t h the d e f i n i n g w o r d : S P R I N G I (= a s o u r c e ) , 2 (= a s e a s o n ) , 4 (= e l a s t i c i t y ) , 5 (= an a c t i v e h e a l t h y q u a l i t y ) and 6 (= an act of s p r i n g i n g ) , a r e n o t r e l e v a n t to the I N S T R U M E N T c o n c e p t . S i n c e - in t h e o r y - the n o u n S P R I N G c a n be u s e d w i t h a l l six m e a n i n g s in L D O C E d e f i n i t i o n s , its i n c l u s i o n in our l i s t is l i a b l e to p r o v e d e t r i m e n t a l : it c a n l e a d the s y s t e m to a s s o c i a t e the I N S T R U M E N T c o n c e p t w i t h a d e f i n i n g w o r d t h a t has n o t h i n g to d o w i t h i n s t r u m e n t a l i t y •

G o i n g b a c k to the L D O C E d e f i n i t i o n for H A M M E R , w e r e a l i z e t h a t the a l g o r i t h m t h a t w i l l a s s o c i a t e i n s t r u m e n t s and p r e d i c a t e s w i l l h a v e to take i n t o

a c c o u n t , n o t o n l y the V i n g f o r m (in the f o r m u l a 'for V i n g ' ) , b u t a l s o its

o b j e c t ; o t h e r w i s e a h a m m e r is g o i n g to be t h o u g h t of as a k i n d of v e h i c l e :

C o m p a r e

a tool ... for driving DRIVE 1 2/3 in LDOCE

with

a tool ... for driving DRIVE 1 5/6 in LDOCE nails

A s e c o n d d i f f i c u l t y t h a t we m u s t f a c e up to is t h a t t h e r e m a y be no d e f i n i n g N O U N , b u t an a l l - p u r p o s e i n d e f i n i t e

s u c h as S O M E T H I N G or A N Y T H I N G . In t h a t case, h o w e v e r , the I N S T R U M E N T c o n c e p t is l i k e l y to be e x p r e s s e d s o m e w h e r e e l s e in the d e f i n i t i o n s , by m e a n s of

(USED) F O R Ving, for i n s t a n c e . T h i s l a s t p o i n t l e a d s us to an e x a m i n a t i o n of the v a r i o u s w a y s in w h i c h the link b e t w e e n i n s t r u m e n t a n d p r e d i c a t e c a n be c o n v e y e d ; the e x i s t e n c e of a d e f i n i n g v o c a b u l a r y is a h e l p b u t the r a n g e of S Y N T A C T I C p o s s i b l e s r e m a i n s e n o r m o u s ; h o w e v e r , t h e r e is s o m e t h i n g t h a t c o u l d be c a l l e d the L E X I C O G R A P H I C A L T R A D I T I O N and f a m i l i a r i t y w i t h t h a t t r a d i t i o n c a n h e l p c u t d o w n on the n u m b e r of p o s s i b l e f o r m u l a e - the f o l l o w i n g s t a n d a g o o d c h a n c e of b e i n g r a t h e r h e a v i l y u s e d :

[~OME THING

•.. ]

]

THING

"INSTRUMENT ...

TOOL

Q I O

FIG. 2

USED rIN1

TO V

I. Y.I

MADE TO V

'I%IAT [ CAN V

IS USED TO

I MADE TO V I USED TO V

(USED) FOR VING

O b v i o u s l y , p r o c e s s i n g L D O C E d e f i n i t i o n s is a lot of w o r k in t e r m s of the

n e c e s s a r y a l g o r i t h m s and in t e r m s of the s h e e r v o l u m e of l a n g u a g e d a t a to be s c r u t i n i z e d . W e s u g g e s t t h a t a u s e f u l a p p r o a c h is p r o v i d e d by I R ( I n f o r m a t i o n R e t r i e v a l ) t e c h n i q u e s as e m b o d i e d in

(8)

the I B M s y s t e m k n o w n as S T A I R S .

S T A I R S p r o c e s s e s v a r i o u s o b j e c t s , w h i c h c a n be w o r k e d i n t o the f o l l o w i n g

h i e r a r c h y :

D O C U M E N T S (TOP)

/

P A R A G R A P H S

/

S E N T E N C E S

/

W O R D S (BOTTOM)

The v a r i o u s p a r a g r a p h s of a g i v e n d o c u m e n t c a n be a s s i g n e d l a b e L 5 , so t h a t the s e a r c h w i t h i n a s i n g l e d o c u m e n t c a n be o r i e n t e d .

S T A I R S p r o v i d e s a n u m b e r of S E A R C H O P E R A T O R S , w h i c h w i l l be b r i e f l y c h a r - a c t e r i z e d b e l o w . A ~77~,£/ s e a r c h

o p e r a t o r c a n be u s e d to link a n y o{ the f o l l o w i n g t h r e e c a t e g o r i e s :

I. w o r d t o k e n s , e.g. D I S E A S E S , A P P L I E S , C O M P U T E R I Z E D , I F ~

2.a) stems, e.g. RUN-, A N T A G O N I Z - ,

M O T H E R - (the u s e of a c h a r a c t e r m a s k e n a b l e s the s y s t e m to a s s i g n R U N N I N G RUNS, R U N N E R , R U N N E R S , etc. to the s t e m RUN-) a n d b) l e x e m e s for w h i c h S T A I R S g e n e r a t e s the m o r p h o l o g i c a l v a r i a n t s ~

3. a n y e x p r e s s i o n c o n s i s t i n g of elements of t y p e I or 2 l i n k e d by S T A I R S S E A R C H O P E R A T O R S (the d e f i n i t i o n is t h e r e f o r e r e c u r s i v e , a n d a l l o w s a n y d e g r e e of e m b e d d i n g ) .

L e t A a n d B s t a n d for e l e m e n t s b e l o n g i n g to a n y of the a b o v e t h r e e t y p e s . The o p e r a t o ~ t h a t S T A I R S w o r k s w i t h a r e the f o l l o w i n g : A A D J B : A and B o c c u r n e x t to

e a c h o t h e r a n d in t h a t o r d e r in the d o c u m e n t to be r e t r i e v e d .

A S Y N B : A a n d B a r e to be re-

g a r d e d as s y n o n y m S [ o r a g i v e n s e a r c h operation A W I T H B : A and B o c c u r in the

s a m e s e n t e n c e

A S A M E B : A a n d B o c c u r in the s a m e p a r a g r a p h

N O T B : B d o e s n ' t o c c u r in the d o c u m e n t to be retrieved A AND B : both A & B

A OR B : inclusive OR

A X O R B : e x c l u s i v e O R

In our s y s t e m the S T A I R S h i e r a r c h y would c o r r e s p o n d to the f o l l o w i n g :

A . A D O C U M E N T -A H O M O G R A P H (e.g.

D O U B T 2) or A N E N T R Y W I T H O U T H O M O G R A P H S

(e.g. P O N D E R O U S ) in the L D O C E f i l e B. A P A R A G R A P H -a s p e c i f i e d F I E L D

w i t h i n A, e.g. P O S (part of s p e e c h ) , G R A M M A T I C A L C O D E , S E M A N T I C C O D E , T E X T O F T H E D E F I N I T I O N , . . .

C . A S E N T E N C E - a n y s e n t e n c e i n c l u d e d

in the t e x t of a given d e f i n i t i o n

D . A W O R D - t h e v a r i o u s w o r d s

w i t h i n a d e f i n i t i o n or the v a r i o u s c o d e s a n d P O S w i t h i n a c o d e f i e l d or a P O S f i e l d . It w i l l be a p p a r e n t t h a t in o r d e r to r e w r i t e FIG. 2 as a s e t of s e a r c h o p e r - a t i o n s in a S T A I R S - l i k e s y s t e m we n e e d to be a b l e to r e f e r to s p e c i f i e d m o r p h o l o g i c a l a n a l y s e s . M o r e o v e r , V and N P a r e n e i t h e r w o r d t o k e n s nor s t e m s : t h e y r e f e r to c a t e g o r i e s ( r e s p e c t i v e l y l e x i c a l a n d p h r a s e s t r u c t u r e ) a n d we w i l l h a v e to e x t e n d the p o s s i b i l i t i e s of the s y s t e m so t h a t s u c h c a t e g o r i e s c a n be i n c l u d e d in the e x p r e s s i o n s g u i d i n g t h e s e a r c h a n d r e t r i e v a l o p e r a t i o n s . P h r a s e s t r u c t u r e c a t e g o r i e s a r e a h a r d n u t to c r a c k , and we w i l l p r o b a b l y h a v e to d o w i t h o u t t h e m in a f i r s t stage, b u t l e x i c a l c a t e g o r i e s s u c h as V c a n be housed in a S T A I R S - l i k e s y s t e m : a V is the n a m e of a n y d o c u m e n t that includes V a m o n g its P O S - p a r a g r a p h .

L e a v i n g a s i d e the r e f e r e n c e to NPs, FIG. 2 c a n be r e w r i t t e n as a c o m p l e x S T A I R S - l i k e s e a r c h operation w i t h s i x l e v e l s . T h e e m b e d d i n g of S T A I R S e x - p r e s s i o n s w i t h i n S T A I R S e x p r e s s i o n s g i v e s r i s e to the use of l a b e l s s u c h as

(9)

At, BI, etc; the c o l o n is to be r e a d as "can be d e f i n e d a s " "

AI O R A2

AI : BI W I T H B 2

BI : ~ N Y T H I N G ' O R 'SOMETHING' B 2 : CI O R C2 O R C3

CI :---'USE~- W I T H 'FOR' A D J V - I N G

C2 : ~ - ~ R ~ A D J V - I N G C3 : 'MADE' A D J 'TO' ADJ V A 2 : B3 W I T H B 4

B3 . ' ~ N S T R U M E N T ' O R S Y N - I N S T R U M E N T

B4 : C4 O R C5 O R C6 O R C7 C4 ~--DI AD--J D2 - -

DI : 'WHICH' OR 'THAT' D 2 : Vs O R EI-'OR E2

El : 'CANrADJ V E2 : 'IS' A D J

'USED' WITH 'TO' ADJ V

C5 : 'MADE' AD-J-"T~ ' ADJ V C6 : 'USED' WITH 'TO' ADJ V C7 : D3 O R D4

D3 'USED' W I T H 'FOR' A D J V - I N G D4 : 'FOR' A D J V - I N G _ _ - - -

N o t e t h a t to be r e a l l y u s e f u l , the

a l g o r i t h m s t h a t a s s o c i a t e p r e d i c a t e s and i n s t r u m e n t s s h o u l d h a v e a c c e s s to a t h e s a u r u s - l i k e c l a s s i f i c a t i o n of p r e d i c a t e s . T a k e for i n s t a n c e the d e f i - n i t i o n of M I C R O S C O P E :

"an i n s t r u m e n t t h a t m a k e s v e r y s m a l l n e a r o b j e c t s s e e m l a r g e r , a n d so c a n be u s e d for e x a m i n i n g t h e m "

The p r e f e r e n t i a l link is b e t w e e n M I C R O S C O P E a n d E X A M I N E a n d a s e n t e n c e s u c h as :

"He e x a m i n e d the new v i r u s w i t h an e x t r e m e l y p o w e r f u l m i c r o s c o p e "

w i l l be i n t e r p r e t e d the r i g h t way. B u t w h a t a b o u t

"He s t u d i e d the n e w v i r u s w i t h a n e x t r e m e l y p o w e r f u l m i c r o s c o p e " ? W e c o u l d g e t a r o u n d this p r o b l e m if w e h a d a c c e s s to a t h e s a u r u s l i k e R o g e t ' s : s i n c e S T U D Y and E X A M I N E s h a r e a S U B H E A D in R o g e t ' s , viz. S C A N in 438 V I S I O N , a link b e t w e e n S T U D Y and M I C R O S C O P E c o u l d be e s t a b l i s h e d .

B I B L I O G R A P H Y

- L O N G M A N D I C T I O N A R Y O F C O N T E M P O R A R Y E N G L I S H , 1978.

- L O N G M A N D I C T I O N A R Y O F E N G L I S H I D I O M S , 1979.

- A K M A J I A N A N D H E N Y , 1975 : A k m a j i a n , A. a n d Heny, F., A n I n t r o d u c t i o n to the

P r i n c i p l e s of T r a n s f o r m a t i o n a l S y n t a x , M I T P r e s s , C a m b r i d g e a n d L o n d o n , 1975. - H U D S O N 1976 : H u d s o n , R.A., A r g u m e n t s

for a N o n - t r a n s f o r m a t i o n a l G r a m m a r , The U n i v e r s i t y of C h i c a g o P r e s s , C h i c a g o and L o n d o n , 1976.

- R o g e t ' s T h e s a u r u s of E n g l i s h W o r d s and P h r a s e s , P e n g u i n B o o k s , 1962 L o n g m a n E d i t i o n .

- R U S T I N , 1973 : R u s t i n , R. (ed.), Natural L a n g u a g e P r o c e s s i n g , A l g o r i t h m i c s Press N e w York, 1973.

- S M I T H & M A X W E L L 1973 : S m i t h , R.N., a n d M a x w e l l , E., A n E n g l i s h D i c t i o n a r y for C o m p u t e r i z e d S y n t a c t i c and Semantic P r o c e s s i n g S y s t e m s , m i m e o , P i s a 1973.

~ _ ~

Figure I Akmajian and Heny's verb classes CLASS I : prefer, want, hate, like, hope,

desire, love.

CLASS II : force, persuade, allow, coax, help, order, permit, make, cause. CLASS I I I : believe, assume, know, perceive,

find, prove, understand, imagine. CLASS I V : condescend, dare, endeavour, fail,

manage, proceed, refuse.

CLASS V : seem, appear, happen, turn out.

[image:9.596.63.289.122.578.2]

Figure

Figure 1 A1qnajian and Heny's verb classes
Figure I Akmajian and Heny's verb classes

References

Related documents

Continuum of Teaching Practice for CSTP (Beginning Teacher Support and

To investigate the proteomic response of Pseudomonas putida KT2440 to mcl-PHAs synthesis and accumula- tion process, the bacterial cells were harvested from the cultures at 8 h

Atribuït a Antoni Sadurní i taller (brodats) i Bernat Martorell (model pictòric), Decapitació de Sant Jordi, brodat de la casulla del tern de Sant Jordi, capella del Palau de

available in Pasteurella National Research Labora- tory at Razi Vaccine and Serum Research Institute) from different provinces of Iran, were used to inves- tigate

Experiments were designed with different ecological conditions like prey density, volume of water, container shape, presence of vegetation, predator density and time of

Planning for Enrichment and Equity in Dual Language Education: A Study of Eight Program Master Plans..

Vaccination with Brucella recombinant DnaK and SurA proteins induces protection against Brucella abortus infection in BALB/c mice. Cloeckaert A, Bowden

The user types the text/query which he wants to know in native language to communicate with the people.The final intent of the app is in Figure 6, which contains