On Representing Governed Prepositions and Handling “Incorrect” and Novel Prepositions

(1)

ON R E P R E S E N T I N G G O V E R N E D P R E P O S I T I O N S AND H A N D L I N G " I N C O R R E C T " AND NOVEL P R E P O S I T I O N S

H a t t e R. Blejer, Sharon F l a n k , a n d A n d r e w K c h l e r SRA C o r p o r a t i o n

2000 15th St. N o r t h A r l i n g t o n , VA 22201, USA

A B S T R A C T

N L P systems, in o r d e r to be robust, must h a n d l e novel a n d i l l - f o r m e d i n p u t . One c o m m o n type o f error involves the use o f n o n - s t a n d a r d prepositions to m a r k a r g u m e n t s . In this paper, we argue t h a t such errors can be h a n d l e d in a systematic f a s h i o n , a n d t h a t a system d e s i g n e d to h a n d l e t h e m o f f e r s o t h e r advantages. We o f f e r a c l a s s i f i c a t i o n s c h e m e f o r p r e p o s i t i o n usage errors. F u r t h e r , we show h o w t h e k n o w l e d g e r e p r e s e n t a t i o n e m p l o y e d in t h e SRA NLP system facilitates h a n d l i n g these data.

1.0 I N T R O D U C T I O N

It is well k n o w n t h a t NLP systems, in o r d e r to be robust, must h a n d l e ill- f o r m e d i n p u t . One c o m m o n type o f error i n v o l v e s t h e use o f n o n - s t a n d a r d prepositions to m a r k arguments. In this paper, we a r g u e t h a t such errors can be h a n d l e d in a systematic fashion, a n d t h a t a system d e s i g n e d to h a n d l e t h e m o f f e r s o t h e r a d v a n t a g e s .

T h e examples o f n o n - s t a n d a r d prepositions we present in the p a p e r are taken f r o m colloquial language, both w r i t t e n a n d oral. T h e type o f error these examples represent is q u i t e f r e q u e n t in colloquial w r i t t e n language. T h e f r e q u e n c y o f such examples rises sharply in evolving sub-languages a n d in oral colloquial language. In d e v e l o p i n g an NLP system to be used by various U.S. g o v e r n m e n t customers, we have been sensitized to the need to h a n d l e v a r i a t i o n a n d i n n o v a t i o n in preposition usage. H a n d l i n g this type o f v a r i a t i o n or i n n o v a t i o n is part o f o u r overall c a p a b i l i t y to h a n d l e novel predicates, w h i c h arc f r e q u e n t in sub- language. Novel predicates created f o r sub- languages arc less "stable" in how they m a r k a r g u m e n t s ( A R G U M E N T MAPPING) t h a n general English "core" predicates w h i c h speakers learn as c h i l d r e n . It can be expected that the e v e n t u a l a d v e n t o f

successful speech u n d e r s t a n d i n g systems will f u r t h e r e m p h a s i z e the n e e d to h a n d l e this a n d o t h e r v a r i a t i o n .

T h e N L P system u n d e r d e v e l o p m e n t at S R A i n c o r p o r a t e s a N a t u r a l L a n g u a g e K n o w l e d g e Base (NLKB), a m a j o r part o f w h i c h consists o f objects r e p r e s e n t i n g S E M A N T I C P R E D I C A T E CLASSES. T h e system uses h i e r a r c h i c a l k n o w l e d g e sources; all general "class-level" c h a r a c t e r i s t i c s o f a s e m a n t i c p r e d i c a t e class, i n c l u d i n g the n u m b e r , type, a n d m a r k i n g o f t h e i r a r g u m e n t s , are p u t in the NLKB. This leads to increased e f f i c i e n c y in a n u m b e r o f system aspects, e.g., the lexicon is more c o m p a c t a n d easier to m o d i f y since it only c o n t a i n s i d i o s y n c r a t i c i n f o r m a t i o n . This r e p r e s e n t a t i o n allows us to d i s t i n g u i s h b e t w e e n I c x i c a l l y a n d s e m a n t i c a l l y d e t e r m i n e d ARGUIVIENT M A P P I N G a n d to f o r m u l a t e general class-level c o n s t r a i n t r e l a x a t i o n m e c h a n i s m s .

I . I C L A S S I F Y I N G P R E P O S I T I O N U S A G E

P r e p o s i t i o n usage in English in positions g o v e r n e d by p r e d i c a t i n g elements, w h e t h e r a d j e c t i v a l , verbal, or n o m i n a l , may be classified as (I) lexically d e t e r m i n e d , (2) s y n t a c t i c a l l y d e t e r m i n e d , o r (3) s e m a n t i c a l l y d e t e r m i n e d . Examples are: L E X I C A L L Y D E T E R M I N E D :

laugh at, afraid of

S Y N T A C T I C A L L Y D E T E R M I N E D : by in passive sentences S E M A N T I C A L L Y D E T E R M I N E D :

move to~from

P r e p o s i t i o n usage in i d i o m a t i c phrases is also c o n s i d e r e d to be lexically d e t e r m i n e d , e.g., ~ respect to.

1.2 A T Y P O L O G Y OF E R R O R S IN P R E P O S I T I O N U S A G E

(2)

prepositions into the following categories: (1) s u b s t i t u t i o n o f a s e m a n t i c a l l y appropriate preposition -- either f r o m the same class or a n o t h e r -- f o r a semantically d e t e r m i n e d one, (2) substitution of a semantically appropriate preposition f o r a lexically d e t e r m i n e d one, (3) false starts, (4) blends, and (5) substitution of a semantically appropriate preposition f o r a s y n t a c t i c a l l y d e t e r m i n e d one. A small percentage of the non-standard use of prepositions appears to be random.

1.3 COMPUTATIONAL APPLICATIONS OF T H I S WORK

In a theoretical linguistics f o r u m (Blejcr and Flank 1988), we argued that these examples of the use of non-standard prepositions to m a r k arguments (1) represent the kind of principled variation that underlies language change, and (2) support a semantic analysis of government that utilizes t h e m a t i c roles, citing other evidence f o r the semantic basis of prepositional case m a r k i n g f r o m studies of language d y s f u n c t i o n (Aitchison 1987:103), language acquisition (Pinker 1982:678; Mcnyuk 1969:56), and typological, cross- linguistic studies on case-marking systems. More theoretical aspects of our work ( i n c l u d i n g d i a c h r o n i ¢ c h a n g e a n d arguments f o r and against p a r t i c u l a r linguistic theories) were covered in that paper; here we concentrate on issues of interest to a computational linguistics forum. First, our n a t u r a l language knowledge representation and processing strategies take into account the semantic basis of prepositional case marking, and thus f a c i l i t a t e handling non-standard and novel use of prepositions to mark arguments. The second contribution is our typology of errors in preposition usage. We claim that an NLP system which accepts n a t u r a l l y occurring input must recognize the type of the error to know how to compensate f o r it. F u r t h e r m o r e , the knowledge representation scheme we have implemented is an e f f i c i e n t representation for English and lends itself to adaptation to representing non-English case-marking as well.

T h e r e is w i d e v a r i a t i o n in computational strategies for mapping f r o m the actual n a t u r a l language expression to some sort of P R E D I C A T E - A R G U M E N T representation. At issue is how the system recognizes the arguments of the predicate.

At one end of the spectrum is an approach which allows a n y m a r k i n g of a r g u m e n t s if the type o f the a r g u m e n t is correct f o r that predicate. This approach is i n a d e q u a t e because it ignores vital i n f o r m a t i o n carried by the preposition. At the other extreme is a semantically constrained syntactic parse, in m a n y ways a highly desirable strategy. This latter method, however, constrains more strictly than what h u m a n s actually produce and understand. Our strategy has been to use the latter method, allowing relaxation of those constraints, u n d e r certain well-specified circumstances.

Constraint relaxation has been recognized as a viable strategy for handling ill-formed input. Most discussion centers a r o u n d orthographic errors and errors in subject-verb agreement. Jensen, Heidorn, Miller, and Ravin (1983:158) note the importance of "relaxing restrictions in the g r a m m a r rules in some principled way." Knowing which constraints to relax and avoiding a p r o l i f e r a t i o n of incorrect parses however, is a non-trivial task. Weischedel a n d S o n d h e i m e r .(1983:163ff) o f f e r c a u t i o n a r y advice on this subject.

There has been some discussion of errors similar to those cited in our paper. Carbonell and Hayes (1983:132) observed that "problems created by the absence of expected case markers can be overcome by the application of domain knowledge" using case f r a m e instantiation. We agree with these authors that the use of domain knowledge is an i m p o r t a n t element in u n d e r s t a n d i n g ill-formed input. However, in instances where the preposition is not omitted, but r a t h e r replaced by a non- s t a n d a r d preposition, we claim that an u n d e r s t a n d i n g of the linguistic principles involved in the substitution is necessary.

(3)

MAPPING rules a n d other operator- o p e r a n d s e m a n t i c r u l e s , s e m a n t i c translation creates situation f r a m e s (SF). SFs consist of a predicate a n d e n t i t y f r a m e s (EF), whose semantic roles in the situation are labeled. Other semantic objects are r e l a t i o n a l f r a m e s (e.g. p r e p o s i t i o n a l phrases), p r o p e r t y f r a m e s (e.g. adjective phrases), and unit f r a m e s (measure phrases). D u r i n g the semantic i n t e r p r e t a t i o n and discourse analysis phase, the situation f r a m e is i n t e r p r e t e d , resulting in one or more i n s t a n t i a t e d knowledge base (KB) objects, w h i c h are state or event descriptions with e n t i t y participants. 2.0 R E P R E S E N T I N G ARGUMENT

MAPPING IN AN NLP SYSTEM In our lexicons, verbs and adjectives are linked to one or more predicate classes which are d e f i n e d in the N a t u r a l Language K n o w l e d g e Base (NLKB). Predicates typically govern one or more arguments or t h e m a t i c roles. All general, class-level i n f o r m a t i o n about the t h e m a t i c roles which a given predicate governs is represented at the highest possible level. Only i d i o s y n c r a t i c i n f o r m a t i o n is represented in the lexicon. When lexicons are loaded the i d i o s y n c r a t i c i n f o r m a t i o n in the lexicon is u n i f i e d with the general i n f o r m a t i o n in the NLKB. Our representation scheme has c e r t a i n i m p l e m e n t a t i o n a l a d v a n t a g e s : lexicons are less e r r o r - p r o n e and easier to m o d i f y , the d a t a are more compact, constraint relaxation is f a c i l i t a t e d , etc. More i m p o r t a n t l y , we claim that such semantic classes are psychologically valid.

Our representation scheme is based on the principle that A R G U M E N T MAPPING is generally d e t e r m i n e d at the class-level, i.e., predicates group along semantic lines as to the type of A R G U M E N T MAPPING they take. Our work draws f r o m theoretical linguistic studies of t h e m a t i c relations (e.g., G r u b e r 1976, J a c k e n d o f f 1983, and Ostler 1980). We do not accept the "strong" version of localism, i.e., that all f o r m mirrors f u n c t i o n -- that A R G U M E N T MAPPING classes arise f r o m metaphors based on spatial relations. U n l i k e case grammar, we limit the n u m b e r of cases or roles to a small set, based on how they are m a n i f e s t e d in s u r f a c e syntax. We subsequently "interpret" roles based on the semantic class of the predicate, e.g., the GOAL of an A T T I T U D E is generally an a n i m a t e "experiencer'.

For example, in the NLKB the A R G U M E N T M A P P I N G o f predicates w h i c h denote a c h a n g e in spatial relation specifies a GOAL a r g u m e n t , m a r k e d with prepositions which posit a GOAL relation

(to, into,

a n d

onto)

and a SOURCE a r g u m e n t , m a r k e d with prepositions which posit a SOURCE relation

(from, out of, o f f

of). A sub-class of these predicates, n a m e l y Vendler's (1967) achievements, m a r k the GOAL a r g u m e n t with prepositions which posit an O V E R L A P relation

(at, in).

Compare:

MOVE t o / i n t o / o n t o f r o m / o u t o f / o f f of ARRIVE a t / i n

from

The entries f o r these verbs in SRA's lexicon m e r e l y s p e c i f y which semantic class they belong to (e.g., SPATIAL-RELATION), w h e t h e r t h e y are stative or d y n a m i c , w h e t h e r they allow an agent, and w h e t h e r t h e y denote an achievement. T h e i r A R G U M E N T MAPPING is not e n t e r e d explicitly in the lexicon. The verb

reach,

on the other hand, which marks its GOAL idiosyncratically, as a d i r e c t object, would have this f a c t in its lexical entry.

2.1 G R O U P I N G SEMANTIC ROLES Both on i m p l e m e n t a t i o n a l and on theoretical grounds, we have grouped c e r t a i n semantic roles into superclasses. Such groupings arc common in the l i t e r a t u r e on case and v a l e n c y (see Somers 1987) and are also supported by cross- linguistic evidence. Our grouping of roles follows previous work. For example, the A G E N T SUPERCLASS covers both a n i m a t e agents as well as i n a n i m a t e instruments. A G R O U N D SUPERCLASS (as discussed in T a l m y 1985) includes both S O U R C E and

GOAL, and a GOAL SUPERCLASS

i n c l u d e s G O A L , P U R P O S E , an'd DIRECTION.

(4)

relaxation of the constraints depends in m a n y instances on knowing the relationship between the n o n - s t a n d a r d and the expected prepositions: are t h e y sisters, privatives, or is the n o n - s t a n d a r d preposition a p a r e n t of the expected preposition.

In the following section we present examples of the five types of preposition usage errors. In the subsequent section, we discuss how our system presently handles these errors, or how it might e v e n t u a l l y handle them.

3.0 THE DATA

We have classified the variation data according to the type of substitution. The main types are:

(1) semantic for semantic (Section 3.1), (2) semantic for lexical (Section 3.2), (3) blends (Section 3.3),

(4) false starts (Section 3.4), and

(5) semantic for syntactic (Section 3.5). The data presented below are a representative sample of a larger group of examples. The c u r r e n t paper covers the classifications which we have e n c o u n t e r e d so far; we expect that analysis o f additional data will provide f u r t h e r types o f substitutions within each class.

3.1 SEMANTIC FOR SEMANTIC 3.1.1 To/From

The substitution of the goal m a r k e r for the source m a r k e r cross-linguistically is recognized in the case l i t e r a t u r e (e.g., lkegami 1987). In English, this appears to be more pronounced in certain regional dialects. Common source/goal alternations cited by Ikegami (1987:125) include: averse f r o m / t o , d i f f e r e n t f r o m / t o , immune f r o m / t o , and distinction f r o m / t o . The m a j o r i t y o f e x a m p l e s i n v o l v e to substituting for from in lexical items which incorporate a negation of the predicate; the standard m a r k e r of G R O U N D in this class of predicates is a SOURCE marker, e.g., different from. The "positive" counterparts mark the G R O U N D with GOAL, e.g., similar to, as discussed in detail in Gruber (1976). Variation between to and from can only occur with verbs which incorporate a negative, otherwise the semantic distinction which these prepositions denote is necessary.

(1) The way that he came on to that bereaved brother completely alienated me TO Mr. Bush. 9/26/88 MCS

(2) At this moment I'm different TO primitive man. 10/12/88 The Mind, PBS 3.1.2 To/With

C o m m u n i c a t i o n and t r a n s f e r of knowledge can be expressed e i t h e r as a process with multiple, equally involved participants, or as an a s y m m e t r i c process with one of the participants as the "agent" of the t r a n s f e r of i n f o r m a t i o n . Our data d o c u m e n t the substitution of the GOAL m a r k e r f o r the CO-THEME marker; this m a y r e f l e c t the t e n d e n c y of English to p r e f e r "agent" focussing. The participants in a COMMUNICATION situation are similar in their semantic roles, the only d i f f e r e n c e being one of "viewpoint." By no means all c o m m u n i c a t i o n predicates operate in this way: e.g., E X P L A N A T I O N , T R A N S F E R OF KNOWLEDGE are more c l e a r l y a s y m m e t r i c . T h e s y s t e m d i f f e r e n t i a t e s between "mutual" and "asymmetric" c o m m u n i c a t i o n predicates. (3) The only reason they'll chat TO you is, you're either pretty, or they need something from your husband. 9/30/88 MCS

(4) 171 have to sit down and explore this TO you. 10/16/88

3.2 SEMANTIC FOR LEXICAL 3.2.1 Goal Superclass ( G o a l /

P u r p o s e / D i r e c t i o n )

Goal and purpose are f r e q u e n t l y expressed by the same case-marking, with the DIRECTION m a r k e r a l t e r n a t i n g with these at times. The s t a n d a r d preposition in these examples is lexically determined. In example (6), instead o f the lexically d e t e r m i n e d to, which also marks the semantic role GOAL, a n o t h e r preposition within the same superclass is chosen. In example (5) the phrasally d e t e r m i n e d for is replaced by the GOAL marker. There is a b u n d a n t cross-linguistic evidence for a GOAL SUPERCLASS which includes GOAL and PURPOSE; to a lesser extent DIRECTION also patterns with these cross- linguistically.

(5)

3.2.2 O n / O f

Several examples involve lexical items expressing knowledge or cognition, f o r which the s t a n d a r d preposition is lexically d e t e r m i n e d . This preposition is u n i f o r m l y replaced by on, also a m a r k e r of the semantic role of R E F E R E N T . Examples include abreast of, grasp of, an idea of, and knowledge of. We claim that the association of the role R E F E R E N T with knowledge a n d cognition (as well as with t r a n s f e r - o f - i n f o r m a t i o n predicates) is among the more salient associations t h a t language learners encounter.

(7) Terry Brown, 47, a truck driver, agreed; "with eight years in the White House," he said, "Bush ought to have a better grasp ON the details." 9/27/88 NYT p. B8

(8) I did get an idea ON the importance o f consistency as f a r as reward and penalty are concerned. 11/88 ETM j o u r n a l

3.2.3 W i t h / F r o m / T o

In this class, we believe that "mutual action verbs" such as marry and divorce r o u t i n e l y show a CO-THEME m a r k e r with being substituted f o r e i t h e r to or from. Such predicates have a SECONDARY- MAPPING of P L U R A L - T H E M E in the NLKB. C o m m u n i c a t i o n predicates are a n o t h e r class which allows a P L U R A L - THEME and show a l t e r n a t i o n of GOAL and CO-THEME (Section 3.1.2).

(9) Today Robin Givens said she won't ask for any money in her divorce WITH Mike Tyson. 10/19/88 ATC

3.3 FALSE S T A R T S

The next set of examples suggests that the speaker has "retrieved" a preposition f r o m a d i f f e r e n t A R G U b I E N T MAPPING f o r the verb or f o r a d i f f e r e n t a r g u m e n t than the one which is e v e n t u a l l y produced. For example, confused with replaces confused by in (10), and say to replaces say about in (11). Such examples are more prevalent in oral language. Handling these examples is d i f f i c u l t since all sorts of contextual i n f o r m a t i o n -- linguistic and non-linguistic -- goes into detecting the error.

(10) They didn't want to be confused WITH the facts. 11/14/88 DRS

(11) The memorial service was really well done. The rabbi did a good job. What do you say TO a kid who died fike that?

11/14/88 3.4 BLENDS

Here, a lexically or phrasally d e t e r m i n e d preposition is replaced by a preposition associated with a semantically similar lexical item. In (12) Q u a y l e says he was smitten about Marilyn, possibly t h i n k i n g o f crazy about. In (13) he may be t h i n k i n g of on the s u b j e c t / t o p i c of. The q u e s t i o n e r in (14) m a y h a v e in s u p p o r t / f a v o r o f in mind. In (15) Quayle may have meant we learn by making mistakes. In (16), the idiomatic phrase in support o f is c o n f u s e d w i t h the ARGUlVlENT M A P P I N G of the noun support, e.g., "he showed his support for the president'.

(12) I was very smitten A B O U T her... I saw a good thing and I responded rather quickly and she did too. 10/20/88 WP, p. C8

(13) ON the area o f the federal budget deficit .... 10/5/88 Sen. Q u a y l e in v p debate (& NYT 10/7/88 p. B6)

(14) You made one o f the most eloquent speeches I N behalf o f contra aid. 10/5/88 Questioner in VP debate (& N Y T 10/7/88 p.B6)

(15) We learn B Y o u r mistakes. 10/5/88 Sen. Q u a y l e in v p debate (& NYT 10/7/88 p. B6)

(16) We testified in support FOR medical leave. 10/22/88 FFS

3.5 SEMANTIC FOR SYNTACTIC -- WITH/BY

(6)

substituted f o r by.

We analyze the WITH in these examples e i t h e r as the less agentive A G E N T ( n a m e l y the INSTRUlVlENT) in example (18), or the less agentive CO- THEME in example (17). The substitutions are semantically appropriate and the substitutes are semantically related to AGENT. •

(17) All o f Russian Hfe was accompanied WITH some kind o f singing. 8/5/88 ATC (18) Audiences here are especially enthused WITH Dukakis's description o f the Reagan-Bush economic policies. 11/5/88 ATC

4.0 THE COMPUTATIONAL IMPLEMENTATION

Of the f i v e types o f errors cited in Section 3, substitutions o f semantic for semantic (Section 3.1), semantic f o r lexical (Section 3.2), and semantic f o r syntactic (Section 3.5) are the simplest to handle computationally.

4.1 SEMANTIC FOR SEMANTIC OR LEXICAL

T h e r e p r e s e n t a t i o n s c h e m e described above (Section 2) facilitates handling the semantic f o r semantic and semantic f o r lexical substitutions.

Semantic f o r semantic substitutions are allowed i f

(i) the p r e d i c a t e belongs to the c o m m u n i c a t i o n class and the s t a n d a r d CO- THEME m a r k e r is replaced by a GOAL marker, or

(ii) the predicate incorporates a negative and GOAL is substituted for a s t a n d a r d SOURCE, or vice versa.

Semantic f o r lexical substitutions are allowed i f

(iii) the non-standard preposition is a non- privative sister of the s t a n d a r d preposition (e.g., in the GOAL SUPERCLASS),

(iv) "the non-standard preposition is the NLKB-inherited, "default" preposition for the predicate (e.g., R E F E R E N T for predicates of cognition and knowledge), or

(v) in the NLKB the p r e d i c a t e allows a SECONDARY-MAPPING of P L U R A L - THElvIE (e.g., m a r i t a l predicates as in the divorce with example).

H a n d l i n g the use of a non-standard preposition m a r k i n g an a r g u m e n t crucially involves "type-checking', w h e r e i n the "type" of the noun phrase is checked, e.g. f o r membership in an NLKB class such as a n i m a t e - c r e a t u r e , time, etc. T y p e - c h e c k i n g is also used to n a r r o w the possible senses of the preposition in a prepositional phrase, as well as to p r e f e r certain m o d i f i e r attachments.

Prepositional phrases can have two relations to predicating expressions, i.e., a governed a r g u m e n t (PREP-ARG) or an ADJUNCT. During parsing, the system accesses the A R G U M E N T MAPPING for the predicate; once the preposition is recognized as the s t a n d a r d m a r k e r of an argument, an A D J U N C T reading is disallowed. The rule f o r P R E P - A R G is a separate rule in the grammar. When the preposition does not match the expected preposition, the system checks w h e t h e r any o f the above conditions (i-v) hold; if so, the parse is accepted, but is assigned a lower likelihood. If a parse of the PP as an A D J U N C T is also accepted, it will be p r e f e r r e d over the ill-formed PREP-ARG. 4.2 SEMANTIC FOR SYNTACTIC

The substitution o f semantic m a r k i n g f o r syntactic (WITH for BY) is easily handled: d u r i n g semantic mapping by phrases in the ADJUNCTS are mapped to the role o f the active subject, assuming t h a t " t y p e c h e c k i n g " a l l o w s t h a t i n t e r p r e t a t i o n of the noun phrase. It is also possible f o r such a sentence to be ambiguous, e.g., "he was seated by the man'. We treat with phrases similarly, except that a m b i g u i t y between CO-THEME and PASSIVE SUBJECT is not allowed, based on our observation that with for by is used f o r noun phrases low on the a n i m a c y scale. Thus, only the CO-THEME i n t e r p r e t a t i o n is valid if the noun phrase is animate.

(7)

type of the a r g u m e n t m a r k e d with the "incorrect" preposition must be quite inconsistent with that sense of the predicate f o r the e r r o r even to be noticed, since the preposition is acceptable with some other sense. We are assessing the f r e q u e n c y of false starts in the various genres in which our system is being used, to d e t e r m i n e w h e t h e r we need to implement a strategy to h a n d l e these examples. We p r e d i c t t h a t f u t u r e s y s t e m s f o r u n d e r s t a n d i n g spoken language will need to accomodate this phenomenon.

We do not h a n d l e blends c u r r e n t l y . T h e y involve a f o r m of analogy, i.e.,

smitten

is like

mad,

s y n t a c t i c a l l y , semantically, a n d even stylistically; they may shed some light on language storage and retrieval. Recognizing the similarity in order to allow a principled h a n d l i n g seems very d i f f i c u l t .

In addition, blends may provide evidence f o r a "top down" language production strategy, in which the a r g u m e n t s t r u c t u r e is d e t e r m i n e d b e f o r e the lexieai items are chosen/inserted. Our data suggest that some people may be more prone to making this type of error than are others. Finally, blends are more f r e q u e n t in genres in which people a t t e m p t to use a style that they do not c o m m a n d (e.g., student papers, radio talk shows).

5.0 DIRECTIONS FOR F U T U R E WORK In this paper we have described a f r e q u e n t type of ill-formed input which NLP systems must handle, involving the use of n o n - s t a n d a r d prepositions to m a r k arguments. We presented a classification of these errors and described our algorithm for handling some of these error types. The importance of h a n d l i n g such n o n - s t a n d a r d input will increase as speech recognition becomes more reliable, because spoken input is less formal.

In t h e n e a r t e r m , p l a n n e d e n h a n c e m e n t s include adjusting the weighting scheme to more a c c u r a t e l y r e f l e c t the empirical data. A f r e q u e n c y - based model of preposition usage, based on a much larger and b r o a d e r sampling of text will improve system h a n d l i n g of those errors.

ACKNOWLEDGEMENTS

We would like to express our a p p r e c i a t i o n o f o u r c o l l e a g u e s ' c o n t r i b u t i o n s to the SRA NLP system: Gayle Aycrs, A n d r e w FanG, Ben Fine, K a r y n G e r m a n , Mary Dee Harris, David Reel, and Robert M. Simmons.

R E F E R E N C E S

1. Aitchison, Jean. 1987.

Words in the Mind.

Blackwell, NY.

2. Blejer, Hatte a n d Sharon Flank. 1988. More E v i d e n c e f o r the Semantic Basis of Prepositional Case Marking, d e l i v e r e d December 28, 1988, Linguistic Society of A m e r i c a A n n u a l Meeting, New Orleans. 3. Bresnan, Joan, cd. 1982.

The Mental

Representation of Grammatical Relations.

MIT Press, Cambridge.

4. Carbonell, Jaime and Philip Hayes. 1983. R e c o v e r y S t r a t e g i e s f o r P a r s i n g E x t r a g r a m m a t i c a l Language.

American

Journal of Computational Linguistics

9(3-4):

123-146.

5. Chierchia, G e n n a r o , Barbara Partee, and R a y m o n d T u r n e r , eds. 1989.

Properties,

Types and Meaning.

K l u w e r , Dordrecht. 6. Chomsky, Noam. 1981.

Lectures on

Government and Binding.

Foris, Dordrecht. 7. Croft, William. 1986.

Categories and

Relations in Syntax:

The Clause-Level

Organization

of

Information.

Ph.D.

Dissertation, S t a n f o r d University.

8. Dahlgren, Kathleen. 1988.

Naive

S e m a n t i c s

f o r

N a t u r a l

L a n g u a g e

Understanding.

K l u w e r , Boston.

9. Dirven, Rene and G u n t e r R a d d e n , eds. 1987.

Concepts o/ Case.

G u n t e r Narr, Tubingen.

10. Dowry, David. 1989. On the Semantic Content of the Notion of ' T h e m a t i c Role'. In Chierchia, et al. II:69-129.

11. Foley, William and R o b e r t Van Valin Jr. 1984.

Functional Syntax and Universal

Grammar.

Cambridge Univ. Press,

(8)

12. Gawron, Jean Mark. 1988.

Lexical

Representations and the Semantics of

Complementation.

Garland, NY.

13. Gazdar, Gerald, Ewan Klein, Geoffrey

Pullum, and Ivan Sag. (GKPS) 1985.

Generalized Phrase Structure Grammar.

Harvard Univ. Press, Cambridge.

14. Gruber,

Jeffrey.

1 9 7 6 .

Lexical

Structures in Syntax and Semantics.

North-

Holland, Amsterdam.

15. Haiman, John. 1985.

Natural Syntax:

lconicity

and

Erosion.

Cambridge

University Press, Cambridge.

16.

Hirst,

Graeme.

1 9 8 7 .

Semantic

Interpretation

and

the

Resolution

of

Ambiguity.

Cambridge University Press,

Cambridge.

17. Ikegami, Yoshihiko. 1987. 'Source' vs.

'Goal': a Case of Linguistic Dissymetry, in

Dirven and Radden 122-146.

18. Jackendoff, Ray. 1983.

Semantics and

Cognitwn.

19. Jensen, Karen, George Heidorn, Lance

Miller and Yael Ravin. 1983. Parse Fitting

and Prose Fixing: Getting a Hold on Ill-

formedness.

American

Journal

o f

Computational Linguistics

9(3-4): 147-160.

20. Menyuk, Paula. 1969.

Sentences Children

Use.

21. Miller, Glenn and Philip Johnson-Laird.

1976.

Language and Perception.

Harvard

University Press, Cambridge.

22. Ostler, Nicholas. 1980. A Theory of

Case Linking and Agreement.

Indiana

University Linguistics Club.

23. Pinker, Steven. 1982. A Theory of the

Acquisition

of

Lexical

Interpretive

Grammars, in Bresnan 655-726.

24. Shopen, Timothy, ed. 1985.

Language

Typology

and

Syntactic

Description.

Cambridge University Press, Cambridge.

25. Somers, H. L. 1987.

Valency and Case in

Computational

Linguistics.

Edinburgh

University Press, Edinburgh.

26. Talmy, Leonard. 1985. Lexicalization

Patterns: Semantic Structure in Lexical

Forms. In Shopen III:57-149.

27. Tomita, Masuru. 1986.

Efficient Parsing

for Natural Language.

Kluwer, Boston.

28. Vendler, Zeno. 1967.

Linguistics in

Philosophy.

Cornell University Press,

Ithaca.

29.

Weischedel,

Ralph

and

Norman

Sondheimer. 1983. Meta-rules as a Basis for

Processing Ill-Formed Input.

American

Journal of Computational Linguistics

9(3-

4):161-177.

APPENDIX A. DATA SOURCES

ATC: National

Public

Radio

news

program, "All Things Considered"

ME:

National

Public

Radio

news

program, "Morning Edition"

WE:

National

Public

Radio

news

program, "Weekend Edition"

MCS: WAMU radio, Washington D.C.,

interview program, "The Mike Cuthbert

Show"

DRS: WAMU radio, Washington D.C.,

interview program, "Diane Rehm Show"

FFS:

WAMU radio, Washington D.C.,

interview program, "Fred Fiske Saturday"

AIH: Canadian Broadcasting Company

radio news program, "As It Happens"

NYT: The New York Times

WP:

The Washington Post