The Acquisition and Use of Context Dependent Grammars for English

28 

The Acquisition and Use of Context-Dependent Grammars for English. Robert E Simmons* University of Texas. Yeong-Ho Yu t University of Texas. This paper introduces a paradigm of context-dependent grammar (CDG) and an acquisition system that, through interactive teaching sessions, accumulates the CDG rules. The resulting context-sensitive rules are used by a stack-based, shift~reduce parser to compute unambiguous syntactic structures of sentences. The acquisition system and parser have been applied to the phrase structure and case analyses of 345 sentences, mainly from newswire stories, with 99% accuracy. Extrapolation from our current grammar predicts that about 25 thousand CDG rule examples will be sufficient to train the system in phrase structure analysis of most news stories. Overall, this research concludes that CDG is a computationally and conceptually tractable approach for the construction of sentence grammar for large subsets of natural language text.. 1. I n t r o d u c t i o n . An enduring goal for natural language processing (NLP) researchers has been to con- struct c o m p u t e r programs that can read narrative, descriptive texts such as n e w s p a p e r stories and translate them into k n o w l e d g e structures that can a n s w e r questions, clas- sify the content, and provide summaries or other useful abstractions of the text. An essential aspect of any such N L P system is p a r s i n g - - t o translate the indefinitely long, recursively e m b e d d e d strings of w o r d s into definite ordered structures of constituent elements. Despite decades of research, parsing remains a difficult computation that often results in incomplete, a m b i g u o u s structures; and computational grammars for natural languages remain notably incomplete. In this p a p e r w e suggest that a solution to these problems m a y be f o u n d in the use of context-sensitive rules applied b y a deterministic s h i f t / r e d u c e parser.. A system is described for rapid acquisition of a context-sensitive g r a m m a r b a s e d on ordinary n e w s text. The resulting g r a m m a r is accessed b y deterministic, bottom- u p parsers to c o m p u t e phrase structure or case analyses of texts that the grammars coven The acquisition system allows a linguist to teach a CDG g r a m m a r b y showing examples of parsing successive constituents of sentences. At this writing, 16,275 ex- ample constituents have been s h o w n to the system and u s e d to parse 345 sentences ranging from 10 to 60 w o r d s in length achieving 99% accuracy. These examples com- press to a g r a m m a r of 3,843 rules that are equally effective in parsing. Extrapolation from o u r data suggests that acquiring an almost complete phrase structure g r a m m a r for AP Wire text will require a b o u t 25,000 example rules. The procedure is further d e m o n s t r a t e d to a p p l y directly to computing superficial case analyses from English sentences.. • Department of Computer Sciences, AI Lab, University of Texas, Austin TX 78712. E-mail @cs.texas.edu t Boeing Helicopter Computer Svces, Philadelphia, PA. (~) 1992 Association for Computational Linguistics. Computational Linguistics Volume 18, Number 4. One of the first lessons in natural or formal language analysis is the C h o m s k y (1957) hierarchy of formal grammars, which classifies grammar forms from unre- stricted rewrite rules, through context-sensitive, context-free, and the most restricted, regular grammars. It is usually conceded that pure, context-free grammars are not p o w e r f u l e n o u g h to account for the syntactic analysis of natural languages (NL) such as English, Japanese, or Dutch, and most NL research in computational linguistics has u s e d either a u g m e n t e d context-flee or ad hoc grammars. The conventional w i s d o m is that context-sensitive grammars p r o b a b l y w o u l d be too large and conceptually and computationally untractable. There is also an u n s p o k e n supposition that the use of a context-sensitive g r a m m a r implies using the kind of complex parser required for parsing a fully context~sensitive language.. However, NL research b a s e d on simulated neural n e t w o r k s took a context-based approach. One of the first hints came from the striking finding from Sejnowski and Rosenberg's NETtalk (1988), that seven-character contexts were largely sufficient to m a p each character of a printed w o r d into its corresponding phoneme---where each character actually m a p s in various contexts into several different phonemes. For ac- complishing linguistic case analyses McClelland and K a w a m o t o (1986) and Miikku- lainen and Dyer (1989) u s e d the entire context of phrases and sentences to m a p string contexts into case structures. Robert Allen (1987) m a p p e d nine-word sentences of En- glish into Spanish translations, and Yu and Simmons (1990) accomplished comparable context-sensitive translations b e t w e e n English and G e r m a n simple sentences. It w a s apparent that the contexts in which a w o r d occurred p r o v i d e d information to a neural n e t w o r k that w a s sufficient to select correct w o r d sense and syntactic structure for otherwise a m b i g u o u s usages of language.. In order to solve a problem of accepting indefinitely long, complex sentences in a fixed-size neural network, Simmons and Yu (1990) s h o w e d a m e t h o d for training a n e t w o r k to act as a context-sensitive grammar. A sequential p r o g r a m accessed that g r a m m a r with a deterministic, single-path parser and accurately parsed descriptive texts. Continuing that research, 2,000 rules w e r e accumulated and a n e t w o r k w a s trained using a back-propagation method. The training of this n e t w o r k required ten days of continuous c o m p u t a t i o n on a Symbolics Lisp Machine. We o b s e r v e d that the training cost increased b y m o r e than the square of the n u m b e r of training examples and calculated that 10,000-20,000 rules might well tax a supercomputer. So w e decided that storing the grammar in a hash table w o u l d form a far less expensive option, p r o v i d e d w e could define a selection algorithm comparable to that p r o v i d e d b y the trained neural network.. In this p a p e r w e describe such a selection formula to select rules for context- sensitive parsing, a system for acquiring context-sensitive rules, and experiments in analysis and application of the g r a m m a r to ordinary n e w s p a p e r text. We s h o w that the application of context-sensitive rules b y a deterministic s h i f t / r e d u c e parser is a conceptually and computationally tractable approach to NLP that m a y allow us to accumulate practical grammars for large subsets of English texts.. 2. Context-Dependent Parsing. In NL research most interest has centered on context-free grammars (CFG), a u g m e n t e d with feature tests and transformations, u s e d to describe the phrase structure of sen- tences. There is a b r o a d literature on Generalized Phrase Structure G r a m m a r (Gazdar et al. 1985), Unification Grammars of various t y p e s (Shieber 1986), and A u g m e n t e d . 392. Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English. Transition N e t w o r k s (J. Allen 1987). Gazdar (1988) calls attention to a subcategory of context-sensitive grammars called indexed languages and illustrates some applicability to natural languages, and Joshi illustrates an application of "mild context-sensitivity" (Joshi 1987), b u t in general, N L computation with context-sensitive grammars is a largely unexplored area.. While a few a d v a n c e d N L P laboratories have d e v e l o p e d grammars and parsing capabilities for significantly large subsets of natural language, 1 it cannot be denied that massive effort w a s required and that the results are plagued b y a m b i g u o u s interpreta- tions. These grammars are typically a context-free form, a u g m e n t e d b y complex feature tests, transformations, and occasionally, arbitrary programs. The combination of even an efficient parser with such intricate grammars m a y greatly increase computational complexity of the parsing system (Tomita 1985). It is extremely difficult to write and maintain such grammars, and they m u s t frequently be revised and retested to ensure internal consistency as n e w rules are added. We argue here that an acquisition sys- tem for accumulating context-sensitive rules and their application b y a deterministic s h i f t / r e d u c e parser will greatly simplify the process of constructing and maintaining natural language parsing systems.. Although w e use context-sensitive rules of the form. u X v ~ u Y v . they are interpreted b y a s h i f t / r e d u c e parser with the result that they can be applied successfully to the LR(k) subset of context-free languages. Unless the parser is aug- m e n t e d to include shifts in both directions, the system cannot parse context-sensitive languages. It is an o p e n question as to w h e t h e r English is or is not context-sensitive, b u t it definitely includes discontinuous constituents that m a y be separated b y indefi- nitely m a n y symbols. For this reason, future d e v e l o p m e n t s of the system m a y require operations b e y o n d shift and reduce in the parser. To avoid the easy misinterpretation that our present system applies to context-sensitive languages, w e call it Context- D e p e n d e n t Grammar (CDG).. We begin with the simple notion of a s h i f t / r e d u c e parser. Given a stack and an input string of symbols, the s h i f t / r e d u c e parser m a y only shift a s y m b o l to the stack (Figure la) or reduce n symbols on the stack b y rewriting them as a single s y m b o l (Figure lb). We further constrain the parser to reduce no more than t w o symbols on the stack to a single symbol. The parsing terminates w h e n the stack contains only a single root element and the input string is empty. Usually this class of parser applies a CFG to a sentence, b u t it is equally applicable to CDG.. 2.1 C D G R u l e F o r m s The theoretical viewpoint is that the parse of a sentence is a sequence of states, each c o m p o s e d of a condition of the stack and the input string. The sequence ends success- fully w h e n the stack contains only the root element (e.g. SNT), and the input string is. 1 Notable examples include the large a u g m e n t e d CFGs at IBM Yorktown Hts, the Univ. of Pennsylvania, a n d the Linguistic Research Ctr. at the Univ. of Texas.. 393. Computational Linguistics Volume 18, Number 4. I N P U T S E N T E N C E . t._i C i + l t__i+2 . . . . . t. tn. N r k . S T A C K . bottom. I N P U T S E N T E N C E . t . i + l t 1+2 f_i+$ . . . . . t_ln. NT._~,. t l. S T A C K . bottom. t i,t m,.~,t I are terminals. N'T_'~ is a ~ o n - t e r m i n a l _ . (a) Shift Operation I N P U T S E N T E N C E I N P U T S E N T E N C E . t i t i+1 t_i+2 . . . . t_i t_i+l t__i+2 . . . . . t . m . A~'_t. S T A C K . b o t t o m . a~rd U. S T A C K . b o t t o m . t i,t m , . . . , t I a r e t e r m i n a l s . N'T_~, N T ~ a r e n o n - t e r m i n a l s . . (b) Reduce Operation. Figure 1 Shift/reduce parser.. empty. Each state can be seen as the left half of a context-sensitive rule w h o s e right half is the succeeding state.. stacksinputs ~ stacks+ l i n p u t s + l. However, sentences may be of any length and are often more than forty words, so the resulting strings and stacks w o u l d form very cumbersome rules of variable lengths. To avoid this difficulty, the stack and input parts of a rule are limited to five symbols each. In the following example the stack and input parts are separated by the symbol "*/' as the idea is applied to the sentence "The old man from Spain ate fish." The symbol _ stands for blank, art for article, adj for adjective, p for preposition, n for noun, and v for verb. The syntactic classes are assigned by dictionary lookup in a context-sensitive dictionary. 2. 394. Robert F. Simmons and Yeong-Ho Yu Context-Dependent G r a m m a r s for English. The old man from Spain ate fish. art adj n p n v n. * art adj n p n. art * adj n p n v. _ _ _ art adj * n p n v n. _ _ art adj n * p n v n _. _ _ _ art np * p n v n _. . . . . n p * p n v n _ . _ _ _np p* n vn_ _. _ _np p n* v n_ _ _. _ _ _ np pp * v n _ _ _. . . . . n p * v n _ _ _ . _ _ _ n p v * n . _ _ n p v n * . _ _ _ np vp *. snt *. T h e a n a l y s i s t e r m i n a t e s w i t h a n e m p t y i n p u t s t r i n g a n d t h e s i n g l e s y m b o l " s n t " o n t h e stack, s u c c e s s f u l l y c o m p l e t i n g t h e p a r s e . N o t e t h a t t h e first f o u r o p e r a t i o n s c a n b e d e s c r i b e d as shifts f o l l o w e d b y t h e t w o r e d u c t i o n s , adj n ~ n p , a n d a r t n p ~ n p . S u b s e q u e n t l y t h e p a n d n w e r e s h i f t e d o n t o t h e s t a c k a n d t h e n r e d u c e d to a p p ; t h e n t h e n p a n d p p o n t h e s t a c k w e r e r e d u c e d to a n n p , f o l l o w e d b y t h e s h i f t i n g o f v a n d n, t h e i r r e d u c t i o n to v p , a n d a final r e d u c t i o n o f n p v p ---* snt. I l l u s t r a t i o n s s i m i l a r to this a r e o f t e n u s e d to i n t r o d u c e t h e c o n c e p t o f p a r s i n g in A I texts o n n a t u r a l l a n g u a g e (e.g.J. A l l e n 1987).. We c o u l d p e r f e c t l y w e l l r e c o r d t h e g r a m m a r in p a i r s o f s u c c e s s i v e s t a t e s as follows:. _ _ _ n p p * n v n __--* __ n p p n * v n _ _ _ __ n p p n * v n _ _ _ 7 _ _ _ n p p p * v n _ _ _ . b u t s o m e e c o n o m y c a n b e a c h i e v e d b y r e c o r d i n g t h e o p e r a t i o n a n d p o s s i b l e l a b e l as t h e r i g h t h a l f o f a rule. So f o r t h e e x a m p l e i m m e d i a t e l y a b o v e , w e record:. _ _ _ n p p * n v n _ _ - - + ( S ) . _ _ n p p n * v n _ _ _ - - * ( R p p ) . w h e r e S shifts a n d (R p p ) r e p l a c e s t h e t o p t w o e l e m e n t s o f t h e s t a c k w i t h p p to f o r m t h e n e x t s t a t e o f t h e p a r s e . . T h u s a windowed context o f t e n s y m b o l s is c r e a t e d as t h e left h a l f o f a r u l e a n d a n o p e r a t i o n as t h e r i g h t half. N o t e t h a t if t h e s t a c k w e r e l i m i t e d to t h e t o p t w o e l e m e n t s , a n d t h e i n p u t to a single e l e m e n t , t h e r u l e s y s t e m w o u l d r e d u c e to a b i n a r y r u l e CFG.. T h e e x a m p l e in F i g u r e 2 s h o w s h o w a s e n t e n c e " T r e a t m e n t is a c o m p l e t e r e s t a n d a s p e c i a l d i e t " is p a r s e d b y a c o n t e x t s e n s i t i v e s h i f t / r e d u c e p a r s e r . T e r m i n a l s y m b o l s a r e l o w e r c a s e , w h i l e n o n t e r m i n a l s a r e u p p e r c a s e . T h e s h a d e d a r e a s r e p r e s e n t t h e p a r t s . 2 Described in Section 7.3.. 395. Computational Linguistics Volume 18, Number 4. T r e a t m e n t i s a c o m p l e t e r e s t a n d a s p e c i a l d i e t . ( n v d e t a d j n c n j d e t a d j n). I I n p u t bottom-~------~ ~ top n e ~ ~ last. N!i.. . . . . . . . ...........-.-...........-...-.. iiiiiiiiiii!iliiii ii iii~iiii::i::i::i::~ iiiiiiiiiiiiiii !ii o. i !{{{ii{iiiiiiiiiN{{!;!{, n. n. n v. n v d et. n v d et adj. v det adj n. n v det N P . n v N P . n v N P cnj. v N P cnj det. N P cnj d et adj. cnj det adj n. N P cnj d et N P . v N P cnj N P . n v N P C N P n v N P . n V P . S. n v det adj n. v det adj n cnj. det adj n cnj det. adj n cnj det adj. n cnj det adj n. cnj d et adj n. cnj det adj n. cnj det adj n. d et adj n. adJ n. n. W i n d o w e d C o n t e x t . O p e r a t i o n . shift. shift. shift. shift. shift. r e d u c e to N P . r e d u c e to N P . shift. shift. shift. shift. r e d u c e to N P . r e d u c e to N P . r e d u c e to C N P . r e d u c e to N P . r e d u c e to V P . r e d u c e to S. done. F i g u r e 2 An example of windowed context.. of t h e c o n t e x t invisible to the s y s t e m . T h e n e x t o p e r a t i o n is so l el y d e c i d e d b y t h e w i n d o w e d context. It c a n b e o b s e r v e d t h a t the last state in t h e an al y si s is t h e single s y m b o l S N T - - t h e d e s i g n a t e d r o o t s y m b o l , o n t h e stack a l o n g w i t h a n e m p t y i n p u t string, s u c c e s s f u l l y c o m p l e t i n g t h e parse.. A n d this is t h e C D G f o r m of r u l e u s e d in t h e p h r a s e s t r u c t u r e analysis.. 2 . 2 A l g o r i t h m f o r t h e S h i f t / R e d u c e P a r s e r . T h e p a r s e r a c c e p t s a s t r i n g of s y n t a c t i c w o r d classes as its i n p u t a n d f o r m s a ten- s y m b o l vector, five s y m b o l s e a c h f r o m t h e stack a n d t h e i n p u t string. It l o o k s u p this v e c t o r as t h e left half of a p r o d u c t i o n in t h e g r a m m a r a n d i n t e r p r e t s t h e r i g h t h al f o f t h e p r o d u c t i o n as a n i n s t r u c t i o n to m o d i f y the s t a c k a n d i n p u t s e q u e n c e s to c o n s t r u c t t h e n e x t state of the parse. To a c c o m p l i s h t h e s e tasks, it m a i n t a i n s t w o stacks, o n e f o r the i n p u t s t r i n g a n d o n e for t h e s y n t a c t i c c o n s t i t u e n t s . T h e s e stacks m a y b e a r b i t r a r i l y large.. A n a l g o r i t h m f o r t h e p a r s e r is d e s c r i b e d in F i g u r e 3. T h e m o s t i m p o r t a n t p a r t of this a l g o r i t h m is to find a n a p p l i c a b l e C D G r u l e f r o m t h e g r a m m a r . F i n d i n g s u c h a r u l e is b a s e d o n t h e c u r r e n t w i n d o w e d context. If t h e r e is a r u l e w h o s e left side e x a c t l y m a t c h e s t h e c u r r e n t w i n d o w e d c o n t e x t , t h a t r u l e will b e a p p l i e d . H o w e v e r , realistically, it is o f t e n t h e case t h a t t h e r e is n o exact m a t c h w i t h a n y rule. T h e r e f o r e , it is n e c e s s a r y to find a r u l e t h a t best m a t c h e s t h e c u r r e n t context.. 396. Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English. C D - S R - P a r s e r ( I n p u t , C d g ) . Input is a string of syntactic classes for the given sentence. Cdg is the given CDG grammar rules.. Stack := empty do until(Input = empty a n d Stack = (SNT)). Windowed-context := Append(Top-five(stack),First-ilve(input)) Operation := Consult_CDG(Window-context,Cdg) i f First(Operation) = SHIFT t h e n Stack := Push(First(Input),Stack). Input := Rest(Input) e l s e Stack := Push(Second(Operation),Pop(Pop(Sta~k))). end do. The functions, Top_five and First-five, return the lists o f top (or first) five elements o f the Stack and the Input respectively. I f there are not enough elements, these procedures pad with blanks. The function Append concatenates two lists into one. Consult_CDG consults the given CDG rules to find the next operation to take. The details o f this function are the subject of the next section. Push and Pop add or delete one element to/from a stack while First and Second return the first or second elements o f a list, respectively. Rest returns the given list minus the first element.. Figure 3 Context-sensitive shift reduce parser.. 2.3 Consulting the CDG Rules There are two related issues in consulting the CDG rules. One is the computational representation of CDG rules, and the other is the method for selecting an applicable rule.. In the traditional CFG paradigms, a CFG rule is applicable if the left-hand side of the rule exactly matches the top elements of the stack. However, in our CDG paradigm, a perfect match between the left side of a CDG rule and the current state cannot be assured, and in most cases, a partial match must suffice for the rule to be applied. Since m a n y rules m a y partially match the current context, the best matching rule should be selected.. One w a y to do this is to use a neural network. Through the back-propagation algorithm (Rumelhart, Hinton, and Williams 1986), a feed-forward network can be trained to memorize the CDG rules. After successful training, the network can be used to retrieve the best matching rule. However, this approach based on ~ neural network usually takes considerable training time. For instance, in our previous experiment (Simmons and Yu 1990), training a network for about 2,000 CDG rules took several days of computation. Therefore, this approach has an intrinsic problem for scaling up, at least on the present generation of neural net simulation software.. Another method is based on a hash table in which every CDG rule is stored according to its top two elements of the stack--the fourth and fifth elements of the left half of the rule. Given the current w i n d o w e d context, the top two elements of the stack are used to retrieve all the relevant rules from the hash table.. 397. Computational Linguistics Volume 18, Number 4. We use no more than 64 w o r d and phrase class symbols, so there can be no more than 4,096 possible pairs. The effect is to divide the large n u m b e r of rules into no more than 4,096 subgroups, each of which will have a manageable subset. In fact, with 16,275 rules w e discovered that w e have only 823 pairs and the average n u m b e r of rules per s u b g r o u p is 19.8; however, for frequently occurring pairs the n u m b e r of rules in the s u b g r o u p s can be m u c h larger. The problem is to determine w h a t scoring formula should be u s e d to find the rule that best matches a parsing context.. Sejnowski and Rosenberg (1988) analyzed the weight matrix that resulted from training NETtalk and discovered a triangular function with the apex centered at the character in the w i n d o w and the weights falling off in proportion to distance from that character. We decided that the best matching rule in our system w o u l d follow a similar pattern with m a x i m u m weights for the t o p t w o elements on the stack with weights decreasing in b o t h directions with distance from those positions. The scoring function w e use is d e v e l o p e d as follows:. Let T4 b e the set of vectors {RI~R2,... , R n } w h e r e Ri is the vector [rl, r 2 , . . . , rl0] Let C b e the vector [Cl, Ca,..., c10] Let # ( c i , ri) b e a matching function w h o s e value is 1 if ci = ri, and 0 otherwise.. TZ is the entire set of rules, Ri is (the left half of) a particular rule, and C is the parse context.. Then/-4' is the subset of T4 w h e r e if Ri E T~ I then # ( r i 4 , c 4 ) • #(ris~cs) = 1. Access of the hash table with the top t w o elements of the stack, c4, c5 p r o d u c e s . the set T4'. We can n o w define the scoring function for each Ri C T~ I.. 3 i0. Score = ~ _ , t~(ci, ri) . i + ~ _ , # ( c i , ri)(11 - i ) i=1 i=6. The first s u m m a t i o n scores the matches b e t w e e n the stack elements of the rule and the current context, and the second s u m m a t i o n scores the matches b e t w e e n the elements in the input string. If t w o items of the rule and context match, the total score is increased b y the weight assigned to that position. The m a x i m u m score for a perfect match is 21 according to the above formula.. From several experiments, varying the length of vector and the weights, particu- larly those assigned to blanks, it has b e e n d e t e r m i n e d that this formula gave the best performance a m o n g those tested. More importantly, it has w o r k e d well in the current phrase structure and case analysis experiments.. It w a s an u n e x p e c t e d surprise to us 3 that using context-sensitive productions, an elementary, deterministic, parsing algorithm p r o v e d a d e q u a t e to provide 99% correct, u n a m b i g u o u s anAalyses for the entire text studied.. 3. Grammar Acquisition for CDG. Constructing an a u g m e n t e d phrase structure grammar of w h a t e v e r t y p e unification, GPSG, or A T N - - i s a painful process usually involving a well-trained linguistic team of several people. These types of g r a m m a r require that a CFG recognition rule such. 3 But p e r h a p s not to Marcus (1980) a n d Berwick (1985), w h o p r o m o t e the s t u d y of deterministic parsing.. 398. Robert F. Simmons and Yeong-Ho Yu Context-Dependent G r a m m a r s for English. as n p v p ~ s n t b e s u p p o r t e d b y s u c h a d d i t i o n a l i n f o r m a t i o n as t h e fact t h a t t h e n p a n d v p a g r e e in n u m b e r , t h a t t h e n p is c h a r a c t e r i z e d b y p a r t i c u l a r f e a t u r e s s u c h a s count, animate, etc., a n d t h a t t h e v p c a n o r c a n n o t a c c e p t c e r t a i n t y p e s o f c o m p l e m e n t s . T h e a d d i t i o n a l f e a t u r e s m a k e t h e r u l e s e x c e e d i n g l y c o m p l e x a n d difficult to p r e p a r e a n d d e b u g . C o l l e g e s t u d e n t s c a n b e t a u g h t e a s i l y to m a k e a p h r a s e s t r u c t u r e tree to r e p r e s e n t a s e n t e n c e , b u t it r e q u i r e s c o n s i d e r a b l e linguistic t r a i n i n g to d e a l s u c c e s s f u l l y w i t h a f e a t u r e g r a m m a r . . We h a v e s e e n in t h e p r e c e d i n g s e c t i o n t h a t a C F G is d e r i v e d f r o m r e c o r d i n g t h e s u c c e s s i v e s t a t e s o f t h e p a r s e s o f s e n t e n c e s . T h u s it w a s n a t u r a l f o r u s to d e v e l o p a n i n t e r a c t i v e a c q u i s i t i o n s y s t e m t h a t w o u l d a s s i s t a l i n g u i s t (or a s t u d e n t ) in c o n s t r u c t i n g s u c h p a r s e s to p r o d u c e e a s i l y l a r g e sets o f e x a m p l e C F G rules. 4 T h e s y s t e m c o n t i n u e d to e v o l v e a s a c o n s e q u e n c e o f o u r u s e u n t i l w e h a d i n c l u d e d c a p a b i l i t i e s to:. • r e a d in text a n d d a t a files. • c o m p i l e d i c t i o n a r y a n d g r a m m a r t a b l e s f r o m c o m p l e t e d t e x t files. • select a s e n t e n c e to c o n t i n u e p r o c e s s i n g o r r e v i s e . • l o o k u p w o r d s in a d i c t i o n a r y to s u g g e s t t h e s y n t a c t i c class f o r t h e w o r d in c o n t e x t w h e n a s s i g n i n g s y n t a c t i c classes to t h e w o r d s in a s e n t e n c e . • c o m p a r e e a c h s t a t e o f t h e p a r s e w i t h r u l e s in t h e c u r r e n t g r a m m a r to p r e d i c t t h e s h i f t / r e d u c e o p e r a t i o n . A c a r r i a g e r e t u r n s i g n a l s t h a t t h e u s e r a c c e p t s t h e p r o m p t , o r t h e t y p i n g in o f t h e d e s i r e d o p e r a t i o n o v e r r i d e s it.. • c o m p u t e a n d d i s p l a y t h e p a r s e t r e e f r o m t h e local g r a m m a r a f t e r c o m p l e t i o n o f e a c h s e n t e n c e , o r f r o m t h e g l o b a l t o t a l g r a m m a r a t a n y t i m e . • p r o v i d e b a c k i n g u p a n d e d i t i n g c a p a b i l i t y to c o r r e c t e r r o r s . • p r i n t h e l p m e s s a g e s a n d g u i d e t h e u s e r . • c o m p i l e d i c t i o n a r y a n d g r a m m a r e n t r i e s a t t h e c o m p l e t i o n o f e a c h s e n t e n c e , i n s u r i n g n o d u p l i c a t e e n t r i e s . • s a v e c o m p l e t e d o r p a r t i a l l y c o m p l e t e d g r a m m a r files.. T h e r e s u l t i n g tool, G R A M A Q , e n a b l e s a l i n g u i s t to c o n s t r u c t a c o n t e x t - s e n s i t i v e g r a m m a r f o r a t e x t c o r p u s a t t h e r a t e o f s e v e r a l s e n t e n c e s p e r hour. T h o u s a n d s o f r u l e s a r e a c c u m u l a t e d w i t h o n l y w e e k s o f e f f o r t in c o n t r a s t to t h e y e a r s r e q u i r e d f o r a c o m p a r a b l e s y s t e m o f a u g m e n t e d C F G rules. A b o u t t e n w e e k s o f e f f o r t w e r e r e q u i r e d to p r o d u c e t h e 16,275 r u l e s o n w h i c h this s t u d y is b a s e d . Since G R A M A Q ' s p r o m p t s b e c o m e m o r e a c c u r a t e as t h e d i c t i o n a r y a n d g r a m m a r g r o w in size, t h e r e is a p o s i t i v e a c c e l e r a t i o n in t h e s p e e d o f g r a m m a r a c c u m u l a t i o n a n d t h e l i n g u i s t ' s t a s k g r a d u a l l y c o n v e r g e s to o n e o f a l e r t s u p e r v i s i o n o f t h e s y s t e m ' s p r o m p t s . . A s l i g h t l y d i f f e r e n t v e r s i o n o f G R A M A Q is C a s e a q , w h i c h u s e s o p e r a t i o n s t h a t c r e a t e case c o n s t i t u e n t s to a c c u m u l a t e a c o n t e x t - s e n s i t i v e g r a m m a r t h a t t r a n s f o r m s . 4 Starting with an Emacs editor, it was fairly easy to read in a file of sentences and to assign each word its syntactic class according to its context. Then the asterisk was inserted at the beginning of the syntactic string, the string was copied to the next line, the asterisk moved if a shift operation was indicated, or the top two symbols on the stack were rewritten if a reduce was required--just as we constructed the example in the preceding section. Naturally enough, we soon made Emacs macros to help us, and then escalated to a Lisp program that would print the stack-*-string and interpret our shift/reduce commands to produce a new state of the parse.. 399. Computational Linguistics Volume 18, Number 4. Text States Sentences Wds/Snt Mn-Wds/Snt. Hepatitis 236 12 4-19 10.3 Measles 316 10 4-25 16.3 News Story 470 10 9-51 23.5 APWire-Robots 1005 21 11-53 26.0 APWire-Rocket 1437 25 8-47 29.2 APWire-Shuttle 598 14 12-32 21.9. Total 4062 92 4-53 22.8. Table 1 Characteristics of a sample of the text corpus.. s e n t e n c e s d i r e c t l y to case s t r u c t u r e s w i t h n o i n t e r m e d i a t e stage o f p h r a s e s t r u c t u r e trees. It has the s a m e f u n c t i o n a l i t y as G R A M A Q b u t a l l o w s t h e l i n g u i s t u s e r to s p e c i f y a case a r g u m e n t a n d v a l u e as t h e t r a n s f o r m a t i o n o f s y n t a c t i c e l e m e n t s o n t h e stack, a n d to r e n a m e t h e h e a d of s u c h a c o n s t i t u e n t b y a s y n t a c t i c label. F i g u r e 9 in S e c t i o n 7.3 illustrates the a c q u i s i t i o n of case g r a m m a r . . 4. Experiments with CDG. T h e r e are a n u m b e r of critical q u e s t i o n s t h a t n e e d b e a n s w e r e d if t h e cl ai m t h a t C D G g r a m m a r s are u s e f u l is to be s u p p o r t e d . . • C a n t h e y b e u s e d to o b t a i n a c c u r a t e p a r s e s f o r real texts?. • D o t h e y r e d u c e a m b i g u i t y in the p a r s i n g p r o c e s s ? . • H o w w e l l d o the rules g e n e r a l i z e to n e w texts?. • H o w large m u s t a CFG b e to e n c o m p a s s t h e sy n t act i c s t r u c t u r e s fo r m o s t n e w s p a p e r text?. 4.1 Parsing and Ambiguity with CDG O v e r t h e c o u r s e of this s t u d y w e a c c u m u l a t e d 345 s e n t e n c e s m a i n l y f r o m n e w s w i r e texts. T h e first t w o articles w e r e b r i e f disease d e s c r i p t i o n s f r o m a y o u t h e n c y c l o p e d i a ; the r e m a i n i n g fifteen w e r e n e w s p a p e r articles f r o m F e b r u a r y 1989 u s i n g t h e t e r m s " s t a r w a r s , " "SDI," o r "Strategic D e f e n s e Initiative." Table 1 c h a r a c t e r i z e s t y p i c a l articles b y t h e n u m b e r of C D G r u l e s o r states, n u m b e r of s e n t e n c e s , t h e r a n g e o f s e n t e n c e l e n g t h s , a n d the a v e r a g e n u m b e r of w o r d s p e r sentence.. We d e v e l o p e d o u r a p p r o a c h to a c q u i r i n g a n d p a r s i n g c o n t e x t - s e n s i t i v e g r a m m a r s o n t h e first t w o s i m p l e texts, a n d t h e n u s e d G R A M A Q to r e d o t h o s e texts a n d to c o n s t r u c t p r o d u c t i o n s f o r the n e w s stories. T h e t o t al t ex t n u m b e r e d 345 s e n t e n c e s , w h i c h a c c u m u l a t e d 16,275 c o n t e x t - s e n s i t i v e r u l e s - - a n a v e r a g e o f 47 p e r s e n t e n c e . . T h e p a r s e r e m b o d y i n g the a l g o r i t h m i l l u s t r a t e d earl i er in F i g u r e I w a s a u g m e n t e d to c o m p a r e t h e c o n s t i t u e n t s it c o n s t r u c t e d w i t h t h o s e p r e s c r i b e d d u r i n g g r a m m a r ac- q u i s i t i o n b y t h e linguist. In p a r s i n g t h e 345 s e n t e n c e s , 335 p a r s e s e x a c t l y m a t c h e d t h e l i n g u i s t ' s o r i g i n a l j u d g e m e n t . In n i n e cases in w h i c h d i f f e r e n c e s o c c u r r e d , t h e p a r s e s w e r e j u d g e d correct, b u t s l i g h t l y d i f f e r e n t s e q u e n c e s of p a r s e states o c c u r r e d . T h e t e n t h case c l e a r l y m a d e a n a t t a c h m e n t e r r o r - - o f a n i n t r o d u c t o r y a d v e r b i a l p h r a s e in t h e s e n t e n c e " H o u r s later, B a g h d a d a n n o u n c e d . . . . " Th i s w a s m i s t a k e n l y a t t a c h e d to " B a g h d a d . " This e v a l u a t i o n s h o w s t h a t the g r a m m a r w a s in p r e c i s e a g r e e m e n t w i t h . 400. Robert F. Simmons and Yeong-Ho Yu Context-Dependent Grammars for English. Another mission soon scheduled t h a t also would have priority over the shuttle is the first firing of a trident two intercontinental range missile from a submerged submarine.. ~ N P ~ art eazotb~ n mis~on. ~ NP adv s o o n V P ~ palm scheduled NP . ~ - xeapron ~ that. v ~ have SNT VP n ~ l~iority. p ~ over art the. n ~ shuttle. - - vbe VP ~ ~ _ _ ~ - ~ ~ ~ the. prprt firing. I ' °' o _ . PP~-% a r t ~ a. NP NP n trident n t w o . x / adj intercontinental < n range. NP missile PP ~ p - from. a. ~ - " NP N - ~ ~ p a p r t submerged r~t" N n submarine. Figure 4 Sentence parse.. the linguist 97% of t h e t i m e a n d c o m p l e t e d c o r r e c t p a r s e s in 99.7% o f t h e 345 s e n t e n c e s f r o m w h i c h it w a s d e r i v e d . Since o u r p r i m a r y int erest w a s in e v a l u a t i n g t h e effective- nes s of the CDG, all t h e s e e v a l u a t i o n s w e r e b a s e d o n u s i n g c o r r e c t sy n t act i c classes for t h e w o r d s in the sentences. T h e c o n t e x t - s e n s i t i v e d i c t i o n a r y l o o k u p p r o c e d u r e d e- s cribed in Section 7.3 is 99.5% a c c u r a t e , b u t it assigns 40 w o r d classes incorrectly. As a c o n s e q u e n c e , u s i n g this p r o c e d u r e w o u l d r e s u l t in a r e d u c t i o n o f a b o u t 10% a c c u r a c y in parsing.. A n o u t p u t of a s e n t e n c e f r o m the p a r s e r is d i s p l a y e d as a t ree in F i g u r e 4. Since the w h o l e m e c h a n i s m is c o d e d in Lisp, the actual o u t p u t o f t h e s y s t e m is a n e s t e d list t h a t is t h e n p r i n t e d as a tree.. N o t i c e in this figure t h a t t h e PP at the b o t t o m m o d i f i e s t h e N P c o m p o s e d o f " t h e first firing of a t r i d e n t t w o i n t e r c o n t i n e n t a l r a n g e missile" n o t just t h e w o r d "firing." Since t h e p a r s i n g is b o t t o m - u p , left-to-right, the c o n s t i t u e n t s are f o r m e d in t h e n a t u r a l o r d e r of w o r d s e n c o u n t e r e d in the s e n t e n c e a n d t h e t e r m i n a l s o f t h e tree can b e r e a d t o p - t o - b o t t o m to g i v e t h e i r o r d e r i n g in the senten ce.. A l t h o u g h 345 s e n t e n c e s t o t a l i n g 8594 w o r d s is a sm al l sel ect i o n f r o m t h e infinite set of possible English sentences, it is large e n o u g h to assu re u s t h a t t h e C D G is a r e a s o n a b l e f o r m of g r a m m a r . Since the d e t e r m i n i s t i c p a r s i n g a l g o r i t h m selects a single i n t e r p r e t a t i o n , w h i c h w e h a v e s e e n a l m o s t p e r f e c t l y a g r e e s w i t h t h e l i n g u i s t ' s p a r s i n g s , it is a p p a r e n t that, at least f o r this size t ex t s a m p l e , t h e r e is little d i ffi cu l t y w i t h a m b i g u o u s i n t e r p r e t a t i o n s . . 401. Computational Linguistics Volume 18, Number 4. 5. Generalization of CDG. The purpose of accumulating sample rules from texts is to achieve a g r a m m a r gen- eral enough to analyze new texts it has never seen. To be useful, the grammar m u s t generalize. There are at least three aspects of generalization to be considered.. H o w well does the g r a m m a r generalize at the sentence level? That is, how well does the g r a m m a r parse new sentences that it has not previously experienced?. H o w well does the g r a m m a r generalize at the operation level? That is, how well does the g r a m m a r predict the correct Shift/Reduce operation during acquisition of new sentences?. H o w m u c h does the rule retention strategy affect generalization? For instance, w h e n the g r a m m a r predicts the same o u t p u t as a new rule does, and the new rule is not saved, h o w well does the resulting grammar parse?. 5.1 Generalization at the Sentence Level The complete parse of a sentence is a sequence of states recognized by the g r a m m a r (whether it be CDG or a n y other). If all the constituents of the new sentence can be recognized, the new sentence can be parsed correctly. It will be seen in a later para- graph that with 16,275 rules, the g r a m m a r predicts the output of new rules correctly about 85% of the time. For the average sentence with 47 states, only 85% or about 40 states can be expected to be predicted correctly; consequently the deterministic parse will frequently fail. In fact, 5 of 14 new sentences parsed correctly in a brief experiment that used a grammar based on 320 sentences to attempt to parse the new, 20-sentence text. Considering that only a single path was followed by the deterministic parser, we predicted that a multiple-path parser w o u l d perform somewhat better for this aspect of generalization. In fact, our initial experiments with a beam search parser resulted in successful parses of 15 of the 20 n e w sentences using the same g r a m m a r based on the 320 sentences.. 5.2 Generalization at the Operation Level This level of generalization is of central significance to the g r a m m a r acquisition system. When GRAMAQ looks up a state in the g r a m m a r it finds the best matching state with the same top two elements on the stack, and offers the right half of this rule as its suggestion to the linguist. H o w often is this prediction correct?. To answer this question we compiled the g r a m m a r of 16,275 rules in cumulative increments of 1,017 rules using a procedure, union-grammar, that w o u l d only a d d a rule to the g r a m m a r if the g r a m m a r did not already predict its operation. We call the result a "minimal-grammar," and it contains 3,843 rules. The black line of Figure 5 shows that with the first 1,000 rules 40% were new; with an accumulation of 5,000, 18% were new rules. By the time 16,000 rules have been accumulated, the curve has flattened to an average of 16% new rules added. This means that the acquisition system will make correct prompts about 84% of the time and the linguist will only need to correct the system's suggestions about 3 or 4 times in 20 context presentations.. 402. Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English. 50. 40. r ~ . -] 30. 20. 10. . . . . . . ~ " . . . . . . t . . . . . . . t . . . . . . . t . . . . . . . t . . . . . . . t . . . . . . . . . . . . . . ! . . . . . . . ! . . . . . . . + . ' " ' " t . . . . . . t . . . . . . . ! . . . . . . . t . . . . . . . ! . . . . . . . ! . . . . . . . ! i..~. I I I I I I l I I l : I -' I ; I , | I I I I : I I l I , I l ; I ; ; ! ' I I " | ; ", ' I I I I i I I I I I , l , , , I ' ' '. ........ i ....... ; ....... ~ ....... i ....... i ....... ~ ....... 1 ....... i ....... 4 ...... 4 ...... ~- ....... i ....... ; ....... i ....... i ....... i. ....... ....... i ....... i ....... i ....... J ...... L ..... i ...... i ...... i ...... J ...... ...... i. ....... * ...... G.,,--.~ ....... i ....... J ....... ! ....... ! ....... i ....... i ....... 4 ...... 4 ...... 4 ....... ! ....... ! ....... ! ....... i ....... i. i i i i I i I -:. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17. Accumulated Rules by Thousands. Figure 5 Generalization of CDG rules.. 5.3 R u l e R e t e n t i o n a n d G e n e r a l i z a t i o n If t w o p a r s i n g g r a m m a r s a c c o u n t e q u a l l y w e l l fo r t h e s a m e sen t en ces, t h e o n e w i t h f e w e r rules is less r e d u n d a n t , m o r e abstract, a n d t h e o n e to b e p r e f e r r e d . We u s e d t h e union-grammar p r o c e d u r e to p r o d u c e a n d s t u d y t h e m i n i m a l g r a m m a r f o r t h e 16,275 rules ( r u l e - e x a m p l e s ) d e r i v e d f r o m the s a m p l e text. U n i o n - g r a m m a r r e c o r d s a n e w r u l e f o r a r u l e - e x a m p l e : s. 1. if best m a t c h i n g r u l e has a n o p e r a t i o n t h a t d o e s n ' t m a t c h . 2. if best m a t c h i n g r u l e ties w i t h a n o t h e r r u l e w h o s e o p e r a t i o n d o e s n o t m a t c h . 3. if 2 is true, a n d score = 21 w e h a v e a full c o n t r a d i c t i o n a n d list t h e r u l e as a n error.. Six c o n t r a d i c t i o n s o c c u r r e d in t h e g r a m m a r ; five w e r e i n c o n s i s t e n t t r e a t m e n t s o f " S N T " f o l l o w e d b y o n e o r m o r e p u n c t u a t i o n m a r k s , w h i l e t h e sixth o f f e r e d b o t h a shift a n d a " p p " f o r a p r e p o s i t i o n - n o u n f o l l o w e d b y a p r e p o s i t i o n . T h e latter case is a n a t t a c h m e n t a m b i g u i t y n o t r e s o l v a b l e b y syntax.. In the first pass as s h o w n in Table 2, t h e text r e s u l t e d in 3,194 ru l es c o m p a r e d w i t h 16,275 possible rules. T h a t is, 13,081 possible C D G r u l e s w e r e n o t r e t a i n e d b e c a u s e a l r e a d y existing rules w o u l d m a t c h a n d p r e d i c t t h e o p e r a t i o n . H o w e v e r , u s i n g t h o s e rules to p a r s e t h e s a m e text g a v e v e r y p o o r results: z e r o c o r r e c t p a r s e s at t h e s e n t e n c e level. T h e r e f o r e , t h e p r o c e s s of c o m p i l i n g a m i n i m a l g r a m m a r w a s r e p e a t e d s t a r t i n g w i t h t h o s e 3,194 rules. This t i m e o n l y 619 n e w ru l es w e r e a d d e d . T h e p u r p o s e o f this. 5 These definite conditions are due to an analysis by Mark Ring.. 403. Computational Linguistics Volume 18, Number 4. Pass UntWined Retained Total Rules. 1 13081 3194 16275 2 15656 619 16275 3 16245 18 16275 4 16275 0 16275. Table 2 Four passes with minimal grammar.. r e p e t i t i o n is to get rid of t h e effect t h a t t h e rul es a d d e d l at er c h a n g e t h e p r e d i c t i o n s m a d e earlier. Finally, in a f o u r t h r e p e t i t i o n of t h e p r o c e s s n o r u l e s w e r e new.. T h e r e s u l t i n g g r a m m a r of 3,843 r u l e s s u c c e e d s in p a r s i n g t h e t ex t w i t h o n l y occa- sional m i n o r e r r o r s in a t t a c h i n g c o n s t i t u e n t s . It is to b e e m p h a s i z e d t h a t t h e u n r e t a i n e d rules are similar b u t n o t i d e n t i c a l to t h o s e in t h e m i n i m a l g r a m m a r . . We c a n o b s e r v e t h a t this t e c h n i q u e of m i n i m a l r e t e n t i o n b y " u n i o n i n g " n e w r u l e s to t h e g r a m m a r results in a c o m p r e s s i o n o f t h e o r d e r 16,275/3,843 o r 4.2 to 1, w i t h o u t increase in error. If this r a t i o h o l d s f o r l a r g e r g r a m m a r s , t h e n if t h e l i n g u i st accu - m u l a t e s 40,000 t r a i n i n g - e x a m p l e r u l e s to a c c o u n t f o r t h e s y n t a x o f a g i v e n s u b s e t o f l a n g u a g e , t h a t g r a m m a r can b e c o m p r e s s e d a u t o m a t i c a l l y to a b o u t 10,000 ru l es t h a t will a c c o m p l i s h t h e s a m e task.. 6. Predicting the S i z e of C D G s . W h e n a n y k i n d of a c q u i s i t i o n s y s t e m is u s e d to a c c u m u l a t e k n o w l e d g e , o n e v e r y i n t e r e s t i n g q u e s t i o n is, w h e n will the k n o w l e d g e b e c o m p l e t e e n o u g h fo r t h e i n t e n d e d a p p l i c a t i o n ? In o u r case, h o w m a n y C D G r u l e s will b e sufficient to c o v e r a l m o s t all n e w s w i r e stories? To a n s w e r this q u e s t i o n , a n e x t r a p o l a t i o n c a n b e u s e d to fi n d a p o i n t w h e n t h e solid line of F i g u r e 5 intersects w i t h t h e y-axis. H o w e v e r , t h e C D G c u r v e is d e s c e n d i n g too s l o w l y to m a k e a reliable e x t r a p o l a t i o n . . T h e r e f o r e , a n o t h e r q u e s t i o n w a s i n v e s t i g a t e d instead: w h e n will t h e C D G r u l e s i n c l u d e a c o m p l e t e set of CFG rules? N o t e t h a t a C D G r u l e is e q u i v a l e n t to a CFG r u l e if the c o n t e x t is l i m i t e d to t h e t o p t w o e l e m e n t s o f t h e stack. W h a t t h e o t h e r e l e m e n t s in the c o n t e x t a c c o m p l i s h is to m a k e o n e r u l e p r e f e r a b l e to a n o t h e r t h a t h as t h e s a m e t o p t w o e l e m e n t s of t h e stack, b u t a d i f f e r e n t context.. We a l l o w 64 s y m b o l s in o u r p h r a s e s t r u c t u r e analysis. T h a t m e a n s , t h e r e are 642 p o s s i b l e c o m b i n a t i o n s for t h e t o p t w o e l e m e n t s o f t h e stack. F o r e a c h c o m b i n a t i o n , t h e r e are 65 possible o p e r a t i o n s : 6 a shift or a r e d u c t i o n t o a n o t h e r s y m b o l . A m o n g 16,275 C D G rules, w e s t u d i e d h o w m a n y d i f f e r e n t CFG r u l e s c a n b e d e r i v e d b y elim- i n a t i n g t h e context. We f o u n d 844 d i f f e r e n t CFG r u l e s t h a t u s e d 600 d i f f e r e n t left-side pairs of s y m b o l s . This s h o w s t h a t a g i v e n c o n t e x t free p a i r o f s y m b o l s a v e r a g e s 1.4 d i f f e r e n t o p e r a t i o n s . 7. T h e n , as w e d i d w i t h C D G rules, w e m e a s u r e d h o w m a n y n e w CF G r u l e s w e r e a d d e d in a n a c c u m u l a t i v e fashion. T h e s h a d e d line o f F i g u r e 5 s h o w s t h e result.. 6 Actually, there are fewer than 65 possible operations since the stack elements can be reduced only to nonterminal symbols.. 7 We actually use only 48 different symbols, so only 482 o r 2,304 combinations could have occurred. The fraction 600/2,304 yields .26, the proportion of the combinatoric space that is actually used, so far.. 404. Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English. 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : ! !. ! ! I. _m° • . . . . . . . o r , . Z 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l. 1 1 0 0 1 , 0 0 0 1 0 , 0 0 0 25,000 1 0 0 , 0 0 0 . N b r o f A c c u m u l a t e d R u l e s . Extrapolation thegray line, predicts that 9 9 % of the context free pairs will be achieved with the accumulation of 25,000 context sensitive rues.. Figure 6 Log-log plot of new CFG rules.. Notice that the line has descended to about 1.5% errors at 16,000 rules. To make an extrapolation easier, a log-log graph shows the same data in Figure 6. From this graph, it can be predicted that, after about 25,000 CDG rules are accumulated, the g r a m m a r will encompass a CFG component that is 99% complete. Beyond this point, additional CDG rules will a d d almost no n e w CFG rules, but only fine-tune the g r a m m a r so that it can resolve ambiguities more effectively.. Also, it is our belief that, after the CDG reaches that point, a multi-path, beam- search parser will be able to parse most newswire stories very reliably. This belief is based on our initial experiment that used a beam search parser to test generalization of the g r a m m a r to find parses for fifteen out of t w e n t y n e w sentences.. 7. A c q u i r i n g Case Grammar. Explicating the phrase structure constituents of sentences is an essential aspect in computer recognition of meaning. Case analysis organizes the constituents into a hi- erarchical structure of labeled propositions. The propositions can be used directly to answer questions and are the basis of schemas, scripts, and frames that are used to a d d meaning to otherwise inexplicit texts. As a result of the experiments with acquir- ing CDG and exploring its properties for parsing phrase structures, we became fairly confident that we could generalize the system to acquisition and parsing based on a grammar that w o u l d compute syntactic case structures directly from syntactic strings. Direct translation from string to structure is supported by neural network experiments such as those by McClelland and Kawamoto (1986), Miikkulainen and Dyer (1989), Yu and Simmons (1990), and Leow and Simmons (1990). We reasoned that if we could acquire case g r a m m a r with something approaching the simplicity of acquiring phrase structure rules, the result could be of great value for NL applications.. 405. Computational Linguistics Volume 18, Number 4. 7.1 Case Structure Cook (1989) reviewed twenty years of linguistic research on case analysis of natural language sentences. He synthesized the various theories into a system that depends on the subclassification of verbs into twelve categories, and it is apparent from his review that with a fine subcategorization of verbs and nominals, case analysis can be accomplished as a purely syntactic operation--subject to the limitations of attachment ambiguities that are not resolvable by syntax. This conclusion is somewhat at variance with those AI approaches that require a syntactic analysis to be followed by a semantic operation that filters and transforms syntactic constituents to compute case-labeled propositions (e.g. Rim 1990), but it is consistent with the neural network experience of directly m a p p i n g from sentence to case structure, and with the AI research that seeks to integrate syntactic and semantic processing while translating sentences to propositional structures.. Linguistic theories of case structure have been concerned only with single propo- sitions h e a d e d by verb predications; they have been largely silent with regard to the structure of n o u n phrases and the relations a m o n g e m b e d d e d and sequential proposi- tions. Additional conventions for managing these complications have been developed in Simmons (1984) and Alterman (1985) and are used here.. The central notion of a case analysis is to translate sentence strings into a nested structure of case relations (or predicates) where each relation has a head term and an indefinite number of labeled arguments. An argument m a y itself be a case relation. Thus a sentence, as in the examples below, forms a tree of case relations.. The old man from Spain ate fish.. (eat Agt (man Mod old From spain) Obj fish). (is Objl Obj2. Another mission scheduledsoonisthefirstfiringofatrident missile from a submerged submarine.. (mission Mod another Obj* (scheduled Vmod soon)) (firing Mod first Det the Of (missile Nmod trident Det a). From (submarine Mod submerged Det a))). Note that mission is in Obj* relation to scheduled. This means the object of scheduled is mission, and the expression can be read as "another mission such that mission is scheduled soon." An asterisk as a suffix to a label always signals the reverse direction for the label.. There is a small set of case relations for verb arguments, such as verbmodifier, agent, object, beneficiary, experiencer, location, state, time, direction, etc. For nouns there are determiner, modifier, quantifier, amount, nounmodifier, preposition, and reverse verb relations, agt*, obj*, ben*, etc. Prepositions and conjunctions are usually used directly as argument labels while sentence conjunctions such as because, while, before, after, etc. are represented as heads of propositions that relate two other propositions with the labels preceding, post, antecedent, and consequent. For example, "Because she ate fish and chips earlier, Mary was not hungry.". (because Ante (ate Agt she Obj (fish And chips) Vmod earlier) Conse (was Vmod not Objl mary State hungry)). Verbs are subcategorized as vao, vabo, vo, va, vhav, vbe where a is agent, o is object, b is beneficiary and vhav is a form of have and vbe a form of be. So far, only the. 406. Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English. subcategory of time has been necessary in subcategorizing n o u n s to accomplish this form of case analysis, b u t in general, a lexical semantics is required to resolve syntactic attachment ambiguities. The complete set of case relations is p r e s u m e d to be small, b u t no one has yet claimed a complete enumeration of them.. Other case systems such as those taught b y Schank (1980) and Jackendoff (1983) classify predicate names into such primitives as Do, Event, Thing, Mtrans, Ptrans, Go, Action, etc., to approximate some form of "language of thought" b u t the present ap- proach is less ambitious, proposing merely to represent in a fairly formal fashion the organization of the w o r d s in a sentence. Subsequent operations on this admittedly superficial class of case structures, w h e n a u g m e n t e d with a system of shallow lexi- cal semantics, have been s h o w n to accomplish question answering, focus tracking of topics t h r o u g h o u t a text, automatic outlining, and summarization of texts (Seo 1990; Rim 1990). One strong constraint on this t y p e of analysis is that the resulting case structure m u s t maintain all information present in the text so that the text m a y be exactly reconstituted from the analysis.. 7.2 Syntactic Analysis of Case Structure We've seen earlier that a s h i f t / r e d u c e - r e n a m e operation is sufficient to parse most sentences into phrase structures. Case structure, however, requires transformations in addition to these operations. To form a case structure it is frequently necessary to change the order of constituents and to insert case labels. Following Jackendoff's principle of grammatical constraint, which argues essentially that semantic interpretation is frequently reflected in the syntactic form, case transformations are accomplished as each syntactic constituent is discovered. Thus w h e n a verb, say throw and an NP, say coconuts are on top of the stack, one m u s t not only create a VP, b u t also decide the case, Obj, and form the constituent, (throw Obj coconuts). This can be accomplished in c u s t o m a r y approaches to parsing b y using a u g m e n t e d context free recognition rules of the form:. V P ~ V P N P / l o b j 2 . w h e r e the n u m b e r s following the slash refer to the text dominated b y the syntactic class in the referenced position, (ordered left-to-right) in the right half of the rule. The resulting constituents can be accumulated to form the case analysis of a sentence (Simmons 1984).. We d e v e l o p a u g m e n t e d context-sensitive rules following the same principle. Let us look again at the example "The old man from Spain ate fish," this time to develop case relations.. * art adj n from n vao n ; shift. art * adj n from n vao n ; shift. art adj * n from n vao n ; shift. art adj n * from n vao n ; i mod. art n * from n vao n ; 1 det. n * from n vao n ; shift. n from * n vao n ; shift. n from n * vao n ; 3 2 1 n * vao n ; shift. n vao * n ; 2 agt. vao * n ; shift. vao n * ; 1 obj. 2 (man Mod old). 2 (man Mod old Det the). (man Mod old Det the From spain). 1 (ate Agt (man Mod old ... ). 2 (ate Agt (man ...) Obj fish). 407. Computational Linguistics Volume 18, N u m b e r 4. Stack Case-Transform. V... n v p a s v . adj n nl n2 n vao n vo vbe v vabo n n v p a s v p r e p n prep n b y n snt because because snt and n snt after after snt. n m o d adj n2 n m o d n l 1 agt 2 1 obj 2 1 vbe 2 v p a s v 2 ben 1 vao 1 obj 2 3 2 1 3 2 1 1 prep 2 1 conse 2 2 ante 1 1 2 3 1 pre 2 2 post 1. Table 3 Some typical case transformations for syntactic constituents. I n this e x a m p l e t h e case t r a n s f o r m a t i o n i m m e d i a t e l y f o l l o w s t h e s e m i c o l o n , a n d t h e r e s u l t o f t h e t r a n s f o r m a t i o n is s h o w n in p a r e n t h e s e s f u r t h e r to t h e right. T h e r e s u l t in t h e final c o n s t i t u e n t is:. (ate A g t ( m a n M o d o l d D e t t h e F r o m s p a i n ) O b j fish).. N o t e t h a t w e d i d n o t r e n a m e t h e s y n t a c t i c c o n s t i t u e n t s as N P o r V P in this e x a m p l e , b e c a u s e w e w e r e n o t i n t e r e s t e d in s h o w i n g t h e p h r a s e s t r u c t u r e tree. R e n a m i n g in c a s e a n a l y s i s n e e d o n l y b e d o n e w h e n it is n e c e s s a r y to p a s s o n i n f o r m a t i o n a c c u m u l a t e d f r o m a n e a r l i e r c o n s t i t u e n t . . F o r e x a m p l e , in "fish w e r e e a t e n b y b i r d s , " t h e CS p a r s e is as follows:. * n vbe ppart by n ; shift n * vbe ppart by n ; shift. n vbe * ppart by n ; shift. n vbe ppart * by n ; I vbe 2, vpasv (eaten Vbe were). n vpasv * by n ; I obj 2 (eaten Vbe were Obj fish). vpasv * by n ; shift. vpasv by * n ; shift. vpasv by n * ; i prep 2 (birds Prep by). vpasv n * ; 2 agt 1 (eaten Vbe were Obj fish Agt (birds Prep by)). H e r e , it w a s n e c e s s a r y to r e n a m e t h e c o m b i n a t i o n o f a p a s t p a r t i c i p l e a n d its a u x i l i a r y as a p a s s i v e v e r b , vpasv, so t h a t t h e s y n t a c t i c s u b j e c t a n d object c o u l d b e r e c o g n i z e d as Obj a n d Agent, r e s p e c t i v e l y . We a l s o c h o s e to u s e t h e a r g u m e n t n a m e Prep to f o r m (birds Prep by) so t h a t w e c o u l d t h e n call t h a t c o n s t i t u e n t Agent.. We c a n see t h a t t h e reduce o p e r a t i o n h a s b e c o m e a reduce-transform-rename o p e r a - t i o n w h e r e n u m b e r s r e f e r to e l e m e n t s o f t h e stack, t h e s e c o n d t e r m p r o v i d e s a c a s e a r g u m e n t label, t h e o r d e r i n g p r o v i d e s a t r a n s f o r m a t i o n , a n d a n o p t i o n a l f o u r t h ele- m e n t m a y r e n a m e t h e c o n s t i t u e n t . A s a m p l e of t y p i c a l case t r a n s f o r m a t i o n s is s h o w n a s s o c i a t e d w i t h t h e t o p e l e m e n t s o f t h e s t a c k in Table 3. I n t h i s t a b l e , t h e first e l e m e n t o f t h e s t a c k is in t h e t h i r d p o s i t i o n in t h e left s i d e o f t h e table, a n d t h e n u m b e r I r e f e r s to t h a t p o s i t i o n , 2 to t h e s e c o n d , a n d 3 to t h e first. A s a n a i d to t h e r e a d e r t h e first t w o . 408. Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English. C S - C A S E - P a r s e r ( i n p u t , c d g ) . Input is a string of syntactic classes for the given sentence. Cdg is the given CDG grammar rules.. stack := empty o u t p u t s t a c k := empty d o u n t i l ( i n p u t = empty a n d 2nd(stack) = blank). window-context := append(top-.five(stack),first_five(input)) operation := consult_CDG(window-context,cdg) i f first(operation) = S H I F T t h e n stack := push(first(input),stack). input := rest(input) e l s e stack := push(select(operation),pop(pop(stack))). o u t p u t s t a c k := make_constituent(operation,outputstack) e n d d o . Figure 7 Algorithm for case parse.. e n t r i e s in the table r e f e r literally b y s y m b o l r a t h e r t h a n b y r e f e r e n c e to t h e stack. T h e s y m b o l s vao a n d vabo are subclasses of v e r b s t h a t take, respectively, a g e n t a n d object; a n d agent, beneficiary, a n d object. T h e s y m b o l v.. refers to a n y verb. F o r m s o f t h e v e r b be are r e f e r r e d to as vbe, a n d p a s s i v i z a t i o n is m a r k e d b y r e l a b e l i n g a v e r b b y a d d i n g the suffix -pasv.. P a r s i n g c a s e s t r u c t u r e s F r o m t h e d i s c u s s i o n a b o v e w e m a y o b s e r v e t h a t t h e f l o w o f c o n t r o l in a c c o m p l i s h i n g a case p a r s e is i d e n t i c a l to t h a t of a p h r a s e s t r u c t u r e parse. T h e d i f f e r e n c e lies in t h e fact t h a t w h e n a c o n s t i t u e n t is r e c o g n i z e d (see F i g u r e 7):. in p h r a s e s t r u c t u r e , a n e w n a m e is s u b s t i t u t e d f o r its stack e l e m e n t s , a n d a c o n s t i t u e n t is f o r m e d b y listing the n a m e a n d its e l e m e n t s . in case analysis, a case t r a n s f o r m a t i o n is a p p l i e d to d e s i g n a t e d e l e m e n t s o n the stack to c o n s t r u c t a c o n s t i t u e n t , a n d t h e h e a d (i.e. t h e first e l e m e n t o f the t r a n s f o r m a t i o n ) is s u b s t i t u t e d for its e l e m e n t s - - u n l e s s a n e w n a m e is p r o v i d e d f o r t h a t substitution.. C o n s e q u e n t l y t h e a l g o r i t h m u s e d in p h r a s e s t r u c t u r e an al y si s is easily a d a p t e d to case analysis. T h e d i f f e r e n c e lies in i n t e r p r e t i n g a n d a p p l y i n g t h e o p e r a t i o n to m a k e a n e w c o n s t i t u e n t a n d a n e w stack.. In the a l g o r i t h m s h o w n a b o v e , w e revise th e stack b y a t t a c h i n g e i t h e r t h e h e a d o f t h e n e w c o n s t i t u e n t , o r its n e w n a m e , to t h e stack r e s u l t i n g f r o m t h e r e m o v a l o f all e l e m e n t s in the n e w c o n s t i t u e n t . T h e f u n c t i o n select c h o o s e s e i t h e r a n e w n a m e if p r e s e n t , o r the first e l e m e n t , the h e a d o f t h e o p e r a t i o n . Makeconstituent a p p l i e s t h e t r a n s f o r m a t i o n r u l e to f o r m a n e w c o n s t i t u e n t f r o m t h e o u t p u t stack a n d p u s h e s t h e c o n s t i t u e n t o n t o t h e o u t p u t stack, w h i c h is first r e d u c e d b y r e m o v i n g t h e e l e m e n t s u s e d in the c o n s t i t u e n t . Again, the a l g o r i t h m is a d e t e r m i n i s t i c , first (best) p a t h p a r s e r . 409. Computational Linguistics Volume 18, Number 4. with behavior essentially the same as the phrase structure parser. But this version accomplishes transformations to construct a case structure analysis.. 7.3 Acquisition System for Case Grammar The acquisition system, like the parser, required only minor revisions to accept case grammar. It m u s t a p p l y a shift or any transformation to construct the n e w stack-string for the linguist user, and it m u s t record the shift or transformation as the right half of a context-sensitive rule--still c o m p o s e d of a ten-symbol left half and an operation as the right half. Consequently, the system will b e illustrated in Figure 9 rather than described in detail.. Earlier w e mentioned the context-sensitive dictionary. This is compiled b y associ- ating with each w o r d the linguist's in-context assignments of each syntactic w o r d class in which it is experienced. W h e n the dictionary is built, the occurrence frequencies of each w o r d class are accumulated for each word. A primitive g r a m m a r of four-tuples terminating with each w o r d class is also f o r m e d and hashed in a table of syntactic paths. The procedure to determine a w o r d class in context,. ,, first obtains the candidates from the dictionary.. ,, For each candidate wc, it forms a four-tuple, vec, b y adding it to the cdr of each immediately preceding vec, stored in IPC.. • Each such vec is tested against the table of syntactic paths;. if it has b e e n seen previously, it is a d d e d to the list of IPCs, otherwise it is eliminated.. If the union of first elements of the IPC list is a single w o r d class, that is the choice. If not, the w o r d ' s m o s t frequent w o r d class a m o n g the union of surviving classes for the w o r d is chosen.. The effect of this p r o c e d u r e is to examine a context of plus and m i n u s three w o r d s to determine the w o r d class in question. Although a larger context b a s e d on five- tuple paths is slightly more effective, there is a tradeoff b e t w e e n accuracy and storage requirements.. The w o r d class selection p r o c e d u r e w a s tested on the 8,310 w o r d s of the 345- sentence sample of text. A score of 99.52% correct w a s achieved, w i t h 8,270 w o r d s correctly assigned. As a comparison, the most frequent category for a w o r d resulted in 8,137 correct assignments for a score of 97.52%. Although there are only 3,298 w o r d t y p e s with an average of 3.7 tokens p e r type, the occurrence of single w o r d class usages for w o r d s in this sample is v e r y high, thus accounting for the effectiveness of the simpler heuristic of assignment of the most frequent category. H o w e v e r , since the effect of misassignment of w o r d class can often ruin the parse, the use of the m o r e complex p r o c e d u r e is a m p l y justified. Analysis of the 40 errors in w o r d class assignment s h o w e d 7 confusions of n o u n s and verbs that will certainly cause errors in parsing; other confusions of a d j e c t i v e / n o u n , and a d v e r b / p r e p o s i t i o n are less devastating, b u t still serious e n o u g h to require further i m p r o v e m e n t s in the procedure.. The w o r d class selection p r o c e d u r e is adequate to form the p r o m p t s in the lexical acquisition phase, b u t the statistics on parsing effectiveness given earlier d e p e n d on perfect w o r d class assignments.. S h o w n in Figure 8 is the system's presentation of a sentence and its requests for each w o r d ' s syntactic class. The protocol in Figure 9 s h o w s the acquisition of shift. 410. Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English. L e x i c a l A c q u i s i t i o n : T h e s y s t e m p r o m p t s f o r s y n t a c t i c c l a s s e s a r e i n c a p i t a l s . T h e u s e r a c c e p t s t h e s y s t e m ' s p r o m p t w i t h a c a r r i a g e r e t u r n , c r o r t y p e s i n a s y n t a c t i c c l a s s i n l o w e r c a s e . W e s h o w u s e r ' s r e s p o n s e s i n b o l d - f a c e , u s i n g c r f o r c a r r i a g e r e t u r n . O t h e r a b b r e v i a t i o n s a r e w c f o r w o r d c l a s s , y o r n f o r y e s o r n o , a n d b f o r b a c k u p . . ( T H E L A U N C H O F D I S C O V E R Y A N D I T S F I V E A S T R O N A U T S H A S B E E N D E L A Y E D A T - L E A S T T W O D A Y S U N T I L M A R C H E L E V E N T H B E C A U S E - O F A C R U S H E D E L E C T R I C A L P A R T O N A M A I N E N G I N E C O M M A O F F I C I A L S S A I D ) p r o c e s s t h i s o n e ? - y o r n y. T H E c r f o r d e f a u l t e l s e w e o r b d e f a u l t is: A R T c r L A U N C H c r f o r d e f a u l t e l s e w c o r b c r ; u s e r m a d e a n e r r o r s i n c e t h e r e w a s n o d e f a u l t L A U N C H c r f o r d e f a u l t e l s e w c o r b n ; s y s t e m r e p e a t e d t h e q u e s t i o n O F c r f o r d e f a u l t e l s e w c o r b d e f a u l t i s : O F c r D I S C O V E R Y c r f o r d e f a u l t e l s e w e o r b n A N D c r f o r d e f a u l t e l s e w c o r b d e f a u l t i s : C O N J e r I T S c r f o r d e f a u l t e l s e w c o r b b ; u s e r d e c i d e d t o r e d o " a n d " A N D c r f o r d e f a u l t e l s e w c o r b d e f a u l t is: C O N J a n d I T S c r f o r d e f a u l t e l s e w c o r b p p r o n . i ! s k i p p i n g m o s t o f t h e s e n t e n c e . . . A c r f o r d e f a u l t e l s e w c o r b d e f a u l t is: A R T c r M A I N c r f o r d e f a u l t e l s e w c o r b n E N G I N E c r f o r d e f a u l t e l s e w c o r b n C O M M A c r f o r d e f a u l t e l s e w c o r b d e f a u l t is: C O M M A c r O F F I C I A L S c r f o r d e f a u l t e l s e w c o r b n S A I D c r f o r d e f a u l t e l s e w c o r b v a o . Figure 8 Illustration of dictionary acquisition.. and transformation rules for the sentence. What we notice in this second protocol is that the stack shows syntactic labels but the input string presented to the linguist is in English. As the system constructs a CS rule, however, the vector containing five elements of stack and five of input string is composed entirely of syntactic classes. The English input string better enables the linguist to maintain the meaningful context he or she uses to analyze the sentence. About five to ten minutes were required to make the judgments for this sentence. Appendix A shows the rules acquired in the session.. When rules for the sentence were completed, the system a d d e d the new syntactic classes and rules to the grammar, then offered to parse the sentence. The resulting parse is shown in Figure 10.. The case acquisition system was used on the texts described earlier in Table 1 to accumulate 3,700 example CDG case rules. Because the case transformations refer to three stack elements and the n u m b e r of case labels is large, we expected and f o u n d that a much larger sample of text would be required to obtain the levels of generalization seen in the phrase structure experiments.. Accumulated in increments of 400 rules, the case curve flattens at about 2,400 rules with an average of 33% error in prediction compared to the 20% found in analysis of the same number of phrase structure rules. The compressed or minimal g r a m m a r for this set of case rules reduces the 3,700 rules to 1,633, a compression ratio in this case of 2.3 examples accounted for by each rule. The resulting compressed g r a m m a r parses the texts with 99% accuracy. These statistics are from our initial s t u d y of a case grammar, and they should be taken only as preliminary estimates of w h a t a more thorough s t u d y m a y show.. 411. C o m p u t a t i o n a l L i n g u i s t i c s V o l u m e 18, N u m b e r 4. C a s e - G r a m m a r A c q u i s i t i o n : T h e options are h for a help message, b for backup one state, s for shift, c a s e - t r a n s for a case t r a n s f o r m a t i o n , and cr for carriage r e t u r n t o accept a s y s t e m p r o m p t . S y s t e m p r o m p t s axe capitalized in parentheses, user responses are in lower case. W h e r e no appaxent r e s p o n s e is shown, t h e user did a carriage r e t u r n t o accept t h e p r o m p t . T h e first line shows t h e syntactic classes for t h e w o r d s in t h e sentence.. ( A R T N O F N A N D P P R O N A D J N VHAV V B E VAO A T - L E A S T A D J N U N T I L N N B E C A U S E - O F A R T P P A R T AD3 N O N A R T N N C O M M A N VAO) (* T H E L A U N C H O F D I S C O V E R Y A N D I T S F I V E A S T R O N A U T S HAS B E E N D E L A Y E D A T - L E A S T T W O DAYS U N T I L M A R C H E L E V E N T H B E C A U S E - O F A C R U S H E D E L E C T R I C A L P A R T O N A M A I N E N G I N E C O M M A O F F I C I A L S S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (S) ( A R T * L A U N C H O F D I S C O V E R Y A N D I T S ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (S) ( A R T N * O F D I S C O V E R Y A N D I T S F I V E ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (S) 1 det 2 (N * O F D I S C O V E R Y A N D I T S F I V E ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (S). i s k i p p i n g s e v e r a l s h i f t s (N O F N A N D P P R O N AD3 N * HAS B E E N D E L A Y E D A T - L E A S T ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (S) 1 rood 2 (N O F N A N D P P R O N N * HAS B E E N D E L A Y E D A T - L E A S T ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: NIL 1 p o s s b y 2 (N O F N A N D N * HAS B E E N D E L A Y E D A T - L E A S T ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: NIL 3 2 1 (N O F N * HAS B E E N D E L A Y E D A T - L E A S T T W O ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (3 2 1) (N * HAS B E E N D E L A Y E D A T - L E A S T T W O ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (S) (N V H A V * B E E N D E L A Y E D A T - L E A S T T W O DAYS ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: ( 1 0 B J 2) s (N V H A V V B E * D E L A Y E D A T - L E A S T T W O DAYS ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (S) 1 a n x 2 (N V B E * D E L A Y E D A T - L E A S T T W O DAYS U N T I L ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (S) (N V B E VAO * A T - L E A S T T W O DAYS U N T I L M A R C H ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (1 V B E 2 V A O P A S V ) (N V A O P A S V * A T - L E A S T T W O DAYS U N T I L M A R C H ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: ( 1 0 B J 2 /. ! s k i p p i n g n o w t o B E C A U S E ( V A O P A S V U N T I L N * B E C A U S E - O F A C R U S H E D E L E C T R I C A L ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (S) 3 2 1 ( V A O P A S V * B E C A U S E - O F A C R U S H E D E L E C T R I C A L P A R T ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: (S) ( V A O P A S V B E C A U S E - O F * A C R U S H E D E L E C T R I C A L P A R T ON .. S A I D ) o p t i o n s - a r e h b s c a s e - t r a n s or cr for default: NIL 1 conse 2 ( B E C A U S E - O F * A C R U S H E D E L E C T R I C A L P A R T O N ... S A I D ) options-axe h b s c a s e - t r a n s or cr for default: NIL s. : s k i p p i n g n o w t o t h e e n d ( B E C A U S E - O F C O M M A N VAO *) o p t i o n s - a r e h b s c a s e - t r a n s or cr for default: (1 OB3 2) 1 agt 2 ( B E C A U S E - O F C O M M A VAO *) options-axe h b s c a s e - t r a n s or cr for default: NIL 1 obj 3. Figure 9 I l l u s t r a t i o n o f c a s e g r a m m a r a c q u i s i t i o n . . 8. D i s c u s s i o n and C o n c l u s i o n s . I t s e e m s r e m a r k a b l e t h a t a l t h o u g h t h e t h e o r y o f c o n t e x t - s e n s i t i v e g r a m m a r s a p p e a r e d . i n C h o m s k y ( 1 9 5 7 ) , f o r m a l c o n t e x t - s e n s i t i v e r u l e s s e e m n o t t o h a v e b e e n u s e d p r e - . 4 1 2 . Robert E Simmons and Yeong-Ho Yu Context-Dependent Grammars for English. s a i d ~ A G T - ' ~ Officials. I O B J ~ because-of. m CONSE ~ d e l a y e d ~ VBE ''¢~ b e e n ~ A U X ' - ~ has. O B J - ' ~ l a u n c h - - D E T - ' - ~ the. I O F " - ~ d i s c o v e r y ~ A N D ~ ~ . r o n ~ o J t s . ~ MOD " ~ five. P O S S B Y " ~ its. A T - L E A S T ~ d a y s - M O D ~ two. UNTIL ~ eleventh ~ N M O D - - ~ march. A N T E ~ part ~ M O D ~ electrical. ~ MOD ~ crushed. D E T ''4t" a. O N " - ~ engine. L NMOD - ~ m a i n . D E T - ' ~ a. The launch of discovery and its five astronauts has been delayed at-least two days until march eleventh because-of a crushed electrical part on a main engine comma officials said.. Figure 10 Case analysis of a sentence.. viously in computational parsing. As researchers we seem simply to have assumed, without experimentation, that context-sensitive grammars w o u l d be too large and cumbersome to be a practical approach to automatic parsing. In fact, context-sensitive, binary phrase structure rules with a context composed of the preceding three stack symbols and the next five input symbols,. stack1_3 binary-rule input1_5 ---* operation. provide several encouraging properties.. The linguist uses the full context of the sentence to make a simple decision: either shift a new element onto the stack or combine the top two elements into a phrase category.. The system compiles a CS rule composed of ten symbols, the top five elements of the stack and the next five elements of the input string. The context of the e m b e d d e d binary rule specializes that rule for use in similar environments, thus providing selection criteria to the parser for the choice of shift or reduce, and for assigning the phrase name that has most frequently been used in similar environments. The context provides a simple but powerful approach to preference parsing.. 413. Computational Linguistics Volume 18, Number 4. • As a result, a deterministic b o t t o m - u p parser is notably successful in finding precisely the parse tree that the linguist w h o constructed the analysis of a sentence had in m i n d - - a n d this is true w h e t h e r the g r a m m a r is stored as a trained neural n e t w o r k or in the form of hash-table entries.. • Despite the large combinatoric space for selecting 1 of 64 symbols in each of 10 slots in the rules--641° possible patterns experiments in accumulating phrase structure g r a m m a r suggest that a fairly complete g r a m m a r will require only a b o u t 25,000 CS rules.. • It is also the case that w h e n r e d u n d a n t rules are r e m o v e d the CS g r a m m a r is reduced b y a factor of four and still maintains its accuracy in parsing.. • Because of the simplicity and regular form of the rule structure, it has p r o v e d possible to construct an acquisition system that greatly facilitates the accumulation of grammar. The acquisition system presents contexts and suggests operations that have previously b e e n u s e d with similar contexts; thus it helps the linguist to maintain consistency of judgments.. • Parsing with context-sensitive rules generalizes from phrase structure rewriting rules to the transformational rules required b y case analysis. Since the case analysis rules retain a regular, simple form, the acquisition system also generalizes to case grammar.. Despite such a d v a n t a g e o u s properties, a f e w cautions should be noted. First, the deterministic parsing algorithm is sufficient to a p p l y the CDG to the sentences from which the g r a m m a r w a s derived, b u t to accomplish effective generalization to n e w sentences, a b a n d w i d t h parsing algorithm that follows multiple parsing paths is supe- rior. Second, the 99% accuracy of the parsing will deteriorate m a r k e d l y if the dictionary lookup makes errors in w o r d assignment. Thirdly, the s h i f t / r e d u c e parsing is unable to give correct analyses for such e m b e d d e d discontinuous constituents as "I s a w the m a n y e s t e r d a y w h o . . . . " Finally, the actual parsing structures that w e have presented here are skeletal. We did not m a r k m o o d , aspect or tense of verbs, n u m b e r for nouns, or deal with long distance dependencies. We do not resolve p r o n o u n references; and w e do not complete ellipses in conjunctive and other constructions.
Show more

New documents

At week 8, SCr levels in the She Jing Xiao Bai cap- sule, irbesartan, and She Jing Xiao Bai capsule plus irbesartan groups were significantly lower than that in model control group

The PairingWords system Han et al., 2013 uses a state-of-the-art word similarity measure to align words in the sentence pair and computes the STS score using a simple metric that

Similarity scores using optimal alignment of words where word-to-word similarity was cal- culated using vector based methods using word representations from Mikolov, GloVe, LSA Wiki

Original Article Myeloperoxidase G-463A polymorphism and coronary artery disease susceptibility in the Chinese population: an updated meta-analysis and review.. 1Department of

This work uses a support vector machine SVM to determine the similarity of sentence pairs, tak- ing as input the similarity judgments of four subsys- tems: a set of surface features,

Our study also suggested that except for Hcy level, the diabetic duration and HbAlc of patients in DPN group were significantly higher than those in SDM group, and Hcy, course of

using our defined word sense similarity method.. From this point, the remaining steps are

The SemEval Semantic Textual Similarity STS task series Agirre et al., 2012; Agirre et al., 2013; Agirre et al., 2014; Agirre et al., 2015 has become a central platform for the task: a

Original Article Recombinant human erythropoietin promotes angiogenesis by activating SMAD3 and stimulating endothelial progenitor cells during wound healing.. 1The First Affiliated

Using the domain- specific W2V corpus, we compute the feature do- main w2v cosine similarity in a similar fashion to the Sum W2V feature – we compute the centroid vector of the content