A Parsing Algorithm that Extends Phrases

10 

A Parsing Algorithm That Extends Phrases D a n i e l C h e s t e r . D e p a r t m e n t o f C o m p u t e r S c i e n c e s T h e U n i v e r s i t y o f T e x a s a t A u s t i n . A u s t i n , T e x a s 7 8 7 1 2 . It is desirable f o r a parser t o be able t o e x t e n d a phrase e v e n a f t e r it has b e e n c o m b i n e d i n t o a larger s y n t a c t i c unit. This paper p r e s e n t s an algorithm that d o e s this in t w o ways, o n e dealing with "right e x t e n s i o n " and t h e o t h e r with "left r e c u r s i o n " . A brief c o m p a r i s o n with o t h e r parsing algorithms s h o w s it t o be r e l a t e d t o t h e l e f t - c o r n e r parsing algorithm, but it is m o r e f l e x i b l e in t h e order that it permits p h r a s e s t o be c o m b i n e d . It has m a n y o f t h e p r o p e r t i e s o f t h e s e n t e n c e a n a l y z e r s o f M a r c u s and R i e s b e c k , but is i n d e p e n d e n t o f t h e language t h e o r i e s o n which t h o s e p r o g r a m s are based.. 1. Introduction. T o a n a l y z e a s e n t e n c e o f a n a t u r a l l a n g u a g e , a c o m p u t e r p r o g r a m r e c o g n i z e s t h e p h r a s e s w i t h i n t h e s e n t e n c e , builds d a t a s t r u c t u r e s , s u c h as c o n c e p t u a l r e p r e s e n t a t i o n s , f o r e a c h o f t h e m a n d c o m b i n e s t h o s e s t r u c t u r e s i n t o o n e t h a t c o r r e s p o n d s t o t h e e n t i r e s e n t e n c e . T h e a l g o r i t h m w h i c h r e c o g n i z e s t h e p h r a s e s a n d i n v o k e s t h e s t r u c t u r e - b u i l d i n g p r o c e - d u r e s is t h e parsing algorithm i m p l e m e n t e d b y t h e p r o g r a m . T h e p a r s i n g a l g o r i t h m is c o m b i n e d w i t h a s e t o f p r o c e d u r e s f o r d e c i d i n g b e t w e e n a l t e r n a t i v e a c t i o n s a n d f o r building t h e d a t a s t r u c t u r e s . Since it is o r g a n i z e d a r o u n d p h r a s e s , it is p r i m a r i l y c o n - c e r n e d w i t h s y n t a x , w h i l e t h e p r o c e d u r e s it calls d e a l w i t h n o n - s y n t a c t i c p a r t s o f t h e analysis. W h e n t h e p r o g r a m is run, t h e r e m a y b e a c o m p l e x i n t e r - p l a y b e t w e e n t h e c o d e s e g m e n t s t h a t h a n d l e s y n t a x a n d t h o s e t h a t h a n d l e s e m a n t i c s a n d p r a g m a t i c s , b u t t h e p r o g r a m o r g a n i z a t i o n c a n still b e a b s t r a c t e d i n t o a ( s y n t a c t i c ) p a r s i n g a l g o r i t h m a n d a set o f p r o c e - d u r e s t h a t a r e called t o a u g m e n t t h a t a l g o r i t h m . . B y t a k i n g t h e v i e w t h a t t h e p a r s i n g a l g o r i t h m r e c o g n i z e s t h e p h r a s e s in a s e n t e n c e , t h a t is, t h e c o m p o n e n t s o f its s u r f a c e s t r u c t u r e a n d h o w t h e y c a n b e d e c o m p o s e d , it s u f f i c e s t o s p e c i f y t h e s y n t a x o f a n a t u r a l l a n g u a g e , a t l e a s t a p p r o x i m a t e l y , w i t h a c o n t e x t - f r e e p h r a s e s t r u c t u r e g r a m m a r , t h e rules o f w h i c h s e r v e as p h r a s e d e c o m p o s i t i o n rules. A l - t h o u g h linguists h a v e d e v e l o p e d m o r e e l a b o r a t e g r a m m a r s f o r this p u r p o s e , m o s t c o m p u t e r p r o g r a m s f o r s e n t e n c e a n a l y s i s , e.g., H e i d o r n ( 1 9 7 2 ) , W i n o - . g r a d ( 1 9 7 2 ) a n d W o o d s ( 1 9 7 0 ) , s p e c i f y t h e s y n t a x w i t h s u c h a g r a m m a r , o r s o m e t h i n g e q u i v a l e n t , a n d t h e n a u g m e n t t h a t g r a m m a r w i t h p r o c e d u r e s a n d d a t a s t r u c t u r e s t o h a n d l e t h e n o n - c o n t e x t - f r e e c o m - p o n e n t s o f t h e l a n g u a g e . T h e n o t i o n o f p a r s i n g a l g o r i t h m is t h e r e f o r e r e s t r i c t e d in this p a p e r t o a n a l g o r i t h m t h a t r e c o g n i z e s p h r a s e s in a c c o r d a n c e w i t h a c o n t e x t - f r e e p h r a s e s t r u c t u r e g r a m m a r . . Since t h e p a r s i n g a l g o r i t h m o f a s e n t e n c e a n a l y s i s p r o g r a m d e t e r m i n e s w h e n d a t a s t r u c t u r e s g e t c o m - b i n e d , it s e e m s r e a s o n a b l e t o e x p e c t t h a t t h e a c t i o n s o f t h e p a r s e r s h o u l d r e f l e c t t h e a c t i o n s o n t h e d a t a s t r u c t u r e s . I n p a r t i c u l a r , t h e c o m b i n a t i o n o f p h r a s e s i n t o l a r g e r p h r a s e s c a n b e e x p e c t e d t o c o i n c i d e w i t h t h e c o m b i n a t i o n o f c o r r e s p o n d i n g d a t a s t r u c t u r e s i n t o l a r g e r d a t a s t r u c t u r e s . T h i s h a p p e n s n a t u r a l l y w h e n t h e c o m p u t e r p r o g r a m is s u c h t h a t it calls t h e p r o c e d u r e s f o r c o m b i n i n g d a t a s t r u c t u r e s a t t h e s a m e t i m e t h e p a r s i n g a l g o r i t h m i n d i c a t e s t h a t t h e c o r r e s p o n d i n g p h r a s e s s h o u l d b e c o m b i n e d . . 1.1 Other Parsing Algorithms. A c c o r d i n g t o o n e c l a s s i f i c a t i o n o f p a r s i n g a l g o r - i t h m s ( A h o a n d U l l m a n 1 9 7 2 ) , m o s t a n a l y s i s p r o - g r a m s a r e b a s e d o n a l g o r i t h m s t h a t a r e e i t h e r " t o p - d o w n " , " b o t t o m - u p " o r " l e f t - c o r n e r " , t h o u g h a c - c o r d i n g t o a r e c e n t s t u d y b y G r i s h m a n ( 1 9 7 5 ) , t h e t o p - d o w n a n d b o t t o m - u p a p p r o a c h e s a r e d o m i n a n t . T h e p r i n c i p l e o f top-down p a r s i n g is t h a t t h e r u l e s o f t h e c o n t r o l l i n g g r a m m a r a r e u s e d t o g e n e r a t e a s e n - t e n c e t h a t m a t c h e s t h e o n e b e i n g a n a l y z e d . A s e r i - . Copyright 1980 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not made for direct commercial advantage and the Journal reference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission.. 0 3 6 2 - 6 1 3 X / 8 0 / 0 2 0 0 8 7 - 1 0 5 0 1 . 0 0 . American Journal of Computational Linguistics, Volume 6, Number 2, April-June 1980 87. Daniel Chester A Parsing Algorithm That Extends Phrases. ous problem with this approach, if computed phrases are supposed to correspond to natural phrases in a sentence, is that the parser c a n n o t handle left- branching phrases. But such phrases occur in Eng- lish, Japanese, Turkish and other natural languages (Chomsky 1965, Kimball 1973, L y o n s 1970).. The principle of bottom-up parsing is that a se- quence of phrases whose types match the right-hand side of a grammar rule is reduced to a phrase of the type on the left-hand side of the rule. None of the matching is done until all the phrases are present; this can be ensured by matching the phrase types in the right-to-left order shown in the grammar rule. The difficulty with this approach is that the analysis of the first part of a sentence has no influence on the analysis of latter parts until the results of the analyses are finally combined. E f f o r t s to overcome this difficulty lead naturally to the third approach, left-corner parsing.. Left-corner parsing, like b o t t o m - u p parsing, re- duces phrases whose types match the right-hand side of a grammar rule to a phrase of the type on the. - ~ left-hand side of the rule; the difference is that the '< ~,types listed in the right-hand side of the rule are. / j 0 " x ~ m a t c h e d f r o m left to right for l e f t - c o r n e r parsing ~. ~ instead of from right to left. This technique gets its. ~ name from the fact that the first phrase found cor- ~ responds to the left-most symbol of the right-hand b'~side of the grammar rule, and this symbol has b e e n . called the left corner of the rule. (When a grammar rule is drawn graphically with its left-hand side as the parent node and the symbols of the right-hand side as the daughters, forming a triangle, the left- most symbol is the left corner of the triangle.) Once the l e f t - c o r n e r phrase has been found, the grammar rule can be used to predict what kind of phrase will come next. This is how the analysis of the first part of a sentence can influence the analysis of later parts.. Most programs based on augmented transition networks employ a t o p - d o w n parser to which regis- ters and structure building routines have b e e n add- ed, e.g., Kaplan (1972) and Wanner and Maratsos (1978). It is important to note, however, that the concept of augmented transition networks is a par- ticular way to represent linguistic knowledge; it does not require that the p r o g r a m using the networks operate in t o p - d o w n fashion. In an early paper by Woods ( 1 9 7 0 ) , alternative algorithms t h a t can be used with a u g m e n t e d transition n e t w o r k s are dis- cussed, including the b o t t o m - u p and Earley algor- ithms.. The p r o c e d u r e - b a s e d programs of Winograd ( 1 9 7 2 ) and N o v a k ( 1 9 7 6 ) are basically t o p - d o w n parsers, too. The N L P program of H e i d o r n (1972) employs an augmented phrase structure grammar to. combine phrases in a basically b o t t o m - u p fashion. Likewise, P A R R Y , the p r o g r a m w r i t t e n by C o lb y (Parkison, Colby and Faught 1977), uses a kind of b o t t o m - u p method to drive a c o m p u t e r model of a paranoid.. 1.2 A N e w Parsing A l g o r i t h m . This paper presents a parsing algorithm that al- lows data structures to be combined as soon as pos- sible. The algorithm permits a structure A to be ~ . ~ combined with a structure B to form a structure C, and then to enlarge B to form a new structure B , ~ _ . ~ This new structure is to be f o r m e d in such a wayA~L~A~ ~ that C is now composed of A and B' instead of A l ~ X O ~ y and B. The algorithm permits these actions on data structures because it permits similar actions o n phrases, namely, phrases are c o m b i n e d with o t h e r phrases and a f t e r w a r d are e x t e n d e d to enco mp a s s more words in the sentence being analyzed. This b e h a v i o r of combining phrases b e f o r e all of their c o m p o n e n t s have b e e n f o u n d is called closure b y Kimball (1973). It is desirable because it permits the c o r r e s p o n d i n g data structures to be c o m b i n e d and to influence the c o n s t r u c t i o n of o t h e r d a ta structures sooner than they otherwise could.. In the next section of this paper the desired be- havior for combining phrases is discussed in more detail to show the two kinds of actions that are re- quired. T h e n the algorithm itself is explained and its o p e r a t i o n illustrated by examples, with some details of an e x p e r i m e n t a l i m p l e m e n t a t i o n being given, also. Finally, this algorithm is compa r e d to those used in the s e n t e n c e analysis programs of Marcus and Riesbeck, and some concluding remarks are made.. 2. P h r a s e E x t e n s i o n . A parser that extends phrases combines phrases b e f o r e it has f o u n d all of their components. W h e n parsing the sentence. (1) This is the cat that caught the rat that stole the cheese.. it combines the first "this is the c a t " , in phrase, then extends. f o u r words into the phrase which " t h e c a t " is a n o u n . that noun phrase to " t h e cat that caught the r a t " , in which " t h e r a t " is a n o u n phrase, and finally extends that noun phrase to "th e rat that stole the cheese." This is apparently how people parse that sentence, because, as C h o m s k y (1965) noted, it is natural to break up the sentence into the fragments. this is the cat that caught the rat that stole the cheese. 88 American Journal of Computational Linguistics, Volume 6, Number 2, April-June 1980. Daniel Chester A Parsing A l g o r i t h m T h a t Extends Phrases. ( b y a d d i n g i n t o n a t i o n , p a u s e s o r c o m m a s ) r a t h e r t h a n at t h e b o u n d a r i e s o f t h e m a j o r n o u n p h r a s e s : . this is t h e c a t t h a t c a u g h t t h e r a t t h a t s t o l e t h e c h e e s e . L i k e w i s e , w h e n t h e p a r s e r p a r s e s t h e s e n t e n c e . (2) T h e r a t s t o l e t h e c h e e s e in t h e p a n t r y b y t h e b r e a d . . it f o r m s t h e p h r a s e " t h e r a t s t o l e t h e c h e e s e " , t h e n e x t e n d s t h e n o u n p h r a s e " t h e c h e e s e " t o " t h e c h e e s e in t h e p a n t r y " a n d e x t e n d s that p h r a s e t o " t h e c h e e s e in t h e p a n t r y b y t h e b r e a d " . . T h e s e t w o e x a m p l e s s h o w t h e t w o k i n d s o f a c - t i o n s n e e d e d b y a p a r s e r in o r d e r t o e x h i b i t t h e b e - h a v i o r c a l l e d c l o s u r e . E a c h o f t h e s e a c t i o n s will n o w b e d e s c r i b e d in m o r e detail, u s i n g t h e f o l l o w i n g g r a m m a r , G 1:. S -> NP VP. NP -> Pro. NP -> NPI. NP -> NPI RelPro VP. VP -> V NP. Pro -> this. NPI -> Det N. NPI -> NPI PP. RelPro -> that. V -> is. V -> caught. V -> stole. Det -> the. N -> cat. N -> rat. N -> cheese. N -> pantry. N -> bread. PP -> Prep NP. Prep -> in. Prep -> by. 2 . 1 R i g h t E x t e n s i o n . T h e first k i n d o f a c t i o n n e e d e d b y t h e p a r s e r is t o a d d m o r e c o m p o n e n t s t o a p h r a s e a c c o r d i n g t o a r u l e w h o s e r i g h t - h a n d side e x t e n d s t h e r i g h t - h a n d side o f a n o t h e r rule. T h i s is i l l u s t r a t e d b y s e n t e n c e 1. W i t h g r a m m a r G 1 , t h e p h r a s e " t h i s is t h e c a t " . h a s t h e p h r a s e s t r u c t u r e . s. NP VP. i t \ Pro V NP. l I I this is NPI. t \ Det N. l I the cat. I n o r d e r t o e x t e n d t h e N P p h r a s e " t h e c a t " , t h e s u b s t r u c t u r e . NP I NPI. I \ Det N. I I the cat. m u s t b e c h a n g e d t o t h e s u b s t r u c t u r e . NP. '. NP VP. i \ i , Det N that V. I I I the cat caught. NP I NPI. i \ Det N. l I the rat. T h e p a r s e r , in e f f e c t , m u s t e x t e n d t h e N P p h r a s e a c c o r d i n g t o t h e r u l e N P - - > N P 1 R e l P r o V P , w h o s e r i g h t - h a n d side e x t e n d s t h e r i g h t - h a n d side o f t h e rule N P - - > N P 1 . . T h e r e is a s i m p l e w a y t o i n d i c a t e in a g r a m m a r w h e n a p h r a s e c a n b e e x t e n d e d in this w a y : w h e n t w o r u l e s h a v e t h e s a m e l e f t - h a n d side, a n d t h e r i g h t - h a n d side o f t h e first rule is a n initial s e g m e n t o f t h e r i g h t - h a n d side o f t h e s e c o n d rule, a s p e c i a l m a r k c a n b e p l a c e d a f t e r t h a t initial s e g m e n t in t h e s e c o n d rule a n d t h e n t h e f i r s t rule c a n b e d i s c a r d e d . T h e s p e c i a l m a r k c a n b e i n t e r p r e t e d as s a y i n g t h a t t h e r e s t o f t h e r i g h t - h a n d side o f t h e rule is o p t i o n - al. U s i n g * as t h e s p e c i a l m a r k , t h e r u l e N P - - > N P 1 * R e l P r o V P c a n r e p l a c e t h e r u l e s N P - - > N P 1 a n d N P - - > N P 1 R e l P r o V P in t h e g r a m m a r G 1 . . A m e r i c a n Journal of Computational Linguistics, V o l u m e 6, N u m b e r 2, April-June 1980 8 9 . Daniel Chester A Parsing Algorithm That Extends Phrases. O u r a l g o r i t h m t h e r e f o r e r e q u i r e s a modified grammar. F o r t h e g e n e r a l c a s e , a g r a m m a r is m o d i - f i e d b y e x e c u t i o n o f t h e f o l l o w i n g s t e p : . 1) F o r e a c h rule o f t h e f o r m X - - > Y1 ... Y k , . w h e n e v e r t h e r e is a n o t h e r rule o f t h e f o r m . X - - > Y1 ... Y k Y k + l ... Y n , r e p l a c e this. o t h e r rule b y X - - > Y1 ... Y k * Y k + l .... Y n , p r o v i d e d Y k + 1 is n o t *. W h e n n o m o r e rules c a n b e p r o d u c e d b y t h e a b o v e . s t e p , t h e f o l l o w i n g s t e p is e x e c u t e d : . 2) D e l e t e e a c h rule o f t h e f o r m X - - > Y1 .... Y k f o r w h i c h t h e r e is a n o t h e r rule o f t h e . f o r m X - - > Y1 ... Y k Y k + l ... Y n . . T h e m a r k * tells t h e p a r s e r t o c o m b i n e t h e p h r a s e s . t h a t c o r r e s p o n d t o t h e s y m b o l s p r e c e d i n g * as if t h e . rule e n d e d t h e r e . A f t e r d o i n g t h a t , t h e p a r s e r t r i e s . t o e x t e n d t h e c o m b i n a t i o n b y l o o k i n g f o r p h r a s e s t o . c o r r e s p o n d t o t h e s y m b o l s f o l l o w i n g *. 2.2 L e f t R e c u r s i o n . T h e s e c o n d k i n d o f a c t i o n n e e d e d b y t h e p a r s e r . is t o e x t e n d a p h r a s e a c c o r d i n g t o a left-recursive rule. A l e f t - r e c u r s i v e r u l e is a r u l e o f t h e f o r m . X - - > X Y1 ... Yn. T h e a c t i o n b a s e d o n a l e f t - . r e c u r s i v e r u l e is i l l u s t r a t e d b y s e n t e n c e 2 a b o v e . . T h e p h r a s e " t h e r a t s t o l e t h e c h e e s e " h a s t h e p h r a s e . s t r u c t u r e . s. NP VP. NPI V NP. f \ i i. Det N stole NPI. I I i \. the rat Det N. I I. the cheese. I n o r d e r t o e x t e n d t h e p h r a s e " t h e c h e e s e " , t h e s u b - . s t r u c t u r e . NPI. i \ Det N. I I. the cheese. m u s t b e c h a n g e d t o t h e s u b s t r u c t u r e . NPI. ,. NPI PP. l \ i \ Det N Prep. I I I the cheese in. NP. i NPI. I \ Det. I the. N. i pantry. T h e p a r s e r , in e f f e c t , m u s t e x t e n d t h e N P 1 p h r a s e a c c o r d i n g t o t h e l e f t - r e c u r s i v e r u l e N P 1 - - > N P 1 PP.. I n t h e g e n e r a l c a s e , a f t e r f i n d i n g a p h r a s e a n d c o m b i n i n g it w i t h o t h e r s , t h e p a r s e r m u s t l o o k f o r m o r e p h r a s e s t o c o m b i n e w i t h it a c c o r d i n g t o s o m e l e f t - r e c u r s i v e rule in t h e g r a m m a r . L e f t - r e c u r s i v e rules, t h e r e f o r e , p l a y a s p e c i a l r o l e in this a l g o r i t h m . . I t m i g h t b e t h o u g h t t h a t l e f t - r e c u r s i v e r u l e s a r e t h e o n l y m e c h a n i s m n e e d e d t o e x t e n d p h r a s e s . T h i s is n o t t r u e b e c a u s e p h r a s e s e x t e n d e d in t h i s w a y h a v e a d i f f e r e n t p r o p e r t y f r o m t h o s e e x t e n d e d b y r i g h t e x t e n s i o n . I n p a r t i c u l a r , l e f t - r e c u r s i v e r u l e s c a n b e a p p l i e d s e v e r a l t i m e s t o e x t e n d t h e s a m e p h r a s e , w h i c h is n o t a l w a y s t h e d e s i r e d b e h a v i o r f o r a g i v e n l a n g u a g e p h e n o m e n o n . F o r i n s t a n c e , a n y n u m b e r o f P P p h r a s e s c a n f o l l o w a n N P 1 p h r a s e without c o n n e c t i n g w o r d s o r p u n c t u a t i o n b e c a u s e o f t h e l e f t - r e c u r s i v e rule N P 1 - - > N P 1 P P . If, o n t h e o t h e r h a n d , N P - - > N P R e l P r o V P w e r e u s e d in- s t e a d o f N P - - > N P 1 R e l P r o V P , a n y n u m b e r o f r e l a t i v e c l a u s e s ( R e l P r o a n d V P p h r a s e p a i r s ) c o u l d f o l l o w a n N P p h r a s e , w h i c h is g e n e r a l l y u n g r a m m a t - ical in E n g l i s h u n l e s s t h e c l a u s e s a r e s e p a r a t e d b y c o m m a s o r c o n n e c t i v e w o r d s like " a n d " a n d " o r " . . 3. D e t a i l s o f t h e P a r s i n g A l g o r i t h m . T h e p a r s i n g a l g o r i t h m t h a t p r o v i d e s f o r b o t h r i g h t e x t e n s i o n a n d l e f t r e c u r s i o n will n o w b e d e - s c r i b e d . I t w o r k s w i t h s t r u c t u r e s c a l l e d p h r a s e s t a - tus e l e m e n t s t o k e e p t r a c k o f t h e s t a t u s o f p h r a s e s d u r i n g t h e s e n t e n c e a n a l y s i s . I n this s e c t i o n , a f t e r t h e s e e l e m e n t s a r e d e f i n e d , t h e a l g o r i t h m is p r e s e n t - e d as a s e t o f o p e r a t i o n s t o b e p e r f o r m e d o n a list o f p h r a s e s t a t u s e l e m e n t s . T h e n t h e a l g o r i t h m is a p p l i e d t o s e n t e n c e s 1 a n d 2 t o s h o w h o w it w o r k s , a n d s o m e r e f i n e m e n t s t h a t w e r e a d d e d in a n e x p e r i - m e n t a l i m p l e m e n t a t i o n a r e d i s c u s s e d . . 9 0 American Journal of Computational Linguistics, Volume 6, Number 2, April-June 1980. Daniel Chester A Parsing A l g o r i t h m T h a t Extends Phrases. 3.1 P h r a s e S t a t u s E l e m e n t s . I n o r d e r t o c o m b i n e a n d e x t e n d p h r a s e s p r o p e r l y , a p a r s e r m u s t k e e p t r a c k o f t h e s t a t u s o f e a c h p h r a s e ; in p a r t i c u l a r , it m u s t n o t e w h a t k i n d o f p h r a s e will b e f o r m e d , w h a t c o m p o n e n t p h r a s e s a r e still n e e d e d t o c o m p l e t e t h e p h r a s e , a n d w h e t h e r t h e p h r a s e is a n e w o n e o r a n e x t e n s i o n . T h i s i n f o r m a - t i o n c a n b e r e p r e s e n t e d b y a phrase status element. A p h r a s e s t a t u s e l e m e n t is a t r i p l e o f t h e f o r m ( Y k ... Y n , X , F ) w h e r e , f o r s o m e s y m b o l s Y 1 , ..., Y k - 1 , t h e r e is a g r a m m a r rule o f t h e f o r m X - - > Y1 ... Y k - 1 Y k ... Y n , a n d F is a f l a g t h a t h a s o n e o f t h e v a l u e s n, e, o r p, w h i c h s t a n d f o r " n e w " , " e x t e n d i b l e " a n d " p r o g r e s s i n g " , r e s p e c t i v e l y . I n t u i - tively, this triple m e a n s t h a t t h e p h r a s e in q u e s t i o n is a n X p h r a s e a n d t h a t it will b e c o m p l e t e d w h e n p h r a s e s o f t y p e s Y k t h r o u g h Y n a r e f o u n d . I f F = n, t h e p h r a s e will b e a n e w p h r a s e . I f F -- e, t h e p h r a s e is r e a d y f o r e x t e n s i o n i n t o a l o n g e r X p h r a s e , b u t n o n e o f t h e e x t e n d i n g p h r a s e c o m p o n e n t s h a v e b e e n f o u n d yet. I f F = p, h o w e v e r , s o m e o f t h e e x t e n d i n g p h r a s e c o m p o n e n t s have b e e n f o u n d al- r e a d y a n d t h e r e s t m u s t b e f o u n d t o c o m p l e t e t h e p h r a s e . T h e s t a t u s o f b e i n g r e a d y f o r e x t e n s i o n h a s t o b e d i s t i n g u i s h e d f r o m t h e s t a t u s o f h a v i n g t h e e x t e n s i o n in p r o g r e s s , b e c a u s e it is d u r i n g t h e r e a d y s t a t u s t h a t t h e d e c i s i o n w h e t h e r t o t r y t o e x t e n d t h e p h r a s e is m a d e . . 3.2 T h e A l g o r i t h m . T h e p a r s i n g a l g o r i t h m f o r e x t e n d i n g p h r a s e s is e m b o d i e d in a s e t o f o p e r a t i o n s f o r m a n i p u l a t i n g a list o f p h r a s e s t a t u s e l e m e n t s , c a l l e d t h e element list, a c c o r d i n g t o a g i v e n m o d i f i e d g r a m m a r a n d a g i v e n s e n t e n c e t o b e a n a l y z e d . B e g i n n i n g w i t h a n e m p t y e l e m e n t list, t h e p r o c e d u r e is a p p l i e d b y p e r f o r m i n g t h e o p e r a t i o n s r e p e a t e d l y until t h e list c o n t a i n s o n l y t h e e l e m e n t (,S,n), w h e r e S is t h e g r a m m a r ' s s t a r t s y m b o l , a n d all o f t h e w o r d s in t h e g i v e n s e n t e n c e h a v e b e e n p r o c e s s e d . . T h e f o l l o w i n g a r e t h e six o p e r a t i o n s w h i c h a r e a p p l i e d t o t h e e l e m e n t list:. 1. R e p l a c e a n e l e m e n t o f t h e f o r m (* Y1 ... Y n , X , F ) w i t h t h e p a i r (Y1 ... Y n , X , e ) ( , X , F ) . . 2. R e p l a c e a n e l e m e n t o f t h e f o r m ( , X , p ) w i t h t h e e l e m e n t (Y1 ... Y n , X , e ) if t h e r e is a l e f t - r e c u r s i v e rule o f t h e f o r m X - - > X Y1 ... Y n in t h e g r a m m a r ; if t h e r e is n o s u c h rule, d e - l e t e t h e e l e m e n t . . 3. R e p l a c e a d j a c e n t p h r a s e s t a t u s e l e m e n t s o f t h e f o r m ( , X , n ) ( X Y1 ... Y n , Z , F ) w i t h t h e p a i r ( , X , p ) (Y1 ... Y n , Z , F ' ) . I f t h e f l a g F = e, t h e n F ' = p; o t h e r w i s e , F ' -- F.. 4. R e p l a c e a n e l e m e n t o f t h e f o r m ( , X , n ) w i t h t h e p a i r ( , X , p ) (Y1 ... Y n , Z , n ) if t h e r e is a g r a m m a r rule o f t h e f o r m Z - - > X Y1 ... Y n in t h e g r a m m a r , p r o v i d e d t h e rule is n o t l e f t - r e c u r s i v e . . 5. G e t t h e n e x t w o r d W f r o m t h e s e n t e n c e a n d a d d t h e e l e m e n t ( , W , n ) t o t h e f r o n t ( l e f t ) e n d o f t h e e l e m e n t list.. 6. D e l e t e a n e l e m e n t o f t h e f o r m (Y1 ... Y n , X , e ) . . T h e s e o p e r a t i o n s a r e a p p l i e d o n e a t a t i m e , in a r b i - t r a r y o r d e r . I f m o r e t h a n o n e o p e r a t i o n c a n c a u s e a c h a n g e in t h e e l e m e n t list a t a n y g i v e n t i m e , t h e n o n e o f t h e m is s e l e c t e d f o r a c t u a l a p p l i c a t i o n . T h e m a n n e r o f s e l e c t i o n is n o t s p e c i f i e d h e r e b e c a u s e t h a t is a f u n c t i o n o f t h e d a t a s t r u c t u r e s a n d p r o c e - d u r e s t h a t w o u l d h a v e t o b e a d d e d t o i n c o r p o r a t e t h e a l g o r i t h m i n t o a c o m p l e t e s e n t e n c e a n a l y s i s p r o - g r a m . . T h e s e o p e r a t i o n s h a v e a f a i r l y s i m p l e p u r p o s e : t h e m a j o r g o a l is t o f i n d n e w p h r a s e s , w h i c h a r e r e p r e s e n t e d b y p h r a s e s t a t u s e l e m e n t s o f t h e f o r m ( , X , n ) . Initially, b y a p p l i c a t i o n o f o p e r a t i o n 5, a w o r d o f t h e s e n t e n c e t o b e a n a l y z e d is m a d e i n t o a n e w p h r a s e . W h e n a n e w p h r a s e is f o u n d , w h e t h e r b y o p e r a t i o n 5 o r s o m e o t h e r o p e r a t i o n , t h e r e a r e t w o w a y s t h a t c a n b e u s e d t o f i n d a l a r g e r p h r a s e . O n e w a y is t o a t t e m p t t o f i n d a n e w p h r a s e t h a t b e g i n s w i t h t h e j u s t - f o u n d p h r a s e ; this is t h e p u r - p o s e o f o p e r a t i o n 4. O n c e a n X p h r a s e is f o u n d , t h e e l e m e n t ( , X , n ) is r e p l a c e d b y ( , X , p ) (Y1 ... Y n , Z , n ) f o r s o m e g r a m m a r r u l e o f t h e f o r m Z - - > X Y1 ... Y n a n d s o m e Z d i f f e r e n t f r o m X. T h e e l e m e n t (Y1 ... Y n , Z , n ) r e p r e s e n t s a n a t t e m p t t o f i n d a Z p h r a s e , o f w h i c h t h e f i r s t c o m p o n e n t , t h e X p h r a s e , h a s b e e n f o u n d . . T h e s e c o n d w a y t h a t c a n b e u s e d t o m a k e a l a r g - e r p h r a s e is t o m a k e t h e X p h r a s e a c o m p o n e n t o f a p h r a s e t h a t h a s a l r e a d y b e e n s t a r t e d . I n o p e r a t i o n 3, t h e e l e m e n t ( , X , n ) r e p r e s e n t s a n e w X p h r a s e t h a t c a n b e u s e d in this w a y . A n i m m e d i a t e l y p r e - c e d i n g p h r a s e h a s b e e n m a d e p a r t o f s o m e u n f i n - i s h e d Z p h r a s e , r e p r e s e n t e d b y t h e e l e m e n t ( X Y1 ... Y n , Z , F ) . Since t h e s y m b o l X is t h e f i r s t s y m b o l in t h e first p a r t o f this e l e m e n t , t h e Z p h r a s e c a n c o n - t a i n a n X p h r a s e a t this p o i n t in t h e s e n t e n c e , so t h e X p h r a s e is p u t in t h e Z p h r a s e , a n d t h e r e s u l t o f this a c t i o n is i n d i c a t e d b y t h e n e w e l e m e n t s ( , X , p ) (Y1 ... Y n , Z , F ' ) . . I n b o t h o p e r a t i o n s 3 a n d 4, t h e e l e m e n t ( , X , n ) itself is r e p l a c e d b y ( , X , p ) . T h e f l a g v a l u e p indi- c a t e s , in this c a s e , t h a t t h e X p h r a s e h a s a l r e a d y b e e n a d d e d t o s o m e l a r g e r p h r a s e . O p e r a t i o n 2 t r i e s t o e x t e n d t h e X p h r a s e b y c r e a t i n g t h e e l e m e n t (Y1 ... Y n , X , e ) f o r s o m e l e f t - r e c u r s i v e g r a m m a r r u l e X - - > X Y1 ... Y n , i n d i c a t i n g t h a t t h e X. A m e r i c a n Journal of Computational Linguistics, V o l u m e 6, N u m b e r 2, April-June 1980 91. Daniel Chester A Parsing Algorithm That Extends P h r a s e s . phrase can be extended if it is followed by phrases of types Y1 . . . . . Yn, in that order. If a Y1 phrase comes next in the sentence, the extension begins by an application of o p e r a t i o n 5, which changes the flag from e to p to indicate that the extension is in progress. If there is no Y1 phrase, however, the X phrase cannot be extended, so the element is deleted. .by operation 6, allowing the parser to go on to the next phrase.. In the event a phrase status element has the form (* Y1 ... Yn,X,F), the X phrase can be considered completed. It can also be extended, however, if it is followed by phrases of types Y1 through Yn. Oper- ation 1 creates the new element (,X,F) to indicate the completion of the X phrase, and the new ele- ment (Y1 ... Yn,X,e) to indicate its possible exten- sion. Again, if extension turns out not to be possi- ble, the element can be deleted by o p e r a t i o n 6 so that parsing can continue.. 3.3 Examples. As examples of how the algorithm works, consid- er how it applies to sentences 1 and 2. G r a m m a r G1 must be modified first, but this consists only of substituting the rule NP - - > NP1 * RelPro VP for the two NP rules in the original grammar, as de- scribed earlier. Starting with an empty element list, the sentence "this is the cat that caught the rat that stole the cheese" is processed as shown in Table 1. By operation 5, the word "this" is removed from the sentence and the element (,this,n) is added to the list. By o p e r a t i o n 4, this e l e m e n t is replaced by (,this,p) (,Pro,n), which is shortened to (,Pro,n) by operation 2. By two applications o f operations 4 and 2, the element list b e c o m e s ( , N P , n ) , t h e n (VP,S,n). The element (,is,n) is now added to the f r o n t of the list. This element is changed, by opera- tions 4 and 2 again, to (,V,n) and t h e n to (NP,VP,n). The words " t h e " and " c a t " are proc- essed similarly to p r o d u c e the e l e m e n t list (,N,n) ( N , N P I , n ) (NP,VP,n) (VP,S,n) at step 20.. At this point, operation 3 is applied to combine (,N,n) with ( N , N P I , n ) , yielding (,N,p) ( , N P I , n ) in their place. By operation 2, element (,N,p) is re- moved. T h e first e l e m e n t of the list is now ( , N P I , n ) , which by o p e r a t i o n 4 is changed to ( , N P I , p ) (* RelPro VP,NP,n). When o p e r a t i o n 2 is applied this time, because there is a left-recursive grammar rule for NP1 phrases, element ( , N P I , p ) is replaced by ( P P , N P I , e ) . Operation 1 is applied to the next element to eliminate the special mark * that appears in it, changing the element list to ( P P , N P I , e ) ( R e l P r o V P , N P , e ) (,NP,n) ( N P , V P , n ) (VP,S,n) at step 25.. At this point, the element (,NP,n) represents the NP phrase " t h e cat". O p e r a t i o n 3 is applied to it. and the next element, and then operation 2 to the result, reducing those two elements to (,VP,n). By operations 3 and 2 again, this element is combined with (VP,S,n) to produce (,S,n), which at this point represents the phrase "this is the c a t . " If this phrase were the whole sentence, operation 6 could be applied, reducing the element list to (,S,n) and the sentence would be successfully parsed. T h e r e are more words in the sentence, however, so o t h e r operations are tried.. T h e next word, " t h a t " , is p r o c e s s e d by o p e r a - tions 5, 4 and 2 to add (,RelPro,n) to the front of the list. Since the grammar does not allow a PP phrase to begin with the word " t h a t " , o p e r a t i o n 6 is applied to eliminate the element ( P P , N P I , e ) , which represents the possibility of an extension of an NP1 phrase by a PP phrase. The next element, (RelPro VP,NP,e), represents the possibility of an extension of an NP phrase when it is followed by a RelPro phrase, however, so operations 3 and 2 are applied, changing the element list to (VP,NP,p) (,S,n). No te that the flag value e has changed to p; this means that a VP phrase m u s t be found now to complete the NP phrase extension or this sequence of opera- tions will fail.. By. continuing in this fashion, the s e n t e n c e is parsed. Since no new details of how the algorithm works would be illustrated by continuing the narra- tion, the continuation is omitted.. T h e s e n t e n c e " t h e rat stole the cheese in th e p a n t r y by the b r e a d " is parsed in a similar fashion. The only detail that is significantly different from the previous s e n t e n c e is that a f t e r the e l e m e n t (RelPro VP,NP,e) is deleted by o p e r a t i o n 6, instead of ( P P , N P I , e ) , a new situation occurs, in which a phrase can attach to one of s e v e r a l phrases waiting to be extended. The situation occurs after the sen- tence corresponding to the phrase " t h e rat stole the cheese" is r e p r e s e n t e d by the element (,S,n) when it first appears on the e l e m e n t list. W h e n the PP phrase "in the p a n t r y " is f o u n d , the e l e m e n t ( P P , N P I , e ) changes to ( , N P I , p ) , indicating that the NP1 phrase " t h e cheese" has been e x t e n d e d to " t h e cheese in the p a n t r y " . By operation 2, the element ( , N P I , p ) is changed to ( P P , N P I , e ) so that the NP1 phrase can be e x t e n d e d again. But " t h e p a n t r y " is an NP1 phrase also, which means that an element of the form ( P P , N P I , e ) has b e e n created to extend it as well. Thus, when the next PP phrase, " b y the b r e a d " is found, it can attach to either of the earlier NP1 phrases. T h e parser does n o t decide which a t t a c h m e n t to make, as that depends on n o n - s y n t a x related properties of the data structures that would be associated with the phrases in a complete sen- tence analyzer. In this example the PP phrase can be attached to the NP1 phrase " t h e c h e e s e " , which. 92 American Journal of Computational Linguistics, V o l u m e 6, N u m b e r 2, April-June 1980. Daniel C h e s t e r A Parsing A l g o r i t h m T h a t Extends Phrases. STEP. I. 2. 3. 4. 5. 6. 7. 8. 9. I0. 11. 12. 13. 14. 15. 16. 17. 18. 19. 2O. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. ELEMENT LIST OPERATION. (,the,n). (,the,p) (,Det,n). (,Det,n). (,Det,p) (N,NPI,n). (N,NPI,n). (,cat,n) (N,NPI.,n). (,cat,p) (,N,n) (N,NPI,n). (,N,n) (N,NPI,n). (,N,p) (,NPl,n). (,NPl,n). (,NPl,p) (* RelPro VP,NP,n). (PP,NPI,e) (* RelPro VP,NP,n). (PP,NPI,e) (RelPro VP,NP,e) (,NP,n) (NP,VP,n). (PP,NPl,e) (RelPro VP,NP,e) (,NP,p) (,VP,n). (,this,n) 5. (,this,p) (,Pro,n) 4. (,Pro,n) 2. (,Pro,p) (,NP,n) 4. (,NP,n) 2. (,NP,p) (VP, S,n) 4. (VP,S,n) 2. (,is,n) (VP,S,n) 5. (,is,p) (,V,n) (VP,S,n) 4. (,V,n) (VP, S,n) 2. (,V,p) (NP,VP,n) (VP,S,n) 4. (NP,VP, n) (VP, S,n) 2. (NP,VP,n) (VP,S,n) 4. (NP,VP,n) (VP,S,n) 2. (NP,VP,n) (VP,S,n) 4. (NP,VP,n) (VP,S,n) 4. (NP,VP,n) (VP,S,n) 2. (NP,VP,n) (VP,S,n) 5. (NP,VP,n) (VP,S,n) 4. (NP,VP,n) (VP,S,n) 2. (NP,VP,n) (VP,S,n) 3. (NP,VP,n) (VP,S,n) 2. (NP,VP,n) (VP,S,n) 4. (NP,VP,n) (VP,S,n) 2. (VP,S,n) I. (VP,S,n) 3. (PP,NPI,e) (RelPro VP,NP,e) (,VP,n) (VP,S,n). (PP,NPI,e) (RelPro VP,NP,e) (,VP,p) (,S,n). (PP,NPl,e) (RelPro VP,NP,e) (,S,n). (,that,n) (PP,NPI,e) (RelPro VP,NP,e) (,S,n). (,that,p) (,RelPro,n) (PP,NPI,.e) (RelPro VP,NP,e) (,S,n). (,RelPro,n) (PP,NPI,e) (RelPro VP,NP,e) (,S,n). (,RelPro,n) (RelPro VP,NP,e) (,S,n). (,RelPro,p) (VP,NP,p) (,S,n). (VP, NP,p)(,S,n). etc.. 2. 3. 2. 5. 4. 2. 6. 3. 2. Table 1. Trace of parsing algorithm on sentence 1.. m e a n s t h a t t h e i n t e r v e n i n g e l e m e n t ( P P , N P I , e ) a t - t e m p t i n g t o e x p a n d t h e N P 1 p h r a s e " t h e p a n t r y " h a s t o b e d e l e t e d . . 3 . 4 C o n s t r a i n t s o n t h e O p e r a t i o n s . The algorithms for top-down, bottom-up and l e f t - c o r n e r p a r s i n g a r e u s u a l l y p r e s e n t e d so t h a t all o p e r a t i o n s a r e p e r f o r m e d o n t h e top o f a s t a c k t h a t c o r r e s p o n d s t o o u r e l e m e n t list. W e h a v e n o t c o n - s t r a i n e d o u r a l g o r i t h m in this w a y b e c a u s e t o d o s o w o u l d p r e v e n t t h e d e s i r e d c l o s u r e b e h a v i o r . I n p a r - . ticular, in t h e s e n t e n c e " t h i s is t h e c a t t h a t c a u g h t t h e r a t t h a t s t o l e t h e c h e e s e , " t h e N P p h r a s e " t h e c a t " w o u l d n o t c o m b i n e i n t o t h e p h r a s e " t h i s is t h e c a t " until t h e r e s t o f t h e s e n t e n c e w a s a n a l y z e d if s u c h a c o n s t r a i n t w e r e e n f o r c e d . T h i s is b e c a u s e o p e r a t i o n 1 w o u l d c r e a t e a n e l e m e n t f o r e x t e n d i n g t h e N P p h r a s e t h a t w o u l d h a v e t o b e d i s p o s e d o f f i r s t b e f o r e t h e e l e m e n t ( , N P , n ) , c r e a t e d a l s o b y t h a t o p e r a t i o n , c o u l d c o m b i n e w i t h a n y t h i n g else. T h u s , c o n s t r a i n i n g t h e o p e r a t i o n s t o a p p l y o n l y t o t h e f r o n t e n d o f t h e e l e m e n t list w o u l d n u l l i f y t h e a l g o r i t h m ' s p u r p o s e . . A m e r i c a n Journal of C o m p u t a t i o n a l Linguistics, V o l u m e 6, N u m b e r 2, April-June 1980 9 3 ". Daniel Chester A Parsing Algorithm That Extends Phrases. O u r a l g o r i t h m c a n b e v i e w e d as a m o d i f i c a t i o n o f the l e f t - c o r n e r parser. In fact, if a g r a m m a r is not m o d i f i e d b e f o r e use w i t h o u r algorithm, a n d if o p e r - a t i o n 4 is not r e s t r i c t e d to n o n - l e f t - r e c u r s i v e rules, and if o p e r a t i o n 2 is m o d i f i e d t o d e l e t e a l l e l e m e n t s of t h e f o r m ( , X , p ) , t h e n o u r a l g o r i t h m w o u l d a c t u a l - ly be a l e f t - c o r n e r parser.. 3.5 Experimental Program. T h e a l g o r i t h m has b e e n p r e s e n t e d h e r e in a sim- ple f o r m t o m a k e its e x p o s i t i o n easier a n d its o p e r - ating principles clearer. W h e n it was t r i e d o u t in a n e x p e r i m e n t a l p r o g r a m , s e v e r a l t e c h n i q u e s w e r e u s e d to m a k e it w o r k m o r e e f f i c i e n t l y . F o r e x a m p l e , o p e r a t i o n s 1 a n d 2 w e r e c o m b i n e d with t h e o t h e r o p e r a t i o n s so t h a t t h e y w e r e , in e f f e c t , a p p l i e d as s o o n as possible. O p e r a t i o n 3 was also a p p l i e d as s o o n as possible. T h e o t h e r o p e r a t i o n s c a n n o t b e g i v e n a d e f i n i t e o r d e r f o r t h e i r a p p l i c a t i o n ; t h a t m u s t be d e t e r m i n e d b y t h e n o n - s y n t a c t i c p r o c e d u r e s t h a t are a d d e d t o the p a r s e r . . A n o t h e r t e c h n i q u e i n c r e a s e d e f f i c i e n c y b y a p p l y - ing o p e r a t i o n 4 o n l y w h e n t h e result is c o n s i s t e n t with the g r a m m a r . S u p p o s e g r a m m a r G1 c o n t a i n e d the rule D e t - - > t h a t as well as t h e rule R e l P r o - - > that. W h e n t h e w o r d " t h a t " in t h e s e n t e n c e " t h i s is t h e cat t h a t c a u g h t t h e r a t t h a t stole t h e c h e e s e " is p r o c e s s e d , t h e e l e m e n t list c o n t a i n s t h e t r i p l e ( , t h a t , n ) ( P P , N P I , e ) ( R e l P r o V P , N P , e ) at o n e point. T h e g r a m m a r p e r m i t s o p e r a t i o n 4 t o b e a p - plied to ( , t h a t , n ) t o p r o d u c e e i t h e r ( , D e t , n ) o r ( , R e l P r o , n ) , b u t o n l y t h e l a t t e r c a n lead t o a suc- cessful p a r s e b e c a u s e t h e g r a m m a r d o e s n o t allow. k 0 ~ ] o r e i t h e r a P P p h r a s e o r a R e l P r o p h r a s e t o b e g i n V ~ , ~ ' x with a D e t phrase. T h e t e c h n i q u e f o r guiding o p e r - . .x' f f ~ \ a t i o n 4 so t h a t it p r o d u c e s o n l y u s e f u l e l e m e n t s c o n - ~ < ~ \ s i s t s o f c o m p u t i n g b e f o r e h a n d all t h e p h r a s e t y p e s \ < h ~ ~that c a n be-gin e a c h p h r a s e . T h e n - t h e - f o l l o w i n g "~,,k v "operation is u s e d ~n p~ace ~ o p e r a t i o n 4:. 4'a. If an e l e m e n t o f t h e f o r m ( , X , n ) is at t h e ( r i g h t ) e n d o f t h e list, r e p l a c e it w i t h t h e pair ( , X , p ) (Y1 ... Y n , Z , n ) w h e n t h e r e is a g r a m m a r rule o f t h e f o r m Z - - > X Y1 ... Y n in t h e g r a m m a r , p r o v i d e d t h e r u l e is n o t l e f t - r e c u r s i v e . . 4 ' b . If a n e l e m e n t o f t h e f o r m ( , X , n ) is f o l l o w e d b y an e l e m e n t o f t h e f o r m (U1 ... U m , V , F ) , r e p l a c e ( , X , n ) w i t h t h e pair ( , X , p ) (Y1 ... Y n , Z , n ) w h e n t h e r e is a g r a m m a r rule o f t h e f o r m Z - - > X Y1 ... Y n in t h e g r a m m a r , p r o v i d e d t h e rule is n o t l e f t - r e c u r s i v e , a n d a Z p h r a s e c a n b e g i n a U1 p h r a s e o r Z = U 1 .. It is s u f f i c i e n t t o c o n s i d e r o n l y t h e f i r s t e l e m e n t a f t e r an e l e m e n t o f the f o r m ( , X , n ) b e c a u s e if o p e r - a t i o n 4 ' b c a n n o t b e applied, e i t h e r t h a t first e l e m e n t . c a n b e d e l e t e d b y o p e r a t i o n 6 o r t h e p a r s e is going t o fail a n y w a y . T h u s , in o u r e x a m p l e a b o v e , o p e r a - t i o n 6 c a n b e u s e d t o d e l e t e t h e e l e m e n t ( P P , N P I , e ) so t h a t o p e r a t i o n 4 ' b c a n b e a p p l i e d to ( , t h a t , n ) ( R e l P r o V P , N P , e ) . T h i s t e c h n i q u e is e s s e n t i a l l y t h e sam e as t h e s e l e c t i v e fi l t er techniqueo d e s c r i b e d b y G r i f f i t h s a n d P e t r i c k ( 1 9 6 5 ) f o r l e f t - c o r n e r p a r s i n g ~ g o r i t h m , in t h e i r t e r m i n o l o g y ) . . A n o t h e r t e c h n i q u e i n c r e a s e d e f f i c i e n c y f u r t h e r b y p o s t p o n i n g t h e d e c i s i o n a b o u t w h i c h o f s e v e r a l g r a m m a r rules t o a p p l y via o p e r a t i o n s 3 or 4' f o r as l o n g as possible. T h e g r a m m a r rules w e r e s t o r e d in L i s p list s t r u c t u r e s so t h a t ru l es h a v i n g t h e s a m e l e f t - h a n d side a n d a c o m m o n initial s e g m e n t in t h e i r r i g h t - h a n d side s h a r e d a c o m m o n list s t r u c t u r e , f o r e x a m p l e , if t h e g r a m m a r co n si st s o f t h e rules. X->YZU. X->YZV. W->YZU. t h e s e rules are s t o r e d as t h e list s t r u c t u r e . ( (x (z (u). (v)) ) (w (z (u))) ). w h i c h is s t o r e d o n t h e p r o p e r t y list f o r Y. T h e c o m - m o n initial s e g m e n t s h a r e d b y t h e first t w o rules is r e p r e s e n t e d b y t h e sam e p a t h t o t h e a t o m Z in t h e list s t r u c t u r e . T h e c o m p o n e n t ( X (Z ( U ) ( V ) ) ) in this list s t r u c t u r e m e a n s t h a t a Y p h r a s e c a n b e f o l - l o w e d b y a Z p h r a s e a n d t h e n e i t h e r a U p h r a s e o r a V p h r a s e t o m a k e a n X p h r a s e . W h e n a Y p h r a s e is f o u n d , a n d it is d e c i d e d t o t r y t o f i n d a n X p h r a s e , this c o m p o n e n t m a k e s it p o ssi b l e t o l o o k f o r a Z p h r a s e , b u t it p o s t p o n e s t h e d e c i s i o n as to w h e t h e r t h e Z p h r a s e s h o u l d b e f o l l o w e d b y a U p h r a s e or a V p h r a s e until a f t e r t h e Z p h r a s e has b e e n f o u n d . T h e l e f t - h a n d sides ( X a n d W ) o f t h e rules are listed first t o f a c i l i t a t e o p e r a t i o n 4 ' b . T h i s t e c h n i - q u e is similar t o a t e c h n i q u e u s e d b y I r o n s ( 1 9 6 1 ) a n d d e s c r i b e d b y G r i f f i t h s a n d P e t r i c k ( 1 9 6 5 ) . . 4. Related W o r k . T h e r e are t w o s e n t e n c e analysis p r o g r a m s w i t h p a r s i n g a l g o r i t h m s t h a t r e s e m b l e o u r s in m a n y w a y s , t h o u g h t h e i r s h a v e b e e n d e s c r i b e d in t e r m s t h a t are i n t i m a t e l y t i e d t o t h e p a r t i c u l a r s e m a n t i c a n d s y n - t a c t i c r e p r e s e n t a t i o n s u s e d b y t h o s e p r o g r a m s . T h e p r o g r a m s a r e P A R S I F A L , b y M a r c u s ( 1 9 7 6 , 1 9 7 8 a , b ) a n d t h e a n a l y s e r o f R i e s b e c k ( 1 9 7 5 a , b ) . . 9 4 American Journal of Computational Linguistics, Volume 6, Number 2, April-June 1980. Daniel Chester A Parsing Algorithm That Extends Phrases. 4.1 P A R S I F A L ( M a r c u s ) . T h e b a s i c s t r u c t u r a l u n i t o f P A R S I F A L is t h e node, w h i c h c o r r e s p o n d s a p p r o x i m a t e l y t o o u r p h r a s e s t a t u s e l e m e n t . N o d e s a r e k e p t in t w o d a t a s t r u c t u r e s , a p u s h d o w n s t a c k c a l l e d t h e active node stack, a n d a t h r e e - p l a c e constituent buffer. ( T h e b u f f e r a c t u a l l y h a s f i v e p l a c e s in it, b u t t h e p r o c e - d u r e s t h a t use it w o r k w i t h o n l y t h r e e p l a c e s a t a t i m e . ) T h e g r a m m a r r u l e s a r e e n c o d e d i n t o rule packets. Since t h e o r g a n i z a t i o n o f t h e s e p a c k e t s h a s t o d o w i t h t h e e f f i c i e n t s e l e c t i o n o f a p p r o p r i a t e g r a m m a r rules a n d t h e i n v o c a t i o n o f p r o c e d u r e s f o r a d d i n g s t r u c t u r a l d e t a i l s t o n o d e s , t h e p r o c e d u r e s w e w a n t t o i g n o r e while l o o k i n g at t h e p a r s i n g a l g o r - i t h m o f P A R S I F A L , w e will i g n o r e t h e rule p a c k e t s in this c o m p a r i s o n . T h e e s s e n t i a l f a c t a b o u t rule p a c k e t s is t h a t t h e y e x a m i n e o n l y t h e t o p n o d e o f t h e s t a c k , t h e S o r N P n o d e n e a r e s t t h e t o p o f t h e s t a c k , a n d u p t o t h r e e n o d e s in t h e b u f f e r . . T h e b a s i c o p e r a t i o n s t h a t c a n b e p e r f o r m e d o n t h e s e s t r u c t u r e s i n c l u d e a t t a c h i n g t h e first n o d e in t h e b u f f e r t o t h e t o p n o d e o f t h e s t a c k , w h i c h c o r r e - s p o n d s t o o p e r a t i o n 3 in o u r a l g o r i t h m , c r e a t i n g a n e w n o d e t h a t h o l d s o n e o r m o r e o f t h e n o d e s in t h e b u f f e r , w h i c h c o r r e s p o n d s t o o p e r a t i o n 4, a n d r e a c - t i v a t i n g a n o d e ( p u s h i n g it o n t o t h e s t a c k ) t h a t h a s a l r e a d y b e e n a t t a c h e d t o a n o t h e r n o d e so t h a t m o r e n o d e s c a n b e a t t a c h e d t o it, w h i c h c o r r e s p o n d s t o t h e p h r a s e e x t e n d i n g o p e r a t i o n s 1,2, a n d 6. P A R S I - F A L h a s o n e o p e r a t i o n t h a t is not s i m i l a r t o t h e o p e r a t i o n s o f o u r a l g o r i t h m , w h i c h is t h a t it c a n c r e a t e n o d e s f o r d u m m y N P p h r a s e s called traces. T h e s e n o d e s a r e i n t e n d e d t o p r o v i d e a w a y t o a c - c o u n t f o r p h e n o m e n a t h a t w o u l d o t h e r w i s e r e q u i r e t r a n s f o r m a t i o n a l g r a m m a r r u l e s t o e x p l a i n t h e m . O u r a l g o r i t h m d o e s n o t a l l o w s u c h a n o p e r a t i o n ; if s u c h a n o p e r a t i o n s h o u l d p r o v e t o b e n e c e s s a r y , h o w e v e r , it w o u l d n o t b e h a r d t o a d d , o r its e f f e c t c o u l d b e p r o d u c e d b y t h e p r o c e d u r e s called.. O n e o f t h e b e n e f i t s o f h a v i n g a b u f f e r in P A R S I - F A L is t h a t t h e b u f f e r a l l o w s f o r a k i n d o f l o o k - a h e a d b a s e d o n phrases i n s t e a d o f j u s t words. T h u s the d e c i s i o n a b o u t w h a t g r a m m a r rule t o a p p l y t o t h e f i r s t n o d e in t h e b u f f e r c a n b e b a s e d o n t h e phrases t h a t f o l l o w a c e r t a i n p o i n t in t h e s e n t e n c e u n d e r a n a l y s i s i n s t e a d o f j u s t t h e words. T h e s y s t e m c a n l o o k f u r t h e r a h e a d this w a y a n d still k e e p a strict limit o n t h e a m o u n t o f l o o k - a h e a d a v a i l a b l e . W e c a n g e t a similar e f f e c t w i t h o u r a l g o r i t h m if w e r e s t r i c t t h e a p p l i c a t i o n o f its o p e r a t i o n s t o t h e first f o u r o r five p h r a s e s t a t u s e l e m e n t s in t h e e l e m e n t list. I n a s e n s e , t h e first f i v e e l e m e n t s o f t h e list c o r r e s p o n d t o t h e b u f f e r in P A R S I F A L a n d t h e r e s t o f t h e list c o r r e s p o n d s to t h e s t a c k . I n f a c t , in a r e c e n t m o d i f i c a t i o n o f P A R S I F A L ( S h i p m a n a n d M a r c u s 1 9 7 9 ) t h e b u f f e r a n d s t a c k were c o m b i n e d i n t o a single d a t a s t r u c t u r e c l o s e l y r e s e m b l i n g o u r e l e m e n t list.. 4.2 R i e s b e c k ' s A n a l y z e r . T h e b a s i c s t r u c t u r a l unit o f R i e s b e c k ' s a n a l y z e r is t h e Conceptual Dependency structure as d e v e l o p e d b y S c h a n k ( 1 9 7 3 , 1 9 7 5 ) . A C o n c e p t u a l D e p e n d e n c y s t r u c t u r e is i n t e n d e d t o r e p r e s e n t t h e m e a n i n g o f a w o r d , p h r a s e o r s e n t e n c e . T h e d e t a i l s o f w h a t a C o n c e p t u a l D e p e n d e n c y s t r u c t u r e is will n o t b e dis- c u s s e d h e r e . . T h e m o n i t o r in R i e s b e c k ' s a n a l y z e r h a s n o list o r s t a c k o n w h i c h o p e r a t i o n s a r e p e r f o r m e d ; i n s t e a d , it h a s s o m e g l o b a l v a r i a b l e s t h a t s e r v e t h e s a m e p u r - p o s e . O n l y a f e w o f t h e s e v a r i a b l e s c o n c e r n us h e r e . T h e v a r i a b l e W O R D h o l d s t h e c u r r e n t w o r d b e i n g l o o k e d a t a n d c a n b e t h o u g h t o f as t h e f r o n t e l e m e n t o f o u r e l e m e n t list. T h e v a r i a b l e S E N S E h o l d s t h e s e n s e o f W O R D o r o f t h e n o u n p h r a s e o f w h i c h W O R D is t h e h e a d . I t is like t h e s e c o n d e l e - m e n t in o u r list. T h e e q u i v a l e n t t o t h e t h i r d e l e - m e n t in o u r list is t h e v a r i a b l e R E Q U E S T S , w h i c h h o l d s a list o f p a t t e r n - a c t i o n rules. T h e r e a r e s o m e o t h e r v a r i a b l e s ( s u c h as A R T - I N T ) t h a t o n o c c a s i o n s e r v e as t h e f o u r t h e l e m e n t o f t h e list.. U n l i k e t h e c o n t r o l l e r s in m a n y o t h e r a n a l y s i s p r o g r a m s , R i e s b e c k ' s m o n i t o r is n o t d r i v e n e x p l i c i t l y b y a g r a m m a r . I n s t e a d , t h e s y n t a c t i c i n f o r m a t i o n it u s e s is b u r i e d in t h e p a t t e r n - a c t i o n rules a t t a c h e d t o e a c h w o r d a n d w o r d s e n s e w i t h i n his p r o g r a m ' s lexi- con. T a k e , f o r e x a m p l e , t h e c o m m o n s e n s e o f t h e v e r b " g i v e " : o n e p a t t e r n - a c t i o n rule s a y s t h a t if t h e v e r b is f o l l o w e d b y a n o u n p h r a s e d e n o t i n g a p e r - son, t h e s e n s e o f t h a t p h r a s e is p u t in t h e r e c i p i e n t c a s e slot o f t h e v e r b . A n o t h e r p a t t e r n - a c t i o n rule s a y s t h a t if a f o l l o w i n g n o u n p h r a s e d e n o t e s a n o b - ject, t h e s e n s e o f t h e p h r a s e is p u t in t h e o b j e c t c a s e s l o t o f t h e v e r b . T h e s e p a t t e r n - a c t i o n rules c o r r e - s p o n d t o h a v i n g g r a m m a r rules o f t h e f o r m V P - - > give, a n d V P - - > V P N P , w h e r e t h e p a t t e r n - a c t i o n rules d e s c r i b e t w o d i f f e r e n t w a y s t h a t a V P p h r a s e a n d a n N P p h r a s e c a n c o m b i n e i n t o a V P p h r a s e . T h e r e is a t h i r d p a t t e r n - a c t i o n rule t h a t c h a n g e s t h e c u r r e n t s e n s e o f t h e w o r d " t o " in c a s e it is e n c o u n - t e r e d l a t e r in t h e s e n t e n c e , b u t t h a t is o n e o f t h e a c t i o n s t h a t o c c u r s b e l o w t h e s y n t a c t i c level.. N o u n p h r a s e s a r e t r e a t e d b y R i e s b e c k ' s m o n i t o r in a s p e c i a l w a y . U n m o d i f i e d n o u n s a r e c o n s i d e r e d to b e n o u n p h r a s e s d i r e c t l y , b u t p h r a s e s b e g i n n i n g w i t h a n article o r a d j e c t i v e a r e h a n d l e d b y a s p e c i a l s u b r o u t i n e t h a t c o l l e c t s t h e f o l l o w i n g a d j e c t i v e s a n d n o u n s before b u i l d i n g t h e c o r r e s p o n d i n g C o n c e p t u a l D e p e n d e n c y s t r u c t u r e . O n c e t h e w h o l e n o u n p h r a s e is f o u n d , t h e m o n i t o r e x a m i n e s t h e R E Q U E S T S list t o see if t h e r e a r e a n y p a t t e r n - a c t i o n rules t h a t c a n u s e t h e n o u n p h r a s e . I f so, t h e a s s o c i a t e d a c t i o n is t a k e n a n d t h e rule is m a r k e d s o t h a t it will n o t b e u s e d twice. T h e m o n i t o r is s t a r t e d w i t h a p a t t e r n - a c t i o n rule o n t h e R E Q U E S T S list t h a t p u t s a b e - g i n n i n g n o u n p h r a s e in t h e s u b j e c t c a s e s l o t o f w h a t e v e r v e r b t h a t f o l l o w s . ( T h e r e a r e p r o v i s i o n s . American Journal of Computational Linguistics, Volume 6, Number 2, April-June 1980 95. Daniel Chester A Parsing A l g o r i t h m T h a t Extends Phrases. t o r e a s s i g n s t r u c t u r e s t o d i f f e r e n t slots if l a t e r w o r d s o f t h e s e n t e n c e r e q u i r e it.). I t c a n b e s e e n t h a t R i e s b e c k ' s a n a l y s i s p r o g r a m w o r k s e s s e n t i a l l y b y p u t t i n g n o u n p h r a s e s in t h e c a s e slots o f v e r b s as t h e p h r a s e s a r e e n c o u n t e r e d in t h e s e n t e n c e u n d e r analysis. I n a s y n t a c t i c s e n s e , it builds a p h r a s e o f t y p e sentence ( f i r s t n o u n p h r a s e plus v e r b ) a n d t h e n e x t e n d s t h a t p h r a s e as f a r as p o s s i b l e , m u c h as o u r a l g o r i t h m d o e s u s i n g l e f t - r e c u r s i v e g r a m m a r rules. P r e p o s i t i o n s a n d c o n n e c - t i v e s b e t w e e n s i m p l e s e n t e n c e s c o m p l i c a t e t h e p r o c - ess s o m e w h a t , b u t t h e p r o c e s s is still s i m i l a r t o o u r s at t h e h i g h e s t l e v e l o f p r o g r a m c o n t r o l . . 5. C o n c l u d i n g R e m a r k s . Since o u r p a r s i n g a l g o r i t h m d e a l s o n l y w i t h s y n - t a x , it is n o t c o m p l e t e in itself, b u t c a n b e c o m b i n e d w i t h a v a r i e t y o f c o n c e p t u a l r e p r e s e n t a t i o n s t o m a k e a s e n t e n c e a n a l y z e r . W h e n e v e r a n o p e r a t i o n t h a t c o m b i n e s p h r a s e s is p e r f o r m e d , p r o c e d u r e s t o c o m - b i n e d a t a s t r u c t u r e s c a n b e c a l l e d as well. W h e n t h e r e is a choice o f o p e r a t i o n s t o b e p e r f o r m e d , p r o - c e d u r e s t o m a k e t h e c h o i c e b y e x a m i n i n g t h e d a t a s t r u c t u r e s i n v o l v e d c a n b e called, t o o . B e c a u s e o u r a l g o r i t h m c o m b i n e s p h r a s e s s o o n e r t h a n o t h e r s , t h e r e is g r e a t e r o p p o r t u n i t y f o r t h e d a t a s t r u c t u r e s t o i n f l u e n c e t h e p r o g r e s s o f t h e p a r s i n g p r o c e s s . T h i s m a k e s t h e r e s u l t i n g s e n t e n c e a n a l y z e r b e h a v e n o t o n l y in a m o r e h u m a n - l i k e w a y ( t h e c l o s u r e p r o p e r t y ) , b u t a l s o in a m o r e e f f i c i e n t w a y b e c a u s e it is less likely t o h a v e t o b a c k up.. A l t h o u g h t h e p r o g r a m s o f M a r c u s a n d R i e s b e c k s h a r e m a n y o f t h e s e s a m e p r o p e r t i e s , t h e s y n t a c t i c p r o c e s s i n g a s p e c t s o f t h o s e p r o g r a m s a r e n o t c l e a r l y s e p a r a t e d f r o m t h e p a r t i c u l a r c o n c e p t u a l r e p r e s e n t a - t i o n s o n w h i c h t h e y a r e b a s e d . W e b e l i e v e t h a t t h e p a r s i n g a l g o r i t h m p r e s e n t e d h e r e c a p t u r e s m a n y o f t h e i m p o r t a n t p r o p e r t i e s o f t h o s e p r o g r a m s s o t h a t t h e y m a y b e a p p l i e d t o c o n c e p t u a l r e p r e s e n t a t i o n s b a s e d o n o t h e r t h e o r i e s o f n a t u r a l l a n g u a g e . . R e f e r e n c e s . Aho, A. and UUman J., 1972. The Theory of Parsing, Translation and Compiling, Vol. 1, Prentice-Hall Inc., New Jersey.. Chomsky, N. 1965. Aspects of the Theory of Syntax, MIT Press, Cambridge, Mass.. Griffiths, T. and Petrick S., 1965. O n the relative efficiencies of context-free grammar recognizers. CACM 8, May, 289-300.. Grishman, R., 1975. A Survey of Syntactic Analysis Procedures for Natural Language. Courant Computer Science Report #8, Courant Institute of Mathematical Sciences, New York University, August. Also appears on AJCL Microfiche 47, 1976.. Heidorn, G., 1972. Natural language inputs to a simulation programming system. Report No. NPS-55HD72101A, Naval Postgraduate School, Monterey, Calif., October.. Irons, E., 1961. A syntax directed compiler for A L G O L 60. CACM 4, January, 51-55.. Kaplan, R., 1972. Augmented transition networks as psycholog- ical models of sentence comprehension. Artificial Intelligence 3, 77-100.. Kimball, J., 1973. Seven principles of surface structure parsing in natural language. Cognition 2, 15-47.. Lyons, J., 1970. Chomsky. F o n t a n a / C o l l i n s , London. Marcus, M., 1976. A design for a parser for English. ACM '76. Proceedings, Houston, Texas, Oct 20-22, 62-68. Marcus, M., 1978a. Capturing linguistic generalizations in a. parser for English. Proceedings of the Second National Con- ference of the Canadian Society for Computational Studies of Intelligence, University of T o r o n t o , T o r o n t o , Ontario, July 19-21, 64-73.. Marcus, M., 1978b. A computational account of some const- raints on language. Theoretical Issues in Natural Language Processing-2, D. WaitS, general chairman, University of Illi- nois at Urbana-Champaign, July 2 5 - 2 7 , 2 3 6 - 2 4 6 . . Novak, G., 1976. Computer understanding of physics problems stated in natural language. AJCL microfiche 53.. Parkison, R., Colby, K. and Faught, W., 1977. Conversational language comprehension using integrated pattern-matching and parsing. Artificial Intelligence 9, October, 111-134.. Riesbeck, C., 1975a. Computational understanding. Theoretical Issues in Natural Language Processing, R. Schank and B. Nash-Webber, eds., Cambridge, Mass., June 10-13, 11-16.. Riesbeck, C., 1975b. Conceptual analysis. In Schank (1975), 83-156.. Sehank, R., 1973. Identification of conceptualizations underly- ing natural language. In Schank R. and Colby, K., eds., Computer Models of Thought and Language, W. H. Freeman and Company, San Francisco.. Schank, R., 1975. Conceptual Information Processing. American Elsevier, New York.. Shipman, D., and Marcus, M., 1979. Towards minimal data structures for deterministic parsing. IJCAI-79, Tokyo, Aug 20-23, 815-817.. Wanner, IE., and Maratsos, M., 1978. Linguistic Theory and Psychological Reality, M. Halle, J. Bresnan, G. Miller, eds., MIT Press, Cambridge, Mass., 119-161.. Winograd, T., 1972. Understanding Natural Language. Academic Press, New York.. Woods, W., 1970. Transition network grammars for natural language analysis. CACM 13, October, 591-606.. Daniel Chester is an assistant professor in the De- partment o f Computer Sciences o f the University o f Texas at Austin. He received his Ph.D. in mathemat- ics from the University o f California at Berkeley in 1973.. 9 6 A m e r i c a n Journal of Computational Linguistics, V o l u m e 6, N u m b e r 2, April-June 1980

New documents

In this paper, we demonstrated that, by simply adding skip connections between all layer pairs of a neural network, we are able to achieve similar perplexity scores as a large stacked

Then the intravascular heat exchange cooling technique combined with plasmapheresis was applied, and the patient’s body temperature and elimination of the thyroid hormone was

TABLE 5: Indexes of Cost of Materials, Salaries and Wages, Total Costs and Remainder of Net Output Per Unit of Output, in Exposed, Sheltered and All Transportable Goods Industries,..

In the fu- ture, we plan to extensively compare the learning of word matrix representations with vector space models in the task of sentiment analysis on several datasets... References

We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data.. We conduct several systematic

In blood lipid profile, serum TG level had no signifi- cant difference P > 0.05 in two groups; serum cholesterol levels TC, HDL-C and LDL-C were significantly lower than the control

If this implication seems to clash with experience of small open economies, it is worth noting, however, that in a dynamic, Harrod-Domar, version of such a model, the balance of trade

Figure 2: Pseudo-code for a simple Θn2 time and Θn space algorithm for computing the cardinality of a circle graph’s maximum independent sets... Input: The σ-representation of an n

Other multilingual sentence representation learning techniques include BAE Chandar et al., 2013 which trains bilingual autoencoders with the objective of minimizing reconstruction error

1Center for Hepatobiliary and Pancreatic Diseases, Beijing Tsinghua Changgung Hospital, Medical Center, Tsinghua University, China; 2Department of Gastroenterology, The Second