Parsing Some Constrained Grammar Formalisms

46 

Parsing Some Constrained Grammar Formalisms. K. Vijay-Shanker* University of Delaware. David J. Weir* University of Sussex. In this paper we present a scheme to extend a recognition algorithm for Context-Free Gram- mars (CFG) that can be used to derive polynomial-time recognition algorithms for a set of for- malisms that generate a superset of languages generated by CFG. We describe the scheme by developing a Cocke-Kasami-Younger (CKY)-like pure bottom-up recognition algorithm for Lin- ear Indexed Grammars and show how it can be adapted to give algorithms for Tree Adjoining Grammars and Combinatory Categorial Grammars. This is the only polynomial-time recognition algorithm for Combinatory Categorial Grammars that we are aware of.. The main contribution of this paper is the general scheme we propose for parsing a variety of formalisms whose derivation process is controlled by an explicit or implicit stack. The ideas pre- sented here can be suitably modified for other parsing styles or used in the generalized framework set out by Lang (1990).. 1. I n t r o d u c t i o n . This paper presents a scheme to extend k n o w n recognition algorithms for Context-Free Grammars (CFG) in order to obtain recognition algorithms for a class of grammatical formalisms that generate a strict superset of the set of languages generated by CFG. In particular, we use this scheme to give recognition algorithms for Linear Indexed Grammars (LIG), Tree Adjoining Grammars (TAG), and a version of Combinatory Categorial Grammars (CCG). These formalisms belong to the class of mildly context- sensitive grammar formalisms identified by Joshi (1985) on the basis of some properties of their generative capacity. The parsing strategy that we propose can be applied to the formalisms listed as well as others that have similar characteristics (as outlined below) in their derivational process. Some of the main ideas underlying our scheme have been influenced by the observations that can be made about the constructions used in the proofs of the equivalence of these formalisms and Head Grammars (HG) (Vijay-Shanker 1987; Weir 1988; Vijay-Shanker and Weir 1993).. There are similarities between the TAG and HG derivation processes and that of Context-Free Grammars (CFG). This is reflected in c o m m o n features of the parsing algorithms for HG (Pollard 1984) and TAG (Vijay-Shanker and Joshi 1985) and the CKY algorithm for CFG (Kasami 1965; Younger 1967). In particular, what can happen at each step in a derivation can depend only on which of a finite set of "states" the derivation is in (for CFG these states can be considered to be the nonterminal symbols). This property, which we refer to as the context-freeness property, is important because it allows one to keep only a limited a m o u n t of context during the recognition process,. * Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716. E-mail: vijay@udel.edu. School of Cognitive and Computing Sciences, University of Sussex, Brighton BN1 9QH, U.K. E-mail: davidw@cogs.susx,ac.uk.. © 1994 Association for Computational Linguistics. Computational Linguistics Volume 19, N u m b e r 4. w h i c h r e s u l t s in p o l y n o m i a l t i m e a l g o r i t h m s . I n t h e r e c o g n i t i o n a l g o r i t h m s m e n t i o n e d a b o v e f o r C F G , H G , a n d T A G this is r e f l e c t e d in t h e fact t h a t t h e r e c o g n i z e r c a n e n c o d e i n t e r m e d i a t e s t a g e s o f t h e d e r i v a t i o n w i t h a b o u n d e d n u m b e r o f states. A n a r r a y is u s e d w h o s e e n t r i e s a r e a s s o c i a t e d w i t h a g i v e n c o m p o n e n t o f t h e i n p u t . I n t h e case o f t h e C K Y a l g o r i t h m , t h e p r e s e n c e o f a p a r t i c u l a r n o n t e r m i n a l in a n a r r a y e n t r y is u s e d to e n c o d e t h e fact t h a t t h e n o n t e r m i n a l d e r i v e s t h e a s s o c i a t e d s u b s t r i n g o f t h e i n p u t . T h e c o n t e x t - f r e e n e s s of C F G h a s t h e c o n s e q u e n c e t h a t t h e r e is n o n e e d to e n c o d e t h e w a y , o r w a y s , in w h i c h a n o n t e r m i n a l c a m e to b e p l a c e d in a n a r r a y entry.. I n this r e s p e c t , t h e d e r i v a t i o n p r o c e s s e s o f C C G a n d L I G w o u l d appear to d i f f e r f r o m t h a t o f C F G . I n t h e s e s y s t e m s u n b o u n d e d s t a c k l i k e s t r u c t u r e s r e p l a c e t h e r o l e p l a y e d b y n o n t e r m i n a l s in c o n t r o l l i n g d e r i v a t i o n choices. T h i s w o u l d s e e m to s u g g e s t t h a t t h e c o n t e x t - f r e e n e s s p r o p e r t y o f C F G , H G , a n d T A G d e r i v a t i o n s n o l o n g e r h o l d s . U n b o u n d e d s t a c k s c a n e n c o d e a n u n b o u n d e d n u m b e r o f e a r l i e r d e r i v a t i o n choices. I n fact, w h i l e t h e p a t h sets 1 o f C F G , H G , a n d T A G d e r i v a t i o n t r e e s a r e r e g u l a r l a n g u a g e s , t h e p a t h sets o f C C G a n d L I G a r e c o n t e x t - f r e e l a n g u a g e s . W i t h r e s p e c t to r e c o g n i t i o n a l g o r i t h m s , this s u g g e s t s t h a t t h e a r r a y ( w h o s e e n t r i e s c o n t a i n n o n t e r m i n a l s in t h e case o f C F G ) w o u l d n e e d to c o n t a i n c o m p l e t e e n c o d i n g s o f u n b o u n d e d s t a c k s g i v i n g a n e x p o n e n t i a l t i m e a l g o r i t h m . . H o w e v e r , in L I G a n d C C G , t h e u s e o f s t a c k s to c o n t r o l d e r i v a t i o n s is l i m i t e d in t h a t d i f f e r e n t b r a n c h e s o f a d e r i v a t i o n c a n n o t s h a r e stacks. T h u s , d e s p i t e t h e a b o v e o b s e r v a t i o n s , t h e c o n t e x t - f r e e n e s s p r o p e r t y d o e s in fact h o l d . A d e t a i l e d e x p l a n a t i o n o f w h y this is s o will b e p r e s e n t e d b e l o w . W e p r o p o s e a m e t h o d to e x t e n d t h e C K Y a l g o r i t h m to h a n d l e t h e l i m i t e d u s e o f s t a c k s f o u n d in C C G a n d LIG. We h a v e c h o s e n to a d a p t t h e C K Y a l g o r i t h m since it is t h e s i m p l e s t f o r m o f b o t t o m - u p p a r s i n g . A s i m i l a r a p p r o a c h u s i n g E a r l e y a l g o r i t h m is a l s o p o s s i b l e , a l t h o u g h n o t c o n s i d e r e d here. Since t h e u s e o f t h e s t a c k s is m o s t explicit in t h e L I G f o r m a l i s m w e d e s c r i b e o u r a p p r o a c h in d e t a i l b y d e v e l o p i n g a r e c o g n i t i o n a l g o r i t h m f o r L I G (Sections 2 a n d 3). W e t h e n s h o w h o w t h e g e n e r a l a p p r o a c h s u g g e s t e d in t h e p a r s e r f o r L I G c a n b e t a i l o r e d to C C G (in S e c t i o n 4). I n t h e a b o v e d i s c u s s i o n T A G h a s b e e n g r o u p e d w i t h H G . H o w e v e r , T A G c a n also b e v i e w e d as m a k i n g u s e o f s t a c k s in t h e s a m e w a y as L I G a n d C C G . I n S e c t i o n 5 w e s h o w h o w t h e L I G a l g o r i t h m p r e s e n t e d in S e c t i o n 3 c a n b e a d a p t e d f o r TAG.. 2. L i n e a r I n d e x e d G r a m m a r s . A n I n d e x e d G r a m m a r ( A h o 1968) c a n b e v i e w e d a s a C F G in w h i c h objects a r e n o n t e r - m i n a l s w i t h a n a s s o c i a t e d s t a c k o f s y m b o l s . I n a d d i t i o n to r e w r i t i n g n o n t e r m i n a l s , t h e r u l e s o f t h e g r a m m a r c a n h a v e t h e effect o f p u s h i n g o r p o p p i n g s y m b o l s o n t o p o f t h e s t a c k s t h a t a r e a s s o c i a t e d w i t h e a c h n o n t e r m i n a l . G a z d a r (1988) d i s c u s s e d a r e s t r i c t e d f o r m o f I n d e x e d G r a m m a r s in w h i c h t h e s t a c k a s s o c i a t e d w i t h t h e n o n t e r m i n a l o n t h e left o f e a c h p r o d u c t i o n c a n o n l y b e a s s o c i a t e d w i t h o n e o f t h e o c c u r r e n c e s o f n o n - t e r m i n a l s o n t h e r i g h t of t h e p r o d u c t i o n . S t a c k s o f b o u n d e d size a r e a s s o c i a t e d w i t h o t h e r o c c u r r e n c e s o f n o n t e r m i n a l s o n t h e r i g h t o f t h e p r o d u c t i o n . W e call this L i n e a r I n d e x e d G r a m m a r s (LIG). 2. 1 The path set of a tree is the set of strings labeling paths from the root to the frontier of the tree. The path set of a tree set is the union of path sets of trees in the set.. 2 The name Linear Indexed Grammars is used by Duske and Parchmann (1984) to refer to a different restriction on Indexed Grammars in which production was restricted to have only a single nonterminal on their right-hand side.. 592. K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms. D e f i n i t i o n 2.1 A LIG, G, is d e n o t e d b y (VN, VT, VI, S, P) w h e r e . VN is a finite set of n o n t e r m i n a l s , VT is a finite set of terminals, VI is a finite set of indices (stack symbols), S c VN is the start s y m b o l , a n d P is a finite set of p r o d u c t i o n s . . We a d o p t the c o n v e n t i o n t h a t (~, fl (with or w i t h o u t subscripts a n d primes) de- n o t e m e m b e r s of V~, a n d ~ d e n o t e s a stack symbol. As usual, A, B, C will d e n o t e n o n t e r m i n a l s , a, b, c will d e n o t e terminals, a n d u, v, w will d e n o t e m e m b e r s of V~.. D e f i n i t i o n 2.2 A pair consisting of a n o n t e r m i n a l , s a y A, a n d a string of stack s y m b o l s , s a y (~, will be called a n o b j e c t of the g r a m m a r a n d will be w r i t t e n as A (c~). G i v e n a g r a m m a r , G, w e define the set of objects Vc(G) = { A ((~) I A C VN, (~ E V~ }.. We u s e T to d e n o t e strings in (Vc(G) U VT)*. We w r i t e A ( - . ~ ) to d e n o t e the n o n - t e r m i n a l A associated w i t h a n a r b i t r a r y stack (~ w i t h the string o n top. Also, w e use A () to d e n o t e t h a t a n e m p t y stack is associated w i t h A. The g e n e r a l f o r m of a pro- d u c t i o n in a LIG is: a (.. (~) --+ W l a l (oL1)w2... a i - 1 (oq-1) w i a i (.. oq) Wi+lai+ 1 (oq+ 1 ) . . . A n (o@) Wn+ 1 for n > 0 a n d wl . . . , W,+l are m e m b e r s of V~-.. D e f i n i t i o n 2.3 The d e r i v a t i o n relation, ~ , is d e f i n e d below. If the a b o v e p r o d u c t i o n is u s e d t h e n for a n y fl ~ V{, T1, T2 E (Vc(G) U Wv) *:. T1A (rico T2 ~ TlWlA1 (o~1)W2... Ai-1 (oq-1) wiAi (tic, i)Wi+lAi+l (Oq+l) • .. An (oln) wn+IT2.. We use ~ as the reflexive, transitive closure of ~ . As a result of the linearity in the g e n e r a l f o r m of the rules, w e can observe t h a t the stack flc~ associated w i t h the object in the l e f t - h a n d side of the d e r i v a t i o n a n d flc~i associated w i t h o n e object in the r i g h t - h a n d side h a v e the initial p a r t fl in c o m m o n . In the d e r i v a t i o n above, w e will s a y t h a t this object a i (flOq) is the d i s t i n g u i s h e d child of A (flo0. G i v e n a deriva- tion, the d i s t i n g u i s h e d d e s c e n d a n t relation is the reflexive, transitive closure of the d i s t i n g u i s h e d child relation.. The l a n g u a g e g e n e r a t e d b y a LIG, G, L(G) = { w I S ( ) ~ w }.. Example 2.1 The LIG, G = ({ S, T }, { a, b, c }, { ")/a~ "Yb )~ S~/9) g e n e r a t e s ( w c w ] w C {a, b} + } w h e r e P contains the f o l l o w i n g p r o d u c t i o n s . . S ( . . ) - * a S ( . . % ) S ( . . ) - - ~ b S ( . . q / b ) S(..)---~ T ( . . ) . T ( . . % ) - - , T ( . . ) a T(..',/b)-+ T ( . . ) b T ( ) - - * c . A d e r i v a t i o n tree for the string abbcabb is g i v e n in Figure 1.. 593. Computational Linguistics Volume 19, Number 4. s f ). b s % v . "~aV b. T(~_ ) b. T ( ) a. I ¢. Figure 1 Derivation tree for LIG.. In this p a p e r r a t h e r t h a n a d o p t i n g the g e n e r a l f o r m o f r u l e s as g i v e n a b o v e , w e restrict o u r a t t e n t i o n to g r a m m a r s w h o s e r u l e s h a v e t h e f o l l o w i n g f o r m . In fact, this can b e easily s e e n to c o n s t i t u t e a n o r m a l f o r m fo r LIG.. 1. A (c0 ~ c w h e r e ~ C VT U {c} a n d l e n g t h o f c~, len (,9<) >>_ 1.. 2. A (.. " / 1 . . . Q/m) ----> Ap (.. Vp) As (O<s) w h e r e m > 0.. 3. a ('" ' 7 1 " . "Ym) --" As (OLs) a p (.. ~p) w h e r e m > 0.. 4. A ( " 7 1 . . . 7m) "--+ Ap (.. 7p) w h e r e m > 0.. We a l l o w at m o s t t w o s y m b o l s in t h e r i g h t - h a n d side o f p r o d u c t i o n s b e c a u s e w e i n t e n d to d e v e l o p CKY-style a l g o r i t h m s . In t h e a b o v e r u l e s w e s a y t h a t AF (.. "yp) is t h e primary constituent a n d As (c~s) is t h e secondary constituent. N o t i c e also t h a t in a d e r i v a t i o n u s i n g s u c h a rule, the p r i m a r y c o n s t i t u e n t y i e l d s t h e d i s t i n g u i s h e d child. (In g r a m m a t i c a l t h e o r i e s t h a t u s e a stack o f s u b c a t e g o r i z e d a r g u m e n t s , t h e t o p of t h e stack in t h e p r i m a r y c o n s t i t u e n t d e t e r m i n e s w h i c h s e c o n d a r y c o n s t i t u e n t it c a n c o m b i n e with.). 2.1 Terminators Let u s c o n s i d e r h o w w e m a y e x t e n d the CKY a l g o r i t h m f o r t h e r e c o g n i t i o n o f LIG. G i v e n a fixed g r a m m a r G a n d a n i n p u t al • .. an, t h e r e c o g n i t i o n a l g o r i t h m will c o m p l e t e . a n n x n a r r a y P s u c h t h a t a n e n c o d i n g of A (cO is s t o r e d in P [i, d] if a n d o n l y if A (oQ a i . . . ai+d-1. T h e a l g o r i t h m will o p e r a t e b o t t o m - u p . F o r e x a m p l e , if G c o n t a i n s t h e r u l e a ('" ")11... "Ym) ---+ a p (.. "~p) A s (O~s) a n d w e find a n e n c o d i n g of Ap (O<p'yp) in P Ii, dp] a n d a n e n c o d i n g of As (C~s) in P Ii + dp~ ds] t h e n a n e n c o d i n g of A (C~p'yl... "Ym) will b e s t o r e d . 594. K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms. in P Ii, dp + dsl. W h a t e n c o d i n g s c h e m e s h o u l d b e u s e d ? T h e m o s t s t r a i g h t f o r w a r d poss ibility w o u l d be to store a c o m p l e t e e n c o d i n g o f A (c~p3,~... 3,m) in P [i, dp + ds]. H o w e v e r , in g e n e r a l , if a n object A (~) d e r i v e s a s t r i n g o f l e n g t h d t h e n t h e l e n g t h o f o~ i s (,.9(d). 3 H e n c e t h e r e can be O(/d) objects t h a t d e r i v e a s u b s t r i n g o f t h e i n p u t (of l e n g t h d), for s o m e c o n s t a n t k. H e n c e , the s p a c e a n d t i m e c o m p l e x i t y o f this a l g o r i t h m is e x p o n e n t i a l in t h e w o r s t case. 4. T h e inefficiency of this a p p r o a c h can be s e e n b y d r a w i n g a n a n a l o g y w i t h t h e f o l l o w i n g a l g o r i t h m for CFG. S u p p o s e r a t h e r t h a n s t o r i n g sets o f n o n t e r m i n a l s in e a c h a r r a y entry, w e store a set of trees c o n t a i n i n g all d e r i v a t i o n s u b t r e e s t h a t y i e l d t h e c o r r e s p o n d i n g substring. T h e p r o b l e m w i t h this is t h a t t h e n u m b e r o f d e r i v a t i o n trees is e x p o n e n t i a l w i t h r e s p e c t to t h e l e n g t h of the string s p a n n e d . H o w e v e r , t h e r e is n o n e e d to store d e r i v a t i o n trees since in c o n s i d e r i n g the c o m b i n a t i o n o f s u b d e r i v a t i o n trees in the CFG, o n l y the n o n t e r m i n a l s at the r o o t of th e t ree are r e l e v a n t in d e t e r m i n i n g w h e t h e r t h e r e is a p r o d u c t i o n t h a t licenses the c o m b i n a t i o n . . L i k e w i s e b e c a u s e of the last-in first-out b e h a v i o r in t h e m a n i p u l a t i o n o f stacks in LIG, w e will a r g u e t h a t it is n o t n e c e s s a r y to store t h e e n t i r e stack. F o r instance, c o n s i d e r the d e r i v a t i o n ( d e p i c t e d b y the tree s h o w n in F i g u r e 2) f r o m t h e p o i n t o f v i e w of r e c o r d i n g the d e r i v a t i o n in a b o t t o m - u p p a r s e r (su ch as CKY). Let a n o d e ~?1 l a b e l e d B (fl3,1 . . . 3,k... 3,m) b e a distinguished descendant o f a n o d e ~1 l a b e l e d A (fl3,1 . . . 3,k) as s h o w n in the figure. V i e w i n g the tree b o t t o m - u p , let t h e n o d e ~], l a b e l e d A (fl3,1 • •. 3,k), be the first n o d e a b o v e the n o d e ~71, l a b e l e d B (fl3,1 •. • 3,k. • • 3,m), w h e r e 3,k gets e x p o s e d as the t o p of the stack. Because of the last-in first-out b e h a v i o r , e v e r y distinguished d e s c e n d a n t of ~] a b o v e 711 will h a v e a label of the f o r m A I (fl3,1 . . . 3,k~) w h e r e len (~) > 1. In o r d e r to r e c o r d the d e r i v a t i o n f r o m A (fl3,1 . . . 3,k) it w o u l d b e sufficient to st o re A a n d 3'1 .. • 3,k if w e c o u l d also access the e n t r y that r e c o r d s t h e d e r i v a t i o n f r o m At (fl3,t). In the e n t r y f o r ~?, u s i n g a p o i n t e r to the e n t r y for A t (fl3,t) w o u l d e n a b l e t h e r e c o v e r y of t h e stack b e l o w the t o p k s y m b o l s , 3,1 • .. "Yk. H o w e v e r , this s c h e m e w o r k s w el l o n l y w h e n k _> 2. For instance, w h e n k = 1, s u p p o s e w e r e c o r d e d o n l y A, 3,1, a n d a p o i n t e r to e n t r y for At (fl3,t). S u p p o s e t h a t w e are l o o k i n g f o r t h e s y m b o l b e l o w 3,1, i.e., t h e t o p of ft. T h e n it is possible t h a t in a similar w a y t h e latter e n t r y c o u l d also r e c o r d just At~ 3,t, a n d a p o i n t e r to s o m e o t h e r e n t r y to r e t r i e v e ft. This s i t u a t i o n can o c c u r a r b i t r a r i l y m a n y times.. C o n s i d e r t h e d e r i v a t i o n d e p i c t e d in F i g u r e 3. In this d e r i v a t i o n w e h a v e indi- c a t e d t h e b r a n c h c o n t a i n i n g o n l y t h e d i s t i n g u i s h e d d e s c e n d a n t s . We will a s s u m e t h a t the n o d e l a b e l e d D (f13,, ..-3,k-13,~ . - . 3/~n ,) is the closest d i s t i n g u i s h e d d e s c e n d a n t o f C (fl3,1..-3,k-13,~) s u c h t h a t e v e r y n o d e b e t w e e n t h e m will h a v e a label o f t h e f o r m C ' ( f l " Y l - . , 3,k-13,~ O/) w h e r e len (~') > 1. T h e r e f o r e , a n y n o d e b e t w e e n t h a t l a b e l e d C (fl3,1..-3,k-13,~) a n d B(fl3,1...3,rn) will h a v e a label o f t h e f o r m C " (fl3,1..-"~k-10/') w h e r e fen (c~") > 1. N o w the e n t r i e s r e p r e s e n t i n g d e r i v a t i o n s f r o m b o t h A ( f l 3 , 1 . . . 3,k-13,k) a n d C (fl3,1... 3,k-13,~) c o u l d p o i n t b a c k to t h e e n t r y f o r t h e d e r i v a t i o n f r o m A t (fl3,t), w h e r e a s the e n t r y f o r C' (fl3,1 ...3,k-13,~c~') will p o i n t b a c k to t h e e n t r y fo r A. We shall n o w f o r m a l i z e t h e s e n o t i o n s b y d e f i n i n g a t e r m i n a t o r . . 3 For instance, consider the g r a m m a r in Example 2.1 a n d the derivation in Figure 1. In general we can. h a v e derivations of the form T (q'a3"~) ~ cab n. However, if there exists p r o d u c t i o n s of the form A (c~) --~ ~ t h e n the l e n g t h of the stack in objects is not e v e n b o u n d e d b y the l e n g t h of strings they derive.. 4 The CCG p a r s i n g algorithms t h a t h a v e b e e n p r o p o s e d so far follow this strategy (Pareschi a n d S t e e d m a n 1987; Tomita 1988).. 595. C o m p u t a t i o n a l Linguistics Volume 19, N u m b e r 4. H 1. Figure 2 R e c o v e r i n g the rest of stack-1.. )m Z Figure 3 R e c o v e r i n g the rest of stack-2.. v v. Figure 4 D e f i n i t i o n of a Terminator.. 596. K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms. D e f i n i t i o n 2.4 S u p p o s e t h a t w e h a v e t h e d e r i v a t i o n tree in F i g u r e 4 t h a t d e p i c t s t h e f o l l o w i n g d e r i v a - tion:. A (fl3q . . . % - 1 7 ) ~ u B (fl'Ya . . . q / k - - l q / k . ' ' q/m) W u A t (flq/t) A s (O~s) w u v w . or similarly:. A (flq/1"'- q/k-lq/) u B (flq/1... q / k - l q / i . . , q/m) W u A s (Ols) A t (flq/t) w l~ V W. w h e r e the f o l l o w i n g c o n d i t i o n s h o l d . 2 < k < m . T h e n o d e s l a b e l e d B (flq/1 . . . q / k - l q / k . . . q/m) a n d A t (flq/t) are d i s t i n g u i s h e d d e s c e n d a n t s of the n o d e l a b e l e d A (flq/1 . . . q/k-lq/) in t h e r e s p e c t i v e trees.. F o r a n y d i s t i n g u i s h e d d e s c e n d e n t l a b e l e d C (c~') b e t w e e n t h e n o d e s . l a b e l e d A (flq/1... ")~k-lq/) a n d B (flq/1.-. q / k - l q / k ' ' ' q/m), O/ is of t h e f o r m flq/1 • • • q/kC~ w h e r e len (c~) > 1. N o t e that th e n o d e s l a b e l e d a (flq/1... q/k-lq/) a n d B (flq/1... q / k - l q / k ' ' " q/m) n e e d n o t b e different.. T h e n o d e l a b e l e d A t (flq/t) is the k-terminator of t h e n o d e l a b e l e d A (flq/1 . . . q/k-lq/).. W h e n it is clear f r o m c o n t e x t , r a t h e r t h a n s a y i n g t h a t a n o d e is a t e r m i n a t o r o f a n o t h e r w e will a s s u m e t h a t t e r m i n a t o r s h a v e b e e n d e f i n e d o n objects t h a t p a r t i c i p a t e in a d e r i v a t i o n as well. For instance, in the a b o v e d e r i v a t i o n s , w e will s a y t h a t A t (flq/t) is t h e k - t e r m i n a t o r of A (fl71 . . . 7k-l"Y). Also w h e n t h e d e r i v a t i o n is clear f r o m co n t ex t , w e will o m i t t h e m e n t i o n of t h e d e r i v a t i o n (or d e r i v a t i o n tree). A d d i t i o n a l l y , w e will s a y t h a t a n o d e (object) has a t e r m i n a t o r , if it has a k - t e r m i n a t o r f o r s o m e k.. We will n o w state s o m e p r o p e r t i e s of t e r m i n a t o r s t h a t i n f l u e n c e t h e d e s i g n o f o u r r e c o g n i t i o n a l g o r i t h m . . D e f i n i t i o n 2.5 G i v e n a g r a m m a r , G, d e f i n e M C L ( G ) ( M a x i m u m C h a n g e in L e n g t h ) as: M C L ( G ) = m a x { m ] A (.. q/1. . . q/m) --* T 1 A p ('" ~p) T2 is a p r o d u c t i o n of G }. H e n c e f o r t h , w e will w r i t e M C L since t h e g r a m m a r in q u e s t i o n will a l w a y s b e k n o w n f r o m context.. O b s e r v a t i o n 2.1 In a d e r i v a t i o n tree, if a n o d e (say ~) has a k - t e r m i n a t o r (say ~t) t h e n ~t is a dis- t i n g u i s h e d d e s c e n d a n t of ~/. If the n o d e ~/is l a b e l e d A (flc~) ( w h e r e len (c~) = k) t h e n the n o d e 7/t m u s t b e l a b e l e d A t (flq/t) f o r s o m e A t C V N a n d q/t ff VI. F u r t h e r m o r e , 2 < k < MCL.. O b s e r v a t i o n 2.2 In a d e r i v a t i o n tree, if a n o d e has a k - t e r m i n a t o r t h e n it h as a u n i q u e t e r m i n a t o r . . 597. Computational Linguistics Volume 19, Number 4. If ~/is t h e n o d e in q u e s t i o n t h e n w e are c l a i m i n g h e r e t h a t n o t o n l y d o e s it h a v e a u n i q u e k - t e r m i n a t o r b u t also t h a t t h e r e d o e s n o t exist k ~ w i t h k' ~ k s u c h t h a t ~ h a s a k M e r m i n a t o r . To see w h y this is t h e case, let s o m e n o d e ~? h a v e a k - t e r m i n a t o r (for s o m e k), s a y ~t. U s i n g O b s e r v a t i o n 2.1 w e can a s s u m e t h a t t h e y are l a b e l e d A (fl~l . . . ~k-l"Y) a n d At (flq/t), respectively, w h e r e w e h a v e ( k - 1 ) > 1. F r o m t h e d e f i n i t i o n o f t e r m i n a t o r s w e can a s s u m e t h a t the p a r e n t of the t e r m i n a t o r , ~/t, is a n o d e (say ~') t h a t h as a label o f the f o r m B (fl3'1 . . . "/k-l"~k... "Ym). Since ( f r o m th e d e f i n i t i o n o f t e r m i n a t o r s ) e v e r y n o d e b e t w e e n ~ a n d 7/~ (inclusive) m u s t h a v e a label of t h e f o r m C (fl'Yl - . . ")'k-la ~) w h e r e len (a ~) >_ 1, it i m m e d i a t e l y f o l l o w s t h a t Tit is th e closest distinguished d e s c e n d a n t of s u c h t h a t t h e l e n g t h of the stack in t h e object lab el i n g ~]t is strictly less t h a n t h e l e n g t h of the stack in t h e object l a b e l i n g ~/. F r o m this, the u n i q u e n e s s o f t e r m i n a t o r s follows.. O b s e r v a t i o n 2.3 C o n s i d e r t h e d e r i v a t i o n A (fl"Yl . . . "Yk-l"~) ~ uAt (fl"Yt) w ~ u v w w h e r e At (fl'~t) is t h e k - t e r m i n a t o r of A (fl~/1---'Tk-l"Y). T h e n f o r a n y fl' a n d v', if At (fl'~'t) ~ v' t h e n w e h a v e the d e r i v a t i o n A (fl'~l . . . "/k-~"/) ~ uAt (fl"Yt) w ~ u v ' w w h e r e At (fl"~t) is t h e k - t e r m i n a t o r of A (fl"~l . . . 3~k-~'Y).. This f o l l o w s f r o m t h e fact t h a t t h e d e r i v a t i o n o f uAt (fl"yt) w f r o m A (fl'Yl . . . "Yk-l"7) is i n d e p e n d e n t of ft. T h e r e f o r e w e c a n r e p l a c e At (fl')'t) ~ v b y At (fl'"/t) = ~ v'. Th i s is a v e r y i m p o r t a n t p r o p e r t y t h a t is crucial for o b t a i n i n g p o l y n o m i a l - t i m e a l g o r i t h m . . N o t e t h a t n o t all n o d e s h a v e t e r m i n a t o r s . F o r e x a m p l e , if a n o d e l a b e l e d A (a) is t h e p a r e n t of a n o d e l a b e l e d a (i.e., c o r r e s p o n d i n g to t h e u s e o f t h e p r o d u c t i o n A (a) --* a w h e r e a is a t e r m i n a l s y m b o l ) t h e n o b v i o u s l y this n o d e d o e s n o t h a v e a t e r m i n a t o r . . D e f i n i t i o n 2.6 G i v e n a g r a m m a r , G, w e d e f i n e M T L ( G ) ( M a x i m u m L e n g t h in, T e r m i n a l p r o d u c t i o n ) a s : . M T L ( G ) = max { len (a) ] A (a) --* c is a p r o d u c t i o n o f G w h e r e ~ c VT (_J{¢} }.. As in t h e case of MCL, w e will u s e M T L r a t h e r t h a n M T L ( G ) . . O b s e r v a t i o n 2.4 In the d e r i v a t i o n A (a) ~ w if len (a) > M T L t h e n A (a) h a s a t e r m i n a t o r . . T h e r e m u s t b e at least t w o steps in the a b o v e d e r i v a t i o n since len (a) > MTL. H o w e v e r , w e c a n a s s u m e t h a t t h e n o d e (say 7) in q u e s t i o n l a b e l e d b y t h e object. 1 A (a) has a d i s t i n g u i s h e d d e s c e n d a n t , s a y ~/~, w i t h label B (fl) s u c h t h a t B (fl) ~ ¢. T h e r e f o r e , len (fl) <_ M T L a n d w e m a y r e w r i t e w as u¢v. Since fen (a) > len (fl) w e c a n find the closest d i s t i n g u i s h e d d e s c e n d a n t o f ~ / l a b e l e d C ( a ~) f o r s o m e C, a ~ s u c h t h a t len (a ~) < fen (a). T h a t n o d e is t h e t e r m i n a t o r o f ~] f r o m t h e a r g u m e n t s m a d e in O b s e r v a t i o n 2.2.. T h e a b o v e o b s e r v a t i o n s will b e u s e d in the f o l l o w i n g sect i o n s to e x p l a i n t h e w a y in w h i c h w e r e p r e s e n t d e r i v a t i o n s in t h e p a r s i n g table. We c o n c l u d e this s e c t i o n w i t h an o b s e r v a t i o n t h a t has a b e a r i n g o n t h e steps o f t h e r e c o g n i t i o n a l g o r i t h m . . 598. K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms. Observation 2.5 C o n s i d e r the f o l l o w i n g derivation.. a (fl'~l""" "/k-l) T I A . (fl~/l... ')'k-l"Yk) T2 l~llap (fl71... 7k-17k) U2 Ul Vl A t (fl'/t) v2u2 Ul Vl WV2U2. w h e r e Ap (fiVe... "Yk-l'Yk) is the d i s t i n g u i s h e d child of A (f17~... ')'k-l) and At (flVt) is the k - t e r m i n a t o r of Ap (fl71..- 7k-l"Yk). At (fl~t) is the (k - 1 ) - t e r m i n a t o r of A (fl71 .-- ")'k-l) if a n d o n l y if k > 2. If k = 2 t h e n A (fl71) has a t e r m i n a t o r if a n d o n l y if At (fl'Tt) does. In fact, in this case, if At (fl'Tt) h a s a k M e r m i n a t o r t h e n t h a t t e r m i n a t o r is also the k~-terminator of A (flVt).. This can be seen b y c o n s i d e r i n g the d e r i v a t i o n s h o w n in Figure 3 a n d n o t i n g the s h a r i n g of the t e r m i n a t o r of C (fl3'1.-. 7k-17~) a n d A (fl"/1-.. 7k-l")/k) •. 3. Recognition Algorithms. As in the CKY a l g o r i t h m w e will u s e a t w o - d i m e n s i o n a l array, P, such t h a t if A (c~) ai.. • ai+d-1 t h e n a r e p r e s e n t a t i o n of this d e r i v a t i o n will be r e c o r d e d w i t h an e n c o d i n g of A ((~) in P [i, d]. H e r e w e a s s u m e t h a t the g i v e n i n p u t is al . . . an. We start o u r d i s c u s s i o n b y c o n s i d e r i n g the d a t a structures w e u s e to record s u c h objects a n d d e r i v a t i o n s f r o m them.. 3.1 Anatomy of an Entry We m e n t i o n e d earlier t h a t the stack in a n object can be u n b o u n d e d l y large. We m u s t first find a c o m p a c t w a y to store e n c o d i n g s of such objects w h o s e size is n o t b o u n d e d b y the g r a m m a r . In this section w e p r o v i d e s o m e m o t i v a t i o n for the e n c o d i n g s c h e m e u s e d in the r e c o g n i t i o n a l g o r i t h m b y c o n s i d e r i n g the b o t t o m - u p a p p l i c a t i o n of the rule a n d the e n c o d i n g of the p r i m a r y constituent:. A (..'y1. . .'~m) --* A p ("~/p) A s (,~s). The H e a d . A n object w i t h n o n t e r m i n a l Ap a n d top of stack "Tp will m a t c h the p r i m a r y c a t e g o r y of this rule. Thus, the first r e q u i r e m e n t is t h a t at least this m u c h of the object m u s t be i n c l u d e d in e v e r y e n t r y since it is n e e d e d to d e t e r m i n e if the rule can apply. This c o m p o n e n t is d e n o t e d l a p , v p / a n d called the h e a d of the entry. Thus, in general, a n e n t r y in P Ii, d I w i t h the h e a d {A,'~/ e n c o d e s d e r i v a t i o n s of ai...ai+cl-1 f r o m a n object of the f o r m A (fl'y) for s o m e f l ¢ V 7.. Terminator-pointer. A n e n c o d i n g of the object Ap (fl'Tp) (the p r i m a r y constituent) t h a t d e r i v e s the s u b s t r i n g a i . . . ai+dp_ 1 ( o f the i n p u t string al • • . an) will be stored in the a r r a y e l e m e n t P {i, dp] in o u r CKY-style recognition algorithms. N o w c o n s i d e r the e n c o d i n g of Ap (fl'yp) for s o m e sufficiently l o n g fl-yp. While the h e a d , lAp, ~p), of the e n t r y is sufficient to d e t e r m i n e w h e t h e r the object in q u e s t i o n can m a t c h the p r i m a r y c a t e g o r y of the rule, w e will n e e d to store m o r e i n f o r m a t i o n in o r d e r t h a t w e can d e t e r m i n e the c o n t e n t of the rest of the stack. In the above p r o d u c t i o n , if m = 0 t h e n the c o m b i n a t i o n of Ap (fl~/p) a n d As (~s) results in A (fl). In o r d e r to record the d e r i v a t i o n f r o m A (fl), w e n e e d to k n o w the t o p s y m b o l in the stack fl, i.e., the s y m b o l b e l o w the t o p of the stack associated w i t h the p r i m a r y constituent. We n e e d to recover the i d e n t i t y of. 599. Computational Linguistics Volume 19, Number 4. this s y m b o l f r o m the e n c o d i n g of the p r i m a r y category. This is w h y w e i n t r o d u c e d the n o t i o n of t e r m i n a t o r s . As m e n t i o n e d in Sect i o n 2.1, t e r m i n a t o r s c a n b e u s e d to access i n f o r m a t i o n a b o u t the rest of the stack. In t h e e n c o d i n g of A p (fl'yp), w e will s tore i n f o r m a t i o n t h a t a l l o w s u s to access t h e e n c o d i n g o f its t e r m i n a t o r . T h e p a r t o f the e n t r y e n c o d i n g t h e t e r m i n a t o r will b e called terminator pointer.. T h e M i d d l e . N o t e t h a t the object A p (fl,yp) (in t h e d e r i v a t i o n A p (fl3'p) = ~ a i . . . a i + d p - 1 ) c a n h a v e a k - t e r m i n a t o r w h e r e k is b e t w e e n 2 a n d MCL. T h e r e f o r e , f r o m O b s e r v a - tion 2.1 it f o l l o w s t h a t t h e t e r m i n a t o r - p o i n t e r can o n l y b e u s e d to d e t e r m i n e t h e ( k + l ) st s y m b o l f r o m the top. T h e r e f o r e , a s s u m i n g t h a t fl = fl"yl • .. "Yk-1, t h e t e r m i n a t o r - p o i n t e r will a l l o w u s to access fl~. (Recall f r o m t h e d e f i n i t i o n , a k - t e r m i n a t o r o f A (fl"yl . . . "Yk-13'p) will h a v e the f o r m A t (fl"Yt). T h u s the (k + 1) st s y m b o l f r o m t h e t o p in A (fl-yp) is t h e s a m e as t h e s y m b o l b e l o w t h e t o p of t h e stack o f t h e t e r m i n a t o r . ) T h u s , w e will n e e d to r e c o r d t h e s t r i n g "yl - ' ' "Yk-1 in the e n c o d i n g o f A p (fl'q/1 . . . 3'k-1~'p) as well. Th i s p a r t of t h e e n t r y will b e called the m i d d l e . . To s u m m a r i z e , the e n t r y s t o r e d in P [i, dp] ( w h e r e f l " y l . . . "Yk-l"Yp is a s s u m e d to b e sufficiently l o n g t h a t w e k n o w A m ( f l ' 7 1 - . . 7k-l"Yp) is g u a r a n t e e d to h a v e a t e r m i n a - tor) will h a v e a h e a d , (Ap,-yp); a n d a tail c o m p r i s e d o f a m i d d l e , "Yl..-'Yk-1; a n d a terminator-pointer. N o t e t h a t t h e l e n g t h of t h e m i d d l e m u s t b e at least o n e, b u t at m o s t M C L - 1, since f r o m O b s e r v a t i o n 2.1, w e k n o w 2 < k < MCL. We will call a n e n t r y of this k i n d a t e r m i n a t o r - t y p e entry.. We will n o w discuss w h a t w e n e e d to s t o re in o r d e r to p o i n t to t h e t e r m i n a - tor. S u p p o s e w e w o u l d like to r e c o r d in P [ i , d ] t h e d e r i v a t i o n o f a i . . . a i + d - 1 f r o m A (fl'Yl... 7k-l"Y) as s h o w n below. We a s s u m e t h a t A t (fl'yt) is t h e t e r m i n a t o r in this d e r i v a t i o n . . a (fl"/1...'Yk-l"Y) ai. .. a t _ l A t (fl')'t) at+dt . . . ai+d-1 ai • • • a t - l at • • • a t + d t - l at q-dt • • • ai+d-1. ~- ai • •. ai+d-1. F r o m O b s e r v a t i o n 2.3, it f o l l o w s t h a t it w o u l d b e sufficient to u s e ((at~ "Ytl~ [t~ dt]) as the t e r m i n a t o r - p o i n t e r . This is b e c a u s e a n y e n t r y w i t h t h e h e a d (At~ ")'tl in P It, dt] will. r e p r e s e n t in g e n e r a l a d e r i v a t i o n A t (fl"Yt) ~ a t . . . at+dr-1. This n o t o n l y m a t c h e s t h e a b o v e case, b u t e v e n if fl' ~ fl, f r o m t h e O b s e r v a t i o n 2.1, w e h a v e . A ( f l ' " / 1 . . . "Yk-lq/) ~ a i . . . a t - l A t (fl'"/t) at+dr.., aiq-d--1 ~ a i . . . ai+d-1.. T h u s , t h e u s e of the h e a d i n f o r m a t i o n (plus t h e t w o indices) in t h e t e r m i n a t o r - p o i n t e r c a p t u r e s the e s s e n c e of O b s e r v a t i o n 2.3. It is this s t r u c t u r e - s h a r i n g t h a t a l l o w s u s to a c h i e v e p o l y n o m i a l b o u n d s for s p a c e a n d time. N o t e t h a t t h e s t r i n g d e r i v e d f r o m t h e t e r m i n a t o r , a t . . . a t + d r - i , is a s u b s t r i n g of a i . . . a i + d - 1 . In s u c h a case, i.e., w h e n i G t a n d i + t >>_ t + d r , w e will s a y t h a t /t, dt/ <_ { i ~ d l . We d e f i n e {t, dt/ < {i, dl if {t, dtl <_ {i, dl a n d {t, dtl # {i, dl. Since a n y t e r m i n a t o r - t y p e e n t r y in P [ i , d ] c a n o n l y h a v e t e r m i n a t o r - p o i n t e r s of t h e f o r m ( { A t , "Ytl ~ {t, d t l ) w h e r e It, dtl <_ {i, d l , t h e n u m b e r of t e r m i n a t o r - t y p e e n t r i e s in P [i, d] is O(d2).. Definition 3.1 G i v e n a g r a m m a r , G, d e f i n e M S L ( G ) ( M a x i m u m S e c o n d a r y c o n s t i t u e n t ' s stack L e n g t h ) as M S L ( G ) = m a x { len (as) I A s (e~s) is t h e s e c o n d a r y c o n s t i t u e n t of a p r o d u c t i o n }. H e n c e f o r t h w e will u s e MSL r a t h e r t h a n MSL(G ).. 600. K. Vijay-Shanker and David J. Weir Parsing Some Constrained G r a m m a r Formalisms. We n o w c o n s i d e r t h e q u e s t i o n o f w h e n a t e r m i n a t o r - t y p e e n t r y is a p p r o p r i a t e . O f . c o u r s e , if A (~) ~ a i . . . ai+d-1 w e c o u l d s t o r e s u c h a n e n t r y in P Ii, dpl o n l y w h e n A (c~) h a s a t e r m i n a t o r in this d e r i v a t i o n . F r o m O b s e r v a t i o n 2.4 w e k n o w t h a t if len (c~) > M T L t h e n t h e r e exists a t e r m i n a t o r o f A (~) in this d e r i v a t i o n . H o w e v e r , it is p o s s i b l e t h a t f o r s o m e g r a m m a r M S L > MTL. T h e r e f o r e e v e n w h e n len (c~) > M T L (i.e., t h e object h a s a t e r m i n a t o r ) A (~) c a n still m a t c h t h e s e c o n d a r y c a t e g o r y o f a r u l e if len (c~) G MSL. I n o r d e r to v e r i f y t h a t a n object m a t c h e s t h e s e c o n d a r y c a t e g o r y o f a r u l e w e . n e e d to c o n s i d e r t h e e n t i r e s t a c k in t h e object. W h e n A (~) ~ a i . . . ai+d-1 a n d l e n g t h of ~ d o e s n o t e x c e e d MSL, it w o u l d b e c o n v e n i e n t to s t o r e A as w e l l as t h e e n t i r e s t a c k c~ b e c a u s e s u c h a n object c a n p o t e n t i a l l y m a t c h a s e c o n d a r y c a t e g o r y of a rule. To b e c e r t a i n t h a t s u c h a n object is s t o r e d in its e n t i r e t y w h e n len (~) < MSL, t h e t e r m i n a t o r - t y p e e n t r y c a n o n l y b e u s e d w h e n len (c~) > max(MSL~ M T L ) . H o w e v e r , w e p r e f e r to u s e t h e t e r m i n a t o r - t y p e e n t r y f o r r e p r e s e n t i n g a d e r i v a t i o n f r o m A (~) o n l y w h e n its t e r m i n a t o r , s a y A t (fl), is s u c h t h a t len (fl) >_ max(MSL~ M T L ) r a t h e r t h a n w h e n len (c~) > max(MSL~ M T L ) . A g a i n , w e p o i n t o u t t h a t this c h o i c e is m a d e o n l y f o r c o n v e n i e n c e a n d b e c a u s e w e feel it l e a d s to a s i m p l e r a l g o r i t h m . T h e a l t e r n a t e c h o i c e c o u l d also b e m a d e , w h i c h w o u l d l e a d to a s l i g h t l y d i f f e r e n t a l g o r i t h m . . Definition 3.2 D e f i n e t h e c o n s t a n t T T C ( T e r m i n a l - T y p e C a s e ) as T T C = m a x ( M S L M T L ) . I n a d e r i v a - . t i o n A (fl71 . . . 7k) ~ W w e will s a y t h a t A (flY1 - . . Vk) h a s t h e T C - p r o p e r t y iff it h a s a k - t e r m i n a t o r , s a y A t (flTt), s u c h t h a t len (flVt) _> TTC.. If A (fl31 . . . 3k) ~ a i . . . ai+d_l, w h e r e A (fl31 . - . 3k) d o e s n o t h a v e t h e T C - p r o p e r t y t h e n w e r e c o r d t h e object in its e n t i r e t y in P Ii~ d]. I n o r d e r f o r s u c h a n e n t r y to h a v e t h e s a m e f o r m a t as t h e t e r m i n a t o r - t y p e entry, w e s a y t h a t t h e e n t r y h a s a h e a d /A~ 3k); a tail w i t h a m i d d l e 3 1 . - . 7k-1 a n d a nil t e r m i n a t o r - p o i n t e r . N o t e t h a t in this c a s e t h e . m i d d l e c a n b e a n e m p t y string; f o r i n s t a n c e , w h e n w e e n c o d e A (V) ~ a i . . • ai+d-1. I n g e n e r a l , if c~ = f13 t h e n w e s a y top (~) = 3 a n d rest (c~) = ft. I f o~ = ¢ t h e n w e s a y t h a t top (c~) = rest (c~) ~- ~.. To s u m m a r i z e , t h e s t r u c t u r e o f a n e n t r y in P Ii, d I is d e s c r i b e d b y t h e f o l l o w i n g rules.. • A n e n t r y c o n s i s t s o f a h e a d a n d a tail.. • A h e a d c o n s i s t s o f a n o n t e r m i n a l a n d a s t a c k s y m b o l . . • A tail c o n s i s t s o f a m i d d l e a n d a t e r m i n a t o r - p o i n t e r . T h e e x a c t n a t u r e o f t h e m i d d l e a n d t h e t e r m i n a t o r - p o i n t e r a r e as g i v e n b e l o w . . - - T h e t e r m i n a t o r - p o i n t e r m a y b e o f t h e f o r m (IAt~ 7tl~ [t~dtl) w h e r e A t E VN~ 3t E W I a n d It~ dtl <_ li~ d). I n this case, t h e m i d d l e is a s t r i n g o f s t a c k s y m b o l s o f l e n g t h a t l e a s t one. T h i s f o r m o f a t e r m i n a t o r p o i n t e r is u s e d in t h e e n c o d i n g o f a d e r i v a t i o n f r o m a n object if its t e r m i n a t o r h a s a s t a c k l e n g t h g r e a t e r t h a n o r e q u a l to TTC. Recall t h a t w e h a d called this t y p e o f a n e n t r y a terminator-type entry. A t e r m i n a t o r - p o i n t e r c a n b e a nil. T h e n t h e m i d d l e is a ( p o s s i b l y e m p t y ) s t r i n g o f s t a c k s y m b o l s . H o w e v e r , t h e l e n g t h o f t h e m i d d l e is less t h a n T T C + M C L - 1. T h i s f o r m o f a t e r m i n a t o r p o i n t e r is u s e d in t h e e n c o d i n g o f a d e r i v a t i o n f r o m a n object if it d o e s n o t s a t i s f y t h e T C - p r o p e r t y ; i.e., e i t h e r it h a s n o . 601. Computational Linguistics Volume 19, Number 4. t e r m i n a t o r or if the t e r m i n a t o r exists t h e n its stack l e n g t h is less t h a n TTC.. 3.2 Recognition Algorithms for LIG Since the full a l g o r i t h m i n v o l v e s a n u m b e r of cases, w e d e v e l o p it in stages b y restrict- i n g the f o r m s of p r o d u c t i o n s . T h e first a l g o r i t h m t h a t c o n s i d e r s t h e m o s t r e s t r i c t e d f o r m of p r o d u c t i o n s i n t r o d u c e s m u c h of w h a t lies at t h e co re o f o u r a p p r o a c h . N e x t w e relax t h e s e r e s t r i c t i o n s to s o m e d e g r e e . A f t e r g i v i n g t h e a l g o r i t h m at this stage, w e s w i t c h to discuss h o w this a l g o r i t h m c a n b e a d a p t e d to y i e l d o n e f o r CCG. Later, in Section 5, w e c o n s i d e r f u r t h e r r e l a x a t i o n o f t h e r e s t r i c t i o n s o n t h e f o r m o f LIG p r o d u c t i o n s , w h i c h c a n h e l p us p r o d u c e a n a l g o r i t h m fo r TAG.. R e g a r d l e s s of w h i c h set of r e s t r i c t i o n s w e co n si d er, in e v e r y a l g o r i t h m w e shall establish t h a t the f o l l o w i n g p r o p o s i t i o n holds.. Proposition 3.1 • ( ( a t ~k) (')'1.-. "Yk-1, ((at,,,/t), [t, dt]))) E P[i,d] if a n d o n l y if fo r s o m e . fl c v ~ , . a (fl~'l... "Yk-l"/k) ~ a i . . . a t - l a (fl"/t) at+dt-1...ai+d-1 ai. . .ai+a-1. w h e r e At (fl"/t) is the k - t e r m i n a t o r of A (fl"/1 ' ' ' "Yk) a n d len (fl'Yt) >_ TTC.. • ((A,'yk) (3'1...Tk-1,nil)) E P[i,d] i f a n d o n l y if. a ( ' ) ' 1 . . . q/k-lq/k) ~ ai...ai+d-1. w h e r e in this d e r i v a t i o n A ("/1 . . . "Yk-l'Yk) d o e s n o t h a v e t h e T C - p r o p e r t y . . 3.2.1 A l g o r i t h m 1. Recall t h a t the g e n e r a l f o r m of r u l e s t h a t are to b e c o n s i d e r e d are as follows.. 1. A (c~) --* c w h e r e e ¢ {e} U VT, a n d len (c~) > 1.. 2. A(..'y~. . ' r m ) ~ Ap(..'~p)As(~s). 3. a(..~l..."ym)----~ as(ozs)ap (.."fp).. 4. a ( " ~ l . . . 3 ' m ) ~ A p ( " 3 ' p ) . . A t this stage w e a s s u m e t h a t the f o l l o w i n g r e s t r i c t i o n s h o l d o f t h e a b o v e rules.. In t h e first t y p e of p r o d u c t i o n w e a s s u m e t h a t e c VT a n d len (c~) > 1. T h u s M T L > 1.. len (C~s) _> 1 in p r o d u c t i o n s of t y p e 2 a n d t y p e 3, i.e., MS L > 1.. T h e r e are n o p r o d u c t i o n s of t y p e 4.. We will n o w g i v e the f o l l o w i n g r u l e s t h a t s p e c i f y h o w e n t r i e s get a d d e d in t h e p a r s i n g array. T h e c o n t r o l s t r u c t u r e of t h e a l g o r i t h m (a CKY-style d y n a m i c p r o g r a m - m i n g s t r u c t u r e ) will b e a d d e d later. We a s s u m e t h a t t h e i n p u t g i v e n is al . . . an, w h e r e n > l . . 602. K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms. I n i t i a l i z a t i o n P h a s e In the initialization p h a s e of t h e a l g o r i t h m w e sto re lexical objects (objects d e r i v i n g a t e r m i n a l s y m b o l in o n e step) e n t i r e l y in a single entry. In o t h e r w o r d s , . Rule 1.L A (c~) ---~ a a = ai l < i < n . ( {A, top (c~)} (rest (ee), nil)) ¢ P[ i, 1]. I n d u c t i v e p h a s e H e r e p r o d u c t i o n s of t y p e 2 a n d t y p e 3 will be c o n s i d e r e d . Let u s a s s u m e t h e p r e s e n c e of t h e f o l l o w i n g p r o d u c t i o n in t h e g r a m m a r : A (.. "Yl . . . " / m ) - - + Ap (.. % ) As (o~s). 5. S u p p o s e t h a t w h i l e c o n s i d e r i n g w h i c h entries are to b e i n c l u d e d in P [i, d] w e fi n d the f o l l o w i n g for s o m e dp, ds s u c h t h a t dp + ds = d.. • T h e e n t r y ((Ap,,ypl ( f l p , t p p ) ) E P[i, dp]. T h i s is c o n s i s t e n t w i t h t h e r u l e ' s p r i m a r y c o n s t i t u e n t . R e g a r d l e s s of w h e t h e r tpp = nil o r not, f o r s o m e . fl E V~: Ap (flflp'yp) ~ ai...ai+dp-1. T h a t is, w h e n tpp = nil w e h a v e . • T h e e n t r y ((As, top (o~s)) (rest (c~s), nil)) E P [i + dp, ds]. This is c o n s i s t e n t w i t h t h e r u l e ' s s e c o n d a r y object. T h u s if d = dp + ds w e m a y a s s u m e As (Ols) ~ ai+dp . . . ai+d_l.. F r o m t h e p r e s e n c e of the t w o e n t r i e s specified a b o v e ( a n d t h e d e r i v a t i o n s t h e y r e p - . resent) w e h a v e A (flflp'Yl . . . 7m) ~ Ap (flflp'yp) As (c~s) ~ a i . . . ai+d-1. This d e r i v a t i o n m u s t be r e c o r d e d w i t h a n e n t r y in P [i, d]. T h e c o n t e n t o f t h e e n t r y d e p e n d s o n sev- eral factors: the v a l u e of m; w h e t h e r or n o t t h e t e r m i n a t o r - p o i n t e r in t h e e n t r y fo r t h e p r i m a r y c o n s t i t u e n t (i.e., tpp) is nil; a n d the l e n g t h of t h e m i d d l e in this e n t r y (i.e., tip). T h e s e d e t e r m i n e w h e t h e r or n o t the n e w e n t r y will b e a t e r m i n a t o r - t y p e entry. We h a v e cases for m = 0, m = 1 a n d m _> 2.. C A S E W H E N m = 0 T h e n e w object to be s t o r e d is A (flflp). T h e t o p o f t h e stack in this object c a n b e o b t a i n e d f r o m t h e stack a s s o c i a t e d w i t h the p r i m a r y c o n s t i t u e n t . H o w this is d o n e d e p e n d s o n w h e t h e r the e n t r y e n c o d i n g the p r i m a r y c o n s t i t u e n t is o f t e r m i n a t o r t y p e or not.. W h e n m = 0 a n d tpp = nil This m e a n s t h a t the p r i m a r y c o n s t i t u e n t has b e e n r e p r e s e n t e d in its en t i ret y ; i.e., t h e p r i m a r y c o n s t i t u e n t is Ap (flpTp). Since tpp = nil the p r i m a r y c o n s t i t u e n t d o e s n o t satisfy the T C - p r o p e r t y (i.e., it d o e s n o t h a v e a t e r m i n a t o r w i t h a stack o f l e n g t h g r e a t e r t h a n or e q u a l to TTC), t h e n e w c o n s t i t u e n t t o o c a n n o t b e e n c o d e d u s i n g a t e r m i n a t o r - t y p e entry. T h e r e f o r e , . Rule 2.ps.L. ( l A p , 3'pl (tip, nil)) ¢ P[i, dp] ((As, top(c~s)) (rest(o~s),nil) ) C P[i + d p , d - d p ] . ( IA, t°P (flp) l (rest (flp),nil) ) E P[i,d]. 5 Similar arguments can be used when we consider the production: A (.. 3'1 ... 7m) --* As (C~s) Ap (" 3'p).. 603. Computational Linguistics Volume 19, Number 4. T h e f o l l o w i n g r u l e is the c o u n t e r p a r t of Rule2.p s.L 6 t h a t c o r r e s p o n d s to t h e u s e o f t h e p r o d u c t i o n A (..) ~ As (C~s) Ap (.. 7p).. Rule 2.sp.L. ( (As, top(c~s)) (rest(c~s),nil) ) • P[i, ds] ((Ap, Vp ) (tip,nil)) • P[i + d s , d - ds] ((A, top (tip)) (rest (tip),nil) ) • P[i,d]. W h e n m = 0 a n d tpp ~ nil L e t t h e e n t r y for t h e p r i m a r y c o n s t i t u e n t b e ((Ap, 7p) (tip, ((At, 7t), It, dr]))). Since t h e p r i m a r y c o n s t i t u e n t is Ap (flflpTp) w e will a s s u m e t h a t its t e r m i n a t o r is At (fl'Yt) w h e r e len (flVt) > TTC. N o t e also t h a t len (flp'yp) > 2. T h e e n t r y fo r t h e n e w object (A (flflp)) is d e t e r m i n e d b a s e d o n w h e t h e r len (tip) = 1 o r len (tip) > 1. In t h e latter case t h e len (flpVp)-terminator of the p r i m a r y c o n s t i t u e n t is t h e len ( & ) - t e r m i n a t o r o f t h e n e w object. This is n o t so in the f o r m e r case, as n o t e d in O b s e r v a t i o n 2.5.. C o n s i d e r i n g the latter case first, i.e., len (tip) > 1, w e m a y w r i t e tip as 71...'Yk-lVk w h e r e k > 2. Since in this case t h e n e w object a n d t h e p r i m a r y c o n s t i t u e n t h a v e t h e s a m e t e r m i n a t o r a n d since the p r i m a r y c o n s t i t u e n t h a s t h e T C - p r o p e r t y (tpp ~ nil), the n e w object m u s t also b e e n c o d e d w i t h a t e r m i n a t o r - t y p e entry. T h u s w e h a v e t h e f o l l o w i n g rule:. Rule 3.ps.L. ( ( A p , ~ / p ) ( ' y 1 . . . 3 % t p p ) ) • P [ i , dp] tpp = ( ( A t , v t ) , [t, dt]) ,k >_ 2 ((As, top (c~s)) (rest (c~s), nil)) • P [i + dp, d - dp]. ( ( A , " Y k l ( " Y 1 . . . T k - l , t p p ) ) c P [ i , d 1. H e n c e f o r t h w e shall g i v e the ps v e r s i o n s of the r u l e s o n l y a n d o m i t sp v e r s i o n s . N o w let u s c o n s i d e r t h e case w h e n len (tip) = 1. R e w r i t i n g tip as 71, t h e e n t r i e s . r e p r e s e n t d e r i v a t i o n f o r fl E V~ (len (fl'yl) = len (fl"/t) > TTC).. a (ti"/1) ~ a p (ti"/l"/p) A s (0@). ai . . . at- l At (tiTt ) at+a,.., ai+ap- l As (C~s) ai . . . a t _ l a t . . . a t + d t _ l a t + d t • . . a i + d p _ laiq-d p . . . a i + d _ l . w h e r e At (ti'Yt) is the 2 - t e r m i n a t o r of Ap (ti"/l"Yp)- F r o m O b s e r v a t i o n 2.5 it f o l l o w s t h a t if At(tiVt) has a t e r m i n a t o r t h e n t h e t e r m i n a t o r of A(ti'yl) in this d e r i v a t i o n is t h e s a m e as the t e r m i n a t o r of At (fl'Yt); a n d if At (fl'Yt) h a s n o t e r m i n a t o r t h e n n e i t h e r d o e s A (ti'Yl). A d d i t i o n a l l y , in this d e r i v a t i o n A (ti'yl) satisfies t h e T C - p r o p e r t y if a n d o n l y if At (ti'Yt) h a s t h e T C - p r o p e r t y . T h a t is, w e s h o u l d u s e a t e r m i n a t o r - t y p e e n t r y to r e c o r d this d e r i v a t i o n f r o m A (ti'Yl) if a n d o n l y if a t e r m i n a t o r - t y p e e n t r y h as b e e n u s e d for At (tiTt). Since t h e s e t w o objects s h a r e t h e s a m e t e r m i n a t o r (if it exists) t h e t e r m i n a t o r - p o i n t e r m u s t b e t h e s a m e w h e n w e r e c o r d d e r i v a t i o n s f r o m t h e m . T h e r e - fore, s u p p o s e w e u s e t h e t e r m i n a t o r - p o i n t e r of ((Ap, ½ ) (tip, ((At, "Yt), [t, dt]))) t o lo- cate a n e n t r y ((At, "Yt) (tit, tPt)) • P It, dr]. This w o u l d s u g g e s t t h e a d d i t i o n o f t h e e n t r y . 6 Here L indicates a rule we use in LIG parsing; ps indicates that the primary constituent appears before the secondary constituent. Similarly, sp will be used to indicate that the secondary constituent appears before the primary constituent.. 604. K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms. (IA, "/1)(tit, tpt) ) to P[i,d], regardless of w h e t h e r or n o t tPt = nil. H o w e v e r , w e give the t w o cases (tpt = nil or tPt = (IAr, %1, [r, dr]) for s o m e At, %, r, dr) in the f o r m of t w o d i f f e r e n t rules. This is b e c a u s e (as w e shall see later) these t w o rules will h a v e to a p p e a r in different p o i n t s of the control-structure of the p a r s i n g algorithm.. Rule 4.ps.L. ((Ap, 7p) (71, ((At, 7 t ) , [t, dt]) ) ) c P [ i , dp]. ((As, top (c~s) ) (rest (c~s), nil)) ¢ P [ i + d p , d - d p ] . ( (At, Tt) (fit,nil)) E P[t, dt]. ( (A, 71) (fit,nil)) E P[i,d]. Rule 5.ps.L. ( (Ap, vp) (71, ( (At, 7t) , [t, dt] ) ) ) ¢ P[i,d~,]. ((As, top (c~s) ) (rest (C~s), nil)) ¢ P [ i + d p , d - d p ] . ((At, "~/t) (fit, tPt) ) P[t, at]. tpt = ((Ar,'Yr), Jr, dr]) ((A, 7"/1) (flt,tpt) ) E P[i,d]. C A S E W H E N m - - 1. The l e n g t h of the stack in the n e w object is e q u a l to t h a t of the p r i m a r y object. In fact, the t e r m i n a t o r of the p r i m a r y object (if it exists) is the s a m e as the t e r m i n a t o r of the n e w object, a n d w h e n the p r i m a r y object has n o t e r m i n a t o r n e i t h e r d o e s the n e w object. Therefore the e n c o d i n g of the n e w object can easily be d e r i v e d f r o m t h a t of the p r i m a r y object b y s i m p l y m o d i f y i n g the h e a d (to c h a n g e the t o p of the stack symbol). T h u s w e have:. Rule 6.ps.L. ((Ap,'Tp) (tip, nil)) ¢ P[i, dp] ((As, top(c~s)) (rest(c~s),nil)) c P [ i + d p , d - d p ] ((A,")'I) (tip, nil)) ¢ P[i,d]. Rule 7.ps.L. ( (Ap, 7p) (tip, ((At, 7t) , [t, dt] ) ) ) ¢ P [i, dp] ( ( A~ , top ( c~s ) ) (rest(as), nil)) E P [ i + dp , d - dp ]. C A S E W H E N m > 2. If the p r i m a r y c o n s t i t u e n t is Ap (titip,,/p) t h e n the n e w c o n s t i t u e n t is A (tiflp'Yl... "/m). In fact, in this case, w e h a v e the p r i m a r y c o n s t i t u e n t b e i n g the m - t e r m i n a t o r of A (fltip3'l... "Ym). Of course, this d o e s n o t m e a n t h a t the d e r i v a t i o n f r o m the n e w object s h o u l d be r e c o r d e d w i t h the u s e of a t e r m i n a t o r - t y p e entry. We u s e the t e r m i n a t o r - t y p e e n t r y o n l y w h e n len (tip3'p) ~ TTC. In o r d e r to d e t e r m i n e the l e n g t h of this stack w e h a v e to u s e the e n t r y for the p r i m a r y c o n s t i t u e n t (i.e., (IAp,.Tp)(tip, t p p ) l E PIi, dp]) a n d c o n s i d e r w h e t h e r this is a t e r m i n a t o r - t y p e e n t r y or n o t (i.e., w h e t h e r tpp = nil or not).. 605. Computational Linguistics Volume 19, Number 4. W h e n m _> 2 a n d tpp ~ nil Therefore the length of the stack of the terminator of the primary constituent is greater than or equal to TTC. This means that stack length of the primary constituent (the terminator of the new object) exceeds TTC. Thus we have the following rule:. Rule 8.ps.L. ({Ap,'Tp) (tip, tpp) ) ¢ P[i, dp] tpp = (Iat,'~t), It, dr]) ({As, top (C~s)) (rest (as), nil)) E P[i +dp, d - dpl. (/A, ~m) (")/1.-. "/m-l, ( {Ap, "yp) , [i, dp]) ) ) c P[i,d]. W h e n m _> 2 a n d tpp = nil The primary constituent (which is the terminator of the new object) should be repre- sented in its entirety. Therefore, in order to determine whether we have to encode the new object with a terminator-type entry or not, we have to look at the entry for the primary constituent. Thus we obtain the following rules:. Rule 9.ps.L. len (tip'yp) < TTC ( IAp, 3,pl (tip, nil)) E P[i, dp] ((As, top (as)) (rest (as), nil)) E P [i + dp, d - dp]. ( {A, ~m) (tipVl . . . Tm-l,nil) ) E P[i,d]. Rule 10.ps.L. len (tip-yp) > TTC ( (Ap, 7p) (tip, nil)) c P[i, dp] ( IAs, top (as)) (rest (~s), nil)) E P [i + dp, d - dp]. ((A, "Ym) ('Yl... "/m-l, (lAp, "/p) , [i, dp] ) ) ) C P[i,d]. In the discussions that follow, we find it convenient to refer to the entries mentioned in the above rules as either antecedent entries (or entries that appear in the antecedent) of a rule or c o n s e q u e n t entry (or entry that appears in the consequent) of a rule. For ex- ample, (lAp, q/pl (tip, nil)) in PIi, dpl and ( IAs,top(e~s)l (rest (c~s),nil) ) in PIi + dp,d - dpl are the antecedent entries of Rule 10.ps.L and ( IA, "Yml ('Y'"" "Ym--l~ ( I ap~ "~p I ' [i, dp] ) ) ) that is a d d e d to P [i, d I is the entry in the consequent of Rule 10.ps.L.. 3.3 T h e C o n t r o l S t r u c t u r e We will start by giving a simple control structure for the recognition algorithm that follows the dynamic programming style used in the CKY algorithm.. 606. K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms. In this section w e m o d i f y the n o t a t i o n for entries slightly. In the a b o v e discussion, the t e r m i n a t o r - p o i n t e r of a t e r m i n a t o r - t y p e e n t r y contains a pair of indices repre- senting i n p u t positions. Thus, in effect, P is a f o u r - d i m e n s i o n a l array. As a n alter- native to s a y i n g t h a t (( A, 3,) (fl, (CA', 3"), [t,d;]))) is in P[i,d] w e will s o m e t i m e s s a y (C A, -y)(fl, (A', 3/))) is in P[i, d] [t, dt]. A l s o as a n a l t e r n a t i v e to s a y i n g (C A, oe)(fl, nil)) is in P[i,d] w e will s o m e t i m e s s a y ((A, c,) (fl, nil)) is in P[i,d][O,O]. T h u s P can be c o n s i d e r e d to be an a r r a y of size n x n x (n + 1) x (n + 1).. In the specification of the a l g o r i t h m (Figure 5) w e will n o t restate all the rules w e d i s c u s s e d in the p r e v i o u s section. I n s t e a d w e will o n l y indicate w h e r e in the control s t r u c t u r e each rule fits. As a n example, w h e n w e state "Use Rule 2.ps.L w i t h dp = d " w i t h i n the i, d, a n d d' loops w e m e a n the following: for c u r r e n t v a l u e s of i, d, a n d d' (and hence dp, ds) c o n s i d e r e v e r y p r o d u c t i o n of the f o r m A (.. "Yl . . . 7m) --+ Ap (.. 7p) As (as) w i t h m = 0. For each s u c h p r o d u c t i o n , look for entries of the f o r m ((Ap, 7p) (tip, nil)) E P [i, dp] [0,0] for s o m e tip a n d ( (As, top (as) ) (rest (as), nil) ) E P [i + dp, d - dp] [0,0]. In the e v e n t w e find such entries, w e a d d ( ( A , top (tip)) (rest (flp),nil)) to P[i,d] [0,0] if it is n o t a l r e a d y there.. Since the entries in P[i, d] h a v e the f o r m ( (A, -,/) (fl, (CAt, 3q) , It, dt] ) ) ) (where (t, dt) G (i,d)) or the f o r m (CA,')')(fl, nil)), there are O(d 2) m a n y entries in P[i,d] (where 1 G i < n a n d 1 G d G n - d). T h u s space c o m p l e x i t y of this a l g o r i t h m is O(nd). N o t e t h a t w i t h i n the b o d y w i t h i n the r loop will be a t t e m p t e d for all possible v a l u e s of i, d, d', t, dt, r, dr. Since the r a n g e of each loop is O(n), the t i m e c o m p l e x i t y is O(n7).. The a s y m p t o t i c c o m p l e x i t y of the a b o v e a l g o r i t h m can be i m p r o v e d to O(n 6) w i t h a simple r e a r r a n g e m e n t of the control structure. The k e y p o i n t here is t h a t the steps i n v o l v i n g the u s e of rules 5.ps.L a n d 5.sp.L can be split into t w o parts each. Consider, for example, the use of the Rule 5.ps.L, w h i c h is r e p e a t e d below.. Rule 5.ps.L. ((Ap,',/p) (~I, ((At, q:t) , [t, dt]) ) ) EP[i, dp]. ((A,, top ( o~, ) ) (rest (c~), nil)) G P [ i + d p , d - d p ] . ((At, ")'t) (fit, tpt) ). tpt = ((Ar,")'r), [r, dr]) ((A, q'l) (fit, ((Ar,'Yr), [r, dr]))) E P[i,d]. This rule c o r r e s p o n d s to the use of the p r o d u c t i o n A (..) ~ Ap (.. q/p) As (as). The v a l u e s of i, d, d', t, d t a r e n e c e s s a r y to d e t e r m i n e the s p a n of the substrings d e r i v e d f r o m the p r i m a r y c o n s t i t u e n t a n d the s e c o n d a r y constituent, a n d the v a l u e s of i, d, t, dt, r, dr a r e n e e d e d to locate the e n t r y for the terminator, i.e., ( C A t , , ) , t ) ( f i t , (CAr,"/r), [r, dr]))) a n d to place the n e w e n t r y in the a p p r o p r i a t e p a r s i n g table element. T h a t is, the v a l u e s of r a n d dr a r e n o t r e q u i r e d for the first p a r t a n d the v a l u e of d' n e e d n o t be k n o w n for the second part. This indicates t h a t the s e c o n d p a r t n e e d n o t be d o n e w i t h i n the loop for dq Therefore, w e can m o d i f y the control s t r u c t u r e in the f o l l o w i n g way. Within the t loop (which a p p e a r s w i t h i n the loops for d, i, d',dt) w e find the entries for the p r i m a r y a n d s e c o n d a r y constituents. H a v i n g f o u n d the t w o r e l e v a n t entries, w e m u s t record the h e a d of the n e w e n t r y (A, tip) a n d the t e r m i n a t o r - p o i n t e r of the p r i m a r y constituent, i.e., (CAt, ~t), [t, dt]). We can d o this b y u s i n g a t w o - d i m e n s i o n a l a r r a y called TEMP w h e r e w e store CA, q'l, At, 7t). O u t s i d e the d' l o o p (and h e n c e o u t s i d e the loops for t a n d dt a s well), b u t w i t h i n the loops for i a n d d, w e can h a v e the loops t h a t v a r y t, dt, r, dr (note (r, dr) < (t, dr)) in o r d e r to locate the e n t r y for the t e r m i n a t o r b y u s i n g the i n f o r m a t i o n r e c o r d e d in TEMP. Finally, h a v i n g f o u n d the e n t r y for the. 607. C o m p u t a t i o n a l Linguistics Volume 19, N u m b e r 4. A l g o r i t h m 1 b e g i n . for i:= 2 to n d o I n i t i a l i z a t i o n p h a s e . Use Rule 1 for d : = 2 to n d o % d l o o p . f o r i : = l t o n - d + l d o % i l o o p b e g i n . f o r d ' : = l t o d - l d o % d ' l o o p b e g i n . Use Rule 2.ps.L, 6.ps.L, 9.ps.L, 10.ps.L w i t h dp = d'. for dt : = (d' - 1) to 1 d o % dt l o o p . for t : = i to (i + d' - dt) d o % t' l o o p b e g i n . Use Rule 3.ps.L, 4.ps.L, 7.ps.L, 8.ps.L w i t h dp = d'. for dr :-- dt to 1 d o for r : = t to t + dt - dr d o . b e g i n Use Rule 5.ps.L w i t h dp = d'. e n d % e n d of dr l o o p . % e n d of r l o o p e n d . % e n d of t l o o p % e n d of dt l o o p f o r dt : = ( d - d ' - 1) to 1 d o % dt l o o p . for t : = (i + d') to (i + d - dr) d o % t' l o o p b e g i n . Use Rule 3.ps.L, 4.ps.L, 7.ps.L, 8.ps.L w i t h ds = d' for dr : = dt - 1 to 1 d o . for r : = t to ( t + dt - dr) d o b e g i n . Use Rule 5.sp.L w i t h ds = d' e n d . % e n d of r l o o p % e n d of dr l o o p . e n d % e n d of t l o o p . % e n d of dt l o o p e n d . % e n d of d' l o o p e n d . % e n d of i l o o p % e n d of d l o o p . Figure 5 Algorithm 1.. 608. K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms. terminator w e then store the resulting entry in P [i, d]. These steps are captured b y the following rules. For a specific value of (i, d) w e have. Rule 5.i.ps.L. ( (Ap, q/p) (71, ( (At, q/t) , It, dr]))) C P[i, dp]. ((As, top (C~s)) (rest (C~s), nil)) e [ i + clp, cl - G ] . (A, q/1,At, 7t) E TEMP[t, dt]. Rule 5.ii.ps.L. (a~"/l~.,z~t,q/t) E TEMP[t, dt] ((At, q/t) (fit, ((Ar, q/r), [r, dr]))) C P[t, dt] ((A,q/1) (fit, ((Ar, q/r), [r, dr]))) C P[i,d]. Similarly, w e assume w e have the pair Rule 5.i.sp.L and Rule 5.ii.sp.L corresponding to Rule 5.sp.L. This leads to the algorithm given in Figure 6. In this algorithm w e d r o p the sp rules and specify the ps rules only for the sake of simplicity.. The correctness of Algorithm 2 can be established from the correctness of Algo- rithm 1 (which is established in A p p e n d i x A) and the following Lemma.. Lemma 3.1 Given a g r a m m a r G and an input a l . . . a n an entry ((A, q/} (fl, tp)) is a d d e d to P[i,d] b y Algorithm 1 if and only if ((A,q/) (fl, tp)) is a d d e d to P[i,d] b y Algorithm 2.. Outline of Proof: Using induction on d. The base case corresponding to d = 1 in- volves only the initialization step, which is the same in the t w o algorithms. The only difference b e t w e e n the t w o algorithms (apart from the control structure) is the use of Rule 5.ps.L (and Rule 5.sp.L) b y Algorithm 1 versus the use of Rule 5.i.ps.L and Rule 5.ii.ps.L (Rule 5.i.sp.L and Rule 5.i.sp.L) in Algorithm 2. Rule 5.ps.L is u s e d to a d d entries of the form ((A, ~Yl)(fit, ((ar~ q/r)~ Jr, dr]))). We can establish that ((A, ,`/1) (fit, ((Ar, q/r), [r, dr]))) is a d d e d to P [i, d] d u e to the application of Rule 5.ps.L if and only if there exist entries of the form ((Ap, q/p) ('71, ((At, q/t), [t, dt]))) in P[i, dp]; ((As, top(c~s))(rest(c~s),nil)) in P [ i + d p , d - d p ] ; ((at, q/t)(flt~((ar, q/r)~[Y~dr])) ) in P It, dt]; and the production A (..) --* Ap (.. q/p) As (~s). Using induction, w e can estab- lish that these entries exist if and only if (A, q/1,At~ q/t) is a d d e d to TEMP[t, dt] using Rule 5.ps.i.L (or Rule 5.sp.i.L) and ((A, q/1) (fit, ((Ar, q/r), Jr, dr]))) is a d d e d to P[i,d] using Rule 5.ii.ps.L.. 4. Combinatory Categorial Grammars. C o m b i n a t o r y Categorial Grammars (CCG) (Steedman 1985, 1986) are extensions of Classical Categorial Grammars in which both function composition and function ap- plication are allowed. In addition, forward and b a c k w a r d slashes are u s e d to place conditions concerning the relative ordering of adjacent categories that are to be com- bined.. Definition 4.1 The set of categories generated from a set, VN, of atomic categories is defined as the smallest set such that all m e m b e r s of VN are categories, and if cl, c2 are categories then so are (Cl/C2) and (el\e2).. 609. C o m p u t a t i o n a l Linguistics Volume 19, N u m b e r 4. A l g o r i t h m 2 begin. for i:= 1 to n d o Initialization phase Use Rule 1. f o r d : = 2 t o n d o % d l o o p for i : = 1 to n - d + 1 d o % i loop. begin Initialize TEMP It, dt] to ~ for all (t~ dt) ~ (i~ d). f o r d p : = l t o d - l d o % d p l o o p Use Rule 2.ps.L, 6.ps.L, 9.ps.L, 10.ps.L. for dt := dp - 1 to 1 d o % dt loop for t : = i to i + dp - dt d o % t loop. Use Rule 3.ps.L, 4.ps.L, 5.i.ps.L, 7.ps.L, 8.ps.L % end of t loop. % end of dt loop % end of dp loop for dt : = d - 1 to 1 d o % dt loop. for t : = i to i + d - dt d o % t loop for dr : = dt - 1 to 1 d o . for r : = t to t + dt - dr d o begin. Use Rule 5.ii.ps.L end. % end of r loop % end of dr loop. % end of dt loop % end of t loop. end % end of i loop. % end of d loop. Figure 6 A l g o r i t h m 2.. D e f i n i t i o n 4.2 A C C G , G, is d e n o t e d b y ( V T , V N , S~f~ R ) w h e r e . V T is a finite s e t o f t e r m i n a l s (lexical i t e m s ) , VN is a finite s e t o f n o n t e r m i n a l s ( a t o m i c c a t e g o r i e s ) , S is a d i s t i n g u i s h e d m e m b e r o f VN, f is a f u n c t i o n t h a t m a p s e a c h e l e m e n t o f V T t o a f i n i t e s e t o f c a t e g o r i e s , R is a f i n i t e s e t o f c o m b i n a t o r y r u l e s , w h e r e c o m b i n a t o r y r u l e s h a v e t h e f o l l o w i n g form.. 1. A f o r w a r d r u l e h a s t h e f o l l o w i n g f o r m w h e r e m > 0.. ( x / y ) (yllZl[2... ImZm) ---4. (XllZll2... [mZm). 2. A b a c k w a r d r u l e h a s t h e f o l l o w i n g f o r m w h e r e m > 0.. (y11Zl12... ImZm) ( x \ y ) ---+ (XIIZl12... ]mZm). 610. K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms. H e r e x , y , z l , . . . ,Zm are m e t a - v a r i a b l e s a n d h , . . - , [m E { \ , / } . F o r m -- 0 t h ese r u l e s c o r r e s p o n d to f u n c t i o n a p p l i c a t i o n a n d f o r m > 0 to f u n c t i o n c o m p o s i t i o n . N o t e t h a t the set R c o n t a i n s a finite s u b s e t of t h e s e p o s s i b l e f o r w a r d a n d b a c k w a r d rules; i.e., for a g i v e n C C G o n l y s o m e of the c o m b i n a t o r y ru l es will b e available.. D e f i n i t i o n 4.3 In the f o r w a r d a n d b a c k w a r d rules g i v e n a b o v e , w e s a y t h a t ( x / y ) (resp. ( x \ y ) ) is t h e p r i m a r y c o n s t i t u e n t of the f o r w a r d (resp. b a c k w a r d ) r u l e s a n d (y]1zl[2... ]mZm) is t h e s e c o n d a r y c o n s t i t u e n t of the rule. T h e n o t i o n of a d i s t i n g u i s h e d child is d e f i n e d as in the case of LIG, i.e., a c a t e g o r y is the d i s t i n g u i s h e d ch i l d o f its p a r e n t if it c o r r e s p o n d s to the p r i m a r y c o n s t i t u e n t of the r u l e u s e d . As befo re, t h e d i s t i n g u i s h e d d e s c e n d a n t is the reflexive, t r a n s i t i v e c l o s u r e of the d i s t i n g u i s h e d child relation.. In d i s c u s s i n g C C G w e u s e the n o t a t i o n a l c o n v e n t i o n s t h a t t h e v a r i a b l e s ] a n d c ( w h e n u s e d w i t h or w i t h o u t p r i m e s a n d subscripts) r a n g e o v e r t h e f o r w a r d a n d b a c k w a r d slashes a n d categories, respectively. We u s e x , y , z fo r m e t a - v a r i a b l e s ; a , fl for strings of d i r e c t i o n a l c a t e g o r i e s (i.e., a s t r i n g o f t h e f o r m ]1Cl]2.,. ]nOn f r o m s o m e n ~ 0); a n d A, B, C for a t o m i c c a t e g o r i e s (i.e., m e m b e r s o f VN).. D e r i v a t i o n s in a CCG, G = (VT, VN~ S , f , R), i n v o l v e t h e u s e of t h e c o m b i n a t o r y rules in R. Let ~ b e d e f i n e d as follows, w h e r e T1 a n d T2 are st ri n g s o f c a t e g o r i e s . G. a n d t e r m i n a l s y m b o l s . . If ClC 2 ---+ C is a n i n s t a n c e of a r u l e in R, t h e n T l C T 2 ~ "~1ClC2T2 . G. If c C f ( a ) for s o m e a c V T a n d c is a category, t h e n TlCT2 ~ T l a T 2 . . T h e string l a n g u a g e s g e n e r a t e d b y a CCG, G, L(G) = { w ] S ~ w ] w E V~ }. G. E x a m p l e 4.1 T h e f o l l o w i n g C C G g e n e r a t e s { w c w ] w E {a, b} + }. Let G = ({at b, c}, {S, T, A, B}, S , f , R) w h e r e . f ( a ) = ( A , T \ A / T , T \ A } f ( b ) = {B, T \ B / T , T \ B } f ( c ) = ( S / T } . T h e set of r u l e s R i n c l u d e s the f o l l o w i n g t h r e e rules.. y ( x \ y ) ~ x ( x / y ) ( y \ z l / z 2 ) ---+ ( y \ z l / z 2 ) ( x / y ) ( y \ z l ) ~ ( y \ z l ) . In e a c h of these rules, t h e t a r g e t of the c a t e g o r y m a t c h e d w i t h x m u s t b e S. 7 F i g u r e 7 s h o w s a d e r i v a t i o n of the s t r i n g abbcabb.. We find it c o n v e n i e n t to r e p r e s e n t c a t e g o r i e s in a m i n i m a l l y p a r e n t h e s i z e d f o r m (i.e., w i t h o u t p a r e n t h e s e s u n l e s s t h e y are n e e d e d to o v e r r i d e t h e left a s s o c i a t i v i t y o f the slashes), w h e r e m i n i m a l l y p a r e n t h e s i z e d f o r m is d e f i n e d as follows.. 7 Following Steedman (1985), we allow certain very limited restrictions on the substitutions of variables in the combinatory rules. A discussion on the use of such restrictions is given in Vijay-Shanker and Weir (in press). However, we have not included this in the formal definition since it does not have a significant impact on the algorithm presented.. 611. Computational Linguistics Volume 19, Number 4. A S~A. S~A~B. B. b. S~AkBkB. S~B/T. S'~JF. SIT T'Odr. I I a. c. T~B/F. I b. T~B. I b. Figure 7 CCG example derivation tree.. D e f i n i t i o n 4.4 • A is the minimally parenthesized form of A where A C VN.. • If c l , . . . , c , are the minimally parenthesized forms of categories c ~ , . . . , c" respectively, then ( a l l c l l 2 . . . Incn) is the minimally parenthesized form of ( ( ' - ' (allc~)12'' ")l~c').. A category c is in minimally parenthesized form if c is the minimally parenthesized form of itself.. D e f i n i t i o n 4.5 Let a category c = A l l C 1 1 2 . . . [nCn be in minimally parenthesized such that n > 0, A E VN, and C l , . . . , Cn are minimally parenthesized categories.. • The target category of c = a l l C l I 2 . . . InCn denoted by t a r ( c ) is A.. • The arity of c = A l l C 1 1 2 . . . InCn, denoted as a r i t y (c), is n.. • The argument categories of c = AllC 112... Inch denoted by args (c) = { ci ] 1 < i < n }.. 4.1 C C G a n d LIG Before showing h o w the general parsing scheme illustrated by the LIG recognition algorithm can be instantiated as a recognition algorithm for CCG, we show that CCG and LIG are very closely related. The details of the examination of the relationship between CCG and LIG m a y be f o u n d in Weir and Joshi (1988) and Weir (1988).. A minimally parenthesized category (AIlc 112... InCn) can be viewed as the atomic category, A, associated with a stack of directional argument categories, ILC112... Inc,. The rule. ( x / y ) ( y l l Z l l 2 . . . ImZm) ~ ( X I 1 Z l l 2 . . . ImZm) ! ! ! !. has a s a n instance ( A p i e C e . . . InCh~As) ( A s l i C l } 2 . . . ImCm) --~ ( a p i e C e . . . InCnllCll2... ImCm) I I I I I I as w e l l as (A~L~c~... InCn/(Asl c )) (AI'c'L~c~I2... I,~cm) ~ (&l~c'~... I~Cnh~lL2... Imam). 612. K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms. ! ! as a n instance. T h u s x m a t c h e s the c a t e g o r y (Apl~c ~ . . . [nCn), y m a t c h e s a n a t o m i c cat- e g o r y As in t h e first e x a m p l e a n d a n o n a t o m i c c a t e g o r y (As]~C ') in t h e s e c o n d , a n d e a c h zi m a t c h e s ci for 1 ( i ( m. A d e r i v a t i o n i n v o l v i n g t h e s e c o n d i n s t a n c e ( v i e w e d b o t t o m - u p ) c a n b e s e e n as p o p p i n g t h e t o p d i r e c t i o n a l a r g u m e n t /(Asl'c') f r o m t h e p r i m a r y c a t e g o r y a n d p u s h i n g t h e m d i r e c t i o n a l a r g u m e n t s IlCl]2.-. ImCm • T h u s , e a c h i n s t a n c e of the c o m b i n a t o r y r u l e a p p e a r s to closely r e s e m b l e a LIG p r o d u c t i o n . F o r e x a m p l e , in case of the s e c o n d i n s t a n c e w e h a v e . a p (.. ILC112... ImCm) ---+ a p ( . . / ( A s ] ' c ' ) ) A s (['c'11c112... ImCm) .. We n o w s h o w that, like the set of stack s y m b o l s o f a LIG, t h e set o f d i r e c t i o n a l a r g u - m e n t c a t e g o r i e s t h a t w e n e e d to b e c o n c e r n e d w i t h is finite.. D e f i n i t i o n 4.6 Let c b e a u s e f u l c a t e g o r y w i t h r e s p e c t to a g r a m m a r G if a n d o n l y if c ~ w fo r s o m e w E V~. T h e set of argument categories, args (G) o f a CCG, G = (VT~ VN~ S , f , R), is d e f i n e d as args (G) = Uc~f(a) args (c).. O b s e r v a t i o n 4.1 If c is a u s e f u l c a t e g o r y t h e n args (c) c args (G), a finite set d e t e r m i n e d b y t h e g r a m - mar, G.. This o b s e r v a t i o n can b e s h o w n b y a n i n d u c t i o n o n t h e l e n g t h o f t h e d e r i v a t i o n of s o m e s t r i n g f r o m c. T h e b a s e case c o r r e s p o n d s to a lexical a s s i g n m e n t a n d h e n c e trivially args (c) C args (G). T h e i n d u c t i v e step c o r r e s p o n d s to t h e u s e o f a c o m b i n a t i o n u s i n g a r u l e of t h e f o r m . ( x / y ) (yllZll2... ImZm) ---+ (XIlZll2... ImZm ). or. (yllZll2...]mZm) ( x \ y ) ~ (XIlZll2...ImZm). By i n d u c t i v e h y p o t h e s i s , a n y u s e f u l c a t e g o r y m a t c h i n g e i t h e r (x/y)} (x\y) o r (y]lZll2... ImZm) m u s t take its a r g u m e n t s f r o m args (G) (a finite set) a n d t h e r e f o r e t h e r e s u l t i n g u s e f u l c a t e g o r y also shares this p r o p e r t y . . T h e a b o v e p r o p e r t y m a k e s it possible to a d a p t t h e LIG a l g o r i t h m f o r CCG. N o t e t h a t in the CKY-style C C G r e c o g n i t i o n w e o n l y n e e d to r e c o r d t h e d e r i v a t i o n s f r o m u s e f u l categories. F r o m O b s e r v a t i o n 4.1 it f o l l o w s t h a t t h e lexical c a t e g o r y a s s i g n m e n t , f , d e t e r m i n e s t h e n u m b e r of "stack" s y m b o l s w e n e e d to b e c o n c e r n e d with. T h e r e f o r e , o n l y o n e of the v a r i a b l e s (x) in a c o m b i n a t o r y r u l e is essential in t h e s e n s e t h a t t h e n u m b e r of c a t e g o r i e s t h a t it can u s e f u l l y m a t c h is n o t b o u n d b y t h e g r a m m a r . T h e r e - fore, it w o u l d b e possible to m a p e a c h c o m b i n a t o r y r u l e to a n e q u i v a l e n t finite set of ins tances in w h i c h g r o u n d c a t e g o r i e s ( f r o m args (G)) w e r e s u b s t i t u t e d f o r all v a r i a b l e s o t h e r t h a n x; i.e., y, z l } . . . Z m in t h e c o m b i n a t o r y r u l e a b o v e . This w o u l d r e s u l t in a g r a m m a r t h a t w a s a slight n o t a t i o n a l v a r i a n t of a LIG w h e r e t h e C C G v a r i a b l e x a n d the LIG n o t a t i o n .- p e r f o r m similar roles. H o w e v e r , fo r t h e p u r p o s e o f c o n s t r u c t i n g a r e c o g n i t i o n a l g o r i t h m it is b o t h u n n e c e s s a r y a n d u n d e s i r a b l e to e x p a n d t h e n u m b e r o f rules in this way. We a d a p t t h e LIG a l g o r i t h m so t h a t it, in effect, c o n s t r u c t s a p p r o p r i a t e ins tances of the c o m b i n a t o r y rules as n e e d e d d u r i n g t h e r e c o g n i t i o n process.. 613. Computational Linguistics Volume 19, Number 4. 4.2 R e c o g n i t i o n of CCG T h e first step in m o d i f y i n g the LIG a l g o r i t h m is to d e f i n e t h e c o n s t a n t s MSL a n d M T L for the case of CCG. Let G -- (VT, VN, S , f , R) b e a CCG. T h e s e d e f i n i t i o n s f o l l o w im- m e d i a t e l y f r o m t h e similarities b e t w e e n C C G c o m b i n a t o r y ru l es a n d LIG p r o d u c t i o n s . . O b s e r v a t i o n 4.2 If w e w e r e to e x p r e s s a c o m b i n a t o r y r u l e . ( x / y ) (yIlZl...ImZm) ~ (X]IZ1...]mZm). in t e r m s of LIG p r o d u c t i o n . A ("")'1..."/m) "-'+ A p ( " " y p ) As(o@). t h e n w e h a v e t h e f o l l o w i n g c o r r e s p o n d e n c e s : . • % w i t h / y . . • "~i w i t h = ]izi for 1 < i < m, i.e., "Y1 ' ' ' "Ym w i t h IlZl . . . ]mZm.. • A = Ap.. • As (as) w i t h y h z ~ . . . ImZm .. G i v e n s u c h a d i r e c t c o r r e s p o n d e n c e b e t w e e n c o m b i n a t o r y r u l e s a n d LIG p r o d u c t i o n s , w e will d e f i n e the f o l l o w i n g c o n s t a n t s to b e u s e d in t h e t h e C C G a l g o r i t h m w i t h m i n i m a l e x p l a n a t i o n . . • M T L is the m a x i m u m a r i t y of a lexical category. T h u s , M T L = max { arity (c) ] c c f(a), a C VT }.. • MSL s h o u l d be t h e m a x i m u m a r i t y of a u s e f u l c a t e g o r y t h a t can m a t c h the s e c o n d a r y c a t e g o r y of a rule. N o t e t h a t a c a t e g o r y m a t c h i n g (y]lZ1]2-.. ImZm) will h a v e a n a r i t y t h a t is t h e s u m o f m a n d t h e a r i t y o f t h e c a t e g o r y m a t c h i n g y. F u r t h e r m o r e , n o t e t h a t since y is a n a r g u m e n t of the p r i m a r y c a t e g o r y it m u s t b e b o u n d to a m e m b e r of args (G). T h u s , MSL = max { m ] (y[lz112... ]mZm) } is t h e s e c o n d a r y c a t e g o r y o f a r u l e in R + max { arity (c) ] c E args (G) }.. • N o t e t h a t in t h e case of CCG, M C L n e e d n o t b e d e f i n e d i n d e p e n d e n t l y of MSL.. • As b e f o r e , w e d e f i n e T T C as TTC = max { MSL, M T L }.. Since d i r e c t i o n a l c a t e g o r i e s p l a y the s a m e ro l e t h a t stack s y m b o l s h a v e in LIG, w e r e v i s e the n o t i o n s of l e n g t h top ( ) a n d rest ( ) as follows. We s a y t h a t t h
Show more

New documents

However, in case of TERT rs2736098, two meta-analyses published in 2012 have shown that it has no correlation with cancer risks, whereas a case-control study and a meta-analysis

Evaluation of gener- ated output is typically conducted using a com- bination of crowdsourced human judgments and automated metrics adopted from machine trans- lation and text

From this table it is quite obvious that there is a marked difference in the economic position of the large and small families This table shows the actual number of psrson m the family

We first com- pare two standard search algorithms, greedy and beam search, as well as our newly pro- posed iterative beam search which produces a more diverse set of candidate

Title: Nutrient loading on subsoils from on-site wastewater effluent, comparing septic tank and secondary treatment systems.. To appear in: Water

In this study, we investigated the therapeutic potentials of induced pluripotent stem cells iPSCs to treat IUA in a mouse model and further explore the role of heme-oxygenase-1 HO-1

Since most of such systems are trained with a maximum likelihood MLE ob- jective they suffer from issues such as lack of generalizability and the generic response prob- lem, i.e., a

Nutrient ingestion increased mTOR signaling, but not hVps34 activity in human skeletal muscle after sprint exercise H�akan C.. Lilja1,2, Olav Rooyackers3, Krzysztofa Odrzywol4, James

As input, our approach takes a strategy as well as a pool of argumentative discourse units ADUs for any specific topic-stance pair x.. Each ADU has the role of a thesis in terms of

Original Article Association between cytotoxic T lymphocyte antigen-4 gene polymorphisms and gastric cancer risk: a meta-analysis of case-control studies.. Departments of