P a r s i n g w it h P r in c ip le s :
P r e d i c t i n g a P h r a s a l N o d e B e f o r e I t s H e a d A p p e a r s
1
2
E d w a r d G i b s o n D e p a r t m e n t o f P h i l o s o p h y C a r n e g i e M e ll o n U n i v e r s i t y
P i t t s b u r g h , P A 1 5 213 e a f g ;3>cad. c s . c m u . e d u
1
I n t r o d u c t i o n
R e c e n t w ork in g e n e r a t i v e s y n t a c t i c t h e o r y h a s s h i f t e d th e c o n c e p t i o n o f a n a t u r a l l a n g u a g e g r a m m a r from a h o m o g e n e o u s s e t o f p h r a s e s t r u c t u r e ( P S ) rule s t o a h e t e r o g e n e o u s s e t o f w e l l - f o r m e d n e s s c o n s t r a i n t s on r e p r e s e n t a t i o n s ( s e e , for e x a m p l e , C h o m s k y ( 1 9 8 1 ) , S t o w e l l ( 1 9 8 1 ) , C h o m s k y ( 1 9 8 6 a ) an d P o ll a r d k S a g ( 1 9 8 7 ) ) . In t h e s e t h e o r i e s it is a s s u m e d t h a t th e g r a m m a r c o n t a i n s p r in c i p le s t h a t are i n d e p e n d e n t o f th e la n g u a g e b e i n g p a r s e d , t o g e t h e r w i t h p r in c i p le s t h a t are p a r a m e t e r i z e d t o reflect t h e v a r y in g b e h a v i o r o f d if feren t l a n g u a g e s . H o w e v e r , t h e r e is m o r e t o a t h e o r y o f h u m a n s e n t e n c e p r o c e s s i n g t h a n j u s t a t h e o r y o f l i n g u i s t i c c o m p e t e n c e . A t h e o r y o f p e r f o r m a n c e c o n s i s t s o f b o t h l i n g u i s t i c k n o w l e d g e an d a p a r s i n g a l g o r i t h m . T h i s p a p e r w ill i n v e s t i g a t e w a y s o f e x p l o i t i n g p r i n c i p l e - b a s e d s y n t a c t i c t h e o r i e s d i r e c t l y in a p a r s i n g a l g o r i t h m in o r d e r t o d e t e r m i n e w h e t h e r or n o t a p r i n c i p l e - b a s e d p a r s i n g a l g o r i t h m ca n be c o m p a t i b l e w i t h p s y c h o l i n g u i s t i c e v i d e n c e .
P r i n c i p l e - b a s e d p a r s i n g is an i n t e r e s t i n g r e sea rch t o p i c n o t o n l y f r o m a p s y c h o l i n g u i s t i c p o i n t o f v i e w b u t a ls o fr o m a p r a c t i c a l p o i n t o f v i e w . W h e n P S ru le s are u se d , a s e p a r a t e g r a m m a r m u s t b e w r i t t e n for e a c h l a n g u a g e p a r s e d . E a c h o f t h e s e g r a m m a r s c o n t a i n s a g r e a t d e a l o f r e d u n d a n t i n f o r m a t i o n . For e x a m p l e , th e r e m a y b e t w o r u le s , in d if feren t g r a m m a r s , t h a t are i d e n t i c a l e x c e p t for t h e o r d e r o f t h e c o n s t i t u e n t s on t h e rig ht h a n d s i d e , i n d i c a t i n g a d if fe r e n c e in w ord o r d e r . T h i s r e d u n d a n c y c a n b e a v o i d e d b y e m p l o y i n g a u n iv e r s a l p h r a s e s t r u c t u r e c o m p o n e n t ( n o t n e c e s s a r i l y in t h e f o r m o f r u les ) a l o n g w i t h p a r a m e t e r s a n d a s s o c i a t e d v a lu e s . A p r i n c i p l e s a n d p a r a m e t e r s a p p r o a c h p r o v i d e s a s i n g l e c o m p a c t g r a m m a r for all l a n g u a g e s t h a t w o u ld o t h e r w i s e b e r e p r e s e n t e d by m a n y dif ferent ( a n d r e d u n d a n t ) P S g r a m m a r s .
A n y m o d e l o f h u m a n p a r s i n g m u s t d i c t a t e : a) h o w s t r u c t u r e s ar e p r o j e c t e d fr o m t h e le x i c o n ; b) h o w s t r u c t u r e s are a t t a c h e d t o o n e a n o t h e r ; a n d c) w h a t c o n s t r a i n t s affect t h e r e s u l t a n t s t r u c t u r e s . T h i s p a p e r w ill c o n c e n t r a t e o n t h e first t w o c o m p o n e n t s w i t h r e s p e c t t o p r i n c i p l e - b a s e d p a r s i n g a l g o r i t h m s : n o d e p r o j e c t i o n a n d s t r u c t u r e a t t a c h m e n t .
T w o b a sic c o n t r o l s t r u c t u r e s e x i s t for a n y p a r s i n g a lg o r i t h m : d a t a - d r i v e n c o n t r o l a n d h y p o t h e s i s - d r i v e n c o n tr o l . E v e n if a p a r s e r is p r e d o m i n a n t l y h y p o t h e s i s - d r i v e n , t h e p r e d i c t i o n s t h a t it m a k e s m u s t a t s o m e p o i n t b e c o m p a r e d
with
t h ed a ta th at
are p r e s e n t e d t o it. S o m e d a t a - d r i v e n c o m p o n e n t is t h e r e f o r e n e c e s s a r y for a n y p a r s i n g a l g o r i t h m .Thus, a
r e a s o n a b l e h y p o t h e s i s t o t e s t is t h a t t h e h u m a n p a r s i n g a l g o r i t h m is e n t i r e l y d a t a - d r i v e n . T h i s is e x a c t l y t h e a p p r o a c h t h a t is t a k e n b y a n u m b e r o f p r i n c i p l e - b a s e d p a r s i n g a l g o r i t h m s ( s e e , for e x a m p l e , A b n e y ( 1 9 8 6 ) , K a s h k e t ( 1 9 8 7 ) , G i b s o n &: C la r k ( 1 9 8 7 ) a n d P r i t c h e t t ( 1 9 8 7 ) ) . T h e s e p a r s i n g a l g o r i t h m s e a c h i n c l u d e a n o d e p r o j e c t i o n a l g o r i t h m t h a t p r o j e c t s a n i n p u t w o r d t o a m a x i m a l c a t e g o r y , b u t d o e s n o t c a u s e t h e p r o j e c t i o n o f a n y f u r t h e r n o d e s .A l t h o u g h t h i s s i m p l e s t r a t e g y is a t t r a c t i v e b e c a u s e o f it s s i m p l i c i t y , it t u r n s o u t t h a t it c a n n o t a c c o u n t for c e r t a i n p h e n o m e n a o b s e r v e d in t h e p r o c e s s i n g o f D u t c h (F r a z ie r ( 1 9 8 7 ) : s e e S e c t i o n 2 . 1 ) . A c o m p l e t e l y d a t a - d r i v e n n o d e p r o j e c t i o n a l g o r i t h m a ls o h a s d if fic u lty a c c o u n t i n g for t h e p r o c e s s i n g e a s e o f a d j e c t i v e - n o u n c o n s t r u c t i o n s in E n g l i s h ( s e e S e c t i o n 2 . 2 ) . A s a r esu lt o f t h i s e v i d e n c e , a p u r e l y d a t a - d r i v e n n o d e p r o j e c t i o n
1 Paper presented at the In ternational W orkshop on Parsing T echnologies, A u gu st 28-31, 1989.
2 I would like to than k R obin Clark, Rick K azm an, Howard K urtzm an , Eric N yb erg and Brad P ritch ett for their com m en ts on earlier drafts of this paper, and I offer the usual disclaim er.
a l g o r i t h m m u s t b e r e je c t e d in favor o f a n o d e p r o j e c t i o n a l g o r i t h m t h a t h as a p r e d i c t i v e ( h y p o t h e s i s - d r i v e n ) c o m p o n e n t Frazier ( 1 9 8 7 ) ) .
T h i s p a p e r d e s c r i b e s a n o d e p r o j e c t i o n a l g o r i t h m t h a t is part o f th e C o n s t r a i n e d P a r a lle l Parser ( C P P ) ( G i b s o n ( 1 9 8 7 ) , G i b s o n k C la r k ( 1 9 8 7 ) a n d C la r k & G i b s o n ( 1 9 8 8 ) ) . T h i s parser is b a s e d on t h e p r in c ip le s o f G o v e r n m e n t - B i n d i n g t h e o r y ( C h o m s k y ( 1 9 8 1 , 1 9 8 6 a ) ) . S e c t i o n 3.1 g i v e s an o v e r v i e w o f th e C P P m o d e l , w h ile S e c t i o n 3 .2 d e s c r i b e s th e n o d e p r o j e c t i o n a l g o r i t h m . S e c t i o n 3 .3 d e s c r i b e s th e a t t a c h m e n t a l g o r i t h m , a n d i n c l u d e s an e x a m p l e p a r se . T h e s e n o d e p r o j e c t i o n a n d a t t a c h m e n t a l g o r i t h m s d e m o n s t r a t e t h a t a p r i n c i p l e - b a s e d p a r s i n g a l g o r i t h m c a n a c c o u n t for th e D u t c h a n d E n g li s h d a t a , w h ile a v o i d i n g th e e x i s t e n c e o f r e d u n d a n t p h r a s e s t r u c t u r e rule s. T h u s it is c o n c l u d e d t h a t o n e s h o u l d c o n t i n u e to i n v e s t i g a t e h y p o t h e s i s -d r iv e n p r i n c i p l e - b a s e -d m o -d e l s in t h e s e a r c h for an o p t i m a l p s y c h o l i n g u i s t i c m o -d e l .
2
D a t a - D r i v e n N o d e P r o j e c t io n : E m p ir ica l P r e d i c t io n s an d R e su lts
2.1
E v id e n c e fr om D u t c h
C o n s i d e r t h e s e n t e n c e f r a g m e n t in (1):
(1)
... d a t h e t m e i s j e v a n H o l l a n d ... ... “t h a t t h e girl f r o m H o l l a n d ” ...
D u t c h is like E n g l i s h in t h a t p r e p o s i t i o n a l p h r a s e m o d i f ie r s o f n o u n s m a y f o l lo w t h e n o u n . T h u s t h e p r e p o s i t i o n a l p h r a s e van H ollan d m a y b e a m o d i f ie r o f t h e n o u n p h r a s e the girl in e x a m p l e (1). U n lik e E n g l i s h , h o w e v e r , D u t c h is S O V in s u b o r d i n a t e c l a u s e s . H e n c e in (1) t h e p r e p o s i t i o n a l p h r a s e van H olland m a y a ls o be t h e a r g u m e n t o f a ve r b t o fo llo w . In p a r t i c u l a r , if t h e w o r d ghm lachte ( “s m i l e d ” ) fo l lo w s t h e f r a g m e n t in (1) , t h e n t h e p r e p o s i t i o n a l p h r a s e van H olland c a n a t t a c h t o t h e n o u n p h r a s e t h a t it fo llo w s, s i n c e t h e verb ghm lachte h a s n o l e x i c a l r e q u i r e m e n t s ( s e e (2a ) ) . If, o n t h e o t h e r h a n d , t h e w o r d houdt ( “l i k e s ” ) f o l lo w s t h e f r a g m e n t in (1) , t h e n t h e P P van H olland m u s t a t t a c h as a r g u m e n t o f t h e verb houdt, s i n c e t h e ve r b req u ir e s s u c h a c o m p l e m e n t ( s e e (2b ) ) .
(2)
a . ... d a t [s [iVP h e t m e i s j e [p p v a n H o l l a n d ]] [ v p g l i m l a c h t e ]] ... “t h a t t h e gir l f r o m H o l l a n d s m i l e d ” ...
b . ... d a t [5 [.vp h e t m e i s j e ] [ v p [ v [pp v a n H o l la n d ] [v h o u d t ]]]] ... “t h a t t h e gir l likes H o l l a n d ”
F o l l o w i n g A b n e y ( 1 9 8 6 ) , F ra zier ( 1 9 8 7 ) , C la r k k G i b s o n ( 1 9 8 8 ) a n d n u m e r o u s o t h e r s , it is a s s u m e d t h a t a t t a c h e d s t r u c t u r e s are p r e fe r r e d o v e r u n a t t a c h e d s t r u c t u r e s . If w e a ls o a s s u m e t h a t a p h r a s a l n o d e is n o t p r o j e c t e d u n t i l it s h e a d is e n c o u n t e r e d , w e p r e d i c t t h a t p e o p l e w ill e n t e r t a i n o n l y o n e h y p o t h e s i s for th e s e n t e n c e f r a g m e n t in (1): t h e m o d i f i e r a t t a c h m e n t . T h u s w e p r e d i c t t h a t it s h o u l d t a k e l o n g e r t o p a r s e t h e c o n t i n u a t i o n houdt ( “li k e s ” ) t h a n t o p a r s e t h e c o n t i n u a t i o n ghm lachte ( “s m i l e d ” ), s i n c e t h e c o n t i n u a t i o n houdt fo r c e s t h e p r e p o s i t i o n a l p h r a s e t o b e r e a n a l y z e d as an a r g u m e n t o f t h e v erb. H o w e v e r , c o n t r a r y t o t h i s p r e d i c t i o n , t h e v e r b t h a t a l l o w s a r g u m e n t a t t a c h m e n t is a c t u a l l y p a r s e d f a s te r t h a n t h e verb t h a t n e c e s s i t a t e s m o d i f i e r a t t a c h m e n t in s e n t e n c e f r a g m e n t s like (1). If t h e ve r b h a d b e e n p r o j e c t e d b e f o r e its h e a d w a s e n c o u n t e r e d , t h e n t h e a r g u m e n t a t t a c h m e n t o f t h e P P van H olland w o u l d b e p o s s i b l e a t t h e s a m e t i m e t h a t t h e m o d i f ie r a t t a c h m e n t is p o s s i b l e.3 T h u s Fra zie r c o n c l u d e s t h a t in s o m e c a s e s p h r a s a l n o d e s m u s t be p r o j e c t e d b e f o r e t h e i r l e x i c a l h e a d s h a v e b e e n e n c o u n t e r e d .
3 It is b eyon d the scop e of this pap er to offer an e x p la n a tio n as to why the argum ent attach m en t is in fact preferred, to the m odifier a tta ch m en t. T h is paper seeks only to dem on strate th at the argum ent attach m en t p o ssib ility m u st at least be avai l abl e
for a p sych ologically real parser. See A b ney (1 9 8 6 ), Frazier (1987) and Clark U G ibson (1988) for possib le exp la n a tio n s for the preference p h en om en on.
2.2
E v id e n c e from E n g lish
A s e c o n d p ie c e o f e v i d e n c e a g a i n s t th is l i m i t e d t y p e o f n o d e p r o j e c t i o n is p r o v i d e d by t h e p r o c e s s i n g o f n o u n p h r a s e s in E n g li s h t h a t h a v e m o r e t h a n o n e p r e - h e a d c o n s t i t u e n t .
It is a s s u m e d t h a t t h e p r i m i t i v e o p e r a t i o n o f a t t a c h m e n t is a s s o c i a t e d w i t h a c e r t a i n p r o c e s s i n g c ost. H e n c e th e a m o u n t o f t i m e t a k e n to p a rse a s i n g l e in p u t w ord is d i r e c t l y r e la t e d to t h e n u m b e r o f a t t a c h m e n t s t h a t t h e p arser m u s t e x e c u t e t o in c o r p o r a t e t h a t s t r u c t u r e in t o t h e e x i s t i n g s t r u c t u r e ( s ) . If a p h r a s a l n o d e is n o t p r o j e c t e d u n t i l it s h e a d is e n c o u n t e r e d , t h e n p a r s i n g t h e final w ord o f a h e a d -f in a l c o n s t r u c t i o n will in volve a t t a c h i n g all its p r e - h e a d s t r u c t u r e s at t h a t p o i n t . If, in a d d i t i o n , t h e r e is m o r e t h a n o n e p r e - h e a d s t r u c t u r e a n d n o a t t a c h m e n t s are p o s s i b l e u n til th e h e a d a p p e a r s , t h e n a s i g n if i c a n t p r o p o r t i o n o f p r o c e s s i n g ti m e s h o u l d be s p e n t in p r o c e s s i n g t h e h e a d .
T h e h y p o t h e s i s t h a t a p h r a s a l n o d e is n o t p r o j e c t e d u n t i l its h e a d is e n c o u n t e r e d c a n b e t e s t e d w i t h th e E n g li s h n o u n p h r a s e , s i n c e th e h e a d o f an E n g l i s h n o u n p h r a s e a p p e a r s after a sp e c i fie r a n d a n y a d j e c t iv a l m o d ifie r s . For e x a m p l e , c o n s i d e r t h e E n g li s h n o u n p h r a s e the big red book. F ir st , th e w o r d the is read a n d a d e t e r m i n e r p h r a s e is b u il t . S i n c e it is a s s u m e d t h a t n o d e s are n o t p r o j e c t e d u n til their h e a d s are e n c o u n t e r e d , no n o u n p h r a s e is b u i l t at t h i s p o in t . T h e w ord big is n o w read a n d c a u s e s t h e p r o j e c t i o n o f an a d j e c t iv e p h r a s e . A t t a c h m e n t s are n o w a t t e m p t e d b e t w e e n t h e tw o s t r u c t u r e s b u il t t h u s far. N e i t h e r o f t h e c a t e g o r i e s ca n b e a r g u m e n t , sp e c i fie r or m o d i fie r for t h e o t h e r , s o n o a t t a c h m e n t is p o s s ib le . T h e n e x t w ord red n o w c a u s e s t h e p r o j e c t i o n o f an a d j e c t i v e p h r a s e , a n d o n c e a g a in n o a t t a c h m e n t s are p o s s i b l e . O n l y w h e n t h e w ord book is read a n d p r o j e c t e d to a n o u n p h r a s e c a n a t t a c h m e n t s ta k e p la c e . F ir st t h e a d j e c t i v e p h r a s e r e p r e s e n t i n g red a t t a c h e s as a m o d i f ie r o f t h e n o u n p h r a s e book. T h e n t h e A P r e p r e s e n t i n g big a t t a c h e s as a m o d i fie r o f t h e n o u n p h r a s e j u s t c o n s t r u c t e d . F i n a l l y t h e d e t e r m i n e r p h r a s e r e p r e s e n t i n g the a t t a c h e s as s p e c i fie r o f t h e n o u n p h r a s e big red book.
T h u s if w e a s s u m e t h a t a p h r a s a l n o d e is n o t p r o j e c t e d u n t i l its h e a d is p a r s e d , w e p r e d i c t t h a t a g r e a t e r n u m b e r o f a t t a c h m e n t s w ill t a k e p l a c e in p a r s i n g t h e h e a d t h a n in p a r s i n g a n y o t h e r w o r d in t h e n o u n p h r a se . S i n c e it is a s s u m e d t h a t an a t t a c h m e n t is a s i g n i f i c a n t p arser o p e r a t i o n , it is p r e d i c t e d t h a t p e o p l e s h o u l d t a k e m o r e t i m e p a r s i n g t h e h e a d o f t h e n o u n p h r a s e t h a n t h e y ta k e p a r s i n g t h e o t h e r w o r d s o f th e n o u n p h r a s e . S i n c e t h e r e is n o p s y c h o l i n g u i s t i c e v i d e n c e t h a t p e o p l e t a k e m o r e t i m e t o p r o c e s s h e a d s in h e a d - f in a l c o n s t r u c t i o n s , I h y p o t h e s i z e t h a t p h r a s a l n o d e s are b e i n g p r o j e c t e d b e fo r e th eir h e a d s are b e i n g e n c o u n t e r e d .
3
H y p o t h e s i z i n g a P h r a s a l N o d e B e fo r e Its H e a d A p p e a r s
3.1
T h e P a r s in g M o d e l: T h e C o n s t r a i n e d P arallel P a rse r
T h i s p a p e r a s s u m e s t h e C o n s t r a i n e d P a r a ll e l P a rser ( C P P ) as it s m o d e l o f h u m a n s e n t e n c e p r o c e s s i n g (s e e G i b s o n ( 1 9 8 7 ) , G i b s o n & C l a r k ( 1 9 8 7 ) a n d C la r k k G i b s o n ( 1 9 8 8 ) ) . T h e C P P m o d e l is b a s e d o n t h e p r in c i p le s o f G o v e r n m e n t - B i n d i n g T h e o r y ( C h o m s k y ( 1 9 8 1 , 1 9 8 6 a ) ) ; c r u c ia l ly C P P h a s n o s e p a r a t e m o d u l e c o n t a i n i n g l a n g u a g e - p a r t i c u l a r ru le s. F o l l o w i n g M a r c u s ( 1 9 8 0 ) , s t r u c t u r e s p a r s e d u n d e r t h e C P P m o d e l are p la c e d o n a s t a c k a n d t h e m o s t r e c e n t l y b u il t s t r u c t u r e s are p l a c e d in a d a t a s t r u c t u r e c a ll e d t h e
buffer.
T h e parser b u i l d s s t r u c t u r e b y a t t a c h i n g n o d e s in t h e buffer t o n o d e s o n t o p o f t h e s t a c k . U n lik e M a r c u s m o d e l , t h e C P P m o d e l a l l o w s m u l t i p l e r e p r e s e n t a t i o n s for th e s a m e i n p u t s t r i n g t o e x i s t in a buff er or s t a c k cell. A l t h o u g h m u l t i p l e r e p r e s e n t a t i o n s for t h e s a m e i n p u t s t r i n g are p e r m i t t e d , c o n s t r a i n t s o n p a r a l l e l i s m f r e q u e n t ly c a u s e o n e r e p r e s e n t a t i o n t o b e p referred o v er t h e o t h e r s . M o t i v a t i o n for t h e p a r a ll e l h y p o t h e s i s c o m e s fr o m g a r d e n p a t h e f f e c t s a n d p e r c e p t i o n o f a m b i g u i t y in a d d i t i o n t o r e l a t i v e p r o c e s s i n g l o a d e ff e c ts . For i n f o r m a t i o n o n t h e p a r t i c u l a r c o n s t r a i n t s a n d t h e ir m o t i v a t i o n s , s e e G i b s o n & C la r k ( 1 9 8 7 ) , C la r k & G i b s o n ( 1 9 8 8 ) a n d t h e r e f e r e n c e s c i t e d in t h e s e p a p e r s .
3.1 .1
L ex ica l E n tries for C P P
A le x i c a l e n t r y a c c e s s e d b y C P P c o n s i s t s of, a m o n g o t h e r t h i n g s , a theta-gnd. A t h e t a - g r i d is an u n o r d e r e d list o f theta structures. E ach t h e t a s t r u c t u r e c o n s i s t s o f a t h e m a t i c role a n d a s s o c i a t e d s u b c a t e g o r i z a t i o n f o r m a t i o n . O n e t h e t a s t r u c t u r e in a t h e t a - g r i d m a y be m a r k e d as indirect t o refer to its s u b j e c t . For e x a m p l e , t h e w ord shout m i g h t h a v e t h e fo l lo w in g t h e t a - g r i d:4
( 3 )
((Subcat = PREP, Thematic-Role = GOAL)
(Subcat = COMP, Thematic-Role = PR0P0SITI0H))
W h e n t h e w o r d shout (or an in f le c t e d v a r ia n t o f shout) is e n c o u n t e r e d in an i n p u t p h r a s e , t h e t h e m a t i c role agent w ill be a s s i g n e d t o it s s u b j e c t , as lo n g as th is s u b j e c t is a n o u n p h r a s e . T h e d ir e c t t h e m a t i c ro les goal a n d proposition w ill be a s s i g n e d to p r e p o s i t i o n a l a n d c o m p l e m e n t i z e r p h r a s e s r e s p e c t i v e l y , as l o n g as e a c h is p r e s e n t . S i n c e t h e ord er o f t h e t a s t r u c t u r e s in a t h e t a - g r i d is n o t r e le v a n t t o it s use in p a r s i n g , th e a b o v e t h e t a - g r i d for shout w ill b e su f fic i e n t to p a r s e b o t h s e n t e n c e s in ( 4 ) .
( 4 )
a. T h e m a n s h o u t s [ p p to t h e w o m a n ] [ c p t h a t E r n ie se e s t h e rock] b . T h e m a n s h o u t s [ c p t h a t E r n ie s e e s th e rock] [ p p to t h e w o m a n ]
3 .1 .2
X T h e o r y in C P P
T h e C P P m o d e l a s s u m e s
X
T h e o r y as p r e s e n t in C h o m s k y ( 1 9 8 6 b ) .X
T h e o r y h a s t w o b a s i c p r in c ip le s : first, e a c h tr e e s t r u c t u r e m u s t h a v e a h e a d ; a n d s e c o n d , e a c h s t r u c t u r e m u s t h a v e a m a x i m a l p r o j e c t i o n . A s a r e s u l t o f t h e s e p r i n c i p l e s a n d o t h e r p r in c i p le s , (e.g., t h e 0 - C r i t e r i o n , t h e E x t e n d e d P r o j e c t i o n P r in c i p le , C a s e T h e o r y ) , t h e p o s i t i o n s o f a r g u m e n t s , sp e c i fie r s and- m o d i f i e r s w i t h r e s p e c t t o t h e h e a d o f a g i v e n s t r u c t u r e are l i m i t e d . In p a r t i c u l a r , a s p e c i f ie r m a y o n l y a p p e a r as a s i s t e r t o t h e o n e - b a r p r o j e c t i o n b e lo w a m a x i m a l p r o j e c t i o n , a n d t h e h e a d , a l o n g w i t h it s a r g u m e n t s , m u s t a p p e a r b e l o w t h e o n e - b a r p r o j e c t i o n . T h e o r d e r s o f t h e s p e c i f i e r a n d a r g u m e n t s r e l a t i v e t o t h e h e a d is l a n g u a g e d e p e n d e n t . For e x a m p l e , t h e ba sic s t r u c t u r e o f E n g l i s h c a t e g o r i e s is s h o w n b e l o w . F u r t h e r m o r e , b in a r y b r a n c h i n g is a s s u m e d ( K a y n e ( 1 9 8 3 ) ) , so t h a t m o d i f i e r s are C h o m s k y - a d j o i n e d t o t h e t w o - b a r or o n e - b a r le v e l s , g i v i n g o n e p o s s i b l e s t r u c t u r e for a p o s t - h e a d m o d i f i e r b e l o w o n t h e r ig h t .S p e c i f i e r ^ j ^ S p e o f i e r ^ ^
X A rgum en t* ^ ^ ^ M o d i f l e r
X A rgum en t*
3.1 .3
T h e C P P P a r s in g A l g o r i t h m
T h e C P P a l g o r i t h m is e s s e n t i a l l y v e r y s i m p l e . A w o r d is p r o j e c t e d v i a n o d e p r o j e c t i o n ( s e e S e c t i o n 3 .2 ) i n t o t h e buff er. If a t t a c h m e n t s are p o s s i b l e b e t w e e n t h e b uffer a n d t h e t o p o f t h e s t a c k , t h e n t h e r e s u l ts o f t h e s e a t t a c h m e n t s are p l a c e d i n t o t h e b uffer a n d t h e s t a c k is p o p p e d . A t t a c h m e n t s are a t t e m p t e d a g a in u n t i l n o lo n g e r p o s s i b l e . T h i s e n t i r e p r o c e d u r e is r e p e a t e d for e a c h w o r d in t h e i n p u t s t r i n g . T h e fo r m a l C P P a l g o r i t h m is g i v e n b e lo w :
I. ( I n i t i a l i z a t i o n s ) S e t t h e s t a c k t o nil. S e t t h e buff er t o nil.
4 In a m ore co m p lete theory, a sy n ta ctic category w ould b e d eterm ined from the th em atic role (C h om sk y (1 9 8 6 a )).
2 ( E n d i n g C o n d i t i o n ) If t h e e n d o f t h e in p u t s t r i n g h as b e e n r e a c h e d an d th e buffer is e m p t y t h e n re tu rn th e c o n t e n t s o f t h e s t a c k a n d s t o p .
3 If t h e buffer is e m p t y t h e n p r o j e c t n o d e s for e a c h l e x i c a l e n t r y c o r r e s p o n d i n g t o t h e n e x t word in th e in p u t s t r i n g , a n d p u t t h i s list o f m a x i m a l p r o j e c t i o n s in t o th e buffer.
4 M ake all p o s s i b l e a t t a c h m e n t s b e t w e e n th e s t a c k a n d th e buffer, s u b j e c t to th e a t t a c h m e n t c o n s t r a i n t s (s e e C la r k & G i b s o n ( 1 9 8 8 ) ) . P u t t h e a t t a c h e d s t r u c t u r e s in th e buffer. If no a t t a c h m e n t s are p o s s ib le , t h e n p u t t h e c o n t e n t s o f t h e buffer on t o p o f th e st a c k .
5. G o t o 2.
3.2
T h e P r o j e c t i o n o f N o d e s fr om t h e L ex ico n
N o d e p r o j e c t i o n p r o c e e d s as f o l lo w s . F irst a le x i c a l i t e m is p r o j e c t e d t o a p h r a s a l n o d e : a Confirm ed n o d e (C-node). F o l l o w i n g X T h e o r y , e a c h l e x i c a l e n t r y for a g iv e n w ord is p r o j e c t e d m a x i m a l l y . For e x a m p l e , th e word rock, w h ic h h a s b o t h a n o u n a n d a verb e n t r y w o u l d b e p r o j e c t e d t o at le a s t tw o m a x i m a l p r o j e c t io n s :
(5)
a. [/vp [n1 [jV rock ]]] b. [vp [v [v rock ]]]
N e x t , t h e p a r se r h y p o t h e s i z e s n o d e s w h o s e h e a d s m a y a p p e a r i m m e d i a t e l y to t h e rig ht o f th e g i v e n C -n o d e . T h e s e p r e d i c t e d s t r u c t u r e s are c a ll e d hypothesized n o d e s or H-nodes. A n H - n o d e is d e fi n e d t o b e any n o d e w h o s e h e a d is t o . a e rig ht o f all l e x i c a l i n p u t . In o r d e r to d e t e r m i n e w h ic h H - n o d e s t r u c t u r e s t o h y p o t h e s i z e fr o m a g i v e n C - n o d e , it is n e c e s s a r y t o c o n s u l t t h e a r g u m e n t p r o p e r t i e s a s s o c i a t e d w i t h t h e C-n’o d e t o g e t h e r w it h t h e s p e c i f i e r a n d m o d i f ie r p r o p e r t i e s o f t h e n o d a l c a t e g o r y a n d th e w ord o rd er p r o p e r t i e s o f th e l a n g u a g e in q u e s t i o n . It is a s s u m e d t h a t t h e a b il it y o f o n e c a t e g o r y to a c t as s p e c i f i e r , m o d i fie r or a r g u m e n t o f a n o t h e r c a t e g o r y is p a r t o f u n p a r a m e t e r i z e d U n i v e r s a l G r a m m a r . O n t h e o t h e r h a n d , th e rela tiv e or d e r o f t w o c a t e g o r i e s is a s s u m e d t o b e p a r a m e t e r i z e d a c r o s s d if feren t l a n g u a g e s . For e x a m p l e , a d e te r m in e r p h r a s e , if it e x i s t s in a g i v e n l a n g u a g e , is u n i v e r s a l l y a ll o w a b l e as a s p e c i f ie r o f a n o u n p h r a s e . W h e t h e r t h e d e t e r m i n e r a p p e a r s b e f o r e or a fter it s h e a d n o u n d e p e n d s on t h e l a n g u a g e - p a r t i c u l a r v a lu e s a s s o c i a t e d w i t h t h e p a r a m e t e r s t h a t d e t e r m i n e w o r d o rd er.
T h r e e p a r a m e t e r s are p r o p o s e d t o a c c o u n t for v a r i a t i o n in w o r d o rd er, o n e for e a c h o f a r g u m e n t , s p e c ifie r and m o d i fie r p r o j e c t i o n s.5 For e a c h l a n g u a g e , e a c h o f t h e s e p a r a m e t e r s is a s s o c i a t e d w i t h a t l e a s t o n e va lu e , w here th e p a r a m e t e r v a l u e s c o m e f r o m t h e f o l l o w i n g set: { * h e a d * , * s a t e l l i t e* } . 6 T h e v a lu e h e a d i n d i c a t e s t h a t a c a t e g o r y C c a u s e s t h e p r o j e c t i o n t o t h e r ig h t o f t h o s e c a t e g o r i e s for w h ic h C m a y b e h e a d . T h u s this v a lu e i n d i c a t e s h e a d - i n i t i a l w o r d o r d e r . T h e v a lu e ^ s a t e l l i t e * i n d i c a t e s t h a t a c a t e g o r y C c a u s e s th e p r o j e c t io n t o t h e r ig h t o f t h o s e c a t e g o r i e s for w h i c h C m a y b e a s a t e l l i t e c a t e g o r y . H e n c e t h i s v a l u e i n d i c a t e s h e a d -f in a l w o r d o r d e r . H - n o d e p r o j e c t i o n f r o m a c a t e g o r y C is d e f i n e d in (6) .
(6)
u
/( A r g u m e n t , S p e c if ie r , M o d i f i e r ) H - N o d e P r o j e c t i o n f r o m c a t e g o r y C: If t h e v a lu e a s s o c i a t e d w i t h t h e ( a r g u m en t , sp e c i fie r , m o d i f i e r ) p r o j e c t i o n p a r a m e t e r is * h e a d * , t h e n c a u s e t h e p r o j e c t i o n o f ( a r g u m e n t , s p e c ifie r , m o d ifier) s a t e l l i t e s , a n d a t t a c h t h e m t o t h e rig h t b e l o w t h e a p p r o p r i a t e p r o j e c t i o n o f C . If t h e v a lu e a s s o c i a te d w i t h t h e ( a r g u m e n t , s p e c i f i e r , m o d i f i e r ) p r o j e c t i o n p a r a m e t e r is ^ s a t e l l i t e * , t h e n c a u s e t h e p r o j e c t i o n o f ( a r g u m e n t , s p e c i f ie r , m o d i f i e r ) h e a d s , a n d a t t a c h t h e m t o t h e r ig h t a b o v e t h e a p p r o p r i a t e p r o j e c t i o n o f C .
In E n g li s h t h e a r g u m e n t p r o j e c t i o n p a r a m e t e r is s e t t o * h e a d * , s o t h a t a r g u m e n t s a p p e a r a ft er t h e h e a d . H en ce, if a l e x i c a l e n t r y h a s r e q u i r e m e n t s t h a t m u s t b e filled, t h e n s t r u c t u r e s c o r r e s p o n d i n g t o s u b c a t e g o r i z e d
5Furthermore, it is assu m ed th at the value of the m odifier projection param eter defaults to the value of the argum ent
projection param eter.
6 1 will use the term s a t e l l i t e to in d icate non -h ead con stituen ts: argum ents, specifiers and m odifiers.
c a t e g o r i e s are h y p o t h e s i z e d a n d a t t a c h e d . For e x a m p l e , th e verb see s u b c a t e g o r i z e s for a n o u n p h r a s e , so an e m p t y n o u n p h r a s e n o d e is h y p o t h e s i z e d and a t t a c h e d as a r g u m e n t o f th e verb:
( 7 )
[v p [ v [v s e e ] [iVp e ]]]
T h e s p e c i f ie r p r o j e c t i o n p a r a m e t e r , o n t h e o t h e r h a n d , is s e t t o -the v a lu e ^ s a t e lli te * in E n g li s h so t h a t s p e c i f ie r s a p p e a r b e fo r e their h e a d s . If t h e c a t e g o r y a s s o c i a t e d w i t h a C - n o d e is an a ll o w a b l e s p e c ifie r for o t h e r c a t e g o r i e s , t h e n an H - n o d e p r o j e c t i o n o f each o f t h e s e c a t e g o r i e s is b u il t a n d th e C - n o d e s p e c ifie r is a t t a c h e d to e a c h . For e x a m p l e , s i n c e a d e t e r m i n e r m a y s p e c i f y a n o u n p h r a s e , an H - n o d e n o u n p h r a s e is h y p o t h e s i z e d w h e n p a r s i n g a d e t e r m i n e r in E n g lish :
(8)
[.VP [D e t P [ D e V [oet the ]]] [at/ [/V t ]]]
T h u s t h e n o d e p r o j e c t i o n a l g o r i t h m p r o v id e s a n e w d e r i v a t i o n o f l a n g u a g e - p a r t i c u l a r w ord o r d e r . In p r e v i o u s p r i n c i p l e - b a s e d s y s t e m s , w ord o rd er is d e r iv e d fro m p a r a m e t e r i z e d d i r e c t i o n o f a t t a c h m e n t (see G i b s o n & C la r k ( 1 9 8 7 ) , N y b e r g ( 1 9 8 7 ) , VVehrli ( 1 9 8 8 ) ) . A n a t t a c h m e n t t a k e s p la c e f r o m buffer t o s t a c k in h e a d - i n i t i a l c o n s t r u c t i o n s a n d fr o m s t a c k t o buffer in h e a d - f i n a l c o n s t r u c t i o n s . S i n c e a t t a c h m e n t is n o w a u n i f o r m o p e r a t i o n as d e f i n e d in ( 1 7 ) , t h i s p a r a m e t e r i z a t i o n is n o lo n g e r n e c e s s a r y . I n s t e a d , in h e a d -in i t i a l c o n s t r u c t i o n s , n o d e s n o w p r o j e c t t o t h e n o d e s t h a t t h e y m a y i m m e d i a t e l y d o m i n a t e . In h e a d - f -in a l c o n s t r u c t i o n s , n o d e s n o w p r o j e c t t o t h o s e n o d e s t h a t t h e y m a y b e i m m e d i a t e l y d o m i n a t e d by.
T h e p r o j e c t i o n p a r a m e t e r s as d e f i n e d in (6) a c c o u n t for m a n y f a c t s a b o u t w ord o rd er a c r o s s l a n g u a g e s . H o w e v e r , m o s t , if n o t all, l a n g u a g e s h a v e c a s e s t h a t d o n o t fit t h i s c le a n p ic t u r e . For e x a m p l e , w h ile m o d i f ie r s in E n g l i s h ar e p r e d o m i n a n t l y p o s t - h e a d , a d j e c t i v e s a p p e a r b e fo r e t h e h e a d . A s i n g le g l o b a l v a lu e for m o d i f ie r p r o j e c t i o n p r e d i c t s t h a t t h i s s i t u a t i o n is i m p o s s i b l e . H e n c e w e m u s t a s s u m e t h a t t h e v a lu e s g i v e n for th e p r o j e c t i o n p a r a m e t e r s ar e o n l y d e f a u l t v a lu e s . In o rd er to f o r m a liz e th is id e a , I a s s u m e t h e e x i s t e n c e o f a h ie r a r c h y o f c a t e g o r i e s a n d w o r d s a s . s h o w n b elo w :
C a t e g o r y
N o u n V erb A d p o s i t i o n
E r n ie ro c k ... s e e e a t ... t o o n
It is a s s u m e d t h a t t h e v a l u e for e a c h o f t h e p r o j e c t i o n p a r a m e t e r s is t h e d e f a u l t v a lu e for t h a t p r o j e c t i o n t y p e w i t h r e s p e c t t o a p a r t i c u l a r l a n g u a g e . H o w e v e r , a p a r t i c u l a r c a t e g o r y or w o rd m a y h a v e a v a lu e a s s o c i a t e d w i t h it for a p r o j e c t i o n p a r a m e t e r in a d d i t i o n t o t h e d e f a u l t o n e . If t h i s is t h e c a s e , t h e n o n l y t h e m o s t s p e c i f i c v a l u e is u s e d . For e x a m p l e , in E n g l i s h , t h e c a t e g o r y a d j e c t i v e is a s s o c i a t e d w i t h th e v a l u e ^ s a t e l l i t e * w i t h r e s p e c t t o m o d i f i e r p r o j e c t i o n . T h u s E n g l i s h a d j e c t i v e s a p p e a r b e f o r e t h e h e a d . T h e a d j e c t i v e tall w ill t h e r e f o r e c a u s e t h e p r o j e c t i o n o f b o t h a C - n o d e a d j e c t i v e p h r a s e a n d an H - n o d e n o u n p h r a s e :
( 9 )
a . [AP ta ll ]
b- [jvp Lv' [a p tall ] Dv' (/v e ]]]]
I f r e c u r s i v e a p p l i c a t i o n o f p r o j e c t i o n t o H - n o d e s w e r e a ll o w e d , t h e n it w o u l d b e p o s s i b l e , in p r in c i p le , t o p r o j e c t a n i n f i n i t e n u m b e r o f n o d e s f r o m a s i n g l e l e x i c a l e n tr y . In E n g l i s h , for e x a m p l e , a g e n i t i v e n o u n p h r a s e c a n s p e c i f y a n o t h e r n o u n p h r a s e . T h i s n o u n p h r a s e m a y a ls o b e a g e n i t i v e n o u n p h r a s e , a n d s o on. If H - n o d e s c o u l d p r o j e c t t o f u r t h e r H - n o d e s , t h e n it w o u l d b e n e c e s s a r y t o h y p o t h e s i z e a n i n f in i t e n u m b e r o f g e n i t i v e N P H - n o d e s for e v e r y g e n i t i v e N P t h a t is r ead . A s a r e s u l t o f t h i s d ifficu lty, t h e H - n o d e P r o j e c t i o n C o n s t r a i n t is p r o p o s e d :
T h e H - n o d e P r o j e c t i o n C o n s t r a i n t : O n l y a C - n o d e m a y c a u s e th e p r o j e c t i o n o f an H - n o d e .
A s a r e s u l t o f t h e H - n o d e P r o j e c t i o n C o n s t r a i n t . H - n o d e s m a y not in v o k e H - n o d e p r o j e c t i o n . For e x a m p l e , if a s p e c i f i e r c a u s e s t h e p r o j e c t i o n o f its h e a d , th e r e s u l t i n g h e a d c a n n o t t h e n c a u s e t h e p r o j e c t i o n o f th o s e c a t e g o r i e s t h a t it m a y s p e c i f y . A s a r e s u l t, th e n u m b e r o f n o d e s t h a t m a y be p r o j e c t e d fr o m a s i n g le lex ica l i t e m is s e v e r e l y r e s t r i c t e d .
3.3
N o d e A t t a c h m e n t
G i v e n th e a b o v e n o d e p r o j e c t i o n a l g o r i t h m , it is n e c e s s a r y to d e fin e an a l g o r i t h m for a t t a c h m e n t o f n o d e s . S in c e s t r u c t u r e s are p r e d i c t e d by t h e n o d e p r o j e c t i o n a l g o r i t h m , th e a t t a c h m e n t a l g o r i t h m m u s t d i c t a t e h o w s u b s e q u e n t s t r u c t u r e s m a t c h t h e s e p r e d i c t i o n s . C o n s i d e r t h e f o l l o w i n g tw o e x a m p l e s f r o m E n g li s h : th e first is an e x a m p l e o f s p e c i f ie r a t t a c h m e n t ; th e s e c o n d is an e x a m p l e o f a r g u m e n t a t t a c h m e n t . In E n g li s h , s p e c i f ie r s p r e c e d e t h e h e a d a n d a r g u m e n t s follo w th e h e a d . It is d e s i r a b le for t h e a t t a c h m e n t a l g o r i t h m to h a n d l e b o t h k in d s o f a t t a c h m e n t s w i t h o u t w ord o rder p a r t i c u la r s t i p u l a t i o n s .
F i r s t , s u p p o s e t h a t t h e w o r d the is o n th e s t a c k as b o t h a d e t e r m i n e r p h r a s e and an H - n o d e n o u n p h r a se . F u r t h e r m o r e , s u p p o s e t h a t t h e w ord woman is p r o j e c t e d in t o th e buffer as b o t h a n o u n p h r a s e a n d an H - n o d e c l a u s a l p h r a s e : '
(1 1)
S ta ck : [DetP [Det1 [Det t h e ]]]
[ N P [D e t P [ D e t 1 [D e t t h e ]]] for# for t ]]]
Buffer: forp for' [ n w o m a n ]]]
[ * P e « . . . . [ n p [n> [ n w o m a n ]]] f o r ' , . . . . foreu . . . e ]]]
T h e a t t a c h m e n t a l g o r i t h m s h o u l d a ll o w t w o a t t a c h m e n t s a t t h i s p o in t : t h e H - n o d e N P o n t h e s t a c k u n i t i n g w i t h e a c h N P C - n o d e in t h e buff er. It m i g h t a ls o s e e m r e a s o n a b l e to a llo w t h e b a r e d e t e r m i n e r p h r a s e to a t t a c h d i r e c t l y as s p e c i f ie r o f e a c h n o u n p h r a s e . H o w e v e r , th i s kin d o f a t t a c h m e n t is u n d e s i r a b l e for t w o r e a s o n s . F i r s t o f all, it m a k e s t h e a t t a c h m e n t o p e r a t i o n a d i s j u n c t i v e o p e r a t i o n : an a t t a c h m e n t w o u ld i n v o l v e either m a t c h i n g an H - n o d e o r m e e t i n g t h e s a t e l l i t e r e q u i r e m e n t s o f a c a t e g o r y . S e c o n d o f all, it m a k e s H - n o d e p r o j e c t i o n u n n e c e s s a r y in m o s t s i t u a t i o n s an d th e r e f o r e s o m e w h a t s t i p u l a t i v e . T h a t is, a l l o w i n g a d i s j u n c t i v e a t t a c h m e n t o p e r a t i o n w o u ld p e r m i t m a n y d e r i v a t i o n s t h a t n e v e r u s e a n H - n o d e , s o t h a t t h e n e e d for H - n o d e s w o u l d b e r e s t r i c t e d t o h e a d - f i n a l c o n s t r u c t i o n s w i t h p r e - h e a d s a t e l l i t e s ( s e e S e c t i o n 2 ) . It is t h e r e f o r e d e s i r a b l e for all a t t a c h m e n t s t o in v o l v e m a t c h i n g an H - n o d e .
T w o s t r u c t u r e s s h o u l d b e r e t u r n e d a ft er a t t a c h m e n t s in (1 1): a C - n o d e n o u n p h r a s e a n d a n H - n o d e c l a u s a l p h r a se :
(
12)
a . [ n p [DetP t h e ] for» for w o m a n ]]]
b - [ a - P c u ... [ n p [ D e t P t h e ] for' [ n w o m a n ]]] [ * ; , . . . . [ * „ . . . e 111 N o w c o n s i d e r an E n g l i s h a r g u m e n t a t t a c h m e n t . S u p p o s e t h a t a p r e p o s i t i o n a l p h r a s e r e p r e s e n t i n g th e w ord beside is o n t h e s t a c k a n d t h e n o u n F m n k is r e p r e s e n t e d in t h e buff er as a n o u n p h r a s e a n d a c l a u s a l phrase:
(13)
S tack : [ p p [p> [p b e s i d e ] forp e ]]] Buffer: forp for* fo/ Frank ]]]
[ a - P c , .. .. U p [ n 1[jv F r a n k ]]] [ * ' u . #. [ x cl. m. . « ]]]
(10)
7 A noun phrase is p rojected to an H -node clau sal (or predicate) phrase since nouns m ay b e th e su b jects of predicates.
S i n c e t h e p r e p o s i t i o n beside s u b c a t e g o r i z e s for a n o u n p h r a s e , t h e r e is an H - n o d e N P a t t a c h e d as it s o b j e c t . T h e a t t a c h m e n t a l g o r i t h m s h o u l d a ll o w a s i n g l e a t t a c h m e n t at th i s p o in t : t h e n o u n p h r a s e r e p r e s e n t in g Frank u n i t i n g w i t h th e H - n o d e N P o b j e c t o f beside:
(14)
[pp [p‘ [p b e s i d e ] [ s p Frank ]]]
A s s h o u l d b e clea r fro m t h e t w o e x a m p l e s , th e p r o c e s s o f a t t a c h m e n t in v o l v e s c o m p a r i n g a p r e v i o u s l y p r e d i c t e d c a t e g o r y w i t h a c u r r e n t c a t e g o r y . If th e t w o c a t e g o r i e s are compatible, t h e n a t t a c h m e n t m a y be v ia b l e .
3.3.1
N o d e C o m p a t i b ili t y
C om patibility is d e f i n e d in t e r m s o f unification, w h ic h is d e fi n e d t e r m s o f su b su m ption .8 A s t r u c t u r e X is s a i d to su b su m e a s t r u c t u r e V' if X is m o r e g e n e r a l t h a n Y. T h a t X c o n t a i n s less s p e c i f ic in f o r m a t i o n them Y. S o , for e x a m p l e , a s t r u c t u r e t h a t is s p e c i f i e d as c la u sal ( e .g . t l e a d o f a p r e d i c a t e ) , b u t is n o t s p e c i f i e d for a p a r t i c u l a r c a t e g o r y s u b s u m e s a s t r u c t u r e h a v i n g t h e c a te g o r v erb, s i n c e v e r b s are p r e d i c a t i v e a n d t h u s c l a u s a l c a t e g o r i e s . H e n c e s t r u c t u r e
(15a)
s u b s u m e s s t r u c t u r e(15b):
(15)
a - [ * P CU . . . e ]]] b . [ v p [v> [v w a l k ]]]
T h e unification o p e r a t i o n is t h e l e a s t u p p e r b o u n d o p e r a t o r in t h e s u b s u m p t i o n o r d e r in g o n i n f o r m a t i o n in a s t r u c t u r e . S i n c e s t r u c t u r e
(15a)
s u b s u m e s s t r u c t u r e(15b),
t h e r e s u l t o f u n i f y i n g s t r u c t u r e(15a)
w i t h s t r u c t u r e(15b)
is s t r u c t u r e(15b).
T w o s t r u c t u r e s are compatible if t h e u n i f i c a t i o n o f t h e t w o s t r u c t u r e s is n o n - n i l . T h e i n f o r m a t i o n o n a s t r u c t u r e t h a t is r e le v a n t to a t t a c h m e n t c o n s i s t s o f t h e n o d e ’s b a r le v el (e.g., z e r o le v el, i n t e r m e d i a t e or m a x i m a l ) , a n d t h e n o d e ’s l e x i c a l f e a t u r e s , (e.g. c a t e g o r y , c a s e , etc).3 .3 .2
A t t a c h m e n t
Roughly speaking, the attachm ent operation should locate an H-node in a structure on the stack along with
a compatible node in a structure in the buffer. If both of these structures have parent tree structures, then
these parent tree structures must also be compatible. In order to keep the process of attachm ent simple, it
is proposed th at each attachm ent have at most one compatibility
This constraint is given in
(1 6 ):9
(16)
Attachm ent C onstraint: At most one nontrivial lexical feature unification is perm itted per attachm ent.
A nontrivial unification is one th at involves two nontrivial structures; a trivial unification is one that
involves at least one trivial structure. For example, if the parent node of the buffer site is as of yet undefined,
then the parent node of the stack site trivially unifies with this parent node. Only when both parents are
defined is there a nontrivial unification.
Consider the effect of the following three requirements: first, the lexical features of the stack and buffer
attachm ent sites must be compatible; second, the tree structures above the buffer and stack attachm ent sites
must be compatible; and third, at most one lexical feature unification is permissible per derivation,
(1 6 ).
Since any attachm ent must involve at least one nontrivial lexical feature unification, th at of the stack and
buffer sites, any additional nontrivial unifications will violate the attachm ent constraint in
(1 6 ) .
If both
8 See S h eib er (1986) for background on the p o ssib le uses o f un ification in particular gram m ar form alism s.
9 In fact, this con strain t follow s from the tw o assu m p tion s: first, a co m p atib ility check takes a certain am oun t of processing tim e; and second, a tta ch m en ts th a t take less tim e are preferred over th ose th at take more tim e. See G ib son (forth com in g) for further discussion.
the buffer and stack attachm ent sites have parent tree structures, then the lexical features of these parents
will need to be unified. Since the child structures will also need to be unified, (16) will be violated. Thus
it follows th a t, in an attachm ent, either the buffer site or the stack site has no parent tree stru c tu re
. 10Since the order of the words in the input must be maintained in a final parse, only those nodes in a buffer
structure th a t dominate all lexical items in that structure are permissible as attachm ent sites. For example,
suppose th at the buffer contained a representation for the noun phrase
women in college.Furthermore,
suppose th a t there is an H-node NP on the stack representing the word
the.Although it would be suitable
for the buffer structure representing the entire noun phrase
women in collegeto match the stack H-node, it
would not be suitable for the C-node NP representing
collegeto match this H-node. This attachm ent would
result in a stru ctu re th a t moved the lexical input
women into the left of the lexical input dominated by
the matched H-node, producing a parse for the input
womenm
the college.Since the word order of the
input string must be maintained, sites for buffer attachm ent must dominate
all lexical items in the buffer
structure.
Once suitable maximal projections in each of the buffer and stack structures have been identified for
matching, it is still necessary to check th a t their internal structures are compatible. For example, suppose
that an identified buffer site is a C-node whose head allows exactly one specifier and a specifier is already
attached. If the stack H-node site also contains a specifier, then the attachm ent should be blocked. On the
other hand, if the stack H-node site does not contain a specifier, and other requirements are satisfied, then
the attachm ent should be allowed.
Testing for internal structure compatibility is quite simple if all tree structures are assumed to be binary
branching ones. The only possible attachm ent sites inside the stack H-node are those nodes th a t dominate
no other nodes. As long as there is some buffer node th at both dominates all the buffer input and matches
the H-node attac h m e n t site for bar level, then the attachm ent is possible.
A ttachm ent is formally defined in (17):
(17)
A structure
Win the buffer can attach to a structure
Xon the stack iff all of
(a ), (b ), (c ), (d)
and
(a)
are true:
a.
Structure
Wcontains a maximal projection node,
Y ,such th a t
Ydominates all lexical material in
W \b.
Structure
Xcontains a maximal projection H-node structure, Z;
c. The tree stru ctu re above
Yis compatible with the tree structure above
Z,
subject to the attachm ent
constraint in (16);
d. The lexical features of structure
Yare compatible with the lexical features of structure Z;
e. Structure
Yis
bar-level compatible
with structure Z.
Bar-level compatibility is defined in (18):
(18)
A structure
U
in the buffer is
bar-level compatible
with a structure
Von the stack iff all of
(a),(b)
and
(c)
are true:
a.
Structure
U
contains a node,
S,
such th at
S
dominates all lexical material in
U
;
b.
Structure
Vcontains an H-node structure,
T,
th a t dominates no lexical material;
c. The bar level of 5 is compatible with the bar level of
T .If attachm ent is viable, then
Wcontains a structure
Yth a t is bar-level compatible with a structure Z
that is part of
X .Since
Yand Z are bar-level compatible, there are structures 5 and
Tinside
Yand Z
10 It m ight seem th a t som e possib le a tta ch m en ts are being thrown away at this poin t. T h a t is, in princip le, there m ight be a structure that can only be form ed by attach in g a buffer site to a stack site where b o th sites have parent tree structures. This attach m en t would be blocked by (1 6 ) . H owever, it turns out that any attach m en t th at could have b een form ed by an attachm ent involving m ore than on e lexical feature un ification can always be arrived at by a different a tta ch m en t involving a single lexical feature unification . For the proof, see G ibson (forth com ing).
When the conditions for attachm ent are satisfied, structures
Wand
Xare united in the following way.
First.
\ Vand
Xare copied to nodes
W 'arid
X 'respectively. Inside
X 'there is a node,
Z' , th a t is a copy of
Z. The lexical features of Z ' axe set to the unification of the lexical features of structures
Yand
Z .Next,
structure
Vin
Z '(corresponding to structure
Tin
Z )is replaced by
S ', the copy of structure 5 inside
W.
The bar level of
Vis set to the unification of the bar levels of structures 5 and
T.
Finally, the tree structures above
Yand
Zare unified and this tree structure is attached above
Z 'T h a t
is, if
Zhas some parent tree structure and
Ydoes not, then the copy of this structure inside
X 'is attached
above
Z ' .Similarly, if
Yhas some parent tree structure and
Zdoes not, then the copy of this structure
inside
\ Vis attached above
Z ' .If neither node has any parent tree structure (i.e.,
W - Y , X= Z), then
the unification is trivial and no attachm ent is made. Since V and Z cannot both have parent tree structures
(see (16) and the discussion following it), unifying the parent tree structures is a very simple process.
respectively, th a t satisfy the c o n d itio n s o f bar-level co m p atib ility, ( 1 8 ).
3 .3 .3 . E x a m p le A t t a c h m e n t s
As an illustration of how attachm ents take place, consider once again the noun phrase
the big red book.
First
the determiner
the
is read and is projected to a C-node determiner phrase. Since a determiner is allowable
as the specifier of a noun phrase and specifiers occur before the head in English, an H-node NP is also built.
These two structures are depicted in
(19):
(19)
a. [D e t P t h e ]
b. [ivp [
D e t Pthe ] Lv' [/v e ]]]
Since there is nothing on the stack, these structures are shifted to the top of the stack. The word
big
projects to both a C-node AP and an H-node NP since an adjective is allowable as a pre-head modifier in
English. These two structures are placed in the buffer (depicted in (2 0 )).
( 2 0 )
a. [a p b i g ]
b.
[ n p [ n ' [a p b ig ] [n 1 [/v « ]]]]An attachm ent between nodes
(19b)
and
(20b)
is now attem pted. Note that: a) node
(20b)
is a maximal
projection dom inating all lexical material in its buffer structure; b) node
(19b)
is a maximal projection H-
node on the stack; c) the tree structures above these two nodes are compatible (both are undefined); and
d) the categories of the two nodes are compatible. It remains to check for bar-level compatibility of the two
structures. Since: a) the N'2 in structure
(20b)
dominates all the buffer input; b) the H-node
in structure
(19b)
dominates no C-nodes; and c) N'x and N2 are compatible in bar level, the structures in
(19b)
and
(20b)
can be attached. The two structures are therefore attached by uniting N#x and N'2. The resultant
structure is given in
(2 1
):(21)
[np [
D e t Pthe ] [n'
[a pbig ] [n' [^v «
]]]]
Structure
(2 1 ) ,
the only possible attachm ent between the buffer and the stack, is placed back in the
buffer, and the stack is popped. Since there is now nothing left on the stack, no further attachm ents are
possible at this time. Structure (21) is thus shifted to the stack. The word
red
now enters the buffer as a
C-node adjective phrase and an H-node noun phrase:
(
2 2
)a .
[AP
red ]
b.
[ n p [ n ; [a p r e d ] [ n ' [ n « ]]]]An attachm ent between nodes (
2 1) and (
2 2b) is now attem pted. Requirements ( 1 7 a ) - ( l 7 d ) are satisfied
and the requirement for bar-level compatibility is satisfied by the node labeled N
3in (
2 1) together with N'
in (
2 2b). Hence the structures are united, giving (23):
*
4(23)
[.vp
[ D e t Pthe ] [jv»
[ A Pbig ] [v'
[apred ] [,V; [,v
e]]]]]
Since (23) is the only possible attachm ent between the buffer and the stack, it is placed in the buffer
and the stack is popped. Since the stack is now empty, structure (23) shifts to the stack. The noun
b o o know enters the buffer as both a C-node noun phrase and an H-node clausal phrase:
(24)
a. [.vp [/v» [
atbook ]]]
b - [ x P cu . . . [n p Dv' [ n b o ° k ]]] k i . . , . e ]]]
Two attachm ents are possible at this point. The NP structure in (23) unites with each NP C-node on
the stack, resulting in the structures in (25):
(25)
a - [vp [
D e t Pthe ] [v'
[a pbig ] [v'
[a pred ] [^/
[ ^ >book ]] [pp e]
[ C p e]]]]]
b - [xp«i..„
[n pthe big red book ]
e]]]
Note th a t only one attac h m e n t per structure takes place in the final parse step. Crucially, no more
attachments per stru ctu re take place when parsing the head of the noun phrase than when parsing the pre
head constituents in the noun phrase
. 11Thus, in contrast with the situation when nodes are only projected
when their heads are encountered, the node projection and attachm ent algorithms described here predict
that there should not be any slowdown when parsing the head of a head-final construction.
The Dutch d a ta described in Section 2.1 are handled in a similar manner.
4
C o n c lu s io n s
This paper has described a) a principle-based algorithm for the projection of phrasal nodes before their
heads are parsed, and b) an algorithm for attaching the predicted nodes. It is worthwhile to compare the
new projection algorithm with algorithms th at do not project H-nodes. The projection algorithm provided
here involves more work and hence, on the surface, may seem somewhat stipulative compared to one that
does not project H-nodes. However, it turns out that although projecting -to H-nodes is more complicated
than not doing so, attachm ent when H-nodes are not present is more complicated than attachm ent when
they are present. T hat is, if a projection algorithm causes the projection of H-nodes, it will have a more
complicated attachm ent algorithm. For example, if H-nodes are projected when parsing the noun phrase
t h e w o m a n ,
the determiner
the
is immediately projected to an H-node noun phrase, which leads to a simple
attachm ent. If H-nodes are not projected, then projection is easier, but attachm ent is th at much more
complicated. When attaching, it will be necessary to check if a determiner is an allowable specifier of a noun
phrase: the same operation th a t is performed when projecting to H-nodes. Thus although the complexity of
particular components changes , the complexity of the entire parsing algorithm does not change, whether or
not H-nodes are projected. Since the proposed projection and attachm ent algorithms make better empirical
predictions than ones th at do not predict structure, the new algorithms are preferred.
N ote that it is th e num ber o f atta ch m en ts per structure that is crucial here, and not th e num ber o f total a tta ch m en ts, since attach m en ts m ade up on two ind ep en den t structures m ay be perform ed in parallel, w hereas a tta ch m en ts m ade on the same structure m ust b e perform ed serially. For exam ple, since structures (2 4 a ) w id (24b ) are in d ep en d en t, attach m en ts m ay e made to each of th ese in parallel. B ut if an a tta ch m en t, B relies on the result of anoth er attach m en t A, th en attach m en t A
must be perform ed first.
5
R e fe r e n c e s
Abney (1986), “Licensing and Parsing'’,
Proceedings of the Seventeenth North East Linguistic Society Con
ference,
MIT, Cambridge, MA.
Chomsky, N. (1981),
Lectures on Government and Binding,
Foris, Dordrecht, The Netherlands.
Chomsky, N. (1986a),
Knowledge of Language: Its Nature, Origin and Use
, Praeger Publishers, New York,
NY.
Chomsky, N. (1986b),
Barriers,
Linguistic Inquiry Monograph 13, MIT Press, Cambridge, MA.
Clark, R.
&
Gibson, E. (1988), “A Parallel Model for Adult Sentence Processing” ,
Proceedings of the Tenth
Cognitive Science Conference,
McGill University, Montreal, Quebec.
Frazier, L. (1987) “Syntactic Processing Evidence from Dutch” ,
Natural Language and Linguistic Theory
5, pp. 519-559.
Gibson, E. (1987),
Garden-Path Effects m a Parser with Parallel Architecture,
Eastern States Conference
on Linguistics, Columbus Ohio.
Gibson, E. (forthcoming),
Parsing with Principles: A Computational Theory of Human Sentence Process
ing,
Ms., Carnegie Mellon University, Pittsburgh, PA.
Gibson, E.
k
Clark, R. (1987), “Positing Gaps in a Parallel Parser” ,
Proceedings of the Eighteenth North
East Linguistic Society Conference,
University of Toronto, Toronto, Ontario.
Kashket, M. (1987),
G o v e r n m e n t -Binding Parser fo r Warlpin, a Free Word Order Language,
MIT M aster’s
Thesis, Cambridge, MA.
Kayne, R. (1983)
Connectedness and Binary Branching
, Foris, Dordrecht, The Netherlands.
Marcus, M. (1980),
A Theory o f Syntactic Recognition for Natural Language,
MIT Press, Cambridge, MA.
Nyberg, E. (1987), “Parsing and and the Acquisition of Word Order” ,
Proceedings of the Fourth Eastern
States Conference on Linguistics
, The Ohio State University, Columbus, OH.
Pollard, C.
k
Sag, I. (1987)
An Information-based Syntax and Semantics,
CSLI Lecture Notes Number 13,
Menlo Park, CA.
Pritchett, B. (1987),
Garden Path Phenomena and the Grammatical Basis of Language Processing,
Harvard
University Ph.D. dissertation, Cambridge, MA.
Sheiber, S. (1986)
An Introduction to Unification-based Approaches to Grammar,
CSLI Lecture Notes
Number 4, Menlo Park,
CA .S to well, T. (1981),
Origins o f Phrase Structure,
MIT Ph.D. dissertation.
VVehrli, E. (1988), “Parsing with a GB G ram m ar” , in U. Reyle and C. Rohrer (eds.),
Natural Language
Parsing and Linguistic Theones,
177-201, Reidel, Dordrecht, the Netherlands.