Parsing with Principles: Predicting a Phrasal Node Before Its Head Appears

Full text

(1)

P a r s i n g w it h P r in c ip le s :

P r e d i c t i n g a P h r a s a l N o d e B e f o r e I t s H e a d A p p e a r s

1

2

E d w a r d G i b s o n D e p a r t m e n t o f P h i l o s o p h y C a r n e g i e M e ll o n U n i v e r s i t y

P i t t s b u r g h , P A 1 5 213 e a f g ;3>cad. c s . c m u . e d u

1

I n t r o d u c t i o n

R e c e n t w ork in g e n e r a t i v e s y n t a c t i c t h e o r y h a s s h i f t e d th e c o n c e p t i o n o f a n a t u r a l l a n g u a g e g r a m m a r from a h o m o g e n e o u s s e t o f p h r a s e s t r u c t u r e ( P S ) rule s t o a h e t e r o g e n e o u s s e t o f w e l l - f o r m e d n e s s c o n s t r a i n t s on r e p r e s e n t a t i o n s ( s e e , for e x a m p l e , C h o m s k y ( 1 9 8 1 ) , S t o w e l l ( 1 9 8 1 ) , C h o m s k y ( 1 9 8 6 a ) an d P o ll a r d k S a g ( 1 9 8 7 ) ) . In t h e s e t h e o r i e s it is a s s u m e d t h a t th e g r a m m a r c o n t a i n s p r in c i p le s t h a t are i n d e p e n d e n t o f th e la n g u a g e b e i n g p a r s e d , t o g e t h e r w i t h p r in c i p le s t h a t are p a r a m e t e r i z e d t o reflect t h e v a r y in g b e h a v i o r o f d if feren t l a n g u a g e s . H o w e v e r , t h e r e is m o r e t o a t h e o r y o f h u m a n s e n t e n c e p r o c e s s i n g t h a n j u s t a t h e o r y o f l i n g u i s t i c c o m p e t e n c e . A t h e o r y o f p e r f o r m a n c e c o n s i s t s o f b o t h l i n g u i s t i c k n o w l e d g e an d a p a r s i n g a l g o r i t h m . T h i s p a p e r w ill i n v e s t i g a t e w a y s o f e x p l o i t i n g p r i n c i p l e - b a s e d s y n t a c t i c t h e o r i e s d i r e c t l y in a p a r s i n g a l g o r i t h m in o r d e r t o d e t e r m i n e w h e t h e r or n o t a p r i n c i p l e - b a s e d p a r s i n g a l g o r i t h m ca n be c o m p a t i b l e w i t h p s y c h o l i n g u i s t i c e v i d e n c e .

P r i n c i p l e - b a s e d p a r s i n g is an i n t e r e s t i n g r e sea rch t o p i c n o t o n l y f r o m a p s y c h o l i n g u i s t i c p o i n t o f v i e w b u t a ls o fr o m a p r a c t i c a l p o i n t o f v i e w . W h e n P S ru le s are u se d , a s e p a r a t e g r a m m a r m u s t b e w r i t t e n for e a c h l a n g u a g e p a r s e d . E a c h o f t h e s e g r a m m a r s c o n t a i n s a g r e a t d e a l o f r e d u n d a n t i n f o r m a t i o n . For e x a m p l e , th e r e m a y b e t w o r u le s , in d if feren t g r a m m a r s , t h a t are i d e n t i c a l e x c e p t for t h e o r d e r o f t h e c o n s t i t u e n t s on t h e rig ht h a n d s i d e , i n d i c a t i n g a d if fe r e n c e in w ord o r d e r . T h i s r e d u n d a n c y c a n b e a v o i d e d b y e m p l o y i n g a u n iv e r s a l p h r a s e s t r u c t u r e c o m p o n e n t ( n o t n e c e s s a r i l y in t h e f o r m o f r u les ) a l o n g w i t h p a r a m e t e r s a n d a s s o c i a t e d v a lu e s . A p r i n c i p l e s a n d p a r a m e t e r s a p p r o a c h p r o v i d e s a s i n g l e c o m p a c t g r a m m a r for all l a n g u a g e s t h a t w o u ld o t h e r w i s e b e r e p r e s e n t e d by m a n y dif ferent ( a n d r e d u n d a n t ) P S g r a m m a r s .

A n y m o d e l o f h u m a n p a r s i n g m u s t d i c t a t e : a) h o w s t r u c t u r e s ar e p r o j e c t e d fr o m t h e le x i c o n ; b) h o w s t r u c t u r e s are a t t a c h e d t o o n e a n o t h e r ; a n d c) w h a t c o n s t r a i n t s affect t h e r e s u l t a n t s t r u c t u r e s . T h i s p a p e r w ill c o n c e n t r a t e o n t h e first t w o c o m p o n e n t s w i t h r e s p e c t t o p r i n c i p l e - b a s e d p a r s i n g a l g o r i t h m s : n o d e p r o j e c t i o n a n d s t r u c t u r e a t t a c h m e n t .

T w o b a sic c o n t r o l s t r u c t u r e s e x i s t for a n y p a r s i n g a lg o r i t h m : d a t a - d r i v e n c o n t r o l a n d h y p o t h e s i s - d r i v e n c o n tr o l . E v e n if a p a r s e r is p r e d o m i n a n t l y h y p o t h e s i s - d r i v e n , t h e p r e d i c t i o n s t h a t it m a k e s m u s t a t s o m e p o i n t b e c o m p a r e d

with

t h e

d a ta th at

are p r e s e n t e d t o it. S o m e d a t a - d r i v e n c o m p o n e n t is t h e r e f o r e n e c e s s a r y for a n y p a r s i n g a l g o r i t h m .

Thus, a

r e a s o n a b l e h y p o t h e s i s t o t e s t is t h a t t h e h u m a n p a r s i n g a l g o r i t h m is e n t i r e l y d a t a - d r i v e n . T h i s is e x a c t l y t h e a p p r o a c h t h a t is t a k e n b y a n u m b e r o f p r i n c i p l e - b a s e d p a r s i n g a l g o r i t h m s ( s e e , for e x a m p l e , A b n e y ( 1 9 8 6 ) , K a s h k e t ( 1 9 8 7 ) , G i b s o n &: C la r k ( 1 9 8 7 ) a n d P r i t c h e t t ( 1 9 8 7 ) ) . T h e s e p a r s i n g a l g o r i t h m s e a c h i n c l u d e a n o d e p r o j e c t i o n a l g o r i t h m t h a t p r o j e c t s a n i n p u t w o r d t o a m a x i m a l c a t e g o r y , b u t d o e s n o t c a u s e t h e p r o j e c t i o n o f a n y f u r t h e r n o d e s .

A l t h o u g h t h i s s i m p l e s t r a t e g y is a t t r a c t i v e b e c a u s e o f it s s i m p l i c i t y , it t u r n s o u t t h a t it c a n n o t a c c o u n t for c e r t a i n p h e n o m e n a o b s e r v e d in t h e p r o c e s s i n g o f D u t c h (F r a z ie r ( 1 9 8 7 ) : s e e S e c t i o n 2 . 1 ) . A c o m p l e t e l y d a t a - d r i v e n n o d e p r o j e c t i o n a l g o r i t h m a ls o h a s d if fic u lty a c c o u n t i n g for t h e p r o c e s s i n g e a s e o f a d j e c t i v e - n o u n c o n s t r u c t i o n s in E n g l i s h ( s e e S e c t i o n 2 . 2 ) . A s a r esu lt o f t h i s e v i d e n c e , a p u r e l y d a t a - d r i v e n n o d e p r o j e c t i o n

1 Paper presented at the In ternational W orkshop on Parsing T echnologies, A u gu st 28-31, 1989.

2 I would like to than k R obin Clark, Rick K azm an, Howard K urtzm an , Eric N yb erg and Brad P ritch ett for their com m en ts on earlier drafts of this paper, and I offer the usual disclaim er.

(2)

a l g o r i t h m m u s t b e r e je c t e d in favor o f a n o d e p r o j e c t i o n a l g o r i t h m t h a t h as a p r e d i c t i v e ( h y p o t h e s i s - d r i v e n ) c o m p o n e n t Frazier ( 1 9 8 7 ) ) .

T h i s p a p e r d e s c r i b e s a n o d e p r o j e c t i o n a l g o r i t h m t h a t is part o f th e C o n s t r a i n e d P a r a lle l Parser ( C P P ) ( G i b s o n ( 1 9 8 7 ) , G i b s o n k C la r k ( 1 9 8 7 ) a n d C la r k & G i b s o n ( 1 9 8 8 ) ) . T h i s parser is b a s e d on t h e p r in c ip le s o f G o v e r n m e n t - B i n d i n g t h e o r y ( C h o m s k y ( 1 9 8 1 , 1 9 8 6 a ) ) . S e c t i o n 3.1 g i v e s an o v e r v i e w o f th e C P P m o d e l , w h ile S e c t i o n 3 .2 d e s c r i b e s th e n o d e p r o j e c t i o n a l g o r i t h m . S e c t i o n 3 .3 d e s c r i b e s th e a t t a c h m e n t a l g o r i t h m , a n d i n c l u d e s an e x a m p l e p a r se . T h e s e n o d e p r o j e c t i o n a n d a t t a c h m e n t a l g o r i t h m s d e m o n s t r a t e t h a t a p r i n c i p l e - b a s e d p a r s i n g a l g o r i t h m c a n a c c o u n t for th e D u t c h a n d E n g li s h d a t a , w h ile a v o i d i n g th e e x i s t e n c e o f r e d u n d a n t p h r a s e s t r u c t u r e rule s. T h u s it is c o n c l u d e d t h a t o n e s h o u l d c o n t i n u e to i n v e s t i g a t e h y p o t h e s i s -d r iv e n p r i n c i p l e - b a s e -d m o -d e l s in t h e s e a r c h for an o p t i m a l p s y c h o l i n g u i s t i c m o -d e l .

2

D a t a - D r i v e n N o d e P r o j e c t io n : E m p ir ica l P r e d i c t io n s an d R e su lts

2.1

E v id e n c e fr om D u t c h

C o n s i d e r t h e s e n t e n c e f r a g m e n t in (1):

(1)

... d a t h e t m e i s j e v a n H o l l a n d ... ... “t h a t t h e girl f r o m H o l l a n d ” ...

D u t c h is like E n g l i s h in t h a t p r e p o s i t i o n a l p h r a s e m o d i f ie r s o f n o u n s m a y f o l lo w t h e n o u n . T h u s t h e p r e p o s i t i o n a l p h r a s e van H ollan d m a y b e a m o d i f ie r o f t h e n o u n p h r a s e the girl in e x a m p l e (1). U n lik e E n g l i s h , h o w e v e r , D u t c h is S O V in s u b o r d i n a t e c l a u s e s . H e n c e in (1) t h e p r e p o s i t i o n a l p h r a s e van H olland m a y a ls o be t h e a r g u m e n t o f a ve r b t o fo llo w . In p a r t i c u l a r , if t h e w o r d ghm lachte ( “s m i l e d ” ) fo l lo w s t h e f r a g m e n t in (1) , t h e n t h e p r e p o s i t i o n a l p h r a s e van H olland c a n a t t a c h t o t h e n o u n p h r a s e t h a t it fo llo w s, s i n c e t h e verb ghm lachte h a s n o l e x i c a l r e q u i r e m e n t s ( s e e (2a ) ) . If, o n t h e o t h e r h a n d , t h e w o r d houdt ( “l i k e s ” ) f o l lo w s t h e f r a g m e n t in (1) , t h e n t h e P P van H olland m u s t a t t a c h as a r g u m e n t o f t h e verb houdt, s i n c e t h e ve r b req u ir e s s u c h a c o m p l e m e n t ( s e e (2b ) ) .

(2)

a . ... d a t [s [iVP h e t m e i s j e [p p v a n H o l l a n d ]] [ v p g l i m l a c h t e ]] ... “t h a t t h e gir l f r o m H o l l a n d s m i l e d ” ...

b . ... d a t [5 [.vp h e t m e i s j e ] [ v p [ v [pp v a n H o l la n d ] [v h o u d t ]]]] ... “t h a t t h e gir l likes H o l l a n d ”

F o l l o w i n g A b n e y ( 1 9 8 6 ) , F ra zier ( 1 9 8 7 ) , C la r k k G i b s o n ( 1 9 8 8 ) a n d n u m e r o u s o t h e r s , it is a s s u m e d t h a t a t t a c h e d s t r u c t u r e s are p r e fe r r e d o v e r u n a t t a c h e d s t r u c t u r e s . If w e a ls o a s s u m e t h a t a p h r a s a l n o d e is n o t p r o j e c t e d u n t i l it s h e a d is e n c o u n t e r e d , w e p r e d i c t t h a t p e o p l e w ill e n t e r t a i n o n l y o n e h y p o t h e s i s for th e s e n t e n c e f r a g m e n t in (1): t h e m o d i f i e r a t t a c h m e n t . T h u s w e p r e d i c t t h a t it s h o u l d t a k e l o n g e r t o p a r s e t h e c o n t i n u a t i o n houdt ( “li k e s ” ) t h a n t o p a r s e t h e c o n t i n u a t i o n ghm lachte ( “s m i l e d ” ), s i n c e t h e c o n t i n u a t i o n houdt fo r c e s t h e p r e p o s i t i o n a l p h r a s e t o b e r e a n a l y z e d as an a r g u m e n t o f t h e v erb. H o w e v e r , c o n t r a r y t o t h i s p r e d i c t i o n , t h e v e r b t h a t a l l o w s a r g u m e n t a t t a c h m e n t is a c t u a l l y p a r s e d f a s te r t h a n t h e verb t h a t n e c e s s i t a t e s m o d i f i e r a t t a c h m e n t in s e n t e n c e f r a g m e n t s like (1). If t h e ve r b h a d b e e n p r o j e c t e d b e f o r e its h e a d w a s e n c o u n t e r e d , t h e n t h e a r g u m e n t a t t a c h m e n t o f t h e P P van H olland w o u l d b e p o s s i b l e a t t h e s a m e t i m e t h a t t h e m o d i f ie r a t t a c h m e n t is p o s s i b l e.3 T h u s Fra zie r c o n c l u d e s t h a t in s o m e c a s e s p h r a s a l n o d e s m u s t be p r o j e c t e d b e f o r e t h e i r l e x i c a l h e a d s h a v e b e e n e n c o u n t e r e d .

3 It is b eyon d the scop e of this pap er to offer an e x p la n a tio n as to why the argum ent attach m en t is in fact preferred, to the m odifier a tta ch m en t. T h is paper seeks only to dem on strate th at the argum ent attach m en t p o ssib ility m u st at least be avai l abl e

for a p sych ologically real parser. See A b ney (1 9 8 6 ), Frazier (1987) and Clark U G ibson (1988) for possib le exp la n a tio n s for the preference p h en om en on.

(3)

2.2

E v id e n c e from E n g lish

A s e c o n d p ie c e o f e v i d e n c e a g a i n s t th is l i m i t e d t y p e o f n o d e p r o j e c t i o n is p r o v i d e d by t h e p r o c e s s i n g o f n o u n p h r a s e s in E n g li s h t h a t h a v e m o r e t h a n o n e p r e - h e a d c o n s t i t u e n t .

It is a s s u m e d t h a t t h e p r i m i t i v e o p e r a t i o n o f a t t a c h m e n t is a s s o c i a t e d w i t h a c e r t a i n p r o c e s s i n g c ost. H e n c e th e a m o u n t o f t i m e t a k e n to p a rse a s i n g l e in p u t w ord is d i r e c t l y r e la t e d to t h e n u m b e r o f a t t a c h m e n t s t h a t t h e p arser m u s t e x e c u t e t o in c o r p o r a t e t h a t s t r u c t u r e in t o t h e e x i s t i n g s t r u c t u r e ( s ) . If a p h r a s a l n o d e is n o t p r o j e c t e d u n t i l it s h e a d is e n c o u n t e r e d , t h e n p a r s i n g t h e final w ord o f a h e a d -f in a l c o n s t r u c t i o n will in volve a t t a c h i n g all its p r e - h e a d s t r u c t u r e s at t h a t p o i n t . If, in a d d i t i o n , t h e r e is m o r e t h a n o n e p r e - h e a d s t r u c t u r e a n d n o a t t a c h m e n t s are p o s s i b l e u n til th e h e a d a p p e a r s , t h e n a s i g n if i c a n t p r o p o r t i o n o f p r o c e s s i n g ti m e s h o u l d be s p e n t in p r o c e s s i n g t h e h e a d .

T h e h y p o t h e s i s t h a t a p h r a s a l n o d e is n o t p r o j e c t e d u n t i l its h e a d is e n c o u n t e r e d c a n b e t e s t e d w i t h th e E n g li s h n o u n p h r a s e , s i n c e th e h e a d o f an E n g l i s h n o u n p h r a s e a p p e a r s after a sp e c i fie r a n d a n y a d j e c t iv a l m o d ifie r s . For e x a m p l e , c o n s i d e r t h e E n g li s h n o u n p h r a s e the big red book. F ir st , th e w o r d the is read a n d a d e t e r m i n e r p h r a s e is b u il t . S i n c e it is a s s u m e d t h a t n o d e s are n o t p r o j e c t e d u n til their h e a d s are e n c o u n t e r e d , no n o u n p h r a s e is b u i l t at t h i s p o in t . T h e w ord big is n o w read a n d c a u s e s t h e p r o j e c t i o n o f an a d j e c t iv e p h r a s e . A t t a c h m e n t s are n o w a t t e m p t e d b e t w e e n t h e tw o s t r u c t u r e s b u il t t h u s far. N e i t h e r o f t h e c a t e g o r i e s ca n b e a r g u m e n t , sp e c i fie r or m o d i fie r for t h e o t h e r , s o n o a t t a c h m e n t is p o s s ib le . T h e n e x t w ord red n o w c a u s e s t h e p r o j e c t i o n o f an a d j e c t i v e p h r a s e , a n d o n c e a g a in n o a t t a c h m e n t s are p o s s i b l e . O n l y w h e n t h e w ord book is read a n d p r o j e c t e d to a n o u n p h r a s e c a n a t t a c h m e n t s ta k e p la c e . F ir st t h e a d j e c t i v e p h r a s e r e p r e s e n t i n g red a t t a c h e s as a m o d i f ie r o f t h e n o u n p h r a s e book. T h e n t h e A P r e p r e s e n t i n g big a t t a c h e s as a m o d i fie r o f t h e n o u n p h r a s e j u s t c o n s t r u c t e d . F i n a l l y t h e d e t e r m i n e r p h r a s e r e p r e s e n t i n g the a t t a c h e s as s p e c i fie r o f t h e n o u n p h r a s e big red book.

T h u s if w e a s s u m e t h a t a p h r a s a l n o d e is n o t p r o j e c t e d u n t i l its h e a d is p a r s e d , w e p r e d i c t t h a t a g r e a t e r n u m b e r o f a t t a c h m e n t s w ill t a k e p l a c e in p a r s i n g t h e h e a d t h a n in p a r s i n g a n y o t h e r w o r d in t h e n o u n p h r a se . S i n c e it is a s s u m e d t h a t an a t t a c h m e n t is a s i g n i f i c a n t p arser o p e r a t i o n , it is p r e d i c t e d t h a t p e o p l e s h o u l d t a k e m o r e t i m e p a r s i n g t h e h e a d o f t h e n o u n p h r a s e t h a n t h e y ta k e p a r s i n g t h e o t h e r w o r d s o f th e n o u n p h r a s e . S i n c e t h e r e is n o p s y c h o l i n g u i s t i c e v i d e n c e t h a t p e o p l e t a k e m o r e t i m e t o p r o c e s s h e a d s in h e a d - f in a l c o n s t r u c t i o n s , I h y p o t h e s i z e t h a t p h r a s a l n o d e s are b e i n g p r o j e c t e d b e fo r e th eir h e a d s are b e i n g e n c o u n t e r e d .

3

H y p o t h e s i z i n g a P h r a s a l N o d e B e fo r e Its H e a d A p p e a r s

3.1

T h e P a r s in g M o d e l: T h e C o n s t r a i n e d P arallel P a rse r

T h i s p a p e r a s s u m e s t h e C o n s t r a i n e d P a r a ll e l P a rser ( C P P ) as it s m o d e l o f h u m a n s e n t e n c e p r o c e s s i n g (s e e G i b s o n ( 1 9 8 7 ) , G i b s o n & C l a r k ( 1 9 8 7 ) a n d C la r k k G i b s o n ( 1 9 8 8 ) ) . T h e C P P m o d e l is b a s e d o n t h e p r in c i p le s o f G o v e r n m e n t - B i n d i n g T h e o r y ( C h o m s k y ( 1 9 8 1 , 1 9 8 6 a ) ) ; c r u c ia l ly C P P h a s n o s e p a r a t e m o d u l e c o n t a i n i n g l a n g u a g e - p a r t i c u l a r ru le s. F o l l o w i n g M a r c u s ( 1 9 8 0 ) , s t r u c t u r e s p a r s e d u n d e r t h e C P P m o d e l are p la c e d o n a s t a c k a n d t h e m o s t r e c e n t l y b u il t s t r u c t u r e s are p l a c e d in a d a t a s t r u c t u r e c a ll e d t h e

buffer.

T h e parser b u i l d s s t r u c t u r e b y a t t a c h i n g n o d e s in t h e buffer t o n o d e s o n t o p o f t h e s t a c k . U n lik e M a r c u s m o d e l , t h e C P P m o d e l a l l o w s m u l t i p l e r e p r e s e n t a t i o n s for th e s a m e i n p u t s t r i n g t o e x i s t in a buff er or s t a c k cell. A l t h o u g h m u l t i p l e r e p r e s e n t a t i o n s for t h e s a m e i n p u t s t r i n g are p e r m i t t e d , c o n s t r a i n t s o n p a r a l l e l i s m f r e q u e n t ly c a u s e o n e r e p r e s e n t a t i o n t o b e p referred o v er t h e o t h e r s . M o t i v a t i o n for t h e p a r a ll e l h y p o t h e s i s c o m e s fr o m g a r d e n p a t h e f f e c t s a n d p e r c e p t i o n o f a m b i g u i t y in a d d i t i o n t o r e l a t i v e p r o c e s s i n g l o a d e ff e c ts . For i n f o r m a t i o n o n t h e p a r t i c u l a r c o n s t r a i n t s a n d t h e ir m o t i v a t i o n s , s e e G i b s o n & C la r k ( 1 9 8 7 ) , C la r k & G i b s o n ( 1 9 8 8 ) a n d t h e r e f e r e n c e s c i t e d in t h e s e p a p e r s .

(4)

3.1 .1

L ex ica l E n tries for C P P

A le x i c a l e n t r y a c c e s s e d b y C P P c o n s i s t s of, a m o n g o t h e r t h i n g s , a theta-gnd. A t h e t a - g r i d is an u n o r d e r e d list o f theta structures. E ach t h e t a s t r u c t u r e c o n s i s t s o f a t h e m a t i c role a n d a s s o c i a t e d s u b c a t e g o r i z a t i o n f o r m a t i o n . O n e t h e t a s t r u c t u r e in a t h e t a - g r i d m a y be m a r k e d as indirect t o refer to its s u b j e c t . For e x a m p l e , t h e w ord shout m i g h t h a v e t h e fo l lo w in g t h e t a - g r i d:4

( 3 )

((Subcat = PREP, Thematic-Role = GOAL)

(Subcat = COMP, Thematic-Role = PR0P0SITI0H))

W h e n t h e w o r d shout (or an in f le c t e d v a r ia n t o f shout) is e n c o u n t e r e d in an i n p u t p h r a s e , t h e t h e m a t i c role agent w ill be a s s i g n e d t o it s s u b j e c t , as lo n g as th is s u b j e c t is a n o u n p h r a s e . T h e d ir e c t t h e m a t i c ro les goal a n d proposition w ill be a s s i g n e d to p r e p o s i t i o n a l a n d c o m p l e m e n t i z e r p h r a s e s r e s p e c t i v e l y , as l o n g as e a c h is p r e s e n t . S i n c e t h e ord er o f t h e t a s t r u c t u r e s in a t h e t a - g r i d is n o t r e le v a n t t o it s use in p a r s i n g , th e a b o v e t h e t a - g r i d for shout w ill b e su f fic i e n t to p a r s e b o t h s e n t e n c e s in ( 4 ) .

( 4 )

a. T h e m a n s h o u t s [ p p to t h e w o m a n ] [ c p t h a t E r n ie se e s t h e rock] b . T h e m a n s h o u t s [ c p t h a t E r n ie s e e s th e rock] [ p p to t h e w o m a n ]

3 .1 .2

X T h e o r y in C P P

T h e C P P m o d e l a s s u m e s

X

T h e o r y as p r e s e n t in C h o m s k y ( 1 9 8 6 b ) .

X

T h e o r y h a s t w o b a s i c p r in c ip le s : first, e a c h tr e e s t r u c t u r e m u s t h a v e a h e a d ; a n d s e c o n d , e a c h s t r u c t u r e m u s t h a v e a m a x i m a l p r o j e c t i o n . A s a r e s u l t o f t h e s e p r i n c i p l e s a n d o t h e r p r in c i p le s , (e.g., t h e 0 - C r i t e r i o n , t h e E x t e n d e d P r o j e c t i o n P r in c i p le , C a s e T h e o r y ) , t h e p o s i t i o n s o f a r g u m e n t s , sp e c i fie r s and- m o d i f i e r s w i t h r e s p e c t t o t h e h e a d o f a g i v e n s t r u c t u r e are l i m i t e d . In p a r t i c u l a r , a s p e c i f ie r m a y o n l y a p p e a r as a s i s t e r t o t h e o n e - b a r p r o j e c t i o n b e lo w a m a x i m a l p r o j e c t i o n , a n d t h e h e a d , a l o n g w i t h it s a r g u m e n t s , m u s t a p p e a r b e l o w t h e o n e - b a r p r o j e c t i o n . T h e o r d e r s o f t h e s p e c i f i e r a n d a r g u m e n t s r e l a t i v e t o t h e h e a d is l a n g u a g e d e p e n d e n t . For e x a m p l e , t h e ba sic s t r u c t u r e o f E n g l i s h c a t e g o r i e s is s h o w n b e l o w . F u r t h e r m o r e , b in a r y b r a n c h i n g is a s s u m e d ( K a y n e ( 1 9 8 3 ) ) , so t h a t m o d i f i e r s are C h o m s k y - a d j o i n e d t o t h e t w o - b a r or o n e - b a r le v e l s , g i v i n g o n e p o s s i b l e s t r u c t u r e for a p o s t - h e a d m o d i f i e r b e l o w o n t h e r ig h t .

S p e c i f i e r ^ j ^ S p e o f i e r ^ ^

X A rgum en t* ^ ^ ^ M o d i f l e r

X A rgum en t*

3.1 .3

T h e C P P P a r s in g A l g o r i t h m

T h e C P P a l g o r i t h m is e s s e n t i a l l y v e r y s i m p l e . A w o r d is p r o j e c t e d v i a n o d e p r o j e c t i o n ( s e e S e c t i o n 3 .2 ) i n t o t h e buff er. If a t t a c h m e n t s are p o s s i b l e b e t w e e n t h e b uffer a n d t h e t o p o f t h e s t a c k , t h e n t h e r e s u l ts o f t h e s e a t t a c h m e n t s are p l a c e d i n t o t h e b uffer a n d t h e s t a c k is p o p p e d . A t t a c h m e n t s are a t t e m p t e d a g a in u n t i l n o lo n g e r p o s s i b l e . T h i s e n t i r e p r o c e d u r e is r e p e a t e d for e a c h w o r d in t h e i n p u t s t r i n g . T h e fo r m a l C P P a l g o r i t h m is g i v e n b e lo w :

I. ( I n i t i a l i z a t i o n s ) S e t t h e s t a c k t o nil. S e t t h e buff er t o nil.

4 In a m ore co m p lete theory, a sy n ta ctic category w ould b e d eterm ined from the th em atic role (C h om sk y (1 9 8 6 a )).

(5)

2 ( E n d i n g C o n d i t i o n ) If t h e e n d o f t h e in p u t s t r i n g h as b e e n r e a c h e d an d th e buffer is e m p t y t h e n re tu rn th e c o n t e n t s o f t h e s t a c k a n d s t o p .

3 If t h e buffer is e m p t y t h e n p r o j e c t n o d e s for e a c h l e x i c a l e n t r y c o r r e s p o n d i n g t o t h e n e x t word in th e in p u t s t r i n g , a n d p u t t h i s list o f m a x i m a l p r o j e c t i o n s in t o th e buffer.

4 M ake all p o s s i b l e a t t a c h m e n t s b e t w e e n th e s t a c k a n d th e buffer, s u b j e c t to th e a t t a c h m e n t c o n s t r a i n t s (s e e C la r k & G i b s o n ( 1 9 8 8 ) ) . P u t t h e a t t a c h e d s t r u c t u r e s in th e buffer. If no a t t a c h m e n t s are p o s s ib le , t h e n p u t t h e c o n t e n t s o f t h e buffer on t o p o f th e st a c k .

5. G o t o 2.

3.2

T h e P r o j e c t i o n o f N o d e s fr om t h e L ex ico n

N o d e p r o j e c t i o n p r o c e e d s as f o l lo w s . F irst a le x i c a l i t e m is p r o j e c t e d t o a p h r a s a l n o d e : a Confirm ed n o d e (C-node). F o l l o w i n g X T h e o r y , e a c h l e x i c a l e n t r y for a g iv e n w ord is p r o j e c t e d m a x i m a l l y . For e x a m p l e , th e word rock, w h ic h h a s b o t h a n o u n a n d a verb e n t r y w o u l d b e p r o j e c t e d t o at le a s t tw o m a x i m a l p r o j e c t io n s :

(5)

a. [/vp [n1 [jV rock ]]] b. [vp [v [v rock ]]]

N e x t , t h e p a r se r h y p o t h e s i z e s n o d e s w h o s e h e a d s m a y a p p e a r i m m e d i a t e l y to t h e rig ht o f th e g i v e n C -n o d e . T h e s e p r e d i c t e d s t r u c t u r e s are c a ll e d hypothesized n o d e s or H-nodes. A n H - n o d e is d e fi n e d t o b e any n o d e w h o s e h e a d is t o . a e rig ht o f all l e x i c a l i n p u t . In o r d e r to d e t e r m i n e w h ic h H - n o d e s t r u c t u r e s t o h y p o t h e s i z e fr o m a g i v e n C - n o d e , it is n e c e s s a r y t o c o n s u l t t h e a r g u m e n t p r o p e r t i e s a s s o c i a t e d w i t h t h e C-n’o d e t o g e t h e r w it h t h e s p e c i f i e r a n d m o d i f ie r p r o p e r t i e s o f t h e n o d a l c a t e g o r y a n d th e w ord o rd er p r o p e r t i e s o f th e l a n g u a g e in q u e s t i o n . It is a s s u m e d t h a t t h e a b il it y o f o n e c a t e g o r y to a c t as s p e c i f i e r , m o d i fie r or a r g u m e n t o f a n o t h e r c a t e g o r y is p a r t o f u n p a r a m e t e r i z e d U n i v e r s a l G r a m m a r . O n t h e o t h e r h a n d , th e rela tiv e or d e r o f t w o c a t e g o r i e s is a s s u m e d t o b e p a r a m e t e r i z e d a c r o s s d if feren t l a n g u a g e s . For e x a m p l e , a d e te r m in e r p h r a s e , if it e x i s t s in a g i v e n l a n g u a g e , is u n i v e r s a l l y a ll o w a b l e as a s p e c i f ie r o f a n o u n p h r a s e . W h e t h e r t h e d e t e r m i n e r a p p e a r s b e f o r e or a fter it s h e a d n o u n d e p e n d s on t h e l a n g u a g e - p a r t i c u l a r v a lu e s a s s o c i a t e d w i t h t h e p a r a m e t e r s t h a t d e t e r m i n e w o r d o rd er.

T h r e e p a r a m e t e r s are p r o p o s e d t o a c c o u n t for v a r i a t i o n in w o r d o rd er, o n e for e a c h o f a r g u m e n t , s p e c ifie r and m o d i fie r p r o j e c t i o n s.5 For e a c h l a n g u a g e , e a c h o f t h e s e p a r a m e t e r s is a s s o c i a t e d w i t h a t l e a s t o n e va lu e , w here th e p a r a m e t e r v a l u e s c o m e f r o m t h e f o l l o w i n g set: { * h e a d * , * s a t e l l i t e* } . 6 T h e v a lu e h e a d i n d i c a t e s t h a t a c a t e g o r y C c a u s e s t h e p r o j e c t i o n t o t h e r ig h t o f t h o s e c a t e g o r i e s for w h ic h C m a y b e h e a d . T h u s this v a lu e i n d i c a t e s h e a d - i n i t i a l w o r d o r d e r . T h e v a lu e ^ s a t e l l i t e * i n d i c a t e s t h a t a c a t e g o r y C c a u s e s th e p r o j e c t io n t o t h e r ig h t o f t h o s e c a t e g o r i e s for w h i c h C m a y b e a s a t e l l i t e c a t e g o r y . H e n c e t h i s v a l u e i n d i c a t e s h e a d -f in a l w o r d o r d e r . H - n o d e p r o j e c t i o n f r o m a c a t e g o r y C is d e f i n e d in (6) .

(6)

u

/

( A r g u m e n t , S p e c if ie r , M o d i f i e r ) H - N o d e P r o j e c t i o n f r o m c a t e g o r y C: If t h e v a lu e a s s o c i a t e d w i t h t h e ( a r g u ­ m en t , sp e c i fie r , m o d i f i e r ) p r o j e c t i o n p a r a m e t e r is * h e a d * , t h e n c a u s e t h e p r o j e c t i o n o f ( a r g u m e n t , s p e c ifie r , m o d ifier) s a t e l l i t e s , a n d a t t a c h t h e m t o t h e rig h t b e l o w t h e a p p r o p r i a t e p r o j e c t i o n o f C . If t h e v a lu e a s s o c i ­ a te d w i t h t h e ( a r g u m e n t , s p e c i f i e r , m o d i f i e r ) p r o j e c t i o n p a r a m e t e r is ^ s a t e l l i t e * , t h e n c a u s e t h e p r o j e c t i o n o f ( a r g u m e n t , s p e c i f ie r , m o d i f i e r ) h e a d s , a n d a t t a c h t h e m t o t h e r ig h t a b o v e t h e a p p r o p r i a t e p r o j e c t i o n o f C .

In E n g li s h t h e a r g u m e n t p r o j e c t i o n p a r a m e t e r is s e t t o * h e a d * , s o t h a t a r g u m e n t s a p p e a r a ft er t h e h e a d . H en ce, if a l e x i c a l e n t r y h a s r e q u i r e m e n t s t h a t m u s t b e filled, t h e n s t r u c t u r e s c o r r e s p o n d i n g t o s u b c a t e g o r i z e d

5Furthermore, it is assu m ed th at the value of the m odifier projection param eter defaults to the value of the argum ent

projection param eter.

6 1 will use the term s a t e l l i t e to in d icate non -h ead con stituen ts: argum ents, specifiers and m odifiers.

(6)

c a t e g o r i e s are h y p o t h e s i z e d a n d a t t a c h e d . For e x a m p l e , th e verb see s u b c a t e g o r i z e s for a n o u n p h r a s e , so an e m p t y n o u n p h r a s e n o d e is h y p o t h e s i z e d and a t t a c h e d as a r g u m e n t o f th e verb:

( 7 )

[v p [ v [v s e e ] [iVp e ]]]

T h e s p e c i f ie r p r o j e c t i o n p a r a m e t e r , o n t h e o t h e r h a n d , is s e t t o -the v a lu e ^ s a t e lli te * in E n g li s h so t h a t s p e c i f ie r s a p p e a r b e fo r e their h e a d s . If t h e c a t e g o r y a s s o c i a t e d w i t h a C - n o d e is an a ll o w a b l e s p e c ifie r for o t h e r c a t e g o r i e s , t h e n an H - n o d e p r o j e c t i o n o f each o f t h e s e c a t e g o r i e s is b u il t a n d th e C - n o d e s p e c ifie r is a t t a c h e d to e a c h . For e x a m p l e , s i n c e a d e t e r m i n e r m a y s p e c i f y a n o u n p h r a s e , an H - n o d e n o u n p h r a s e is h y p o t h e s i z e d w h e n p a r s i n g a d e t e r m i n e r in E n g lish :

(8)

[.VP [D e t P [ D e V [oet the ]]] [at/ [/V t ]]]

T h u s t h e n o d e p r o j e c t i o n a l g o r i t h m p r o v id e s a n e w d e r i v a t i o n o f l a n g u a g e - p a r t i c u l a r w ord o r d e r . In p r e v i o u s p r i n c i p l e - b a s e d s y s t e m s , w ord o rd er is d e r iv e d fro m p a r a m e t e r i z e d d i r e c t i o n o f a t t a c h m e n t (see G i b s o n & C la r k ( 1 9 8 7 ) , N y b e r g ( 1 9 8 7 ) , VVehrli ( 1 9 8 8 ) ) . A n a t t a c h m e n t t a k e s p la c e f r o m buffer t o s t a c k in h e a d - i n i t i a l c o n s t r u c t i o n s a n d fr o m s t a c k t o buffer in h e a d - f i n a l c o n s t r u c t i o n s . S i n c e a t t a c h m e n t is n o w a u n i f o r m o p e r a t i o n as d e f i n e d in ( 1 7 ) , t h i s p a r a m e t e r i z a t i o n is n o lo n g e r n e c e s s a r y . I n s t e a d , in h e a d -in i t i a l c o n s t r u c t i o n s , n o d e s n o w p r o j e c t t o t h e n o d e s t h a t t h e y m a y i m m e d i a t e l y d o m i n a t e . In h e a d - f -in a l c o n s t r u c t i o n s , n o d e s n o w p r o j e c t t o t h o s e n o d e s t h a t t h e y m a y b e i m m e d i a t e l y d o m i n a t e d by.

T h e p r o j e c t i o n p a r a m e t e r s as d e f i n e d in (6) a c c o u n t for m a n y f a c t s a b o u t w ord o rd er a c r o s s l a n g u a g e s . H o w e v e r , m o s t , if n o t all, l a n g u a g e s h a v e c a s e s t h a t d o n o t fit t h i s c le a n p ic t u r e . For e x a m p l e , w h ile m o d i f ie r s in E n g l i s h ar e p r e d o m i n a n t l y p o s t - h e a d , a d j e c t i v e s a p p e a r b e fo r e t h e h e a d . A s i n g le g l o b a l v a lu e for m o d i f ie r p r o j e c t i o n p r e d i c t s t h a t t h i s s i t u a t i o n is i m p o s s i b l e . H e n c e w e m u s t a s s u m e t h a t t h e v a lu e s g i v e n for th e p r o j e c t i o n p a r a m e t e r s ar e o n l y d e f a u l t v a lu e s . In o rd er to f o r m a liz e th is id e a , I a s s u m e t h e e x i s t e n c e o f a h ie r a r c h y o f c a t e g o r i e s a n d w o r d s a s . s h o w n b elo w :

C a t e g o r y

N o u n V erb A d p o s i t i o n

E r n ie ro c k ... s e e e a t ... t o o n

It is a s s u m e d t h a t t h e v a l u e for e a c h o f t h e p r o j e c t i o n p a r a m e t e r s is t h e d e f a u l t v a lu e for t h a t p r o j e c t i o n t y p e w i t h r e s p e c t t o a p a r t i c u l a r l a n g u a g e . H o w e v e r , a p a r t i c u l a r c a t e g o r y or w o rd m a y h a v e a v a lu e a s s o c i a t e d w i t h it for a p r o j e c t i o n p a r a m e t e r in a d d i t i o n t o t h e d e f a u l t o n e . If t h i s is t h e c a s e , t h e n o n l y t h e m o s t s p e c i f i c v a l u e is u s e d . For e x a m p l e , in E n g l i s h , t h e c a t e g o r y a d j e c t i v e is a s s o c i a t e d w i t h th e v a l u e ^ s a t e l l i t e * w i t h r e s p e c t t o m o d i f i e r p r o j e c t i o n . T h u s E n g l i s h a d j e c t i v e s a p p e a r b e f o r e t h e h e a d . T h e a d j e c t i v e tall w ill t h e r e f o r e c a u s e t h e p r o j e c t i o n o f b o t h a C - n o d e a d j e c t i v e p h r a s e a n d an H - n o d e n o u n p h r a s e :

( 9 )

a . [AP ta ll ]

b- [jvp Lv' [a p tall ] Dv' (/v e ]]]]

I f r e c u r s i v e a p p l i c a t i o n o f p r o j e c t i o n t o H - n o d e s w e r e a ll o w e d , t h e n it w o u l d b e p o s s i b l e , in p r in c i p le , t o p r o j e c t a n i n f i n i t e n u m b e r o f n o d e s f r o m a s i n g l e l e x i c a l e n tr y . In E n g l i s h , for e x a m p l e , a g e n i t i v e n o u n p h r a s e c a n s p e c i f y a n o t h e r n o u n p h r a s e . T h i s n o u n p h r a s e m a y a ls o b e a g e n i t i v e n o u n p h r a s e , a n d s o on. If H - n o d e s c o u l d p r o j e c t t o f u r t h e r H - n o d e s , t h e n it w o u l d b e n e c e s s a r y t o h y p o t h e s i z e a n i n f in i t e n u m b e r o f g e n i t i v e N P H - n o d e s for e v e r y g e n i t i v e N P t h a t is r ead . A s a r e s u l t o f t h i s d ifficu lty, t h e H - n o d e P r o j e c t i o n C o n s t r a i n t is p r o p o s e d :

(7)

T h e H - n o d e P r o j e c t i o n C o n s t r a i n t : O n l y a C - n o d e m a y c a u s e th e p r o j e c t i o n o f an H - n o d e .

A s a r e s u l t o f t h e H - n o d e P r o j e c t i o n C o n s t r a i n t . H - n o d e s m a y not in v o k e H - n o d e p r o j e c t i o n . For e x a m p l e , if a s p e c i f i e r c a u s e s t h e p r o j e c t i o n o f its h e a d , th e r e s u l t i n g h e a d c a n n o t t h e n c a u s e t h e p r o j e c t i o n o f th o s e c a t e g o r i e s t h a t it m a y s p e c i f y . A s a r e s u l t, th e n u m b e r o f n o d e s t h a t m a y be p r o j e c t e d fr o m a s i n g le lex ica l i t e m is s e v e r e l y r e s t r i c t e d .

3.3

N o d e A t t a c h m e n t

G i v e n th e a b o v e n o d e p r o j e c t i o n a l g o r i t h m , it is n e c e s s a r y to d e fin e an a l g o r i t h m for a t t a c h m e n t o f n o d e s . S in c e s t r u c t u r e s are p r e d i c t e d by t h e n o d e p r o j e c t i o n a l g o r i t h m , th e a t t a c h m e n t a l g o r i t h m m u s t d i c t a t e h o w s u b s e q u e n t s t r u c t u r e s m a t c h t h e s e p r e d i c t i o n s . C o n s i d e r t h e f o l l o w i n g tw o e x a m p l e s f r o m E n g li s h : th e first is an e x a m p l e o f s p e c i f ie r a t t a c h m e n t ; th e s e c o n d is an e x a m p l e o f a r g u m e n t a t t a c h m e n t . In E n g li s h , s p e c i f ie r s p r e c e d e t h e h e a d a n d a r g u m e n t s follo w th e h e a d . It is d e s i r a b le for t h e a t t a c h m e n t a l g o r i t h m to h a n d l e b o t h k in d s o f a t t a c h m e n t s w i t h o u t w ord o rder p a r t i c u la r s t i p u l a t i o n s .

F i r s t , s u p p o s e t h a t t h e w o r d the is o n th e s t a c k as b o t h a d e t e r m i n e r p h r a s e and an H - n o d e n o u n p h r a se . F u r t h e r m o r e , s u p p o s e t h a t t h e w ord woman is p r o j e c t e d in t o th e buffer as b o t h a n o u n p h r a s e a n d an H - n o d e c l a u s a l p h r a s e : '

(1 1)

S ta ck : [DetP [Det1 [Det t h e ]]]

[ N P [D e t P [ D e t 1 [D e t t h e ]]] for# for t ]]]

Buffer: forp for' [ n w o m a n ]]]

[ * P e « . . . . [ n p [n> [ n w o m a n ]]] f o r ' , . . . . foreu . . . e ]]]

T h e a t t a c h m e n t a l g o r i t h m s h o u l d a ll o w t w o a t t a c h m e n t s a t t h i s p o in t : t h e H - n o d e N P o n t h e s t a c k u n i t i n g w i t h e a c h N P C - n o d e in t h e buff er. It m i g h t a ls o s e e m r e a s o n a b l e to a llo w t h e b a r e d e t e r m i n e r p h r a s e to a t t a c h d i r e c t l y as s p e c i f ie r o f e a c h n o u n p h r a s e . H o w e v e r , th i s kin d o f a t t a c h m e n t is u n d e s i r a b l e for t w o r e a s o n s . F i r s t o f all, it m a k e s t h e a t t a c h m e n t o p e r a t i o n a d i s j u n c t i v e o p e r a t i o n : an a t t a c h m e n t w o u ld i n v o l v e either m a t c h i n g an H - n o d e o r m e e t i n g t h e s a t e l l i t e r e q u i r e m e n t s o f a c a t e g o r y . S e c o n d o f all, it m a k e s H - n o d e p r o j e c t i o n u n n e c e s s a r y in m o s t s i t u a t i o n s an d th e r e f o r e s o m e w h a t s t i p u l a t i v e . T h a t is, a l l o w i n g a d i s j u n c t i v e a t t a c h m e n t o p e r a t i o n w o u ld p e r m i t m a n y d e r i v a t i o n s t h a t n e v e r u s e a n H - n o d e , s o t h a t t h e n e e d for H - n o d e s w o u l d b e r e s t r i c t e d t o h e a d - f i n a l c o n s t r u c t i o n s w i t h p r e - h e a d s a t e l l i t e s ( s e e S e c t i o n 2 ) . It is t h e r e f o r e d e s i r a b l e for all a t t a c h m e n t s t o in v o l v e m a t c h i n g an H - n o d e .

T w o s t r u c t u r e s s h o u l d b e r e t u r n e d a ft er a t t a c h m e n t s in (1 1): a C - n o d e n o u n p h r a s e a n d a n H - n o d e c l a u s a l p h r a se :

(

12

)

a . [ n p [DetP t h e ] for» for w o m a n ]]]

b - [ a - P c u ... [ n p [ D e t P t h e ] for' [ n w o m a n ]]] [ * ; , . . . . [ * „ . . . e 111 N o w c o n s i d e r an E n g l i s h a r g u m e n t a t t a c h m e n t . S u p p o s e t h a t a p r e p o s i t i o n a l p h r a s e r e p r e s e n t i n g th e w ord beside is o n t h e s t a c k a n d t h e n o u n F m n k is r e p r e s e n t e d in t h e buff er as a n o u n p h r a s e a n d a c l a u s a l phrase:

(13)

S tack : [ p p [p> [p b e s i d e ] forp e ]]] Buffer: forp for* fo/ Frank ]]]

[ a - P c , .. .. U p [ n 1[jv F r a n k ]]] [ * ' u . #. [ x cl. m. . « ]]]

(10)

7 A noun phrase is p rojected to an H -node clau sal (or predicate) phrase since nouns m ay b e th e su b jects of predicates.

(8)

S i n c e t h e p r e p o s i t i o n beside s u b c a t e g o r i z e s for a n o u n p h r a s e , t h e r e is an H - n o d e N P a t t a c h e d as it s o b j e c t . T h e a t t a c h m e n t a l g o r i t h m s h o u l d a ll o w a s i n g l e a t t a c h m e n t at th i s p o in t : t h e n o u n p h r a s e r e p r e s e n t in g Frank u n i t i n g w i t h th e H - n o d e N P o b j e c t o f beside:

(14)

[pp [p‘ [p b e s i d e ] [ s p Frank ]]]

A s s h o u l d b e clea r fro m t h e t w o e x a m p l e s , th e p r o c e s s o f a t t a c h m e n t in v o l v e s c o m p a r i n g a p r e v i o u s l y p r e d i c t e d c a t e g o r y w i t h a c u r r e n t c a t e g o r y . If th e t w o c a t e g o r i e s are compatible, t h e n a t t a c h m e n t m a y be v ia b l e .

3.3.1

N o d e C o m p a t i b ili t y

C om patibility is d e f i n e d in t e r m s o f unification, w h ic h is d e fi n e d t e r m s o f su b su m ption .8 A s t r u c t u r e X is s a i d to su b su m e a s t r u c t u r e V' if X is m o r e g e n e r a l t h a n Y. T h a t X c o n t a i n s less s p e c i f ic in f o r m a t i o n them Y. S o , for e x a m p l e , a s t r u c t u r e t h a t is s p e c i f i e d as c la u sal ( e .g . t l e a d o f a p r e d i c a t e ) , b u t is n o t s p e c i f i e d for a p a r t i c u l a r c a t e g o r y s u b s u m e s a s t r u c t u r e h a v i n g t h e c a te g o r v erb, s i n c e v e r b s are p r e d i c a t i v e a n d t h u s c l a u s a l c a t e g o r i e s . H e n c e s t r u c t u r e

(15a)

s u b s u m e s s t r u c t u r e

(15b):

(15)

a - [ * P CU . . . e ]]] b . [ v p [v> [v w a l k ]]]

T h e unification o p e r a t i o n is t h e l e a s t u p p e r b o u n d o p e r a t o r in t h e s u b s u m p t i o n o r d e r in g o n i n f o r m a t i o n in a s t r u c t u r e . S i n c e s t r u c t u r e

(15a)

s u b s u m e s s t r u c t u r e

(15b),

t h e r e s u l t o f u n i f y i n g s t r u c t u r e

(15a)

w i t h s t r u c t u r e

(15b)

is s t r u c t u r e

(15b).

T w o s t r u c t u r e s are compatible if t h e u n i f i c a t i o n o f t h e t w o s t r u c t u r e s is n o n - n i l . T h e i n f o r m a t i o n o n a s t r u c t u r e t h a t is r e le v a n t to a t t a c h m e n t c o n s i s t s o f t h e n o d e ’s b a r le v el (e.g., z e r o le v el, i n t e r m e d i a t e or m a x i m a l ) , a n d t h e n o d e ’s l e x i c a l f e a t u r e s , (e.g. c a t e g o r y , c a s e , etc).

3 .3 .2

A t t a c h m e n t

Roughly speaking, the attachm ent operation should locate an H-node in a structure on the stack along with

a compatible node in a structure in the buffer. If both of these structures have parent tree structures, then

these parent tree structures must also be compatible. In order to keep the process of attachm ent simple, it

is proposed th at each attachm ent have at most one compatibility

This constraint is given in

(1 6 ):9

(16)

Attachm ent C onstraint: At most one nontrivial lexical feature unification is perm itted per attachm ent.

A nontrivial unification is one th at involves two nontrivial structures; a trivial unification is one that

involves at least one trivial structure. For example, if the parent node of the buffer site is as of yet undefined,

then the parent node of the stack site trivially unifies with this parent node. Only when both parents are

defined is there a nontrivial unification.

Consider the effect of the following three requirements: first, the lexical features of the stack and buffer

attachm ent sites must be compatible; second, the tree structures above the buffer and stack attachm ent sites

must be compatible; and third, at most one lexical feature unification is permissible per derivation,

(1 6 ).

Since any attachm ent must involve at least one nontrivial lexical feature unification, th at of the stack and

buffer sites, any additional nontrivial unifications will violate the attachm ent constraint in

(1 6 ) .

If both

8 See S h eib er (1986) for background on the p o ssib le uses o f un ification in particular gram m ar form alism s.

9 In fact, this con strain t follow s from the tw o assu m p tion s: first, a co m p atib ility check takes a certain am oun t of processing tim e; and second, a tta ch m en ts th a t take less tim e are preferred over th ose th at take more tim e. See G ib son (forth com in g) for further discussion.

(9)

the buffer and stack attachm ent sites have parent tree structures, then the lexical features of these parents

will need to be unified. Since the child structures will also need to be unified, (16) will be violated. Thus

it follows th a t, in an attachm ent, either the buffer site or the stack site has no parent tree stru c tu re

. 10

Since the order of the words in the input must be maintained in a final parse, only those nodes in a buffer

structure th a t dominate all lexical items in that structure are permissible as attachm ent sites. For example,

suppose th at the buffer contained a representation for the noun phrase

women in college.

Furthermore,

suppose th a t there is an H-node NP on the stack representing the word

the.

Although it would be suitable

for the buffer structure representing the entire noun phrase

women in college

to match the stack H-node, it

would not be suitable for the C-node NP representing

college

to match this H-node. This attachm ent would

result in a stru ctu re th a t moved the lexical input

women in

to the left of the lexical input dominated by

the matched H-node, producing a parse for the input

women

m

the college.

Since the word order of the

input string must be maintained, sites for buffer attachm ent must dominate

all lexical items in the buffer

structure.

Once suitable maximal projections in each of the buffer and stack structures have been identified for

matching, it is still necessary to check th a t their internal structures are compatible. For example, suppose

that an identified buffer site is a C-node whose head allows exactly one specifier and a specifier is already

attached. If the stack H-node site also contains a specifier, then the attachm ent should be blocked. On the

other hand, if the stack H-node site does not contain a specifier, and other requirements are satisfied, then

the attachm ent should be allowed.

Testing for internal structure compatibility is quite simple if all tree structures are assumed to be binary

branching ones. The only possible attachm ent sites inside the stack H-node are those nodes th a t dominate

no other nodes. As long as there is some buffer node th at both dominates all the buffer input and matches

the H-node attac h m e n t site for bar level, then the attachm ent is possible.

A ttachm ent is formally defined in (17):

(17)

A structure

W

in the buffer can attach to a structure

X

on the stack iff all of

(a ), (b ), (c ), (d)

and

(a)

are true:

a.

Structure

W

contains a maximal projection node,

Y ,

such th a t

Y

dominates all lexical material in

W \

b.

Structure

X

contains a maximal projection H-node structure, Z;

c. The tree stru ctu re above

Y

is compatible with the tree structure above

Z,

subject to the attachm ent

constraint in (16);

d. The lexical features of structure

Y

are compatible with the lexical features of structure Z;

e. Structure

Y

is

bar-level compatible

with structure Z.

Bar-level compatibility is defined in (18):

(18)

A structure

U

in the buffer is

bar-level compatible

with a structure

V

on the stack iff all of

(a),

(b)

and

(c)

are true:

a.

Structure

U

contains a node,

S,

such th at

S

dominates all lexical material in

U

;

b.

Structure

V

contains an H-node structure,

T,

th a t dominates no lexical material;

c. The bar level of 5 is compatible with the bar level of

T .

If attachm ent is viable, then

W

contains a structure

Y

th a t is bar-level compatible with a structure Z

that is part of

X .

Since

Y

and Z are bar-level compatible, there are structures 5 and

T

inside

Y

and Z

10 It m ight seem th a t som e possib le a tta ch m en ts are being thrown away at this poin t. T h a t is, in princip le, there m ight be a structure that can only be form ed by attach in g a buffer site to a stack site where b o th sites have parent tree structures. This attach m en t would be blocked by (1 6 ) . H owever, it turns out that any attach m en t th at could have b een form ed by an attachm ent involving m ore than on e lexical feature un ification can always be arrived at by a different a tta ch m en t involving a single lexical feature unification . For the proof, see G ibson (forth com ing).

(10)

When the conditions for attachm ent are satisfied, structures

W

and

X

are united in the following way.

First.

\ V

and

X

are copied to nodes

W '

arid

X '

respectively. Inside

X '

there is a node,

Z

' , th a t is a copy of

Z. The lexical features of Z ' axe set to the unification of the lexical features of structures

Y

and

Z .

Next,

structure

V

in

Z '

(corresponding to structure

T

in

Z )

is replaced by

S '

, the copy of structure 5 inside

W

.

The bar level of

V

is set to the unification of the bar levels of structures 5 and

T

.

Finally, the tree structures above

Y

and

Z

are unified and this tree structure is attached above

Z '

T h a t

is, if

Z

has some parent tree structure and

Y

does not, then the copy of this structure inside

X '

is attached

above

Z ' .

Similarly, if

Y

has some parent tree structure and

Z

does not, then the copy of this structure

inside

\ V

is attached above

Z ' .

If neither node has any parent tree structure (i.e.,

W - Y , X

= Z), then

the unification is trivial and no attachm ent is made. Since V and Z cannot both have parent tree structures

(see (16) and the discussion following it), unifying the parent tree structures is a very simple process.

respectively, th a t satisfy the c o n d itio n s o f bar-level co m p atib ility, ( 1 8 ).

3 .3 .3 . E x a m p le A t t a c h m e n t s

As an illustration of how attachm ents take place, consider once again the noun phrase

the big red book.

First

the determiner

the

is read and is projected to a C-node determiner phrase. Since a determiner is allowable

as the specifier of a noun phrase and specifiers occur before the head in English, an H-node NP is also built.

These two structures are depicted in

(19):

(19)

a. [D e t P t h e ]

b. [ivp [

D e t P

the ] Lv' [/v e ]]]

Since there is nothing on the stack, these structures are shifted to the top of the stack. The word

big

projects to both a C-node AP and an H-node NP since an adjective is allowable as a pre-head modifier in

English. These two structures are placed in the buffer (depicted in (2 0 )).

( 2 0 )

a. [a p b i g ]

b.

[ n p [ n ' [a p b ig ] [n 1 [/v « ]]]]

An attachm ent between nodes

(19b)

and

(20b)

is now attem pted. Note that: a) node

(20b)

is a maximal

projection dom inating all lexical material in its buffer structure; b) node

(19b)

is a maximal projection H-

node on the stack; c) the tree structures above these two nodes are compatible (both are undefined); and

d) the categories of the two nodes are compatible. It remains to check for bar-level compatibility of the two

structures. Since: a) the N'2 in structure

(20b)

dominates all the buffer input; b) the H-node

in structure

(19b)

dominates no C-nodes; and c) N'x and N2 are compatible in bar level, the structures in

(19b)

and

(20b)

can be attached. The two structures are therefore attached by uniting N#x and N'2. The resultant

structure is given in

(

2 1

):

(21)

[np [

D e t P

the ] [n'

[a p

big ] [n' [^v «

]]]]

Structure

(2 1 ) ,

the only possible attachm ent between the buffer and the stack, is placed back in the

buffer, and the stack is popped. Since there is now nothing left on the stack, no further attachm ents are

possible at this time. Structure (21) is thus shifted to the stack. The word

red

now enters the buffer as a

C-node adjective phrase and an H-node noun phrase:

(

2 2

)

a .

[AP

red ]

b.

[ n p [ n ; [a p r e d ] [ n ' [ n « ]]]]

(11)

An attachm ent between nodes (

2 1

) and (

2 2

b) is now attem pted. Requirements ( 1 7 a ) - ( l 7 d ) are satisfied

and the requirement for bar-level compatibility is satisfied by the node labeled N

3

in (

2 1

) together with N'

in (

2 2

b). Hence the structures are united, giving (23):

*

4

(23)

[.vp

[ D e t P

the ] [jv»

[ A P

big ] [v'

[ap

red ] [,V; [,v

e

]]]]]

Since (23) is the only possible attachm ent between the buffer and the stack, it is placed in the buffer

and the stack is popped. Since the stack is now empty, structure (23) shifts to the stack. The noun

b o o k

now enters the buffer as both a C-node noun phrase and an H-node clausal phrase:

(24)

a. [.vp [/v» [

at

book ]]]

b - [ x P cu . . . [n p Dv' [ n b o ° k ]]] k i . . , . e ]]]

Two attachm ents are possible at this point. The NP structure in (23) unites with each NP C-node on

the stack, resulting in the structures in (25):

(25)

a - [vp [

D e t P

the ] [v'

[a p

big ] [v'

[a p

red ] [^/

[ ^ >

book ]] [pp e]

[ C p e

]]]]]

b - [xp«i..„

[n p

the big red book ]

e

]]]

Note th a t only one attac h m e n t per structure takes place in the final parse step. Crucially, no more

attachments per stru ctu re take place when parsing the head of the noun phrase than when parsing the pre­

head constituents in the noun phrase

. 11

Thus, in contrast with the situation when nodes are only projected

when their heads are encountered, the node projection and attachm ent algorithms described here predict

that there should not be any slowdown when parsing the head of a head-final construction.

The Dutch d a ta described in Section 2.1 are handled in a similar manner.

4

C o n c lu s io n s

This paper has described a) a principle-based algorithm for the projection of phrasal nodes before their

heads are parsed, and b) an algorithm for attaching the predicted nodes. It is worthwhile to compare the

new projection algorithm with algorithms th at do not project H-nodes. The projection algorithm provided

here involves more work and hence, on the surface, may seem somewhat stipulative compared to one that

does not project H-nodes. However, it turns out that although projecting -to H-nodes is more complicated

than not doing so, attachm ent when H-nodes are not present is more complicated than attachm ent when

they are present. T hat is, if a projection algorithm causes the projection of H-nodes, it will have a more

complicated attachm ent algorithm. For example, if H-nodes are projected when parsing the noun phrase

t h e w o m a n ,

the determiner

the

is immediately projected to an H-node noun phrase, which leads to a simple

attachm ent. If H-nodes are not projected, then projection is easier, but attachm ent is th at much more

complicated. When attaching, it will be necessary to check if a determiner is an allowable specifier of a noun

phrase: the same operation th a t is performed when projecting to H-nodes. Thus although the complexity of

particular components changes , the complexity of the entire parsing algorithm does not change, whether or

not H-nodes are projected. Since the proposed projection and attachm ent algorithms make better empirical

predictions than ones th at do not predict structure, the new algorithms are preferred.

N ote that it is th e num ber o f atta ch m en ts per structure that is crucial here, and not th e num ber o f total a tta ch m en ts, since attach m en ts m ade up on two ind ep en den t structures m ay be perform ed in parallel, w hereas a tta ch m en ts m ade on the same structure m ust b e perform ed serially. For exam ple, since structures (2 4 a ) w id (24b ) are in d ep en d en t, attach m en ts m ay e made to each of th ese in parallel. B ut if an a tta ch m en t, B relies on the result of anoth er attach m en t A, th en attach m en t A

must be perform ed first.

(12)

5

R e fe r e n c e s

Abney (1986), “Licensing and Parsing'’,

Proceedings of the Seventeenth North East Linguistic Society Con­

ference,

MIT, Cambridge, MA.

Chomsky, N. (1981),

Lectures on Government and Binding,

Foris, Dordrecht, The Netherlands.

Chomsky, N. (1986a),

Knowledge of Language: Its Nature, Origin and Use

, Praeger Publishers, New York,

NY.

Chomsky, N. (1986b),

Barriers,

Linguistic Inquiry Monograph 13, MIT Press, Cambridge, MA.

Clark, R.

&

Gibson, E. (1988), “A Parallel Model for Adult Sentence Processing” ,

Proceedings of the Tenth

Cognitive Science Conference,

McGill University, Montreal, Quebec.

Frazier, L. (1987) “Syntactic Processing Evidence from Dutch” ,

Natural Language and Linguistic Theory

5, pp. 519-559.

Gibson, E. (1987),

Garden-Path Effects m a Parser with Parallel Architecture,

Eastern States Conference

on Linguistics, Columbus Ohio.

Gibson, E. (forthcoming),

Parsing with Principles: A Computational Theory of Human Sentence Process­

ing,

Ms., Carnegie Mellon University, Pittsburgh, PA.

Gibson, E.

k

Clark, R. (1987), “Positing Gaps in a Parallel Parser” ,

Proceedings of the Eighteenth North

East Linguistic Society Conference,

University of Toronto, Toronto, Ontario.

Kashket, M. (1987),

G o v e r n m e n t -

Binding Parser fo r Warlpin, a Free Word Order Language,

MIT M aster’s

Thesis, Cambridge, MA.

Kayne, R. (1983)

Connectedness and Binary Branching

, Foris, Dordrecht, The Netherlands.

Marcus, M. (1980),

A Theory o f Syntactic Recognition for Natural Language,

MIT Press, Cambridge, MA.

Nyberg, E. (1987), “Parsing and and the Acquisition of Word Order” ,

Proceedings of the Fourth Eastern

States Conference on Linguistics

, The Ohio State University, Columbus, OH.

Pollard, C.

k

Sag, I. (1987)

An Information-based Syntax and Semantics,

CSLI Lecture Notes Number 13,

Menlo Park, CA.

Pritchett, B. (1987),

Garden Path Phenomena and the Grammatical Basis of Language Processing,

Harvard

University Ph.D. dissertation, Cambridge, MA.

Sheiber, S. (1986)

An Introduction to Unification-based Approaches to Grammar,

CSLI Lecture Notes

Number 4, Menlo Park,

CA .

S to well, T. (1981),

Origins o f Phrase Structure,

MIT Ph.D. dissertation.

VVehrli, E. (1988), “Parsing with a GB G ram m ar” , in U. Reyle and C. Rohrer (eds.),

Natural Language

Parsing and Linguistic Theones,

177-201, Reidel, Dordrecht, the Netherlands.

Figure

Updating...

References

Updating...

Related subjects :