Phrase Structure Trees Bear More Fruit than You Would Have Thought

(1)

P h r a s e S t r u c t u r e T r e e s Bear M o r e Fruit

t h a n Y o u W o u l d H a v e T h o u g h t 1

Aravind K. Joshi

Department of Computer and Information Science

The Moore School/D2

University of Pennsylvania

Philadelphia, PA 19104

Leon S. Levy

Bell Telephone Laboratories

Whippany, NJ 07981

In this paper we will present several results concerning phrase structure trees. These results show that phrase structure trees, when viewed in certain ways, have much more descriptive power than one would have thought. W e have given a brief account of local constraints on structural descriptions and an intuitive proof o f a theorem about local constraints. We have compared the local constraints approach to some aspects o f Gazdar's framework and that of Peters and Ritchie and of Karttunen. W e have also presented some results on skeletons (phrase structure trees without labels) which show that phrase structure trees, even when deprived of the labels, retain in a certain sense all the structural informa- tion. This result has implications for grammatical inference procedures.

1. Introduction

T h e r e is renewed interest in examining the descriptive as well as generative p o w e r of phrase structure grammars. The primary motivation for this interest has come from the recent investigations in alternatives to t r a n s f o r m a t i o n a l grammars (e.g., B r e s n a n 1978; Kaplan and Bresnan 1979; G a z d a r 1978, 1979a, 1979b; Peters 1980; K a r t t u n e n 1980). 2 Some of these approaches require a m e n d m e n t s to phrase structure grammars (especially G a z d a r 1978, 1979a, 1979b; Peters 1980; K a r t t u n e n 1980) that increase their descriptive p o w e r without increasing their generative power. G a z d a r wants to restrict the p o w e r to that of c o n t e x t - f r e e languages. Others are not com-

l This work was partially s u p p o r t e d by N S F g r a n t M C S 79- 08401 a n d M C S 8 1 - 0 7 2 9 0 .

This paper is a revised and e x p a n d e d version of a p a p e r pres- e n t e d at the 18th A n n u a l M e e t i n g of the A s s o c i a t i o n for C o m p u t a - tion L i n g u i s t i c s , U n i v e r s i t y of P e n n s y l v a n i a , P h i l a d e l p h i a , J u n e 1980.

T h a n k s are e x t e n d e d to the two referees of this p a p e r a n d to Michael M c C o r d for their valuable c o m m e n t s , w h i c h helped in the i m p r o v e m e n t of b o t h the c o n t e n t a n d the p r e s e n t a t i o n of the m a t e - rial herein.

pletely precise on this aspect. Berwick has shown that the K a p l a n - B r e s n a n system is nearly equivalent to the

2 Since this p a p e r was s u b m i t t e d for publication, a n u m b e r of p a p e r s h a v e a p p e a r e d t h a t s h o u l d be o f i n t e r e s t to its readers. " P h r a s e L i n k i n g G r a m m a r s " by S. P e t e r s a n d R . W . R i t c h i e d e - scribes their s y s t e m (Technical R e p o r t , D e p a r t m e n t of Linguistics, U n i v e r s i t y of T e x a s at A u s t i n , 1982). Strong a d e q u a c y of c o n t e x t - free g r a m m a r s h a s b e e n d i s c u s s e d by J. B r e s n a n , R.M. K a p l a n , S. Peters, and A. Z a e n a n in " C r o s s - S e r i a l D e p e n d e n c i e s in D u t c h " (to a p p e a r in Linguistic Inquiry in 1982). T h i s p a p e r s h o w s t h a t c o n t e x t - f r e e g r a m m a r s are n o t s t r o n g l y a d e q u a t e (i.e., t h e y are unable to provide the a p p r o p r i a t e s t r u c t u r a l descriptions) to c h a r a c - terize cross-serial d e p e n d e n c i e s in D u t c h . In a r e c e n t p a p e r ( " H o w m u c h c o n t e x t - s e n s i t i v i t y is n e e d e d to provide r e a s o n a b l e structural descriptions: A tree adjoining s y s t e m for g e n e r a t i n g p h r a s e structure t r e e s , " p r e s e n t e d at the P a r s i n g W o r k s h o p , O h i o State U n i v e r - sity, M a y 1982; also T e c h n i c a l R e p o r t , D e p a r t m e n t of C o m p u t e r a n d I n f o r m a t i o n Science, U n i v e r s i t y of P e n n s y l v a n i a , 1982), A. Joshi d i s c u s s e s w e a k a n d s t r o n g a d e q u a c y of g r a m m a r s a n d p r o p o s - es a tree a d j o i n i n g g r a m m a r ( T A G ) t h a t a p p e a r s to be s t r o n g l y a d e q u a t e a n d h a s only slightly m o r e p o w e r t h a n c o n t e x t - f r e e g r a m - mars. Joshi h a s also given a r o u g h c h a r a c t e r i z a t i o n of a class of c o n t e x t - s e n s i t i v e l a n g u a g e ( M C S L ) t h a t a p p e a r s to be suitable to c h a r a c t e r i z e n a t u r a l l a n g u a g e s . L a n g u a g e s o f T A G b e l o n g to M C S L , a n d l a n g u a g e s of P L G ' s of P e t e r s a n d Ritchie also belong to this class. ( T A G ' s use a linking device similar to t h a t in t h e P L G ' s . )

Copyright 1982 by the A s s o c i a t i o n for C o m p u t a t i o n a l Linguistics. P e r m i s s i o n to copy w i t h o u t fee all or part of this material is g r a n t e d provided that the copies are n o t m a d e for direct c o m m e r c i a l a d v a n t a g e and the Journal r e f e r e n c e a n d this c o p y r i g h t notice are included on the first page. To copy otherwise, or to republish, requires a fee a n d / o r specific p e r m i s s i o n .

0 3 6 2 - 6 1 3 X / 8 2 / 0 1 0 0 0 1 - 1 1 5 0 1 . 0 0

(2)

Aravind K. Joshi and Leon S. Levy Phrase Structure Trees Bear M o r e Fruit

so-called indexed grammars. The power of the phrase linking g r a m m a r s of Peters and Ritchie is n o t c o m - pletely k n o w n at this time.

The notion of node admissibility plays an important role in these formulations. The earliest reference to node admissibility appears in C h o m s k y 1965 (p. 215); he suggests the possibility of constructing a rewriting system where the rewriting of a symbol is determined not only by the symbol being rewritten but also by the dominating c a t e g o r y symbol. In his analysis of the base c o m p o n e n t of a t r a n s f o r m a t i o n g r a m m a r , M c C a w l e y 1968 suggested that the appropriate role of c o n t e x t - s e n s i t i v e rules in the base c o m p o n e n t of a transformational g r a m m a r can be viewed as node admissibility c o n d i t i o n s on the base trees. The base c o m p o n e n t is thus a set of labeled trees satisfying certain conditions. Peters and Ritchie 1969 made this notion precise and proved an important result, which roughly states that the weak generative p o w e r of a c o n t e x t - s e n s i t i v e g r a m m a r is that of a c o n t e x t - f r e e grammar, if the rules are used as node admissibility conditions. L a t e r Joshi and L e v y 1977 made a substantial extension of this result and showed that, if the node admissibility conditions include Boolean combi- nations of proper analysis predicates and domination predicates, the weak generative capacity is still that of a c o n t e x t - f r e e grammar.

Besides the n o t i o n of node admissibility, G a z d a r i n t r o d u c e s two o t h e r notions in his f r a m e w o r k ( G e n e r a l i z e d Phrase Structure G r a m m a r s , G P S G ) . These are (1) categories with holes and an associated set of derived rules and linking rules, and (2) metarules for deriving rules from one another. The categories with holes and the associated rules do not increase the weak generative p o w e r b e y o n d that of c o n t e x t - f r e e grammars. The metarules, unless c o n - strained in some fashion, will increase the generative power, because, for example, a metarule can generate an infinite set of c o n t e x t - f r e e rules that can generate a strictly c o n t e x t - s e n s i t i v e language. (The language { a n b n c n / n > l } can be g e n e r a t e d in this way.) The metarules in the actual grammars written in the G P S G f r a m e w o r k so far are constrained enough so that they do not increase the generative power.

Besides node admissibility conditions, Peters 1980 introduces a device for "linking" nodes (see also Kart- tunen 1980). A lower node can be " l i n k e d " to a node higher in the tree and b e c o m e s " v i s i b l e " while the semantic i n t e r p r e t a t i o n is carried out at the lower node. The idea here is to let the c o n t e x t - f r e e g r a m m a r overgenerate and the semantic interpretation weed out ill-formed structures. K a r t t u n e n 1980 has developed a parser using this idea.

Kaplan and Bresnan 1979 have p r o p o s e d an inter- mediate level of representation called functional structure. This level serves to filter structures generated by

a phrase structure grammar. Categories with holes are not used in their framework. In this paper we will not be c o n c e r n e d with the K a p l a n - B r e s n a n system.

In Section 2 we briefly review G a z d a r ' s proposal, especially his notion of categories with holes. We give a short historical review of this notion.

In Section 3 we briefly describe our work on local constraints on structural descriptions (Joshi and L e v y 1977; Joshi, Levy, and Yueh 1980). We give an intuitive explanation of these results.

In Section 4 we p r o p o s e some e x t e n s i o n s of o u r results and discuss them in the c o n t e x t of some long distance rules. We also describe P e t e r s ' s 1980 ap- p r o a c h and present some suggestions for " c o n t e x t - sensitive" compositional semantics.

In Section 5 we briefly present the f r a m e w o r k of Peters and K a r t t u n e n and c o m p a r e it with that of G a z - dar and of ourselves.

In Section 6 we briefly discuss our results c o n c e r n - ing a characterization of structural descriptions entirely in terms of trees without labels.

2. G a z d a r ' s F o r m u l a t i o n

G a z d a r 1979 has introduced categories with holes and some associated rules in o r d e r to allow for the base generation of " u n b o u n d e d " dependencies. L e t V N be the set of

basic

nonterminal symbols. T h e n we define a set D(V N) of

derived

nonterminal symbols as follows.

D(VN) = { a / ~ I ct, fl e V N }

F o r example, if S and NP are the only two n o n t e r - minal symbols, then D(V N) would consist of S / S , S / N P , N P / N P , and N P / S . The intended interpretation of a derived c a t e g o r y (slashed c a t e g o r y or a cate- gory with a hole) is as follows: A node labeled

a/18

will dominate subtrees identical to those that can be d o m i n a t e d by a, except that s o m e w h e r e in every sub- tree of the

a/18

type there will occur a node of the form

~/fl

dominating a resumptive p r o n o u n , a trace, or the e m p t y string, and every node linking

a/fl

and /3//3 will be of the form

a/t9.

Thus

a/18

labels a node of type a that dominates material containing a hole of the type/3 (i.e., an extraction site in a m o v e m e n t analysis). F o r example, S / N P is a sentence that has an N P miss- ing somewhere. The derived rules allow the p r o p a g a - tion of a hole and the linking rules allow the introduction of a c a t e g o r y with a hole. F o r example, given the rule (1)

[ s N P V P ] 3 (1)

3 This is the same as the rule S-,,. NPVP

but written as a node admissibility condition.

(3)

Aravind K. Joshi and Leon S. Levy Phrase Structure Trees Bear More Fruit

we will get two derived rules (2) and (3)

[S/NP

NP/NP VP] (2)

[S/NP

NP VP/NP] (3)

An e x a m p l e of a linking rule is a rule (rule s c h e m a ) that introduces a c a t e g o r y with a hole as n e e d e d for topicalization.

[s ~ S/a] (4)

F o r a = P P this b e c o m e s

[s PP S/PP ] (5)

This rule will induce a structure like (6). T h e tech- nique of categories with holes and the associated derived and linking rules allows u n b o u n d e d d e p e n d e n c i e s to be a c c o u n t e d for in the p h r a s e structure r e p r e s e n t a - tion; h o w e v e r , this is a c c o m p l i s h e d at the e x p e n s e of proliferation of categories of the type a/[3 (see also K a r t t u n e n 1980). Later, in Section 3, we will p r e s e n t an alternate w a y of r e p r e s e n t i n g (6) by m e a n s of local constraints and some of their generalizations.

J

PP

P

NP

I

to

Bill

S

S / P P

NP

VPlPP

I

/ \

Mary

V

NP

I

gave

PP/PP

I

a book

¢

(6)

T h e notion of c a t e g o r i e s with holes is not c o m - pletely new. In his ' S t r i n g Analysis of L a n g u a g e S t r u c t u r e ' , H a r r i s 1956, 1962 i n t r o d u c e s c a t e g o r i e s such as S - NP or S Np (like S / N P of G a z d a r ) to a c c o u n t for m o v e d constituents. H e does not h o w e v e r seem to provide, at least not explicitly, m a c h i n e r y for carrying the " h o l e " d o w n w a r d s . H e also has rules in his f r a m e w o r k f o r i n t r o d u c i n g c a t e g o r i e s with holes. Thus, in his f r a m e w o r k , s o m e t h i n g like (6) would be accomplished b y allowing for a sentence f o r m (a cen- ter string) of the f o r m (7) (not entirely his notation).

NP V

£-NP

(7)

£ = O b j e c t or C o m p l e m e n t of V

Sager, who has c o n s t r u c t e d a very substantial p a r s e r starting f r o m some of these ideas and extending t h e m significantly, has allowed for the p r o p a g a t i o n of the ' h o l e ' resulting in structures very similar to those of G a z d a r . She has also used the notion of categories with holes in order to carry out some coordinate structure c o m p u t a t i o n . F o r example, Sager allows for the coordination of S / a and S / a but not S and S/a. (See Sager 1967 for an early r e f e r e n c e to her work.)

G a z d a r is the first, h o w e v e r , to i n c o r p o r a t e the notion of c a t e g o r i e s with holes a n d the a s s o c i a t e d rules in a f o r m a l f r a m e w o r k for his syntactical t h e o r y and also to exploit it in a s y s t e m a t i c m a n n e r for ex- plaining c o o r d i n a t e structure p h e n o m e n a .

3. Local C o n s t r a i n t s

In this section we briefly review our w o r k on local constraints. A l t h o u g h this w o r k has already a p p e a r e d (Joshi and L e v y 1977, Joshi, L e v y , and Yueh 1980) and a t t r a c t e d s o m e a t t e n t i o n recently, the d e m o n s t r a - tion of our results has r e m a i n e d s o m e w h a t inaccessible to m a n y due to the technicalities of the tree a u t o m a t a theory. In this p a p e r we p r e s e n t an intuitive a c c o u n t of these results in t e r m s of interacting finite state m a - chines.

T h e m e t h o d of local c o n s t r a i n t s is an a t t e m p t to d e s c r i b e c o n t e x t - f r e e l a n g u a g e s in an a p p a r e n t l y c o n t e x t - s e n s i t i v e f o r m that helps to retain the intuitive insights a b o u t the g r a m m a t i c a l structure. This f o r m of description, while a p p a r e n t l y c o n t e x t - s e n s i t i v e , is in fact c o n t e x t - f r e e .

3.1 D e f i n i t i o n o f Local C o n s t r a i n t s

C o n t e x t - s e n s i t i v e g r a m m a r s , in general, are m o r e p o w e r f u l (with r e s p e c t to w e a k g e n e r a t i v e c a p a c i t y ) t h a n c o n t e x t - f r e e g r a m m a r s . A fascinating result of Peters and Ritchie 1969 is that if a c o n t e x t - s e n s i t i v e g r a m m a r G is used for " a n a l y s i s " then the language " a n a l y z e d " b y G is c o n t e x t - f r e e . First, we describe w h a t we m e a n b y the use of a c o n t e x t - s e n s i t i v e g r a m - m a r G for " a n a l y s i s " . G i v e n a tree t, we define the set of proper analyses of t. R o u g h l y speaking, a p r o p e r analysis of a tree is a slice across the tree. M o r e precisely, the following recursive definition applies:

(4)

Aravind K. Joshi and Leon S. Levy Phrase Structure Trees Bear M o r e Fruit

Definition 3.1. The set of proper analyses of a tree t, denoted Pt, is defined as follows.

(i) If t = q~ (the e m p t y tree), then

Pt = Cp.

( i i ) I f t =

A

then Pt = { A } u P(to).P(t 1) . . . P(t n)

where to, t 1 . . . . t n are trees, and '.' denotes c o n c a - tenation (of sets).

E x a m p l e 3.1

p11rA~#2 ( p l , P 2 e V * ) .

The c o n t e x t u a l c o n d i t i o n associated with such a " v e r t i c a l " p r o p e r analysi~ is called a domination predicate.

The general form of a local constraint combines the proper analysis and d o m i n a t i o n predicates as follows:

Definition 3.2. A local constraint rule is a rule of the form

A -* u / C A

where C A is a B o o l e a n c o m b i n a t i o n of proper analysis and d o m i n a t i o n predicates.

In transformational linguistics the context-sensitive and d o m i n a t i o n predicates are used to describe conditions on transformations; hence we have referred to these local constraints elsewhere as local t r a n s f o r m a - tions.

S

/ \

A

B

'=

/ \

I

C

d

E

C e

pt = {S, AB, AE, Ae, CdB, CdE, Cde, cdB, cdE, cde}.

L e t G be a context-sensitive grammar; i.e., its rules are of the form

A -" ~/~r__~

where A c V - E (V is the alphabet and E is the set of terminal symbols), w e V + (set of non-null strings on V) and ~r, 4~ E V* (set of all strings o n V ) . If ~r and are b o t h null, then the rule is a c o n t e x t - f r e e rule. A tree t is said to be " a n a l y z a b l e " with respect to G if for each node of t some rule of G " h o l d s " . It is obvious how to check whether a c o n t e x t - f r e e rule holds of a node or not. A context-sensitive rule A -- ~/~r ~ holds of a node labeled A if the string corresponding to the immediate descendants of that node is t~ and there is a p r o p e r analysis of t of the form p17rAdpp 2

that "passes t h r o u g h " the node, (Pl,P2 E V*). We call the c o n t e x t u a l c o n d i t i o n qr ff a proper analysis predicate.

Similar to these context-sensitive rules, which allow us to specify context on the " r i g h t " and " l e f t " , we often need rules to specify context on the " t o p " or " b o t t o m " . Given a node labeled A in a tree t, we say that DOM(~r q0, qr, ~ E V*, holds of a node labeled A if there is a path from the root of the tree to the frontier, which passes through the node labeled A, and is of the form

3.2 R e s u l t s on Local C o n s t r a i n t s

Theorem 3.1 (Joshi a n d L e v y 1977) L e t G be a finite set of local constraint rules and z(G) the set of trees analyzable by G. (It is assumed here that the trees in +(G) are sentential trees; i.e., the r o o t node of a tree in ~-(G) is labeled by the start symbol, S, and the terminal nodes are labeled by terminal symbols.) T h e n the string language L(z(G)) = { x l x is the yield of t and t E ~-(G)} is context-free.

E x a m p l e 3.2 L e t V = { S , T , a , b , c , e } a n d Y. = { a , b , c , e } ,

and G be a finite set of local constraint rules: 1. S - - e

2. S --- aT 3. T ~ a S

4. S -- b T c / ( a ) A D O M ( T ) 5. T - ~ b S c / ( a ) A D O M ( S )

In rules 1, 2, and 3, the context is null, and these rules are context-free. In rule 4 (and in rule 5), the constraint requires an ' a ' on the left, and the node d o m i n a t - ed (immediately) by a T (and by an S in rule 5).

The language generated by G can be derived by G l:

S - * e S - - a T 1

S --, aT T --- aS 1

T --,- aS T 1 --,. b S c S 1 --,. b T c

In G 1 there are additional nonterminals S 1 and T 1 that enable the c o n t e x t c h e c k i n g of the local c o n s t r a i n t s grammar, G, in the generation process.

It is easy to see that, u n d e r the h o m o m o r p h i s m that r e m o v e s subscripts on the n o n t e r m i n a l s T 1 a n d S 1, each tree generable in G 1 is analyzable in G. Also, each tree a n a l y z a b l e in G has a h o m o m o r p h i c pre- image in G 1.

The m e t h o d s used in the p r o o f of the t h e o r e m use tree a u t o m a t a to check the local constraint predicates,

(5)

Aravind K. Joshi and Leon S. Levy Phrase Structure Trees Bear M o r e Fruit

since tree a u t o m a t a used as recognizers accept only tree sets whose yield languages are context-free.

We now give an informal introduction to the ideas of ( b o t t o m - u p ) tree automata. Tree a u t o m a t a process labeled trees, where there is a left-to-right ordering on the successors of a node in the tree. W h e n all the successors of a node v have been assigned states, then a state is assigned to v by a rule that depends on the label of v and (he states of the successors of v consid- ering their left-to-right ordering. N o t e that the auto- m a t o n may immediately assign states to the nodes on the frontier of the tree since these nodes .have no successors. If the set of states is partitioned into final and non-final states, then a tree is a c c e p t e d by the a u t o m a t o n if the state assigned to the root is a final state. A set of trees accepted by a tree a u t o m a t o n is called a recognizable set. N o t e that the a u t o m a t o n m a y operate non-deterministically, in which case, as usual, a tree is accepted if there is some set of state assign- ments leading to its acceptance.

The importance of tree a u t o m a t a is that they are related to the sets of derivation trees of context-free grammars. Specifically, if T is the set of derivation trees of a context-free grammar, G, then there is a tree a u t o m a t o n that recognizes T. Conversely, if T is the set of trees recognized by a tree a u t o m a t o n , A, then T may be systematically relabeled as the set of derivation trees of a context-free grammar.

The basic idea presented in detail in Joshi and L e v y 1977 is that, because tree a u t o m a t a have nice closure properties (closure under union, intersection, and con- catenation), they can do the computations required to check the local constraints.

A n o t h e r way of looking at the checking of a labeled tree by a tree a u t o m a t o n is as follows. We im- agine a finite state machine sitting at each node of a tree. The role of the finite state machine is to check that a correct rule application is made at the node it is checking. Initially, the nodes on the frontier are turned on and signal their parent nodes. At any other node in the tree, the machine at that node is turned on as soon as all its direct descendants are active. Assum- ing that at each node the machine for that node has c h e c k e d that the rule applied there was one of the rules of the context-free grammar we are looking for, then when the root node of the tree signals that it has correctly checked the root we k n o w that the tree is a proper tree for the given context-free grammar.

W h e n checking for local constraints, a machine at a given node not only passes information to its parent, as described above, but also passes information a b o u t those parts of the local constraints, corresponding to the given node as well as all its descendants, that have not yet been satisfied. The point is that this informa-

tion is always b o u n d e d and hence a finite n u m b e r of states are adequate to code this information.

The fact that the closure p r o p e r t i e s hold can be seen as follows. Consider a slightly more general situation. We consider an A machine and a B machine at each node. D e p e n d i n g on the c o n n e c t i o n s b e t w e e n these A and B machines, we obtain additional results. F o r example, as each A machine passes information to its parent, it m a y also pass information to the B machine, but the [3 m a c h i n e will n o t pass i n f o r m a t i o n back to the A machine. The tree is accepted if the B machine .at the root node of the tree ends up in a final state. A l t h o u g h this seems to be a more complicated model, it can in fact be subsumed in our first model and is the basis of an informal p r o o f that the recogni- tion rules are closed u n d e r intersection, since the A machine and the [3 machine can check different rules.

A n important point is that the local constraint on a rule applied at a given node m a y only be verified by the checking a u t o m a t a at some distant ancestor of that node. In particular, in the case of a proper analysis constraint, it can only be verified at a node sufficiently high in the tree to dominate the entire string specified in the constraint.

The perceptive reader m a y n o w be wondering what replaces all these h y p o t h e t i c a l finite state m a c h i n e s w h e n the set of trees c o r r e s p o n d s to a c o n t e x t - f r e e grammar. Well, if we were to c o n v e r t our local constraints g r a m m a r into a s t a n d a r d f o r m c o n t e x t - f r e e grammar, we would require a larger nonterminal set. In effect this larger nonterminal set is an encoding of the finite state information stored at the nodes.

The intuitive explanation presented in this section is, in fact, a complete characterization of recognizabili- ty. Given a c o n t e x t - f r e e grammar, one can specify the finite state machine to be posted at each node of a tree to check the tree. A n d conversely, given the finite state m a c h i n e description, o n e can derive the equivalent context-free grammar.

The essence of the local constraints formulation is to paraphrase the finite state checking at the nodes of the tree in terms of patterns or predicates.

4. S o m e G e n e r a l i z a t i o n s

The result of T h e o r e m 3.1 can be generalized in various ways. Generalizations in (i) and (ii) below are immediate.

(i) Variables can be included in the constraint. Thus, for example, a local constraint rule can be of the form

A --,,. w I B C D X E FYG

where A,B,C,D.E,F,G are nonterminals, w is a string of terminals a n d / o r nonterminals, and X and Y are variables that range over a r b i t r a r y strings of terminals and nonterminals.

(6)

Aravind K. Joshi and Leon S. Levy Phrase Structure Trees Bear M o r e Fruit

(ii) Finite tree f r a g m e n t s can be included in the constraint. Thus, for example, a local constraint rule can be of the f o r m

A - , - w

BG

,

P P

/ \

I X

C

D

P

N P

I

to

A n o t h e r useful g e n e r a l i z a t i o n has the following essential character.

(iii) Predicates that relate nodes mentioned in the prop- er analysis predicates and domination predicates

(associated with a rule), as well as nodes in finite tree f r a g m e n t s d o m i n a t e d b y these n o d e s , can be included in the constraint. U n f o r t u n a t e - ly, at this time we are unable to give a precise c h a r a c t e r i z a t i o n of this generalization. T h e following two predicates are special cases of this generalization, and T h e o r e m 3.1 holds f o r these two cases.

(a) C O M M A N D COM (A B C)

B i m m e d i a t e l y d o m i n a t e s A and B d o m i n a t e s C, not necessarily immediately. Usually B is an S node.

(b) L E F T M O S T S I S T E R L M S ( A B)

A is the l e f t m o s t sister of B

E x a m p l e 4.1

L e t us consider (6) in Section 2 (topicalization).

C o n s i d e r the following f r a g m e n t of a local constraints g r a m m a r , G. (Only s o m e of the relevant rules are shown.)

S .-,,. N P V P VP .-,. V NP PP S' --- PP S

V -.,. give I NP PP

PP--,,. c I PP1 X V 2 NP 3 __

A D O M (S' 4 S 5 Y VP 6 ) A COM (PP1 S'4 V P 6 ) - - /k LMS (PP1 $5)

T h e last rule has a p r o p e r analysis predicate, a domination predicate, and C O M M A N D and L E F T M O S T - S I S T E R p r e d i c a t e s w h o s e a r g u m e n t s satisfy the re- q u i r e m e n t s m e n t i o n e d in (iii) a b o v e (i.e., t h e y relate n o d e s m e n t i o n e d in p r o p e r analysis p r e d i c a t e s and d o m i n a t i o n p r e d i c a t e s ) . T h e indexing m a k e s this clear.

Structure (7) will be w e l l - f o r m e d . C o m p a r e (7) with (6) in Section 1 ( G a z d a r ' s f r a m e w o r k ) and with (8) in Section 5 ( P e t e r s ' s and K a r t t u n e n ' s f r a m e w o r k ) .

J

PPI

/ \

P

NP

I

to

Bitt

Ss

/

NP

V P 6

/

M o r y

V 2

N P 3

P P

g a v e

o b o o k

¢

(7)

A r a v i n d K. J o s h i and Leon S. L e v y P h r a s e S t r u c t u r e T r e e s B e a r M o r e F r u i t

4.1 Local Constraints in Semantics

Since a local constraint rule has a c o n t e x t - f r e e p a r t and contextual constraint part, it is possible to define c o n t e x t - s e n s i t i v e c o m p o s i t i o n a l s e m a n t i c s in the following m a n n e r .

F o r a c o n t e x t - f r e e rule of the f o r m

A--,- BC

if o(A), o(B), o(C) are the ' s e m a n t i c ' translations associated with A, B, and C, respectively, t h e n o(A) is a c o m p o s i t i o n of o(B) and o(C).

F o r a local constraint rule of the f o r m

A - , - BC I P

where A -* BC is the c o n t e x t - f r e e part and P is the contextual constraint, we can have o(A) as a c o m p o s - ition of o(B) and o(C), which d e p e n d s on P. This idea has b e e n pursued in the c o n t e x t of p r o g r a m m i n g languages (Joshi, L e v y , and Yueh 1980). W h e t h e r such an a p p r o a c h would be useful for natural language is an o p e n question. ( A n additional c o m m e n t a p p e a r s in Section 5.)

5. Linked Nodes 4 (Peters's and Kartunnen's frame- work)

Peters 1980 and K a r t u n n e n 1980 have p r o p o s e d a device for linking nodes to handle u n b o u n d e d d e p e n - dencies. Thus, for example, instead of (6) or (7), we have (8).

S

!

/

(8)

. f - p p

$

,-

/ \

/

\

/

P

N P

NP

VP

I

/

I

to

B i l l

M a r y

V

NP

",,

\

g a v e a book

I

S

T h e dotted line that loops f r o m the VP node b a c k to the m o v e d constituent is a w a y of indicating the loca- tion of the gap in the o b j e c t position under the VP. T h e link also indicates that there is certain d e p e n d e n c y b e t w e e n the gap and the dislocated element. Both in our a p p r o a c h and that of Peters and K a r t t u n e n , proliferation of categories as in G a z d a r ' s a p p r o a c h is avoid- ed. Further, for Peters and K a r t t u n e n , while carrying

4 We give a very informal description of a linked tree. A precise definition can be f o u n d in S. Peters and R.W. Ritchie, Phrase Linking Grammars, Technical R e p o r t , D e p a r t m e n t of Lin- guistics, University of Texas at Austin, 1982.

out b o t t o m - u p s e m a n t i c translation, the m o v e d constit- u e n t is " v i s i b l e " at the VP node. In our a p p r o a c h , this "visibility" can be o b t a i n e d if the translation is m a d e to d e p e n d o n the c o n t e x t u a l c o n s t r a i n t which, of course, has already b e e n c h e c k e d prior to the translation. This is the essence of our suggestion in Section 4.1.

K a r t t u n e n 1980 has c o n s t r u c t e d a p a r s e r i n c o r p o - rating the device of linked nodes. K a r t t u n e n also dis- cusses the p r o b l e m of c o m p l e x p a t t e r n s of m o v e d c o n - stituents and their associated gaps or r e s u m p t i v e p r o - nouns. This is not easy to handle in G a z d a r ' s f r a m e - w o r k without multiplying the categories e v e n further, e.g., b y providing categories such as S / N P NP, etc. 5 K a r t t u n e n handles this p r o b l e m by essentially i n c o r p o - rating the checking of the p a t t e r n s of gaps and fillers in the parser, i.e., in the control structure of the p a r - ser.

O u r a p p r o a c h can be r e g a r d e d as s o m e w h a t inter- m e d i a t e b e t w e e n G a z d a r ' s and t h a t of P e t e r s a n d K a r t t u n e n in the following sense. We avoid multipli- cation of categories as do Peters and K a r t t u n e n . O n the o t h e r hand, the r e l a t i o n s h i p b e t w e e n the m o v e d c o n s t i t u e n t and the gap is e x p r e s s e d in the g r a m m a r itself ( m o r e in the spirit of G a z d a r ) instead of in the p a r s e r ( m o r e precisely, in the d a t a structure c r e a t e d b y the p a r s e r ) as in the Peters and K a r t t u n e n a p p r o a c h .

We h a v e not p u r s u e d the topic of multiple gaps and fillers in our f r a m e w o r k but, obviously, in it we would opt for K a r t t u n e n ' s suggestion of checking the constraints on the p a t t e r n s of gaps and fillers in the p a r s e r itself. It could not be done b y local constraints alone b e c a u s e local constraints essentially do the w o r k of the links in the Peters and K a r t t u n e n f r a m e w o r k .

6. Skeletal Structural Descriptions (Skeletons)

In Section 4, we s h o w e d h o w local constraints allowed us to p r e v e n t proliferation of categories. We can dispense with the local c o n s t r a i n t s and construct an equivalent c o n t e x t - f r e e g r a m m a r that would h a v e potentially a very large n u m b e r of categories. While pursuing the relation b e t w e e n ' s t r u c t u r e ' and the size of the n o n t e r m i n a l v o c a b u l a r y (i.e., the syntactic c a t e - gories), we were led to the following surprising result: the actual labels, in a sense, c a r r y no i n f o r m a t i o n . (This result was also used b y us in developing s o m e heuristics for converting a c o n t e x t - f r e e g r a m m a r into a m o r e c o m p a c t but equivalent local constraints g r a m - mar. We will not describe this use of our result in the p r e s e n t p a p e r . ( F o r f u r t h e r i n f o r m a t i o n , see Joshi, L e v y , and Y u e h 1980.)

First we n e e d some definitions. A phrase structure tree w i t h o u t labels will be called a skeletal structural

5 S / N P NP m e a n s an S tree with two NP type holes.

(8)

Aravind K. Joshi and Leon S. L e v y Phrase S t r u c t u r e Trees Bear M o r e Fruit

description or a skeleton. A skeleton exhibits all of the grouping structure without naming the syntactic categories. F o r example, (9) is a skeleton. The structural description is characterized only by the shape of the tree and not the associated labels. The only symbols appearing in the structure are the terminal symbols (more precisely, the preterminal symbols and the terminal symbols, in the linguistic c o n t e x t , as in (10); however, for the rest of the discussion, we will take skeletons to mean trees with terminal symbols only).

L e t G be a c o n t e x t - f r e e grammar and let T G be the set of sentential d e r i v a t i o n trees (structural descriptions) of G. L e t S G be the skeletons of G, i.e., all trees in T G with the labels removed.

It is possible to show that for e v e r y c o n t e x t - f r e e g r a m m a r G we can c o n s t r u c t a skeletal g e n e r a t i n g system (consisting of skeletons and skeletal rewriting rules) that generates exactly SG; i.e., all c a t e g o r y labels can be eliminated while retaining the structural i n f o r m a t i o n ( L e v y and Joshi 1979).

Bitt

J

ove

/ \

a b o o k

/ \

to

Mary

(9)

/

N

I

Bill

V

• gave

DET

N

P

N

I

a

book

to

Mary

(10)

(9)

Aravind K. Joshi and Leon S. Levy Phrase Structure Trees Bear More Fruit

E x a m p l e 6.1

G : S ~ aSb S --,. ab SG:

/ \

/ I \

a b a , b

/X

a b

(1) ( 2 )

A skeletal g e n e r a t i n g s y s t e m can be c o n s t r u c t e d as follows. We have a finite set of initial skeletons and a finite set of skeletal rewriting rules w h o s e l e f t - h a n d and right-hand sides are skeletons.

Initial Skeletons

/ \

a

b

Skeletal rewriting rules

a b a , b

/ \

a b

In this s y s t e m , g e n e r a t i o n p r o c e e d s f r o m an initial skeleton t h r o u g h a sequence of i n t e r m e d i a t e skeletons to the desired skeleton. Clearly, b e c a u s e of the definition of a skeleton and the nature of the skeletal rewriting rules, the rules must always apply to one of the l o w e r m o s t c o n f i g u r a t i o n s in a s k e l e t o n t h a t m a t c h e s with the l e f t - h a n d side of a rule. Thus the derivation of the skeleton (3) in S G would be as in (11). T h e configurations encircled b y a d o t t e d line are the ones to which the skeletal rule is applied.

/ I X

a • b

/ 1 \

a , b

/ \

a b

• • •

( 3 ) = = •

In the a b o v e example, there was only one n o n t e r - minal; h e n c e the result is obvious. F o l l o w i n g is a s o m e w h a t m o r e c o m p l i c a t e d example.

E x a m p l e 6.2

G : E - ~ a E - . ( E )

E - - E + E / ~ ( + )A~ ( * ) A ~ ( * ) E - ~ E * E / ~ ( + )

G is a local c o n s t r a i n t s g r a m m a r . Clearly there is a c o n t e x t - f r e e g r a m m a r G' t h a t is e q u i v a l e n t to G. R a t h e r t h a n taking a c o m p l i c a t e d c o n t e x t - f r e e g r a m - m a r and t h e n exhibiting the equivalent skeletal g r a m - mar, we will take the local constraints g r a m m a r G and exhibit a skeletal g r a m m a r equivalent to G. This will allow us to p r e s e n t a c o m p l i c a t e d e x a m p l e w i t h o u t m a k i n g the resulting skeletal g r a m m a r t o o unwieldy. Also, this e x a m p l e will give s o m e idea a b o u t the rela- tionship b e t w e e n local c o n s t r a i n t s g r a m m a r s and skeletal g r a m m a r s ; in particular, the skeletal rewriting rules indirectly e n c o d e the local constraints in the rules in E x a m p l e 6.2.

We have eliminated all labels b y introducing structural rewriting rules a n d defining the d e r i v a t i o n as p r o c e e d i n g f r o m skeleton to skeleton r a t h e r t h a n f r o m string to string. This result clearly brings out the rela- tionship b e t w e e n the grouping structure and the syntactic categories labeling the nodes.

f • ~ • •

' / \

'

/ I N

l a b l a . . . a ,

,

_, _,

,

_{, - 7 \ ' . .}

j ! \

( a

b ,

, ,

/ \

a

b

(11)

(10)

A r a v i n d K. J o s h i and Leon S. L e v y Phrase S t r u c t u r e T r e e s Bear M o r e Fruit

Skeletal grammar equivalent to G:

Initial Skeletons:

/ \

/i\

t

1

1 I

G Q Q Q

Skeletal Rewriting Rules:

/ \

i ' i

Q Q

/ \

• 4. •

I

Q O

. / i \

I

T

Q Q

/ \

(1 G

G

/',~ T

i ' i

°

Q Q

/ \

• ¢ •

" / \

j . j

o

Q Q

./!\.

I

./l\.

°

I

0 0

/ \

T

r ' r

°

Q Q

/ \

'

T

'

~

'/i\'

°

i ' i

C1 Q

/ \

°

i ' T

Q Q

(11)

Aravind K. J o s h i and Leon S. Levy Phrase Structure Trees Bear M o r e Fruit

Since skeletons p a y a t t e n t i o n to grouping only, this result m a y be psycholinguistically i m p o r t a n t b e c a u s e our first intuition a b o u t the structure of a s e n t e n c e is m o r e likely to be in t e r m s of the grouping structure and not in terms of the c o r r e s p o n d i n g syntactic categories, especially those b e y o n d the p r e t e r m i n a l categories.

T h e t h e o r y of s k e l e t o n s m a y also p r o v i d e s o m e insight into the p r o b l e m of g r a m m a t i c a l inference. F o r a finite state string a u t o m a t o n , it is well k n o w that if the n u m b e r of states is 2k then, if we are p r e s e n t e d with all a c c e p t a b l e stings of length < 2 k , the finite state a u t o m a t o n is c o m p l e t e l y determined. We h a v e a similar situation with the skeletons. First, it c a n be s h o w n that for e a c h skeletal set S G (i.e., the set of skeletons of a c o n t e x t - f r e e g r a m m a r ) we can construct a b o t t o m - u p tree a u t o m a t o n that recognizes precisely S G ( L e v y and Joshi 1978). Further, if the n u m b e r of states of this a u t o m a t o n is k, then the set of all ac- c e p t a b l e sets of s k e l e t o n s of d e p t h _<2k c o m p l e t e l y d e t e r m i n e s 5 G ( L e v y and Joshi 1979). Using skeletons (i.e., string with their grouping structure) r a t h e r t h a n just strings as input to a g r a m m a t i c a l inference machine is an idea w o r t h pursuing further.

6. C o n c l u s i o n :

We h a v e p r e s e n t e d several results c o n c e r n i n g phrase structure trees that show that p h r a s e structure trees w h e n viewed in c e r t a i n w a y s h a v e m u c h m o r e descriptive p o w e r t h a n one would have thought. We have given a brief a c c o u n t of our w o r k on local constraints and p r e s e n t e d an intuitive proof. We have also c o m p a r e d it to s o m e aspects of the f r a m e w o r k of G a z - dar and that of Peters and K a r t t u n e n . We h a v e also shown that phrase structure trees, e v e n w h e n d e p r i v e d of the labels, retain in a certain sense all the structural information. This result has implications for g r a m m a t - ical inference procedures.

R e f e r e n c e s

Bresnan, J.W. 1978 A Realistic transformational grammar. In Halle, M., Bresnan, J. and Miller, G.A., Ed., Linguistic Theory and Psychological Reality. The MIT Press, Cambridge, Mass.

Chomsky, N. 1965 Aspects of the Theory of Syntax, The MIT Press, Cambridge, Mass.

Gazdar, G.J.M. 1978 English as a context-free language, unpubl- ished manuscript.

Gazdar, G.J.M. 1981 Unbounded dependencies and coordinate structure. Linguistic Inquiry 12, 2.

Gazdar. G.J.M. 1982 Phrase Structure Grammar. To appear in Jaeobson, P. and Pullam, G.K., Ed., The Nature of Syntactic Representation. Reidel, Boston, Mass.

Harris, Z.S. 1956 String analysing of language structure, manuscript. Also in manuscripts around 1958 and published in 1962 by Mouton and Co., The Hague.

Joshi, A.K. and Levy, L.S. 1977 Constraints on structural descriptions. S l A M Journal of Computing.

Joshi, A.K., Levy, L.S., and Yueh, K. 1980 Local constraints in the syntax and semantics of programming language. Journal of Theoretical Computer Science.

Kaplan, R. and Bresnan, J.W. 1979 A formal system for grammatical representation. To appear in Bresnan, J.W., Ed., The Men- tal Representation of Grammatical Relations. The MIT Press, Mass.

Karttunen, L. 1980 Unbounded dependencies in phrase structure grammar: slash categories versus dotted lines. Paper presented at the Third Amsterdam Colloquium: Formal Methods in the Study of Language, Amsterdam.

Levy, L.S. and Joshi, A.K. 1978 Skeletal structural descriptions. Information and Control

McCawley, J.D. 1967 Concerning the base component of a transformational grammar. Foundations of Language, Vol. 4. Reprint- ed in McCawley, J.D., Grammar and Meaning, Academic Press, New York, NY, 1968.

Peters, S. 1980 (Talk presented at the Workshop on Alternatives to Transformation Grammars, Stanford University, January.) Peters, S. and Ritchie, R.W. 1969 Context-sensitive immediate

constituent analysis. Proc. A C M Symposium on Theory of Computing.

Sager, N. 1967 Syntactic analysis of natural languages. Advances in Computers, Vol. 8 ed. M. Alt and M. Rubinoff, Academic Press, New York, NY.

Aravind K. Joshi is a professor o f computer and information science, and that department's chairman, at the University o f Pennsylvania, Philadelphia. He re- ceived the Ph.D. degree in electrical engineering from the University o f Pennsylvania in 1960.

Leon S. Levy is a member o f the technical staff o f Bell Telephone Laboratories at Whippany, New Jersey. He received the Ph.D. degree in Computer Science from the University of Pennsylvania in 1970.