LICENSING GRAMMAR, PROCESSING AND EXTRACTION

5.1- Introduction

In the previous chapter I outlined a model o f dependency syntax. Licensing Grammar (LG), a theory which was shown to differ substantially from other dependency-based frame works, notably in its failure to recognise distinct GRs and in its assumed disparity between syntactic and semantic structure. In this final chapter I wish to return to the central topic of this thesis and explore how the theory o f LG might serve as the basis o f a holistic and principled account for the extraction data examined in the first three chapters. However, as I suggested at the end o f chapter 3 , 1 believe that a purely structural syntactic account o f extraction may be inappropriate, and I will argue here that in order to capture many o f the intricacies o f the data we must look to issues o f language processing as well as syntax. One o f the main concerns o f this chapter will thus be to show how the syntactic mechanisms o f LG underpinning extraction can be conditioned and constrained by the operation o f the parser. I believe that only in this way is it possible to account for the scalar nature o f grammaticality judgements associated with extraction phenomena, as well as some o f the pragmatic influences which affect the data.

I have suggested that extraction data can be viewed as a sort o f benchmark test against which the adequacy o f competing theories may be evaluated. My aim here, then, is to demonstrate that a relatively simple, though unconventional, dependency-based theory of syntax such as LG, when supplemented with processing factors, can offer a plausible and effective account o f these data, and one which in my opinion goes beyond anything formulated in other dependency frameworks. In this way I hope to demonstrate for the first time that a dependency theory might after all be able to compete on an equal footing with PP Theory in its treatment o f extraction.

In section 5.2 I will outline some key issues in language processing and explain in greater detail why I believe that these are relevant to extraction data. Section 5.3 will then describe a simple LG-based parsing system; I will show, among other things, how the operation o f this parser may derive the adjacency constraint which was noticeably absent in LG itself. Section 5.4 introduces LG’s syntactic approach to extraction, and denionstrates how this

can be constrained by the operation o f the parser. Finally in sections 5.5 and 5.6 I will show how island constraints as well as extraction asymmetries can be explained in terms of processing difficulty.

5.2 - Some Processing-Related Issues

5.2.1 - The need for processing

LG, like WG, is a monostratal dependency theory, and thus any account o f extraction advanced within this model will have to involve a single surface structure o f binary relations alone. Unlike WG, however, LG is a ‘mono-relational’ theory which does not recognise distinct GR’s, and thus will evidently not be able to invoke a separate relation such as WG’s Visitor in order to help account for the long-distance dependencies arising from extraction. From this point o f view, then, LG is doubly constrained in that in seeking to account for extraction phenomena the theory can only call upon a single level o f syntactic representation composed o f a single relation. Consider, for example, the sentence in (1):

1) What did Wally wear?

Given the more constrained principles o f licensing upon which LG is based, the only possible structure for (1) will be (2):

t r

What [FIN do] Wally wear

Assuming that “what” is a syntactic (and semantic) argument o f “wear”, this will be the only word capable o f licensing “what”; no other word in (2) can actively license it since none of them requires any other dependent. Similarly, “what” cannot be passively licensed by another word in (2) for the simple reason that it already has a head in the form o f “wear”; as I described in section 4.3.6, a passive (dependent-sponsored) licensing relation occurs in cases where a dependent requires a head, but the head does not require a dependent. This is obviously inapplicable in the case o f “what”, which is actively licensed by “wear”.

I have already described in the third chapter how analyses like the one in (2) are inadequate. For one thing they offer no explanation as to why the WH-object o f “wear” occurs so far from its head at the front o f the sentence. (2) also violates well-motivated principles of locality, in that “wear” and its dependent “what” are separated by a number of words which bear no direct relation to them. Moreover, this analysis is totally unconstrained, and there is

nothing in (2) to prevent the extraction dependency from spanning an ‘island’ domain such as (3):

*what [FIN do] Wally arrive while he [FIN be] [ING wear] *‘What did Wally arrive while he was wearing?’

In chapters 2 and 3 I described how PP Theory and WG both overcome problems o f this sort by reducing the long distance dependencies between heads and extractees to a series o f smaller local ones, either by a system o f intermediate traces in the case o f PP Theory or by the use of ancillary visitor dependencies in the case o f WG. The question, then, is whether a similar localisation o f long distance dependencies can also be achieved in LG, given that the theory incorporates neither transformational mechanisms nor allows for the existence o f additional dependencies or distinct GRs.

I believe that the key to this question lies in processing. More specifically, I believe that although LG provides the syntactic means o f coping with extraction, this can only be properly understood by examining how syntactic structures are processed. Very simply, I will argue that LG may allow the licensing capacity of one word to be transferred to another. Thus FIN in (2) above, for example, could license “what” through a valency specification transferred from “wear”. I will describe this system o f valency transferal (VT) more fully in section 5.4. My claim will be, however, that the syntactic process o f VT is executed ‘online’ by the parser. Furthermore, I wül argue that the parser can only carry out each application o f VT locally, (in a sense to be defined later). I believe that this procedural approach to syntax may thus allow a series o f temporary dependencies to be established between an extractee and its head, thus localising the distance between them.

Relating extraction to processing in this way has a number o f advantages; for one thing it raises the possibility that ‘island’ violations could be explained in terms o f processing difficulty rather than ungrammaticality. This in turn implies a less absolute and more relative approach to these data; after all, some things can be ‘more difficult’ than others, and this more flexible view seems better-suited to the often scalar nature of island violations discussed in sections 2.5 and 3.5; one o f the reasons why I believe that a purely syntactic account of extraction is inappropriate is that it implies an absolute distinction between what is grammatical and ungrammatical. Furthermore, I will also argue that incorporating processing-related factors into the LG account of extraction offers us a realistic hope o f taking into account some o f the

pragmatic influences on the data, discussed in the first chapter, which would otherwise be very difficult to accommodate within a purely syntactic fi*amework.

The rest o f this chapter will be devoted to exploring how simple processing factors may relate to syntax in LG, and how the two may be shown to interact in offering a plausible account for extraction phenomena. I should stress at this point, however, that by this I am not seeking to mask any shortcomings o f LG by appealing to processing. Nor is it my intention here to put forward a rigorous, fully-defined or necessarily implementable model o f parsing; the main emphasis o f this chapter is syntax and how it can be constrained and conditioned by processing considerations, and for this reason I will try to keep parsing-related technicalities and formalism to a minimum.

5.2.2 - A brief overview o f parsing

I assume, relatively innocuously I think, that the core meaning o f a sentence is a product o f its constituent words and the relations between these w ords\ The latter point is illustrated by (4) below where two sentences containing the same words display different meanings:

4) i. Nixon kissed Mao. ii. Mao kissed Nixon.

In section 4.4 of the previous chapter I described how in LG the semantic relations (or roles) o f a sentence, those o f most immediate relevance to its meaning, can be read off or derived fi’om a syntactic structure in a systematic way. This is more or less true o f all syntactic theories, and thus the implication is that in order to understand a sentence two key tasks o f the hearer will be to recognise its constituent words and to work out the syntactic relations which exist between them. This process o f recognition and structure building has traditionally been described as parsing - see Winograd (1983 ch. 3), De Roeck (1983), Jackendoff (1987 ch. 6), Altmann (1989) and Garman (1990 ch. 6) for some general introductory comments.

The basic function o f the parser is to assign a well-formed syntactic structure to a grammatical string of words. Thus in some sense the parser can be viewed as a machine which inputs strings o f words or elements and outputs syntactic structures created on the basis o f

^By core meaning I intend to exclude pragmatic components such as implicatures and explicatures (Sperber and Wilson 1986).

these strings. In order to achieve this the parser must, o f course, make use of grammatical knowledge, and the parsing process is often described as the procedural implementation of principles and facts stored in the grammar (Marcus 1980, Winograd 1983, Hudson 1990 ch. 2). For example, a PP Theory-based parser might utilise principles o f X ’ Theory as well as Theta Theory or Case Theory in order to construct a phrase structure tree such as (5ii) on the basis o f the input string in (5i) (Berwick and Weinberg 1984, Berwick 1991, Pritchett 1992);

5) i. A B C ii. CP AP C’ i -

/

BP I B

One important property o f the parser is that it must operate incrementally from left-to- right (Marlsen-Wilson 1973, Marlsen-Wilson and Tyler 1980). Theoretically it is o f course possible to devise a parser which starts building structure at the end or in the middle o f a sentence. Humans evidently do not operate in this way, however, and instead processing gets underway immediately as soon as words are recognised. This, for example, explains the occurrence o f ‘garden path’ sentences where the processor is fooled into building a structure which subsequently proves to be incorrect (Marcus 1980, Frazier and Rayner 1982, Pritchett 1992). So, for example, in (5) above an incremental parser might seek to link A and B, creating the partial structure in (6), before C even occurs:

6) AP A’

/

When C occurs its own projection will be added to (6), yielding the possible structure shown in (5ii). The parsing process continues in this way with structure being built up as new words are recognised in the input string^.

^One issue which pertains particularly to phrase structure parsers is the question of whether the processor operates ‘top-down’ or ‘bottom-up’. Very simply, a ‘top-down’ (or hypothesis-driven) parser will start by assuming the existence of a top node, usually a clausal constituent such as S’ or CP, and will then seek to accommodate words from the input string within the substructuresof this higher node. A ‘bottom-up’ parser, however, will instead build up lower nodes in the phrase structure

Since the linguistic mainstream has been dominated by constituency-based theories of syntax, it is not surprising that the majority o f parsing models have tended to concentrate on how hearers build up constituent structure on the basis o f words, illustrated in (5) above; as Gorrell (1995 p. 43) concludes: “It is a rare processing model that does not make some reference to syntactic constituents and structural relations” (see also Winograd 1983 chs. 3 and 7 and Fraser 1993 ch. 1). The first serious attempt to offer a coherent and explicit model of phrase structure parsing can be found in Bever (1970) and Fodor, Bever and Garrett (1974). Their main parsing strategy, the Canonical Sentoid Strategy was to identify sequences o f NP- V-NP in the incoming string and to analyse these sequences automatically as subject, verb and object. This is an early example o f a parsing axiom or heuristic, an overall strategy which guides the processor in its attempt to build structure on the basis o f the input string. Since the work of Fodor et al. (1974) a number of different parsing heuristics have been proposed which seek to build up constituent structure in a simple and principled way, see for example, Kimball (1973), Frazier and Fodor (1978), Marcus (1980), Frazier and Rayner (1982) Berwick and Weinberg (1984) and Pritchett (1992).

Dependency-based parsers, by contrast, have been comparatively rare; Fraser (1993) contains a valuable survey. Nevertheless, parsers have been formalised on the basis o f some dependency theories such as WG (Hudson 1989, 1994, Fraser 1989, 1993) and Lexicase (Starosta and Nomura 1986). In addition, various systems o f machine-translation and speech recognition have employed a dependency formalism (Hays 1966). Faced with the same string in (5i) a dependency-based parser might output the structure in (7):

’ >

A B C

Like its constituency-based counterparts, a dependency parser must also operate incrementally from left to right, and will thus start building structure on the basis o f words as they are recognised in the input string. However the absence o f phrase structure in dependency grammars means that for the most part a dependency-based parser will rely less on structural algorithms and concentrate more on the properties o f individual words as they are encountered. Thus the structure in (7) is largely a product o f the fact that A and C license

tree on the basis of words as they are encountered in the input string, and then seek to create higher nodes by combining these lower ones.

dependents, whereas B does not, and it is valency information o f this sort that the parser will have to access in order to construct a well-formed syntactic representation. In this way dependency parsers will, for the most part, be more data-driven than hypothesis driven, and for this reason the top-down/bottom up distinction which characterises phrase structure parsers is o f questionable relevance to dependency-oriented processing, although see Fraser (1993).

One issue which pertains equally to dependency and phrase structure parsers is the question o f ambiguity. It is often the case that a legitimate input string can be associated with more than one well-formed structure. This is true, for example, in cases such as (8):

8) Visiting relatives can be a nuisance.

In addition to examples of global ambiguity such as these, the parser will also have to resolve temporary (or local) ambiguities such as (9):

9) i. I know the Bishop.

ii. I know the Bishop likes football.

Both (9i) and (ii) display temporary ambiguity in that at the point when “the Bishop” occurs a purely incremental processor will not know whether it is the object o f “know”, as in (9i), or the subject o f a complement clause, as in (9ii). In order to resolve this ambiguity and establish the correct structure the parser will have to wait and see whether there is any further structure to process.

Following Fodor et al. (1974) it is possible to distinguish two broad approaches to the question of ambiguity, serialism and parallelism; very basically when faced with an ambiguous input string, a parallel parser will construct all possible structures that can be formed on the basis o f that string. If then later the ambiguity is resolved, then the parser can go back and eliminate any incorrect structures that it might have built. Presented with the same ambiguous input string a serial parser will construct just one structure. It may o f course transpire later that the parser’s choice was wrong - this, o f course, is one o f the hazards of operating incrementally. In this case the parser will have to go back and undo the structure it has created and build another one. This process is called backtracking.

In different ways the creation of parallel structures and backtracking are both costly in terms o f processing effort, and Marcus (1980) seeks to eliminate the need for these strategies by developing a model o f deterministic parsing. A deterministic parser, like a serial parser, creates only one structure at a time, but uses a look-ahead device or ‘buffer’ to ensure

that each structural choice it makes is correct. Thus in (9i), for example, before linking “the Bishop” as an object o f “know”, a deterministic parser will check to see if this analysis is compatible with subsequent words in the input string, which it clearly is not in the case o f (9ii). In this way the parser can avoid the need to backtrack later. Since the work o f Marcus (1980) most phrase structure parsers have been either strictly deterministic, in that they allow no backtracking, or weakly deterministic, which may allow only very limited backtracking (van de Koot 1990). So too the majority o f recent dependency parsers have been serial and/or weakly deterministic; this is true, for example, o f Starosta and Nomura’s (1986) Lexicase parser and Fraser’s (1989, 1993) WG-based parser as well as other systems developed by Hays (1961) and Covington (1990). However, Hudson’s (1989) WG-based parallel parser is an exception.

5.2.3 - Other processing-related accounts o f extraction

Even before serious parsing systems were first formulated, people sought to attribute certain linguistic phenomena to processing factors. For example. Miller and Chomsky (1963) and Chomsky (1965) note how a grammar can be simplified if centre-embedded constructions are excluded by processing rather than grammatical criteria. Subsequently a great deal of interest was shown in interpreting aspects o f the grammar in processing terms, or vice versa.

One of the best-known examples o f this was the Derivational Theory o f Complexity or DTC (Fodor et al. 1974) which, very simply, saw the parser as a transformational grammar ‘run backwards’. Since then people have generally taken a more autonomous view o f the parser and grammar (Abney 1988, van de Koot 1990, Berwick 1991). Nevertheless, more recently, Hawkins (1994) suggests that certain aspects of the grammar may in fact be ‘ grammaticalised’

In document Extraction, movement and dependency theory (Page 177-191)