Discussion: linear order and the parser - Processing cross-serial dependencies

Chapter 5. Processing cross-serial dependencies

5.7 Discussion: linear order and the parser

So far, we have seen that there might be several dimensions along which crossed dependencies might differ from the more ‘usual’ nested dependency structures. In this discussion, I want to take a brief look at the processing of serial order, as it can provide us with further insights in the differences between Dutch and German structures. I haven’t taken these elaborations into account in my experiments however, and therefore I will finish this paragraph with some suggestions for further research.

5.7.1 Processing serial order

It is often argued that linear order information cannot be used in real time sentence processing, since it is a slow, serial process (McElree and Dosher, 1993; McElree, 2006), while sentence processing is taken to be a fast, parallel process140. This need not be problematic, since serial order information is also almost never needed, argue Lewis, Vasishth and Van Dyke (2005). Rather, the parser relies on discriminating lexical retrieval cues and a distinction between the past (elements outside current focus) and the present (elements just encoded). However, it seems that are at least some structures in which serial order is needed to parse the structure correctly. This need occurs when at least two items in memory match in retrieval cues and the only way to distinguish them is by serial order. LVD suggest that examples of such structures are the typically difficult self-embeddings and cross-serial dependency structures. Thus the problem with for example nested centre-embeddings is not storage overload, but rather impoverished discrimination combined with poor (cognitive) support for serial order. Processing limitations are then a result of limitations in discriminating similar items, rather than limits on storage, leading to a requirement for serial order retention (which is a slow process).

139_{Because we’ve concentrated above all on crossed dependencies throughout this thesis, I won’t go into the}

exact predictions of the processing models for the German structures, as this would be beyond the scope of this thesis. Instead, experiment 2 will be used to provide suggestions for further research and to see whether the syntactic structure proposed in the previous chapter gives predictions in the right direction.

140_{There is a debate however on whether the parser is parallel or serial, a distinction which is very hard to test, as}

Lewis (2000) points out. The exact implications do not concern us here however, and we will assume throughout that the parser in fact works parallel, as argued by most researchers in the field nowadays (cf. Van Gompel and Pickering, 2007).

108 The idea of processing limitation in centre-embeddings as interference difficulties is worked out more elaborately in Lewis and Vasishth (2005), in their ACT-R model (see previous paragraph). Recall that in this model sentence processing relies heavily on discriminating retrieval cues and the distinction between the present (currently encoded items) and the past (stored items). This works very well in general, but fails in sentences with the following structure, where β is a word that triggers the retrieval of either α1 or α2, and where α1 or α2 cannot be distinguished on anything except on the basis of their

relative serial positions (Lewis and Vasishth, 2005:404): (33) α1 … α2 … β

Normally, memory retrieval will retrieve that element with the highest activation value. If all other things are equal, that would be α2, the most recent element (which is the most activated due to the

decay of α1). Double self-embeddings are typical examples that fit this schema. Consider this sentence

conaining a double-embedded object relative clause:

(34) The salmon that the man that the dog bit smoked tasted good.

These structures are notoriously difficult to comprehend (Chomsky & Miller, 1963 et seq.). In many theories, the processing difficulty in these sentences has been explained in terms of exertions of memory resources (e.g., Gibson, 1991; 1998; Kimball, 1973; Lewis, 1996; Miller & Isard, 1964; Stabler, 1994). Lewis and Vasishth however propose that the problem in these sentences are the multiple attachment points that require the parser to distinguish the candidate constituents primarily (or even exclusively) on the basis of their relative serial order, to make the correct attachments to the verb. And this is, as noted in independent studies, a troublesome process for the parser (cf. McElree, 2006). One generalization that seems to hold in the least is that recalling serial order is difficult. Lewis (2003) suggest that maybe there isn’t a serial order mechanism built into the (automatic) parser, and that distinguishing items might be based on a separate (slower, and possibly more deliberate) process that is recruited by the parser if it cannot work the structure out. The question that remains, is: how can we then analyse the differences found between the processing of crossed and nested dependencies? 5.7.2 Serial order in cross-serial dependency structures

As Lewis and Vasishth (2005) note, cross-serial dependencies are particularly interesting because they are syntactic constructions that appear to violate the standard nested most-recent-first ordering, which is argued to be the most natural for a parser with activation decay - note that that predicts that nested dependencies are easier than crossed dependencies, in this regard. Instead, cross-serial dependencies occur in a most-recent-last ordering. As a way around this problem, Lewis and Vasishth suggest that there might be sufficient information (semantic, presumably) at the verb to discriminate between them, which would mean that the parser could avoid both the heavy processing costs of processing nested structures and of using serial order. However, this information might not always be available (in sentences without full definite NPs, for example, or without discourse-predictable verbs), so what happens if explicit serial order information is required? Lewis and Vasishth argue for an account where the first, second and third NPs might be encoded as “discourse anaphors whose semantics are grounded in explicit relations in a discourse model, perhaps held in long-term working memory” (Lewis and Vasishth, 2005:413). Unfortunately, they do not explicitly work out this model (it seems they want to move the processing efforts to a kind of pragmatic component).

So how might we account for the difference between crossed and nested dependencies? One way of looking at it is: if recalling serial order is difficult, using the exact same order would be helping the most, for it is easier to remember a copy of something than to remember an inverted copy of it141

141_{Note that such a hypothesis could have interesting consequences for the analyses of other linguistic processes}

where serial order is retained, as in Scrambling, Object Shift, etcetera (see chapter 3).

(cf. Cooper, 1975 – thus cited in Vogel, Hahn and Branigan, 1996).

109 Another solution is offered by Lewis (2001), who posits two codes as general parser cues: the START code and the END code (the current code). Whereas the former works with queue effects, the latter works with stack effects. The best matching item will be retrieved on the basis of these codes, and there is no need for the parser to know any other positions.142. This hypothesis has not yet been worked out fully however, and we’ll turn other possibilities now.

The question can be shown from both sides: why is it useful (or more economical, parsing-wise) to get to V1 first, or why is it useful to get rid of (the dependency of) the NP1 first? Though this is of course the same question, but the ways of looking at it are reflected in the answers one might give:

(i) As was suggested in Bach et al. (1986), among others, it might be more useful to get V1 first because that way the main structure can be built and integrated immediately and incrementally, whereas in German, there is no way to tell where the substructure NP3-V3 (the first built substructure in three-verb verb cluster structures) belongs in the main structure until the end of the sentence. Substructures, being big complexes, lay a heavy processing load on the cognitive resources, and should be avoided. This hypothesis I’d like to call the ‘immediate integration hypothesis’. Note that this hypothesis crucially assumes that we built syntactic trees top-down rather than bottom-up, and does not take the amount of derivations that have taken place in a structure into account.

(ii) If we suppose a theta-driven parser, the difference between crossed and nested dependencies might arise from the time that the first NP has to wait for its theta-role: whereas in German the first NP has to wait until the last verb in the sentence (making this a particularly long theta-dependency), in Dutch every NP is satisfied as soon as possible (given that each NP has to get a theta-role, and that the NPs all occur together initially). This hypothesis I’d like to call the ‘immediate theta-role hypothesis’. Note that this hypothesis crucially assumes that linear order is involved in theta-role assignment, and that local short distance of theta-role assignment is thus overridden by ‘overall shortest theta-dependencies’. Immediate theta-assignment is thus counted from the argument that requires it, instead of from the verb, as is usually assumed. This correctly predicts that the dependency of the first NP must be resolved first, and so further.

Condition (ii) would however be violated in German, and centre-embedded structures in other languages. Possibly, this violation could be nullified if we assume that the condition only applies clause-internally, and crossed structures only occur clause-internal (as was the case with Agree, see chapter three), while nested dependencies typically occur between clauses. This does have as a consequence that we must analyse German verb clusters as containing several clauses (contrary to Dutch, which crucially contains one clause), while effects of clause union have been put forward for German too (Haider, 2003; Bader and Schmid, 2006). However, the fact that German nested structures become very difficult from three embeddings onwards, does mirror very closely the findings for the processing of centre-embeddings in other languages, such as English (Gibson and Thomas, 1995). In fact, it has been shown time and again that two elements, and maybe three, is what the parser can process (or keep in mind) in interference contexts (Lewis, 1996, and references therein). Take for example the following pair of sentences, where sentence (35a) is fine, but (35b) is nearly incomprehensible.

(35) a. The salmon1 that the man2 smoked2 fell1.

b. The salmon1 that the man2 that the dog3 chased3 smoked2 fell1.

142_{Also, it might provide some insights as why participles in large verb clusters are preferred at either a verb}

110 Thus, we see that this divergence in comprehensibility mirrors the findings of Bach et al. (1986) that between two levels and three levels of embedding a great raise in processing difficulty is encountered. This, and the matching surface structures, suggests that these are very similar structures.

In document Cross-serial dependencies in Dutch: On the syntactic and psycholinguistic processes in verb clusters in standard Dutch, and how they can be related (Page 107-110)