This section began by demonstrating the variety of contexts in whichdo-support may appear in PDE. These contexts of occurrence were then seen not to be the result of a single grammatical parameter (in the sense of a minimal point of crosslinguistic difference), but rather to simultaneously instantiate many discrete environments which are shared, in a piecemeal way, with a variety of other typologically distinct languages. In light of this evidence, it makes sense to disaggregate the various contexts for a diachronic study. Platzack (2008) provides a structural argument for considering VP topicalization, ellipsis, and focus cases separately from negatives and subject-auxiliary inversion. Traditionally, quantitative analysis of Englishdo-support has taken the same tack (Ellegård 1953; Kroch 1989). For these reasons (and also because assembling a sufficient corpus of examples of rare VP topicalization constructions is difficult), I will not address the diachrony of VP topicalization or ellipsis in this dissertation, focusing instead in the first instance on negative imperatives and questions (both affirmative and negative; taken to be exemplars of the subject-verb inversion environment). After disposing of that question, I then went on to consider the usage ofdo(and its cognates or synonyms in other languages) in non-do-support contexts. This provides two crucial insights for a diachronic analysis. The first is thatdoin a variety of languages can serve as a marker of argument structure, specifically of transitivity, agentivity, and/or causativity. Second, it is seen that a sort ofdodifferent from that found in PDEdo-support recurrently impinges on a variety of West Germanic dialects. Both these observations play a key role in motivating my decision to undertake an analysis of thedowhich appears in EME affirmative
declarative sentences, seeing it as a crucial prerequisite to understanding the evolution ofdo-support as a whole. This theme is taken up in section 4.3 below.
Negative imperatives too show some synchronic oddities with respect to their relationship to the core system ofdo-support – such as the failure of full complementarity with some auxiliaries noted above. They are included in the investigation as another diachronic development potentially distinct from the other environments (and also because they have been traditionally analyzed alongside other type ofdo-support). Having established these synchronic bases, I now turn to a discussion of the statistical methods employed in this dissertation.
Chapter 3
Statistical methods
The goal of this chapter is to give the reader an orientation to the statistical and computational issues which underlie the work in this dissertation.
There is an indubitable connection between statistical procedures and corpus methods as applied to questions of historical syntax (and indeed in general). Having extracted a dataset from a corpus (often, counts of construction (non)occurrences in a variety of different linguistic contexts), a researcher then desires a measurement of how meaningful these results are, and what, if any, conclusions may be drawn about the linguistic faculties of the speakers who generated the data. Statistical methodologies are a natural fit for this mode of inquiry. In this chapter I will describe the prevalent modes of statistical inquiry in historical syntactic research and their implementation in the present dissertation. Specifically, in section 3.1 I will discuss the CRH, the dominant mathematical model of the spread of syntactic changes through the population. In section 3.2 I will delve into further detail on logistic regression, the statistical procedure which underpins the CRH model. Section 3.3 addresses additional complications which arise when computing regression models, and describes the strategies this dissertation employs for addressing them. Finally, section 3.4 discusses a crucial issue in the interpretation of the statistical results in CRH analyses. The question of choosing a best model turns out to be crucial to the CRH.
3.1 The Constant Rate Hypothesis
Kroch (1989) formulated the Constant Rate Hypothesis (CRH; sometimes known as the Constant Rate Effect or CRE):
(76) Constant Rate Hypothesis: changes spread at the same rate in all contexts.
This hypothesis is important because “on the basis of [it] substantial progress can be made in understanding the relationship between the structural patterns uncovered by grammatical analysis and the frequency patterns revealed by sociolinguistic methods.” That is, it is a useful tool for the study of language variation and change from a generative standpoint, since it links the domains of frequency and grammar, allowing corpus data to inform theoretical proposals and vice versa.
Figure 3.1:An illustration of two possible parameterizations of parallel lines. The left-hand system considers the lines as separate objects, assigning to each a slope and an intercept (for a total of 4 parameters). The right-hand system uses only three parameters: a slope and an intercept to describe one line, and an offset to measure the distance between the lines.
What motivates the posing of this hypothesis? It is fundamentally a parsimony argument. As illustrated in Figure 3.1, there are two possible mathematical descriptions of a system of two parallel lines. If we consider the two lines to be independent of each other, we must describe each fully, specifying its slope and intercept. On the other hand, if the two lines are taken to be part of a single system, it is necessary to specify the slope only once; the family of lines is then fully described by giving one intercept and the distance between the two lines. If we concretize by taking these lines to represent time courses of a change in two contexts, the first analysis is tantamount to proposing that there are two processes of change – one per context. These
two processes proceed at the same rate merely accidentally; since both slopes are specified in the model, there is no impediment to the lines being (or becoming) non-parallel. The second analysis amounts to a claim that there is only one thing (an abstract grammatical parameter) that is changing, corresponding to the single slope parameter. The intercept parameters are a measurement of context-specific effects which favor or disfavor the manifestation of the parameter change. The CRH counsels us to accept the former rather than the latter hypothesis because it uses fewer parameters to explain the phenomena.
Of course, the parsimony gain when there are just two lines is minimal. More convincing cases are adduced by Kroch (1989) – both the French V2 case compares 4 lines, and the Englishdo-support case has 6. However, each of these scientific comparisons is not of equal importance. In each case, there is a large family of very surface-similar contexts (V2 with different subject types, ordo-support with different clause types), which is compared with another, more distinct surface pattern (left-dislocation and verb movement to the left ofnever). While it is not a trivial discovery that V2 anddo-support each evolve in parallel in various contexts, there is not very much at stake. Any theory of grammar which recognizes the identity of these syntactic constructions will be able to capture their diachronic unity. On the other hand, the out-group comparisons provide evidence bearing on questions of true grammatical abstraction – the effect of a prosodic change on syntax in the case of French and the existence of an abstract verb-raising parameter in English. Thus, we ought to concentrate most of our attention on these comparisons, and take the in-group comparisons to be less important. One extreme method of implementing such a scheme would be to in fact assume that the in-group contexts evolve in parallel, and calculate a pooled slope estimate from them for comparison with the out-group slope. This may or may not be satisfactory, especially in the context of analyses where (non)-parallelism is not immediately visually apparent (because of noise, data sparsity, or an abstract analysis which derives a slope estimate indirectly from observed data). A statistical analysis using the concept ofshrinkage, where the in-group slopes can differ in the presence of especially compelling evidence but otherwise are constrained to be zero, may provide a sensible middle ground (with the disadvantage that it is not straightforward to implement; though Bayesian approaches which involve specifying a detailed implementation of the model should be able to cope at the cost of increased conceptual and implementational complexity).
Another factor which would help give the parsimony arguments more weight is the presence of more slope parameters which are collapsible across contexts, as would be the case if the diachronic trajectory is described not by a straight line, but rather by a higher-order polynomial or a spline function. Many syntactic changes unfold along an uninterrupted S-curve (which is equivalent to a straight line under the
transformations used in the statistical procedure underlying the CRH; see the following section for details). This is why the hypothesis bears the nameConstantRate Hypothesis, rather thanEqualRate Hypothesis (where the latter is a closer description of the hypothesis’s actual claim). In fact, this circumstance presents a significant challenge to CRH analyses. The finding of a different slope conclusively demonstrates that two contexts are not related by a single underlying change, but the finding of equality does not guarantee that the changes are generated by the same change.1In practice, the slope values for syntactic changes which have
been observed and described in the literature is constrained to a narrow range. On the short end, a change cannot take place more rapidly than one generation (and is likely to subsist in writing somewhat longer). Conversely, though some theories of change predict that very long-term syntactic changes are in principle possible, and putative evidence of such changes exists (Wallenberg 2013), it is difficult to distinguish such data from random drift (and indeed, fixation of former loci of variation by random drift over long time spans is predicted; see Kimura 1983).
In any case, the syntactic changes actually observed and described to date take place over roughly 100–300 years, a fact which necessarily restricts the observed slope values of these changes to those characteristic of S-curves with such lengths. The risk of two lines having statistically indistinguishable slopes by accident under these circumstances is thus heightened. The only antidote to this problem is to collect more data, a task which is addressed in chapter 5. Larger datasets will allow more precise measurements to be made of contextual slopes, and thus more precise comparisons to be made between them.2