Is there a link between formulaic language and idiolect?

Chapter 8 ‘Come to think of it’: a consideration of the issues

8.3 Is there a link between formulaic language and idiolect?

The next question that naturally arises is whether there is an inherent connection between formulaic sequences and idiolect—whether evidence of idiolectal formulaicity exists. Each author produced their own narratives; therefore, all of the language that occurs is part of their idiolect. But this approach can be a little too simplistic—or at least less useful for forensic purposes—since the strength of the argument accounts to no more than noticing that the words strength, of, the and argument are part of my own idiolect, but shared with countless other authors: in the BNC alone, strength occurs 6,951 times, of occurs 3,887,705 times, the occurs 6,047,031 times and argument

-172-

occurs 8,201 times. These words therefore hold very little diagnostic power of idiolect. However, the combination of these words does become interesting, with strength of the argument occurring only once in the BNC—what Coulthard (2004) would refer to as idiolectal co-selection. Of course, it is not clear whether strength of the argument is formulaic for me—it certainly does not occur anywhere else in these pages and nor is it knowable whether I will ever use this sequence of words again—but it is likely that analysing sequences of words may provide stronger evidence of idiolect rather than single words in isolation and therefore may be more appropriate for authorship attribution. Similarly, each of the authors in the corpus used the words in, a and way, so each of these single words were available as part of their idiolects but only eight of the authors used in a way as a contiguous sequence. Furthermore, only one author, Rose, used this sequence in every text she produced, indeed more than once in each text. This phrase therefore must be characteristic of Rose’s idiolect in a way that it is not for other authors. Even though this phrase is not distinctive in comparison to other authors, its frequency of occurrence does appear to be idiosyncratic to Rose.

The fact that two of the approaches adopted in this research were able to detect significant variation between authors shows that formulaic sequences also appear to be useful in illustrating idiolectal differences and characterising the styles of different authors. And what of the fact that other sequences of words, formulaic clusters, were identified based on recurrence across a series of texts? Again, the repetition and consistency must characterise an author’s idiolect in some way and the fact that 26 formulaic clusters were identified for Rose is surely significant in relation to the fact that only one was identified for Michael. Finally, the count of words which make up formulaic sequences compared to the overall words used might also characterise idiolect in some way. Is it a characteristic of idiolect that Melanie uses the lowest count amount of formulaic language whilst Thomas uses the most? Can it be said that Thomas is more formulaic than Melanie or any other author in the corpus? The fact that variation was demonstrated between authors and that on some occasions a Questioned Document could be attributed to its author would indicate that this is in fact the case. Formulaic sequences are a characteristic of idiolect, and, according to the evidence, are also a useful marker of authorship.

As discussed in Chapter 2, accepting that formulaic sequences are evidence of idiolectal variation rests on the assumption that such a notion actually exists and is not instead only the random or calculated combination of choices made by the language-user. In this research, a direct connection between formulaic sequence usage and authorship has been proposed since there is a strong relationship between formulaic sequences and reality—a reason why this marker should vary between authors as opposed to more objective measures (e.g. the number of words starting with a

-173-

vowel as used in the Cusum technique, for which no reasonable linguistic explanation exists for why inter-author variation should occur). That is, the way that people have been socialised, the contact with, and interest in language that they have, and the priorities they face when producing language all have an impact on the individual stores of formulaic sequences contained in each author’s mental lexicon. To this end, there is a good sociolinguistic and psycholinguistic theory to support the notion that idiolectal differences between authors exist and that authors use formulaic sequences differently from each other.

This research has made no direct attempt to find evidence of idiolect and was only ever searching for markers of authorship. It is important to keep the distinction between these two endeavours separate, since, in light of the discussion in Chapter 2, far more data from many more authors produced over far longer periods would be required before any strong claim of the existence of idiolect could be substantiated (in other words, a closer approximation to the “totality of speech habits” than can be achieved in five short narratives). However, it should be apparent that there is an underlying theory behind this marker of authorship—authors have a different store of formulaic sequences built up over a lifetime of differing experiences—so in Grant’s (2010) terms, this research is not idiolect-free authorship attribution. Therefore, although further testing along the lines suggested in Section 99.1 (p. 186) will be required before any attempt to demonstrate the existence of idiolect, the following comments on idiolect in general may be offered.

This research contributes to the theory of idiolect in as much as it demonstrates that formulaic language is a component. But the crucial point is that it only provides a very small snapshot of idiolect, namely, of each individual author’s idiolect as it relates to writing short personal narratives over a five day period. This would differ, one assumes, from the features of their idiolect that would manifest whilst, for example, discussing an important matter with colleagues in a meeting over the course of one hour. The point is that whilst any form of language an author produces can legitimately be argued to be an aspect of their idiolect (as it has been defined in this research), there are as yet, far too many unknown variables affecting which aspects of an idiolect are invoked on one occasion compared to another. In light of this, it seems more appropriate, specifically for forensic purposes, to talk about the stability of linguistic features (Barlow, 2010). Barlow (2010) explains that “the language of an individual changes depending on the interlocutor and the general context” (p. 2)—a point which has been repeatedly made throughout this research. In his investigation of the language of White House Press Secretaries, Barlow argued that some features of idiolect, such as bigrams, could be shown to be stable over a one year period since intra-author variation across different speech samples was low, whilst inter-author variation was higher.

-174-

Rather than looking for features of idiolect, then, the forensic linguist may benefit from searching for features of idiolectal stability—those features which seem to characterise an individual’s language use regardless of content and despite text length, genre, composition date and medium—the Holy Grail of authorship markers. Such features will likely be rare (if they exist at all) and it is argued here that deeper level features which lie beyond the individual’s conscious language decisions will be the most fruitful avenues for investigation. Of course, formulaic language is one such possibility and so whilst it has been argued that formulaic sequences are part of these twenty authors’ idiolects, and that this characteristic appears to enable differentiation between some authors, what we cannot know at this stage is whether this feature is evidence of idiolectal stability.

Can we therefore adjust the existing theories of idiolect (as described in Chapter 2) on the basis of the empirical work carried out here? It would be tempting to say yes given that formulaic language, particularly the count of formulaic sequences, appears to be a characteristic of language which remains stable over a five day period and which differs enough for a Questioned Document to be attributed to its author. However, my more cautious side prevents such a conclusion since this research project was not designed as an investigation into idiolect and some of the criticisms I have levelled against other researchers in Chapter 2 could, I suggest, to differing extents apply to my own. To reach such a conclusion would be to judge my research in a different light to others’. It is therefore more cautious to conclude that through this research an aspect of language use has been identified that appears to differ between individuals. However, the scope for generalisation is very small since the analysis has focussed on one very small aspect of language use (namely, the production of short written narratives over a five day period). For now, I would prefer to conclude that formulaic language appears to be a consistent and distinctive authorship marker (Grant, 2010) which warrants further investigation to determine its limits. How that feeds into the idiolect debate, if at all, will be for other researchers to assess.

In document To cut a long story short:an analysis of formulaic sequences in short written narratives and their potential as markers of authorship (Page 173-176)