Chapter 3 ‘Breaking new ground’: formulaic language as a marker of authorship
3.5 Can formulaic sequences be identified in ways sufficiently robust for forensic purposes?
3.5.1 Intuition and shared knowledge
Wray (2002) says that as members of their own speech communities, researchers “often are the self- appointed arbiters of what is idiomatic or formulaic in their data” (p. 20). This is based on the belief that native speakers recognize formulaic language as having special status (Van Lancker-Sidtis & Rallon, 2004: 208). Therefore, intuition can be used as a basis for identifying formulaic sequences. This approach is clearly subjective and for the technique to carry more reliability, at least a second- rater should be used (Read & Nation, 2004: 29). Better still, panels of independent judges rather than individuals or couples can be used to reach consensus about whether a string of words is indeed
-69-
formulaic (e.g. P. Foster, 2001), for as Wray (2002) comments “there should be a certain resilience in a consensus achieved in this way ... there can be a wide variation in the overall number of sequences spotted by different judges” (p. 22).
Despite a lack of objectivity, using intuition to identify formulaic sequences is ideally suited for a researcher who wishes to adopt an exploratory approach. Formulaic sequences are not always fixed and do not always have firm borders, so it sometimes requires a judgement call to decide whether something is formulaic or not: “[T]he problem with formulaic language is that between the extremes of what is definitely formulaic and what is definitely not formulaic, there is a sizeable amount of material that may or may not be” (Wray, 2008: 93, original emphasis). This type of discretionary judgement into the ‘grey areas’ of formulaic sequences can only be performed by researchers (in comparison to automated methods).
Tied into using intuition is the concept of using shared knowledge. If members of the same speech community all share the same knowledge about particular formulaic sequences, then it can be possible to detect which strings of words are formulaic for that community. The method is for one person to start a formulaic sequence and then for other members of the speech community to complete it. Depending on how reliably the sequence is completed by others provides insights into how formulaic the sequence is for that particular speech community. However, the technique is not appropriate for formulaic sequences that allow variation (Wray, 2002: 24—5) since not all members of the same speech community could be expected to complete variable formulaic sequences in the same way.
Reliance on intuition is a commonly used approach for the identification of formulaic sequences (Wray, 2002: 20) although researchers acknowledge that it is at the same time the least scientific: “The status of the intuition of an individual investigator is dubious from a modern “scientific” perspective” (Read & Nation, 2004: 29). This immediately causes problems for any method that might be tested against the Daubert criteria (Section 2.3, p. 39). Intuition is not scientific because there is a lack of reliability—what one researcher may judge to be formulaic may not be so for another so there is the danger of significant variation between judges. To complicate the issue further, what may be formulaic for one researcher on one occasion may not even be so for the same researcher on a different occasion for reasons such as tiredness and unintentional changes in how judgements are made (Wray, 2002: 23). Therefore, identifying formulaic sequences using intuition alone cannot offer any reliability.
-70-
Claiming validity can be less problematic. If a string of words is intuitively recognised as formulaic, then it has every potential to be stored and processed holistically, particularly if a group of judges can reach consensus. However, intuitions about language are not always correct and in an era of corpus analysis, linguists are often sceptical of intuitive judgements as Sinclair noted over 20 years ago:
[T]he contrast exposed between the impressions of language detail noted by people, and the evidence compiled objectively from texts is huge and systematic. It leads one to suppose that human intuition about language is highly specific, and not at all a good guide to what actually happens when the same people actually use the language (Sinclair, 1991: 4).
In addition, to make intuitive judgements that are valid, researchers identifying formulaic sequences need to have the same shared knowledge as the people who produced them: “Clearly, any string that is formulaic for, say, the speaker, but not for the hearers, will simply not be understood unless it is transparent” (Wray, 2002: 24).
Intuitive analysis of texts is often restricted to small datasets given that each text has to be read carefully and more than once which can make it a slow and laborious process. It is therefore not feasible to use intuition to identify formulaic sequences in larger texts or indeed in shorter texts if there are many of them as is often the case in forensic investigations. The time pressures involved in producing forensic authorship evidence (Shuy, 2006) may therefore preclude this from being a feasible identification technique. Whilst using a panel of judges may increase reliability, the majority of forensic linguists work in isolation and may not have access to similarly trained linguists who could assist. Furthermore, many forensic materials are confidential, and so the linguist would be unlikely to get permission for a panel of judges to view the texts. It may be possible for linguists to extract word sequences that they believed to be formulaic and to present them to a panel out of context, but the success would rely crucially on the linguist having identified the ‘right’ sequences of words in the first place. As such, using intuition as a technique for identifying formulaic sequences in forensic texts is not feasible.