Chapter 8 ‘Come to think of it’: a consideration of the issues
8.4 Are formulaic sequences a forensically robust marker of authorship?
Having established that formulaic sequences are a potential marker of authorship, it is necessary to assess how this tool may be used and what its future may hold taking into account its forensic value. Table 8.3 summarises the three methods outlined in Chapters 5—7 and indicates whether each method is valid, reliable and feasible for forensic application, summarising the discussion at the end of each of these chapters.
-175-
Table 8-3 Summary of methods in terms of validity, reliability and forensic feasibility
Method Valid Reliable Feasible
Formulaic clusters X X
Core word X X /
Formulaic sequences reference list
Of the three methods presented in this research, only one, the formulaic sequences reference list, successfully meets the criteria of being valid and reliable as an analytical tool, whilst holding the most potential to meet evidential standards. The core word method is valid but not reliable, with half of the approach being forensically robust and the other half being less so. Based on this research, then, an approach based on the formulaic sequences reference list appears to be the most suitable method to adopt for analysing texts for authorship. However, again, reference must be made to the fact that only short texts have been the focus of this investigation and it is quite possible that with a greater number of texts and/or texts of greater length, the other two methods may become more suitable. It stands to reason that with more data, a higher number of formulaic clusters will likely be identified. Likewise, there is potential for more way-phrases to be identified along with a wider variety of meanings and therefore alternatives when more data is available for analysis. In such a situation, stronger patterns may be established than was possible here. However, both of these approaches will still be limited in terms of their reliability and forensic feasibility.
The most obvious point to make is that using formulaic sequences as a marker of authorship is clearly not developed enough to hold any evidential value, taking into account the discussion of the Daubert criteria in Chapter 2. More testing is required in order to establish known rates of error and the exact limitations of the method. Peer review and acceptance or rejection by the community can then follow. In light of this, assuming these stages are followed, will there be any evidential value to using formulaic sequences as a marker of authorship? This is certainly more questionable since predicting which formulaic sequences authors are likely to use does not seem possible, with the possible exception of Rose for whom in a way could be predicted to occur in additional texts. However, the fact that texts produced by Rose could be compared to texts by other authors with in a way being far more distinctive may carry evidential value and, of course, formulaic sequences would never be used as evidence alone—they would always be combined with other markers of authorship. Therefore, as part of the forensic linguists’ basket of authorship attribution tools, formulaic sequences may indeed turn out to have significant evidential value.
It should also not be forgotten that the most successful of the methods was based on the normalised count of words which are part of formulaic sequences in comparison to the overall words and this appears to be a far more robust marker of authorship with some authors being more
-176-
formulaic than others. This method may be especially persuasive for jurors who are likely to recognise that some people use more clichés, for example, than others. It should also not be forgotten that the approaches outlined in this research are lexically based, and since other authorship attribution evidence has been admitted in UK and USA court cases based on lexis and notably strings of words (Coulthard, 2004; Fitzgerald, 2004; McMenamin, 2002), there is no reason to automatically assume that this tool would not be admissible. In reality, after further testing, whether the evidence is admissible will come down to the judgement of the courts.
What then of investigative value? Although Questioned Documents could not always be successfully attributed to their authors, there was a measure of success which may be useful at the investigative stage. Furthermore, it was possible to narrow down a larger pool of candidate authors using formulaic clusters. This would certainly be helpful in an investigation with multiple suspects. And it should be borne in mind that these results were achieved based on limited data so an investigation fortunate enough to yield longer and more texts would likely benefit from incorporating these analyses in combination with other markers of authorship. Through such an approach, the culmination of smaller similarities between texts may produce stronger evidence of authorship as in Bayesian analysis (e.g. Mosteller & Wallace, 2007).
A key point about the approaches outlined in this research is that some people are more similar than others, such as twins (e.g. Künzel, 2010). So too can it be expected that some idiolects will be harder to differentiate than others as was found by using the count of formulaic sequences as a marker of authorship: some authors could be differentiated whilst others could not. Notably, there appears to be a threshold at which the method no longer works whilst the most formulaic and least formulaic authors can successfully be differentiated. In real terms, this means that this method, like any other marker of authorship that relies on pairwise distinctions, cannot guarantee to work in every case. However, the point should be made that no such marker of authorship exists that provides 100% success and the forensic linguist will always have to appraise the data to establish which markers are the most useful to apply based on the available data.
In addition to the forensic resilience of formulaic sequences, comments regarding their general nature can now be made.