Suitability of the data for forensic linguistics research: limitations

Chapter 8 ‘Come to think of it’: a consideration of the issues

8.6 Suitability of the data for forensic linguistics research: limitations

In Section 7.5.2 (p. 164), the possibility that the data collection method may have primed or otherwise influenced the narratives produced by the authors was raised. It was demonstrated for that approach that there was no relationship between any formulaic sequences in the data-eliciting questions and the narratives. Nonetheless, it is now necessary to focus more closely on the data in order to assess the impact of this research and, consequently, the level of generalizability that can be afforded to the results. The suitability of the data for this research can therefore be organised under three key questions:

-181-

1. Are narratives appropriate data for forensic linguistics research? 2. What are the limitations of the data?

3. How generalizable are the results derived from the data?

To engage with the first question, clearly, if narratives are deemed inappropriate data on which to investigate the potential of formulaic sequences as a marker of authorship, the generalizability of the results will be severely limited. In Section 4.4.1, the argument was made that narratives were appropriate data since the participants needed a structured writing task which would be familiar to them. Clearly, however, this is not the only consideration and other issues that must be borne in mind include the nature of narratives, their relationship with occurrences of formulaic sequences and, ultimately, their appropriateness for the forensic context.

There is certainly some agreement that narratives are ‘special’: that they are not necessarily spontaneous and they are inherently linked to the identity of their authors. Toolan (2001) argues that one of the characteristics of a narrative is the “degree of artificial fabrication or constructedness not usually apparent in spontaneous conversation” because narratives are planned, revised and refined (p. 4). Furthermore, narratives contain a “degree of prefabrication”:

In other words, narratives often seem to have bits we have seen or heard, or think we have seen or heard, before (recurrent chunks far larger than the recurrent chunks we call words). One Mills and Boon heroine or hero seems much like another—and some degree of typicality seems to apply to heroes and heroines in more elevated fictions too, such as nineteenth- century British novels (p. 4)

Of course, in the latter half of the quotation Toolan appears to be describing formulaic genres (Kuiper, 2009) but the reference to ‘recurrent chunks’ no doubt can be subsumed under the definition of the formulaic sequence. It may be argued that narratives inherently attract or demand a higher count of formulaic sequences than other types of text such as a letter of complaint or an application form for employment might. Furthermore, narratives may be more closely connected to identity than other texts:

Because narratives are, relative to ordinary turns of talk, long texts and personalized or evaluated texts, there is a way in which, while your conversational remarks reflect who you are (your identity and values), in the course of any narrative the narrator’s text describes that narrator. In brief snatches of conversation, a person may be able, through accent-mimicry for example, to ‘pass’ for someone of a different class or gender or ethnic identity; but to take on another’s identity in a sustained fashion, across a number of personal narratives, is ordinarily very difficult, and may even imply disabling confusion or a personality disorder (Toolan, 2001: 3)

-182-

This is in keeping with Johnstone (1996), as discussed in Section 2.1.5, who argued that linguistic differences between people are especially evident in narratives and that “it is precisely in narrative that people’s individuality is expressed most obviously, because the purpose of narrating is precisely the creation of an autonomous, unique self in discourse” (1996: 56). This creates something of a duality: narratives may well contain more prefabrication and increased opportunity for the use of formulaic sequences than other types of texts, but at the same time, since narratives are so linked to identity, the ground may well be fertile for individual style to manifest. Also, the fact that narratives may invite a higher occurrence of formulaic sequences is not sufficient grounds to dismiss the data. Ong (1982) argues:

Human memory and language grow out of the unconscious into consciousness. Writing and print and electronic devices are produced by conscious planning—though of course their use, like all human activities, involves the unconscious as well as consciousness (p. 22).

Therefore, whilst it may well be true that the narratives analysed in this research were produced with conscious planning, there will also have been elements of unconscious planning, and since formulaic sequences are argued to be produced at a lower, less conscious level, their occurrences may still be more linked to authorship rather than being determined by genre indicating that narratives are no less suitable data than any other potential text. Clearly, finding an answer to this puzzle goes far beyond the scope of this research.

What this research cannot establish is whether the 20 authors investigated would use the same formulaic sequences or the same count of formulaic sequences in any other texts that they authored, and to this effect, there is a valid argument that the data are special. However, since this research is only in the initial stages of testing formulaic sequences as a marker of authorship, it is argued here that selecting narratives as data was reasonable and even though this particular decision limits the results, those results may still be useful for the forensic context:

Some texts, through their content, are clearly of interest to police investigators and the wider judicial process. These texts might include, for example, threatening or abusive letters, ransom notes or sexually explicit internet conversations between middle-aged men and under-aged girls. Many texts, however, which are analysed as part of forensic casework, are not inherently criminal; they may be more mundane including for instance, personal letters and diaries. Such texts may provide an alibi or their content may assist an investigation in a less direct way (Grant, 2008: 216).

In this way, the selection of narratives as data for this study is just as valid (and equally, as limited) as any other type of text that could have been selected and clearly, the next stage in the research process will be to explore the relationship between formulaic sequences and individual language use

-183-

on a wider variety of genres. Having established that narratives are as valid as any type of data for an investigation of this kind, it is next necessary to determine whether other choices made as part of the research design might have limited the generalizability of the data.

The first issue to consider is the length of the texts, and in loose terms, whether the data should have been longer or shorter than the 500 words decided upon. The decision to use shorter texts (and the arbitrariness of this term was discussed in Section 4.4.1) was motivated by the desire to make the results relevant to the forensic context, where texts are typically short. Just as Kredens (2001) argued that demonstrating differences in idiolect between two very similar speakers should make it more likely that it would be possible to find differences between dissimilar speakers, so too it can be claimed of this research that finding evidence of author specific uses of formulaic sequences in shorter texts should increase the likelihood of finding differences between authors when longer texts are analysed. Certainly, this logic would be in keeping with the notion that more data makes conclusions more solid, or as D. Foster (2001) claims: “[g]ive anonymous offenders enough verbal rope and column inches and they will hang themselves for you, every time” (p. 12). Since two of the three approaches outlined in this research met with relative success, it is likely that using longer texts would generate stronger results. Likewise, since the results were limited in some cases, it may be wise to concede that the approaches will be unlikely to work on even shorter texts (i.e. less than 500 words).

A further potential limitation of the data is the context in which they were produced. Whilst the point was repeatedly reinforced to participants that they alone should author the texts, beyond that stipulation there was no control over how the texts were produced. It is not known, for instance, whether each author dedicated a reasonable amount of time to the composition of each text or whether they rushed the task. It is not known whether there were any distractions (such as watching television or talking on the telephone) whilst composing these texts. To this end, there may be an argument for further testing of the method using data produced under laboratory conditions. However, since the results from this research indicate a level of success for two of the methods, it is unlikely that there is a need to control the data so tightly. This is not an insignificant fact given that in a forensic investigation, no data would be available which had been produced under laboratory conditions (with the possible exception of encouraging a suspect to complete a writing task under police supervision). Therefore, to suggest that the research should be replicated on more strictly controlled language would seem to be a step backward in the testing process since it would carry less forensic applicability.

-184-

The fact that the data were collected over a five day period may also be problematic. This was necessary to ensure that participants were not faced with an unmanageable task (e.g. producing 2,500 words in one day) and also to capture each author’s use of formulaic sequences over a period of time, rather than in the same sitting where tiredness could have increased (or conversely decreased) the use of formulaic sequences towards the end of the task. What the data have therefore captured is each author’s use of formulaic sequences on five separate occasions over five days. No claims can be made about whether the same authors would use the same formulaic sequences, or indeed count of formulaic sequences, if the research was repeated. There is clearly a need for future research to investigate changes in individual repertoires of formulaic sequences over a longer and intermittent period. One potential solution may be to adopt a similar research design to Barlow (2010). In collecting his data from the White House Press Conferences he deliberately avoided using speech produced on consecutive days, typically leaving an interval of three days, in order to reduce the likelihood of priming from one day to the next although the obvious point of departure is that those data were produced for non-experimental purposes.

It is also important to consider that in Section 4.6, several procedures for editing the data were outlined. How much effect did these procedures have on the analysis? Is editing data in this way practical under typical forensic circumstances? To deal with the second question first, editing the data was relatively straightforward since the grammar and spelling features available in Word 2010 were available to automate the process. Therefore, editing the 65,113 word corpus was both reliable and achievable in a short period of time. Therefore, the practical issue is of minor concern. Of greater concern is whether editing the data had an effect on the analysis and whether standardising the data (correcting spellings, apostrophe misuse etc.) actually covered up idiosyncrasies or other aspects of author style. To assess this, all of the changes that were made were individually reviewed and only two of them potentially affected the results. In Keith-4, vicer versa was corrected to vice versa and besides the point was standardised to beside the point in Alan-3, both of which were subsequently identified as formulaic sequences using the reference list approach. None of the other changes made affected the analysis and the occurrence of these two formulaic sequences across the entire corpus would have been insufficient to alter the results. Therefore, editing the data to enable automated approaches was justified.

Finally then, assessing the overall generalizability of this research, the following points can be made:  The results and conclusions derived are limited to short narratives and generalizability to

-185-

 The texts were composed over a five day period so claims about consistency in formulaic sequence use over a longer period cannot be made; and

 The effects of editing the data are minimal, so analyses using texts which have potentially been auto-corrected by word-processor spelling and grammar features may be possible using the approaches outlined in this research.

-186-

In document To cut a long story short:an analysis of formulaic sequences in short written narratives and their potential as markers of authorship (Page 182-188)