CHAPTER 2: Insights from the Digital Humanities
2.3 Comparison with the New Method
2.3.1 Similarities with the DH Projects
The similarities between the new method and the DH projects can be divided into three areas:
1. Procedure, 2. Evaluation, 3. Benefits.
These three similarities are explained below.
2.3.1.1 Procedure
Several of the DH projects follow a similar set of steps, which is not surprising, given that they are all looking for instances of text reuse. This common procedure is defined most extensively by the Tracer program, which divides the task into the following seven steps:
1. Segmentation – dividing the texts into subsets (e.g. sentences or clauses); 2. Pre-processing – parsing words and creating synonym files etc.;
3. Featuring – deciding what to search for (e.g. tri-grams);
4. Selection – removal of words/features that might cloud the results; 5. Linking – identifying places in the texts that have matching ‘features’; 6. Scoring – ranking the results;
7. Post-processing – manual analysis of the results.
The method that is used by this study, which is explained in detail in Chapter 3, also follows the same basic sequence, but with a new combination of linking and scoring and minor modifications to the ordering of the steps.
2.3.1.2 Evaluation
In terms of evaluating their own effectiveness, each of the DH projects began with a published list of instances of text reuse in their research area, trained their
method/program to detect as many of these instances as possible, and then presented the parameters of their method which proved to be the most effective.269 These parameters typically involved the level of segmentation (i.e. dividing the texts in to lines, sentences or paragraphs), the length of n-grams, and the allowable gap between matching n-grams.270
While these projects compare the effectiveness of different sets of parameters for their research area, they do not compare themselves against what has been achieved by other researchers. So, for example, the researchers who used the Tesserae program do not claim that this program is more effective than Tracer. This is because they were
269 For example, Coffee et al., 415.
270 See, for example, Büchler et al., ‘Towards a Historical Text Re-Use Detection’, 227; Horton, Olsen, and Roe, ‘Something Borrowed’, 7.
investigating a different research area and the complexity of detecting text reuse in English newspapers,271 for example, is different to the complexity of detecting text reuse of French dictionaries272 and different to the complexity of detecting instances of Honkadori in Japanese Waca poetry.273 Consequently, the researchers simply judge their own effectiveness by what has been detected manually in their particular
research area.
This study evaluates the effectiveness of the new method using the same process as these DH projects. The baseline, or the set of published/known parallels, is a combination of the lists of parallels in the two standard editions of Greek New Testament, the UBS5 and NA28, as well as Evans’ collated list of parallels.274 These three lists were chosen because they represent the three most comprehensive lists of parallels with the Septuagint and, importantly, they also list parallels with the Jewish Pseudepigrapha. During the evaluation of the method in Chapters 4–7, the parallels in these three lists that are not detected by the method are noted and discussed, including suggestions as to how the parameters of the method could be modified in order to detect them.
As well as following this DH pattern in the evaluation of the new method, the study also attempts to follow the common Biblical studies process during the analysis of individual parallels (see Chapter 1 for an explanation of this process). This analysis involves consulting with the published lists of verbal parallels by Delamarter,
271 See Chapter 2, Section 2.2.1 – METER. Paul Clough et al., ‘METER: MEasuring TExt Reuse’, Proceedings of the 40th Annual Meeting on Association for
Computational Linguistics (ACL ’02; Stroudsburg, PA, USA: Association for
Computational Linguistics, 2002).
272 See Chapter 2, Section 2.2.6 – The PAIR Program. Horton, Olsen, and Roe. 273 See Chapter 2, Section 2.2.9 – String Resemblance Systems. Takeda et al. 274 Aland et al., Greek New Testament; Aland et al., Novum Testamentum Graece; Evans, Ancient Texts.
Dittmar, Gough, Hübner, McLean, Towner, Wilson and Wolfe, as well as the commentaries of Knight and Towner.275
2.3.1.3 Benefits
The above survey of DH projects also demonstrated the benefits of searching for verbal parallels in a systematic manner. After searching through a database (or several databases) of texts, several studies reported that they had detected ‘new’ thematically coherent parallels that had not been noted in the lists published by the literary
scholars.276 These programs were able to detect these new parallels because, unlike humans, computers do not have a limited intertextual framework (see Chapter 1). As such, computers can detect every source text that contains a set of matching words, not just the source text that an individual scholar is most familiar with.277 Several of these projects also demonstrated how detecting parallels in a systematic manner can provide interesting metadata, such as which set of texts appear to be the most familiar to scholars (see Section 2.2.4 – Tesserae), and which decade a particular source text appears to have had the most influence (see Section 2.2.6 – the PAIR Program). In Chapters 4–7, the new method will also demonstrate similar benefits. It will detect ‘new’ thematically coherent parallels that are not included in the three baseline lists of known parallels. Many of these new parallels are not contained in the other lists either (see above), nor in the commentaries of Knight and Towner. The method, like
Tesserae, will also show which source texts appear to have received the least attention by scholars (by comparing the number of known parallels with those that were
275 Delamarter, A Scripture Index to Charlesworth’s The Old Testament
Pseudepigrapha; Dittmar, Vetus Testamentum in Novo; Gough, The New Testament Quotations; Hübner, Vetus Testamentum in Novo: Band 2 Corpus Paulinum; McLean, Citations and Allusions; Towner, ‘1-2 Timothy and Titus’; Wilson, Pauline Parallels;
Wolfe, ‘The Sagacious Use of Scripture’; Knight III, The Pastoral Epistles; Towner,
The Letters to Timothy and Titus.
276 See, for example, Takeda et al., 487; Ganascia, Glaudes, and Del Lungo, 14. 277 The subjectivity involved in generating computer searches is explained in Chapter 3, Section 3.3.
detected by the method). This data will highlight particular books of the Septuagint and Jewish Pseudepigrapha that could benefit from more detailed study.