Similarities with the DH Projects - Comparison with the New Method

CHAPTER 2: Insights from the Digital Humanities

2.3 Comparison with the New Method

2.3.1 Similarities with the DH Projects

The similarities between the new method and the DH projects can be divided into three areas:

1. Procedure, 2. Evaluation, 3. Benefits.

These three similarities are explained below.

2.3.1.1 Procedure

Several of the DH projects follow a similar set of steps, which is not surprising, given that they are all looking for instances of text reuse. This common procedure is defined most extensively by the Tracer program, which divides the task into the following seven steps:

1. Segmentation – dividing the texts into subsets (e.g. sentences or clauses); 2. Pre-processing – parsing words and creating synonym files etc.;

3. Featuring – deciding what to search for (e.g. tri-grams);

4. Selection – removal of words/features that might cloud the results; 5. Linking – identifying places in the texts that have matching ‘features’; 6. Scoring – ranking the results;

7. Post-processing – manual analysis of the results.

The method that is used by this study, which is explained in detail in Chapter 3, also follows the same basic sequence, but with a new combination of linking and scoring and minor modifications to the ordering of the steps.

2.3.1.2 Evaluation

In terms of evaluating their own effectiveness, each of the DH projects began with a published list of instances of text reuse in their research area, trained their

method/program to detect as many of these instances as possible, and then presented the parameters of their method which proved to be the most effective.269 These parameters typically involved the level of segmentation (i.e. dividing the texts in to lines, sentences or paragraphs), the length of n-grams, and the allowable gap between matching n-grams.270

While these projects compare the effectiveness of different sets of parameters for their research area, they do not compare themselves against what has been achieved by other researchers. So, for example, the researchers who used the Tesserae program do not claim that this program is more effective than Tracer. This is because they were

269_{For example, Coffee et al., 415.}

270_{See, for example, Büchler et al., ‘Towards a Historical Text Re-Use Detection’,} 227; Horton, Olsen, and Roe, ‘Something Borrowed’, 7.

investigating a different research area and the complexity of detecting text reuse in English newspapers,271 for example, is different to the complexity of detecting text reuse of French dictionaries272 and different to the complexity of detecting instances of Honkadori in Japanese Waca poetry.273 Consequently, the researchers simply judge their own effectiveness by what has been detected manually in their particular

research area.

This study evaluates the effectiveness of the new method using the same process as these DH projects. The baseline, or the set of published/known parallels, is a combination of the lists of parallels in the two standard editions of Greek New Testament, the UBS5_{and NA}28_{, as well as Evans’ collated list of parallels.}274_These three lists were chosen because they represent the three most comprehensive lists of parallels with the Septuagint and, importantly, they also list parallels with the Jewish Pseudepigrapha. During the evaluation of the method in Chapters 4–7, the parallels in these three lists that are not detected by the method are noted and discussed, including suggestions as to how the parameters of the method could be modified in order to detect them.

As well as following this DH pattern in the evaluation of the new method, the study also attempts to follow the common Biblical studies process during the analysis of individual parallels (see Chapter 1 for an explanation of this process). This analysis involves consulting with the published lists of verbal parallels by Delamarter,

271_{See Chapter 2, Section 2.2.1 – METER. Paul Clough et al., ‘METER: MEasuring} TExt Reuse’, Proceedings of the 40th Annual Meeting on Association for

Computational Linguistics (ACL ’02; Stroudsburg, PA, USA: Association for

Computational Linguistics, 2002).

272_{See Chapter 2, Section 2.2.6 – The PAIR Program. Horton, Olsen, and Roe.} 273_{See Chapter 2, Section 2.2.9 – String Resemblance Systems. Takeda et al.} 274_{Aland et al., Greek New Testament; Aland et al., Novum Testamentum Graece;} Evans, Ancient Texts.

Dittmar, Gough, Hübner, McLean, Towner, Wilson and Wolfe, as well as the commentaries of Knight and Towner.275

2.3.1.3 Benefits

The above survey of DH projects also demonstrated the benefits of searching for verbal parallels in a systematic manner. After searching through a database (or several databases) of texts, several studies reported that they had detected ‘new’ thematically coherent parallels that had not been noted in the lists published by the literary

scholars.276 These programs were able to detect these new parallels because, unlike humans, computers do not have a limited intertextual framework (see Chapter 1). As such, computers can detect every source text that contains a set of matching words, not just the source text that an individual scholar is most familiar with.277_{Several of} these projects also demonstrated how detecting parallels in a systematic manner can provide interesting metadata, such as which set of texts appear to be the most familiar to scholars (see Section 2.2.4 – Tesserae), and which decade a particular source text appears to have had the most influence (see Section 2.2.6 – the PAIR Program). In Chapters 4–7, the new method will also demonstrate similar benefits. It will detect ‘new’ thematically coherent parallels that are not included in the three baseline lists of known parallels. Many of these new parallels are not contained in the other lists either (see above), nor in the commentaries of Knight and Towner. The method, like

Tesserae, will also show which source texts appear to have received the least attention by scholars (by comparing the number of known parallels with those that were

275_{Delamarter, A Scripture Index to Charlesworth’s The Old Testament}

Pseudepigrapha; Dittmar, Vetus Testamentum in Novo; Gough, The New Testament Quotations; Hübner, Vetus Testamentum in Novo: Band 2 Corpus Paulinum; McLean, Citations and Allusions; Towner, ‘1-2 Timothy and Titus’; Wilson, Pauline Parallels;

Wolfe, ‘The Sagacious Use of Scripture’; Knight III, The Pastoral Epistles; Towner,

The Letters to Timothy and Titus.

276_{See, for example, Takeda et al., 487; Ganascia, Glaudes, and Del Lungo, 14.} 277_{The subjectivity involved in generating computer searches is explained in Chapter} 3, Section 3.3.

detected by the method). This data will highlight particular books of the Septuagint and Jewish Pseudepigrapha that could benefit from more detailed study.

In document Echoes of Scripture and the Jewish Pseudepigrapha in the Pastoral Epistles: Including a Method of Identifying High-interest Parallels (Page 80-84)