• No results found

CHAPTER 3: the Method of Identifying High-Interest Parallels

3.2 The Steps Involved in the Method

3.2.4 Featuring

The third step in the method is Featuring, which involves deciding what type of elements/features are to be detected and then performing the necessary preparation so that source and target texts can be linked based on these selected features. This study chose to use a set of five different features, which are based on the five different

299 For example, Google Translate or WordNet.

300 Walter Bauer, A Greek-English Lexicon of the New Testament and Other Early

Christian Literature (ed. Frederick William Danker; 3rd ed.; Chicago: University of

Chicago Press, 2001); Gerhard Kittel and Gerhard Friedrich, Theological Dictionary

of the New Testament (10th ed.; Grand Rapids: Wm. B. Eerdmans Publishing Co.,

reference forms that are listed in Chapter 1, Section 1.3. These features are shown in Figure 3.2.4.1, below. Some of the fields in the ‘Keyword’ search feature, including ‘Max Separation’ and ‘Word order,’ are not used by the method. However, they are included in the definition for consistency with the other features.

Figure 3.2.4.1 – The Initial Set of Five Search Features

Each of the search features has a field called ‘Alternatives’ that refers to its level of lexical alternatives (as explained in the previous section). The value of this field will influence the number of source texts that are returned by each search. When the source texts are large, setting this value to ‘broad’ synonyms will make it difficult to find a combination of words that is only found in a small number of source texts. Within this study, the value of ‘Alternatives’ for the ‘Keyword’ feature was set to ‘one word’ and the value for the other search features was set to ‘narrow’ synonyms. If a parallel was selected for further analysis (based on potential singularity and thematic coherence), then the surrounding words in the source and target texts were manually surveyed for ‘broad’ synonyms.

As discussed in Chapter 8, after the method has been applied to a large set of source texts (e.g. the Septuagint) with these settings, the method can be applied again to a smaller set of source texts with the most parallels (e.g. the Psalms) with the value of ‘Alternatives’ extended to ‘broad’ synonyms for these search features.

Name: Keyword Word order: fixed Alternatives: one word Min words: 1

Max words: 1 Max separation: 1

Name: Verbatim Word order: fixed Alternatives: narrow Min words: 2 Max words: segment Max separation: 2

Name: Non-verbatim Word order: flexible Alternatives: narrow Min words: 2 Max words: segment Max separation: 5 Name: Multiple Keywords

Word order: flexible Alternatives: narrow Min words: 2 Max words: segment Max separation: 30

Name: Multiple Segments Word order: flexible Alternatives: narrow Number of segments: 2 Max segment separation: 30 S1 min words: 1

S1 max words: 1 S1 max separation: 1 S2 min words: 1 S2 max words: segment S2 max separation: 5

After defining the features, the method prepares to look for them. If the platform was Tracer or Tesserae, this would involve dividing the source and target texts into n- grams.301 However, in order to take advantage of the source texts that are contained in Accordance, the study adopts an approach that is similar to the PHŒBUS project, which is to define syntax rules.302

Like the syntax rules of a traditional language like English, the syntax rules that are defined by the method are relatively simple, but there are many of them.303 Every search feature needs a different set of rules for every different length segment.304 The following explanation section shows what these rules look like when a segment with four words is followed by a second segment that also has four words.305 Since

Accordance is used to interpret (or test) the rules, they are in the format that is used in Accordance command-line searches. However, the rules could equally be written in SQL or any other database query language, or as rules in a PROLOG program, as they are in the PHŒBUS project.306

For the most basic search feature, a ‘Keyword’, there is one syntax rule for every word in the segment. If the segment has four words – ‘AAA’, ‘BBB’, ‘CCC’ and ‘DDD’ – then any one of these words could potentially be a keyword, so the method generates four syntax rules, as follows:307

301 ‘Tesserae: Advanced Search’; Franzini et al., ‘TRACER: A User Manual’. 302 Ganascia, Glaudes, and Del Lungo.

303 Nugues, Language Processing with Perl and Prolog: Theories, Implementation,

and Application, 253ff.

304 Which is why the study limited segment lengths to 14 words.

305 The rules for structural parallels require two segments to be considered together. 306 W. F. Clocksin and C. S. Mellish, Programming in Prolog (5th ed.; Heidelberg: Springer, 2003); Nugues, Language Processing with Perl and Prolog: Theories,

Implementation, and Application; Ganascia, Glaudes, and Del Lungo.

307 The words ‘AAA’, ‘BBB’, ‘CCC’ and ‘DDD’ etc. are used in these definitions because these were the actual strings (i.e. sequences of characters) used in the

definition of the syntax rules (see Appendix A for a sample of these rules). During the syntax analysis, each word (e.g. AAA) was replaced by its lexical alternatives, or the

1. (AAA) 2. (BBB) 3. (CCC) 4. (DDD)

The rules for a ‘Verbatim’ search feature are also relatively simple. The ‘Word order’ field in this feature is set to ‘fixed’ and the ‘Max separation’ field is set to 2, so the words in these rules are separated by the Accordance search elements/commands ‘<followed by>’ and ‘<within 2 words>’. The method then generates six rules for the same four-word segment:

1. (AAA) <followed by><within 2 words> (BBB)

2. (AAA) <followed by><within 2 words> (BBB) <followed by><within 2 words> (CCC) 3. (AAA) <followed by><within 2 words> (BBB) <followed by><within 2 words> (CCC) <followed by><within 2 words> (DDD)

4. (BBB) <followed by><within 2 words> (CCC)

5. (BBB) <followed by><within 2 words> (CCC) <followed by><within 2 words> (DDD) 6. (CCC) <followed by><within 2 words> (DDD)

The ‘Non-verbatim’ search feature is only slightly more complicated than the

‘Verbatim’ feature. The ‘Word order’ field for this reference form is set to ‘flexible’, meaning that there is no <followed by> restriction in the syntax rules. The value of ‘Max separation’ is increased to 5 to allow for this more flexible arrangement of words. Therefore, the rules generated for the same four-word segment are: 1. (AAA) <within 5 words> (BBB)

2. (AAA) <within 5 words> (BBB) <within 5 words> (CCC)

3. (AAA) <within 5 words> (BBB) <within 5 words> (CCC) <within 5 words> (DDD) 4. (AAA) <within 5 words> (BBB) <within 5 words> (DDD)

5. (AAA) <within 5 words> (CCC)

6. (AAA) <within 5 words> (CCC) <within 5 words> (DDD) 7. (AAA) <within 5 words> (DDD)

8. (BBB) <within 5 words> (CCC)

9. (BBB) <within 5 words> (CCC) <within 5 words> (DDD) 10. (BBB) <within 5 words> (DDD)

11. (CCC) <within 5 words> (DDD)

The rules for the ‘Multiple Keywords’ search feature are similar to the ones used to define the ‘Non-verbatim’ feature, except that the matching words (i.e. the keywords) in the target text are from more than one clause of the source text. This is illustrated by the reference to Numbers 21 in John 3:14a:

καὶ κατελάλει ὁ λαὸς πρὸς τὸν θεὸν καὶ κατὰ Μωυσῆ λέγοντες Ἵνα τί ἐξήγαγες ἡµᾶς ἐξ Αἰγύπτου ἀποκτεῖναι ἡµᾶς ἐν τῇ ἐρήµῳ; ὅτι οὐκ ἔστιν ἄρτος οὐδὲ ὕδωρ, ἡ δὲ ψυχὴ ἡµῶν προσώχθισεν ἐν τῷ ἄρτῳ τῷ διακένῳ.

set of equivalent words that match the word in a search of the source texts (hence the need for the words to be in brackets in these rule definitions). This is explained in more detail below.

καὶ ἀπέστειλεν κύριος εἰς τὸν λαὸν τοὺς ὄφεις […] καὶ ἐποίησεν

Μωυσῆς ὄφιν χαλκοῦν καὶ ἔστησεν αὐτὸν ἐπὶ σηµείου, καὶ ἐγένετο ὅταν ἔδακνεν ὄφις ἄνθρωπον, καὶ ἐπέβλεψεν ἐπὶ τὸν ὄφιν τὸν χαλκοῦν καὶ ἔζη.308 (Num 21:5-9)

Καὶ καθὼς Μωϋσῆς ὕψωσεν τὸν ὄφιν ἐν τῇ ἐρήµῳ309 (John 3:14a) The three keywords in John 3:14 are not found together in any single clause of the source text (i.e. Numbers 21); the closest combination of the words being 24 words apart. In order to cater for examples like these, the value of ‘Max

separation’ is set to 30 for this search feature. The rules generated for the same four-word segment are then as follows:

1. (AAA) <within 30 words> (BBB)

2. (AAA) <within 30 words> (BBB) <within 30 words> (CCC)

3. (AAA) <within 30 words> (BBB) <within 30 words> (CCC) <within 30 words> (DDD) 4. (AAA) <within 30 words> (BBB) <within 30 words> (DDD)

5. (AAA) <within 30 words> (CCC)

6. (AAA) <within 30 words> (CCC) <within 30 words> (DDD) 7. (AAA) <within 30 words> (DDD)

8. (BBB) <within 30 words> (CCC)

9. (BBB) <within 30 words> (CCC) <within 30 words> (DDD) 10. (BBB) <within 30 words> (DDD)

11. (CCC) <within 30 words> (DDD)

Finally, the complexity for the ‘Multiple Segments’ search feature is greater than those of the other search features because they span more than one segment of words. In order to simplify the number of rules that are required (and hence reduce the number of searches involved), this study defines this search feature as having one word from a first segment and then one or more words from the following segment. It will be noted in the following chapters that even with this simplified definition of structural parallels, the number of searches involved for this reference form is about the same as all the others combined.

308 ET: ‘And the people spoke against God and against Moses, saying, ‘Why did you lead us out of Egypt for us to die in the wilderness? For there is no bread or water and our soul detests this miserable food.’ And the Lord sent snakes to the people […] And Moses made a bronze snake and placed it as a sign. And it came to be that whenever a snake bit a man, and he looked upon the bronze snake, and he lived.’

Words from this second segment should appear in the same phrase/clause of a source text. For the sake of simplicity, this is simulated by requiring the words to be found ‘<within 5 words>’ of each other. Words from the two adjacent segments of the target text can appear in adjacent clauses of the source text. For the sake of simplicity, this is simulated by requiring the words to be found ‘<within 30 words>’ of each other. So, if the same four-word segment (i.e. ‘AAA’, ‘BBB’, ‘CCC’ and ‘DDD’) were

followed by another four-word segment with the words ‘111’, ‘222’, ‘333’ and ‘444’, then the syntax rules that are generated would be:

1. (AAA) <within 30 words> (111)

2. (AAA) <within 30 words> ((111) <within 5 words> (222))

3. (AAA) <within 30 words> ((111) <within 5 words> (222) <within 5 words> (333))

4. (AAA) <within 30 words> ((111) <within 5 words> (222) <within 5 words> (333) <within 5 words> (444)) 5. (AAA) <within 30 words> ((111) <within 5 words> (333))

6. (AAA) <within 30 words> ((111) <within 5 words> (333) <within 5 words> (444)) 7. (AAA) <within 30 words> ((111) <within 5 words> (444))

8. (AAA) <within 30 words> (222)

9. (AAA) <within 30 words> ((222) <within 5 words> (333))

10. (AAA) <within 30 words> ((222) <within 5 words> (333) <within 5 words> (444)) 11. (AAA) <within 30 words> ((222) <within 5 words> (444))

12. (AAA) <within 30 words> (333)

13. (AAA) <within 30 words> ((333) <within 5 words> (444))

14. (AAA) <within 30 words> ((222) <within 5 words> (333) <within 5 words> (444)) 15. (AAA) <within 30 words> (444)

16. (BBB) … 31. (CCC) … 46. (DDD) ….310

In this manner, the method generates all the required syntax rules for each of the different search features.

The fives sets of rules that are listed above illustrate what is required for a four-word segment that is followed by a second four-word segment. In this study, these rules were stored in a Microsoft Word file called ‘Master4x4’, which has a total of 85 rules. Similar files were also created for the other segment combinations. For example, the rules for a seven-word segment that is followed by a second six-word segment were stored in a file called ‘Master7x6’, which has over 300 rules. The largest file that was required for this study, the ‘Master7x14’ file, contains over 1400 rules.

If the maximum segment length is fourteen words, as it was in this study, then there are potentially 196 (i.e. 14x14) different sets of rules, or ‘Master’ files. However, some of these files may not be required. For example, there were no segments with

310 There are four words in the second segment, so every word in the first segment has 15 potential combinations with the words in the second segment.

thirteen words in the Pastoral Epistles so there was no need to generate the files for these combinations.

The generation of these files took only a few days because the rules in each file are similar. For example, the rules in the ‘Master7x6’ file are a simplified version of the rules in the ‘Master7x14’ file. This process is similar to writing a set rules/definitions for a PROLOG program.311 The main difference is that the rules are subsequently instantiated and scanned (see below) using the Accordance program, rather than a PROLOG interpreter, in order to avoid the labor-intensive task of creating the databases of parsed source texts. The following step, Linking and Scoring, outlines how the rules are instantiated and scanned/tested.