Text Division - Biblical Hebrew

In chap. 1 we used, without comment, some examples wherein the basic units for syntactic analysis were parts of words, a word being conventionally defined as a continuous sequence of Hebrew consonants and vowels bounded by spacers: (white) space, line end, verse-end marker (sof pasuq), or dash (maqqep).In general, our basic unit of syntactic analysis is the segment. A seg-ment can be a word (“a free morpheme”), a part of a word (“a bound morpheme”), or a sequence of words. (For the place of morphology in our work, see appendix §A2.3.1.)

We dissect prepositions and pronoun suffixes from nouns. For example, above (§1.3.1) we analyzed וֹי ְר ִפּ ִמ into three segments וֹ + ′י ְר ִפּ + ′ ִמ ‘from’ + ‘fruit’ + ‘its’. We also detach definite articles and conjunctions. For example, םוֹיּ ַבּ is made up of three segments:

םוֹיּ + ַ + בּ ‘in’ + ‘the’ + ‘day’

while םוּשׂ ְפּ ְתִיּ ַו consists of these three segments:

ם + וּשׂ ְפּ ְתִיּ + ַו ‘and’ + ‘they-seized’ + ‘them’

This example shows that we separate one-consonant segments (the conjunction ַו) and object pro-noun suffixes from verbs but not the subject propro-noun affixes of verbs. Segments are also formed by the ligaturing of words, almost invariably forming proper nouns. For example, in our analysis, we ligature the two-word sequence ל ֵא־תי ֵבּ to form a single segment glossed ‘Bethel’.

It should be clear that we do not accept the lexicalist hypothesis: “the view that rules of syntax may not refer to elements smaller than a single word.”¹ As is often the case in Anglo-American linguistics, this hypothesis is “English-o-centric.” Rejection of the lexicalist hypothesis is war-ranted for Biblical Hebrew because its words commonly consist of sequences of morphemes, each having a distinctive role in syntax. For example, were we to treat הּ ָמּ ִא ְלוּ in Gen 24:53 as a single constituent, important syntactical structure would be hidden. We therefore divide this word into four segments:

הָּ + ′ מּ ִא + ′ ְל + ′וּ ‘and’ + ‘to’ + ‘mother’ + ‘her’

We divide 1,258 four-segment words, 20,759 three-segment words, and 120,007 two-segment words into their segments.

1. Robert L. Trask, A Dictionary of Grammatical Terms in Linguistics (London: Routledge, 1993) 157.

2.1 Words, Segments, and Ligatures

Having made the global decision to segment the words making up the biblical text, we were left with a large number of local decisions. We needed to choose which words should be segmented or joined and how. Making these choices required repeatedly answering three questions:

1. Segmentation. Should a given word be segmented or left whole?

2. Selection of cut-point. Where should a multisegment word be sliced?

3. Ligaturing. When should a sequence of words be ligatured?

2.1.1 Segmentation

An example from English may help readers understand what the “segment / no segment” deci-sion involves. Consider the word “tomorrow.” This noun evolved from a prepositional phrase.² It is instructive to examine this word in Exod 17:9 as printed in three English versions. The ^kjv uses the term “to morrow,” the ^asv has “to-morrow,” and the ^rsv has “tomorrow.” Were we to perform a syntactic analysis of the verse in English, how would we represent “tomorrow”—as two segments or as one? The answer depends on the word’s lexical status when the version was published.

Our decisions regarding word segmentation are mostly intuitive, with a few exceptions.³ Our segmentation of one sort of word deserves comment. We always segment יֵנ ְפ ִל into יֵנ ְפ + ′ ִל ‘to’ +

‘face-of ’. We do so because we, and most writers of Biblical Hebrew grammars, view יֵנ ְפ ִל as a preposition “prefixed to the substantive םיִנ ָפּ in the construct state.”⁴ The English translation ‘be-fore’ has lexicalized the compound.⁵ Compare “in spite of,” “instead of,” and “in place of.” The preposition יֵנ ְפ ִל is not the only word of this kind in Biblical Hebrew.

2.1.2 Selection of Cut-Point

Early in our work, we segmented the texts ad libitum, that is, “by the seats-of-our-pants.” This approach was indescribably tedious, and the results were unacceptably inconsistent. We therefore enunciated a set of sequentially applied rules that were at first manually implemented. This manual approach increased the tedium and, surprisingly, led to results that were not particularly consistent.

Our next gambit was to have the computer enforce consistency. The computer was programmed to move an arrow along through the text, pausing at positions where a word cut might be made, and waiting until the human analyst accepted or rejected the proposed segmentation. If the segmenta-tion was accepted, then an arrow was left in posisegmenta-tion showing the cut, and the text file was adjusted

2. E. C. Traugott, “Grammaticalization and Lexicalization,” in Concise Encyclopedia of Grammatical Categories (ed. K. Brown and J. Miller; Oxford: Elsevier, 1999) 182.

3. For specifics, see F. I. Andersen and A. D. Forbes, A Linguistic Concordance of Ruth and Jonah (Wooster, OH:

Biblical Research Associates, 1976) 23–26.

4. Bill T. Arnold and John H. Choi, A Guide to Biblical Hebrew Syntax (Cambridge: Cambridge University Press, 2003) 115. Other works referring to the complex nature of יֵנ ְפ ִל include GKC, 377; R. J. Williams, Williams’ Hebrew Syntax (3rd ed.; Toronto: University of Toronto Press, 2007) 135–36; Bruce K. Waltke and M. O’Connor, An Introduc-tion to Biblical Hebrew Syntax (hereafter IBHS; Winona Lake, IN: Eisenbrauns, 1990) 221. Some treat יֵנ ְפ ִל as unitary, as in C. H. J. van der Merwe, Jackie A. Naudé, and Jan H. Kroeze, A Biblical Hebrew Reference Grammar (Biblical Languages: Hebrew 3; Sheffield: Sheffield Academic Press, 1999) 287.

5. Lexicalization is “the process or result of assigning to a word or phrase the status of a ^lexeme,” according to R. R. K. Hartmann and Gregory James, Dictionary of Lexicography (London: Routledge, 1998) 84.

accordingly. Figure 2.1, a black-and-white photo of our computer monitor screen, was made in early 1971.⁶ It shows the opening words of the book of Ruth with the segmenting arrows in place.

Figure 2.1.

The pair of arrows at the beginning of the third line flanks the definite article represented by “ ָ .”

Our final and most accurate approach to segmentation involved “bootstrapping.” The computer used the segmentation manifested in a stretch of correctly segmented text to segment new, previ-ously unseen text. There were arrow-insertion criteria and arrow-exclusion criteria. The new results were then manually corrected, and the newly checked results were used to segment the next block of text. And so on. This approach yielded quite high-quality results that were then perfected in the process of building a computerized, lemmatized dictionary for the Hebrew Bible.

2.1.3 Ligaturing

We deal with ligatured words every day. Consider “New York.” This proper noun has been lexi-calized for a long time and so, in our system, would be declared to be a single segment. In the case of Biblical Hebrew, we have declared hundreds of proper nouns, plus four common nouns and one subordinating conjunction, to be lexicalized. We have joined adjacent words to produce 382 distinct segments. Two-thirds of these appear only once in the MT. The most frequently occurring ligatured item is the subordinating conjunction ם ִא י ִכּ ‘except’ (120×). The second most-frequent is forms of ל ֵא־תי ֵבּ ‘Bethel’ (72×), and the third is ם ֶח ֶל תי ֵבּ ‘Bethlehem’ (41×). By our analysis, Biblical Hebrew contains almost one thousand ligatured segments.

2.2 Chunking the Text into Clauses

2.2.1 Rule-Based Clause Onset Detection

For computational parsing, the largest appropriate units are clauses. In our approach, a clause typically consists of a predicator and the constituents that accompany it, a predicator being a verbal or quasiverbal constituent that specifies equivalence, activity, state, or process. A single finite verb can be a whole clause.⁷ We have explained elsewhere just how we determined clause boundaries.⁸ Here we briefly provide the flavor of that work.

6. The pin-cushion distortion in the photo is due to the monitor screen’s non-planarity. Four decades later, it is amusing to recall how inordinately pleased we were with our pathetic little unkerned Hebrew stick characters!

7. Even a single constituent can form a “verbless” (or “nominal”) clause lacking a predicator. See chap. 19.

8. F. I. Andersen and A. D. Forbes, “On Marking Clause Boundaries,” in Bible et Informatique: Interprétation, Her-méneutique, Compétence Informatique (Paris: Honoré Champion, 1992) 181–202. On the general question, see Marjo C. A. Korpel and Josef Oesch, eds., Unit Delimitation in Biblical Hebrew and Northwest Semitic Literature (Pericope 4; Assen: Van Gorcum, 2003).

We relied on a dozen ordered rules for detecting clause onset—clause-offset detection being a more difficult problem. But, of course, a true main clause onset must coincide with a true offset, set-ting aside the first clause in the Bible. Our rules are highly heuristic. To provide a sense of our ap-proach, we quote four of the rules from our referenced essay, tell how often they were used (“how often they fired”), and tell their individual accuracies across Biblical Hebrew.

1. Rule A. “The quoting formula רֹמא ֵל is usually followed immediately by a quoted

speech.” This rule fired 939 times, 929 of them correctly—a 99% true positive rate. There were 10 places where a speech did not immediately follow the quoting formula.

2. Rule B. “A waw-sequential construction usually begins a new clause.” In 20,691 of its 20,907 firings a new clause begins—a 99% true positive rate.

3. Rule F. “When the first word in a verse is a predicator, it is likely that it begins a . . . clause.” This rule fired 2,520 times, 17 incorrectly—a 97% true positive rate.

4. Rule L. “Each new chapter probably begins a new clause.” This rule fired 929 times, once

“incorrectly”—a 99.9% true positive rate. The verse prior to Jer 3:1 ends with a complete and well-formed clause. Then Jer 3:1 begins with a stranded רֹמא ֵל.

Our suite of rules correctly found almost two-thirds of the clause onsets with very few false onset detections but with a fair number of fragmentary “clauses.” Completion of the task of clause isola-tion required careful human over-reading.

2.2.2 Clause-Boundary Ambiguity

Clause-onset position can be ambiguous. Consider three clauses from Exod 17:9:

ה ָע ְב ִגּ ַה שׁאֹר־ל ַע ב ָצּ ִנ י ִכֹנ ָא ר ָח ָמ ק ֵל ָמ ֲע ַבּ ם ֵח ָלּ ִה א ֵצ ְו The ^njps reads:

“. . . go out and do battle with Amalek. Tomorrow I will station myself on the top of the hill . . .”

But it might also be rendered:⁹

“. . . go out and do battle with Amalek tomorrow. I will station myself on the top of the hill . . .”

Whether a clause boundary should occur before or after “tomorrow” is formally ambiguous. In circumstances of this sort, we have the choice of either somehow representing both clause divisions or selecting and representing the more compelling division. We decided to provide mechanisms for handling the former option but, in this first pass at analysis, represented only the latter. This choice, however, leaves us to decide which of the several options is preferred. For the present example, we follow the cantillations, as do all seven English Bibles consulted, and divide the clauses before

“tomorrow.” We let unanimity rule, mindful that neither cantillations nor scholarly consensus is always correct.

9. Bear in mind that we have opted not to be constrained by the cantillations. The athnaḥ in Exod 17:9 encodes a pause before “tomorrow,” indicating that the Masoretes divided the verbs into two clauses at this point. See further Emanuel Tov, Textual Criticism of the Hebrew Bible (2nd ed.; Minneapolis: Fortress, 2001) 68–69.

2.3 Brief Summary

Segments. Our basic units of analysis, segments, are whole words (“free morphemes”), parts of words (“bound morphemes”), or ligatured words (“lexicalized phrases”). Delimitation of seg-ments relied on computational “bootstrapping” and consistency enforcement followed by correction by an expert.

Clause Delimitation. Delimitation of clause boundaries involved computational application of a set of heuristic clause-onset rules followed by correction by an expert.

In document Biblical Hebrew (Page 34-39)