Chapter 2 The Nature of Authentic and Simplified Texts
2.4 Comparative analysis of authentic and simplified texts
2.4.1 The lexical profile statistics of simplified and authentic samples
It may prove beneficial to begin with examples of the opening paraphrases in both the simplified and authentic versions of Oliver Twist and Great Expectations. These examples will be restricted to the first paragraph in the simplified versions, but then the contents will need to be matched with that of the authentic counterparts. For this comparison, the four quoted passages will be compared using AntWordProfiler 1.4.0w (Anthony 2013) after saving the four passages in four individual .TXT files (The significance of the coloured text in the quoted passages will be explained below).
Oliver Twist
The first paragraph of the simplified version of Oliver Twist reads:
Oliver Twist was born in a workhouse, and when he arrived in this hard world, it was very doubtful whether he would live beyond the first three minutes. He lay on a hard little bed and struggled to start breathing.
(Oliver Twist, Oxford Bookworms Library, Stage 6, Chap. 1)
Table 2.4 Oliver Twist – lexical profile statistics for simplified paragraph
File Token Type Group Token%
basewrd1.txt 35 28 28 87.5%
basewrd2.txt 3 3 3 7.5%
basewrd10.txt 1 1 1 2.5%
basewrd31.txt 1 1 1 2.5%
Total 40 33 33 100%
The respective authentic text reads:
Among other public buildings in a certain town, which for many reasons it will be prudent to refrain from mentioning, and to which I will assign no fictitious name, there is one anciently common to most towns, great or small: to wit, a workhouse; and in this workhouse was born; on a day and date which I need not trouble myself to repeat, inasmuch as it can be of no possible
consequence to the reader, in this stage of the business at all events; the item of mortality whose name is prefixed to the head of this chapter.
For a long time after it was ushered into this world of sorrow and trouble, by the parish surgeon, it remained a matter of considerable doubt whether the
45
child would survive to bear any name at all; in which case it is somewhat more than probable that these memoirs would never have appeared; or, if they had, that being comprised within a couple of pages, they would have possessed the inestimable merit of being the most concise and faithful
specimen of biography, extant in the literature of any age or country.
(Oliver Twist, Oxford World’s Classics, Chap. 1)
Table 2.5 Oliver Twist – lexical profile statistics for authentic paragraph
File Token Type Group Token%
basewrd1.txt 153 83 76 80.95% basewrd2.txt 10 10 10 5.29% basewrd3.txt 8 8 8 4.23% basewrd4.txt 5 5 5 2.65% basewrd5.txt 4 4 4 2.12% basewrd6.txt 1 1 1 0.53% basewrd8.txt 1 1 1 0.53% basewrd9.txt 2 2 2 1.06% basewrd10.txt 2 1 1 1.06% basewrd11.txt 1 1 1 0.53% basewrd13.txt 1 1 1 0.53% basewrd16.txt 1 1 1 0.53% Total 189 118 111 100% Great Expectations
The first paragraph of the simplified version of Great Expectations reads:
My father’s family name being Pirrip, and my Christian name Philip, my infant tongue could make of both names nothing longer than Pip. So I called myself Pip, and came to be called Pip. Having lost both my parents in my infancy, I was brought up by my sister, Mrs Joe Gargery, who married the local blacksmith.
(Great Expectations, Penguin Readers, Level 6, Chap. 1)
Table 2.6 Great Expectations – lexical profile statistics for simplified paragraph File Token Type Group Token%
basewrd1.txt 46 36 30 79.31%
basewrd2.txt 1 1 1 1.72%
basewrd3.txt 2 2 1 3.45%
basewrd8.txt 1 1 1 1.72%
46
basewrd31.txt 3 3 3 5.17%
Non-level list words 2 2 2 3.45%
Total 58 46 39 100%
And the respective authentic text reads:
My father’s family name being Pirrip, and my Christian name Philip, my infant tongue could make of both names nothing longer or more explicit than Pip. So, I called myself Pip, and came to be called Pip.
I give Pirrip as my father’s family name, on the authority of his tombstone
and my sister – Mrs Joe Gargery, who married the blacksmith. As I never saw my father or my mother, and never saw any likeness of either of them (for their days were long before the days of photographs), my first fancies regarding what they were like, were unreasonably derived from their
tombstones. The shape of the letters on my father’s, gave me an odd idea that he was a square, stout, dark man, with curly black hair. From the character and turn of the inscription, ‘Also Georgiana Wife of the Above,’ I drew a childish conclusion that my mother was freckled and sickly. To five little stone lozenges, each about a foot and a half long, which were arranged in a neat row beside their grave, and were sacred to the memory of five little brothers of mine – who gave up trying to get a living, exceedingly early in that universal struggle – I am indebted for a belief I religiously entertained that they had all been born on their backs with their hands in their trousers- pockets, and had never taken them out in this state of existence.
(Great Expectations, Penguin Classics, Vol. 1, Chap. 1)
Table 2.7 Great Expectations – lexical profile statistics for authentic paragraph
File Token Type Group Token%
basewrd1.txt 200 106 89 82.64% basewrd2.txt 11 11 11 4.55% basewrd3.txt 11 11 11 4.55% basewrd4.txt 2 2 2 0.83% basewrd6.txt 2 2 2 0.83% basewrd7.txt 1 1 1 0.41% basewrd8.txt 2 2 2 0.83% basewrd11.txt 3 1 1 1.24% basewrd12.txt 1 1 1 0.41% basewrd31.txt 4 4 4 1.65%
47
basewrd33.txt 2 2 1 0.83%
Non-level list words 3 2 2 1.24%
Total 242 145 127 100%
In order to examine the vocabulary load contained within these four short passages, the concept of ‘word family’ requires introduction. In order to understand what word family refers to, it is essential to recall the definition of the headword as addressed in Chapter 1. The headword, as defined earlier, refers to a word in its basic form that can be extended to other related forms. Bauer and Nation (1993) consider that a word family ‘consists of a base word and all its derived and inflected forms that can be understood by a learner without having to learn each form separately’ (Bauer & Nation 1993: 253; emphasis added). Therefore, the word family
refers to ‘a headword, its inflected forms, and its closely related derived forms’ (Nation 2001: 8; emphasis added). These in sum are three categories, namely, the headword (also referred to as the base, root or stem word) and its inflection and derivation. Schmitt (2000) clarifies what is meant by inflection and derivation (derivative forms), with the inflected forms of a headword resulting from the inclusion of affixes for grammatical purposes; an example of an inflected word being when the base form (headword) walk morphs into walked, walking and walks. These three forms, together with the headword, still function as the same part of speech category, that is, they are all verbs. If adding the affixes to the headword ‘change[s] the word class’, then ‘the result is a derivative’ form, such as the derivative forms stimulative and stimulation originating from the headword stimulate (Schmitt 2000: 2) (Note that while the terms lexeme, headword and word family are used interchangeably in this thesis to refer to the same concept, each is specific to a particular field of study, where the lexeme is employed in linguistics (semantics) to describe a certain unit of lexical meaning that exists irrespective of how many inflective endings it may have, or the total number of words that comprise it; the headword is found in the field of English as a second language (SLA) to define a word under which a series of
48
dictionary entries appear; and the word family is used in corpus linguistics to describe a word in its base form, alongside any inflected or derived forms created through affixation).
The difference between the lemma and the headword (basic form) is that lemmas maintain the same category in respect to the part of speech. Moreover, when discussing the concept of word families, a headword will include both the inflected and derived forms ‘even if the part of speech is not the same’ (Nation 2004: 6; emphasis added), as stated above. The result will then be that a headword, as Nation (2006) suggests, can include more than one lemma. Using the example of the headword abbreviate, this takes the following family members: abbreviate, abbreviates, abbreviated, abbreviating, abbreviation, abbreviations. If these were treated as lemmas, abbreviate and abbreviation would represent two different lemmas, the first of which embraces the four family members of abbreviate, abbreviates, abbreviated and abbreviating in its verb class; with the latter taking on the two members of abbreviation and abbreviations as nouns.