DESIGN OF THE STUDY - The Effect of Grammatical Error on Holistic Scoring. in the Essays of ESL

The objective of this study was to determine if grammatical error and holistic scores are correlated on departmental final essay exams of NNS students in the

composition program at Kean University. The study also sought to establish whether a high incidence of grammatical error is correlated with failing holistic exam scores and conversely, if fewer grammatical errors are correlated with a higher incidence of passing scores.

Setting

This study was carried out at Kean University, a state university located in Union, New Jersey. Kean’s predominantly commuter student body comes from a variety of social, economic, and linguistic backgrounds and is among the most multicultural in New Jersey (Kean University, 2009-2010).

Admission to Kean is relatively uncompetitive, with an acceptance rate of

approximately seventy percent (College Board, 2011). The mean SAT score for students matriculating into Kean in Fall 2009 was 1378 (Kean University, 2009-2010) out of a possible score of 2400, a figure that was well below the statewide average score for that year of 1505 (Rispoli, 2009).

Many of the students who attend Kean can be characterized as academically disadvantaged (Kean University, 2007), a fact reflected in the large number of students

who take remedial courses as preparation for regular college courses. In Fall 2009, approximately two-thirds of all first-time, full-time Kean students were enrolled in remediation courses (Kean University, 2009-2010). Of the total undergraduate

population, roughly eleven percent was enrolled in one or more remedial courses during the same time period (Kean University, 2009-2010).

Due to a variety of factors, graduation from Kean rarely occurs within four years. According to data from the United States Department of Education, in 2008, only sixteen percent of full-time Kean University freshmen graduated within four years, while only forty-four percent succeeded in graduating within six years (Heyboer, 2011).

Sources of Data

The data for this study came from a sample of essays written by students in the ESL Program at Kean. The ESL Program, housed in the English Department, serves NNSs who have been accepted to degree programs at the University but still require additional English instruction to be able to succeed in their studies. During the Spring 2011 semester, the ESL Program had 246 students (Appendix A), roughly three-quarters of who were Hispanic (Thompson, 2011). Fifty-four percent of these students were graduates of foreign high schools, while roughly 38 percent were graduates of high

schools within the United States. An estimated 3 to 4 percent were international students. The vast majority, roughly 93 percent, attended the University full-time (Thompson, 2011).

The essays used in the sample came from the final exams of students in College Composition for Nonnative Speakers I, the first course in a required two-part sequence in

academic expository writing. Several sections of this course are offered every semester, the majority during morning or afternoon hours. Each is taught by a different full-time or adjunct instructor from the ESL program. Instructors teach students how to write essays using different rhetorical modes, or developmental structures of organization (Alamargot, 2009). At the end of every semester, students are required to demonstrate their writing skills in a departmental final essay exam consisting of two essay prompts, based on assigned readings, written by each instructor and subject to approval by the Program director. Students select one of the questions and have two hours to answer it in an essay. The exam questions state, directly or indirectly, the rhetorical mode students should use in their responding essays; most often the prompts demand responses written as

persuasive, compare-contrast, cause and effect, classification, definition, or process analysis essays. To pass the exam, students are expected to achieve a holistic score of 4 on their essays (see Appendix B and Table 1).

Selection of the Sample

Since the objective of the study was to determine if there were correlations between grammatical error and holistic scores as well as grammatical error and passing and failing holistic scores in the essays of students in College Composition for

Nonnatives Speakers I, a random stratified sample consisting of an equal number of passing and failing holistically-scored final essay exams was used. The exams came from the Spring 2011 semester of this course, which was the most recent data available.

To be included in the sample, essay exams had to meet two criteria. First, essays had to be at least three hundred words in length exclusive of quoted material. The length

requirement was adopted to make ensure there was enough text to analyze for the objective scoring. Second, essays had to have been scored by only two raters. Any essay that was graded by more than two raters—additional raters are only used when there is divergence in the core numerical score (i.e. disagreement as to whether the essay earned a 3, which is theoretically failing, or a 4, which is passing)—was eliminated from the sample because of the implicit disagreement in exam quality. It was assumed that the study’s findings would be more meaningful when applied to data that was not subject to dispute.

Table 1

Description of Characteristics of Passing Essays for Students in ENG 1300,

College Composition for Nonnative Speakers I

Score of 4 (Place in/Pass to ENG 1430)

The meaning is consistently clear. The writer constructs and follows a thesis statement appropriate in scope to an argument. Logical organization, with points appropriately sequenced, is evident. The writer develops his/her position on an issue in a series of well-constructed paragraphs supported by sufficient detail and explanation and developed through the appropriate use of multiple methods. The writer is persuasive and deals with various points of view objectively, giving thorough support for generalizations through use of sufficient and reliable data. The writer has evaluated information from readings, experience, and other sources; has presented it in a logical and analytical way; and has identified the assumptions behind underlying ideas. The writer thoroughly explores complex ideas.

The language is fluent. Translation is not noted. Vocabulary and idioms are used appropriately. Sentence structure is appropriate and varied. Errors are infrequent and do not interfere with comprehension. Conventions of capitalization, punctuation, paragraphing, titles, quotations, and citations are excellent. Formatting is consistent.

Kean University ESL Program Scoring Guide (2006).

After non-complying essays were set aside, the remaining essays were divided into those with passing holistic scores and those with failing scores. Two passing and

two failing exams were then randomly selected from each of the four sections of the course to arrive at a sample of eight passing and eight failing essay exams.

Procedures

The sixteen essay exams chosen for this study were holistically graded prior to the selection of the sample by eight experienced Kean adjunct faculty members during the penultimate week of the Spring 2011 semester. All eight raters had graduate degrees, with seven holding master’s degrees in Teaching English as a Second Language (TESL) or a related field. The rater without a master’s degree in TESL or a related discipline had approximately twelve years’ experience teaching ESL at the post-secondary level, two other master degrees, and was close to completing a master’s in TESL when the scoring was conducted. The eight raters’ experience teaching ESL at the post-secondary level ranged from roughly nine to thirty-three years. The raters were also very

experienced in conducting holistic scoring. Seven of the eight raters had nine or more years’ experience doing holistic assessment. The remaining rater had approximately five years’ holistic grading experience.

In keeping with standard holistic grading protocol, each essay in the sample was independently read by two ESL faculty members and rated based on the overall

impression of the essay measured against the Kean University ESL Program Holistic Scoring Guide (see Appendix B). Scores for essays of students in Composition for Nonnative Speakers I generally range between 3, which is failing, and 4, which is passing. Raters were allowed to assign pluses and minuses to the core numerical scores to indicate the degree of relative strength or weakness of each essay. The scores given by

each rater were then averaged together for the final holistic score. The resultant composite holistic scores were used for the correlational analysis. As previously mentioned, any essay scored by a third reader was not included in the sample.

Approximately one month later, three of the eight aforementioned Kean

University ESL adjunct faculty members conducted an objective rating of each essay in

Table 2

Objective Scoring Guide

ͳǤ ͵ͲͲ_ǡǡ Ǥ ǡǤ Ǥ

2. Assign a weight to each error to indicate its effect on the readability and comprehensibility of the sentence.

x Use a score of 1 to indicate errors that are minor and do not significantly impact comprehensibility;

x Use a score of 2 to indicate errors that result in a moderate distortion of the sentence; and

x Use a score of 3 to indicate errors that cause severe distortions, rendering the affected sentence difficult to read and understand.

3. Sum the scores of the errors.

ͶǤ ȋ͵ͲͲȌǤ ͵ͲͲȋȌ ̴̴̴̴̴̴̴̴̴̴̴̴̴̴̴αȋȌ ȋȌ ǡǤǤȋͳͻͻͺȌ ǤǣƬǤ

the sample. Only one of these raters was formally trained in objective scoring, though all three reviewed and normed the objective scoring model developed by Brodkey and Young and presented with slight modifications by Bailey (1998). As described in Table 2, the raters read the first three hundred words of each essay, excluding all quoted text, underlining every syntactic error they encountered. Then, adapting error classifications used by Breland & Jones (1984), Homburg (1984), Roberts & Cimasko (2008), and Weltig (2004), raters counted all the syntactic errors, including mistakes in subject-verb agreement, pronoun usage, verb formation, sentence structure, word order, fragments and run-ons, use of modifiers, article usage, prepositions, word forms, clauses, auxiliaries, singular versus plural nouns, and possessives (see Appendix C). Errors that were not counted were those derived from lexical misuse and problems in discourse, mechanics, and content.

In the next step of the objective scoring process, raters weighted each error

according to its effect on the readability and comprehensibility of the sentence in which it appeared. In accordance with Brodkey and Young’s design, errors were weighted as follows (as cited in Bailey, 1998, p. 193):

Score of 1 - Errors that are minor and do not significantly impact comprehensibility;

Score of 2 - Errors that result in a moderate distortion of the sentence;

Score of 3 - Errors that cause severe distortions, rendering the affected sentence difficult to read and understand.

The weighted error scores were then summed, and an essay correctness score was calculated by dividing the error score into the number of words in the text.

After the holistic and objective scorings were completed, the holistic scores, which ranged from a low score of 3- to a high of 4+, were converted into numerical equivalents (Table 3) so they could be employed in the data analysis. Pluses and minuses were weighted as adding or subtracting an additional one-quarter point (.25) to the core numeric score. However, the difference between a 3+ and a 4- was weighted as a half point (.50) on the basis that there is a greater distance in scores between number ranges

Table 3

Holistic Score Numerical Equivalents

Original Score Numerical Equivalent

3 - 3.25 3/3- 3.375 3 3.5 3/3+ 3.625 3+ 3.75 4- 4.25 4/4- 4.375 4 4.5 4/4+ 4.75

than within. In other words, the difference in a score of 3 and 3+ is less than the difference between a 3+ and a 4-, especially since students are expected to demonstrate the characteristics of someone writing at a level of 4 (refer to Table 1) to pass this course. Whether an essay earns a strong 4 or a weak one is not as important as having surpassed the characteristics of a level 3 writer. Hence the larger numerical spread assigned to this spread.

To prepare the objective scores for analysis, the two closest of the three objective correctness scores calculated for each essay were averaged together to arrive at one figure.

Analysis of Data

Two correlational analyses were done using the holistic scores and the objective scores for the essays in the sample. The relationship between grammatical error and holistic scores was measured by calculating a Pearson’s product-movement correlation coefficient (r). Essay correctness scores, which are used in standard objective scoring methodology and are directly inversely representative of grammatical error, were used along with holistic scales in this calculation. The statistical significance of the obtained r was calculated at a .05 significance level for a one-tail test and sought to reject the null hypothesis that there is no correlation between the variables, holistic and objective essay scores.

The relationship between grammatical error and passing and failing final essay exam scores was measured by calculating the one-tailed point-biserial correlation

coefficient (rpbi). The same data—holistic scores and grammar correctness scores—were used for this calculation. The obtained rpbi was measured at a .05 level of significance.

Chapter IV: RESULTS

This study examined the relationship between holistic scores and grammatical error in departmental final essay exams of students in Composition for Nonnative Speakers I, the first part of a two-course composition sequence offered through the ESL program at Kean University. Its purpose was to establish whether holistic essay scores, in general, and passing and failing scores, in particular, were correlated with grammatical error during the Spring 2011 semester.

The study used holistic and objective scores from a random stratified sample of an equal number of passing and failing essays. Correlations between grammatical error and holistic scores were determined by calculating a one-tailed Pearson’s product-movement correlation coefficient (r) and a one-tailed point-biserial correlation coefficient (rpbi) with the data.

Results

The holistic scores of the essays in the sample are shown in Table 4. As

previously mentioned, all essays in the sample were scored by two raters, and any essay necessitating a third rater was excluded to ensure there was no disagreement on the overall quality of essays in the sample. The holistic scores of both raters are noted, along with their composite numerical equivalents.

Table 4

Holistic Scores for Selected Sample

Essay Holistic Scores Numerical Equivalents

1 4-/4 4.375 2 4-/4 4.375 3 4/4 4.5 4 3/3+ 3.625 5 3-/3 3.375 6 3/3+ 3.625 7 3/3 3.5 8 4/4 4.5 9 4/4 4.5 10 4-/4 4.375 11 4-/4- 4.25 12 3/3 3.5 13 4-/4 4.375 14 3/3+ 3.625 15 3-/3 3.375 16 3/3+ 3.625

The results of the objective scoring—the total error and correctness scores based on the first three hundred words, omitting quotations by rater—are indicated in Table 5. The averages of the two closest correctness scores per essay and the numeric holistic score for every essay in the sample are shown in Table 6 and presented graphically in Figure 1.

Analyses of variance were conducted to examine the relationships between the variables and ascertain if correlations existed. The results of the one-tail Pearson’s product-movement correlation coefficient indicate a statistically significant

correlation (r=0.44) between grammatical correctness and holistic scores at the .05 level, meaning a high incidence of grammatical error was directly and inversely correlated with holistic scores.

Table 5

Objective Essay Scores by Rater

Rater A Rater B Rater C

Essay Error Correctness Error Correctness Error Correctness

Score Score Score Score Score Score

1 53 5.66 49 6.12 61 4.92 2 53 5.66 48 6.25 44 6.82 3 52 5.77 61 4.92 51 5.88 4 27 11.11 32 9.38 16 18.75 5 36 8.33 44 6.82 38 7.89 6 57 5.26 59 5.08 49 6.12 7 46 6.52 54 5.56 46 6.52 8 22 13.64 25 12.00 17 17.65 9 27 11.11 22 13.64 23 13.04 10 34 8.82 40 7.50 33 9.09 11 38 7.89 35 8.57 36 8.33 12 68 4.41 60 5.00 37 8.11 13 40 7.50 47 6.38 41 7.32 14 48 6.25 37 8.11 42 7.14 15 62 4.84 52 5.77 65 4.62 16 40 7.50 46 6.52 47 6.38

The second correlational analysis, the point-biserial correlation coefficient (rpbi), was calculated to assess whether grammatical correctness was correlated with passing and failing holistic essay scores. The results of this analysis showed that at a level of .05, there was a statistically significant correlation (rpbi=0.39), indicating grammar was indeed correlated with passing and failing grades on the holistically scored essay exams. In other words, exams with fewer errors and higher correctness scores were correlated with passing holistic scores, while those with a prevalence of errors and lower correctness scores were correlated with failing scores.

Table 6

Holistic Scores and Objective Essay Correctness Scores*

Essay Holistic Score Correctness Score

1 4.375 5.89 2 4.375 6.53 3 4.5 5.83 4 3.625 10.25 5 3.375 8.11 6 3.625 5.17 7 3.5 6.52 8 4.5 12.82 9 4.5 13.34 10 4.375 8.96 11 4.25 8.45 12 3.5 4.71 13 4.375 7.41 14 3.625 7.63 15 3.375 4.73 16 3.625 6.45

x Correctness score is the average of the two closest correctness scores

The holistic and objective scores for each essay in the sample are presented in a simple stacked line chart in Figure 1, with the data arranged from low to high scores. The general increase in holistic scores that occurred as grammatical correctness scores improved is reflected. Figure 2 shows the regression line for the holistic and correctness scores in the sample. The regression line depicts the best linear prediction of holistic scores in terms of correctness scores.

Figure 1

Holistic and Objective Correctness Scores Per Essay

Figure 2

In document The Effect of Grammatical Error on Holistic Scoring. in the Essays of ESL College Composition Students. Robin Rosen Chang (Page 33-47)