Descriptive statistics and reliability analysis of the EPT, the TOEFL iBT and

CHAPTER 4. RESULTS

4.1 The relationship among the EPT and self-assessment and the TOEFL iBT

4.1.1. Descriptive statistics and reliability analysis of the EPT, the TOEFL iBT and

The descriptive statistics and reliability analysis for the three measures, namely the self- assessment, the EPT, and the TOEFL iBT, are reported one by one in this subsection.

4.1.1.1. The EPT

The descriptive statistics for the EPT are listed in Table 4.1. The mean score of the EPT reading sections was 18.36 out of 35 or 52.5 on a 100-point scale with a standard deviation of 5.21. The mean score of the EPT listening section was 21.09 out of 35 or 62.3 on a 100-point scale with a standard deviation of 5.15. By comparison, the ESL students had a higher average score on the EPT reading section than the listening section. The score data on the EPT reading and listening sections are normally distributed as indicated by the small values in skewness and kurtosis as well as the non-significant p-values of the Shapiro-Wilk’s test (.336 & .413). The results of the EPT writing section are of ordinal nature, consisting of three ordinal levels (Engl101B, Engl101C, and Pass).

Table 4.1

Descriptive Statistics for the EPT (n = 202)

Test Section M S.D Min/Max Skewness Kurtosis Shapiro-Wilk p value

EPT a Reading 18.36 5.21 6/34 0.001 -0.341 .336 Listening 21.09 5.15 5/34 0.015 -0.158 .413 Writing 2.15 0.60 1/3 -0.060 -0.297 <.001 Note. a. The full scores of reading and listening sections of the EPT were 40.

Cronbach’s alpha was reported as reliability index for the scores on the EPT reading and listening sections. The reliability of the EPT reading section in fall of 2014 was .67 and that for the EPT listening section was also .67. Each essay in the EPT writing section was graded by at

least two human raters. In the case of rating disagreement, a third rater, sometimes a fourth rater, was invited to rate the essay independently and the final rating was determined based on the agreed ratings from two raters. The resultant rating data were the typical sparse data with a large number of missing values. In other words, many essays were rated by a combination of different raters. Therefore, only inter-rating reliability, instead of inter-rater reliability, of the EPT writing section was estimated using Cronbach’s alpha after treating human raters as a random facet and consolidating the rating from different human raters into two to four ratings. The inter-rating reliability of the EPT writing section in fall 2014 was .787 (Cronbach’s alpha, N = 587). Due to a lack of access to the EPT essay ratings of the sampled ESL students, I used the inter-rating reliability for the whole fall 2014 EPT essays in this study and the true reliability of the study sample may be slightly lower than .787 due to smaller sample and the resultant smaller variance. 4.1.1.2 The TOEFL iBT

The descriptive statistics for the TOEFL iBT are listed in Table 4.2. Similar to the EPT, the ESL students had a higher average score on the TOEFL iBT reading section, with a mean score of 22.87 out of 35 or 76.2% and a standard deviation of 4.09. The TOEFL iBT speaking score was the lowest, with a mean score of 20.63 out of 30 or 68.8% and a standard deviation of 2.54. This relatively small standard deviation suggests that the TOEFL iBT speaking scores had less variation. The mean score of the TOEFL iBT listening section was 21.88 out of 30 or 72.9% with a standard deviation of 3.88. The mean score of the TOEFL iBT writing section was 22.33 out of 30 or 74.4% with a standard deviation of 3.05. The mean total score of the TOEFL iBT was 87.83 out 120 or 73.2% with a standard deviation of 9.92.

Since all of the ESL students in this study were fully admitted international students who met the minimum English language requirement, the TOEFL iBT scores were the truncated

portions with a relatively narrower range and less variation. In addition, the significant Shapiro- Wilk p-values (<.05) indicated that the TOEFL section scores and total scores were not normally distributed in this data set.

Table 4.2

Descriptive Statistics for the TOEFL iBT (n = 202)

Test Section M S.D Min/Max Skewness Kurtosis Shapiro-Wilk p value TOEFL iBT Reading 22.87 4.09 10/30 -0.435 -0.457 <.001 Listening 21.88 3.88 9/29 -0.294 -0.091 .012 Speaking 20.63 2.54 14/27 0.308 -0.474 <.001 Writing 22.33 3.05 15/28 0.065 -0.888 <.001 Total 87.83 9.92 60/109 -0.160 -0.903 <.001

Because item-level information of the TOEFL iBT scores was not available in this study, a compromise was made to use the reported reliability information from ETS

(http://www.ets.org/Media/Tests/TOEFL/pdf/TOEFL_iBT_Score_Reliability_Generalizability.p df). Considering the impact of restricted score range in the data in this study, the official

reliability information released by the ETS may be higher than the actual reliability of the

TOEFL iBT scores of the ESL students in this study. The reported reliability for the TOEFL iBT reading section is .86, for the listening section is .87, for the speaking is .90, and for the writing is .78.

4.1.1.3. The Self-assessment

The descriptive statistics show ESL students’ general performance and score distributions on the self-assessment based on the 6-point Likert scale (see Table 4.3). The descriptive statistics of the self-assessment revealed that the means of the responses to the self-assessment items ranged from 3.52 to 4.89 with standard deviation from 1.10 to 1.40. The descriptive statistics suggested that the ESL students tended to positively rate their English proficiency in the specific academic contexts and there were large variances in ESL students’ evaluation in each self-

assessment items as shown by the magnitudes of standard deviations. Significant Shapiro-Wilk

p-values in Table 4.3 showed that the responses on all of the self-assessment items were not

normally distributed, with the majority of the items negatively skewed or having high scoring responses.

Table 4.3

Descriptive Statistics for the Self-assessment (N = 347)

SA Items M SD Min/Max Skewness Kurtosis Shapiro-Wilk

p value R ea ding Rd1 4.27 1.22 1/6 -0.44 -0.39 < .001 Rd2 4.52 1.22 1/6 -0.67 -0.21 < .001 Rd3 4.29 1.27 1/6 -0.41 -0.72 < .001 Rd4 4.89 1.16 1/6 -1.09 0.91 < .001 Rd5 4.12 1.19 1/6 -0.28 -0.59 < .001 List ening Lsn1 4.44 1.26 1/6 -0.66 -0.03 < .001 Lsn2 4.30 1.22 1/6 -0.35 -0.53 < .001 Lsn3 3.97 1.38 1/6 -0.31 -0.66 < .001 Lsn4 4.13 1.25 1/6 -0.29 -0.47 < .001 Lsn5 3.52 1.40 1/6 0.01 -0.78 < .001 S pe aking Spk1 4.03 1.25 1/6 -0.38 -0.36 < .001 Spk2 4.03 1.29 1/6 -0.37 -0.36 < .001 Spk3 4.31 1.24 1/6 -0.39 -0.62 < .001 Spk4 4.01 1.23 1/6 -0.32 -0.39 < .001 Spk5 4.02 1.23 1/6 -0.28 -0.39 < .001 W riting Wrt1 4.17 1.10 1/6 -0.23 -0.14 < .001 Wrt2 4.31 1.17 1/6 -0.50 -0.18 < .001 Wrt3 3.81 1.21 1/6 -0.20 -0.29 < .001 Wrt4 4.12 1.20 1/6 -0.43 -0.26 < .001 Wrt5 4.20 1.19 1/6 -0.31 -0.47 < .001

The reliability for each of the self-assessment sections was high, as shown in Cronbach’s alpha (.90 for reading, .89 for listening, .93 for speaking, and .92 for writing). Since factor scores of the self-assessment sections will be used in the multitrait-multimethod (MTMM) analysis, the

scale reliability or factor rho coefficient for factor scores of the self-assessment sections under the confirmatory factor analysis framework will be reported later in this section, using the formula suggested by Kline (2011).

In document An argument-based validation study of the English Placement Test (EPT) – Focusing on the inferences of extrapolation and ramification (Page 109-113)