Descriptive statistics and reliability estimates

Chapter 5. Results and discussion

5.1 Descriptive statistics and reliability estimates

Descriptive statistics and reliability estimates were calculated to describe the performance of the each group of test-takers on the Collocational Ability Test using both scoring methods. These statistics were calculated to describe and compare the characteristics

of the groups and the consistency of scoring. The descriptive statistics also guided the selection of subsequent statistics for further analyses. Histograms displayed the distributions of scores visually. The descriptive statistics were useful in supporting the evaluation

inference in the interpretive argument. Backing for the assumption that scores had

appropriate characteristics for norm-referenced decisions was supported or challenged based on these statistical results.

5.1.1 Descriptive statistics and reliability: dichotomous method

Descriptive statistics for the scores on the Collocational Ability Test using the dichotomous scoring scale are shown in Table 5.1. The dichotomous scoring method awarded 1 point to the target response which had been identified by the corpus search. All other responses were marked as incorrect with 0 points. The mean scores decreased, as expected, from the highest placement group to the lowest placement group. The high-ability group performed the best (M=12.31), followed by the mid-ability group (M=5.01), and then the low-ability group (M=2.81). The reliability estimate calculated for the whole sample (N=206) was acceptable at α = .83; however, the distribution of the scores is skewed, as shown in Table. 5.1

Table 5.1. Descriptive statistics for the Collocational Ability Test for all levels (k=35) using a dichotomous scoring method

Group N Min Max Mean Std. deviation Cronbach‘s alpha

High-ability 35 3 21 12.31 4.15

Mid-ability 109 0 15 5.01 3.38 .83

Low-ability 62 0 11 2.81 2.75

Figure 5.1 shows a histogram of the distribution of scores for the Collocational Ability Test using the dichotomous scoring method. Although the initial values for skewness (0.957) and kurtosis (0.53) are within the ―rule of thumb‖ values (-2 and +2) for a normal distribution (Bachman, 2004, p. 74), these statistics, divided by their standard of error, provide a picture of a non-normal distribution with the skewness/SES (5.59) over the maximum value and kurtosis/SES (1.75) reaching the maximum value. The histogram of scores confirms the idea of an asymmetrical non-normal distribution.

Figure 5.1. Histogram showing the distribution of scores on collocation test (k=35) for all groups (N=206)

A distribution with a positive skew as shown in Figure 5.1 indicates that the test is too difficult for the population represented by the sample of test-takers. This is the type of

distribution that would be expected from the scores on a pre-test intended for a criterion- referenced interpretation. The asymmetrical distribution of scores confirms the information provided by the values from the descriptive statistics indicating a non-normal distribution.

Backing was therefore not found to support the assumption in the evaluation inference that that scores had appropriate characteristics for norm-referenced decisions could be supported or challenged based on these statistical results.

5.1.2 Descriptive statistics and reliability: Polytomous method

The data were recalculated by applying a partial credit scale to the results following the identification of acceptable relevant collocates in the corpus of written academic English. The polytomous scale allowed for partial credit if a response was identified as a potential collocate of the node in the item in the context of academic English. Response analysis, described in chapter 3, identified the potential collocates by frequency in the academic texts of COCA. A response that matched the target response was given full credit as 2 points. Responses that were identified as potential collocates in academic English were awarded one point. All other responses received no credit. The descriptive statistics and reliability estimate for the results using the partial credit scale are presented in Table 5.2.

Table 5.2. Descriptive statistics for the Collocational Ability Test for all levels (k=35) using a polytomous scoring method

Groups N Min Max Mean Std. deviation

Cronbach‘s alpha High-ability 35 21 52 36.91 7.18 Mid-ability 109 0 41 22.67 8.82 .89 Low-ability 62 0 33 16.05 9.27 TOTAL 206 0 52 11.54 5.54

A similar trend is seen with the partial credit scale as seen with a dichotomous scale. Mean scores for all three groups descend as expected from the highest to the lowest

proficiency group. The high-ability group performed the best (M=18.46.), followed by the mid-ability group (M=11.33), and then the low-ability group (M=8.02). The standard

deviations for all three groups are closer together than with the dichotomous scoring method. The reliability estimate using this scoring procedure (α = .89) is higher than the results from the dichotomous scale (r = .83), indicating less measurement error.

Figure 5.2. Histogram showing the distribution of scores on collocation test (k=35) for all groups (N=206)

Furthermore, scores using the polytomous scoring method are more normally

distributed. The symmetrical distribution of scores presented in Figure 5.2 is more beneficial for making placement decisions provides backing for the assumption that scores had

appropriate characteristics for norm-referenced decisions was supported for the polytomous scoring method.

In sum, the descriptive statistics indicate that the higher proficiency groups

outperform the lower proficiency groups in a descending order. This is true for both scoring methods; however, the application of the polytomous scoring method produced a score distribution that is favorable to the dichotomous scoring method for making decisions for placing students into appropriate levels of English instruction or exempting them from instruction. The consistency of the measurement is superior for the polytomous scoring method, as indicated by the reliability estimates. Backing was also found for the evaluation inference using the polytomous scoring method only.

In document A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design (Page 114-119)