Equivalent Forms - Creating New Forms - VALIDATION OF THE GUESSING FROM CONTEXT TEST

CHAPTER 4: VALIDATION OF THE GUESSING FROM CONTEXT TEST

4.7 Creating New Forms

4.7.1 Equivalent Forms

At least two equivalent forms of the GCT are needed to serve as a tool for future research in this field. Equivalent forms have the same construct to be measured, the same test length, and the same distribution of item difficulties. Having two equivalent forms will allow a pre- and post-test design where the effects of teaching on the skill of guessing from context may be investigated.

The first step for creating new forms was to determine the number of items included in each form in order to achieve a certain level of reliability. The minimum level of reliability was determined so that the Rasch person strata indices would exceed 2. A Rasch person strata index of 2 indicates two statistically distinct levels for person abilities, which is the minimum level for acceptable responsiveness (detecting change after an experimental treatment). The person strata index of 2 is equivalent to person reliability of .610 given the formulae in Linacre (2010a).20 The number of items needed for achieving the reliability of .610 was estimated based on the following Spearman-

142

Brown prediction formula (Brown, 1910; Spearman, 1910):

where T = target number of items, C = current number of items, RT = target person reliability, and RC = current person reliability. Table 32 shows the estimated number of items that are required to arrive at the person reliability of .610 for each form of the three sections.

Table 32. Estimated number of items needed for arriving at person strata of 2

Form A Form B Form C Form D Form E Form F

Part of speech 9.3 14.0 16.0 6.6 14.5 16.7

Contextual clue 18.5 19.0 13.0 9.6 16.5 11.8

Meaning 13.1 19.8 11.9 11.6 8.9 12.9

Form B in the meaning section indicates the largest number of items (19.8) for arriving at the Rasch person reliability of .610 (= Rasch person strata of 2). This means that a new test form should involve at least 20 items in order for any form to guarantee the minimum requirement for a sensitive test (Rasch person strata of 2). As indicated by the pilot studies (see Section 3.7), a 20-item test form may be completed in half an hour and is unlikely to result in a fatigue effect that could affect reliability. Thus, new test forms had 20 items which was the minimum number in terms of reliability and the maximum number in terms of fatigue effect.

As there are 49 acceptable items, two equivalent 20-item test forms can be constructed. The two equivalent forms were created based on the following criteria in order to maintain the representativeness of the construct:

1. Each form had nine nouns, six verbs, three adjectives, and two adverbs in order to reflect actual language use (noun: verb: adjective: adverb =

143 9:6:3:2).

2. Each form included all twelve types of contextual clues (one or two items per contextual clue) in order to ensure test representativeness.

3. The proximity of the clue to the test word was controlled so that each form had the same number of clue-inside (clues that appear in the same sentence as the test word) and clue-outside (clues that appear in a different sentence from the sentence containing the test word) items; that is, 13 clue-inside items and 7 clue-outside items for each form. This ratio (13:7) was an approximate ratio of 41:19 for the 60 original items (see Section 3.7). 4. In order to make sure that each form has items with a wide spread of

difficulty, the 49 acceptable items were classified into four groups based on the item difficulties in the meaning section21: 1) larger than 0.5 logits, 2) between 0 and 0.5 logits, 3) between -0.5 and 0 logits, and 4) smaller than -0.5 logits. Each form had five items selected from each of the four groups.22

The distributions of the item difficulties for the two new forms for each section are shown in Figures 22-24 using the Rasch person-item map (The person ability and the item difficulty estimates larger than 2.0 and smaller than -2.0 are summarised into one row for want of space). The items of Form A are presented on the left-side of the item distribution, and the items of Form B are presented on the right-side of the item distribution. For each item, the item number is followed by its Rasch item difficulty in brackets. For example, 13(3.19) means that the item number is 13 and its item difficulty is 3.19 logits.

The spread of item difficulties was determined based on the meaning section instead of the part of speech and the contextual clue sections, because deriving the meaning is arguably the most important aspect in the skill of guessing from context. As will be discussed later, however, no significant difference was found in item difficulty between the two forms for the part of speech and the contextual clue sections.

To be precise, Form A had four items with difficulty estimates larger than 0.5 logits and six items with difficulty estimates between 0 and 0.5, because there were only a total of nine items with difficulty estimates larger than 0.5 logits.

144

Form A Form B *##### | 2 *### T + *# | 5(1.69) *# | *###### S | T 30(1.12) 1 ######## + 24(1.08) 28(1.08) *######## | 14(0.69) ############ | S 34(0.50) ############ M | 2(0.33) 45(0.20) 1(0.26) 39(0.26) 46(0.15) 0 ############ + M 35(0.05) 6(-0.03) 40(-0.01) 33(0.12) 7(0.10) 18(-0.04) 59(-0.08) 13(-0.11) 20(-0.07) 50(-0.10) 8(-0.12) *############# | 17(-0.16) 10(-0.19) 23(-0.21) 15(-0.18) ######## S | S 32(-0.23) 42(-0.23) 56(-0.25) 53(-0.45) 12(-0.47) 25(-0.50) #### | 38(-0.28) 55(-0.42) -1 ##### + 27(-0.69) 26(-0.83) 48(-0.91) 36(-0.89) 44(-0.91) 57(-0.94) *## | T ## T | | -2 + * |

Figure 23. Person-item map of the equivalent forms for the contextual clue section <More able persons> | <More difficult items>

Form A Form B ################ ################ ################ ###### S | 13(3.19) 20(2.20) 2 ######### + 57(1.93) 5(1.89) | ################ | 32(1.53) 15(1.69) 33(1.66) ######### M | S 1 ######### + 59(0.93) 24(0.89) 1(0.93) *############## | 45(0.77) ###### | 36(0.61) 18(0.40) #### S | 17(0.28) 48(0.28) 28(0.28) 0 *###### + M 40(-0.04) ### | 27(-0.13) 26(-0.27) 56(-0.33) 46(-0.22) 50(-0.22) 39(-0.24) ## | 35(-0.37) 2(-0.38) 30(-0.27) ## | 6(-0.71) 42(-0.85) 25(-0.73) 8(-0.84) -1 # | 38(-0.91) 23(-1.02) 14(-1.07) ## T | S 55(-1.35) 44(-1.15) 7(-1.17) *## | 10(-1.59) | 12(-1.82) -2 *# + ##* | 34(-2.35) 53(-2.36)

145

Form A Form B *## | 2 ## + *# T | 40(1.65) *# | T 34(1.34) ### | 14(1.29) 30(1.16) 28(1.15) 1 ##### S + 42(0.94) 7(0.98) 39(0.96) *###### | S 13(0.76) ######## | 24(0.48) 26(0.36) 1(0.38) ############### | 32(0.32) 2(0.31) 38(0.23) 15(0.22) 33(0.20) 0 ############## M + M 17(0.14) 10(-0.12) 56(-0.13) 53(0.11) 5(0.04) 8(-0.02) *############ | 23(-0.16) 35(-0.26) 25(-0.18) 50(-0.18) 18(-0.25) ########## | 6(-0.32) 46(-0.47) 36(-0.57) 57(-0.62) *######### | S 45(-0.68) 59(-0.78) 55(-0.81) 20(-0.63) -1 **####### S + 27(-0.87) 48(-1.00) 12(-1.04) #### | 44(-1.20) # | T *## T | -2 * + * |

Figure 24. Person-item map of the equivalent forms for the meaning section

The person-item maps in Figures 22-24 indicate that the item difficulties are evenly distributed between Forms A and B for the three sections. In order to statistically examine the homogeneity of variance of item difficulty between the two forms, Levene’s test was performed. The results showed that the null hypothesis of equal variances was not rejected for the three sections (F = 2.18, p = .148 for the part of speech section; F = 1.81, p = .187 for the contextual clue section; and F = 0.00, p = .957 for the meaning section), indicating that the spread of item difficulties may be acceptably equal between the two forms. Subsequent t-tests (2-tailed) did not detect any significant differences in the mean item difficulties between the two forms for any of the three sections (Table 33). The effect sizes (r) were smaller than .20, which indicates small differences between the two forms (Cohen, 1988, 1992).

146

Table 33. Comparison of the item difficulty between the two equivalent forms

Form A Form B t d.f. p r M SD M SD Part of speech -0.06 1.12 0.01 1.41 -0.17 38 .866 .027 Contextual clue -0.11 0.46 0.03 0.69 -0.78 38 .440 .119 Meaning 0.07 0.73 0.07 0.74 -0.01 38 .995 .001

Taken together, the two forms may be representative of the construct being measured, and may be equivalent in the mean and the spread of item difficulties. (See Appendix F for the two new forms of the GCT.)

In document Diagnostic Tests of English Vocabulary Learning Proficiency: Guessing From Context and Knowledge of Word Parts (Page 153-158)