• No results found

CHAPTER 4: VALIDATION OF THE GUESSING FROM CONTEXT TEST

4.8 Discussion

Previous studies (Bruton & Samuda, 1981; Clarke & Nation, 1980; Williams, 1985) proposed guessing-from-context strategies from a pedagogical perspective, and these strategies essentially included the three aspects measured by the GCT. However, no attempts have been made to empirically examine whether identifying the part of speech of the unknown word and finding a contextual clue really contribute to deriving its meaning. As the present research measured the three aspects of guessing from context using the GCT, this issue may be addressed by investigating the interrelationships among the three aspects. In so doing, a multiple regression analysis was performed with the dependent variable being the Rasch person ability estimates from the meaning section and the independent variables being the Rasch person ability estimates from the part of speech section and from the contextual clue section. A path diagram of the results is presented in Figure 27 which is the same as the one presented in Figure 18. This figure shows that both the ability to identify the part of speech of the unknown word (β = .32) and the ability to find a contextual clue (β = .44) significantly contribute to the ability to derive its meaning. A combination of the abilities of part of speech and contextual clues accounted for about half of the variability of the ability to derive the meaning (R2=.45). Given that guessing involves many other factors such as reading ability and world knowledge, this coefficient of determination may be considered high. Taken together, the results showed that both identifying the part of speech of the

151

unknown word and finding a contextual clue to help guess meaning play an important role in deriving meaning, indicating the effectiveness of the guessing strategies proposed by the previous studies.

One of the important features of the GCT is its comprehensiveness (measuring multiple aspects of the skill of guessing). This is in line with recent L2 vocabulary studies (Schmitt, 1998; Webb, 2005, 2007a, 2007b, 2007c) which have underscored the importance of measuring multiple aspects of vocabulary knowledge because different tasks may have varying effects on aspects of vocabulary knowledge. The GCT presupposes that different tasks and teaching materials may result in the development of different aspects of guessing skill and the ability of deriving meaning. For example, the instruction of contextual clues may improve scores on the contextual clue section and lead to the improvement of the ability of deriving meaning. Grammar instruction may improve scores on the part of speech section and contribute to the improvement of the ability of deriving meaning. The introduction of the guessing strategies may also raise learners’ awareness of the importance of identifying the part of speech and looking for contextual clues, which may lead to an improvement in guessing. The GCT may thus contribute to effective and efficient teaching of the skill of guessing from context.

Part of speech Contextual clue Meaning .55* .32* .44* R2=.45* *p<.05

Figure 27. Relationships of the part of speech and the contextual clue sections to the meaning section

152

4.9 Summary

This chapter aimed to validate the GCT so that it would be widely available to researchers, teachers, and learners. In so doing, 428 Japanese learners of English with a wide range of proficiency levels participated in the present research. Six forms each with 20 items were created in a paper-based format and were randomly distributed to the participants. Rasch analysis showed that lucky guessing (unexpected success on difficult items by persons with low ability) was observed in the part of speech section, but not in the contextual clue and the meaning sections; and thus, the responses in the part of speech section were corrected for lucky guessing. Rasch analysis also revealed that 49 out of 60 items were acceptable. The validity of the GCT with the 49 acceptable items was investigated from eight aspects of construct validity (content, substantive, structural, generalizability, external, consequential, responsiveness, and interpretability) in order to comprehensively provide logical argumentation and empirical evidence to support its validity. Table 35 summarises the evidence provided for the validity argument. On the whole, both the logical argumentation and the empirical evidence indicated a high degree of validity. The validity argument also revealed the following three points to note:

1. Further evidence may still be needed for item calibration invariance and person measure invariance in the generalizability aspect because the small sample size may have affected the results.

2. The part of speech section may not be responsive (or sensitive) enough to detect able persons’ gains from an experimental intervention because of a ceiling effect.

3. Five items with unacceptable Rasch measurement statistics need watching for future use of the GCT. A close look at the passages and the options did not find any problems with these items. Whether these items work well needs to be examined further. Table 36 presents the five items that need watching.

153

Two equivalent forms were created so that each form had 20 items with a wide spread of difficulty. Each form had 20 items so that any form would have person strata of greater than 2 which is the minimum requirement for a responsive test. These new forms are useful for research involving a pre- and post-test design. The new forms are also useful for teachers and learners because the results may provide learners with diagnostic feedback on their weaknesses in guessing from context. The scores obtained from the GCT are highly interpretable for both norm- and criterion-referenced purposes in the context of Rasch measurement. For more convenient interpretations, conversion tables (Tables 31 and 34) between raw scores and Rasch person ability estimates are provided. The scores may be effectively reported to learners using a bar graph which presents learners’ weaknesses visually. Taken together, it should be reasonable to conclude that the GCT is a highly valid measure for assessing the skill of guessing from context and useful for both research and practical purposes.

154

Table 35. Summary of evidence provided for the GCT

Aspects Sub-category Evidence provided

Content 1. Content relevance

2. Representativeness 3. Technical quality 4. Expert judgments

Test specifications Rasch item strata Rasch person-item map Rasch item fit analysis

Reviews by English teachers and PhD students in applied linguistics

Substantive Test of difficulty hypotheses

Rasch person fit analysis

Structural Dimensionality analysis

Generalizability 1. Item calibration invariance 2. Person measure invariance 3. Reliability 4. Invariance across administrative contexts

DIF analysis for gender and L1 DPF analysis for item order

Rasch person separation and reliability Rasch item separation and reliability Comparison between person ability

estimates from a 20-item form and the missing data design

External Correlation with the productive version

of the GCT

Correlation with TOEIC scores

Consequential Analysis of sources of invalidity

Item bias

Responsiveness Person-item map (ceiling effects)

Person strata

Interpretability Person-item map

Conversion of raw scores and Rasch person ability estimates

Table 36. Summary of items that need inspecting for future use of the GCT

Item No. Section Reason for future inspection

5 M Technical quality (underfit)

57 M Technical quality (underfit)

24 P DIF analysis (gender)

58 C DIF analysis (gender)

30 M DIF analysis (gender)

155