In the following two tables, a spearman rho rank order correlation coefficient was used to show how well each of the methods were related to each other, or will produce similar rankings across the separate series of texts. The critical level of rho = 0.89. It has already been shown that only M5 produces similar results across both series, but it is also worthwhile to see which of the methods agree with others when applied to the same series. Table 6.7 shows the correlations between the different methods used to rank the Blue series of texts.
Table 6.8 Blue series: Intercorrelation matrix of Methods M1 – M7
M1 Rank M2 Rank M3 Rank M4 Rank M5 Rank M6 Rank
M2 Rank -.09 M3 Rank -.14 .94* M4 Rank -.54 .09 -.09 M5 Rank .09 .37 .31 -.09 M6 Rank -.43 -.26 -.37 .89* .03 M7 Rank -.14 .94* 1.00* -.09 .31 -.37 The critical r (n=7) =0.89, p<.05
The methods showing statistically significant correlations with each other for the Blue series, are M7 with M2 (.94) and M3 (1.00). M7 and M2 are both a combination of semantic and syntactic proxies which may explain their strength of relationship. In addition, M3 shows strength with M2 (.94). M3 shares the element of function words with M2, which probably explains the strength it showed with M2. The strength that M4 showed with M6 (.89) was also to be expected because they both contain a measure of average sentence length. M1 (vocabulary index) is the method that this series of texts was selected by. The correlations show that M1 showed no strength with any of the other methods used.
Table 6.8 shows the correlations between the different methods used to rank the Orange series of texts.
Table 6.8 Orange series: Intercorrelation matrix of Methods M1 – M6
M1 Rank M2 Rank M3 Rank M4 Rank M5 Rank M6 Rank
M2 Rank .09 M3 Rank .09 1.00* M4 Rank .03 .94* .94* M5 Rank .13 .92* .92* .85 M6 Rank .03 .94* .94* 1.00* .85 M7 Rank .07 1.00 1.00 .94* .93* .94* The critical r (n=7) =0.89, p<.05
The Orange series also showed strong correlations for M7 with methods M2 and M3 and in addition, with M5. This could be a result of these four methods using every word type in the measurement method and therefore representing both semantic and syntactic proxies. These four methods have all shown statistical significance with all other methods except for M1 which is the vocabulary index alone. The strong association of M4 with M6 was also shown again for this series (1.00). Overall, M2, M3. M5, and M7 show that they will produce similar results to any of the other methods except for M1. M1 has been shown across both series of texts to have no strength of correlation with the other methods used.
6.8 Summary
This chapter has outlined five further methods of measuring text difficulty and has also shown that each of the texts in the two series occupy different rank orders each time a different method is applied. The main finding to emerge from exploring these methods as a group, was that the Blue series of texts showed itself to be less stable across differing measurements of text difficulty. In contrast the Orange series maintained a reasonably consistent rank order when various methods of measuring text difficulty were applied, with the exception of M1. This suggests that the Blue series which was selected on vocabulary load alone, was
relatively unstable. The other main finding of this section was that the only method that showed it could produce a similar rank order to the original method of selection across the two series was M5, mean standardized type:token ratio. Method M7 showed itself to correlate very closely with M2 M3 and M5 which, with the added benefits of ease of calculation and the inclusion of all words in the text would make M7 the method of preference.
The process of applying various methods to measure difficulty of texts has shown the wide range of challenge that can be built in to the linguistic components of a text. Writers can use simple vocabulary, but then write with very long sentences, or use a very complex syntactic style. Conversely, a text with an even spread of difficulty across features of text will have vocabulary, sentence length, syntactic complexity, and lexical richness all tied together in a close range of difficulty. By revealing a global view of the challenge the components of a text may pose, this part of the study has shown that an analysis of the complete ‘build’ of a text has shown how well rounded and balanced aspects of text difficulty need to be. This suggests that valuable feedback can be provided to authors about a range of important aspects for texts that are specifically written to meet levelling criteria using a controlled vocabulary approach. This has implications for monitoring the make up of the text at the early stages of levelling, should this be desired.
Chapter Seven: Phase Four
Establishing Criterion Measures of Text Difficulty
Following on from phase three, the six methods used to rank the texts were now ready for validating against criterion measures. Phase four was the establishment of the criterion measures to be used and also the identification of the strongest criterion measure. The criterion measures developed in this phase are teacher rankings, student rankings, combined rankings, and student performance. In addition to this, a questionnaire was completed by teachers. This is reported on separately in Chapter 9. While establishing the criterion measures, the participant groups of teachers and students had no knowledge of the rankings, or the methods by which the texts had been ranked. The procedures for collecting reader opinion and student performance to establish the criterion measures are described in this chapter.