Cases With High Correlation Between Abilities

5.2 Results

5.2.2 Comparison of Scoring Methods

5.2.2.2 Cases With High Correlation Between Abilities

Discussion), a criterion is required to define good performance of these methods. As stated before, a high positive correlation indicates that a method performs well, but the size of correlation that designates the cut-off point of good performance is somehow arbitrary. As reliability of the data is defined as the squared correlation between true and observed scores, the conventions for reliability may be used as such a criterion. Consistent with current standards in psychological research, good reliability is usually defined as at least .80. With a correlation of r = .90 or above, a reliability of at least .81 is provided. In addition to this rather strict criterion, satisfactory values for reliability (.70) were also considered. In the following, cases with correlations of r ≥ .84 and r ≥ .90 are considered in more detail for each scoring method. However, HOMALS scoring is not investigated any further in this paragraph as positive correlations for this method did not exceed r = .29 or

r = .38. In addition, NRM V2 is not considered any further for the same reason. The

following paragraph explicates cases for consensus-based scoring methods with regard to the most important influencing variables. Case analysis for NRM V1 and V3 studies is described separately afterward, as different influencing variables were important here.

Consensus-based Scoring Methods. Table 29 and 30 provide an overview of the number of conditions in which the correlations between true and estimated abilities were equal to or higher than the criterion values of .84 and .90. More conditions with these values were observed for five-categorical data compared to two-categorical data. In

addition, more conditions with high correlations were observed for Consensus Analysis (in comparison to CBM methods) for two-categorical data, whereas for five-categorical data Consensus Analysis and (proportion) CBM methods did not show big differences.

Table 29

Number of Conditions with Correlations ≥ .84 for Consensus-based Scoring Methods

2-categorical data 5-categorical data

Method Absolute Frequency % Absolute Frequency % Total

Mode CBM 38 14.07 78 28.89 270

Proportion CBM 39 14.44 87 32.22 270

Consensus Analysis 48 22.22 71 32.87 216

Because ability and difficulty were identified to have non-trivial influence on the dependent variable, Table 31 contains the relative frequencies with which correlations of least .84 were observed in each respective factor level combination. Factor level

combinations for low ability are omitted because no high correlations were observed here. Table 32 presents these frequencies for a correlation of at least .90. The frequencies indicate how many high correlations were observed in combinations of factor levels of ability and difficulty, and hence, indicate in which situations consensus-based scoring methods worked best.

As can be seen from Table 31 and Table 32, five-categorical data in general yielded more cases with high correlations between true abilities and sum scores based on scored data. Comparing the relative frequencies for high and medium ability conditions, the main effect of ability shows its effect for all consensus-based scoring methods; that is, high ability conditions yielded more cases with high correlations compared to medium ability

conditions. For medium ability, only conditions with easy difficulty combinations [e,e,e] as well as [e,e,m] yielded high correlations. In contrast, high correlations were also observed for high ability with difficult items, apart from the combination [d,d,d]. The main effect of difficulty is apparent by comparison of different difficulty combinations: For combinations with easy items, more high correlations were observed compared to items with high difficulty.

Table 30

Number of Conditions with Correlations ≥ .90 for Consensus-based Scoring Methods

2-categorical data 5-categorical data

Method Absolute Frequency % Absolute Frequency % Total

Mode CBM 10 3.70 43 15.93 270

Proportion CBM 11 4.07 54 20.00 270

Consensus Analysis 20 9.26 51 23.61 216

Comparing the relative frequencies for mode and proportion CBM reveals some differences between these methods: proportion CBM provided more cases which met the criteria. However, a clear pattern of differences with regard to the factor level combinations was not identifiable. In comparison to CBM scoring, Consensus Analysis scoring appeared to perform slightly better as more conditions yielded high correlations, in particular with the cut-off of r = .84. Differences were more pronounced for difficulty combinations with mixed difficulties compared to conditions with equal difficulties (with high ability only). In other words, the number of conditions with high correlations were very similar for all consensus-based scoring methods for difficulty combinations [e,e,e], [m,m,m] and [d,d,d]. However, differences appeared for difficulty combinations [e,e,m], [e,m,d] and [d,d,e], in which Consensus Analysis performed better overall. Moreover, Consensus Analysis yielded high correlations in conditions with high ability and mostly difficult items [d,d,e] - a condition, in which no high correlation were observed for CBM methods. Similarly, for the difficulty condition [e,m,d], Consensus Analysis provided more cases in which correlations were above the cut-off point of r = .84 and r = .90. Hence, Consensus Analysis scoring showed advantages in conditions with difficult items. However, for only difficult items [d,d,d], no scoring method yielded high positive correlations.

To summarize, high correlations between true and re-estimated abilities were only observed in specific conditions characterized by combinations of ability and difficulty where

Table 31

Relative Frequencies of Observed Correlation ≥ .84 for Factor Level Combinations of Ability and Difficulty

Two-categorical data Ability

Consensus Analysis Mode CBM Proportion CBM

Difficulty 0 1 0 1 0 1 [e, e, e] .67 .33 .67 .33 .67 .33 [e, e, m] .67 .67 .27 .60 .33 .33 [e, m, d] .00 .58 .00 .00 .00 .27 [m, m, m] .00 .67 .00 .67 .00 .67 [d, d, e] .00 .42 .00 .00 .00 .00 [d, d, d] .00 .00 .00 .00 .00 .00

Five-categorical data Ability

Consensus Analysis Mode CBM Proportion CBM

Difficulty 0 1 0 1 0 1 [e, e, e] 1.00 1.00 1.00 1.00 1.00 .93 [e, e, m] 1.00 1.00 .60 1.00 .93 1.00 [e, m, d] .00 1.00 .00 .60 .00 .73 [m, m, m] .00 1.00 .00 1.00 .00 1.00 [d, d, e] .00 .92 .00 .00 .00 .20 [d, d, d] .00 .00 .00 .00 .00 .00

Table 32

Relative Frequencies of Observed Correlation ≥ .90 for Factor Level Combinations of Ability and Difficulty

Two-categorical data Ability

Consensus Analysis Mode CBM Proportion CBM

Difficulty 0 1 0 1 0 1 [e, e, e] .33 .00 .33 .00 .33 .00 [e, e, m] .33 .08 .00 .00 .07 .00 [e, m, d] .00 .33 .00 .00 .00 .00 [m, m, m] .00 .33 .00 .33 .00 .33 [d, d, e] .00 .25 .00 .00 .00 .00 [d, d, d] .00 .00 .00 .00 .00 .00

Five-categorical data Ability

Consensus Analysis Mode CBM Proportion CBM

Difficulty 0 1 0 1 0 1 [e, e, e] .67 .33 .67 .33 .93 .33 [e, e, m] .67 .67 .27 .67 .53 .60 [e, m, d] .00 .67 .00 .27 .00 .27 [m, m, m] .00 .67 .00 .67 .00 .93 [d, d, e] .00 .58 .00 .00 .00 .00 [d, d, d] .00 .00 .00 .00 .00 .00

mean person ability exceeded mean item difficulty. More specifically, best results were observed for the high ability group with easy or medium difficult items.

Nominal Response Model. For NRM V1, 100% of the conditions revealed a correlation between true abilities and NRM ability estimates ≥ .84. In 95.4% of the conditions, a correlation ≥ .90 was observed. The five cases, in which the correlation was lower than .90, were observed for the two conditions with low ability and difficulty

combination [d,d,d], the two conditions with high ability and difficulty combination [e,e,e] as well as high ability and difficulty combination [e,e,m] (here only one of two conditions).

For the NRM V3, 101 of 108 (93.5%) conditions revealed a correlation between true abilities and NRM ability estimates of ≥ .84, and 90 of 108 (83.3%) revealed a correlation of ≥ .90. Again, the observed small main effects of number of variables, ability and

difficulty, as well as the interaction of ability and difficulty showed their effects.

Correlations lower than .84 were only observed for low ability in combination with high difficulty combinations, that is [d,d,e] and [d,d,d]. However, with an increasing number of variables, the number of cases with correlations smaller than .84 decreased for these conditions. While with twelve variables, no correlation of ≥ .84 was observed for low ability and difficulty combinations [d,d,e] and [d,d,d], with a higher number of variables more cases reached the criterion. For number of variables at 24, only the combination of low ability and difficulty condition [d,d,d] revealed correlations lower than .84, whereas the combination of low ability and difficulty combination [d,d,e] yielded correlations equal to or above the cut-off point. For number of variables at 48, one out of two conditions revealed correlation smaller than .84 for low ability and difficulty combination [d,d,d].

Results are similar for stricter criteria: For each level of the independent variable number of variables - that is twelve, 24 or 48 variables - the combination of low ability and difficult items [d,d,d] revealed correlations smaller than .90. In addition, for number of variables at twelve, correlations were lower than .90 in the combinations low ability with [e,m,d], [m,m,m] and [d,d,e] as well as with medium ability and only difficult items

([d,d,d]). Moreover, with high ability and difficulty combinations [e,e,e] and [e,e,m], the correlations were smaller than .90. In all other combinations, the criterion was reached and correlations were higher than .90.

5.2.3 Summary of Results. The results for the HOMALS scoring method are

In document An Investigation of Empirical Scoring Methods for Ability Measurement (Page 161-167)