Consensus Analysis - Individual Analysis of Scoring Methods

5.2 Results

5.2.1 Individual Analysis of Scoring Methods

5.2.1.3 Consensus Analysis

Euclidean Distance. The total mean Euclidean distance between the true scoring key and the scoring key resulting from Consensus Analysis was M = 2.59 (SD = 2.32), ranging from a minimum of zero to a maximum of 6.93 for two-categorical data. For five-categorical data, this dependent variable ranged from a minimum of zero to a maximum of 27.71 with a total mean of M = 10.59 (SD = 9.20).

Table 15

Descriptive Statistics for Euclidian Distance for Factor Levels of the Independent Variable Number of Variables (Consensus Analysis)

Data Number of variables M SD Minimum Maximum

two-categorical 12 1.76 1.45 0 3.46 24 2.49 2.05 0 4.90 48 3.52 2.91 0 6.93 five-categorical 12 7.12 5.74 0 13.86 24 10.17 8.10 0 19.60 48 14.48 11.40 0 27.71

Note. The descriptive statistics are collapsed over all factor levels of the other

independent variables.

Within conditions, mean variation was M = .69 (SD = 1.00) with a range from zero to 3.46 for two-categorical data and M = 2.29 (SD = 3.47) ranging from zero to 13.77 for five-categorical data. Table 15 presents the descriptive statistics for each factor level of the dependent variable number of variables for both two- and five-categorical data. Again, the

possible minimum as well as maximum values (see paragraph 5.1.2.2) for the Euclidean distance were observed in some conditions. Hence, the scoring method performed very well in some experimental conditions, whereas in other conditions scoring keys did not converge at all.

Table 16

Four-way ANOVA Results for Euclidian Distance (Consensus Analysis)

2-categorical data 5-categorical data

Source SS df η2 _SS _df _η2 M: No. of variables 1121732.69 2 .08 19676059.63 2 .09 N: No. of respondents 17540.77 3 .00 101215.00 3 .00 D: Difficulty 2537545.42 5 .17 41521697.21 5 .19 A: Ability 5840602.22 2 .39 88929858.31 2 .41 M × N 1589.39 6 .00 18099.20 6 .00 M × D 196638.52 10 .01 3361844.13 10 .02 M × A 464056.74 4 .03 6651046.16 4 .03 N × D 7859.99 15 .00 54746.54 15 .00 N × A 19293.95 6 .00 268494.40 6 .00 D × A 1265027.72 10 .09 19163983.76 10 .09 Within 3177769.58 .21 37240008.92 .17 Total 14794711.97 219217767.98

Note. Table includes sum of squares, degrees of freedom, and effect sizes for main

effects and first order interactions.

Results of analysis of variance in Table 16 reveal that Euclidean distance varied mostly with ability of respondents (η2 _{= .39 and η}2 _{= .41) for both two- and five-categorical}

= .09) and interaction of difficulty and ability (η2 _{= .09 and η}2 _{= .09) yielded medium}

effects and accounted for variation in the dependent variable. Very small and small effects were also present for interactions of number of variables with both ability and difficulty.

The direction of effects is identical for two- and five-categorical data, which is why results are presented graphically only for five-categorical data. Main and interaction effects for two-categorical data are presented in Appendix C, Figures 49 and 50. As the graphical presentation of main effects in Figure 10 shows, Euclidean distance increased with number of variables, was not (mainly) affected by number of respondents, increased with difficulty and decreased with higher ability. That is, distance was comparably low for difficulty combination [e,e,e] with a mean of M = 3.74, whereas it was somewhat higher for difficulty combination [d,d,d] (M = 17.13) for five-categorical data. Similarly, for high ability mean Euclidean distance was lower (M = 3.65) than for low ability (M = 18.36).

Euclidean Distance 0 5 10 15 20 25 30

nvar = 12 nvar = 24 nvar = 48

Euclidean Distance 0 5 10 15 20 25 30 N = 20 N = 50 N = 100 N = 200 Euclidean Distance 0 5 10 15 20 25 30

[e,e,e] [e,e,m] [m,m,m] [e,m,d] [d,d,e] [d,d,d]

Euclidean Distance 0 5 10 15 20 25 30 −1 0 1

Figure 10 . Five-categorical data: Main effects of independent variables on Euclidean

distance (Consensus Analysis). Top left panel: Number of variables; top right panel: Number of respondents; bottom left panel: Difficulty; bottom right panel: Ability.

The interaction effects of ability and difficulty (Figure 11) were pronounced. With high ability, Euclidean distance was low for easy and medium difficulty combinations, slightly increased with difficulty combination [d,d,e] and strongly increased with only difficult items. For medium ability, only easy difficulty combinations (i.e., [e,e,e] and [e,e,m]) resulted in low Euclidean distance. For low ability, however, Euclidean distance never approached zero, although it was lower for difficulty combination [e,e,e] compared to the other difficulty combinations. The interaction effects observed for number of variables with both ability and difficulty were again caused by the way Euclidean distance is

calculated: Because of squaring, with increasing number of variables, values increase more steeply when the distances are greater.

0 5 10 15 20 25 30 Number of Variables Euclidean Distance 12 24 48 Ability −1 0 1 0 5 10 15 20 25 30 Difficulty Euclidean Distance

eee eem emd mmm dde ddd

Number of Variables 12 24 48 0 5 10 15 20 25 30 Difficulty Euclidean Distance

eee eem emd mmm dde ddd

Ability −1 0 1

Figure 11 . Five-categorical data: Interaction effect of independent variables (Consensus

Analysis). Top left panel: Interaction of number of variables and ability; top right panel: Interaction of number of variables and difficulty; bottom left panel: Interaction of ability and difficulty.

Confirmatory Factor Analyses. After the application of Consensus Analysis as a scoring method to score the simulated data, some items showed zero variances; hence these data sets could not be used for the confirmation of the one-dimensional structure. For two-categorical data, the mean number of exclusions due to zero variance was M = 5.08 (SD = 16.79), ranging from a minimum of 0 to a maximum of 115 between conditions. A total of 823 data sets had to be excluded because of zero variance after scoring. These data sets were the same ones that were excluded for CFAs of mode CBM scored data because of zero variance (see paragraph 5.2.1.1). The data sets had zero variances before mode CBM or Consensus Analysis were used, which was apparently most likely for low sample size and certain combinations of difficulty and ability.

For five-categorical data, although no zero variances before scoring were observed, zero variances were observed after application of Consensus Analysis6_{. The number of}

excluded data sets was fairly high, with a mean of M = 277.67 (SD = 1038.91), ranging from 0 to 6854 between conditions. Here, a total of 44982 data sets had to be excluded because of zero variance after scoring. Zero variances after scoring could only occur if these response options - which were not chosen by any respondent (or by every respondent, which did not occur as zero variance before scoring was tested) - were selected as correct. Obviously, the algorithm would fail in this case. This point will be readdressed in the discussion section. The number of exclusions because of zero variance varied with the manipulated data characteristics for five-categorical data. Figure 51 in Appendix C present the number of exclusions after scoring for each factor level. The number of exclusions increased with increasing number of variables, and decreased with increasing sample size. In addition, observations of zero variance were dependent upon item difficulty and ability. The number of exclusions was very low for difficulty combinations [e,e,e], [m,m,m], and [d,d,d], that is, for combinations with uniformly distributed difficulty. However, for mixed

6_{This was only registered for N > 20, as CFAs were not conducted for data sets with very low sample}

difficulty combinations, the number of excluded data sets was fairly high. In addition, the number of exclusions slightly increased with higher ability. Moreover, the number of exclusions was highest for the combination low ability and [e,e,m], medium ability and [e,m,d], as well as high ability and [d,d,e] (see Figure 53 in Appendix C). However, for the remaining data sets most CFAs were successfully conducted. Only for two-categorical data, nine CFAs showed difficulties, as the residual covariance matrix was not positive definite. For five-categorical data, no estimation difficulties were observed.

Table 17

Descriptive Statistics for CFA Results (Consensus Analysis)

2-categorical data 5-categorical data

M SD M in M ax M SD M in M ax Chi Square 55.23 1174.01 54.88 1977.43 df 54 1080 54 1080 No. p-vala _1021.90 _1339.51 ₂ ₆₅₃₃ _994.40 _1810.98 ₀ ₁₀₀₀₀ CFI .93 .07 .69 .99 .97 .05 .68 1.00 RMSEA .02 .01 .01 .05 .02 .01 .01 .09

Note. Estimator: Weighted Least Squares Mean and Variance Adjusted. Min = Minimum;

Max = Maximum. aNumber of p-values p < .05.

As stated in Table 17, mean results for WLSMV CFAs are satisfactory for

five-categorical data, whereas for two-categorical data CFI values indicate non-satisfactory fit, but RMSEA values indicate good fit. Minimum values of CFI reveal for both two- and five-categorical data that fit was non-satisfactory in some conditions. In fact, the fit indices varied with data characteristics, number of respondents having the biggest influence on fit indices for two categorical data (see Appendix C for the graphical presentation). In addition, fit was slightly worse for data sets with a higher number of variables (only with

respect to CFI and χ2_{-test), better for medium ability compared to high and low ability,}

and comparably better for difficulty combination [m,m,m] (compared to other difficulty combinations) for two categorical data. The effects of sample size and number of variables were similar, although less pronounced, for five categorical data. However, fit was

comparably worse for medium ability compared to high and low ability. Moreover, fit was substantially worse for the difficulty combination [e,m,d]. The effects of ability and

difficulty seemed to be more pronounced for five-categorical data compared to two-categorical data.

To summarize, the one-dimensional structure remains relatively stable after scoring, most substantially influenced by sample size and number of variables. Although fit varied with manipulated data characteristics, the pattern of variation was different compared to Euclidean distance. The influence of the independent variables ability and difficulty seems to be somewhat higher compared to CBM methods, however, different for two- and

five-categorical data and also different compared to the way both independent variables influenced Euclidean distance. Moreover, for the interpretation of the influence of ability and difficulty, one should bear in mind the CFA drop-out for five-categorical data, which also mainly varied with different factor levels of ability and difficulty.

5.2.1.4 HOMALS. Unlike the other scoring methods, for the application of

In document An Investigation of Empirical Scoring Methods for Ability Measurement (Page 131-137)