Chapter 3 Rotatable Asymmetric Variable Compensation MIRT
3.3 Simulation
Simulation studies are used to examine the parameter recovery of the RAVCM under different true models, and to check the effect of rotation between correlation structures. To compare the estimates of the parameters in different models, all models
are estimated by the same method, i.e., Metropolis Hasting within Gibbs sampler. The choices of prior distributions are also similar. When estimating the CM, NCM, and VCM, the prior distributions of a, b, and θ are set to be normal, which is the same as the priors of the RAVCM. In different simulations, the ability parameters (θ1, θ2) are all generated from multivariate normal, with different correlation. The a
parameters are generated from the truncated normal distribution with mean 1 and variance 1. To keep the monotonicity of the model, it is assumed that a parameters are not smaller than 0. If a generated value is negative, it will be re-generated. The difficulty parameters (b) are generated from normal distribution with variance 1.5 and mean 0 for the CM, mean -1 for the NCM, due to the large overall difficulty of the NCM (Bolt & Lall, 2003).
The data sets are simulated according to the generated parameters and one of the existing MIRT models. The Cronbach’s alpha of all data sets are greater than 0.9. The proportion of correct answers of all data sets are approximately 0.5. Four sample sizes are considered: (a) 500 examinees, 15 items; (b) 500 examinees, 30 items; (c) 1000 examinees, 15 items; (d) 1000 examinees, 30 items. In the first simulation study, to compare the performance of the CM, NCM and RAVCM under different true mo- dels, the data sets are generated from the CM, NCM or the mixture of the CM and NCM. In the second study, to assess ability to rotate, the data sets are generated from the NCM under different correlations of abilities. To compare the estimates given by different models under the same scale and direction, the estimated abilities are scaled or Procrustes rotated and scaled according to the shape of generated abilities. For ro- tatable models such as the CM and RAVCM, the Procrustes rotation does not change the fitted probabilities when the item vectors are inversely rotated. However, for the NCM, since rotation changes the fitted probabilities, Procrustes rotation cannot be applied.
The following statistics are included in the results. The first one is root mean square errors (RMSEs) of ability estimates, defined as
RMSE(θ.) = v u u u t 1 N N X j=1 (θ.j −θˆ.j)2,
whereN is the total number of examinees;θ.jand ˆθ.jare the true ability and estimates
ability of the j-th examinees, respectively. The point “.” can be replaced by any dimension number, such as 1 or 2 in two-dimensional case. Second, RMSEs of the estimated response surfaces, which is
RMSE( ˆP(θ)) = v u u u t 1 nN n X i=1 N X j=1 (Pij(θj)−Pˆij(θj))2,
where n and N are the number of items and examinees, respectively. Pij(θj) and
ˆ
Pij(θj) are probabilities that examinee j answers item i correctly, given by the true
and estimated item response surface of item i, respectively. The two probabilities are both evaluated using true abilities, and thus only measure how well the item re- sponse surfaces are recovered. Unlike the integral over the whole space, this statistic measures the differences between the true and estimated surfaces where the abilities exist. The last statistic is log-likelihood of the estimated model, which is a measure of goodness of fit. The statistics in the results are based on an average of 50 replications, for each setting.
In the first simulation study (Table 3.1, 3.2, and 3.3), the true correlation (0) was set to be known. The purpose is to compare parameter recovery of the CM, NCM and RAVCM under different true models. The data sets are generated using the CM, NCM, or a mixture of them. As expected, when the CM is the true model, the NCM gives the largest RMSEs and the worst fitting statistic among the three models. Although the RAVCM cannot recover the abilities and item response sur-
faces as well as the CM (It might be caused by the difference between the CM and RAVCM surfaces.), it gives much smaller RMSEs than the NCM. Also, the ability RMSEs of the RAVCM is very similar to those of the CM. When the NCM is the true model, the log-likelihoods given by the three models are close, but the RMSEs of ability estimates and surfaces given by the CM are generally larger than the NCM and RAVCM. When half of items are generated from the CM and the other half are generated using the NCM, the RAVCM gives smaller RMSEs of ability and surface estimates than the CM and the NCM.
Table 3.1 Performance of the CM, NCM and RAVCM when the true model is the CM. (“R” is the abbreviation of “RMSE”. “Fitted” means the fitted model.)
Sample Size Fitted R(θ1) R(θ2) SE(θ1) SE(θ2) R( ˆP(θ)) Log like
500 examinees CM 0.58 0.59 0.59 0.81 0.04 -2218 15 items NCM 0.67 0.67 0.69 0.79 0.24 -3637 RAVCM 0.59 0.59 0.69 0.72 0.08 -2335 500 examinees CM 0.48 0.55 0.46 0.75 0.03 -4739 30 items NCM 0.57 0.65 0.67 0.73 0.16 -5026 RAVCM 0.50 0.57 0.59 0.68 0.08 -4915 1000 examinees CM 0.63 0.65 0.55 0.91 0.03 -4749 15 items NCM 0.73 0.73 0.80 0.83 0.16 -5313 RAVCM 0.65 0.66 0.48 0.99 0.07 -4997 1000 examinees CM 0.52 0.53 0.40 0.72 0.03 -10107 30 items NCM 0.61 0.62 0.63 0.64 0.14 -10478 RAVCM 0.52 0.53 0.55 0.60 0.07 -10663
In the second simulation (Table 3.4, 3.5), the data were generated according to the NCM (2) with correlated abilities (small correlation, 0.4 and moderate correlation, 0.7), and estimated using both the NCM (2) and the RAVCM (6) as if the abilities were uncorrelated. Because of its rotatability, the RAVCM is true under any corre- lation structures, while the NCM is no longer the true model when the correlation of abilities are misspecified. To check the effect of rotation, the parameter recovery was evaluated after transforming the estimates back onto the simulated scales where
Table 3.2 Performance of the CM, NCM and RAVCM when the true model is the NCM. (“R” is the abbreviation of “RMSE”. “Fitted” means the fitted model.)
Sample Size Fitted R(θ1) R(θ2) SE(θ1) SE(θ2) R( ˆP(θ)) Log like
500 examinees CM 0.79 0.81 0.63 0.91 0.10 -3007 15 items NCM 0.72 0.74 0.75 0.84 0.07 -3014 RAVCM 0.71 0.74 0.78 0.80 0.08 -3036 500 examinees CM 0.65 0.69 0.55 0.83 0.09 -5937 30 items NCM 0.47 0.56 0.54 0.63 0.06 -5870 RAVCM 0.47 0.56 0.55 0.61 0.07 -5914 1000 examinees CM 0.73 0.75 0.85 0.91 0.11 -5441 15 items NCM 0.63 0.67 0.74 0.83 0.06 -5093 RAVCM 0.63 0.68 0.69 0.72 0.08 -5163 1000 examinees CM 0.68 0.70 0.49 0.75 0.12 -11359 30 items NCM 0.55 0.58 0.58 0.63 0.04 -11045 RAVCM 0.56 0.59 0.60 0.61 0.07 -11132
the abilities are correlated. Under all conditions, the RAVCM gives smaller RMSEs of abilities than the NCM. The RMSEs of surfaces are similar. It indicates that after transforming the estimates to correlated space, the RAVCM gives surfaces close to the NCM, which means that the true surfaces in uncorrelated space are well recove- red. The log-likelihood of fitted RAVCM is smaller than the NCM when the number of items are small (15). With 30 items, the log-likelihoods are similar.
Note that in Table 3.4 and 3.5, the RMSEs of surfaces are compared under the spaces with true ability correlations. The true model under those spaces should be the NCM, and this is the reason why the RMSEs of surfaces given by the two models are similar. For the RAVCM, the estimated abilities and surfaces under uncorrelated space can be rotated to the other spaces with different correlation structure. The RMSE’s of ˆP(θ) and likelihood for the RAVCM in Table 3.4 and 3.5 do not change. For the RAVCM, if the item response surfaces are well recovered in the uncorrelated space, the rotated surface will also be good in the correlated spaces. In Figure 3.6, the correlation was misspecified as 0 while the true correlation for the NCM was 0.7. The estimated RAVCM is very close to the true model under 0 correlation representation
Table 3.3 Performance of the CM, NCM and RAVCM when the true model is the mixture of the CM and NCM. (Half of items are generated using the CM, and the other half are generated using the NCM). (“R” is the abbreviation of “RMSE”. “Fitted” means the fitted model.)
True Model: 7 CM items, 8 NCM items or 15 CM items, 15 NCM items. Sample Size Fitted R(θ1) R(θ2) SE(θ1) SE(θ2) R( ˆP(θ)) Log like
500 examinees CM 0.78 0.79 0.74 0.76 0.09 -2920 15 items NCM 0.86 0.87 0.90 0.99 0.16 -3506 RAVCM 0.76 0.78 0.78 0.82 0.11 -2983 500 examinees CM 0.67 0.70 0.59 0.63 0.08 -5204 30 items NCM 0.62 0.64 0.64 0.64 0.10 -5263 RAVCM 0.54 0.58 0.62 0.61 0.06 -5254 1000 examinees CM 0.65 0.70 0.72 0.78 0.08 -5239 15 items NCM 0.67 0.75 0.62 0.88 0.14 -5299 RAVCM 0.62 0.69 0.71 0.75 0.07 -5331 1000 examinees CM 0.63 0.68 0.38 0.80 0.09 -11037 30 items NCM 0.51 0.55 0.54 0.62 0.08 -11075 RAVCM 0.47 0.49 0.50 0.58 0.05 -11077
Table 3.4 Performance comparison between the NCM and RAVCM when abilities are correlated. Parameter recovery of the RAVCM was evaluated after transforming the estimates back onto the simulated scales where the abilities are correlated. The true correlation is 0.4.
Sample Size Fitted R(θ1) R(θ2) SE(θ1) SE(θ2) R( ˆP(θ)) Log like
500 examinees NCM 0.73 0.75 0.94 0.95 0.12 -3282 15 items RAVCM 0.67 0.68 0.87 0.94 0.12 -2789 500 examinees NCM 0.52 0.55 0.59 0.62 0.05 -5933 30 items RAVCM 0.48 0.50 0.59 0.58 0.06 -5908 1000 examinees NCM 0.65 0.69 0.90 0.88 0.12 -7403 15 items RAVCM 0.62 0.63 0.82 0.87 0.12 -6648 1000 examinees NCM 0.52 0.53 0.60 0.66 0.05 -11811 30 items RAVCM 0.46 0.49 0.61 0.64 0.06 -11874
and can be rotated to the 0.7 representation. Both representations looks close to the true model. For the NCM, the problem is that it gives similar item response surfaces under different specified correlation spaces (see the last column of Figure 3.6). If the specified correlation is far away from the true correlation, the error of estimated item response surfaces given by the NCM will be large.
Table 3.5 Performance comparison between the NCM and RAVCM when abilities are correlated. Parameter recovery of the RAVCM was evaluated after transforming the estimates back onto the simulated scales where the abilities are correlated. The true correlation is 0.7.
Sample Size Fitted R(θ1) R(θ2) SE(θ1) SE(θ2) R( ˆP(θ)) Log like
500 examinees NCM 0.65 0.68 0.94 0.95 0.18 -4182 15 items RAVCM 0.51 0.55 0.69 0.79 0.18 -3699 500 examinees NCM 0.60 0.61 0.80 0.87 0.11 -6582 30 items RAVCM 0.49 0.52 0.79 0.83 0.11 -6372 1000 examinees NCM 0.56 0.60 0.82 0.85 0.08 -6418 15 items RAVCM 0.50 0.54 0.77 0.85 0.09 -5758 1000 examinees NCM 0.55 0.57 0.71 0.72 0.06 -12781 30 items RAVCM 0.45 0.47 0.66 0.73 0.07 -12754
Figure 3.6 Contour plots of the true and estimated RAVCM under different spaces.
Note: Figure 3.6 shows the contour plots of (a) the true model under 0 and 0.7 correlation representation (the fisrt column); (b) the estimated RAVCM under mis-
specified correlation 0 and the transformed estimated RAVCM under the space where the correlation is 0.7 (the second column); (c) estimated standard NCM when corre- lation was specified as 0 and 0.7 (the third column). The label of the horizontal and vertical axis are θ1 andθ2, respectively. The true correlation of the generated ability
parameters was 0.7. All plots are for the same item.