2.5 NSCAW Application
2.5.2 Simplex Assumption Testing and Reliability Estimates
The general model for each scale, which is given in Tables 2.3 and 2.4 for each scale, is used as the larger model for testing the QSM assumptions of equal true score variance, equal error variance, and equal reliability over time. Tables 2.5 and 2.6 give results of these tests with non-significant tests highlighted in bold. For the majority of scale scores at least one of the assumptions of constant true score variance or constant error variance may be made. In cases where both assumptions are reasonable, the simultaneous assumption of both equal error variance and equal true score variance over time (that is, equal reliability over time) is also acceptable. Neither one of the assumptions seems dominant for the split-halves models where 11 of the 23 scales have equal error variance over time and 9 of the 23 scales have equal true score variance over time (Table 2.5). The assumption of both equal error variance and true score variance over time holds for the four scales where both assumptions hold separately in the split halves models. There are 7 out of the 23 scales where neither QSM assumption holds indicating that reliability estimates obtained for these scales from a model with one of the QSM assumptions may result in poor estimates.
For the item-level models, the assumption of constant true score variance over time is more prevalent than the assumption of constant error variance over time. Constant true score variance holds for 10 of the 17 scales evaluated using the item-level model while only 3 of the 17 scales have constant error variance over time. These results support the idea of a possible test-retest or Socratic effect for this population since the error variances are declining over time while the true score variances are steady over time. The decreases in error variance are expected since the children are developing and thereby increasing their ability to answer the test questions. There are also a fair number (6 out of 17) of scale scores where neither assumption is reasonable. This occurs at higher rate for the item-level models compared to the split-halves models.
It is interesting that for scales that are comparable across the split-halves and item- level models, which include the CDI, School Engagement, and YSR, the QSM assumption test results sometimes differ. For example, for the CDI the split-halves model indicates equal error variance over time while the item-level model indicates neither assumption is acceptable. Also, the school engagement test results are opposite for the two types of models. One difference is that in the split-halves version, the best model for the CDI has equal loadings over time while the item-level model does not. Also, the split-halves general models fit better for the CDI, School Engagement and the YSR compared to the item-level models. This indicates that there are differences in the two model forms, split- halves and item-level. As noted, the equal error variance assumption does not often hold in the item-level model and the equal true score variance holds more often as compared to the split-halves version of the model.
The reliability estimates for the general model, the constant error variance, and constant true score variance models are presented in Tables 2.7 and 2.9. For the split- halves models, 9 out of the 17 constant true score variance models did not converge (Table 2.7). Four of the 9 scales with non-convergence issues were found to have constant true score variance in the testing of that assumption (see Table 2.5). Most of these have issues of non-invertible matrices, which are likely due to empirical underidentification of one or more of the parameters. As discussed in Section 3.1.3, this may occur in these models when both of the stability parameters approach 1 or are equal to each other. The constant true score variance over time constraint may be causing this to occur for many scales. Therefore, no reliability estimates for the constant true score model are available for these 9 scales. For the scales where reliability estimates are available, the estimates do not differ much across the various models. This indicates that the QSM assumption chosen would not affect the reliability estimates dramatically. In some cases applying a QSM constraint can decrease standard errors, but this is not a consistent result. Reliability estimates are efficient for these data including the general model estimates.
For the item-level models, 4 out of the 17 scales did not estimate for any of the models. These were generally scales with more indicators (15 - 32) and were simply too large to estimate. Also, most of the constant true score models did not estimate likely because of empirical underidentification issues. This is interesting in the item-level model case since almost 60% of the tests of constant true score variance were accepted (Table 2.6). Some of the other item-level models produced reliability estimates that are greater than one. In these cases, the pure random error is negative after applying all possible correlations between indicators within time, the cq. This indicates that the
model should be simplified to include fewer cq elements for these scales. Of the six
scales where this occurs, five have statistically significant negative pure error variances indicating that the number of additional factors should be reduced. Like the split-halves models, the reliability estimates for the item-level models do not differ greatly across the QSM assumptions indicating that these assumptions may not have a large impact on estimates for these data.