Subscores Reliability and Classification - Bi-factor Multidimensional Item Response Theory Mode

Results from subsection 4.2.4 and subsection 4.3.4 are used to make inference about subscores reliability and classification, so as to answer the last research question of this dissertation. Results that are presented in Chapter 4 showed that there were substantially high Bayesian marginal subscores reliabilities, in average ˆρ > .90 for the BF-C model and ˆρ > .80 for the BF-PC model, respectively. For both models, higher subscores reliability resulted from lower bias and reduction in the error variance (i.e. RMSE) of thetas in all dimensions which is arrived at a higher discrimination level or for a longer test length.

Subscores reliabilities in the specific ability dimensions (dimension 2, 3, and 4) were somewhat lower than the subscore for the primary ability in the BF-PC model. Subscores reliabilities in the specific ability dimensions from the BF-C model showed equally high subscores reliability to the primary ability dimension. These results support BF-C as a better model that improved subscores reliabilities in the specific ability dimensions and maintain high reliability in the primary ability dimension.

The results also support a longer test to have higher subscores reliabilities. Also subscore reliabilities are improved when the levels of discrimination in the specific ability dimensions are higher than the primary ability dimension. This result supported the hypothesis that high discrimination level in the specific content domains improves the reliability of subscores.

The subscore separation index (SSI) that is greater than 1.00 was used as a benchmark to quantify the quality of the estimated subscores in the specific ability dimensions in terms of on how those subscores are highly separated from the primary ability dimension. Both studied models showed that SSI increases for a longer test and at a higher level of discrimination. Note that the SSI for the simulation studies only accounted for any values greater than 1.00 that the quality of subscores was determined based on subscores in the specific ability dimensions that were 1 level distinct (i.e. higher or lower) than the primary ability dimension. There were SSIs less than 1.0 but greater than 0.0 to distinguish subscores in the specific ability dimension from the primary dimension, which was not examined in this dissertation.

There were very low hit-rates of SSI > 1.00 from the BF-PC model, that ranged from .03 to 2.5 percent. This result may explain the nature of the partially compensatory model that the model did not substantially distinguish the subscores in the specific ability dimensions from the primary ability dimension. Thus, the subscores reported from BF-PC model make less distinction if they were estimated from the undimensional approach. This also implies that an examinee needs high performances in all abilities being measured in an item to answer the item correctly, which is also true that there were integrations of abilities that solely dominant by one primary ability within a person to perform well in a whole test. Another explanation from this results is that the choice of SSI> 1.00 for this model might be too strict due to the nature of the model that required all high proficiencies for better performance in a test. Thus, the index of separation between the specific ability dimension and primary ability dimension that is more conservative can be considered, for example SSI > .30 or SSI > .50.

SSIs for the BF-C model at varying simulation conditions were observed to be higher than the SSIs from the BF-PC model, that the hit-rates ranged from 7 to 15 percent and there were always SSIs greater than 0.00. Thus, the estimated subscores in the specific ability dimensions of the BF-C model showed 1 level distinct (i.e. higher or lower) than the subscore for the primary ability dimension. This implies that the BF-C model can be considered to explain many integrations of abilities that might be needed for an examinee to answer an item correctly, as well as to perform

better in a test when the examinee has higher abilities in the specific ability dimensions than the primary ability dimension.

The hit-rates that are resulted from the SSI can be used to explain the frequencies of examinees with distinct subscores, that is higher or lower scores, in the specific ability dimensions compared to the primary dimension. Results from BF-PC model showed that there were less than 2 percent of the examinees had have dominant specific ability dimensions that had helped them to perform better in a test. From 1,500 examinees, this is equivalent to about 30 examinees that were recognized with their distinctions in the specific abilities that they have showed good or bad performance in a test with multiple dimension abilities. Whereas, 98 percent of the rest of the examinees may or may not have 1 level distinction (higher or lower) in their specific ability dimensions compared to the primary ability.

If BF-C model was considered for score reporting, there were more than 6 percent and up until 15 percent of the examinees had showed distinct performances in their specific ability dimensions than in the primary ability dimension. From 1,500 examinees, these percentages equivalent to about 90 and up to 210 examinees that had showed notable different in their performances that affected more from their specific ability dimensions than their primary ability, which implies their distinction performances in a test with multidimensional structure.

In document Bi-factor Multidimensional Item Response Theory Modeling for Subscores Estimation, Reliability, and Classification (Page 129-131)