Discussion - Discussion and Future Work - Likelihood estimation for jointly analyzing item resp

Chapter 6 Discussion and Future Work

6.1 Discussion

The nature of speededness in operational tests and easy availability of response time data in computerized tests have inspired much research on the response times in educational and psychological testing. In this thesis, likelihood-based approaches to estimating the hierarchical framework (van der Linden, 2007) are developed to efficiently and accurately obtain parameter estimates for the response and response time models.

Two approaches were presented for jointly estimating the 3PLM and the lognormal response time model. One is based on the MML estimation, and the other builds on the MMAP estimation. Both approaches were implemented with the EM algorithm to cope with unknown latent variables in item calibration. Using the estimated item parameter values, examinees’ latent trait parameters—the proficiency and speed parameters—were estimated through the MAP and EAP estimators.

The estimation procedures, as implemented in this study, provided highly reliable and accurate results with respect to recovery of both the item and person parameters. Simulation studies presented in Chapters 3.3.1 and 3.3.2 suggest that the overall MSEs and biases of the estimated parameters were small, and the values of correlations between the parameters were recovered well. Although the MML estimation procedure showed occasional convergence problems, they mostly concerned the failure to home in on the pre-specified interval, which was set in a somewhat stringent manner. Despite the strict convergence criteria specified, the MMAP estimation procedure, on the other hand, showed excellent convergence rates as a result of utilizing prior information at the second level of the hierarchical framework. The results relating to the simulation factors were largely consistent with expectations. As a general rule, the larger the samples, the better the quality of parameter estimates.

The higher levels of correlations resulted in better estimation of the parameters than lesser levels as well. Overall, no substantive differences across varying ρP’s were evident in item calibration, most likely due to marginalization, so too were there no systematic differences in latent trait estimates due to different ρI levels.

The other aspect explored in this study was the feasibility of the likelihood-based procedures in calibrating the items in CAT. In Chapter 4.1, Fisher information matrix of item parameters for the hierarchical framework was developed for adaptively selecting calibration samples during the CAT administrations. A total of four optimality sampling designs were proposed that differ in terms of treatment of the information matrix and the purpose of the online calibration. A simulation study presented in Chapter 4.3 suggests that the MMAP estimation accompanied with the EM algorithm performed well despite the relatively small sample sizes (N = 400 ∼ 800). The overall MSEs observed remained small, and the biases of the estimates were close to zero. Increasing N or ρI consistently resulted in reduced estimation errors across the simulation conditions.

With respect to the sampling design, D-optimality was generally found more effective than A-optimality for selecting the calibration samples. The possible reason for this trend is that, while D-optimality takes into account the information from the respective item parameters as well as the joint information between the parameters, A-optimality only considers the information from the individual item parameters, and thus, capitalizing on the limited information in selecting the calibration samples. From the perspective of implementation, the two approaches are comparable in complexity, and hence, the choice of strategy should be driven by better outcomes. The second trend of note concerns the differences relating to the purpose of online calibration. Results from the simulation study suggest that when the purpose of field-testing is centered on accurately estimating the parameters of only the response model, DS- or AS-optimality should be preferred to D- or A-optimality. While the D- and DS-optimal sampling design produced comparable results, AS optimality clearly outperformed A-optimality by improving the estimation precision of the parameters of in-

terest.

Provided in Chapter 5 was the extension of the hierarchical framework into a more flexible response time model. The PHLTM (Ranger & Ortner, 2012), a modified version of the Cox PH model, was chosen for its increasing popularity in the measurement literature. In this study, the PHLTM was fit in a semiparametric fashion by leaving the baseline rates unknown, whereby the model allows flexibility in modeling response time distributions. The estimation procedure was based on the PPL in which latent speed parameters are constrained by a penalty function. It is computationally similar to other shrinkage methods for penalized regression, such as ridge regression and smoothing splines. Simulation studies presented in Chapter 5.3 suggest that the PPL estimator produced smaller errors than the PL estimator in recovering the true regression parameters and latent speed parameters. While the PL estimator tended to underestimate standard errors of the parameter estimates, the PPL estimator appeared to faithfully capture the true standard errors of the estimates. The application of the proposed estimation method within the hierarchical framework was also provided for jointly analyzing accuracy scores and response times.

In document Likelihood estimation for jointly analyzing item responses and response times (Page 105-107)