Version 1 of Web-based Computer Program for Simulations

CHAPTER V. RESULTS

5.1 ARCH Calibration Sufficiency (RQ1) Results

5.1.1 Version 1 of Web-based Computer Program for Simulations

Version 1 of a web-based computer program was written using several thousand lines of custom HTML and JavaScript code to re-administer simulated adaptive tests using the historical COM test data based on specific input values outlined in more detail below. The output of the simulation was displayed in HTML tables that were then copied to Excel and SPSS for analysis.

The web-based computer program developed allowed calibration, SPRT and EXSPRT settings, and ARCH criteria to be entered as input values prior to running the

simulations (see figure 10). The calibration settings specified the percentage of correct answers on the total test needed to qualify as a master with 72.5% representing the cut- score used in previous COM test studies. The remaining calibration settings dictated how the simulation would execute and when the simulation would terminate.

Figure 10. Screenshot of sample input settings for version 1 of the web based Monte Carlo COM Test Simulation program.

For example, the calibration settings values in figure 10 resulted in the following steps occurring during a simulation run.

Simulation Run Steps:

1. Set the calibration sample size to the minimum calibration sample size of 10. 2. Use item responses from randomly selected masters and nonmasters equal to the

calibration sample size (e.g. 10 masters and 10 nonmasters) to calibrate items for use with EXSPRT and empirically establish item-bank level probabilities for use with SPRT.

3. Administer a simulated test to each of the 104 examinees with all relevant SPRT and EXSPRT data being output to the associated tables.

4. Repeat steps 2 and 3 the number of times indicated (e.g., 10 rounds). 5. Increment the calibration sample size by the increment value of 10.

6. If the calibration sample size is less than or equal to the maximum calibration sample size value of 100 then go to step 2. Otherwise, end the simulation. Each of the adaptive testing algorithms also had a priori error rates that could be set to specific values. However, the simulations conducted for the purposes of this study used the same values that matched those used in earlier COM test studies. The prior probability of mastery was set to .5 and both the a priori false mastery and false nonmastery error rates were set to .025.

In each simulated adaptive test associated with step 3 above, a specific examinee would be randomly administered one of the 85 items with their actual correct or incorrect response to the item being available in the historical data and used as their response in the simulated test. Items would continue to be administered randomly to the same examinee until either all 85 items had been exhausted or all the adaptive testing algorithms had

been able to make a classification decision. The process would repeat with the next examinee and continue in this way until the conditions for the termination of the simulation specified by the calibration inputs had been met.

Each run of the simulations would populate data into five tables: (1) Precision of Item Calibration Estimates; (2) SPRT and EXSPRT Results by Examinee; (3) SPRT Precision of Item Calibration Estimates and Test Metrics By Unique Test; (4) EXSPRT Precision of Item Calibration Estimates and Test Metrics By Unique Test; and (5) Test Metrics By Calibration Sample Size Group. Screenshots of the first few rows of each of the five tables are provided below. It is important to note that these screenshots are not results of the study but are provided to show context on how the simulations operated.

Figure 11. Screenshot of sample output of results from estimating precision of item calibrations. Where:

Cal Sample Size is the calibration sample size being simulated

Cal Round is the calibration round for a given calibration sample size Item ID is the ID of one of the 85 items being calibrated

Nonmaster (NM) is label for all the nonmaster calibration statistics for the item Master (NM) is label for all the master calibration statistics for the item

# is the number of nonmasters/masters in the calibration sample size

s is the number of nonmaster/master successful/correct responses to the item f is the number of nonmaster/master failed/incorrect responses to the item

P(C|NM)/P(C|M) is the probability of a correct response from a nonmaster/master P(!C|NM)/P(!C|M) is the probability of an incorrect response from a nonmaster/master

Area I is the area under masters beta distribution curve between the P(C|NM) and the end of the tail Area II is the area under masters beta distribution curve between the P(C|NM) and a specific alpha point Beta t-tests columns are initial experiments with various versions of the Beta Difference Index statistic

Figure 12. Screenshot of sample results for SPRT and EXSPRT tests during Monte Carlo COM test simulations Where:

Cal Sample Size is the calibration sample size being simulated

Cal Round is the calibration round for a given calibration sample size Examinee ID is the ID of one of the 104 examinees who took the COM test

Is Master is the indication of if the examinee is a master based on their total test score Master (NM) is label for all the master calibration statistics for the item

SPRT Results are the set of results associated with the SPRT based test EXSPRT Results are the set of results associated with the EXSPRT based test Correct/False NM/False M/No Dec indicates the accuracy of the test decision Test Length is the number of items given on the test before a decision was made Total Test Score is the percent of correct answers the examinee had on the total test

Figure 13. SPRT precision of item calibration estimates and test metrics by unique calibration round output table screenshot from Monte Carlo COM test simulations

Where:

Cal Sample Size is the calibration sample size being simulated

Cal Round is the calibration round for a given calibration sample size

Precision of Item Calibration Estimates are the set of statistics associated with the item-bank level parameter estimates Nonmaster (NM) is label for all the nonmaster calibration statistics for the item-bank

Master (NM) is label for all the master calibration statistics for the item-bank # is the number of nonmasters/masters in the calibration sample size

PRE is the proportion reduction in error achieved by the test

Percent Correct/False NM/False M/No Dec indicates the accuracy of the test decisions Test Length is the number of items given on the test before a decision was made

μ is the mean test length

Figure 14. EXSPRT Precision of Item Calibration Estimates and Test Metrics By Unique Test Output Table Screenshot Where:

Calibration Sample Size is the calibration sample size being simulated

Calibration Round is the calibration round for a given calibration sample size

Master (NM) is label for all the master calibration statistics for the item-bank # is the number of nonmasters/masters in the calibration sample size

SD is the standard deviation associated with mean

Beta t-tests columns are initial experiments with various versions of the Beta Difference Index statistic EXSPRT Test Metrics are the set of statistics associated with EXSPRT testing

PRE is the proportion reduction in error achieved by the test

Percent Correct/False NM/False M/No Dec indicates the accuracy of the test decisions Test Length is the number of items given on the test before a decision was made

Figure 15. Test Metrics By Calibration Sample Size Group Output Table Screenshot Where:

Cal Sample Size is the calibration sample size being simulated

SPRT/EXSPRT Results are the set of statistics associated with SPRT/EXSPRT test simulations PRE is the proportion reduction in error achieved by the test

Percent Correct/False NM/False M/No Dec indicates the accuracy of the test decisions Test Length is the number of items given on the test before a decision was made

μ is the mean of the associated statistic

5.1.2 High Error Rate with Empirically Established Item-bank Level Probabilities

In document Facilitating Variable-Length Computerized Classification Testing Via Automatic Racing Calibration Heuristics (Page 125-135)