Discussions and Future Directions - Some practical item selection algorithms in cognitive diagn

The current studies represent an effort to advance the feasibility of CD-CAT, an

intelligent educational measurement tool that was envisioned as enhancing individualized

learning over twenty years ago. On one hand, recent developments in cognitive diagnostic

modeling and CAT have equipped psychometricians with tools they can use to embark on the

development of CD-CAT. On the other hand, the CD assessment component in the PARCC and

Smarter Balanced and the pedagogy issue in Moocs present great opportunities for CD-CAT.

The current studies have focused on the crucial element of CD-CAT: item selection

algorithms. A comprehensive review of item selection algorithms in CD-CAT was conducted.

Several new selection algorithms were proposed to address two important issues in CD-CAT:

measurement efficiency and item exposure control. The PWCAI and PWACDI are

computationally affordable and highly efficient alternatives to other information index-based

algorithms. They can be used as a building block for the development of algorithms to deal with

issues such as item exposure control, content balancing and duel-purpose CD-CAT in CD-CAT.

All of these can develop into interesting future studies.

Although the binary stratification algorithm is a simpler alternative than the information

index-based methods, current research has demonstrated its edge in balancing the item exposure

rates in both fixed-length and variable-length CD-CAT. The stratification method has been well

studied in traditional CAT. It offers an elegant solution to the item exposure control. It also has

the potential to solve item selection problems when multiple constraints must be taken into

It appears that the two new proposed approaches in the current studies are competitors,

but this is not necessarily the case, because each of them may be a better fit in different

scenarios. In general, PWCDI and PWACDI are preferred when measurement efficiency is the

top priority, while binary stratification is more advantageous in highly constrained CD-CAT. In

some applications that have multiple constraints, there exists the possibility of using a hybrid

REFERENCES

Almond, R. G., DiBello, L. V., Moulder, B., & Zapata‐Rivera, J. D. (2007). Modeling

diagnostic assessments with bayesian networks. Journal of Educational Measurement,

44, 341-359.

American Federation of Teachers. (2014). Moving from 'test and punish' to 'support and improve'

Retrieved March 1st, 2015, from http://www.aft.org/column/moving-test-and-punish-

support-and-improve

Belov, D. I., Armstrong, R. D., & Weissman, A. (2008). A monte carlo approach for adaptive

testing with content constraints. Applied Psychological Measurement, 32, 431-446.

Bolt, D. (2007). The present and future of irt‐based cognitive diagnostic models (icdms) and

related methods. Journal of Educational Measurement, 44, 377-383.

Bunderson, C. V., Inouye, D. K., & Olsen, J. B. (1988). The four generations of computerized

educational measurement ETS Research Report Series. Priceton, NJ: Educational Testing

Service.

Burstein, J. (2003). The e-rater® scoring engine: Automated essay scoring with natural language

processing. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-

disciplinary perspective (pp. 113-121). Mahwah, NJ: Lawrence Erlbaum Associates.

Campione, J. C., & Brown, A. L. (1990). Guided learning and transfer: Implications for

approaches to assessment. In N. Frederiksen, R. Glaser, A. Lesgold & M. G. Shafto

(Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp. 141-172).

Hillsdale, NJ: Lawrence Erlbaum Associates.

Chang, H.-H. (2012). Making computerized adaptive testing diagnostic tools for schools. In R.

history and predictions for the future (pp. 195-226). Charlotte, NC: Information Age

Publishing.

Chang, H.-H. (2014). Psychometrics behind computerized adaptive testing. Psychometrika, 1-20.

Chang, H. H., Qian, J., & Ying, Z. (2001). A-stratified multistage computerized adaptive testing

with b blocking. Applied Psychological Measurement, 25, 333-341.

Chang, H. H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an irt

model. Psychometrika, 58, 37-52.

Chang, H. H., & Ying, Z. (1999). A-stratified multistage computerized adaptive testing. Applied

Psychological Measurement, 23, 211-222.

Chang, H. H., & Ying, Z. (2009). Nonlinear sequential designs for logistic item response theory

models with applications to computerized adaptive tests. The Annals of Statistics, 37,

1466-1488.

Chang, S. W., & Ansley, T. N. (2003). A comparative study of item exposure control methods in

computerized adaptive testing. Journal of Educational Measurement, 40, 71-103.

Chen, S. Y., & Ankenman, R. D. (2004). Effects of practical constraints on item selection rules

at the early stages of computerized adaptive testing. Journal of Educational

Measurement, 41, 149-174.

Chen, S. Y., & Lei, P. H. (2005). Controlling item exposure and test overlap in computerized

adaptive testing. Applied Psychological Measurement, 29, 204-217.

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: Cd-cat.

Cheng, Y. (2010). Improving cognitive diagnostic computerized adaptive testing by balancing

attribute coverage: The modified maximum global discrimination index method.

Educational and Psychological Measurement, 70, 902-913.

Cheng, Y., & Chang, H. (2007). Dual information method in cognitive diagnostic computerized

adaptive testing. Paper presented at the the Annual Meeting of National Council on

Measurement in Education, Chicago, IL.

Cheng, Y., & Chang, H. H. (2009). The maximum priority index method for severely

constrained item selection in computerized adaptive testing. British Journal of

Mathematical and Statistical Psychology, 62, 369-383.

Cheng, Y., Chang, H. H., Douglas, J., & Guo, F. (2009). Constraint-weighted a-stratification for

computerized adaptive testing with nonstatistical constraints balancing measurement

efficiency and exposure control. Educational and Psychological Measurement, 69, 35-49.

Cheng, Y., Chang, H. H., & Yi, Q. (2007). Two-phase item selection procedure for flexible

content balancing in cat. Applied Psychological Measurement, 31, 467-482.

Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory

and applications. Psychometrika, 74, 633-665.

Cooper, S., & Sahami, M. (2013). Reflections on stanford's moocs. Communications of the ACM,

56, 28-30.

DiBello, L. V., Roussos, L. A., & Stout, W. (2006). 31a review of cognitively diagnostic

assessment and a summary of psychometric models. Handbook of statistics, 26, 979-

Eggen, T. (2001). Overexposure and underexposure of items in computerized adaptive testing

(measurement and research department reports 2001-1). Amhen, The Netherlands: CITO

Group.

Embretson, S. (1990). Diagnostic testing by measuring learning processes: Psychometric

considerations for dynamic testing. In N. Frederiksen, R. Glaser, A. Lesgold & M. G.

Shafto (Eds.), Diagnostic monitoring of skills and knowledge acquisition (pp. 407-432).

Hillsdale: Lawrence Erlbaum Associates.

Georgiadou, E. G., Triantafillou, E., & Economides, A. A. (2007). A review of item exposure

control strategies for computerized adaptive testing developed from 1983 to 2005. The

Journal of Technology, Learning and Assessment, 5, 4-28.

Gibbons, R. D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis.

Psychometrika, 57, 423-436.

Gierl, M. J. (2007). Making diagnostic inferences about cognitive attributes using the rule‐

space model and attribute hierarchy method. Journal of Educational Measurement, 44,

325-340.

Gott, S. P. (1990). The assisted learning of strategic skills: Comments on chapters 5, 6, and 7. In

N. Frederiksen, R. Glaser, A. Lesgold & M. G. Shafto (Eds.), Diagnostic monitoring of

skill and knowledge acquisition (pp. 173-189). Hillsdale, NJ: Lawrence Erlbaum

Associates.

Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of

Hartz, S. M. C. (2002). A bayesian framework for the unified model for assessing cognitive

abilities: Blending theory with practicality. (Doctoral dissertation), University of Illinois

at Urbana-Champaign.

Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied

Psychological Measurement, 29, 262-277.

Henson, R., Roussos, L., Douglas, J., & He, X. (2008). Cognitive diagnostic attribute-level

discrimination indices. Applied Psychological Measurement, 32, 275-288.

Hew, K. F. (2015). Promoting engagement in online courses: What strategies can we learn from

three highly rated moocs. British Journal of Educational Technology. doi:

10.1111/bjet.12235

Hew, K. F., & Cheung, W. S. (2014). Students’ and instructors’ use of massive open online courses (moocs): Motivations and challenges. Educational Research Review, 12, 45-58.

Hsu, C. L., Wang, W. H., & Chen, S. Y. (2013). Variable-length computerized adaptive testing

based on cognitive diagnosis models. Applied Psychological Measurement, 563-582.

Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item response theory: Application to

psychological measurement. Belmont, CA: Dorsey Press.

Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and

connections with nonparametric item response theory. Applied Psychological

Measurement, 25, 258.

Kingsbury, & Zara, A. R. (1991). A comparison of procedures for content-sensitive item

selection in computerized adaptive tests. Applied Measurement in Education, 4, 241-261.

Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized

Knuth, D. (1973). Searching and sorting (Vol. 3). Reading, MA: Addison-Wesley.

Leighton, J., & Gierl, M. (2007). Cognitive diagnostic assessment for education: Theory and

applications. New York, NY: Cambridge University Press.

Leung, C. K., Chang, H. H., & Hau, K. T. (2002). Item selection in computerized adaptive

testing: Improving the a-stratified design with the sympson-hetter algorithm. Applied

Psychological Measurement, 26, 376-392.

Leung, C. K., Chang, H. H., & Hau, K. T. (2003). Incorporation of content balancing

requirements in stratification designs for computerized adaptive testing. Educational and

Psychological Measurement, 63, 257-270.

Liu, J., Ying, Z., & Zhang, S. (2013). A rate function approach to computerized adaptive testing

for cognitive diagnosis. Psychometrika, 1-23.

Luecht, R. M., & Nungester, R. J. (1998). Some practical examples of computer‐adaptive

sequential testing. Journal of Educational Measurement, 35, 229-249.

Mao, X., & Xin, T. (2013). The application of the monte carlo approach to cognitive diagnostic

computerized adaptive testingwith content constraints. Applied Psychological

Measurement, 37, 482-496.

McBride, J. R., & Martin, J. T. (1983). Reliability and validity of adaptive ability tests in a

military setting. New horizons in testing, 223-226.

McGlohen, M., & Chang, H. H. (2008). Combining computer adaptive testing technology with

cognitively diagnostic assessment. Behavior Research Methods, 40, 808-821.

Messick, S. (1989). Validity In R. Linn (Ed.), Educational measurement (3rd edition) (pp. 12-

Nichols, P. D., Chipman, S. F., & Brennan, R. L. (1995). Cognitively diagnostic assessment.

Hillsdale, NJ: Lawrence Erlbaum Associates.

Piech, C., Huang, J., Chen, Z., Do, C., Ng, A., & Koller, D. (2013). Tuned models of peer

assessment in moocs. arXiv preprint arXiv:1307.2579.

Revuelta, J., & Ponsoda, V. (1998). A comparison of item exposure control methods in

computerized adaptive testing. Journal of Educational Measurement, 35, 311-327.

Rosen, K. (2011). Discrete mathematics and its applications (7th edition): McGraw-Hill

Science.

Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods,

and applications. New York, NY: The Guilford Press.

Sandeen, C. (2013). Integrating moocs into traditional higher education: The emerging “mooc 3.0” era. Change: The Magazine of Higher Learning, 45, 34-39.

Snow, R. E., & Mandinach, E. B. (1991). Integrating assessment and instruction: A research and

development agenda ETS Research Report Series (Vol. 1991, pp. i-176). Priceton, NJ:

Educational Testing Service.

Steinberg, R. J. (1984). What cognitive psychology can (and cannot) do for test development. In

B. S. Plake (Ed.), Social and technical issues in testing: Implications for test construction

and usage. Hillsdale, NJ: Lawrence Erlbaum Associates.

Stocking, M. L. (1993). Controlling item exposure rates in a realistic adaptive testing paradigm.

Princeton, NJ: Educational Testing Service.

Stocking, M. L., & Swanson, L. (1993). A method for severely constrained item selection in

Sympson, J. B., & Hetter, R. D. (1985). Controlling item-exposure rates in computerized

adaptive testing. Paper presented at the the 27th Annual meeting of the Military Testing

Association, San Diego, CA.

Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models.

Journal of the Royal Statistical Society: Series C (Applied Statistics), 51, 337-350.

Tatsuoka, C., & Ferguson, T. (2003). Sequential classification on partially ordered sets. Journal

of the Royal Statistical Society: Series B (Statistical Methodology), 65, 143-157.

Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item

response theory. Journal of educational measurement, 20, 345-354.

Tatsuoka, K. K. (1990). Toward an integration of item-response theory and cognitive error

diagnosis. Diagnostic monitoring of skill and knowledge acquisition, 453-488.

Tatsuoka, K. K. (1991). Boolean algebra applied to determination of universal set of knowledge

states. Princeton, NJ: Educational Testing Service.

Tatsuoka, K. K. (1995). Architecture of knowledge structures and cognitive diagnosis: A

statistical pattern recognition and classification approach. Cognitively diagnostic

assessment, 327-359.

Tatsuoka, M. M., & Tatsuoka, K. K. (1989). Rule space. In N. L. Johnson & S. Kotz (Eds.),

Encyclopedia of statistical sciences: John Wiley & Sons.

Urry, V. W. (1971). A monte carlo investigation of logistic mental test models. ProQuest

Information & Learning.

US Department of Education. (2015a). Elementary and secondary education act. from

US Department of Education. (2015b). Race to the top assessment program. 2015, from

http://www2.ed.gov/programs/racetothetop-assessment/index.html

US Department of Education. (2015c). Race to the top program. 2015, from

http://www2.ed.gov/programs/racetothetop/index.html

van der Linden, W. J. (1999). Multidimensional adaptive testing with a minimum error-variance

criterion. Journal of Educational and Behavioral Statistics, 24, 398-412.

van der Linden, W. J. (2005). Linear models for optimal test design. New York, NY: Springer.

van der Linden, W. J., & Chang, H. H. (2003). Implementing content constraints in alpha-

stratified adaptive testing using a shadow test approach. Applied Psychological

Measurement, 27, 107-120.

van der Linden, W. J., & Reese, L. M. (1998). A model for optimal constrained adaptive testing.

Applied Psychological Measurement, 22, 259-270.

van der Linden, W. J., & Veldkamp, B. P. (2004). Constraining item exposure in computerized

adaptive testing with shadow tests. Journal of Educational and Behavioral Statistics, 29,

273-291.

Vardi, M. Y. (2012). Will moocs destroy academia? Commun. ACM, 55, 5.

Veldkamp, B. P., & van der Linden, W. J. (2002). Multidimensional adaptive testing with

constraints on test content. Psychometrika, 67, 575-588.

Wang, C. (2013). Mutual information item selection method in cognitive diagnostic

computerized adaptive testing with short test length. Educational and Psychological

Wang, C., & Chang, H. (2009). Kullback-leibler information in multidimensional adaptive

testing: Theory and application. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC

Conference on Computerized Adaptive Testing.

Wang, C., & Chang, H. H. (2011). Item selection in multidimensional computerized adaptive

testing—gaining information from different angles. Psychometrika, 1-22.

Wang, C., Chang, H. H., & Douglas, J. (2012). Combining cat with cognitive diagnosis: A

weighted item selection approach. Behavior Research Methods, 44, 95-109.

Wang, C., Chang, H. H., & Huebner, A. (2011). Restrictive stochastic item selection methods in

cognitive diagnostic computerized adaptive testing. Journal of Educational

Measurement, 48, 255-273.

Wang, C., Zheng, C., & Chang, H. H. (2014). An enhanced approach to combine item response

theory with cognitive diagnosis in adaptive testing. Journal of Educational Measurement,

51, 358-380.

Weiss, D. J. (1974). Strategies of adaptive ability measurement. Minneaplis, MN: University of

Mnnesota.

Xu, X., Chang, H., & Douglas, J. (2003). A simulation study to compare cat strategies for

cognitive diagnosis. Paper presented at the the Annual Meeting of National Council on

Measurement in Education, Chicago, IL.

Yi, Q., & Chang, H. H. (2003). A‐stratified cat design with content blocking. British Journal of

Mathematical and Statistical Psychology, 56, 359-378.

Yuan, L., & Powell, S. (2013). Moocs and open education: Implications for higher education.

https://www.oerknowledgecloud.org/sites/oerknowledgecloud.org/files/MOOCs-and-

Open-Education.pdf

Yuan, L., Powell, S., & Olivier, B. (2014). Beyond moocs: Sustainable online learning in

institutions. Retrieved from http://publications.cetis.ac.uk/wp-

content/uploads/2014/01/Beyond-MOOCs-Sustainable-Online-Learning-in- Institutions.pdf

In document Some practical item selection algorithms in cognitive diagnostic computerized adaptive testing -- smart diagnosis for smart learning (Page 79-91)