Louis A. Roussos William Stout
6.3. Summing Up: A More Complete Approach to DIF Analysis
The general purpose for conducting a DIF analysis is to help ensure test equity. The statistical flag-ging of items that exhibit evidence of DIF represents an essential contribution toward the achievement of this objective. Because tests are inherently multidi-mensional and multidimultidi-mensionality is the basic cause of DIF, increased understanding of test multidimen-sionality and the effects of these dimensions on DIF hold the potential for a more accurate interpretation of the test score, more control over the influence of relevant auxiliary dimensions, and the reduction of influence by unintended and irrelevant nuisance dimensions.
Thus, the optimal approach for a DIF analysis procedure would seem to be one that incorporates the immediate critical goal of detecting DIF items, which is the focus of the traditional DIF analysis approach, and the longer range goal of identifying the DIF secondary dimensions, which is the focus of the more recent advancements that have been accomplished in DIF research.
It is important to note that the process of test design and development already involves consideration of a wide variety of substantive item characteristics through the item review processes (including the review of flagged DIF items and the sensitivity review of items for offensive language that could cause DIF) and the creation and implementation of test specifications, and these identified characteristics provide a ready source of secondary dimensions for DIF hypotheses. Also, in the case of linked tests, the large number of pretest items that are typically tested provides a more than adequate pool for forming item bundles for these hypotheses. Moreover, pretest items are already fre-quently used for research purposes so that some of these pretest slots can be reserved for controlled testing of DIF hypotheses (e.g., see Bolt, 2000).
Thus, the advantages of increased understanding of DIF secondary dimensions by augmenting the tra-ditional DIF analysis implementation procedure with the developing and testing of DIF hypotheses do not necessarily involve any significant increase in expense.
The inclusion of the developing and testing of DIF hypotheses in a DIF analysis implementation proce-dure often involves merely increased awareness that the hypotheses already exist and can be easily tested.
References
Ackerman, T. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspec-tive. Journal of Educational Measurement, 29, 67–91.
Bolt, D. (2000). A SIBTEST approach to testing DIF hypothe-ses using experimentally designed test items. Journal of Educational Measurement, 37, 307–327.
Bolt, D. (2002, April). Studying the DIF potential of nuisance dimensions using bundle DIF and multidimen-sional IRT analyses. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.
Dorans, N. J. (1989). Two new approaches to assessing dif-ferential item functioning: Standardization and the Mantel-Haenszel method. Applied Measurement in Education, 2, 217–233.
Douglas, J., Roussos, L. A., & Stout, W. F. (1996). Item bundle DIF hypothesis testing: Identifying suspect bundles and assessing their DIF. Journal of Educational Measurement, 33, 465–485.
Gierl, M. J., Bisanz, J., Bisanz, G. L., & Boughton, K. A. (2002, April). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the DIF analysis framework. Paper presented at the annual meet-ing of the National Council on Measurement in Education, New Orleans, LA.
Gierl, M. J., & Kaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement, 38, 164–187.
Holland, P. W., & Thayer, D. T. (1988). Differential item perfor-mance and the Mantel-Haenszel procedure. In H. Wainer &
H. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ:
Lawrence Erlbaum.
Linn, R. L. (1993). The use of differential item functioning statistics: A discussion of current practice and future impli-cations. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 349–364). Hillsdale, NJ: Lawrence Erlbaum.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
McCarty, F. A., Oshima, T. C., & Raju, N. (2002, April).
Identifying possible sources of differential bundle function-ing with polytomously scored data. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: Macmillan.
Roussos, L. A., Schnipke, D. L., & Pashley, P. J. (1999).
A generalized formula for the Mantel-Haenszel differen-tial item functioning parameter. Journal of Educational and Behavioral Statistics, 24, 293–322.
Roussos, L. A., & Stout, W. F. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Mea-surement, 20, 355–371.
Ryan, K. E., & Chiu, S. (2001). An examination of item con-text effects, DIF, and gender DIF. Applied Measurement in Education, 14, 73–90.
Shealy, R., & Stout, W. F. (1993). An item response theory model for test bias. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 197–239). Hillsdale, NJ:
Lawrence Erlbaum.
Stout, W. F., Bolt, D., Froelich, A. G., Habing, B., Hartz, S. M., & Roussos, L. A. (2003). Development of a SIBTEST bundle methodology for improving test equity with
applications for GRE test development (GRE Board Professional Rep. No. 98–15P, ETS Research Rep. 03–06).
Princeton, NJ: Educational Testing Service.
Stout, W. F., Habing, B., Douglas, J., Kim, H. R., Roussos, L. A.,
& Zhang, J. (1996). Conditional covariance-based nonpara-metric multidimensionality assessment. Applied Psycholog-ical Measurement, 20, 331–354.
U.S. Department of Education. (2002). Draft regulations to implement Part A of Title I of the Elementary Secondary Education Act of 1965 as amended by the No Child Left Behind Act of 2001. Washington, DC: Author.
Walker, C. M., & Beretvas, S. N. (2001). An empirical investi-gation demonstrating the multidimensional DIF paradigm:
A cognitive explanation for DIF. Journal of Educational Measurement, 38, 147–163.
Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337–347). Hillsdale, NJ:
Lawrence Erlbaum.