Computability, and Generality
8.10. Summary and Implications
14
42
?
?
?
? 12
?
A statistical model would be specified for the complete data that would be observed if everyone were measured on the more finely categorized variable.
The remaining part of the model would specify that certain cells are not observed; only their sums, such as those indicated in the figure, are observed. In this case, the second part of the model would show that for the second group of people, two unobserved cells would be merged (i.e., summed) to create the observed category “improved,” and three unobserved cells would be collapsed to create the observed cate-gory “not improved.” Researchers familiar with confir-matory factor analysis and structural equation models will probably guess (correctly) that some missing data models have unidentified parameters, so the analysis is sometimes complicated.
A wide variety of missing data cases can be ana-lyzed using this framework. These include estimating frequencies when some people are missing data on some variables; fitting log-linear models when there are missing data; fitting latent class models and, more generally, models with fused cells (such as genet-ics models); fitting latent class models when some observed variables have missing data; fitting models when some variables are more finely categorized than others; and fitting models with various assumptions about the missing data process. Some of the above models were not previously conceptualized as missing data problems, so it was not realized how many situa-tions could be treated within one general framework.
Rindskopf (1992) describes these models in detail.
Even more general models for missing data can be estimated using the Bayesian program BUGS (Spiegel-halter, Thomas, & Best, 1999; Spiegel(Spiegel-halter, Thomas, Best, & Gilks, 1996). BUGS was primarily developed for Bayesian analysis with missing data, but it has been applied to a wide variety of statistical models.
The categorical data models for which it has been used include logistic regression, Poisson regression, item response theory, latent class analysis, multilevel (nested) models, and log-linear models.
One trend illustrated by this example is obviously the move toward a comprehensive, general model that includes many special cases. This approach also requires numerical methods involving heavy com-putation, especially for large problems. Even rela-tively large problems can now be analyzed using a microcomputer.
8.10. Summary and Implications
Categorical data analysis, like most of applied statis-tics, has become more realistic, more general, more comprehensive, and more complex. In fact, there are models even more general than some dis-cussed here. For example, generalized linear models (McCullagh & Nelder, 1989) have been developed that include regression, ANOVA, logistic regres-sion, and log-linear models (among others) as special cases. Computer hardware and software (e.g., BUGS, Mplus, LEM, SPlus) that did not previously exist have made many of these new methods possible, and have stimulated the development of more statistical methods.
Many other areas of recent research have expanded the set of tools for analyzing categorical data. Some of these are too specialized for discussion here (e.g., exact methods, meta-analysis, and data-mining methods such as CHAID, CART, and neural networks). Others are covered in separate sections of this volume (e.g., multilevel models, longitudinal models, item response theory, and structural equation models).
Researchers must also keep in mind that analysis has implications for design; a badly designed study cannot be rescued by a brilliant analysis. Complex statistical methods require additional design consider-ations beyond those encountered with more traditional designs. In particular, latent variable models and multilevel models cannot be used without a properly designed study.
I hope that the examples presented here have pro-vided a taste for the exciting new developments in
categorical data analysis. We have not only exciting new methods but also exciting old methods; what could be better?
References
Aiken, L. C., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage.
Fisher, R. A. (1930). Statistical methods for research workers (3rd ed.). Edinburgh, UK: Oliver & Boyd.
Freedman, D., Pisani, R., & Purves, R. (1978). Statistics.
New York: W. W. Norton.
Friendly, M. (2000). Visualizing categorical data. Cary, NC:
SAS Publishing.
Goleman, D. (1985, October 22). Strong emotional response to disease may bolster patient’s immune system. New York Times, p. C1.
Goodman, L. A. (1978). Analyzing qualitative/categorical data:
Log-linear models and latent structure analysis. Cambridge, MA: Abt Books.
Laird, N. M., & Olivier, D. (1981). Covariance analysis of censored survival data using log-linear analysis techniques.
Journal of the American Statistical Association, 76, 231–240.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman & Hall.
Rindskopf, D. (1987). Using latent class analysis to test developmental models. Developmental Review, 7, 66–85.
Rindskopf, D. (1990). Nonstandard loglinear models.
Psychological Bulletin, 108, 150–162.
Rindskopf, D. (1992). A general approach to categorical data analysis with missing data using generalized linear models with composite links. Psychometrika, 57, 29–42.
Rindskopf, D. (1999). Some hazards of using nonstandard log-linear models, and how to avoid them. Psychological Methods, 4, 339–347.
Rosenthal, R., & Rosnow, R. L. (1985). Contrast analysis:
Focused comparisons in the analysis of variance. New York:
Cambridge University Press.
Smith, J. (1990, October 7). Take my advice. Los Angeles Times Magazine, p. 6.
Spiegelhalter, D. J., Thomas, A., & Best, N. G. (1999).
WinBUGS Version 1.2 user manual. Cambridge, UK: MRC Biostatistics Unit.
Spiegelhalter, D. J., Thomas, A., Best, N. G., & Gilks, W. R.
(1996). BUGS: Bayesian inference using Gibbs sampling, Version 0.5 (Version ii). Cambridge, UK: MRC Biostatistics Unit.
Wickens, T. D. (1989). Multiway contingency tables analysis for the social sciences. Hillsdale, NJ: Lawrence Erlbaum.
Wilkinson, L., & Task Force on Statistical Inference. (1999).
Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594–604.
Woolf, B. (1955). On estimating the relation between blood group and disease. Annals of Human Genetics, 19, 251–253.