Random Item IRT Models

De Boeck, Paul

doi:10.1007/s11336-008-9092-x

Random Item IRT Models

Presidential Address
Published: 02 December 2008

Volume 73, pages 533–559, (2008)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Psychometrika Aims and scope Submit manuscript

Random Item IRT Models

Download PDF

Paul De Boeck¹

1543 Accesses
184 Citations
4 Altmetric
Explore all metrics

Abstract

It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF. In correspondence with these issues, three parts are included. All three rely on the Rasch model as the simplest model to study, and the same data set is used for all applications. First, it is shown that the Rasch model with fixed persons and random items is an interesting measurement model, both, in theory, and for its goodness of fit. Second, the linear logistic test model with an error term is introduced, so that the explanation of the item difficulties based on the item properties does not need to be perfect. Finally, two more models are presented: the random item profile model (RIP) and the random item mixture model (RIM). In the RIP, DIF is not considered a discrete phenomenon, and when a robust regression approach based on the RIP difficulties is applied, quite good DIF identification results are obtained. In the RIM, no prior anchor sets are defined, but instead a latent DIF class of items is used, so that posterior anchoring is realized (anchoring based on the item mixture). It is shown that both approaches are promising for the identification of DIF.

Article PDF

A modular approach for item response theory modeling with the R package flirt

Article 15 July 2015

Generalized Fiducial Inference for Binary Logistic Item Response Models

Article 14 January 2016

General mixture item response models with different item response structures: Exposition with an application to Likert scales

Article Open access 10 January 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Adams, R., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47–76.
Google Scholar
Albers, W., Does, R.J.M.M., Ombos, Tj., & Janssen, M.P.E. (1989). A stochastic growth model applied to tests of academic knowledge. Psychometrika, 54, 451–466.
Article Google Scholar
Andersen, E.B. (1980). Discrete statistical models with social science applications. Amsterdam: North-Holland.
Google Scholar
Angoff, W.H., & Ford, S.F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10, 95–106.
Article Google Scholar
Bates, D., Maechler, M., & Dai, B. (2008). The lme4 Package version 0.999375-26. http://cran.r-project.org/web/packages/lme4/lme4.pdf/.
Bejar, I.I. (1993). A generative approach to psychological and educational measurement. In N. Frederiksen, R.J. Mislevy, & I.I. Bejar (Eds.), Test theory for a new generation of tests (pp. 323–359).
Bejar, I.I., Lawless, R.R., Morley, M.E., Wagner, M.E., Bennett, R.E., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment, 2, 1–29.
Google Scholar
Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444.
Article Google Scholar
Briggs, D.C., & Wilson, M. (2007). Generalizability in item response modeling. Journal of Educational Measurement, 44, 131–155.
Article Google Scholar
Camilli, G., & Shepard, L.A. (1994). Methods for identifying biased test items. Sage: Thousand Oaks.
Google Scholar
Chen, Z., & Henning, G. (1985). Linguistic and cultural bias in proficiency tests. Language Testing, 2, 155–163.
Article Google Scholar
Cho, S.-J., & Rabe-Hesketh, S. (2008). Estimating item response models with random item parameters. Unpublished manuscript.
Clark, H.H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior, 12, 335–359.
Article Google Scholar
Coleman, E.B. (1964). Generalizing to a language population. Psychological Reports, 14, 219–226.
Google Scholar
De Boeck, P., & Wilson, M. (2004). Explanatory item response models. New York: Springer.
Google Scholar
De Boeck, P., Wilson, M., & Acton, S. (2005). A conceptual and psychometric framework for distinguishing categories and dimensions. Psychological Review, 112, 129–158.
Article PubMed Google Scholar
Dorans, N.J., & Holland, P.W. (1993). DIF detection and description: Mantal-Haenszel and standardization. In P.W. Holland, & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Hillsdale: Erlbaum.
Google Scholar
Dorans, N.J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355–368.
Article Google Scholar
Embretson, S.E. (1999). Generating items during testing: Psychometric issues and models. Psychometrika, 64, 407–433.
Article Google Scholar
Fischer, G.H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.
Article Google Scholar
Frederickx, S., Tuerlinckx, F., De Boeck, P., & Magis, D. (2008). An item mixture model to detect differential item functioning. Unpublished manuscript, K.U. Leuven.
Gelman, A., & Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–411.
Article Google Scholar
Glas, C.A.W., & van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261.
Article Google Scholar
Hively, W., Patterson, H.L., & Page, S.H. (1968). A “universe-defined” system of arithmetic achievement tests. Journal of Educational Measurement, 5, 275–290.
Article Google Scholar
Holland, P.W., & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & J.I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale: Lawrence Erlbaum.
Google Scholar
Holland, P.W., & Wainer, H. (1993). Differential item functioning. Hillsdale: Lawrence Erlbaum.
Google Scholar
Ironson, G.H., Homan, S., Willis, R., & Singer, B. (1984). The validity of item bias techniques with math word problems. Applied Psychological Measurement, 8, 391–396.
Article Google Scholar
Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306.
Google Scholar
Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189–212). New York: Springer.
Google Scholar
Johnson, P.M., & Sinharay, S., (2005). Calibration of polytomous item families using Bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369–400.
Article Google Scholar
Lunn, D., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS—a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325–337.
Article Google Scholar
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.
PubMed Google Scholar
McGraw, K.O., & Wong, S.P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46.
Article Google Scholar
Millsap, R.E., & Everson, H.T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.
Article Google Scholar
Raaijmakers, J., Schrijnemakers, J., & Gremmen, F. (1999). How to deal with “the language-as-fixed-effect-fallacy”: Common misconceptions and alternative solutions. Journal of Memory and Language, 41, 416–426.
Article Google Scholar
Popham, W.J. (1978). Criterion-referenced measurement. Englewood Cliffs: Prentice-Hall.
Google Scholar
Rouder, J.N., Lu, J., Speckman, P.L., Sun, D., Morey, R.D., & Naveh-Benjamin, M. (2007). Signal detection models with random participant and random item effects. Psychometrika, 72, 621–624.
Article Google Scholar
Rousseeuw, P.J., & Leroy, A.M. (1987). Robust regression and outlier detection. New York: Wiley.
Book Google Scholar
Rousseeuw, P.J., & van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.
Article Google Scholar
Roussos, L.A., Templin, J.L., & Henson, R.A. (2007). Skills diagnosis using IRT-based latent class models. Journal of Educational Measurement, 44, 293–311.
Article Google Scholar
Savalei, V. (2006). Logistic approximation to the normal: The KL rationale. Psychometrika, 71, 763–767.
Article Google Scholar
Shrout, P.E., & Fleiss, J.L. (1979). Intraclass correlation: Uses in assessing reliability. Psychological Bulletin, 86, 420–428.
Article PubMed Google Scholar
Shepard, L., Camilli, G., & Williams, D.M. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22, 77–105.
Article Google Scholar
Sinharay, S., Johnson, M.S., & Williamson, D.M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Sciences, 28, 295–313.
Article Google Scholar
Snijders, T.A.B., & Bosker, R.J. (1999). Multilevel analysis. An introduction to basic and advanced multilevel modeling. London: Sage.
Google Scholar
StataCorp (2007). Stata statistical software: Release 10. College Station: StataCorp LP.
Google Scholar
Swaminathan, H., & Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–70.
Article Google Scholar
Tan, E.S., Ambergen, A.W., Does, R.J.M.M., & Imbos, Tj. (1999). Approximations of normal IRT models for change. Journal of Educational and Behavioral Statistics, 24, 208–223.
Google Scholar
Teresi, J.A. (2001). Statistical methods for examination of differential item functioning (DIF)—with applications to cross-cultural measurement of functional, physical and mental health. Journal of Mental Health and Aging, 7, 31–40.
Google Scholar
Thierny, L., & Kadane, J.R. (1986). Accurate approximations for the posterior moments and marginal densities. Journal of the American Statistical Association, 81, 82–86.
Article Google Scholar
Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99, 118–128.
Article Google Scholar
Tuerlinckx, F., Rijmen, F., Verbeke, G., & De Boeck, P. (2006). Statistical inference in generalized linear mixed models: A review. British Journal of Mathematical and Statistical Psychology, 59, 225–255.
Article PubMed Google Scholar
Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369–386.
Article Google Scholar
Verhelst, N.D., & Eggen, T.J.H.M. (1989). Psychometrische en statistische aspecten van peilingsonderzoek (PPON rapport 4). Arnhem: Cito.
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221–261.
Article Google Scholar
Zwinderman, A.H. (1991). A generalized Rasch model for manifest predictors. Psychometrika, 56, 589–600.
Article Google Scholar

Download references

Author information

Authors and Affiliations

K.U.Leuven, Leuven, Belgium
Paul De Boeck

Authors

Paul De Boeck
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul De Boeck.

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Boeck, P. Random Item IRT Models. Psychometrika 73, 533–559 (2008). https://doi.org/10.1007/s11336-008-9092-x

Download citation

Published: 02 December 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s11336-008-9092-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Random Item IRT Models

Abstract

Article PDF

Similar content being viewed by others

A modular approach for item response theory modeling with the R package flirt

Generalized Fiducial Inference for Binary Logistic Item Response Models

General mixture item response models with different item response structures: Exposition with an application to Likert scales

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Random Item IRT Models

Abstract

Article PDF

Similar content being viewed by others

A modular approach for item response theory modeling with the R package flirt

Generalized Fiducial Inference for Binary Logistic Item Response Models

General mixture item response models with different item response structures: Exposition with an application to Likert scales

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation