Abstract
An application of a hierarchical IRT model for items in families generated through the application of different combinations of design rules is discussed. Within the families, the items are assumed to differ only in surface features. The parameters of the model are estimated in a Bayesian framework, using a data-augmented Gibbs sampler. An obvious application of the model is computerized algorithmic item generation. Such algorithms have the potential to increase the cost-effectiveness of item generation as well as the flexibility of item administration. The model is applied to data from a non-verbal intelligence test created using design rules. In addition, results from a simulation study conducted to evaluate parameter recovery are presented.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Albert, J.H. (1992). Bayesian estimation of normal-ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17, 261–269.
Béguin, A.A., & Glas, C.A.W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541–562.
Bormuth, J.R. (1970). On the theory of achievement test items. Chicago: University of Chicago Press.
Cho, S.-J., & Rabe-Hesketh, S. (2011). Alternating imputation posterior estimation of models with crossed random effects. Computational Statistics and Data Analysis, 55, 12–25.
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559.
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach. New York: Springer.
Embretson, S.E. (1999). Generating items during testing: psychometric issues and models. Psychometrika, 64, 407–433.
Fischer, G.H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.
Fox, J.-P. (2004). Multilevel IRT model assessment. In L.A. van der Ark, M.A. Croon, & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences (pp. 227–252). London: Lawrence Erlbaum Associates.
Fox, J.-P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 271–288.
Freund, P.A., Hofer, S., & Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195–210.
Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (2004). Bayesian data analysis. New York: Chapman & Hall.
Gelman, A., & Pardoe, I. (2006). Bayesian measures of explained variance and pooling in multilevel (hierarchical) models. Technometrics, 48, 241–251.
Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In J.M. Bernardo, J. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics 4: proceedings of the fourth Valencia international meeting (pp. 169–193). Oxford: Oxford University Press.
Glas, C.A.W. (2010). Item parameter estimation and item fit analysis. In W.J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 269–288). New York: Springer.
Glas, C.A.W., & van der Linden, W.J. (2001). Modeling variability in item parameters in item response models (Research Report 01-11). Enschede, The Netherlands: Department of Educational Measurement and Data Analysis, University of Twente.
Glas, C.A.W., & van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261.
Glas, C.A.W., van der Linden, W.J., & Geerlings, H. (2010). Estimation of the parameters in an item-cloning model for adaptive testing. In W.J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 289–314). New York: Springer.
Griffiths, W.E., & Valenzuela, M.R. (2006). Gibbs samplers for a set of seemingly unrelated regressions. Australian and New Zealand Journal of Statistics, 48, 335–351.
Heidelberger, P., & Welch, P.D. (1983). Simulation run length control in the presence of an initial transient. Operations Research, 31, 1109–1144.
Hively, W., Patterson, H.L., & Page, S.H. (1968). A “universe-defined” system of arithmetic achievement items. Journal of Educational Measurement, 5, 275–290.
Holling, H., Bertling, J.P., & Zeuch, N. (2009). Automatic item generation of probability word problems. Studies in Educational Evaluation, 35, 71–76.
Irvine, S.H., (2002). The foundations of item generation for mass testing. In S.H. Irvine & P.C. Kyllonen (Eds.) Item generation for test development (pp. 3–34). Mahwah: Lawrence Erlbaum Associates.
Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189–212). New York: Springer.
Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306.
Johnson, V.E., & Albert, J.H. (1999). Ordinal data modeling. New York: Springer.
Laros, J.A., & Tellegen, P.J. (1991). Construction and validation of the SON-R 5,5-17, the Snijders-Oomen non-verbal intelligence test. Groningen: Wolters-Noordhoff.
Luecht, R.M. Adaptive computer-based tasks under an assessment engineering paradigm. Paper presented at the 2009 Graduate Management Admission Council Conference on Computerized Adaptive Testing, Minneapolis, Minnesota.
MacEachern, S.N., & Berliner, L.M. (1994). Subsampling the Gibbs sampler. The American Statistician, 48, 188–190.
Millman, J., & Westman, R.S. (1989). Computer-assisted writing of achievement test items: toward a future technology. Journal of Educational Measurement, 26, 177–190.
Mislevy, R.J., & Levy, R. (2007). Bayesian psychometric modeling from an evidence-centered design perspective. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 839–865). Amsterdam: Elsevier.
Osburn, H.G. (1968). Item sampling for achievement testing. Educational and Psychological Measurement, 28, 95–104.
Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6, 7–11. Available from http://CRAN.R-project.org/doc/Rnews/.
R Development Core Team (2009). R: A language and environment for statistical computing. Computer software manual. Vienna, Austria. Available from http://www.R-project.org.
Raftery, A.E., & Lewis, S. (1992). How many iterations in the Gibbs sampler? In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics 4: proceedings of the fourth Valencia international meeting (pp. 763–773). Oxford: Oxford University Press.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Rijmen, F., & De Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271–285.
Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8, 185–205.
Roid, G., & Haladyna, T. (1982). A technology for test-item writing. New York: Academic Press.
Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Sinharay, S., Johnson, M.S., & Williamson, D.M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313.
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society B, 64, 583–639.
Tanner, M.A. (1996). Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions. New York: Springer.
Tellegen, P.J., & Laros, J.A. (1993). The construction and validation of a nonverbal test of intelligence: the revision of the Snijders-Oomen tests. European Journal of Psychological Assessment, 9, 147–157.
van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369–386.
van der Linden, W.J., & Glas, C.A.W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13, 35–53.
Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests of aggregation bias. Journal of the American Statistical Association, 57, 348–368.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Geerlings, H., Glas, C.A.W. & van der Linden, W.J. Modeling Rule-Based Item Generation. Psychometrika 76, 337–359 (2011). https://doi.org/10.1007/s11336-011-9204-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-011-9204-x