Modeling Rule-Based Item Generation

Geerlings, Hanneke; Glas, Cees A. W.; van der Linden, Wim J.

doi:10.1007/s11336-011-9204-x

Modeling Rule-Based Item Generation

Published: 17 March 2011

Volume 76, pages 337–359, (2011)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Psychometrika Aims and scope Submit manuscript

Modeling Rule-Based Item Generation

Download PDF

Hanneke Geerlings¹,
Cees A. W. Glas¹ &
Wim J. van der Linden²

570 Accesses
39 Citations
Explore all metrics

Abstract

An application of a hierarchical IRT model for items in families generated through the application of different combinations of design rules is discussed. Within the families, the items are assumed to differ only in surface features. The parameters of the model are estimated in a Bayesian framework, using a data-augmented Gibbs sampler. An obvious application of the model is computerized algorithmic item generation. Such algorithms have the potential to increase the cost-effectiveness of item generation as well as the flexibility of item administration. The model is applied to data from a non-verbal intelligence test created using design rules. In addition, results from a simulation study conducted to evaluate parameter recovery are presented.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Albert, J.H. (1992). Bayesian estimation of normal-ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17, 261–269.
Article Google Scholar
Béguin, A.A., & Glas, C.A.W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541–562.
Article Google Scholar
Bormuth, J.R. (1970). On the theory of achievement test items. Chicago: University of Chicago Press.
Google Scholar
Cho, S.-J., & Rabe-Hesketh, S. (2011). Alternating imputation posterior estimation of models with crossed random effects. Computational Statistics and Data Analysis, 55, 12–25.
Article Google Scholar
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559.
Article Google Scholar
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach. New York: Springer.
Google Scholar
Embretson, S.E. (1999). Generating items during testing: psychometric issues and models. Psychometrika, 64, 407–433.
Article Google Scholar
Fischer, G.H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.
Article Google Scholar
Fox, J.-P. (2004). Multilevel IRT model assessment. In L.A. van der Ark, M.A. Croon, & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences (pp. 227–252). London: Lawrence Erlbaum Associates.
Google Scholar
Fox, J.-P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 271–288.
Article Google Scholar
Freund, P.A., Hofer, S., & Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195–210.
Article Google Scholar
Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (2004). Bayesian data analysis. New York: Chapman & Hall.
Google Scholar
Gelman, A., & Pardoe, I. (2006). Bayesian measures of explained variance and pooling in multilevel (hierarchical) models. Technometrics, 48, 241–251.
Article Google Scholar
Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In J.M. Bernardo, J. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics 4: proceedings of the fourth Valencia international meeting (pp. 169–193). Oxford: Oxford University Press.
Google Scholar
Glas, C.A.W. (2010). Item parameter estimation and item fit analysis. In W.J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 269–288). New York: Springer.
Google Scholar
Glas, C.A.W., & van der Linden, W.J. (2001). Modeling variability in item parameters in item response models (Research Report 01-11). Enschede, The Netherlands: Department of Educational Measurement and Data Analysis, University of Twente.
Glas, C.A.W., & van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261.
Article Google Scholar
Glas, C.A.W., van der Linden, W.J., & Geerlings, H. (2010). Estimation of the parameters in an item-cloning model for adaptive testing. In W.J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 289–314). New York: Springer.
Google Scholar
Griffiths, W.E., & Valenzuela, M.R. (2006). Gibbs samplers for a set of seemingly unrelated regressions. Australian and New Zealand Journal of Statistics, 48, 335–351.
Article Google Scholar
Heidelberger, P., & Welch, P.D. (1983). Simulation run length control in the presence of an initial transient. Operations Research, 31, 1109–1144.
Article Google Scholar
Hively, W., Patterson, H.L., & Page, S.H. (1968). A “universe-defined” system of arithmetic achievement items. Journal of Educational Measurement, 5, 275–290.
Article Google Scholar
Holling, H., Bertling, J.P., & Zeuch, N. (2009). Automatic item generation of probability word problems. Studies in Educational Evaluation, 35, 71–76.
Article Google Scholar
Irvine, S.H., (2002). The foundations of item generation for mass testing. In S.H. Irvine & P.C. Kyllonen (Eds.) Item generation for test development (pp. 3–34). Mahwah: Lawrence Erlbaum Associates.
Google Scholar
Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189–212). New York: Springer.
Google Scholar
Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306.
Google Scholar
Johnson, V.E., & Albert, J.H. (1999). Ordinal data modeling. New York: Springer.
Google Scholar
Laros, J.A., & Tellegen, P.J. (1991). Construction and validation of the SON-R 5,5-17, the Snijders-Oomen non-verbal intelligence test. Groningen: Wolters-Noordhoff.
Google Scholar
Luecht, R.M. Adaptive computer-based tasks under an assessment engineering paradigm. Paper presented at the 2009 Graduate Management Admission Council Conference on Computerized Adaptive Testing, Minneapolis, Minnesota.
MacEachern, S.N., & Berliner, L.M. (1994). Subsampling the Gibbs sampler. The American Statistician, 48, 188–190.
Article Google Scholar
Millman, J., & Westman, R.S. (1989). Computer-assisted writing of achievement test items: toward a future technology. Journal of Educational Measurement, 26, 177–190.
Article Google Scholar
Mislevy, R.J., & Levy, R. (2007). Bayesian psychometric modeling from an evidence-centered design perspective. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 839–865). Amsterdam: Elsevier.
Google Scholar
Osburn, H.G. (1968). Item sampling for achievement testing. Educational and Psychological Measurement, 28, 95–104.
Article Google Scholar
Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6, 7–11. Available from http://CRAN.R-project.org/doc/Rnews/.
R Development Core Team (2009). R: A language and environment for statistical computing. Computer software manual. Vienna, Austria. Available from http://www.R-project.org.
Raftery, A.E., & Lewis, S. (1992). How many iterations in the Gibbs sampler? In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics 4: proceedings of the fourth Valencia international meeting (pp. 763–773). Oxford: Oxford University Press.
Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Google Scholar
Rijmen, F., & De Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271–285.
Article Google Scholar
Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8, 185–205.
Article PubMed Google Scholar
Roid, G., & Haladyna, T. (1982). A technology for test-item writing. New York: Academic Press.
Google Scholar
Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Article Google Scholar
Sinharay, S., Johnson, M.S., & Williamson, D.M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313.
Article Google Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society B, 64, 583–639.
Article Google Scholar
Tanner, M.A. (1996). Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions. New York: Springer.
Google Scholar
Tellegen, P.J., & Laros, J.A. (1993). The construction and validation of a nonverbal test of intelligence: the revision of the Snijders-Oomen tests. European Journal of Psychological Assessment, 9, 147–157.
Google Scholar
van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369–386.
Article Google Scholar
van der Linden, W.J., & Glas, C.A.W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13, 35–53.
Article Google Scholar
Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests of aggregation bias. Journal of the American Statistical Association, 57, 348–368.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Research Methodology, Measurement, and Data Analysis, University of Twente, P.O. Box 217, 7500 AE, Enschede, The Netherlands
Hanneke Geerlings & Cees A. W. Glas
CTB/McGraw-Hill, 20 Ryan Ranch Road, Monterey, CA, 93940, USA
Wim J. van der Linden

Authors

Hanneke Geerlings
View author publications
You can also search for this author in PubMed Google Scholar
Cees A. W. Glas
View author publications
You can also search for this author in PubMed Google Scholar
Wim J. van der Linden
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanneke Geerlings.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Geerlings, H., Glas, C.A.W. & van der Linden, W.J. Modeling Rule-Based Item Generation. Psychometrika 76, 337–359 (2011). https://doi.org/10.1007/s11336-011-9204-x

Download citation

Received: 23 March 2010
Revised: 19 July 2010
Published: 17 March 2011
Issue Date: April 2011
DOI: https://doi.org/10.1007/s11336-011-9204-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Modeling Rule-Based Item Generation

Abstract

Article PDF

Similar content being viewed by others

Automatic Item Generation Unleashed: An Evaluation of a Large-Scale Deployment of Item Models

Computer-Adaptive Testing with Fewer Assumptions

Model Selection for Monotonic Polynomial Item Response Models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling Rule-Based Item Generation

Abstract

Article PDF

Similar content being viewed by others

Automatic Item Generation Unleashed: An Evaluation of a Large-Scale Deployment of Item Models

Computer-Adaptive Testing with Fewer Assumptions

Model Selection for Monotonic Polynomial Item Response Models

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation