Abstract
I address two issues that were inspired by my work on the Dutch Committee on Tests and Testing (COTAN). The first issue is the understanding of problems test constructors and researchers using tests have of psychometric knowledge. I argue that this understanding is important for a field, like psychometrics, for which the dissemination of psychometric knowledge among test constructors and researchers in general is highly important. The second issue concerns the identification of psychometric research topics that are relevant for test constructors and test users but in my view do not receive enough attention in psychometrics. I discuss the influence of test length on decision quality in personnel selection and quality of difference scores in therapy assessment, and theory development in test construction and validity research. I also briefly mention the issue of whether particular attributes are continuous or discrete.
Article PDF
Avoid common mistakes on your manuscript.
References
American Educational Research Association, American Psychological Association & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington: American Educational Research Association.
Atkins, D.C., Bedics, J.D., McGlinchey, J.B., & Beauchaine, T.P. (2005). Assessing clinical significance: does it matter which method we use? Journal of Consulting and Clinical Psychology, 73, 982–989.
Bauer, S., Lambert, M.J., & Nielsen, S.L. (2004). Clinical significance methods: a comparison of statistical techniques. Journal of Personality Assessment, 82, 60–70.
Bentler, P.A., & Woodward, J.A. (1980). Inequalities among lower bounds to reliability: with applications to test construction and factor analysis. Psychometrika, 45, 249–267.
Boring, E.G. (1923). Intelligence as the tests test it. New Republic, 35, 35–37.
Borsboom, D., Cramer, A.O.J., Kievit, R.A., Zand Scholten, A., & Franić, S. (2009). The end of construct validity. In R.W. Lissitz (Ed.), The concept of validity. Revisions, new directions, and applications (pp. 135–170). Charlotte: Information Age Publishing, Inc.
Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2004). The concept of validity. Psychological review, 111, 1061–1071.
Bouwmeester, S., Vermunt, J.K., & Sijtsma, K. (2007). Development and individual differences in transitive reasoning: a fuzzy trace theory approach. Developmental Review, 27, 41–74.
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322.
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
Cronbach, L.J., & Furby, L. (1970). How we should measure “change”—or should we? Psychological Bulletin, 74, 68–80.
De Boeck, P., & Wilson, M. (2004). Explanatory item response models. A generalized linear and nonlinear approach. New York: Springer.
Denollet, J. (2000). Type D personality: a potential risk facor refined. Journal of Psychosomatic Research, 49, 255–266.
Denollet, J. (2005). DS14: standard assessment of negative affectivity, social inhibition, and Type D personality. Psychosomatic Medicine, 67, 89–97.
Emons, W.H.M., Denollet, J., Sijtsma, K., & Pedersen, S.S. (2011). Dimensional and categorical approaches to the Type D personality construct (in preparation).
Emons, W.H.M., Sijtsma, K., & Meijer, R.R. (2007). On the consistency of individual classification using short scales. Psychological Methods, 12, 105–120.
Evers, A., Sijtsma, K., Lucassen, W., & Meijer, R.R. (2010). The Dutch review process for evaluating the quality of psychological tests: history, procedure and results. International Journal of Testing, 10, 295–317.
Ferguson, E., et al. (2009). A taxometric analysis of Type D personality. Psychosomatic Medicine, 71, 981–986.
Fischer, G.H. (1995). The linear logistic test model. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch models. Foundations, recent developments and applications (pp. 131–155). New York: Springer.
Green, S.A., & Yang, Y. (2009). Commentary on coefficient alpha: a cautionary tale. Psychometrika, 74, 121–135.
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282.
Hermans, H.J.M. (2011). Prestatie Motivatie Test voor Kinderen 2 (PMT-K-2) (Performance motivation test for children 2). Amsterdam: Pearson Assessment.
Jacobson, N.S., & Truax, P. (1991). Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12–19.
Jansen, B.R.J., & Van der Maas, H.L.J. (1997). Statistical test of the rule assessment methodology by latent class analysis. Developmental Review, 17, 321–357.
Jansen, B.R.J., & Van der Maas, H.L.J. (2002). The development of children’s rule use on the balance scale task. Journal of Experimental Child Psychology, 81, 383–416.
Kapinga, T.J. (2010). Drempelonderzoek. Didactische plaatsbepaling binnen het voortgezet onderwijs en praktijkonderwijs. 5 e versie 2010 (Threshold investigation. Didactical location within secondary education and practical education. 5th Version 2010). Ridderkerk: 678 Onderwijs Advisering.
Korkman, M., Kirk, U., & Kemp, S. (2010). NEPSY-II-NL. Nederlandstalige bewerking (A developmental neuropsycological assessment, II, Dutch version). Amsterdam: Pearson Assessment.
Kruyen, P.M., Emons, W.H.M., & Sijtsma, K. (in press). Test length and decision quality in personnel selection: when is short too short? International Journal of Testing.
Lissitz, R.W. (2009). The concept of validity. Revisions, new directions, and applications. Charlotte: Information Age Publishing, Inc.
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley.
Mellenbergh, G.J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293–299.
Mellenbergh, G.J. (1999). A note on simple gain score precision. Applied Psychological Measurement, 23, 87–89.
Michell, J. (1999). Measurement in psychology. A critical history of a methodological concept. Cambridge: Cambridge University Press.
Nicewander, W.A., & Price, J.M. (1983). Reliability of measurement and the power of statistical tests: some new results. Psychological Bulletin, 94, 524–533.
Novick, M.R., & Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32, 1–13.
Ogles, B.M., Lunnen, K.M., & Bonesteel, K. (2001). Clinical significance: history, application, and current practice. Clinical Psychology Review, 21, 421–446.
Raykov, T. (2001). Bias of coefficient α for fixed congeneric measures with correlated errors. Applied Psychological Measurement, 25, 69–76.
Reise, S.P., & Haviland, M.G. (2005). Item response theory and the measurement of clinical change. Journal of Personality Assessment, 84, 228–238.
Ruscio, J., Haslam, N., & Ruscio, A.M. (2006). Introduction to the taxometric method: a practical guide. Mahwah: Erlbaum.
Samejima, F. (1969). Psychometrika monograph: Vol. 17. Estimation of latent ability using a response pattern of graded scores. Richmond: Psychometric Society.
Schlichting, L., & Lutje Spelberg, H. (2010). Schlichting Test voor Taalproductie—II (Schlichting test for language production—II). Houten: Bohn Stafleu van Loghum.
Siegler, R.S. (1981). Developmental sequences within and between concepts. Monographs of the Society for Research in Child Development, 46(2, Serial No. 189).
Sijtsma, K. (2009a). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120.
Sijtsma, K. (2009b). Reliability beyond theory and into practice. Psychometrika, 74, 169–173.
Sijtsma, K. (2011). Psychological measurement between physics and statistics. Theory & Psychology.
Sijtsma, K., & Emons, W.H.M. (2011). Advice on total-score reliability issues in psychosomatic measurement. Journal of Psychosomatic Research, 70, 565–572.
Singh, S. (1997). Fermat’s last theorem. London: Harper Perennial.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295.
Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103, 677–680.
Smits, D.J.M., & De Boeck, P. (2003). A componential IRT model for guilt. Multivariate Behavioral Research, 38, 161–188.
Ten Berge, J.M.F., Snijders, T.A.B., & Zegers, F.E. (1981). Computational aspects of the greatest lower bound to the reliability and constrained minimum trace factor analysis. Psychometrika, 46, 201–213.
Van Breukelen, G.J.P., & Vlaeyen, J.W.S. (2005). Norming clinical questionnaires with multiple regression: the pain cognition list. Psychological Assessment, 17, 336–344.
Van Maanen, L., Been, P.H., & Sijtsma, K. (1989). Problem solving strategies and the linear logistic test model. In E.E.C.I. Roskam (Ed.), Mathematical psychology in progress (pp. 267–287). New York: Springer.
Verguts, T., & De Boeck, P. (2002). The induction of solution rules in Raven’s progressive matrices test. European Journal of Cognitive Psychology, 14, 521–547.
Zachary, R.A., & Gorsuch, R.L. (1985). Continuous norming: implications for the WAIS-R. Journal of Clinical Psychology, 41, 86–94.
Zhu, J., & Chen, H.-Y. (2011). Utility of inferential norming with smaller sample sizes. Journal of Psychoeducational Assessment. doi:10.1177/0734282910396323.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is based on the author’s Presidential Address, presented at the International Meeting of the Psychometric Society 2011, July 18–22, 2011, Hong Kong, China.
Rights and permissions
About this article
Cite this article
Sijtsma, K. Future of Psychometrics: Ask What Psychometrics Can Do for Psychology. Psychometrika 77, 4–20 (2012). https://doi.org/10.1007/s11336-011-9242-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-011-9242-4