Summary
The purpose of this chapter is to provide an overview of several concepts and terms that were originally defined and investigated in the corner of education that housed psychometrics, but have migrated to the more general education literature.Definitions, explanations, and examples will be given for the commonly used terms including reliability, generalizability, and validity. Following the discussion of the common psychometric concepts and terms, the second part of the chapter provides an overview of how one might use these concepts in designing or choosing an instrument. The third part of the chapter will introduce some newer and more advanced topics that have received attention in recent years. The chapter will conclude with a brief review of practical suggestions for those engaged in educational research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
Reference
American Educational Research Association, American Psychological Association, & National Council for Measurement in Education. (1985).Standards for educational and psychological testing.Washington, DC: American Psychological Association, Inc.
Angoff, W. H. (1984).Scales norms and equivalent scores.Princeton, NJ: Educational Testing Services. Originally published in R. L. Thorndike (Ed.), (1971).Educational Measurement(2nd ed., pp. 508–600). Washington, DC: American Council on Education.
Ansley, N. A., & Forsyth, R. A. (1985). An examination of the characteristics of unidimensional IRT parameter estimates derived from two-dimensional data.Applied Psychological Measurement9(1), 37–48.
Babbie, E. R. (1973).Survey research methods.Belmont, CA: Wadsworth.
Baker, F. B. (1985).The basics of item response theory.Portsmouth, NH: Heinemann.
Blacklow, R. S., Goepp, C. E., & Hojat, M. (1993). Further psychometric evaluations of a class-ranking model as a predictor of graduates’ clinical competence in the first year of residency.Academic Medicine 68(4)295–297.
Brennan, R. L. (1992).Elements of generalizability theory(2nd ed.). Iowa City, IA: American College Testing Program.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitraitmultimethod matrix.Psychological Bulletin 5681–105.
Campbell, D. T., & Stanley, J. C. (1963).Experimental and quasi-experimental designs for research.Chicago: Rand McNally College Publishing Company.
Carmines, E. G., & Zeller, R. A. (1979).Reliability and validity assessment.Beverly Hills, CA: Sage Publications.
Concato, J., & Feinstein, A. R. (1997). Asking patients what they like: overlooked attributes of patient satisfaction with primary care.American Journal of Medicine 102399–406.
Cook, L. L., & Eignor, D. R. (1991). An NCME instructional module on IRT equating methods.Educational Measurement: Issues and Practice 10(3)37–45.
Crocker, L., & Algina, J. (1986).Introduction to classical and modern test theory.New York: Holt, Rinehart, and Winston.
Cronbach, L. J., & Furby, L. (1970). How should we measure “change”- or should we?Psychological Bulletin 7468–80.
Cronbach, L. J., Gleser, G. C., Nanda, H.&Rajaratnam, N. (1972).The dependability of behavioral measurements: Theory of generalizability for scores and profiles.New York: Wiley.
Dawson-Saunders, B., & Trapp, R. G. (1994).Basic and clinical biostatistics(2nd ed.). Norwalk, CT: Appleton and Lang.
DeVellis, R. F. (1991).Scale development: Theory and applications.Newbury Park: Sage Publications. Dorans, N. J. (1990). Equating methods and sampling designs.Applied Measurement in Education 33–17. Drasgow, F., & Parsons, C. K. (1983). Application of unidimensional item response theory models to multidimensional data.Applied Psychological Measurement7(2), 189–199.
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 105–146). New York: American Council on Education and Macmillan.
Fisher, R. M. (1925).Statistical methods for research workers.London: Oliver and Boyd.
Fowler, F. J., Jr. (1995).Improving survey questions: Design and evaluation.Newbury Park: Sage Publications.
Gorsuch, R. L. (1983).Factor analysis.Hillsdale, NJ: Erlbaum Associates.
Gronlund, N. E. (1985).Measurement and evaluation in teaching(5th ed.). New York: Macmillan.
Guilford, J. P., & Fruchter, B. (1978).Fundamental statistics in psychology and education(6th ed.). New York: McGraw-Hill.
Guion, R. M. (1998).Assessment measurement and prediction for personnel decisions.Mahwah, NJ: Lawrence Erlbaum Associates.
Guyatt, G., Walter, S. D., & Norman, G. R. (1987). Measuring change over time: Assessing the usefulness of evaluative instruments.Journal of Chronic Diseases 40171–178.
Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 147–200). New York: American Council on Education and Macmillan.
Hambleton, R. K., & Rovinelli, R. J. (1986). Assessing the dimensionality of a set of test items.Applied Psychological Measurement 10(3)287–302.
Hambleton, R. K., & Swaminathan, H. (1985).Item response theory: Principles and applications.Boston: Kluwer Nijhoff.
Holmes, W. C., & Shea, J. A. (1998). A new HIV/AIDS-targeted quality of life (HT-QoL) instrument: development, reliability, and validity.Medical Care36, 138–154.
Kane, M., Crooks, T., & Cohen, A. (1999). Validating measures of performance.Educational Measurement: Issues and Practice 18(2)5–17.
Kerlinger, F. N. (1986).Foundations of behavioral research(3rd ed.). New York: Holt, Rinehart and Winston.
Kitzinger, J. (1995). Introducing focus groups.British Medical Journal 31199–302.
LaDuca, A. (1994). Validation of professional licensure examinations: Professions theory, test design, and construct validity.Evaluation in the Health Professions17(2), 178–197.
Lazarus, G. S., Foulke, G., Bell, R. A., Sietkin, A. D., Keller, K., & Kravitz, R. L. (1998), The effects of a managed care educational program on faculty and trainee knowledge, attitudes, and behavioral intentions.Academic Medicine73, 1107–1113.
Likert, R. (1932). A technique for the measurement of attitudes.Archives of Psychology No. 14055.
Linn, P. L., & Slinde, J. A. (1977). Determination of the significance of change between pre-and posttesting periods.Reviews of Educational Research 47121–150.
Lloyd-Jones, G., Fowell, S., & Bligh, J. G. (1999). The use of the nominal group technique as an evaluative tool in medical undergraduate education.Medical Education 33(1)8–13.
Lord, F. M. (1980).Applications of item response theory to practical testing problems.Hillsdale, NJ: Lawrence Erlbaum.
Lord, F. M., & Novick, M. N. (1968).Statistical theories of mental test development.Reading, MA: Addison-Wesley.
McHomey, C. A., Ware, J. E., Lu, J. F. R., & Sherbourne, C. D. (1994). The MOS 36item short-form health survey (SF-36):III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups.Medical Care32, 40–66.
McKinley, R. L. (1988). A comparison of six methods for combining multiple IRT item parameter estimates.Journal of Educational Measurement 25233–246.
Messick, S. (1989). Validity. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 13–103). New York: American Council on Education and Macmillan.
Millman, J., & Greene, J. (1989). The specification and development of tests of achievement and ability. In R. L. Linn (Ed.).Educational measurement(3rd ed., pp. 335–366). New York: American Council on Education and Macmillan.
Mislevy, R. J., & Bock, R. D. (1986).BILOG: Item analysis and test scoring with binary logistic models.Mooresville, IN: Scientific Software.
Moore, G. T., Block, S. D., Style, C. B., & Mitchell, R. (1994). The influence of the New Pathway curriculum on Harvard medical students.Academic Medicine69, 983–989.
Nunnally, J. C. (1978).Psychometric theory.New York: McGraw-Hill.
Nunnally, J. C., & Bernstein, I. H. (1994).Psychometric theory(3rd ed.). New York: McGraw-Hill.
Petersen, N. S., Cook, L. L., & Stocking, M. L. (1983). IRT versus conventional equating methods: A comparative study of scale stability.Journal of Educational Statistics 8137–156.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 221–262). New York: American Council on Education and Macmillan.
Pitts, J., Coles, C., & Thomas, P. (1999). Educational portfolios in the assessment of general practice trainers: reliability of assessors.Medical Education 33(7)515–520.
Popham W. J. (1997). Consequential validity: Right concern-wrong concept.Educational Measurement: Issues and Practice16(2), 9–13.
Ramsey, P. G., Carline, J. D., Inui, T. S., Larson, E. B., LoGerfo, J. P., & Wenrich, M. D. (1989). Predictive validity of certification by the American Board of Internal Medicine.Annals of Internal Medicine 110(9)719–726.
Shavelson, R. J., & Webb, N. M. (1991).Generalizability theory: A primer.Newbury Park, CA: Sage.
Shea, J. A., Norcini, J. J., & Webster, G. D. (1988). An application of item response theory to certifying examinations in internal medicine.Evaluation and the Health Professions 11(3)283–305.
Shea, J. A.&Norcini, J. J. (1995). Equating. In J. Impara (Ed.)Licensure Testing: Purposes procedures and practices(pp. 253–287). Lincoln, NE: Burns Institute of Mental Measurements.
Shepard, L. A. (1997). The centrality of test use and consequences for test validity.Educational Measurement: Research and Practice 16(2)5–8, 13, 24.
Skaggs, G., & Lissitz, R. W. (1986a). An exploration of the robustness of four test equating methods.Applied Psychological Measurement 10303–317.
Skaggs, G., & Lissitz, R. W. (1986b). IRT test equating: Relevant issues and a review of recent literature.Review of Educational Research 56y495–529.
Spearman C. E. (1904). The proof and measurement of association between two things.American Journal of Psychology 1572–101.
Streiner D. L. (1994). Figuring out factors: the use and misuse of factor analysis.Canadian Journal of Psychiatry- Revue Canadienne de Psychiatrie 39135–140.
Streiner, D. L., & Norman, G. R. (1995).Health measurement scales: A practical guide to their development and use(2nd ed.). Oxford: Oxford University Press.
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory.Applied Psychological Measurement7, 201–210.
Suen, H. K. (1990).Principles of test theories.Hillsdale, NJ: Lawrence Erlbaum Associates.
Swaminathan, H. (1983). Parameter estimation in item response models. In R. K. Hambleton (Ed.)Applications of item response theory(pp. 24–44). Vancouver: Educational Research Institute of British Columbia.
Swanson, D. B., Case, S. M., & Nungester, R. J. (1991). Validity of NBME Part I and Part II scores in prediction of Part III performance.Academic Medicine 66(9RIME Suppl.), S7–S9.
Wenzel, L. S., Briggs, K. L., & Puryear, B. L. (1998). Portfolio: authentic assessment in the age of the curriculum revolution.Journal of Nursing Education37(5), 208–212.
Wingersky, M. S., Barton, M. A., & Lord, F. M. (1982).LOGIST User’s guide.Princeton, NJ: Educational Testing Service.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Shea, J.A., Fortna, G.S. (2002). Psychometric Methods. In: Norman, G.R., et al. International Handbook of Research in Medical Education. Springer International Handbooks of Education, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0462-6_4
Download citation
DOI: https://doi.org/10.1007/978-94-010-0462-6_4
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-3904-8
Online ISBN: 978-94-010-0462-6
eBook Packages: Springer Book Archive