Psychometric Methods

Shea, Judy A.; Fortna, Gregory S.

doi:10.1007/978-94-010-0462-6_4

Judy A. Shea⁷ &
Gregory S. Fortna⁸

Part of the book series: Springer International Handbooks of Education ((SIHE,volume 7))

1359 Accesses
15 Citations

Summary

The purpose of this chapter is to provide an overview of several concepts and terms that were originally defined and investigated in the corner of education that housed psychometrics, but have migrated to the more general education literature.Definitions, explanations, and examples will be given for the commonly used terms including reliability, generalizability, and validity. Following the discussion of the common psychometric concepts and terms, the second part of the chapter provides an overview of how one might use these concepts in designing or choosing an instrument. The third part of the chapter will introduce some newer and more advanced topics that have received attention in recent years. The chapter will conclude with a brief review of practical suggestions for those engaged in educational research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 429.00; Price excludes VAT (USA)

Softcover Book: USD 549.99; Price excludes VAT (USA)

Hardcover Book: USD 549.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

What Counts as Evidence: A Review of Validity Studies in Educational and Psychological Measurement

Advancing Human Assessment: A Synthesis Over Seven Decades

Objective Measurement in Psychometric Analysis

Reference

American Educational Research Association, American Psychological Association, & National Council for Measurement in Education. (1985).Standards for educational and psychological testing.Washington, DC: American Psychological Association, Inc.
Google Scholar
Angoff, W. H. (1984).Scales norms and equivalent scores.Princeton, NJ: Educational Testing Services. Originally published in R. L. Thorndike (Ed.), (1971).Educational Measurement(2nd ed., pp. 508–600). Washington, DC: American Council on Education.
Google Scholar
Ansley, N. A., & Forsyth, R. A. (1985). An examination of the characteristics of unidimensional IRT parameter estimates derived from two-dimensional data.Applied Psychological Measurement9(1), 37–48.
Article Google Scholar
Babbie, E. R. (1973).Survey research methods.Belmont, CA: Wadsworth.
Google Scholar
Baker, F. B. (1985).The basics of item response theory.Portsmouth, NH: Heinemann.
Google Scholar
Blacklow, R. S., Goepp, C. E., & Hojat, M. (1993). Further psychometric evaluations of a class-ranking model as a predictor of graduates’ clinical competence in the first year of residency.Academic Medicine 68(4)295–297.
Article Google Scholar
Brennan, R. L. (1992).Elements of generalizability theory(2nd ed.). Iowa City, IA: American College Testing Program.
Google Scholar
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitraitmultimethod matrix.Psychological Bulletin 5681–105.
Article Google Scholar
Campbell, D. T., & Stanley, J. C. (1963).Experimental and quasi-experimental designs for research.Chicago: Rand McNally College Publishing Company.
Google Scholar
Carmines, E. G., & Zeller, R. A. (1979).Reliability and validity assessment.Beverly Hills, CA: Sage Publications.
Book Google Scholar
Concato, J., & Feinstein, A. R. (1997). Asking patients what they like: overlooked attributes of patient satisfaction with primary care.American Journal of Medicine 102399–406.
Article Google Scholar
Cook, L. L., & Eignor, D. R. (1991). An NCME instructional module on IRT equating methods.Educational Measurement: Issues and Practice 10(3)37–45.
Article Google Scholar
Crocker, L., & Algina, J. (1986).Introduction to classical and modern test theory.New York: Holt, Rinehart, and Winston.
Google Scholar
Cronbach, L. J., & Furby, L. (1970). How should we measure “change”- or should we?Psychological Bulletin 7468–80.
Article Google Scholar
Cronbach, L. J., Gleser, G. C., Nanda, H.&Rajaratnam, N. (1972).The dependability of behavioral measurements: Theory of generalizability for scores and profiles.New York: Wiley.
Google Scholar
Dawson-Saunders, B., & Trapp, R. G. (1994).Basic and clinical biostatistics(2nd ed.). Norwalk, CT: Appleton and Lang.
Google Scholar
DeVellis, R. F. (1991).Scale development: Theory and applications.Newbury Park: Sage Publications. Dorans, N. J. (1990). Equating methods and sampling designs.Applied Measurement in Education 33–17. Drasgow, F., & Parsons, C. K. (1983). Application of unidimensional item response theory models to multidimensional data.Applied Psychological Measurement7(2), 189–199.
Google Scholar
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 105–146). New York: American Council on Education and Macmillan.
Google Scholar
Fisher, R. M. (1925).Statistical methods for research workers.London: Oliver and Boyd.
Google Scholar
Fowler, F. J., Jr. (1995).Improving survey questions: Design and evaluation.Newbury Park: Sage Publications.
Google Scholar
Gorsuch, R. L. (1983).Factor analysis.Hillsdale, NJ: Erlbaum Associates.
Google Scholar
Gronlund, N. E. (1985).Measurement and evaluation in teaching(5th ed.). New York: Macmillan.
Google Scholar
Guilford, J. P., & Fruchter, B. (1978).Fundamental statistics in psychology and education(6th ed.). New York: McGraw-Hill.
Google Scholar
Guion, R. M. (1998).Assessment measurement and prediction for personnel decisions.Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Guyatt, G., Walter, S. D., & Norman, G. R. (1987). Measuring change over time: Assessing the usefulness of evaluative instruments.Journal of Chronic Diseases 40171–178.
Article Google Scholar
Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 147–200). New York: American Council on Education and Macmillan.
Google Scholar
Hambleton, R. K., & Rovinelli, R. J. (1986). Assessing the dimensionality of a set of test items.Applied Psychological Measurement 10(3)287–302.
Article Google Scholar
Hambleton, R. K., & Swaminathan, H. (1985).Item response theory: Principles and applications.Boston: Kluwer Nijhoff.
Book Google Scholar
Holmes, W. C., & Shea, J. A. (1998). A new HIV/AIDS-targeted quality of life (HT-QoL) instrument: development, reliability, and validity.Medical Care36, 138–154.
Article Google Scholar
Kane, M., Crooks, T., & Cohen, A. (1999). Validating measures of performance.Educational Measurement: Issues and Practice 18(2)5–17.
Article Google Scholar
Kerlinger, F. N. (1986).Foundations of behavioral research(3rd ed.). New York: Holt, Rinehart and Winston.
Google Scholar
Kitzinger, J. (1995). Introducing focus groups.British Medical Journal 31199–302.
Article Google Scholar
LaDuca, A. (1994). Validation of professional licensure examinations: Professions theory, test design, and construct validity.Evaluation in the Health Professions17(2), 178–197.
Article Google Scholar
Lazarus, G. S., Foulke, G., Bell, R. A., Sietkin, A. D., Keller, K., & Kravitz, R. L. (1998), The effects of a managed care educational program on faculty and trainee knowledge, attitudes, and behavioral intentions.Academic Medicine73, 1107–1113.
Article Google Scholar
Likert, R. (1932). A technique for the measurement of attitudes.Archives of Psychology No. 14055.
Google Scholar
Linn, P. L., & Slinde, J. A. (1977). Determination of the significance of change between pre-and posttesting periods.Reviews of Educational Research 47121–150.
Article Google Scholar
Lloyd-Jones, G., Fowell, S., & Bligh, J. G. (1999). The use of the nominal group technique as an evaluative tool in medical undergraduate education.Medical Education 33(1)8–13.
Article Google Scholar
Lord, F. M. (1980).Applications of item response theory to practical testing problems.Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Lord, F. M., & Novick, M. N. (1968).Statistical theories of mental test development.Reading, MA: Addison-Wesley.
Google Scholar
McHomey, C. A., Ware, J. E., Lu, J. F. R., & Sherbourne, C. D. (1994). The MOS 36item short-form health survey (SF-36):III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups.Medical Care32, 40–66.
Article Google Scholar
McKinley, R. L. (1988). A comparison of six methods for combining multiple IRT item parameter estimates.Journal of Educational Measurement 25233–246.
Article Google Scholar
Messick, S. (1989). Validity. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 13–103). New York: American Council on Education and Macmillan.
Google Scholar
Millman, J., & Greene, J. (1989). The specification and development of tests of achievement and ability. In R. L. Linn (Ed.).Educational measurement(3rd ed., pp. 335–366). New York: American Council on Education and Macmillan.
Google Scholar
Mislevy, R. J., & Bock, R. D. (1986).BILOG: Item analysis and test scoring with binary logistic models.Mooresville, IN: Scientific Software.
Google Scholar
Moore, G. T., Block, S. D., Style, C. B., & Mitchell, R. (1994). The influence of the New Pathway curriculum on Harvard medical students.Academic Medicine69, 983–989.
Article Google Scholar
Nunnally, J. C. (1978).Psychometric theory.New York: McGraw-Hill.
Google Scholar
Nunnally, J. C., & Bernstein, I. H. (1994).Psychometric theory(3rd ed.). New York: McGraw-Hill.
Google Scholar
Petersen, N. S., Cook, L. L., & Stocking, M. L. (1983). IRT versus conventional equating methods: A comparative study of scale stability.Journal of Educational Statistics 8137–156.
Article Google Scholar
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 221–262). New York: American Council on Education and Macmillan.
Google Scholar
Pitts, J., Coles, C., & Thomas, P. (1999). Educational portfolios in the assessment of general practice trainers: reliability of assessors.Medical Education 33(7)515–520.
Article Google Scholar
Popham W. J. (1997). Consequential validity: Right concern-wrong concept.Educational Measurement: Issues and Practice16(2), 9–13.
Article Google Scholar
Ramsey, P. G., Carline, J. D., Inui, T. S., Larson, E. B., LoGerfo, J. P., & Wenrich, M. D. (1989). Predictive validity of certification by the American Board of Internal Medicine.Annals of Internal Medicine 110(9)719–726.
Article Google Scholar
Shavelson, R. J., & Webb, N. M. (1991).Generalizability theory: A primer.Newbury Park, CA: Sage.
Google Scholar
Shea, J. A., Norcini, J. J., & Webster, G. D. (1988). An application of item response theory to certifying examinations in internal medicine.Evaluation and the Health Professions 11(3)283–305.
Article Google Scholar
Shea, J. A.&Norcini, J. J. (1995). Equating. In J. Impara (Ed.)Licensure Testing: Purposes procedures and practices(pp. 253–287). Lincoln, NE: Burns Institute of Mental Measurements.
Google Scholar
Shepard, L. A. (1997). The centrality of test use and consequences for test validity.Educational Measurement: Research and Practice 16(2)5–8, 13, 24.
Google Scholar
Skaggs, G., & Lissitz, R. W. (1986a). An exploration of the robustness of four test equating methods.Applied Psychological Measurement 10303–317.
Article Google Scholar
Skaggs, G., & Lissitz, R. W. (1986b). IRT test equating: Relevant issues and a review of recent literature.Review of Educational Research 56y495–529.
Article Google Scholar
Spearman C. E. (1904). The proof and measurement of association between two things.American Journal of Psychology 1572–101.
Article Google Scholar
Streiner D. L. (1994). Figuring out factors: the use and misuse of factor analysis.Canadian Journal of Psychiatry- Revue Canadienne de Psychiatrie 39135–140.
Article Google Scholar
Streiner, D. L., & Norman, G. R. (1995).Health measurement scales: A practical guide to their development and use(2nd ed.). Oxford: Oxford University Press.
Google Scholar
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory.Applied Psychological Measurement7, 201–210.
Article Google Scholar
Suen, H. K. (1990).Principles of test theories.Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Swaminathan, H. (1983). Parameter estimation in item response models. In R. K. Hambleton (Ed.)Applications of item response theory(pp. 24–44). Vancouver: Educational Research Institute of British Columbia.
Google Scholar
Swanson, D. B., Case, S. M., & Nungester, R. J. (1991). Validity of NBME Part I and Part II scores in prediction of Part III performance.Academic Medicine 66(9RIME Suppl.), S7–S9.
Google Scholar
Wenzel, L. S., Briggs, K. L., & Puryear, B. L. (1998). Portfolio: authentic assessment in the age of the curriculum revolution.Journal of Nursing Education37(5), 208–212.
Google Scholar
Wingersky, M. S., Barton, M. A., & Lord, F. M. (1982).LOGIST User’s guide.Princeton, NJ: Educational Testing Service.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Pennsylvania, USA
Judy A. Shea
American Board of Internal Medicine, USA
Gregory S. Fortna

Authors

Judy A. Shea
View author publications
You can also search for this author in PubMed Google Scholar
Gregory S. Fortna
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

McMaster University, Canada
Geoff R. Norman
University of Maastricht, The Netherlands
Cees P. M. van der Vleuten & Diana H. J. M. Dolmans &
University of Sheffield, UK
David I. Newble
Dalhousie University, Canada
Karen V. Mann
University of Toronto, Canada
Arthur Rothman
CurryCorp, Canada
Lynn Curry

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shea, J.A., Fortna, G.S. (2002). Psychometric Methods. In: Norman, G.R., et al. International Handbook of Research in Medical Education. Springer International Handbooks of Education, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0462-6_4

Download citation

DOI: https://doi.org/10.1007/978-94-010-0462-6_4
Published: 05 May 2011
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-3904-8
Online ISBN: 978-94-010-0462-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Psychometric Methods

Summary

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

What Counts as Evidence: A Review of Validity Studies in Educational and Psychological Measurement

Advancing Human Assessment: A Synthesis Over Seven Decades

Objective Measurement in Psychometric Analysis

Reference

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Psychometric Methods

Summary

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

What Counts as Evidence: A Review of Validity Studies in Educational and Psychological Measurement

Advancing Human Assessment: A Synthesis Over Seven Decades

Objective Measurement in Psychometric Analysis

Reference

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation