The Effects of Violating Standard Item Writing Principles on Tests and Students: The Consequences of Using Flawed Test Items on Achievement Examinations in Medical Education

Downing, Steven M.

doi:10.1007/s10459-004-4019-5

The Effects of Violating Standard Item Writing Principles on Tests and Students: The Consequences of Using Flawed Test Items on Achievement Examinations in Medical Education

Published: June 2005

Volume 10, pages 133–143, (2005)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Advances in Health Sciences Education Aims and scope Submit manuscript

The Effects of Violating Standard Item Writing Principles on Tests and Students: The Consequences of Using Flawed Test Items on Achievement Examinations in Medical Education

Download PDF

Steven M. Downing¹

1554 Accesses
134 Citations
1 Altmetric
Explore all metrics

Abstract

The purpose of this research was to study the effects of violations of standard multiple-choice item writing principles on test characteristics, student scores, and pass–fail outcomes. Four basic science examinations, administered to year-one and year-two medical students, were randomly selected for study. Test items were classified as either standard or flawed by three independent raters, blinded to all item performance data. Flawed test questions violated one or more standard principles of effective item writing. Thirty-six to sixty-five percent of the items on the four tests were flawed. Flawed items were 0–15 percentage points more difficult than standard items measuring the same construct. Over all four examinations, 646 (53%) students passed the standard items while 575 (47%) passed the flawed items. The median passing rate difference between flawed and standard items was 3.5 percentage points, but ranged from −1 to 35 percentage points. Item flaws had little effect on test score reliability or other psychometric quality indices. Results showed that flawed multiple-choice test items, which violate well established and evidence-based principles of effective item writing, disadvantage some medical students. Item flaws introduce the systematic error of construct-irrelevant variance to assessments, thereby reducing the validity evidence for examinations and penalizing some examinees.

Article PDF

Ensuring the quality of multiple-choice exams administered to small cohorts: A cautionary tale

Article Open access 03 January 2017

Theory, Process, and Validation Evidence for a Staff-Driven Medical Education Exam Quality Improvement Process

Article 06 July 2016

Modifying Hofstee standard setting for assessments that vary in difficulty, and to determine boundaries for different levels of achievement

Article Open access 28 January 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

M. Albanese (1993) ArticleTitleType K and other complex multiple-choice items: An analysis of research and item properties Educational Measurement: Issues and Practices 12 28–33
Google Scholar
Case S.M. Downing S.M. (1989). Performance of various multiple-choice item types on medical specialty examinations: Types A,B,C, K, and X. Proceedings of the Twenty-Eighth Annual Conference on Research in Medical Education, pp. 167–172
D.B. CaseS.M. Swanson (1998) Constructing Written Test Questions for the Basic and Clinical Sciences National Board of Medical Examiners Philadelphia, PA
Google Scholar
K.D. Crehan T.M. Haladyna (1991) ArticleTitleThe validity of two item-writing rules Journal of Experimental Education 59 183–192
Google Scholar
Dawson-Saunders B., Nungester R.J. Downing S.M. (1989). A comparison of single best answer multiple-choice items (A-type) and complex multiple-choice items (K-type). Proceedings of the Twenty-Eighth Annual Conference on Research in Medical Education, pp. 161–166
S.M. Downing (2002) ArticleTitleConstruct-irrelevant variance and flawed test questions: Do multiple-choice item writing principles make any difference? Academic Medicine 77 s103–104 Occurrence Handle12377719
PubMed Google Scholar
S.M. Downing R.A. Baranowski L.J. Grosso J.J. Norcini (1995) ArticleTitleItem type and cognitive ability measured: The validity evidence for multiple true–false items in medical specialty certification Applied Measurement in Education 8 89–199
Google Scholar
Downing S.M., Dawson-Saunders B., Case S.M. Powell R.D. (April, 1991). The psychometric effects of negative stems, unfocused questions, and heterogeneous options on NBME Part I and Part II characteristics. A paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL
R.B. Frary (1991) ArticleTitleThe none-of-the-above option: An empirical study Applied Measurement in Education 4 115–124
Google Scholar
T.M. Haladyna (2004) Developing and Validating Multiple-choice Test Items Lawrence Erlbaum Associates Hillsdale, NJ
Google Scholar
T.M. Haladyna M.C. DowningS.M. Rodriguez (2002) ArticleTitleA review of multiple-choice item-writing guidelines Applied Measurement in Education 15 309–334 Occurrence Handle10.1207/S15324818AME1503_5
Article Google Scholar
P.H. Harasym E.J. Leong C. Violato R. Brant F.F. Lorscheider (1998) ArticleTitleCuing effect of “all of the above” on the reliability and validity of multiple-choice test items Evaluation and the Health Profession 21 120–133
Google Scholar
R.F. Jozefowicz B.M. Koeppen S. Case R. Galbraith D. Swanson H. Glew (2002) ArticleTitleThe quality of in-house medical school examinations Academic Medicine 77 156–161 Occurrence Handle11841981
PubMed Google Scholar
R.L. Linn N.E. Gronlund (2000) Measurement and Assessment in Teaching EditionNumber8 Prentice-Hall Upper Saddle River, NJ
Google Scholar
W.A. Mehrens I.J. Lehmann (1991) Measurement and Evaluation in Education and Psychology Harcourt Brace New York
Google Scholar
S. Messick (1989) Validity R.L. Linn (Eds) Educational Measurement EditionNumber3 American Council on Education and Macmillan New York 13–104
Google Scholar
L. Nedelsky (1954) ArticleTitleAbsolute grading standards for objective tests Educational and Psychological Measurement 14 181–201
Google Scholar
A.J. Nitko (1996) Educational Assessment of Students Merrill Englewood Cliffs, NJ
Google Scholar
P. Tamir (1993) ArticleTitlePositive and negative multiple choice items: How difficult are they? Studies in Educational Evaluation 19 311–32
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Medical Education (MC 591), College of Medicine, University of Illinois at Chicago, 808 South Wood Street, Chicago, Il, 60612-7309, USA
Steven M. Downing

Authors

Steven M. Downing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven M. Downing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Downing, S.M. The Effects of Violating Standard Item Writing Principles on Tests and Students: The Consequences of Using Flawed Test Items on Achievement Examinations in Medical Education. Adv Health Sci Educ Theory Pract 10, 133–143 (2005). https://doi.org/10.1007/s10459-004-4019-5

Download citation

Received: 10 December 2003
Accepted: 01 October 2004
Issue Date: June 2005
DOI: https://doi.org/10.1007/s10459-004-4019-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Effects of Violating Standard Item Writing Principles on Tests and Students: The Consequences of Using Flawed Test Items on Achievement Examinations in Medical Education

Abstract

Article PDF

Similar content being viewed by others

Ensuring the quality of multiple-choice exams administered to small cohorts: A cautionary tale

Theory, Process, and Validation Evidence for a Staff-Driven Medical Education Exam Quality Improvement Process

Modifying Hofstee standard setting for assessments that vary in difficulty, and to determine boundaries for different levels of achievement

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Effects of Violating Standard Item Writing Principles on Tests and Students: The Consequences of Using Flawed Test Items on Achievement Examinations in Medical Education

Abstract

Article PDF

Similar content being viewed by others

Ensuring the quality of multiple-choice exams administered to small cohorts: A cautionary tale

Theory, Process, and Validation Evidence for a Staff-Driven Medical Education Exam Quality Improvement Process

Modifying Hofstee standard setting for assessments that vary in difficulty, and to determine boundaries for different levels of achievement

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation