Statistical significance testing should be discontinued in mathematics education research

Menon, Rama

doi:10.1007/BF03217248

Statistical significance testing should be discontinued in mathematics education research

Article
Published: September 1993

Volume 5, pages 4–18, (1993)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Mathematics Education Research Journal Aims and scope Submit manuscript

Statistical significance testing should be discontinued in mathematics education research

Download PDF

Rama Menon¹

87 Accesses
14 Citations
Explore all metrics

Abstract

It is claimed here that the confidence mathematics education researchers have in statistical significance testing (SST) as an inference tool par excellence for experimental research is misplaced. Five common myths about SST are discussed, namely that SST: (a) is a controversy-free, recipe-like method to allow decision making; (b) answers the question whether there is a low probability that the research results were due to chance; (c) logic parallels the logic of mathematical proof by contradiction; (d) addresses the reliability/replicability question; and (e) is a necessary but not sufficient condition for the credibility of results. It is argued that SST’s contribution to educational research in general, and mathematics education research in particular, is not beneficial, and that SST should be discontinued as a tool for such research. Some alternatives to SST are suggested, and a call is made for mathematics education researchers to take the lead in using these alternatives.

References

Atkinson, D. R., Furlong, M. J., & Wampold, B. E. (1982). Statistical significance, reviewer evaluations, and scientific process: Is there a (statistically) significant relationship?Journal of Counselling Psychology, 29, 189–194.
Article Google Scholar
Bakan, D. (1966). The test of significance in psychological research.Psychological Bulletin, 66, 423–437.
Article Google Scholar
Begg, I., Armour, V., & Kerr, T. (1985). On believing what we remember.Canadian Journal of Behavioral Science, 17, 199–214.
Google Scholar
Carver, R. P. (1978). The case against statistical significance testing.Harvard Educational Review, 48, 378–399.
Google Scholar
Chow, S. L. (1991). Some reservations about power analysis.American Psychologist, 46, 1088–1089.
Article Google Scholar
Coats, W. (1970). Significant differences: A case against the normal use of inferential statistical models in educational research.Educational Researcher Newsletter, 21, 6–7.
Google Scholar
Cohen, J. (1977).Statistical power analysis for the behavioral sciences. New York: Academic Press.
Google Scholar
Cohen, J. (1990). Things I’ve learned so far.American Psychologist, 45, 304–312.
Google Scholar
Cooper, H. M. (1984).The integrative research review: A systematic approach. California: Sage Publications.
Google Scholar
Cronbach, L. J., & Snow, R. E. (1977).Aptitudes and instructional methods: A handbook for research on interactions. New York: Irvington.
Google Scholar
Crow, E. L. (1991). Response to Rosenthal’s comment “How are we doing in soft psychology?”American Psychologist, 46, 1083.
Article Google Scholar
Daniel, L. G. (1989, January).Use of the jacknife statistic to establish the external validity of discriminant analysis results. Paper presented at the annual meeting of the Southwest Educational Research Association, Houston, Texas. (ERIC Document Reproduction Service No. ED 305 382).
Dar, R. (1987). Another look at Meehl, Lakatos, and the scientific practices of psychologists.American Psychologist, 42, 145–151.
Article Google Scholar
Dawes, R. M. (1981).How to use your head and statistics at the same time, or at least in rapid alternation. Unpublished manuscript, University of Oregon.
Diaconis, P., & Efron, B. (1983). Computer-intensive methods in statistics.Scientific American, 248(5), 116–130.
Article Google Scholar
Diaconis, P., & Freedman, D. (1981). The persistence of cognitive illusions.The Behavioral and Brain Sciences, 4, 333–334.
Article Google Scholar
Factor, L., & Kooser, R. (1981).Value presuppositions in science textbooks: A critical bibliography. Galesburg, IL: Knox College.
Google Scholar
Falk, R. (1986). Misconceptions of statistical significance.Journal of Structural Learning, 9, 83–96.
Google Scholar
Falk, R., & Greenbaum, C. W. (1993).The fallacy of probabilistic modus tollens and the statistical-significance decision. Paper submitted for publication.
Fisher, R. A. (1960).The design of experiments, (7th ed.). Edinburgh: Oliver & Boyd.
Google Scholar
Gigerenzer, G., & Murray, D. J. (1987).Cognition as intuitive statistics. Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Glass, G. V., & Hopkins, K. D. (1984).Statistical methods in education and psychology (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Gold, D. (1969). Statistical tests and substantive significance.The American Sociologist, 4, 42–46.
Google Scholar
Guttman, L. (1977). What is not what in statistics.The Statistician, 26, 81–107.
Article Google Scholar
Guttman, L. (1981). Efficacy coefficients for differences among averages. In I. Borg (Ed.),Multidimensional data representations: When and why. Ann Arbor, MI: Mathesis Press.
Google Scholar
Guttman, L. (1985). The illogic of statistical inference for cumulative science.Applied Stochastic Models and Data Analysis, 1, 3–10.
Article Google Scholar
Hays, W. L. (1974).Statistics (2nd ed.). New York: Holt, Rinehart & Winston.
Google Scholar
Hays, W. L. (1981).Statistics for psychologists (3rd ed.). New York: Holt, Rinehart & Winston.
Google Scholar
Kendall, M. G. (1943).The advanced theory of statistics. Vol. 1. New York: Lippincott.
Google Scholar
Lesnak, R. J. (1989). Writing to learn: An experiment in remedial algebra. In P. Connolly & T. Vilardi (Eds.),Writing to learn mathematics and science (pp. 147–156). New York: Teachers College Press.
Google Scholar
Levy, P. (1967). Substantive significance of significant differences between two groups.Psychological Bulletin, 67, 37–40.
Article Google Scholar
Lunneborg, C. E. (1987).Bootstrap applications for the behavioral sciences. Seattle: University of Washington.
Google Scholar
McGraw, K. Q. (1991). Problems with the BESD: A comment on Rosenthal’s “How are we doing in soft psychology?”American Psychologist, 46, 1084–1086.
Article Google Scholar
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology.Journal of Consulting and Clinical Psychology, 46, 806–834.
Article Google Scholar
Melton, A. W. (1962). Editorial.Journal of Experimental Psychology, 64, 553–557.
Article Google Scholar
Morrison, D. E., & Henkel, R. E. (1969). Significance tests reconsidered.The American Sociologist, 4, 131–140.
Google Scholar
Pauker, S. P., & Pauker, S. G. (1979). The amniocentesis decision: An explicit guide for parents. In C. J. Epstein, C. J. R. Curry, S. Packman, S. Sherman & B. D. Hall (Eds.),Birth defects: Original article series; Vol. 15. Risk, communication, and decision making in genetic counseling (pp. 289–324). New York: The National Foundation.
Google Scholar
Phillips, L. D. (1973).Bayesian statistics for social scientists. London: Nelson.
Google Scholar
Rosenthal, R., & Rubin, D. B. (1982). A simple general purpose display of magnitude of experimental effect.Journal of Educational Psychology, 74, 166–169.
Article Google Scholar
Rosenthal, R. (1979). The “file drawer problem” and tolerance for null results.Psychological Bulletin, 86, 638–641.
Article Google Scholar
Rosenthal, R. (1990). How are we doing in soft psychology?American Psychologist, 45, 775–777.
Article Google Scholar
Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science.American Psychologist, 44, 1276–1284.
Article Google Scholar
Rozeboom, W. W. (1960). The fallacy of the null hypothesis significance test.Psychological Bulletin, 57, 416–428.
Article Google Scholar
Salsburg, D. S. (1985). The religion of statistics as practiced in medical journals.The American Statistician, 39(3), 220–223.
Article Google Scholar
Shaver, J. P. (1985a). Chance and nonsense: A conversation about interpreting tests of statistical significance, Part 1.Phi Delta Kappan, September, 57–60.
Shaver, J. P. (1985b). Chance and nonsense: A conversation about interpreting tests of statistical significance, Part 2.Phi Delta Kappan, October, 138–141.
Shaver, J. P. (1992, April).What statistical significance testing is, and what it is not. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
Slakter, M. J., Yu, Y. B., & Suzuki-Slakter, N. S. (1991). *, **, and ***; Statistical nonsense at the.00000 level.Nursing Research, 40(4), 248–249.
Article Google Scholar
Spencer-Brown, G. (1957).Probability and scientific inference. London: Longmans.
Google Scholar
Stegmuller, W. (1973). “Jenseits von Popper und Carnap”: Die logischen Grundlagen des statitischen Schliessens. Berlin: Springer.
Google Scholar
Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa.Journal of the American Statistical Association, 54, 30–34.
Article Google Scholar
Stevens, S. S. (1968). Measurement, statistics and the schemapiric view.Science, 161, 849–856.
Article Google Scholar
Stevens, S. S. (1971). Issues in psychophysical measurement.Psychological Review, 78, 426–450.
Article Google Scholar
Strahan, R. F. (1991). Remarks on the binomial effect size display.American Psychologist, 46, 1083–1084.
Article Google Scholar
Thompson, B. (1987).The use (and misuse) of statistical significance testing: Some recommendations for improved editorial policy and practice. Paper presented at the annual meeting of the American Educational Research Association, Washington, DC.
Thompson, B. (1988). Program FACSTRAP: A program that computes bootstrap estimates of factor structure.Educational and Psychological Measurement, 48, 1129–1135.
Article Google Scholar
Thompson, B. (1989). Statistical significance, result importance, and result generalizability: Three noteworthy but somewhat different issues.Measurement and Evaluation in Counselling and Development, 22, 2–6.
Google Scholar
Thompson, B. (1992).The use of statistical significance tests in research: Some criticisms and alternatives. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, April 22, 1992.
Tyler, R. W. (1931). What is statistical significance?Educational Research Bulletin, 10, 115–118, 142.
Google Scholar
Winch, R. P., & Campbell, D. T. (1969). Proof? No. Evidence? Yes. The significance of tests of significance.The American Sociologist, 4, 140–143.
Google Scholar
Winer, B. J., Brown, D. R., & Michels, K. M. (1991).Statistical principles in experimental design (3rd ed.). New York: McGraw-Hill.
Google Scholar

Download references

Author information

Authors and Affiliations

Nanyang Technological University, Singapore
Rama Menon

Authors

Rama Menon
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Menon, R. Statistical significance testing should be discontinued in mathematics education research. Math Ed Res J 5, 4–18 (1993). https://doi.org/10.1007/BF03217248

Download citation

Issue Date: September 1993
DOI: https://doi.org/10.1007/BF03217248

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Statistical significance testing should be discontinued in mathematics education research

Abstract

Article PDF

Similar content being viewed by others

The Narcissism of Mathematics Education

Education Research as Analytic Claims: The Case of Mathematics

Pleasures, Power, and Pitfalls of Writing up Mathematics Education Research

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Statistical significance testing should be discontinued in mathematics education research

Abstract

Article PDF

Similar content being viewed by others

The Narcissism of Mathematics Education

Education Research as Analytic Claims: The Case of Mathematics

Pleasures, Power, and Pitfalls of Writing up Mathematics Education Research

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation