Summary
Although there can be many reasons that one study fails to confirm the results of another, the consequences of data exploration and the potential for spuriously significant results are often overlooked. A series of simulation experiments were designed to mimic the characteristics of relapse-free survival data that might be encountered in a prognostic factor study of node-negative breast cancer patients. Each simulated dataset of 500 or 250 cases was divided into a training set, used to select the “best” prognostic factor cutpoint, and a validation set, used to confirm the cutpoint. Testing multiple cutpoints markedly increased the risk of making a Type I error. The power to detect even small true differences was substantial, and increased as the number of cutpoints increased. Regardless of the number of cutpoints tested on the training sets, the Type I error rate on an independent validation data set was quite stable and the power of the validation set to detect true differences was not related to the number of cutpoints. Validation power closely approximated that predicted for a simple two group comparison. It is therefore recommended that exploratory analyses of prognostic factors formally employ some method of adjusting for increased Type I errors, such as independent validation sets, ad hoc adjustment factors, or other statistical methods of estimating the true risk.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
McGuire WL, Hilsenbeck SG, Clark GM: Optimal mastectomy timing. J Natl Cancer Inst 84:346–348, 1992.
Allred DC, Tandon AK, Clark GM, McGuire WL: HER-2/neu oncogene amplification and expression in human mammary carcinoma.In Pretlow TG II, Pretlow TP (eds) Biochemical and Molecular Aspects of Selected Cancers, Vol 1. Academic Press, 1991, pp 75–97.
Therneau TM, Grambsch PM, Fleming TR: Martingale residuals for survival models. Biometrika 77:147–160, 1990.
Abel U, Berger J, Wiebelt H: CRITLEVEL: An exploratory procedure for the evaluation of quantitative prognostic factors. Meth Inform Med 23:154–156, 1984.
Sigurdsson H, Baldetorp B, Borg Å, Dalberg M, Fernö, Killander D, Olsson H, Ranstam J: Flow cytometry in primary breast cancer: improving the prognostic value of the fraction of cells in the S-phase by optimal categorization of cut-off levels. Brit J Cancer 62:786–790, 1990.
McGuire WL: Breast cancer prognostic factors: Evaluation guidelines. J Natl Cancer Inst 83:154–155, 1991.
StatSci: S-PLUS Reference Manual, version 3.0. Statistical Sciences, Inc., Seattle WA, 1991.
George SL, Desu MM: Planning the size and duration of a clinical trial studying the time to some critical event. J Chron Dis 27: 15–24, 1974.
Author information
Authors and Affiliations
Additional information
We regret to report that Dr. McGuire died on March 25, 1992, while this work was in progress.
Rights and permissions
About this article
Cite this article
Hilsenbeck, S.G., Clark, G.M. & McGuire, W.L. Why do so many prognostic factors fail to pan out?. Breast Cancer Res Tr 22, 197–206 (1992). https://doi.org/10.1007/BF01840833
Issue Date:
DOI: https://doi.org/10.1007/BF01840833