Abstract
Modern high-throughput technologies allow us to simultaneously measure the expressions of a huge number of candidate predictors, some of which are likely to be associated with survival. One difficult task is to search among an enormous number of potential predictors and to correctly identify most of the important ones, without mistakenly identifying too many spurious associations. Mere variable selection is insufficient, however, for the information from the multiple predictors must be intelligently combined and calibrated to form the final composite predictor. Many commonly used procedures overfit the training data, miss many important predictors, or both. Although it is impossible to simultaneously adjust for a huge number of predictors in an unconstrained way, we propose a method that offers a middle ground where some partial multivariate adjustments can be made in an adaptive fashion, regardless of the number of candidate predictors. We demonstrate the performance of our proposed procedure in a simulation study within the Cox proportional hazards regression framework, and we apply our new method to a publicly available data set to construct a novel prognostic gene signature for breast cancer survival.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559–584
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Nguyen DV, Roche DM (2002) Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18:39–50
Park PJ, Tian L, Kohane IS (2002) Linking expression data with patient survival times using partial least squares. Bioinformatics 18:1625–1632
Pomeroy SL, Tamayo P, Gaasenbeek M et al (2001) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415:24
Rosenwald A, Wright G, Wiestner A et al (2003) The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell 3:185–197
Sorlie T, Perou CM, Tibshirani R et al (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA 98:10869–10874
van’t Veer LJ, Dai H, van de Vijver MJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536
van de Vijver MJ, He YD, van’t Veer LJ et al (2002) A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347:1999–2009
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R et al (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98:11462–11467
Wold H (1966) Estimation of principal components and related models by iterative least squares. In: Krishnaiaah PR (ed) Multivariate analysis. Academic Press, New York, pp 391–420
Garthwaite PH (1994) An interpretation of partial least squares. J Am Stat Assoc 89:122–127
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York
Li H, Gui J (2004) Partial Cox regression analysis for high-dimensional microarray gene expression data. Bioinformatics 20:1208–1215. doi: 10.1093/bioinformatics/6th900
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
Acknowledgments
This work was supported in part by Concept Award W81XWH-04-1-0714 from the Breast Cancer Research Program of the Congressionally Directed Medical Research Programs run by the US Department of Defense via the US Army Medical Research and Materiel Command; the University of Rochester CTSI grant # 1 UL1 RR024160-01 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH); and the NIH Roadmap for Medical Research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this protocol
Cite this protocol
Peterson, D.R. (2013). Constructing Multivariate Prognostic Gene Signatures with Censored Survival Data. In: Yakovlev, A., Klebanov, L., Gaile, D. (eds) Statistical Methods for Microarray Data Analysis. Methods in Molecular Biology, vol 972. Humana Press, New York, NY. https://doi.org/10.1007/978-1-60327-337-4_6
Download citation
DOI: https://doi.org/10.1007/978-1-60327-337-4_6
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-60327-336-7
Online ISBN: 978-1-60327-337-4
eBook Packages: Springer Protocols