Abstract
Forest-based classification and prediction is one of the most commonly used nonparametric statistical methods in many scientific and engineering areas, particularly in machine learning and analysis of high-throughput genomic data. In this chapter, we first introduce the construction of random forests and deterministic forests, and then address a fundamental and practical issue on how large the forests need to be.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
L. Breiman. Bagging predictors. Machine Learning, 26:123–140, 1996.
L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth, California, 1984.
X. Chen, C. Liu, M. Zhang, and H. Zhang. A forest-based approach to identifying gene and gene–gene interactions. Proc. Natl. Acad. Sci. USA, 104:19199–19203, 2007.
J.H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189–1232, 2001.
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.
R.J. Klein, C. Zeiss, E.Y. Chew, J.Y. Tsai, R.S. Sackler, C. Haynes, A.K. Henning, J.P. SanGiovanni, S.M. Mane, S.T. Mayne, M.B. Bracken, F.L. Ferris, J. Ott, C. Barnstable, and C. Hoh. Complement factor H polymorphism in age-related macular degeneration. Science, 308:385–389, 2005.
M.R. Kosorok and S. Ma. Marginal asymptotics for the “large p, small n” paradigm: With applications to microarray data. Annals of Statistics, 35:1456–1486, 2007.
S. Lin, D. J. Cutler, M. E. Zwick, and A. Chakravarti. Haplotype inference in random population samples. American Journal of Human Genetics, 71:1129–1137, 2002.
C. Strobl, A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis. Conditional variable importance for random forests. BMC Bioinfor-matics, 9:307, 2008.
C. Strobl, A.-L. Boulesteix, A. Zeileis, and T. Hothorn. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8:25, 2007.
M. J. van de Vijver, Y. D. He, L. J. van’t Veer, H. Dai, A. A. M. Hart, et al. A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine, 347:1999–2009., 2002.
M. Wang, M. Zhang, X. Chen, and H.P. Zhang. Detecting genes and gene-gene interactions for age-related macular degeneration with a forest-based approach. Statistics in Biopharmaceutical Research, 1:424–430, 2009.
H.P. Zhang. Classification trees for multiple binary responses. Journal of the American Statistical Association, 93:180–193, 1998a.
H.P Zhang, C.Y. Yu, and B. Singer. Cell and tumor classification using gene expression data: Construction of forests. Proc. Natl. Acad. Sci. USA, 100:4168–4172, 2003.
M. Zhang, D. Zhang, and M. Wells. Variable selection for large p small n regression models with incomplete data: Mapping qtl with epistases. BMC Bioinformatics, 9:251, 2008.
L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.
Y. Freund and R.E. Schapire. Game theory, on-line prediction and boosting. In In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325–332. ACM Press, 1996.
R. Díaz-Uriarte and S. Alvarez de Andrés. Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7:3, 2006.
D. Amaratunga, J. Cabrera, et al. Enriched random forests. Bioin-formatics, 24:2010–2014, 2008.
R. Genuer, J. M. Poggi, and C. Tuleau. Random forests: some methodological insights. Rapport de Recherche, Institut National de Recherche en Informatique et en Automatique, 2008.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Zhang, H., Singer, B.H. (2010). Random and Deterministic Forests. In: Recursive Partitioning and Applications. Springer Series in Statistics, vol 0. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6824-1_6
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6824-1_6
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6823-4
Online ISBN: 978-1-4419-6824-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)