Abstract
A new method for estimation of attributes’ importance for supervised classification, based on the random forest approach, is presented. Essentially, an iterative scheme is applied, with each step consisting of several runs of the random forest program. Each run is performed on a suitably modified data set: values of each attribute found unimportant at earlier steps are randomly permuted between objects. At each step, apparent importance of an attribute is calculated and the attribute is declared unimportant if its importance is not uniformly better than that of the attributes earlier found unimportant. The procedure is repeated until only attributes scoring better than the randomized ones are retained. Statistical significance of the results so obtained is verified. This method has been applied to 12 data sets of biological origin. The method was shown to be more reliable than that based on standard application of a random forest to assess attributes’ importance.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelharter, D.J.: Probabilistic networks and expert systems. Springer, New York (1999)
Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1996)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Monterey (1984)
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Francisco (1988)
Pawlak, Z.: Information systems theoretical foundations. Inf. Syst. 6, 205–218 (1981); Rough Set Theory
Komorowski, J., Oehrn, A., Skowron, A.: ROSETTA Rough Sets. In: Klsgen, W., Zytkow, J. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 554–559. Oxford University Press, Oxford (2002)
Bazan, J.G., Szczuka, M.S.: RSES and rSESlib - A collection of tools for rough set computations. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 106–113. Springer, Heidelberg (2001)
Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991); Rough Set Theory
Ågotnes, T., Komorowski, J., Løken, T.: Taming Large Rule Models in Rough Set Approaches. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS, vol. 1704, pp. 193–203. Springer, Heidelberg (1999)
Makosa, E.: Rule Tuning, MSc Thesis, The Linnaeus Center for Bioinformatics, Uppsala University (2005)
Nguyen, H.S., Nguyen, S.H.: Pattern extraction from data. Fundamenta Informaticae 34, 129–144 (1998)
Nguyen, H.S., Skowron, A., Synak, P.: Discovery of data patterns with applications to decomposition and classfification problems. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 2: Applications, Case Studies and Software Systems, pp. 55–97. Physica-Verlag, Heidelberg (1998)
Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001), Also see the bibliography at: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_papers.htm
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156. Morgan Kauffman, San Francisco (1996), Also see the bibliography at: http://www.cs.princeton.edu/~schapire/boost.html
Duentsch, I., Gediga, G.: Uncertainty Measures of Rough Set Prediction. Artif. Intell. 106, 109–137 (1998)
Duentsch, I., Gediga, G.: Statistical evaluation of rough set dependency analysis. Int. J. Hum.-Comput. Stud. 46, 589–604 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rudnicki, W.R., Kierczak, M., Koronacki, J., Komorowski, J. (2006). A Statistical Method for Determining Importance of Variables in an Information System. In: Greco, S., et al. Rough Sets and Current Trends in Computing. RSCTC 2006. Lecture Notes in Computer Science(), vol 4259. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908029_58
Download citation
DOI: https://doi.org/10.1007/11908029_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47693-1
Online ISBN: 978-3-540-49842-1
eBook Packages: Computer ScienceComputer Science (R0)