Abstract
An algorithm for filtering information based on the Kolmogorov-Smirnov correlation-based approach has been implemented and tested on feature selection. The only parameter of this algorithm is statistical confidence level that two distributions are identical. Empirical comparisons with 4 other state-of-the-art features selection algorithms (FCBP, CorrSF, ReliefF and ConnSF) are very encouraging.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
S.D. Bay. The UCI KDD archive. Univ. of California, Irvine, 1999. http://kdd.ics.uci.edu.
T.M. Cover. The best two independent measurements are not the two best. IEEE Transactions on Systems, Man, and Cybernetics, 4:116–117, 1974.
M. Dash and H. Liu. Consistency-based search in feature selection. Artificial Intelligence, 151:155–176, 2003.
W. Duch, T. Winiarski, J. Biesiada, and A. Kachel. Feature ranking, selection and dis-cretization. In Proceedings of Int. Conf. on Artificial Neural Networks (ICANN), pages 251–254, Istanbul, 2003. Bogazici University Press.
U.M. Fayyad and K.B. Irani. Multi-interval discretization of continous-valued attributes for classification learning. In R. Bajcsy, editor, Proceedings of the Thirteenth Interna-tional Joint Conference on Artificial Intelligence, Chambery, France, pages 1022–1027, San Francisco, CA, 1993. Morgan Kaufmann.
M.A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, N.Z., 1999.
R. Laha I. Chakravarti and J. Roy. Handbook of Methods of Applied Statistics. John Wiley and Sons, Chichester, 1967.
K. Kira and L.A. Rendell. A practical approach to feature selection. In Proceedings of the Ninth International Conference on Machine Learning (ICML-92), pages 249–256, San Francisco, CA, 1992. Morgan Kaufmann.
N. Hastings M. Evans and B. Peacock. Statistical Distributions, 3rd. ed. John Wiley and Sons, Chichester, 2000.
C.J. Mertz and P.M. Murphy. The UCI repository of machine learning databases. Univ. of California, Irvine, 1998. http://www.ics.uci.edu.pl/mlearn/MLRespository.html.
W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical recipes in C. The art of scientific computing. Cambridge University Press, Cambridge, UK, 1988.
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo, CA, 1993.
M. Robnik-Sikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53:23–69, 2003.
G.T. Toussaint. Note on optimal selection of independent binary-valued features for pattern recognition. IEEE Transactions on Information Theory, 17:618–618, 1971.
I. Witten and E. Frank. Data minig — practical machine learning tools and techniques with JAVA implementations. Morgan Kaufmann, San Francisco, CA, 2000.
L. Yu and H. Liu. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 12th International Conference on Machine Learning (ICML-03), Washington, D.C., pages 856–863, San Francisco, CA, 2003. Morgan Kaufmann.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Biesiada, J., Duch, W. (2005). Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter. In: Kurzyński, M., Puchała, E., Woźniak, M., żołnierek, A. (eds) Computer Recognition Systems. Advances in Soft Computing, vol 30. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32390-2_9
Download citation
DOI: https://doi.org/10.1007/3-540-32390-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25054-8
Online ISBN: 978-3-540-32390-7
eBook Packages: EngineeringEngineering (R0)