Abstract
A filter algorithm using F-measure has been used with feature redundancy removal based on the Kolmogorov-Smirnov (KS) test for rough equality of statistical distributions. As a result computationally efficient K-S Correlation-Based Selection algorithm has been developed and tested on three high-dimensional microarray datasets using four types of classifiers. Results are quite encouraging and several improvements are suggested.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Feature Selection
- Feature Subset
- Feature Selection Algorithm
- Feature Ranking
- Microarray Gene Expression Data
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Duch, W.: Filter methods. In: [3], pp. 89–118 (2006)
Toussaint, G.T.: Note on optimal selection of independent binary-valued features for pattern recognition. IEEE Transactions on Information Theory 17, 618–618 (1971)
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature extraction, foundations and applications. Physica Verlag, Springer, Heidelberg (2006)
Biesiada, J., Duch, W.: Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter Solution. In: Kurzynski, M., Puchala, E., Wozniak, M., Zolnierek, A. (eds.) Computer Recognition Systems. Proc. of the 4th International Conference on Computer Recognition Systems (CORES 2005). Advances in Soft Computing, vol. 9, pp. 95–104. Springer, Heidelberg (2005)
Alon, U., et al.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. PNAS 96, 6745–6750 (1999)
Alizadeh, A.A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Golub, T.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes in C. The art of scientific computing. Cambridge University Press, Cambridge (1988)
Hall, M.A.: Correlation based feature selection for machine learning. PhD thesis, Dept. of Comp. Science, Univ. of Waikato, Hamilton, New Zealand (1999)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 12th International Conference on Machine Learning (ICML 2003), Washington, D.C., pp. 856–863. Morgan Kaufmann, San Francisco (2003)
Dash, M., Liu, H., Motoda, H.: Consistency based feature selection. In: Proc. 4th Pacific Asia Conference on Knowledge Discovery and Data Mining, pp. 98–109. Springer, Heidelberg (2000)
Evans, M., Hastings, N., Peacock, B.: Statistical Distributions. John Wiley & Sons, Chichester (2000)
Duch, W., Biesiada, J.: Margin-based feature selection filters for microarray gene expression data. International Journal of Information Technology and Intelligent Computing 1, 9–33 (2006)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology 3(2), 185–205 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Biesiada, J., Duch, W. (2008). A Kolmogorov-Smirnov Correlation-Based Filter for Microarray Data. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds) Neural Information Processing. ICONIP 2007. Lecture Notes in Computer Science, vol 4985. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69162-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-69162-4_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69159-4
Online ISBN: 978-3-540-69162-4
eBook Packages: Computer ScienceComputer Science (R0)