Abstract
Document preprocessing and Feature selection are the major problem in the field of data mining, machine learning and pattern recognition. Feature Subset Selection becomes an important preprocessing part in the area of data mining. Hence, to reduce the dimensionality of the feature space, and to improve the performance, document preprocessing, feature selection and attribute reduction becomes an important parameter. To overcome the problem of document preprocessing, feature selection and attribute reduction, a theoretic framework based on hybrid Information gain-rough set (IG-RS) model is proposed. In this paper, firstly the document preprocessing is prepared; secondly an information gain is used to rank the importance of the feature. In the third stage a neighborhood rough set model is used to evaluate the lower and upper approximation value. In the fourth stage an attribute reduction algorithm based on rough set model is proposed. Experimental results show that the hybrid IG-RS model based method is more flexible to deal with documents.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proc. of the 6th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, KDD (2000)
Fung, B., Wang, K., Ester, M.: Hierarchical document clustering using frequent item sets. In: Proc. of SIAM Int’l Conf. on Data Mining, SDM, pp. 59–70 (May 2003)
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proc. of Int’l Conf. on Knowledge Discovery and Data Mining, KDD 2002, pp. 436–442 (2002)
Chen, C.L., Tseng, F.S.C., Liang, T.: An integration of fuzzy association rules and WordNet for document clustering. In: Proc. of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 147–159 (2009)
Chen, C.-L., Tseng, F.S.C., Liang, T.: An integration of WordNet and fuzzy association rule mining for multi-label document clustering. Data & Knowledge Engineering 69, 1208–1226 (2010)
Chen, C.-L., Tseng, F.S.C., Liang, T.: Mining fuzzy frequent itemsets for hierarchical document clustering. Information Processing and Management 46, 193–211 (2010)
Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X.: A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Information Processing and Management (2012)
Xu, Y., Wang, B., Li, J.-T., Jing, H.: An Extended Document Frequency Metric for Feature Selection in Text Categorization. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 71–82. Springer, Heidelberg (2008)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp. 412–420 (1997)
Pawlak, Z., Skowron, A.: Rough Sets: Some Extensions. Information Sciences 177, 28–40 (2007)
Jensen, R., Shen, Q.: Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches. IEEE Transactions of Knowledge and Data Engineering 16, 1457–1471 (2004)
Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Transactions on Fuzzy Systems 15(1), 73–89 (2007)
Uğuz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems 24, 1024–1032 (2011)
Hu, Q., Yu, D., Liu, J., Wu, C.: Neighborhood rough set based heterogeneous feature subset selection. Information Sciences 178, 3577–3594 (2008)
Wang, H.: Nearest neighbors by neighborhood counting. IEEE Transactions on PAMI 28, 942–953 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Patil, L.H., Atique, M. (2015). A Novel Feature Selection and Attribute Reduction Based on Hybrid IG-RS Approach. In: Satapathy, S., Govardhan, A., Raju, K., Mandal, J. (eds) Emerging ICT for Bridging the Future - Proceedings of the 49th Annual Convention of the Computer Society of India CSI Volume 2. Advances in Intelligent Systems and Computing, vol 338. Springer, Cham. https://doi.org/10.1007/978-3-319-13731-5_59
Download citation
DOI: https://doi.org/10.1007/978-3-319-13731-5_59
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13730-8
Online ISBN: 978-3-319-13731-5
eBook Packages: EngineeringEngineering (R0)