Abstract
Dimensionality reduction can efficiently improve computing performance of classifiers in text categorization, and non-negative matrix factorization could map the high dimensional term space into a low dimensional semantic subspace easily. Meanwhile, the non-negative of the basis vectors could provide a meaningful explanation for the semantic subspace. However, it usually could not achieve a satisfied classification performance because it is sensitive to the noise, data missing and outlier as a linear reconstruction method. This paper proposes a novel approach in which the train text and its category information are fused and a transformation matrix that maps the term space into a semantic subspace is obtained by a basis orthogonality non-negative matrix factorization and truncation. Finally, the dimensionality can be reduced aggressively with these transformations. Experimental results show that the proposed approach remains a good classification performance in a very low dimensional case.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Zheng, W., Qian, Y.: Aggressive dimensionality reduction with reinforcement local feature selection for text categorization. In: Wang, F.L., Deng, H., Gao, Y., Lei, J. (eds.) AICI 2010. LNCS, vol. 6319, pp. 365–372. Springer, Heidelberg (2010)
Landauer, T., Foltz, P., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25(2), 259–284 (1998)
Zhou, S., Li, K., Liu, Y.: Text categorization based on topic model. In: Wang, G., Li, T., Grzymala-Busse, J.W., Miao, D., Skowron, A., Yao, Y. (eds.) RSKT 2008. LNCS (LNAI), vol. 5009, pp. 572–579. Springer, Heidelberg (2008)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Silva, C., Ribeiro, B.: Knowledge extraction with non-negative matrix factorization for text classification. In: Corchado, E., Yin, H. (eds.) IDEAL 2009. LNCS, vol. 5788, pp. 300–308. Springer, Heidelberg (2009)
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5, 1457–1469 (2004)
Guillamet, D., Vitri, J., Schiele, B.: Introducing a weighted non-negative matrix factorization for image classification. Pattern Recognition Letters 24(14), 2447–2454 (2003)
Liu, W., Yuan, K., Ye, D.: Reducing microarray data via nonnegative matrix factorization for visualization and clustering analysis. Journal of Biomedical Informatics 41(4), 602–606 (2008)
Bishop, C.M.: SpringerLink: Pattern recognition and machine learning, vol. 4. Springer, New York (2006)
Zheng, W., Zhang, H., Qian, Y.: Fast text categorization based on collaborative work in the semantic and class spaces. In: To Appear in the International Conference on Machine Learning and Cybernetics (2011)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420, Citeseer. Morgan Kaufmann Publishers Inc, San Francisco (1997)
Zhang, W., Yoshida, T., Tang, X.J.: A comparative study of tf*idf, lsi and multi-words for text classification. Expert Systems with Applications 38(3), 2758–2765 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zheng, W., Qian, Y., Tang, H. (2011). Dimensionality Reduction with Category Information Fusion and Non-negative Matrix Factorization for Text Categorization. In: Deng, H., Miao, D., Lei, J., Wang, F.L. (eds) Artificial Intelligence and Computational Intelligence. AICI 2011. Lecture Notes in Computer Science(), vol 7004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23896-3_62
Download citation
DOI: https://doi.org/10.1007/978-3-642-23896-3_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23895-6
Online ISBN: 978-3-642-23896-3
eBook Packages: Computer ScienceComputer Science (R0)