Abstract
In text categorization, term weighting is the task to assign weights to terms during the document presentation phase. Thus, it affects the classification performance. In this paper, we propose a new term weighting scheme logtf.rf max . It is an improvement to tf.rf − one of the most effective term weighting schemes to date. We conducted experiments to compare the new term weighting scheme to tf.rf and others on common text categorization benchmark data sets. The experimental results show that logtf.rf max consistently outperforms tf.rf as well as other schemes. Furthermore, our new scheme is simpler than tf.rf.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic Query Expansion Using SMART: TREC 3. In: NIST SPECIAL PUBLICATION SP, pp. 69–69 (1995)
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and its Applications. STUDFUZZ, vol. 138, pp. 81–97. Springer, Heidelberg (2004)
Deng, Z.-H., Tang, S.-W., Yang, D.-Q., Li, M.Z.L.-Y., Xie, K.-Q.: A comparative study on feature weight in text categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 588–597. Springer, Heidelberg (2004)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: The Seventh International Conference on Information and Knowledge Management (1998)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A Library for Large Linear classification. The Journal of Machine Learning Research 9, 1871–1874 (2008), Software available at http://www.csie.ntu.edu.tw/~cjlin/liblinear
Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification (2003)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)
Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to represent texts in input space? Machine Learning 46(1-3), 423–444 (2002)
Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: The Eleventh International Conference on Information and Knowledge Management, pp. 659–661. ACM (2002)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)
Wu, H., Salton, G.: A comparison of search term weighting: term relevance vs. inverse document frequency. ACM SIGIR Forum 16(1), 30–39 (1981)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49. ACM (1999)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Machine Learning-International Workshop Then Conference, pp. 412–420. Morgan Kaufmann Publishers, Inc. (1997)
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1-2), 69–90 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Xuan, N.P., Le Quang, H. (2014). A New Improved Term Weighting Scheme for Text Categorization. In: Huynh, V., Denoeux, T., Tran, D., Le, A., Pham, S. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 244. Springer, Cham. https://doi.org/10.1007/978-3-319-02741-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-02741-8_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02740-1
Online ISBN: 978-3-319-02741-8
eBook Packages: EngineeringEngineering (R0)