A New Supervised Term Ranking Method for Text Categorization

Mammadov, Musa; Yearwood, John; Zhao, Lei

doi:10.1007/978-3-642-17432-2_11

Musa Mammadov²⁰,
John Yearwood²⁰ &
Lei Zhao²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6464))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1815 Accesses

Abstract

In text categorization, different supervised term weighting methods have been applied to improve classification performance by weighting terms with respect to different categories, for example, Information Gain, χ ² statistic, and Odds Ratio. From the literature there are three term ranking methods to summarize term weights of different categories for multi-class text categorization. They are Summation, Average, and Maximum methods. In this paper we present a new term ranking method to summarize term weights, i.e. Maximum Gap. Using two different methods of information gain and χ ² statistic, we setup controlled experiments for different term ranking methods. Reuter-21578 text corpus is used as the dataset. Two popular classification algorithms SVM and Boostexter are adopted to evaluate the performance of different term ranking methods. Experimental results show that the new term ranking method performs better.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A New Improved Term Weighting Scheme for Text Categorization

On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification

Article 21 May 2019

Imbalanced Text Categorization Based on Positive and Negative Term Weighting Approach

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Lan, M., Tan, C.L., Low, H.-B.: Proposing a new term weighting scheme for text categorization. In: AAAI. AAAI Press, Menlo Park (2006)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Article Google Scholar
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: SAC, pp. 784–788. ACM, New York (2003)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) ICML, pp. 412–420. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Duch, W., Duch, G.: Filter methods. In: Feature Extraction, Foundations and Applications, pp. 89–118. Physica Verlag, Springer (2004)
Google Scholar
Liu, Y., Loh, H.T., Youcef-Toumi, K., Tor, S.B.: Handling of Imbalanced Data in Text Classification: Category-Based Term Weights. In: Kao, A., Poteet, S.R. (eds.) Natural Language Processing and Text Mining, p. 171 (2006)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Lewis, D.D.: Reuters-21578 text categorization test collection. Distribution 1.3 (2004)
Google Scholar
Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification (2003)
Google Scholar
Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning 39(2/3), 135–168 (2000)
Article MATH Google Scholar
Joachims, T., Nedellec, C., Rouveirol, C.: Text categorization with support vector machines: learning with many relevant. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Li, T., Zhang, C., Zhu, S.: Empirical studies on multi-label classification. In: ICTAI, pp. 86–92. IEEE Computer Society, Los Alamitos (2006)
Google Scholar
Salton, G.: Developments in automatic text retrieval. Science 253(5023), 974–980 (1991)
Article MathSciNet Google Scholar
Mammadov, M.A., Rubinov, A.M., Yearwood, J.: The study of drug-reaction relationships using global optimization techniques. Optimization Methods and Software 22(1), 99–126 (2007)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of information Technology and Mathematical Science, University of Ballarat, Ballarat, VIC, 3350, Australia
Musa Mammadov, John Yearwood & Lei Zhao

Authors

Musa Mammadov
View author publications
You can also search for this author in PubMed Google Scholar
John Yearwood
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer and Information Science, University of South Australia, 5095, Mawson Lakes, SA, Australia
Jiuyong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mammadov, M., Yearwood, J., Zhao, L. (2010). A New Supervised Term Ranking Method for Text Categorization. In: Li, J. (eds) AI 2010: Advances in Artificial Intelligence. AI 2010. Lecture Notes in Computer Science(), vol 6464. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17432-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-17432-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17431-5
Online ISBN: 978-3-642-17432-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A New Supervised Term Ranking Method for Text Categorization

Abstract

Chapter PDF

Similar content being viewed by others

A New Improved Term Weighting Scheme for Text Categorization

On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification

Imbalanced Text Categorization Based on Positive and Negative Term Weighting Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A New Supervised Term Ranking Method for Text Categorization

Abstract

Chapter PDF

Similar content being viewed by others

A New Improved Term Weighting Scheme for Text Categorization

On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification

Imbalanced Text Categorization Based on Positive and Negative Term Weighting Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation