A Genetic Semi-supervised Fuzzy Clustering Approach to Text Classification

Liu, Hong; Huang, Shang-teng

doi:10.1007/978-3-540-45160-0_17

Hong Liu⁷ &
Shang-teng Huang⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2762))

Included in the following conference series:

International Conference on Web-Age Information Management

434 Accesses
5 Citations

Abstract

A genetic semi-supervised fuzzy clustering algorithm is proposed, which can learn text classifier from labeled and unlabeled documents. Labeled documents are used to guide the evolution process of each chromosome, which is fuzzy partition on unlabeled documents. The fitness of each chromosome is evaluated with a combination of fuzzy within cluster variance of unlabeled documents and misclassification error of labeled documents. The structure of the clusters obtained can be used to classify future new documents. Experimental results show that the proposed approach can improve text classi-fication accuracy significantly, compared to text classifiers trained with a small number of labeled documents only. Also, this approach performs at least as well as the similar approach – EM with Naïve Bayes

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Framework for Text Classification Using Intuitionistic Fuzzy Sets

New Fuzzy Decision Tree Model for Text Classification

Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

Article 11 April 2017

References

Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Tzeras, K., Hartman, S.: Automatic indexing based on bayesian inference networks. In: Proc 16th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993), pp. 22–34 (1993)
Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proceedings of the 10 European Conference on Machine Learning, pp. 137–142 (1998)
Google Scholar
Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-Training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Article MATH Google Scholar
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: Proceedings of the 16th International Conference on Machine Learning, pp. 200–209 (1999)
Google Scholar
Pedrycz, W., Waletzky, J.: Fuzzy clustering with partial supervision. IEEE Trans. on Systems, Man, and Cybernetics 27(5), 787–795 (1997)
Article Google Scholar
Benkhalifa, M., Mouradi, A., Bouyakhf, H.: Integrating External Knowledge to Supplement Training Data in Semi-Supervised Learning for Text Categorization. Information Retrieval 4(2), 91–113 (2001)
Article MATH Google Scholar
Michalewicz, Z.: Genetic Algorithm + Data Structures = Evolution Programs, 3rd edn. Springer, New York (1996)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
MATH Google Scholar
Lang, K.: NewsWeeder: learning to filter Netnews. In: Proceedings of the 12th International Conference on Machine Learning, pp. 331–339 (1995)
Google Scholar
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchel, T., Nigam, K., Slatteryet, S.: Learning to construct knowledge bases from the World Wide Web. Articial Intelligence 118(1-2), 69–113 (2000)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Shanghai Jiaotong University, Xinjian Building 2008, Shanghai, 200030, China
Hong Liu & Shang-teng Huang

Authors

Hong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shang-teng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Wright State University, USA
Guozhu Dong
School of Computer Science, Sichuan University, 610065, Chengdu, China
Changjie Tang
UNC Chapel Hill,
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, H., Huang, St. (2003). A Genetic Semi-supervised Fuzzy Clustering Approach to Text Classification. In: Dong, G., Tang, C., Wang, W. (eds) Advances in Web-Age Information Management. WAIM 2003. Lecture Notes in Computer Science, vol 2762. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45160-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-45160-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40715-7
Online ISBN: 978-3-540-45160-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Genetic Semi-supervised Fuzzy Clustering Approach to Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Framework for Text Classification Using Intuitionistic Fuzzy Sets

New Fuzzy Decision Tree Model for Text Classification

Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Genetic Semi-supervised Fuzzy Clustering Approach to Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Framework for Text Classification Using Intuitionistic Fuzzy Sets

New Fuzzy Decision Tree Model for Text Classification

Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation