A new feature selection method for text clustering

Xu, Junling; Xu, Baowen; Zhang, Weifeng; Cui, Zifeng; Zhang, Wei

doi:10.1007/s11859-007-0040-x

A new feature selection method for text clustering

Published: September 2007

Volume 12, pages 912–916, (2007)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Wuhan University Journal of Natural Sciences

A new feature selection method for text clustering

Download PDF

Xu Junling¹,
Xu Baowen^1,2,
Zhang Weifeng³,
Cui Zifeng¹ &
…
Zhang Wei¹

113 Accesses
3 Citations
Explore all metrics

Abstract

Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin’s index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the Davies-Bouldin’s index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method.

Article PDF

A Clustering Based Feature Selection Method Using Feature Information Distance for Text Data

A Two-Stage Unsupervised Dimension Reduction Method for Text Clustering

A Comparative Study on Feature Selection Techniques for Multi-cluster Text Data

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Sebastiani F. Machine Learning in Automated Text Categorization[J]. ACM Computing Surveys, 2002, 34:41–47.
Article MathSciNet Google Scholar
Liu T, Liu S, Chen Z, et al. An Evaluation on Feature Selection for Text Clustering[C]//Proceedings of the 20th International Conference on Machine Learning. Washington D C: AAAI Press, 2003:488–495.
Google Scholar
Dash M, Liu H. Feature Selection for Classification[J]. International Journal of Intelligent Data Analysis, 1997, 1(3): 131–156.
Article Google Scholar
Koller D, Sahami M. Toward Optimal Feature Selection [C]//Proceedings of the 13th International Conference on Machine Learning. Bari: Morgan Kaufmann, 1996:284–292.
Google Scholar
Blum A, Langley P. Selection of Relevant Features and Examples in Machine Learning[J]. Artificial Intelligence, 1997, 1(2):245–271.
Article MathSciNet Google Scholar
Jain A, Duin P, Chang M. Statistical Pattern Recognition: A Review[J]. IEEE Trans Pattern Analysis and Machine Intelligence, 2000, 22(1):4–37.
Article Google Scholar
Dash M, Liu H. Feature Selection for Clustering[C]// Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Kyoto: Springer, 2000:110–121.
Google Scholar
Martin H, Mario A, Jain A. Feature Saliency in Unsupervised Learning[R]. Michigan: Michigan State University,2002.
Google Scholar
Yang Y, Pedersen J. A Comparative Study on Feature Selection in Text Categorization[C]//Proceedings of the 4th International Conference on Machine Learning. Nashville: Morgan Kaufmann Press, 1997:412–420.
Google Scholar
Galavotti L, Sebastiani F, Simi M. Feature Selection and Negative Evidence in Automated Text Categorization[C]// Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000.
Google Scholar
Davies D, Bouldin D. A Cluster Separation Measure[J]. IEEE Trans Pattern Analysis and Machine Intelligence, 1979, 1:224–227.
Article Google Scholar
Blum A, Mitchell T. Combining Labeled and Unlabeled Data with Co-Training[C]//Proceedings of the 11th Annual Conference on Computational Learning Theory. Madison: ACM Press, 1998:92–100.
Google Scholar
Huang S, Chen Z, Yu Y, et al. Multitype Features Coselection for Web Document Clustering[J]. IEEE Trans Knowledge and Data Engineering, 2006, 18(4):448–459.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, Nanjing, 210096, Jiangsu, China
Xu Junling, Xu Baowen, Cui Zifeng & Zhang Wei
State Key Laboratory of Software Engineering, Wuhan University, Wuhan, 430072, Hubei, China
Xu Baowen
Department of Computer Science and Engineering, Nanjing University of Posts and Telecommunications, Nanjing, 210003, Jiangsu, China
Zhang Weifeng

Authors

Xu Junling
View author publications
You can also search for this author in PubMed Google Scholar
Xu Baowen
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Weifeng
View author publications
You can also search for this author in PubMed Google Scholar
Cui Zifeng
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu Baowen.

Additional information

Foundation item: Supported by the National Natural Science Foundation of China (60503020, 60373066), the Outstanding Young Scientist’s Fund (60425206), the Natural Science Foundation of Jiangsu Province (BK2005060) and the Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow University

Biography: XU Junling(1984–), male, Ph.D. candidate, research direction: statistical pattern recognition, machine learning and data mining.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J., Xu, B., Zhang, W. et al. A new feature selection method for text clustering. Wuhan Univ. J. of Nat. Sci. 12, 912–916 (2007). https://doi.org/10.1007/s11859-007-0040-x

Download citation

Received: 18 February 2007
Issue Date: September 2007
DOI: https://doi.org/10.1007/s11859-007-0040-x

Key words

CLC number

TP 393

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new feature selection method for text clustering

Abstract

Article PDF

Similar content being viewed by others

A Clustering Based Feature Selection Method Using Feature Information Distance for Text Data

A Two-Stage Unsupervised Dimension Reduction Method for Text Clustering

A Comparative Study on Feature Selection Techniques for Multi-cluster Text Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

A new feature selection method for text clustering

Abstract

Article PDF

Similar content being viewed by others

A Clustering Based Feature Selection Method Using Feature Information Distance for Text Data

A Two-Stage Unsupervised Dimension Reduction Method for Text Clustering

A Comparative Study on Feature Selection Techniques for Multi-cluster Text Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation