A Frequent Term-Based Multiple Clustering Approach for Text Documents

Zheng, Hai-Tao; Chen, Hao; Gong, Shu-Qin

doi:10.1007/978-3-319-11116-2_57

Hai-Tao Zheng¹⁹,
Hao Chen¹⁹ &
Shu-Qin Gong¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8709))

Included in the following conference series:

Asia-Pacific Web Conference

3313 Accesses
1 Citations

Abstract

With the boom of web and social network, the amount of generated text data has increased enormously. On one hand, although text clustering methods are applicable to classify text data and facilitate data mining work such as information retrieval and recommendation, inadequate aspects are still evident. Especially, most existing text clustering methods provide either a hard partitioned or a hierarchical result, which cannot describe the data from various perspectives. On the other hand, multiple clustering approaches, which are proposed to classify data with various perspectives, meet several challenges such as high time complexity and incomprehensible results while applied to text documents. In this paper, we propose a frequent term-based multiple clustering approach for text documents. Our approach classifies text documents with various perspectives and provides a semantic explanation for each cluster. Through a series of experiments, we prove that our method is more scalable and provides more comprehensible results than traditional multiple clustering methods such as OSCLU and ASCLU while applied to text documents. In addition, we also found that our approach achieves a better clustering quality than existing text clustering approaches like FTC.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Frequent Term-Based Text Clustering Using Hidden Support

A comprehensive and analytical review of text clustering techniques

Article 08 April 2024

A Statistics-Based Semantic Relation Analysis Approach for Document Clustering

Keywords

References

Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM 38(11), 39–41 (1995)
Article Google Scholar
Günnemann, S., Müller, E., Färber, I., Seidl, T.: Detection of orthogonal concepts in subspaces of high dimensional data. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1317–1326. ACM (2009)
Google Scholar
Günnemann, S., Färber, I., Müller, E., Seidl, T.: Asclu: Alternative subspace clustering. In: MultiClust at KDD. Citeseer (2010)
Google Scholar
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 881–892 (2002)
Article Google Scholar
Agrawal, R., Ramakrishnan, Srikant, o.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008 (1997)
Google Scholar
Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. Proceedings of the VLDB Endowment 2(1), 1270–1281 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua-Southampton Web Science Laboratory, Graduate School at Shenzhen, Tsinghua University Shenzhen, China
Hai-Tao Zheng, Hao Chen & Shu-Qin Gong

Authors

Hai-Tao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Hao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Qin Gong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Beijing Institute of Spacecraft System Engineering, Beijing, China
Lei Chen
School of Computer Science, National University of Defense Technology, 410073, Changsha, Hunan, China
Yan Jia
RMIT University, Melbourne, Australia
Timos Sellis
School of Computer Science and Technology, Soochow University, 215006, Suzhou, China
Guanfeng Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, HT., Chen, H., Gong, SQ. (2014). A Frequent Term-Based Multiple Clustering Approach for Text Documents. In: Chen, L., Jia, Y., Sellis, T., Liu, G. (eds) Web Technologies and Applications. APWeb 2014. Lecture Notes in Computer Science, vol 8709. Springer, Cham. https://doi.org/10.1007/978-3-319-11116-2_57

Download citation

DOI: https://doi.org/10.1007/978-3-319-11116-2_57
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11115-5
Online ISBN: 978-3-319-11116-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Frequent Term-Based Multiple Clustering Approach for Text Documents

Abstract

Chapter PDF

Similar content being viewed by others

Frequent Term-Based Text Clustering Using Hidden Support

A comprehensive and analytical review of text clustering techniques

A Statistics-Based Semantic Relation Analysis Approach for Document Clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Frequent Term-Based Multiple Clustering Approach for Text Documents

Abstract

Chapter PDF

Similar content being viewed by others

Frequent Term-Based Text Clustering Using Hidden Support

A comprehensive and analytical review of text clustering techniques

A Statistics-Based Semantic Relation Analysis Approach for Document Clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation