Abstract
Biomedical research is progressing remarkably and there arises a necessity for identifying most interested sub-research discipline in biomedicine. This sounds similar to identifying the popular research sub-filed which is growing in fast pace under biomedical research. Application of topic models upon research articles derives to better clustering algorithms which reveal interesting insights to the underlying research problem. This paper proposes a new clustering algorithm which is a fusion of GSDMM and HDP. The resulting scientific article clusters are compared with K-means clustering, which reveals interesting results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wagner, S., Wagner, D.: Comparing Clusterings: An Overview. Universität Karlsruhe, FakultätfürInformatik, Karlsruhe (2007)
Miao, Y., Kešelj, V., Milios, E.: Document clustering using character N-grams: a comparative evaluation with term-based and word-based clustering. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management (2005)
Brants, T., Chen, F., Tsochantaridis, I.: Topic-based document segmentation with probabilistic latent semantic analysis. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management. ACM (2002)
MuthuSelvi, G., Mahalakshmi, G.S., Sendhilkumar, S.: Author attribution using stylometry for multi-author scientific publications. Adv. Nat. Appl. Sci. 10(8), 42–47 (2016)
Kanungo, T., et al.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: A bibliometric analysis of journal of informetrics—a decade study. In: International Conference on Recent Trends and Challenges in Computational Models (ICRTCCM’17), organized by Department of Computer Science and Engineering, University College of Engineering, Tindivanam (2017)
MuthuSelvi, G., Mahalakshmi, G.S., Sendhilkumar, S.: An investigation on collaboration behavior of highly cited authors in journal of informetrics (2007–2016). J. Comput. Theor. Nanosci. 3803
Mccallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 169–178. ACM Press, New York, NY (2000)
Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: Generation of Author Topic Models Using LDA. Lecture Notes in Computational Vision and BioMechanics. Springer (2017)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes (PDF). J. Am. Stat. Assoc. 101, 1566–1581 (2006)
Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: Hierarchical modeling approaches for generating author blueprints. In: International Conference on Smart Innovations in Communications and Computational Sciences (ICSICCS-2017), organizing by North West Group of Institutions, Moga, Punjab, India during 23–24 June 2017, ID:145 (2017)
Sorg, P., Cimiano, P.: An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval, Institute AIFB, University of Karlsruhe & Web Information Systems Group, Delft University of Technology (2009)
McLachlan, G., Basford, K.: Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York (1988)
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘14), pp. 233–242. ACM, New York, NY, USA (2014)
Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: Measuring author contributions via LDA. In: 2nd International Conference on Advanced Computing and Intelligent Engineering
Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: Measuring authorial indices from the eye of co-author(s). In: International Conference on Smart Innovations in Communications and Computational Sciences (ICSICCS-2017), organizing by North West Group of Institutions, Moga, Punjab, India during 23–24 June 2017, ID:146 (2017)
Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: Authorship analysis of JOI articles (2007–2016). Int. J. Control Theory Appl. 9(10), 1–11 (2016), ISSN: 0974-5572
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12(4), 461–486 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S. (2020). Gibbs Sampled Hierarchical Dirichlet Mixture Model Based Approach for Clustering Scientific Articles. In: Elçi, A., Sa, P., Modi, C., Olague, G., Sahoo, M., Bakshi, S. (eds) Smart Computing Paradigms: New Progresses and Challenges. Advances in Intelligent Systems and Computing, vol 766. Springer, Singapore. https://doi.org/10.1007/978-981-13-9683-0_18
Download citation
DOI: https://doi.org/10.1007/978-981-13-9683-0_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9682-3
Online ISBN: 978-981-13-9683-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)