Skip to main content

Gibbs Sampled Hierarchical Dirichlet Mixture Model Based Approach for Clustering Scientific Articles

  • Conference paper
  • First Online:
Smart Computing Paradigms: New Progresses and Challenges

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 766))

Abstract

Biomedical research is progressing remarkably and there arises a necessity for identifying most interested sub-research discipline in biomedicine. This sounds similar to identifying the popular research sub-filed which is growing in fast pace under biomedical research. Application of topic models upon research articles derives to better clustering algorithms which reveal interesting insights to the underlying research problem. This paper proposes a new clustering algorithm which is a fusion of GSDMM and HDP. The resulting scientific article clusters are compared with K-means clustering, which reveals interesting results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Wagner, S., Wagner, D.: Comparing Clusterings: An Overview. Universität Karlsruhe, FakultätfürInformatik, Karlsruhe (2007)

    Google Scholar 

  2. Miao, Y., Kešelj, V., Milios, E.: Document clustering using character N-grams: a comparative evaluation with term-based and word-based clustering. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management (2005)

    Google Scholar 

  3. Brants, T., Chen, F., Tsochantaridis, I.: Topic-based document segmentation with probabilistic latent semantic analysis. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management. ACM (2002)

    Google Scholar 

  4. MuthuSelvi, G., Mahalakshmi, G.S., Sendhilkumar, S.: Author attribution using stylometry for multi-author scientific publications. Adv. Nat. Appl. Sci. 10(8), 42–47 (2016)

    Google Scholar 

  5. Kanungo, T., et al.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)

    Article  Google Scholar 

  6. Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: A bibliometric analysis of journal of informetrics—a decade study. In: International Conference on Recent Trends and Challenges in Computational Models (ICRTCCM’17), organized by Department of Computer Science and Engineering, University College of Engineering, Tindivanam (2017)

    Google Scholar 

  7. MuthuSelvi, G., Mahalakshmi, G.S., Sendhilkumar, S.: An investigation on collaboration behavior of highly cited authors in journal of informetrics (2007–2016). J. Comput. Theor. Nanosci. 3803

    Google Scholar 

  8. Mccallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 169–178. ACM Press, New York, NY (2000)

    Google Scholar 

  9. Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: Generation of Author Topic Models Using LDA. Lecture Notes in Computational Vision and BioMechanics. Springer (2017)

    Google Scholar 

  10. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes (PDF). J. Am. Stat. Assoc. 101, 1566–1581 (2006)

    Article  Google Scholar 

  11. Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: Hierarchical modeling approaches for generating author blueprints. In: International Conference on Smart Innovations in Communications and Computational Sciences (ICSICCS-2017), organizing by North West Group of Institutions, Moga, Punjab, India during 23–24 June 2017, ID:145 (2017)

    Google Scholar 

  12. Sorg, P., Cimiano, P.: An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval, Institute AIFB, University of Karlsruhe & Web Information Systems Group, Delft University of Technology (2009)

    Google Scholar 

  13. McLachlan, G., Basford, K.: Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York (1988)

    MATH  Google Scholar 

  14. Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘14), pp. 233–242. ACM, New York, NY, USA (2014)

    Google Scholar 

  15. Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: Measuring author contributions via LDA. In: 2nd International Conference on Advanced Computing and Intelligent Engineering

    Google Scholar 

  16. Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: Measuring authorial indices from the eye of co-author(s). In: International Conference on Smart Innovations in Communications and Computational Sciences (ICSICCS-2017), organizing by North West Group of Institutions, Moga, Punjab, India during 23–24 June 2017, ID:146 (2017)

    Google Scholar 

  17. Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S.: Authorship analysis of JOI articles (2007–2016). Int. J. Control Theory Appl. 9(10), 1–11 (2016), ISSN: 0974-5572

    Google Scholar 

  18. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12(4), 461–486 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. MuthuSelvi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mahalakshmi, G.S., MuthuSelvi, G., Sendhilkumar, S. (2020). Gibbs Sampled Hierarchical Dirichlet Mixture Model Based Approach for Clustering Scientific Articles. In: Elçi, A., Sa, P., Modi, C., Olague, G., Sahoo, M., Bakshi, S. (eds) Smart Computing Paradigms: New Progresses and Challenges. Advances in Intelligent Systems and Computing, vol 766. Springer, Singapore. https://doi.org/10.1007/978-981-13-9683-0_18

Download citation

Publish with us

Policies and ethics