Aggregation of Semantically Similar News Articles with the Help of Embedding Techniques and Unsupervised Machine Learning Algorithms: A Machine Learning Application with Semantic Technologies

Tarbani, Nitesh; Wadhva, Kanchan

doi:10.1007/978-3-030-99079-4_5

Nitesh Tarbani⁶ &
Kanchan Wadhva⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1038))

513 Accesses

Abstract

Business news helps leaders and entrepreneurs in decision-making every day. This involves making corporate strategies, taking marketing decisions, planning operations, investing human capital, etc. This news gives a complete idea to leaders and entrepreneurs about what is happening in the corporate world. They maintain track of all mergers and takeovers and make interested people informed. Today, it is essential for people to keep themselves updated about corporate business. As there are so many news websites and the same news article gets published on each of these websites with a little changed title. As a consequence of which people have to spend far longer trying to find information than the time they have to catch up on the news. So, it would be very helpful for such people, if clusters of semantically similar news articles from different websites could be created, and reading only one news item from one cluster will be sufficient. This chapter will explain a few approaches to aggregate similar news articles. The very first step is to collect the data. Initially, for developing the model, data is collected from sites such as Kaggle, UCI, etc. After the model is developed, real-time data can be collected using the news API. The second step is to preprocess the collected data which involves subtasks such as Tokenization, Stop-Word Removal, Stemming/Lemmatization, Case Transformation, etc. The third step here is Embedding Text to Vectors, using various embedding techniques such as Bag-of-Words, TF-IDF, Word2Vec, etc. The next task here is to make clusters of these embedded vectors or numbers, using various unsupervised learning algorithms such as K-means, agglomerative, etc. Finally, the last step here is to find a comparison of various combinations of embedding techniques and clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Real-Time News Grouping: Detecting the Same-Content News on Turkish News Stream

Aggregating Neural Word Embeddings for Document Representation

Sequence-Based Word Embeddings for Effective Text Classification

References

N.O. Andrews, E.A. Fox, Recent Developments in Document Clustering (2007)
Google Scholar
G. Salton, A. Wong, C.S. Yang, A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article Google Scholar
M. Steinbach, G. Karypis, V. Kumar, A comparison of document clustering techniques. Technical Report (Department of Computer Science and Engineering, University of Minnesota, 2000)
Google Scholar
F. Bach, M. Jordan, Learning spectral clustering, in Advances in Neural Information Processing Systems 16 (NIPS). ed. by S. Thrun, L. Saul, B. Schölkopf (MIT Press, Cambridge, 2004), pp. 305–312
Google Scholar
D. Cheng, S. Vempala, R. Kannan, G. Wang, A divide-and-merge methodology for clustering, in PODS ’05: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (ACM Press, New York, NY, USA, 2005), pp. 196–205
Google Scholar
C.H.Q. Ding, X. He, H. Zha, M. Gu, H.D. Simon, A min–max cut algorithm for graph partitioning and data clustering, in ICDM ’01: Proceedings of the 2001 IEEE International Conference on Data Mining (IEEE Computer Society, Washington, DC, USA, 2001), pp 107–114
Google Scholar
S. Osinski, J. Stefanowski, D. Weiss, Lingo: search results clustering algorithm based on singular value decomposition, in ed. M.A. Klopotek, S.T. Wierzchon, K. Trojanowski, Intelligent Information Systems, Advances in Soft Computing (Springer, Berlin, 2004), pp 359–368
Google Scholar
D. Greene, P. Cunningham, Producing accurate interpretable clusters from high-dimensional data, in 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, vol. 3721 (University of Dublin, Trinity College, Dublin, 2005), pp. 486–494
Google Scholar
O. Zamir, O. Etzioni, Web document clustering: a feasibility demonstration, in SIGIR ’98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM Press, New York, NY, USA, 1998), pp. 46–54
Google Scholar
Y. Xia, N. Tang, A. Hussain, E. Cambria, Discriminative Bi-Term Topic Model for Headline-Based Social News Clustering in FLAIRS Conference (2015)
Google Scholar
I. Himelboim, M.A. Smith, L. Rainie, B. Shneiderman, C. Espina, Classifying Twitter topic-networks using social network analysis. Soc. Media + Soc. 3(1), (2017)
Google Scholar
M. Sahami, T.D. Heilman, A web-based kernel function for measuring the similarity of short text snippets, in WWW (ACM, New York, NY, USA, 2006), pp. 377–386
Google Scholar
S. Banerjee, K. Ramanathan, A. Gupta, Clustering short texts using Wikipedia, in Proceeding SIGIR ’07 Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007), pp. 787–788
Google Scholar
J.G. Conrad, M. Bender, Semi-supervised events clustering in news retrieval, in NewsIR@ECIR (2016)
Google Scholar
M. Weber, Finding news in a haystack: event based clustering with social media based ranking. Master thesis for the media technology programme, Leiden University, The Netherlands, 2012
Google Scholar
S. Bergamaschi, F. Guerra, M. Orsini, C. Sartori, M. Vincini, Relevant news: a semantic news feed aggregator, in Semantic Web Applications and Perspectives, vol. 314, ed. G. Semeraro, E. Di Sciascio, C. Morbidoni, H. Stoemer (2007), pp. 150–159
Google Scholar
A. Gulli, The anatomy of a news search engine, in WWW (Special Interest Tracks and Posters), ed. A. Ellis, T. Hagino (ACM, New York, 2005), pp. 880–881
Google Scholar
X. Li, J. Yan, Z. Deng, L. Ji, W. Fan, B. Zhang, Z. Chen, A novel clustering-based RSS aggregator, in Williamson et al. [11], pp. 1309–1310
Google Scholar
D.R. Radev, J. Otterbacher, A. Winkel, S. Blair-Goldensohn, Newsinessence: summarizing online news topics. Commun. ACM 48(10), 95–98 (2005)
Article Google Scholar
F. Hamborg, N. Meuschke, B. Gipp, Matrix-based news aggregation: exploring different news perspectives, in Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries (IEEE Press, 2017), pp. 69–78
Google Scholar
C. Grozea, D.C. Cercel, C. Onose, S. Trausan-Matu, Atlas: news aggregation service, in 2017 16th RoEduNet Conference: Networking in Education and Research (RoEduNet) (IEEE, 2017), pp. 1–6
Google Scholar
G. Paliouras, A. Mouzakidis, V. Moustakas, C. Skourlas, PNS: a personalized news aggregator on the web, vol. 104 (1970), pp. 175–197
Google Scholar
K. Sundaramoorthy, R. Durga, S. Nagadarshini, Newsone—an aggregation system for news using web scraping method, in 2017 International Conference on Technical Advancements in Computers and Communications (ICTACC) (IEEE, 2017), pp. 136–140
Google Scholar
A.A. Amer, H.I. Abdalla, A set theory based similarity measure for text clustering and classification. J. Big Data 7, 74 (2020). https://doi.org/10.1186/s40537-020-00344-3
Article Google Scholar

Download references

Author information

Authors and Affiliations

Sipna COET, Amravati, India
Nitesh Tarbani
Great Learning, Hyderabad, India
Kanchan Wadhva

Authors

Nitesh Tarbani
View author publications
You can also search for this author in PubMed Google Scholar
Kanchan Wadhva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nitesh Tarbani .

Editor information

Editors and Affiliations

Faculty of Computers and Information, Minia University, Minia, Egypt
Essam Halim Houssein
Faculty of Computer Science & Engineering, Galala University, Suze, Egypt
Mohamed Abd Elaziz
Department of Computer Sciences, University of Guadalajara, Guadalajara, Jalisco, Mexico
Diego Oliva
Faculty of Computer Sciences and Informatics, Amman Arab University, Amman, Jordan
Laith Abualigah

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tarbani, N., Wadhva, K. (2022). Aggregation of Semantically Similar News Articles with the Help of Embedding Techniques and Unsupervised Machine Learning Algorithms: A Machine Learning Application with Semantic Technologies. In: Houssein, E.H., Abd Elaziz, M., Oliva, D., Abualigah, L. (eds) Integrating Meta-Heuristics and Machine Learning for Real-World Optimization Problems. Studies in Computational Intelligence, vol 1038. Springer, Cham. https://doi.org/10.1007/978-3-030-99079-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-99079-4_5
Published: 05 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99078-7
Online ISBN: 978-3-030-99079-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Aggregation of Semantically Similar News Articles with the Help of Embedding Techniques and Unsupervised Machine Learning Algorithms: A Machine Learning Application with Semantic Technologies

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Real-Time News Grouping: Detecting the Same-Content News on Turkish News Stream

Aggregating Neural Word Embeddings for Document Representation

Sequence-Based Word Embeddings for Effective Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Aggregation of Semantically Similar News Articles with the Help of Embedding Techniques and Unsupervised Machine Learning Algorithms: A Machine Learning Application with Semantic Technologies

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Real-Time News Grouping: Detecting the Same-Content News on Turkish News Stream

Aggregating Neural Word Embeddings for Document Representation

Sequence-Based Word Embeddings for Effective Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation