Abstract
Semantic Relatedness (SR) defines a relation between linguistic items. These items could be words, phrases, or documents. There are many interesting related applications such as information extraction, words sense disambiguation, text summarization, and text clustering. The task of quantifying SR manually is fairly natural and axiomatic, whereas it is complex automatically because of human’s background experience and external domain concepts that are not available for the computational methods. This paper focuses on the Semantic Relatedness in Short Texts (SRST). A Vector Space Model—that is based on multi-corpus—is proposed to measure the SRST. Word synonyms and anaphoric information are used to improve the semantic representation of the document. Since the set of verses in the Holy Quran is a precious sample of the short texts., it is used as the main case study in this paper to measure the degree of relatedness between these verses. Experiments are conducted where their results proved the efficiency of the proposed model in improving SR measurement. The results show an improvement to the recall to be 60% rather than 11.3% as the best previous studies.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Pakhomov, S.; McInnes, B.; Adam, T.; Liu, Y.; Pedersen, T.; Melton, G.B.: Semantic similarity and relatedness between clinical terms: an experimental study. In: AMIA Annual Symposium Proceedings 2010, pp. 572–576. American Medical Informatics Association (2010)
Harispe, S.; Ranwez, S.; Janaqi, S.; Montmain, J.: Semantic similarity from natural language and ontology analysis. Synth. Lect. Hum. Lang. Technol. 8(1), 1–254 (2015)
Shoaib, M.; Daud, A.; Khiyal, M.S.H.: Improving similarity measures for publications with special focus on author name disambiguation. Arab. J. Sci. Eng. 40(6), 1591–1605 (2015). https://doi.org/10.1007/s13369-015-1636-7
Fernando, S.; Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics 2008, pp. 45–52 (2008)
Aliguliyev, R.M.: A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Exp. Syst. Appl. 36(4), 7764–7772 (2009)
Martinez, D.; MacKinlay, A.; Aliod, D.M.; Cavedon, L.; Verspoor, K.: Simple similarity-based question answering strategies for biomedical text. In: CLEF (Online Working Notes/Labs/Workshop) (2012)
Gómez-Adorno, H.; Pinto, D.; Vilarino, D.: A question answering system for reading comprehension tests. In: Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F.; Rodríguez J.S.; di Baja G.S. (eds.) Mexican Conference on Pattern Recognition 2013, pp. 354–363. Springer, Berlin (2013)
Shoukry, A.; Rafea, A.: Sentence-level Arabic sentiment analysis. In: Collaboration Technologies and Systems (CTS), 2012 International Conference on 2012, pp. 546–550. IEEE (2012)
Al-Zoghby, A.M.; Ahmed, A.S.E.; Hamza, T.T.: Arabic semantic web applications: a survey. J. Emerg. Technol. Web Intell. 5(1), 52–69 (2013)
Shaheen, M.; Ezzeldin, A.M.: Arabic question answering: systems, resources, tools, and future trends. Arab. J. Sci. Eng. 39(6), 4541–4564 (2014). https://doi.org/10.1007/s13369-014-1062-2
Hakkoum, A.; Raghay, S.: Semantic Q&A system on the Qur’an. Arab. J. Sci. Eng. 41(12), 5205–5214 (2016). https://doi.org/10.1007/s13369-016-2251-y
Lahbib, W.; Bounhas, I.; Elayeb, B.; Evrard, F.; Slimani, Y.: A hybrid approach for Arabic semantic relation extraction. In: FLAIRS Conference (2013)
Froud, H.; Lachkar, A.; Ouatik, S.A.: A comparative study of root-based and stem-based approaches for measuring the similarity between arabic words for Arabic text mining applications. arXiv preprint arXiv:1212.3634 (2012)
Hadni, M.; Ouatik, S.E.A.; Lachkar, A.: Word sense disambiguation for arabic text categorization. Int. Arab J. Inf. Technol. 13(1A), 215–222 (2016)
Joty, S.; Moschitti, A.; Al Obaidli, F.A.; Romeo, S.; Tymoshenko, K.; Uva, A.: ConvKN at SemEval-2016 Task 3: answer and question selection for question answering on Arabic and English fora. In: Proceedings of SemEval, pp. 896–903 (2016)
Ababneh, J.; Almomani, O.; Hadi, W.; El-Omari, N.K.T.; Al-Ibrahim, A.: Vector space models to classify Arabic text. Int. J. Comput. Trends Technol. (IJCTT) 7(4), 219–223 (2014)
Al-Anzi, F.S.; AbuZeina, D.: Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing. J. King Saud Univ. Comput. Inf. Sci. 29(2), 189–195 (2017)
Sharaf, A.-B.M.; Atwell, E.: QurSim: a corpus for evaluation of relatedness in short texts. In: LREC 2012, pp. 2295–2302 (2012)
Sharaf, A.-B.M.; Atwell, E.: QurAna: Corpus of the Quran annotated with Pronominal Anaphora. In: LREC 2012, pp. 130–137 (2012)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
El-Deeb, R., Al-Zoghby, A.M. & Elmougy, S. Multi-corpus-Based Model for Measuring the Semantic Relatedness in Short Texts (SRST). Arab J Sci Eng 43, 7933–7943 (2018). https://doi.org/10.1007/s13369-018-3232-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-018-3232-0