Abstract
Due to a shortage of resources for studying opinions and feelings in Arabic dialects, the task to identify the polarity of sentiments in the Arabic web is a challenging task. In this paper we present MYC a Moroccan YouTube Corpus of manually annotated comments with the aim of facilitating the task of sentiments analysis of Moroccan dialect in the web. Comments are collected from the wildly used website YouTube and manually annotated by several annotators. Using the voting approach, we created the largest Moroccan dialect subjectivity corpus of 20 000 comments labeled into positive and negative comments and including some other information (topic, likes and dislikes). This dataset could be a useful tool for the creation of Moroccan dialect-specific NLP applications in the future. In the trials, Support Vector Machines (SVM) and Naive Bayes (NB), two well-known supervised learning classifiers, were used to categorize comments as positive or negative, using a distinct set of parameters for each. Each classifier’s recall, precision, and F-measure are computed. Both SVM and NB do well in terms of precision. The acquired results encourage us to move further with additional Moroccan comments from different videos in order to generalize our model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)
Jackson, P., Moulinier, I.: Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization, vol. 5. John Benjamins Publishing Company, Amsterdam (2002)
Atia, S., Shaalan, K.: Increasing the accuracy of opinion mining in Arabic. In: Proceedings—1st International Conference on Arabic Computational Linguistics: Advances in Arabic Computational Linguistics ACLing 2015, pp. 106–113 (2015)
Cherif, W., Madani, A., Kissi, M.: Towards an efficient opinion measurement in Arabic comments. Procedia Comput. Sci. 73(Awict), 122–129 (2015)
Salloum, S.A., Al-emran, M., Monem, A.A., Shaalan, K.: A survey of text mining in social media: facebook and twitter perspectives. Adv. Sci. Technol. Eng. Syst. J. 2(1), 127–133 (2017)
Heikal, M., Torki, M., El-Makky, N.: Sentiment analysis of Arabic tweets using deep learning. Procedia Comput. Sci. 142, 114–122 (2018)
Duwairi, R.M., Marji, R., Sha’ban, N., Rushaidat, S.: Sentiment analysis in Arabic tweets. In: 5th International Conference on Information and Communication Systems (ICICS) (2014)
Al-Tamimi, A.-K., Shatnawi, A., Bani-Issa, E.: Arabic sentiment analysis of YouTube comments. In: IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (2017)
Ebimba.com. Top 15 Most Popular social Networking Sites (2021). http://www.ebizmba.com/articles/social-networkingwebsites. Accessed: 06 Mar 2021. (Don’t DELETE)
Abdul-Mageed, M., Diab, M.T.: SANA: a large scale multi-genre, multi-dialect lexicon for Arabic subjectivity and sentiment analysis. In: LREC (2014)
Diab, M., et al.: Tharwa: a large scale dialectal Arabic-standard Arabic-English lexicon. In: Proceedings of the Language Resources and Evaluation Conference (LREC) (2014)
Rahab, H., Zitouni, A., Djoudi, M.: SIAAC: sentiment polarity identification on Arabic Algerian newspaper comments. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds.) CoMeSySo 2017. AISC, vol. 662, pp. 139–149. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67621-0_12
Al-Thubaity, A., Alqahtani, Q., Aljandal, A.: Sentiment lexicon for sentiment analysis of Saudi dialect tweets. Procedia Comput. Sci. 142, 301–307 (2018)
El-Beltagy, S.R., Ali, A.: Open issues in the sentiment analysis of Arabic social media: a case study. In: 2013 9th International Conference on Innovations in Information Technology (IIT) (2013)
Guellil, I., Adeel, A., Azouaou, F., Hussain, A.: SentiALG: automated corpus annotation for Algerian sentiment analysis. In: Ren, J., et al. (eds.) BICS 2018. LNCS (LNAI), vol. 10989, pp. 557–567. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00563-4_54
Oussous, A., Benjelloun, F.-Z., Lahcen, A.A., Belfkih, S.: ASA: a framework for Arabic sentiment analysis. J. Inf. Sci. (2019)
Abdulla, N.A., Ahmed, N.A., Shehab, M.A., Al-Ayyoub, M., Al-Kabi, M.N., Al-rifai, S.: Towards improving the lexicon-based approach for Arabic sentiment analysis. Int. J. Inf. Technol. Web Eng. 9(3), 55–71 (2014)
Elmadany, A.A., Hamdy Mubarak, W.M.: ArSAS: an Arabic speech-act and sentiment corpus of tweets. In: OSACT 3: the 3rd Workshop on Open-Source Arabic Corpora and Processing Tools, p. 20 (2018)
Elnagar, A., Khalifa, Y.S., Einea, A.: Hotel Arabic-reviews dataset construction for sentiment analysis applications. Studies Comput. Intell. 35–52 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jbel, M., Hafidi, I., Metrane, A. (2023). MYC: A Moroccan Corpus for Sentiment Analysis. In: Aboutabit, N., Lazaar, M., Hafidi, I. (eds) Advances in Machine Intelligence and Computer Science Applications. ICMICSA 2022. Lecture Notes in Networks and Systems, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-031-29313-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-29313-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28845-6
Online ISBN: 978-3-031-29313-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)