Abstract
This paper presents, a grammatically motivated, sentiment classification model, applied on a morphologically rich language: Urdu. The morphological complexity and flexibility in grammatical rules of this language require an improved or altogether different approach. We emphasize on the identification of the SentiUnits, rather than, the subjective words in the given text. SentiUnits are the sentiment carrier expressions, which reveal the inherent sentiments of the sentence for a specific target. The targets are the noun phrases for which an opinion is made. The system extracts SentiUnits and the target expressions through the shallow parsing based chunking. The dependency parsing algorithm creates associations between these extracted expressions. For our system, we develop sentiment-annotated lexicon of Urdu words. Each entry of the lexicon is marked with its orientation (positive or negative) and the intensity (force of orientation) score. For the evaluation of the system, two corpora of reviews, from the domains of movies and electronic appliances are collected. The results of the experimentation show that, we achieve the state of the art performance in the sentiment analysis of the Urdu text.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst, pp 1–34
Abdul-Mageed M, Korayem M (2010) Automatic identification of subjectivity in morphologically rich languages: the case of Arabic. In: Proceedings of the 1st workshop on computational approaches to subjectivity and sentiment analysis (WASSA), Lisbon pp 2–6
Andreevskaia A, Bergler S (2006) Mining WordNet for fuzzy sentiment: sentiment tag extraction from WordNet glosses. In Proceedings of the 11th conference of the European chapter of the association for computational linguistics, EACL-2006, Trent, pp 209–216
Annet M, Kondrak G (2008) A comparison of sentiment analysis techniques: polarizing movie blogs. In: Proceedings of Canadian AI, pp 25–35
Baker P, Hardie A, McEnery T, Jayaram BD (2003) Corpus data for South Asian language processing. In: Proceedings of the EACL workshop on South Asian languages, Budapest
Bansal M, Cardie C, Lee L (2008) The power of negative thinking: exploring label disagreement in the min cut classification framework, Manchester. In: Proceedings of COLING pp 13–16
Bloom K, Argamon S (2010) Unsupervised extraction of appraisal expressions. In: Proceedings of Canadian AI, Ottawa, pp 290–294
Breck E, Choi Y, Cardie C (2007) Identifying expressions of opinion in context. In: Proceedings of IJCAI’07. Menlo Park, CA, pp 2683–2688
Choi Y, Cardie C (2008) Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, HI, pp 793–801
Crilley K (2001) Information warfare: new battle fields, terrorists, propaganda, and the Internet. ASLIB Proc 53(7): 250–264
Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the twelfth international world wide web conference (WWW 2003), Budapest, pp 519–528
Durrani N, Hussain S (2010) Urdu word segmentation. In: Proceedings of 11th annual conference of the North American chapter of the association for computational linguistics, Los Angeles
Glaser J, Dixit J, Green DP (2002) Studying hate crime with the Internet: What makes racists advocate racial violence?. J Soc Issues 58(1): 177–193
Hardie A (2003) Developing a tagset for automated part-of-speech tagging in Urdu. In: Proceedings of the conference of the corpus linguistics, Lancaster
Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of ACL’97. Stroudsburg, PA, pp 174–181
Hatzivassiloglou V, Wiebe JM (2000) Effects of adjective orientation and gradability on sentence subjectivity. In Proceedings of the 18th international conference on computational linguistics, New Brunswick, NJ
Higashinaka R, Prasad R, Walker MA (2006) Learning to generate naturalistic utterances using reviews in spoken dialogue systems. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the ACL, Sydney, pp 265–272
Hu M, Liu B (2004) Mining and summarizing customer reviews. In Proceedings of SIGKDD’04, pp 168–177
Humayoun M, Hammarström H, Ranta A (2007) Urdu morphology, orthography and lexicon extraction. In: Proceedings of the 2nd workshop on computational approaches to Arabic script-based languages. Stanford, USA, pp 59–66
Ijaz M, Hussain S (2007) Corpus based Urdu lexicon development. In: Proceedings of the conference on language technology, University of Peshawar, Pakistan
Jang H, Shin H (2010) Language-specific sentiment analysis in morphologically rich languages. In: Proceedings of the COLING Poster Volume, Beijing, pp 498–506
Kaji N, Kitsuregawa M (2007) Building lexicon for sentiment analysis from massive collection of html documents. In: Proceedings of EMNLP’07, pp 1075–1083
Kamps J, Marx M, Mokken RJ, de Rijke M (2004) Using Wordnet to measure semantic orientation of adjectives. In Proceedings of LREC’04, pp 1115–1118
Kennedy A, Inkpen D (2006) Sentiment classification of movie and product reviews using contextual valence shifters. Comput Intell 22(2): 110–125
Kim S-M, Hovy E (2006) Automatic identification of pro and con reasons in online reviews. In: Proceedings of the COLING, Sydney pp 483–490
Lehal GS (2009) A two stage word segmentation system for handling space insertion problem in Urdu script. In: Proceedings of world academy of science, engineering and technology, Bangkok pp 321–324
Lehal GS (2010) A word segmentation system for handling space omission problem in Urdu script. In: Proceedings of the 1st workshop on South and Southeast Asian natural language processing (WSSANLP), the 23rd international conference on computational linguistics, COLING, Beijing, pp 43–50
Muaz A, Ali A, Hussain S (2009) Analysis and development of Urdu POS tagged corpora. In: Proceedings of the 7th workshop on Asian language resources, ACL-IJCNLP, Suntec, Singapore, pp 24–31
Mukund S, Ghosh D, Srihari RK (2010) Using cross-lingual projections to generate semantic role labeled corpus for Urdu—a resource poor language. In: Proceeding of the 23rd international conference on computational linguistics COLING, Beijing pp 797–805
Mullen T, Collier N (2004) Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the conference on empirical methods in natural language processing, Barcelona, pp 412–418
Na J-C, Sui H, Khoo C, Chan S, Zhou Y (2004) Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews. In: Proceedings of conference of the international society of knowledge organization (ISKO), pp 49–54
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd meeting of the association for computational linguistics, Barcelona, pp 271–278
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retrieval 2(1–2): 1–135
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the conference on empirical methods in NLP, Philadelphia, PA, pp 79–86
Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Sapporo pp 25–32
Riloff E, Wiebe J, Wilson T (2003) Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the 7th conference on natural language learning, Edmonton, pp 25–32
Rizvi SMJ, Hussain M (2005) Modeling case marking systems of Urdu-Hindi languages by using semantic information. In: Proceedings of natural language processing and knowledge engineering, pp 85–90
Schmidt RL (1999) Urdu: an essential grammar. Routledge Publishing, New York
Snyder B, Barzilay R (2007) Multiple aspect ranking using the Good Grief algorithm. In: Proceedings of the joint human language technology/North American chapter of the ACL conference, Rochester, NY pp 300–307
Stone PJ, Dunphy DC, Smith MS, Ogilvie DM (1966) The general inquirer: a computer approach to content analysis. MIT Press, Cambridge
Syed AZ, Muhammad A, Martínez-Enríquez AM (2010) Lexicon based sentiment analysis of Urdu text using SentiUnits. In: Proceedings of the 9th Mexican international conference of artificial intelligence, Pachuca, Mexico, pp 32–43
Tan S, Cheng X, Wang Y, Xu H (2009) Adapting Naive Bayes to domain adaptation for sentiment analysis. In: Proceedings of the 31st European conference on IR research on advances in information retrieval, pp 337–349
Tsarfaty R, Seddah D, Goldberg Y, Kübler S, Candito M, Foster J, Versley Y, Rehbein I, Tounsi L (2010) Statistical parsing of morphologically rich languages (SPMRL) what, how and whither. In: Proceedings of the NAACL HLT 2010 first workshop on statistical parsing of morphologically-rich languages, Los Angeles, pp 1–12
Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of 40th meeting of the association for computational linguistics, Philadelphia, PA, pp 417–424
Turney P, Littman M (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(4): 315–346
Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. In: Proceedings of ACM SIGIR conference on information and knowledge management (CIKM 2005), Bremen, pp 625–631
Wiebe J, Wilson T, Bruce R, Bell M, Martin M (2004) Learning subjective language. Comput Linguist 30(3): 277–308
Yang K, Yu N, Valerio A, Zhang H (2006) WIDIT in TREC 2006 Blog Track. In: Proceedings of Text REtrieval conference—TREC
Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of EMNLP’03, pp 129–136
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Syed, A.Z., Aslam, M. & Martinez-Enriquez, A.M. Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text. Artif Intell Rev 41, 535–561 (2014). https://doi.org/10.1007/s10462-012-9322-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-012-9322-6