Sarcasm Detection of Amazon Alexa Sample Set

Pandey, Avinash Chandra; Seth, Saksham Raj; Varshney, Mahima

doi:10.1007/978-981-13-2553-3_54

Avinash Chandra Pandey³⁶,
Saksham Raj Seth³⁶ &
Mahima Varshney³⁶

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 526))

1308 Accesses
10 Citations

Abstract

Sentiment analysis using collection of positive, negative score of a word has been one of the most researched topics in Data Mining. This kind of analysis is more prominent based on the content available on social media like comments on Facebook, tweets on Twitter, and the count goes on. Sarcasm can be understood as irony but it is a text spoken in such a manner that evokes laughter and humor. It is a type of sentiment where people express their negative feelings using positive or intensified positive words in the text. While speaking, people often use heavy tonal stress and certain gestures clues like rolling of the eyes, hand movement, etc., to reveal sarcasm. In this paper, NLTK has been used which is a Python toolkit to harness the power of generating information from the huge text datasets available. Sampled data from Amazon Alexa has been collected which is further processed using SentiWordNet 3.0 and TextBlob to remove noise and irrelevant data. Thereafter, Gaussian naive Bayes algorithm along with TextBlob has been used to detect sarcasm in dataset. The performance of the proposed method is compared with naïve Bayes, decision tree, and support vector machine. From the experimental results, effectiveness of the proposed method is observed.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Sentiment Analysis of Hinglish Text and Sarcasm Detection

Sarcasm Detection Using Feature-Variant Learning Models

Emoticon and Text Sarcasm Detection in Sentiment Analysis

Keywords

1 Introduction

In the present-day world where humans are having conflicting emotions, it is a tedious task to analyze their sentiments. Sarcasm requires shared knowledge between speaker and the listener [1]. Detection of sarcasm in text is difficult because gestural and tonal clues are missing. Many machine learners collect their dataset from social texts to detect sarcasm, especially in tweets [1]. We used the dataset provided by the Amazon Alexa’s sample set to apply machine learning algorithms. A machine learning algorithm is attempted to design to detect sarcasm in text. Naive Bayes, one-class SVM and Gaussian kernel are few algorithms commonly used to perform the same task [2].

Semi-supervised sarcasm is identified on two different datasets: a collection of millions of tweets collected from Twitter, and a collection of millions of product reviews from Amazon [3]. On Twitter a common form of sarcasm exists in a form where a positive sentiment contradicted with a negative situation. For example, many sarcastic tweets include a positive sentiment, such as “love” or “enjoy”, followed by an expression that describes an undesirable activity or state (e.g., “taking exams” or “being ignored”) [4].

Sarcasm changes the polarity of an apparently positive or negative statement into its contradictory statement. A corpus of sarcastic messages on Twitter is created by many authors on whom determination of the sarcasm of each message has been made by its author. These corpuses are used as a reliable benchmark to compare sarcastic expressions in Twitter. Many authors also investigated the impact of lexical and pragmatic factors for discovering sarcastic statements. Sarcastic statements are difficult to identify. Therefore, we compare the performance of machine learning techniques and human judges on this task to find who is performing better. Perhaps unsurprisingly, neither the human judges nor the machine learning techniques [5,6,7] perform very well [8]. There are many computational approaches for sarcasm detection using lexical cues has been given [9].

Many properties [10,11,12,13,14] were explored while finding sarcasm in text like theories of sarcasm, syntactical properties [15], lexical feature [16, 17], etc. [4, 18, 19]. Model’s accuracy can be improved after finding positive and negative works, which can be done using bag-of-words. Accuracy increases for feature extraction by the use of bag-of-words [12]. The experimental results depict that the proposed method outperforms the existing methods. The rest of the paper is organized as follows: Sect. 2 describes the proposed method. Section 3 discusses experimental results and Sect. 4 concludes the paper.

2 Proposed Work

In this research paper, we performed various operations to build our model. SentiWordNet 3.0 dictionary is preprocessed and transformed to form a map, which contains a key and value from the dictionary, where key contains the POS tags and synsets of the SentiWordNet dictionary and value contains the mean of positive and negative values of the respective words in the SentiWordNet dictionary. We used this map to calculate to sentiment of the provided textual data. The complete steps of proposed method have been shown in Fig. 1.

The aim of TextBlob is to provide access to common text processing operations. Polarity and Subjectivity are the main factors of Python library, i.e., TextBlob. TextBlob objects can be treated as Python library to do Natural Language Processing. On the provided textual data, polarity and subjectivity are calculated by the TextBlob objects, to improve the sentiment score. Above two methods were very useful and improvement in accuracy was up to 5–7%. Apart from these two methods, we implemented Vectorization method.

In which a vector was created to store the count of Nouns, Adverbs, Adjectives, and Verbs in the provided textual data. This method was implemented with the help of POS-TAG (Part-Of-Speech Tagging), a very impressive method in the Python library in NLTK (Natural Language Toolkit). NLTK library deals with the textual data and simplifies work for Python programmers. Method POS-TAG returns a list of each word from the provided textual data, with the tags of Nouns, Verbs, Adverbs, Adjectives, etc.

To improve our accuracy for about 2–3%, we implemented a technique called Capitalization. In which the focus is given on the words which are Capital, so that we can detect the words which are to be focused to be spoken. When we provide a textual data, we have no idea which word is given stress on. This was a very impressive technique to judge the textual data’s sense.

A matrix was created for the whole dataset containing the features extracted from the above techniques and final step taken was to apply naive Bayes Algorithm. The naive Bayes is used as a baseline for text categorization. The classifier makes the naïve assumption that the independence occurs between all the features. The classifier is applied from Bayes theorem. Its simplicity makes it a popular machine learning classifier.

$$ P\left( {C_{k} |x} \right) = \frac{{P\left( {x|C_{k} } \right)\, \cdot \,P\left( {C_{k} } \right)}}{P\left( x \right)} $$

On a whole, after the application of all these great techniques, an accuracy of 70.96% was obtained.

3 Experimental Results

The performance of the proposed method has been tested on sarcasm dataset and its accuracy is also compared with naïve Bayes, decision tree, and SVM. From Table 1, it is easily observed that the proposed method outperforms the exiting method. Moreover, histogram for accuracy is also plotted in Fig. 2. From Fig. 2, the effectiveness of the proposed method can be easily observed.

Table 1 Accuracy of the existing method and the proposed method

Full size table

The above histogram shows the accuracy rate variation for executing the same model for three times.

4 Conclusion

Automatic sarcasm detection is a formidable task. This paper offers novel naïve Bayes method to detect sarcasm in Amazon Alexa dataset [20]. The dataset is divided into training and test dataset using cross-validation techniques. The quality of features/attributes extracted from the training dataset affects the performance of the technique. Therefore, SentiWordNet and TextBlob have been used to extract important features from dataset and the model is trained using those features. The test dataset is tested using Gauss-based naïve Bayes method and three baseline methods namely; naïve Bayes, decision tree, and support vector machine. From the experimental results, it is found that the proposed method outperforms the baseline methods.

Sarcasm is closely related to language- or culture-specific traits. Future approaches to identity sarcasm in new languages can benefit to identify such traits.

References

Bharti, S.K., Vachha, B., Pradhan, R.K., Babu, K.S., Jena, S.K.: Sarcastic sentiment detection in tweets streamed in real time: a big data approach. Digital Communications and Networks 2(3), 108–121 (2016)
Article Google Scholar
Peng, C.-C., Lakis, M., Pan, J.W.: Detecting Sarcasm in Text: An Obvious Solution to a Trivial Problem (2015)
Google Scholar
Dmitry, D., Tsur, O., Rappoport, A.: Semi-supervised recognition of sarcastic sentences in twitter and amazon. In: Proceedings of the fourteenth conference on computational natural language learning, pp. 107–116. Association for Computational Linguistics (2010)
Google Scholar
Ellen, R., Qadir, A., Surve, P., De Silva, L., Gilbert, N., Huang, R.: Sarcasm as contrast between a positive sentiment and negative situation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 704–714 (2013)
Google Scholar
Pandey, A.C., Pal, R., Kulhari, A.: Unsupervised data classification using improved biogeography based optimization. Int. J. Syst. Assur. Eng. Manag. 1–9
Google Scholar
Pandey, A.C., Rajpoot, D.S., Saraswat, M.: Data clustering using hybrid improved cuckoo search method. In: 2016 Ninth International Conference on Contemporary Computing (IC3), pp. 1–6. IEEE (2016)
Google Scholar
Pal, R., Avinash Pandey, H.M., Saraswat, M.: BEECP: Biogeography optimization-based energy efficient clustering protocol for HWSNS. In: 2016 Ninth International Conference on Contemporary Computing (IC3), pp. 1–6. IEEE (2016)
Google Scholar
González-Ibánez, R., Muresan, S., Wacholder, N.: Identifying sarcasm in Twitter: a closer look. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 581–586. Association for Computational Linguistics (2011)
Google Scholar
Forslid, E., Niklas, W.: Automatic Irony-and Sarcasm Detection in Social Media (2015)
Google Scholar
Bamman, D., Smith. N.A.: Contextualized sarcasm detection on Twitter. In: ICWSM, pp. 574–577 (2015)
Google Scholar
Pandey, A.C., Rajpoot, D.S., Saraswat, M.: Twitter sentiment analysis using hybrid cuckoo search method. Inf. Process. Manag. 53(4) 764–779 (2017)
Google Scholar
Wicana, S.G., İbisoglu, T.Y., Yavanoglu, U.: A Review on sarcasm detection from machine-learning perspective. In: 2017 IEEE 11th International Conference on Semantic Computing (ICSC), pp. 469–476. IEEE (2017)
Google Scholar
Dave, A.D., Desai, N.P.: A comprehensive study of classification techniques for sarcasm detection on textual data. In: International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 1985–1991. IEEE (2016)
Google Scholar
Rajadesingan, A., Zafarani, R., Liu, H.: Sarcasm detection on twitter: a behavioral modeling approach. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 97–106. ACM, (2015)
Google Scholar
Mishra, A., Kanojia, D., Seema N., Kuntal D., Bhattacharyya, P.: Harnessing Cognitive Features for Sarcasm Detection (2017). arXiv:1701.05574
Sharada, A., Krishna, P.P.: Sentiment Mining: an approach for Hindi reviews. Algorithms (2017)
Google Scholar
Forslid, E., Wikén, N.: Automatic Irony-and Sarcasm Detection in Social Media (2015)
Google Scholar
Detection Ratcliffe, C., Griffith, J., A Machine Learning Approach to Automatic Sarcasm. National University of Ireland, Galway
Google Scholar
Joshi, A., Kanojia, D., Bhattacharyya, P., Carman, M.J.: Sarcasm Suite: a browser-based engine for sarcasm detection and generation. In: AAAI, pp. 5095–5096 (2017)
Google Scholar
Amazon Alexa dataset, http://curtis.ml.cmu.edu/w/courses/index.php/Amazon_Dataset_ for_Sarcasm

Download references

Author information

Authors and Affiliations

Jaypee Institute of Information Technology, Noida, India
Avinash Chandra Pandey, Saksham Raj Seth & Mahima Varshney

Authors

Avinash Chandra Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Saksham Raj Seth
View author publications
You can also search for this author in PubMed Google Scholar
Mahima Varshney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saksham Raj Seth .

Editor information

Editors and Affiliations

University of Nevada, Reno, Reno, NV, USA
Banmali S. Rawat
Atal Bihari Vajpayee Indian Institute of Information Technology and Management, Gwalior, Gwalior, Madhya Pradesh, India
Aditya Trivedi
Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Sanjeev Manhas
Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India
Vikram Karwal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pandey, A.C., Seth, S.R., Varshney, M. (2019). Sarcasm Detection of Amazon Alexa Sample Set. In: Rawat, B., Trivedi, A., Manhas, S., Karwal, V. (eds) Advances in Signal Processing and Communication . Lecture Notes in Electrical Engineering, vol 526. Springer, Singapore. https://doi.org/10.1007/978-981-13-2553-3_54

Download citation

DOI: https://doi.org/10.1007/978-981-13-2553-3_54
Published: 20 November 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2552-6
Online ISBN: 978-981-13-2553-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Sarcasm Detection of Amazon Alexa Sample Set

Abstract

Similar content being viewed by others

Sentiment Analysis of Hinglish Text and Sarcasm Detection

Sarcasm Detection Using Feature-Variant Learning Models

Emoticon and Text Sarcasm Detection in Sentiment Analysis

Keywords

1 Introduction

2 Proposed Work

3 Experimental Results

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Sarcasm Detection of Amazon Alexa Sample Set

Abstract

Similar content being viewed by others

Sentiment Analysis of Hinglish Text and Sarcasm Detection

Sarcasm Detection Using Feature-Variant Learning Models

Emoticon and Text Sarcasm Detection in Sentiment Analysis

Keywords

1 Introduction

2 Proposed Work

3 Experimental Results

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation