Abstract
In sentiment analysis, we try to find out the writer's view about any product, events, government policy, services, topics, individual, etc., through the text written by them on social media platforms like Twitter, Facebook, etc. This study has considered two datasets (STS-Gold and IMDb) on a different domain and with varying lengths of text. The objective of this study is to know which classification algorithm performs better on two domains of text with different length. We have applied six machine learning algorithms (support vector machine, logistic regression, K-Nearest Neighbors, random forest, Naïve Bayes, and decision tree) and compared them on the basis f-score, precision, recall, and accuracy. In the IMDb dataset, logistic regression performs better among all and gives the highest accuracy of 96.3% and f-score of 80.6%. The second highest is achieved with Naïve Bayes with 95.89 and 80.05% f-score. Naïve Bayes gives the highest accuracy of 81.08% and an f-score of 42.45% in the STS-Gold dataset. The second highest is achieved with logistic regression giving an accuracy of 80.09 and 41.52% f-score. We found that logistic regression and Naïve Bayes are performing better among all the algorithms on both datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
“The pen is mightier than the sword,” Wikipedia, 22-Nov-2016. [Online]. Available: https://en.wikipedia.org/w/index.php?title=The_pen_is_mightier_than_the_sword&oldid=750939396 [Accessed: 02-Dec-2016]
Pawar AB, Jawale MA, Kyatanavar DN (2016) Fundamentals of sentiment analysis: concepts and methodology. In: Sentiment analysis and ontology engineering. Springer, Cham, pp 25–48
Contiki M et al (2016) SemEval-2016 task 5: Aspect-based sentiment analysis. In: Workshop on semantic evaluation (SemEval-2016). Association for Computational Linguistics
Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inform Sci Technol 60(11):2169–2188
Sang ETK, Bos J (2012) Predicting the 2011 Dutch Senate election results with twitter. In: Proceedings of the 13th conference of the European Chapter of the Association for Computational Linguistics, pp 53–60
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82. https://doi.org/10.1145/2436256.2436274
Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches, and applications. Knowl-Based Syst 89:14–46. https://doi.org/10.1016/j.knosys.2015.06.015
Hatzivassiloglou. V, Wiebe J (2000) Effects of adjective orientation and gradability on sentence subjectivity. In: Proceedings of the international conference on computational linguistics (COLING), pp 299–305
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing (HLT/EMNLP), Vancouver, pp 347–354
Yi J, Nasukawa T, NiblackW, Bunescu R (2003) Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM 2003), USA, pp 427–434
Turney (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting of the association for computational linguistics (ACL), Philadelphia, pp 417–424
Li S, Wang Z, Zhou G, Lee SYM (2011) Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of the international joint conference on artificial intelligence, pp 1826–1831
Xia R, Zong C, Li S (2011) The ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181:1138–1152. https://doi.org/10.1016/j.ins.2010.11.023. http://www.sciencedirect.com/science/article/pii/S0020025510005682
Fersini E, Messina E, Pozzi F (2014) Sentiment analysis: Bayesian ensemble learning. Decis Support Syst 68:26–38. https://doi.org/10.1016/j.dss.2014.10.004. http://www.sciencedirect.com/science/article/pii/S0167923614002498
Rokach L (2005) Ensemble methods for classifiers. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, US pp 957–980. https://doi.org/10.1007/0-387-25465-X_45
Sehgal V, Song C (2007) Sops: stock prediction using web sentiment. In: Seventh IEEE international conference on data mining workshops, 2007. ICDM workshops in 2007, pp 21–26. https://doi.org/10.1109/ICDMW.2007.100
Whitehead M, Yaeger L (2010) Sentiment mining using ensemble classification models. In: Innovations and advances in computer sciences and engineering. Springer pp 509–514
Hangya V, Berend G, Varga I, Farkas R (2014) SZTE-NLP: aspect level opinion mining exploiting syntactic cues. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014). Dublin, Ireland, pp 610–614
Dang NC, Moreno-García MN, De la Prieta F (2020) Sentiment analysis based on deep learning: a comparative study. Electronics 9(3):483
Saif H, Fernandez M, He Y, Alani H (2013) Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human Language Technologies, vol 1. Association for Computational Linguistics pp 142–150
(2005). Internet &Text Slang Dictionary. Accessed: Feb. 2, 2017. [Online] Available: https://www.noslang.com/dictionary
Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Documentation 28(1):11–21
McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. In: AAAI-98 Workshop on learning for text categorization, vol 752(1), pp 41–48
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Their Appl 13(4):18–28
Umanol M, Okamoto H, Hatono I, Tamura HIROYUKI, Kawachi F, Umedzu S, Kinoshita J (1994) Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems. In: Fuzzy systems, 1994. Proceedings of the third IEEE conference on IEEE world congress on computational intelligence, pp 2113–2118. IEEE
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22
Tabari B, Herman W (2002) A Multivariate logistic regression equation to screen for diabetes. Diabetes Care 25:1999–2003
Soucy P, Mineau GW (2001) A simple KNN algorithm for text categorization. In Proceedings IEEE international conference on data mining, 2001, ICDM 2001. IEEE pp 647–648
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ahuja, R., Sharma, S.C. (2022). Sentiment Analysis on Different Domains Using Machine Learning Algorithms. In: Tiwari, S., Trivedi, M.C., Kolhe, M.L., Mishra, K., Singh, B.K. (eds) Advances in Data and Information Sciences. Lecture Notes in Networks and Systems, vol 318. Springer, Singapore. https://doi.org/10.1007/978-981-16-5689-7_13
Download citation
DOI: https://doi.org/10.1007/978-981-16-5689-7_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5688-0
Online ISBN: 978-981-16-5689-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)