Abstract
Sentiment analysis refers to the study of systematically extracting the meaning of subjective text. When analyzing sentiments from the subjective text using machine learning techniques, feature extraction becomes a significant part. We perform a study on the performance of feature extraction techniques, TF-IDF (term frequency-inverse document frequency) and Doc2vec (document to vector), using Cornell movie review datasets, UCI sentiment labeled datasets, stanford movie review datasets, effectively classifying the text into positive and negative polarities by using various preprocessing methods like eliminating stop words and tokenization which increases the performance of sentiment analysis in terms of accuracy and time taken by the classifier. The features obtained after applying feature extraction techniques on the text sentences are trained and tested using the classifiers logistic regression, support vector machines, K-nearest neighbors, decision tree, and Bernoulli Naive Bayes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, Series. EMNLP ’02, pp. 79–86 (2002)
Jain, A.P., Katkar, V.D.: Sentiments analysis of Twitter data using data mining. In: International Conference on Information Processing (ICIP), pp. 807–810 (2015)
Koprinska, I., O’Keefe, T.: Feature selection and weighting methods in sentiment analysis. In: Proceedings of the 14th Australasian Document Computing Symposium, Sydney, Australia, pp. 67–74 (2009)
Albitar, S., Espinasse, B., Fournier, S.: An effective TF/IDF-based text-to-text semantic similarity measure for text classification. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang Y. (eds.) Web Information Systems Engineering WISE 2014. Lecture Notes in Computer Science, vol. 8786. Springer, Cham (2014)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: CoRR, vol. abs/1405.4053 (2014)
Sanguansat, P.: Paragraph2Vec-based sentiment analysis on social media for business in Thailand. In: 8th International Conference on Knowledge and Smart Technology (KST), pp. 175–178 (2016). https://doi.org/10.1109/KST.2016.7440526
Bilgin, M., Senturk, I.F.: Sentiment analysis on Twitter data with semi-supervised Doc2Vec. In: International Conference on Computer Science and Engineering (UBMK), pp. 661–666 (2017). https://doi.org/10.1109/UBMK.2017.8093492
Maas, A.L., Ng, A.Y., Potts, C., Huang, D., Pham, P.T., Daly, R.E.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Series. HLT ’11, Portland, Oregon , pp. 142–150 (2011)
Kotzias, D., Denil, M., de Freitas, N., Smyth, P.: From group to individual labels using deep features. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Series. KDD ’15, Sydney, NSW, Australia, pp. 597–606 (2015)
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Series. ACL ’04, article no. 271 (2004)
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Series. ACL’05, pp. 115–124 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Avinash, M., Sivasankar, E. (2019). A Study of Feature Extraction Techniques for Sentiment Analysis. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 814. Springer, Singapore. https://doi.org/10.1007/978-981-13-1501-5_41
Download citation
DOI: https://doi.org/10.1007/978-981-13-1501-5_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1500-8
Online ISBN: 978-981-13-1501-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)