Skip to main content

A Study of Feature Extraction Techniques for Sentiment Analysis

  • Conference paper
  • First Online:
Emerging Technologies in Data Mining and Information Security

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 814))

Abstract

Sentiment analysis refers to the study of systematically extracting the meaning of subjective text. When analyzing sentiments from the subjective text using machine learning techniques, feature extraction becomes a significant part. We perform a study on the performance of feature extraction techniques, TF-IDF (term frequency-inverse document frequency) and Doc2vec (document to vector), using Cornell movie review datasets, UCI sentiment labeled datasets, stanford movie review datasets, effectively classifying the text into positive and negative polarities by using various preprocessing methods like eliminating stop words and tokenization which increases the performance of sentiment analysis in terms of accuracy and time taken by the classifier. The features obtained after applying feature extraction techniques on the text sentences are trained and tested using the classifiers logistic regression, support vector machines, K-nearest neighbors, decision tree, and Bernoulli Naive Bayes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, Series. EMNLP ’02, pp. 79–86 (2002)

    Google Scholar 

  2. Jain, A.P., Katkar, V.D.: Sentiments analysis of Twitter data using data mining. In: International Conference on Information Processing (ICIP), pp. 807–810 (2015)

    Google Scholar 

  3. Koprinska, I., O’Keefe, T.: Feature selection and weighting methods in sentiment analysis. In: Proceedings of the 14th Australasian Document Computing Symposium, Sydney, Australia, pp. 67–74 (2009)

    Google Scholar 

  4. Albitar, S., Espinasse, B., Fournier, S.: An effective TF/IDF-based text-to-text semantic similarity measure for text classification. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang Y. (eds.) Web Information Systems Engineering WISE 2014. Lecture Notes in Computer Science, vol. 8786. Springer, Cham (2014)

    Google Scholar 

  5. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: CoRR, vol. abs/1405.4053 (2014)

    Google Scholar 

  6. Sanguansat, P.: Paragraph2Vec-based sentiment analysis on social media for business in Thailand. In: 8th International Conference on Knowledge and Smart Technology (KST), pp. 175–178 (2016). https://doi.org/10.1109/KST.2016.7440526

  7. Bilgin, M., Senturk, I.F.: Sentiment analysis on Twitter data with semi-supervised Doc2Vec. In: International Conference on Computer Science and Engineering (UBMK), pp. 661–666 (2017). https://doi.org/10.1109/UBMK.2017.8093492

  8. Maas, A.L., Ng, A.Y., Potts, C., Huang, D., Pham, P.T., Daly, R.E.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Series. HLT ’11, Portland, Oregon , pp. 142–150 (2011)

    Google Scholar 

  9. Kotzias, D., Denil, M., de Freitas, N., Smyth, P.: From group to individual labels using deep features. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Series. KDD ’15, Sydney, NSW, Australia, pp. 597–606 (2015)

    Google Scholar 

  10. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Series. ACL ’04, article no. 271 (2004)

    Google Scholar 

  11. Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Series. ACL’05, pp. 115–124 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. Sivasankar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Avinash, M., Sivasankar, E. (2019). A Study of Feature Extraction Techniques for Sentiment Analysis. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 814. Springer, Singapore. https://doi.org/10.1007/978-981-13-1501-5_41

Download citation

Publish with us

Policies and ethics