Skip to main content

On the Sensitivity of LSTMs to Hyperparameters and Word Embeddings in the Context of Sentiment Analysis

  • Conference paper
  • First Online:
Proceedings of the 5th International Conference on Big Data and Internet of Things (BDIoT 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 489))

Included in the following conference series:

  • 407 Accesses

Abstract

Recurrent neural networks are still providing excellent results in sentiment analysis tasks, variants such as LSTM and Bidirectional LSTM have become a reference for building fast and accurate predictive models. However, such performance is difficult to obtain due to the complexity of the models and the hyperparameters choice. LSTM based models can easily overfit to the studied domain, and tuning the hyperparameters to get the desired model is the keystone of the training process. In this work, we provide a study on the sensitivity of a selection of LSTM based models to various hyperparameters and we highlight important aspects to consider while using similar models in the context of sentiment analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.internetworldstats.com/stats.htm.

  2. 2.

    https://www.statista.com.

  3. 3.

    https://twitter.com.

  4. 4.

    https://www.mturk.com.

  5. 5.

    https://www.nltk.org.

  6. 6.

    https://pypi.org/project/tweet-preprocessor.

  7. 7.

    https://nlp.stanford.edu/projects/glove.

  8. 8.

    https://keras.io.

  9. 9.

    https://numpy.org.

  10. 10.

    https://pandas.pydata.org.

  11. 11.

    https://colab.research.google.com.

References

  1. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012). https://doi.org/10.2200/S00416ED1V01Y201204HLT016

    Article  Google Scholar 

  2. El Haddaoui, B., Chiheb, R., Faizi, R., El Afia, A.: Toward a sentiment analysis framework for social media, 1–6 (2018). https://doi.org/10.1145/3230905.3230919

  3. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  4. Purnamasari, P.D., Taqiyuddin, M., Ratna, A.A.P.: Performance comparison of text-based sentiment analysis using recurrent neural network and convolutional neural network. In: Proceedings of the 3rd International Conference on Communication and Information Processing - ICCIP 2017, Tokyo, Japan, pp. 19–23 (2017). https://doi.org/10.1145/3162957.3163012

  5. Yin, W., Kann, K., Yu, M., Schütze, H.: Comparative Study of CNN and RNN for Natural Language Processing. ArXiv170201923 Cs, Feb 2017. http://arxiv.org/abs/1702.01923. Accessed: 14 Nov 2020

  6. Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 06(02), 107–116 (1998). https://doi.org/10.1142/S0218488598000094

    Article  MATH  Google Scholar 

  7. Schmidhuber, J., Cummins, F.: Learning to forget: Continual prediction with LSTM, 6 (1999)

    Google Scholar 

  8. Gers, F.A., Schmidhuber, J.: Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, Como, Italy, pp. 189–194. vol. 3 (2000). https://doi.org/10.1109/IJCNN.2000.861302

  9. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Gated feedback recurrent neural networks, 9 (2015).

    Google Scholar 

  10. Yu, Y., Si, X., Hu, C., Zhang, J.: A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019). https://doi.org/10.1162/neco_a_01199

    Article  MathSciNet  MATH  Google Scholar 

  11. Wang, J., Peng, B., Zhang, X.: Using a stacked residual LSTM model for sentiment intensity prediction. Neurocomputing 322, 93–101 (2018). https://doi.org/10.1016/j.neucom.2018.09.049

    Article  Google Scholar 

  12. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093

    Article  Google Scholar 

  13. Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. ArXiv161106639 Cs, Nov 2016. http://arxiv.org/abs/1611.06639 (2020). Accessed 18 Nov 2020

  14. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951). https://doi.org/10.1214/aoms/1177729586

    Article  MathSciNet  MATH  Google Scholar 

  15. Dozat, T.: Incorporating Nesterov Momentum into Adam, p. 4 (2016)

    Google Scholar 

  16. Allen-Zhu, Z., Li, Y.: Can SGD learn recurrent neural networks with provable generalization? ArXiv190201028 Cs Math Stat, May 2019. http://arxiv.org/abs/1902.01028 (2020). Accessed 21 Nov 2020

  17. Keskar, N.S., Socher, R.: Improving generalization performance by switching from adam to SGD. ArXiv171207628 Cs Math, Dec 2017. http://arxiv.org/abs/1712.07628 (2020). Accessed 21 Nov 2020

  18. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964). https://doi.org/10.1016/0041-5553(64)90137-5

    Article  Google Scholar 

  19. Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate O (1/k^ 2). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)

    MathSciNet  Google Scholar 

  20. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. ArXiv14126980 Cs, Jan 2017. http://arxiv.org/abs/1412.6980 (2020). Accessed 21 Nov 2020

  21. Choi, D., Shallue, C.J., Nado, Z., Lee, J., Maddison, C.J., Dahl, G.E.: On empirical comparisons of optimizers for deep learning. ArXiv191005446 Cs Stat, Jun 2020. http://arxiv.org/abs/1910.05446 (2020). Accessed 18 Nov 2020

  22. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting, p. 30 (2014)

    Google Scholar 

  23. Turian, J., Ratinov, L.-A., Bengio, Y.: Word representations: A simple and general method for semi-supervised learning, p. 11 (2010)

    Google Scholar 

  24. Almeida, F., Xexéo, G.: Word embeddings: A survey. ArXiv190109069 Cs Stat, Jan 2019. http://arxiv.org/abs/1901.09069 (2020). Accessed 21 Nov 2020

  25. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. ArXiv13013781 Cs, Sep 2013. http://arxiv.org/abs/1301.3781 (2020). Accessed 14 Nov 2020

  26. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162

  27. Yuan, H., Wang, Y., Feng, X., Sun, S.: Sentiment analysis based on weighted word2vec and att-LSTM. In: Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence - CSAI 2018, Shenzhen, China, pp. 420–424. (2018). https://doi.org/10.1145/3297156.3297228

  28. Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 515–520 (2016). https://doi.org/10.18653/v1/N16-1062

  29. Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., Wilson, T.: SemEval-2013 task 2: Sentiment analysis in twitter. In: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, Georgia, USA, pp. 312–320 (June 2013). https://www.aclweb.org/anthology/S13-2052. Accessed 23 Nov 2020

  30. Rosenthal, S., Ritter, A., Nakov, P., Stoyanov, V.: SemEval-2014 task 9: Sentiment analysis in twitter. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 73–80 (August 2014). https://doi.org/10.3115/v1/S14-2009

  31. Rosenthal, S., Mohammad, S.M., Nakov, P., Ritter, A., Kiritchenko, S., Stoyanov, V.: SemEval-2015 task 10: Sentiment analysis in twitter. ArXiv191202387 Cs, Dec 2019. http://arxiv.org/abs/1912.02387 (2020). Accessed 23 Nov 2020

  32. Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 task 4: Sentiment analysis in twitter. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 1–18 (June 2016). https://doi.org/10.18653/v1/S16-1001

  33. Angiani, G., et al.: A Comparison between Preprocessing Techniques for Sentiment Analysis in Twitter (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bousselham El Haddaoui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

El Haddaoui, B., Chiheb, R., Faizi, R., El Afia, A. (2022). On the Sensitivity of LSTMs to Hyperparameters and Word Embeddings in the Context of Sentiment Analysis. In: Lazaar, M., Duvallet, C., Touhafi, A., Al Achhab, M. (eds) Proceedings of the 5th International Conference on Big Data and Internet of Things. BDIoT 2021. Lecture Notes in Networks and Systems, vol 489. Springer, Cham. https://doi.org/10.1007/978-3-031-07969-6_40

Download citation

Publish with us

Policies and ethics