Skip to main content

Improving Mental Health Through Multimodal Emotion Detection from Speech and Text Data Using Long-Short Term Memory

  • Conference paper
  • First Online:
Frontiers of ICT in Healthcare

Abstract

In today’s world of cut-throat competition, where everyone is running an invisible race, we often find ourselves alone amongst the crowd. The advancements in technology are making our lives easier, yet man being a social animal is losing touch with society. As a result, today a huge part of the population is suffering from psychological disorders. Inferiority complex, inability to fulfil dreams, loneliness, etc., are considered to be the common reasons to disturb mental stability, which may further lead to disorders like depression. In extreme cases, depression causes loss of precious lives when an individual decides to commit suicide. Assessing an individual’s mental health in an interactive way with the core help of machine learning is the primary focus of this work. To realize this objective, we have used the most suitable long-short term memory (LSTM) architecture. It is an artificial recurrent neural network (RNN) in the field of deep learning on Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and FastText datasets to get 86% accuracy when fed with model-patient conversational data. Further, we discussed the scope of enhancing cognitive control capabilities over the psychiatric disorders, which may even lead to severe level of depression and suicidal attacks. Here, the proposed system will help to determine the severity level of depression in a person and will help with the recovery process. The system comprises of a wrist-band to measure some biological parameters, a headband to analyse the mental health and a user-friendly website and mobile application which has an in-built chatbot. AI-based chatbot will talk to the patients and help them reveal their thoughts, which they are otherwise not able to communicate to their peers. A person can chat via text message, which is to be stored in the database for further analysis. The novelty of this work is in the sentiment analysis of voice chat, which therefore creates a comfortable environment for the user.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Al Banna MH et al (2021) Attention-based bi-directional long-short term memory network for earthquake prediction. IEEE Access 9:56589–56603

    Article  Google Scholar 

  2. Anagnostopoulos CN, Iliou T, Giannoukos I (2015) Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif Intell Rev 43(2):155–177

    Article  Google Scholar 

  3. Dini L, Bittar A (2016) Emotion analysis on Twitter: the hidden challenge. In: Proceedings of LREC’16, pp 3953–3958 (2016)

    Google Scholar 

  4. Fabietti M et al (2020) Artifact detection in chronically recorded local field potentials using long-short term memory neural network. In: Proceedings of AICT 2020, pp 1–6 (2020)

    Google Scholar 

  5. Ghosh T et al (2021) An attention-based mood controlling framework for social media users. In: Proceedings of brain informatics, pp 245–256 (2021)

    Google Scholar 

  6. Ghosh T et al (2021) A hybrid deep learning model to predict the impact of covid-19 on mental health form social media big data. Preprints (2021060654)

    Google Scholar 

  7. Humphrey EJ, Bello JP, LeCun Y (2012) Moving beyond feature design: deep architectures and automatic feature learning in music informatics. In: ISMIR, pp 403–408

    Google Scholar 

  8. Kahou SE et al (2016) Emonets: multimodal deep learning approaches for emotion recognition in video. J Multimodal User Interfaces 10(2):99–111

    Article  Google Scholar 

  9. Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in North American English. PloS One 13(5):e0196391

    Article  Google Scholar 

  10. Mikolov T, Grave E, Puhrsch C, Joulin A (2017) Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405, pp 1–4

  11. Mohammad SM, Bravo-Marquez F (2017) Emotion intensities in tweets. arXiv preprint arXiv:1708.03696, pp 1–13

  12. Poria S, Cambria E, Howard N, Huang GB, Hussain A (2016) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174:50–59

    Article  Google Scholar 

  13. Sreeja PS, Mahalakshmi G (2017) Emotion models: a review. Int J Control Theor Appl 10:651–657

    Google Scholar 

  14. Sailunaz K, Dhaliwal M, Rokne J, Alhajj R (2018) Emotion detection from text and speech: a survey. Soc Netw Anal Mining 8(1):1–26

    Google Scholar 

  15. Satu M et al (2020) Towards improved detection of cognitive performance using bidirectional multilayer long-short term memory neural network. In: Proceedings of brain informatics, pp 297–306

    Google Scholar 

  16. Satu MS et al (2021) Tclustvid: a novel machine learning classification model to investigate topics and sentiment in covid-19 tweets. Knowl-Based Syst 226:107126

    Article  Google Scholar 

  17. Semwal N, Kumar A, Narayanan S (2017) Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models. In: Proceedings of ISBA, pp 1–6

    Google Scholar 

Download references

Acknowledgements

MM is supported by the AI-TOP (2020-1-UK01-KA201-079167) and DIVERSASIA (618615-EPP-1-2020-1-UKEPPKA2-CBHEJP) projects funded by the European Commission under the Erasmus+ programme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nilanjana Dutta Roy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhagat, D., Ray, A., Sarda, A., Dutta Roy, N., Mahmud, M., De, D. (2023). Improving Mental Health Through Multimodal Emotion Detection from Speech and Text Data Using Long-Short Term Memory. In: Mandal, J.K., De, D. (eds) Frontiers of ICT in Healthcare . Lecture Notes in Networks and Systems, vol 519. Springer, Singapore. https://doi.org/10.1007/978-981-19-5191-6_2

Download citation

Publish with us

Policies and ethics