Skip to main content

Comparison of Machine Learning Models for Early Depression Detection from Users’ Posts

  • Chapter
  • First Online:
Early Detection of Mental Health Disorders by Social Media Monitoring

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1018))

  • 747 Accesses

Abstract

With around 300 millions people worldwide suffering from depression, the detection of this disorder is crucial and a challenge for individual and public health. As with many diseases, early detection means better medical management; the use of social media messages as potential clues to depression is an opportunity to assist in this early detection by automatic means. This chapter is based on the participation of the CNRS IRIT laboratory in the early detection of depressive people (eRisk) task at the CLEF evaluation forum. Early depression detection differs from depression detection in that it considers temporality; the system must make its decision about a user’s possible depression with as little data as possible. In this chapter we re-evaluate the models we have developed for our participation at eRisk over the years on the different collections, to obtain a more robust comparison. We also add new models. We use well-established classification methods, such as Logistic regression, Random forest, and Support Vector Machine (SVM). The users’ data from which the system should detect if they are depressed, are represented as vectors composed of (a) various task-oriented features including depression related lexicons and (b) word and document embeddings, extracted from the users’ posts. We perform an ablation study to analyze the most important features for our models. We also use BERT deep learning architecture for comparison purposes, both for depression detection and early depression detection. According to our results, well-established machine learning models are still better than more modern models for -early- detection of depression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.la-depression.org/, accessed January 28, 2021.

  2. 2.

    http://www.doctissimo.fr/psychologie/news/la-france-pays-le-plus-touche-par-la-depression, accessed January 28, 2021.

  3. 3.

    CES-D stands for Center for Epidemiologic Studies Depression who provides a questionnaire that can be used to detect depression [35].

  4. 4.

    Reddit is a social news aggregation, web content rating, and discussion website (https://www.reddit.com).

  5. 5.

    https://www.who.int/news-room/fact-sheets/detail/depression.

  6. 6.

    http://en.wikipedia.org/wiki/List_of_antidepressants  accessed on 23/02/2017.

  7. 7.

    http://www.webmd.com/depression/guide/depression-medications-antidepressants accessed on 10/01/2018.

  8. 8.

    http://empath.stanford.edu.

  9. 9.

    https://github.com/google-research/bert, accessed on 02/02/2021.

References

  1. Burdisso, S. G., Errecalde, M., & Montes-y-Gómez, M. (2019). t-ss3: a text classifier with dynamic n-grams for early risk detection over text streams. arxiv:1911.06147.

  2. Burdisso, S. G., Errecalde, M., & Montes-y-Gómez, M. (2019). A text classification framework for simple and effective early depression detection over social media streams. Expert System Application, 133, 182–197.

    Article  Google Scholar 

  3. Cacheda, F., Iglesias, D. F., Nóvoa, F. J., & Carneiro, V. (2018). Analysis and experiments on early detection of depression. In Working Notes of CLEF 2018—Conference and Labs of the Evaluation Forum, Avignon, France, September 10–14, 2018.

    Google Scholar 

  4. Choudhury, M. D., Counts, S., Horvitz, E., & Hoff, A. (2014). Characterizing and predicting postpartum depression from shared facebook data. In Computer Supported Cooperative Work, CSCW ’14, Baltimore, MD, USA, February 15–19, 2014 (pp. 626–638).

    Google Scholar 

  5. Choudhury, M. D., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting depression via social media. In Proceedings of the Seventh International Conference on Weblogs and Social Media.

    Google Scholar 

  6. Colombo, G. B., Burnap, P., Hodorog, A., & Scourfield, J. (2016). Analysing the connectivity and communication of suicidal users on twitter. Computer Communications, 73, 291–300.

    Article  Google Scholar 

  7. Dai, A. M., Olah, C., & Le, Q. V. (2015). Document embedding with paragraph vectors. arXiv:1507.07998.

  8. Dalloux, C., Claveau, V., Cuggia, M., Bouzillé, G., & Grabar, N. (2020). Supervised learning for the ICD-10 coding of french clinical narratives. In Digital Personalized Health and Medicine—Proceedings of MIE 2020, Medical Informatics Europe, Geneva, Switzerland, April 28–May 1, 2020 (2020) (pp. 427–431).

    Google Scholar 

  9. Deveaud, R., Mothe, J., Ullah, M. Z., & Nie, J.-Y. (2018). Learning to adaptively rank document retrieval system configurations. ACM Transactions on Information Systems (TOIS), 37(1), 1–41.

    Article  Google Scholar 

  10. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.

  11. Fast, E., Chen, B., & Bernstein, M. S. (2016). Empath: Understanding topic signals in large-scale text. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, May 7–12, 2016 (2016) (pp. 4647–4657).

    Google Scholar 

  12. France, D. J., Shiavi, R. G., Silverman, S. E., Silverman, M. K., & Wilkes, D. M. (2000). Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Transactions on Biomedical Engineering, 47(7), 829–837.

    Article  Google Scholar 

  13. Funez, D. G., Errecalde, M. L., Villegas, M. P., Ucelay, M. J. G., & Cagnina, L. C. (2017). Temporal variation of terms as concept space for early risk prediction. In Working Notes of CLEF 2017—Conference and Labs of the Evaluation Forum, Dublin, Ireland, September 11–14, 2017.

    Google Scholar 

  14. Funez, D. G., Ucelay, M. J. G., Villegas, M. P., Burdisso, S., Cagnina, L. C., Montes-y-Gómez, M., & Errecalde, M. (2018). Unsl’s participation at erisk 2018 lab. In Working Notes of CLEF 2018—Conference and Labs of the Evaluation Forum, Avignon, France, September 10–14, 2018.

    Google Scholar 

  15. Hoang, T. B. N., & Mothe, J. (2018). Predicting information diffusion on twitter-analysis of predictive features. Journal of Computational Science, 28, 257–264.

    Article  Google Scholar 

  16. Iarivony Faneva, R. (2020). Extraction et fouille de données textuelles: application à la détection de la dépression, de l’anorexie et de l’agressivité dans les réseaux sociaux. Ph.D. thesis, Université de Toulouse.

    Google Scholar 

  17. Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3–7, 2017, Volume 2: Short Papers (pp. 427–431).

    Google Scholar 

  18. King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2), 137–163.

    Article  Google Scholar 

  19. Kulkarni, A. B. K. (2018). Early detection of depression. Master’s thesis, University of Houston.

    Google Scholar 

  20. Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International Conference on Machine Learning, PMLR (pp. 1188–1196).

    Google Scholar 

  21. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2017). Feature selection: A data perspective. ACM Computing Surveys, 50, 6.

    Google Scholar 

  22. Li, Z., Xiong, Z., Zhang, Y., Liu, C., & Li, K. (2011). Fast text categorization using concise semantic analysis. Pattern Recognition Letters, 32(3), 441–448.

    Article  Google Scholar 

  23. Low, L. A., Maddage, N. C., Lech, M., Sheeber, L., & Allen, N. B. (2011). Detection of clinical depression in adolescents’ speech during family interactions. IEEE Transactions on Biomedical Engineering, 58(3), 574–586.

    Article  Google Scholar 

  24. Malam, I. A., Arziki, M., Bellazrak, M. N., Benamara, F., Kaidi, A. E., Es-Saghir, B., He, Z., Housni, M., Moriceau, V., Mothe, J., & Ramiandrisoa, F. (2017). IRIT at e-risk. In Working Notes of CLEF 2017—Conference and Labs of the Evaluation Forum, Dublin, Ireland, September 11–14, 2017.

    Google Scholar 

  25. Marriott, T. C., & Buchanan, T. (2014). The true self online: Personality correlates of preference for self-expression online, and observer ratings of personality online and offline. Computers in Human Behavior, 32, 171–177.

    Article  Google Scholar 

  26. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger, Eds. Advances in Neural Information Processing Systems (vol. 26). Curran Associates, Inc.

    Google Scholar 

  27. Mohammad, S., & Turney, P. D. (2013). Crowdsourcing a word-emotion association lexicon. Computational Intelligence, 29, 3.

    Article  MathSciNet  Google Scholar 

  28. Mowery, D., Park, A., Conway, M., & Bryan, C. (2016). Towards automatically classifying depressive symptoms from twitter data for population health. In Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (pp. 182–191).

    Google Scholar 

  29. Organization, W. H., et al. (2017). Depression and other common mental disorders: Global health estimates. 2017. Geneva: WHO.

    Google Scholar 

  30. Øverland, S., Woicik, W., Sikora, L., Whittaker, K., Heli, H., Skjelkvåle, F. S., Sivertsen, B., & Colman, I. (2020). Seasonality and symptoms of depression: A systematic review of the literature. Epidemiology and Psychiatric Sciences, 29.

    Google Scholar 

  31. Ozdas, A., Shiavi, R. G., Silverman, S. E., Silverman, M. K., & Wilkes, D. M. (2004). Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Transactions on Biomedical Engineering, 51(9), 1530–1540.

    Article  Google Scholar 

  32. Paul, S., Jandhyala, S. K., & Basu, T. (2018). Early detection of signs of anorexia and depression over social media using effective machine learning frameworks. In Working Notes of CLEF 2018—Conference and Labs of the Evaluation Forum, Avignon, France, September 10–14, 2018.

    Google Scholar 

  33. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12, 2825–2830.

    Google Scholar 

  34. Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 2227–2237).

    Google Scholar 

  35. Radloff, L. (2015). A self-report depression scale for research in the general population. Applied Psychological Measurement, 1, 413–449.

    Google Scholar 

  36. Ramiandrisoa, F., & Mothe, J. (2020). Early detection of depression and anorexia from social media: A machine learning approach. In Proceedings of the Joint Conference of the Information Retrieval Communities in Europe (CIRCLE 2020), Samatan, Gers, France, July 6–9, 2020.

    Google Scholar 

  37. Ramiandrisoa, F., Mothe, J., Benamara, F., & Moriceau, V. (2018). IRIT at e-risk 2018. In Working Notes of CLEF 2018—Conference and Labs of the Evaluation Forum, Avignon, France, September 10–14, 2018.

    Google Scholar 

  38. Resnik, P., Armstrong, W., Claudino, L. M. B., Nguyen, T., Nguyen, V., & Boyd-Graber, J. L. (2015). Beyond LDA: exploring supervised topic modeling for depression-related language in Twitter. In Proceedings of CLPsych@NAACL-HLT.

    Google Scholar 

  39. Rude, S., Gortner, E., & Pennebaker, J. (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion, 18(8), 1121–1133.

    Article  Google Scholar 

  40. Sadeque, F., Xu, D., & Bethard, S. (2017). Uarizona at the CLEF erisk 2017 pilot task: Linear and recurrent models for early depression detection. In Working Notes of CLEF 2017—Conference and Labs of the Evaluation Forum, Dublin, Ireland, September 11–14, 2017.

    Google Scholar 

  41. Trotzek, M., Koitka, S., & Friedrich, C. M. (2017). Linguistic metadata augmented classifiers at the CLEF 2017 task for early detection of depression. In Working Notes of CLEF 2017—Conference and Labs of the Evaluation Forum, Dublin, Ireland, September 11–14, 2017.

    Google Scholar 

  42. Trotzek, M., Koitka, S., & Friedrich, C. M. (2018). Utilizing neural networks and linguistic metadata for early detection of depression indications in text sequences. IEEE Transactions on Knowledge and Data Engineering, 32(3), 588–601.

    Article  Google Scholar 

  43. Trotzek, M., Koitka, S., and Friedrich, C. M. (2018). Word embeddings and linguistic metadata at the CLEF 2018 tasks for early detection of depression and anorexia. In Working Notes of CLEF 2018—Conference and Labs of the Evaluation Forum, Avignon, France, September 10–14, 2018.

    Google Scholar 

  44. Villegas, M. P., Funez, D. G., Ucelay, M. J. G., Cagnina, L. C., & Errecalde, M. L. (2017). LIDIC—unsl’s participation at erisk 2017: Pilot task on early detection of depression. In Working Notes of CLEF 2017—Conference and Labs of the Evaluation Forum, Dublin, Ireland, September 11–14, 2017.

    Google Scholar 

  45. Wang, X., Zhang, C., Ji, Y., Sun, L., Wu, L., Bao, Z. A., & depression detection model based on sentiment analysis in micro-blog social network. In Trends and Applications in Knowledge Discovery and Data Mining - PAKDD,. (2013). International Workshops: DMApps, DANTH, QIMIE, BDM, CDA, CloudSD, Gold Coast, QLD, Australia, April 14–17, 2013. Revised Selected Papers, 2013, 201–213.

    Google Scholar 

  46. Xue, Y., Li, Q., Jin, L., Feng, L., Clifton, D. A., & Clifford, G. D. (2014). Detecting adolescent psychological pressures from micro-blog. In Proceedings of the Health Information Science—Third International Conference, HIS 2014, Shenzhen, China, April 22–23, 2014 (pp. 83–94).

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by the PREVISION project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under GA No 833115 (https://cordis.europa.eu/project/id/833115). The paper reflects the authors’ view and the Commission is not responsible for any use that may be made of the information it contains.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josiane Mothe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mothe, J., Ramiandrisoa, F., Ullah, M.Z. (2022). Comparison of Machine Learning Models for Early Depression Detection from Users’ Posts. In: Crestani, F., Losada, D.E., Parapar, J. (eds) Early Detection of Mental Health Disorders by Social Media Monitoring. Studies in Computational Intelligence, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-031-04431-1_5

Download citation

Publish with us

Policies and ethics