Keywords

1 Introduction

Young people, particularly university students, have attracted the attention of various researchers in the field of mental health [13], this is due to the frequency with which university students present difficulties that affect them emotionally, which prevents them from having a correct adaptation community [17]. This phenomenon has led to several mental health problems, particularly anxiety and depression disorders, which have become a serious health and economic problem (accounting over a trillion dollars each year) [19] and tend to interrupt the proper development of the student [17]. Depression, particularly, a precursor to suicide, has a high cure rate if detected and diagnosed early [4]. Therefore, it is very important for institutions to develop collaborative work environments so that the emergence of these issues can be detected and avoided opportunely [10]. Even though institutions nowadays rely on management systems to accelerate student learning, they tend to omit communications technologies such as social networks and microblogging, where students participate in real-time collaborative interactions which more genuinely represent the students’ thinking and which shed light into the mental health issues that the student could present [1].

Due to the constant evolution of computer applications, several of them have been used for mental health purposes [7], including social media [26], self-report questionnaires [6], biosignals [8, 27], and distributed environments. Distributed environments have gained particular importance in information technologies [3], particularly those that have evolved to become more intuitive and user-friendly, allowing collaborations at very accessible costs [1]. With this in mind, educational institutions have tried to create better learning management systems to reduce the gap which still exists in terms of these types of platforms for students [21]. Even though there have been multiple efforts in this direction, institutions have yet to harmonize three important factors: education, support, and interaction between members of the educational communities [1].

One particular option to strengthen these factors in educational communities is microblogging [5]: a form of communication in which users describe their status in brief posts [22] (usually with a character limit), which can then be shared through social networks, cell phones, Web platforms, etc. Microblogging has been increasingly studied in research as a recent phenomenon in which Twitter® is largely responsible. It is estimated that in 2009 there were more than 32 million people on this platform [9] and that in 2008 11% of the US population had posted on a microblogging site. Some uses of microblogging at the time were particularly intent on sharing information, keeping up with topics of interest, and directly communicating with others [9], and today they are sharing their activities or searching for information [14], including during emergency situations [30].

In recent years, microblogging data has emerged as a valuable resource for detecting mental health symptoms, particularly depression [31]. Numerous studies have explored the detection of depression symptoms using text analysis techniques [18] and machine learning algorithms, such as Support Vector Machines (SVMs) [16] and multi-kernel SVMs [25]. These studies have primarily focused on English and Chinese languages, delving into various aspects of depression and related tendencies, including rumination [24]. However, one notable gap in the literature is the scarcity of studies conducted in the Spanish language. This limitation hinders our understanding of mental health symptoms in Spanish-speaking populations. To address this gap, there is a need for the development of an ad-hoc microblogging tool specifically designed for Spanish users. Such a tool would enable comprehensive research and analysis in the context of Spanish language and culture, ultimately contributing to a more inclusive and diverse understanding of mental health.

In this work, we present microblogging as a dual purpose mechanism, first, as a means of interaction between students, teachers, entrepreneurs, etc., that allows users to have the freedom to share experiences, opinions, and concerns. Second, we must detect patterns that denote the student’s problems in mental health through the use of automated natural language processing (NLP) applications and be able to refer him or her to attention if required.

2 Materials and Methods

2.1 Software Development

For the development of the platform, the Scrum methodology was used [12]. Scrum is an agile development methodology which has been reported to be more productive and effective for these types of purpose than other methodologies [23]; it is based on closing short development cycles known as “Sprints” [23].

The elements of Scrum are described as follows [28]:

  • Roles. The specific responsibilities and accountabilities assigned to individuals or groups within the development team, product owner, and Scrum master. They ensure that the philosophy of the methodology is followed as good as possible:

    • Product Owner (owner of the product).

    • Scrum Master (facilitator of collaboration, process adherence, and impediment removal).

    • Team (development team).

  • Events. Events aim to minimize undefined and improvised meetings and to establish a series of instances that allow better communication and collaboration in the team [28]. This reduces time spent on lengthy meetings and restrictive predictive processes. They also have a “TimeBox” with a fixed duration; an event ends when its purpose has been achieved. The events commonly used in Scrum are [28]:

    • Sprint.

    • Sprint Planning.

    • Daily Scrum.

    • Sprint Review.

    • Sprint Retrospective.

  • Artifacts. Artifacts provide transparency and the ability to inspect and adapt processes so that all team members understand what is being accomplished [28]:

    • Product Backlog.

    • Sprint Backlog.

    • Increase.

A schematic diagram of the Scrum methodology is shown in Fig. 1. First, the Product Owner prioritizes the Product Backlog, which is a list of desired features and requirements. Then, the Scrum Team conducts Sprint Planning to determine which items from the Product Backlog will be included in the Sprint and creates the Sprint Backlog. During the Sprint, the Scrum Team works on developing the product increment. At the end of the Sprint, a Sprint Review is held to demonstrate the product increment to stakeholders and gather feedback. The final stage of the cycle is the Sprint Retrospective, where the Scrum Team reflects on the Sprint and identifies areas for improvement. The Sprint ends after the Sprint Retrospective meeting.

Fig. 1.
figure 1

Adapted from [22, 28]

The Scrum iterative cycle, showing the different elements and operations: Product Backlog, Sprint Planning, Sprint Backlog, Daily Scrum, Increment, Sprint Review, and finally, Sprint Retrospective.

2.2 Platform Description

The microblogging web platform for support in managing conditions that affect the mood of students allows fostering interaction between students through communities that address the most important topics of interest today or that may be more useful to students with the objective of detecting states of mind that could affect the emotional well-being of students, both in everyday life and at school. In this way, users can communicate through the platform, interact with students of higher and lower semesters to provide support in case of any doubt or concern that may arise, thus promoting teamwork indirectly based on the support given and received by the students that make up the communities; likewise, there are communities dedicated to leisure that allows the students to be free to avoid focusing exclusively on the academic field. The latter provides a tool for the a posteriori analysis of data in order to enhance the early detection of psychological disorders such as anxiety, depression, among other disorders. This detection is proposed through the analysis of text information using NLP techniques that can be collected directly from student publications to detect symptoms in a timely manner and take the appropriate measures to be treated.

The technologies used for the development of the platform were the following:

  • Python: Used for backend development and integrated with an Object Relational Mapping (ORM) system.

  • Vue.js: Employed for frontend development, enabling the creation of interactive user interfaces.

  • PostgreSQL: Utilized as the database management system to handle relational databases.

  • Django: Employed for building application programming interfaces (APIs) that facilitate communication between the frontend and backend components of the platform.

2.3 Platform Architecture

The Vue.js framework, as mentioned above, was used for FrontEnd, which together with Buefy (a component library for user interfaces) allowed the platform to be implemented in a responsive, pleasant, and intuitive way for students. Figure 2 presents the architecture of the platform and also shows how the different technologies that were used relate to each other.

The Web platform interacts with the database, which provides the necessary storage for the subsequent analysis of the content of the publications. This is performed through various APIs, which employ the Django framework for the development of the necessary models that adapt to the needs of the platform, the logic of creating, updating, listing, and deleting data, as well as the URLs necessary for consumption of the services. An important point to highlight is where the database is hosted, Heroku, a cloud platform with high accessibility and low cost. Unlike other platforms, Heroku supports a large number of programming languages [11]. Finally, PostgreSQL was used as the database manager. This system uses SQL-relational databases and adapted very well with the Django framework, which provided greater advantages when it came to implementing it and using it in production.

Fig. 2.
figure 2

The services and technologies used in the platform. Those for FrontEnd (left), Backend APIs (center), and database (right).

2.4 Platform Functionalities

The main functionality of the platform is the communities module. In order for a student to be able to post and see what other people have already posted, they need to join the community. Figure 3a shows the interface where the student is instructed to perform the said task, in this case, the “Sports” task. Once the student has joined the community, he/she can post publications and can also see the publications that have been made in that community (Fig. 3b).

Fig. 3.
figure 3

Interface of the “Sports” community, seen by a student who (a) has not yet joined it, and who (b) has already done so.

In order for a publication to be displayed correctly within a community, the student must first enter the content of the publication in the text input provided by each community to which they are attached and assign it a label that best suits its content.

There are several ways in which an user can interact with a given post, such as rating, reporting and commenting. Rating is done when the student wants to award a prize (in experience points) to a post by another student in the community. For this, the student must select the experience points that he wishes to grant and register them on the platform (Fig. 4a). The experience points will be reflected in the profile of the user who received them. Each student can send a maximum of 5,000 experience points per day (modifiable). Reporting is allowed when a user detects a misplaced post within a community; the platform gives him the option to report it (Fig. 4b). This is important to ensure that the content on the platform is appropriate. Finally, a student can comment on an existing post by pressing the Discuss button, writing the comment, and submitting it (Fig. 4c).

Fig. 4.
figure 4

Possible interactions with a post, such as (a) awarding experience points, (b) reporting a post for removal, or (c) commenting a post.

An administration module for the exclusive use of the administrator of the microblogging platform was also developed. One of the functionalities to which the administrator has access is to see the posts that have been reported as inappropriate by users and to decide whether they should be removed or remain on the platform (Fig. 5).

Fig. 5.
figure 5

List of publications that have been reported by students. The list includes IDs for the report and post, as well as the name of the user, the post itself, the date it was posted, and the option to remove it.

3 Development and Results

The methodology presented in this document allowed the development and launch of the microblogging platform for students of the Department of Systems and Computing of the Technological Institute of Mexico (TecNM)/Technology Institute of Merida (ITM) during the January-June and August-December 2022 semesters. The platform presented in this work, being part of a comprehensive system, allows direct association of the results of the text analysis with other indicators obtained by other modules of the platform to which it belongs, for example, scores obtained from commonly used questionnaires for the detection of mental health symptoms, as described in [22]. The original study included 157 undergraduate students. Ethical guidelines and regulations were strictly followed for both the self-report questionnaires and the input of natural language; all participants provided informed consent. The participant group consisted of 30 females and 127 males, with ages ranging from 17 to 23 years. The results obtained by this module, CSV files, allow the results from the self-report questionnaires to be directly integrated with artificial intelligence or data science algorithms.

In particular, our deployment allowed us to capture texts that were subsequently analyzed using a NLP methodology for detecting mental health symptoms. While providing a comprehensive description of the technique is beyond the scope of this work, we can outline its general design. The data for each user were linked to their responses to the PHQ-9 questionnaire, a very common questionnaire for assessing symptoms of depression [15]. The scores of all users were classified ascendingly and separated into four data sets corresponding to quartiles, Q1 (lowest) through Q4 (highest). The words entered into the microblogging platform for all users in the quartiles Q1 and Q4 were pre-processed using common NLP techniques, particularly tokenization, which involved splitting the texts into individual words or tokens, eliminating punctuation signs, prepositions, and other commonly occurring linguistic elements that typically do not carry significant semantic meaning. After preprocessing, we used Word2Vec [20] feature extraction methods to represent each text in a numerical format that ML models could process. Word2Vec is a predictive word embedding model that learns word representations based on their contextual usage. It takes into account the surrounding words of a target word to predict it or vice versa. Next, we used a SVM classifier to train a model using labeled data. Labeled data consisted of user texts that had been manually annotated as “depressed” (Q4) or “non-depressed” (Q1) according to their corresponding PHQ-9 scores. The performance of the SVM model in classifying depressive symptoms in the written language of undergraduate students was evaluated using recall.

In the context of depression screening, recall can be a more effective metric to consider due to the nature of the task. Detecting individuals who may be experiencing depressive symptoms is crucial to ensure that appropriate support and intervention can be provided. By focusing on recall, we aim to minimize false negatives, which are instances where individuals with depressive symptoms are incorrectly classified as nondepressed. To assess the robustness and generalizability of the model, we employed 5-fold cross-validation, and the SVM model was trained and evaluated iteratively using different combinations of training and validation sets. The average recall obtained from cross-validation was 0.7823. While there is room for improvement, these results demonstrate the effectiveness of the NLP-based SVM model in detecting depressive symptoms in the written language of undergraduate students. More research is needed to determine whether more samples will improve the efficacy of the classifier.

For a qualitative analysis, particularly useful for mental health professionals, these datasets were used to generate Word Clouds. Word Clouds, are a commonly-used visual tool to highlight the results of qualitative analyses [2]. They show the words that most often appear in a corpus of words. The size of each word in the Word Cloud corresponds to the frequency of occurrence of it in the corpus, allowing interpreters to analyze differences between distinct corpus and hypothesize possible reasons behind those differences [2]. The Word Clouds corresponding to Q1 and Q4 are shown in Fig. 6. Words such as “creo” (“I believe”) occur evidently more often in the Q1 word cloud than in the Q4. These results are consistent with the literature which has identified how the language of the depressed patients is different from those who are not [29].

Fig. 6.
figure 6

Word Clouds of users in quartiles (a) Q1 and (b) Q4 according to the PHQ-9 questionnaire results and their inputs to the microblogging module.

4 Discussion and Conclusions

In this article, a microblogging platform was presented as part of a system that allows students to express themselves freely about various topics of interest among communities of students with the same hobbies, preferences, and interests. The development of this tool offers the possibility of allowing students to express themselves, which could be particularly useful when they have emotional difficulties, either in their daily life or at school. This could support the timely detection and prevention of the development of mental health problems, such as psychological disorders, through the textual analysis of the publications made by users. In particular, the platform allowed us to collect the necessary data to apply NLP techniques prior to the application of a SVM, which in turn allowed us to generate Word Clouds for users showing and lacking symptoms of depression, according to the PHQ-9 questionnaire, which is consistent with the reported literature.

The performance of the SVM classifier heavily relies on the availability of accurately labeled data, which, in this particular case, ultimately depends on how the depression symptoms are detected. Obtaining a large and diverse dataset with accurate labels for depressive symptoms may pose challenges. The lack of such database makes it possible for the SVM classifier to struggle to generalize well to unseen data, leading to reduced performance. Also, the choice of features and their representation, such as the Word2Vec embeddings, can influence the performance of the classifier. Inappropriate feature selection or representation may result in the loss of important information or introduce noise. The optimal selection of features for this particular study requires further research.

The classification of depressive symptoms based on written language can be challenging due to the subjective and context-dependent nature of text. Different individuals may express symptoms differently, and the classifier might struggle with capturing nuanced variations in language use. Different approaches to capture text should be considered and the results of the classifier analyzed.

In terms of the platform, to speed up the development process, cutting-edge technologies were used, such as Vue.js, which allows the reuse of code through its components, which helped to have a more readable development and with less repetitive code, in conjunction with Django, which provided a vast amount of functions for the realization of the APIs. These technologies adapted very well with the PostgreSQL database manager system, used to store and manage information from student posts in the various communities.

The development tools were selected due to their scalability, community support and integration capabilities. However, some software development tools might have a steeper learning curve, requiring time and effort for developers to become proficient in their usage. On the other hand, depending on the chosen tools, there may be limitations in terms of customizability or flexibility, which could restrict certain design choices or unique feature implementations. This may suggest considering alternative software development tools, including frameworks or libraries, cloud services and DevOps tools.

Finally, the noninvasive nature of the platform also allows obtaining more reliable data than those obtained by digital tools that ask the user to enter text to answer a particular question, avoiding unnatural information or information that is influenced by the user awareness of being psychologically evaluated. The design of this tool allows the “automatic” generation of data, that is, it frees researchers from tasks that involve submitting the patient to generate texts periodically or as part of extraordinary research protocols and simply allows downloading and analyzing the data.

5 Future Work

The microblogging platform has proved to be useful for research and mental health attention purposes, so it is expected that its operability and functionality will be constantly improved. Among the main improvements that can be added, are the following:

  • Enhance representativeness of data: Investigate active engagement of students experiencing psychological distress with the microblogging platform and address any hesitancy to openly share struggles, ensuring a more representative sample.

  • Consider alternative methods for visualizing mental health expressions: Recognize limitations of word clouds and explore additional methods or visualizations that capture the context and nuances of students’ expressions related to mental health.

  • Broaden assessment of mental health dimensions: Expand beyond the PHQ-9 questionnaire to capture a broader range of mental health dimensions and related conditions, ensuring a comprehensive understanding of students’ well-being.

  • Evaluate alternative machine learning algorithms: Compare the effectiveness of other machine learning algorithms in identifying mental health symptoms in student texts, beyond SVM, to determine the most suitable approach.

  • Incorporate user-centered design principles: Involve students in the development process and consider their experiences, concerns, and feedback to improve the platform’s effectiveness and user-friendliness.

  • Expand socialization spaces and tags: Allow students to create additional socialization spaces and expand the catalog of communities, as well as provide flexibility in adding tags that best fit the content of their publications.

  • Enhance post editing and deletion options: Allow users the freedom to edit their posts, add tags if forgotten, and permanently delete posts when desired, while defining editing protocols for NLP and analysis purposes.

  • Improve reactions and comments: Expand the diversity of reactions to publications, ensuring they are mutually exclusive, and enable users to respond and react to comments, while incorporating reporting and hiding functionalities for improved security.

  • Refine pagination and visibility: Improve the platform’s pagination system to exclude hidden publications for a more streamlined user experience.

  • Expand publication options: Enable a wider range of publication formats, such as links, images, gifs, and surveys, offering more versatile content creation capabilities.

  • Increase training data for SVM classifier and explore alternative ML models: Augment the SVM classifier with more text samples for training and evaluate the performance of different machine learning models in identifying mental health symptoms.