1 Introduction

Over 73 million people have been affected by COVID-19 across the globe. This pandemic has significant impact on the mental health of individuals who lost loved ones and those are unable to socialize due to mandatory social isolation-based health policies. Complex psychological reactions to COVID-19 quarantine measures and related emotional responses have been recognized as hard to disentangle (Becker et al. 2019; Ong et al. 2019; WHO 2020; Cauberghe et al. 2021). A study conducted in Belgium found social media to be positively associated with constructive coping for adolescents with anxious feelings during quarantine period (Cauberghe et al. 2021; de Las Heras-Pedrosa et al. 2020) Also, social media provides a platform for risk communication and exchange of feelings and emotions to curb social isolation. Research has shown that social media data provide a wealth of information on the natural flow of people’s emotional feelings and expressions (Li et al. 2020). This rich source of data can be utilized to curb the data collection barriers during the pandemic. The goal of this research was to use artificial intelligence (AI) to uncover the hidden, implicit patterns related to emotional health of people subject to mandatory quarantine, embedded in a latent manner in their tweets.

In this paper, we created a natural language processing (NLP)-based emotion detection framework that aims to provide useful information by examining unstructured social media data. The purpose of the framework is to show the meaning and emotions of users’ expressions related to a particular topic, which can be used to understand their psychological health and emotional well-being. The use of NLP approach for emotion detection from unstructured texts such as social media (e.g., Twitter) remains a challenge in biomedical applications of AI. However, we aim to demonstrate the effectiveness of deep learning models for detecting emotions from COVID-19 tweets. In addition, our emotion-semantic trends depicting public response to “Stay At Home” measures could provide useful insights for improved decision making as regards handling future pandemics. The contributions of this paper are as follows:

  • We created a new database of emotion-annotated COVID-19 social isolation tweets which could be used for future comparisons and implementations of emotion recognition systems based on machine learning models.

  • We design a triple-task framework to investigate eight emotional states via Plutchik’s model using the COVID-19 tweets in which all three different tasks are complementary to each other towards a common goal.

  • We discover semantic-word trends using latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) techniques. We aim to have a semantic knowledge discovery based on topic trends and se- mantic structures during the first wave of the pandemic, which provides an effective mechanism for managing future waves.

This paper is organized as follows: Sect. 2 presents a review of literature on existing emotion detection models based on social media data related to health and well-being, while Sect. 3 describes our multi-task framework for emotion detection. Section 4 describes experimental setup and findings, while Sect. 5 discusses limitations and future research directions. Section 6 summarizes and concludes this work.

2 Related work

Over the years, natural language processing (NLP) and machine learning (ML) have been used to identify the type of emotions elicited in unstructured data such as tweets. In this section, we provide a review of literature on emotion detection studies using data from online health communities, emotion recognition using lexicon-based models, deep learning and machine learning, as well as directions for public health decision making based on COVID-19-related text analytics. Emotion detection analytics through information retrieval and NLP as a mechanism have been used to explore large text corpora of online health com- munity communications in mental health, dentistry, cancer treatment, fitness, and general health and wellness. For example, a communication tool was introduced for mental health care to understand counseling content based on emotion detection and NLP (Kabir and Madria 2021; Yousaf et al. 2020; Wechsler 2023; Castiglione et al. 2021b, a; Umer et al. 2020). Most of the studies on emotion analysis used machine learning techniques (Yu et al. 2014; Johnsen et al. 2019; Khanpour and Caragea 2018; Plaza-Del-Arco et al. 2020; Hasan et al. 2019). Other work focused on internet of things (IoT) applications for COVID-19 diagnosis (Castiglione et al. 2021a; Kabir and Madria 2021; Yousaf et al. 2020). Moreover, researchers in (Kabir and Madria 2021) developed a machine learning model based on a voting classifier to detect happy or unhappy emotions in COVID-19 Tweets. In our previous work (Jelodar et al. 2020), sentiment analysis and latent-topics extraction techniques were applied in uncovering issues related to COVID-19 pandemic (Aslam et al. 2020; Venigalla et al. 2020). We extend the methodology in this study with the aim of detecting emotional states and semantic trends from people’s reactions during mandatory quarantine using the StayAtHome hashtag on Twitter (Plutchik 1980; Ekman et al. 1999).

3 Research model

The main goal of this work is to develop a framework for detecting the discrete emotions expressed in COVID-19 tweets using NLP techniques. Specifically, we present a multi-task emotion detection framework based on Stay-At-Home aspects of COVID-19 tweets. Moreover, this framework benefits from a multi-layer CNN deep learning model to explore the emotion of sentences and context representation based on Plutchik’s theory. Our approach involves three main tasks. The first task is to create models that investigate people’s emotional reactions to mandatory Stay-At-Home restrictions based on their tweets. The second task is to discover semantic and emotional trends to uncover patterns depicted in the first wave of the pandemic during a 30-day period (28th April to 1st June, 2020). Finally, the third task is to develop a deep learning-based emotion detection model for analyzing quarantine-related tweets. Our framework, including these three tasks, is presented in Fig. 1.


1. #StayAtHome Tweets Preprocessing

Our inclusion criteria involve only tweets related to the COVID-19 Stay-At-Home hashtag (i.e., #StayAtHome). Next, we performed lexical text analysis and preprocessing of data to clean the data by removing the noisy and irrelevant tweets. To achieve this, we applied the following NLP techniques: sentence splitting and word tokenization, HTML cleaning (to remove HTML tags), removal of stop-words and hashtags, and stemming (to remove prefixes and suffixes to form the root words).


2. Task 1: Emotion-detection of #StayAtHome tweets:

To achieve our research goal, Task 1 is the most important process for automatic detection of emotions from #StayAtHome tweets. It is the first step towards the initial determination of the type of emotions expressed, which also has a direct influence on Tasks 2 and 3. We utilized the National Research Council Canada (NRC)’s Word-Emotion lexicon (Mohammad et al. 2013) that is based on Plutchik’s Wheel of Emotions to perform this task. The lexicon maps individual words to one of eight emotional states (anger, joy, fear, disgust, sadness, surprise, trust, and anticipation), with a score attached to each word-emotion association, as shown in Algorithm 1. In this research, three processing steps were carried out on every annotated tweet: (a) identifying the type of emotion using Plutchik theory, (b) assigning the emotion score obtained from the NRC Word-Emotion Lexicon, and (c) identifying the emotion and the maximum association score based on the scores computed.

figure a
figure b
Fig. 1
figure 1

Research framework and pipeline for the COVID-19 tweet emotion capture and analysis

Since the labelling is done automatically and no human tagging is used, we will have consistent data annotation. This model defines eight basic emotions and makes it possible to provide a consistent classification of texts. Moreover, we uncover the trend in the data based on semantic and emotional aspects through NLP methods. Figure 2 provides an example of selecting the score for COVID-19 tweets. For example, from the tweet after text processing showed “Sad man friend whos livin skin cant stand company” and the term of ’Sad’ associated with FEAR and this emotional expression provided the highest score from NRC based Lexicon when the process is detecting the predominant emotion.


3. Task 2: Emotion/Semantic-Trends of #StayAtHome

Researchers have identified timing of the emotional progression and noted that positive emotions arose significantly earlier and the negative emotions took longer (Kwon et al. 2021; Sun et al. 2020; Gautam and Sharma 2020; Hofmann 2001; Blei et al. 2003). Identification of the emotion and semantic trends over time can be helpful and effective to understand temporal changes of the opinions related to the human behavior. In fact, understanding the mood changes or awareness of the emotion trends can have a practical application for public health decision making (Li and Xu 2020; Lee et al. 2021; Wu et al. 2020; Kim 2014; Kim and Chung 2020). We use semantic topics (Hochreiter and Schmidhuber 1997; Uddin and Nilsson 2020) discovered in the entire dataset to detect and describe semantic trends. In order to obtain semantic topics, we developed a topic model. We considered two popular methods for evaluating and determining an optimal approach to obtain semantic trends from #StayAtHome tweets. These methods include, the Probabilistic Latent Semantic Analysis(PLSA) (Hofmann 2001) and Latent Dirichlet Allocation(LDA) (Blei et al. 2003)models to obtain the best semantic related-words and discover semantic structures of the COVID-19 tweets. The PLSA model is an NLP technique that shows topical similarities between words, while the LDA model has proven very useful for semantic extraction and generating trends over time. LDA has been successfully applied in several applications such as topic discovery, temporal semantic trends, document classification, and finding relations between documents. The aim of Task 2 is to capture two kinds of trends based on emotion and semantic aspects of the COVID-19 tweets. Then by investigating the distributions of these semantic topics across various days, we obtain semantic trends. Algorithm 2 shows a pseudocode to handle second task. However, to feed Task 2 with required data, we first computed the types of each emotion to identify the trends among different emotions based on Task 1.


4. Task 3: Modeling Sentence and COVID-19 Emotion-Detection

Machine learning offers the advantage of automatic emotion detection beyond using existing lexical dictionaries for emotion analysis. In particular, deep learning models have proven successful in many NLP applications for emotion detection from health and medical text data (Uddin and Nilsson 2020;). We focused used the convolutional neural network (CNN) (Kim 2014; Kim and Chung 2020) to implement an emotion de tection model based on emotion vectors in #StayAtHome tweets, as showed in Algorithm 3.

figure c

4 Experimental evaluation

4.1 Twitter dataset and #StayAtHome!

To collect the data, we considered Twitter as a valuable source and a platform that plays an important role in reflecting people’s opinions, requests, problems, and needs on a variety of issues during the COVID-19 outbreak. We extracted above one million (1,047,968) tweets based on /#StayAtHome using the Twitter Search API between 28th April and 2nd June, 2020.

4.2 Informative trends of the first wave: emotion and semantic

Trending topics, to a certain extent, describe the opinion of a community and provide the means to analyze it, knowing where public attention is at a certain time point and this has become a matter of interest for researchers and health professionals. Regarding Task 2, we need to predict the trend of topics and give some explanations for the important variation of trends about COVID-19-related social isolation. To test our machine learning approach with respect to this task, we randomly split our dataset into 90% for training and the remaining 10% for testing.

4.3 Relationship between semantic trends and StayAtHome tweets

It is difficult to identify the key concepts discussed by users from a million tweets in traditional ways, hence we examined NLP methods (LDA and PLSA) to extract topics based on semantic aspects to better understand behaviours and people’s reactions while staying at home. Then, we investigated the distribution of generated topics on different days of the initial wave of the outbreak, which as a result of this process can be helpful in managing public health in the community. First, we investigated PLSA and LDA models to analyze and validate the relationship between semantic topics extracted from COVID-19 tweets and related issues of the pandemic. For this purpose, we use the Mallet package. Then, we generate 100 topics and focused on top five topics of all COVID-19 tweets resulting from topic modeling for discovering semantically related words. We consider an LDA model for performing Task 2.

To implement our analytic framework’s detection of semantic trends shown in the topics during the initial wave of the pandemic, we investigated top five topics (i.e., S1, S2, S3, S4, and S5, as shown in Fig. 2 ) to better understand the online community reactions change over time. These topics were distributed across different days and we were able to isolate time varying nature of the semantic trends of #StayAtHome tweets labelled by an automated process described in Task 1. As shown in Fig. 2, the highest ranked (most frequent) topic is characterized by the words Home, Staysafe, Lockdown, Love, and Family. These correspond to the safety issues related to staying at home. We label this topic as S1. It rapidly decreases over time at the rate of 0.11 (p = 0.04) within the first 28 days and the decline was greater within the last 14 days 0.28 (p = 0.001) with some day-to-day fluctuations (see Fig. 4 ).

Fig. 2
figure 2

Semantic trends of the initial waves of COVID-19 pandemic by #StayAtHome

4.4 Relationship between emotion trends and StayAtHome tweets

In Tasks 1 and 2 of the framework, we applied the NRC emotion lexicon, which is supported by Plutchik’s theory and contains about 14,000 words and mapped to eight primary emotions: anger, anticipation, joy, surprise, sadness, disgust, trust, and fear. The NRC dictionary has been widely used for emotion analysis using social media data (such as Twitter). By virtue of Task 2, we identified the relationship between emotion trends and the tweets based on the Plutchik emotion theory. As shown in Fig. 3, the dominant emotion elicited in tweets over time is “Anticipation”. According to psychology literature, anticipation and surprise can be related to positive or negative health emotional outcomes. Nevertheless, in this study anticipation stemmed out of the hashtag “#StayAtHome”, a restriction on a socially undesirable action and therefore, one can assume anticipation is mostly directed towards a negative emotional feeling of perceived susceptibility.

Fig. 3
figure 3

Distribution of emotion trends in #StayAtHome tweets over time in the initial wave of COVID-19 pandemic

4.5 Deep learning model configurations and training details

The objective of Task 3 of this work is to automatically detect emotions from #StayAtHome tweets by enabling Multi-Channel CNN methodology as a computational model for the emotion detection of the COVID-19 tweets. Regarding this model, we develope a three-channel CNN by utilizing multiple parallel convolution layers that learn the sentences using different kernel sizes. For each channel, an input layer is fed with emotion-word vectors. Then the word embedding vectors were concatenated as the feature vectors of the sequence. Moreover, in the first convolution layer, convolution calculation is performed using multiple filters with variable window sizes and a local emotion feature vector is generated for each possible word window size. In addition, we use a dropout layer to improve and help the performance of each convolution layer and also for decreasing overfitting during training. In our architecture, three max-pooling layers consolidate the output from the dropout layers. Next, we use a flatten-layer as a function that evokes the features taken from the pooling layer and plot it to a unit column. Finally, we concatenate the output of all learned features and fed them into dense layers to generate scores and recognize the type of emotion elicited in the sentences (tweets).

The model was trained on dataset generated from Task 1. Prior to this, we trained our own word embedding using the Word2Vec technique (Yu et al. 2014) which provides a much richer and meaningful text representations, compared to the bag-of-words approach. Since the long short-term memory (LSTM) recurrent neural network (Hochreiter and Schmidhuber 1997) is widely used for NLP-related problems in recent years, we also built an LSTM baseline model in this study. The LSTM model consists of 64 units. As shown in Fig. 4, we trained our models using different number of epochs (such as 10, 20, 30, 40 and 50) to assess model performance.

As Long short-term memory (LSTM) (Hochreiter and Schmidhuber 1997; Uddin and Nilsson 2020) is a standard base-line for this research area, we also use that deep model in our study. The LSTM network used in this research consists of 64 units. Here, we consider various parameters to train our model with the different number of epochs such as 10, 20, 30, 40 and 50 to ensure the significance of the obtained results. However, for each COVID-19 tweet, we have 8 labels that are features for our detection. Therefore, the output of the deep-learning model can determine the type of COVID-19 tweets with the labels. Figure 4 provides a clear view of variation with different parameters using word embedding trained. The advantage of the CNN model comes for detection of the type of emotions in COVID-19 tweets, which enables to avoid overfitting and still be able to find complex patterns to emotion detection in the introduced data.

Fig. 4
figure 4

The F1-score for COVID-19 emotion detection by comparing the CNN model and the LSTM model

5 Discussion, limitation and future work

Our results, in general, suggested that the machine learning methods we use are appropriate for the emotion detection of COVID-19 tweets. The study results clearly demonstrated anticipation as a prominent emotional semantic. Among various definitions for this emotion semantic analysis, anticipation is considered as one of “the mature ways of dealing with real stress”. Regarding this definition, people can lower their stress during the COVID-19 pandemic by anticipating and preparing how they are going to deal with it. Anticipation can be interpreted as either future positive and negative events according to (Wu et al. 2020) and are aligned with hope and fear which are the typical anticipatory feelings that arise in response to possibilities of future such events. A study that included multiple unigrams and bi-grams related to COVID19 twitter feeds were analyzed using machine learning approaches and their findings were similar to ours in that the dominant theme identified was anticipation with a mixed of feelings of trust, anger and fear (Kim 2014).

The use of online social network text data to understand user health behaviors and emotions has become an emerging research area in NLP and health informatics. COVID-19 introduced an unprecedented global threat that public health planning and policy-making community are still struggling to find best practices to curb the pandemic. This study findings showed a mechanism of how the emotions and semantic trends of people’s reactions to COVID-19 public health restrictions can be obtained for knowledge discovery and can inform related decision making. The advantage of this approach is that identifying these online trends provide easy and helpful information about public reactions to particular issues and thus it has recently attracted the attention of medical and computer researchers. In this research, our framework cover three practical tasks that are related to each other with a common goal to develop a deep-learning system for emotion detection and analysis of informative trends from COVID-19 tweets of people‘s reaction during the stay-at-home. Our final results uncovered important directions for public health policy makers and decision makers to pay attention to emotional issues that stemmed from those strict public health restrictions. Overall, the results could be useful for supporting decision making. However, this research is limited and examined sentiments/emoticons based on texts on COVID-19 issues, and we did not work on images in online social media. Currently, our database consists of 1,047,968 tweets based on #StatAtHome tweets from 28 April to 2 June 2020. Although, more tweets can be extracted based on #StayAtHome, but we believe that the number of current tweets is sufficient for this research work.

6 Conclusion

This paper presented a novel framework for emotion detection using COVID-19 tweets in relation to the “stay-at-home” public health guidelines. For this framework, a multi-task framework of COVID-19 emotions detection via a CNN model was presented. The research further shows that the framework is effective in capturing the emotions and semantics trends in social media messages during the pandemic. Moreover, it presents a more insightful understanding of COVID-19 tweets by automatically identifying the type of emotions including both negative and positive reactions and the magnitude of their presentation. By considering the length and strength of the staying at home public health order in the first wave, we believe that it is necessary to examine the changes in people’s emotions by monitoring the time trends and fluctuations of directions using Twitter data.