1 Introduction

Emerging infectious diseases are responsible for many deaths and disabilities globally [1]. Evidence shows that at least 43 million people contracted the H1N1 flu worldwide within 12 months of the pandemic which, in turn, resulted in over 200,000 deaths [2, 3]. In addition, 770,000 HIV/AIDS-related deaths were reported in 2018 alone, with over 37 million people infected globally [4]. The latest emerging infectious disease, COVID-19 [5, 6], has already infected over 89.5 million people worldwide, with a mortality of at least 1.9 million as of January 9, 2021 [7]. Emerging infectious diseases have also been shown to inflict significant burden on economies and public health systems [8,9,10]. For example, global health systems are struggling to cope with the COVID-19 pandemic, while unemployment/job losses, reduced income/productivity, and business closures are prevalent among individuals and organizations due to the lockdown measures imposed by governments. To understand public perceptions toward the pandemic, social media data can provide the required insights from a global perspective [11].

Social media has been a major and rich data source for research in many domains, including health, due to its 3.8 billion active users [12] from diverse geographic locations across the globe. For instance, researchers analyzed user comments extracted from social media platforms (such as Facebook, Twitter, Instagram, and discussion forums) to uncover insights about health-related issues (e.g., mental health [13, 14], substance use [15, 16], and diseases [17,18,19,20]), political issues (e.g., elections [21,22,23,24]), and business-related issues (e.g., customer engagement [25, 26]). With respect to COVID-19, social media comments can reveal public opinions about governments and health organizations’ response to the pandemic, as well as economic, health, social, political, physical, and psychological impact of COVID-19 on global populations in line with the factors affecting efforts to limit the spread of the disease either negatively or positively.

In this paper, we apply natural language processing (NLP) to analyze COVID-19-related comments from six social media platforms (Twitter, Facebook, YouTube, and three online discussion forums) to uncover issues surrounding the pandemic based on public perceptions. NLP is a widely used method for extracting insights from unstructured texts, such as social media data and clinical texts (e.g., electronic health records [27] and patient journals [28]). We aim to answer the following research questions in this work:

  • RQ1: What are the negative issues (economic, socio-political, educational, and political issues) shared by people on social media with respect to the COVID-19 pandemic?

  • RQ2: What are the positive opinions or perceptions of people with respect to COVID-19 and how it is being handled?

  • RQ3: How can the negative issues be addressed using insights from the positive opinions and other research evidence?

The methodological approach utilized in answering our research questions are as follows:

  1. 1)

    We applied NLP approach for detecting relevant and opinionated keyphrases from social media comments related to the COVID-19 pandemic. To extract meaningful keyphrases, our approach considers the context in which words appear in the unstructured comments.

  2. 2)

    We identify negative and positive themes that capture public opinions about the pandemic. Our results reveal 34 negative themes out of which 17 are economic, socio-political, educational, and political issues. Twenty (20) positive themes were also identified.

  3. 3)

    We suggest interventions to tackle the negative issues. The interventions which are based on the positive themes and research evidence would inform and help governments and relevant agencies, as well as individuals, to minimize the spread and impact of COVID-19, and to respond effectively to future pandemics.

2 Related Work

Over the years, social media has been a rich source of data for health informatics research [29]. Natural language processing (NLP) techniques have been widely used for analyzing social media comments and clinical texts (such as identifying health-related and psychosocial issues with respect to the COVID-19 pandemic [30]).

The lexicon-based NLP technique was used to detect the prevalence of keywords indicating public interests in e-cigarette, marijuana, influenza, and Ebola using social media data, while latent Dirichlet allocation (LDA) technique was used to retrieve topics from the corpus [31]. LDA has also been utilized to extract latent topics from COVID-19-related comments posted on social media [32]. Also, the Natural Language Toolkit (NLTK) was used by Bekhuis et al. [33] to identify top collocated n-grams (bigrams and trigrams) from clinical emails.

Furthermore, a custom topic modeling technique, called Ailment Topic Aspect Model, was employed to generate latent topics from Twitter data with the aim of identifying mentions of ailments of interest, including allergies, obesity, and insomnia [34]. The non-negative matrix factorization is another topic modeling technique used in health informatics research to extract topics from social media data [35]. A third-party tool for text mining, called KH-Coder, has also been used to explore potential topics related to H1N1-related advice, vaccine, and antiviral uptake in the UK based on Twitter data [36]. The machine learning-based NLP was utilized to analyze unstructured clinical notes to predict hospital readmissions for COPD patients [37] and perform sentiment analysis of user comments on mental health apps [38]. None of the techniques above considered the context in which words appear in unstructured texts which can yield more meaningful and relevant keyphrases.

To demonstrate the significance of context-based text analysis, Dave and Varma conducted experiments to compare N-Gram chunking technique and the part-of-speech (POS) chunking technique [39]. Rather than just extracting n-grams, the POS chunking method considers context of words by using regular grammars or POS patterns that specify how sentences should be deconstructed into keyphrases of interest. Their results show that systems using the POS chunking technique extracted relevant features (keyphrases) and outperformed systems adopting N-Gram chunking for feature extraction. We extend this approach with enhanced part-of-speech (POS) patterns tailored to our goal, chunking and CoNLL IOB tagging, as well as keyphrase transformation and sentiment scoring. We further categorized the extracted keyphrases into broader themes using the thematic analysis method.

3 Methodology

Based on our research questions, the goal of this paper is to investigate and reflect on people’s personal experiences and opinions with respect to the COVID-19 pandemic using social media data. To achieve this, we utilize the following well-established computational techniques:

  1. 1)

    We developed programs or scripts to mine user comments related to COVID-19 from six social media platforms.

  2. 2)

    We preprocessed the data using NLP techniques.

  3. 3)

    We applied a seven-stage context-aware NLP approach to identify opinionated and meaningful keyphrases from the comments.

  4. 4)

    We applied thematic analysis to iteratively categorize related keyphrases identified in step 3 above into broader themes or categories.

Figure 1 shows the NLP pipeline utilized in extracting opinionated keyphrases from comments related to COVID-19 pandemic.

Fig. 1
figure 1

NLP pipeline for extracting opinionated keyphrases from COVID-19-related comments

3.1 Data Collection

A total of 47,410,795 comments related to COVID-19 were collected across six social media platforms (i.e., Twitter, YouTube, Facebook, PushSquare.com, Archinect.com, and LiveScience.com), as described below:

  1. 1)

    Twitter: We built a console application to mine 47,249,973 tweets in real-time using the Twitter Streaming API [40] and C# programming language. The program targets tweets from the following hashtags: #COVID19, #COVID, #ncov2019, #Covid_19, #StopTheSpread, #CoronaVirusUpdates, #StayAtHome, #selfquarantine, #COVID-19, #COVIDー19, #CoronaCrisis, #panicbuying, #caronavirusoutbreak, #SocialDistancing, #cronovirus, #CoronaVirusUpdate, #QuarantineLife, #Quarantined, #pandemic, #CoronavirusPandemic, #Coronavid19, #coronapocalypse, #QuarantineAndChill, #CoronaVirus, #MyPandemicSurvivalPlan, and #CoronavirusOutbreak.

  2. 2)

    YouTube: We wrote a Python script to automatically extract 111,722 user comments linked to 2,939 COVID-19-related videos using the YouTube Data API [41]. The keywords used for the video search include covid-19, covid19, and coronavirus.

  3. 3)

    Facebook: We adopted a semi-automatic technique to extract comments due to search restrictions imposed by Facebook. We first obtained 91 groups and 68 pages related to COVID-19 manually using the following keywords: COVID, COVID-19, and Coronavirus. Afterwards, we developed a Python script to retrieve 8,382 and 777 comments from the pages and groups, respectively.

  4. 4)

    Discussion forums: We collected 18,401, 20,747, and 793 user comments from COVID-19-related threads on PushSquare.com [42], Archinect.com [43, 44], and LiveScience.com [45], respectively, using Python scripts.

3.2 Data Preprocessing

To clean the data and prepare it for keyphrase extraction, we apply the following preprocessing steps using NLP techniques implemented in Python:

  1. 1)

    Remove mentions, URLs, and hashtags

  2. 2)

    Expand contractions (such as replacing “couldn’t” with “could not”)

  3. 3)

    Replace HTML characters with Unicode equivalent (such as replacing “&” with “&”)

  4. 4)

    Remove HTML tags (such as “ < div > ” and “ < p > ”)

  5. 5)

    Remove special characters that are not required for sentence boundary detection

  6. 6)

    Compress words with repeated characters (such as compressing “poooool” to “pool”)

  7. 7)

    Convert slangs to English words using relevant online slang dictionaries [46, 47]

  8. 8)

    Remove words that are numbers

After applying the above steps on the data, and removing non-English comments (identified using the langdetect Python library [48]) and duplicate comments, the total number of comments reduced to 8,021,341. We randomly selected about 13% of these comments (n = 1,051,616) to form the corpus used in this paper.

3.3 Keyphrase Extraction

To extract meaningful and opinionated keyphrases which are words or phrases representing topical content of each document (or comment) in our corpus, we utilized a context-aware NLP approach. This approach extends the version adopted by Dave and Varma [39] with enhanced part-of-speech (POS) patterns tailored to our objective, chunking (in conjunction with CoNLL IOB tagging [49]), as well as transformation and sentiment scoring stages. In subsequent subsections, we describe the keyphrase extraction component of the NLP pipeline in Fig. 1. In line with this, we present an algorithm (see Fig. 2) that accepts a regular grammar and our corpus as input parameters and returns opinionated keyphrases of interest as output. The algorithm was implemented in Python using the Natural Language Toolkit (NLTK).

Fig. 2
figure 2

The KeyphraseExtractor algorithm based on the context-aware NLP approach

3.3.1 Grammar Definition

We defined a regular grammar (see below) which is a set of rules composed of POS patterns that describe how the syntactic units of each document in our corpus are deconstructed into their constituents or parts. The grammar captures the context of each comment and the opinions/sentiments expressed using nouns, adjectives, and verbs. Research revealed that nouns are crucial for detecting the context of a conversation [50], while both adjectives and verbs are significant for sentiment classification [51].

Grammar: { <DT>? <JJ.*>* <NN.*>* <VB.*>? (<IN>? <DT>? <JJ.*>* <NN.*>*)? }

The regular grammar above is composed of patterns of POS tags from the well-established Penn Treebank Tagset [52, 53]. For instance, the < NN.* > pattern matches any type of noun (see Table 1), < JJ.* > matches any type of adjective, < VB.* > matches any type of verb, < IN > matches a preposition or subordinating conjunction, and < DT > matches a determiner. We considered determiners and prepositions since they usually occur together with nouns and adjectives in sentences (e.g., public concern about the virus). Also, the “*” symbol after a POS pattern refers to “zero or more occurrences,” while “?” refers to “zero or one occurrence.”

Table 1 Part-of-speech (POS) tags and description

3.3.2 Sentence Breaking and Tokenization

Next, each document is separated into unique sentences. To achieve this, we utilized a robust unsupervised algorithm (within the Python NLTK’s tokenize library [54]) which considers collocations, punctuations, capitalizations, and abbreviations in determining sentence boundaries within each document. Afterwards, each sentence is further broken down into words or tokens in preparation for POS tagging.

3.3.3 POS Tagging

Each token is assigned a POS tag (within the Penn Treebank Tagset) denoting its part of speech in the English language. For example, tokens in the following sentence “Stop panic buying and be sure to use face masks in public areas” are tagged as follows: [(‘Stop’, ‘NNP’), (‘panic’, ‘NN’), (‘buying’, ‘NN’), (‘and’, ‘CC’), (‘be’, ‘VB’), (‘sure’, ‘JJ’), (‘to’, ‘TO’), (‘use’, ‘VB’), (‘face’, ‘NN’), (‘masks’, ‘NNS’), (‘in’, ‘IN’), (‘public’, ‘JJ’), (‘areas’, ‘NNS’)].

3.3.4 Lemmatization

Next, each tagged token is lemmatized or converted into its root word based on its part of speech. Prior to lemmatization, we converted the tokens or words to lowercase. Lemmatization is achieved by using the English vocabulary and conducting morphological analysis of words [55]. Hence, a root word is the dictionary form of the original word. By converting the tokens to their root form, we harmonized similar words while preserving their meaning. For instance, the following verb words “seen” and “sees” are converted to their root form—“see.” Referring to our previous sample tagged tokens, the output of the lemmatization stage is: [(‘stop’, ‘NNP’), (‘panic’, ‘NN’), (‘buying’, ‘NN’), (‘and’, ‘CC’), (‘be’, ‘VB’), (‘sure’, ‘JJ’), (‘to’, ‘TO’), (‘use’, ‘VB’), (‘face’, ‘NN’), (‘mask’, ‘NNS’), (‘in’, ‘IN’), (‘public’, ‘JJ’), (‘area’, ‘NNS’)].

3.3.5 Chunking

Next, we created a chunker that uses the regular grammar defined above to match phrases comprising an optional determiner, followed by zero or more of any type of adjective, zero or more of any type of noun, zero or one of any type of verb, and an optional component. This component consists of an optional preposition, followed by an optional determiner, zero or more of any type of adjective, and zero or more of any type of noun. Using our previous example, the chunker produces the parse tree in Fig. 3, showing the key terms (KT) that match the grammar specified.

Fig. 3
figure 3

A sample parse tree illustrating the output of the chunker

To generate the candidate keyphrases, we first converted the parse tree (or chunks) generated by the chunker for each document into a CoNLL IOB format. An IOB (Inside-Outside-Beginning) tag specifies how a key term functions in the context of a phrase—whether the term begins (B-KT), is inside (I-KT), or outside (O-KT or O) the phrase [49]. Next, we iteratively group terms that are part of a keyphrase (i.e., B-KT and I-KT) and stops when a term that does not belong to the keyphrase (i.e., O-KT or O) is encountered.

For example, the CoNLL IOB format of the parse tree in Fig. 3 gives [(‘stop’, ‘NNP’, ‘B-KT’), (‘panic’, ‘NN’, ‘I-KT’), (‘buying’, ‘NN’, ‘I-KT’), (‘and’, ‘CC’, ‘O’), (‘be’, ‘VB’, ‘B-KT’), (‘sure’, ‘JJ’, ‘I-KT’), (‘to’, ‘TO’, ‘O’), (‘use’, ‘VB’, ‘B-KT’), (‘face’, ‘NN’, ‘I-KT’), (‘mask’, ‘NNS’, ‘I-KT’), (‘in’, ‘IN’, ‘B-KT’), (‘public’, ‘JJ’, ‘I-KT’), (‘area’, ‘NNS’, ‘I-KT’)]. By iteratively grouping the B-KT and I-KT terms, the following keyphrases emerged: “stop panic buying,” “be sure,” and “use face mask in public area.”

3.3.6 Transformation and Filtering

In this stage, we removed keyphrases that are stopwords (i.e., common words, such as about, the, from, there, had, and can) from our list of candidate keyphrases. We also removed selected stopwords from the start, end, and within keyphrases while preserving their meaning. For example, “be sure” will be filtered out since “be” and “sure” are included in our pre-defined list of stopwords that cannot start nor end a keyphrase. Third, we removed keyphrases whose length exceeds ten. While previous research retained only keyphrases up to length six [39], we extended our threshold to ten to avoid losing important keyphrases that would enrich insights from this research.

3.3.7 Sentiment Scoring and Filtering

In line with our objective to keep only opinionated candidate keyphrases (i.e., keyphrases with “negative” or “positive” sentiment polarity [56]), we assigned a sentiment score Ss ranging from − 1 to + 1 to each keyphrase using the popular VADER (Valence Aware Dictionary for sEntiment Reasoning) lexicon-based algorithm [57]. Afterwards, we filtered out non-opinionated or “neutral” keyphrases using the criteria summarized in Table 2. For example, the Ss for “stop panic buying” and “use face mask in public area” are − 0.6705 and 0.1027, respectively; hence, will be retained since they are opinionated. The neutral score ranges from − 0.05 and + 0.05 based on the outcome of the experiments conducted by Hutto and Gilbert [57].

Table 2 Criteria for sentiment classification

3.4 Categorizing Keyphrases

Next, the final opinionated keyphrases were manually categorized into broader themes (an approach also used by Bekhuis et al. [33] to categorize phrases) by four reviewers. The reviewers were divided into two teams—T1 and T2. T1 consists of two reviewers who were tasked with grouping the negative keyphrases, while T2 comprises the two other reviewers who grouped the positive keyphrases.

Each reviewer independently and iteratively examined the keyphrases and continued to categorize them until no new category emerged due to saturation. The reviewers used coding sheets to record the category assigned to a keyphrase after examining it. Each reviewer determined the appropriate category names; in addition, a new category was created if none of the existing categories matches the keyphrase being examined. Moreover, a keyphrase was assigned to only one category since keyphrases are more specific than comments. In other words, reviewers assign a keyphrase to the most appropriate category or to a new category if none of existing categories fits. Next, the reviewers in each team validated each other’s work by agreeing or disagreeing with the category mapped to each keyphrase and offered suggestions for every disagreement. Finally, each team applied the suggestions and ensured that all category names are unique while harmonizing similar categories. To measure interrater reliability between reviewers in each team, we used the percentage agreement metric [58]. The percentage agreement score between reviewers in T1 was 98%, while that of T2 was 99.3%.

4 Results

In this section, we present our experimental results including keyphrase categorization. From our large corpus, a total of 427,875 negative and 520,685 positive keyphrases were autogenerated.

4.1 Negative Keyphrases

Our results showed that death is the most dominant keyphrase (n = 10,187), followed by die (n = 7,240), fight (n = 5,891), bad (n = 3,808), kill (n = 3,668), lose (n = 3,631), pay (n = 3,486), leave (n = 3,234), crisis (n = 2,783), hard (n = 2,720), worry (n = 2,476), sick (n = 2,314), sad (n = 2,129), etc. Other keyphrases include self isolation, difficult time, life at risk, death toll rise, conspiracy theory, become infected, spread misinformation, panic buy, lack of leadership, no social distancing, travel restriction, spread fake news, in time of uncertainty, public health emergency, biological weapon, desperate time call for desperate measure, contagious disease, hospital overwhelm, take advantage of crisis, suffer from underlie medical condition, and so on.

Figure 4 shows some of the negative keyphrases and their corresponding category and dominance (as indicated by the bubble size). For example, under the “Economic Crisis” category, recession is the most dominant keyphrase, followed by economic crisis, destroy economy, and crash economy. On the other hand, hoax is the most dominant keyphrase under the “Misinformation” category followed by fake news, while unemployment is the most dominant keyphrase under the “Job & Business issues” category followed by lose job.

Fig. 4
figure 4

Sample negative keyphrases and their frequency of occurrence (a larger bubble size illustrates higher dominance)

4.2 Positive Keyphrases

For positive keyphrases, our results showed that the most dominant keyphrase, in decreasing order, is help (n = 18,498), followed by hope (n = 7,708), protect (n = 7,130), love (n = 6,895), support (n = 6,198), good (n = 5,740), share (n = 5,187), care (n = 4,917), and stay safe (n = 4,917). Other keyphrases include stay healthy, gratitude, relief fund, help slow spread, solidarity, ask for friend, encourage people, stay calm, great initiative, fresh air, use hand sanitizer, artificial intelligence, support business, keep safe distance, practice good hygiene, pray at home, play video game, use defense production act, protect public health, encourage social distancing, free webinar, and so on.

Figure 5 shows some of the positive keyphrases and their associated category and dominance (as indicated by the bubble size). Under the “Public awareness” category, stay safe is the most dominant keyphrase, followed by stay home stay safe, wash hand, and ensure social distancing. On the other hand, relief fund is the most dominant keyphrase under the “Charity” category, while gratitude is the most dominant keyphrase under the “Gratitude” category followed by appreciate effort, show appreciation, thank doctor, and thank healthcare worker.

Fig. 5
figure 5

Sample positive keyphrases and their frequency of occurrence (a larger bubble size illustrates higher dominance)

4.3 Keyphrase Categories

Since majority of the keyphrases were similar, reviewers reached a saturation point where no new categories were emerging. As a result, a total of 18,694 negative and 19,841 positive keyphrases were categorized. In terms of content coverage, the categorized keyphrases spanned 104,619 unique comments.

After grouping related keyphrases into categories or broader themes using the thematic analysis method described in “Sect. 3.4,” 34 negative and 20 positive categories emerged. We refer to these categories as “themes,” and the keyphrases under each category as “subthemes” in the remaining parts of this paper. The 34 negative themes were further distributed into health-related issues, economic issues, psychosocial issues, socio-political issues, social issues, educational issues, and political issues. In this paper, we focused on 17 negative themes mapped to economic, socio-political, educational, and political issues (see Table 3 and Fig. 6). Other issues (i.e., health-related, psychosocial, and social issues) have been discussed in our previous work [30]. As shown in Fig. 7, the top 5 negative themes based on number of user comments are Concerns about social distancing and isolation policies (n = 8,872), followed by Misinformation (n = 2,223), Political influence (n = 1,640), Financial issues (n = 1,622), and Poor governance (n = 1,559). Figure 6 shows the number of subthemes under each theme.

Fig. 6
figure 6

The chart shows negative themes and the corresponding number of subthemes

Table 3 Negative themes, description, and sample comments
Fig. 7
figure 7

The chart shows the total number of user comments associated with each negative theme

Furthermore, Table 4 shows the 20 positive themes and sample comment(s) for each theme, while Fig. 8 and Fig. 9 show the positive themes and the corresponding number of subthemes and comments, respectively. Based on number of comments, Public awareness (n = 22,749) emerged as the top theme, followed by Spiritual support (n = 12,130) and Encouragement (n = 5,244). Other themes include Charity (n = 942), Entertainment (n = 798), Gratitude (n = 758), Development of curative solutions or treatments (n = 653), Advocacy for increased testing (n = 341), Physical activity (n = 285), Cleaner environment (n = 278), etc.

Table 4 Positive themes, description, and sample comments
Fig. 8
figure 8

The chart shows positive themes and the corresponding number of subthemes

Fig. 9
figure 9

The chart shows the total number of user comments associated with each positive theme

By identifying both negative and positive themes, we have answered the first two research questions—RQ1 and RQ2—respectively.

Finally, we randomly selected 100 comments from our original corpus to examine if they can be categorized into existing themes. Our results show that 82% (n = 82/100) of the comments were successfully mapped to appropriate themes. The remaining 18 unmapped comments either contain keyphrases there are not opinionated (i.e., neither positive nor negative) or unrelated to COVID-19 pandemic issues, for example, “Update: Coronavirus news, at a glance” [C39], “Worship for the 5th Sunday in Lent, from St. Martin’s…” [C79], “23:59 For more information, please check MOH’s announcement…” [C31], and “Beef stew, bread, butter and a Red Bull. Another day…” [C94].

5 Discussion

Our results revealed various negative and positive themes representing public opinions about the pandemic, as well as impact of COVID-19 on people and institutions in line with the factors affecting efforts to limit the spread of the disease either negatively or positively. To answer our third research question (RQ3), we first discuss the negative issues (see Table 3) and then suggest interventions based on the positive themes (see Table 4) and research evidence.

5.1 Negative Issues Regarding COVID-19 Pandemic

5.1.1 Economic Issues

Based on our findings, the COVID-19 pandemic led to unemployment, low revenue or losses for business, low supply of essential items, challenging living condition, economic downturn, and financial crisis.

Job- and Business-Related Crisis

In line with our findings (see sample comments below), research shows that the pandemic triggered massive global unemployment crisis [59,60,61] where people are losing jobs or unable to get one. This is due to lockdowns and reduced consumer spending which led to businesses/companies experiencing low income/revenue and losses as many near bankruptcy, shutdown temporarily, or likely to go out of business [62,63,64,65].

...job layoffs are soaring faster than any time in recorded history...This looks bad and it is bad. The worst jobless claims in U.S. history means the economy has fallen into the abyss.” [C9100]

My job is shutdown; my husband job is shutdown...How am I supposed to pull this off? There is NO income. We have 4 children including an 8-week-old baby. I need help NOW.” [C7119]

Economic Downturn

Based on our findings, the pandemic pushes global economies toward recession as stock market indices crashes, as shown in the sample comment below. Evidence shows that the COVID-19 pandemic negatively impacted stock markets more forcefully than any other disease outbreak in history [66]. For example, primary sectors (e.g., agriculture and petroleum and oil), secondary sectors (e.g., manufacturing), and other sectors (e.g., finance, food, real estate, tourism, and transportation sectors) driving stock market indices experienced various challenges (such as supply chain disruption, revenue crash, and transaction halt) compounded by lockdown and social isolation policies aimed to curb COVID-19 spread [67].

Our 250 economists have updated our global forecasts. Coronavirus will inflict a short, sharp global recession. We expect 2020 world growth to drop to zero. In Q1, we see the global economy shrinking faster than in the financial crisis” [C10002]

Shortage of Essential Items

People lamented shortage of food items, toiletries including hand sanitizers, and personal protective equipment (e.g., face masks and protective gear and garment) necessary to prevent contracting the disease. In addition, public health centres and hospitals experience shortage of testing kits and ventilators which hampered efforts to identify COVID-19 cases and keep patients alive. Also, blood shortages were reported in blood banks and lockdown measures may prevent many people from donating blood. Our findings (see sample comments below) align with research which confirms critical supply shortages of the items highlighted above [68,69,70,71,72].

U.S. cities have acute shortages of masks, test kits, ventilators as they face coronavirus threat” [C11119]

Acute shortage of blood in the blood banks...Blood donations needed during & after coronavirus pandemic” [C7999]

Is anybody else having a food shortage in their grocery stores? My hometown stores are about completely empty.” [C4442]

Challenging Living Condition and Financial Issues

As shown in the comments below, people experienced difficulty providing for their families or meeting their needs such as paying bills (e.g., rent, mortgage installment, credit card payment, and phone bill) and buying sufficient food, as a result of job losses and the strict lockdowns which impose financial hardship on people, including owners of small businesses (such as restaurants and cosmetics shops) and hustlers. Many organizations are unable to pay employees’ full salary due to financial constraints caused by the pandemic and resort to half salaries or job cuts (see [C621]). Research has shown that households experienced food insecurity as a result of poor financial status caused by the COVID-19 pandemic [73].

Please help us. Lost my job due to coronavirus shutting down my workplace. I have no income...no rent.” [C9991]

...there are a heartbreaking number of hungry Americans posting their Venmo’s and asking for help in this thread...” [C12883]

Today my company instituted across-the-board pay cuts of 10-30%, and canceled merit increases, bonuses, and 401(k) matches...” [C621]

Flight Cancellations

Based on our findings, people lamented sudden cancellation of flights and difficulty in getting refunds from affected airlines, as shown in the sample comment below. These cancellations are due to border closures and travel bans imposed by governments of many countries to curtail the importation of COVID-19; however, such actions inflicted much pain and distress to stranded passengers, as well as financial losses to the airlines [74].

Very disappointed how Etihad is handling COVID19. Not only did they cancel all flights, but it is legally impossible for me to travel given the travel bans. Instead of refunding my money, I am getting credit that has restrictions to re-book by Sept. How is this fair?” [C144]

5.1.2 Socio-political Issues

Concerns About Social Distancing and Isolation Policies

Our findings revealed public concerns over lockdown, social distancing, and isolation policies irrespective of their perceived benefits. Some of the concerns include (i) people disrespectfully snubbing those who are not 6-feet away from them; (ii) social distancing/lockdown without financial support; (iii) implementing isolation/quarantine policies that contradict the World Health Organization’s advice; (iv) human rights violation; (v) weak enforcement of lockdown policy; (vi) reliance on self-isolation without aggressive testing; (vii) ineffectiveness of isolation/lockdown in slums; (viii) devastating effect of social isolation on domestic violence victims; (ix) millions stranded due to lockdown and struggling to get food and water; and (x) spike in anxiety and depression cases after lockdown announcement. Sample comments are shown below:

...but of societal norms; so much so that there are now actual social distancing snobs who look down on you if you’re less than 6 feet away. Will coronavirus kill all our humanity too?” [C7116]

This is exactly what I am afraid of since lockdown...exploiting the crisis to strip us of our rights.” [C909]

Controversy over Precautionary Measures

Precautionary measures, such as wearing face masks and gloves, generated controversies based on our findings. For example, some people think N95 masks with a valve can aid the spread of the virus from infected patients (see sample comment below), while some are concerned about the stigma attached to wearing masks.

They tell the infected to wear a N95 mask which is 95 effective with no oils...half the masks have a one way valve for the exhale which is unfiltered. They are trying to kill everyone.” [C193]

Lack of Preparedness and Protests

Furthermore, people highlighted lack of preparedness on the part of governments and health systems as a factor aiding the spread of the disease. Evidence shows that many countries and health authorities failed to rapidly perceive the threat posed by COVID-19 [75, 76], thereby allowing it to degenerate into a pandemic level that imposes hardship on world population. Therefore, it is unsurprising that there are protests in several countries, such as health workers and some essential workers protesting about shortage of protective equipment, citizens of developing countries protesting about lack of food and electricity during the lockdown, essential workers requesting for hazard pay during the pandemic, citizens protesting against their government’s inactions toward protecting them from COVID-19, and so on. Below are sample comments:

Another nurses protest calling attention to the shortage of protective equipment, and rationing policies by hospitals.” [C299]

This is what is happening in Chile. People protesting because President…is not taking the correct responsibility on Covid_19. We need national quarantine!!” [C10384]

We are hungry, no food no light - you cannot tell us to stay indoors. Nigerians in parts of Lagos already threatening to defy government lockdown directives with a protest in two days” [C8886]

Risk of Spread at Detention Centres

Moreover, people raised concerns about risk of spreading COVID-19 disease in prisons and the need for decongestion, as shown in the sample comment below. Evidence already shows that incarcerated populations are vulnerable to infectious diseases, including COVID-19, due to unavoidable close contact (since prisons or detention centres are often overcrowded and poorly ventilated/sanitized) and poor healthcare access [77, 78].

As cases increase in Texas, so do concerns about the well-being of people in Texas prisons and jails.” [C1555]

5.1.3 Educational Issues

Disruption in Education

Our findings revealed the disruptive effect of the COVID-19 pandemic on education globally, such as school closures. People are concerned about their children or wards’ education including the cost implication of virtual learning put in place by schools, as well as children in rural areas who would be deprived of learning. This aligns with evidence highlighting the effect of school closures on 80% of children worldwide, and the worsened inequalities in educational outcomes between children in lower- and higher-income countries [79].

There is still NO date for schools to reopen in the Capital.” [C4440]

You are putting the most disadvantaged students at a further educational disadvantage...” [C91]

What kind of education are we getting? We haven’t paid such high fees for ZOOM kind of education. Our online education system is so sick and badly affecting our grades and that’s totally unfair!! 90k for this kind of education is way too worthy!” [C181]

Knowledge Gap

Furthermore, knowledge gap (in form of ignorance and lack of intelligence) on the part of leadership and society is another factor hampering the containment of COVID-19, based on our findings. As shown in the comments below, authorities are short of knowledge as regards what should be done, while people are ill-informed due to limited access to accurate and coherent information about the disease and preventive/control measures.

Nursing, a primarily female profession, is under attack. The CDC in ignorance says wear a bandana. THIS IS NOT PROTECTION! Lives at stake! Hospitals have their heads in the sand. Please! Can you hear us?” [C818]

Some neighbors even want us out because they think they would breathe this virus in the air and we’re inside our own hostel! This is dangerous ignorance!” [C6000]

Misinformation

The proliferation of misinformation is impeding access to accurate information about COVID-19 that could have helped curb the spread of the disease and save lives. Misinformation which refers to false information or information with limited or without scientific evidence is one of the top 5 issues that emerged in our findings, and also reported by previous research [80,81,82,83,84,85]. Sample comment below reveals public concerns about misinformation regarding COVID-19:

The amount of fake news on all the health concerns regarding COVID19 is shocking. Only person I trust with info is my cousin who is a doctor. She has just told me it DOESN’T last in the air, as long as you are two metres away and sneeze into a tissue you are fine!!” [C12010]

5.1.4 Political Issues

Elected governments or political appointees are central to decision-making or governance that should improve the standard of living of people and assure their health and safety. However, people are concerned about widespread interference in COVID-19-related matters for political gain, based on our findings. In addition, they are concerned about the absence of strong leadership in the wake of the pandemic, and the poor state (or lack) of key public infrastructure (e.g., electricity, water, internet, and healthcare facilities/centres). Research has shown that political beliefs and partisanship pose a significant limitation on the effectiveness of preventive measures (such as social distancing) [86,87,88]. Sample comments below reveal public opinion regarding these issues:

Quite possibly the worst governor in the country. He’s hurting not only his own citizens but all Americans, as all the people on spring break and theme parks go home with covid19.” [C5777]

When authorities and armed forces asked you to self-quarantine with no internet and no electricity (16 hours load shedding) in Hunza ...?!” [C108]

5.2 Interventions for Addressing the Negative Issues

In this section, we suggest interventions that can help address the negative issues while drawing insights from the positive themes and relevant research evidence. This answers our third research question (RQ3).

To cushion the effect of economic issues on people, “charity” and “grassroots support” are important factors as revealed in our findings. Mobile technology can play a significant role in ensuring effective distribution of relief items. For example, GPS-enabled and multilingual mobile apps can help people to easily find food banks nearest to them. Moreover, government-funded or non-governmental charity organizations responsible for distributing economic relief to people can easily expand their reach or coverage and make delivery decisions based on data collected through these apps. For instance, people can indicate their needs through these apps and other information, such as location, age group, health condition, and whether they are in self-isolation (due to exposure to COVID-19). These apps can also be used to onboard volunteers who want to offer financial and material assistance and connect them to those in need. Furthermore, the data collected through these apps can further be analyzed using artificial intelligence (AI) techniques (such as machine learning or deep learning) to predict the communities that are in dire need of assistance. Besides technology usage, governments can budget for additional measures to protect the finances of people and businesses such as keeping people employed through financial partnership with employers, providing stimulus packages, and facilitating quick employment for the jobless [89]. Evidence shows that governments of some countries are adopting these measures to varying degrees [90].

Regarding shortage of items to protect people from the pandemic (such as face mask and hand sanitizers), a “homemade protective equipment” approach can be employed as a viable alternative, as revealed in our findings. Evidence shows that homemade masks, for example, can offer protection from COVID-19 transmission, in the event where medical masks are not available [91]. To address supply chain issues with respect to high demand and essential products, research suggests recovery strategies (such as increase in production shifts, use of spare capacity, emergency sourcing, bolstering capacity locally, and collaboration with supply chain partners) [92, 93].

Regarding concerns about social distancing and isolation policies imposed by governments, as well as controversies over precautionary measures suggested by health professionals, “public awareness” is a major and useful tool to address these issues, including misinformation, as revealed in our findings. Providing timely and accurate COVID-19-related information to people, and also connecting them to evidence-based resources and health professionals to resolve their questions or confusions, can be lifesaving. To reach a wider audience on a personalized basis, mobile- and voice-enabled chatbots equipped with real-time access to evidence-based and validated resources (such as approved safety measures by World Health Organization, as well as government-approved policies or guidelines) can be developed such that people can interact (in their own language) with the chatbots using their smartphones anytime. Difficult questions can be automatically channeled to health experts for responses within the same chat window. For those with traditional cellular phones, governments and local health agencies can partner with telecom firms to deliver COVID-19-related information directly to people’s phones as a short messaging service (SMS) at regular intervals. In addition, official COVID-19-related channels on social media (such as [94]) supervised by health experts and local/international health organizations can provide accurate and frequent updates.

Regarding educational disruptions due to COVID-19, evidence shows that digital technologies are pedagogical tools that can enhance diverse forms of learning both within and outside the school environment [95]. Based on our findings, “online learning” (also called e-learning, virtual learning, virtual classroom, digital classroom, or distance learning) will help mitigate the impact of educational disruptions caused by the COVID-19 pandemic. While it may not be as effective as in-class learning in some cases, it will prevent potential brain drain that may result in the absence of continuous learning. Mobile and web-based learning platforms, many of which are available today, should be readily accessible in schools at all levels going forward. Designers should ensure these tools provide personalized learning experience such that students can manage their own content and the tools offer tailored suggestions that fit their interests or needs. The tools should also support collaborative learning where students can work together on assignments, projects, or other tasks similar to what they do in the real-world. Furthermore, governments across the globe should ensure equitable access to these educational technologies irrespective of economic, financial, racial, or cultural differences. Public infrastructure supporting these technologies (such as stable electricity, as well as affordable and reliable internet) should be considered a top priority and made readily accessible to people.

Finally, governments at all levels should partner (rather than compete) with health professionals and researchers to form a strong force against COVID-19. Our findings revealed “advocacy for testing” which reflects public call for increased testing since some governments are still struggling in this area due to their political interests superseding public health. Research argues the significance and effectiveness of adaptive evidence-making intervention (a fusion of scientific evidence and policy) during public health emergencies (such as the COVID-19 pandemic) [96]. This can only be possible if political leaders and health experts align and work harmoniously to address current and future pandemics.

6 Conclusion and Future Work

We explored the impact of the COVID-19 pandemic on people globally using social media data. We analyzed over 1 million comments obtained from six social media platforms using a seven-stage context-aware natural language processing (NLP) approach to extract relevant keyphrases which were further categorized into broader themes. Our results revealed 34 negative themes and 20 positive themes surrounding the COVID-19 pandemic. We discussed the economic, socio-political, educational, and political issues and suggested interventions to tackle them based on the positive themes and research evidence. These interventions would inform and help governments, organizations, and individuals to minimize the spread and impact of COVID-19 and to respond effectively to future pandemics.

As part of future work, we would use the data generated from this work to train, evaluate, and compare machine learning models that detect the themes and corresponding sentiment of social media comments related to COVID-19 in real-time. The best performing model(s) could be integrated with applications and visualization tools to provide useful and personalized features/interventions, as well as uncover real-time insights that could help to curb the spread of the virus and mitigate its impact on the society.