COVID-19 Pandemic: Identifying Key Issues Using Social Media and Natural Language Processing

Oyebode, Oladapo; Ndulue, Chinenye; Mulchandani, Dinesh; Suruliraj, Banuchitra; Adib, Ashfaq; Orji, Fidelia Anulika; Milios, Evangelos; Matwin, Stan; Orji, Rita

doi:10.1007/s41666-021-00111-w

COVID-19 Pandemic: Identifying Key Issues Using Social Media and Natural Language Processing

Research Article
Published: 11 February 2022

Volume 6, pages 174–207, (2022)
Cite this article

Download PDF

Journal of Healthcare Informatics Research Aims and scope Submit manuscript

COVID-19 Pandemic: Identifying Key Issues Using Social Media and Natural Language Processing

Download PDF

Oladapo Oyebode ORCID: orcid.org/0000-0002-5797-7790¹,
Chinenye Ndulue¹,
Dinesh Mulchandani¹,
Banuchitra Suruliraj¹,
Ashfaq Adib¹,
Fidelia Anulika Orji²,
Evangelos Milios¹,
Stan Matwin^1,3 &
…
Rita Orji ORCID: orcid.org/0000-0001-6152-8034¹

5300 Accesses
12 Citations
2 Altmetric
Explore all metrics

Abstract

The COVID-19 pandemic has affected people’s lives in many ways. Social media data can reveal public perceptions and experience with respect to the pandemic, and also reveal factors that hamper or support efforts to curb global spread of the disease. In this paper, we analyzed COVID-19-related comments collected from six social media platforms using natural language processing (NLP) techniques. We identified relevant opinionated keyphrases and their respective sentiment polarity (negative or positive) from over 1 million randomly selected comments, and then categorized them into broader themes using thematic analysis. Our results uncover 34 negative themes out of which 17 are economic, socio-political, educational, and political issues. Twenty (20) positive themes were also identified. We discuss the negative issues and suggest interventions to tackle them based on the positive themes and research evidence.

Shifting sentiments: analyzing public reaction to COVID-19 containment policies in Wuhan and Shanghai through Weibo data

Article Open access 29 August 2024

Themes and Sentiments of Online Comments Under COVID-19: A Case Study of Macau

Global News Sentiment Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Emerging infectious diseases are responsible for many deaths and disabilities globally [1]. Evidence shows that at least 43 million people contracted the H1N1 flu worldwide within 12 months of the pandemic which, in turn, resulted in over 200,000 deaths [2, 3]. In addition, 770,000 HIV/AIDS-related deaths were reported in 2018 alone, with over 37 million people infected globally [4]. The latest emerging infectious disease, COVID-19 [5, 6], has already infected over 89.5 million people worldwide, with a mortality of at least 1.9 million as of January 9, 2021 [7]. Emerging infectious diseases have also been shown to inflict significant burden on economies and public health systems [8,9,10]. For example, global health systems are struggling to cope with the COVID-19 pandemic, while unemployment/job losses, reduced income/productivity, and business closures are prevalent among individuals and organizations due to the lockdown measures imposed by governments. To understand public perceptions toward the pandemic, social media data can provide the required insights from a global perspective [11].

Social media has been a major and rich data source for research in many domains, including health, due to its 3.8 billion active users [12] from diverse geographic locations across the globe. For instance, researchers analyzed user comments extracted from social media platforms (such as Facebook, Twitter, Instagram, and discussion forums) to uncover insights about health-related issues (e.g., mental health [13, 14], substance use [15, 16], and diseases [17,18,19,20]), political issues (e.g., elections [21,22,23,24]), and business-related issues (e.g., customer engagement [25, 26]). With respect to COVID-19, social media comments can reveal public opinions about governments and health organizations’ response to the pandemic, as well as economic, health, social, political, physical, and psychological impact of COVID-19 on global populations in line with the factors affecting efforts to limit the spread of the disease either negatively or positively.

In this paper, we apply natural language processing (NLP) to analyze COVID-19-related comments from six social media platforms (Twitter, Facebook, YouTube, and three online discussion forums) to uncover issues surrounding the pandemic based on public perceptions. NLP is a widely used method for extracting insights from unstructured texts, such as social media data and clinical texts (e.g., electronic health records [27] and patient journals [28]). We aim to answer the following research questions in this work:

RQ1: What are the negative issues (economic, socio-political, educational, and political issues) shared by people on social media with respect to the COVID-19 pandemic?
RQ2: What are the positive opinions or perceptions of people with respect to COVID-19 and how it is being handled?
RQ3: How can the negative issues be addressed using insights from the positive opinions and other research evidence?

The methodological approach utilized in answering our research questions are as follows:

1)
We applied NLP approach for detecting relevant and opinionated keyphrases from social media comments related to the COVID-19 pandemic. To extract meaningful keyphrases, our approach considers the context in which words appear in the unstructured comments.
2)
We identify negative and positive themes that capture public opinions about the pandemic. Our results reveal 34 negative themes out of which 17 are economic, socio-political, educational, and political issues. Twenty (20) positive themes were also identified.
3)
We suggest interventions to tackle the negative issues. The interventions which are based on the positive themes and research evidence would inform and help governments and relevant agencies, as well as individuals, to minimize the spread and impact of COVID-19, and to respond effectively to future pandemics.

2 Related Work

Over the years, social media has been a rich source of data for health informatics research [29]. Natural language processing (NLP) techniques have been widely used for analyzing social media comments and clinical texts (such as identifying health-related and psychosocial issues with respect to the COVID-19 pandemic [30]).

The lexicon-based NLP technique was used to detect the prevalence of keywords indicating public interests in e-cigarette, marijuana, influenza, and Ebola using social media data, while latent Dirichlet allocation (LDA) technique was used to retrieve topics from the corpus [31]. LDA has also been utilized to extract latent topics from COVID-19-related comments posted on social media [32]. Also, the Natural Language Toolkit (NLTK) was used by Bekhuis et al. [33] to identify top collocated n-grams (bigrams and trigrams) from clinical emails.

Furthermore, a custom topic modeling technique, called Ailment Topic Aspect Model, was employed to generate latent topics from Twitter data with the aim of identifying mentions of ailments of interest, including allergies, obesity, and insomnia [34]. The non-negative matrix factorization is another topic modeling technique used in health informatics research to extract topics from social media data [35]. A third-party tool for text mining, called KH-Coder, has also been used to explore potential topics related to H1N1-related advice, vaccine, and antiviral uptake in the UK based on Twitter data [36]. The machine learning-based NLP was utilized to analyze unstructured clinical notes to predict hospital readmissions for COPD patients [37] and perform sentiment analysis of user comments on mental health apps [38]. None of the techniques above considered the context in which words appear in unstructured texts which can yield more meaningful and relevant keyphrases.

To demonstrate the significance of context-based text analysis, Dave and Varma conducted experiments to compare N-Gram chunking technique and the part-of-speech (POS) chunking technique [39]. Rather than just extracting n-grams, the POS chunking method considers context of words by using regular grammars or POS patterns that specify how sentences should be deconstructed into keyphrases of interest. Their results show that systems using the POS chunking technique extracted relevant features (keyphrases) and outperformed systems adopting N-Gram chunking for feature extraction. We extend this approach with enhanced part-of-speech (POS) patterns tailored to our goal, chunking and CoNLL IOB tagging, as well as keyphrase transformation and sentiment scoring. We further categorized the extracted keyphrases into broader themes using the thematic analysis method.

3 Methodology

Based on our research questions, the goal of this paper is to investigate and reflect on people’s personal experiences and opinions with respect to the COVID-19 pandemic using social media data. To achieve this, we utilize the following well-established computational techniques:

1)
We developed programs or scripts to mine user comments related to COVID-19 from six social media platforms.
2)
We preprocessed the data using NLP techniques.
3)
We applied a seven-stage context-aware NLP approach to identify opinionated and meaningful keyphrases from the comments.
4)
We applied thematic analysis to iteratively categorize related keyphrases identified in step 3 above into broader themes or categories.

Figure 1 shows the NLP pipeline utilized in extracting opinionated keyphrases from comments related to COVID-19 pandemic.

3.1 Data Collection

A total of 47,410,795 comments related to COVID-19 were collected across six social media platforms (i.e., Twitter, YouTube, Facebook, PushSquare.com, Archinect.com, and LiveScience.com), as described below:

1)
Twitter: We built a console application to mine 47,249,973 tweets in real-time using the Twitter Streaming API [40] and C# programming language. The program targets tweets from the following hashtags: #COVID19, #COVID, #ncov2019, #Covid_19, #StopTheSpread, #CoronaVirusUpdates, #StayAtHome, #selfquarantine, #COVID-19, #COVIDー19, #CoronaCrisis, #panicbuying, #caronavirusoutbreak, #SocialDistancing, #cronovirus, #CoronaVirusUpdate, #QuarantineLife, #Quarantined, #pandemic, #CoronavirusPandemic, #Coronavid19, #coronapocalypse, #QuarantineAndChill, #CoronaVirus, #MyPandemicSurvivalPlan, and #CoronavirusOutbreak.
2)
YouTube: We wrote a Python script to automatically extract 111,722 user comments linked to 2,939 COVID-19-related videos using the YouTube Data API [41]. The keywords used for the video search include covid-19, covid19, and coronavirus.
3)
Facebook: We adopted a semi-automatic technique to extract comments due to search restrictions imposed by Facebook. We first obtained 91 groups and 68 pages related to COVID-19 manually using the following keywords: COVID, COVID-19, and Coronavirus. Afterwards, we developed a Python script to retrieve 8,382 and 777 comments from the pages and groups, respectively.
4)
Discussion forums: We collected 18,401, 20,747, and 793 user comments from COVID-19-related threads on PushSquare.com [42], Archinect.com [43, 44], and LiveScience.com [45], respectively, using Python scripts.

3.2 Data Preprocessing

To clean the data and prepare it for keyphrase extraction, we apply the following preprocessing steps using NLP techniques implemented in Python:

1)
Remove mentions, URLs, and hashtags
2)
Expand contractions (such as replacing “couldn’t” with “could not”)
3)
Replace HTML characters with Unicode equivalent (such as replacing “&” with “&”)
4)
Remove HTML tags (such as “ < div > ” and “ < p > ”)
5)
Remove special characters that are not required for sentence boundary detection
6)
Compress words with repeated characters (such as compressing “poooool” to “pool”)
7)
Convert slangs to English words using relevant online slang dictionaries [46, 47]
8)
Remove words that are numbers

After applying the above steps on the data, and removing non-English comments (identified using the langdetect Python library [48]) and duplicate comments, the total number of comments reduced to 8,021,341. We randomly selected about 13% of these comments (n = 1,051,616) to form the corpus used in this paper.

3.3 Keyphrase Extraction

To extract meaningful and opinionated keyphrases which are words or phrases representing topical content of each document (or comment) in our corpus, we utilized a context-aware NLP approach. This approach extends the version adopted by Dave and Varma [39] with enhanced part-of-speech (POS) patterns tailored to our objective, chunking (in conjunction with CoNLL IOB tagging [49]), as well as transformation and sentiment scoring stages. In subsequent subsections, we describe the keyphrase extraction component of the NLP pipeline in Fig. 1. In line with this, we present an algorithm (see Fig. 2) that accepts a regular grammar and our corpus as input parameters and returns opinionated keyphrases of interest as output. The algorithm was implemented in Python using the Natural Language Toolkit (NLTK).

3.3.1 Grammar Definition

We defined a regular grammar (see below) which is a set of rules composed of POS patterns that describe how the syntactic units of each document in our corpus are deconstructed into their constituents or parts. The grammar captures the context of each comment and the opinions/sentiments expressed using nouns, adjectives, and verbs. Research revealed that nouns are crucial for detecting the context of a conversation [50], while both adjectives and verbs are significant for sentiment classification [51].

Grammar: { <DT>? <JJ.*>* <NN.*>* <VB.*>? (<IN>? <DT>? <JJ.*>* <NN.*>*)? }

The regular grammar above is composed of patterns of POS tags from the well-established Penn Treebank Tagset [52, 53]. For instance, the < NN.* > pattern matches any type of noun (see Table 1), < JJ.* > matches any type of adjective, < VB.* > matches any type of verb, < IN > matches a preposition or subordinating conjunction, and < DT > matches a determiner. We considered determiners and prepositions since they usually occur together with nouns and adjectives in sentences (e.g., public concern about the virus). Also, the “*” symbol after a POS pattern refers to “zero or more occurrences,” while “?” refers to “zero or one occurrence.”

Table 1 Part-of-speech (POS) tags and description

Full size table

3.3.2 Sentence Breaking and Tokenization

Next, each document is separated into unique sentences. To achieve this, we utilized a robust unsupervised algorithm (within the Python NLTK’s tokenize library [54]) which considers collocations, punctuations, capitalizations, and abbreviations in determining sentence boundaries within each document. Afterwards, each sentence is further broken down into words or tokens in preparation for POS tagging.

3.3.3 POS Tagging

Each token is assigned a POS tag (within the Penn Treebank Tagset) denoting its part of speech in the English language. For example, tokens in the following sentence “Stop panic buying and be sure to use face masks in public areas” are tagged as follows: [(‘Stop’, ‘NNP’), (‘panic’, ‘NN’), (‘buying’, ‘NN’), (‘and’, ‘CC’), (‘be’, ‘VB’), (‘sure’, ‘JJ’), (‘to’, ‘TO’), (‘use’, ‘VB’), (‘face’, ‘NN’), (‘masks’, ‘NNS’), (‘in’, ‘IN’), (‘public’, ‘JJ’), (‘areas’, ‘NNS’)].

3.3.4 Lemmatization

Next, each tagged token is lemmatized or converted into its root word based on its part of speech. Prior to lemmatization, we converted the tokens or words to lowercase. Lemmatization is achieved by using the English vocabulary and conducting morphological analysis of words [55]. Hence, a root word is the dictionary form of the original word. By converting the tokens to their root form, we harmonized similar words while preserving their meaning. For instance, the following verb words “seen” and “sees” are converted to their root form—“see.” Referring to our previous sample tagged tokens, the output of the lemmatization stage is: [(‘stop’, ‘NNP’), (‘panic’, ‘NN’), (‘buying’, ‘NN’), (‘and’, ‘CC’), (‘be’, ‘VB’), (‘sure’, ‘JJ’), (‘to’, ‘TO’), (‘use’, ‘VB’), (‘face’, ‘NN’), (‘mask’, ‘NNS’), (‘in’, ‘IN’), (‘public’, ‘JJ’), (‘area’, ‘NNS’)].

3.3.5 Chunking

Next, we created a chunker that uses the regular grammar defined above to match phrases comprising an optional determiner, followed by zero or more of any type of adjective, zero or more of any type of noun, zero or one of any type of verb, and an optional component. This component consists of an optional preposition, followed by an optional determiner, zero or more of any type of adjective, and zero or more of any type of noun. Using our previous example, the chunker produces the parse tree in Fig. 3, showing the key terms (KT) that match the grammar specified.

To generate the candidate keyphrases, we first converted the parse tree (or chunks) generated by the chunker for each document into a CoNLL IOB format. An IOB (Inside-Outside-Beginning) tag specifies how a key term functions in the context of a phrase—whether the term begins (B-KT), is inside (I-KT), or outside (O-KT or O) the phrase [49]. Next, we iteratively group terms that are part of a keyphrase (i.e., B-KT and I-KT) and stops when a term that does not belong to the keyphrase (i.e., O-KT or O) is encountered.

For example, the CoNLL IOB format of the parse tree in Fig. 3 gives [(‘stop’, ‘NNP’, ‘B-KT’), (‘panic’, ‘NN’, ‘I-KT’), (‘buying’, ‘NN’, ‘I-KT’), (‘and’, ‘CC’, ‘O’), (‘be’, ‘VB’, ‘B-KT’), (‘sure’, ‘JJ’, ‘I-KT’), (‘to’, ‘TO’, ‘O’), (‘use’, ‘VB’, ‘B-KT’), (‘face’, ‘NN’, ‘I-KT’), (‘mask’, ‘NNS’, ‘I-KT’), (‘in’, ‘IN’, ‘B-KT’), (‘public’, ‘JJ’, ‘I-KT’), (‘area’, ‘NNS’, ‘I-KT’)]. By iteratively grouping the B-KT and I-KT terms, the following keyphrases emerged: “stop panic buying,” “be sure,” and “use face mask in public area.”

3.3.6 Transformation and Filtering

In this stage, we removed keyphrases that are stopwords (i.e., common words, such as about, the, from, there, had, and can) from our list of candidate keyphrases. We also removed selected stopwords from the start, end, and within keyphrases while preserving their meaning. For example, “be sure” will be filtered out since “be” and “sure” are included in our pre-defined list of stopwords that cannot start nor end a keyphrase. Third, we removed keyphrases whose length exceeds ten. While previous research retained only keyphrases up to length six [39], we extended our threshold to ten to avoid losing important keyphrases that would enrich insights from this research.

3.3.7 Sentiment Scoring and Filtering

In line with our objective to keep only opinionated candidate keyphrases (i.e., keyphrases with “negative” or “positive” sentiment polarity [56]), we assigned a sentiment score S_s ranging from − 1 to + 1 to each keyphrase using the popular VADER (Valence Aware Dictionary for sEntiment Reasoning) lexicon-based algorithm [57]. Afterwards, we filtered out non-opinionated or “neutral” keyphrases using the criteria summarized in Table 2. For example, the S_s for “stop panic buying” and “use face mask in public area” are − 0.6705 and 0.1027, respectively; hence, will be retained since they are opinionated. The neutral score ranges from − 0.05 and + 0.05 based on the outcome of the experiments conducted by Hutto and Gilbert [57].

Table 2 Criteria for sentiment classification

Full size table

3.4 Categorizing Keyphrases

Next, the final opinionated keyphrases were manually categorized into broader themes (an approach also used by Bekhuis et al. [33] to categorize phrases) by four reviewers. The reviewers were divided into two teams—T1 and T2. T1 consists of two reviewers who were tasked with grouping the negative keyphrases, while T2 comprises the two other reviewers who grouped the positive keyphrases.

Each reviewer independently and iteratively examined the keyphrases and continued to categorize them until no new category emerged due to saturation. The reviewers used coding sheets to record the category assigned to a keyphrase after examining it. Each reviewer determined the appropriate category names; in addition, a new category was created if none of the existing categories matches the keyphrase being examined. Moreover, a keyphrase was assigned to only one category since keyphrases are more specific than comments. In other words, reviewers assign a keyphrase to the most appropriate category or to a new category if none of existing categories fits. Next, the reviewers in each team validated each other’s work by agreeing or disagreeing with the category mapped to each keyphrase and offered suggestions for every disagreement. Finally, each team applied the suggestions and ensured that all category names are unique while harmonizing similar categories. To measure interrater reliability between reviewers in each team, we used the percentage agreement metric [58]. The percentage agreement score between reviewers in T1 was 98%, while that of T2 was 99.3%.

4 Results

In this section, we present our experimental results including keyphrase categorization. From our large corpus, a total of 427,875 negative and 520,685 positive keyphrases were autogenerated.

4.1 Negative Keyphrases

Our results showed that death is the most dominant keyphrase (n = 10,187), followed by die (n = 7,240), fight (n = 5,891), bad (n = 3,808), kill (n = 3,668), lose (n = 3,631), pay (n = 3,486), leave (n = 3,234), crisis (n = 2,783), hard (n = 2,720), worry (n = 2,476), sick (n = 2,314), sad (n = 2,129), etc. Other keyphrases include self isolation, difficult time, life at risk, death toll rise, conspiracy theory, become infected, spread misinformation, panic buy, lack of leadership, no social distancing, travel restriction, spread fake news, in time of uncertainty, public health emergency, biological weapon, desperate time call for desperate measure, contagious disease, hospital overwhelm, take advantage of crisis, suffer from underlie medical condition, and so on.

Figure 4 shows some of the negative keyphrases and their corresponding category and dominance (as indicated by the bubble size). For example, under the “Economic Crisis” category, recession is the most dominant keyphrase, followed by economic crisis, destroy economy, and crash economy. On the other hand, hoax is the most dominant keyphrase under the “Misinformation” category followed by fake news, while unemployment is the most dominant keyphrase under the “Job & Business issues” category followed by lose job.

4.2 Positive Keyphrases

For positive keyphrases, our results showed that the most dominant keyphrase, in decreasing order, is help (n = 18,498), followed by hope (n = 7,708), protect (n = 7,130), love (n = 6,895), support (n = 6,198), good (n = 5,740), share (n = 5,187), care (n = 4,917), and stay safe (n = 4,917). Other keyphrases include stay healthy, gratitude, relief fund, help slow spread, solidarity, ask for friend, encourage people, stay calm, great initiative, fresh air, use hand sanitizer, artificial intelligence, support business, keep safe distance, practice good hygiene, pray at home, play video game, use defense production act, protect public health, encourage social distancing, free webinar, and so on.

Figure 5 shows some of the positive keyphrases and their associated category and dominance (as indicated by the bubble size). Under the “Public awareness” category, stay safe is the most dominant keyphrase, followed by stay home stay safe, wash hand, and ensure social distancing. On the other hand, relief fund is the most dominant keyphrase under the “Charity” category, while gratitude is the most dominant keyphrase under the “Gratitude” category followed by appreciate effort, show appreciation, thank doctor, and thank healthcare worker.

4.3 Keyphrase Categories

Since majority of the keyphrases were similar, reviewers reached a saturation point where no new categories were emerging. As a result, a total of 18,694 negative and 19,841 positive keyphrases were categorized. In terms of content coverage, the categorized keyphrases spanned 104,619 unique comments.

After grouping related keyphrases into categories or broader themes using the thematic analysis method described in “Sect. 3.4,” 34 negative and 20 positive categories emerged. We refer to these categories as “themes,” and the keyphrases under each category as “subthemes” in the remaining parts of this paper. The 34 negative themes were further distributed into health-related issues, economic issues, psychosocial issues, socio-political issues, social issues, educational issues, and political issues. In this paper, we focused on 17 negative themes mapped to economic, socio-political, educational, and political issues (see Table 3 and Fig. 6). Other issues (i.e., health-related, psychosocial, and social issues) have been discussed in our previous work [30]. As shown in Fig. 7, the top 5 negative themes based on number of user comments are Concerns about social distancing and isolation policies (n = 8,872), followed by Misinformation (n = 2,223), Political influence (n = 1,640), Financial issues (n = 1,622), and Poor governance (n = 1,559). Figure 6 shows the number of subthemes under each theme.

Table 3 Negative themes, description, and sample comments

Full size table

Furthermore, Table 4 shows the 20 positive themes and sample comment(s) for each theme, while Fig. 8 and Fig. 9 show the positive themes and the corresponding number of subthemes and comments, respectively. Based on number of comments, Public awareness (n = 22,749) emerged as the top theme, followed by Spiritual support (n = 12,130) and Encouragement (n = 5,244). Other themes include Charity (n = 942), Entertainment (n = 798), Gratitude (n = 758), Development of curative solutions or treatments (n = 653), Advocacy for increased testing (n = 341), Physical activity (n = 285), Cleaner environment (n = 278), etc.

Table 4 Positive themes, description, and sample comments

Full size table

By identifying both negative and positive themes, we have answered the first two research questions—RQ1 and RQ2—respectively.

Finally, we randomly selected 100 comments from our original corpus to examine if they can be categorized into existing themes. Our results show that 82% (n = 82/100) of the comments were successfully mapped to appropriate themes. The remaining 18 unmapped comments either contain keyphrases there are not opinionated (i.e., neither positive nor negative) or unrelated to COVID-19 pandemic issues, for example, “Update: Coronavirus news, at a glance” [C39], “Worship for the 5th Sunday in Lent, from St. Martin’s…” [C79], “23:59 For more information, please check MOH’s announcement…” [C31], and “Beef stew, bread, butter and a Red Bull. Another day…” [C94].

5 Discussion

Our results revealed various negative and positive themes representing public opinions about the pandemic, as well as impact of COVID-19 on people and institutions in line with the factors affecting efforts to limit the spread of the disease either negatively or positively. To answer our third research question (RQ3), we first discuss the negative issues (see Table 3) and then suggest interventions based on the positive themes (see Table 4) and research evidence.

5.1 Negative Issues Regarding COVID-19 Pandemic

5.1.1 Economic Issues

Based on our findings, the COVID-19 pandemic led to unemployment, low revenue or losses for business, low supply of essential items, challenging living condition, economic downturn, and financial crisis.

Job- and Business-Related Crisis

In line with our findings (see sample comments below), research shows that the pandemic triggered massive global unemployment crisis [59,60,61] where people are losing jobs or unable to get one. This is due to lockdowns and reduced consumer spending which led to businesses/companies experiencing low income/revenue and losses as many near bankruptcy, shutdown temporarily, or likely to go out of business [62,63,64,65].

“...job layoffs are soaring faster than any time in recorded history...This looks bad and it is bad. The worst jobless claims in U.S. history means the economy has fallen into the abyss.” [C9100]

“My job is shutdown; my husband job is shutdown...How am I supposed to pull this off? There is NO income. We have 4 children including an 8-week-old baby. I need help NOW.” [C7119]

Economic Downturn

Based on our findings, the pandemic pushes global economies toward recession as stock market indices crashes, as shown in the sample comment below. Evidence shows that the COVID-19 pandemic negatively impacted stock markets more forcefully than any other disease outbreak in history [66]. For example, primary sectors (e.g., agriculture and petroleum and oil), secondary sectors (e.g., manufacturing), and other sectors (e.g., finance, food, real estate, tourism, and transportation sectors) driving stock market indices experienced various challenges (such as supply chain disruption, revenue crash, and transaction halt) compounded by lockdown and social isolation policies aimed to curb COVID-19 spread [67].

“Our 250 economists have updated our global forecasts. Coronavirus will inflict a short, sharp global recession. We expect 2020 world growth to drop to zero. In Q1, we see the global economy shrinking faster than in the financial crisis” [C10002]

Shortage of Essential Items

People lamented shortage of food items, toiletries including hand sanitizers, and personal protective equipment (e.g., face masks and protective gear and garment) necessary to prevent contracting the disease. In addition, public health centres and hospitals experience shortage of testing kits and ventilators which hampered efforts to identify COVID-19 cases and keep patients alive. Also, blood shortages were reported in blood banks and lockdown measures may prevent many people from donating blood. Our findings (see sample comments below) align with research which confirms critical supply shortages of the items highlighted above [68,69,70,71,72].

“U.S. cities have acute shortages of masks, test kits, ventilators as they face coronavirus threat” [C11119]

“Acute shortage of blood in the blood banks...Blood donations needed during & after coronavirus pandemic” [C7999]

“Is anybody else having a food shortage in their grocery stores? My hometown stores are about completely empty.” [C4442]

Challenging Living Condition and Financial Issues

As shown in the comments below, people experienced difficulty providing for their families or meeting their needs such as paying bills (e.g., rent, mortgage installment, credit card payment, and phone bill) and buying sufficient food, as a result of job losses and the strict lockdowns which impose financial hardship on people, including owners of small businesses (such as restaurants and cosmetics shops) and hustlers. Many organizations are unable to pay employees’ full salary due to financial constraints caused by the pandemic and resort to half salaries or job cuts (see [C621]). Research has shown that households experienced food insecurity as a result of poor financial status caused by the COVID-19 pandemic [73].

“Please help us. Lost my job due to coronavirus shutting down my workplace. I have no income...no rent.” [C9991]

“...there are a heartbreaking number of hungry Americans posting their Venmo’s and asking for help in this thread...” [C12883]

“Today my company instituted across-the-board pay cuts of 10-30%, and canceled merit increases, bonuses, and 401(k) matches...” [C621]

Flight Cancellations

Based on our findings, people lamented sudden cancellation of flights and difficulty in getting refunds from affected airlines, as shown in the sample comment below. These cancellations are due to border closures and travel bans imposed by governments of many countries to curtail the importation of COVID-19; however, such actions inflicted much pain and distress to stranded passengers, as well as financial losses to the airlines [74].

“Very disappointed how Etihad is handling COVID19. Not only did they cancel all flights, but it is legally impossible for me to travel given the travel bans. Instead of refunding my money, I am getting credit that has restrictions to re-book by Sept. How is this fair?” [C144]

5.1.2 Socio-political Issues

Concerns About Social Distancing and Isolation Policies

Our findings revealed public concerns over lockdown, social distancing, and isolation policies irrespective of their perceived benefits. Some of the concerns include (i) people disrespectfully snubbing those who are not 6-feet away from them; (ii) social distancing/lockdown without financial support; (iii) implementing isolation/quarantine policies that contradict the World Health Organization’s advice; (iv) human rights violation; (v) weak enforcement of lockdown policy; (vi) reliance on self-isolation without aggressive testing; (vii) ineffectiveness of isolation/lockdown in slums; (viii) devastating effect of social isolation on domestic violence victims; (ix) millions stranded due to lockdown and struggling to get food and water; and (x) spike in anxiety and depression cases after lockdown announcement. Sample comments are shown below:

“...but of societal norms; so much so that there are now actual social distancing snobs who look down on you if you’re less than 6 feet away. Will coronavirus kill all our humanity too?” [C7116]

“This is exactly what I am afraid of since lockdown...exploiting the crisis to strip us of our rights.” [C909]

Controversy over Precautionary Measures

Precautionary measures, such as wearing face masks and gloves, generated controversies based on our findings. For example, some people think N95 masks with a valve can aid the spread of the virus from infected patients (see sample comment below), while some are concerned about the stigma attached to wearing masks.

“They tell the infected to wear a N95 mask which is 95 effective with no oils...half the masks have a one way valve for the exhale which is unfiltered. They are trying to kill everyone.” [C193]

Lack of Preparedness and Protests

Furthermore, people highlighted lack of preparedness on the part of governments and health systems as a factor aiding the spread of the disease. Evidence shows that many countries and health authorities failed to rapidly perceive the threat posed by COVID-19 [75, 76], thereby allowing it to degenerate into a pandemic level that imposes hardship on world population. Therefore, it is unsurprising that there are protests in several countries, such as health workers and some essential workers protesting about shortage of protective equipment, citizens of developing countries protesting about lack of food and electricity during the lockdown, essential workers requesting for hazard pay during the pandemic, citizens protesting against their government’s inactions toward protecting them from COVID-19, and so on. Below are sample comments:

“Another nurses protest calling attention to the shortage of protective equipment, and rationing policies by hospitals.” [C299]

“This is what is happening in Chile. People protesting because President…is not taking the correct responsibility on Covid_19. We need national quarantine!!” [C10384]

“We are hungry, no food no light - you cannot tell us to stay indoors. Nigerians in parts of Lagos already threatening to defy government lockdown directives with a protest in two days” [C8886]

Risk of Spread at Detention Centres

Moreover, people raised concerns about risk of spreading COVID-19 disease in prisons and the need for decongestion, as shown in the sample comment below. Evidence already shows that incarcerated populations are vulnerable to infectious diseases, including COVID-19, due to unavoidable close contact (since prisons or detention centres are often overcrowded and poorly ventilated/sanitized) and poor healthcare access [77, 78].

“As cases increase in Texas, so do concerns about the well-being of people in Texas prisons and jails.” [C1555]

5.1.3 Educational Issues

Disruption in Education

Our findings revealed the disruptive effect of the COVID-19 pandemic on education globally, such as school closures. People are concerned about their children or wards’ education including the cost implication of virtual learning put in place by schools, as well as children in rural areas who would be deprived of learning. This aligns with evidence highlighting the effect of school closures on 80% of children worldwide, and the worsened inequalities in educational outcomes between children in lower- and higher-income countries [79].

“There is still NO date for schools to reopen in the Capital.” [C4440]

“You are putting the most disadvantaged students at a further educational disadvantage...” [C91]

“What kind of education are we getting? We haven’t paid such high fees for ZOOM kind of education. Our online education system is so sick and badly affecting our grades and that’s totally unfair!! 90k for this kind of education is way too worthy!” [C181]

Knowledge Gap

Furthermore, knowledge gap (in form of ignorance and lack of intelligence) on the part of leadership and society is another factor hampering the containment of COVID-19, based on our findings. As shown in the comments below, authorities are short of knowledge as regards what should be done, while people are ill-informed due to limited access to accurate and coherent information about the disease and preventive/control measures.

“Nursing, a primarily female profession, is under attack. The CDC in ignorance says wear a bandana. THIS IS NOT PROTECTION! Lives at stake! Hospitals have their heads in the sand. Please! Can you hear us?” [C818]

“Some neighbors even want us out because they think they would breathe this virus in the air and we’re inside our own hostel! This is dangerous ignorance!” [C6000]

Misinformation

The proliferation of misinformation is impeding access to accurate information about COVID-19 that could have helped curb the spread of the disease and save lives. Misinformation which refers to false information or information with limited or without scientific evidence is one of the top 5 issues that emerged in our findings, and also reported by previous research [80,81,82,83,84,85]. Sample comment below reveals public concerns about misinformation regarding COVID-19:

“The amount of fake news on all the health concerns regarding COVID19 is shocking. Only person I trust with info is my cousin who is a doctor. She has just told me it DOESN’T last in the air, as long as you are two metres away and sneeze into a tissue you are fine!!” [C12010]

5.1.4 Political Issues

Elected governments or political appointees are central to decision-making or governance that should improve the standard of living of people and assure their health and safety. However, people are concerned about widespread interference in COVID-19-related matters for political gain, based on our findings. In addition, they are concerned about the absence of strong leadership in the wake of the pandemic, and the poor state (or lack) of key public infrastructure (e.g., electricity, water, internet, and healthcare facilities/centres). Research has shown that political beliefs and partisanship pose a significant limitation on the effectiveness of preventive measures (such as social distancing) [86,87,88]. Sample comments below reveal public opinion regarding these issues:

“Quite possibly the worst governor in the country. He’s hurting not only his own citizens but all Americans, as all the people on spring break and theme parks go home with covid19.” [C5777]

“When authorities and armed forces asked you to self-quarantine with no internet and no electricity (16 hours load shedding) in Hunza ...?!” [C108]

5.2 Interventions for Addressing the Negative Issues

In this section, we suggest interventions that can help address the negative issues while drawing insights from the positive themes and relevant research evidence. This answers our third research question (RQ3).

To cushion the effect of economic issues on people, “charity” and “grassroots support” are important factors as revealed in our findings. Mobile technology can play a significant role in ensuring effective distribution of relief items. For example, GPS-enabled and multilingual mobile apps can help people to easily find food banks nearest to them. Moreover, government-funded or non-governmental charity organizations responsible for distributing economic relief to people can easily expand their reach or coverage and make delivery decisions based on data collected through these apps. For instance, people can indicate their needs through these apps and other information, such as location, age group, health condition, and whether they are in self-isolation (due to exposure to COVID-19). These apps can also be used to onboard volunteers who want to offer financial and material assistance and connect them to those in need. Furthermore, the data collected through these apps can further be analyzed using artificial intelligence (AI) techniques (such as machine learning or deep learning) to predict the communities that are in dire need of assistance. Besides technology usage, governments can budget for additional measures to protect the finances of people and businesses such as keeping people employed through financial partnership with employers, providing stimulus packages, and facilitating quick employment for the jobless [89]. Evidence shows that governments of some countries are adopting these measures to varying degrees [90].

Regarding shortage of items to protect people from the pandemic (such as face mask and hand sanitizers), a “homemade protective equipment” approach can be employed as a viable alternative, as revealed in our findings. Evidence shows that homemade masks, for example, can offer protection from COVID-19 transmission, in the event where medical masks are not available [91]. To address supply chain issues with respect to high demand and essential products, research suggests recovery strategies (such as increase in production shifts, use of spare capacity, emergency sourcing, bolstering capacity locally, and collaboration with supply chain partners) [92, 93].

Regarding concerns about social distancing and isolation policies imposed by governments, as well as controversies over precautionary measures suggested by health professionals, “public awareness” is a major and useful tool to address these issues, including misinformation, as revealed in our findings. Providing timely and accurate COVID-19-related information to people, and also connecting them to evidence-based resources and health professionals to resolve their questions or confusions, can be lifesaving. To reach a wider audience on a personalized basis, mobile- and voice-enabled chatbots equipped with real-time access to evidence-based and validated resources (such as approved safety measures by World Health Organization, as well as government-approved policies or guidelines) can be developed such that people can interact (in their own language) with the chatbots using their smartphones anytime. Difficult questions can be automatically channeled to health experts for responses within the same chat window. For those with traditional cellular phones, governments and local health agencies can partner with telecom firms to deliver COVID-19-related information directly to people’s phones as a short messaging service (SMS) at regular intervals. In addition, official COVID-19-related channels on social media (such as [94]) supervised by health experts and local/international health organizations can provide accurate and frequent updates.

Regarding educational disruptions due to COVID-19, evidence shows that digital technologies are pedagogical tools that can enhance diverse forms of learning both within and outside the school environment [95]. Based on our findings, “online learning” (also called e-learning, virtual learning, virtual classroom, digital classroom, or distance learning) will help mitigate the impact of educational disruptions caused by the COVID-19 pandemic. While it may not be as effective as in-class learning in some cases, it will prevent potential brain drain that may result in the absence of continuous learning. Mobile and web-based learning platforms, many of which are available today, should be readily accessible in schools at all levels going forward. Designers should ensure these tools provide personalized learning experience such that students can manage their own content and the tools offer tailored suggestions that fit their interests or needs. The tools should also support collaborative learning where students can work together on assignments, projects, or other tasks similar to what they do in the real-world. Furthermore, governments across the globe should ensure equitable access to these educational technologies irrespective of economic, financial, racial, or cultural differences. Public infrastructure supporting these technologies (such as stable electricity, as well as affordable and reliable internet) should be considered a top priority and made readily accessible to people.

Finally, governments at all levels should partner (rather than compete) with health professionals and researchers to form a strong force against COVID-19. Our findings revealed “advocacy for testing” which reflects public call for increased testing since some governments are still struggling in this area due to their political interests superseding public health. Research argues the significance and effectiveness of adaptive evidence-making intervention (a fusion of scientific evidence and policy) during public health emergencies (such as the COVID-19 pandemic) [96]. This can only be possible if political leaders and health experts align and work harmoniously to address current and future pandemics.

6 Conclusion and Future Work

We explored the impact of the COVID-19 pandemic on people globally using social media data. We analyzed over 1 million comments obtained from six social media platforms using a seven-stage context-aware natural language processing (NLP) approach to extract relevant keyphrases which were further categorized into broader themes. Our results revealed 34 negative themes and 20 positive themes surrounding the COVID-19 pandemic. We discussed the economic, socio-political, educational, and political issues and suggested interventions to tackle them based on the positive themes and research evidence. These interventions would inform and help governments, organizations, and individuals to minimize the spread and impact of COVID-19 and to respond effectively to future pandemics.

As part of future work, we would use the data generated from this work to train, evaluate, and compare machine learning models that detect the themes and corresponding sentiment of social media comments related to COVID-19 in real-time. The best performing model(s) could be integrated with applications and visualization tools to provide useful and personalized features/interventions, as well as uncover real-time insights that could help to curb the spread of the virus and mitigate its impact on the society.

Availability of Data and Material

Data and materials related to this research may be shared upon request and subject to approval by the research supervisor.

Code Availability

The source code related to this research may be shared upon request and subject to approval by the research supervisor.

References

Morens DM, Folkers GK, Fauci AS (2004) The challenge of emerging and re-emerging infectious diseases. Nature 430:242–249. https://doi.org/10.1038/nature02759
Article Google Scholar
Jilani TN, Jamil RT, Siddiqui AH (2021) H1N1 Influenza. In: StatPearls. StatPearls Publishing, Treasure Island (FL)
Dawood FS, Iuliano AD, Reed C, Meltzer MI, Shay DK, Cheng PY, Bandaranayake D, Breiman RF, Brooks WA, Buchy P, Feikin DR, Fowler KB, Gordon A, Hien NT, Horby P, Huang QS, Katz MA, Krishnan A, Lal R, Montgomery JM, Mølbak K, Pebody R, Presanis AM, Razuri H, Steens A, Tinoco YO, Wallinga J, Yu H, Vong S, Bresee J, Widdowson MA (2012) Estimated global mortality associated with the first 12 months of 2009 pandemic influenza A H1N1 virus circulation: a modelling study. Lancet Infect Dis 12:687–695. https://doi.org/10.1016/S1473-3099(12)70121-4
Article Google Scholar
World Health Organization (2019) HIV/AIDS. https://www.who.int/news-room/fact-sheets/detail/hiv-aids. Accessed 16 May 2020
Tian H, Liu Y, Li Y, Wu C-H, Chen B, Kraemer MUG, Li B, Cai J, Xu B, Yang Q, Wang B, Yang P, Cui Y, Song Y, Zheng P, Wang Q, Bjornstad ON, Yang R, Grenfell BT, Pybus OG, Dye C (2020) An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China. Science (80-) 368:eabb6105. https://doi.org/10.1126/science.abb6105
Article Google Scholar
Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, Yuan ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YZ (2020) A new coronavirus associated with human respiratory disease in China. Nature 579:265–269. https://doi.org/10.1038/s41586-020-2008-3
Article Google Scholar
Johns Hopkins Coronavirus Resource Center COVID-19 map. https://coronavirus.jhu.edu/map.html. Accessed 16 May 2020
Jones KE, Patel NG, Levy MA, Storeygard A, Balk D, Gittleman JL, Daszak P (2008) Global trends in emerging infectious diseases. Nature 451:990–993. https://doi.org/10.1038/nature06536
Article Google Scholar
Bloom DE, Black S, Rappuoli R (2017) Emerging infectious diseases: a proactive approach. Proc Natl Acad Sci U S A 114:4055–4059
Article Google Scholar
Fan V, Jamison D, Summers L (2016) The inclusive cost of pandemic influenza risk. Natl Bur Econ Res. https://doi.org/10.3386/w22137
Article Google Scholar
Barbier G, Liu H (2011) Data mining in social media. Social network data analytics. Springer, US, pp 327–352
Chapter Google Scholar
Kemp S (2020) Digital 2020: global digital overview. https://datareportal.com/reports/digital-2020-global-digital-overview. Accessed 17 May 2020
Robinson P, Turk D, Jilka S, Cella M (2019) Measuring attitudes towards mental health using social media: investigating stigma and trivialisation. Soc Psychiatry Psychiatr Epidemiol 54:51–58. https://doi.org/10.1007/s00127-018-1571-5
Article Google Scholar
Guntuku SC, Buffone A, Jaidka K, Eichstaedt J, Ungar L (2018) Understanding and measuring psychological stress using social media. In: Proceedings of the 13th international conference on web and social media, ICWSM 2019. Association for the Advancement of Artificial Intelligence, pp 214–225
Zhan Y, Etter J-F, Leischow S, Zeng D (2019) Electronic cigarette usage patterns: a case study combining survey and social media data. J Am Med Informatics Assoc 26:9–18. https://doi.org/10.1093/jamia/ocy140
Article Google Scholar
Hassanpour S, Tomita N, DeLise T, Crosier B, Marsch LA (2019) Identifying substance use risk based on deep neural networks and Instagram social media data. Neuropsychopharmacology 44:487–494. https://doi.org/10.1038/s41386-018-0247-x
Article Google Scholar
Huang Y, Huang D, Nguyen QC (2019) Census tract food tweets and chronic disease outcomes in the U.S., 2015–2018. Int J Environ Res Public Health 16:975. https://doi.org/10.3390/ijerph16060975
Article Google Scholar
Oyebode O, Orji R (2019) Detecting factors responsible for diabetes prevalence in Nigeria using social media and machine learning. In: 15th international conference on network and service management (CNSM 2019). Institute of Electrical and Electronics Engineers Inc., pp 1–4
Chew C, Eysenbach G (2010) Pandemics in the age of Twitter: content analysis of tweets during the 2009 H1N1 outbreak. PLoS One 5: https://doi.org/10.1371/journal.pone.0014118
Signorini A, Segre AM, Polgreen PM (2011) The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS One 6:e19467. https://doi.org/10.1371/journal.pone.0019467
Article Google Scholar
Oyebode O, Orji R (2019) Social media and sentiment analysis: the Nigeria presidential election 2019. In: 2019 IEEE 10th annual information technology, electronics and mobile communication conference, IEMCON 2019. Institute of Electrical and Electronics Engineers Inc., pp 140–146
Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. Proc Fourth Int AAAI Conf Weblogs Soc Media Predict. https://doi.org/10.1074/jbc.M501708200
Article Google Scholar
Budiharto W, Meiliana M (2018) Prediction and analysis of Indonesia presidential election from Twitter using sentiment analysis. J Big Data 5: https://doi.org/10.1186/s40537-018-0164-1
Tjong E, Sang K, Bos J (2012) Predicting the 2011 Dutch senate election results with Twitter. 13th Conf Eur Chapter Assoc Comput Linguist 65–72
Ma J, Tse YK, Wang X, Zhang M (2019) Examining customer perception and behaviour through social media research – an empirical study of the United Airlines overbooking crisis. Transp Res Part E Logist Transp Rev 127:192–205. https://doi.org/10.1016/j.tre.2019.05.004
Article Google Scholar
Ibrahim NF, Wang X (2019) Decoding the sentiment dynamics of online retailing customers: time series analysis of social media. Comput Human Behav 96:32–45. https://doi.org/10.1016/j.chb.2019.02.004
Article Google Scholar
Tissot HC, Asselbergs FW, Shah AD, Brealey D, Harris S, Agbakoba R, Folarin A, Romao L, Roguski L, Dobson R (2020) Natural language processing for mimicking clinical trial recruitment in critical care: a semi-automated simulation based on the LeoPARDS trial. IEEE J Biomed Heal Informatics 1–1 . https://doi.org/10.1109/jbhi.2020.2977925
Vilic A, Petersen JA, Hoppe K, Sorensen HBD (2016) Visualizing patient journals by combining vital signs monitoring and natural language processing. In: Proceedings of the annual international conference of the IEEE Engineering in Medicine and Biology Society, EMBS. Institute of Electrical and Electronics Engineers Inc., pp 2529–2532
Grajales FJ, Sheps S, Ho K, Novak-Lauscher H, Eysenbach G (2014) Social media: a review and tutorial of applications in medicine and health care. J Med Internet Res 16:1–68. https://doi.org/10.2196/jmir.2912
Article Google Scholar
Oyebode O, Ndulue C, Adib A, Mulchandani D, Suruliraj B, Orji FA, Chambers C, Meier S, Orji R (2021) Health, psychosocial, and social issues emanating from the COVID-19 pandemic based on social media comments: text mining and thematic analysis approach. JMIR Med Informatics 9:e22734. https://doi.org/10.2196/22734
Article Google Scholar
Park A, Conway M (2017) Tracking health related discussions on Reddit for public health applications. AMIA Annu Symp proceedings AMIA Symp 2017:1362–1371
Google Scholar
Jelodar H, Wang Y, Orji R, Huang H (2020) Deep sentiment classification and topic discovery on novel coronavirus or COVID-19 online discussions: NLP using LSTM recurrent neural network approach. IEEE J Biomed Heal Informatics 1–1 . https://doi.org/10.1109/jbhi.2020.3001216
Bekhuis T, Kreinacke M, Spallek H, Song M, O’Donnell JA (2011) Using natural language processing to enable in-depth analysis of clinical messages posted to an Internet mailing list: a feasibility study. J Med Internet Res 13:e98. https://doi.org/10.2196/jmir.1799
Article Google Scholar
Paul MJ, Dredze M (2011) You are what you tweet: Analyzing twitter for public health. In Fifth International AAAI Conference on Weblogs and Social Media, pp. 265–272
Nobles AL, Dreisbach CN, Keim-Malpass J, Barnes LE (2018) “Is This an STD? Please Help!”: Online Information Seeking for Sexually Transmitted Diseases on Reddit. In Twelfth International AAAI Conference on Web and Social Media, pp. 660–663.
McNeill A, Harris PR, Briggs P (2016) Twitter Influence on UK vaccination and antiviral uptake during the 2009 H1N1 pandemic. Front Public Heal 4:26. https://doi.org/10.3389/fpubh.2016.00026
Article Google Scholar
Agarwal A, Baechle C, Behara R, Zhu X (2018) A natural language processing framework for assessing hospital readmissions for patients with COPD. IEEE J Biomed Heal Informatics 22:588–596. https://doi.org/10.1109/JBHI.2017.2684121
Article Google Scholar
Oyebode O, Alqahtani F, Orji R (2020) Using machine learning and thematic analysis methods to evaluate mental health apps based on user reviews. IEEE Access 8:111141–111158. https://doi.org/10.1109/ACCESS.2020.3002176
Article Google Scholar
Dave K, Varma V (2010) Pattern based keyword extraction for contextual advertising. International conference on information and knowledge management, proceedings. ACM Press, New York, New York, USA, pp 1885–1888
Google Scholar
Twitter Inc. Consuming streaming data. https://developer.twitter.com/en/docs/tutorials/consuming-streaming-data. Accessed 20 May 2020
Google Inc. YouTube Data API. https://developers.google.com/youtube/v3. Accessed 20 May 2020
Nlife media corona virus panic/discussion thread - general discussion forum. https://www.pushsquare.com/forums/ps_general_discussion/corona_virus_panicdiscussion_thread. Accessed 2 Apr 2020
Archinect corona virus covid-19 and you | Forum. https://archinect.com/forum/thread/150187455/corona-virus-covid-19-and-you. Accessed 2 Apr 2020
Archinect COVID - 19 thread central | Forum | Archinect. https://archinect.com/forum/thread/150188615/covid-19-thread-central. Accessed 2 Apr 2020
Future US Inc. Coronavirus & epidemiology | Live Science forums. https://forums.livescience.com/forums/coronavirus-epidemiology.42/. Accessed 2 Apr 2020
Slang words dictionary. https://raw.githubusercontent.com/sifei/Dictionary-for-Sentiment-Analysis/master/slang/acrynom.csv. Accessed 19 Jun 2019
Slang lookup table. https://raw.githubusercontent.com/felipebravom/StaticTwitterSent/master/extra/SentiStrength/SlangLookupTable.txt. Accessed 19 Jun 2019
PyPI langdetect 1.0.8. https://pypi.org/project/langdetect/. Accessed 9 Aug 2020
Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003 -. Association for Computational Linguistics (ACL), Morristown, NJ, USA, pp 142–147
Asmuth JA, Gentner D (2005) Context sensitivity of relational nouns. In: Proceedings of the 27th annual meeting of the Cognitive Science Society. pp 163–168
Chesley P, Vincent B, Xu L, Srihari RK (2006) Using verbs and adjectives to automatically classify blog sentiment. Training 580:233
Google Scholar
Santorini B (1990) Part-of-speech tagging guidelines for the Penn Treebank project (3rd revision). Technical Reports (CIS) :570. https://www.repository.upenn.edu/cgi/viewcontent.cgi?article=1603&context=cis_reports
Taylor A, Marcus M, Santorini B (2003) The Penn Treebank: an overview. Treebanks: building and using parsed corpora. Springer, Dordrecht, pp 5–22
Chapter Google Scholar
nltk.tokenize package — NLTK 3.5 documentation. http://www.nltk.org/api/nltk.tokenize.html?highlight=tokenizer#nltk.tokenize.punkt.PunktSentenceTokenizer. Accessed 23 May 2020
Han P, Shen S, Wang D, Liu Y (2012) The influence of word normalization in English document clustering. In: CSAE 2012 - proceedings, 2012 IEEE international conference on computer science and automation engineering. pp 116–120
Liu B (2015) Sentence subjectivity and sentiment classification. In: Sentiment analysis: mining opinions, sentiments, and emotions. pp 70–88
Hutto CJ, Gilbert E (2014) VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth international AAAI conference on weblogs and social media. pp 216–225
McHugh ML (2012) Interrater reliability: the kappa statistic. Biochem Medica 22:276–282. https://doi.org/10.11613/bm.2012.031
Article Google Scholar
Blustein DL, Duffy R, Ferreira JA, Cohen-Scali V, Cinamon RG, Allan BA (2020) Unemployment in the time of COVID-19: a research agenda. J Vocat Behav. 119:103436
Article Google Scholar
Kawohl W, Nordt C (2020) COVID-19, unemployment, and suicide. The Lancet Psychiatry 7:389–390
Article Google Scholar
Fairlie R, Couch K, Xu H (2020) The impacts of COVID-19 on minority unemployment: first evidence from April 2020 CPS microdata. Cambridge, MA
Coibion O, Gorodnichenko Y, Weber M (2020) The cost of the Covid-19 crisis: lockdowns, macroeconomic expectations, and consumer spending. Cambridge, MA
Bartik AW, Bertrand M, Cullen ZB, Glaeser EL, Luca M, Stanton CT (2020) How are small businesses adjusting to COVID-19? Early evidence from a survey. Nat Bureau Econ Res 1–35. https://doi.org/10.3386/w26989
Didier T, Huneeus F, Larrain M, Schmukler SL (2021) Financing firms in hibernation during the COVID-19 pandemic. J Financial Stab 53:100837. https://doi.org/10.1016/j.jfs.2020.100837
Constantino Hevia, Pablo Andrés Neumeyer (2020) A perfect storm: COVID-19 in emerging economies | VOX, CEPR Policy Portal. https://voxeu.org/article/perfect-storm-covid-19-emerging-economies. Accessed 21 Jun 2020
Baker S, Bloom N, Davis S, Kost K, Sammon M, Viratyosin T (2020) The unprecedented stock market impact of COVID-19. Natl Bur Econ Res. https://doi.org/10.3386/w26945
Article Google Scholar
Nicola M, Alsafi Z, Sohrabi C, Kerwan A, Al-Jabir A, Iosifidis C, Agha M, Agha R (2020) The socio-economic implications of the coronavirus pandemic (COVID-19): a review. Int J Surg 78:185–193
Article Google Scholar
Ranney ML, Griffeth V, Jha AK (2020) Critical supply shortages - the need for ventilators and personal protective equipment during the Covid-19 pandemic. N Engl J Med 382:E41
Article Google Scholar
Iacobucci G (2020) Covid-19: lack of PPE in care homes is risking spread of virus, leaders warn. BMJ 368:m1280. https://doi.org/10.1136/bmj.m1280
Article Google Scholar
Nogee D, Tomassoni A (2020) Concise communication: Covid-19 and the N95 respirator shortage: closing the gap. Infect Control Hosp Epidemiol 1–1 . https://doi.org/10.1017/ice.2020.124
Goddard E (2020) The impact of COVID-19 on food retail and food service in Canada: preliminary assessment. Can J Agric Econ. https://doi.org/10.1111/cjag.12243
Article Google Scholar
Wang Y, Han W, Pan L, Wang C, Liu Y, Hu W, Zhou H, Zheng X (2020) Impact of COVID‐19 on blood centres in Zhejiang province China. Vox Sang vox.12931 . https://doi.org/10.1111/vox.12931
Akinleye O, S. Dauda RO, Iwegub O, Popogbe OO (2020) Impact of COVID-19 pandemic on financial health and food security: a survey-based analysis. SSRN Electron J. https://doi.org/10.2139/ssrn.3619245
Suzumura T, Kanezashi H, Dholakia M, Ishii E, Napagao SA, Pérez-Arnal R, Garcia-Gasulla D (2020) The impact of COVID-19 on flight networks. In 2020 IEEE International Conference on Big Data 2443–2452. https://doi.org/10.1109/BigData50022.2020.9378218
Villa S, Lombardi A, Mangioni D, Bozzi G, Bandera A, Gori A, Raviglione MC (2020) The COVID-19 pandemic preparedness ... or lack thereof: from China to Italy. Glob Heal Med 2:73–77. https://doi.org/10.35772/ghm.2020.01016
Article Google Scholar
Timmis K, Brüssow H (2020) The COVID-19 pandemic: some lessons learned about crisis preparedness and management, and the need for international benchmarking to reduce deficits. Environ Microbiol. https://doi.org/10.1111/1462-2920.15029
Article Google Scholar
Dolan K, Wirtz AL, Moazen B, Ndeffo-mbah M, Galvani A, Kinner SA, Courtney R, McKee M, Amon JJ, Maher L, Hellard M, Beyrer C, Altice FL (2016) Global burden of HIV, viral hepatitis, and tuberculosis in prisoners and detainees. Lancet 388:1089–1102
Article Google Scholar
Kinner SA, Young JT, Snow K, Southalan L, Lopez-Acuña D, Ferreira-Borges C, O’Moore É (2020) Prisons and custodial settings are part of a comprehensive response to COVID-19. Lancet Public Heal 5:e188–e189
Article Google Scholar
Van Lancker W, Parolin Z (2020) COVID-19, school closures, and child poverty: a social crisis in the making. Lancet Public Heal 5:e243–e244
Article Google Scholar
Mian A, Khan S (2020) Coronavirus: the spread of misinformation. BMC Med 18:89
Article Google Scholar
Erku DA, Belachew SA, Abrha S, Sinnollareddy M, Thomas J, Steadman KJ, Tesfaye WH (2021) When fear and misinformation go viral: Pharmacists’ role in deterring medication misinformation during the ‘infodemic’ surrounding COVID-19. Res Soc Admin Pharm 17(1):1954–1963. https://doi.org/10.1016/j.sapharm.2020.04.032
Earnshaw VA, Katz IT (2020) Educate, amplify, and focus to address COVID-19 misinformation. JAMA Heal Forum 1:e200460–e200460. https://doi.org/10.1001/JAMAHEALTHFORUM.2020.0460
Article Google Scholar
Laato S, Islam AKMN, Islam MN, Whelan E (2020) Why do people share misinformation during the COVID-19 pandemic? https://doi.org/10.1080/0960085X.2020.1770632
Nasir NM, Baequni B, Nurmansyah MI (2020) Misinformation related to COVID-19 in Indonesia. J Adm Kesehat Indones 8:51–59
Google Scholar
Motta M, Stecula D, Farhart C (2020) How right-leaning media coverage of COVID-19 facilitated the spread of misinformation in the early stages of the pandemic in the U.S. Can J Polit Sci 1–8 . https://doi.org/10.1017/S0008423920000396
Painter M, Qiu T (2020) Political beliefs affect compliance with COVID-19 social distancing orders. SSRN Electron J. https://doi.org/10.2139/ssrn.3569098
Article Google Scholar
Grossman G, Kim S, Rexer J, Thirumurthy H (2020) Political partisanship influences behavioral responses to governors’ recommendations for COVID-19 prevention in the United States. SSRN Electron J. https://doi.org/10.2139/ssrn.3578695
Article Google Scholar
Adolph C, Amano K, Bang-Jensen B, Fullman N, Wilkerson J (2020) Pandemic politics: timing state-level social distancing responses to COVID-19. medRxiv 2020.03.30.20046326 . https://doi.org/10.1101/2020.03.30.20046326
Stuckler D, Basu S, Suhrcke M, Coutts A, McKee M (2009) The public health effect of economic crises and alternative policy responses in Europe: an empirical analysis. Lancet 374:315–323. https://doi.org/10.1016/S0140-6736(09)61124-7
Article Google Scholar
Gentilini U, Almenfi M, Orton I, Dale P (2020) Social protection and jobs responses to COVID-19: a real-time review of country measures. WB, Washington DC
Google Scholar
Eikenberry SE, Mancuso M, Iboi E, Phan T, Eikenberry K, Kuang Y, Kostelich E, Gumel AB (2020) To mask or not to mask: modeling the potential for face mask use by the general public to curtail the COVID-19 pandemic. Infect Dis Model 5:293–308. https://doi.org/10.1016/j.idm.2020.04.001
Article Google Scholar
Paul SK, Chowdhury P (2020) A production recovery plan in manufacturing supply chains for a high-demand item during COVID-19. Int J Phys Distrib Logist Manag. https://doi.org/10.1108/IJPDLM-04-2020-0127
Article Google Scholar
Gereffi G (2020) What does the COVID-19 pandemic teach us about global value chains? The case of medical supplies. J Int Bus Policy 1–15 . https://doi.org/10.1057/s42214-020-00062-w
Facebook Inc. Coronavirus (COVID-19) Information Center. https://www.facebook.com/coronavirus_info. Accessed 13 Jul 2020
John P, Wheeler S (2015) The digital classroom: harnessing technology for the future of learning and teaching. Routledge
Book Google Scholar
Lancaster K, Rhodes T, Rosengarten M (2020) Making evidence and policy in public health emergencies: lessons from COVID-19 for adaptive evidence-making and intervention. Evid Policy A J Res Debate Pract. https://doi.org/10.1332/174426420x15913559981103
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the DeepSense team at Dalhousie University, as well as Compute Canada, for providing the computing infrastructure used to perform our experiment/analysis.

Funding

This research was undertaken, in part, thanks to funding from the Canada Research Chairs Program. We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) through the Discovery Grant.

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, Halifax, NS, B3H 4R2, Canada
Oladapo Oyebode, Chinenye Ndulue, Dinesh Mulchandani, Banuchitra Suruliraj, Ashfaq Adib, Evangelos Milios, Stan Matwin & Rita Orji
Department of Computer Science, University of Saskatchewan, Saskatoon, SK, S7N 5C9, Canada
Fidelia Anulika Orji
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
Stan Matwin

Authors

Oladapo Oyebode
View author publications
You can also search for this author in PubMed Google Scholar
Chinenye Ndulue
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh Mulchandani
View author publications
You can also search for this author in PubMed Google Scholar
Banuchitra Suruliraj
View author publications
You can also search for this author in PubMed Google Scholar
Ashfaq Adib
View author publications
You can also search for this author in PubMed Google Scholar
Fidelia Anulika Orji
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Milios
View author publications
You can also search for this author in PubMed Google Scholar
Stan Matwin
View author publications
You can also search for this author in PubMed Google Scholar
Rita Orji
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Material preparation, data collection, and analysis were performed by Oladapo Oyebode. Chinenye Ndulue, Dinesh Mulchandani, Banuchitra Suruliraj, and Ashfaq Adib also participated in data collection and conducted thematic analysis to categorize the keyphrases. The manuscript was written by Oladapo Oyebode and reviewed by other authors. All authors proofread and approved the final manuscript.

Corresponding author

Correspondence to Oladapo Oyebode.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oyebode, O., Ndulue, C., Mulchandani, D. et al. COVID-19 Pandemic: Identifying Key Issues Using Social Media and Natural Language Processing. J Healthc Inform Res 6, 174–207 (2022). https://doi.org/10.1007/s41666-021-00111-w

Download citation

Received: 23 January 2021
Revised: 03 November 2021
Accepted: 01 December 2021
Published: 11 February 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s41666-021-00111-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

COVID-19 Pandemic: Identifying Key Issues Using Social Media and Natural Language Processing

Abstract

Similar content being viewed by others

Shifting sentiments: analyzing public reaction to COVID-19 containment policies in Wuhan and Shanghai through Weibo data

Themes and Sentiments of Online Comments Under COVID-19: A Case Study of Macau

Global News Sentiment Analysis

Explore related subjects

1 Introduction

2 Related Work

3 Methodology

3.1 Data Collection

3.2 Data Preprocessing

3.3 Keyphrase Extraction

3.3.1 Grammar Definition

3.3.2 Sentence Breaking and Tokenization

3.3.3 POS Tagging

3.3.4 Lemmatization

3.3.5 Chunking

3.3.6 Transformation and Filtering

3.3.7 Sentiment Scoring and Filtering

3.4 Categorizing Keyphrases

4 Results

4.1 Negative Keyphrases

4.2 Positive Keyphrases

4.3 Keyphrase Categories

5 Discussion

5.1 Negative Issues Regarding COVID-19 Pandemic

5.1.1 Economic Issues

Job- and Business-Related Crisis

Economic Downturn

Shortage of Essential Items

Challenging Living Condition and Financial Issues

Flight Cancellations

5.1.2 Socio-political Issues

Concerns About Social Distancing and Isolation Policies

Controversy over Precautionary Measures

Lack of Preparedness and Protests

Risk of Spread at Detention Centres

5.1.3 Educational Issues

Disruption in Education

Knowledge Gap

Misinformation

5.1.4 Political Issues

5.2 Interventions for Addressing the Negative Issues

6 Conclusion and Future Work

Availability of Data and Material

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation