Sentiment Analysis of COVID-19 Vaccines

Singh, Amritpal; Bhasin, Vandana; Jatana, Abhishek; Saxena, Naval; Rojal, Shivam

doi:10.1007/978-981-19-2535-1_2

Amritpal Singh¹⁵,
Vandana Bhasin¹⁵,
Abhishek Jatana¹⁵,
Naval Saxena¹⁵ &
…
Shivam Rojal¹⁵

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 471))

461 Accesses

Abstract

Social media is invariably being used these days for exchanging information and views on global affairs including COVID-19 pandemic. In this study, we have worked to understand the public attitudes of people in different countries towards COVID-19 vaccines using social media platform Twitter. We have applied natural language processing techniques of sentiment analysis to get an insightful outlook on people’s views. Hence, we categorized our results into fine-grained polarities to grasp the exact sentiment. For analyzing the sentiments, we have taken tweets that expressed sentiments for all countries, as well as for four countries that had higher fatality rates are United States of America, Mexico, Brazil and India. The people have expressed a neutral opinion towards the vaccines. Based on the sentiment, the vaccines were also ranked in which the people have expressed more faith in Sputnik V and Covishield vaccines.

Access provided by Autonomous University of Puebla. Download conference paper PDF

COVID-19 vaccination perception and outcome: society sentiment analysis on twitter data in India

Article 10 May 2023

Sentiment Analysis of COVID-19 Vaccine Rollout in India

Sentiment Analysis on COVID-19 Tweets

Keywords

1 Introduction

The world has gone through a tremendous turmoil with the outbreak of pneumonia-like disease, coronavirus. This unprecedented health issue has brought forth uncalled lockdowns and marked effect on the economies of the world. In December 2019, the virus SARS-COV-2 became a public health emergency whose outbreak had started in Hubei Province, Wuhan, China. In February 2020, WHO (World Health Organization) called the disease COVID-19 and declared it a global pandemic [1]. The two major reasons that led to the spread of the disease were its highly contagious nature and its numerous variants along with public’s nonadherence to guidelines related to wearing masks and maintaining social distance. This distress increased with many asymptomatic cases who were carriers of disease and resulted in the infection spreading over a period of time [2]. The coronavirus resource center of John Hopkins University indicates that virus has infected over 243 million people all over the world and has taken lives of more than 4.9 million people [3]. This outbreak became so difficult to handle that the only feasible solution visible was to develop vaccines. Pfizer/BioNtech Company made the first vaccine, which was approved for usage amongst all communities. It was authorized for use by the people of United Kingdom. The authors in these papers have put forth the development process of the vaccines, their different varieties and the front runner vaccine candidates [4, 5]. The vaccination drives worldwide have been able to vaccinate 68.02 million people [3]. The Indian government has also achieved a feat of being able to vaccinate 100 crore people.

As such the governments’ only hope of controlling the viral spread was public acceptance of safe and efficacious vaccines. The vaccination process has been under tremendous pressure due to hesitancy, distrust and debate. This hesitancy of the public towards the vaccine has been projected by WHO as one of the top 10 global health threats. This hesitancy clubbed with substantial rumors is the biggest obstacle in achieving mass vaccinations of population to bring in herd immunity [6, 7].

The original method of gauging the opinions of people was through surveys and other traditional methods. Even though these traditional methods could be used to gauge the willingness of individuals to get vaccinated but the dynamic nature of the sentiment would be missing. Hence, instead of applying the traditional methods of data collection, we have explored the data from social media, so that public sentiments and attitudes could be gauged in real time [8]. This study is conducted to understand the general public psychological frame of mind towards vaccines using social media. As is reflected, half of the world population is on social media, where India itself has 448 million active social media users [9]. However, the data on social media are largely unstructured, and hence we have used natural language processing to extract tweets from Twitter. We have initially taken tweets from all over the world to perform sentiment analysis. Then we have retrieved tweets for four countries where the fatality rates were high. These countries include—India, United States of America (US), Brazil and Mexico including sentiment analysis for entire dataset.

Sentiment analysis is a process where the subjective opinions are extracted and categorized using text, audio and video sources to determine polarities and subjectivities, feelings, or states of mind towards target issues, themes or areas of relevance [10]. These approaches can be used by the medical arena as well as by the government for public policy research. The main contribution of this study can be summarized as.

To classify the sentiments of people around the globe for COVID-19 vaccines—AstraZeneca, Sputnik V, Covishield, Covaxin, Moderna and Pfizer including four countries—India, US, Mexico and Brazil.
To perform word cloud mapping to monitor the frequency of highly used words.
Rank the vaccines according to the sentiment of the people.

2 Related Literature

The posts on social media express the views and opinions of the public in an unadulterated and unstructured form. So, the researchers have been using this platform extensively as the unbiased opinion of public is available easily. Twitter is amongst those social media platforms that have received posts reaching up to millions. A lot of research work has been carried out using twitter dataset on different areas of COVID-19 pandemic.

In this paper, the authors have applied an artificial intelligence-based approach using 3,00,000 social media posts from Facebook and other platforms for United Kingdom and United States [8]. They have used the natural language approach to predict average sentiments, trends and have also found their topics/themes. They have identified the positive, negative and neutral sentiments for both the countries and correlated their findings. They have identified the optimism of the public related to vaccines as well as their safety and economic viability. They even compared their results with nationwide surveys.

Glowwacki et al. analyzed the Twitter dataset to examine the addiction concerns during COVID-19 pandemic in US and other countries [11]. Their work was focused on two keywords—covid and addiction. They were able to bring forth 14 topics prevalent during that time using 3301 tweets. The authors highlighted the public discussions that were happening on Twitter related to addiction for consideration from the health management authorities but did not base their study on addictions due to COVID-19.

The authors have performed an exploratory study to find out the public sentiment towards the effectiveness of a mask for prevention of COVID-19 [12]. They performed this analysis using tools like natural language processing, sentiment analysis and clustering. The clustering technique helped in classifying high-level themes and fifteen subtopics within each of these themes. They found that initially there were negative trends towards wearing masks in each of the themes. These trends are an indication for gaining deeper insights to public fears and address them appropriately by government bodies.

The authors have carried out a study where they collected sentiment of people of Filipinos from the social networking site, Twitter. They used Naïve Bayes model to annotate and train their data for English and Filipino languages using the RapidMiner data science software [13]. They were able to achieve an accuracy of 81.77% for classifying tweets into positive, negative and neutral polarity.

This work is closest to our work but the authors have performed it only for their country, Filipinos whereas we have targeted countries on the basis of their highest death rates. In our work, we have analyzed 4000 live tweets for six vaccines from four countries as well as textual data pertaining to all countries to comprehend the public opinions. The swing of the public mood towards vaccines would bring an important insight for governments specially for countries where the fatalities have been very high.

3 Methodology

Researchers all over the world want to understand erratic aspects of COVID-19 pandemic. In our study, we have explored the sentiment of people towards COVID-19 vaccines. The workflow of our research methodology is shown in Fig. 1. We have first collected 24,000 live tweets in English language from Twitter that were related to COVID-19 vaccines initially without bifurcating tweets for any country. We were interested in exploring six vaccines–Moderna, AstraZeneca, Sputnik V, Covishield, Pfizer and Covaxin and for each vaccine we retrieved 4000 tweets.

In the first phase of data collection, we first collected tweets on a global basis to understand an overall perspective throughout the globe. Then, we retrieved tweets by filtering them with their countries to perceive cross-cultural polarity. The four countries that we selected were India, United States, Mexico and Brazil as the fatality rates in these countries were high.

The data were then preprocessed to collect hashtags required for sentiment analysis. Fine-grained sentiment analysis is then performed on the tweets to get different classification of the vaccines. Finally, the data were visualized using different representations. In the next section, we explain each step in detail that we have taken to perform the sentiment analysis.

3.1 Twitter Authentication

In order to retrieve data from the Twitter account, we extracted tweets using Twitter API, Tweepy. This involved an authentication process where in a Twitter developer account was created. Tweepy was accessed using Python (V 3.7.3) programming language.

The authentication object was subsequently invoked to facilitate the authentication process. It fetched two values—access token and its corresponding token key. Hence the token secret was created, completing the authentication process. Figure 2 illustrates the Twitter authentication process using a flowchart and, in the next section, we describe data collection.

3.2 Data Collection

After extracting tweets related to vaccines from Twitter, we collected hash tags related to various vaccines. For each country, we have collected 4000 live tweets for each vaccine, which implies that 24,000 tweets were collected for each country. Since we have collected tweets for four countries the total tweets that we extracted was 96,000. In addition, we also extracted tweets for all countries which were again 4000 live tweets for six vaccines, hence a total of 120,000 tweets were extracted. We have used Tweepy library for mining of data. The hashtags related to vaccines are listed below.

Moderna–#moderna. #modernavaccine
AstraZeneca–#AstraZeneca, #astrazenecavaccine, #oxfordvaccine
Sputnik V–#SputnikV, #sputnik, #SputnikLight
Covishield–#covishield, #covishieldvaccine, #covishieldsideeffects
Pfizer–#Pfizer, #PfizerVaccine
Covaxin–#Covaxin, #Covaxinvaccine

Next, we discuss the preprocessing of the extracted tweets.

3.3 Preprocessing Dataset

The data sets acquired from social media were raw and hence highly unstructured. Hence, in this form, their adaptation to machine learning algorithms was not possible. Hence, we have prepared and cleaned data. The data cleaning activities that we have performed are.

Removal of stop words.
Removal of HTML tags.
Removal of special characters like hash, @ that normally add noise to text.
Tokenized the retrieved data.
Converted all root words into their lemmas.
Standardized any accented characters into ASCII characters.
Converted all upper-case words into lower case so that feature set complexity gets reduced.

3.4 Sentiment Analysis

Sentiment analysis is an analysis of subjective judgments of an entity on different aspects. It allows to extract and analyze those judgments [14]. Being a machine learning process, it uses natural language processing so that emotions of people could be understood through their written words [15]. Hence, it brings out a computational distinction and classification of opinion that is expressed by the author of the text about the subject that the premise is built upon.

Sentimental analysis is used to measure polarity and subjectivity. Subjectivity calculation helps us to find facts, opinions and desires whereas the rate of polarity determines the positive negative and neutral tone of an author in a particular data corpus. We have performed this work using Python library, Text blob to process the tweets collected. Text blob processes textual data using natural language processing to define the overall sentiment based on lexicons.

We have used fine-grained polarity of extracted tweets as the positive tweets have been further classified into highly positive and weakly positive tweets. The polarity range is between [0.5, 1] and (0, 0.5) for highly positive and weakly positive tweets, respectively. Positive polarity is an indication that people were highly appreciative of COVID-19 vaccines and willingly got vaccinated. People with weakly positive polarity were those who were aware that vaccines would make them safe. Similarly, we have classified the negative tweets into highly negative and weakly negative tweets with their polarity range lying between [−0.5, 1] and (0, −0.5), respectively. Tweets that indicated negative remarks and a refusal to get vaccine were marked as weakly negative and those tweets where people claimed about the adverse side effects after vaccination were taken as highly negative.

The polarity of neutral tweets was taken as 0. Tweets where the user did not have a negative or a positive opinion about the vaccine were classified as neutral.

We also created word clouds to visualize important words based on their occurrence.

4 Results

In our work, we have analyzed the tweets for four different countries and comprehend the sentiment of overall population of the world. The live tweets were collected to analyze the polarity of four different countries and the overall sentiment of the people throughout the world. We have analyzed and compared the tweets for different countries for different types of vaccines as illustrated in Tables 1, 2, 3, 4, 5 and 6.

Table 1 Polarities for AstraZeneca for four countries and entire world

Full size table

Table 2 Polarities for Covishield for four countries and entire world

Full size table

Table 3 Polarities for Pfizer for four countries and entire world

Full size table

Table 4 Polarities for Covaxin for four countries and entire world

Full size table

Table 5 Polarities for Moderna for four countries and entire world

Full size table

Table 6 Polarities for Sputnik V for four countries and entire world

Full size table

4.1 Overall Sentiments

We have analyzed the sentiments of four countries—India, United States, Mexico and Brazil as well as the entire dataset containing countries all over the world as they have suffered the highest fatalities related to COVID-19. In Table 1, the polarity for Astra Zeneca is shown, and the numbers in the table clearly indicate a neutral sentiment for a large population.

Similarly in Table 2, the different polarities for Covishield vaccines have been given for four different countries and the world. The sentiment of the people is more towards the granular tone of neutral.

Tables 3, 4, 5 and 6 depict the polarities computed using Text blob for Pfizer, Covaxin, Moderna and Sputnik V, respectively.

We have then constructed histograms to help us visualize the data collected for fine-grained polarities for all vaccines under examination. Figure 3, 4, 5, 6, 7 and 8 are the visual representations for sentiments of AstraZeneca, Covishield, Pfizer, Covaxin, Moderna and Sputnik V, respectively. It can be seen in Fig. 3 that the neutral sentiment for all countries is the highest even if we would combine weakly and highly positive polarity. These data in table 1 represented the polarities for AstraZeneca vaccine, which was used to create a bar graph shown in Fig. 3. Similarly Figs. 4, 5, 6, 7 and 8 represent the polarity datasets given in Tables 2, 3, 4, 5 and 6, respectively.

The results obtained for sentiment analysis using Text Blob have been represented using confusion matrix. The confusion matrix was constructed using the Decision Tree classifier of Text Blob. It has been constructed for all six vaccines and their respective countries. In Fig. 9, the confusion matrix for AstraZeneca vaccine has been illustrated along with accuracy scores for different countries and world. We have also tabulated the accuracies for different vaccines in Table 7. However, due to paucity of space, the rest of the results are available in Appendix 1 for all other vaccines.

Table 7 Accuracy scores for all countries and the world

Full size table

4.2 WordCloud

WordCloud has been generated for three polarities where the highly positive and weakly positive tweets have been clubbed together to generate the positive cloud. Similarly, the highly negative and weakly negative tweets have been combined to generate the negative cloud. The WordCloud has been generated for all vaccines and their related sentiments for different countries. In Figs. 10, 11 and 12, we have shown the WordCloud for AstraZeneca, Covishield and Covaxin for Mexico respectively.

The WordCloud helps us visualize different occurrences of a word. The word whose frequency is higher is highlighted and categorized by different sizes for different scores [15, 16]. The words that reflected the highly positive sentiment included Pfizer, vaccine, good, Covaxin, like, safe, dos, get, variant and so on. The WordCloud for other vaccines has been given in Appendix 2 for further reference.

4.3 Ranking of Vaccines

Considering the positive polarity tweets that we had categorized, we also used it to rank the popularity of different vaccines. However, this ranking is on the basis of the tweets that we have extracted. The percentage of positive tweets for different vaccines is shown in Fig. 13, and we have used this percentage to rank the vaccines in order of their popularity. We have ranked the vaccines based on the tweets dataset that we have collected. The public clearly favors Sputnik V and Covishield vaccines ask can be clearly seen in Fig. 14.

5 Discussion

With the setting in of this disease, the world has entered into a continuous phase of lockdowns and disruptions. The vaccines that were put forth by the governments were the only solace in this scenario. The vaccines have been rolled out in every country and it is good to examine the role of vaccines in fighting the disease. Hence, we have carried out sentiment analysis of people attitudes towards different vaccines.

We have categorized the vaccine into fine-grained polarities and compared the sentiment within different countries and overall world. The classification accuracy using decision tree classifier for AstraZeneca vaccine was 100% for Mexico, 94.26% for India, 98.7% for United States and 95.5% for Brazil and similarly for other vaccines. It was found that though there are many negative theories floating about the vaccines but still people have a neutral opinion about them. Though the highly positive tweets were low as compared with the weakly positive and neutral tweets, but the general sentiment seems to be in favor of vaccines. However, this study was limited as we examined the tweets of only English language. Another limitation being only live tweets were scrutinized but they could also be studied over a period of time. This study could be very beneficial to the governments to build policies to handle such global health crisis.

References

Wang C, Horby PW, Hayden FG, Gao GF (2020) A novel coronavirus outbreak of global health concern. The Lancet 395(10223):470–473
Google Scholar
Nishiura H, Tetsuro K, Takeshi M, Ayako S, Sung-mok J, Katsuma H, Ryo K (2020) Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19). Int J Infect Dis 94:154
Article Google Scholar
John Hopkins University and Medicine. 26 10 2021. https://coronavirus.jhu.edu/map.html
Krammer F (2020) SARS-CoV-2 vaccines in development. Nature 586(7830):516–527
Article Google Scholar
Forni G, Alberto M (2021) COVID-19 vaccines: where we stand and challenges ahead. Cell Death Differ 28(2):626–639
Article Google Scholar
Lazarus JV, Ratzan SC, Palayew A, Gostin LO, Larson HJ, Rabin K, Kimbals S, El-Mohandes A (2021) A global survey of potential acceptance of a COVID-19 vaccine. Nat Med 27(2):225–228
Google Scholar
Lane S, MacDonald NE, Marti. M, Dumolard L (2018) Vaccine hesitancy around the globe: analysis of three years of WHO/UNICEF Joint Reporting Form data-2015–2017. Vaccine 36(26):3861–3867
Google Scholar
Hussain A, Ahsen T, Zain H, Zakariya S, Kia MGD, Azhar A, Aziz S (2021) Artificial intelligence–enabled analysis of public attitudes on facebook and twitter toward covid-19 vaccines in the united kingdom andand the united states: observational study. J Med Internet Res 23(4):e26627
Article Google Scholar
Statista 27 10 2021. https://www.statista.com/statistics/284436/india-social-network-penetration/
Dashtipour K, Mandar G, Jingpeng L, Fengling J, Bin K, Amir H (2020) A hybrid Persian sentiment analysis framework: Integrating dependency grammar based rules and deep neural networks. Neurocomputing 380:1–10
Article Google Scholar
Glowacki EM, Wilcox GB, Glowacki JB (2021) Identifying# addiction concerns on twitter during the COVID-19 pandemic: A text mining analysis. Substance Abuse 42(1):39–46
Article Google Scholar
Sanders AC, White RC, Severson LS, Ma R, McQueen R, Paulo HC, Zhang Y, Erickson JS, Bennett KP (2021) Unmasking the conversation on masks: natural language processing for topical sentiment analysis of COVID-19 Twitter discourse medRxiv, pp 2020–2028
Google Scholar
Villavicencio C, Macrohon JJ, Inbaraj XA, Jyh-Horng J, Jer-Guang H (2021) Twitter sentiment analysis towards COVID-19 vaccines in the Philippines using Naïve Bayes. Information 12(5):204
Article Google Scholar
Soleymani M, David G, Brendan J, Björn S, Shih-Fu C, Maja P (2017) A survey of multimodal sentiment analysis. Image Vis Comput 65:3–14
Google Scholar
Li J, Eduard H (2017) A practical guide to sentiment analysis. In: Reflections on sentiment/opinion analysis. Springer, Cham, pp 41–59
Google Scholar
Agarwal A, Boyi X, Ilia V, Owen R, Passonneau RJ (2011) Sentiment analysis of twitter data. In: Proceedings of the workshop on language in social media (LSM 2011), Oregon Portland
Google Scholar

Download references

Author information

Authors and Affiliations

Lal Bahadur Shastri Institute of Management, Delhi, India
Amritpal Singh, Vandana Bhasin, Abhishek Jatana, Naval Saxena & Shivam Rojal

Authors

Amritpal Singh
View author publications
You can also search for this author in PubMed Google Scholar
Vandana Bhasin
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Jatana
View author publications
You can also search for this author in PubMed Google Scholar
Naval Saxena
View author publications
You can also search for this author in PubMed Google Scholar
Shivam Rojal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vandana Bhasin .

Editor information

Editors and Affiliations

Department of Computer Science Engineering, Maharaja Agrasen Institute of Technology, Rohini, Delhi, India
Deepak Gupta
Department of Computer Science Engineering, Maharaja Agrasen Institute of Technology, Rohini, Delhi, India
Ashish Khanna
Rajnagar Mahavidyalaya, Birbhum, West Bengal, India
Siddhartha Bhattacharyya
Department of Information Technology, Cairo University, Giza, Egypt
Aboul Ella Hassanien
Department of Computer Science, Sukhdev College of Business Studies, University of Delhi, Delhi, India
Sameer Anand
Department of Computer Science, Shaheed Sukhdev College of Business Studies, University of Delhi, Delhi, India
Ajay Jaiswal

Appendices

Appendix 1

The accuracy scores and confusion matrix for Covishield, Covaxin, Moderna, Sputnik V and Pfizer vaccines with their respective countries are shown in Tables 8, 9, 10, 11 and 12, respectively.

Table 8 Covishield confusion matrix

Full size table

Table 9 Covaxin confusion matrix

Full size table

Table 10 Moderna confusion matrix

Full size table

Table 11 Sputnik V confusion matrix

Full size table

Table 12 Pfizer confusion matrix

Full size table

Appendix 2

See (Figs. 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, A., Bhasin, V., Jatana, A., Saxena, N., Rojal, S. (2023). Sentiment Analysis of COVID-19 Vaccines. In: Gupta, D., Khanna, A., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A. (eds) International Conference on Innovative Computing and Communications. Lecture Notes in Networks and Systems, vol 471. Springer, Singapore. https://doi.org/10.1007/978-981-19-2535-1_2

Download citation

DOI: https://doi.org/10.1007/978-981-19-2535-1_2
Published: 23 September 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2534-4
Online ISBN: 978-981-19-2535-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Sentiment Analysis of COVID-19 Vaccines

Abstract

Similar content being viewed by others

COVID-19 vaccination perception and outcome: society sentiment analysis on twitter data in India