Keywords

1 Introduction

The world has gone through a tremendous turmoil with the outbreak of pneumonia-like disease, coronavirus. This unprecedented health issue has brought forth uncalled lockdowns and marked effect on the economies of the world. In December 2019, the virus SARS-COV-2 became a public health emergency whose outbreak had started in Hubei Province, Wuhan, China. In February 2020, WHO (World Health Organization) called the disease COVID-19 and declared it a global pandemic [1]. The two major reasons that led to the spread of the disease were its highly contagious nature and its numerous variants along with public’s nonadherence to guidelines related to wearing masks and maintaining social distance. This distress increased with many asymptomatic cases who were carriers of disease and resulted in the infection spreading over a period of time [2]. The coronavirus resource center of John Hopkins University indicates that virus has infected over 243 million people all over the world and has taken lives of more than 4.9 million people [3]. This outbreak became so difficult to handle that the only feasible solution visible was to develop vaccines. Pfizer/BioNtech Company made the first vaccine, which was approved for usage amongst all communities. It was authorized for use by the people of United Kingdom. The authors in these papers have put forth the development process of the vaccines, their different varieties and the front runner vaccine candidates [4, 5]. The vaccination drives worldwide have been able to vaccinate 68.02 million people [3]. The Indian government has also achieved a feat of being able to vaccinate 100 crore people.

As such the governments’ only hope of controlling the viral spread was public acceptance of safe and efficacious vaccines. The vaccination process has been under tremendous pressure due to hesitancy, distrust and debate. This hesitancy of the public towards the vaccine has been projected by WHO as one of the top 10 global health threats. This hesitancy clubbed with substantial rumors is the biggest obstacle in achieving mass vaccinations of population to bring in herd immunity [6, 7].

The original method of gauging the opinions of people was through surveys and other traditional methods. Even though these traditional methods could be used to gauge the willingness of individuals to get vaccinated but the dynamic nature of the sentiment would be missing. Hence, instead of applying the traditional methods of data collection, we have explored the data from social media, so that public sentiments and attitudes could be gauged in real time [8]. This study is conducted to understand the general public psychological frame of mind towards vaccines using social media. As is reflected, half of the world population is on social media, where India itself has 448 million active social media users [9]. However, the data on social media are largely unstructured, and hence we have used natural language processing to extract tweets from Twitter. We have initially taken tweets from all over the world to perform sentiment analysis. Then we have retrieved tweets for four countries where the fatality rates were high. These countries include—India, United States of America (US), Brazil and Mexico including sentiment analysis for entire dataset.

Sentiment analysis is a process where the subjective opinions are extracted and categorized using text, audio and video sources to determine polarities and subjectivities, feelings, or states of mind towards target issues, themes or areas of relevance [10]. These approaches can be used by the medical arena as well as by the government for public policy research. The main contribution of this study can be summarized as.

  • To classify the sentiments of people around the globe for COVID-19 vaccines—AstraZeneca, Sputnik V, Covishield, Covaxin, Moderna and Pfizer including four countries—India, US, Mexico and Brazil.

  • To perform word cloud mapping to monitor the frequency of highly used words.

  • Rank the vaccines according to the sentiment of the people.

2 Related Literature

The posts on social media express the views and opinions of the public in an unadulterated and unstructured form. So, the researchers have been using this platform extensively as the unbiased opinion of public is available easily. Twitter is amongst those social media platforms that have received posts reaching up to millions. A lot of research work has been carried out using twitter dataset on different areas of COVID-19 pandemic.

In this paper, the authors have applied an artificial intelligence-based approach using 3,00,000 social media posts from Facebook and other platforms for United Kingdom and United States [8]. They have used the natural language approach to predict average sentiments, trends and have also found their topics/themes. They have identified the positive, negative and neutral sentiments for both the countries and correlated their findings. They have identified the optimism of the public related to vaccines as well as their safety and economic viability. They even compared their results with nationwide surveys.

Glowwacki et al. analyzed the Twitter dataset to examine the addiction concerns during COVID-19 pandemic in US and other countries [11]. Their work was focused on two keywords—covid and addiction. They were able to bring forth 14 topics prevalent during that time using 3301 tweets. The authors highlighted the public discussions that were happening on Twitter related to addiction for consideration from the health management authorities but did not base their study on addictions due to COVID-19.

The authors have performed an exploratory study to find out the public sentiment towards the effectiveness of a mask for prevention of COVID-19 [12]. They performed this analysis using tools like natural language processing, sentiment analysis and clustering. The clustering technique helped in classifying high-level themes and fifteen subtopics within each of these themes. They found that initially there were negative trends towards wearing masks in each of the themes. These trends are an indication for gaining deeper insights to public fears and address them appropriately by government bodies.

The authors have carried out a study where they collected sentiment of people of Filipinos from the social networking site, Twitter. They used Naïve Bayes model to annotate and train their data for English and Filipino languages using the RapidMiner data science software [13]. They were able to achieve an accuracy of 81.77% for classifying tweets into positive, negative and neutral polarity.

This work is closest to our work but the authors have performed it only for their country, Filipinos whereas we have targeted countries on the basis of their highest death rates. In our work, we have analyzed 4000 live tweets for six vaccines from four countries as well as textual data pertaining to all countries to comprehend the public opinions. The swing of the public mood towards vaccines would bring an important insight for governments specially for countries where the fatalities have been very high.

3 Methodology

Researchers all over the world want to understand erratic aspects of COVID-19 pandemic. In our study, we have explored the sentiment of people towards COVID-19 vaccines. The workflow of our research methodology is shown in Fig. 1. We have first collected 24,000 live tweets in English language from Twitter that were related to COVID-19 vaccines initially without bifurcating tweets for any country. We were interested in exploring six vaccines–Moderna, AstraZeneca, Sputnik V, Covishield, Pfizer and Covaxin and for each vaccine we retrieved 4000 tweets.

Fig. 1
figure 1

Different phases of sentiment analysis in our study

In the first phase of data collection, we first collected tweets on a global basis to understand an overall perspective throughout the globe. Then, we retrieved tweets by filtering them with their countries to perceive cross-cultural polarity. The four countries that we selected were India, United States, Mexico and Brazil as the fatality rates in these countries were high.

The data were then preprocessed to collect hashtags required for sentiment analysis. Fine-grained sentiment analysis is then performed on the tweets to get different classification of the vaccines. Finally, the data were visualized using different representations. In the next section, we explain each step in detail that we have taken to perform the sentiment analysis.

3.1 Twitter Authentication

In order to retrieve data from the Twitter account, we extracted tweets using Twitter API, Tweepy. This involved an authentication process where in a Twitter developer account was created. Tweepy was accessed using Python (V 3.7.3) programming language.

The authentication object was subsequently invoked to facilitate the authentication process. It fetched two values—access token and its corresponding token key. Hence the token secret was created, completing the authentication process. Figure 2 illustrates the Twitter authentication process using a flowchart and, in the next section, we describe data collection.

Fig. 2
figure 2

Steps to create a Twitter authentication developer account

3.2 Data Collection

After extracting tweets related to vaccines from Twitter, we collected hash tags related to various vaccines. For each country, we have collected 4000 live tweets for each vaccine, which implies that 24,000 tweets were collected for each country. Since we have collected tweets for four countries the total tweets that we extracted was 96,000. In addition, we also extracted tweets for all countries which were again 4000 live tweets for six vaccines, hence a total of 120,000 tweets were extracted. We have used Tweepy library for mining of data. The hashtags related to vaccines are listed below.

  • Moderna–#moderna. #modernavaccine

  • AstraZeneca–#AstraZeneca, #astrazenecavaccine, #oxfordvaccine

  • Sputnik V–#SputnikV, #sputnik, #SputnikLight

  • Covishield–#covishield, #covishieldvaccine, #covishieldsideeffects

  • Pfizer–#Pfizer, #PfizerVaccine

  • Covaxin–#Covaxin, #Covaxinvaccine

Next, we discuss the preprocessing of the extracted tweets.

3.3 Preprocessing Dataset

The data sets acquired from social media were raw and hence highly unstructured. Hence, in this form, their adaptation to machine learning algorithms was not possible. Hence, we have prepared and cleaned data. The data cleaning activities that we have performed are.

  • Removal of stop words.

  • Removal of HTML tags.

  • Removal of special characters like hash, @ that normally add noise to text.

  • Tokenized the retrieved data.

  • Converted all root words into their lemmas.

  • Standardized any accented characters into ASCII characters.

  • Converted all upper-case words into lower case so that feature set complexity gets reduced.

3.4 Sentiment Analysis

Sentiment analysis is an analysis of subjective judgments of an entity on different aspects. It allows to extract and analyze those judgments [14]. Being a machine learning process, it uses natural language processing so that emotions of people could be understood through their written words [15]. Hence, it brings out a computational distinction and classification of opinion that is expressed by the author of the text about the subject that the premise is built upon.

Sentimental analysis is used to measure polarity and subjectivity. Subjectivity calculation helps us to find facts, opinions and desires whereas the rate of polarity determines the positive negative and neutral tone of an author in a particular data corpus. We have performed this work using Python library, Text blob to process the tweets collected. Text blob processes textual data using natural language processing to define the overall sentiment based on lexicons.

We have used fine-grained polarity of extracted tweets as the positive tweets have been further classified into highly positive and weakly positive tweets. The polarity range is between [0.5, 1] and (0, 0.5) for highly positive and weakly positive tweets, respectively. Positive polarity is an indication that people were highly appreciative of COVID-19 vaccines and willingly got vaccinated. People with weakly positive polarity were those who were aware that vaccines would make them safe. Similarly, we have classified the negative tweets into highly negative and weakly negative tweets with their polarity range lying between [−0.5, 1] and (0, −0.5), respectively. Tweets that indicated negative remarks and a refusal to get vaccine were marked as weakly negative and those tweets where people claimed about the adverse side effects after vaccination were taken as highly negative.

The polarity of neutral tweets was taken as 0. Tweets where the user did not have a negative or a positive opinion about the vaccine were classified as neutral.

We also created word clouds to visualize important words based on their occurrence.

4 Results

In our work, we have analyzed the tweets for four different countries and comprehend the sentiment of overall population of the world. The live tweets were collected to analyze the polarity of four different countries and the overall sentiment of the people throughout the world. We have analyzed and compared the tweets for different countries for different types of vaccines as illustrated in Tables 1, 2, 3, 4, 5 and 6.

Table 1 Polarities for AstraZeneca for four countries and entire world
Table 2 Polarities for Covishield for four countries and entire world
Table 3 Polarities for Pfizer for four countries and entire world
Table 4 Polarities for Covaxin for four countries and entire world
Table 5 Polarities for Moderna for four countries and entire world
Table 6 Polarities for Sputnik V for four countries and entire world

4.1 Overall Sentiments

We have analyzed the sentiments of four countries—India, United States, Mexico and Brazil as well as the entire dataset containing countries all over the world as they have suffered the highest fatalities related to COVID-19. In Table 1, the polarity for Astra Zeneca is shown, and the numbers in the table clearly indicate a neutral sentiment for a large population.

Similarly in Table 2, the different polarities for Covishield vaccines have been given for four different countries and the world. The sentiment of the people is more towards the granular tone of neutral.

Tables 3, 4, 5 and 6 depict the polarities computed using Text blob for Pfizer, Covaxin, Moderna and Sputnik V, respectively.

We have then constructed histograms to help us visualize the data collected for fine-grained polarities for all vaccines under examination. Figure 3, 4, 5, 6, 7 and 8 are the visual representations for sentiments of AstraZeneca, Covishield, Pfizer, Covaxin, Moderna and Sputnik V, respectively. It can be seen in Fig. 3 that the neutral sentiment for all countries is the highest even if we would combine weakly and highly positive polarity. These data in table 1 represented the polarities for AstraZeneca vaccine, which was used to create a bar graph shown in Fig. 3. Similarly Figs. 4, 5, 6, 7 and 8 represent the polarity datasets given in Tables 2, 3, 4, 5 and 6, respectively.

Fig. 3
figure 3

Visual representation of polarities for AstraZeneca of four countries and the entire world

Fig. 4
figure 4

Visual representation of polarities for Covishield for four countries and the entire world

Fig. 5
figure 5

Visual representation of polarities for Pfizer for four countries and the entire world

Fig. 6
figure 6

Visual representation of polarities for Covaxin for four countries and the entire world

Fig. 7
figure 7

Visual representation of polarities for Moderna for four countries and the entire world

Fig. 8
figure 8

Visual representation of polarities for Sputnik V for four countries and the entire world

The results obtained for sentiment analysis using Text Blob have been represented using confusion matrix. The confusion matrix was constructed using the Decision Tree classifier of Text Blob. It has been constructed for all six vaccines and their respective countries. In Fig. 9, the confusion matrix for AstraZeneca vaccine has been illustrated along with accuracy scores for different countries and world. We have also tabulated the accuracies for different vaccines in Table 7. However, due to paucity of space, the rest of the results are available in Appendix 1 for all other vaccines.

Fig. 9
figure 9figure 9

AstraZeneca confusion matrix

Table 7 Accuracy scores for all countries and the world

4.2 WordCloud

WordCloud has been generated for three polarities where the highly positive and weakly positive tweets have been clubbed together to generate the positive cloud. Similarly, the highly negative and weakly negative tweets have been combined to generate the negative cloud. The WordCloud has been generated for all vaccines and their related sentiments for different countries. In Figs. 10, 11 and 12, we have shown the WordCloud for AstraZeneca, Covishield and Covaxin for Mexico respectively.

Fig. 10
figure 10

WordCloud for AstraZeneca in Mexico

Fig. 11
figure 11

WordCloud for Covishield in Mexico

Fig. 12
figure 12

WordCloud for Covaxin in Mexico

The WordCloud helps us visualize different occurrences of a word. The word whose frequency is higher is highlighted and categorized by different sizes for different scores [15, 16]. The words that reflected the highly positive sentiment included Pfizer, vaccine, good, Covaxin, like, safe, dos, get, variant and so on. The WordCloud for other vaccines has been given in Appendix 2 for further reference.

4.3 Ranking of Vaccines

Considering the positive polarity tweets that we had categorized, we also used it to rank the popularity of different vaccines. However, this ranking is on the basis of the tweets that we have extracted. The percentage of positive tweets for different vaccines is shown in Fig. 13, and we have used this percentage to rank the vaccines in order of their popularity. We have ranked the vaccines based on the tweets dataset that we have collected. The public clearly favors Sputnik V and Covishield vaccines ask can be clearly seen in Fig. 14.

Fig. 13
figure 13

Ranking of vaccines on the basis of positive tweets collected

Fig. 14
figure 14

Visual representation of the rankings of the vaccine

5 Discussion

With the setting in of this disease, the world has entered into a continuous phase of lockdowns and disruptions. The vaccines that were put forth by the governments were the only solace in this scenario. The vaccines have been rolled out in every country and it is good to examine the role of vaccines in fighting the disease. Hence, we have carried out sentiment analysis of people attitudes towards different vaccines.

We have categorized the vaccine into fine-grained polarities and compared the sentiment within different countries and overall world. The classification accuracy using decision tree classifier for AstraZeneca vaccine was 100% for Mexico, 94.26% for India, 98.7% for United States and 95.5% for Brazil and similarly for other vaccines. It was found that though there are many negative theories floating about the vaccines but still people have a neutral opinion about them. Though the highly positive tweets were low as compared with the weakly positive and neutral tweets, but the general sentiment seems to be in favor of vaccines. However, this study was limited as we examined the tweets of only English language. Another limitation being only live tweets were scrutinized but they could also be studied over a period of time. This study could be very beneficial to the governments to build policies to handle such global health crisis.