Keywords

1 Introduction

Analytics of twitter data is predominantly conducted to ascertain the underlying sentiment behind the tweets/images. Hence, this effectively narrows to sentiment analysis. Sentiment analysis is computational study of the various opinions, emotions, sentiments, and attitude which is expressed by different users in the form of texts pertaining to an entity of interest. Sentiment analysis is also called as review mining, opinion mining, or attitude analysis [1,2,3,4,5].

Motivation for the surge in voluminous user content globally is attributed to technological advancements as also increased Internet activities like—discussion forums, conferencing, online transactions, e-commerce, chatting, surveillances, ticket booking, websites of merchants, widespread and continual communications on various social media, and the variety of other online activities [1, 3, 6, 7].

Current work is organized as follows: Sect. 2 covers motivation, Sect. 3 covers literature survey, Sect. 4 covers experimentation and results, Sect. 5 presents observations, Sect. 6 highlights the novelty, Sect. 7 presents the various applications, Sect. 8 presents challenges, Sect. 9 presents research contribution, and Sect. 10 covers conclusion.

2 Motivation

This novel technique will help people to analyze various data from Twitter and help understand the public opinion or sentiment of people behind the specific keywords, and this will be useful in various sectors like business, marketing, forecasting, politics, and tourism.

3 Literature Survey

There are mainly two approaches found in existing literature [1, 7,8,9,10,11,12,13,14,15,16,17,18] for performing sentiment analysis–lexicon based and machine learning based. Concept of polarity is used in the former while suitable classification models are developed in the latter.

Detailed survey of recent work is presented in Table 1, and research gaps are highlighted.

Table 1 Survey of sentiment analysis in recent works

4 Experimentation and Results

We used Tweepy to fetch the tweets in real time for three currently popular hashtags in India: #MakeInIndia, #AtmNirbharIndia, #VocalforLocal. The tweepy.Cursor() function was used to fetch all latest tweets. Preprocessing was performed using the ‘re’ library of python. TextBlob was used for polarity determination. We wrote a python program to encode the seven class labels as follows: -1 negative, -0.6 to -1 strongly negative, 0 to -0.3 weakly negative, 0 neutral, 0 to 0.3 weakly positive, 0.6 to 1 strongly positive, and 1 for positive and performed three experiments as under.

4.1 Experiment 1: #MakeInIndia

We fetched 1000 tweets in real time and have analyzed the same for ascertaining the sentiment. Visualization results for seven sentiment classes are as illustrated in Fig. 1

Fig. 1
figure 1

Analysis of 1000 tweets for #MakeInIndia

4.2 Experiment 2: #AtmNirbhar

We fetched 1000 tweets in real time and have analyzed the same for ascertaining the sentiment. Visualization results are as illustrated in Fig. 2

Fig. 2
figure 2

Analysis of 1000 tweets for #AtmNirbhar

4.3 Experiment 3: #VocalforLocal

Figure 3 illustrates the outcome of analyzing 200 tweets.

Fig. 3
figure 3

Analysis of 200 tweets for #MakeInIndia

We performed comparative analysis of the two hashtags with respect to seven sentiment classes as illustrated in the stacked bar chart in Fig. 4

Fig. 4
figure 4

Comparative analysis of two hashtags with respect to seven sentiment classes

To validate the obtained results, we assigned the task of annotation to two human experts and noted the findings. Figures 5 and 6 illustrate the differences in annotation between the two experts using RMSE and standard deviation, respectively.

Fig. 5
figure 5

Differences in RMSE values for the two human experts

Fig. 6
figure 6

Inter-individual differences in annotation by two human experts

5 Observations

  • From Figs. 1, 2, 3 and 4, we infer that the highest positive percentage of tweets was for #MakeInIndia while the highest negative tweets were for #Atmnirbhar

  • From Table 1, it is observed that although some standard datasets do exist, most researchers prefer to gather tweets in real time. Tweepy was observed to be the predominant choice. Also, SVM and Random Forests have frequently yielded high accuracy of over 95%

6 Novelty

This technique gives the result visualization in the form of pie-chart along with seven classes which gives the clearer idea about the sentiment behind keyword, and this novel approach of result visualization helps people to understand result in detail.

7 Applications

Twitter data analytics has variety of applications such as

  • For generating reputation for brands or products [26,27,28],

  • For increasing the customer engagement, having better informed decisions toward risk analysis, efficient credit ratings for various customers, and performing competitive analysis [29],

  • Increasing productivity and efficiency of restaurants [30],

  • For better market intelligence and improve customer satisfaction [3, 36],

  • Increased tourism [37],

  • Monitoring and analyzing public opinions concerning political issues [3],

  • To forecast the price changes as per news sentiments [1],

  • To develop new products, services and promote products as per the customers reviews [1] and social advertising [38, 39].

8 Challenges in Twitter Data Analytics

  1. i.

    Determining the contextual information for sentiments and forming a generalized foundation globally is difficult [30].

  2. ii.

    There is increased difficulty due to the widespread use of onomatopoeias, idioms, homophones, alliterations, and acronyms [30]. Hence, complex NLP techniques are required to decipher the correct context and meaning of various words.

  3. iii.

    Aspect-based sentiment analysis is an important challenge [36].

  4. iv.

    Opinion summarization, subjectivity classification, and opinion retrieval [36]

  5. v.

    Lack of large annotated data to train models across various domains [40]

9 Research Contribution

  • Current work is a novel approach of visualizing and analyzing the three currently popular hashtags in India. Our extensive experimentation and analysis about the prevailing sentiment shall be greatly beneficial for fellow researchers.

  • We have also covered important aspects such as—current challenges, future trends, and applications of sentiment analysis.

10 Conclusion

We have successfully implemented the proof of concept toward gathering tweets in real time and attempting to analyze the sentiment of a part of population using the lexicon-based technique. We have performed extensive experimentation and analyzed the sentiments for 100, 200, 500, and 1000 different sets of tweets for three most currently most popular hashtags. Ample data visualization performed in this work would be great asset to fellow researchers thereby carving the path for future research.