An Automatic Emotion Analysis of Real Time Corona Tweets

Kalaivani, A.; Vijayalakshmi, R.

doi:10.1007/978-981-16-3660-8_34

A. Kalaivani¹⁰ &
R. Vijayalakshmi¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1393))

Included in the following conference series:

International Conference on Advanced Informatics for Computing Research

887 Accesses
1 Citations

Abstract

Emotion Analysis from text is a recent research field originated from Sentiment Analysis. The Sentiment Analysis identify the sentiments and classify them to positive, neutral, or negative sentiments based on the text. Emotion Analysis ocus to detect and recognize emotions through the text expression which are anger, disgust, fear, happiness, sadness and surprise. The real time applications of emotion analysis can be widely applied in software engineering, website customization, education, and gaming domains. Emotion Analysis can be done by gathering social media data such as twitter, reviews, blogs. Emotion Analysis can be done on the public opinion on a particular product or on a particular topic.

In this paper we are focusing to do emotion analysis of the public on real time recent corona tweets. The tweets are collected through twitter application interface. The collected real tweets are applied to pre-processing strategies to remove inconsistent and redundant factors and the pre-processed tweets are visualized through word cloud. The emotions scores are obtained using nrc_sentiment dictionary which contains basic emotions and both sentiments of positive or negative. The emotions score levels of the public are identified from the Corona Tweets and depicted through graphical analysis.

Access provided by Autonomous University of Puebla. Download conference paper PDF

TWEESENT: A Web Application on Sentiment Analysis

Text-Based Analysis of Emotion by Considering Tweets

Emoticons and Their Effects on Sentiment Analysis of Twitter Data

Keywords

1 Introduction

Sentiment Analysis applies natural language processing to a text documents to understand the sentiments of the text. The outcome of Sentiment classification will classify the text into positive sentiments, negative sentiments or neutral sentiments. Sentiment polarity can be applicable to a whole textual document, sentence and word. The sentiment analysis can be applied to marketing to identify customer trends and then to medical to find the opinion of the patients on a particular disease. Twitter analysis can be used to identify the impact of the education system in the society. Twitter Analysis can be used to track trends and also expanded to disseminate health information during viral epidemics.

Coronavirus disease (COVID-19) is a global infectious disease originated in China during year end of 2019. COVID-19 started spreading across the world and people are still in pandemic situation. The symptoms of COVID-19 virus will have a mild to moderate respiratory illness and can be recovered easily at the initial stage without special treatment. Patients infected by COVID-19 virus are aged people with medical health issue related to heart problems, diabetics patient, respiratory irregularity and cancer patient have a chance of adaptability to this disease easily. Prevention Mechanism can be carried out to slow down COVID transmission by informing public about COVID-19 virus and also conduct awareness program. Simple mechanism is to protect ourself is by wash your hands frequently or using an sanitizer to keep yourself free. The COVID-19 virus spreads through personal contacts of the infected person with cough or cold by means of saliva droplets or nose discharge.

Our human being can interpret the sentiments of writing or speech. Based on the contextual understanding user can classify the sentHumans are gifted with the capability to interpret the tone of a piece of writing. Consider the sentence: “My flight‟s been delayed. Brilliant!” Most of them quickly interpret the person as sarcastic, but having a delayed flight is not a good experience. By applying this contextual understanding user can classify the sentence into negative sentiments. Without contextual understanding, a machine look at the sentence “brilliant” and identify sentiment as positive. In this paper we are collecting some real time data on corona and we are going to show what people are thinking about Corona in terms of sentiment analysis.

Sentimental analysis is a resource available for any organization used to analyze and enhance their products and services. Mobile Usage is found with all communities and before purchase of any mobile we can analyze feedback from mobile users through online tweets. Online Mobile tweets from customers of their mobile products can be used for conducting a sentiment analysis. In this paper, the online mobile tweets are collected and processed to identify the emotion scores of mobiles users using R tool. The online mobile tweets are extracted and pre-processed and then classified the tweets into positive and negative sentiments and also their emotion scores are identified for the customers. The basic emotion classification system for online mobile tweets are depicted in Fig. 1.

The organization of the paper flow is as follows: Sect. 2 describes the researchers work on this domain and Sect. 3 describes system methodologies on process of gathering twitter data to emotion classification. The implementation of the proposed work in a stage by stage way is discussed in Sect. 5 and the paper is concluded with the contributions of the proposed work and future enhancement.

2 Literature Review

Many researchers contributed for their research work on pre-processing online tweets and twitter classification and further work also expanded on emotion score. Researchers concentrated much on stop phrase removal to achieving higher accuracy, formation of word cloud to identify frequent words, frequency count of accuracy for number of words present in the word cloud and finally sentiment emotions of overall tweets. Vijayalakshmi R, and A.Kalaivani, [1] proposed a brief information about word cloud formation and also pre processing techniques for apple mobile tweets finally accuracy of sentiment emotion score are depicted through bar graph.

Naramula Venkatesh and A.Kalaivani [2], proposed a preprocessing techniques for mobile tweets and formed word cloud visualization on apple mobile data sets and finding the frequency of the words. Bhattacharajee [3] et al. proposed a preprocessing algorithm for noise reduction based on lexicon. The Cosine Similarity Algorithm is proposed to classify the sentiment comment into a five point scale of −2 (highly negative) to +2 (highly positive). Ghag and shah [4] produced a research work by using movie document datasets to analyze the effect of stopwords removal on sentiment classification models. The proposed improved algorithm to produced better accuracy than traditional classifier. The survey is carried out with the classifier based on term weighting technique.

S. Rill et al. in [5], invented a system PoliTwi to detect emerging political topics in Twitter rather than other standard information channels. The identified Top Topics are shared via different channels towards wider publicity. The topics are compared with Google Trends and observed topics emerged in Twitter than in Google Trends. Finally, these topics can be used as a knowledge bases for concept-level sentiment analysis.

F. H. khan [6] et al. proposed an hybrid algorithm for twitter feeds classification. The proposed method applies contributed on multiple pre-processing steps before sent to the classifier. Proposed techniques overcomes the previous limitations and achieves higher accuracy when compared to the state-of-art techniques.

Rehab Duwairi, Mahmoud El-Orfali [7] anayzed the role of text pre-processing, feature selection and representation and classification using support vector machines. The level of accuracy achieved is improved when compared to the existing literature work. Duwairi and El-orfali [8] discussed sentiment analysis related to Arabic text. The sentiment analysis was investigated for Arabic text datasets with multiple classifiers of SVM, Naive Bayes and K-Nearest Neighbour. The experimental results shows that selection of preprocessing strategies on the input tweets increases the performance of the classifiers.

E. Haddi, X. Liu, and Y. Shi [9] demonstrated the role of text pre-processing in sentiment analysis. The experimental results focussed on appropriate feature selection and representation, sentiment analysis accuracies using support vector machines (SVM) achieved improved performance. Alexander Pak, Patrick Paroubek [10] performed Sentiment Analysis using microblogging. It is a popular communication tool among Internet users. The proposed work automatically collect a corpus for both sentiment analysis and opinion mining purposes. A textual analysis is performed on the collected corpus and a sentiment classifier is built to determine different types of sentiments. Proposed techniques are efficient and performed better than state of art techniquues. The research currently focussed on English language and in future can be expanded for any other languages. The primary issues discussed in literature review are classification accuracy such as most of the tweets with a very high percentage as neutral. The other issues to be considered by the researchers in future work are data sparsity and sarcasm.

3 System Methodology

Datas are gathered from users through web analytics tools which are independent, semi based and unreadable manner. From twitter API we are collecting some real time data on corona what people tweet about the virus and how they are protecting themselves from covid-19 and then the collected data is converted in the form of.csv files. The twitter data collected from the users is unstructured, incomplete, noisy and inconsistent. The data processing strategies are applied to discover knowledge records.

The general data pre-processing steps are removal of lowercase, punctuation, number, URL, special characters and expression. In the raw text stop words removal elimates noise from text by removing words such as “the,” “and” and “a”. Tokenization and Visualization are effective method to discover abstract thoughts and express information in the raw text. The outcomes of Sentiment analysis are represent in the form of Graphs, Histogrrams and Matrices. The most famous representation are Interactive Maps and Word Cloud. Visualization presentation are used in multimedia, medicine, education, engineering and technological applications. The words with biggest size is most frequently used and with much less length are least used.

In our proposed system, the dataset chosen for sentiment analysis is real time corona dataset. The datasets is applied to pre-processing strategies to remove inconsistent and redundant factors. Our proposed pre-processing techniques involves elimination of punctuations, special characters, digits, escaping HTML characters. Further the dataset is finetuned by applying removal of stop words, removal of URLs and removal of expressions. The pre-processed data visualization is represented as word cloud with the frequency of the key words. Finally the tweets are classified into emotions based on nrc-sentiment dictionary and descriptive analysis for the emotions in the form of graph. In Real-world the data may contain unreadable formats which lack in trends, unpolished, disorder and noisy data with errors. Data processing is the best pre-processing techniques to resolve and the proposed block diagram is shown in Fig. 2.

a.
Data Collection

Customers are free to express their comments on public forums like blogs, discussion boards and reviews. Public opinions are collected on private or public social network sites like Facebook and Twitter. Opinions and feelings are expressed in terms of vocabulary, context writing, short forms and slang. The data collected through pubic forums or social network are unstructured and huge disorganized data. The manual analysis of sentiment data is virtually impossible, so in our proposed work we used “R” tool for the efficient data analysis.

b.
Text Preparation

The data collected through public forums should be filtered to extract the data for data analysis. Text preparation is done by eliminating non-textual content in the data collection. After the text preparation process the relevant data alone exist which can be used for further data analysis.

c.
Sentence Classification

Preprocessed sentence are examined for subjectivity and objectivity expressions. Each sentence of the tweets are examined for subjectivity and objectivie expressions. Sentences with subjective expressions are retained and that which conveys objective expressions are discarded. The various computational techniques used for identifying subjective sentences are unigrams, lemmas, negation.

d.
Sentiment Scores

The subjective sentence identified are further classified into two groups as positive and negative. Sentiment Anaysis plays a vital role to analyze and categorize the sentence into positive and negative tweets and the emotion scores are also calculated.

4 System Implementation

The steps to connect R and twitter API to extract Tweets on COVID-19 are

1.
Make a Twitter account with the mobile number.
2.
Create first Twitter app from this link -http://apps.twitter.com
3.
Snap on Create New App. Pick a name for your app and give a concise depiction to your application and give your profile link.
4.
Snap on “Create your Twitter application”. On the off chance that your application is made and it should look like this as shown in the Fig. 3.
Fig. 3.
Creation of Twitter APP
Full size image
5.
Open your application and go to “Keys and Access Tokens” to learn your Consumer Key (Programming Interface Key) and Consumer Secret (API Secret) key as shown in Fig. 4.
Fig. 4.
Twitter data extraction -“Keys and Access Tokens”
Full size image
6.
If you're doing this for the primary time then you've got to scroll down on an equivalent keys and access tokens page and generate your Access tokens as shown in Fig. 5.
Fig. 5.
Twitter data extraction
Full size image

R Studio is installed using the following steps:
1.
Install necessary packages and load the libraries as shown in Fig. 6. These packages are important to install as they permit R interface to associate with twitter and offers validation to outsider applications.
Fig. 6.
Installing packages in R- studio
Full size image
2.
Now set up the following commands to establish connections between keys as shown in Fig. 7.
Fig. 7.
Setting the connections between keys
Full size image
3.
The environment and connection for R to speak with Twitter has been found out and tweets are extracted. There are a few orders to remove tweets of a client or by utilizing a particular word. The R code to extract tweets on a particular word are specified in Fig. 8.
Fig. 8.
Tweets extraction for a particular word
Full size image
4.
Finally the tweets are downloaded as shown below in Fig. 9.
Fig. 9.
Tweets after downloading
Full size image
5.
Nearly 1500 recent tweets are downloaded. After downloading the tweets are easily converted into an.csv file for comfortable view. The tweets are downloaded using following queries Fig. 10.
Fig. 10.
Tweets conversion into .CSV
Full size image

From that csv file around 5–10 tweets are taken for data pre processing of tweets. The tweets are applied to tokenization in different structure, single tokenizers can include number of words present in sentences and shown in Fig. 11. The tweets upper case characters are converted into lowercase which are shown in Fig. 12.

The tweets are further appied with the pre-processing and the tweets upper case characters are converted into lowercase which are shown in Fig. 12. Pre-processed tweets after the removal of punctuation mark are shown in Fig. 13 and removal of numbers are shown in Fig. 14. The pre-processed tweets after the removal of stop words are shown in Fig. 15 and removal of URL are shown in Fig. 16. Further, they are processed to remove whitespaces and the output of the tweets after removal of white space are shown in Fig. 17.

The preprocessed tweets are applied to tokenization, where the tokens are extracted from the input tweets. The frequency of the words extracted are computed and depicted using bar graph as shown in Fig. 18. Word cloud is the visual representation of the tokenized words which are depicted in Fig. 19.

The final preprocessed tweets are applied to src_dictionary to identify the emotion scores and also to classify the tweets as positive or negative. The various emotions showed by the tweets are anger, disgust, fear joy, surprise and so on. The sentiment emotion score for COVID-19 Tweets are shown in Fig. 20.

5 Conclusion

Sentiment Emotion Scoring for COVID-19 tweets shows the negativity as high because all people think that is very dangerous and no medicine was found still. People are afraid of that diseases so the emotion scoring for negativity is high. People who are recovered from covid have tweeted on how to take self-care and their experience on covid. Future scope of the proposed work can be related to any other pandemic diseases or on products to give direction for the customers. Emotion scoring identified for the whole tweets based on words and in future we can classify sentiments for individual tweets based on the emotion score of individual tweets and can further analyse the data at an extreme point of view.

References

Vijayalakshmi, R., Kalaivani, A.: Sentiment emotion scoring for Apple mobile tweets. Test Eng. Manag. 82, 6756–6763 (2020)
Google Scholar
Kalaivani, A., Venkatesh, N.: Word cloud for online mobile phone tweets towards sentiment analysis. Int. J. Eng. Adv. Technol. 8(6), 2249–8958 (2019)
Google Scholar
Bhattacharjee, S., Das, A., Bhattacharya, U., Parui, S.K., Roy, S.: Sentiment analysis using cosine similarity measure. In: IEEE 2nd International Conference Recent Trends in Information Systems (ReTIS), pp. 27–32 (2015)
Google Scholar
Ghag, K., Shah, K.: Comparing analysis of effect of stop words removal on sentiment classification. In: IEEE International Conference on Computer Communication and Control, pp. 2–7 (2015)
Google Scholar
Rill, S., Reinel, D., Scheidt, J., Zicari, R.: Early detection of emerging Politics topic on twitter and the impact on concept-level sentiment analysis. Knowl.-Based Syst. 69, 24–33 (2014)
Article Google Scholar
Khan, F.H., Bashir, S., Qamar, U.: TOM: Twitter opinion mining Frame work using hybrid classification scheme. Decis. Support Syst. 57(1), 245–257 (2014)
Article Google Scholar
Duwairi, R., El-Orfali, M.: A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J. Inf. Sci. 215–221 (2014)
Google Scholar
Duwairi, R., Elorfali, M.: A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J. Inf. Sci. 1–13 (2013)
Google Scholar
Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sentiment analysis. Procedia Comput. Sci. 17, 26–32 (2013)
Article Google Scholar
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the International Conference on Language Resources and Evaluation, pp. 17–23 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamil Nadu, India
A. Kalaivani
IBM, Bangalore, India
R. Vijayalakshmi

Authors

A. Kalaivani
View author publications
You can also search for this author in PubMed Google Scholar
R. Vijayalakshmi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Papua New Guinea University of Technology, Lae, Papua New Guinea
Ashish Kumar Luhach
Namibia University of Science and Technology, Windhoek, Namibia
Dharm Singh Jat
Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
Kamarul Hawari Bin Ghazali
University of Eastern Finland, Kuopio, Finland
Xiao-Zhi Gao
Saint Mary's University, Halifax, NS, Canada
Pawan Lingras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kalaivani, A., Vijayalakshmi, R. (2021). An Automatic Emotion Analysis of Real Time Corona Tweets. In: Luhach, A.K., Jat, D.S., Bin Ghazali, K.H., Gao, XZ., Lingras, P. (eds) Advanced Informatics for Computing Research. ICAICR 2020. Communications in Computer and Information Science, vol 1393. Springer, Singapore. https://doi.org/10.1007/978-981-16-3660-8_34

Download citation

DOI: https://doi.org/10.1007/978-981-16-3660-8_34
Published: 20 June 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3659-2
Online ISBN: 978-981-16-3660-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Automatic Emotion Analysis of Real Time Corona Tweets

Abstract

Similar content being viewed by others

TWEESENT: A Web Application on Sentiment Analysis

Text-Based Analysis of Emotion by Considering Tweets

Emoticons and Their Effects on Sentiment Analysis of Twitter Data

Keywords

1 Introduction

2 Literature Review

3 System Methodology

4 System Implementation

5 Conclusion

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Automatic Emotion Analysis of Real Time Corona Tweets

Abstract

Similar content being viewed by others

TWEESENT: A Web Application on Sentiment Analysis

Text-Based Analysis of Emotion by Considering Tweets

Emoticons and Their Effects on Sentiment Analysis of Twitter Data

Keywords

1 Introduction

2 Literature Review

3 System Methodology

4 System Implementation

5 Conclusion

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation