A Comparative Study on the Identification of Informative Tweets Using Deep Neural Networks During Crisis

Ramya, T.; Christaline, J. Anita

doi:10.1007/978-981-16-9488-2_66

T. Ramya⁴⁰ &
J. Anita Christaline⁴¹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 860))

544 Accesses

Abstract

The social media platform Twitter is considered as a vital source of information during the time of crisis events which shares a variety of information about injured or dead people, infrastructure damage, affected people needs, and missing or found people, among others. The information shared on social media is either textual messages or images. Informative tweets are helpful to the victims and humanitarian organizations that require details. So identifying the informative tweets from crisis-related data collected from Twitter is a challenging task, and we are in need of specialized machine learning algorithms for automatic identification. This review article gives an overview about crisis-related dataset, classification of tweets, preprocessing methods, methodology, and machine learning algorithms used in their study. This article also gives an overview of a few works of the author related to classifying useful images shared during a crisis on Twitter. The classifying algorithms of Naive Bayes and SVM are analyzed in this article. By using different algorithms for various datasets, a comparative study has been done. The performance of datasets collected during various crisis events is collected and compared using the parameters AUC, precision, recall, and F1-score.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Comparative Analysis of Different Classifiers on Crisis-Related Tweets: An Elaborate Study

System for Situational Awareness Using Geospatial Twitter Data

A stacked convolutional neural network for detecting the resource tweets during a disaster

Article 25 September 2020

Keywords

1 Introduction

Twitter is a micro-blogging site in which users can post and interact through messages known as tweets. Tweets are visible to everyone by default. The senders can send messages only to their followers; they also mute others with whom they do not want to interact and block them from viewing their tweets. Twitter acts as a source of news provider at the time of crisis. It provides timely access for information seekers during a disaster. Twitter provides details about disaster events much faster than other news providers. These details are available in future for reference. In Twitter, Retweet feature is used to republish a post hence the information can be shared with more people. Hence, Twitter acts as a vital source of information sharing during crisis events. Since the varied tweets are broadcasted rapidly using classifiers, extracting the needed information alone is a challenging task. The authors study about the tweets collected during various crisis events and found that all crisis-related social tweets are related to one of the following categories like affected people, infrastructures and utility damage, caution and advice, sympathy and emotional support, donations and volunteer, and other useful information. However, automatically classifying tools for extracting useful information is largely unavailable [1]. In this review article, we compared different classifiers used by the authors to classify the tweets, performance evaluation and also investigate information extracted for an Identification of Informative Tweets Using Deep Neural Networks during crisis events [2]. The following section describes crisis-related dataset, classification of tweets, preprocessing methods, methodology, and machine learning algorithms used in their study.

2 Crisis-Related Dataset

The collection of data from social media is an important task to create models for automatic detection of particular tasks. Researchers have scraped tweets based on hurricanes, floods, earthquakes, wildfires, etc. from Twitter and made the data available for public use. Images posted during four natural disasters Typhoon Ruby, Hurricane Matthew, Ecuador Earthquake, and Nepal Earthquake were used for evaluation [3]. Nearly 3518 images were selected and a Damage severity assessment was done and classified the images into three categories severe, mild, and no damage. Twitter datasets collected during the 2015 Nepal Earthquake (NEQ) and the 2013 Queensland Floods (QFL) of nearly 21,703 tweets were taken and classified into relevant and non-relevant data. They consist of both labeled and unlabelled data from related events [4]. Crisis MMD datasets consist of data related to seven natural disasters Hurricane Harvey 2017, Hurricane Maria 2017, California Wildfires 2017, Mexico Earthquake 2017, California Wildfires 2017, Iraq-Iran Earthquake 2017, and Sri Lanka Floods 2017 with 3.5million tweets and 176,000 images. In this paper, the author focused on both textual content and labeled images to extract useful information hence it is useful for many humanitarian organizations to plan for relief operations [5, 6]. Crisis NLP dataset which consists of data related to various crisis events is taken and CNN with word embedding model is used for the classification of textual content from Twitter during a crisis and achieves the best performance compared to other models [7]. CrisisLexT26 dataset which consists of crisis-related events from 2012 to 2013 was taken and performs identification of different information categories using the CNN model [8]. The dataset collected from Twitter's API using the hashtag #Joplin, #sandy was used to identify useful textual content using a model based on conditional random fields and achieves a 90% detection rate [9]. The authors collected the data from Twitter based on the event Hurricane Florence 2018 which provides a detailed picture about the affected people, areas, and utilities damage [10].

3 Preprocessing Methods

Preprocessing is required for data collected from Twitter since tweets consist of misspelled, incomplete, and grammatical error sentences. To preprocess the input data, CNN with a pre-trained word vector model developed by Kim is used for sentence-level classification tasks [8,9,10,11]. Lovins stemmer was used to remove errors [12]. Feature selection methods of unigrams and bigrams are used for classification tasks. The author used an approach of the jieba segmentation package for automatically detecting Chinese text from the Twitter dataset [13]. The CSAE—Convolutional Sparse Auto-Encoder is used to extract the Chinese text [14]. The preprocessing steps of Stemming, Stop word Removal, and Spell Check are used during the stemming process [15].

4 Methodology

The various methods used for the automatic detection of crisis-related messages on Twitter are shown in the figure.

5 Machine Learning Algorithms

Supervised learning is the machine learning algorithm which consists of a trained dataset which maps the input variables and predicts the output variable. A semi-supervised learning approach based on self-training-based and graph-based experiments done for the datasets collected from Twitter. The graph-based semi-supervised learning algorithm achieves better results in terms of F1-score [4]. The machine learning classifiers SVM (TF-IDF) and SVM (Word2Vec) are used for identifying the tweets related to crisis events [1]. A Transformer-based machine learning technique called Bidirectional Encoder Representations from Transformers (BERT) is used for natural language processing (NLP) [16]. Domain adaptation with the Naive Bayes classifier algorithm is used to classify the tweets from labeled and unlabelled data [17]. To evaluate the Crisis2Vec dataset, a linear model of Logistic Regression and a non-linear model of LSTM—Long Short-Term Memory are used to evaluate the performance [6]. An innovative AI technology called a knowledge graph (KG) covers Opportunities, Challenges, and Implementation of COVID-19 KGs in industry and academia [18]. Text steganalysis model based on CNN framework is used for better identification of short text [19]. An unsupervised machine learning approach of a convolutional sparse auto-encoder (CSAE) is used to pre-train the CNN model for extracting the Chinese text from images and also achieves better results [14]. A supervised network of CTR—candidate text region generation method is based on text-aware saliency detection to predict the initial location of the text [20]. Naive Bayes text classification algorithm is used to identify the text based on opinion [11] (Table 1).

Table 1 Various methods on the classification of text and images

Full size table

6 Evaluation Metrics

The performance of each model has been evaluated using AUC, precision, recall, and F1-score. It is shown in Table 2.

Table 2 Evaluation metrics of various methods

Full size table

7 Comparison Chart

The figure shows the comparison chart of various parameters like F1-score, Precision, Recall, and Accuracy of various algorithms like crowdsourcing, CNN crisis embedding, BiLSTM crisis embedding, Sem CNN model, TLex embedding, Markov chain algorithm, and SVM algorithm. The graph shows that F1-score is high for TLex embedding algorithm, Precision and Recall are high for the Random Forest algorithm, and accuracy is high for Image 4 act methodology compared to other algorithms.

The Graphical Representation of F1-score of different algorithms is depicted in Fig. 1a. The graph shows that TLex algorithm has a better F1-score in comparison with other algorithms.

The Graphical Representation of Precision of different algorithms is shown in Fig. 1b. The graph shows that the Random Forest algorithm which is implemented using the European flood dataset has a high Precision Score in comparison with other algorithms.

The Graphical Representation of the Recall score of different algorithms is shown in Fig. 1c. From the figure, it is identified that the Random Forest algorithm has a better Recall Score and classified the datasets into relevant and irrelevant.

The Graphical Representation of Accuracy of different algorithms is depicted in Fig. 1d. The graph shows that an Image 4 act algorithm has the highest accuracy when compared to other algorithms.

8 Conclusion

This work has detailed the classification of tweets, datasets, preprocessing methods, and machine learning algorithms used in their study. The performance of each model is evaluated using the parameters AUC, precision, recall, and F1-score is discussed. The classifying algorithms of Naive Bayes and SVM are analyzed and it shows that SVM outperformed compared with other classifiers. This article gives a brief review of the existing publication works which focused on detecting the related, relevant, event types, information types, tweets, and a few works based on detecting images related to crisis events from Twitter and detecting informative textual content from images, detecting Chinese text, etc. The evaluation metrics of various algorithms were analyzed in the graph. From the chart, it is found that the TLex algorithm which is implemented using the COCO dataset has a high F1-score of 94%. The Random Forest algorithm which is implemented using the European dataset has high precision and recall scores of 98.3 and 80.4 percent. An Image 4 act algorithm which is focused on predicting images related to disaster posted on Twitter achieves the highest accuracy of 98% compared to other algorithms. Hence, a detailed analysis related to methodology, algorithms, datasets used, and evaluation metrics of various methods has been analyzed in this review article. The Future directions of the research may focus on the evaluation of other machine learning algorithms with improved evaluation metrics.

References

Athanasia N, Stavros PT (2015) Twitter as an instrument for crisis response: The Typhoon Haiyan case study. In: The 12th international conference on information systems for crisis response and management
Google Scholar
Sabarimani KS, Arthi R (2012) A brief review on Brain Tumour detection and classifications. Bio-inspired Neurocomputing
Google Scholar
Bruns A, Liang YE (2012) Tools and methods for capturing Twitter data during natural disasters. First Monday
Google Scholar
Helsloot I, Groenendaal J (2013) T witter: an underutilized potential during sudden crises? J Contingencies Cris Manag 21(3)
Google Scholar
Sakaki T, Matsuo Y, Kurihara S, Toriumi F, Shinoda K, Noda I, Uchiyama K, Kazama K (2013) The possibility of social media analysis for disaster management. In: 2013 IEEE region 10 humanitarian technology conference. IEEE, pp 238–243
Google Scholar
Yu M, Huang Q, Qin H, Scheele C, Yang C (2019) Deep learning for real-time social media text classification for situation awareness–using Hurricanes Sandy, Harvey, and Irma as case studies. Int J Digit Earth 12(11)
Google Scholar
Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013) Extracting information nuggets from disaster-Related messages in social media. In: Iscram
Google Scholar
Brynielsson J, Johansson F, Jonsson C, Westling A (2014) Emotion classification of social media posts for estimating people’s reactions to communicated alert messages during crises. Secur Inform 3(1)
Google Scholar
David CC, Ong JC, Legara EF (2016) Tweeting Supertyphoon Haiyan: evolving functions of Twitter during and after a disaster event. PloS One 11(3)
Google Scholar
Alam F, Ofli F, Imran M, Aupetit M (2018) A twitter tale of three hurricanes: Harvey, irma, and maria. arXiv:1805.05144
Al-Garadi MA, Yang YC, Cai H, Ruan Y, O’Connor K, Graciela GH, Perrone J, Sarker A (2021) Text classification models for the automatic detection of nonmedical prescription medication use from social media. BMC Med Inform Decis Mak 21(1)
Google Scholar
Burel G, Saif H, Fernandez M, Alani H (2017) On semantics and deep learning for event detection in crisis situations
Google Scholar
Ning X, Yao L, Benatallah B, Zhang Y, Sheng QZ, Kanhere SS (2019) Source-aware crisis-relevant tweet identification and key information summarization. ACM Trans Internet Technol (TOIT) 19(3)
Google Scholar
Johnson M, Murthy D, Roberstson B, Smith R, Stephens K (2020) Disasternet: evaluating the performance of transfer learning to classify hurricane-related images posted on twitter. In: Proceedings of the 53rd Hawaii international conference on system sciences
Google Scholar
Arthi R, Ahuja J, Kumar S, Thakur P, Sharma T (2021) Small object detection from video and classification using deep learning. In: Advances in systems, control and automations: select proceedings of ETAEERE 2020. Springer Singapore, pp 101–107
Google Scholar
Tamine L, Soulier L, Ben Jabeur L, Amblard F, Hanachi C, Hubert G, Roth C (2016) Social media-based collaborative information access: analysis of online crisis-related twitter conversations. In: Proceedings of the 27th ACM conference on hypertext and social media, pp 159–168
Google Scholar
Benitez IP, Sison AM, Medina RP (2018) An improved genetic algorithm for feature selection in the classification of Disaster-related Twitter messages. In: 2018 IEEE symposium on computer applications & industrial electronics (ISCAIE). IEEE, pp 238–243
Google Scholar
Cirqueira D, Almeida F, Cakir G, Jacob A, Lobato F, Bezbradica M, Helfert M (2020) Explainable sentiment analysis application for social media crisis management in retail
Google Scholar
Paul NR, Balabantaray RC (2020) Detecting crisis event on twitter using combination of LSTM, CNN model. In: Annual convention of the computer society of India. Springer, Singapore, pp 71–80
Google Scholar
Behl S, Rao A, Aggarwal S, Chadha S, Pannu HS (2021) Twitter for disaster relief through sentiment analysis for COVID-19 and natural hazard crises. Int J Disaster Risk Reduct 55
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, India
T. Ramya
Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Vadapalani, Chennai, India
J. Anita Christaline

Authors

T. Ramya
View author publications
You can also search for this author in PubMed Google Scholar
J. Anita Christaline
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. Ramya .

Editor information

Editors and Affiliations

School of Computer Engineering, Kalinga Institute of Industrial Technology(KIIT) Deemed to be University, Bhubaneswar, Odisha, India
Pradeep Kumar Mallick
KIET Group of Institutions, Delhi-NCR, Ghaziabad, India
Akash Kumar Bhoi
Research on Agent based, Social and Interdisiciplinary Applications (GRASIA), Complutense University of Madrid, Madrid, Spain
Alfonso González-Briones
School of Computer Engineering, Kalinga Institute of Industrial Technology(KIIT) Deemed to be University, Bhubaneswar, Odisha, India
Prasant Kumar Pattnaik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ramya, T., Christaline, J.A. (2022). A Comparative Study on the Identification of Informative Tweets Using Deep Neural Networks During Crisis. In: Mallick, P.K., Bhoi, A.K., González-Briones, A., Pattnaik, P.K. (eds) Electronic Systems and Intelligent Computing. Lecture Notes in Electrical Engineering, vol 860. Springer, Singapore. https://doi.org/10.1007/978-981-16-9488-2_66

Download citation

DOI: https://doi.org/10.1007/978-981-16-9488-2_66
Published: 03 June 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-9487-5
Online ISBN: 978-981-16-9488-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Comparative Study on the Identification of Informative Tweets Using Deep Neural Networks During Crisis

Abstract

Similar content being viewed by others

Comparative Analysis of Different Classifiers on Crisis-Related Tweets: An Elaborate Study

System for Situational Awareness Using Geospatial Twitter Data

A stacked convolutional neural network for detecting the resource tweets during a disaster

Keywords

1 Introduction

2 Crisis-Related Dataset

3 Preprocessing Methods

4 Methodology

5 Machine Learning Algorithms

6 Evaluation Metrics

7 Comparison Chart

8 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Comparative Study on the Identification of Informative Tweets Using Deep Neural Networks During Crisis

Abstract

Similar content being viewed by others

Comparative Analysis of Different Classifiers on Crisis-Related Tweets: An Elaborate Study

System for Situational Awareness Using Geospatial Twitter Data

A stacked convolutional neural network for detecting the resource tweets during a disaster

Keywords

1 Introduction

2 Crisis-Related Dataset

3 Preprocessing Methods

4 Methodology

5 Machine Learning Algorithms

6 Evaluation Metrics

7 Comparison Chart

8 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation