Keywords

1 Introduction

Twitter is a micro-blogging site in which users can post and interact through messages known as tweets. Tweets are visible to everyone by default. The senders can send messages only to their followers; they also mute others with whom they do not want to interact and block them from viewing their tweets. Twitter acts as a source of news provider at the time of crisis. It provides timely access for information seekers during a disaster. Twitter provides details about disaster events much faster than other news providers. These details are available in future for reference. In Twitter, Retweet feature is used to republish a post hence the information can be shared with more people. Hence, Twitter acts as a vital source of information sharing during crisis events. Since the varied tweets are broadcasted rapidly using classifiers, extracting the needed information alone is a challenging task. The authors study about the tweets collected during various crisis events and found that all crisis-related social tweets are related to one of the following categories like affected people, infrastructures and utility damage, caution and advice, sympathy and emotional support, donations and volunteer, and other useful information. However, automatically classifying tools for extracting useful information is largely unavailable [1]. In this review article, we compared different classifiers used by the authors to classify the tweets, performance evaluation and also investigate information extracted for an Identification of Informative Tweets Using Deep Neural Networks during crisis events [2]. The following section describes crisis-related dataset, classification of tweets, preprocessing methods, methodology, and machine learning algorithms used in their study.

2 Crisis-Related Dataset

The collection of data from social media is an important task to create models for automatic detection of particular tasks. Researchers have scraped tweets based on hurricanes, floods, earthquakes, wildfires, etc. from Twitter and made the data available for public use. Images posted during four natural disasters Typhoon Ruby, Hurricane Matthew, Ecuador Earthquake, and Nepal Earthquake were used for evaluation [3]. Nearly 3518 images were selected and a Damage severity assessment was done and classified the images into three categories severe, mild, and no damage. Twitter datasets collected during the 2015 Nepal Earthquake (NEQ) and the 2013 Queensland Floods (QFL) of nearly 21,703 tweets were taken and classified into relevant and non-relevant data. They consist of both labeled and unlabelled data from related events [4]. Crisis MMD datasets consist of data related to seven natural disasters Hurricane Harvey 2017, Hurricane Maria 2017, California Wildfires 2017, Mexico Earthquake 2017, California Wildfires 2017, Iraq-Iran Earthquake 2017, and Sri Lanka Floods 2017 with 3.5million tweets and 176,000 images. In this paper, the author focused on both textual content and labeled images to extract useful information hence it is useful for many humanitarian organizations to plan for relief operations [5, 6]. Crisis NLP dataset which consists of data related to various crisis events is taken and CNN with word embedding model is used for the classification of textual content from Twitter during a crisis and achieves the best performance compared to other models [7]. CrisisLexT26 dataset which consists of crisis-related events from 2012 to 2013 was taken and performs identification of different information categories using the CNN model [8]. The dataset collected from Twitter's API using the hashtag #Joplin, #sandy was used to identify useful textual content using a model based on conditional random fields and achieves a 90% detection rate [9]. The authors collected the data from Twitter based on the event Hurricane Florence 2018 which provides a detailed picture about the affected people, areas, and utilities damage [10].

3 Preprocessing Methods

Preprocessing is required for data collected from Twitter since tweets consist of misspelled, incomplete, and grammatical error sentences. To preprocess the input data, CNN with a pre-trained word vector model developed by Kim is used for sentence-level classification tasks [8,9,10,11]. Lovins stemmer was used to remove errors [12]. Feature selection methods of unigrams and bigrams are used for classification tasks. The author used an approach of the jieba segmentation package for automatically detecting Chinese text from the Twitter dataset [13]. The CSAE—Convolutional Sparse Auto-Encoder is used to extract the Chinese text [14]. The preprocessing steps of Stemming, Stop word Removal, and Spell Check are used during the stemming process [15].

4 Methodology

The various methods used for the automatic detection of crisis-related messages on Twitter are shown in the figure.

5 Machine Learning Algorithms

Supervised learning is the machine learning algorithm which consists of a trained dataset which maps the input variables and predicts the output variable. A semi-supervised learning approach based on self-training-based and graph-based experiments done for the datasets collected from Twitter. The graph-based semi-supervised learning algorithm achieves better results in terms of F1-score [4]. The machine learning classifiers SVM (TF-IDF) and SVM (Word2Vec) are used for identifying the tweets related to crisis events [1]. A Transformer-based machine learning technique called Bidirectional Encoder Representations from Transformers (BERT) is used for natural language processing (NLP) [16]. Domain adaptation with the Naive Bayes classifier algorithm is used to classify the tweets from labeled and unlabelled data [17]. To evaluate the Crisis2Vec dataset, a linear model of Logistic Regression and a non-linear model of LSTM—Long Short-Term Memory are used to evaluate the performance [6]. An innovative AI technology called a knowledge graph (KG) covers Opportunities, Challenges, and Implementation of COVID-19 KGs in industry and academia [18]. Text steganalysis model based on CNN framework is used for better identification of short text [19]. An unsupervised machine learning approach of a convolutional sparse auto-encoder (CSAE) is used to pre-train the CNN model for extracting the Chinese text from images and also achieves better results [14]. A supervised network of CTR—candidate text region generation method is based on text-aware saliency detection to predict the initial location of the text [20]. Naive Bayes text classification algorithm is used to identify the text based on opinion [11] (Table 1).

Table 1 Various methods on the classification of text and images

6 Evaluation Metrics

The performance of each model has been evaluated using AUC, precision, recall, and F1-score. It is shown in Table 2.

Table 2 Evaluation metrics of various methods

7 Comparison Chart

The figure shows the comparison chart of various parameters like F1-score, Precision, Recall, and Accuracy of various algorithms like crowdsourcing, CNN crisis embedding, BiLSTM crisis embedding, Sem CNN model, TLex embedding, Markov chain algorithm, and SVM algorithm. The graph shows that F1-score is high for TLex embedding algorithm, Precision and Recall are high for the Random Forest algorithm, and accuracy is high for Image 4 act methodology compared to other algorithms.

The Graphical Representation of F1-score of different algorithms is depicted in Fig. 1a. The graph shows that TLex algorithm has a better F1-score in comparison with other algorithms.

Fig. 1a
figure 1

Graphical representation of F1-score of different algorithms

The Graphical Representation of Precision of different algorithms is shown in Fig. 1b. The graph shows that the Random Forest algorithm which is implemented using the European flood dataset has a high Precision Score in comparison with other algorithms.

Fig. 1b
figure 2

Graphical representation of precision score of different algorithms

The Graphical Representation of the Recall score of different algorithms is shown in Fig. 1c. From the figure, it is identified that the Random Forest algorithm has a better Recall Score and classified the datasets into relevant and irrelevant.

Fig. 1c
figure 3

Graphical representation of recall score of different algorithms

The Graphical Representation of Accuracy of different algorithms is depicted in Fig. 1d. The graph shows that an Image 4 act algorithm has the highest accuracy when compared to other algorithms.

Fig. 1d
figure 4

Graphical representation of accuracy of different algorithms

8 Conclusion

This work has detailed the classification of tweets, datasets, preprocessing methods, and machine learning algorithms used in their study. The performance of each model is evaluated using the parameters AUC, precision, recall, and F1-score is discussed. The classifying algorithms of Naive Bayes and SVM are analyzed and it shows that SVM outperformed compared with other classifiers. This article gives a brief review of the existing publication works which focused on detecting the related, relevant, event types, information types, tweets, and a few works based on detecting images related to crisis events from Twitter and detecting informative textual content from images, detecting Chinese text, etc. The evaluation metrics of various algorithms were analyzed in the graph. From the chart, it is found that the TLex algorithm which is implemented using the COCO dataset has a high F1-score of 94%. The Random Forest algorithm which is implemented using the European dataset has high precision and recall scores of 98.3 and 80.4 percent. An Image 4 act algorithm which is focused on predicting images related to disaster posted on Twitter achieves the highest accuracy of 98% compared to other algorithms. Hence, a detailed analysis related to methodology, algorithms, datasets used, and evaluation metrics of various methods has been analyzed in this review article. The Future directions of the research may focus on the evaluation of other machine learning algorithms with improved evaluation metrics.