1 Introduction

These days Multimedia based surveillance becomes more vital as there are lots of incidents become unnoticed even with CCTV. The November 2015 terror attacks in Paris threw the social media community into a frenzy [26]. A fire hose of information, opinions, and news streamed through Twitter, Facebook, and other networks. Some of it real, some of it false, and a lot of miscommunication. In fact, social media has made disaster response more complicated, yet often more effective.

Well, in any disaster, especially one producing casualties, there are two things that people want to know: what is going on, and are my friends and family safe? Before the internet was widespread, those questions were answered by legacy news sources, like television and radio. The whereabouts of loved ones would likely be dependent on a series of phone calls. But today, the game has completely changed. As soon as a major event occurs, a flood of information from hundreds, if not thousands of sources springs up, readily available to anyone with internet access. It has even transformed the nature of journalism, with citizen journalists sometimes providing the most up to date and accurate information, while traditional reporters scramble for sources. But with this overwhelming amount of information, there is a glaring problem i.e. which person we can trusted.

This is where another major factor comes in i.e. crowdsourcing. While people post information, there are educated groups fact checking, referencing, and supporting those ideas. Companies like Grasswire can collect huge numbers of tweets and photos, while following up on stories and checking with experts, all in real time. The faked photo of the Sikh man was almost immediately flagged as “fake” by Grasswire, preventing it from gaining as much traction.

The internet’s ability to regulate itself can even lead to even stronger journalistic standards than in staffed legacy news networks. But at the same time, crowdsourcing information isn’t always the best idea. During the hunt for the 2013 Boston Bombers, the crowdsourcing media website, Reddit, attempted to figure out the bombers’ identity. A sort of witch-hunt based on weak evidence led to multiple innocent people being targeted in the media and online, despite having nothing to do with the case at all.

Clearly, is it important that there be some editorial standard and organization, or a crowd can run amok with the wrong information. But social media has also made it fast and easy to reassure everybody – right – Social Media [13]. Instead of having to create a phone tree, we can reassure anyone interested in our wellbeing, and deal with the disaster at hand. Public officials are also able to reach as many people as possible by providing directions and updates online, thereby potentially saving lives. Despite some drawbacks to having a barrage of information, social media has made disaster response more effective, and under the right organization, more accurate. As technology changes the way we interact with the world, it is important to continue regulating and adapting to new tools about the power of one social media platform [16].

Apparently, people in the neighbourhood took out their phones, took pictures of the fire, and posted them on Facebook, rather than calling the fire department. It wasn’t until 15 min after the fire started that the fire department was alerted. Now the question of whether to post it first on the social media or to alert the relevant authority becomes a matter of personal choice.

In 2014 gas explosion occurred in two apartment building in East Harlem neighbourhood of Manhattan in NYC Fig. 1. People were taking selfies with the burning building as their background [34]. NYC fire department responded to the scene after the explosion within two minutes. This was a heart of the city scenario, what if such tragedy happens in a remote area, the authorities will not come to know such things unless someone alert them right away.

Fig. 1
figure 1

Aerial view of the explosion

It’s hardly news that the Internet has turned much of our world upside down, it is true for how we do communication when an event occurs, is like we book our travel, social media is creating new crises new ways to respond new channels to exploit but while most everyone in public relations and crisis communications understands that these changes are big.

We would like to suggest that they are much bigger than most realize in fact event communications in the social media era is the same as it was before, let’s look at how communication used to be done in the old days. So to speak an event occurs we picture it as an explosion because crisis by nature tend to be sudden explosive events with major consequences when an event like this happens the response team is activated if we use the incident command system the various sections are organized and the public information officers (PIO) identified and gets to work as head of the communications function he or she will refer to him her as the PIO gathers up information from the responders drafts the press release gets the information approved by command, answers media questions hands out the release of the waiting media then the PIO sets up the press conference which is the main way to get response information to the media.

Recently, the various technologically advanced ways of fire detection have been introduced, as most of the time the fire breakout has been found to cause severe economic damage as well as loss of lives. Fire is easily spreadable to its surrounding carrying strong heat and stopping the people from escape. When the fire breakout spreads to the surrounding it can becomes more destructive resulting in higher casualties and death. Fire accounts for huge economic lose and death. However, early fire detection is more effective to astound the economic lose and to save the human lives. The fire detection at very initial stage is necessary to escape the fireplace and to drench the fire source properly.

The most suitable way to detect fire at early stage and to avoid losses is fire alarm system installation. Fire alarm system is equipped with various interconnected devices which are working together to detect fire and give fire alerts to the surrounding people via connected video and audio appliances regarding fire breakout and emergency exit. The various detection devices such as heat, smoke and gas detectors initially detect their respective happenings followed by auto-alarm activation. The bells, mountable sounders and horns are fitted in the alarm system to give proper alert.

Fire carries various features such as light, heat, smoke and many others which are detectable by different sensors. These days large amount of fire data exist on the cloud in the form of smoke and heat sensors data as well fire images data (smoke images, fire images). We can use these datasets in fire detection as well as we can use data uploaded on social media like fire images etc. This is the era of artificial intelligence and by using machine learning methods we can predict the fire as well detect it. To provide the efficient solution for above mentioned problem, we proposed a strategy by using two machine learning methods. Our first proposed model is hybrid neural network model which is made up of Adaboost and many MLP (Multi-layer Perceptron) neural networks. The main aim of using hybrid Adaboost-MLP model is to predict fire proficiently. This model utilized distinctive sensors information like smoke, heat, and gas for training. After predicting, we used a deep neural network for detection of fire in the videos or images. We used Convolutional Neural network for detection. Proposed CNN model is trained on available data on cloud as well as data gathered from social media.

Rest of the paper is organized as follows. Section 2 is describing our motivation. Section 3 is overview of proposed work and its subparts are describing the detail of each module. Section 4 is detail description of Experiments and results. Finally, at the end we concluded our work.

2 Motivation

Since much of the initial steps related to the fire lies with the social media, we would like to collect information from the social media, and the sensor data (that are available) in case of a wild fire, sensors inside the forest (if any), and the satellite images. Satellite image may not clearly reveal exact details of the fire, but it can easily identify the existence of fire. Once this is established then we can gather other data both from the social media and from sensor data from the sensors placed inside the forest or in the research stations near by the fire. And use ML technique to find required solution.

3 Proposed work

Early fire detection for surveillance of industries, homes, forests, nuclear power plants and other different public areas can be fruitful regarding ecological, economic, and social damage. However, plenty of work done on early fire detection, but still it is a challenging problem to come with a more efficient method. Thus, the problem demands to come up with an algorithm that can achieve better accuracy and can minimize the false alarms [34]. A fire has features like light, smoke, heat, gas and many others. As this is the era of artificial intelligence and many others machine-learning based algorithms, so, to provide a better solution for the problem mentioned above, we proposed a method using machine learning techniques. Our proposed model has two main deep neural networks models. Firstly, we used a hybrid model made of Adaboost and many MLP neural networks. The purpose of hybrid Adaboost-MLP model is to predict fire efficiently. This model used different sensors data like smoke, heat, and gas for training. After predicting the fire, we proposed a CNN model to detect the fire immediately. Our CNN model used two types of inputs, i.e. (i) the input pictures and videos by nearest selected surveillance camera node (ii) or the data posted on social media by humans of that region. The detailed description of each part is as follows subparts.

3.1 Adaboost-MLP hybrid model

Our proposed hybrid model for fire prediction is demonstrated in Fig. 2. It shows, first there is Adaboost algorithm to make better predictions with MLP neural networks, many MLP neural networks as weak predictors for prediction of fire and final all the forecasted results from MLP neural networks are summarized using weight provided by AdaBoost. A detailed description of these parts is in following subparts.

Fig. 2
figure 2

Proposed AdaBoost-MLP model

3.1.1 Adaboost algorithm

Adaboost abbreviated as Adaptive Boosting is a powerful machine learning algorithm. It is formulated by Scientist Freund [14]. Commonly, it is used as a conjunction of many other weak artificial neural networks. It combines the output of other classifiers (‘weak learners’) into a weighted sum, and then it characterized as the final output (boosting). Firstly, it was designed for feature classification and regression, but because of its great efficiency regarding classification, it has been used in many image processing applications [17, 30, 32]. The AdaBoost algorithm has sensitivity against noisy data and outliers. In our proposed scheme, we used Adaboost algorithm to boost the output of many MLP neural networks for better and efficient fire prediction.

3.1.2 MLP neural networks

MLP, short for multilayer perceptron, basically it is a feed-forward artificial neural network [31]. It has excellent performance for nonlinear matching and has great generalization abilities, for that reason, it has been applied in many signal processing works. A simple MLP neural network consists of three nodes, i.e. one input node, one output node and one hidden node. Except for input layer, every neuron (each node is a neuron) uses nonlinear activation function (commonly hyperbolic tangent or logistic function activation functions). For training, it utilizes backpropagation technique. We used different MLP neural networks as a weak prediction model for fire.

3.2 Convolutional neural network architecture

Deep CNN’s have presented the powerful applications in computer vision projects like classification, renewal of images [5, 8, 20, 38], object detection image segmentation and, object detection [35, 36], Image segmentation and localization [21, 23]. The CNN model was firstly derived by Fukushima [15], he devised an architecture of hierarchical neural network after the marvellous research of Hubel [19]. After that, a CNN model was designed by Lecun [12] for the classification of digits. D. Ciresan [11] proposed a CNN model for image classification.. The popularity of CNN’s among various computer applications is associated with their hierarchical architecture. The basic idea behind the deep convolutional neural network is the partition of the problem into subparts, continuously, until the final solution [1,2,3, 7, 27]. Other benefits of using CNN are, networks can be modified very easily, training datasets and other parameters can be updated without any complexity.

Normally, CNN comprises with different types of processing layers including Convolutional, Max-polling and many FC layers. These layers are connected in such manner that output of one layer is turned into the input of next connected layer. Convolutional layers are the core blocks of the CNN model. These layers are made of a rectangular frame of neurons. These neurons have a small amenable field which can be extended through the input volume. So, the convolutional layer is the convolution of the preceding layer. But sometimes more convolutional layers make the network more complex especially for recognition systems. After convolutional layers, sometimes there may be max-pooling layers. The primary function of the pooling layers is subsampling of their inputs. After several convolutional and max-pooling layers, fully connected layers come for high-level reasoning in the neural network. For real-time applications, these layers can be connected and deformed together many times for expected results.

In our proposed CNN model, we made three convolutional layers with three max-pooling layers. Our proposed model is illustrated in the Fig. 3. The input for CNN model is taken from the stored images or images that posted by humans on social media using the internet. Then these images are resized to 224 × 224 × 3. Resized images then passed through the first convolutional layer. On that layer, 96 kernels of size 11 × 11 with a stride of 4 are applied and produced 96 features. These features are then sent to max-pooling layer for reducing complexity, and the size of the feature maps is reduced by the factor of 2. After that, second convolutional layer came with 256 kernels of size 5 × 5 came followed by similar max-pooling layer. Then the last convolutional layer having the same 256 kernels and last pooling layer like previous. At the end of the model, there exists three final and fully connected layers which classify the output of previous layers into results. In the final layers we used SoftMax classifier to classify the final output. The final output will be classified as “Fire” and “Normal”. SoftMax discriminant classifier do it by providing weighted distance between training and testing samples from that particular data class. We used ReLu activation function with coefficient a = 13 for convolutional and fully connected layers similarly as [4, 37] shown in Fig. 4.

Fig. 3
figure 3

Proposed CNN mode

Fig. 4
figure 4

Leaky Relu

For calculating precision p, recall r, F_measure, and accuracy, we used the following equations. Precision defines a relevant object as fractions of the retrieved objects, but recall is a representation of relevant objects of the complete set. We used F_measure to measure the efficiency of precision and recall. We calculated these by using following equations.

$$ {\displaystyle \begin{array}{l}p=\frac{T^p}{T^p+{F}^p}\\ {}r=\frac{T^p}{T^p+{F}^n}\\ {}F\_ Measure=2\times \left(\frac{p\times r}{p+r}\right)\\ {} Accuracy=\frac{T^p+{T}^n}{T^p+{T}^n+{F}^p+{F}^n}\end{array}} $$

Where:

Tp (True positive) = (Fire images predicted as fire images).

Tn (True negative) = (Normal images predicted as Normal images).

Fp (False positive) = (Normal images predicted as fire images).

Fn (False negative) = (Fire images predicted as Normal images).

To measure the true positive rate and false positive rate following are used.

$$ {\displaystyle \begin{array}{l}{T}^p\;\mathrm{rate}={T}^p\times \left({T}^p+{F}^n\right)\\ {}{F}^p\;\mathrm{rate}=1-{T}^n\times \left({T}^n+{F}^p\right)\end{array}} $$

4 Experiments and results

The purpose of our work is to predict fire before happening and classification (if a fire happens) whether the input image contains the fire. For that, the both of our ANN models are trained using a collection of labelled data (sensors data and images). Furthermore, after prediction, we want to locate the position of the fire. The training of each model is described below.

4.1 Training of Adaboost-MLP model

The outcomes of the fires are many like light smoke, many gases, heat and many other things. These parameters can detect it. We used CO2 gas data and heat data as a training data for both Adaboost and MLP models. The complete process of MLP network training is illustrated in Fig. 5. From the figure, we can see that: we chose 900 samples (each sample is a vector which contain many sensors values) of original fire sensor pre-processed data and then we divided this data into three parts, i.e. a set of training data, validation data and a set of testing data. Each sample is in the form of vector and each vector contains many sensors values. Total data that we used is almost 3.45 GB. The first set, i.e. 1st to 700th samples are used to train the model, 2nd dataset of hundred samples (701–800) is used to validate the model and rest 801st to 900th are used to validate the data.

Fig. 5
figure 5

Flow chart of MLP neural network training process

In training set, we divided more into eight different input streams. After training the model, to validate our MLP model we used 100 samples (100 vectors), i.e. 801st to 900th. In our model, the number of input neurons are 8; hidden neurons are 16 and only one output neuron. The learning rate was 0.01 and the training was done using 100 iterations.

The basic operation of an Adaboost algorithm that is commonly used [22] are as follow:

To calculate weights {Dt(ⅈ)} of the of fire data series the following equation can be used:

$$ {D}_t\left(\mathrm{i}\right)=\frac{1}{n}\;\mathrm{Where}\kern0.22em \mathrm{i}=1,2,3,\dots \dots .,n-1,n\ t=1,2,3,\dots .,T $$

Where n is the sample size of fire data series and T belongs to our MLP based forecasting model. After selecting the weights, now to calculate the forecasting error εi (the error occurs during MLP forecasting for every input channel) our model used equation (1) and then by using equation (2) to calculate the sum of all error εt and to calculate the weights for MLP predictor we used equation (3). We used equation (4) to update the sampling weight of input streams. In equation (4) zt is the normalize impact. This procedure repeated many times until all MLP models gives output. Finally, summarized all the MLP predictions using Adaboost framework to make final and efficient prediction. Figure 6 is the training error of our model. The formulae are as follow:

$$ {\varepsilon}_i=\frac{\mid {x}_i-\overline{x_i}\mid }{x_i}\;\mathrm{where}\kern0.22em i=1,2,3,\dots .,n $$
(1)
$$ {\varepsilon}_t=\frac{1}{n}{\sum \limits}_{i=1}^n{\varepsilon}_i $$
(2)
$$ {W}_t=\frac{1}{2}\mathit{\ln}\left[\frac{1-{\varepsilon}_t}{\varepsilon_t}\right] $$
(3)
$$ \left\{\begin{array}{c} Dt(i)=\frac{D_{t-1}\left(\mathrm{i}\right){\beta}^{-{\varepsilon}_i}}{z_t}\\ {}{\beta}_t=\frac{\varepsilon_t}{1-{\varepsilon}_t}\end{array}\right. $$
(4)
Fig. 6
figure 6

Training error of adaboost MLP model

4.2 Training of CNN model

As usually, CNN’s work with two types of neuron layers, i.e. a first convolutional layer which is built up with non-linear stimulation function and second layer consist of non-linear subsampling algorithm. Mixing of these variants of layers can be fruitful for building applications like object detection. As mentioned in the proposed scheme we used ReLu activation function. The reason behind chosen of ReLu as activation function over Sigmoid and Hyper tangent is high accuracy and quick training. The basic formula of Leaky Rectified Linear Unit is following [37]:

$$ f(x)=\left\{\begin{array}{c}x\kern2em if\kern0.5em x>0\\ {}\ ax\kern1.5em otherwise\ x\le 0\end{array}\right. $$

The goal behind classification using CNN model is to classify the image content into fire or normal. To achieve this goal, we used 74,719 RGB labelled images from different datasets Foggia’s video dataset [25], Shino’s dataset [10] and some other datasets [24] for the training of our model. The detailed description is shown in Table 1. We used 45% data for training and rest 55% for testing purpose. Table 2 is showing the data d partitioned for training and testing. We trained our model on a computer having specification Intel(R) Core(TM) i5–3570 CPU @ 3.40GHz and 3.80GHz RAM 32GB and graphics card Nvidia Titan XP. We used Stochastic gradient descent and backpropagation algorithm (to set the best weight as first we chose weight randomly). We calculated average learning rate. Initially, it was 0.01, but after every five epochs, it decreases by factor 0.95. We trained our model in 6 iterations. We calculated average error and miss rate for each iteration. The graph in Fig. 7 is showing variation in error rate and miss rate. After each iteration error rate was decreasing and miss rate was increasing. Miss rate was in 80% to 90%, and it showed the accuracy with each iteration and error rate decreased with factor 0.1. To avoid overfitting in fully connected layers, we used dropout of 0.5.

Table 1 Dataset description
Table 2 Training and testing dataset of smoke and fire
Fig. 7
figure 7

Overview of miss rate and error rate during training of CNN Model

4.3 Results of proposed model

The results for both deep learning models are in the following subsection.

4.3.1 Results of Adaboost-MLP model

For testing the performance of our Adaboost-MLP model, we used a sample data of size 801–900 for temperature and CO2 gas. Fig. 8 (a) and (b) shows the forecasted result for temperature and CO2 gas respectively. From figures, we can see our hybrid Adaboost-MLP model have better predictions of temperature and gas.

Fig. 8
figure 8

Predicted result proposed Adaboost-MLP model

4.3.2 Results of CNN based fire detection

We did our experiments on two datasets: Foggia video fire dataset [25] and Chino smoke dataset [10]. The fire dataset has 31 fire videos with both indoor and outdoor environments, of which 17 videos are of fire, and the rest videos consist no fire. The second data set is of smoke in which some videos are of smoke, and others are without smoke. We select these both datasets because these have videos captured in different scenes in indoor and outdoor environments. Among these videos, some videos contain fire-like objects or situation like fire but not actual fire, and it is very difficult to predict or classify. Similarly, the techniques which based on motion detection may fail to classify a scene like fire or smoke on mountains, cloud, or fog. Due to these reasons, the selected datasets are more challenging. Some of the results for these datasets are shown in Fig. 9.

Fig. 9
figure 9

CNN Model results for different images

We divided our CNN model into parts. First part is for mapping the features and second one for high reasoning classification. Our main aim is to characterize a fire in video frame. Other models use sliding window for the classification of images. We used sliding window technique as well a different method instead of sliding a window of size 64 × 64 for to detect the fire. We tried to detect the fire window with last feature map. First part of our CNN model which includes three convolutional and three max-polling layers can evaluate last feature map. We evaluate last feature map by using first part of CNN model. To detect the smoke and fire in video, a sliding window having a size of 12 × 12 which is created from 64 × 64 RGB image is applied on last feature map in Fig. 10. We used GPU and a tensor 12 × 12 × 1 × N (where is the number of windows) of last feature map to speed up our predictions.

Fig. 10
figure 10

Feature map of original image

We calculated results for precision p, recall r, F_measure, accuracy a, true positive Tp and false positive Fp are shown in Fig. 11. These results show that our trained model has near 91% fire detection accuracy. We can the false positive results are quite low. These results can be improved more by further training. The overall performance of with our both model is very efficient. In the future this system can be extended to reflect more smarter system down to individuals who can contribute in detecting fire and sending intimation to the fire-station [28, 29], an efficient multimedia system for retrieving fire detection scheme also be employed in the fire-station as they can directly detect fire from the CCTV cameras installed in the forest vicinity. Table 3 is the comparison of our proposed model with other existing models. We also tested our model for real time dataset. We made 8 different video which contain smoke and fires. We tested these videos using our own model. Figure 12 is the tested result of these dataset.

Fig. 11
figure 11

Graph for calculated result of P, r, F-measure, Tp rate and Fp rate

Table 3 Comparison of proposed technique with existing methods
Fig. 12
figure 12

Real time testing result