1 Introduction

Fire knows no discrimination and is a paramount reason of crucial and catastrophic disasters around the globe which can occur in different environments like offices, industries, residential complex, schools, etc. Automated fire detection system offers the flexibility to assess essential physical and environmental parameters and their impact detection and prediction of fire either at an early stage or even prior to the outbreak. Accordingly, automatic fire detection systems have attracted considerable attention owing to its importance in reducing fire damage.

The best way to deal with disasters is to nip it off in the bud. To reduce the casualties and consequences of fire and minimize associated financial downside, prevention of the spread of fire is essential at a nascent stage. Fire detection plays a pivotal role in this parlance, to trigger timely warning signal reporting initiation of fire event.

Most of traditional building fire detection systems use off the shelf single sensor based fire detection with no intelligence whatsoever. This brings up two prominent bottlenecks which calls for research intervention—(a) reliable and accurate detection of fire occurrence, and (b) early prediction and warning system to forecast the occurrence of fire based on similar pre-conditions. An accurate and timely detection of fire is essential to mitigate the onset of false positive alarms raised by the fire detection system. The detection sensors must be able to differentiate and discriminate actual fire smoke from non-fire incidents as an inappropriate triggering of fire alarm not only causes disruptions in the production pipeline but also raises panic. At the same time, an early warning system based on AI which can predict the occurrence of fire shall facilitate pre-emptive scheduling of necessary activities, thus ensuring no fire linked damages. Therefore, in order to overcome the drawbacks associated with present fire alarm systems, it is necessary to develop and implement reliable and effective fire management systems to combat this disaster.

Interventions of technology in mitigating fire outbreaks have been studied extensively. Some researchers have worked with computer vision to analyse the fire images [1, 2] however it calls for expensive fire grade and far vision camera that entails considerable hardware cost. The advances of deep learning based technologies has played a prominent role in enhancing the quality of our lives in the past decade, as elaborated later, with a few contributions in fire technology as well. However, on the whole, AI and ML has a lot to offer as a promising technological solution to fire disaster management landscape. In a pioneering work, Brian et all [3] employs neural network approach for detecting and analyzing the fire signals and addressed false alarm conditions. At the same time, machine learning based classifications have been used for detection of fire problems in [4].

Soft computing techniques comes with the advantage of eliminating expensive hardware like image acquisition devices and have played a major role in fire outbreak mitigation [5,6,7]. Soft computing have also been successfully employed to bring out the underlying relationship between the causal variables and context sensitive fire occurrences [8,9,10] besides estimating the fire effected area linking future environmental conditions [11,12,13]. Quite a few researches have also been implemented targeting prediction of fire using soft computation and learning technologies [14,15,16,17,18,19,20]. Artificial neural network (ANN) has also been adopted for early fire detection. Harnessing the potential of ANN and logistic regression, Bisquert et al. [21] and ref [22] reports a good classification accuracy achieved through this technique. Maeda et al. [23] identifies areas of high risk of fire incidents in Brazil employing ANN and the approach is suggestive of efficient detection. Several trade-offs are also required to be kept into consideration during implementing ANN algorithm such as nodes of hidden layer and number of nodes in each hidden layer [24]. Large number of training iterations may essentially over train the network, thus, negatively effecting the prediction accuracy [25]. Support vector machine (SVM) is another vertical of machine learning which have been widely used for fire detection and reported to have achieved good results and effective prediction capabilities [26,27,28,29]. An advantage of using SVM is that it does not require prior determination of probabilities, thus making it more preferable.

Machine learning based soft computation algorithms have also been used extensively in the recent past because of its efficient prediction capabilities. Few notable examples of the same as usage of learning in the form of random forest classifier [30,31,32], decision tree classifier [33, 34], support vector machine [35, 36], logistic regression [37], artificial neural network [38], Naïve bayes classifier[39] for classification and prediction tasks. Amongst all classifier approaches, ensemble based classifier has been seen to be more efficient [40] than any individual classifier as its learns from different aspects of training data considering features from the entire solution space [41]. Ref [42] presents a hybrid ensemble method for improved prediction of slope stability using ensemble classifiers and individual classifier technique. Weighted majority voting technique is used to combine the model and tenfold cross validation is used to validate the data for the slope prediction analysis in [43]. A weight based ensemble method WhmBoost is proposed in [44] for classifying balanced data in a binary classification task. The presented work uses two sampling methods and base classifiers with each of them being associated with the weight factor which results in better complementary advantages.

To reduce data imbalance, changing the learning process and modifying sensitivity of the algorithm, a hybrid method of data level approaches is implemented [45]. Hybrid ensemble methods [46,47,48,49,50] are more pronounced in favour of the minority class as it can separate the majority dataset from minority dataset in an effective way. Various sample technique can be adopted to improve the classification performance. The method of combining the sample technique and ensemble technique which leads to achieve desired performance in classification tasks primarily include adaboost [51, 52], voting [53], gradient boosting [54] approaches. Ref [55] represented a novel ensemble learning method which can detect forest fire in different scenarios. In this paper two individual classifiers Yolov5 and EfficientDet are used to detect the fire and another learner EfficientNet is used to reduce the false positive rate by 51.3% and an experiment is carried out on the dataset which can be signified that proposed ensemble learning method improves the detection performance by 2.5% to 10.9%. In [56], an ensemble model is developed which can produce exact solution and improves the feature selection than multiple individual model. An experiment is conducted on hybrid MultiBoostAB Ensemble technique which has different feature selection for finding the model accuracy. The ensemble learning comprising with multiple learning algorithms is used to enhance the predictive performance of any model and hybrid ensemble learning method is a combination of multiple individual classifiers to solve a particular computational intelligence problem. In literature, several classification problems are investigated by using hybrid ensemble technique like classification in imbalance data [57], pulsar candidate classification [58], classification in medical databases [59] and multiclass classification problem of oilseed disease dataset [60]. Ref [61] improved the prediction of slope stability by using hybrid ensemble technique. D Rosadi et al. proposed a prediction of forest fire by using adaptive boosting ensemble classification method [62].In this method decision tree and SVM individual classifier method are used and consider the public dataset to configure the model. An extreme gradient boost hybrid ensemble learning method is developed by Ying Xie et al. to predict the burn area of the forest fire using forest fire dataset [63]. Proposed ensemble technique for detection of burned area for forest fire has better than other individual classifier in term of prediction accuracy for large-scale fire occurrences. Therefore in literatures, hybrid ensemble learning methods are used in different classification problem and prediction of forest fire system. However for building fire detection cases are not deployed.

In this research, a novel machine learning based algorithm is proposed and validated on robust multi sensor data. The contribution of this work is design of a real time hybrid ensemble classifier which synergistically integrates four individual classifiers namely logistic regression classifier, support vector machine (SVM), Decision tree classifier and Naive Bayes classifier. After necessary pre-processing, the dataset is used in the study. An average voting ensemble technique is used for better prediction and seen to improve robustness of the learning algorithm. Ten-fold cross validation technique is chosen to compare the performance of the proposed machine learning algorithm under different fire scenarios. Results has been quantified using model accuracy, model precision, recall, receiver operating characteristic (ROC), area under curve (AUC), cumulative and individual importance of the parameter and error calculation. After validation of the proposed methodology, experiment has been carried out in the laboratory test bench setup using developed smart IoT sensor node prototype.

The paper is organised as follows: Sect. 1 System description and architecture of hybrid ensemble learning technique, Sect. 2 brief introduction of individual classifiers and proposed novel hybrid ensemble by average voting technique, Sect. 3: presents the research methodology followed by proposed machine learning algorithm, data collection,cross validation and the experimental set up, Sect. 4: results and discussion has been presented in Sect. 4.

2 System Description

2.1 Hybrid and Individual Ensemble Learning

Compared to the single model leaner with only one hypothesis over the data, ensemble learning can consider multiple hypotheses, as seen in Figure 1.

Figure 1
figure 1

Comparison between individual and hybrid learning technique for fire detection

Ensemble learning method is a class of machine learning which trains itself from multiple learning frameworks such as random forest, decision tree or other learning algorithm and combines them to get a new better learner. The multiple learner or base learners which are same models but get trained with different data/parameters by selecting best single learner. The final results of the ensemble technique can be illustrated by using voting, averaging or adaboost method, shown in Figure 1

Prediction capability of the combined model gives a better result compared to single model prediction. Ensemble model can be classified as Homogenous Ensemble Method and Heterogeneous ensemble method. Homogenous Ensemble Methods is constructed by multiple classifiers such as boosting, bagging and random forest etc. using different training dataset while Heterogeneous ensemble Methods is developed by different kind of learning algorithms, such as voting, stacking etc. and utilise the training dataset to develop multiple model.

In this present study, a hybrid ensemble technique is proposed for better prediction and enhanced accuracy. The hybrid ensemble method integration of four individual classifier algorithm comprising with based learners such as Logistic Regression, Decision Tree Model, Support Vector Machine, Naive Bayes model using average voting methodology. To begin with, individual classifier models have been trained for prediction. Then four machine learning models have been trained through a hybrid ensemble technique. The classification accuracies have been compared using the confusion matrices of each of the models. For validation, tenfold cross-validation has been carried out and accuracies of each of the four models have been observed. Performance of the optimum hybrid ensemble classifier has also been compared with each single classifier model which shows better accuracy in prediction and lower RMSE error than other classifier models.

2.2 Individual Classifier Model

In this research paper a hybrid ensemble model has been developed by using different classifier model (logistic regression, support vector machine (SVM), Decision tree and Naive Bayes classifier). The experimental data have been collected from sensor node which is then fitted into the laboratory test bench set up as shown in Figure 4 in the manuscript. After the collection of sensor data then data are preprocessed and a dataset has been prepared for the model configuration. The model is trained with 80% of the dataset and remaining data’s are kept for testing purpose. After splitting the dataset, the model is fitted or trained to produce the outcomes. A tenfold cross validation technique has been introduced to increase the effectiveness of the model, therefore the training dataset is divided into 10 subsets from where 9 subsets are used for training and remaining one is used for predicting purpose. After that ensemble approach is used to develop more accurate ensemble classifiers model by addition of multiple number of individual classifiers.

Six individual classifier models have been trained with the real time sensors data collected through multi sensor node from experimental set up. Brief description of each of the classifier models is discussed below:

2.2.1 Logistic Regression

Logistic regression is one of the machine learning algorithms which utilise the logistic function or sigmoid function and used for multi-class classification problems as well as binary classification problem. Logistic regression is a linear classifier therefore logistic function is defined as

$$ f(x) = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + ...........\beta_{r} x_{r} $$
(1)

where, f(x) dependent variable, x1, x2……xr are explanatory variable and the variables \(\beta_{0} ,\,\,\beta_{1} ..........\beta_{r}\) are the estimators of the regression co-efficient or predicted weight.

2.2.2 Support Vector Machine

Support vector machine classification algorithm is one of the most robust classification and regression algorithm, often used in several fields of application in science and engineering field. SVM plays an important role in the field of application of voice recognition, pattern recognition and also text categorisation. The main objective of support vector machine algorithm in binary classification is to get the minimum hyper planes which have maximum distance from the training data set. In nonlinear application, kernel function has been used to find the hyper plane which is represented by the non- linear decision boundary in the input spaces.

2.2.3 Decision Tree Classifier

Decision tree algorithm is a machine learning technique which is used to find the data in replacement statistical procedures and to extract the decision. Different kinds of decision algorithm have been used to obtain their accuracy and cost effectiveness. A decision tree is a flow chart like tree structure which includes branches, root node and leaf node. Internal node represents feature or attribute of the classifier, branches represents outcome or decision rule of a test and each leaf node denotes a class label. The top most of the tree is referred to as root node of decision tree, as seen in Figure 2.

Figure 2
figure 2

Basic algorithm structure of decision tree classifier

2.2.4 Naive Bayes Model

Naïve bayes classification is basically is used multi-label learning problem. A naive bayes classifier is related with the bayesian network, as shown in Equation 2 where C denotes single class variable and `n` represents attributes of variables of XI. Therefore `c` is a class label variable and xi represents a value of an attribute Xi. A naïve Bayes distribution can be represented as

$$ {\text{P}}_{{\text{r}}} \left( {{\text{c}},{\text{x}}_{{1}} , \ldots \ldots \ldots .,{\text{x}}_{{\text{n}}} } \right) = {\text{P}}_{{\text{r}}} \left( {\text{c}} \right)\prod\limits_{i = 1}^{n} {P_{r} } (x_{i} |c) $$
(2)

where, Pr(c) and \(P_{r} (x_{i} |c)\) are represented as class prior and conditional distribution.

2.3 Hybrid Ensemble Classifier

In the literature most of the proposed ensemble methods are developed by a single base estimator or single sampling method but with mixing the number of base estimator and number of sampling method which can give the system better performance.

The main objective of hybrid ensemble approach is to develop more accurate ensemble classifiers by addition of multiple numbers of individual classifiers. Ensemble classifier method combines the prediction of several base estimators to improve robustness of the system over the individual estimator. However, it is not certain that hybrid ensemble classifier shall always perform better than individual classifier however accuracy of hybrid ensemble classifier is always better than average accuracy of all single classifiers. There are many methods available in literature to develop hybrid ensemble classifier. The most widely used and computationally inexpensive method is majority voting and average voting.

In this research paper, average voting method has been implemented in real time to build several base estimators independently, after considering their averaged prediction. It is seen that performance of the combined estimator is better than of any of the single base estimators. A general architecture of the hybrid average voting classifier is shown in Figure 3 where the input dataset are pre-processed and followed by the intermediate base estimator and combining the logistic regression, support vector machine, decision tree classifier and naive bayes classifier models using average voting technique. All of the combined classifiers follow the probability rule of the average voting techniques. In this technique all individual classifier creates its on hypothesis (H1, H2, H3, H4) accordingly and for every output class a probability has been generated after that a best probability class has been selected for the final prediction due to the hybrid ensemble technique shown in Figure 3.

Figure 3
figure 3

Proposed hybrid ensemble classifier model

3 Research Methodology

In this section, a description of individual classifier and hybrid classifier ensemble techniques are used for fire prediction. The research methodology consists of three parts: Dataset preparation, novel machine learning algorithm design and cross validation of dataset.

3.1 Proposed Machine Learning Algorithm for Fire Detection

In recent research trends a hybrid ensemble learning techniques enhances more interested in the field of predictive modelling and it is combined the various learning classifier so that it improves the prediction accuracy over the single classifier model [64]. In this research a voting technique is used that combines the results of the multiple classifier model and weight are determined by gating network and the input of the model which has been created and base model are same and returns a weight to each of the base model in [65]. Two voting technique are mainly used like majority voting and average voting. In majority voting technique 50% vote are consider for final prediction and in average voting, the vote of the individual classifier has been averaged then predict the final decision.In this work we are considering average voting for combining the classifier and a general architecture of the hybrid average voting classifier shown in Figure 4.

Figure 4
figure 4

General architecture of the hybrid average voting classifier

A correlation coefficient denotes the strong relationship between two input variables. There are different kinds of correlation coefficients but here Pearson’s coefficient has been used denoted by \({\uprho }\) due to its advantages.

Pearson’s coefficient is defined as covariance between two input variables divided by the product of standard deviation

$$ \rho (X,Y) = \frac{COV(X,Y)}{{\sigma_{X} \sigma_{Y} }} $$
(3)
$$ \rho (X,Y) = \frac{{E[(X - \mu_{X} )(Y - \mu_{Y} )]}}{{\sigma_{X} \sigma_{Y} }} $$
(4)

where, \(\mu_{X} ,\mu_{Y}\) are mean of X and mean of Y.

A co-relation matrix has been obtained to visualize the relationship between sensor input data and labelled output data. Figure 5 indicates that the variable of dataset is distributed and the distribution of variable is not symmetric in nature. Variable range normally lies between [0 1] on their minimum and maximum values to improve the computation efficiency of the classifier. The correlation variable ranges are varied from − 1 to 1 which corresponds to maximum positive correlation to maximum negative correlation. In this work, maximum and minimum range of sensor input of co-relation variable is − 0.85 to 0.36. The CO2 and O2 of the sensor data output variable has strongly correlated each other is shown in Figure 5.

Figure 5
figure 5

Correlation matrix of input and output variable of fire detection in dataset

3.2 Cross‐Validation and Performance Measures

K fold cross validation technique has been used for the prediction system to reduce the bias resulting from the random selection of training data and hold out data samples which has been used in [66]. In this paper, tenfold cross validation has been introduced, therefore the training dataset is divided into 10 subset from where 9 subsets are used for training and remaining one is used for predicting purpose. The training and prediction process has been iterated for 10 times with different subsets used as the predicting set. Finally, the performance of the prediction has been investigated by averaging the performance of training and predicting dataset. Performance has been measured by calculating the model accuracy, ROC curve and AUC. The performance of prediction can be portrayed by the confusion matrix shown in Figure 10 and the tenfold cross validation shown in Figure 6.

Figure 6
figure 6

Tenfold cross validation of the obtained dataset

3.3 Dataset Preparation

The fire data from the NIST Website “https://www.nist.gov/el/nist-report-test-fr-4016” has been considered for performance evaluation of the proposed model. On this dataset, we have applied the proposed hybrid ensemble based machine learning for validation it using five fire scenarios), two for smoldering fire dataset (SDC1, SDC3), two for flaming fire dataset (SDC5, SDC15) and one for cooking oil fire dataset (SDC12) conducted in a mock-up of a small house or apartment. At multiple positions within the data structure, concentrations of CO, CO2, and O2 were measured, as well as smoke and temperature. Details of the dataset are available in the referred website.

In Table 1, Precision, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) of each machine learning model under different fire scenarios have also been investigated. Performance of the proposed work with similar reported research for fire detection system has been illustrated in Table 4. The results indicate an improvement in performance of the proposed model along with considerable performance in different fire scenarios. In the dataset, fire label column has labelled with “0” represented as non fire case and “1” labelled for fire case condition under different test cases.

Table 1 Proposed Hybrid Machine Learning Models Under Different Fire Scenarios (individual dataset) and a Mixed Fire Scenario (merge dataset) in Term of Precision, MAE and RMSE

Experimental validation of the proposed algorithm has also been carried out using the developed sensor node in a laboratory prototype as shown in Figure 8. In this paper, different gas sensors (MQ 3, MQ135, MQ-2) are used in the sensor node of the experimental setup to detect the fire. MQ 3 gas sensor is highly sensitivity to alcohol and MQ135 gas sensor has been used to detect NH3, NOx, alcohol, Benzene, smoke,CO2,etc. MQ-2 gas sensor is a semiconductor sensor for combustible gas has high sensitivity to H2, LPG, Propane and CO gas. The experimental fire data such as temperature and gas concentration profile has been introduced as shown in Figure 7.

Figure 7
figure 7

Experimental gas concentration result from gas sensor

The smart fire sensor node comprises:

  • Embedded Controller board Microchip ATmega328P Single board microcontroller (16 MHz Clock Speed with 32 KB in-system programmable flash) with Cloud Connectivity Chip.

  • Sensors Different gas sensors (MQ135, MQ 2, MQ3) are used pertaining to fire outbreak eg smoke, CO2, CO, O2, etc. Temperature and Humidity sensors are also incorporated as it provides related pre-cursor information and the performance of gas sensor value may be improved by adjusting the load resistance value of the sensor.

  • Temperature and Humidity Sensor DHT11 is an embedded humidity and temperature sensor provides signals in digital, I2C format useful in providing fire related pre-cursor information. Temperature and Humidity sensors are also incorporated as it provides related pre-cursor information.

  • Buzzer As an actionable downlink based indicator of presence of fire event. The functionality of the buzzer is proposed to be extended and interfaced with a relay as an actionable counter-measure like switching-on of pump, etc. when presence of fire is affirmed by the cloud based analytics engine through downlink.

A laboratory scale test bed setup has been fabricated for experimentation of different fire conditions. The test bed is primarily automated in nature and comprises two chambers–one for electrical fire (right side), and other for gas-linked fires (left side). The chamber on the right side resembles common electrical fire and is powered on through a control switch which turns on and ignites an electric coil through a step down transformer. An electrical blower is also attached to bring down the flame. The left chamber is designed to experiment fires occurring through presence of several inflammable gasses. Sensing such conditions both pre and post fire scenarios reflects presence of crucial gasses and physical environmental factors which are indispensible in lending valuable insights for effective fire management. The setup also has means of extinguishing the fire through piped CO2 release after the experiment is over. The developed smart and wireless multi sensor nodes can be placed in any of the chambers (in left chamber in Figure 8) and automatically detects the presence of flame, gas, and fire conditions and shall transmit to the base station for onward transmission to the cloud.

Figure 8
figure 8

(a) Experimental laboratory test bed set up for fire detection system (b) Sensor node attached with different sensors

3.4 Internet of Things (IoT) Framework Used in the Study

Real time fire detection framework is essential to ultimately save life and prevent catastrophic disasters. After sensing physical parameters related to fire event, IoT system is essential not only for proper detection of the fire using cloud computation based advanced machine learning algorithms, but also to take preemptive and timely counter measure to mitigate the disaster. After validating the proposed algorithm on the NIST dataset, a small experiment has been carried out using lab level test bench set up using developed smart sensor node. Real time data from the wireless smart nodes are sent to the cloud platform for further data analytics using IoT chain uplink as well as to send automated alarms to beneficiaries. This entire system flow is a part of proposed smart fire detection setup proposed in this research. The results establish an end-to-end working prototype of an intelligent and smart fire detection framework using IoT chain. An IoT enabled architecture is implemented for real time management of fire situation, comprising four major components–sensors, networking, cloud and application server, and respective layers of communication protocols are shown in Figure 9.

  1. (a)

    Sensors They pertain to devices which detect the presence of certain fire related physical elements in the environment. Multiple parameters pertaining to fire outbreak can be captured through such sensors. Fire and smoke sensors provides valuable insights on the intensity of fire. The gas sensors like CO, CO2 and O2 help assess and develop the intelligent AI based framework by lending valuable information about pre-conditions and fire associated parameters which may be useful in forecasting an outbreak.

  2. (b)

    Networking There are several networking and communication technologies which can be used for transmission of acquired sensor data to the cloud. The commonly used ones are cellular (2G, 3G, 4G, etc.), radio frequency (LoRa, Zigbee, etc.), or inexpensive WiFi. These technologies vary in terms of offered performance like transmission range, data latency, power consumption, battery shelf life, etc. and their implementation depends on the specific requirement keeping into consideration, local conditions. The primary components of networking are data loggers, repeaters or gateways depending on the coverage of the local network area and required coverage.

  3. (c)

    Cloud Data received from the sensors through the internet needs to be stored on cloud framework for future usage. The cloud platform either public or private can host multiple applications, enabling sensor device management, configuration, and routing.

  4. (d)

    Application Server This is the last stage of the IoT chain and focuses on advanced analytics suitable for the fire management application. Data visualizations along with customised dash-boards offer unprecedented insights through diverse use-cases facilitating predictive management and fire projections.

Figure 9
figure 9

Illustrative IoT system floor plan and communication layer protocols for fire management

4 Results And Discussion

4.1 Comparison of Confusion Matrix for Prediction

The predicted performance has been displayed by the confusion matrix plot which is a matrix array indicating the prediction condition compared with actual class. As we can see, the number of correctly predicted positive values and negative values are represented by \({\text{T}}_{{\text{P}}}\) and \({\text{T}}_{{\text{N}}}\) respectively. Accordingly, the number of incorrect classifiers is defined as \({\text{F}}_{{\text{P}}}\) and \({\text{F}}_{{\text{N}}}\), as shown in Table 2. It is clear from the confusion matrix plot that the hybrid ensemble model performs very efficiently and classified the fire test data with a minimal error rate, as also has been seen in the confusion matrix in Figure 10.

Table 2 Confusion Matrices for Prediction Analysis
Figure 10
figure 10

(a) Confusion matrix plot for individual classifier (Logistic, Decision tree, SVM and Gaussian NB) (b) Prediction matrix for hybrid Ensemble classifier for fire detection system

The hybrid ensemble model is suitable for prediction of the majority classes of any problem and fails to predict the minority classes which are very challenging to perform real time application. Like most machine learning models, the proposed model also has a rate of misclassification, however, the rate of which is low. The reasons for the same are (a) High Bias—as a consequence when the model is `under fitting` the training dataset of the example, and consequently, not presenting a very accurate relationship amongst the input and predicted variables.

(b) High Variance—due to a perfect fit of the proposed hybrid ensemble algorithm with the trained dataset. However, the developed model fits so well with the existing dataset that it may not give comparable results with the new sensor data, thus sacrificing accuracy. Instances of high bias can be solved by increasing the features in the data sets while high variances can be deal with by reducing sensitivity of the model by reducing features. An optimal balance of features has been considered after careful evaluation of the correlation matrix considering a sizeable dataset.

Based on the outcome of confusion matrix, accuracy can be defined as.

$$ {\text{Accuracy}} = \frac{{{\text{T}}_{{\text{P}}} {\text{ + T}}_{{\text{N}}} }}{{{\text{T}}_{{\text{P}}} {\text{ + T}}_{{\text{N}}} {\text{ + F}}_{{\text{P}}} {\text{ + F}}_{{\text{N}}} }}, $$

Where,

TP = True positive slope correctly classified, TN = True negative slope correctly classified, FP = False negative slop incorrectly classified, FN = False positive slope incorrectly classified.

Accordingly precision and Recall score is calculated defined as.

$$ {\text{Precision}} = \frac{{{\text{T}}_{{\text{P}}} }}{{{\text{T}}_{{\text{P}}} {\text{ + F}}_{{\text{P}}} }}\, = \,,{\text{Recall = TPR = Sensitivity}} = \frac{{{\text{T}}_{{\text{P}}} }}{{{\text{T}}_{{\text{P}}} {\text{ + F}}_{{\text{N}}} }},{\text{FPR}} = \frac{{{\text{F}}_{{\text{P}}} }}{{{\text{T}}_{{\text{N}}} {\text{ + P}}_{{\text{P}}} }} $$

Where, TPR = It is defined correctly prediction positive, FPR = incorrectly predicted to positive.

The confusion matrix plot for hybrid ensemble method is shown in Figure 10 where where TP = 777, FN = 267, TN = 18 and FP = 28 and accuracy of the model have been tabulated in Table 2.

4.2 Comparison of the ROC curves and AUC Score

ROC analysis is a visual and numerical method used for distinguishing the given classes of classification algorithm and utilised for predicting structure and function from sequence data. ROC plot of individual learning model and hybrid ensemble classifier with average voting method is shown in Figure 11. A better classifier performance is observed when a particular ROC curve runs above the other ROC curve. With an AUC value closer to 1, better overall performance is noticed for final fire outbreak prediction by the proposed algorithm. AUC value of voting ensemble classifier is very closer to one suggestive of better prediction performance compared to other individual classifiers.

Figure 11
figure 11

Roc plot of different classifier model and voting based ensemble model with AUC score

4.3 Cumulative and Individual Importance of the Parameter of Hybrid Ensemble

Feature importance has been assigned a score for the respective features based on how useful it is in predicting a target variable and selection of that feature improves the efficiency and effectiveness of prediction of the problem. Individual importance and cumulative importance of the sensor dataset for hybrid ensemble classifier model is shown in Figure 12. The cumulative rising curve also helps us understand the relative weight of each of the contributing factors that are responsible for the fire detection.

Figure 12
figure 12

Feature importance—individual and cumulative plot for hybrid ensemble model

Four individual classifier models have been trained and model scores have been recorded in performance evaluation Table 3, both for individual models and hybrid ensemble model. The hybrid ensemble model which is defined that each of the four individual machine learning models generates 4 times that results in a combination of a total of 20 weak learners of the model. After that, hybrid ensemble by average voting classifier technique is used wherein most of the classes have been predicted by the weak learner of the model may be the final prediction of the hybrid ensemble model. The model accuracy, AUC, precision, classification for prediction of the proposed hybrid ensemble model is seen to be better than the individual models with lower MAE, and RMSE error than individual classifiers. The performance of the proposed work with other similar research work for fire detection system has been illustrated in Table 4 where comparison is made in term of precision. It is observed that the performance of the proposed model is better than the existing approaches.

Table 3 Performance Comparison Table for Hybrid Ensemble Classifier Model with Other Single Classifier Model
Table 4 Performance of the proposed work with other existing similar work

5 Conclusion

In this manuscript, a hybrid ensemble model based on average voting technique is proposed for fire detection on real time multi sensor data. Four individual classifiers namely logistic regression classifier, support vector machine (SVM), Decision tree classifier and Naive Bayes classifier have been used which are seen to perform satisfactorily for fire detection. The proposed machine learning algorithm has been validated on five different fire scenarios and NIST dataset has been chosen for this purpose. The proposed ensemble classifier is observed to perform better than the constituent classifiers as well as reported literatures and results indicates improved model accuracy, AUC, precision with reduced Mean Absolute error, Mean Squared Error and RMSE error. Smart multi sensor fire detection device has also been developed which efficiently detects the presence of fire and wirelessly transmit the sensor data to cloud platform for further data analytics.