Introduction

Conventional fire detectors, such as temperature-sensing and smoke-sensing fire detectors, monitor the concentration of smoke particles in ambient air. The installation location of the detector is selected on the basis of avoiding the presence of other particles, such as excessive dust particles and water vapor particles due to high ambient humidity. The detectors are also susceptible to errors due to light interference and electromagnetic interference, which cause false negatives and positives. Image-based flame detection (IBFD) technology has addressed the shortcomings of conventional detectors. Image-based flame detectors automatically detect fire through high-definition cameras, accurately feedback information, and issue early alarms; they can also detect fires at an early or even ultra-early stage and are more adaptable to the detection environment. IBFD algorithms are at the core of IBFD technology and directly determine the basic performance metrics of image-based flame detectors in terms of, for example, sensitivity, accuracy, false alarm rate, and alarm duration.

Several studies have focused on the optimization of IBFD. Zhang et al. [1] proposed a method for image-based flame recognition and detection that involves simulating the human visual system; this method can be used to obtain target information at high detection speeds. The effectiveness of the method has been verified experimentally. The results indicated that the algorithm based on the vision mechanism can considerably improve recognition accuracy. Celik et al. [2] proposed a flame detection algorithm that addressed problems such as severe loss of foreground information, a high false alarm rate, and weak generalizability of existing video IBFD algorithms. The algorithm utilized the YcbCr color space to effectively distinguish between image luminance and chrominance. The performance of the algorithm was tested using two sets of images; finally, a detection rate of 99% was achieved. Tan et al. [3] first adopted the concept of circularity of fire to identify flames, used brightness values to binarize images, and then used an algorithm to extract the fire area, thereby achieving highly efficient detection. Zhang et al. [4] and Huang et al. [5] proposed the use of convolutional neural networks based on spatiotemporal saliency features for video-based fire smoke detection. Schröder et al. [6] proposed a knock detection algorithm and tested it in different environments. The test results indicated that the algorithm could accurately detect and distinguish between fire and deflagration. Li et al. [7] and Yu et al. [8] proposed a method for identifying suspected smoke regions by combining two-step image segmentation and moving object detection and analyzed a video-based smoke detection algorithm based on the multi-feature fusion of smoke. Marbach et al. [9] proposed an automatic fire detection processing technique in images. Experiments have revealed that this technique is more suitable for special environments, such as those with other dynamic objects and with high wind speed, than for general environments. In the technique, dynamic information is extracted from separate images, a velocity gradient image is constructed, and time is divided equally; the results are then analyzed. Rosas-Romero [10] and Hu et al. [11] have developed techniques for the post-disaster assessment of forest fires based on image segmentation, area calculation, and loss calculation methods. Mohajane et al. [12] developed and trained five hybrid machine learning algorithms to detect forest fires and evaluated the effectiveness of the model. According to the training results, the frequency ratio logistic regression algorithm had the best forest fire prediction performance and can be used for accurate forest fire detection. Wang et al. [13] proposed an image processing–based gas detection algorithm to identify anomalies in flames. Xu et al. [14] developed a geostationary algorithm to detect fire thermal anomalies; the algorithm can be used to detect active fire pixels and considerably increase the capture rate of detected targets in video images.

National standards for IBFD algorithms are currently lacking. The existing standards for fire alarms are not completely applicable to IBFD, and the threshold is low, facilitating the practical applicability of market-accessible image-based flame detectors. The performance of fire detectors is highly variable, and the application scenarios are limited. Currently, standard methods for evaluating IBFD algorithms are lacking. The performance evaluation of image-based flame detectors relies on conventional methods, which are primarily based on the anti-interference ability of the detectors against external light sources and the response accuracy and sensitivity of experimental fire tests. Fire image data, however, contain considerably more information and are used in detection algorithms to help identify complex information and disturbing images. Image-based flame detector standards rarely account for the characteristics of image data. The conventional evaluation methods cannot be used to distinguish the advantages and disadvantages of the algorithms, and thus, they are not adopt to evaluating IBFD algorithms.

Li et al. [15, 16] analyzed convolutional neural network–based and image complexity–based fire detection algorithms. Wei et al. [17] adopted a risk evaluation method based on a fuzzy algorithm for relay protection and designed a simulation experiment. This method was more effective in improving the probability of risk assessment results compared with the traditional risk assessment method. Zhao et al. [18], Liu et al. [19] and Singh et al. [20] have proposed video image–based fire risk assessment methods based on the random forest algorithm. First, an assessment model was established on the basis of the random forest algorithm and fire characteristic data categories, and the fire risk was numerically assessed. According to the actual risk situation, the risk level is determined. Wang et al. [21] adopted the forest fire detection algorithm, which was suitable for infrared sensors, and used a medium-resolution imaging spectrometer. Using forest fire data, the algorithm produced highly accurate results. Rao et al. [22] proposed a method to detect forest fires using a space fire monitoring system. Xie et al. [23] established a bow-tie model of fire and explosion in oil depots, and combined with a risk matrix for risk assessment. Based on the cloud model theory, a quantitative risk assessment algorithm was developed and can be used to identify risks more accurately, providing a theoretical basis for fire detection. Qu et al. [24] proposed a multiparameter fire detection method based on feature depth extraction. In this method, a variety of fire characteristic parameters are collected as raw data, and an algorithm based on XGboost and other data are selected for training. The performance of the algorithm is improved by accounting for the classification bias of the upper-layer model. Experimental verification revealed that the method has high accuracy and sensitivity for a variety of single models. Wu et al. [25] used image processing technology to analyze the image data for obtaining the image feature vectors and examining the effectiveness of the algorithm. Cho et al. [26] conducted extensive performance evaluations of block-based image steganography algorithms. To increase the assessment efficient, Wooster et al. [27] used high-resolution fire detectors to provide an independent accurate assessment.

Based on the current testing standards for IBFD technology, this study established a more accurate IBFD algorithm evaluation method to determine the advantages and disadvantages of the algorithm. The evaluation of two algorithms of YOLOv3 series verifies the reliability of the proposed evaluation method and greatly improves the evaluation efficiency of IBFD algorithm. This study contributes to the evaluation criteria of the IBFD and provides convenience for the optimization of the algorithm, thereby better realizing the early warning of fire.

Research on the mass of algorithm evaluation indexes

Assessment method

A fuzzy comprehensive hierarchical evaluation method was used to determine the mass of the algorithm index in this study. It is a comprehensive subjective and objective evaluation method based on fuzzy mathematics combined with the analytic hierarchical process [28]. It combines qualitative and quantitative analysis to formulate the masses. The algorithm index is decomposed into a variety of indexes at several levels, and then the unstable factors of the evaluation process are adjusted according to the fuzzy comprehensive evaluation control. This method is used to address the difficulty of quantification, making the evaluation results more real, effective, and versatile.

Determination of the mass of the algorithm index

Construction of the fuzzy comprehensive evaluation method

The fuzzy interval matrix and the masses among the factors have notable influences on the results of fuzzy comprehensive evaluation. Different evaluation indexes have different effects on the final evaluation results. Therefore, an expert scoring method and the analytic hierarchical process were used to construct an evaluation index system to fix the mass of the algorithm index.

Constructing a judgment matrix

The relative importance of the four first-level evaluation indexes was evaluated by 20 experts. According to the indexes, five experts used their own knowledge and experience to evaluate the fuzzy interval between the indexes. Because the influence of each level of indexes on the upper-level indexes or the overall goal is different, the importance of the different levels between indexes of the same level had to be compared and sorted. A 1–9 nine-level scale method was used for evaluating the relative importance value of the former indexes compared with the latter indexes. A higher value can exhibit a higher relative importance.

According to Table 1, the evaluation value intervals determined by experts were calculated according to the following Eq. (1).

$$x_{{\text{i}}} = \frac{{\sum\nolimits_{{{\text{j}} = 1}}^{{\text{q}}} {\left[ {b_{{{\text{ij}}}}^{2} - a_{{{\text{ij}}}}^{2} } \right]} }}{{2\sum\nolimits_{{{\text{j}} = 1}}^{{\text{q}}} {\left[ {b_{{{\text{ij}}}} - a_{{{\text{ij}}}} } \right]} }}$$
(1)
Table 1 Evaluation value intervals determined by experts

where bij represents the upper limit of the interval and aij represents the lower limit of the interval.

The first-level evaluation index judgment matrix was calculated, as displayed in Table 2. According to the experts’ scores, the judgment matrix of each level index was determined, and the mass of the algorithm index was obtained through a calculation process: The elements of matrix were multiplied row by row to obtain a new matrix vector, as shown in Eq. (2).

$$A_{{\text{i}}} = \mathop \prod \limits_{{{\text{j}} = 1}}^{{\text{n}}} a_{{{\text{ij}}}}$$
(2)
Table 2 Judgment matrix of first-level evaluation index

According to the number of indexes in the matrix, the nth power of the matrix vector was calculated, as shown in Eq. (3).

$$\overline{A}_{{\text{i}}} = \left( {\mathop \prod \limits_{{{\text{j}} = 1}}^{{\text{n}}} a_{{{\text{ij}}}} } \right)^{{\frac{1}{{\text{n}}}}}$$
(3)

The normalization of the row vector wj was the mass vector W [29], as shown in Eqs. (4) and (5).

$$w_{{\text{j}}} = \frac{{\left( {\prod\nolimits_{{{\text{k}} = 1}}^{{\text{n}}} {a_{{{\text{ij}}}} } } \right)^{{\frac{1}{{\text{n}}}}} }}{{\left( {\sum\nolimits_{{{\text{j}} = 1}}^{{\text{m}}} {\prod\nolimits_{{{\text{k}} = 1}}^{{\text{n}}} {a_{{{\text{ij}}}} } } } \right)^{{\frac{1}{{\text{n}}}}} }}\quad j = 1,2, \ldots ,m$$
(4)
$$W = \left( {w_{1} ,w_{2} , \ldots ,w_{{\text{n}}} } \right)^{{\text{T}}}$$
(5)

The first-level evaluation index judgment matrix was multiplied, rooted to the nth power, and finally normalized to obtain the mass of the first-level evaluation algorithm index, as illustrated in Table 3. Then, the mass distribution of the algorithm evaluation index system was calculated, as displayed in Table 4.

Table 3 Mass of first-level evaluation index
Table 4 Mass of algorithm evaluation index system

Algorithm evaluation

After determining the algorithm evaluation index system, the final evaluation results of various algorithms were obtained according to the index system. First, the algorithm under test was executed to identify the image in the dataset, and then the actual value corresponding to each index was identified through the recognition process. Because the evaluation was conducted using the algorithm and did not completely depend on the subjective judgment of evaluation experts, the reliability and objectivity of the algorithm evaluation were strengthened. According to the mass of the algorithm index and experimental value, the detection algorithm was evaluated, and the results are expressed in terms of grades and scores. Two algorithms in the YOLOv3 were trained by different dataset, separately named as Algorithm 1 (A1) and Algorithm 2 (A2). According to the mass of the algorithm index, the 1,000 images selected from 10,795 images in fire image dataset were used for individually examining the ability of fire recognition by using A1 and A2. According to the evaluation mode of the algorithm, the evaluation of the IBFD algorithm was divided into four first-level evaluation indexes: “difficulty of image recognition,” “image recognition accuracy,” “algorithm efficiency,” and “algorithm anti-interference ability.” The evaluation grades were divided into four grades: “excellent,” “good,” “medium,” and “poor”, and the corresponding grades (100, 80, 50, 0) were formulated. The scores for the algorithm were represented by a corresponding evaluation vector and were obtained by multiplying the rating scales. The grades were defined as “excellent,” “good,” “medium,” and “poor,” which corresponds to the segments of [85, 100], [70, 85), [60, 70), and [0, 60), respectively.

According to the actual values obtained by detecting each index using the algorithm, the degree level of the interval corresponding to each index was obtained. In this study, 0 and 1 were used as the values in the level matrix K. According to the degree range specified, the degree level combined with the actual value of each index obtained using the algorithm was assigned in correspondence to the index.

Figure 1 illustrates the results of comparing the two algorithms in terms of image recognition at different stages of fire development. A1 could identify images of three fire stages at the same time, whereas A2 could identify the images of the initial stage of the fire and the full combustion stage but not those of the fire recession stage.

Fig. 1
figure 1

Results of comparing the two algorithms in terms of image recognition at different stages of fire development for a initial stage, b full combustion stage, and c fire recession stage of A1 and d initial stage, e full combustion stage, and f fire recession stage of A2

As illustrated in Fig. 2 A1 and A2 could not identify the characteristics of smoldering fire. Although they could identify both open fires and explosive fires, the recognition accuracy, recognition range, and recognition degree were different. The obtained level matrix K corresponding to A1 in terms of the fire complexity is as shown in Eq. (6).

(6)
Fig. 2
figure 2

Recognition results of fire characteristics for a smoldering fire, b open fire, and c explosive fire of A1 and d smoldering fire, e open fire, and f explosive fire of A2

According to the level matrix, the mass distribution of the evaluation results for the four levels was obtained, as shown in Eq. (7).

$$\left\{ {0.3005,0.3655,0.0892} \right.,\left. {0.0892} \right\}*K = \left\{ {0.4559,0.3655,0.1784,\left. 0 \right\}} \right.$$
(7)

The fire complexity index system of A1 was calculated; 45.59% of the results were “excellent,” 36.55% were “good,” and 17.84% were in the “medium,” and the system did not appear “poor.” The next step was to calculate the rating as shown in Eq. (8).

$$\left\{ {0.4559,0.3655,0.1784,\left. 0 \right\}} \right.*\left\{ {100,80,50,\left. 0 \right\} = 83.7500} \right.$$
(8)

The calculated rating value of 83.7500 indicated that the evaluation score of A1 in the second-level index system of fire complexity was within the range [70, 85), which corresponds to the “good” level. The score of A2 in terms of fire complexity was 77.3080. Thus, the evaluation results for A1 were better than those for A2, indicating that the detection performance of A1 was higher than that of A2 in terms of fire complexity.

The actual values of all the third-level evaluation indexes of the two algorithms were used to calculate the corresponding scores, and then the scores of the upper-level indexes were calculated step by step to obtain the scores of the first-level and second-level index systems of the two algorithms.

Tables 5 and 6 display the A1 and A2’s first-level and second-level evaluation index score, respectively. According to the results, the comprehensive scores of A1 and A2 were 90.2790 and 79.5143, respectively. Moreover, the overall performance of A1 was higher than that of A2 after examination. Specifically, A1 had a wider recognition range and higher accuracy. Under dark background conditions, the recognition speed of A1 was considerably higher than that of A2, and the error rate of A2 was higher under the interference of the red background conditions. Therefore, the examination results of A1 and A2 is consistent with the calculation results of the evaluation method. The scientific rationality of evaluation method of the IBFD algorithm was verified.

Table 5 Score of first-level and second-level evaluation index for A1
Table 6 Score of first-level and second-level evaluation index for A2

Conclusions

In this study, a method for evaluating IBFD algorithms was designed, and a three-level evaluation index system of the algorithm was established using the fuzzy comprehensive hierarchical analysis method. Different dimensions were used to evaluate the advantages and disadvantages of various IBFD algorithms, forming 4 first-level evaluation indexes, 9 second-level evaluation indexes, and 29 third-level evaluation indexes to establish the algorithm index evaluation system. A set of algorithm evaluation methods was designed by setting evaluation standards and defining ranges for each third-level evaluation index, and the various algorithms were evaluated according to their identification test results. The two algorithms were based on the YOLOv3 series for fire image dataset recognition and testing; the final evaluation results were obtained according to the score levels and corresponding the mass of the algorithm index, verifying the validity and rationality of the proposed algorithm evaluation method. The results provide a basis for the formulation of image-based flame detector standards and facilitate algorithm optimization.