1 Introduction

Wildfires are a constant threat to ecological systems and human safety, especially in sparsely populated rural areas with high fire risk factors. Estimations indicate that the general volume of the world forest stand is rapidly reducing and wildfires represent one of the main causes for such trend [1]. To minimize potential damage caused by wildfires, three types of actions are available: preventive action in pre-fire phase, early fire detection in initial fire ignition phase and quick and efficient fire-fighting in active fire burning phase. The focus of this paper is on fire detection in its initial stage so the prompt reaction in terms of fire-fighting activities could be executed. Traditional way of detecting wildfire is based on human observers situated on fire lookout towers with good visibility covering larger areas around the post. Thanks to advances in visual systems technology, remote cameras can be located on different posts enabling a single observer to cover multiple remote areas. In the past 10–15 years, development of automatic fire detection systems started to take place as an aid to human wildfire observation. There are many different approaches to wildfire detection using color and motion information to detect fire flames [29]. However, most of the visual detection systems are mainly designed for smoke detection, since the appearance of smoke is in most cases more visible than the fire itself. Terrain configuration and other obstacles could often occlude the fire flames. Methods dealing with smoke detection are mainly based on fusion of several different approaches, such as texture-based approach [1012], chromatic-based approach [1315], neural networks [16], detection of smoke motion [1721], support vector machines [22, 23], clustering on fractal curve [24], detection based on fractal properties of smoke [25], fuzzy-finite automata [26] and wavelet analysis [16, 2732]. Smoke detection systems provide surveillance assistance for smoke detection, but cannot be used as a self-sufficient solution. Such systems are capable of detecting smoke in the image with high accuracy but they often need additional human confirmation for final decision. The goal of our research is the increase in detection accuracy, bringing the performance of the detection systems closer to the results that could currently be obtained by human observers. We present a novel method for visual smoke detection. The proposed method is based on motion detection, chromatic analysis and smoke-dynamics analysis followed by alarm verification based on the spatial context information. The method is named spatial context smoke detection method (SCSD). The rest of the paper is organized as follows: the SCSD method is presented in detail in Sect. 2. Evaluation methodology is given in Sect. 3, followed by evaluation results in Sect. 4. Conclusion is given in Sect. 5.

2 Smoke detection

Detection phases for SCSD method are presented in Fig. 1. Sequence of visible spectrum images is used as an input for the smoke detection system along with real-time meteorological data from the detection site. The smoke detection process is divided into several phases dedicated to the different aspects of detection. In the motion detection phase, moving regions are extracted and forwarded to the following phases with the original image. In the chromatic analysis phase, additional verification of motion regions is performed based on region color characteristics. The wavelet-based analysis is used for change detection in high frequency content of the image. Appearance of smoke over a region should gradually change the energy of the region, and the edges should loose their sharpness without vanishing instantly.

Fig. 1
figure 1

SCSD method consisting of several detection phases calculates the possibility of smoke in the input image

Every detected region is iteratively examined for smoke dynamics like growth and lateral or upward motion. Another part of the detection system is the visual context analysis. First, the image is divided into homogenous regions in the segmentation phase. The classifier generates the possibility for a region belonging to a certain class, so only those regions that have high classifying certainty are classified. Regions that are not classified do not play a role in the spatial-context analysis. Certain categories of false alarms could be eliminated based on the spatial arrangement of the detected regions and the classified regions. Real-time meteorological data are obtained from the detection site using a meteorological weather station. Meteorological analysis phase outputs a value that indicates the potentiality of fire in the surroundings given the current meteorological conditions. Finally, all the information regarding color, texture, dynamics, meteorological and spatial context is used as an input to the inference engine. In the case that all the indicators imply that the region in the image could be classified as smoke an alarm is raised and a possibility of smoke for the given image is generated.

2.1 Motion detection algorithm

The first step in most fire and smoke detection systems is motion detection. There are many approaches to motion analysis used for smoke detection like detection based on binning and clustering, motion segmentation using motion history image or clustering motion on a fractal curve [3335]. Method used in SCSD is adaptation of a general method for motion detection presented in [36], modified to take into account smoke characteristic and smoke-detection scenarios. First, a statistical background model is used

$$\begin{aligned} \mathrm{class}({p_n}) = \left\{ {\begin{array}{*{20}{c}} \mathrm{foreground}&{|{p_{n - 1}} - {p_n}| > \delta \cdot \overline{{\sigma _n}} } \\ \mathrm{background} \mathrm{else} \\ \end{array}} \right.\!,\nonumber \\ \end{aligned}$$
(1)

where \({p_n}\) represents value of pixel \(p\) in the \({n\mathrm{th}}\) frame, \(\delta \) is the relative deviation threshold and \(\overline{{\sigma _n}}\) represents standard deviation for a given pixel in the \({n\mathrm{th}}\) frame. Standard deviation is calculated for every pixel using

$$\begin{aligned} \overline{{\sigma }}_{n + 1} = a|{p_{n + 1}} - \overline{p}_{n + 1} | + (1 - a)\overline{\sigma }_n, \end{aligned}$$
(2)

where \(a\) is a parameter defining responsiveness of the model to the changes in the background, and \(\overline{{p_n}}\) is the running average for a given pixel, calculated by

$$\begin{aligned} \overline{p}_{n + 1} = a \cdot {p_{n + 1}} + (1 - a)\overline{p}_n. \end{aligned}$$
(3)

Parameters \(a\) and \(\delta \) are variable parameters, and we have to determine the values that give the best results on image testing database. Testing algorithm performance for every parameter combination is computationally expensive, since the quality of selected parameters has to be evaluated using several different detection sequences and scenarios, so a heuristic approach is adopted using genetic algorithms. Genetic algorithm fitness function evaluates the motion detection error for given set of parameters. Errors from all the test sequences are accumulated and represent the fitness value. The aim of the optimization is to find a global minimum of the fitness function in continuous search space. Calculating fitness value for every individual in every generation could be computationally expensive if the test database contains a large number of images. So the population size is set to 20 to achieve a compromise between search space coverage and computational requirements. Diversity is defined by limiting the initial range of the input variables to \(a \in \left[ {0,1} \right]\) and \(\delta \in \left[ {1,5} \right]\). Two of the best-scored individuals are labeled elite and are guaranteed to survive to the next generation. The rest of the parents create new children through crossover and mutation. In this case, 80 % of children are generated using crossover, and 20 % through mutation. Best fitness function values are obtained for parameter values \(a = 0.026\) and \(\delta = 1.719\).

2.2 Chromatic and texture analysis

Chromatic analysis is performed after the candidate regions are detected. The analysis is based on results obtained from database of ground truth segmentation containing 1,000 images. Ground truth images are analyzed taking into account smoke regions mean value, pixel color space distribution and first, second and third region moments. Based on experimental results, certain rules are made concerning the characteristics for the detected phenomenon that define the range of chromatic properties specific to smoke. Since smoke is semi-transparent phenomenon, in specific scenarios it is difficult to distinguish it from the background based only on color information. Figure 2 shows RGB values for neighboring smoke and background pixels taken for 5 images from different scenes. However, color space covered by smoke pixels is relatively limited, and diagonally positioned due to the fact that smoke is always colored in light-to-dark gray color interval. So by adopting color-spread rules and limiting region intensity moment values, a certain number of non-smoke regions could be eliminated. Smoke is visible because of the effect of light scattering from smoke molecules, and such effect is not the same in every color channel. The light exhibits scattering effect when it comes in contact with atmospheric molecules, known as Rayleigh scattering, where the intensity of the scattered light is inversely proportional to wavelength. This implies that the scattering at \(400\) nm is 9.4 times as great as that at \(700\) nm for equal incident intensity, so in performing chromatic analysis the blue channel is more extensively used than other channels.

Fig. 2
figure 2

Smoke (red) and adjacent background (blue) pixels from neighboring regions in RGB color space (color figure online)

Besides the color information, texture of the region is also used as an indicator about smoke characteristics. After the initial change detection, region candidates are checked for texture consistency. The appearance of smoke affects the high frequency content of the image by gradually smoothing the edges. Edges in the image represent local extrema in the wavelet domain and the smoothing effect of the edges results in a decrease in values in these extrema. Since the edges and texture contribute to the high frequency information of the image, energies of the wavelet subimages drop due to smoke in the image sequence [31]. The possibility of frequency analysis with different band-pass filters of different sizes is the reason why wavelet analysis was chosen over other image analysis tools. Deviation in texture content is analyzed using spatial discrete wavelet transformation (DWT). The analysis is performed in the first four decomposition levels, since the further decomposition does not contribute any significant information regarding the possibility of smoke in the analyzed region [27]. After testing several wavelet-basis functions, Daubechies 10 wavelet was selected giving equal or better results over other basis functions. The results of the transformation in particular decomposition levels are approximation coefficients matrix (cA) and detail coefficient matrices in horizontal (cH), vertical (cV) and diagonal direction (cD). The total energy of the region is calculated using standard wavelet energy equation:

$$\begin{aligned} {E_w} = \sum \limits _{r,c} {\mathrm{cD}_{(r,c)}^2 + \mathrm{cH}_{(r,c)}^2 + \mathrm{cV}_{(r,c)}^2} \end{aligned}$$
(4)

The energies of the detected region are normalized for every decomposition level; components of all the directions are taken into account and compared to the referent background values. Significant deviation from the referent values is an important indicator considered in further detection phases.

2.3 Motion dynamics and meteorological context

Regions detected using motion detection are checked for smoke-like behavior. Smoke regions should exhibit gradual growth without intensive variations in size. Size of the detected region is measured in every iteration. Since the distance between the camera and the real position of the detected region is unknown, certain aberrations are allowed, but the size of the region should gradually increase. Smoke regions should also exhibit upward and lateral motion. The upward shift of the center of gravity of the detected region should be persistent in the first phase of the detection. These motion characteristics are compared against referent values obtained from the test. The correlation value between the referent and current motion dynamics values is used as an output from this phase. The meteorological context includes information about wind speed, wind direction, air pressure, humidity and temperature measured by appropriate meteorological stations. Meteorological information is used to determine the weather conditions to automatically adjust the sensitivity of the detection process, for example excessive humidity implies rainy weather or after-rain period. Meteorological analysis gives as its output a value that indicates the potentiality of fire in the surroundings given the current meteorological conditions. Wind speed and direction are used in combination with motion dynamics analysis. The lateral motion of the detected regions should not greatly deviate from the wind direction readings of the station. However, a certain extent of deviation is allowed due the possible difference of the meteorological conditions between the detection site and the actual position of the region. In case the lateral motion of the region is the opposite of the wind direction readings from the station the region should not be classified as smoke. It is important to emphasize that although wildfires can affect the direction of the wind, for early fire detection that is not the case. We are trying to detect the fire in its starting, incipient phase when the fire is small, so fire influence to the wind direction could be neglected.

2.4 Segmentation based on region merging

There are many approaches to image segmentation such as clustering methods, compression based methods, graph partitioning methods, histogram-based methods and region merging methods [37]. SCSD uses statistical region merging [38, 39] for image segmentation. Statistical region merging used in SCSD is adapted for segmentation that is more sensitive to visual smoke characteristics, giving priority to correct segmentation of smoke regions over the segmentation of other categories. In the start of the segmentation process, every pixel is considered as a separate region, and the merging is iteratively performed based on merging predicate \(P(R,R^{\prime })\) that decides whether two regions \(R\) and \(R^{\prime }\) belong to the same statistical region. Merging predicate is defined as:

$$\begin{aligned}&P(R,R^{\prime })\nonumber \\&= \left\{ {\begin{array}{*{20}{l}} \mathrm{true}&\quad |\overline{{R_x}^{\prime }}\! -\! \overline{{R_x}} | \!\le \! b(R,R^{\prime }){} \, \mathrm{and}\, \overline{|{R_\mathrm{b}}^{\prime }} - \overline{{R_\mathrm{b}}} | \!\le \!{T_\mathrm{b}}\\&\qquad {\text{ and}}\, |\overline{{R_x}} - \overline{{R_y}} |\! <\! {T_\mathrm{G}},\\ \mathrm{false}&\quad \mathrm{else} \\ \end{array}} \right.. \end{aligned}$$
(5)

where \(\overline{{R_x}}\) and \(\overline{{R_y}}\) represent average values of different color channels from RGB color model, \({R_\mathrm{b}}\) represents blue color channel, \(T_\mathrm{b}\) and \(T_\mathrm{G}\) represent blue and general threshold, respectively, where \({T_\mathrm{b}} < {T_\mathrm{G}},\) and \(b(R,R^{\prime })\) is defined as:

$$\begin{aligned} b(R,R^{\prime }) = g\sqrt{\frac{1}{{2Q}}\left(\frac{1}{{|R|}} + \frac{1}{{|R^{\prime }|}}\right)\ln \frac{2}{\delta }}, \end{aligned}$$
(6)

where \(| \cdot |\) represents the cardinality, \(g\) is the maximum allowed value for used color channel (usually 255), \(\delta \) is defined as \(\delta = \frac{1}{{3|I{|^2}}}\), where \(|I|\) represents the number of pixels in the image and \(Q\) is a tunable parameter used for varying the general number of the segmented regions. Predicate \(P(R,R^{\prime })\) defines three conditions to be satisfied in order for merging to occur. First condition is a statistical region merging criterion defined by \(|\overline{{R_x}^{\prime }} - \overline{{R_x}} | \le b(R,R^{\prime }){}\) [38]. Using this criterion, visually homogenous regions could be extracted based on their chromatic characteristics. This enables the method to perform under variable lighting conditions, since the segmentation is based on statistical region homogeneity, rather than intensity differentiation constraints. Regions satisfying the region homogeneity condition for every color channel are further analyzed. To make the segmentation procedure more sensitive to smoke regions, additional conditions are introduced in the predicate. Appearance of smoke in the image region results in change in the chromatic characteristics that is most noticeable in the blue channel. To make the segmentation procedure more rigorous when dealing with smoke regions, conditions concerning the aberration in the blue channel are limited using a threshold \(T_\mathrm{b}\). Another characteristic of smoke is its neutral coloring, spanning from light to dark gray. Average values for the regions should be similar in every color channel, so a general threshold \(T_\mathrm{G}\) is introduced. These conditions are introduced to reduce partial blending of smoke regions with the background regions in the process of segmentation. However, the process of segmentation is not the replacement for smoke detection, it is rather used to improve segmentation of regions adjacent to smoke for false alarms reduction using spatial context analysis in the following phases. The experimental testing performed with test database images and ground-truth segmentation shows that the segmentation gives the best results in the interval \(Q \in [{195,207}]\), \({T_\mathrm{b}} \in \left[ {17,21} \right]\) and \({T_\mathrm{G}} \in \left[ {35,39} \right]\) for various scenarios, but there is not a single optimal value for every situation. For evaluation purposes, values \(Q = 202\), \(T_\mathrm{b}=19\) and \(T_\mathrm{G}=38\) are used giving the best average segmentation results on segmentation test database.

2.5 Categorization of segmented regions

Fire detection systems are placed in nature surroundings, and for the task of general categorization six main categories are defined as:

  • Smoke

  • Low clouds and fog

  • Sun and light effects

  • Water surfaces (sea, lakes, rivers, etc.) and sky

  • Distant landscapes

  • Vegetation.

The phase of categorization is not used for detection itself. It is used in validation process for false alarms elimination, so it is not necessary to categorize the whole image. Categorization of a certain region should be performed only if the classifier decision with high degree of confidence could be made. Regions that are not classified do not play a role in the alarm validation process. In this phase, regions are not categorized into Smoke category as well as Low clouds and fog category, because these phenomena are of our prime interest. Method used for general categorization is Naïve Bayes (NB) classification based on kernel density estimation [40]. NB is selected for the classification process for several reasons. NB classifiers are effective, efficient, robust and support incremental training. Furthermore, NB classifiers can deal with a large number of variables and large data sets, and they handle both discrete and continuous attribute variables. Features such as average RGB and HSI values, first and second region moments, and the size of the region are selected for the classification process. Let \(c_k\) be the possible semantic class from one of the defined six categories. The probability that a region having a feature vector \(\varvec{F}\) belongs to a class \(c_k\) is given by the Bayes’ theorem

$$\begin{aligned} p({c_k}|\varvec{F}) = p(\varvec{F}|{c_k})\frac{{p({c_k})}}{{p(\varvec{F})}}, \end{aligned}$$
(7)

where \(p(c_k)\) represents the prior probability of region belonging to the class \(c_k\), that can be obtained from the ground truth segmentation, and \(p(\varvec{F})\) represents the probability of the given feature vector from the prior feature distribution. Probability of an observed value could be easily obtained using kernel density estimation. Gaussian kernels are used with estimated density averaged over a large set of kernels

$$\begin{aligned}& p(\varvec{F}|{c_k}) = \frac{1}{n}\sum \limits _i {g(\varvec{F};{\mu _i},{\sigma _k})}\end{aligned}$$
(8)
$$\begin{aligned}& g(\varvec{F};{\mu _i},{\sigma _k}) = \frac{1}{{\sqrt{2\pi {\sigma _k}} }}{e^{ - \frac{{(F - {\mu _i})}}{{2\sigma _k^1}}}}, \end{aligned}$$
(9)

where \(i\) ranges over the training points of feature vector \(\varvec{F}\). The training of the classifier is performed using a ground truth image database containing over 6,000 regions belonging to one of the previously defined classes.

2.6 False alarms reduction

Primary causes for false alarms are natural phenomena visually similar to smoke that could be in certain conditions misinterpreted as smoke even by human observer such as fog, clouds low to the ground particularly when video cameras are located on mountain tops, dust from the ground, water evaporation, etc. On the other side, there are also other types of false alarms caused by natural phenomena but human observer could easily dismiss these alarms. Examples are rain drops on camera (Fig. 3a), or sun effects (Fig. 3b). These occurrences can easily adduce the system to trigger a false alarm.

Fig. 3
figure 3

Generated false alarms caused by a raindrop on camera, b sunlight effect

Such scenarios can be avoided by introducing specific methods for false alarms elimination using alarm shape characteristics.

2.6.1 Shape analysis based false alarm reduction

Smoke has irregular convex shape that is not compact, while raindrops on the other hand have high compactness factor. Measure of compactness and curvature can be calculated using Eqs. (10) and (11), respectively, taking into account the perimeter and area of the object:

$$\begin{aligned}& c = \frac{{4\pi A}}{{{l^2}}} \end{aligned}$$
(10)
$$\begin{aligned}& B = \frac{1}{l}\sum \limits _i {\alpha _{}^2(i)}, \end{aligned}$$
(11)

where \(c\) represents compactness, \(l\) is the perimeter, \(A\) is the area, \(B\) is curvature, \(i\) is the index of border point of the region and \(\alpha \) is the angle between two lines intersecting at point \(i\). Each line is passing through a different boundary point with distance of three neighboring boundary points in opposite circular directions from \(i\). The method used for reducing alarms generated due to sunlight effects is based on specific characteristic for sunrays that have a rather elongated shape very distinguishable from smoke. The goal of the method is to calculate the elongation factor for the detected region and determine if it could be rejected as sunray using this property. The axis of the least moment of inertia is calculated by

$$\begin{aligned} \theta = \frac{1}{2}\arctan \left(\frac{{2{\mu _{11}}}}{{{\mu _{20}} - {\mu _{02}}}}\right)\!, \end{aligned}$$
(12)

where \(\theta \) is the angle of this axis with the positive x axis, and \({\mu _{xy}}\) represents the moment of the \(xy\) order. When the axis is determined a bounding box for the region can be found, and the elongation factor obtained. From experimental experiences, the data show that the smoke regions have elongation less than factor 3, while the sunrays have elongation factor greater than 5. Presented methods are used for two particular types of false alarms, and by using the spatial context information additional scenarios resulting in false alarms could be dismissed.

2.6.2 Spatial-context based false alarms reduction

Most of the cameras used for smoke detection have pan-tilt ability and cover a \(360^{\circ }\) area around the mounting point. In some situations, large water surfaces appear in part of the scenes, where water motion and spray in windy weather could be a trigger for false alarms. It is also possible that clouds, which by shape and chromatic characteristic resemble smoke, could trigger a false detection, as well as partial capture of sun contours on the edge of the image. All potential detections are compared with categorization results. In most cases, categorized regions do not cover the whole area of the image because only those regions that are classified with high certainty factor are used in this phase. Relative positions of the candidate alarm regions and categorized regions in their vicinity are analyzed. Figure 4 shows a possible spatial relation between two regions. These relative relations could be divided into directional and topological. Directional relations describe the relative positions of the regions to each other, e.g. left, above, completely below, mostly right, etc. Topological relations describe the non-directional relations such as near, far, surrounds, excludes, connects, borders, etc. [41]. Absolute positions of the regions are also used, such as left corner, middle of the image, bottom, etc. Relative relation between two regions can be acquired using regions bounding box, center of gravity and relative angle between two regions as Fig. 4 shows.

Fig. 4
figure 4

The bounding boxes, centers of gravity and relative angle \(\alpha \) define spatial relation between regions R1 and R3

Depending on the relative position of the candidate regions for detection and classified categories in the vicinity, certain rules can be introduced for elimination of specific scenarios. Examples of spatial-context based elimination rules are

figure a1

The set of elimination rules contains all the constraints regarding spatial arrangement of specific region classes. Based on this information, certain categories of false alarms could be dismissed in this phase of the post-detection false alarm reduction. There are possible scenarios where smoke could occur beyond direct vision of the camera and gradually appear in the scene (e.g. behind a mountain). In such cases, the smoke candidate regions would be detected after emerging behind the occluding terrain. Such scenarios are not eliminated as false alarms due to the base-line contact of the emerging region with regions categorized as vegetation or distant landscape. In case of the base-line contact of the region with the mentioned categories, the rejection is not performed. This prevents the elimination of valid alarms that appear out of the context of their origin.

2.7 The inference engine

The results from all detection phases are used as inputs to the inference engine. Every input value represents the extent of deviation from the referent values for the given input type. All the values are mapped to the interval [0,1], where 1 represents significant deviation from the referent values, and 0 represents no deviation. The mapping function is specific to every input type, e.g. the deviation for the shape elongation parameter is mapped to 0 if the calculated elongation is 2.5, which is the actual referent value, and in case the calculated elongation is greater than 7 the parameter is mapped to 1. Similarly, mapping functions are defined for every input type based on the referent values from the training set and the allowed range. The mapping is performed for every input type except for meteorological information. Meteorological input is used for the adjustment of the detection sensitivity. In the case of weather conditions that imply low fire risk (e.g. excessive humidity implies rainy weather or after-rain period), the sensitivity of the system is adjusted accordingly. Every region detected by the motion detection phase is recorded in the system, and the analysis methods are performed in several iterations. This ensures the consistency of smoke-like behavior. The system uses elimination-based detection, where every input is used as a basis for potential elimination. Significant deviation from the referent value in any of the input parameters results in elimination of the region from the detection process. One example would be the deviation in chromatic characteristics. If the chromatic values of the candidate region fall out of the allowed range predefined in the testing phase of the system, the deviation values would be significant and would act as a trigger for the region elimination. The same rule applies for motion analysis, dynamics analysis and texture analysis. The deviation tolerance is additionally adjusted based on the current sensitivity of the system. In the case of weather conditions where fire occurrence is highly unlikely, the tolerance to deviation is decreased for 10 % in all the detection aspects. If the candidate region persists through the confirmation period an alarm is raised along with the actual calculated possibility of smoke. The possibility of smoke is calculated as a complement of average deviation values from all the inputs. Examples of detection images are shown in Figs. 3 and 6.

2.8 Computational complexity

The computational complexity could be obtained through analysis of the computationally significant phases of the algorithm. The most computationally demanding phases are motion detection, wavelet analysis and segmentation based on region merging. Other detection phases could be omitted from computational complexity analysis, since they are not computationally expensive in relation to the mentioned phases.

The motion detection phase can be shown to have complexity of \(O(4N+5M)\), where \(N\) denotes the number of pixels in the image, and \(M\) is the number of foreground pixels. The complexity of the wavelet analysis phase is \(O(8M)\) for four decomposition levels, where only pixels from the candidate regions are taken into account. The segmentation based on region merging can be shown to have complexity of \(O(N\mathrm{log}(g))\), where \(g\) is a constant defined in Sect. 2.4, so the complexity is linear in \(N\). The overall complexity of the algorithm is cumulative complexity of all the phases \(O((4+\mathrm{log}(g))N + 13M) \approx O(k_1N+k_2M)\) which is linear in \(N\) and as such suitable for real-time operation.

3 Evaluation methodology

To evaluate and compare different smoke detection algorithms certain evaluation measures have to be defined. A novel approach to smoke detection quality assessment is introduced through the definition of fuzzy measures. They have been used in evaluation process in combination with the state of the art measures that are best suited for smoke detection.

3.1 General detection quality

Quality measures for such systems can be divided in two categories: evaluation on global and local scale. Evaluation on global scale is performed using classifier results from multiple consecutive images or multiple image sequences. Classifiers are evaluated based on true–false detections on the image level. When evaluating on local scale, focus is on detection quality on a single image [42]. A pixel is considered to be the smallest detection unit when evaluating on local scale, and a single image is considered to be the smallest detection unit on the global scale. In detection theory, various detector evaluation measures could be defined [43, 44] and for wildfire smoke sensor evaluation seven measures for general detection quality assessment have been used: cd—sensitivity measure or true positive rate defined by (13), cr—specificity measure or true negative rate defined by (), md—false negative rate (complement to sensitivity) defined by (15), fd—false positive rate (complement to specificity) defined by (16), acc—accuracy measure defined by (17), ppv—precision or positive predictive value defined by (18), and mcc—Matthews correlation coefficient [45] defined by (19).

$$\begin{aligned}&\mathrm{cd} = \frac{{|\mathrm{TP}|}}{{|\mathrm{TP}| + |\mathrm{FN}|}} \end{aligned}$$
(13)
$$\begin{aligned}&\mathrm{cr} = \frac{{|\mathrm{TN}|}}{{|\mathrm{FP}| + |\mathrm{TN}|}} \end{aligned}$$
(14)
$$\begin{aligned}&\mathrm{md} = \frac{{|\mathrm{FN}|}}{{|\mathrm{TP}| + |\mathrm{FN}|}} = 1 - \mathrm{cd} \end{aligned}$$
(15)
$$\begin{aligned}&\mathrm{fd} = \frac{{|\mathrm{FP}|}}{{|\mathrm{FP}| + |\mathrm{TN}|}} = 1 - \mathrm{cr} \end{aligned}$$
(16)
$$\begin{aligned}&\mathrm{acc} = \frac{{|\mathrm{TP}| + |\mathrm{TN}|}}{{|\mathrm{TP}| + |\mathrm{TN}| + |\mathrm{FP}| + |FN|}} \end{aligned}$$
(17)
$$\begin{aligned}&\mathrm{ppv} = \frac{{|\mathrm{TP}|}}{{|\mathrm{TP}| + |\mathrm{FP}|}} \end{aligned}$$
(18)
$$\begin{aligned}&\mathrm{mcc} = \frac{{|\mathrm{TP}| \cdot |\mathrm{TN}| - |\mathrm{FP}| \cdot |\mathrm{FN}|}}{{\sqrt{(|\mathrm{TP}| {+} |\mathrm{FP}|)(|\mathrm{TP}|{+} |\mathrm{FN}|)(|\mathrm{TN}| {+} |\mathrm{FP}|)(|\mathrm{TN}| {+} |\mathrm{FN}|)} }}\!,\nonumber \\ \end{aligned}$$
(19)

where TP is the number of correctly classified positive detections, FP is the number of falsely classified positive detections, TN is the number of correctly classified negative detections and FN is the number of falsely classified negative detections. In ideal case FP and FN are zero, so ideal classifier has cd \(=\) 1, fd \(=\) 0, cr\(\,=\,\)1, md \(=\) 0, acc \(=\) 1, ppv \(=\) 1 and mcc \(=\) 1.

3.2 Quality evaluation based on fuzzy logic

Individual pixels could be classified as partially smoke or background, based on degree of membership to each of these classes. Using fuzzy evaluation, the classification error is determined based on membership difference of referent (ground-truth) and observer decision regarding individual pixels for each class [42]. Error \(\mathrm{err}_\mathrm{p}\) of an observer for a pixel \(p\) is calculated by

$$\begin{aligned} {\mathrm{err}_\mathrm{p}}(R,O) = \left\{ {\begin{array}{*{20}{l}} {\alpha \cdot R \cdot (R - O)}&\quad {R > 0} \\ {O - R}&\quad {O \ge R} \\ \end{array}} \right., \end{aligned}$$
(20)

where \(R\) is the referent fuzzy value for the pixel \(p\), \(O\) is the value given by the classifier, and \(\alpha \) is the parameter defining the cost of the error. This error measure takes into account the type of error as well as the extent of the error. The case where the assessed value for the smoke membership is grater than the referent ground-truth (\(O>R\)) fuzzy-segmentation is called fuzzy false detection. The real cost or impact on the environment of such scenario is significantly less than in the case of fuzzy missed detection when. The error cost for fuzzy missed detection increases with referent fuzzy value for that pixel and with the difference between observer and referent values. When both values are equal (\(O=R\)), the error is zero. The parameter \(\alpha \) is defined experimentally, and value \(\alpha = 3\) gives the best cost proportion between the two types of errors. This parameter takes into account the impact of the error on the environment as well as on the practicality of the detection system.

4 Results

The proposed SCSD method is compared with two other existing methods: method denoted as Method 1 presented in [32], and method denoted as Method 2 presented in [46]. Figure 5 shows comparison flowcharts for all three methods. Method 1 consists of several detection steps. First, a motion detection algorithm is used to obtain moving regions from the image. Motion detection is based on background estimation and recursive thresholding. The following phase is the detection of decrease in high frequency content using spatial wavelet transform. Image is divided into blocks of size 8 by 8 pixels and wavelet coefficients are calculated for each block. Single-level discrete wavelet transform is used in the implementation. The following phase is the detection of decrease in \(U\) and \(V\) color channels caused by the appearance of smoke in the region. In the next step, the flicker on the smoke boundaries is used as additional information for detection. Finally, shape analysis is performed determining the convexity of the detected region. The algorithm gives the best results at close range (\({<}100\) m), but is also very reliable at greater distances (\(<\!2\) km). The authors have also designed algorithms that deal primarily with long range smoke detection such as [47, 48].

Fig. 5
figure 5

Flowcharts of sequential detection phases for a novel SCSD method, b Method 1, and c Method 2

Method 2 is based on detection of changes in the blue channel of the image and dynamics analysis of the detected regions. The first step of the method is the binning of the image, based on calculations of the average variation in the number of smoke pixels. Mean value for every bin is compared with referent background value. Background bin values are periodically updated to avoid significant change in scene lighting that could trigger false alarms. In case that the difference in the blue channel between current and the referent bin is greater than the calculated threshold, the detected bins are declared as candidate bins and the confirmation phase is initiated. During the confirmation phase, a predefined minimum number of clustered candidate regions should exist in the scene, or the confirmation phase is interrupted and the process reverts to pre-detection state. Also, the detected regions should exhibit gradual growth, so the regions that are exceeding the growth speed specific to smoke are dismissed. Regions satisfying all the detection conditions for a sufficient validation time period are declared as smoke regions and a smoke alarm is raised. Evaluation is performed on the set of 2,870 test images from 10 different video-sequences [10]. Meteorological data were not used in the evaluation process. Figure 6 shows samples of detection outputs for the same frame in one testing sequence.

Fig. 6
figure 6

Original input image and final detection outputs for the same frame in one of the testing sequences for all compared methods

Basic type of evaluation is based on the most important global measures: sensitivity measure (cd) and specificity measure (cr) shown in Table 1. The results show that none of the methods performs flawlessly since for all methods \(\mathrm{cd}_\mathrm{gl}\ne 1\) and \(\mathrm{cr}_\mathrm{gl}\ne 1\). Results for measures \(\mathrm{md}_\mathrm{gl}\) and \(\mathrm{fd}_\mathrm{gl}\) are not presented, since these measures are complementary to \(\mathrm{cd}_\mathrm{gl}\) and \(\mathrm{cr}_\mathrm{gl}\), respectively. Local measures could be used for more detail detectors quality comparison in the case when compared methods have similar results concerning global measures.

Table 1 Global measures for all compared methods

Various graphical representations of detection quality measures have been proposed like ROC curves [32] or DET curves [44], but they are suitable for analysis when discrimination threshold is varied. For detection algorithm quality comparison quality graphs [42] are more appropriate (Fig. 7). Quality graphs have image rank on x axis and increasingly sorted appropriate quality measure values on y axis. It is important to emphasize that image rank is not the same for all quality measures. There are specific situations where certain measures could not be calculated, for example, when \(\mathrm{TP} + \mathrm{FN} = 0,\, \mathrm{cd}\) measure could not be calculated due to division by zero, hence in Fig. 7 the number of sample images was not 2,870 (number of images in test sequences), but 185.

Fig. 7
figure 7

Quality graph for measures \(\mathrm{cd}_\mathrm{lc},\, \mathrm{cr}_\mathrm{lc},\, \mathrm{acc}_\mathrm{lc}\; \text{ and}\; \mathrm{mcc}_\mathrm{lc}\)

Table 2 shows average numerical values of local measures for all compared methods.

Table 2 Average local measures for all compared methods

The last quality evaluation of SCSD method against other two methods from literature is error measure \(\mathrm{err}_\mathrm{p}(R,O)\) based on fuzzy logic described in Sect. 3.2. Equation (21) defines fuzzy quality measure for one pixel \(\mathrm{err}_\mathrm{p}\) and average error for one individual image \(\mathrm{err}_\mathrm{I}\) is calculated by

$$\begin{aligned} {\mathrm{err}_\mathrm{I}} = \frac{1}{N}\sum \limits _{p = 1}^N {{\mathrm{err}_\mathrm{p}}} (R,O), \end{aligned}$$
(21)

where \(N\) represents the number of pixels in the image. Similar to positive predictive measure, quality graphs are not suitable for this measure so average \(\mathrm{err}_I\) values for all test set are given in Table 2. The target value is 0, so it is evident that SCSD method has the best results. It is important to emphasize that the evaluation for fuzzy measure was performed on only one sequence due to complexity of gradational hand-segmentation for every image. Based on the presented evaluation results it is clear that the proposed method delivers accurate detections when compared with similar detection methods. The evaluation shows that the sensitivity of the proposed method (\(\mathrm{cd}_\mathrm{gl}\)) exceeds the results obtained by the referent methods while retaining high the specificity (\(\mathrm{cr}_\mathrm{gl}\)) of the system. High sensitivity of the system allows for a more prompt reaction that is a very important characteristic of any sort of detection system, especially systems dealing with detection of phenomena with great impact to human and ecological safety. The results regarding the accuracy of the systems (\(\mathrm{acc}_\mathrm{lc}\)) show that the proposed method outperforms the referent methods. A similar measure describing the overall detector quality (\(\mathrm{mcc}_\mathrm{lc}\)) shows the best performance of the proposed method taking into account all detection classification types. And finally, the results regarding the cumulative fuzzy error of the systems show that the proposed method generates the lowest fuzzy error when compared to referent methods.

5 Conclusion

Method is evaluated using measures that provide both global and local evaluation of smoke detection systems. Since smoke is a semi-transparent and amorphous phenomenon, it is hard to precisely determine the clear border between smoke and background pixels, so in order to minimize the evaluation error fuzzy evaluation metrics were also used in evaluation process. The proposed SCSD method was compared with two existing methods for smoke detection. The most important indicator of smoke detection system quality is evaluation on global scale and SCSD method outperforms other compared methods, having the highest sensitivity measure \(\mathrm{cd}_\mathrm{gl}\). More precise evaluation could be performed using local measures evaluated on local scale where SCSD gives better results for measures \(\mathrm{cd}_\mathrm{lc},\, \mathrm{acc}_\mathrm{lc},\, \text{ and}\,\mathrm{mcc}_\mathrm{lc}\). Evaluating all three methods using fuzzy measures also shows that SCSD method generates the lowest average fuzzy error, taking into account the extent of the error as well as error type. The overall conclusion is that newly proposed SCSD method presented in this paper could be appropriate for implementation in a wildfire smoke sensor used in wildfire monitoring and surveillance system that could provide prompt reaction in the case of wildfire, resulting in environmental damage minimization.