1 Introduction

Fire poses a major risk to human safety, health, and property. Traditional “point sensor” fire detection technologies based on particle sampling, temperature sampling, and smoke analysis have slow response time, usually measured in minutes, and have little applicability in outdoor environments. To avoid false alarms, smoke and heat detectors only trigger once a sufficient amount of smoke particles flow into the device or until the temperature has increased substantially. However, time is a major factor in minimizing the damage caused by a fire. Decreasing the response time can greatly increase our chances to extinguish a fire and reduce the damaged caused by the incident.

The slow response time of point sensors has led researchers to consider, as an alternative, volumetric sensor based computing for automatic fire detection systems. Detecting smoke and flame in images or video frames grabbed from a vision sensor is one such technique [1]. Vision sensor based methods hold the promise of decreasing the response time, increasing the probability of detection, and providing coverage of large areas including open areas. Vision sensors can provide information about the direction, size, and growth of fire and smoke [2]. Existing CCTV surveillance systems in factories and public sites can in principle be upgraded at low cost to provide early warnings of a fire using video processing and machine vision techniques to detect flame and smoke.

Çetin et al. [3] provide a recent review of short-range video-based fire detection systems and summarize the existing approaches in terms of the underlying techniques they use. They note that the current vision sensor based approaches focus on both flame and smoke detection, while earlier work only investigated flame detection. From the early research of the late 1990’s, significant improvement has been made, leading to commercial products such as VFDS [4], VisiFire [5], and SIGNIFIRE [6], which are now widespread in buildings and outdoor wild-fire warning systems. However, these fire detection systems still require human intelligence to distinguish between real fires and false alarms.

In general, vision sensor based fire detection systems consist of three components: motion detection, candidate object segmentation, and flame or smoke blob detection. As the camera is normally assumed to be fixed in such applications, most of the attempts use some variation on background modeling or image subtraction methods [7] to detect initial motion. Motion detection is followed by candidate object segmentation using either color information [8, 9], texture cues [10], pixel variability measures [11], or by optical flow field distribution [12]. Flame and smoke blobs may be detected either by modeling flame or smoke in mathematical terms [1315] or using image separation approaches [16, 17]. After motion detection, morphological image processing techniques are often used as a preprocessing step for region detection. Millan-Garcia [18] present an early fire alarm system using IP cameras focusing only on smoke detection. The motion and smoke detection is carried out in Discrete Cosine Transform (DCT) domain instead of the time domain. Yu Cui et al. [10] and Truonget et al. [19] also focus on the detection of smoke for early fire alarms and use color and texture cues respectively for smoke segmentation and machine learning classifiers for the final region classification decision.

Classifiers able to identify regions of interest can be used at a number of stages in the process. A classifier may involve simple image processing based segmentation criteria [20, 21] or more complex approaches involving machine learning [22, 23], Baysian classification [24], classifiers using a mixture of Gaussian (MoG) model [25], support vector machines [26], Markov random fields [2730], or neural networks [31, 32]. Classifiers may incorporate features characterizing color, texture of flame or smoke, and spatial or temporal frequency analysis.

Video evidence of a fire can be characterized by either flame or smoke regions. In either case, the system’s ability to detect fire and smoke will depend on the specific scene depth and camera field of view. But flame and smoke have different physical properties and dynamic behavior. A system that can detect both smoke and flame regions would have a greater probability of detecting fire earlier than one that only detects one or the other. Yu et al. [33] present a real-time flame and smoke detection algorithm. Initial candidate moving pixels are computed using differential background subtraction and then flame and smoke color models are used to obtain a decision rule to segment out the flame and smoke regions. Foreground images for flame and smoke are accumulated. Candidate flame regions are declared if a block (an \(8 \times 8\) group of pixels) of accumulated foreground image value is greater than a threshold. Smoke features are extracted using optical flow of blocks of pixels of the accumulated foreground image. Smoke candidate regions are classified using a neural network classifier. The authors report a processing time of 25 frames per second for a video at \(320 \times 240\) resolution, but they do not mention the specification of the processor, the response time (the time from the point the fire started to the time the fire is detected), or the false alarm rate for their method.

Some authors have reported empirical evaluation results more thoroughly. Wang et al. [34] present a fire flame detection algorithm that extracts initial flame candidates using motion, texture, and color cues. They then use flame area variation to eliminate false detections. Finally, the authors compare their algorithm’s performance with that of Chen et al. [35] in terms of response time. The authors claim to process a \(320 \times 240\) video at 24 frames per second. However, they do not detect smoke, which results in a slow response time for incidents where smoke is the better early indicator of fire.

In this paper we focus on the use of an RGB camera to detect flame and smoke as an early indicator of fire incidents occurring within a range of one meter to 20 meters from the camera.Our system, called QuickBlaze, is an extension of Rinsurongkawong et al.’s [20] and Malenichev et al.’s [21] methods. We combined the techniques into two parallel streams to achieve better response time for early fire detection. Flame is detected by a combination of growth rate analysis and Lucas-Kanade pyramidal optical flow analysis [36] on candidate regions segmented out by background subtraction and a RGB color model. To detect smoke, we use turbulence analysis on candidate regions detected after motion and color cues.

We present a comprehensive empirical evaluation of QuickBlaze in terms of response time, processing speed in frames per second (FPS), and frame error rate (FER, for videos that do not contain fire) on a wide variety of videos including a new set of videos of fire incidents and non-fire events as well as a set of videos available online. For comparison, we choose the commerical-grade software VisiFire [5], which is a real-time vision-based flame and smoke detection application available for commercial use. It is based on a series of academic research contributions [3, 27, 28, 37, 38] in the area of fire incident detection. Out of 20 test video sequences that contain flame or smoke regions, QuickBlaze’s time is better for 18 of the sequences. QuickBlaze is able to detect fire incidents in cases (three video sequences) where VisiFire failed to detect the fire incident. The algorithm runs \(2.66\) times faster than VisiFire on the same hardware. The name QuickBlaze is not a trademark; it is simply used for ease of reading the manuscript. All the videos used for testing in the empirical evaluation described in this paper are available online at the AIT Computer Vision Wiki [39].

In summary, the detailed experimental evaluation reported upon in this paper shows that QuickBlaze is faster and has a lower response time to fire incidents than state of the art commercial software. More generally, simultaneous detection of flame and smoke in video holds the promise to increase the probability of detection, decrease false alarm rates, and decrease the response time of fire detection systems.

In the rest of this paper, we first describe our methods (Sect. 2). Then, in Sect. 3, we present and discuss the empirical evaluation. Finally, in Sect. 4, we conclude.

2 Methodology

The general framework of QuickBlaze is shown in Figure 1. We provide details on each block in the following subsections.

Figure 1
figure 1

QuickBlaze framework. Blocks shown in parallel can be executed in parallel on separate cores

2.1 Color Balancing

An essential preprocessing step of our video processing pipeline is color balancing. Color balancing is essential when object segmentation is based on a color model and the system may be deployed under different illumination conditions. Color balancing consists of two steps, estimation of the illumination and chromatic normalization by a scaling factor [40]. We assume that there is a good color distribution in each frame and therefore use a “gray-world” algorithm, one of the simplest illuminant estimation methods. We compute the average of the image intensities in the R, G, and B planes over each frame. The resulting vector of three intensities is known as the “gray-value” for the image. Each R, G, and B plane is then scaled independently using a multiplication factor that normalizes the gray-value to the average intensity of the frame in the R, G, and B planes. If a scaled value is greater than the maximum possible intensity, it is clamped to the maximum. After color balancing, the frames are fed to the independent flame and smoke detection pipelines as shown in Figure 1.

2.2 Motion Detection

Generally, vision sensors for fire detection are assumed to be fixed at a certain location and orientation, keeping the field of view and the background scene fixed. Motion regions are detected and segmented by extracting the foreground objects using a background model. Let the intensity of pixel \((x,y)\) in frame \(t\) be represented by \(I(x, y; t)\), and let the estimated background intensity of pixel \((x,y)\) in frame \(t\) be denoted by \(B(x,y,t)\). To determine whether the pixel positioned at \((x,y)\) is moving, we first compute \(D_1(x,y;t)\), \(D_2(x,y;t)\), and \(D_3(x,y;t)\), given as

$$\begin{aligned}&D_1(x,y;t) \triangleq I(x,y; t) - I(x,y; t-1),\\&D_2(x,y;t) \triangleq I(x,y; t) - I(x,y; t-2),\\&D_3(x,y;t) \triangleq I(x,y; t) - B(x,y; t). \end{aligned}$$

Let \(F(x,y;t)\) be a binary image that specifies whether pixel \((x,y)\) is apparently moving in frame \(t\). \(F(x,y;t)\) can be defined by

$$ F(x,y;t) = \left\{ {\begin{array}{*{20}l} 1 & \begin{aligned} {\text{if}}\; & \left| {D_{1} (x,y;t)} \right| > T(x,y;t) \wedge \\ & {\mkern 1mu} \left| {D_{2} (x,y;t)} \right| > T(x,y;t) \wedge \\ & \left| {D_{3} (x,y;t)} \right| > T(x,y;t) \\ \end{aligned} \\ 0 & {{\quad\text{otherwise}},} \\ \end{array} } \right. $$
(1)

where \(\alpha \), an update parameter, is a small positive real number close to zero. We use the same procedure to obtain \(F(x,y;t)\) for flame and smoke independently, but the background model \(B(x,y;t)\) and the threshold \(T(x,y;t)\) are different for flame and smoke.

2.2.1 Dynamic Background Model and Adaptive Motion Threshold for Flame

For flame, the background is dynamically updated for pixels considered stationary using the update rule

$$\begin{aligned} B(x,y; t+1) = {\left\{ \begin{array}{ll} \alpha B(x,y; t) + (1-\alpha ) I(x,y; t) &{} {\text {if}} \,\, F (x,y,t) = 1 \\ B(x,y; t) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(2)

The threshold \(T(x, y; t)\) for flame is adaptive, updated according to two cases:

$$\begin{aligned} T(x,y; t+1) = {\left\{ \begin{array}{ll} \alpha T(x,y; t) + (1-\alpha )|I(x,y;t) - B(x,y,t+1) | &{} {\text {if}}\; F(x,y,t) \ge 1 \\ T(x,y; t) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(3)

We use the same \(\alpha \) for the adaptive threshold and the background model update.

2.2.2 Static Background Model and Static Threshold for Smoke

Smoke pixels are generally more sparse than fire pixels, and the rate of change of the intensity of smoke pixels per frame is slower than that of flame. Regularly updating the background mode suppresses detection of slow-moving smoke pixels as foreground. To address this problem, we use a static background model \(B(x,y; t)\) for smoke detection that is computed at the start of video processing and an empirical constant \(T(x, y; n) = \tau \) for the threshold. The parameter \(\tau \) can be determined through experimental evaluation on training videos.

2.3 Chromatic Filtering

The color of flame is generally in the red-yellow or reddish range, while smoke, at least at the start of a fire, has a color that varies from bluish white to white. We therefore use separate color models to detect flame and smoke regions. Pixels are only considered as candidate members of potential flame or smoke regions if they are marked as being in motion according to the motion map \(F(x,y;t)\) described in Sect. 2.2 and then survive the chromatic filtering step for flame or smoke.

2.3.1 Flame Color Model

To find the candidate fire pixels we follow the method of Chen et al. [9]. The first filter selects pixels that have an appropriate hue in the HSI color space. The method then eliminates from consideration any pixels outside the red-yellow range of \(0^\circ \) to \(60^\circ \) for H. Let the intensity of a pixel under consideration in the red, green, and blue channel be R, G, and B respectively. The hue filter happens to correspond to the RGB region where, \(R \ge G\) and \(G > B\). The consequence of this filter is that for all selected pixels, in the RGB color space,

$$\begin{aligned} R \ge G > B. \end{aligned}$$

To prevent selection of reddish pixels with low brightness and low saturation, we further filter out pixels with low red levels and low saturation levels. For the red level, we apply a simple threshold:

$$\begin{aligned} R > R_T, \end{aligned}$$

where \(R\) is the red level of the pixel in question and \(R_T\) is an empirical threshold for the red level that is determined by calibrating on training videos. For saturation, we apply a slightly more complicated rule:

$$\begin{aligned} S \ge ((255 - R) \times \frac{S_T}{R_T}) \end{aligned}$$

The set of valid saturation and red levels according to this rule is shown in Figure 2. As a final step, following Chen et al., we filter out any candidate fire pixels with intensity below a threshold \(I_T\). As with all other parameters, we set the values of \(R_T\), \(S_T\), and \(I_T\) empirically using training videos.

Figure 2
figure 2

Filtering for candidate flame pixels based on red level and saturation level

2.3.2 Smoke Color Model

For the smoke color model, we used the rule presented by Celik et al. [41]:

$$\begin{aligned} |R - G|<T;|G - B|<T; |R - B|<T, \end{aligned}$$

where threshold \(T\) is adjusted for good performance on the training video set.

2.4 Morphological Image Processing

After basic filtering according to motion and color cues, the resulting candidate regions are typically noisy. We perform morphological image processing on both the flame and smoke pixel maps to remove noise and extract final candidate regions. We first perform basic operations to close holes and remove small noise elements. We then obtain connected components and eliminate small components according to the length of the region’s perimeter. Finally, we get a chain code for each surviving candidate fire and smoke region separately.

2.4.1 Candiate Flame Regions

On the potential flame pixels, we perform morphological closing and then opening to eliminate small holes and connect nearby regions. Since fire normally moves and spreads vertically, we use a \(5\times 5\) specifically-tuned vertical structuring element for closing and a standard \(3\times 3\) rectangular structuring element for opening.

2.4.2 Candidate Smoke Regions

Candidate smoke pixel regions tend to be more sparse than candidate flame pixel regions. Instead of using opening and closing, we used dilation and erosion (using \(3\times 3\) rectangular structuring element) on the probable smoke pixels to remove the noise.

2.5 Localization of Candidate Regions

A frame may contain multiple candidate flame and smoke regions. Each candidate region should be tracked and localized separately to analyze growth rate and flow rate in case of flame regions or turbulence in case of smoke regions. When tracking regions from frame to frame, each region can be classified as either newly born or a previously existing region. To perform this classification, we logically AND the masks (obtained after filtering for motion and color morphology) for the current and previous frame. If all the values for a given foreground region evaluate to 0, the region is classified as newly born. If, on the other hand, some values for the given foreground region are non zero, the region is associated with the previously detected region. For both smoke and flame, a candiate region must survive (be associated with a region in the previous frame) for some number of frames before being declared evidence of an actual fire (see the next subsection for details).

2.6 Smoke Detection

For smoke, we use simplistic motion, color, and connectivity rules to finalize the set of detected regions, as detailed below.

2.6.1 Absolute Color Change

When smoke appears in a scene, it tends to cause fading of the color of the object behind it, until it completely covers that object. This color change or increase in smoke opacity does not happen instantly, but happens gradually as the smoke density increases. When a smoke candidate region is detected, due to morphological processing in the previous step, the region may include not only actual smoke pixels but also neighboring background region or object pixels. This leads us to define a criterion for the presence of smoke: let \((\delta R, \delta G, \delta B)\) be the absolute change in the color of a pixel from the previous frame to the current frame. We define a threshold \(C_{\max }\) such that to be considered part of a smoke region, the pixel must satisfy

$$\begin{aligned} \delta R < C_{\max } \wedge \delta G < C_{\max } \wedge \delta B < C_{\max }. \end{aligned}$$

As with all other parameters, we fix \(C_{\max }\) for good performance on a set of training videos.

2.6.2 Turbulence Rate Analysis

Smoke emanating from a fire is warmer and lighter than surrounding air and therefore tends to move upwards. The shape of a smoke region’s projection onto the image plane is therefore complex; furthermore, its perimeter will change more abruptly than its area. This property can be modeled mathematically to more accurately detect possible smoke regions.

Let the perimeter of a candidate smoke blob be \(P(t)\), and let the area of the blob be \(A(t)\). We define a ratio \(\Omega (t)\) to represent the irregularity of the blob at time \(t\):

$$\begin{aligned} \Omega (t) = \frac{P(t)}{2\sqrt{\pi A(t)}}. \end{aligned}$$

We can characterize the turbulence of the blob [42], as the derivative of the irregularity with respect to time:

$$\begin{aligned} \frac{d\Omega }{dt} = \frac{2\frac{dP}{dt}A(t) - P(t)\frac{dA}{dt}}{4\sqrt{\pi A^{3}(t)}}. \end{aligned}$$

\(\frac{d\Omega }{dt}\) increases with the complexity of the shape. To increase our level of confidence in declaring a candidate region as smoke, we maintain a cumulative sum of \(\frac{d\Omega }{dt}\) and \(A(t)\) for \(\Delta _t\) frames. If the cumulative sum of \(\frac{d\Omega }{dt}\) for a region is greater than \(\Omega _{threshold}\), and \(A(t)\) is greater than \(\Delta _A\), we classify the region as a smoke region. The thresholds are empirically calculated on the training dataset.

2.7 Flame Detection Algorithm

Similar to the way in which we use pixel color change and turbulence analysis to eliminate false positive smoke regions, we can eliminate false positive flame regions when their behavior is inconsistent with the physics of flame. Fire is characterized by turbulent flames. As shown in Figure 3, as air is heated by a fire, since the density of a gas is inversely proportional to its temperature, a plume will rise above the burning object, causing upward motion. As air in the hot plume is cooled by the surrounding air, its buoyancy decreases, causing it to cease rising and start falling. As cool air is induced to flow into the fire plume (a process called entrainment), eddies form, creating rising vortices and turbulence. We use two heuristics, growth rate analysis and flow rate analysis, to characterize the dynamics typical of flame regions and to filter out regions unlikely to contain flame.

Figure 3
figure 3

Buoyancy and fire plumes

2.7.1 Growth Rate Analysis

As explained in the previous section, generally, a burning fire will expand upward and outward, depending on the air flow and fuel type. Our growth rate analysis method exploits this feature of growing fires. When a candidate fire connected component is initially detected, we record its bounding box for the frame in which it is first detected. We then extract growth rate information from the next \(n\) frames. We measure growth separately in the left, right, upward, and downward directions. Only candidate regions that grow more in the upward direction than the other three directions are finally considered fire regions. Our method of measuring growth in each frame is as follows. In the first frame subsequent to detection of the region, we initialize a search region obtained from the original bounding box by expanding the bounding box by one pixel in all four directions and then removing the pixels corresponding to the original bounding box. On each subsequent frame, we expand the search region by one pixel. In every frame, we count candidate fire pixels separately for the left, right, upward, and downward directions. We take the summation of the number of candidate fire pixels in the region over all \(n\) frames as a measure of how much the candidate fire blob has grown in the respective direction from the frame in which it was originally detected. Finally, we only retain as candidate fire regions those components whose growth in the upward direction is greater than its growth in the left, right, and downward directions at frame \(n\).

2.7.2 Flow Rate Analysis

Flow rate analysis is based on the hypothesis that image regions containing turbulent flames generate a large amount of flow compared to more rigid moving objects.

We use the intensity-based pyramidal Lucas-Kanade optical flow method with Shi and Tomasi keypoint features [43] to estimate motion in a candidate flame region from frame to frame. Lucas-Kanade first solves for the optical flow at the top of the pyramid, then at each layer below, it uses the motion estimates from the previous layer as a starting point. The method continues down the pyramid until it reaches the lowest level. Pyramidal Lucas-Kanade estimates a motion velocity vector for each feature point in the region of interest. The results at time \(t\) are

$$\begin{aligned} P^{(t)}&= \{(p^{(t)}_{xi}, p^{(t)}_{yi})\}_{i \in 1, \ldots , m^{(t)}} \\ Q^{(t)}&= \{(q^{(t)}_{xi}, q^{(t)}_{yi})\}_{i \in 1, \ldots , m^{(t)}}, \end{aligned}$$

where each \(\mathbf {p}_i^{(t)}\) and \(\mathbf {q}_i^{(t)}\) denote the starting and ending point of a feature point, respectively. \(m^{(t)}\) is the total number of feature points tracked from frame \(t-1\) to frame \(t\). We will use the velocity vectors \(\mathbf {q}_i^{(t)}-\mathbf {p}_i^{(t)}\) to calculate two features of the overall flow rate. The first feature is the average flow from frame 0 to frame 1:

$$\begin{aligned} F_0 = \frac{1}{m^{(1)}}\sum _{i=1}^{m^{(1)}}\sqrt{(p_{yi}^{(1)}-q_{yi}^{(1)})^2 + (p_{xi}^{(1)}-q_{xi}^{(1)})^2} \; . \end{aligned}$$
(4)

\(F_0\) is used as a reference value for the subsequent \(n-1\) frames:

$$\begin{aligned} F(t) = \frac{1}{m^{(t)}}\sum _{i=1}^{m^{(t)}}\sqrt{(p_{yi}^{(t)}-q_{yi}^{(t)})^2 + (p_{xi}^{(t)}-q_{xi}^{(t)})^2} \; . \end{aligned}$$
(5)
$$\begin{aligned} F_v = \frac{1}{n-1}\sum _{t=2}^n(F(t) - F_0), \end{aligned}$$
(6)

\(F_v\) is the total growth in the average flow rate over a window of \(n\) frames. We expect the flow rate of turbulent fire regions to grow faster than that of other moving objects. We fix a threshold for the flow rate growth \(F_T\). We require \(F_v > F_T\) to declare a candidate flame region as a fire region. \(F_T\) is computed using training video data.

3 Experimental Results

We implemented the algorithm pipeline defined in Sect. 2 for fire detection using an Intel Core 2 Quad CPU Q9550 with four cores, running at 2.83GHz with 4 GB RAM and Ubuntu Linux as the operating system. The current implementation is sequential, but since the two processing streams are in fact independent, they can in principle easily be run on separate cores after the color balancing step. The software was written in C and C++ with the help of the OpenCV 2.4 library. The focus of our experiments is to analyze the ability of the system to detect fires as early as possible after the burning process has begun, so as key indicators of the algorithm’s effectiveness, we measure the response time, the frame processing rate, and the false error detection rate. We manually adjusted the threshold parameters (Table 1) to enable the system to perform well on a set of three training videos, which contains two videos with flame and smoke regions and one distracter video containing an orange balloon.

After training, the system was tested on the videos listed in Table 2 without any modification to the parameters. Of course, some further calibration might be necessary to detect flames or smoke with characteristics very different from those in the videos tested, but the manually adjusted parameter settings proved suitable without modification for this set of test videos, including those with orange distractor objects, moving people and cars, and so on. Sample frames from each of the training videos are shown in Figure 4.

Figure 4
figure 4

Sample frames from training videos used for adjusting thresholds for QuickBlaze. T1 has 7343 frames, T2 has 1607 frames, and T3 has 3383 frames. T1 and T3 contain both fire and smoke regions, while T2 contains no fire or smoke but does contain an orange balloon as a distracter (Color figure online)

Table 1 Threshold Parameters Adjusted for Good Performance on Training Videos

For testing we collected a total of 30 videos, 20 of which contain flame and/or smoke and 10 of which are distracters containing no flame or smoke. M-1, M-2, M-3, M-4, M-8, and M-22 were created by the authors [20]. M-5, M-6, M-9, M-10, M-11, M-12, M-13, M-14, M-21, M-23, M-24, and M-29 are from the VisiFire website [5]. M-7 was taken from NIST [44]. M-16, M-17, M-18, M-19, M-25, M-26, M-27, and M-28 were taken from Malenichev et al. [21]. To verify that our algorithm runs in real time, we created videos M-20 and M-30 using an IQEye IP camera with a resolution of \(720 \times 480\) pixels at 15 frames per second and ran our algorithm on the live streams as the videos were being recorded. M-1 through M-15 and M-20 contain both flame and smoke, while M-16 to M-19 only contain smoke. M-21, M-25, M-27, and M-28 contain distracter objects likely to be classified as smoke. M-22, M-24, M-29, and M-30 contain distracter objects likely to be classified as flame. M-23 has both flame-like and smoke-like objects. M-1, M-3, M-4, and M-22 are similar to the training videos, but they were created at different times with different target objects. Detailed descriptions of each video are given in Table 2, and a sample frame from each video that contains flame and smoke is shown in Figure 5. We selected the first frame containing flame or smoke according to human observation of the incident. Sample frames from the test videos that do not contain flame or smoke are shown in Figure 6.

Table 2 Video Sequences Selected for Testing
Figure 5
figure 5

Sample frames from test videos that contain flame or smoke. In each case, the first frame containing flame or smoke according to human observations is shown. Although M-1, M-3, and M-4 are similar to videos in the training set, they were acquired at different times with different target objects

Figure 6
figure 6

Arbitrary sample frames from test video sequences without any flame or smoke regions. Although M-22 is similar to one of the training videos, it was acquired at a different time with different target objects

In our implementation, to speed up total compute time, every frame is downsampled to \(320 \times 240\) as a preprocessing step. For the ground truth, the first author manually identified the first frame of each video in which evidence of fire (flame or smoke) was evident. This manual identification process is tedious, and the precise localization of the first frame would no doubt differ for different people, especially for videos that contain faint smoke regions before flame appears. However, as we shall see, the ground truth frame numbers are always lower than those identified by the machine vision algorithms. Since any human error affects both of the tested algorithms equally, it can be ignored.

To demonstate the effectiveness of combining the two methods in parallel streams, we evaluate the performance of our combined approach with the individual methods as described in [20] and [21]. In Table 3, we compare the combined approach to the individual methods in terms of fire detection time, and the data clearly show that the combined approach is better in terms of detection time than either of the individual methods alone. In Table 4, we compare the combined approach to the individual methods in terms of false alarm rates, and the data clearly show that the improvement in terms of response time comes at the cost of a small increase in false alarm rates, as the false alarms for the combined system will necessarily be the union of the false alarms for the individual systems.

Table 3 Experimental Results for Videos Containing Fire and/or Smoke
Table 4 Experimental Results for Distracter Videos (Videos without any Fire or Smoke)

For a comparative evaluation of our machine vision detection algorithm, we contrast our results with those obtained from VisiFire [5], a Windows-based program that is available for commercial use with a fee and for free limited academic use on request. VisiFire contains several different fire analyzers, including a fire and smoke detector, a forest fire detector, and an infrared sensor-based fire detector. We chose VisiFire because it runs in real time and has evolved through a series of academic research contributions [3, 27, 28, 37, 38]. In the program, we selected fire and smoke detection and used the default settings for the parameters available. The detailed settings are shown in Figure 7. We set the processing size for each video frame to \(320 \times 240\), the same as used by our algorithm. The S_T parameter shown in Figure 7 represents the threshold number of neighboring blocks with evidence of fire necessary to trigger a fire alarm. The software marks likely flame blocks with red squares and likely smoke blocks with dark blue squares. Although the VisiFire executable was run on Windows and our method was run on Ubuntu, the test video sequences were identical, and the runs were performed on the same hardware. VisiFire displays the frame number and number of frames currently being processed per second on its display, so we could easily record the first frame in which fire was detected and the overall frame processing rate by observing this display.

Figure 7
figure 7

Settings used for VisiFire fire and smoke analyzer in the experimental comparison

Table 5 Experimental Results for Videos Containing Fire and/or Smoke

Comparisons of VisiFire and the QuickBlaze for response time, frames per second, and false alarms are shown in Tables 5 and 6.

In terms of speed of processing, although the frame rate varies from video to video based on the number of candidate fire and smoke regions, the proposed method runs faster than VisiFire on every video, with a total overall speedup of 2.66.

In terms of accuracy, we observe that on 18 of the 20 videos that contain flame and/or smoke, fire detection using our proposed method results in better response time than using VisiFire.

In general, we observe that the fire incidents can be characterized by the visibility of any flame and/or smoke and depends upon the surrounding environment, the illumination of the scene, and the camera’s field of view. We now take a closer look at results on specific videos.

Figure 8 compares detection of fire based on smoke regions by our method and by VisiFire. The first row shows the first frame of video sequence M-1, M-3, M-4, and M-15 respectively, in which a fire incident was first detected by our method, and the second row shows the first frame in the corresponding video in which VisiFire detected fire, if it did. VisiFire detected smoke regions in M-1 and M-2 (albeit much later than our method), a flame region in M-4, and did not detect any fire incident in M-15 (see Table 5 for details for all fire videos).

Figure 9 shows a similar comparison of fire detection based on flame regions. The first row shows the first frame of video sequence M-7, M-8, M-12, and M-14 respectively, in which a fire incident was first detected by our method, and the second row shows first frame in the corresponding video in which VisiFire detected fire, if it did. VisiFire detected smoke in M-7, flame in M-8, and did not detect a fire incidents in M-12 or M-14. Clearly, our method detects flame regions earlier than VisiFire in these videos. QuickBlaze tends to work very well when illumination is dim relative to any flame and when clutter is low with little background contrast.

Table 6 Experimental Results for Distracter Videos (Videos Without any Fire or Smoke)
Figure 8
figure 8

First frame in which fire is detected through smoke detection by QuickBlaze. The first row shows the first frame in video sequences M-1, M-3, M-4, and M-15 in which the fire incident was detected by the proposed method’s smoke region detector. The second row shows the first frame in the same sequences in which the fire incident was detected by VisiFire. VisiFire failed to detect any fire incident in video M-15

Figure 9
figure 9

First frame in which fire is detected through flame detection by QuickBlaze. The first row shows the first frame in video M-7, M-8, M-12, and M-14 in which the fire incident was detected by the proposed method’s flame region detector. The second row shows the first frame in the same sequences in which the fire incident was detected by VisiFire. VisiFire failed to detect any fire incident in videos M-12 and M-14

In two cases, VisiFire detected flame or smoke regions earlier than the proposed method. See Figure 10 for a visual comparison of the frames in which fire was first detected by the proposed method and VisiFire. In video M-6, the flame is quite transparent, so the background shows through, reducing our method’s ability to detect it. The method does detect the smoke given off by the fire, but later than VisiFire, which is able to detect the flame early on. In the second case, VisiFire rapidly detects the thick, opaque smoke initially coming from the smoke bomb, whereas our method performs better on thin, wispy smoke, resulting in later detection once the smoke is more diffuse.

Figure 10
figure 10

First frame in which fire was detected by QuickBlaze and VisiFire on two videos for which the QuickBlaze’s response time was slow than VisiFire’s

We also observe that our algorithm precisely localizes fire incidents in all of the test video sequences, both in daytime and nighttime. This is shown in Figure 11. The first row shows the first frame in which a fire incident is detected by our method in nigghttime videos M-9, M-10, and M-13, while the second row shows the VisiFire result for the corresponding sequences. Visual comparison clearly shows that the proposed method better localizes the fire incident than does VisiFire.

Figure 11
figure 11

The first frame in which fire incidents were detected in nighttime video sequences M-9, M-10, and M-13. QuickBlaze detected flame regions, while VisiFire detected smoke regions. Note the better localization of the fire by the proposed method

Although the best possible threshold parameters may vary with the type of fire or smoke, we observe that the threshold parameters we obtained manually using the training videos provide very good performance across a wide variety early fire detection scenarios. For instance, in video sequences M-7, M-8, M-10, and M-13, actual fires were detected by their indirect reflections from other surfaces. These results can be seen in Figure 12.

Figure 12
figure 12

Frame in which fire was detected from an indirect surface that reflected the radiation emitted by fire (QuickBlaze)

As a final comparison, Figure 13 shows sample false detections in M-21, M-22, and M-23. M-21 contains a cloud of dust produced by a car accident, which triggers a false alarm for our method. Our method was distracted by the partly orange neon sign in M-29 whereas VisiFire was not; VisiFire, on the other hand, was distracted by the headlights in M-23, whereas our method was not.

Figure 13
figure 13

Sample frames from videos M-21, M-22, M-23, and M-29 in which false alarms were raised by the QuickBlaze or VisiFire. No false alarm was raised for M-23 by QuickBlaze. No false alarm was raised for M-29 by VisiFire

This extensive evaluation in comparison with state of the art commercial software shows that our method responds to fire incidents sooner, processes the images faster, has a slightly lower false alarm rate, and better localizes fire incidents, especially at night-time.

4 Conclusion

In this paper, we have proposed QuickBlaze, a real-time early fire incident detection system that detects both flame and smoke regions in video images. We use parallel image processing streams to detect flame and smoke regions at high speed, with low response times. Our method does not require any offline training, although manual adjustment of parameters during a calibration phase is required to cater to the particular camera’s depth of view and surrounding environment. We compare the combined approach to the individual methods in terms of false alarm rates, and the data clearly show that the improvement in terms of response time comes at the cost of a small increase in false alarm rates, as the false alarms for the combined system will necessarily be the union of the false alarms for the individual systems. We evaluate the algorithm on videos of a variety of real-world fires, and we have also performed a live test. Our focus in this paper is on detection of bright red-to-orange flame and white smoke typical of early fires [45, 46]. Detecting different kinds of smoke and flame more typical of mature fires is out of the scope of this paper. For comparison, we benchmark our method against commercial real-time fire and smoke detection software that evolved through a series of academic research contributions. We find that QuickBlaze has a better response time, is faster, and provides better fire localization than the commercial system. QuickBlaze could be deployed in any environment, and would likely have a faster response time than smoke detectors, as long as the fire or smoke is in view of the camera sensor. This does mean however that it may be most practical for large open spaces (commercial or industrial spaces) and less practical for areas with many small rooms such as residences. Furthermore, one sensor system might not be the best possible detector for all situations. In indoor environments, it would of course be possible to augment the video based approach reported upon here with traditional point based sensors such as demonstrated by [4750].