1 Introduction

Fire is one of the leading hazards affecting everyday life around the world. The sooner the fire is detected, the better the chances for survival. Today’s fire alarm systems, however, still pose many problems. In order to cope with these problems, research on video-based fire detection (VFD) has started in the beginning of this century. This has resulted in a large amount of vision-based detection techniques that can be used to detect the fire at an early stage [44]. Based on the numerous advantages of video-based sensors, e.g. fast detection (no transport delay) and the ability to provide fire progress information, VFD is recently becoming a viable alternative or complement for the more traditional fire sensors.

Although it has been shown that ordinary video promises good fire detection and analysis results, vision-based detectors still suffer from a significant amount of missed detections and false alarms. The main cause of both problems is the fact that visual detection is often subject to constraints regarding the scene under investigation, e.g. changing environmental conditions, and the target characteristics. To avoid the disadvantages of visual sensors, the use of other types of sensors is started to be explored in the last decade.

Thanks to the improvement of resolution, speed and sensitivity of infrared (IR) imaging, this newer type of imagery is already used successfully in many video surveillance applications, e.g. traffic safety, airport security and material inspection. Recently, also IR imaging approaches for flame detection have been proposed [6, 18, 31, 41, 45]. When light conditions are bad or the target its color is similar to the background, IR vision is a fundamental aid. Even other visual-specific object detection problems, such as shadows, do not cause problems in IR [20]. Nevertheless, IR has its own specific limitations, such as thermal reflections, IR-blocking and thermal-distance problems. Furthermore, the cost of IR cameras is still very high.

Recently, as an alternative for IR and visual sensors, time-of-flight (TOF) imaging sensors are started to be used as a way to improve everyday video analysis tasks. TOF cameras are a relatively new innovation capable of providing three-dimensional image data from a single sensor. TOF imaging takes advantage of the different kinds of information produced by the TOF cameras, i.e. depth and amplitude information. The ability to describe scenes using a depth map and an amplitude image, provides new opportunities in different applications, including visual monitoring (object detection, tracking, recognition and image understanding), human computer interaction (e.g. gaming) and video surveillance. As the TOF sensor cost is decreasing, it is even expected that this number of applications will increase significantly in the near future.

The possibilities of TOF based fire detection have not yet been investigated. As such, this paper is the first attempt in this direction. Based on our preliminary experiments with a Panasonic D-Imager [32], of which some exemplary TOF flame images are shown in Fig. 1, it is already possible to state that TOF cameras have great potential for fire detection. Flames produce a lot of measurement artifacts (~TOF noise), which most likely can be attributed to the emitted infrared (IR) light of the flames themselves. Contrarily to ordinary objects, like people, the depth of flames changes very fast over time and ranges over the entire depth range of the camera. Furthermore, the amplitude of the boundary pixels of flames shows a high degree of disorder. As our experiments revealed that the combination of these TOF characteristics is unique for flames, they are the most appropriate features for TOF fire detection.

Fig. 1
figure 1

Exemplary TOF flame images: a depth maps and b corresponding amplitude images; c ordinary video (not registered)

In outdoor situations, outside the range of the TOF camera and in case that smoke appears in the field of view of the TOF camera, the TOF depth map becomes unreliable and cannot be used anymore for accurate flame detection. This can also be seen in Fig. 1. In order to cope with this problem, a visual detector is used instead of the fast changing depth detector. The proposed visual flame detector is based on a set of low-cost visual flame features which have proven to be useful in distinguishing flames from ordinary moving objects. The main benefit of using TOF amplitude data as well as visible information in outdoor environments (or outside the range of the TOF camera), is that mis-detections in visual images can be corrected by TOF detections and vice versa. As such fewer false alarms will occur when both are combined. Important to mention is that in order to do this, the visual and TOF images need to be registered, i.e. aligned with each other.

The remainder of this paper is organized as follows. Section 2 lists the related work in TOF video analysis and video fire detection. Very briefly, this section also reflects on conventional (3D) depth acquisition using stereo-vision. Next, Section 3 describes the working principle and the advantages and disadvantages of TOF imaging. Subsequently, Section 4 proposes the novel TOF based indoor flame detection algorithm. Flames are detected by looking for regions which contain multiple pixels with high amplitude disorder and high accumulated depth differences. Section 5, on the other hand, presents the novel multi-sensor visual-TOF flame detector. Next, Section 6 reports the performance results of our primary experiments. Finally, Section 7 lists the conclusions and points out directions for future work.

2 Related work

2.1 Time-of-flight based video analysis

To the best of our knowledge, the work described in this paper is the first attempt to develop a fire detection system based on the use of a TOF depth sensor. Nevertheless, the use of TOF cameras for video analysis is not new. Recently, TOF has started to be explored as a way to improve a lot of conventional video based applications. The results of these first approaches already seem very promising and ensure the feasibility of TOF imaging in other domains, such as fire detection. So far, TOF imaging devices are used for:

  • Video surveillance: Hügli and Zamofing [22] explore a remedy to shadows and illumination problems in ’conventional’ video surveillance by using range cameras. Tanner et al. [37] and Bevilacqua et al. [3] propose a TOF-based improvement for the detection, tracking and counting of people. Similarly, Grassi et al. [17] fuse TOF and infrared images to detect pedestrians and to classify them according to their moving direction and relative speed; Tombari et al. [38] detect graffiti by looking for stationary changes of brightness that do not correspond to changes in depth.

  • Image/video segmentation: In [4, 34] fusion of depth and color images results in significant improvements in segmentation of challenging sequences.

  • Face detection/recognition: Hansen et al. [21] improve the performance of face detection by using both depth and gray scale images; Meers et al. [28] generate accurate TOF-based 3D face prints, suitable for face recognition.

  • Human Computer Interaction: TOF cameras also pave the way to new types of interfaces that make use of gesture recognition [7] or the user’s head pose and facial features [28]. These novel interfaces can be used in a lot of systems, e.g. view control in 3D simulation programs, video conferencing, interactive tabletops [51], and support systems for the disabled.

  • (Deviceless) gaming: TOF imaging also increases the gaming immersion, as with this technology, people can play video games using their bodies as controllers. This is done by markerless motion capture, i.e. tracking and gesture recognition, using a single depth sensor. The sensor smoothly mirrors the player’s movements onto the gaming character. Recently, several companies, e.g. Omek Interactive and Softkinetic, started to provide commercially available TOF technology for gesture-based video gaming. Furthermore, Microsoft also focuses on this new way of gaming with its recently launched TOF-like Kinect.

  • Other applications: e-health (e.g fall detection [25]), medical applications, interactive shopping and automotive applications (e.g driving assistance and safety functions such as collision avoidance [35, 43]).

2.2 Video fire detection

The several vision-based fire and smoke detection algorithms that have been proposed in literature have led to a large amount of VFD algorithms that can be used to detect the presence of fire. Each of these algorithms detects flames or smoke by analyzing one or more fire features in visible light [44].

Color was one of the first features used in VFD and is still by far the most popular [9]. The majority of the color-based approaches in VFD makes use of RGB color space, sometimes in combination with the saturation of HSI (Hue–Saturation–Intensity) color space [10, 33]. The main reason for using RGB is the equality in RGB values of smoke pixels and the easily distinguishable red–yellow range of flames.

Other frequently used fire features are flickering [27, 33] and energy variation [8, 40, 44]. Both focus on the temporal behavior of flames and smoke. Flickering refers to the temporal intermittency with which pixels appear and disappear at the edges of turbulent flames. Energy variation refers to the temporal disorder of pixels in the high-pass components of the discrete wavelet transformed images of the camera.

Fire also has the unique characteristic that it does not remain a steady color, i.e., the flames are composed of several varying colors within a small area. Spatial difference analysis [33, 39] focuses on this feature and analyses the spatial color variations in pixel values to eliminate ordinary fire-colored objects with a solid flame color.

Also an interesting feature for fire detection is the disorder of smoke and flame regions over time. Some examples of frequently used metrics to measure this disorder are randomness of area size [5], boundary roughness [40], and turbulence variance [52]. Although not directly related to fire characteristics, motion is also used in most VFD systems as a feature to simplify and improve the detection process, i.e., to eliminate the disturbance of stationary non-fire objects. In order to detect possible motion, possibly caused by the fire, the moving part in the current video frame is detected by means of a motion segmentation algorithm [8, 39, 40, 52].

Based on the analysis of the discussed state-of-the-art and our own experiments [49], a low-cost visual flame detector is presented in Section 5. This detector is used in conjunction with the TOF amplitude disorder detector to detect flames in outdoor situations and outside the range of the TOF camera.

2.3 Conventional depth acquisition using stereo-vision

The acquisition of high quality real time depth information is a very challenging issue. Traditionally this problem has been tackled by means of stereo vision systems, which exploit the information coming from two or more conventional video cameras. Stereo vision provides a direct way of inferring the depth information by using two images, i.e., the stereo pair, destined for the left and right eye, respectively. When each image of a stereo pair is viewed by its respective eye, the human brain can process subtle differences between the images to yield 3D perception of the scene being viewed [42]. Currently, stereo vision systems are used in a wide range of application domains such as remote machine control, medical imaging, automatic surveillance and multi-modal (depth/color) content-based segmentation [1, 14].

Although stereo vision systems have been greatly improved in the last years and obtained interesting results, they cannot handle all scene situations (~aperture problem). Moreover the algorithmic complexity of most advanced stereo-vision systems is quite high, i.e., they are very time-consuming algorithms, not suited for real-time operation. Hence, stereo vision systems do not provide completely satisfactory solutions for the extraction of depth information from generic scenes [12]. This is also confirmed by the quantitative comparison of Beder et al. [2]. They found that the TOF system outperformed the stereo system in terms of achievable accuracy for distance measurements.TOF cameras can as such be considered as a competing technique for stereo-vision based video surveillance applications. Furthermore, since TOF systems directly yield accurate 3D measurements, they can (contrarily to stereo-vision systems) be used in highly dynamic environments, such as large and open spaces like car parks, atria and airports, i.e., our use cases.

3 Time-of-flight imaging

The working principle of TOF imaging is shown in Fig. 2. In order to measure the depth for every pixel in the image, the TOF camera is surrounded by infrared LEDs which illuminate the scene with a frequency modulated IR signal. This signal is reflected on the scene, and the camera measures the time t needed by the signal to go and return. If the emitter and the receiver are punctual and located at the same place, then Δt allows us to measure the depth of each pixel, as the depth d = c Δt / 2, where c is the signal’s speed (c ≃ 3 * 108 m/s for light). Simultaneously, the camera also measures the strength of the reflected infrared signal, i.e. its amplitude, which is an indicator about the accuracy of the distance estimation [7].

Fig. 2
figure 2

Working principle of TOF imaging: modulated light is emitted from IR LEDs on the sensor. Light is reflected on the object and captured by the sensor. The time between emission and reception and the measured amplitude is used to generate the depth and the intensity images

As the depth and amplitude information is obtained using the same sensor, the depth map (Fig. 3a) and the amplitude image (Fig. 3b) are registered (Fig. 3c). Compared to other multi-modal detectors, no additional processing is required for correspondence matching, i.e., one of the strengths of the TOF sensor. Other advantages of TOF imaging are:

  • Not sensitive to light changes/shadows: the TOF camera uses his own (invisible) light, which simplifies moving object detection a lot.

  • Minimal amount of post-processing, giving application-processing more time for real time detection.

  • The depth map, of which the information represents the physical properties of object location and shape, can help in dividing the objects during occlusion or partial overlapping [13].

  • Low price compared to other IR-based sensors.

Fig. 3
figure 3

Correspondence matching between a TOF depth map and b amplitude image; c registration check

In general, one can conclude that time-of-flight data compensates for the disadvantages and weaknesses, e.g.noise and other problematic artifacts, present in other data [4]. However, time-of-flight imaging also has its disadvantages, i.e., its advantages come with a price. The disadvantages associated with TOF imaging cameras are:

  • Low spatial resolution: The average commercially available TOF camera has a QCIF resolution (176 ×144 pixels), which is rather low. However, as with traditional imaging technology, resolution is increasing steadily with each new model offering higher resolution as the technology matures.

  • Measurement artifacts: Objects too distant (≤ 1.2 m for the type of camera used in our experiments) can be poorly illuminated leading to low quality range/depth measurements. This is also illustrated by the experiments shown in Fig. 4a; Significant motion can also cause corrupt range/amplitude data, because the scene may change during consecutive range acquisitions. In order to cope with these problems, the discrete wavelet transform (DWT) detail images are also investigated to distinguish flames from other ‘fast’ moving objects; The sensor also has a limited ‘non-ambiguity range’ before the signals get out of phase. In small rooms, this is no problem, but in large rooms this can do raise problems, as shown by the experiment in Fig. 4b. For this reason, the outdoor detector, which can also be used for detection out of the range of the TOF camera, uses a visual flame detector instead of the ’unreliable’ depth maps.

  • Need for active illumination: This increases power consumption and physical size, complicates thermal dissipation, but perhaps most importantly, limits the useful operating distance of the cameras. However, as the proposed detectors mainly focus on the IR emitted by the flames itself, this active illumination can (probably) be switched off.

Fig. 4
figure 4

TOF measurement artifacts: a poor illumination and b out of phase problem

4 Time-of-flight based flame detection (indoor, distance < 10m)

A general scheme of the indoor TOF based flame detector is shown in Fig. 5. The proposed algorithm consists of three stages. The first two stages, i.e., the fast changing depth detection and the amplitude disorder detection are processed simultaneously. The last stage, i.e., the region overlap detection, investigates the overlap between the resulting candidate flame regions of the prior stages. If there is an overlap, fire alarm is given. Because the proposed flame detection algorithm requires reliable depth maps, its detection distance is limited to the range of the TOF camera, which is closer than ten meter.

Fig. 5
figure 5

General scheme of the TOF based flame detector

4.1 Fast changing depth detection

The fast changing depth detection starts with calculating the accumulated frame difference AFD n (1) between the current depth frame \(F^{\rm depth}_n\) and the previous and the next depth frame, i.e., \(F^{\rm depth}_{n-1}\) and \(F^{\rm depth}_{n+1}\) respectively. By rounding the absolute frame differences, the \(AFD^{\rm depth}_n\) is able to distinguish fast changing flames from more slowly moving ordinary objects. Pixels which \(AFD^{\rm depth}_n\) is greater than zero get a label 1 in the candidate flames image \(Flames^{\rm depth}_n\) (2). Other pixels get a label zero.

$$ AFD^{\rm depth}_n = \lfloor \, |F^{\rm depth}_n - F^{\rm depth}_{n+1}| + |F^{\rm depth}_n - F^{\rm depth}_{n-1}| \, \rceil $$
(1)
$$ Flames^{\rm depth}_n = \begin{cases} 1 & \mbox{where } AFD^{\rm depth}_n > 0\\ 0 & \mbox{otherwise } \end{cases} $$
(2)

Next, a morphological closing with a 3 ×3 structuring element connects neighboring candidate flame pixels, i.e. pixels with a label 1 in \(Flames^{\rm depth}_n\). Subsequently, a morphological opening filters out isolated candidate flame pixels using the same structuring element. The resulting connected flame pixel group(s) of \(Flames^{\rm depth}_n\) form the depth candidate flame region(s). An example of the fast changing depth detection is shown in Fig. 6.

Fig. 6
figure 6

Fast changing depth detection: a consecutive TOF depth images and their b morphologically filtered accumulated depth difference (\(Flames^{\rm depth}_n\))

Important to mention is that, as an alternative for the conventional morphological operators, we have also evaluated the added value of more advanced morphological alternatives, such as the opening by reconstruction proposed by Doulamis et al. [15]. According to the authors, this more advanced operator does better retain the contours of image objects. In the experimental results (Section 6), the performance of this more advanced technique is compared to the conventional morphological operations and the results of tests on realistic fire/non-fire video sequences are discussed.

4.2 Amplitude disorder detection

The amplitude disorder detection starts with a similar accumulated frame differencing (3) as the one which was used for the fast changing depth detection. However, as high \(AFD^{\rm amp}_n\) frame differences also occur at the boundary pixels of ordinary moving objects which are close to the TOF sensor, this feature alone is not enough for accurate flame detection.

$$ AFD^{\rm amp}_n = \lfloor \, |F^{\rm amp}_n - F^{\rm amp}_{n+1}| + |F^{\rm amp}_n - F^{\rm amp}_{n-1}| \, \rceil $$
(3)

In order to distinguish flame pixels from the boundary pixels of ordinary ‘close’ moving objects, the discrete wavelet transform (DWT) [26] of the amplitude image is also investigated. Experiments (Fig. 7) revealed that flame regions are uniquely characterized by high values in the horizontal H, vertical V and diagonal D detail images of the DWT. Ordinary ‘close’ moving objects do not have this characteristic. For this reason, a \(AFD^{\rm amp}_n\) region R with high accumulated amplitude differences is only labeled as candidate flame region, i.e. gets a value of 1 in \(Flames^{\rm amp}_n\) (5), if it contains high H,V and D values in the DWT detail images, i.e. \(DWT^{\rm detail}_R = 1\) (4).

$$ DWT^{\rm detail}_R = \begin{cases} 1 & \mbox{if } \max(H_R) \times \max(V_R) \times \max(D_R) = 1 \\ 0 & \mbox{otherwise } \end{cases} $$
(4)
$$ Flames^{\rm amp}_n = \begin{cases} 1 & \mbox{where } AFD^{\rm amp}_n > 0 \,\,\mbox{AND} \,\, DWT^{\rm detail}_R = 1\\ 0 & \mbox{otherwise } \end{cases} $$
(5)
Fig. 7
figure 7

Discrete wavelet transform of amplitude image: flames show high values in horizontal (H), vertical (V) and diagonal (D) detail images

Analogously as in the fast changing depth detection, the morphological filtering connects neighboring candidate flame pixels in \(Flames^{\rm amp}_n\) and filters out isolated candidate flame pixels. The resulting connected flame pixel group(s) of \(Flames^{\rm amp}_n\) (Fig. 8) form the amplitude candidate flame region(s).

Fig. 8
figure 8

Amplitude disorder detection: a consecutive TOF amplitude images and their b morphologically and DWT filtered accumulated depth differences (\(Flames^{\rm amp}_n\))

4.3 Region overlap detection

This last stage investigates the overlap between the depth and the amplitude candidate flame region(s), i.e. \(Flames^{\rm depth}_n\) and \(Flames^{\rm amp}_n\) respectively. Important to mention is that, in order to do this, the depth map and the amplitude image need to be registered. However, as they are both obtained using the same sensor, both TOF outputs are already aligned on each other. In order to detect the overlap, it is sufficient to perform a logical AND operation between \(Flames^{\rm amp}_n\) and \(Flames^{\rm depth}_n\). If the resulting binary image contains one or more ’common’ pixels, i.e. pixels with a value of 1, fire alarm is given. In Fig. 9, an example of this region overlap detection is shown.

Fig. 9
figure 9

Region overlap detection: c logical AND of a depth and b amplitude candidate flame regions

5 Visual - TOF flame detection (outdoor, distance > = 10m)

In an outdoor environment or over longer distances, i.e., out of the range of the TOF camera, the depth information of the TOF sensor becomes unreliable. For this reason, the indoor TOF-based flame detector, which is introduced in the previous section, cannot be used under these circumstances. One could think of only using the TOF amplitude information, however, relying on this feature alone can cause mis-detections, as high values in the H, V, and D detail images of the DWT amplitude images can also occur due to amplitude measurement artifacts. As such, we propose to use a visual detector in addition to the TOF amplitude disorder detection. The proposed visual flame detector is based on a set of low-cost visual flame features which have proven useful in distinguishing fire from ordinary moving objects [48].

A general scheme of the ’outdoor’ visual-TOF based flame detector is shown in Fig. 10. The proposed algorithm consists of three stages and is similar to the previous described ‘indoor’ TOF based flame detection algorithm. The first two stages, i.e., the low-cost visual flame detection and the amplitude disorder detection are processed simultaneously. The last stage, i.e., the region overlap detection, investigates the overlap between the resulting candidate flame regions of the prior stages. If there is an overlap between these flame regions, fire alarm is given.

Fig. 10
figure 10

General scheme of the TOF-visual based flame detector

Important to mention is that, in order to perform the region overlap detection, the visual RGB image and the amplitude image need to be registered. Some types of TOF cameras, e.g. the OptriCam [30], already offer both TOF sensing and RGB capabilities and their visual and TOF images are already registered. The majority of TOF cameras, however, still does not have this RGB capabilities. As such, visual-TOF registration, i.e. the calculation of the visual-TOF transformation parameters, is necessary.

5.1 Low-cost visual flame detector

The low-cost visual flame detector (Fig. 11) starts with a dynamic background subtraction [39, 44], which extracts moving objects by subtracting the video frames with everything in the scene that remains constant over time, i.e. the estimated background. To avoid unnecessary computational work and to decrease the false alarms caused by noisy objects, a morphological opening, which filters out the noise, is performed after the dynamic background subtraction. Each of the remaining foreground objects is further analyzed using a set of visual flame features.

Fig. 11
figure 11

General scheme of the low-cost visual flame detector

In case of a fire object, the selected features, i.e. spatial flame color disorder, principal orientation disorder and bounding box disorder, vary considerably over time. Due to this high degree of disorder, extrema analysis is chosen as a technique to easily distinguish between flames and other objects. It is related to the number of local maxima and minima in the set of data points. For more detailed information the reader is referred to [49].

5.2 Multi-sensor image registration

In order to combine the information in our multi-sensor visual-TOF setup, the corresponding objects in the scene need to be aligned, i.e. registered. The goal of registration is to establish geometric correspondence between the multi-sensor images so that they may be transformed, compared, and analyzed in a common reference frame [36]. Because corresponding objects in the visual and TOF amplitude image may have different sizes, shapes, features, positions and intensities, as is shown in Fig. 12, the fundamental question to address during registration is: what is a good image representation to work with, i.e. what representation will bring out the common information between the two multi-sensor images, while suppressing the non common information between those images [23]?

Fig. 12
figure 12

Comparison of corresponding objects in TOF-visual images

When choosing an appropriate registration method, a first distinction can be made between automatic and manual registration. In applications with manual registration, e.g. using a calibration checkerboard [24], a set of corresponding points are manually selected from the two images in order to compute the parameters of the transformation and the registration performance is evaluated by subjectively comparing the registered images. This is repeated several times until the registration performance is satisfied, i.e., the registration criteria is reached. If the background changes, e.g. due to camera movement, the entire procedure needs to be repeated. Because this manual process is labor intensive, automatic registration is more desirable. Therefore, we adapt the latter in our system.

A second distinction for an appropriate (automatic) registration method, is between region, line and point feature-based methods [54]. It is necessary to use features that are stable with respect to the sensors, i.e. the same physical artifact produces features in both images. Compared to the correspondence of individual points and lines, region-based methods, such as silhouette mapping, provide more reliable correspondence between color and TOF amplitude image pairs [11].

For example, comparing the visual and TOF images in Fig. 12, one can see that some information varies a lot, but what is most similar are the silhouettes. Therefore, the proposed image registration method performs a match of the transformed color silhouette of the calibration object, i.e. a moving person, to its amplitude silhouette. The mutual information, i.e. the silhouette coverage, is assumed to reach its maximal value when both images are registered. However, knowing that the same silhouettes extracted from TOF and visual images can still have different details (as shown in the experiments), a complete exact match is (quasi) impossible. It is also important to mention that, instead of using a person as the calibration object, also other (moving) objects in the scene can be used.

The proposed silhouette contour based image registration algorithm, which is shown in Fig. 13, coarsely registers the images taken simultaneously from the TOF and visual parallel sensors whose lines of sight are close to each other. The registration starts with a moving object silhouette extraction [11] in both visual and TOF image to separate the calibration objects, i.e. the moving foreground, from the background, which is assumed to be static. Key components of the moving object silhouette extraction are the dynamic background subtraction, automatic thresholding and morphological filtering with growing structuring elements, which grow iteratively until a resulting silhouette is suitable for visual-TOF silhouette matching. After silhouette extraction, 1D contour vectors are generated from the resulting IR and visual silhouettes using silhouette boundary extraction, Cartesian to polar transform and radial vector analysis. Next, in order to retrieve the rotation angle and the scale factor between the LWIR and visual image, these contours are mapped onto each other using circular cross correlation [19] and contour scaling. Finally, the translation between the two images is calculated using maximization of binary correlation. The retrieved transformation parameters are used in the region overlap detection to align the visual with the TOF amplitude image.

Fig. 13
figure 13

Silhouette-based image registration of visual and TOF amplitude images

For a more detailed description of this silhouette based registration method, the reader is referred to [46, 47], in which the same registration process is used for LWIR-visual image registration. The experiments in Fig. 14 (and the referred work) show that the proposed method automatically finds the correspondence between silhouettes from synchronous multi-sensor images.

Fig. 14
figure 14

Examples of visual and TOF amplitude image registration: a visual and b TOF amplitude images; c registration check

6 Experimental results

The TOF camera used in this work is the Panasonic D-Imager [32]. The D-imager is one of the leading commercial products of its kind. Other appropriate TOF cameras are the CanestaVision from Canesta, the SwissRanger from Mesa Imaging, the PMD[vision] CamCube and the Optricam from Optrima [30]. The technical specifications of the D-Imager are shown in Fig. 15. The image processing code was written in MATLAB, and is sufficiently simple to operate in real-time on a standard desktop or portable personal computer.

Fig. 15
figure 15

D-Imager and its technical specification

6.1 Indoor experiments

To illustrate the potential use of the proposed indoor TOF based flame detector, several realistic fire and non-fire indoor experiments were performed. An example of these experiments, i.e., the paper fire test, is shown in Fig. 16. As can be seen in the depth maps, the measured depth of flames changes very fast. Even between two consecutive frames, very high depth differences are noticeable. In the amplitude images, on the other hand, it can also be seen that the boundaries of the flames have a very high amplitude. Simultaneously to the TOF recording with the Panasonic D-Imager, we also recorded the experiments with an ordinary video camera. As such, the TOF detection results can be compared to state-of-the-art VFD methods.

Fig. 16
figure 16

Paper fire test: a TOF depth map and b corresponding amplitude image of two consecutive frames; c ordinary video (not registered)

In order to objectively evaluate the detection results of the proposed algorithm, the detection rate metric (4) is used. This metric is comparable to the evaluation methods used by Celik et al. [9] and Toreyin et al. [40]. The detection rate equals the ratio of the number of correctly detected fire frames, i.e. the detected fire frames minus the number of falsely detected frames, to the number of frames with fire in the manually created ground truth (GT).

$$ detection\,rate = \frac{(\# detected - \# false\,detections)} {\#\, GT\, fire\, frames} $$
(6)

The results in Table 1 show how robust fire detection can be obtained with relatively simple TOF image processing. Compared to the VFD detection results, i.e., an average detection rate of 93% and an average false positive rate of 2%, the proposed TOF-based flame detector, with its 96% detection rate and no false positive detections, performs better for these primary experiments. The indoor detector, however, is not able to detect the fire in outdoor situations or outside the range of the TOF camera. Main reason of its failing is the fact that its depth maps becomes unreliable under these circumstances. In order to cope with this problem, an outdoor visual-TOF flame detector was introduced in Section 5. The following subsection discusses its performance.

Table 1 Performance evaluation of indoor TOF-based fire detection

Important to mention is that, as an alternative for the conventional morphological operators, we have also evaluated the added value of the more advanced morphological technique proposed by Doulamis et al. [15]. As can be seen in the last column of Table 1, the results achieved with the proposed conventional morphological operators are comparable to, i.e., do not much differ from, the advanced opening by reconstruction (shown between brackets). As such, due to their lower computational complexity, the conventional morphological operators still seem a good choice.

6.2 Outdoor experiments > 10m

Analogously as for the evaluation of the indoor detector, several realistic fire and non-fire experiments were performed to illustrate the potential use of the outdoor visual-TOF flame detector. An example of these experiments, i.e. the Christmas tree fire, is shown in Fig. 17. In order to test the detection range of the proposed multi-sensor detector, the distance between the sensors and the fire/moving objects is also varied during these experiments.

Fig. 17
figure 17

Christmas tree experiment: a TOF depth map and b corresponding amplitude image; c ordinary video

As the results in Table 2 show, robust flame detection can be obtained with the proposed multi-sensor visual-TOF image processing. Compared to the VFD detection results, i.e., an average detection rate of 88% and an average false positive rate of 4%, the outdoor detector, with its average detection rate of 92% and no false positive detections, performs better.

Table 2 Performance evaluation of outdoor visual-TOF fire detection

By further inspecting the tests in Table 2, one can also see that increasing the distance between the cameras and the fire source, does not much influence the detection results. For example, the detection rate of the outdoor wood fire test at 22 m is around 89%, which is quasi as good as the 91% of the straw fire test at 7 m. Similar to the indoor detector results, the use of more advanced morphological techniques did also not have significant effect on the flame detection rate.

The results also show that, compared to the indoor detector, the average detection rate of the outdoor detector is a little lower. This can mainly be attributed to the fact that the resolution of the TOF camera, for the moment, is too low to detect small objects over long distances. Very small flames (e.g. in the beginning of the fire) are, as such, not detected.

7 Conclusions

Two novel time-of-flight based fire detection methods for indoor and outdoor fire detection are proposed in this paper. The indoor detector focuses on the most appropriate TOF flame features, i.e., a fast changing depth and a high amplitude disorder, which combination is unique for flames. The fast changing depth is detected by accumulated frame differencing between three consecutive depth frames. Regions which have multiple pixels with a high accumulated depth difference are labeled as candidate flame regions. Simultaneously, regions with high accumulative amplitude differences and high values in all detail images of the amplitude image its discrete wavelet transform are also detected. These regions are labeled as the candidate flame regions of the amplitude images. If the resulting candidate flame regions of the fast changing depth detection and the amplitude disorder detection overlap, fire alarm is given. Experiments show that the proposed indoor detector yields an average flame detection rate of 96% with no false positive detections.

The outdoor detector, on the other hand, only differs from the indoor detector in one of its multi-modal inputs. As depth maps are unreliable in outdoor environments and outside the range of the camera, the outdoor detector uses a visual flame detector instead of the fast changing depth detection. Outdoor experiments show that this multi-sensor detector has an average flame detection rate of 92% and also has no false positive detections.

Both of the proposed detectors can be very essential tools for future environment crisis management systems. An example of such a system is described in the work of Vescoukis et al. [50]. Currently, the proposed work is also evaluated within test cases of a Belgian and an European project, i.e., the car park fire safety project [29] and the Firesense project [16]. Both projects have similar objectives as the work of Vescoukis et al., i.e., they aim to deliver a system in which severel sensors are monitored simultaneously and combined/fused with context information about the environment. Preliminary test results within those projects show the practical application, i.e., the valorization potential, of the proposed time-of-flight work. As the proposed techniques can easily be translated to other domains, e.g., to detect dynamic regions within the context of crowd analysis [53], our work on multi-modal TOF based video analysis will only increase in value in the coming years.

Finally, it is important to remark that the proposed algorithms will also be able to detect multiple fire seeds within the same ‘view’, i.e., a requirement for many apllications. As the proposed algorithms work on a ‘blob’ level (~flame regions), they have no difficulties to detect multiple fireplaces. Furthermore, as the multi-modal images are registered, a candidate flame region on position (x,y) in one image will not be able to influence a candidate flame region on a ‘non-overlapping’ position (x′, y′) in the other multi-modal image.