Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

7.1 Introduction

Decades of research in face recognition has seen several research directions, mostly in the visible spectrum and many high-performing algorithms have been developed for this purpose. To instigate further research, several research programs such as JanusFootnote 1 have been initiated where the goal is to take face recognition to the next significant level. It is also well understood that in order to have a large-scale application, the technology has to encompass face recognition both in and beyond visible spectrums, i.e., developing capabilities to recognize face images/videos in visible, near infrared, and thermal spectrums. Compared to the visible spectrum, research in face recognition beyond visible spectrum is relatively less explored and has primarily focused on near infrared and thermal imagery [2, 3, 5, 6, 15, 19]. As shown in Fig. 7.1, face images in these three spectrums provide non-overlapping information and can be individually or in-combination used for identity management.

Fig. 7.1
figure 1

Sample face images captured in visible, thermal, and near-infrared (NIR) spectrum. NIR image has been taken from CASIA NIR-VIS 2.0 face database [14]

For recognizing face images captured in thermal images (spectrum range of 8–12 μm), the first step is the face detection followed by feature extraction and matching against gallery image(s). Similar to visible spectrum, thermal face detection can be modeled as a two-class problem (face and non-face). Trujillo et al. [25] proposed a thresholding-based approach for detecting faces in thermal images. Since the goal is to recognize expressions, face detection accuracy is not reported in that study. Selinger and Socolinsky [21] and Socolinsky and Selinger [23] applied boosted class-cover catch digraph (CCCD) [24] for face detection. They [23] observed that thermal face recognition performance degrades in outdoor environments. Since the overall goal was to identify the subject, the results of the intermediate face detection stage were not reported. In [23], the authors focused on face recognition and the results of detection were not reported. However, it is possible that in thermal spectrum the outdoor setting affects the detection stage too, particularly in thermal spectrum. Martinez et al. [18] utilized GentelBoost along with Haar-like features [26] and, the results showcase that, to an extent, boosting with Haar-like features can be utilized for face detection. However, evaluation in challenging environments remains an open research problem. Wang et al. [28] observed that Haar-like features with AdaBoost can be useful for detecting eyes, even in the presence of glasses. Zhang et al. [29] proposed a modified boosting approach in which visible images could also be utilized along with the images of other spectrum to train the cascade model. Table 7.1 briefly summarizes these algorithms on face detection in thermal images.

Table 7.1 Summary of related research for face detection in thermal images

For designing an efficient face recognition algorithm, it is important that the face detection is accurate. It has been observed that imprecise eye localization and therefore imprecise face localization degrades the performance of the overall thermal face recognition pipeline [6]. Since the majority of researchers have used manually detected face images in the recognition pipeline, thermal face detection has not been well explored in the literature. Moreover, in order to learn an efficient face detector it is imperative to have access to a large amount of face and non-face images. The samples obtained in diverse conditions, such as indoor and outdoor environments, with session variations are necessary to learn a generalizable detector. In our opinion, thermal face detection and recognition research is impaired by the non-availability of a challenging database that includes face and non-face images captured in both indoor and outdoor environments with time lapse variations. Moreover, there exists a very limited literature focusing on detection of occluded thermal faces. Therefore, it is important that the challenge of face detection is addressed to achieve a fully automated and efficient thermal face recognition system. In view of existing limitations, this chapter attempts to bridge the gap in the following ways:

  • A database, namely IIITD thermal face database, is prepared that consists of 614 face images pertaining to 65 subjects and 150 non-face images. Face images are captured in two sessions separated by two years time frame. Non-face images are captured in both indoor and outdoor settings. A small set, IIITD-People in Sun and Evening (IIITD-PSE), consisting 22 subjects is also prepared to study the variations due to outdoor day light (sun) and nighttime environments. The database and the ground truth annotations of face regions will be made publicly available for researchers to undertake research on thermal face detection via https://research.iiitd.edu.in/groups/iab/.

  • Baseline experiments pertaining to face localization are performed on the IIITD thermal face, IIITD-PSE, and Notre Dame (ND) thermal face [6] databases with Haar- and LBP-cascaded AdaBoost to analyze the challenges associated with face detection in thermal images. Challenging scenarios such as cross-sensor thermal face detection and the effects of outdoor conditions (i.e., day or night) are also examined. A baseline evaluation of detecting faces under occlusion (using disguise accessories) is also performed on the IIITD In and Beyond Visible Spectrum Disguise (I2BVSD) face database [8, 9]. A skin detection-based region of interest (ROI) selection is proposed, to improve the face detection performance. We also propose a novel face detection evaluation measure to evaluate the performance of face detection algorithms.

7.2 IIITD Thermal Face Database

As shown in Table 7.1, there are multiple thermal face databases available. However, all of them captured face data with the objective of face recognition in controlled environments and may not be suitable for understanding the state of the art of face detection algorithms in the thermal spectrum. Further, existing face detection algorithms have been optimized for the visible spectrum, and since both visible and thermal spectra have different characteristics, such optimized pre-trained models may not yield the best results. Therefore, we have collected the IIITD thermal face database with a focus on capturing the variations that may affect the appearance of facial regions in a thermal image, for instance, time lapse and environment. The IIITD database consists of 614 thermal face images pertaining to 65 individuals and 150 non-face images. All the images are captured using a thermal camera having micro-bolometer sensor operating in 8-14 μm spectrum range, also known as long-wave infrared spectrum. Face images are near frontal with neutral expressions and are captured in two sessions:

  • Session I is captured in October/November 2011 and it consists of 82 images pertaining to 41 subjects.

  • Session II is captured in January 2014 and it consists of 532 images from 65 subjects. There are 41 overlapping subjects in both the sessions.

A set of 150 non-face images is collected out of which equal number of images are captured indoor and outdoor. Since a face can appear very different during day and night in a thermal image, we collected a separate dataset named IIITD-People in Sun and Evening (IIITD-PSE) database to capture these variations. It consists of 120 images pertaining to 22 subjects acquired in outdoor settings during both day (around 2 p.m. and ~36 °C temperature) and night (around 10 p.m. and ~22 °C temperature). Both subsets, the IIITD-PSE-Day and IIITD-PSE-Night, contain 60 images pertaining to 15 subjects, with an overlap of 8 subjects.

All the images are of size 720 × 576 pixels. The details of both IIITD and IIITD-PSE datasets are summarized in Table 7.2. Figure 7.2 illustrates the variety of images contained in the IIITD and IIITD-PSE databases. For evaluating the performance of face detection algorithms, the ground truth has been manually annotated in terms of two eyes, nose, and mouth coordinates. To encourage the research on the problem of thermal face detection, the database and annotated ground truth will be made publicly available to researchers.

Table 7.2 Dataset details pertaining to sessions, subjects, and classes
Fig. 7.2
figure 2

Images illustrating the variations captured in the IIITD and IIITD-PSE thermal face database. Each row contains images of one subject under different environments

7.3 Databases, Algorithms, and Evaluation Measures

For understanding the performance of thermal face detection , Collection XI of the University of Notre Dame (ND) face dataset [6] and I2BVSD database [8, 9] are used along with the IIITD and IIITD-PSE databases. The ND face database consists of 2292 infrared frontal face images of size 312 × 239 from 82 subjects. It also contains a separate training set of 159 face images. Figure 7.3 shows sample images of subjects from the ND database.

Fig. 7.3
figure 3

Sample images from the ND thermal face database [6]

For studying the effect of occlusion using disguise accessories, we utilize the I2BVSD face dataset [8, 9] which is the only publicly available dataset containing images with occlusion. The dataset consists of 681 face images pertaining to 75 subjects in both visible and thermal spectra. The utilization of disguise accessories results in varying amount of face occlusion. Depending on the facial regions which are occluded, the dataset is divided into two parts, major disguises (231 images) and minor disguises (307 images). Sample images contained in the dataset are shown in Fig. 7.4. For this research, we utilize only the thermal spectrum images having disguise variations (538 images, 75 subjects).

Fig. 7.4
figure 4

Sample images from the I2BVSD thermal face database [8, 9]. Face images from subsets pertaining to face occlusion due to a minor and b major usage of disguise accessories. While I2BVSD database has images in both visible and thermal spectrum, we have used only thermal images in the experiments

7.3.1 Algorithms

The baseline performance has been established for two face detection algorithms:

  1. 1.

    Haar-like features [17, 26] with cascaded AdaBoost classifier: Haar-like features are computed from rectangular regions of the image. Every rectangle is divided into multiple non-overlapping sub-rectangles. Pixel intensities of each sub-rectangle are added and the differences of summed intensities of sub-rectangles are used as features. Sub-rectangles are created such that these differences provide coarse information about horizontal, vertical, and diagonal gradients.

  2. 2.

    Local binary patterns (LBP) features [16] with cascaded AdaBoost classifier: In LBP, the difference between every pixel and its neighbors is computed. The sign of differences is represented using a binary bit. The string of these binary bits for every pixel is converted to decimal. An LBP-coded image representation is obtained by replacing every pixel with its corresponding decimal values. The final feature is represented in terms of histograms obtained from local regions of the LBP-coded image representation.

Algorithm 1 briefly summarizes the cascade boosting face detection approach utilizing Haar/LBP.

7.3.2 Evaluation Measures

In the existing literature, performance and effectiveness of face detection algorithms is measured using metrics such as mean square error (MSE). However, these statistics only present the difference between the ground truth and automatically segmented landmark points. In our understanding, MSE is a more useful metric for landmark detection and it is not very informative if one wants to compare the regions of interest and analyze falsely detected and falsely rejected regions. The objective of a face detection algorithm is to be able to detect the complete region of interest so that it has maximum intersection with the ground truth. To evaluate the performance with respect to this intersection criterion, we propose the following two measures:

  • Ratio with ground truth (RG): RG presents the ratio of intersection of the predicted region and ground truth with the area of ground truth segmentation.

    $${\text{RG}} = \frac{{{\text{Area}}_{D \cap G} }}{{{\text{Area}}_{G} }}$$
    (7.1)
  • Ratio with detection (RD): RD presents the ratio of intersection of the predicted region and ground truth with the area of detected region.

    $${\text{RD}} = \frac{{{\text{Area}}_{D \cap G} }}{{{\text{Area}}_{D} }}$$
    (7.2)

Here, D and G represent the detected and ground truth face regions (rectangles), respectively. The visual interpretation of RG and RD together is shown in Fig. 7.5. High RD along with low RG indicates that while there is a good overlap between the two, a smaller face rectangle is detected compared to the ground truth. Similarly, high RG along with low RD indicates that automatic face detection algorithm has detected a larger face rectangle compared to the ground truth. Their values lie in the range of [0, 1] and for ideal face detection, both should be very close to one. RG and RD can be more effectively used together to analyze the results. In this research, we observe that a threshold of 0.7 for both RG and RD can be used to consider successful face detection.

Fig. 7.5
figure 5

RG and RD measures. High values of RG and RD together ensure a good face detection

7.4 Results and Analysis

The performance of the algorithms has been evaluated in four different scenarios:

  • Testing with visible cascade: The publicly available visible spectrum face detector model is utilized to detect faces in thermal spectrum images. This experiment establishes the baseline for performance evaluation.

  • Learning a model for thermal images: We learn detectors using thermal face and non-face images. The images are preprocessed using histogram equalization followed by feature extraction using LBP or Haar.

  • Effect of environment and sensor: The effect of environmental factors such as indoor/outdoor setting and day/nighttime is studied using the IIITD-PSE dataset. We evaluate the effect of sensor interoperability on face detection . This set of experiments is aimed to study the generalizability of face detection models.

  • Effect of occlusion: In this set of experiments, we study the effectiveness of the learned thermal face detector on occluded faces. Along with LBP and Haar features, a skin detection-based ROI selection approach is also presented.

7.4.1 Testing with Cascade Trained on Visible Images

The first experiment is performed to evaluate the performance of pre-trained Haar cascades (available with OpenCV [4]) on thermal spectrum images. Since Haar cascade is originally trained for visible spectrum, this experiment also provides an understanding about face detection performance with cross-spectral training. For this experiment, the IIITD database (614 images) and the test partition of the ND dataset (2292 images) are utilized as test sets. In all the experiments, when multiple face rectangles are detected in an image, the largest one is considered as the detected rectangle. The graphs of normalized image count verses their RG and RD are shown in Fig. 7.6. The horizontal axis represents RG (or RD), whereas the vertical axis represents the ratio of the number of images having specific RG (or RD) to the total number of images. As it can be seen, very small proportion of face images resulted in high RG or RD values. This shows that pre-trained visible image cascade is not appropriate for face detection in thermal images. No face rectangle is detected in 15.9 % images of the IIITD dataset and 31.5 % images of the ND dataset. As shown in Table 7.3, for both IIITD and ND datasets, many images have low RG and RD, which further show poor face detection results.

Fig. 7.6
figure 6

Pre-trained Haar with face detection on the a IIITD dataset and b ND dataset. Horizontal axis represents the value of RG and RD. Vertical axis represents the normalized image count with corresponding RG and RD. Normalized image count is computed as \(\frac{{{\# }\,{\text{images}}\;{\text{with}}\;{\text{corresponding}}\;{\text{RG}}\;{\text{or}}\;{\text{RD}}}}{{{\# }\,{\text{total}}\;{\text{images}}}}\)

Table 7.3 Summary of face detection results with pre-trained cascade and the cascade trained with thermal images on the IIITD and ND thermal face databases

7.4.2 Learning a Cascade Model for Thermal Faces

Since pre-trained cascade model does not exhibit effective performance, it is important to train the face detection model using thermal data. We utilize face and non-face images captured in thermal spectrum for this task. From the IIITD dataset, 307 randomly selected face images and all the 150 non-face images are used as the training set and testing is performed on the remaining (unseen) 307 images. For ND dataset, training is performed with a predefined set of 159 train images and testing with 2292 images. The LBP cascade model is trained and the results on the testing database are shown in Fig. 7.7. It can be observed that compared with pre-trained cascade, there is a substantial increase in the number of images with higher RG and RD when cascades trained on thermal images are used. At least one face rectangle is detected in each image of the IIITD dataset, whereas in 6.41 % images of the ND dataset no face rectangle is detected. Further, Table 7.3 also shows that training on thermal images helps in improving face detection results. However, there is a further scope of improvement, as faces are detected reasonably well in only about 60 % images.

Fig. 7.7
figure 7

Face detection using LBP cascade learned on the a IIITD dataset and b ND dataset

7.4.2.1 Learning a Cascade Model from Combined Dataset

One possible way to further improve the face detection performance is to learn the model using data containing large variations. In order to achieve this, we train a model using both the datasets: 307 and 159 images from the IIITD and ND datasets, respectively, comprise the face samples of the training set for this experiment. The cascaded AdaBoost model is trained using LBP features. The results pertaining to this experiment are shown in Table 7.4 and Fig. 7.8a, b. Moreover, Table 7.4 shows that there is a significant improvement in the correct detections (RG > 0.7 and RD > 0.7 together), with 2 and 3 % improvement for IIITD and ND datasets, respectively. Also the number of undetected faces reduces significantly.

Table 7.4 Results of face detection when the model is trained with combined IIITD + ND databases and tested on IIITD and ND thermal face databases
Fig. 7.8
figure 8

Face detection using LBP cascade with training on combined ND and IIITD dataset and testing on the a IIITD dataset and b ND dataset. Corresponding result on c IIITD dataset and d ND dataset when the images are preprocessed using histogram equalization is obtained

To further reduce the difference between images from the two databases, image histogram equalization is applied. It is our assertion that histogram equalization can help reduce the effect of the sensor- and/or environment-specific variations. Therefore, LBP features are obtained after preprocessing the images using histogram equalization. As shown in Table 7.4 and Fig. 7.8c, d, there is a slight improvement in performance when images are preprocessed using histogram equalization. The detection rate of 0.65 is obtained on both the sets.

On this combined training set, the effectiveness of Haar cascade is also evaluated with histogram equalization preprocessing. For this experiment, the cascaded AdaBoost model is learned on the Haar features obtained from histogram-equalized images. The results are shown in Fig. 7.9 and Table 7.4. The results show that Haar cascade with histogram equalization preprocessing performs considerably better in the given scenario by further improving the detection rate to 0.70 and 0.77 on the IIITD and ND datasets, respectively. However, there is a trade-off in terms of training time and accuracy, with Haar cascade requiring more training time and exhibiting better results than that of LBP cascade. Note that there is still scope for improvement as the detection accuracy rate is in the range of 0.70–0.80. Figure 7.10 shows sample detection results of the Haar feature-based cascade learned using combined training set on the IIITD and ND datasets, which yields about 65 % images with successful face detection .

Fig. 7.9
figure 9

Face detection using Haar cascade with training using histogram equalization preprocessed ND and IIITD dataset and testing on a IIITD dataset and b ND dataset

Fig. 7.10
figure 10

Examples of a good (\({\text{RG}} > 0.7\) and \({\text{RD}} > 0.7\)) and b poor detection results on the IIITD (top row) and ND (bottom row) datasets. Green (solid lines) and red (dashed lines) rectangles represent the ground truth and detected face region (using Haar feature-based cascade learned on combined training set), respectively

7.4.2.2 Decision Fusion of Haar and LBP Cascades

Since Haar and LBP do not encode the same information, one may expect that both of them should find their applicability in encoding different kinds of variations. Therefore, it is possible that the set of images for which each of the techniques works the best may not be completely overlapping. This is a plausible condition for fusing two techniques and can potentially help further improve the overall accuracy. In order to combine, we follow a simple approach: If face is detected by only one of the techniques, the detected region is taken as the final decision. However, if a face is detected by both the techniques, the following two decision fusion approaches can be applied.

  • Fusion Approach 1: Out of the two candidate rectangles, select the smaller one. This approach assumes that the detection techniques are prone to overestimating the face rectangle size, thus selecting the smaller candidate rectangle should result in better detection.

  • Fusion Approach 2: Out of the two candidate rectangles, select the larger one. The second approach assumes that the detection techniques are prone to underestimating the size of face rectangle.

Both sets of experiments are performed along with histogram equalization and the results are shown in Table 7.5 and Fig. 7.11. As shown in Table 7.5, in both the fusion approaches, at least one face rectangle is detected in all but three images. The results show that the first fusion approach exhibits better results for the IIITD dataset, and fusion has little effect on the ND dataset.

Table 7.5 Summary of face detection results with the proposed fusion approaches on the IIITD and ND thermal face databases
Fig. 7.11
figure 11

Face detection using fusion approach 1 on the a IIITD and b ND datasets; and using fusion approach 2 on the c IIITD and d ND datasets

7.4.3 Effect of Sensor and Environment

Since thermal images are dependent on the heat emissivity of the object or surface, they may be affected by environmental aspects such as ambient temperature. These images are also dependent on the type of sensor used, and therefore, the interoperability of sensors can also affect the accuracy of the models learned from one camera. This section studies these two aspects of thermal face detection .

Effect of Day and Night Outdoor Environments

We utilize the IIITD-PSE dataset to understand the challenges of thermal face detection in outdoor settings along with the effect of capture during day and night. Figure 7.2 shows sample images of the same person captured in day and night environments. Since fusion approaches yield better results on the IIITD and ND databases, this experiment is also performed with the fusion approaches only. The results of this experiment are reported in Fig. 7.12. It can be observed that the model learned from images captured in indoor settings (IIITD + ND) is not effective on images captured outdoors during daytime (IIITD-PSE-Day). However, as shown in Table 7.6, the results are relatively better for outdoor nighttime images (IIITD-PSE-Night). During daytime, the temperature difference between the skin and the environment is smaller compared to nighttime, which might be affecting the overall contrast of the image. Therefore, daytime outdoor face detection in thermal spectrum needs further research.

Fig. 7.12
figure 12

Detection results on IIITD-PSE dataset. Face localization results on IIITD-PSE-Day set using a fusion approach-1 and b fusion approach-2. Similarly, results on IIITD-PSE-Night set using c fusion approach-1 and d fusion approach-2

Table 7.6 Summarizing the results of face detection with proposed fusion approaches on thermal images acquired in daytime and nighttime

Effect of Cross-sensor Training

Previous experiments show that if the cascade is trained using the images from same database, it provides good results. However, it is important to have a model which can be utilized across multiple datasets captured using different thermal imaging sensors. As shown in Figs. 7.2 and 7.3, images captured using two different sensors might look quite different. Therefore, a logical step is to examine the challenges due to cross-sensor data (i.e., problem of sensor interoperability).

The cascades trained on individual datasets during the previous experiments are used for detecting faces pertaining to the other dataset. For example, the cascade trained using the ND database is used for detecting faces from the IIITD database and vice versa. The results in Fig. 7.13 show that the model learned from one dataset does not cross-validate well when tested on the other dataset. This may be due to the fact that different datasets include different properties such as capturing environment, set of subjects, imaging resolution, and sensor characteristics.

Fig. 7.13
figure 13

Face detection using LBP cascade with a training on ND and testing on IIITD set and b training on IIITD and testing on ND dataset

7.4.4 Effect of Occlusion

In order to study the effect of occlusion, we have performed experiments on the I2BVSD face disguise dataset (thermal spectrum images only). The dataset is divided into two parts, (i) minor disguise and (ii) major disguise. The minor disguise subset consists of images of subjects wearing headgears, hair, and beard extensions which do not cover any of the vital features such as eyes, nose, and mouth. The major disguise subset consists of images of subjects wearing shades, mouth pieces, heavy beards, and/or any accessory which covers one of the vital features. The baseline evaluation is performed with cascaded AdaBoost models learned using combined training set (IIITD + ND). As shown in Table 7.7, detection accuracy for minor disguise subset is about 25 % for LBP and Haar-based cascaded AdaBoost detector. For major disguise subset, the LBP and Haar-based detectors yield 10 and 15 % detection accuracy, respectively. This shows that the cascaded AdaBoost-based approach is not very effective in the presence of occlusions. The corresponding results in terms of RG and RD are reported in Fig. 7.14. If we pose a constrained problem of face localization, i.e., given that there is a face image, locate it, we can utilize a skin detection-based approach for approximating facial regions with occlusion variations. Skin color-based region detection has been studied extensively in visible spectrum [11, 13, 20, 22]. However, to the best of our knowledge, skin color detection in thermal spectrum is still unexplored. Skin detection is comparatively easier in thermal spectrum because the heat patterns generated due to body temperature are typically distinct compared to background. Therefore, we present a skin detection-based ROI selection approach.

Table 7.7 Summarizing the results of face detection on the faces occluded using disguised accessories
Fig. 7.14
figure 14

Face detection using LBP cascade on a minor and b major disguise subsets, and using Haar cascade on c minor and d major disguise subsets. The results obtained using skin detection-based ROI selection are also reported

7.4.4.1 Skin Detection-Based ROI Selection

In order to reduce the number of falsely detected faces, we propose a skin detection-based ROI selection approach as a preprocessing stage to cascaded AdaBoost face detection . The steps involved in the proposed skin detection approach are shown in Fig. 7.15. Further details of skin detection are as follows:

Fig. 7.15
figure 15

The face detection pipeline with skin detection-based ROI selection. The blue, red, and green rectangles represent the selected ROI, detected face region, and ground truth face region, respectively

  • Features: For every pixel, a square neighborhood of \(k \times k\) is chosen as the representation. Thus, for every pixel a k 2 dimensional feature vector is obtained; and an image of size \(m \times n\) is represented using \(mn \times k^{2}\) feature set. In this work, neighborhood of k = 3 is chosen. This feature representation helps encode every skin pixel with respect to its neighborhood.

  • Skin and Non-skin Modeling: Using the ground truth face and non-face regions, corresponding skin and non-skin pixels are obtained. These distributions of skin and non-skin features are learned from the training data. The distributions essentially capture how the heat patterns of skin and non-skin pixels appear in the local neighborhood. Skin and non-skin distributions are modeled as:

    $$\begin{aligned} f_{s} ({\mathbf{x}}) & = {\fancyscript{N}}({\mathbf{x}},\mu_{s} ,\varSigma_{s} ), \\ f_{ns} ({\mathbf{x}}) & = \, {\fancyscript{N}}({\mathbf{x}},\mu_{ns} ,\varSigma_{ns} ) \\ \end{aligned}$$

    where, f s and f ns denote the probability of x belonging to skin and non-skin regions, respectively. \({\fancyscript{N}}( \cdot ,\mu ,\varSigma )\) denotes a normal distribution with mean \(\mu\) and variance \(\varSigma\). The training phase includes learning the mean and variance of skin and non-skin distributions.

  • Pixel classification: A pixel with feature representation x is classified as skin, if \(\log \left( {\frac{{f_{s} ({\mathbf{x}})}}{{f_{ns} ({\mathbf{x}}) + \varepsilon }}} \right) > \varepsilon\) where \(\varepsilon\) is a very small positive real number.

  • ROI Selection: At the end of the pixel classification stage, a binary mask is obtained for every image. Although almost all the facial regions are often obtained as skin regions, there may be some holes and/or there may be multiple connected components (due to occlusion). We propose to utilize the largest connected component of the binary mask, and the corresponding bounding box is utilized as the region of interest. Once the ROI is selected, a cascade classifier is utilized for finding the face location/bounding box.

7.4.4.2 Effectiveness of ROI Selection

Figures 7.16 and 7.17 show some examples of ROI selections and the effectiveness of the ROI selection approach, respectively. It can be observed that the proposed approach effectively rejects non-face regions of the image. As shown in Table 7.7 and Fig. 7.17, the skin modeling-based ROI selection alone, without any further face detector, yields RG of around 100 %. This shows that the ROI is almost always covering the face region. We further applied the AdaBoost cascade, learned in earlier experiments, on the ROI obtained using skin detection. As shown in Table 7.7, Fig. 7.14, and Fig. 7.18 ROI selection helps in improving detection accuracy, especially in case of minor disguise.

Fig. 7.16
figure 16

Samples of skin detection-based ROI selection approach. The red rectangle represents the ROIs obtained

Fig. 7.17
figure 17

Effectiveness of skin detection-based ROI selection is shown for a minor and b major disguise subsets. On both the sets, the RG values are comparatively very high for large number of images, suggesting that very little ground truth facial region is discarded in ROI selection

Fig. 7.18
figure 18

Results of the proposed fusion approaches on a minor and b major disguise subsets

7.5 Conclusion and Future Research Directions

Thermal face detection has been a relatively unexplored area of research and there are multiple covariates that affect the performance of face detection algorithms. This chapter presents a study to understand the effect of thermal imagining-specific covariates on the performance of face detection in thermal images. There are three contributions of this chapter:

  1. 1.

    We prepared two thermal face databases, namely the IIITD thermal face database that contains 614 face images of 64 subjects and 150 non-face images, and IIITD-PSE database comprising 120 images captured during daytime and nighttime to study the effect of ambient temperature on thermal face detection .

  2. 2.

    We analyzed the performance of three algorithms: AdaBoost-based face detector with LBP features, AdaBoost face detector with Haar-like features, and fusion of the LBP and Haar-like features. The performance is analyzed not only on the IIITD thermal and PSE databases, but also on the Notre Dame and I2BVSD face databases. The use of these two existing databases helps us to understand the impact of interoperability and occlusion on thermal face detection .

  3. 3.

    We propose two new metrics of face localization, RG and RD, which in combination provide the true performance of face detection.

    The results show that decision level fusion of Haar-like and LBP features is promising and preprocessing using histogram equalization also helps in improving the detection accuracy. This may point out that preprocessing is one of the key components in addressing environmental covariates in thermal face detection . Further, the results on cross-dataset experiments, indoor–outdoor and day–night variations, and occlusions using facial accessories reveal challenging nature of the problem.