Keywords

1 Introduction

The working of visible imaging-based face recognition methods highly depends on illumination conditions. The change in illumination intensity of the light source and change in position of the camera degrade the performance significantly. The visible light face recognition systems may not work efficiently in outdoor settings. In outdoor settings, it is difficult to control the intensity of light. The visible light face recognition systems do not work in dark environments and night vision applications. To overcome above-stated limitations, infrared imaging-based recognition systems were found in the literature. Infrared imaging-based recognition systems do not depend on the illumination conditions unlike visible light systems. For the night vision application, passive thermal sensors are proposed recently. These sensors take the radiation having 3–14 µm of wavelengths emitted by objects. In critical security applications, the use of infrared imaging for face recognition has been expanding. Various setups have been proposed for face recognition using infrared imaging, including some methods that are used in the visible light-based face recognition system. These methods work using local features and take out typical local structural information. Some methods are fusion based, in which visible and thermal images or features are fused to get the final output. In cross modality-based methods, distinctive features of visible and thermal images are matched. In deep learning-based techniques, observe mapping between thermal images and corresponding visible images of features.

The infrared (IR) portion of the electromagnetic spectrum is further split into four bandwidths named as LWIR (Long-Wave-Infrared), MWIR (Medium-Wave-Infrared), SWIR (Short-Wave-Infrared), and NIR (Near-Infrared). The Long-wave-IR is also known as Thermal-IR, and it has drawn the most notice because of its strength as compared to others. Thermal sensors are based on the heat amount produced from the object; unlike visible light sensors, they don’t work on reflected light rays. Thermal sensors can work under different light conditions even in dark environments. The heat produced from the object has less dominance to dispersion and adsorption by smoke and dust. Thermal sensors also disclose human face anatomical information that makes them capable to detect disguised faces [1]. The objective of this study is to convey novel research in human face recognition using the thermal imaging field.

The rest of this paper is organized as follows. In Sect. 2, the literature review of work related to infrared-based face recognition systems is presented. In Sect. 3, the challenges faced by researchers in the thermal face recognition system are discussed. In Sect. 4, the two datasets used by various researchers in their work are presented. The conclusion of this study is given in Sect. 5.

2 Literature Review

The system that has face recognition/identification aims to find out and learn the unique features of the face. Along with the learning unique features, it is also crucial to maximize the similarity between different images of the same person. Different images of the same person mean images taken in different conditions like the distance between sensor and face, lighting conditions, mood, pose, etc.

Various researchers proposed well-performing human face detection and recognition methods based on visible imaging as shown in Table 1. However, the various poses of the face and illumination conditions are limitations.

Table 1 Different techniques used by researchers for face recognition systems in thermal images

Local as well as global features of human face images are required in visible light-based human face identification system. The same features are extracted under illumination controlled environment. Some methods were focused on the pose of a person in which images or features are transformed into a subspace where the intra-class lection is minimized and inter-class lection is maximized for finding better taxonomy of the human facial images.

Recently, researchers have been showing their interest in deep learning for face recognition, mainly in the convolutional filters. Many neural networks have been proposed to overcome the limitation of face recognition systems like the pose of a person, low resolution, distance between face and sensor, variation in lighting conditions, etc. The major challenge in neural network methods is acquiring the huge dataset required for training purposes and due to the large dataset, computational cost also increases.

Visible light face recognition methods are performing well but they can fail in dark environments or even in improper light conditions. Whereas infrared sensors work well even in a completely dark environment, and they do not require any external light source.

2.1 Infrared Face Recognition Techniques

For human identification in security systems, the biometric feature of a human face can be used. For face recognition, firstly input face image through a camera and then try to accurately match with face image already stored in the database of the same person. In this method, major challenges are illumination conditions at the time of capturing images for input purposes, finding an accurate match of the same person with the large database, and dealing with the disguised faces. Unlike the visible light face recognition system, infrared imaging techniques use facial thermograms.

The infrared sensors computing heat radiations from the objects and capturing thermal images are independent of the effect of illumination conditions. Analogous to visible light human face recognition system, thermal imaging-based system also take unique features of the image and learns them for recognition as shown in Fig. 1 [9]. There are certain methods that infuse thermal images as well as visible images before the actual feature extraction as shown in Fig. 2 [9].

Fig. 1
figure 1

One infrared face recognition

Fig. 2
figure 2

Cross modality in the face recognition system

This fused image has the best features of both modalities, and for testing, it can use as per requirements. For a 24 × 7 surveillance system, cross modality-based methods are most suitable. For improving efficiency, researchers also concatenated features of visible and thermal images as shown in Fig. 3 [9].

Fig. 3
figure 3

Heterogeneous face recognition using concatenated features

Figure 4 [9] shows another infrared face recognition system, in which features of both modalities images are projected into a common subspace using Canonical Correlation Analysis (CCA) and like algorithms.

Fig. 4
figure 4

Heterogeneous recognition using common subspace for learning

Some researchers also try for extracting common features using a deep neural network, in which the network learns a mapping between visible light image and thermal image features as shown in Fig. 5 [9]. After network training and testing with mapping of the thermal image, the network is able to map it to corresponding visible mode images. Nowadays, deep learning can be used for many different purposes in face recognition systems.

Fig. 5
figure 5

Features mapping using DNN

2.2 Preprocessing in Infrared Face Recognition Techniques

For pull-out quality features from facial images, pre-processing is an important step. The face part of the image is cropped and normalized after face detection in the whole image and removed noise by using a low pass filter. Low pass filter is also helpful for removing illumination variations in the images. To remove illumination variations, the Difference of Gaussian (DOG) filter is frequently used for visible face recognition [10]. After applying convolution on the original image using a two-dimensional DOG filter, a DOG filtered image is constructed. DOG filter is used to reduce illumination variations in the visible facial imagery, and in the thermal images, it is used for reducing local variations due to the temperature distribution of the face. By reducing local variations in the thermal image using DOG filter, make image closer to visible light image and lift edge information. For balancing positive and negative components for face recognition, the σ of the Gaussian filter is opt precisely. The DOG filter is beneficial for the methods using local features, as the DOG filter removes noise and enhances edges. DOG filter can be used in both visible and thermal images by correctly tuning σ values.

2.3 Local Features in Infrared Face Recognition Techniques

In beginning, researchers consider visible light face recognition methods for developing thermal face recognition system. Diego A. Socolinsky et al. found visible and IR imaging for human face recognition algorithms based on the appearance of features [2]. They considered many analysis tools for measuring performance like PCA (principle component analysis), ICA (Independent component analysis), and LDA (linear discriminant analysis).

Mendez et al. achieved better performance in infrared face recognition than visible light imaging [3]. The authors use one hundred dimensions for PCA, ICA, and LFA (Local feature analysis), and the LDA for different algorithms; they discovered recognition performance was worst when illumination and facial expression of training and testing sets are not the same.

Ahonenet al. introduced LBP (local binary pattern)-based face recognition system [4], and afterward, various extensions of the original operator have been proposed [11]. LBP is an efficient texture operator, producing features with high recognition ability [12]. LBP is resistant to lighting effects because it are invariant to gray level transformations. In uniform facial regions, investigation is highly recommended for improving features robustness.

A dataset of LWIR face images contains 40 frames of 91 persons, with different settings, and this includes front pose, speaking action face, with or without glass, and various illumination conditions [13]. Xiaoyang Tan et al. have conducted a study using the above-stated dataset with focusing on images with and without glasses. The researchers perform the comparison between LBA and LDA and they found LBP and LDA are performing similarly by taking the average of both in the case of images without eyeglass, whereas in the case of images with glasses, performance degrades significantly. In the computed feature space for computing similarity, several tests are performed including Chi square test, log-likelihood statistic, and histogram intersection in face recognition with nearest neighbor classifier [10].

Several researchers [14,15,16] have done comparison-based studies for comparing many thermal face recognition techniques with UCH thermal face and Equinox datasets. They found Weber Local Descriptor (WLD) as the best performing method over Equinox including the case of images having glasses and simple images. They also found appearance-based techniques are not performing well especially with disguised faces or faces with different expressions. On UCH thermal face dataset, Speeded up Robust Features (SURF) and Scale-invariant Feature Transform (SIFT) show better results with rotations and facial expressions.

Recently, researchers show their interest in wide baseline matching approaches and achieved significant improvement. In these methods, local interest points take it out separately from the test and corresponding image and designate by invariant descriptors, and then recursively descriptors are matched to get the original transformation of two images. D. Lowe shows SIFT descriptors are a better method for object recognition systems including a probabilistic hypothesis rejection approach with real-time operating, recognition capabilities [17]. Additionally, SIFT features can also be used for registering thermal and visible light images [18].

2.4 Fusion Techniques

Image fusion is possible in many ways like at image level, feature level, match score level, and decision level in both thermal and visible light images. Concatenation of feature vectors of fused images (visible and thermal) is the simplest way of fusion along with Eigenspace fusion of images also proposed. Fusion is also done by transformation into wavelet domain [19] before training on 2 V-GSVM whereas Singh et al. [6, 20] fused images at image level and domain level.

Bebis et al. [5] studied the effects of the facial clog by eyeglasses on thermal face recognition and they found in their experiment recognition efficiency let down when images of human faces having spectacles are available in the gallery image but not in the screening image and conversely in thermal imaging. To solve this hard problem, they take advantage of the fusion technique, by fusing the thermal image with the visible light image. In multi-resolution level fusion, features with different spatial extend to be fused at the resolution at which they are most salient. Compounded multi-scale representation is created by using some specific fusion rules from multi-scale transformation of thermal and visible light images [21]. The final fused image is acquired by performing an inverse multi-scale transform.

2.5 Deep Learning in Face Recognition Techniques

Recently, researchers' interest in deep learning increases and applied in various fields like computer vision, artificial intelligence, and pattern recognition. Many works show its benefits by using it in various fields. Deep learning exhibits a very strong learning ability. The main advantage in using deep learning in face recognition includes automatic feature design study results reduced manual overhead, and classifier fixation and feature extraction/selection are done in a single step unlike traditional methods.

A convolutional neural network (CNN) is a type of deep neural network in deep learning which is wildly used for analyzing visual images. Wu et al. [7] used CNN for face recognition in thermal imaging. In this work, the RGB-D-T dataset of thermal images is used to train CNN and learn efficient features of thermal images. CNN is more efficient in terms of recognition rate than modern methods like Histograms of Oriented Gradients (HOGs), Local Binary Patterns (LBPs), and moments invariant. They achieved significant recognition rates in different experimental setups, 98% in head rotations, 99.4% in variant expressions, and 100% in different illumination conditions.

In thermal face recognition, screening image is thermal and need to be mapped with the stored visible image in the database which is a very challenging task because thermal and visible images are of completely different modes. Deep Neural Networks (DNN) were used and Deep Perceptual Mapping (DPM) records non-linear relation between both modes [8]. CNN is capable of holding person identity information when it learns the non-linear mapping from infrared to visible light images. All visible light image mapped descriptors are affixed to form a single vector of the feature. After normalization of the created long vector from visible light images, it is matched with the thermal image vector. Thermal is also constructed in a similar way as for visible images. The dataset contains 4584 images of 82 persons, having images of both visible and thermal mode, achieved rank-1 with 83.73% identification accuracy.

3 Challenges in Thermal Face Recognition

The thermal face recognition systems measure the temperature of the face, and that temperature may vary with several conditions like environment temperature, the person with fever or other health issues, drunk person, etc. along with the person with eyeglass.

Eyeglass is a serious challenge in thermal imaging because thermal sensors cannot measure temperature beyond the glass; additionally, eyeglass is an obstacle between face and camera, which results in the sensor not being able to record important information.

The surroundings temperature, person mood, and health status also affect the object temperature which may lead to performance degradation especially in the case of MWIR and LWIR images. Alcohol dunked person’s facial temperature can change results recognition efficiency losses significantly.

4 Various Datasets

4.1 ND Collection X1 [22]

The dataset collection was conducted by the University of Notre Dame (UND) to help research work of human recognition algorithms. This database contains a total of 2292 pairs of visible and IR facial frontal images captured from 82 subjects from 2002 to 2004. Merlin uncooled LWIR sensor is used for capturing thermal images and for visible images, advanced resolution sensors are used.

4.2 Equinox Dataset [13]

This LWIR images dataset contains 40 frames from 91persons with three sequences of each frame and is collected by Equinox Corporation NY. Both left and right lateral and frontal settings are used for external light. Images were captured when the person is speaking, smiling, frowning, and giving a surprise expression, and additionally, taken extra shots of the person with eyeglasses.

5 Conclusion

In this study, we reviewed various related thermal face recognition works from the literature. We found in literature both local and global features are used and methods based on local features give better recognition efficiency than global features-based methods. If both visible light and infrared images are available, then methods based on fusion and DNN are used for taking advantages of both imaging techniques. Medium wave infrared face recognition performs well in a dark environment. Texture- and appearance-based methods also give a significant performance in Medium wave infrared face recognition. Low wave infrared imagery is strong in the change in illumination settings and has less intra-class variations. When thermal images are available for training, cross spectral matching can be used for recognition in dark and outdoor settings. A recognition system gives better performance if both visible and thermal images are used for learning. For the future, a few interesting approaches are Common Representation Learning, Canonical Correlation Analysis, and Deep Neural Networks for face recognition systems.