Abstract
This paper proposes a novel Eigen face recognition that is aided by fusion of visible and thermal face images to improve the face recognition accuracy. We adopt three different fusion schemes where in the face information is fused by the optimal weights obtained by different optimization algorithms. The first two fusion approaches operate in the dual tree discrete wavelet transform (DT-DWT), while the third one operates in the Curvelet transform (CT) domain. We employ particle swarm optimization (PSO), self-tuning particle swarm optimization (STPSO) and brain storm optimization algorithm (BSO) to find optimal fusion coefficients. The proposed fusion aided face recognition approaches are evaluated through extensive experiments using OCTVBS benchmark face database and the Eigen face detection methodology. Simulation results show that proposed face recognition techniques have significant performance improvement in recognition accuracy suggesting fusion aided face recognition approach that deserves further study and consideration whenever high recognition accuracy is desired.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Face recognition is one of the most successful and widely applied biometric traits for security purposes [50]. Face recognition based on the visible spectrum has shown good performance when the face images are captured in a controlled environment [20, 42]. But, the performance of such face recognition systems degrades significantly in uncontrolled illumination conditions [21, 32]. The accuracy of face recognition degrades quickly when the lighting is dim or when the face is not uniformly illuminated [27]. Thus, face recognition with visible face images in an uncontrolled lighting environment is challenging task. Face recognition using thermal infrared imaging (IR) sensors has become an area of growing interest [18]. The use of thermal images in face recognition systems has been shown to improve the recognition accuracy and be robust in uncontrolled illumination conditions [48]. However, in such systems thermal face image is sensitive to the ambient temperature changes that lead to misidentification [26].
Infrared images can distinguish targets from their backgrounds based on the radiation difference, which works well in all-weather and all-day/night conditions. By contrast, visible images can provide texture details with high spatial resolution and definition in a manner consistent with the human visual system [36, 37]. Therefore, it is desirable to fuse these two types of images, which can combine the advantages of thermal radiation information in infrared images and detailed texture information in visible images.
In our proposed work, we have developed three multi-resolution based fusion schemes to enhance the face recognition performance. In the first proposed scheme, the source images are decomposed into high and low frequency coefficients through DT-DWT. The reason to choose multi resolution approach is that the high frequencies are relatively independent of global changes in the illumination, while the low frequencies take into account the spatial relationships among the pixels and less sensitive to noise and small changes (e.g. Facial expression). Fusion in the multi resolution domain involves combining the coefficients of the visible and thermal images. The fused image is obtained by applying the inverse transform on the combined coefficients.
The rest of the paper is organized as follows. Section 2 describes about related work in face recognition. The proposed image fusion schemes are described in section 3. Experimental results and discussions are given in section 4. Finally, conclusions are given in section 5.
2 Related work
Face recognition is one of the most efficient and broadly used biometric modality in today’s scenario [9]. Face recognition methods can be classified into two main categories: holistic and texture-based methods [58, 59, 64]. In the holistic approach, all the pixels in the entire face image are taken as a single signal, and processed to extract the relevant features for classification [12].
Holistic or appearance-based approaches to face recognition involve encoding the entire facial image in a high-dimensional space [29, 64]. It is assumed that all faces are constrained to particular positions, orientations, and scales. The most widely used holistic approaches are the principal component analysis (PCA) [6], linear discriminant analysis (LDA) [55] and a blind source separation technique, called independent component analysis (ICA) [4].
Principal Component Analysis was used for face recognition by Turk and Pentland [56]. Later, Principal Component Analysis was compared with Linear Discriminant Analysis in [39]. Gabor based Kernel PCA with fractional power polynomial model were used by Liu in [33]. Yang et.al proposed two dimensional PCA for face recognition [61]. In 2005, Locally Linear Discriminant Analysis (LLDA) was used for face recognition [25]. Texture-based approaches rely on the detection of individual facial characteristics and their geometric relationships prior to performing face recognition [40, 51, 64]. Apart from these approaches, face recognition can also be performed by using different local regions of face images [5, 11].
Jiayi Ma et al. [15] proposed a novel fusion algorithm, named Gradient Transfer Fusion (GTF), based on gradient transfer and total variation (TV) minimization. The authors formulated the fusion problem as an ι1-TV minimization problem, where the data fidelity term keeps the main intensity distribution in the infrared image, and the regularization term preserves the gradient variation in the visible image. It can simultaneously keep the thermal radiation information in the infrared image and preserve appearance information in the visible image. The fusion results look like high-resolution infrared images with clear highlighted targets and hence, it will be beneficial for fusion-based target detection and recognition systems.
Recently, the Sparse Representation based Classification (SRC) method, has received a lot of attention for face recognition [15]. In SRC, a sparse coefficient vector was introduced in order to represent the test image by a small number of training images. Then the SRC model was formulated by jointly minimizing the reconstruction error and the 1-norm on the sparse coefficient vector. The main advantages of SRC have been pointed out in [15]: i) it is simple to use without carefully crafted feature extraction, and ii) it is robust to occlusion and corruption.
Jiayi Ma et al. [15] address the problem of face recognition when there is only few, or even only a single, labelled examples of the face that we wish to recognize. Moreover, these examples are typically corrupted by nuisance variables, both linear (i.e., additive nuisance variables, such as bad lighting and wearing of glasses) and non-linear (i.e., non-additive pixel-wise nuisance variables, such as expression changes). The small number of labelled examples means that it is hard to remove these nuisance variables between the training and testing faces to obtain good recognition performance. To address the problem, the authors proposed a method called semi-supervised sparse representation-based classification.
Some other methods proposed for face recognition are: Bayesian inference [41], Elastic Bunch Graph Matching (EBGM) [30], Support Vector Machines (SVM) [43], Linear Discriminant Analysis (LDA) [17], Kernel Methods [49], Neural Networks [31], Local Feature Analysis (LFA) [45]. Parkhi et al. [44] used deep convolutional neural networks for face recognition and this approach achieved results comparable to the state of the art.
Face recognition system also uses transform domain techniques to achieve challenges like illumination compensation and normalisation [10]. Discrete Cosine transform (DCT) minimizes illumination variations and is robust and can be implemented in real time [7]. High speed face recognition can be implemented combining DCT and Fisher Linear Discriminant (FLD) and Radial Basis Function (RBF) neural networks. The proposed system achieves excellent performance with training and high-speed recognition, high recognition rates and illumination challenges [14]. 3D Discrete Wavelet Transform (DWT) is employed for feature extraction of hyper-spectral facial analysis and achieved accuracy proves that 3D DWT method is superior to spatio-spectral classification [16]. The authors [2] used Multi-Resolution transform such as, Gabor Wavelet Transform (GWT) is used for recognizing facial images, collected from benchmark Yale Database. Alaa Eleyan et al. [13] combined wavelet with PCA to improve face recognition accuracy. Hafiz Imtiaz et al. [24] proposed face recognition approach based on two-dimensional discrete wavelet transform (2D-DWT), which efficiently exploits the local spatial variations in a face image. Other formal algorithms for feature extraction with multivariate statistical techniques in complex domain are fused with deep learning and results show advancement as compared to state of art methods in computer vision and pattern recognition [54].
But the existing above-mentioned face recognition approaches either works on visible image or thermal image. It is well-known that, face recognition based on the visible image has shown good performance when the face images are captured in a controlled environment. But, the performance of such face recognition systems degrades significantly in uncontrolled illumination conditions. The accuracy of face recognition degrades quickly when the lighting is dim or when the face is not uniformly illuminated. The use of thermal images in face recognition systems has been shown to improve the recognition accuracy and be robust in uncontrolled illumination conditions. However, in such systems thermal face image is sensitive to the ambient temperature changes that lead to misidentification. Hence, considering the complementary information contained in visible and thermal face images, the fusion can be used to improve the accuracy of face recognition task [1, 8, 60].
Image fusion literature related to visible and thermal images show that multi-resolution approach (MRA) fusion is efficient and it is possible to integrate information at different level of decomposition [47]. Multiresoultion methods provide powerful signal analysis, which are widely used in feature extraction. Wavelet transform techniques achieve optimal decomposition without affecting much the image quality. Because of the property of shift-invariance, it is well known that wavelet-based approach is one of the most robust feature extraction schemes, even under variable illumination. Some of the most popular multi-resolution approaches include the Laplacian pyramid (LAP) [52], Gradian pyramid (GRAD), ratio of Laplacian pyramid (ROLP) [53], Contourlet transform, non sub-sampled Contourlet transform (NSCT) [28], discrete wavelet transform (DWT), shift invariant discrete wavelet transform (SIDWT) [62], dual tree discrete wavelet transform (DT-DWT) [38] and Curvelet transform (CT) .
In this paper, we propose, three optimization based fusion methods that aids face recognition problem. The ultimate goal of the paper is to enhance the face recognition performance by various optimizations based fusion methods. In the first proposed scheme, the source images are decomposed into high and low frequency coefficients through DT-DWT. Particle swarm Optimization (PSO) is used to find the optimal weights to combine face information from thermal and visible images. Then the fused images are recognized using Eigen face approach for the purpose of demonstrating the benefits of fusion.
In the second proposed scheme, the source images are decomposed into high and low frequency coefficients through DT-DWT. Self Tunning Particle swarm Optimization (ST-PSO) is used to find the optimal weights to combine face information from thermal and visible images. Then the fused images are recognized using Eigen face approach for the purpose of demonstrating the benefits of fusion.
In the third scheme, curvelet transform is applied for image decomposition that preserves the edges along the curves. Again, further to improve the searching of optimal weight coefficients a Brain storm optimization algorithm is used for optimization. Then the fused images are recognized using Eigen face approach. In our work, we have used OTCBVS [22] face database for carrying experiments using proposed fusion methods.
3 Preliminaries
3.1 Eigen face detection methodology
In Eigen face detection methodology [57] PCA is applied to the task of face recognition. The PCA converts the pixels of face image into a number of Eigen feature vectors. These Eigen Feature vectors used to measure the similarity between the two face images. The mean vector of the training face images is calculated. Let the training set of face images can be I1, I2, IM. Every training image (Ii) is represented as a vector Γi and mean face vector (ψ) is computed as follows:
Where S is the to be number of faces in the training set. Subtracting the mean from the training images gives mean shifted images vector (ϕi) as
The Eigen vectors and Eigen values of the mean shifted images are computed from covariance matrix (c).
Where A = [ϕ1, ϕ2, ϕ3, ..., ϕM]. The Eigen face can be defined by computing the Eigen face vector μi of c.
Where j = 1, 2, 3, ..., S. Vij is the Eigen vectors of ATA. The Eigen vectors are ordered in descending order by its corresponding Eigen values. The Eigen vectors having the largest Eigen values could be retained and projected into the Eigen face.
The last step in this method is to classify a given face image. To perform face recognition the similarity score is calculated between the test image and each of the training images. The given new image F, transforms into its Eigen face components (projected into face space), subtract the mean (ϕ = Γ − ψ) and compute the projection
Where wi = μiTΓ are the coefficients of the projection and (wi) referred as Eigen features. The matched image is the one with the highest similarity score.
3.2 Particle swarm optimization algorithm (PSO)
PSO is a population-based optimization technique that finds an optimal solution to the problem in a feasible solution space. PSO is initialized with a population of random solutions called particles that distributed over the search space. The moment of each particle is updated based on the two factors: (i) personal best (Pi(t)) position that the ith particle has found (ii) global best position (Pg(t)) found by the whole swarm. Each particle updates its velocity (Vi(t)) and position (Xi(t)) as follows
Where ω is the inertia weight that controls the convergence of PSO. The parameters r1 and r2 controls the diversity of the population. The acceleration coefficients c1 and c2 take fixed value (i.e., c1=c2=2) that stabilizes the influence of the Pi(t) and Pg(t). The current global best value replaces the previous iteration Pg(t) value if it has a better fitness value. The same process is repeated until the maximum number of iterations is reached.
3.3 Self-tuning particle swarm optimization (ST-PSO)
The PSO can easily locate nearly optimal solutions with fast convergence speed, but fails to adjust the acceleration coefficients leading to premature convergence. The PSO algorithm tends to give poor performance when the acceleration coefficients (c1 and c2) are fixed. In the proposed work [38], dynamically varying acceleration coefficients have been introduced to improve search ability and premature convergence. The modified acceleration coefficients are represented as follows
where c1iv,c1fv, c2fv and c2iv are constants fall in the range of [2.5,0.5] and [0.5, 2.5] respectively. The objective of ST-PSO is to avoid premature convergence.
3.4 Brain storm optimization algorithm (BSO)
Shi proposed BSO [19] in 2011 by modelling the human brain storm process by creatively mapping it to the optimization field. In BSO, each position within the solution space is called an idea that is randomly initialized in the solution space. During each generation, the ideas are grouped into cluster using k-means clustering and the idea with best fitness is selected as cluster center. To avoid from premature convergence and improve the search efficiency, a randomly selected cluster center could be replaced by a newly generated individual with a probability of pr. To generate new idea, one cluster or two clusters are randomly selected with pre-determined probability (p1, p2). If a new idea is generated using one existing idea, it can be produced by Eq. (6).
Where \( {X}_{new}^t \) and \( {X}_{old}^t \) are the tth dimension of Xnew and Xold, respectively. N(μ, σ) represents the Gaussian distribution with mean μ and variance σ and ξ is the regulatory factor which controls the convergence speed, is defined as
where Nmax is the maximum number of iterations, Nm is the current number of iteration, K is the scale factor (K = 20). If the new idea is generated by two existing ideas, it can be defined as
where w1 is the weight of selected idea. After the new idea is generated, the quality of the new idea is evaluated by fitness function, if the new idea is better than old one, it replaced by new idea. The above process is repeated for all ideas and until the maximum number of iterations is reached. Then output result as the optimal solution to the problem.
4 Proposed optimized image fusion framework for face recognition
In our proposed work, we have developed three multi-resolution based fusion schemes to enhance the face recognition performance. Visible (V) and thermal (T) face images are taken using different cameras so that the images have different fields of view and spatial resolutions. Thermal images are registered using affine transform by considering the visible image as base image [34, 35]. After registration, the source images are decomposed using transform and fused using optimal weights obtained by optimization algorithm. Fused images are trained and recognized using Eigen face detection methodology.
In the first proposed scheme, the source images are decomposed into high and low frequency coefficients through DT-DWT. The reason to choose multi resolution approach is that the high frequencies are relatively independent of global changes in the illumination, while the low frequencies take into account the spatial relationships among the pixels and less sensitive to noise and small changes (e.g. Facial expression). Fusion in the multi resolution domain involves combining the coefficients of the visible and thermal images. The fused image is obtained by applying the inverse transform on the combined coefficients.
The key question in implementing this idea is how optimally combining the coefficients from each spectrum. Using unweighted average is not appropriate since it assumes that the two spectra are equally important (weight = 0.5 for both the images). George Bebis et al. [3] employed genetic algorithm (GA) to find an optimal fusion strategy to combine information from thermal and visible images. Gabriel et al. [19] also used genetic algorithm to choose optimal face areas where one spectrum is more representative than other. Genetic algorithm requires additional operations such as cross over and mutation which is a time-consuming process. Moreover, genetic algorithms struck with local optima.
In the first scheme, PSO is used to find the optimal weights to combine face information from thermal and visible images. The reason for the choice of PSO is that, it has less time complexity compared with GA free from complex crossover and mutation operations. The advantages of PSO is that it can locate nearly optimal solutions with a fast convergence, but usually fails to adjust acceleration coefficients which often leads to premature convergence [38].
In the second scheme, we have modified the PSO by introducing dynamically varying acceleration coefficients to improve the global search ability and to avoid premature convergence. The modified version of PSO is named as self-tuning particle swarm optimization (ST-PSO). ST-PSO is employed to find optimal weights and used to combine information from thermal and visible face images.
Edges in the face images need to be properly synthesized in the fused image in order to improve face recognition accuracy. But DT-DWT fails to preserve edges along the curves. Therefore, to improve the presence of edges further in the third scheme, Curvelet transform is applied for image decomposition that preserves the edges along the curves. Again, further to improve the searching of optimal weight coefficients a Brain storm optimization algorithm is used for optimization. The Fig. 1 illustrates the steps involved in the proposed work.
The fused images are recognized using Eigen face approach for the purpose of demonstrating the benefits of fusion. The projected Eigen face space is constructed from training face images. Similarity score between the test image and each of training images is calculated. The matched image is the one with the highest similarity score. Recognition performance is computed by finding the percentage of images in the test set, for which the top match is an image of the same person from the training images. Experimental results show that the proposed image fusion image fusion scheme is a viable approach for enhancing face recognition performance.
4.1 Algorithm 1: Image fusion through DT-DWT and PSO for face recognition
-
Step 1:
The visible and thermal images are resampled to common size (m x n). The reason is that DT-DWT operates only on images size of power two. Hence, we generated 128 × 128 size images using bi-cubic interpolation.
-
Step 2:
The images T and V are registered using affine transformation in order to spatially align the images.
-
Step 3:
The images T and V are decomposed into low and high frequency components using DT-DWT.
-
Step 4:
The coefficients of the T and V face images are combined using the fusion rule (FR)
Here w1 and w2 determine the percentage of each image coefficient in the fused image. Here, PSO is used to obtain optimal weights that maximize the entropy and minimize the root mean square error. The procedure for obtaining optimal weights is given below.
In our work, image fusion is formulated as an optimization problem. The set of solution is defined as a set of N particles (weights)
where,w = (w1, w2)T ∈ A, which maximizes the entropy (E) of the fused image.
where p(j) is the probability of the occurrence of jth intensity of the fused image. The solution set (w) also minimize the objective function (RMSE)
The overall objective function is defined as follows
Here α1 and α2 are constants whose value indicate the relative significance of the objective function. In this work, we choose α1=α2=0.5. The solution set which gives maximum entropy value and minimal RMSE value can be taken as global best. After the maximum number of iterations is reached the global best value is used to get the final fused image. The Fig. 1 illustrates the steps involved in the proposed work.
The reason to choose entropy and RMSE as objective function is to maximize the information content (entropy) thus indicates the quality of the fused image.
-
Step 5:
The fused image (F) is obtained by taking inverse DT-DWT to the fused coefficients.
Where T−1 is the inverse DT-DWT.
-
Step 6:
The fused face images are recognized using Eigen face detection methodology. As shown in Fig. 1 the mean vector (ψ) of the training face images and mean shifted images vector (ϕi) Calculated using Eq. (9) and (10).
-
Step 7:
Calculate the Eigen vectors and Eigen values of the mean shifted images from covariance matrix (c). The Eigen vectors are ordered in descending order by their corresponding Eigen values. The Eigen vectors having the largest Eigen values could be retained and projected into the Eigen face
-
Step 8:
The last step is to classify a face image. To perform face recognition the similarity score is calculated between the test image and each of the training images. The matched image is the one with the highest similarity score.
4.2 Algorithm 2: Image fusion through DT-DWT and ST-PSO for face recognition
-
Step 9:
Thermal and visible images are resample using bi-cubic interpolation to the size of 128 × 128.
-
Step 10:
Images are registered using affine transform to spatially align the images.
-
Step 11:
The registered images are decomposed using DT-DWT and fused using optimal weights obtained from ST-PSO using the step 4 mentioned in section 3.1. The optimal weights are obtained by substituting solution set values in Eq. (10) and the solution which gives maximum fitness value at the end of maximum iterations will be considered as optimal value.
-
Step 12:
As in Inverse DT-DWT is applied on the fused coefficients to get the final fused image.
-
Step 13:
To perform face recognition on the fused images the steps 6 to 9 mentioned in section 4.1 has followed.
4.3 Algorithm 3: Image fusion through Curvelet and BSO for face recognition
-
Step 1:
As in Scheme I and II, the thermal and visible images are registered using bi-cubic interpolation to the size of 128 × 128. Curvelet transform can operate in any size of image. Here the reason for resampling is that to use consistent image size for all the proposed fusion approaches.
-
Step 2:
After image resampling, image registration is performed. The DT-DWT has better reconstruction and shift invariant property. But edges are not effectively handled in DT-DWT. Curvelet transform effectively captures the edges along the curves can improve the face recognition accuracy.
-
Step 3:
Optimal weights are obtained using BSO formulation by substituting idea set in Eq. (15). The idea set which gives the maximum fitness value at the end of the maximum iteration is taken as global best value.
-
Step 4:
The fused image (F) is obtained by taking inverse curvelet transform to the fused coefficients.
Where T−1 is the inverse Curvelet transform.
-
Step 5:
To perform face recognition on the fused images the steps 6 to 9 mentioned in section 4.1 has followed.
5 Experimental results and discussion
5.1 OTCBVS-dataset
In our experiments, we used the face database called OTCBVS which is a standard bench mark of thermal and visible images for face recognition techniques [23]. OTCBVS consists of 700 visible and 700 thermal images of 16 persons. The images were taken at different times that contain variability in illumination, facial expression (open /closed eyes, smiling/ non smiling), various poses like upright, frontal position and facial details (glasses / without glasses. Out of 700 thermal images only 400 images of 10 persons are taken out of which 200 are thermal images and 200 are visible images. 20% images are used as training set and 80% images are used as testing set.
5.2 Parameter settings
The proposed image fusion based face recognition techniques are compared with several image fusion based faced recognition techniques namely, Laplacian pyramid (LAP) [52], Ratio-of-laplacian pyramid (ROLP) [53], Gradian pyramid (GRAD) [47], shift invariant discrete wavelet transform (SIDWT) [62], Non sub-sampled contourlet transform (NSCT) [28]. All techniques are implemented using Matlab R2015. In this work, DT-DWT and curvelet transform is used for image decomposition. The number decomposition level is set to 6 for DT-DWT, 5 for curvelet transform, 4 for NSCT, 2 for LAP, ROLP, GRAD, and SIDWT. The high frequency components are fused using maximum selection rule whereas the low frequency components are fused using average fusion rule for state-of-art-methods. The parameters selected for PSO, ST-PSO and BSO are listed in Tables 1 and 2.
5.3 Face recognition accuracy
The visible and thermal face images are pre-processed prior to recognition. The visible and thermal images are converted into grayscale images. Thermal images are registered based on the visible images. The images are fused using proposed image fusion schemes and state-of-art image fusion techniques. The fused face images are recognized using Eigen face Recognition methodology. The projected Eigen face space is constructed from training face images. Similarity score between the test image and each of training image is calculated. The matched image is the one with the highest similarity score. Recognition performance is computed by finding the percentage of images in the test set, for which the top match is an image of the same person from the training images. The recognition ratio (R) is computed as follows
where N is the number of images in the test set. Here fi=1, if the top most match from the training set belongs to the same object and fi=0 otherwise.
5.4 Evaluation metrics
Evaluating the quality of the fused image is a challenging task as the reference image is not available to compare to the fusion results. Researchers have proposed several quality metrics to assess the quality of such an image. Zheng Liu et al. [63] classified the twelve-quality metrics as being grouped into four categories in which mutual information (MI), Petrovic metrics, and spatial frequency (SF) are considered in the performance analysis of the proposed work. The first two metrics come under information theory as the image fusion aims to combine information content and does not require a reference image. The other quality metrics, such as fusion symmetry (FS) and correlation coefficients (CC), are also used in the proposed system. The metrics are defined and computed as follows:
-
1)
Mutual information: MI quantifies the mutual dependence between the source and fused image, which is given by
where MIAF and MIBF are the mutual information between the source images A, B and the fused image. hAF(m, n) is the joint probability distribution function of A and F, and hA(m) and hF(n) are the marginal probability distribution functions of A and F, respectively.
-
2)
Petrovic metrics: QAB/F computes the amount of edge information transferred from the source image to the fused image. LAB/F computes the loss of information and NAB/F computes the artefacts (noise) in the fused image due to the fusion process. The procedure for computing QAB/F, LAB/F and NAB/F given in [46] is adapted in our work to compute the petrovic metric.
-
3)
Fusion symmetry (FS) defines the symmetry of the fused image with respect to the source images and is computed by
A higher value of FS denotes better performance of the fusion system.
-
4)
Spatial Frequency (SF) is used to measure the action level in an image. A large value of SF represents the clarity of the image. The spatial frequency is computed as
where RF is the row frequency and given by
and CF is the column frequency denoted by
-
5)
The correlation coefficient (CC) computes the relevance of the fused image to the source image
and is defined by
where
where \( \overline{A} \), \( \overline{B} \) and \( \overline{F} \) are the average pixel intensity of the source images and fused image that measure an index of contrast. f(i, j), a(i, j) and b(i, j) represent the pixel intensity at (i, j) for the fused and source images, respectively.
5.5 Performance of various image fusion algorithms
We have conducted the experiments in four ways. (i) The first experiment includes the training data set using all type of images (varying facial expression, facial illumination and eye glasses). (ii) The second experiment includes only the images which are having varying facial expression as training dataset. (iii) Third experiment includes the face images from varying illumination condition; (iv) The last experiment contains the face images with eye glasses.
The sample input visible and thermal images for over all test and detailed test are given in Figs. 2, 3, 4, 5, 6 and 7. Fusion results of various image fusion algorithms are given in Figs. 8 and 9. The LAP and ROLP fusion methods not effectively captured the eye part of the face. Moreover, the features like nose, ears not effectively synthesized.
The edges of the nose and ears are not completely reproduced compared with GRAD based fusion approaches. Compared with LAP and ROLP, GRAD based fusion method have better eye feature. But, the teeth in the face image lost it contrast. The reason is that visible image having clear representation about eye and teeth under bright illumination conditions. Thermal images don’t have these sharp features. But thermal images have same features if images are taken during presence of light or absence of light.
The visibility of the eye feature and sharpness of the nose are better in SIDWT. But the contrast of teeth is not preserved in SIDWT. The results of NSCT are better than SIDWT. Compared with pyramid-based approach, wavelet based fusion approach gives better fusion result. The proposed image fusion scheme I based on dual tree discrete wavelet transform and PSO effectively captures the eye and teeth features from visible images under the well-illuminated condition and effectively captures from thermal images under the absence of light.
Compared with LAP, ROLP, GRAD and NSCT, the proposed scheme-I effectively synthesized the facial features from thermal and visible images. The factor is that the proposed algorithm is based multi resolution approach. So, the details that are missing at one level can be easily acquired at another level. The coarse details of image are effectively fused by DT-DWT. The decomposed coefficients of thermal and visible images are fused using the optimal weights determined from PSO that improves quality of image interpretation. Though, in Fig. 9a, the last face image, the person with eyeglass is not effectively fused in the resultant image of proposed scheme-I. From that figure we can observe that left eye inside the eye glass is not visible. That feature is effectively captured in proposed image fusion scheme-II.
The Curvelet and brain storm optimization based image fusion results gives better results compared with other methods. The fused images are highly contrasted image and having sharp edge and nose features. Proposed approach gives better result than all other methods. The reason is that the Curvelet transform preserved edges along the curves. Naturally face images contain more curves that is effectively synthesized in the fused images which gives better image representation. From Figs. 10, 11, 12 and 13 we can observe that the proposed image fusion algorithms effectively combine the information from thermal and images under various illumination condition, expression and eyeglass test.
The quantitative analysis, based on various image fusion quality metrics for the fused images by applying the fusion algorithms, is presented in Table 3. Among the quality metrics, a high value for SF, MIF, FS, CC, and QAB/F and a low value for the RMSE, LAB/F and NAB/F indicate the good quality of the fused image. The high value of QAB/F for the proposed algorithm indicates that more edge information has been transferred to the fused image. It can also be observed that the low frequency value of LAB/F indicates only a minimal information loss compared with other metrics. The low value of NAB/F indicates that the proposed method introduces minimal artefacts in the fused image, whereas LAP introduces more artefacts compared with all of the other methods.
5.6 Face recognition performance of fused images for overall dataset
The overall dataset tests had varying success as shown in Table 4, Figs. 8, 9 and 10. Face recognition using visible image gives the recognition accuracy as 80.00%. In general, fusion led to improved recognition performance compared to visible images.
The face recognition accuracy (Table 4) using fusion methods such as LAP, ROLP, GRAD, SIDWT, NSCT, Jiayi Ma et al. [15] and Parkhi et al. [44] are, 90.50%, 90.21% 90.80%, 92.32%, 93.54%, 94.00% and 93.98% respectively. The authors thank of Jiayi Ma et al. [15] for posting the code in github (https://github.com/jiayi-ma/S3RC). The authors thank Parkhi et al. [44] for posting the code in website (http://www.robots.ox.ac.uk/~vgg/software/vgg_face/). Comparing Proposed fusion methods with other methods (face recognition accuracy 94.17, 94.50, 96.00), proposed method better than other methods and vice versa.
5.7 Face recognition performance based on different facial expressions
The facial expression tests had varying success as shown in Table 4, Figs. 10, 11 and 13. Face recognition using visible image gives the recognition accuracy as 85.32%. In general, fusion led to improved recognition performance compared to recognition in visible spectrum. Comparing thermal images with fusion, sometimes thermal images performed better than fusion and vice versa. The reason is that the presence of undesired illumination effect of visible images taken account into the fused image. Among all the methods the proposed method based on curvelet and BSO based fusion approach gives better recognition accuracy (90.90%).
5.8 Face recognition performance under varying illumination conditions
The facial expression tests had varying success as shown in Table 4, Figs. 14, 15, 16 and 17. Face recognition using visible image gives the recognition accuracy as 85.32%. In general, fusion led to improved recognition performance compared to recognition in visible spectrum. Comparing thermal images with fusion, sometimes thermal images performed better than fusion and vice versa. The reason is that the presence of undesired illumination effect of visible images taken account into the fused image.
Among all the methods the proposed method based on Curvelet and BSO based fusion approach gives better recognition accuracy (90.90). From Fig. 17 we can observe that under varying illumination condition fused images give better recognition accuracy (LAP - 85.30%, ROLP – 86.74%, GRAD-86.52%, SIDWT-87.88% and NSCT-84.00%) compared with recognition in visible spectrum. Recognition in the visible spectrum was not satisfactory while recognition using proposed fused images had comparable performance to that in the thermal spectrum. The recognition accuracy using proposed fused images are 90.65%, 91.23% and 90.34% respectively.
5.9 Face recognition performance with eyeglasses
Face recognition accuracy for images with eyeglasses using IR images give poor recognition performance (62.67%). The reason is that eyeglasses are not sensitive to heat. The part under the eyeglasses cannot effectively capture in thermal images. Our experimental results illustrate clearly that IR is robust to illumination changes but perform poorly when glasses are present in the face image (Fig. 17). From Table 4 and Fig. 12 we can observe that considerable improvement is achieved in this case by fusing IR with visible images in curvelet and dual tree discrete wavelet transform domain.
We have also attempted to analyse PSO, ST-PSO and BSO solutions in order to understand what part of the face are encoded by IR features and what parts are encoded by visible features. Eye and teeth were optimally combined mostly using features from visible spectrum. Head parts of the face were optimally combined using features from thermal spectrum.
5.10 Processing time
The computational efficiency of different fusion methods is compared here. In our experiments, all the five test methods are implemented in MATLAB R2015 on a computer with a 3.0 GHz CPU and 4 GB RAM. The average running time of different fusion methods are listed in Table 5. The GRAD method has a high computational efficiency whereas LAP and SIDWT take 7 s to fuse source images. The proposed method takes 100 s to complete the fusion process. We believe that with a more efficient implementation approach such as C++, the running time can be easily reduced.
6 Conclusion
We presented and compared three different fusion schemes for combining thermal and visible imagery for the purpose of face recognition. IR images are more robust to varying illumination conditions, but gives performance when eyeglasses are present in the face images. The proposed swarm intelligence based fusion methodology is general enough and can be applied in these cases as well as to improve recognition performance when thermal and visible images are not very reliable.
Several interesting conclusions can be drawn by considering these results.
-
(i)
As expected, face recognition in the thermal images is not influenced by the illumination changes. However, thermal image yielded very low success when eyeglasses were present in the face images.
-
(ii)
Illumination changes had an important influence on the success of face recognition in the visible domain. Illumination changes also affect the fused images. The fact is that the fusion was not able to completely discard undesired illumination effects present in the visible images.
-
(iii)
Success of face recognition using fused images implies that fusion was able to become less sensitive both to eyeglasses and illumination changes
-
(iv)
Between the three proposed fusion schemes tested, fusion in the Curvelet domain yield higher recognition performance over all.
References
Ahmad A, Riaz MM, Ghafoor A, Zaidi T (2016) Noise resistant fusion for multi-exposure sensors. IEEE Sensors J 16(13):5123–5124
Arivazhagan S, Mumtaj J (2007) Face Recognition using Multi-Resolution Transform. International Conference on Computational Intelligence and Multimedia Applications, 301–306
Bebis G, Gyaourova A, Singh S, Pavlidis I (2006) Face recognition by fusing thermal infrared and visible imagery. Image Vis Comput 24:727–742
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs.fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
BenAbdelkader C, Griffin P (2005) A local region-based approach to gender classification from face images. Proc. IEEE Comp. Society Conf. Computer Vision and Pattern Recognition, pages 52–57
Brunelli R, Poggio T (1993) Face recognition: features versus templates. IEEE Trans Pattern Anal Mach Intell 15(10):1042–1052
Chen W, Er MJ, Wu S (2006) “Illumination compensation and normalization for robust face recognition using discrete cosine transform in logarithm domain” IEEE transactions on systems, man, and cybernetics. Part B (Cybernetics) 36(2):458–466
Chen YL, Jahanshahi MR, Manjunatha P, Gan WP, Abdelbarr M, Masri SF, Becerik-Gerber B, Caffrey JP (2016) Inexpensive multimodal sensor fusion system for autonomous data acquisition of road surface conditions. IEEE Sensors J 16(21):7731–7743
Choudhary A, Vig R (2018) Face Recognition Using Multiresolution Hybrid Kekre-DCT Wavelet Transform Features with Multiclass ECOC Framework. Proc Comp Sci 132:1781–1787
Choudhary A, Vig R (2018) Face recognition using multiresolution hybrid Kekre-DCT wavelet transform features with multiclass ECOC framework. Proc Comp Sci 132:1781–1787
Dakin SC, Watt RJ (2009) Biological `bar codes' in human faces. World Acad Sci Eng Technol 9:1–10
Ekenel HK, Sankur B (2005) Multiresolution face recognition. Image Vis Comput 23:469–477
Eleyan A, Demirel H (2007) Face Recognition using Multiresolution PCA. IEEE International Symposium on Signal Processing and Information Technology, 52–55.
Er MJ, Chen W, Wu S (2005) High-speed face recognition based on discrete cosine transform and RBF neural networks. IEEE Trans Neural Netw 16(3):679–691
Gao Y, Ma J (2017) Semi-Supervised Sparse Representation Based Classification for Face Recognition with Insufficient Labeled Samples. IEEE Transactions on Image Processing
Ghasemzadeh A, Demirel H (2017) 3D discrete wavelet transform-based feature extraction for hyperspectral face recognition. IET Biometrics 7(1):49–55
Grotschel M, Lovász L (1993) Combinatorial optimization: a survey, DIMACS technical report 93–29, Princeton University. In Internet
Guzman AM, Goryawala M, Wang J, Barreto A, Andrian J, Rishe N, Adjouadi M (2013) Thermal imaging as a biometrics approach to facial signature authentication. IEEE J Biomed Health Inf 17(1):214–222
Hermosilla G, Gallardo F, Farias G, San Martin C (2015) Fusion of visible and thermal descriptors using genetic algorithms for face recognition systems. Sensors 15:17944–17962. https://doi.org/10.3390/s150817944
Hizem W, Allano L, Mellakh A, Dorizzi B (2009) Face recognition from synchronised visible and near-infrared images. IET Signal Process 3(4):282–288
Hollingsworth KP, Darnell SS, Miller PE, Woodard DL, Bowyer KW, Flynn PJ (2012) Human and Machine Performance on Periocular Biometrics Under Near-Infrared Light and Visible Light. IEEE Transact Inf Foren Sec 7(2):588–601
IEEE OTCBVS WS Series Bench (n.d.); DOE University Research Program in Robotics under grant DOE-DE-FG02-86NE37968; DOD/TACOM/NAC/ARC Program under grant R01–1344-18; FAA/NSSA grant R01–1344-48/49; Office of Naval Research under grant #N000143010022
IEEE OTCBVS WS Series Bench (n.d.); DOE University Research Program in Robotics under grant DOE-DE-FG02-86NE37968; DOD/TACOM/NAC/ARC Program under grant R01–1344-18; FAA/NSSA grant R01–1344-48/49; Office of Naval Research under grant #N000143010022. http://vcipl-okstate.org/pbvs/bench/Data/02/download.html
Imtiaz H, Fattah SA (2011) A face recognition scheme using waveletbased dominant features. Signal & Image Processing : An International Journal (SIPIJ). 2(3)
Kim T-K, Kittler J (2005) Locally Linear Discriminant Analysis for Multi-modally Distributed Classes for Face Recognition with a Single Model Image. IEEE Trans Pattern Anal Mach Intell 27(3):318–327
Klare BF, Jain AK Heterogeneous Face Recognition Using Kernel Prototype Similarities. IEEE Trans Pattern Anal Mach Intell 35(6):1410–1422
Kong SG, Heo J, Boughorbel F, Zheng Y, Abidi BR, Koschan A, Yi M, Abidi MA (2007) Multiscale Fusion of Visible and Thermal IR Images for Illumination-Invariant Face Recognition. Int J Comput Vis 71(2):215–233
Kong WW, Lei YJ, Lei Y, Zhang J (2010) Technique for image fusion based on non-subsampled contourlet transform domain improved NMF. SCIENCE CHINA Inf Sci 53(12):2429–2440
Kumar A, Zhang D (2006) Personal recognition using hand shape and texture. IEEE Trans Image Process 15(8):2454–2461
Laurenz W, Fellous JM, Krüger N, von der Malsburg C (1997) Face recognition by elastic bunch graph matching. 19(7): 775–779
Lawrence S, Giles CL, Tsoi A, Back AD (1998) Face recognition: a convolutional neural network approach. IEEE Trans Neural Netw 8(1):98–113
Li SZ, Chu RF, Liao SC, Zhang L (2007) Illumination invariant face recognition using near-infrared images. IEEE Trans Pattern Anal Mach Intell 29(4):627–639
Liu C (2004) Gabor-Based Kernel PCA with Fractional Power Polynomial Models for Face Recognition. IEEE Trans Pattern Anal Mach Intell 26(5):572–581
Ma J, Zhao J, Ma Y, Tian J (2015) Non-rigid visible and infrared face registration via regularized Gaussian fields criterion. Pattern Recogn 48:772–784
Ma J, Qiu W, Zhao J, Ma Y, Yuille AL (2015) Robust L2E Estimation of Transformation for Non-Rigid Registration, IEEE Transactions on Signal Processing. IEEE Transactions on Signal Processing, VOL. 63, NO. 5, MARCH 1, 2015, 1115–1129
Ma J, Chen C, Li C, Huang J (2016) Infrared and visible image fusion via gradient transfer and total variation minimization. Info Fusion 31:100–109
Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: a survey. Info Fusion 45:153–178
Madheswari K, Venkateswaran N (2016) Swarm intelligence based optimisation in thermal image fusion using dual tree discrete wavelet transform. Quant Infra Thermograph J, First Online, 1–20
MartõÂnez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Pattern Anal Mach Intell 23(2):228–233
Meraoumia A, Chitroub S, Bouridane A (2010) Gaussian modeling and Discrete Cosine Transform for efficient and automatic palmprint identification. Int Conf Mach Web Intell (ICMWI):121–125
Moghaddam B, Pentland A (1997) Probabilistic visual learning for object representation. IEEE Trans Pattern Analysis Mach Intell 18:696–710
Nicolò F, Schmid NA (2012) Long range cross-spectral face recognition: matching SWIR against visible light images. IEEE Transact Inf Foren Sec 7:1717–1725
Osuna E, Freund R, Girosi F (1997) Training support vector machines: an application to face detection
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep Face Recognition British Machine Vision Conference
Penev P, Atick J (1996) Local Feature Analysis: A general statistical theory for object representation
Petrovic V (2007) Subjective tests for image fusion evaluation and objective metric validation. Info Fusion 8(2):2018–2216
Raghavendra R, Dorizzi B, Rao A, Kumar GH (2011) Particle swarm optimization based fusion of near infrared and visible images for improved face verification. Pattern Recogn 44:401–411
Rajoub BA, Zwiggelaar R (2014) Thermal facial analysis for deception detection. IEEE Transact Inf Foren Sec 9(6):1015–1023
Schoelkopf B, Smola A, Muller KR (1997) Kernel principal components analysis”, artificial neural networks, ICANN97
Seal A, Bhattacharjee D, Nasipuri M (2016) Human face recognition using random forest based fusion of à-trous wavelet transform coefficients from thermal and visible images. Int J Electron Commun (AEÜ) 70:1041–1049
Shen LL, Bai L (2004) Gabor feature based face recognition using kernal methods. Proc IEEE Int Conf Auto Face Gest Recog:386–389
Tan H, Huang X, Tan H, He C Pixel-level Image Fusion Algorithm Based on Maximum Likelihood and Laplacian Pyramid Transformation. J Comput Inf Syst 9(1):327–334
Toet A (1989) Image fusion by a ratio of low-pass pyramid. Pattern Recogn Lett 9:245–253
Tripathi BK (2017) On the complex domain deep machine learning for face recognition. Appl Intell 47(2):382–396
Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Sci:71–86
Turk M, Pentland A (1991) Eigen faces for recognition. IEEE Transact Cogn Neurosci 13(1):71–86
Turk MA, Pentland AP (1991) Face recognition using eigenfaces. Computer Vision and Pattern Recognition. Proceedings {CVPR'91.}, {IEEE} Computer Society Conference on 1991
Villegas-Quezada C, Climent J (2008) Holistic face recognition using multivariate approximation,genetic algorithms and AdaBoost classifier: preliminary results. World Acad Sci Eng Technol 44:802–806
Xiaozheng Z, Gao Y (2009) Face recognition across pose: A review. Pattern Recogn 42:2876–2896
Yang Y, Que Y (2016) Shuying Huang, and Pan Lin, ‘multimodal sensor medical image fusion based on Type-2 fuzzy logic in NSCT domain’. IEEE Sensors J 16(10):3735–3745
Yang J, Zhang D, Frangi AF, Yang J-y (2004) Two-Dimensional PCA : A New Approach to Appearance-Based Face Representation and Recognition. IEEE Trans Pattern Anal Mach Intell 26(1):131–137
Zheng L, Blasch E, Xue Z, Zhao J, Laganiere R, Wei W (2012) Objective Assessment of Multi-resolution Image Fusion algorithms for Context enhancement in Night Vision: A Comparative Study. IEEE Trans Pattern Anal Mach Intell 34(1):94–109
Zheng L, Blasch E, ZhiyunXue JZ, Laganiere R, Wu W (2012) Objective assessment of multi-resolution image fusion algorithms for context enhancement in night vision: a comparative study. IEEE Trans Pattern Anal Mach Intell 34(1)
Zhou M, Wei H (2006) Face verification using Gabor wavelets and adaboost. Proc Int Conf Pattern Recog:404–407
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kanmani, M., Narasimhan, V. Optimal fusion aided face recognition from visible and thermal face images. Multimed Tools Appl 79, 17859–17883 (2020). https://doi.org/10.1007/s11042-020-08628-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08628-9