1 Introduction

In the past two decades, automated face recognition have attracted much attention [13]. Appearance-based methods are one kind of the most widely used face recognition methods. Typical appearance-based face recognition methods include face recognition based on principal component analysis (PCA) [4], linear discriminant analysis (LDA) [5, 6] and Gabor wavelets. It has been commonly admitted that Gabor-feature-based method is one of the state-of-art appearance-based face recognition methods [79].

It is regarded that Gabor-feature-based methods can be grouped into analytical methods and holistic methods. Analytical methods compute the response of an image to a Gabor wavelet in a set of discrete locations such as the eyes, eyebrows, chin and nose, whereas holistic methods make use of a global response of the face image [79]. Holistic methods are also usually combined with other methods such PCA, LDA, 2DPCA [10] and kernel methods for face recognition applications. It has been demonstrated that holistic face recognition methods outperform analytical methods and holistic algorithms of Gabor-feature-based methods rank higher than their analytical counterparts [8]. This is mainly because holistic methods and algorithms rely not only on the Gabor coefficients computed on a limited number of facial landmarks (as analytical methods do), but they also extract relevant information on the global distribution of the facial structure [8]. Moreover, the holistic methods are easier to implement.

When investigating the fusion strategy that fuses the Gabor transformation results of face images, we identify the following aspects. Many Gabor-feature-based face recognition applications take into account a filter bank with five frequencies and eight orientations. Moreover, the majority of Gabor-feature-based face recognition methods first extract Gabor features of the face image with respect to different orientations and spatial frequencies and treats the concatenations of all the Gabor features as the representation of the face image [11, 12]. We refer to these methods as conventional Gabor-based face recognition methods (CGFRM). Hereafter, a Gabor feature means the transformation result of an image with respect to a frequency and orientation. A Gabor feature is a complex matrix with the same size as the original image. The real and imaginary parts of the complex number in the Gabor feature denote the real and imaginary parts of the transformation result of a pixel, respectively. It is clear that most of CGFRMs directly fuse multiple Gabor features at the feature level. On the other hand, in previous literatures there are also two examples of decision level fusion and matching score level fusion of Gabor features. Specifically, Wang et al. [13] divided the 64 Gaborface features into groups and simultaneously apply classification algorithms to these feature groups, and then performed the decision level fusion to obtain the final classification results. Serrano et al. [14] fused the Gabor features at the matching score level. They first respectively applied 40 classifies to 40 Gabor features and then directly summed 40 SVM matching scores to produce the final matching score.

We think that there is a space to improve the fusion strategy in previous CGFRM. First, the literature has pointed out that the decision level fusion cannot completely exploit the information of the multiple features [15]. Second, CGFRM equally combines all the Gabor features at the feature level and ignores the fact that different Gabor features might have different influence on face recognition. Actually, experimental results have shown that Gabor banks with different frequencies and orientations lead to different performances of face recognition [16]. In our opinion, the matching score level fusion has a great potential to exert the performance of different Gabor features. Though a matching score level fusion method has been proposed in [14], the method is very computationally inefficient. This is because the method was based on 40 support vector machines (SVM). More importantly, in [14] only SVM was used as the classifier; it is not known whether the matching score level fusion strategy combined with other classifiers can perform well or not. Moreover, in [14] only the magnitude of the Gabor feature was exploited and the authors did not explore the applicability of the phase.

We note that previous studies also attempted to improve the discriminant capability of Gabor features by selecting optimal parameters for the Gabor wavelet [17, 18]. Perez et al. [19] used entropy and genetic algorithms to select Gabor jets and exploited a weighted Borda count to perform classification. Guo et al. [20] encoded Gabor phase difference relationships between neighborhood and pixels and used the codes to represent the image. Štruc et al. [21, 22] integrated the phase of the Gabor feature with LDA for face recognition. Previous studies have also shown that the code or histogram of the phase of the Gabor feature rather than the phase itself can lead to a promising face recognition performance [19, 23]. This indeed somewhat implies that low-resolution representation of the phase is suitable for face recognition.

It seems that different biometrics applications prefer different information of the Gabor feature. For example, Gabor-based face recognition usually uses the magnitude that is directly associated with both the real and imaginary parts of the Gabor feature, whereas Gabor-based palmprint authentication usually exploits only the real part of the Gabor feature [22].

In this paper, we propose a matching score level fusion method on Gabor features. The first scheme of the method views the Gabor features with different spatial frequencies as the features of face images with respect to different resolutions and directly summed the matching score to obtain the final score. These features are also referred to as multi-resolution Gabor features of face images. The second scheme first encodes the phase of the Gabor feature and then takes the weighted sum of the matching scores of the magnitude and phase of the Gabor features with different spatial frequencies to obtain the final score. The weighted fusion scheme has the following rationale: the magnitude and phase of the Gabor feature have different capabilities in representing and recognizing the face, and a proper weight is able to denote this capability. We also illustrate that in face recognition, the low-resolution representation of the phase of the Gabor feature such as the code of the phase is more discriminative than the phase itself.

2 Gabor transform and our method

The Gabor filter takes the form of a complex plane wave modulated by a Gaussian envelope function. The Gabor filter can be formulated in spatial-frequency domain as:

$$ \psi_{u,v} (z) = \frac{{||k_{u,v} ||}}{{\sigma^{2} }}e^{{( - ||k_{u,v} ||^{2} ||z||^{2} /2\sigma^{2} )}} [e^{{izk_{u,v} }} - e^{{ - \sigma^{2} /2}} ] $$
(1)

where \( z = (x,y) \), \( \sigma = 2\pi \), \( k_{u,v} = \left( \begin{gathered} k_{v} \cos \phi_{u} \hfill \\ k_{v} \sin \phi_{u} \hfill \\ \end{gathered} \right) \), and \( k_{v} \) and \( \phi_{u} \) control the scale and orientation of the Gabor wavelet, respectively. The first term in the brackets is the oscillatory part of the kernel and the second compensates the DC value. Let image matrix \( I(z)(z = (x,y)) \) be a facial matrix, and then the Gaborface is represented as the convolution of \( I(z) \) with the Gabor wavelet \( \psi_{u,v} (z) \), which can be defined as \( O_{u,v} (z) = I(z)*\psi_{u,v} (z) \).

2.1 The first scheme of our method

The first scheme of our method first respectively runs for each of all the spatial frequencies. For every spatial frequency, this scheme first performs all the Gabor transformations with respect to different orientations for all the face images and then concatenates the magnitudes of all the Gabor features, with respect to different orientations, of a face image to form a matrix. Let \( Y_{f} \) and \( X_{f}^{i} \) denote the matrices of the Gabor features with respect to frequency f of the test sample and i-th training sample, respectively. The first scheme calculates the distance between the training and test samples using \( d_{i}^{f} = ||X_{f}^{i} - Y_{f} || \). \( d_{i}^{f} \) is used as the matching score of the magnitude of the Gabor feature with respect to frequency f of the test sample and i-th training sample. The matching scores are fused using \( d_{i} = \sum\nolimits_{{f = f_{1} }}^{{f_{N} }} {d_{i}^{f} } \). \( f_{1} , \ldots ,f_{N} \) stand for N frequencies. \( d_{i} \) is referred to as final matching score between the test sample and i-th training sample. \( d_{i} \) indeed denotes the sum of the similarities between the N Gabor features of the test sample and i-th training sample. It is clear that the smaller the \( d_{i} \), the higher the similarity between the test sample and i-th training sample. If \( k = \arg\min_{i} d_{i} \), the first scheme considers that the test sample is from the same class as the k-th training sample.

2.2 The second scheme of our method

The second scheme of our method uses a weighted matching score level fusion strategy to fuse the magnitude and phase code of the Gabor feature. This scheme uses the same way as the first scheme to obtain the magnitude of the Gabor feature. Moreover, it also codes the phase and uses it for face recognition.

The main steps of the second scheme of our method are as follows: it first respectively runs for each of all the spatial frequencies and performs all the Gabor transformations with respect to different orientations for all the face images. For each frequency, it concatenates the magnitudes of all the Gabor features with respect to different orientations of a face image to form a matrix (referred to as magnitude matrix). For each frequency, the second scheme also concatenates the codes of the “phase” of all the Gabor features to produce a matrix (referred to as phase matrix). If there are m frequencies, then we obtain m magnitude matrices and m phase matrices each corresponding to one frequency.

Let \( X_{f}^{i} \) and \( Y_{f} \) denote the magnitude matrices with respect to frequency f of the i-th training sample and test sample, respectively. The distance between \( X_{f}^{i} \) and \( Y_{f} \) are calculated using \( d_{i}^{f} = ||X_{f}^{i} - Y_{f} || \). \( d_{i}^{f} \) is also the matching score with respect to frequency f of the test sample and i-th training sample.

The second scheme of our method codes the “phase” as number 1, 2, 3 or 4 as follows: let O be a Gabor feature. The “phase” is coded using \( C(m,n) = 1 \), if \( Re(O(m,n)) > 0 \) and \( Im(O(m,n)) > 0 \); \( C(m,n) = 2 \), if \( Re(O(m,n)) > 0 \) and \( Im(O(m,n)) \le 0 \);\( C(m,n) = 3 \), if \( Re(O(m,n)) \le 0 \) and \( Im(O(m,n)) \le 0 \); \( C(m,n) = 4 \), if \( Re(O(m,n)) \le 0 \) and \( Im(O(m,n)) > 0 \). \( O(m,n) \) denotes the element located in the m-th row and n-th column of O. C stands for the coding result and has the same size as O. C is the so-called phase matrix, also referred to as phase code. \( C(m,n) \) denotes the element located in the m-th row and n-th column of C. Each sample including the training sample and test sample has one phase matrix.

Let \( \tilde{C}_{f} \) and \( \tilde{C}_{f}^{i} \) denote the phase matrices with respect to frequency f of the test sample and i-th training sample, respectively. The distance between \( \tilde{C}_{f}^{i} \) and \( \tilde{C}_{f} \) are calculated using \( \tilde{d}_{i}^{f} = { || }\tilde{C}_{f}^{i} - \tilde{C}_{f} || \). The second scheme normalizes the matching score using \( e_{i}^{f} = \frac{{d_{i}^{f} - d_{\min }^{f} }}{{d_{\max }^{f} - d_{\min }^{f} }} \), \( \tilde{e}_{i}^{f} = \frac{{\tilde{d}_{i}^{f} - \tilde{d}_{\min }^{f} }}{{\tilde{d}_{\max }^{f} - \tilde{d}_{\min }^{f} }} \). \( d_{\max }^{f} ,d_{\min }^{f} \) denote the maximum and minimum values of \( d_{i}^{f} \), respectively. \( \tilde{d}_{\max }^{f} ,\tilde{d}_{\min }^{f} \) denote the maximum and minimum values of \( \tilde{d}_{i}^{f} \), respectively. It is clear that \( 0 \le e_{i}^{f} ,\tilde{e}_{i}^{f} \le 1 \).

The second scheme obtains the final matching scores using\( d_{i} = q_{1} \sum\nolimits_{{f = f_{1} }}^{{f_{2} }} {e_{i}^{f} } + q_{2} \sum\nolimits_{{f = f_{1} }}^{{f_{2} }} {\tilde{e}_{i}^{f} } \). \( q_{1} \) and \( q_{2} \) are the weights. If \( k = \arg\min_{i} d_{i} \), the second scheme considers that the test sample is from the same class as the k-th training sample.

2.3 Analysis of our method

This subsection mainly shows the characteristics and rationales of our method. First, our method is based on the matching score level fusion strategy and has the following rationale. Among the fusion strategies of multi-biometrics, the matching score level fusion is the most common strategy owing to the ease in accessing and combining the scores generated from different biometrics traits [4, 5, 24, 25]. Actually, the matching score level fusion conveys more information than the decision level fusion. This is because for a biometrics trait of a test sample, the decision level fusion predicts its class label in the form of an integer number and provides it for the final decision, whereas the matching score level fusion provides the matching score in the form of a real number for the final decision. The matching score means the similarity or dissimilarity between the test sample and each training sample. As a result, the matching score level fusion exploits the information of how the test sample is similar to every training sample to obtain the final authentication result, but the decision level fusion can use only some integer numbers to do so. Specifically, if there are N frequencies and M training samples, our method will depend on NM real numbers to produce the final authentication result. However, a decision level fusion method will use only N integer numbers to obtain the final authentication result. Each of these N integer numbers stands for the predicted class label, of the test sample, generated from one frequency. Compared with the feature level fusion, the matching score level fusion allows the multiple biometrics traits to be independently coped with. Moreover, when integrating the matching scores of all the biometrics traits to obtain the final authentication result, it can set a larger weight for a more accurate biometrics trait. This is very useful for obtaining the optimal result.

Second, the second scheme of our method provides a reasonable way to fuse the magnitude and phase of the Gabor feature of the face image. Superficially, most of previous literatures regard that the phase of the Gabor features of the face image is not so powerful in classifying the subject as the magnitude. Actually, most of CGFRM do not exploit the phase [7]. However, it has also been demonstrated that codes of the phase can be very useful for distinguishing the face [20, 21, 23]. As a result, it is reasonable for our method to simply encode the phase of the Gabor features and to fuse the magnitude and phase code at the matching score level. As shown in Sect. 3, the experimental result also illustrates that the code of the phase is more discriminative than the phase matrix itself.

3 Experiments

In this section, we test the two schemes of our method and perform experimental comparison on the phase of the Gabor feature and its code.

3.1 Experiments on the FERET face database

The FERET program ran from 1993 through 1997 and was sponsored by the Department of Defense’s Counterdrug Technology Development Program through the Defense Advanced Research Products Agency (DARPA). The primary mission of this program was to develop automatic face recognition capabilities that could be employed to assist security, intelligence and law enforcement personnel in the performance of their duties. The FERET image corpus was assembled to support government monitored testing and evaluation of face recognition algorithms using standardized tests and procedures. The final corpus consists of 14051 eight-bit grayscale images of human heads with views ranging from frontal to left and right profiles.

We used a subset of the FERET face database as shown in [26] to test our method. This subset includes 1,400 images of 200 individuals each providing seven images. It is composed of the images whose names are marked with two-character strings: “ba”, “bj”, “bk”, “be”, “bf”, “bd”, and “bg”. This subset involves variations in facial expression, illumination and pose [26]. We cropped the facial portion of each original image and resized the cropped image to 80 × 80 pixels and pre-processed them by histogram equalization. In order to reduce the computational cost, we further resized the images into 40 × 40 matrices. We took the first four face images of each subject as training images and treated the others as test images. Before we tested our method and other methods, we normalized each face image to a vector with the length of 1.

Table 1 shows the rates of classification errors of CGFRM, the first and second schemes of our method on the FERET face database. It is clear that both the first and second schemes of our method obtain lower rates of classification errors than CGFRM. In addition, the second scheme of our method can outperform the first scheme. This is mainly because the weight in the second scheme enables it to sufficiently exert the performance on face recognition of the magnitude and phase of the Gabor feature. Table 2 shows the rates of classification errors of face recognition based on either “phase code” or “original phase angel” of the face images in the FERET face database. It indicates that face recognition based on “phase code” can produce a much lower rate of classification errors than face recognition based on “original phase angel.” Hereafter, “original phase angel” is calculated as follows: \( A(m,n) = \arctan a(m,n) \), \( a(m,n) = \text{Im} (I(m,n)/\text{Re} (I(m,n) \). \( A(m,n) \) is the so-called “original phase angel.” Face recognition based on “original phase angel” takes “original phase angels” as features of the face image and uses the nearest neighbor classifier to perform classification. Face recognition based on “phase code” takes “phase code” defined as in Sect. 2.2 as features of the face image and also uses the nearest neighbor classifier to perform classification.

Table 1 Rates of classification errors of CGFRM, the first and second schemes of our method on the FERET face database
Table 2 Rates of classification errors of face recognition based on either “phase code” or “original phase angel” of the face images in the FERET face database

3.2 Experiments on the Yale B face database

The Yale B face image database was obtained with varying illuminations and unfixed poses. We used 45 face images with pose 00 of each subject to conduct experiments. Each of these images was cropped to form a 32 × 32 image. As we did in [27], we divided these face images into four subsets. The samples from subset 1 were used as training samples, and the others were served as test samples.

Table 3 shows the rates of classification errors of CGFRM, the first and second schemes of our method on the Yale B face database. Table 4 also shows the rates of classification errors of face recognition based on either “phase code” or “original phase angel” of the face images. These two tables also confirm that both the first and second schemes of our method can obtain lower rates of classification errors than CGFRM. They also support the conclusion that ‘face recognition based on “phase code” can produce a much lower rate of classification errors than face recognition based on “original phase angel”’.

Table 3 Rates of classification errors of CGFRM, the first and second schemes of our method on the Yale B face database
Table 4 Rates of classification errors of face recognition based on either “phase code” or “original phase angel” of the face images in the Yale B face database

4 Conclusion

This paper shows that the matching score level fusion is a good strategy for Gabor-feature-based face recognition and the proposed two fusion schemes can obtain good performance. The proposed matching score level fusion schemes have the following rationale: first, the proposed schemes can convey sufficient information of multiple Gabor features to the face recognition procedure. As multi-resolution Gabor features reflect different characteristics of the face image and these Gabor features are independently dealt with and fused, the matching score level fusion enables the complementary information of different Gabor features to be more sufficiently exploited for face recognition than the direct feature level fusion that is usually adopted by previous Gabor-feature-based face recognition schemes. Second, the second scheme proposed in this paper provides a very effective way to fuse the magnitude and phase of the Gabor features at the matching score level. Actually, by properly setting the weight, the second scheme is easy to control the influence on the final classification decision of the magnitude and phase. The paper also illustrates that face recognition based on “phase code” can produce a much lower rate of classification errors than face recognition based on “original phase angel.”