Keywords

1 Introduction

Due to the advantages of non-intrusive natural and high uniqueness, as well as the increasing need for security-related applications, face recognition has been an active topic for researchers in the field of computer vision and pattern recognition [1]. A wide range of applications, from public security to personal consumer electronics, have made face recognition one of the most popular research topics. With the past decade, numerous face recognition approaches have been reported in the literature to handle variations in illumination, expression and occlusion etc. Despite a significant level of maturity and a few practical successes, finding efficient and robust algorithms for face recognition is still an active and challenging topic.

As a generalization of Fourier transform (FT), the fractional Fourier transform (FRFT) is a powerful signal analysis and processing tool. The Fourier analysis [2, 3] is an effective analysis tool for facial images. However, it abnegates the time-domain information of images absolutely so that it will lose some significant information of faces inevitably. The conventional FT can be regard as a rotation in the frequency plane. However, FrFT performs a rotation of signal to any angle. Therefore, the two dimensional fractional Fourier transform (2D-FrFT) contains the time-frequency information of the signal simultaneously. Facial image can be regarded as a gradually changed signal. It has been shown that the 2D-FrFT is an effective analysis tool for facial images [6]. Gao et al. made use of the phase information of the 2D-FrFT for emotion recognition [4]. Wang et al. utilized the phase and Magnitude information of 2D-FrFT for face registration [15]. Kong et al. utilized 2D-FrFT with fuzzy fusion classification algorithm in human emotional state recognition [14].

Motivated by the above-mentioned reason, in this paper, we adopt 2D-FrFT phase information to improve the effect of face recognition. First, we extract the phase information of facial image in 2D-FrFT, which is called the generalized phase spectra (GPS). It has been shown that the lower frequency bands contain more smooth information, while high bands contain the edge information of a facial image [2]. Then, we present an improved two-dimensional separability judgment to select appropriate order parameters for 2D-FrFT. Through selecting the appropriate transform orders and choosing the optimal generalized phase spectrum band, a new spectrum feature can be extracted, which not only contains the smooth information but also includes the edge information of a facial image. In order to make full use of the discriminative information from different orders for face recognition, the proposed approach merges multiple orders’ GPS band of 2D-FrFT and gives different weights to different orders simultaneously, as illustrated in Fig. 1. Finally, we perform the classification using the nearest neighbor classifier. In the experiments, two famous public face databases are employed as the test data.

Fig. 1.
figure 1

Illustration of the WMFP.

In the following sections, we will first perform the analysis of FRFT and the definition of the GPS in Sect. 2. Then our face recognition approach is presented in Sect. 3. The experiment results based on the nearest neighbor classifier are summarized in Sect. 4. And finally, conclusions are drawn in Sect. 5.

2 Fractional Fourier Transform

The discrete FrFT [6] is a new time-frequency analysis tool which is developed and widely used in recent years. In essence, the signal representation in fractional Fourier domain integrates the information of time domain and frequency domain. The 2D-FrFT on each image is defined as follows:

$$ {\mathbf{X}}_{{\left( {\upalpha ,\upbeta } \right)}} \left( {{\text{m}},{\text{n}}} \right) = \sum {_{\text{p = 0}}^{{{\text{M}} - 1}} } \sum {_{\text{q = 0}}^{{{\text{N}} - 1}} } {\text{x}}\left( {{\text{p}},{\text{q}}} \right){\text{K}}_{{\left( {\upalpha ,\upbeta } \right)}} \left( {{\text{p}},{\text{q}},{\text{m}},{\text{n}}} \right) $$
(1)
$$ {\text{K}}_{{\left( {\upalpha ,\upbeta } \right)}} = {\text{K}}_{\upalpha } \otimes {\text{K}}_{\upbeta } $$
(2)
$$ {\text{K}}_{\text{p}} = \frac{{{\text{A}}_{\upalpha} }}{{2\Delta {\text{x}}}}{ \exp }\left( {\frac{{{\text{j}}\uppi\left( {{ \cot }\;{ \cot }\;\upalpha} \right){\text{m}}^{2} }}{{\left( {2\Delta {\text{x}}} \right)^{2} }} - \frac{{{\text{j}}2\uppi\left( {\csc \csc\upalpha} \right){\text{mn}}}}{{\left( {2\Delta {\text{x}}} \right)^{2} }} + \frac{{{\text{j}}2\uppi\left( {\cot \cot\upalpha} \right){\text{n}}^{2} }}{{\left( {2\Delta {\text{x}}} \right)^{2} }}} \right) $$
(3)

The transform kernel is \( {\text{K}}_{{\left( {\upalpha ,\upbeta } \right)}} \), as defined as (2). 2D-FrFT is equivalent to apply discrete FrFT on the two parameters of a signal \( {\text{x}}\left( {{\text{p}},{\text{q}}} \right) \) successively. The separable kernel \( {\text{K}}_{\text{p}} \) with respect to different orders for x and y axes indicates additional degrees of freedom that can be used for specific applications in the feature extraction.

As we all know, two dimensional Fourier transform (2DFT) is not reality preserving. With the 2DFT, the original data is transformed into complex-value which contains both magnitude information and phase information. So does the 2D-FrFT, which is the generalized form of 2DFT. It is inflected in the phenomenon that the transform order can be selected according to different needs. In this paper, since the phase contains a large amount of edge information [4], it is more conducive to identify the face. It is extended from the phase spectrum by the FT. GPS means that a facial image with the 2D-FrFT phase information is distributed on the fractional domain.

3 Proposed Approach

It has been demonstrated that not all phase spectrum bands of FT are beneficial for face recognition [2]. This motivates us to exploit the discriminative nature of different GPS bands. Obviously, how to choose the appropriate GPS bands from different transform orders is the key question. According to the properties of FrFT [6], it is periodic with periodicity equal to 4. From the definition, we give detailed description about the transform order that changes from 0.1 to 1. Pei et al. has given some explanation about the phase information of different transform orders [6]. The transform order \( {\text{p}} \) changing from 0.1 to 1 is divided into three ranges according to the inter-class and intra-class difference of phase information. The orders can be divided into three ranges, small order range (0.1 to 0.4), middle order range (0.5 to 0.7) and large order range (0.8 to 1) respectively. The Fig. 2 indicates that the difference of inter-class figures from different ranges is obvious while the intra-class figures from the same range are similar.

Fig. 2.
figure 2

(a) is the transformed face image from ORL database according to the orders changing from 0–1, (b) and (c) are the intra-class and inter-class average difference of three ranges based on ORL database, as well as (d), (e) and (f) are based on AR database.

As discussed in [3], the small transform orders will lose edge information of an image with the 2D-FrFT, and the large transform orders will contain much noise, such as illumination, expression and so on. As for the middle order, it is the low-abiding, but we can also think that it cannot make full use of the low spectral phase band and the high spectral phase band yet. To make full use of the GPS bands of different ranges, we select the transform order from each range randomly and then choose the bands by the improved two-dimensional separability judgment. When the division of orders is completed, how to choose the GPS bands from the transform ranges is critical. The two-dimensional separability judgment [2] is adopted to select the frequency band, but it neglects the transformed data structure. In this paper, the improved two-dimensional separability judgment is proposed which takes the transformed data information into account instead of the face images. Suppose that the original image sample set is \( {\text{X}} \), each gray image matrix is sized \( {\text{M}}\; *\;{\text{N}} \) and expressed by \( {\text{X}}\left( {{\text{p}},{\text{q}}} \right) \), where \( 1 \le {\text{p}} \le {\text{M}} \), \( 1 \le {\text{q}} \le {\text{N}} \). We perform 2D-FrFT as depicted as the Eq. (4):

$$ {\mathbf{F}}_{{\left( {\upalpha, {\upbeta }} \right)}} \left( {{\text{u}},{\text{v}}} \right) = \sum {_{{{\text{p}} = 0}}^{{{\text{M}} - 1}} } \sum {_{{{\text{q}} = 0}}^{{{\text{N}} - 1}} } {\mathbf{X}}\left( {{\text{p}},{\text{q}}} \right){\text{K}}_{{\left( {\upalpha ,\upbeta } \right)}} \left( {{\text{p}},{\text{q}},{\text{u}},{\text{v}}} \right) $$
(4)

For simplicity, we still use \( {\text{X}} \) to represent the transformed data. We will choose the GPS bands according to the properties of phase information from different ranges. Let \( {\text{F}}_{{\left( {\upalpha ,\upbeta } \right)}} \left( {\upmu_{0} ,{\upnu }_{0} } \right) \) indicates the zero PSB. Shift \( {\text{F}}_{{\left( {\upalpha ,\upbeta } \right)}} \left( {\upmu_{0} ,{\upnu }_{0} } \right) \) to the center of the image at \( \left( {{\text{M}}/2,{\text{N}}/2} \right) \). Since the GPS domain is represented by the matrix form, we use a square ring \( {\text{Ring}}\left( {\text{k}} \right) \) to represent the \( k{\text{th}} \) GPS band, where \( 0 \le {\text{k}} \le {\text{M}}/2 \). The four vertexes of \( {\text{Ring}}\left( {\text{k}} \right) \) are \( \left( {{\text{u}} - {\text{k}},{\text{v}} - {\text{k}}} \right) \), \( \left( {{\text{u}} + {\text{k}},{\text{v}} - {\text{k}}} \right) \), \( \left( {{\text{u}} - {\text{k}},{\text{v}} + {\text{k}}} \right) \) and \( \left( {{\text{u}} + {\text{k}},{\text{v}} + {\text{k}}} \right) \) respectively. So, the \( k{\text{th}} \) GPS band denotes:

$$ {\text{F}}_{{\left( {\upalpha ,\upbeta } \right)}} \left( {\upmu,{\nu }} \right) \in {\text{Ring}}({\text{k}}) $$
(5)

Different GPS bands with the above expression way are illustrated in Fig. 3. If we select the GPS band, keep the original values of \( {\text{F}}_{{\left( {\upalpha ,\upbeta } \right)}} \left( {\upmu,{\upnu }} \right) \), otherwise set the values of \( {\text{F}}_{{\left( {\upalpha ,\upbeta } \right)}} \left( {\upmu,{\upnu }} \right) \) to be zero. The selection step can be expressed by the Eq. (6):

Fig. 3.
figure 3

Illustration of the GPS bands.

$$ {\text{F}}_{{\left( {\upalpha ,\upbeta } \right)}} \left( {\upmu},{\upnu } \right) = \left\{ {\begin{array}{*{20}c} {\text{Original values}} & {{\text{if F}}_{{\left( {\upalpha ,\upbeta } \right)}} \left( {\upmu ,{\upnu }} \right) \in {\text{Ring}}({\text{k}})} \\ 0 & {{\text{if F}}_{{\left( {\upalpha ,\upbeta } \right)}} \left( {\mu ,\upnu } \right) \notin {\text{Ring}}({\text{k}})} \\ \end{array} } \right. $$
(6)

Then we can evaluate the separability \( J_{\text{k}} \) by the improved two-dimensional separability judgment. Let \( {\text{A}}_{\text{i}} \left( {{\text{i}} = 1,2, \ldots ,{\text{C}}} \right) \) denote a mean value of \( w_{\text{i}} \) class and \( {\text{A}} \) denote the total mean value of \( {\text{X}} \). \( {\text{A}}_{\text{i}} \) and \( {\text{A}} \) are in the form of the matrix. \( {\text{F}}_{{{\text{k}}\left( {\upalpha ,\upbeta } \right)}} \) represents the \( k{\text{th}} \) GPS band. With regard to the \( k{\text{th}} \) GPS band, we can compute the within-class \( S_{w} \) matrix and between-class \( S_{b} \) matrix according to the Eqs. (7) and (8), and the trace ratio is calculated by the Eq. (9).

$$ S_{b} = \sum {_{i = 1}^{C} } \left[ {\left( {A_{i} - A} \right)\left( {A_{i} - A} \right)^{T} } \right] $$
(7)
$$ S_{w} = \sum {_{i = 1}^{C} } \left[ {\left( {A_{i} - F_{k(\alpha ,\beta )} } \right)\left( {A_{i} - F_{k(\alpha ,\beta )} } \right)^{T} } \right] $$
(8)
$$ J_{k} = \frac{{tr\left( {S_{b} } \right)}}{{tr\left( {S_{w} } \right)}} $$
(9)

For all the GPS bands, we select the bands with high trace ratio by maximizing the between-class scatter while minimizing the within-class scatter.

4 Experimental Results

In this section, we present the experimental results on public available databases for face recognition, which serve to illustrate the efficacy of the proposed approach. The following describes the details of the experiments and results. The nearest neighbor classifier with the Euclidean distance is applied for recognition.

4.1 Experiments on the FERET Face Database

The FERET database consists of 13539 facial images corresponding to 1565 subjects, who are diverse across ethnicity, gender and age. From the FERET face database, we select a subset includes 1400 images of 200 individuals (each individual has seven images) in the experiment. There are 71 females and 129 males. This database involves the variations in facial expression, illumination and poses (±15° and ±25°). All the chosen images are resized to 64 * 64 with 256 gray levels per pixel. Figure 4 shows some sample images of one subject from the FERET database. In the experiment, the first three images of each person are used as training samples and the remainder as test samples. So, there are 600 (=200 * 3) training samples and 800(=200 * 4) test samples.

Fig. 4.
figure 4

The samples of cropped face images from the FERET database.

Tables 1 and 2 show the accuracy of different transform orders and the comparison between WMFP and Eigenface, Fisherface, SRC [10], LGBP [14], Fourier + PCA [2] and LFMP [15] separately on the FERET database. We implemented these methods ourselves and tuned the parameters for each method for fair comparison. The results verify the proposed method powerfully. Figure 5 indicates the identity of GPS bands from different transform orders, from which we select 0.4, 0.5 randomly and 0.9 as the band fusion orders. It is a positive correlation between color and the trace ratio of the GPSB. The closer to the white color, the higher trace ratio of GPSB.

Table 1. The recognition rate with the discrimination of different orders on the FERET database
Table 2. The comparison results on the FERET database
Fig. 5.
figure 5

Illustration of the GPS bands according to different transform orders on the FERET database.

For the experiments on the FERET database, we test the effectiveness of our method to the overall problems. As can be seen from this table, the proposed WMFP outperforms the 6 compared methods with the lowest gains in accuracy of 5% on the FERET database. We have made three observations from the results listed in Tables 1 and 2: (1) Table 1 shows that different orders of 2D-FrFT may result in different recognition rates. However, the highest rate 68%, which is achieved in order 0.3, is still not satisfying, due to the fact that the single order 2D-FrFT feature is always not enough for recognition. Not only the amount of information, but also the redundancy is increasing when weighted multi-order bands are fused. (2) The Gabor method is very time consuming and more suitable for high-resolution image which imposes a heavy computational burden on the target device, in particular on mobile devices, which have low computational power. The proposed method is no need to construct the subspace through the feature extraction methods and has less computation cost. (3) WMFP obtains the best recognition performance on all the experiments, which implies that both edge and smooth information of facial image are important for recognition. Compared with other methods, the WMFP is more intuitive and effective with considerable results.

4.2 The Robustness to the Illumination Based on the Extended Yale-B Face Database

The Extended Yale-B database consists of 2414 frontal face images of 38 individuals under various laboratory-controlled lighting conditions. All the test images used in the experiments are manually aligned, cropped, and then resized to 64 * 64 images. The Extended Yale-B set only has little variability of expression, aging, etc. However, its extreme lighting conditions still make it a challenging task for most face recognition methods. The database is divided into five subsets, which can be seen from the Table 3. In this experiment, we select 0.2, 0.5 and 0.8 as the band fusion orders, which are selected randomly from the three transform ranges.

Table 3. Data partition on Extended Yale-B database for various experiments

Figure 6 shows the performance for varying Subsets. We compare the proposed method with PCA, FLD, LPP, SRC and ESERPCA [11], in light of its capability of handling illumination changes. The three classical methods are sensitive to the illumination varying. Note that SRC, ESERPCA and the proposed method show excellent performance for moderate light variations, yielding 100%. In particular, WMFP obtains the best recognition rate of 62.36% on Subset 5. By contrast, recognition accuracy of the other approaches drops heavily in this Subset. It is not only more visualized to display the performance between different Subsets and methods, but also demonstrate the robustness of the proposed algorithm to the varying illumination.

Fig. 6.
figure 6

The comparison on the Extended Yale-B database is revealed in this figure.

5 Conclusion

This paper proposes a new method called WMFP which improve the phase information accuracy of 2D-FrFT significantly. The GPS bands are extracted from different orders of 2D-FrFT, which contains both the smooth information of small orders and the edge information of large orders. Following our method, the accuracy of the phase information achieves a preferred result. It also demonstrates the efficacy to the illumination varying which makes full use of the phase information of different transform orders. For future work, we are interested in designing effective classifiers to improve the recognition performance. Moreover, how to combine 2D-FrFT magnitude and phase of face images to further improve the face recognition accuracy appears to be another interesting direction of future work.