1 Introduction

Face recognition is an important technique in biometrics. It is widely used in our lives, especially in the field of personal safety. Many research approaches have been reported with many applications based on face recognition, and techniques are categorized into identification, determination and verification. Conventional recognition systems include with access control systems, visual surveillance systems, and so on. Most of the systems in face recognition technology are developed under non-illumination-variation conditions (normal conditions). In order to address the limitations in dealing with illumination variation, many efforts have been made in illumination normalization research. However, these studies consider the human face only under little or partial illumination conditions (partial illumination variation), such as the method of local binary patten (LBP) [35,36,37] and homomorphic wavelet-based illumination normalization and difference of Gaussian filter (HWIN + DoG), but never under non-illumination conditions (full illumination variation).

In this paper, we propose an approach for illumination normalization for facial images based on the enhanced contrast method of histogram equalization (HE) and Gaussian low-pass filter (GLPF) smoother. A new filter for illumination normalization is generated through combination of HE and GLPF, called HE_GLPF for short. Normalization of face images by applying the HE_GLPF, allows features of these preprocessed images to be extracted using Gabor wavelets, which consider both magnitude and phase in the frequency-domain of the images and principal component analysis (PCA). Then, a support vector machine (SVM) is applied as a classifier.

Up to now, many popular methods have been proposed in the academic community for face recognition. The forerunner of those typical methods is the well-known PCA method. One of the related articles [1] presented an approach using PCA to extract principal component features from two-dimensional (2-D) facial images for expressing the large 1-D vector of pixels. Then, linear discriminant analysis (LDA) was applied to face recognition, and a related article [2] used LDA to provide a small set of features that contain the most relevant information for classification purposes, which overcomes the limitation of the PCA method by applying a linear discriminant criterion. Afterwards many improved applications based on PCA and LDA were presented in many references [3,4,5,6,7,8,9,10,11]. However, all these methods did not perform well because of the limited ability in managing variations in facial expression, lighting conditions and position.

In order to solve the above mentioned problem, a Gabor filter was used for the face recognition, since a Gabor filter can extract the features of a face image from multiple scales and orientations. The earliest typical article [12,13,14,15,16] presented a method of elastic bunch graph matching (EBGM) to extract some fiducial points on the face to reconstruct face images, and the recognition task was based on these image graphs. It is very important to find the fiducial points precisely, which is challenging because of the illumination variations. Then the Gabor–Fisher classifier (GFC) method was presented [17], which is robust to changes in illumination and facial expression. However, the GFC method extracts the Gabor features only by using a down-sampling approach, which takes the risk of losing some important features. After that, a lot of methods based on a Gabor filter were proposed [19,20,21]. All of these Gabor filter–based methods only consider the magnitude of a Gabor wavelet, but not a phase operation of the Gabor wavelet.

Until recently, only Bellakhdhar et al. [17] had presented a method to raise the facial recognition rate by fusing the phase and magnitude of a Gabor wavelet, building a classifier for facial recognition based on the PCA approach and SVM. Their proposed method was verified by using the public Facial Recognition Grand Challenge v2 database of faces and the Olivetti Research Laboratory (ORL) database, with the experimental results showing that the proposed approach can get a higher recognition rate than some of the existent approaches [17,18,19]. However, this proposed approach may not perform well under illumination variation, because both of the experimental databases used in the paper are under controlled illumination conditions.

To solve the problem of illumination variation of facial images for face recognition, many techniques have been proposed. A popular and effective solution is the difference of Gaussians (DoG) filter, used by some researchers [21,22,23] to perform the task of illumination normalization of facial images, after which the recognition rate is higher than with non-illumination normalization face recognition methods. Most worthy of mention is one proposed illumination normalization method—homomorphic wavelet-based illumination normalization and difference of Gaussian filter (HWIN + DoG) [23]—that performs better than other proposed illumination normalization methods [21, 24, 26].

In this paper, we propose a new illumination normalization method generated by combining an enhanced contrast method, HE, and a smoother GLPF. HE is a method for changing image intensities to enhance contrast, and according to the adjustment, the intensities can be better distributed on the histogram, which means that we can get a higher contrast in the areas of lower local contrast. HE can solve the problem of illumination variation effectively and yet, a lot of noise still exists, and the image is not smooth enough for facial recognition with even HE. To cope with this shortcoming, GLPF is used to filter out the noise and smooth the image. After illumination normalization of images, features of these preprocessed images are extracted by using a Gabor wavelet and PCA. SVM is then used as a classifier for the facial recognition.

This paper is organized as follows. In Sect. 2, we introduce the HE, GLPF, Gabor, PCA and SVM methods step-by-step. In Sect. 3, we perform our proposed approach with various face databases for evaluation. The experiments show that proposed HE_GLPF approach performs well under the full and partial illumination variations, and it showed effectiveness at dealing with the problem of illumination variation of face images. Finally, conclusions and discussion are included in Sect. 4.

2 Proposed approach

In this section, we build a new filter for illumination normalization which is generated by the combination of HE and GLPF. After normalization of face images by applying HE_GLPF, features of these preprocessed images are extracted by combining a Gabor wavelet, which considers both magnitude and phase and PCA. Then, SVM is applied as a classifier for face recognition. Figure 1 shows the workflow of our proposed approach.

Fig. 1
figure 1

Workflow of mining task

2.1 Illumination normalization

In this stage, we normalize the image considering both spatial and frequency domains. In the process of illumination normalization, we use HE as a method for changing image intensities to enhance contrast. Subsequently, GLPF is applied to eliminate noise and smooth the target image. HE and GLPF are combined as a filter to normalize the image to address the problem of illumination variation.

2.2 Histogram equalization

Histogram equalization is an way to change contrast during image processing by using the image’s histogram. It is a method of image processing in the spatial domain of the image. Given an image I, its equalized image is E, the integer pixel intensity of I ranges from 0 to \(L-\)1, where L is the number of possible intensity values. The normalized histogram p denotes a histogram of I with a bin for each possible intensity, which is described in (1):

$$\begin{aligned} p_n =\frac{N_n }{N}, n=0, 1, \ldots , L-1 \end{aligned}$$
(1)

where \(N_n \) is the number of pixels with intensity n, and N is the number of all pixels. Then, the equalized image E can be defined by (2):

$$\begin{aligned} T( k )=floor\left( {\left( {L-1} \right) \mathop \sum \nolimits _{n=0}^k p_n } \right) \end{aligned}$$
(2)

where k are the pixel intensities of I and E, and the function of floor() rounds down to the nearest integer. We consider the intensities of I and E as continuous random variables X and Y; then Y can be defined with (3):

$$\begin{aligned} Y_k =T\left( {X_k } \right) =\left( {L-1} \right) \mathop \int \nolimits _0^k p_k \left( x \right) dx \end{aligned}$$
(3)

where \(Y_k \) is the probability of pixel intensities with level k in E. \(T\left( {X_k } \right) \)is the transformed function for pixel intensities with level k in I. According to Eq. (1), finally, we can get the transformed histogram equalization, which is defined in (4):

$$\begin{aligned} Y_k =\left( {L-1} \right) \mathop \sum \nolimits _{n=0}^k \frac{N_n }{N} \end{aligned}$$
(4)

2.3 Gaussian low-pass filter

GLPF is a method of image processing in the frequency domain, which is used to smooth images and remove noise. Given an image I, which is represented as an M-by-N integer pixel, the GLPF can be defined with (5):

$$\begin{aligned} H\left( u \right) =e^{\frac{1}{2}\left( {\frac{u^{2}}{u_c^2 }} \right) } \end{aligned}$$
(5)

where u is the distance from point (i, j) to the center of a Fourier transform, and \(u_c \) is the standard deviation of the Gaussian function, the value ranges from 0 to 255. The distance from point (i, j) to the center of the Fourier transform can be defined with (6):

$$\begin{aligned} u=\sqrt{\left( {i-floor\left( {\frac{M}{2}} \right) ^{2}} \right) +\left( {j-floor\left( {\frac{N}{2}} \right) ^{2}} \right) } \end{aligned}$$
(6)

where floor() rounds down to the nearest integer. Then, we perform the convolution with H(u) and with the original image, so a new filtered image can be generated. Note that increasing \(u_c \) used in (5) can cause more blurring.

2.4 Feature extraction

In this stage, we extract features from images by combining a Gabor wavelet and principal component analysis.

2.4.1 Gabor wavelet

A Gabor filter has linear filter structure, and it is used for edge detection in image processing. It can be notified that set of Gabor filters with different frequencies and orientations is applied for extracting characteristic from an image. A Gabor wavelet is a combination of elements from a family of mutually similar Gabor functions. Commonly, in the spatial domain, a Gabor wavelet is defined as a 2-D plane wave with wavelet vector \({z}''\), which is expressed by a Gaussian function with relative width \(\sigma \) [13, 24], as shown in (7):

$$\begin{aligned}&{\Psi _{\mu ,\upsilon }}\left( {\vec z} \right) = \frac{{{{\left| {\left| {{{\vec k}_{\mu ,\upsilon }}} \right| } \right| }^2}}}{{{\sigma ^2}}}exp\left( {\frac{{{{\left| {\left| {{{\vec k}_{\mu ,\upsilon }}} \right| } \right| }^2}{{\left| {\left| {\vec z} \right| } \right| }^2}}}{{2{\sigma ^2}}}} \right) \nonumber \\&\quad \left[ {\exp \left( {i{{\vec k}_{\mu ,\upsilon }}\vec z} \right) - exp\left( { - \frac{{{\sigma ^2}}}{2}} \right) } \right] \end{aligned}$$
(7)

where \(\mu \) and \(\upsilon \) define the orientation and scale of the Gabor kernels \({z}''\), Gabor wavelets generally select 8 different orientations and 5 different scales, \(\mu =\left\{ {0,1,\ldots , 7} \right\} \) and \(\upsilon =\left\{ {0, 1, \ldots , 4} \right\} \). \(\left| {\left| \cdot \right| } \right| \) denotes the norm, and the wave vector \({k}''\) is defined in (8):

$$\begin{aligned} {\vec k_{\mu ,\upsilon }} = {k_\upsilon }{e^{i{\phi _\mu }}} \end{aligned}$$
(8)

where \(k_\upsilon =k_{max} /f_\upsilon \) and \(\phi _\mu =\pi \mu /8\). \(k_{max} \) is the maximum frequency, and f is the spacing factor between kernels in the frequency domain [23, 26]. In most face recognition cases, parameters of \(\sigma =2\pi \), \(k_{max} =\pi /2\), \(f=\sqrt{2}\) are used for the Gabor wavelet by most researchers [13, 28]. The Gabor wavelet function can be generated by performing a convolution between the image and a family of Gabor filters as described by (9):

$$\begin{aligned} F_{\mu ,\upsilon } \left( z \right) =I\left( z \right) *\Psi _{\mu ,\upsilon } \left( z \right) \end{aligned}$$
(9)

where \(*\) denotes the convolution operator, and \(F_{\mu ,\upsilon } \left( z \right) \) is the Gabor filter response of the image with orientation \(\mu \) and scale \(\upsilon \).

In this paper, we build a Gabor wavelet considering both magnitude and phase in the frequency domain according to five difference scales and eight difference orientations; 5 \(\times \) 8 = 40 Gabor kernels are generated for both magnitude and phase. In the experiment, we set the size of input images to 128 \(\times \) 128 pixels. Then, we generate the Gabor features by convolving the image with generated Gabor kernels, but the size of the feature vector is (128\( \times \)128\( \times \)40\( \times \)2)—too large to calculate. Then, we apply PCA to reduce the dimension to extract features for classification.

2.4.2 Principal component analysis

PCA is an effective method for reducing the number of dimensions of input data without much loss of information. In this section, the goal of using PCA is to extract features that can well preserve the principal components in a matrix.

Given a set of face images \(I_1 \), \(I_2 \), ..., \(I_m \), the average face of these given face images is defined by (10):

$$\begin{aligned} {\Psi }=\frac{1}{m}\mathop \sum \nolimits _{i=1}^m I_i \end{aligned}$$
(10)

where \(i=\left\{ {1,\ldots ,m} \right\} \). The difference of each input face from the average face is expressed by (11):

$$\begin{aligned} \phi _i =I_i -{\Psi } \end{aligned}$$
(11)

Then, the covariance matrix CM can be calculated with (12):

$$\begin{aligned} { {CM}}=\mathop \sum \nolimits _{i=1}^N \phi _i \phi _i^T =AA^{T} \end{aligned}$$
(12)

During this process, we load the database and apply transformation before loading. Due to the signal contains information useful for recognition, and the relevant parameters are extracted. The model is a compact representation of the signal, so it makes easy recognition process, but also reduces the quantity of data to be stored.

2.5 Classification based on support vector machine

A support vector machine (SVM) is a supervised learning model in learning machines. The SVM algorithm was originally designed for the two value classification problem, when dealing with multiple types of problems, it is necessary to construct a suitable multiple-class-classifier. Now, there are two kinds of methods of SVM designed for multiple-class-classifying. First one is the direct method, this method appears to be simple but its computational complexity is high, it is difficult to realize and only suitable for small problems. Another approach is the indirect method, it is used to achieve multiple-class-classification in many cases, which is constructed by combining several binary classifiers, the common methods are two kinds of one-versus-rest and one-versus-one. One-versus-rest, is the method used to train the samples of a class into a class in turn, the other remaining samples belong to another class, so the samples of the K classes are constructed out of K SVMs. The unknown samples are classified into the class with the maximum value of classification. This method has a drawback, because the training set is 1:m, which is not very useful in the case of bias. So we will use the one-versus-one method to do the classification. One-versus-one, is the method of designing a SVM between any two classes of samples, so the samples of K classes need to be designed out of K*(\(K-\)1)/2 SVMs. For an unknown sample, its class is the class with the most number of votes.

The advantage of using SVM for face classification is the low expected probability of generalization errors. In our work, we try to split face data set into training data sets based on the method of one-versus-one: Assume that there are four persons of A, B, C, D. In the process of training, samples data of (A, B), (A, C), (A, D), (B, C), (B, D) and (C, D) are generated as the six training data sets, then six training results are obtained as classifier-(A, B), classifier-(A, C), classifier-(A, D), classifier-(B, C), classifier-(B, D) and classifier-(C, D). When testing, the unknown samples are respectively tested to six classifiers, and then take the form of voting, finally obtaining a set of results. The process of voting is shown in Fig. 2, we can know that the class of an unknown sample is the class with the maximum number of votes.

Fig. 2
figure 2

The process of voting

3 Experiment and analysis

In the experiments, we use three different face databases: Olivetti Research Laboratory face database (ORL), Yale University face database (Yale) and Brazilian face database (FEI), which represent normalized, partial illumination variation and full illumination variation face databases, respectively. We compared the results from performing our proposed approach with those from the existent approaches in these three different face databases.

3.1 Data set

The ORL face database is composed of 40 distinct persons, and there are 10 different images of each person. It was compiled between April 1992 and April 1994 at the Olivetti Research Laboratory in Cambridge, UK [29]. The face images in ORL are composed without illumination variation; some sample face data are shown in Fig. 3a. For partial illumination variation images the Yale face database is used. It contains 165 grayscale images of 15 distinct persons. There are 11 images per subject, with different facial expressions or configurations [32]. There are some problems with partial illumination variation in some of the face images, the partial contour and texture of the faces is not shown clearly which may cause the consequences of inaccurate results of recognition, and some of these face data samples are shown in Fig. 3b. Full illumination data are obtained via the FEI face database was taken between June 2005 and March 2006 at the Artificial Intelligence Laboratory of FEI in São Bernardo do Campo, São Paulo, Brazil. It constitutes 14 images for each of the 200 distinct persons. There are some problems with full illumination variation in some of the face images, almost all of the contour and texture of the faces is not shown clearly, which will cause the consequences of inaccurate results; some sample face data are shown in Fig. 3c. It is referenced elsewhere [34].

Fig. 3
figure 3

Sample face images: a from the ORL database; b from the Yale database; c from the FEI database

3.2 Analysis of proposed approach

In this subsection, we evaluate our proposed approach by discussing the parameter of standard deviation \(u_c \) used in the process of illumination normalization, and comparison with proposed methods.

3.2.1 Analysis of parameter of standard deviation

In this stage, we performed our proposed approach with the FEI face database to evaluate the parameter of standard deviation in the process of illumination normalization. Note that increasing \(u_c \)used in Eq. (5) can cause more blurring, its value ranges from 0 to 255. The experimental result is shown in Fig. 4.

The result shows that the recognition rate will be high when the value of \(u_c \) ranges from 50 to 100.

Fig. 4
figure 4

Effect of parameter of \(u_c \)

3.2.2 Comparison with proposed methods

Now we compared our proposed approach of illumination normalization (HE_GLPF) with other illumination normalization approaches (HE [34], LVT [31], HWIN + DoG [23]). Figure 5 shows the results of illumination normalization.

Fig. 5
figure 5

Results of illumination normalization in sample facial images

Fig. 6
figure 6

Comparison of recognition rate in different face databases

In Fig. 5, the first facial image in row 1 comes from the ORL dataset; it is a normal face image. Next, the first image in row 2 is one of the Yale dataset; it is a partial illumination variation image, and the first image in row 3 is the full illumination variation image from the FEI dataset. All of the approaches mentioned (HE, LVT, HWIN + DoG and our proposed approach of HE_GLPF) have been applied to the illumination normalization of these three original images (shown in the first column of Fig. 5). For the first case where the face is a normal image (the first row of Fig. 5) the characteristics of the face can be clearly seen for all approaches. In the second case for the partial illumination variation image in the second row, the characteristic of the face can also be discriminated between clearly. Finally, for the full illumination variation image in the third row, our proposed HE_GLPF performs the best among the proposed approaches.

With the results of Fig. 5, we can evaluate our proposed approach of illumination normalization (HE_GLPF) with other existent illumination normalization approaches (HE, LVT, HWIN + DoG) by comparing the result of performing classification using Gabor_PCA_SVM. We applied these three approaches on the three face databases, respectively and set the value of \(u_c \) in our proposed illumination normalization approach to 50. The recognition rate results of our proposed method HE_GLPF + Gabor_PCA_SVM (HE_GLPF_Gabor_PCA_SVM) and the other existing methods HE + Gabor_PCA_SVM (HE_Gabor_PCA_SVM), LVT + Gabor_PCA_SVM (LVT_Gabor_PCA_SVM) and HWIN + Dog + Gabor_PCA_SVM (HWIN_Dog_Gabor_PCA_SVM) are illustrated in Fig. 6. The classifiers of SVMs used in this paper are all performed with defaulted parameters. (main parameters: C = 1.0, kernel = ’rbf’, degree = 3, gamma = ’auto’, coef0 = 0.0) From the result, the proposed HE_GLPF_Gabor_PCA_SVM showed the best performance regardless of data type, normal, full and partial illumination. Generally, the normal data recognition rate is better than that for the illumination affected data. Also the recognition rate of partial illumination variation data is slightly higher than for the full illuminated data.

The result shows that all approaches perform with a recognition rate of over 95% for the normal face database from ORL. This is an expected result because there are no illumination problems. The proposed HE_GLPF_Gabor_PCA_SVM approach showed one of the highest recognition rates together with the HE_Gabor_PCA_SVM and the HWIN_DoG_Gabor_PCA_SVM. For the partial illumination variation data from Yale, the proposed HE_GLPF_Gabor_PCA_SVM approache is tied for first with the LVT and HWIN_Dog_Gabor_PCA_SVM approaches. In the FEI face database test, which contains some face images of full illumination variation, our proposed approach performs the best in comparison to all other approaches, because our proposed illumination normalization approach, HE_GLPF, can solve the full illumination variation problem more effectively than the other approaches. Specifically, the recognition rate of performing the other approaches: Gabor_PCA_SVM, HE_Gabor_PCA_SVM, HWIN_DoG_Gabor_PCA_SVM and VLT_Gabor_PCA_SVM are discriminatively lower than performing our proposed approach of HE_GLPF_Gabor_PCA_SVM on the facial dataset containing with full illumination variation.

The proposed classification model has also been evaluated using the receiver operating characteristic (ROC). ROC is a graphical plot that can illustrate the performance of a classifier system because its discrimination threshold is varied. The ROC is created by plotting the fractions of true positive rate (TPR) and false positive rate (FPR). In the experiment, TPR and FPR are described by the Eqs. (13) and (14):

$$\begin{aligned} { {TPR}}=\frac{{ {TP}}}{{ {TP}}{+}{} { {FN}}} \end{aligned}$$
(13)
$$\begin{aligned} { {FPR}}=\frac{{ {FP}}}{{ {FP}}{+}{} { {TN}}} \end{aligned}$$
(14)

where TP, FN, FP and TN are described in Table 1. When the instance is positive and classified as positive, then TP is assigned. For FN, the instance is positive, and classified as negative. TN stands for the instance is negative and classified as negative. Finally, FP denotes the instance is negative and classified as positive.

Table 1 Contingency table of ROC components

The experimental results from the ROC are shown in Fig. 7. We can see that our proposed approach performs best among all of the approaches.

Fig. 7
figure 7

Experimental results from ROC

Fig. 8
figure 8

Experimental result from EPC

The expected performance curve (EPC) [33] is used to compare different classifying models, which is a range of possible expected performances. EPC takes into account a possible mismatch while estimating the desired threshold, and the parameter alpha (\({\alpha })\) is used to estimate the possible mismatch of the threshold. The result is shown in Fig. 8. The result shows that our proposed approach obtains a small error rate from the variation of alpha.

4 Conclusions

In this paper, in order to improve the accuracy of face recognition in poor illumination face images, we propose an approach to illumination normalization of face images by combining HE and GLPF: HE is a method to adjust contrast during image processing and it is very useful when both the background and the foreground of facial image is too bright or too dark. A drawback of this method is its handling of the data without choice, which may increase the background noise; While the GLPF is a method of image processing in the frequency domain, which is used to smooth images and remove noise. GLPF can greatly repair the defects caused by HE. Gabor wavelets and PCA are then used to extract features, and an SVM method applied for face classification. The experiments show that our proposed HE_GLPF approach performs well with both the full and partial illumination variation problems and is effective at dealing with the problem of illumination variation of face images.

In future, research will be further extended to the application of our proposed illumination normalization approach to bioimages, such as X-rays and mammograms.