1 Introduction

Face recognition (FR) is the problem of verifying or identifying a face from its image. It has received a great deal of attention from the scientific and industrial communities over the last three decades due to its wide range of applications in entertainment, smart cards, information security, law enforcement, access control and video surveillance [1, 2]. Various methods have been proposed for facial feature extraction and classification, among which the representatives include subspace learning methods (e.g., Eigenface [3], Fisherface [4], Laplacianfaces [5], subspace learning from image gradient orientations (IGO) [6]), kernel-based subspace learning methods [710], local binary pattern (LBP) methods [11, 12], Gabor feature-based classification methods [1315], preprocessing-based approaches [1621, 30, 31, 33, 34], the recently developed sparse representation-based methods [2225], and the like. While most of these methods in controlled conditions have already achieved impressive performance over large-scale galleries, there still exist many challenges for FR in uncontrolled environments, such as partial occlusions, varying lighting conditions, expressions, poses, and so on.

This paper focuses mainly on the image preprocessing technology, and the issue of robustness to facial expression varying, lighting variations and partial occlusions. Preprocessing-based approaches seek to reduce the image to a more “canonical” form, and many well-known algorithms have been developed to tackle this problem. Histogram equalization (HE) is one simple example, but purpose-designed methods often exploit the fact that (on the scale of a face) naturally occurring incoming illumination distributions typically have predominantly low spatial frequencies and soft edges so that high-frequency information in the image is predominantly signal (i.e., intrinsic facial appearance). For example, the Multiscale Retinex (MSR) method of Jobson et al. [16] cancels much of the low-frequency information by dividing the image by a smoothed version of itself. Zhang et al. developed a novel technique called Gradientfaces (GF) by extracting the illumination insensitive measure from the gradient domain [17]. Wang et al. [18] use a similar idea (with a different local filter) in the self-quotient image (SQI). Gross and Brajovic (GB) [19] developed an anisotropic smoothing method that relies on the iterative estimation of a blurred version of the original image. More recently, Xiaoyang Tan and Bill Triggs (TT) [20] present a simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition, and B. Wang et al. [21] present a novel Illumination normalization technique called Weberfaces (WF) by using Weber’s law. In addition, nonnegative matrix factorization (NMF) has become a popular data-representation method and has been widely used in image processing and pattern-recognition problems [30, 31].

In this paper, we propose an image preprocessing method based on local approximation gradient (LAG). The traditional gradient is only calculated along 0° and 90°; however, there exist many other directional gradients in an image block. In order to consider more directional gradients, we introduce a novel LAG operator. The LAG operator is actually calculated by integrating more directional gradients. Generally, the main purpose of gradient calculation is to obtain different edges. Because of considering more directional gradients, LAG captures more edge information for each pixel of an image and finally generates an LAG image, which achieves a more robust image dissimilarity between images. More edge information depicts higher variances of the image. Intuitively, one often pays more attention to the regions of higher variances in a given image as compared with the flat regions. Therefore, the computation of orientation LAG should play an important role in a classification task. Then, a LAG image is normalized into an augmented feature vector using the “z-score” method. The dimensionality of the augmented feature vector is reduced by linear discriminant analysis to yield a low-dimensional feature vector. Experimental results show that LAG achieves more robust results in comparison with state-of-the-art methods in AR, Extended Yale B and CMU PIE face database. The rest of the paper is organized as follows: Section 2 presents our preprocessing method, Sect. 3 reports experimental results, and Sect. 4 concludes the paper.

2 Image preprocessing method based on local approximation gradient

2.1 Local approximation gradient

Generally, gradient operator is a first-order derivative operator. For an input image Γ(x, y), its gradient is a vector:

$$\nabla \Gamma (x,y) = \left( {\begin{array}{*{20}c} {G_{x} } \\ {G_{y} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {\partial \Gamma } & {\partial x} \\ {\partial \Gamma } & {\partial y} \\ \end{array} } \right).$$
(1)

The amplitude of the vector is expressed as:

$${\text{mag}}(\Gamma ) = (G_{x}^{2} + G_{y}^{2} )^{1/2}$$
(2)

The orientation angle of the vector is expressed as:

$$\beta (x,y) = \arctan \left( {\frac{{G_{y} }}{{G_{x} }}} \right)$$
(3)

For a discrete image Γ(x, y), G x and G y are usually calculated by different templates. An image template neighborhood T ∊ ℜ3×3 is shown in Fig. 1. We denote by α the intensity value in the central pixel location and by b m (m = 1, 2, … 0.8) the mth neighbor of α. The orientation angle of α can be calculated according to \(\beta_{\alpha } = \arctan \left( {\frac{{b_{3} - b_{7} }}{{b_{1} - b_{5} }}} \right).\)

Fig. 1
figure 1

Image template neighborhood

We can see that the traditional gradient is only calculated along 0° and 90°; however, there exist many other directional gradients in an image block. To consider more directional gradients, we design a novel gradient operator called LAG. The local approximate gradient should also be calculated in a local image block. Figure 2 shows eight different directions in 3 × 3 image template neighborhood.

Fig. 2
figure 2

Eight different directions in image template neighborhood

For an input image Γ(x, y), we can calculate its LAG vector according to formula (4).

$$\nabla \Gamma^{\text{LAG}} (x,y) = \left( \begin{aligned} G_{{0^{{^\circ }} }} \hfill \\ G_{{45^{{^\circ }} }} \hfill \\ G_{{90^{{^\circ }} }} \hfill \\ G_{{135^{{^\circ }} }} \hfill \\ G_{{225^{{^\circ }} }} \hfill \\ G_{{270^{{^\circ }} }} \hfill \\ G_{{315^{{^\circ }} }} \hfill \\ \end{aligned} \right) = \left( \begin{aligned} (b_{1} - \alpha ) \hfill \\ (b_{2} - \alpha ) \hfill \\ (b_{3} - \alpha ) \hfill \\ (b_{4} - \alpha ) \hfill \\ (b_{5} - \alpha ) \hfill \\ (b_{6} - \alpha ) \hfill \\ (b_{7} - \alpha ) \hfill \\ (b_{8} - \alpha ) \hfill \\ \end{aligned} \right)$$
(4)

The amplitude of the vector is expressed as:

$${\text{mag}}(\Gamma^{\text{LAG}} ) = (G_{{0^{^\circ } }}^{2} + G_{{45^{^\circ } }}^{2} + \ldots\,G_{{315^{^\circ } }}^{2} )^{1/2} = \left( {\sum\limits_{m = 1}^{8} {(b_{m} - \alpha )^{2} } } \right)^{1/2}$$
(5)

The orientation angle of the vector is expressed as:

$$\beta^{\text{LAG}} = \arctan \left( {\tfrac{1}{8}\left( {\sum\limits_{m = 1}^{8} {b_{m} - \alpha } } \right)} \right)$$
(6)

Suppose there are N pixels in an image. We treat a pixel α in an image as a center and determine its l neighbor pixels on a square of radius L using the city-block distance (note that for the pixel on the margin of an image, we use the mirror transform to enlarge the image first and then determine its neighbors). These neighbor pixels form a squarely neighbor set. We denote by \(\varOmega_{\alpha }^{L} = \left\{ {b_{1} ,b_{2} , \ldots b_{l} } \right\},l = \# (\varOmega_{\alpha }^{L} )\) the Lth neighbor set of pixel α. L is radius, and l is the number of neighbor pixel.

Figure 3 illustrates different neighbor sets for different radius L. Figure 3a shows the first neighbor set Ω 1 α (including 8 neighbor pixels). Figure 3b shows the second neighbor set Ω 2 α (including 24 neighbor pixels). Figure 3c shows the second neighbor set Ω 3 α (including 48 neighbor pixels).

Fig. 3
figure 3

Squarely neighbor sets for radius L. a l = 8 (L = 1), b = 24 (L = 2), and c l = 48 (= 3)

For the Lth neighbor set of pixel α, we can calculate the corresponding orientation angle according to formula (7)

$$\beta^{\text{LAG}} = \arctan \left( {\frac{1}{l}\left( {\sum\limits_{m = 1}^{l} {b_{m} - \alpha } } \right)} \right),\,\;\varOmega_{\alpha } = \left\{ {b_{1} ,b_{2} , \ldots b_{l} } \right\},\,\;l = \# (\varOmega_{\alpha } )$$
(7)

For each pixel of an image, we consider all neighbor pixels in a local block, and calculate more directional gradients. The LAG is actually calculated by integrating more directional gradients. Generally, the main purpose of gradient calculation is to obtain different edge information. Because of considering more directional gradients, LAG should obtain more edge information. More edge information depicts higher variances of the image. Intuitively, one often pays more attention to the regions of higher variances in a given image compared with the flat regions. So, the computation of orientation LAG should play an important role in a classification task.

In the following, we will compare the difference dissimilarity between the different preprocessed images by different distance metric methods. Let us consider a motivating example in which different dissimilarity measures are applied to the images shown in Fig. 4. Figure 4 shows the different preprocessed images (including Original images, HE images, GF images, GB images, MSR images, TT images, WF images, and LAG images), and the resolution of all images resolution is adjusted to 60 × 60. For GB [19] we set λ = 1. For MSR [16] we set hsiz = [7, 15, 21]. For GF [17] we set σ = 0.75. For TT [20] we set γ = 0.2, σ 0 = 1, σ 1 = 2, α = 0.1, and τ = 10. For WF [21] we set σ = 0.75, nn = 9, and α = 2. For the proposed LAG, we set L = 2. Table 1 shows the dissimilarity comparison between neural expression image and the other images with different measures. The other images consist of smile expression image, sunglasses image, scarves image from same subject and neural expression images from different subject. These measure methods consist of Euclidean, Cosine, Correlation and Manhattan [29]. As can be seen in Table 1, seven methods (including None, HE, GF, GB, MSR, TT and WF) associate a smaller distance between the original neural image and a neural image from a different subject. The distance between the original and the same image with occlusion is larger. In contrast, the usage of LAG results in a large distance between the original neural expression image and the image of a different person. Therefore, we can believe that LAG obtains more robust dissimilarity measure.

Fig. 4
figure 4

Images used for the dissimilarity measurement. a Original images, b HE images, c GF images, d GB images, e MSR images, f TT images. g WF images, h LAG images. Each row from left to right in turn is the neural expression, smiling expression, occluded (sunglasses), occluded (scarves) and another person’s neural expression

Table 1 Dissimilarity comparison between neural expression image and the other images with different measures

2.2 Image normalization

Assume that we are given a set of n input images {Γ i }, i = 1, 2, …, n,we can obtain its local approximate gradient image {Ψ i }, i = 1, 2, …, n by calculating local approximate gradient. Let {y i } be the p-dimensional column vector obtained by writing {Ψ i } in lexicographic ordering. All column vectors form a matrix B = [y 1y n ] ∊ ℜp×n. Each gradient image is normalized using the “z-score” method [32]. We denote by δ i the normalized vector of y i . The qth dimension of δ i can be calculated according to formula (8).

$$\delta_{i}^{\left( q \right)} = \left( {y_{i}^{\left( q \right)} - \mu^{\left( q \right)} } \right)/\sigma^{\left( q \right)} ,\quad i = 1, \ldots ,n,\quad q = 1, \ldots ,p$$
(8)

where y (q) i , δ (q) i are, respectively, the qth dimension of column vector \(y_{{_{i} }}^{{}}\) and \(\delta_{{_{i} }}^{{}}\).μ (q)and σ (q) are the corresponding mean and variance respectively, and they can be calculated according to \(\mu_{{}}^{(q)} = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {y_{i}^{(q)} }\) and \(\sigma_{{}}^{(q)} = \sqrt {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {(y_{i}^{(q)} - \mu_{{}}^{(q)} )^{2} } }\). We can compute all p dimensions of column vector \(\delta_{{_{i} }}^{{}}\) according to formula (8). For an input image Γ i , we can obtain its corresponding normalized vector \(\delta_{{_{i} }}^{{}}\).

2.3 Dimensionality reduction

The dimensionality of the normalized vector is reduced by LDA to yield a low-dimensional feature vector. LDA searches for feature vectors in the underlying space that best discriminate among classes. More formally, given a number of independent features relative to which the data is described, LDA creates a linear combination of these which yields the largest mean differences between the desired classes. Mathematically speaking, for all the samples of all classes, we define two measures: (1) one is called within-class scatter matrix, as given by formula (9)

$$S_{w} = \sum\limits_{c = 1}^{C} {\sum\limits_{h = 1}^{{l^{c} }} {(\delta_{h,c} - m^{c} )} } (\delta_{h,c} , - m^{c} )^{T}$$
(9)

where δ h,c denotes the hth training sample in class c, l c is the number of training samples in class c, m c is the mean of the training samples in class c, and C is the number of class. (2) The other is called between-class scatter matrix, as given by formula (10)

$$S_{b} = \sum\limits_{c = 1}^{C} {l^{c} } (m^{c} - m)(m^{c} - m)^{T}$$
(10)

where m represents the mean of all classes.

The goal is to maximize the between-class measure while minimizing the within-class measure. The discriminative projection basis W and eigen values Λ are obtained by solving the thin linear discriminant analysis (LDA) eigendecomposition:

$$(S_{w} + \varepsilon I)^{ - 1} S_{b} W = W\Lambda$$
(11)

where ε is a small regularization constant (10−3 below) and I is the identity matrix. Thus, the original L-dimensional space is projected onto a final d-dimensional space using LDA.

2.4 Robustness to occlusion

In order to evaluate the robustness to occlusion of our algorithm, we considered a single-sample-per-class experiment using neutral face images taken from the AR database. Our training test consisted of 100 face images of 100 different subjects, taken from session 1. Our testing set consisted of one image per subject taken from session 2. We evaluated the performance of our algorithms for the case of synthetic occlusions. Our algorithms consisted of LAG, and LAG with LDA (LAG-LDA). All test images were artificially occluded by a dinosaur patch of increasing size at random location. Figure 5 shows the corresponding corrupted images. Figure 6 shows the recognition rates of different algorithms without LDA. Figure 7 shows the recognition rates of different algorithms with LDA. As shown in Figs. 6 and 7, our algorithms (including LAG and LAG-LDA) consistently outperform the compared methods. As we may observe, LAG and LAG-LDA obtain more robust performance with a recognition rate over 75 % even when the percentage of occlusion is about 4 %. In addition, LAG achieves also better accuracy on FR than the compared methods without LDA and with LDA.

Fig. 5
figure 5

Artificial occlusion images using a dinosaur patch of increasing size at random location

Fig. 6
figure 6

Recognition rate as a function of the percentage of occlusion (without LDA)

Fig. 7
figure 7

Recognition rate as a function of the percentage of occlusion (with LDA)

Figure 8 shows the recognition rates of LAG and LAG-LDA. As shown in Fig. 8, the two results are approximate. Figure 9 shows the running time of all methods (including with LDA and without LDA). It’s worth noting that the methods with LDA achieves achieve faster running time than that without LDA.

Fig. 8
figure 8

Recognition rate as a function of the percentage of occlusion (LGA and LAG-LDA)

Fig. 9
figure 9

Running time as a function of the percentage of occlusion

3 Experiments

In this section, we evaluate the robustness of our method for the application on FR. Three publicly available face databases, namely, AR [26], Extended Yale B [27], and CMU PIE [28] are used for experimental evaluation. For AR, Extended Yale B and CMU PIE, we resize them to 60 × 60. To verify the effectiveness of the proposed method, in this series of experiments we evaluate the performance of the proposed method and compare it to that of several methods using the LDA and 1-NN classifier. In order to evaluate the performance of our algorithm for the case of synthetic occlusions, a series of similar experiments were also performed. In these experiments, all test images were artificially occluded by a dinosaur patch at random position. We used dinosaur patch of resolution 30 × 30.

3.1 Parameter selection

For GB [19] we set λ = 1 for all experiments. For MSR [16] we set hsiz = [7, 15, 21]. For GF [17] we set σ = 0.75. For TT [20] we set γ = 0.2, σ 0 = 1, σ 1 = 2, α = 0.1, and τ = 10. For WF [21] we set σ = 0.75, nn = 9, and α = 2. For the proposed LAG, we set the neighbor radius L = 2 for AR and Extended Yale B. In CMU PIE, the neighbor radius is set to 4. In addition, in 1-NN classifier, the dissimilarity measure based on cosine distance is adopted. In LDA, PCA ratio is set to 1, so all the non-zero eigenvalues will be kept.

3.2 AR database

The AR database consists of more than 4000 frontal view face images of 126 subjects. Each subject has up to 26 images taken in two sessions. The first session contains 13 images, numbered from 1 to 13, including different facial expressions (1–4), illumination changes (5–7), and different occlusions under different illumination changes (8–13). The second session duplicates the first session 2 weeks later. We randomly select a subset with 100 subjects.

Figure 10 shows a sample of images used in our experiments. Figure 10a shows the non-occluded images of session 1. Figure 10b shows the non-occluded images of session 2. Figure 10c shows the corresponding artificial occlusion images of (b). Figure 10d shows the face images occluded by the scarf or sunglasses.

Fig. 10
figure 10

Face images of the same subject taken from the AR database. a Shows the non-occluded images of session 1. b Shows the non-occluded images of session 2. c Shows the corresponding artificial occlusion images of b. d Shows the face images occluded by the scarf or sunglasses

We investigate the robustness of our scheme for the case of facial expression varying, illumination variations, occlusions(sunglasses)-illumination changes, and occlusions(scarves)-illumination changes. Table 2 provides the detailed information of each experiment (referred to as Exp.). The best recognition rates are shown in Table 3, Tables 4 and 5. The dimension that results in the best performance for each method is given in parentheses.

Table 2 Detailed information of all experiments of the AR database
Table 3 Best recognition rate comparison on AR face database with different methods (from Exp. 1 to Exp. 4)
Table 4 Best recognition rate comparison on AR face database with different methods (from Exp. 5 to Exp. 8 and with LDA)
Table 5 Best recognition rate comparison on AR face database with different methods (from Exp. 9 to Exp. 12)

From Exp. 1 to Exp. 4, we use images 1–4 of session 1 for training. As can be seen from Table 3, LAG-LDA achieves better recognition rate compared with other methods. LAG-LDA achieves 93.25 % recognition rate for different expressions (Exp. 1). For Exp. 2, LAG-LDA achieves 89.33 % recognition rate for the case of illumination changes. For Exp. 3 and Exp. 4, which are more difficult experiment on the AR database, LAG-LDA achieves 67 % recognition rate for the case of occlusions(sunglasses)-illumination changes, and achieves 67.33 % recognition rate for the case of occlusions(scarves)-illumination changes.

Furthermore, from Exp. 5 to Exp. 8, we use images 8–13 of session 1 for training, and they are also very difficult experiment on the AR database. As can be seen from Table 4, LAG-LDA achieves also better recognition rate compared with other methods. LAG-LDA achieves 75.25 % recognition rate for different expressions (Exp. 5). For Exp. 6, LAG-LDA achieves 90.67 % recognition rate for the case of illumination changes. Due to the use of the occlusion images as the training set, in Exp. 7, LAG-LDA has also achieved better results, and achieves 92 % recognition rate for the case of occlusions(sunglasses)-illumination changes. For Exp. 8, LAG-LDA achieves 73.33 % recognition rate for the case of occlusions(scarves)-illumination changes.

In order to evaluate the performance of our algorithm for the case of synthetic occlusions, we carried out four additional experiments (including Exp. 9, Exp. 10, Exp. 11 and Exp. 12) in AR database. Exp. 9, Exp. 10, Exp. 11 and Exp. 12 are similar experiments to Exp. 1, Exp. 2, Exp. 5 and Exp. 6. The only difference is that all test images were artificially occluded by a dinosaur patch at random position. Table 5 shows the corresponding results. As can be seen from Table 5, the proposed LAG-LDA performs better than other state-of-the-art methods. The performances of these compared methods are not satisfying. It appears that those compared seven methods are sensitive to an artificial occlusion image. In contrast, LAG-LDA is more robust to artificial occlusion image and thus achieves better results.

3.3 Extended Yale B database

The Extended Yale B database contains 2414 front-view face images of 38 individuals. For each individual, about 64 pictures were taken under various laboratory-controlled lighting conditions. Figure 11 shows partial face images used in our experiments. Figure 11a shows the non-occluded images of one subject. Figure 11b shows the corresponding artificial occlusion images of (a).

Fig. 11
figure 11

Partial face images of the same subject taken from the Extended Yale B database. a Shows the non-occluded images of one subject, and the corresponding images were artificially occluded by a dinosaur patch at random position in (b)

We carried out the following two experiments in Extended Yale B database. In Exp. 1, a random subset with d(=15, 20, 25, 32) images per individual is taken with labels to form the training set, and the rest of the database is considered to be the testing set. We summarize the best recognition rates of these methods in Table 6. The dimension that results in the best performance for each method is given in parentheses. In Exp. 2, we performed a similar experiment with the Exp. 1, and the only difference is that all test images were artificially occluded by a dinosaur patch at random position. We summarize the best recognition rates of these methods in Table 7. The dimension that results in the best performance for each method is given in parentheses.

Table 6 Best recognition rate comparison on Extended Yale B face database with different methods
Table 7 Best recognition rate comparison on Extended Yale B face database with different methods (dinosaur occlusion)

As can be seen from Table 6, LAG-LDA obtains better recognition rate compared to the other algorithms. In all experiments, LAG-LDA has more than 99 % recognition rate. For each method, we calculate the standard deviation for all the recognition rates. As can be seen from Table 6, LAG-LDA obtains smaller standard deviations compared to the other methods, so the performance of our algorithm is more stable. Therefore, for illumination changes, our algorithm is more robust.

As can be seen from Table 7, we can see that the performances of all methods degrade compared with Table 6. However, LAG-LDA still obtains a better recognition rate as compared to the other algorithms. In all experiments, LAG-LDA has also more than 99 % recognition rate. For each method, we also calculate the standard deviation for all the recognition rates. As can be seen from Table 7, LAG-LDA still obtains a smaller standard deviation as compared to the other methods, so the performance of our algorithm is more stable. Therefore, we can believe that our algorithm is more robust to illumination changes with different occlusions.

3.4 CMU PIE database

The CMU PIE database consists of more than 41,000 face images of 68 subjects. The database contains faces under varying pose, illumination, and expression. We used the five near frontal poses (C05, C07, C09, C27, C29) and a total of 170 images for each subject. Figure 12 shows partial face images used in our experiments. Figure 12a shows the non-occluded images of one subject. Figure 12b shows the corresponding artificial occlusion images of (a).

Fig. 12
figure 12

Partial face images of the same subject taken from the CMU PIE database. a Shows the non-occluded images of one subject, and the corresponding images were artificially occluded by a dinosaur patch at random position in (b)

We carried out the following two experiments in CMU PIE database. In Exp. 1, a random subset with d(=10,15,20 and 25) images per individual is taken with labels to form the training set, and the rest of the database is considered to be the testing set. We summarize the best recognition rates of these methods in Table 8. The dimension that results in the best performance for each method is given in parentheses. In Exp. 2, we performed a similar experiment with the Exp. 1, and the only difference is that all test images were artificially occluded by a dinosaur patch at random position. We summarize the best recognition rates of these methods in Table 9. The dimension that results in the best performance for each method is given in parentheses.

Table 8 Best recognition rate comparison on Extended Yale B face database with different methods
Table 9 Best recognition rate comparison on Extended Yale B face Database with different methods (dinosaur occlusion)

As can be seen from Table 8, we can see that our method LAG-LDA achieves the best performance among all methods, irrespective of the variations of training sample size. In all experiments, LAG-LDA has more than 92 % recognition rate. For each method, we calculate the standard deviation for all the recognition rates. As can be seen from Table 8, our method obtains the second smallest standard deviation as compared to the other methods; however, we can see that our method outperforms the first smallest standard deviation method (HE-LDA) in terms of recognition accuracy for approximately up to 84, 85, 86 and 86 % in different training sample sizes (10, 15, 20 and 25). In addition, LAG-LDA outperforms the second best method (MSR-LDA) in terms of recognition accuracy for approximately up to 6, 4, 5 and 5 % in different training sample sizes (10, 15, 20 and 25). Therefore, we can believe that our algorithm is more robust to pose changes.

As can be seen from Table 9, we can see that the performances of all methods degrade compared with Table 8. However, we can see that our method LAG-LDA still achieves the best performance among all methods, irrespective of the variations of training sample size. In all experiments, LAG-LDA has more than 84 % recognition rate. For each method, we also calculate the standard deviation for all the recognition rates. As can be seen from Table 9, our method does not obtain the smallest standard deviation as compared to the other methods; however, we can see that our method outperforms the first smallest standard deviation method (HE-LDA) in terms of recognition accuracy for approximately up to 76, 79, 79 and 80 % in different training sample sizes (10, 15, 20 and 25). In addition, LAG-LDA outperforms the second best method (GF-LDA) in terms of recognition accuracy for approximately up to 21, 21, 22 and 20 % in different training sample sizes (10, 15, 20 and 25). Therefore, we can believe that our algorithm is more robust to pose changes with different occlusions.

As can be seen from Tables 8 and 9, we can observe that LAG-LDA obtains better results with or without occlusion; however, the other methods are very sensitive to artificial occlusion. Therefore, LAG-LDA is a more stable method.

3.5 Parameter analysis

In this section, we will discuss the influence of parameter setting of the neighbor radius L. The performances of LAG-LDA with different neighbor radius are evaluated on the face image databases mentioned above. The first session of each individual is used for the training and the second session for tests in the experiment on the AR database. The random 15 images per class of Extended Yale B database are chosen for training, and the rest for testing. In addition, the random 15 images per class are selected for training, and the remaining images for test on the CMU PIE database.

The recognition rates of LAG-LDA with respect to different neighbor radius on the three databases are shown in Table 10. As can be seen from Table 10, LAG-LDA achieves better result when L is set to 2 in AR and Extended Yale B. It is worth nothing that LAG-LDA achieves better result when L is set to 4 in CMU PIE. Therefore, we generally choose the neighbor radius L = 2. For face database with pose changes, we intuitively believe that the larger neighbor radius can deal with larger local image block, and can better adapt to pose changes. In this paper, we choose the neighbor radius L = 4 for CMU PIE.

Table 10 The performance of LAG-LDA with different neighbor radius

4 Conclusions

In this paper, we propose an image preprocessing method based on LAG. The LAG operator is actually calculated by integrating more directional gradients. Because of considering more directional gradients, LAG captures more edge information for each pixel of an image and finally generates an LAG image, which achieves a more robust image dissimilarity between images. Then, the LAG image is normalized into a normalized feature vector using the “z-score” method. The dimensionality of the normalized feature vector is reduced by linear discriminant analysis to yield a low-dimensional feature vector. Our experiments show that the proposed method is robust to different facial expressions, illumination variations and occlusions changes (including random occlusion position and different occlusion sizes), and achieves better recognition rate.